Today, thanks to major recent leaps in AI capabilities, technology based on computer vision (CV) powers key functionalities in many apps and devices, including those that people use every day.
Facial recognition biometrics software would be nowhere without computer vision. Autonomous vehicles require it for safe driving. Medical image analysis and robotic QA inspection processes all need computer vision. And Shopic, the company I work for, uses it to identify items placed in grocery shopping carts.
These technologies essentially leverage CV to automate useful processes using visual data as the input, with systems comprising scanning hardware and algorithms that allow machines to analyze, process, and extract information from digital images and videos. It’s heavily reliant on machine learning, deep learning, and complex neural networks.
In recent months, computer vision has been evolving in new and exciting ways. To some, these changes are also disturbing, but I see a sector that’s full of potential and opportunity. Here are some of the stand-out trends that I see as dominating the industry now and for the foreseeable future.
The bar to access is creeping downwards
Ongoing advances in edge cloud storage are resulting in edge devices, such as digital cameras and visual sensors, that can run CV processing “on site,” instead of sending it to the cloud.
This moves computer vision AI processing to edge devices, lowering latency and reducing energy and bandwidth consumption. Companies like Nvidia are working on edge cloud services to improve the deployment of CV assets.
Freed from expensive cloud processing and storage, adoption costs are falling, making CV systems more accessible and affordable. The shift to edge processing also improves data privacy for computer vision apps, overcoming compliance issues that previously dogged adoption for many companies.
Computer vision is spreading to more use cases
As the barrier to adoption falls, more verticals are implementing computer vision in ever more use cases. Healthcare is an eager adopter, with CV enabling more accurate imaging diagnostics and telehealth services, and promising to make robotic surgery a reality.
We’re seeing first-hand the growth in retail CV use cases, including self-serve shopping systems and cashierless stores. Autonomous vehicles and road safety devices are implementing CV to increase safety, farmers are adopting CV for crop monitoring and disease detection, and vSLAM systems are using it to deliver more accurate mapping for disaster relief, weather predictions, and more.
As use cases stack up, the CV market will expand, with Global Data predicting that the CV market will grow from $17.73 billion in 2023 to $30.3 billion in 2026.
Systems are becoming more sophisticated
As the AI algorithms that underpin computer vision solutions grow more powerful, CV systems will improve their ability to recognize objects and faces. This will bring the capacity to detect emotions and track physical movement with greater precision, opening up new capabilities in behavioral insights and anomaly detection. While surveillance of individuals is highly problematic and has come under scrutiny in recent years, ethically safe use cases include anonymized sentiment analysis of crowds.
We’re also seeing computer vision systems develop innovative new models and methods for image processing. Today’s “attention models” are essentially input processing techniques that enable neural networks to focus on defined aspects of a complex input, so CV systems can understand each part of a busy image or video. In addition, “graph neural networks” apply deep learning predictions to rich data structures that map the relationships between different objects, enhancing CV capabilities to comprehend and interpret context.
Connecting visual data with data from other sources, meanwhile, enriches the broader context and understanding of events, just like humans combine sight with the feel of the wind, a sense of speed, and ambient background sounds to make sense of the world around us. In this way, CV solutions can understand and extract insights from a whole scene within a broader context, not just from selected segments.
This maturity will enable more accurate interpretation and analysis, improved decision making, and greater efficacy in complex and fast-changing situations such as a busy manufacturing plant or urban street.
AR is entering a new era
Today’s augmented reality (AR) solutions can produce an interactive 3D reproduction of any real environment, and adjust it in a limited way by tracking changing light on flat surfaces. They can respond to user movements through head tracking and controllers, but that’s as far as it goes.
However, the integration of computer vision cameras with eye-tracking solutions and gyroscopes is starting to produce more intricate systems. CV-enhanced AR solutions can perceive the user’s entire surroundings, direct the user away from obstacles, adapt the virtualized environment to the user’s body movements, and more.
This has important implications for disability assistance devices, direction apps, and gaming/metaverse experiences.
But challenges remain
While computer vision has been seeing tremendous growth, and the future is looking bright, there are still challenges that leaders in the space need to contend with. Because CV is a relatively new field in the business arena, there’s a shortage of specialists who can oversee development and rollout at scale. Companies need to upskill employees to meet these needs.
As an industry, we also need to do better at addressing concerns around privacy, trust, and ethical use. Data collection that’s either vigilantly opted into or truly anonymized is the key here. Computer vision systems need to comply with rapidly evolving privacy regulations and public demand for privacy.
There are increasing demands for AI transparency and explainability. CV cannot remain a black box, but the AI models it relies upon are so complex that it is difficult to make them explainable. “Those responsible for putting AI systems in place will work harder to ensure that they are able to explain how decisions are made and what information was used to arrive at them,” writes Bernard Marr, but as CV grows more complex, so will the challenge.
Computer vision is still taking shape
Like other AI solutions, computer vision is advancing at the speed of light, and new methods, applications, use cases, and capabilities are appearing all the time. It’s hard to predict what is yet to come in such fast-changing circumstances.
While there are many challenges to overcome, particularly around privacy and ethical use, CV systems promise to deliver new functionalities for numerous verticals, effectively opening up new opportunities for all invested parties.
About the Author
Evyatar Ben-Shitrit is the Director of Innovation at Shopic, a leading provider of smart cart solutions for medium and large grocery chains. Shopic powers the world’s largest smart cart deployments. Our intelligent retail solutions bring the advantages of online commerce to physical supermarkets worldwide. Shopic’s unique clip-on device uses computer vision to turn any regular shopping cart into a smart cart for the duration of a visit. It delivers a personalized shopping experience with instant on-cart checkout, an effective in-store retail media channel, and actionable insights from real-time analysis of shopper carts and store shelves. Shopic’s solutions are pragmatic, immediately deployable, operational with minimal store adjustments, and deliver swift ROI. Crucially, they’re among the only frictionless solutions that work for medium and large-sized stores.
Featured image: ©Alexander