Self-Driving Data: The Role of Automation in Autonomous Vehicles

Apple recently shared plans to develop a self-driving car by 2024

The tech giant’s watershed announcement has reignited interest in the autonomous car market, a burgeoning industry set to surpass $65.3 billion by 2027. Traditional automotive OEMs feel the Silicon Valley heat getting closer to the core business.

While many drivers are still wary of self-driving vehicles, the industry’s growth augurs well for public safety, as more than 90% of all vehicular accidents are due to human error.

Despite our fallibility, the human brain is hardwired to process massive amounts of stimuli and make split-second decisions. Autonomous vehicles must possess a similar ability to understand the fixed rules of the road—recognizing traffic signs, road markings, and speed limits—while simultaneously identifying, assessing, and reacting to countless unexpected variables—foreign objects, other cars, bikers, pedestrians and more.

It falls on us to teach them how.

The extent of these necessary perception capabilities and safety layers have delayed our prior expectations of the self-driving launch timeline, a process which has been likened to the arduous rise of cellular coverage. But there are still clear indications that self-driving cars are gradually moving towards production, and as autonomous technology continues to mature, these machines inch closer and closer to matching, and someday even surpassing, our human synapses.

The Role of Automation

Autonomous vehicles begin and end with data—from the moment a vehicle’s sensors capture an image, a sound, or even a tactile sensation, a complex process of recognition, action determination and response occurs. And the ability for a vehicle to simply obtain this incoming information — capturing sights, sounds and feelings on the road — is not enough. All that data must be recognized, verified, and validated in a manner that is fast enough and smart enough to ensure that all safety and technical requirements are met.


This means that for a vehicle to be truly capable of driving without human control or even with limited human intervention, autonomous systems must essentially be taught to understand the stimuli presented in real time, requiring many different neural networks work together, performing many different perception tasks that our brain is doing seamlessly.


Training this multi-neural network systems source data requires powerful data management system, a system that is capable absorbing unstructured data as its input and manages the process of creating top quality source data sets that are used in the training process of these networks, where every perception task comes with its own definition, performance requirements and labelling instructions. Once you starting diving deep into this world you find that the amount of tasks is huge and each one of these tasks comes with its own edge and corner cases, to take an example: identifying cop instructions: each country and city cops look different and in many cases the signs they use to direct traffic have their own nuances.

The process of generating these many datasets is human labor intensive process and the human involvement is crucial in guiding the machine through the many cases and corners — every AV has thousands of humans working on the backend to annotate and decipher hundreds of thousands of images with high complexity and 99.5% accuracy, but this rate of accuracy can only be ensured through rigorous testing and validation. Creating human machine data pipelines becomes critical both of automation of the labor as well as monitor of the quality and correctness of these training sets.

We were once promised a self-driving car technology in production by 2018 which was delayed to 2021, Apple now aims for 2024. The above-mentioned process is the core for the delays of this technology to reach markets since reaching 99.5% quality assurance confidence for the data annotation process is an extreme challenge in time, cost, and risk management, and the gradual edge case coverage needed for this process of labelling and training, when involving both by humans and machines, is a very slow process.

There are, however, certain software tools that provide algorithmic frameworks to automate this process— These rapidly evolving technologies are becoming critical components in AV time to market and in more general manner, all AI systems time to market. Developing top level AI system is very long and expensive, essentially preventing from most businesses adopting this technology and stay competitive, without the proper data management, labeling and pipelines platforms, AI will stay out of reach for most business out there.

Conditional automation: Human in the loop

If ongoing AV initiatives hope to take hands off the steering wheel for good, it is crucial that they maintain a hands-on approach—human accountability and safety monitoring must remain part of the driving process, even after this responsibility transitions from the people behind the wheel to the people behind the tech.

In order to ensure the highest levels of safety, autonomous vehicles will require continuous, intuitive decision making throughout the driving experience, from the moment a car turns on. As developers consider the ways that the plethora of incoming signals can be managed, processed, analyzed and shared in real time, they must ask themselves several key questions: How much can we rely on AI to handle this task? How much will consumers trust AI to handle this task? How much human involvement is necessary—and for how long—as AI gets better and better at handling this task?

With questions like this in mind, it is safe to say that autonomous driving will only enter the mainstream in a serious, impactful way through a “humans in the loop” approach. The expectation is that self-driving cars must be assisted and monitored in tandem with human intervention for the foreseeable future, while the question of whether real-time remote human assistance (as opposed to post-drive monitoring) will be continually necessary, remains open and for now every time you intervene with the Tesla AutoPilot, you are labelling this scenario for tesla and teaching the machine something went wrong using the scenario data.

We are at the precipice of a self-driving revolution, a prospect that holds the potential for a bright future—safer roads, more inclusive transportation for the elderly and disabled, shipping processes with increased time and cost efficiency, and more. But turning points with the greatest potential also call for great responsibility on the part of those at the forefront, particularly in the case of driving and human transportation, where safety assurance is an uncompromisingly critical consideration.

As the abilities of vehicular automation increase, there will be a need for more sensors, memory, processing power and, of course in turn, more data, and the market demand for infallibly reliable AI is rightfully pushing AV developers towards a mindset of human assistance rather than one of human replacement. Because AV development requires such
immense amounts of high-speed data recognition and near-perfect data processing, the rise of self-driving cars will mark a huge leap in the abilities of AI. But it also marks a call to action for those leading the charge to establish development practices that can ensure safety and build widespread trust as this cutting-edge tech inches closer to becoming the norm.


About the Author

Eran Shlomo is CEO and Cofounder of Dataloop. Dataloop’s data management and annotation platform streamlines the process of preparing visual data for machine and deep learning. Dataloop’s platform is a one-stop shop for generating datasets from raw visual data – It includes data management environment, intuitive annotation tool with automatic annotation capabilities, and data QA and debug tools.

more insights