Automotive Companies only Access 5% of their Vehicle Data

#machinelearning #computerscience

With vehicle sensors collecting massive amounts of data, only 5% of it is currently being used for product development. Better infrastructure and data processing hold the keys to progress.

The promise of fully autonomous vehicles continues to excite and inspire millions of people around the world. The amazing things that safe, reliable, self-driving vehicles can do for humanity--from providing newfound mobility to senior citizens to reducing traffic accidents--are closer to our reach than ever. But we still have a long way to go.

Along with the fuel (gasoline, diesel or electricity) that powers automobiles, autonomous vehicles require a fuel of their own to “drive” safely and effectively: data. Although that data, already collected by millions of sensors on thousands of vehicles around the world, is readily available, it’s not being utilized to its full potential.

These datasets power the algorithms that make all levels of autonomous driving possible. Today, automotive companies only access 5% of their vehicle data, while the remaining 95% becomes costly to store and optimise without the context necessary to make use of it. As automotive companies aggressively pursue data-driven product development, data handling, including the exploration, querying, curation and evaluation of data, is a common bottleneck on the road to progress. These essential but hard-to-manage datasets bring a unique set of challenges for those hoping to make use of them:

Unstructured and diverse formats
Need for rich semantics in order to access them
Huge sizes that require high-performance computing
Strong need for data versioning
Access and security issues
Need for continuous failure-case driven data exploration

With proper data processing and analytics software, engineering users can overcome all of these challenges and vehicle data can fulfill its potential. By upgrading legacy technology for data access and analysis, OEMs and mobility tech companies can bump up data utilisation rates by up to 40% and generate additional ROI as data exploration, search, analysis, anomaly detection and evaluation require less manual engineering work and yield better results.

Data infrastructure and management are being neglected

Currently, vehicle technology developers are focused on machine learning models and ground truth labeling. These same developers are neglecting infrastructure upgrades, leading many to use legacy technology for data management. Terabytes of unstructured, unprocessed vehicle data easily overwhelm these systems, causing them to malfunction. Raw data has no metadata and billions of frames, and technology developers are left to use tools ill-suited to the task of data management to organize data on their own.

Highly-paid engineers search datasets manually on sluggish database systems, spending up to 75% of their time on raw data handling issues instead of training and validating building models. Furthermore, the lack of insights garnered from these vehicle datasets means machine learning and data science systems are unable to effectively build AI functions that rely on sensor data as an input.

Purpose-built data infrastructure is the solution

Infrastructure that is designed and built specifically to house raw sensor data and extract insights is the answer. This infrastructure should be as simple and easy to navigate as spreadsheets and SQL databases that bring order and usefulness to data in other industries. For vehicle sensor data, infrastructure that uses a flexible and minimalist data model and a scalable method to produce semantics, along with fast queries and integrated endpoints to use and share data, is most effective.

These and other infrastructure elements improve data re-use and result in faster insights. Enabled by semantic automation, data can be ranked according to its importance based on context, content, criticality, and usability. This reduces redundancy and maximizes information density. Low importance data is archived or deleted, while high importance data is easily accessible for everyone. Furthermore, data insights can be provided almost immediately with overviews of the content and the redundancy.

Users can augment retention recommendations with tailored rule-based constraints, e.g. unprotected left turns should always be kept.

Better autonomous vehicles, sooner

Advances in automated data infrastructure and processing liberate automotive technology developers from the constraints of legacy technology systems. Automated, semantic data management systems make data handling easier than ever, enabling automotive companies to save valuable engineering time, boosting productivity and increasing overall utilization.

Put simply, more robust, effective infrastructure for unstructured sensor data will lead to higher ROI on research and development by freeing up engineers to do what they do best: building algorithms that will power the autonomous vehicles of the future. And with those engineers working more efficiently, the dreams of autonomous vehicles improving our lives are that much closer to reality.

Dealing with lots of raw sensor data?

Learn more about how the SiaSearch data management platform can help on our website: https://www.siasearch.io/

Originally published by Clemens Viernickel on: https://www.siasearch.io/blog/raw-sensor-data-needs-infrastructure-to-be-useful/

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

DEV Community

Automotive Companies only Access 5% of their Vehicle Data

Data infrastructure and management are being neglected

Purpose-built data infrastructure is the solution

Better autonomous vehicles, sooner

Dealing with lots of raw sensor data?

Top comments (0)

Read next

Introducing Milvus 2.5: Built-in Full-Text Search, Advanced Query Optimization, and More 🚀

TDoC 2024 - Day 3: Introduction to Machine Learning

World's Largest Telegram Dataset Reveals How Information Spreads Across 120,000+ Channels

Microsoft's Phi-4: Smaller AI Model Achieves Big Results Through Clean Training Data