About the Team

Our team is responsible for the collection, storage, and processing of large-scale datasets generated by autonomous vehicles and delivery robots. This includes sensor data from cameras, lidars, radars, and other onboard systems. Scaling reliable storage and providing efficient compute tools is essential for supporting downstream teams—such as machine learning, simulation, and algorithm development. Our data processing stack incorporates specialized algorithms similar to those deployed directly on autonomous systems in the field.

About the Role

As a Software Engineer, Data Platform at Avride, you will be responsible for designing, building, and maintaining the core data and machine learning infrastructure with a strong focus on software design and code quality. You will design systems to ingest, process, and organize petabytes of telemetry and sensor data into a globally distributed data lake, enabling high-throughput, low-latency access to data for both model training and online inference. Your work will help ML engineers and data scientists iterate faster and deliver better-performing systems.

‍

What You'll Do

Build and maintain robust data pipelines and core datasets to support simulation, analytics, and machine learning workflows, as well as business use cases
Design and implement scalable database architectures to manage massive and complex datasets, optimizing for performance, cost, and usability
Collaborate closely with internal teams such as Simulation, Perception, Prediction, and Planning to understand their data requirements and workflows
Evaluate, integrate, and extend open-source tools (e.g., Apache Spark, Ray, Apache Beam, Argo Workflows) as well as internal systems

‍

What You'll Need

Strong proficiency in Python (required); experience with C++ is highly desirable
Proven ability to write high-quality, maintainable code and design scalable, robust systems
Experience with Kubernetes for deploying and managing distributed systems
Hands-on experience with large-scale open-source data infrastructure (e.g., Kafka, Flink, Cassandra, Redis)
Deep understanding of distributed systems and big data platforms, with experience managing petabyte-scale datasets

‍

Nice to Have

Experience building and operating large-scale ML systems
Understanding of ML/AI workflows and experience with machine learning pipelines
Experience optimizing resource usage and performance in distributed environments
Familiarity with data visualization and dashboarding tools (e.g., Grafana, Apache Superset)
Experience with cloud-based infrastructure (e.g., AWS, GCP, Microsoft Azure)

‍

Candidates are required to be authorized to work in the U.S. The employer is not offering relocation sponsorship, and remote work options are not available.

Your application has been sent. Thank you! We'll get back to you within a week.

Something went wrong. Please, try again.

Software Engineer - Data Platform

About the Team

About the Role

What You'll Do

What You'll Need

Nice to Have

Apply now