Software Engineer - Data


San Francisco, CA, USA

Full time

Apr 21

This job is no longer accepting applications.

At Databricks, we are passionate about enabling data teams to solve the world's toughest problems, from security threat detection to cancer drug development. We do this by building and running the world's best data and AI infrastructure platform, so our customers can focus on the high value challenges that are central to their own missions. Our engineering teams build technical products that fulfill real, important needs in the world. We always push the boundaries of data and AI technology, while simultaneously operating with the resilience, security and scale that is important to making customers successful on our platform.

We develop and operate one of the largest scale software platforms. The fleet consists of millions of virtual machines, generating terabytes of logs and processing exabytes of data per day. At our scale, we observe cloud hardware, network, and operating system faults, and our software must gracefully shield our customers from any of the above.

As a software engineer working on the Data team you will help build the data platform for Databricks. You will architect and run high-quality, large-scale, multi-geo data pipelines for analyzing product telemetry and logs, and using it to guide business decisions. You will do this using Databricks - the Data team also functions as a large, production, in-house "customer" that dogfoods Databricks and guides the future direction of the products.

The impact you will have:

  • Design and implement reliable data pipelines using Spark and Delta.
  • Establish conventions and create new APIs for telemetry, debug and audit logging data, and evolve them as the product and underlying services change.
  • Create understandable service level agreements for each of the production data pipelines.
  • Develop best practices and frameworks for unit, functional and integration tests around data pipelines, and guide the team towards increased overall test coverage.
  • Design CI and deployment processes and best practices for the production data pipelines.
  • Design schemas for financial, sales and support data in the data warehouse.

What we look for:

  • BS (or higher degree) in Computer Science, or a related field
  • Experience building, shipping and operating multi-geo data pipelines at scale.
  • Experience with working with and operating workflow or orchestration frameworks, including open source tools like Airflow and Luigi or commercial enterprise tools.
  • Experience with large-scale messaging systems like Kafka or RabbitMQ or commercial systems.
  • Excellent communication (writing, conversation, presentation) skills, consensus builder
  • Strong analytical and problem solving skills
  • Passion for data engineering and for enabling others by making their data easier to access.

You must be logged in to to apply to this job.


Your application has been successfully submitted.

Please fix the errors below and resubmit.

Something went wrong. Please try again later or contact us.

Personal Information


View resume



Databricks is the data and AI company, helping data teams solve the world’s toughest problems.