Sr HPC Infrastructure Engineer

Tesla

Palo Alto, CA, USA

Full time

Engineering

Mar 4

What to Expect

As an HPC Infrastructure Engineer you will be working directly with the high-performance computing infrastructure that drives the development of all engineering work that makes Tesla a world leader in EV technology. Continued development and automation of this infrastructure is imperative to the success of our many R&D teams and the company as a whole. You’ll be working in a closely integrated, cross-functional, and highly versatile team that designs, implements, and maintains all Tesla HPC resources. With the ever-growing need for more data and more compute, locally, and in remote locations – cluster builds are getting larger and more complex – and in need of more automated processes for deployment, monitoring, self-healing (when possible), and alerting. You will be responsible for ensuring greatly improved processes in rapid compute and storage deployments in geographically distributed areas to help provide maximum system efficiency.

What You’ll Do

  • Leverage and improve upon existing cluster management solutions to ensure rapid      deployment
  • Work with engineering teams to understand useful metrics to collect and implement such monitoring and alerting with existing monitoring solutions.
  • Improve root cause analysis and corrective action for problems large and small – identify patterns and design task automations.
  • Organize and document implemented solutions for long term information retention with our internal ticketing and documentation system.
  • Work closely with cluster architects to design and document automated workflows that can be easily implemented by remote hands with little or no understanding of internal systems.
  • As part of the team, respond to, and document submitted support tickets relating to the functionality of various clusters, storage systems, and software solutions.
  • Help develop automated tools to collect information that can be directly used to assist users creating root cause analysis for issues in their job submissions.

What You’ll Bring

  • Bachelor’s degree in computer science, electrical engineering or related field with 3 years of additional equivalent experience or evidence of exceptional ability related to the position.
  • 5+ years’ experience with:
  • Cluster deployment and operations
  • Linux operating system flavors (CentOS/RHEL, Ubuntu)
  • Configuration management software (Puppet, Chef, Ansible, etc.)
  • Systems monitoring and alerting (Ganglia, Telegraf, Splunk, etc.)
  • Administering job schedulers (SLURM, LSF, etc.)
  • 3+ years’ experience with:
  • GPU-based compute systems
  • Storage systems (On-prem and/or in-cloud)
  • Working knowledge of programming and/or scripting with python, bash, or similar
  • Excellent time management and communication skills are absolute musts

Nice to have

  • Experience with multi-site on-prem and in cloud hybrid software and hardware deployments
  • Experience working with EDA, CAE/CFD or ML workloads a huge plus
  • Experience with containers – Docker, Singularity, Kubernetes
  • Experience with hardware accelerators (FPGA / ASIC based)
  • Experience with parallel filesystems
  • Previous experience at the large-scale data center and remote systems management
  • Familiarity with public cloud compute and storage resource orchestration

Compensation and Benefits

Benefits

Along with competitive pay, as a full-time Tesla employee, you are eligible for the following benefits at day 1 of hire:

  • Aetna PPO and HSA plans > 2 medical plan options with $0 payroll deduction
  • Family-building, fertility, adoption and surrogacy benefits
  • Dental (including orthodontic coverage) and vision plans, both have options with a $0 paycheck contribution
  • Company Paid (Health Savings Account) HSA Contribution when enrolled in the High Deductible Aetna medical plan with HSA
  • Healthcare and Dependent Care Flexible Spending Accounts (FSA)
  • LGBTQ+ care concierge services
  • 401(k) with employer match, Employee Stock Purchase Plans, and other financial benefits
  • Company paid Basic Life, AD&D, short-term and long-term disability insurance
  • Employee Assistance Program
  • Sick and Vacation time (Flex time for salary positions), and Paid Holidays
  • Back-up childcare and parenting support resources
  • Voluntary benefits to include: critical illness, hospital indemnity, accident insurance, theft & legal services, and pet insurance
  • Weight Loss and Tobacco Cessation Programs
  • Tesla Babies program
  • Commuter benefits
  • Employee discounts and perks program


Expected Compensation

$104,000 - $348,000/annual salary + cash and stock awards + benefits


Pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. The total compensation package for this position may also include other elements dependent on the position offered. Details of participation in these benefit plans will be provided if an employee receives an offer of employment.



Tesla is an Equal Opportunity / Affirmative Action employer committed to diversity in the workplace. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, age, national origin, disability, protected veteran status, gender identity or any other factor protected by applicable federal, state or local laws.

Tesla is also committed to working with and providing reasonable accommodations to individuals with disabilities. Please let your recruiter know if you need an accommodation at any point during the interview process.

For quick access to screen reading technology compatible with this site click here to download a free compatible screen reader (free step by step tutorial can be found here). Please contact accommodationrequest@tesla.com for additional information or to request accommodations.

Privacy is a top priority for Tesla. We build it into our products and view it as an essential part of our business. To understand more about the data we collect and process as part of your application, please view our Tesla Talent Privacy Notice

Apply for this position Back to job

You must be logged in to to apply to this job.

Apply

Your application has been successfully submitted.

Please fix the errors below and resubmit.

Something went wrong. Please try again later or contact us.

Personal Information

Profile

View resume

Details

{{notification.msg}}