Site Reliability Engineer, Ethos API Platform


Ottawa, Ontario, Canada

Full time

Software Engineering / Software Developer

Jun 3

Our Company

Changing the world through digital experiences is what Adobe’s all about. We give everyone—from emerging artists to global brands—everything they need to design and deliver exceptional digital experiences! We’re passionate about empowering people to create beautiful and powerful images, videos, and apps, and transform how companies interact with customers across every screen.

We’re on a mission to hire the very best and are committed to creating exceptional employee experiences where everyone is respected and has access to equal opportunity. We realize that new ideas can come from everywhere in the organization, and we know the next big idea could be yours!

The Challenge

The Ethos Platform provides industry leading API hosting capabilities. Our solutions support high traffic, highly visible applications with immense amounts of data, numerous third-party integrations, and exciting scalability and performance problems.

The Site Reliability Engineer on the Ethos API Platform DevOps team has the responsibility to ensure optimal performance and uptime of Adobe's API infrastructure: specifically the API Gateway and our Kubernetes powered infrastructure. We're focused on: containerization, clusterization, performance, continuous integration / continuous deployment (CI/CD), and pipeline automation. This team is uniquely positioned to make a measurable difference to Adobe's development culture, reputation and its bottom line!

The successful candidate should have a strong interest in learning new technologies, working independently, and the ability to drive complex and ambitious projects to conclusion. Strong collaboration with the engineering teams and an ability to thrive under pressure are key skills required to succeed in this role. This individual should be self-motivated and have a passion for quality.

The Ethos Platform DevOps team is geographically distributed and as such we rely heavily on tools like Slack and video conferencing. Our team is in between San Franciso, Bucharest and Ottawa. Some international travel may be required.

What You'll Do

  • Own and operate clusters of servers in AWS and Azure running applications that handle billions of transactions. Define and track metrics to monitor and improve reliability of these systems.
  • Ensure the highest level of uptime and Quality of Service (QoS) for our customers through operational excellence.
  • Build, challenge, and secure our automated, multi-cloud, multi-tenant environments: in software, process, and infrastructure.
  • Engage in service capacity analysis and demand forecasting, software performance analysis and system tuning.
  • Improve our tools for continuous integration, continuous deployment, automated testing and release management.
  • Work closely with internal users to debug and fix REST-based APIs and network connectivity.

What You Need To Succeed

  • B.Sc. or higher in related field, or equivalent experience.
  • Years of proven experience in software engineering, release engineering, and/or configuration management.
  • Passion for automating repetitive work using scripting languages (shell) and automation platforms like Chef.
  • Experience with cloud service providers: Microsoft Azure, Amazon AWS.
  • In depth experience with Linux and familiarity with containerization (e.g. Docker).
  • Skill with one or more development languages such as Go, Python, and Ruby.
  • Strong written and oral skills.
  • Flexibility in working hours and the ability to participate in an on-call schedule.

Nice To Have

  • Cloud provider automation, ex. AWS Cloudformation, Azure ARM Templates, Troposphere, Terraform, Heat Templates, etc.
  • Experience with build management tools, preferably Jenkins.
  • Experience with log aggregation tools such as Splunk, Sumologic.
  • Experience with monitoring solutions: Newrelic, Datadog, Runscope, Prometheus, Grafana.
  • Experience with orchestration tools, ex. Apache Mesos/Kubernetes.
  • Exposure to Kafka, AWS Kinesis or messaging platforms.
  • Previous experience automating build and release processes.
  • Previous experience consulting and working with customers.

Apply for this position Back to job

You must be logged in to to apply to this job.


Your application has been successfully submitted.

Please fix the errors below and resubmit.

Something went wrong. Please try again later or contact us.

Personal Information


View resume



Changing the world through digital experiences