Site Reliability Engineer, Unannounced Project City

Blizzard

Irvine, CA, USA Remote

Full time

Oct 1

This job is no longer accepting applications.

Job Title:

Site Reliability Engineer, Unannounced ProjectRequisition ID:

R008766Job Description:

At Blizzard Entertainment, our Site Reliability Engineers (SREs) use systems expertise combined with software engineering patterns to help define, create, and support the architecture, build systems, orchestration, and operations of services across the business. The role comprises talented engineers focused on evangelizing reliability-as-a-feature through monitoring, service-level objectives, automation, everything-as-code, and testing.

Blizzard's games and platforms reach a global audience of passionate gamers. The scale is massive, and the challenges are very real, but the wise application of technology allows it to run reliably with minimal oversight. Our Site Reliability Engineers are at the heart of this work, working directly with the engineering teams from idea to launch to deliver the most epic (and reliable!) experiences... ever.

As an SRE at Blizzard, you may find yourself...

  • Being part of an on-call rotation to assist in finding a resolution during incidents
  • Hosting blameless postmortems to share learnings, discover gaps, embrace transparency, and improve reliability across our services
  • Building positive and collaborative relationships across the company
  • Employing your systems knowledge to triage problems and tune resource usage
  • Championing automation to reduce toil and increase development velocity
  • Helping define and instrument Service-Level Objectives to ensure epic player experiences
  • Leveraging Configuration Management to build and maintain consistency across services
  • Building Terraform configs to manage infrastructure in public and private clouds
  • Supporting and improving build pipelines with Jenkins, Atlantis, and ArgoCD
  • Adopting Containers and Kubernetes for new and existing services
  • Applying everything-as-code methodologies across configuration, infrastructure, orchestration, and elsewhere

You may succeed in this role if you...

  • Love to solve novel and exciting problems
  • Dislike solving the same problems over-and-over- so you automate or eliminate them
  • Are inspired to make everyone's job easier by improving workflows
  • Are comfortable digging through metrics, logs, and whatever else is available to triage and fix an incident at any time
  • Strive to be better, smarter, and faster tomorrow than you are today
  • Enjoy trying new technologies to improve what we're doing today
  • Naturally spread the philosophies and practices of DevOps to others
  • Like to collaborate with others to solve problems, share knowledge, and provide feedback
  • Can self-assess the needs of a system or team, and make a case to prioritize that work
  • Relish working with software, network, cloud, and systems engineers to solve problems across all tiers of the stack
  • Help your peers succeed as much as you can

Types of projects you may work on...

  • Managing services and infrastructure supporting Blizzard's incredible games
  • Defining the future of running services for our platforms and games with Kubernetes
  • Working closely with our incubation teams to help define how future products should operate
  • Integrating monitoring and logging with systems to improve observability and enable Service-Level Objectives
  • Designing and executing stress tests to validate scale expectations vs reality

Areas of Expertise for an SRE at Blizzard

SREs at Blizzard are expected to become experts in the technologies used by the teams they are working with. Below is a non-exhaustive list of technologies SREs may be exposed to:

  • Service-Level Objectives (SLI, SLO, SLA, Error Budget, Burn Rate)
  • Distributed Systems (system/software architectures, micro-services, high-availability, elections)
  • Configuration Management (Puppet, Hiera, Ansible)
  • Container Computing (Docker, Kubernetes, Service Mesh)
  • Cloud Services and Architecture (AWS, GCP, OpenStack)
  • Distributed Message Bus (RabbitMQ, Kafka, SQS)
  • Infrastructure as Code (Terraform, Terragrunt)
  • Proxies and Load Balancing (Nginx, HAProxy, Envoy, ELB/ALB)
  • Monitoring (Prometheus, Kibana, Grafana, Elasticsearch, New Relic, APM)
  • Logging (Splunk, SysLog, ELK Stack)
  • Source Control (GitHub Enterprise)
  • CI/CD (Jenkins, ArgoCD)
  • Linux (bash, debugging, tuning, performance measuring)
  • Networking (triaging, packet loss, routing)
  • Programming (Python, Go, JavaScript, C#, C++, Shell)

Requirements

  • Experience working in a large, complex, distributed application environment including real-time, stateful services.
  • Experience with software development (e.g. Python, Go, C#), CI Pipeline tools (e.g. Jenkins), Git source management, cloud hosting (e.g. AWS, GCP).
  • Extensive experience with container computing (e.g. Docker & Kubernetes) and distributed event stream technologies or message brokers (e.g. Kafka, RabbitMQ).
  • Expertise in Linux systems and operational excellence, and well as application stability, security, performance, capacity management, and documentation.
You must be logged in to to apply to this job.

Apply

Your application has been successfully submitted.

Please fix the errors below and resubmit.

Something went wrong. Please try again later or contact us.

Personal Information

Profile

View resume

Details

{{notification.msg}}