Search by job, company or skills

Optimizely

Site Reliability Engineer

This job is no longer accepting applications

new job description bg glownew job description bg glownew job description bg svg
  • Posted 3 months ago

Job Description

Introduction

SREs at Optimizely are focused on making us the most reliable, performant, and trustworthy Digital Experience Optimization platform ever. Our engineering teams have built data pipelines that process 10 billion events daily and applications that support powerful experimentation and collaboration workflows at scale. Our platforms are built on AWS and GCP. We use technologies such as Kafka, Samza, HBase, MySQL, and Postgres. We build and manage our systems using TravisCI, Jenkins, Docker, Kubernetes, Terraform, and Chef. We use a combination of managed and self-hosted approaches. This is a unique opportunity to lead the engineering organization in areas of standardized automated infrastructure and service provisioning and orchestration, service-oriented architectural excellence, and forward-looking planning and execution of large technical project.

We are looking for a great Site Reliability Engineer to help build and scale our CloudOps capabilities. You will be responsible for designing, implementing, and operating critical infrastructure and platform services while collaborating closely with engineering, support, and product teams to improve the reliability, scalability, and performance of our systems.

This is a hands-on technical role where you will be instrumental in shaping the SRE culture,

driving automation, and ensuring high availability across all services.

Job Responsibilities

  • Champion a Site Reliability Engineering culture across the organization by sharing best practices, tools, documentation, and code.
  • Identify and automate manual operational tasks using scripting, infrastructure-as-code, and CI/CD pipelines.
  • Build and maintain observability (monitoring, logging, tracing) for all production systems to ensure reliability, availability, and performance.
  • Proactively monitor alerts across all platforms and coordinate with SRE, Operations, Engineering, and Support teams to ensure quick detection and resolution of incidentsminimizing MTTA/MTTR.
  • Lead and manage on-call rotations, driving a blameless incident management and postmortem culture.
  • Collaborate with development teams to define and implement SLOs, SLIs, and error budgets.
  • Ensure uptime SLAs are met through robust automation, testing, monitoring, and operational best practices.
  • Create and maintain runbooks, playbooks, and system documentation to ensure operational readiness and knowledge sharing.

Knowledge and Experience

  • Strong experience in Linux Systems Administration in cloud or virtualized environments
  • Proficiency in infrastructure-as-code tools such as Terraform
  • Hands-on experience with configuration management tools like Ansible or SaltStack
  • Skilled in scripting and automation using Python and Bash
  • Experience deploying and maintaining services in public cloud environments (Azure, AWS, or GCP)
  • Solid understanding of observability tooling, especially Datadog, ELK Stack (Elasticsearch, Logstash, Kibana), or similar
  • Experience building and maintaining CI/CD pipelines (e.g., GitHub Actions, Azure DevOps, Octopus)
  • Familiarity with Kubernetes and Docker; production experience is a strong plus
  • Experience operating and scaling distributed systems across multiple regions
  • Strong communication and collaboration skills; comfortable working across time zones
  • Passion for learning, continuous improvement, and a strong sense of ownership
  • Fluent in English, both written and spoken.

Optimizely is committed to a diverse and inclusive workplace. Optimizely is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 135140319

Similar Jobs