Search by job, company or skills

techcombank (tcb)

Director, Site Reliability Engineering

10-12 Years
Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 2 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Purpose

We're looking for someone who loves making things run smoothly — and keeping them that way. As our Director of Site Reliability Engineering, you'll lead the teams that keep our systems reliable, fast, and always available. You'll oversee our 24/7 operations and our observability platform team, making sure we're not just fixing problems but preventing them.

This isn't just about uptime. It's about building a culture of reliability, accountability, and smart engineering. You'll set the strategy, guide the teams, and make sure we're hitting our SLOs while constantly improving how we work.

If you're passionate about reliability, believe in observability, and want to help shape the future of SRE in Vietnam, this is your chance.

What You'll Do
  • Lead and grow our SRE organization — including a 24/7 operations team (split into squads) and an observability platform team.
  • Define and deliver a roadmap for reliability across all our systems.
  • Own our SLOs, SLIs, and SLAs — track them, report them, and make sure we meet them.
  • Drive incident management and postmortems that actually lead to change.
  • Oversee our observability stack: Dynatrace, SolarWinds, Splunk, Prometheus/Grafana, OpenSearch. and make sure it serves everyone: engineers, operations and QA/QE teams
  • Work closely with engineering and product teams to bake reliability into everything we do.

How We'll Measure Success
  • Meeting (and beating) our SLOs and SLAs.
  • Reducing MTTD and MTTR.
  • Improving uptime and reliability across the board.
  • Building a strong, engaged SRE team.
  • Making a mark in the SRE community.

What We're Looking For
  • 10+ years in SRE or related fields, with at least 6 years leading teams.
  • Deep experience with observability tools (Dynatrace, SolarWinds, Splunk, Prometheus/Grafana, OpenSearch).
  • Strong knowledge of AWS and on-prem infrastructure.
  • Great leadership skills — you know how to build teams and help people grow.
  • Comfortable with incident management.
  • Analytical, data-driven, and always looking for ways to improve.

Nice to Have

  • Experience in financial services or other regulated industries.
  • Certifications in cloud or SRE practices (AWS, Google SRE, etc.).
  • Experience with DevOps practices and knowledge of DevEx.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 146654365

Similar Jobs