Search by job, company or skills

techcombank (tcb)

Senior Site Reliability Engineer

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 9 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Role Overview

We are seeking a Senior Monitoring Engineer to join our Monitoring Team. This team is responsible for building and operating a centralized monitoring and observability platform that spans all on-premise and cloud infrastructure at Techcombank. You will play a key role in designing monitoring strategies, implementing solutions, and working closely with cross-functional teams to ensure comprehensive system visibility and performance insight across the bank.

Key Responsibilities:

  • Design and implement monitoring solutions for enterprise systems across on-premise and AWS cloud environments.

  • Define/update monitoring standards, metrics, and alerts in collaboration with infrastructure and application teams.

  • Maintain and operate observability tools such as Grafana, Prometheus, InfluxDB, Dynatrace, Splunk, and OpenSearch.

  • Act as a key point of contact for projects to advise on monitoring requirements and best practices.

  • Integrate monitoring tools into CI/CD pipelines and incident management workflows.

  • Collaborate with stakeholders across IT, security, operations, and business units to ensure monitoring coverage and visibility.

  • Participate in root cause analysis and post-incident reviews to continuously improve observability practices.

Requirements

Must-Have Qualifications:

3+ years of experience in system engineering, DevOps, SRE, or observability/monitoring roles. Hands-on experience with at least 3 of:

• Grafana (dashboards, alerting)

• Prometheus or InfluxDB (metrics collection and storage)

• Dynatrace (APM)

• Splunk and/or OpenSearch (log management)

Good understanding of system architectures, both traditional on-premises and cloud-native (especially AWS).

Familiarity with the operational needs of enterprise IT environments, ideally within the banking or financial services sector.

Good communication and stakeholder management skills.

Ability to translate technical metrics into meaningful operational insights. Resilience and reliability under pressure, able to maintain focus and deliver results.

Nice-to-Have Skills:

Scripting (Bash, Python, Groovy, etc.) and automation experience

Exposure to ITIL practices, incident management workflows, and service reliability concepts

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 147083823

Similar Jobs