Search by job, company or skills

bTaskee

Site Reliability Engineer (SRE) / Senior DevOps

Early Applicant
  • Posted 14 days ago
  • Be among the first 10 applicants

Job Description

Responsibilities and Duties:

Configuration & Automation

  • Manage and automate infrastructure with Ansible, Terraform, and GitLab CI/CD.
  • Deploy Kubernetes applications using Helm.

Workflow Orchestration

  • Build and manage batch & real-time pipelines with Airflow.
  • Integrate Airflow with Kafka, PostgreSQL, and ClickHouse.

System Administration

  • Administer PostgreSQL, MongoDB, and Redis for high availability and fault tolerance.
  • Troubleshoot OS, storage (block/object), and networking (VPC, proxies, CDNs) issues.

Monitoring & Observability

  • Set up monitoring with Prometheus and dashboards with Grafana.
  • Manage logging systems and integrate incident alerts via Slack.

Performance & Troubleshooting

  • Diagnose and optimize infrastructure and applications for efficiency.
  • Collaborate with development teams to address bottlenecks and improve reliability.

Engineering Practices

  • Ensure availability, scalability, and disaster recovery readiness.
  • Contribute code in Shell, GoLang, and Python; participate in testing and release processes.

Collaboration & Knowledge Sharing

  • Work in Agile teams, managing tasks through epics/issues.
  • Maintain documentation, runbooks; conduct RCA and mentor team members.

Qualifications and Skills:

  • 3+ years of experience with infrastructure automation using Terraform, Ansible, and Helm;
  • Experience with Kubernetes cluster management and application deployment;
  • Strong experience in data pipeline management with Airflow, Kafka, PostgreSQL, Redis, and ClickHouse;
  • Proficiency in Shell, GoLang, and Python;
  • Strong communication skills and experience with mentoring and knowledge sharing;
  • Ability to gather and analyze metrics from both operating systems and applications for performance optimization and fault diagnosis.

How To Apply:

  • Send Email via: [Confidential Information]
  • Working Schedule:Monday to Friday - 09:00 AM to 06:00 PM
  • Working Location: HQ Tower - 201 Tran Nao, An Khanh Ward, Thu Duc City, HCMC.

More Info

Date Posted: 18/09/2025

Job ID: 126153565

Report Job

About Company

View More
Last Updated: 23-09-2025 08:49:19 PM
Home Jobs in Ho Chi Minh Site Reliability Engineer (SRE) / Senior DevOps