Responsibilities and Duties:
Configuration & Automation
- Manage and automate infrastructure with Ansible, Terraform, and GitLab CI/CD.
- Deploy Kubernetes applications using Helm.
Workflow Orchestration
- Build and manage batch & real-time pipelines with Airflow.
- Integrate Airflow with Kafka, PostgreSQL, and ClickHouse.
System Administration
- Administer PostgreSQL, MongoDB, and Redis for high availability and fault tolerance.
- Troubleshoot OS, storage (block/object), and networking (VPC, proxies, CDNs) issues.
Monitoring & Observability
- Set up monitoring with Prometheus and dashboards with Grafana.
- Manage logging systems and integrate incident alerts via Slack.
Performance & Troubleshooting
- Diagnose and optimize infrastructure and applications for efficiency.
- Collaborate with development teams to address bottlenecks and improve reliability.
Engineering Practices
- Ensure availability, scalability, and disaster recovery readiness.
- Contribute code in Shell, GoLang, and Python; participate in testing and release processes.
Collaboration & Knowledge Sharing
- Work in Agile teams, managing tasks through epics/issues.
- Maintain documentation, runbooks; conduct RCA and mentor team members.
Qualifications and Skills:
- 3+ years of experience with infrastructure automation using Terraform, Ansible, and Helm;
- Experience with Kubernetes cluster management and application deployment;
- Strong experience in data pipeline management with Airflow, Kafka, PostgreSQL, Redis, and ClickHouse;
- Proficiency in Shell, GoLang, and Python;
- Strong communication skills and experience with mentoring and knowledge sharing;
- Ability to gather and analyze metrics from both operating systems and applications for performance optimization and fault diagnosis.
How To Apply:
- Send Email via: [Confidential Information]
- Working Schedule:Monday to Friday - 09:00 AM to 06:00 PM
- Working Location: HQ Tower - 201 Tran Nao, An Khanh Ward, Thu Duc City, HCMC.