As we provide services serving millions of customers such as: Zalo, ZMP3, BaoMoi, Kiki ....We are looking for an experienced Senior Site Reliability Engineer (Sr. SRE) who brings a unique perspective, a passion for collaborating with cross-functional teams, and the ability to derive real-time insights from a massive data scale to build practical solutions and deliver exceptional user experiences at every touchpoint.
What you will do
- Run the production environment by monitoring availability and taking a holistic view of system health;
- Build software and systems to manage platform infrastructure and applications;
- Improve reliability, quality, and time-to-market of our suite of software solutions;
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continuous improvement;
- Provide primary operational support and engineering for multiple large-scale distributed software applications;
- Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding;
- Partner with development teams to improve services through rigorous testing and release procedures;
- Participate in system design consulting, platform management, and capacity planning;
- Create sustainable systems and services through automation and uplifts;
- Balance feature development speed and reliability with well-defined service-level objectives.
What you will need
- Ability to program (structured and OOP) using one or more high-level languages, such as Python, Golang;
- Experience with dynamic resource management frameworks (Kubernetes, Nomad, Yarn);
- Experience manage infrastructure as code (Terraform,..);
- Experience with source version control (git, svn...), as well as configuration management (Ansible, Puppet, Salt stack...);
- Experience with distributed storage technologies such as NFS, HDFS, Ceph and Amazon S3;
- Proactive approach to identifying problems, performance bottlenecks, and areas for improvement.
Preferred skills and qualifications:
- Previous success in technical engineering;
- Coding experience beyond simple scripts.