Search by job, company or skills

FPT Smart Cloud

Site Reliability Engineer (SRE)

new job description bg glownew job description bg glownew job description bg svg
  • Posted 7 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About FPT Smart Cloud

FPT Smart Cloud (FCI) a member of FPT Corporation, pioneers AI & Cloud solutions in Vietnam. FCI was founded with the mission to generating an immense leap in productivity and agility in business operations.

FPT Smart Cloud aims at leading the industry by focusing on building a firm technological foundation, developing diversified ecosystem products, and reaching global connectivity.

  • Customized to specific needs: Providing cloud-based products and solutions customized to each industry.
  • All-in-one Platform: Consolidating FPT Smart Cloud technology and diverse business solutions all in one platform. AI & Cloud services are a Unify eco-system.
  • Local market leadership: Outstanding Cloud and AI technology infrastructure and platform to help local businesses grow their products and services online.
  • Deliver the future: Help customers achieve business outcomes faster by integrating world-class processes and technology.

Key Responsibilities

  • Define and monitor SLOs/SLIs for critical services and systems, from identifying appropriate indicators to maintaining them throughout the service lifecycle.
  • Monitor service performance by implementing and optimizing observability tools such as Prometheus, Grafana, Datadog, or similar platforms to measure reliability and performance.
  • Collaborate with development and operations teams to ensure SLOs are integrated into development and operational workflows.
  • Evaluate and adjust SLOs when necessary to ensure alignment with business objectives and customer requirements.
  • Optimize systems to balance performance and operational costs while maintaining high stability and reliability.
  • Build dashboards, reports, and alerting mechanisms related to SLIs/SLOs, and propose improvements when objectives are breached.

Requirements

  • 13 years of experience in SRE, DevOps, or similar roles.
  • Hands-on experience defining and managing SLI/SLO, with a solid understanding of service reliability measurement.
  • Experience with monitoring and observability tools such as Prometheus, Grafana, Datadog, or equivalent solutions.
  • Knowledge of Kubernetes, Docker, and container technologies.
  • Experience with CI/CD tools such as Jenkins, GitLab CI, or other automation platforms.
  • Strong analytical skills and ability to use tools such as Excel, SQL, or other data analysis tools to measure system performance.
  • Good communication skills with the ability to collaborate effectively with both technical and non-technical stakeholders to define SLOs aligned with business goals

Top Benefits

  • Salary: Competitive, pay according to ability. Negotiation during the interview.
  • Social insurance and health insurance according to labor laws.
  • Creative, open-minded working environment that respects individuals
  • FPT Premium Care package
  • Activities and culture with FCI and FPT Corporation
  • Study support package for children of FCI union
  • Sponsor related courses and certifications

Working Environment

  • Working Location:
    • Site HN: FPT Tower, No. 10 Pham Van Bach Street, Cau Giay Ward, Hanoi
    • Site HCMC: 3rd floor, PJICO Tower, no. 186 Dien Bien Phu, Ward 6, District 3, HCMC.
  • Working hours:
    • 8h30 AM 12h00 PM
    • 1h00 PM 5h30 PM
  • Working days: Monday to Friday (weekends off)
Contact Person

Pham Thi Ha My (Ms.) Talent Acquisition Team Lead

Email: [Confidential Information]

Phone: 0962456194

FPT Smart Cloud (FCI) Co., LTD

Address: 7th Floor, FPT Tower, No. 10 Pham Van Bach, Cau Giay Dist, Hanoi

Websites: FPT Cloud | FPT AI

Send CV

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 143860335

Similar Jobs