Search by job, company or skills

CODE88 Company Limited

DevOps Team Leader

This job is no longer accepting applications

new job description bg glownew job description bg glownew job description bg svg
  • Posted a month ago

Job Description

Job Purpose:

The DevOps Team Lead sits at the intersection of technical expertise, operational reliability, and project delivery. This role is responsible for leading a team of Systems/Platform engineers to design, implement, and maintain secure, scalable, and highly available infrastructure across AWS, Azure, Google Cloud, and onpremise environments.

The position owns the endtoend application delivery platform (CI/CD, Kubernetes, GitLab, ArgoCD, Helm), observability stack, and continuous ISO/IEC 27001 compliance within the team, ensuring timely delivery of highquality infrastructure services that support business objectives.

Key Responsibilities

Infrastructure & IaC Management

  • Lead the design, implementation, and maintenance of infrastructure across AWS, Azure, Google Cloud, and onpremise servers.
  • Champion Infrastructure as Code (IaC) practices using tools such as Terraform, Terragrunt, CloudFormation, or equivalent to provision, configure, and manage infrastructure in a repeatable and auditable way.
  • Ensure environments are standardized, secure, costoptimized, and aligned with architecture and security guidelines.

Application Delivery & Platform Engineering

  • Own and evolve the application delivery platform using GitLab CI, ArgoCD, Helm charts, and Kubernetes.
  • Design and maintain CI/CD pipelines to support reliable, frequent, and automated application deployments across environments.
  • Establish best practices and guardrails for Kubernetes cluster configuration, namespace management, Helm chart management, and deployment strategies (e.g., blue/green, canary).
  • Collaborate closely with development teams to ensure smooth, predictable, and observable releases.

Monitoring, Logging & Alerting

  • Lead the design, implementation, and continuous improvement of the observability stack, including Prometheus, Thanos, Alertmanager, Grafana, Kibana, and Elasticsearch.
  • Define and maintain monitoring standards, SLOs/SLIs, dashboards, and alerting rules to ensure early detection and rapid resolution of incidents.
  • Ensure logs, metrics, and traces are consistently collected, stored, and accessible for troubleshooting, performance tuning, and capacity planning.

Compliance & Information Security (ISO/IEC 27001)

  • Lead the implementation, documentation, and continuous maintenance of the ISO/IEC 27001 Information Security Management System (ISMS) within the team.
  • Ensure infrastructure, platforms, and operational processes adhere to information security policies, controls, and audit requirements.
  • Collaborate with Information Security, Risk, and Compliance stakeholders to support audits, risk assessments, and corrective actions.
  • Promote a culture of security and compliance awareness within the team and across collaborating functions.

Team Leadership & People Management

  • Lead, mentor, and develop a team of Systems/Platform engineers; provide regular feedback, support career growth, and foster a highperformance culture.
  • Plan and prioritize team workload, ensuring timely delivery of projects, BAU tasks, and incident resolution.
  • Promote knowledge sharing, documentation, and crosstraining to reduce single points of failure.

Collaboration

  • Work closely with software development, security, network, and service desk teams to ensure infrastructure and platforms meet business and operational requirements.
  • Translate business needs into technical solutions, set expectations, and communicate clearly on progress, risks, and timelines.
  • Participate in architecture and design discussions, contributing infrastructure and operations perspectives.

Reliability, Incident & Problem Management

  • Oversee incident response, including triage, communication, and coordination with relevant teams to minimize downtime and impact.
  • Drive root cause analysis (RCA) and implement corrective and preventive actions for recurring issues.
  • Continuously improve operational processes, runbooks, and standard operating procedures.

Skills & Qualifications:

  • Bachelor's degree in Computer Science, Information Technology, Engineering, or related field; advanced degree is a plus.
  • 5+ years of handson experience in systems, platform, or infrastructure engineering, with at least 2 years in a technical leadership or team lead role.
  • Strong communication skills in English, both written and verbal, with the ability to explain complex technical topics to nontechnical stakeholders.
  • Demonstrated ability to provide highquality customer service, manage expectations, and build strong relationships with internal stakeholders.
  • Proven experience leading and mentoring technical teams.

Knowledge & Experience:

  • Deep expertise in managing and configuring public cloud environments (AWS required; Azure and Google Cloud strongly preferred).
  • Strong experience with Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or equivalent.
  • Proven experience designing and maintaining CI/CD pipelines, ideally with GitLab CI; familiarity with other CI tools is a plus.
  • Handson experience with Kubernetes, ArgoCD, and Helm charts for application deployment and configuration management.
  • Solid understanding of networking concepts within cloud and containerized environments (VPCs, subnets, security groups, ingress/egress, load balancers).
  • Strong background in Linux administration, system hardening, patch management, and performance optimization.
  • Practical experience with observability stacks: Prometheus, Thanos, Alertmanager, Grafana, Kibana, and Elasticsearch (or equivalent tools).
  • Proven experience implementing, operating, or maintaining ISO/IEC 27001 controls and processes within an organization.
  • Experience with configuration management/automation tools (e.g., Ansible, Rancher, or equivalent).
  • Relevant cloud certifications (e.g., AWS Certified Solutions Architect, Azure Administrator, Google Professional Cloud Architect) are an advantage.

(*) BONUSES & REWARDS

Competitive Salary

13th Month Salary & Performance Bonus

Employee of the Year Award

(*) TRAINING & DEVELOPMENT

In-house & Overseas Training

Full reimbursement for international Technical Certification

Global career opportunity

(*) ANNUAL PAID LEAVES

Vacation Leave: 14 days per year

Medical Leave: 6 days per year

1 extra seniority day for every 3 years of service

(*) HEALTHCARE

Annual Routine Check-up

Premium Healthcare Insurance (Generali)

Comprehensive Insurance

(*) WELLNESS AND LEISURE ACTIVITIES

Annual Team Building

Soccer & Badminton Club and Sports activities

Entertainment activities: Music band, Karaoke & Play-station time

Celebrations special events: Birthdays, Christmas, New Year/Year-end Party.

(*) PERKS

Fruits Days Twice a Month

Unlimited snacks & beverages

More Info

Job Type:
Industry:
Function:
Employment Type:

Job ID: 142151997