Search by job, company or skills

  • Posted a day ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Key Responsibilities:
  • Design, provision, and operate AWS-based infrastructure using Terraform
  • Own and evolve Kubernetes clusters running production workloads
  • Implement and tune autoscaling strategies (HPA, VPA, KEDA)
  • Enforce GitOps-first workflows for infrastructure and application delivery
  • Automate operational tasks using Ansible, scripting, and CI/CD pipelines
  • Improve reliability through monitoring, alerting, and capacity planning
  • Lead incident response and postmortems
  • Collaborate with engineering teams to improve developer experience
  • Apply security best practices across cloud, Kubernetes, and CI/CD layers
Required Qualifications:
  • 4+ years of hands-on experience in DevOps, SRE, or Platform Engineering roles, with clear production ownership.
  • Proven ownership of AWS and Kubernetes in production
  • Strong troubleshooting skills across infrastructure and CI/CD
  • Clear communication in spoken English

MUST HAVE

Cloud & Infrastructure

  • Strong production experience with AWS, including designing and operating cloud infrastructure
  • Deep understanding of cloud networking, IAM, security boundaries, and cost implications
  • Experience running production workloads in the cloud
Kubernetes & Containers
  • Kubernetes in production environments
  • Autoscaling with HPA, VPA, KEDA
  • Resource requests/limits, node pools, and capacity planning
  • Docker for containerization, image optimization, and security practices
Infrastructure as Code & Automation
  • Terraform for Infrastructure as Code, including modular design and remote state
  • Ansible for configuration management and automation
Delivery & Operations
  • GitOps-based workflows for infrastructure and application deployments
  • Experience integrating CI/CD pipelines with GitOps
Communication
  • Spoken English at a professional level for technical discussions and incident response

NICE TO HAVE

Observability & Reliability

  • Prometheus, Grafana, and logging stacks (ELK/EFK, Loki)
  • Understanding of SLOs, SLIs, and error budgets

Kubernetes Ecosystem

  • Helm and/or Kustomize
  • Argo CD or Flux
  • Ingress controllers and backup/DR tooling

Security

  • Kubernetes RBAC, NetworkPolicies, Pod Security Standards
  • Secrets management (AWS Secrets Manager, Vault, External Secrets)

Scripting & Programming

  • Bash scripting
  • Python or Go for automation and tooling

More Info

Job Type:
Industry:
Function:
Employment Type:

Job ID: 138314579