
Search by job, company or skills
Key Responsibilities:
- Design, provision, and operate AWS-based infrastructure using Terraform
- Own and evolve Kubernetes clusters running production workloads
Implement and tune autoscaling strategies (HPA, VPA, KEDA)
- Enforce GitOps-first workflows for infrastructure and application delivery
- Automate operational tasks using Ansible, scripting, and CI/CD pipelines
- Improve reliability through monitoring, alerting, and capacity planning
- Lead incident response and postmortems
- Collaborate with engineering teams to improve developer experience
- Apply security best practices across cloud, Kubernetes, and CI/CD layers
Required Qualifications:
- 4+ years of hands-on experience in DevOps, SRE, or Platform Engineering roles, with clear production ownership.
- Proven ownership of AWS and Kubernetes in production
- Strong troubleshooting skills across infrastructure and CI/CD
- Clear communication in spoken English
MUST HAVE
Cloud & Infrastructure
- Strong production experience with AWS, including designing and operating cloud infrastructure
- Deep understanding of cloud networking, IAM, security boundaries, and cost implications
- Experience running production workloads in the cloud
Kubernetes & Containers
- Kubernetes in production environments
- Autoscaling with HPA, VPA, KEDA
- Resource requests/limits, node pools, and capacity planning
- Docker for containerization, image optimization, and security practices
Infrastructure as Code & Automation
- Terraform for Infrastructure as Code, including modular design and remote state
- Ansible for configuration management and automation
Delivery & Operations
- GitOps-based workflows for infrastructure and application deployments
- Experience integrating CI/CD pipelines with GitOps
Communication
- Spoken English at a professional level for technical discussions and incident response
NICE TO HAVE
Observability & Reliability
- Prometheus, Grafana, and logging stacks (ELK/EFK, Loki)
- Understanding of SLOs, SLIs, and error budgets
Kubernetes Ecosystem
- Helm and/or Kustomize
- Argo CD or Flux
- Ingress controllers and backup/DR tooling
Security
- Kubernetes RBAC, NetworkPolicies, Pod Security Standards
- Secrets management (AWS Secrets Manager, Vault, External Secrets)
Scripting & Programming
- Bash scripting
- Python or Go for automation and tooling
Job ID: 138157877