Search by job, company or skills

  • Posted a day ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Key Responsibilities:

- Design, provision, and operate AWS-based infrastructure using Terraform

- Own and evolve Kubernetes clusters running production workloads

Implement and tune autoscaling strategies (HPA, VPA, KEDA)

- Enforce GitOps-first workflows for infrastructure and application delivery

- Automate operational tasks using Ansible, scripting, and CI/CD pipelines

- Improve reliability through monitoring, alerting, and capacity planning

- Lead incident response and postmortems

- Collaborate with engineering teams to improve developer experience

- Apply security best practices across cloud, Kubernetes, and CI/CD layers

Required Qualifications:

- 4+ years of hands-on experience in DevOps, SRE, or Platform Engineering roles, with clear production ownership.

- Proven ownership of AWS and Kubernetes in production

- Strong troubleshooting skills across infrastructure and CI/CD

- Clear communication in spoken English

MUST HAVE

Cloud & Infrastructure

- Strong production experience with AWS, including designing and operating cloud infrastructure

- Deep understanding of cloud networking, IAM, security boundaries, and cost implications

- Experience running production workloads in the cloud

Kubernetes & Containers

- Kubernetes in production environments

- Autoscaling with HPA, VPA, KEDA

- Resource requests/limits, node pools, and capacity planning

- Docker for containerization, image optimization, and security practices

Infrastructure as Code & Automation

- Terraform for Infrastructure as Code, including modular design and remote state

- Ansible for configuration management and automation

Delivery & Operations

- GitOps-based workflows for infrastructure and application deployments

- Experience integrating CI/CD pipelines with GitOps

Communication

- Spoken English at a professional level for technical discussions and incident response

NICE TO HAVE

Observability & Reliability

- Prometheus, Grafana, and logging stacks (ELK/EFK, Loki)

- Understanding of SLOs, SLIs, and error budgets

Kubernetes Ecosystem

- Helm and/or Kustomize

- Argo CD or Flux

- Ingress controllers and backup/DR tooling

Security

- Kubernetes RBAC, NetworkPolicies, Pod Security Standards

- Secrets management (AWS Secrets Manager, Vault, External Secrets)

Scripting & Programming

- Bash scripting

- Python or Go for automation and tooling

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 138157877

Similar Jobs