
Search by job, company or skills
Responsibilities
Platform Operation:
Design and operate GCP Kubernetes clusters for production workloads
Implement Infrastructure-as-Code with Terraform and Crossplane
Deploy platforms using GitOps (ArgoCD/Flux)
Build CI/CD pipelines
Manage multi-cluster environments with GCP Arc
Site Reliability Engineering:
Build observability stack: Prometheus, OpenTelemetry, Jaeger, Loki, Grafana
Define SLIs, SLOs, error budgets
Deploy automated incident response and root cause analysis
Qualifications
General requirements:
At least Intermediate level of English level
Likely having 4+ years of experience depending on how fast of your learning and developing technical capability
Ability to effectively consult with clients to understand their needs, propose tailored solutions, and persuasively communicate their value to gain approval
Ability to obtain deep knowledge of the project technologies and work independently with minimum guidance
Strong logical thinking and problem-solving skills
Ability to self-learn and adapt to new technologies quickly
Must have:
Kubernetes Mastery: 4+ years of production experience (Cluster design, RBAC, networking, performance tuning).
IaC & GitOps: Proven experience with Terraform and GitOps (ArgoCD/Flux), CI/CD pipelines (GitHub Actions, Azure DevOps, Jenkins, GitLab CI)
Code/Scripting: Proficiency in Python or Go (for automation) and Bash.
The Cloud: Strong experience in GCP/AWS (AKS, VNets, Identity) is preferred
Observability: Deep knowledge of Prometheus and Grafana.
Nice to have:
Experience with Service Mesh (Istio, Linkerd).
Security-first mindset: Container security, Policy-as-Code, or CKS certification.
FinOps: Experience with cloud cost optimization tools.
Certifications: CKA, CKS, or GCP/Azure/AWS Solutions Architect.
Job ID: 145042867