Responsibilities:
- Design, develop, and maintainbackend servicesusingGolang, ensuring high code quality, performance, and maintainability.
- Participate insystem designand architectural discussions, including service boundaries, data flow, scalability, and reliability trade-offs.
- Build, deploy, and operate applications onGoogle Kubernetes Engine (GKE):
-Package and deploy applications usingHelm charts
-Configure resources, autoscaling, health checks, and rollout/rollback strategies
-Troubleshoot production issues related to performance, stability, networking, and resource usage
- Manage cloud infrastructure (GCP)usingTerraform (Infrastructure as Code):
-Create, maintain, and review Terraform modules
-Ensure consistent and reliable environments across development, staging, and production
- Improvesystem reliability, observability, and security:
-Implement and use logging, metrics, tracing, and alerting
-Participate in incident response, root cause analysis, and post-incident improvements
- Collaborate closely with product, DevOps, and engineering teams to deliver secure, production-ready solutions.
Requirements& Qualifications:
- 6+ years of experiencein backend or platform engineering in production environments.
- Solid understanding ofdistributed systemsconcepts such as scalability, reliability, retries, timeouts, and consistency.
- Strong hands-on experience withGKE / Kubernetes, including:
-Core Kubernetes resources (Deployments, Services, Ingress, ConfigMaps, Secrets)
-Deploying and managing applications usingHelm charts
-Debugging and operating production workloads
- Strong understanding ofGCP core services, including IAM, VPC, Subnets, Cloud NAT, VPN, Load Balancing, Cloud DNS, Cloud Logging, Cloud Run and Monitoring.
- Practical experience withTerraformfor infrastructure provisioning and management.
- Experience withCI/CD pipelinesand cloud-native application operations.
- Strong proficiency inGolang, including:
-Concurrency (goroutines, channels), context handling, and error management
-Building and maintaining APIs (REST and/or gRPC)
-Writing clean, testable, and maintainable code
- Strong problem-solving skills and the ability to work withcomplex systems.
- Good communication skills and a strong sense of ownership.
Optional (Nice to Have):
- Basic knowledge ofAI systems and GenAI fundamentals, includingAI agents, RAG architectures, and LLM-based services.
- Familiarity withAI infrastructureconcepts:
-Model inference services
-GPU-based workloads
-Scaling, latency, and cost trade-offs
- Experience withservice mesh(e.g., Istio)
- Familiarity withobservability tools(Prometheus, Grafana, Cloud Monitoring)
- Good understanding ofcloud and application security, including:
-IAM and access control (GCP IAM, Kubernetes RBAC)
-Secrets management and secure configuration
-Secure service-to-service communication (mTLS)
-Container and Kubernetes security best practices