We are looking for a Senior Platform Engineer with deep expertise in deploying, operating, and optimizing Kubernetes-based infrastructure, LLM Inference Platforms, and Agent Platforms at scale. In this role, you will be a key contributor to building and running AI-native platforms centered on large-scale LLM inference, GPU acceleration, and agent workloads — with a relentless focus on stability, performance, and scalability.
Key Responsibilities
- Deploy, operate, and continuously optimize Kubernetes clusters across cloud and on-premise environments.
- Build and maintain a robust LLM Inference Platform and Agent Platform to serve GenAI applications, AI agents, and large-scale AI workloads.
- Deploy and tune inference engines including vLLM, SGLang, Triton, TensorRT-LLM, llama.cpp, KServe, Ray Serve, and equivalent frameworks.
- Drive inference performance improvements for LLM workloads through batching, quantization, KV-cache optimization, parallelism strategies, and runtime tuning.
- Maximize GPU utilization and optimize autoscaling, scheduling, latency, and throughput for large-scale inference systems.
- Architect and operate scalable serving infrastructures for multi-tenant AI workloads, balancing high availability with cost efficiency.
- Establish and maintain comprehensive monitoring and observability systems covering AI platform health and inference workload performance.
- Define and refine key metrics, alerting thresholds, SLOs/SLAs, and error budgets for inference services.
- Build and manage deployment pipelines, rollout strategies, and automation workflows for AI systems.
- Lead and contribute to incident response, root cause analysis, and ongoing reliability improvements.
- Partner closely with AI Engineers and Product Teams to continuously elevate the AI platform and developer experience.
Requirements
- 5+ years of experience as a Platform Engineer, Site Reliability Engineer (SRE), DevOps Engineer, or equivalent role.
- Proven track record deploying and operating Kubernetes in production environments.
- Strong command of the Kubernetes ecosystem: networking, ingress, storage, autoscaling, observability, and security.
- Hands-on experience with AI/ML infrastructure, GPU workloads, and LLM inference systems, including engines such as vLLM, SGLang, Triton, TensorRT-LLM, llama.cpp, or equivalent.
- Solid understanding of LLM inference optimization techniques — quantization, batching, tensor/pipeline parallelism, and KV-cache optimization.
- Experience with monitoring and observability tooling: Prometheus, Grafana, Loki, ELK/OpenSearch, and OpenTelemetry.
- Proficiency with CI/CD, GitOps, Helm, Terraform, ArgoCD, or comparable toolchains.
- Ability to write reliable automation scripts in Python, Bash, or Go.
- Strong foundational knowledge of Linux systems, networking, distributed systems, and performance tuning.
- Self-driven, systems-minded, and capable of managing production incidents with composure and rigor.
- An AI-native mindset — you actively leverage AI tools and automation to sharpen operational efficiency and elevate engineering workflows.
Nice to Have
- Experience with LLMOps, RAG systems, AI agents, or agent orchestration frameworks.
- Familiarity with inference orchestration, request routing, or disaggregated serving architectures.
- Hands-on experience with distributed systems such as Kafka, ClickHouse, Elasticsearch/OpenSearch, or vector databases.
- Prior experience deploying AI platforms in on-premise or private cloud environments.
- Relevant certifications: CKA, CKAD, CKS, AWS/GCP/Azure, or other cloud and platform credentials.