Senior Platform Engineer (AI Inference & Agent Platform)

greennode

Ho Chi Minh, Vietnam

5-7 Years

Save

Posted 14 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

We are looking for a Senior Platform Engineer with deep expertise in deploying, operating, and optimizing Kubernetes-based infrastructure, LLM Inference Platforms, and Agent Platforms at scale. In this role, you will be a key contributor to building and running AI-native platforms centered on large-scale LLM inference, GPU acceleration, and agent workloads — with a relentless focus on stability, performance, and scalability.

Key Responsibilities

Deploy, operate, and continuously optimize Kubernetes clusters across cloud and on-premise environments.
Build and maintain a robust LLM Inference Platform and Agent Platform to serve GenAI applications, AI agents, and large-scale AI workloads.
Deploy and tune inference engines including vLLM, SGLang, Triton, TensorRT-LLM, llama.cpp, KServe, Ray Serve, and equivalent frameworks.
Drive inference performance improvements for LLM workloads through batching, quantization, KV-cache optimization, parallelism strategies, and runtime tuning.
Maximize GPU utilization and optimize autoscaling, scheduling, latency, and throughput for large-scale inference systems.
Architect and operate scalable serving infrastructures for multi-tenant AI workloads, balancing high availability with cost efficiency.
Establish and maintain comprehensive monitoring and observability systems covering AI platform health and inference workload performance.
Define and refine key metrics, alerting thresholds, SLOs/SLAs, and error budgets for inference services.
Build and manage deployment pipelines, rollout strategies, and automation workflows for AI systems.
Lead and contribute to incident response, root cause analysis, and ongoing reliability improvements.
Partner closely with AI Engineers and Product Teams to continuously elevate the AI platform and developer experience.

Requirements

5+ years of experience as a Platform Engineer, Site Reliability Engineer (SRE), DevOps Engineer, or equivalent role.
Proven track record deploying and operating Kubernetes in production environments.
Strong command of the Kubernetes ecosystem: networking, ingress, storage, autoscaling, observability, and security.
Hands-on experience with AI/ML infrastructure, GPU workloads, and LLM inference systems, including engines such as vLLM, SGLang, Triton, TensorRT-LLM, llama.cpp, or equivalent.
Solid understanding of LLM inference optimization techniques — quantization, batching, tensor/pipeline parallelism, and KV-cache optimization.
Experience with monitoring and observability tooling: Prometheus, Grafana, Loki, ELK/OpenSearch, and OpenTelemetry.
Proficiency with CI/CD, GitOps, Helm, Terraform, ArgoCD, or comparable toolchains.
Ability to write reliable automation scripts in Python, Bash, or Go.
Strong foundational knowledge of Linux systems, networking, distributed systems, and performance tuning.
Self-driven, systems-minded, and capable of managing production incidents with composure and rigor.
An AI-native mindset — you actively leverage AI tools and automation to sharpen operational efficiency and elevate engineering workflows.

Nice to Have

Experience with LLMOps, RAG systems, AI agents, or agent orchestration frameworks.
Familiarity with inference orchestration, request routing, or disaggregated serving architectures.
Hands-on experience with distributed systems such as Kafka, ClickHouse, Elasticsearch/OpenSearch, or vector databases.
Prior experience deploying AI platforms in on-premise or private cloud environments.
Relevant certifications: CKA, CKAD, CKS, AWS/GCP/Azure, or other cloud and platform credentials.