
Search by job, company or skills
Job Purpose:
We seek an Advanced MLOps Engineer with a deep understanding of end-to-end machine learning pipelines, specializing in Kubernetes MLOps, RAG, LangChain frameworks, and GraphRAG techniques. The ideal candidate has experience design, build and deploying scalable AI solutions on platforms like Databricks, AWS, OnPrem system, designing robust architectures and is recognized for solving complex ML challenges. This role requires a blend of cutting-edge ML innovation, software engineering, and operations expertise to deliver high-impact AI solutions in highly regulated environments.
Responsibilities:
- Manage infrastructure across on-premise clusters and AWS cloud, ensuring high availability and scalability.
- End-to-End MLOps Pipeline: Own the ML lifecycle from data ingestion, model training, and validation, to deployment and monitoring, using the latest MLOps practices. Also design, impelemnt and maintain feature store pipeline.
- Develop and maintain Infrastructure as Code (IaC) using Terraform for provisioning cloud and on-prem resources.
- Databricks & ML Deployment: Architect and deploy large-scale machine learning models on Databricks, ensuring high availability, fault tolerance, and compliance with enterprise-grade security standards.
- Deploy and manage containerized ML services CI/CD using Kubernetes, with GitOps practices via ArgoCD for ML Platform across Cloud and OnPrem.
- Chatbot RAG & LangChain Expertise: design, deployment, and optimization of Retrieval-Augmented Generation pipelines leveraging LangChain to build scalable, performant AI applications.
- Network understanding for configure VPCs, subnets, security groups, VPNs, and firewalls in AWS and on-prem environments.
- Monitor system performance, troubleshoot issues, and implement improvements for reliability and efficiency.
Qualifications:
3+ years of experience in machine learning, with a focus on MLOps and large-scale model deployment. Advanced proficiency in Python and other relevant languages, with strong skills in cloud-native deployments (e.g., AWS, Azure, GCP).Expertise in Infrastructure as Code (IaC) principles
Expertise in Kubernetes skills and CI/CD pipeline are big advantages
Experience with Databricks is a big advantage
Strong experience with programming languages, technique tools, and frameworks aligned with machine learning: Python, LLM, Langchain, GraphRAG, HuggingFace, Chatbot
Experience with Generative AI, Promp Engineering, and Large Language Models is a big plus
Job ID: 134823339