Search by job, company or skills

metadata solutions

Backend/DevOps Engineer (AI Chatbot Platform)

new job description bg glownew job description bg glownew job description bg svg
  • Posted 17 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Our client is looking for a highly skilled Backend/DevOps Engineer to build and operate the infrastructure for a specialized AI chatbot platform. You will be responsible for designing scalable backend systems, managing cloud and GPU infrastructure, and ensuring the reliable deployment of AI services powered by proprietary technologies.

This role combines backend API development with DevOps responsibilitiesideal for engineers who enjoy building robust systems and owning the entire deployment pipeline from code to production, including self-hosted LLM inference services.

Responsibilities:

Backend API Development (40%)

  • Design and develop RESTful/GraphQL APIs for chatbot services (conversations, knowledge bases, user management)
  • Build high-performance services for large-scale, real-time message processing
  • Develop APIs for AI inference services, including streaming responses and real-time chat
  • Build services for LLM model serving and inference pipeline management
  • Develop services for document processing, indexing, and retrieval pipelines
  • Create APIs for analytics, monitoring, and platform administration

Infrastructure & DevOps (40%)

  • Design and maintain cloud infrastructure on AWS/GCP/Azure (Kubernetes, Docker, Terraform)
  • Deploy and manage self-hosted LLM inference services (vLLM, TGI, TensorRT-LLM)
  • Manage GPU infrastructure for AI model training and inference
  • Build CI/CD pipelines for automated testing and deployment (GitHub Actions, GitLab CI)
  • Implement Infrastructure as Code for reproducible environments
  • Configure auto-scaling for AI inference services and chatbot workloads
  • Set up monitoring, logging, and alerting systems (Prometheus, Grafana, Datadog)

Platform Reliability & Security (20%)

  • Ensure high availability (99.9%+) for critical chatbot services
  • Implement disaster recovery and backup strategies for knowledge bases
  • Configure network policies, VPNs, and security groups to protect data
  • Set up rate limiting, DDoS protection, and API gateway configurations
  • Manage secrets, certificates, and compliance requirements (SOC2, ISO 27001)

Requirements:

Technical Requirements

Must-have:

  • Backend Development: Experience building production APIs (Python/FastAPI, Node.js, or Go)
  • Cloud Infrastructure: Hands-on experience with AWS, GCP, or Azure
  • Container Orchestration: Production experience with Docker and Kubernetes
  • CI/CD: Experience building and maintaining deployment pipelines
  • Databases: Schema design and query optimization (PostgreSQL, MongoDB, Redis)

Technical Skills:

  • Microservices architecture and service mesh (Istio, Linkerd)
  • Infrastructure as Code (Terraform, CloudFormation, Pulumi)
  • Message queues and event streaming (RabbitMQ, Kafka, AWS SQS)
  • Monitoring and observability (Prometheus, Grafana, ELK, Datadog)
  • Secrets management (HashiCorp Vault, AWS Secrets Manager)
  • Load balancing and CDN configuration
  • Database optimization and caching strategies (Redis, Memcached)
  • LLM serving frameworks (vLLM, TGI, TensorRT-LLM)
  • GPU infrastructure management (NVIDIA drivers, CUDA, GPU scheduling)

Nice to have:

  • Experience with AI/ML infrastructure (GPU clusters, model serving)
  • Knowledge of vector databases (Pinecone, Weaviate, Milvus)
  • Experience building large-scale, user-facing chat/messaging platforms
  • Experience with GitOps workflows (ArgoCD, Flux)
  • Knowledge of service mesh and zero-trust networking
  • Experience with cost optimization and FinOps practices
  • Familiarity with compliance frameworks (SOC2, GDPR, HIPAA)
  • Experience with Vietnamese cloud providers or local hosting

What We're Looking For

Backend Engineering

  • Proficiency in Python (FastAPI/Django) or Go
  • Experience designing RESTful and GraphQL APIs
  • Strong understanding of microservices architecture patterns
  • Solid knowledge of database design (SQL & NoSQL) and query optimization
  • Experience with async programming and message-driven architectures

DevOps & Infrastructure

  • Strong experience with Docker and Kubernetes
  • Infrastructure as Code (Terraform, CloudFormation, or Pulumi)
  • Experience designing and implementing CI/CD pipelines
  • Knowledge of cloud networking, security groups, and IAM
  • Experience with monitoring and observability tools

System Design

  • Experience building large-scale distributed systems
  • Understanding of CAP theorem and consistency models
  • Knowledge of caching strategies and CDN implementation
  • Experience with load testing and performance optimization
  • Understanding of high-availability architectures

Security & Compliance

  • Knowledge of security best practices (OWASP, encryption, secrets management)
  • Understanding of network security and VPC configurations
  • Experience with compliance standards (SOC2, ISO 27001)
  • Knowledge of identity and access management (IAM, SSO, OAuth)

Preferred Qualifications

  • Experience with AI/ML service deployment (LLM inference, model serving)
  • Knowledge of vector search infrastructure
  • Experience with chaos engineering and reliability practices
  • Certifications such as AWS Solutions Architect or CKA (Certified Kubernetes Administrator)
  • Contributions to open-source infrastructure projects
  • Experience with edge computing and CDN optimization

Technology Stack

  • Backend: Python (FastAPI), Go, Node.js
  • Infrastructure: AWS/GCP/Azure, Kubernetes, Docker, Terraform
  • AI Serving: vLLM, TGI, TensorRT-LLM, NVIDIA GPU Stack
  • Databases: PostgreSQL, MongoDB, Redis, Vector Databases
  • Message Queue: Kafka, RabbitMQ, AWS SQS
  • CI/CD: GitHub Actions, GitLab CI, ArgoCD
  • Monitoring: Prometheus, Grafana, Datadog, ELK Stack
  • Security: HashiCorp Vault, AWS Secrets Manager, cert-manager

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 145271771