LLM Engineer / GenAI Engineer (RAG & LLMOps)

FPT Software

Ho Chi Minh, Vietnam

5-7 Years

Save

Posted 11 days ago
Be among the first 10 applicants

Early Applicant

Job Description

Role Overview

Own the design, fine-tuning, optimization, and production deployment of large language models (LLMs) for domain-specific use cases. You will build high-performance RAG systems, optimize prompts/agents, operate inference at scale, and champion engineering best practices while driving research and innovation.

Key Responsibilities

LLM Engineering: Design, fine-tune, and optimize models such as GPT, Claude, Gemini, LLaMA, and Falcon for domain-specific applications.
RAG Systems: Build and operate retrieval-augmented generation pipelines (ingestion, chunking, embedding, indexing, retrieval, re-ranking) using vector databases (FAISS, Pinecone, Weaviate, etc.).
Prompt/Agent Optimization: Develop prompt templates, chains, and agents with LangChain/LlamaIndex; implement guardrails, tool-use, and memory.
Model Deployment (LLMOps): Implement, monitor, and scale inference endpoints with MLflow, Docker, and Kubernetes; manage versioning/registry and safe rollouts (blue-green/canary).
Performance Optimization: Evaluate and continuously improve accuracy, latency, and cost (batching, caching/KV-cache, quantization, speculative decoding).
Collaboration & Mentoring: Review code, set best practices for AI software engineering, and mentor junior engineers.
Research & Innovation: Track advances in LLMs, multimodal AI, and open source; lead PoCs, benchmarking, and knowledge sharing.

Required Qualifications

Education: Bachelor's or Master's in Computer Science, Artificial Intelligence, or related field (PhD preferred).
Experience:
5+ years in machine learning/NLP.
2+ years working directly with LLMs or GenAI applications.
Technical Skills:
Proficiency in Python and ML frameworks (PyTorch/TensorFlow) and Hugging Face Transformers.
Hands-on with LangChain, LlamaIndex, or SDKs for OpenAI/Anthropic/Cohere/Gemini.
Strong understanding of embeddings, tokenization, and vector search/retrieval.
Familiarity with MLOps, CI/CD, and cloud (AWS/Azure/GCP); containerization with Docker/Kubernetes.
Experience integrating AI APIs (OpenAI, Anthropic, Cohere, Google Gemini).
Soft Skills: Excellent problem-solving and communication; comfortable leading projects and mentoring teammates.

Preferred/Bonus

Experience with model distillation and fine-tuning open-source LLMs (LoRA/QLoRA, PEFT).
Exposure to multimodal AI (text + image + audio/voice), TTS/ASR, VLMs.
Familiarity with AI safety, bias/fairness, privacy, and governance/compliance frameworks.
Cost/performance tuning: quantization (INT8/INT4), speculative decoding, throughput optimization.

Success Metrics (KPIs)

Model quality (task-specific metrics: accuracy/recall, hallucination rate, BLEU/ROUGE/WER as applicable).
System performance & cost (P95 latency, throughput, cost per request).
Reliability (SLO/SLA, error rates) and delivery velocity (lead time, deployment frequency).
Knowledge impact (PoC production conversions, docs/best practices, mentoring outcomes).

Tools & Environment

Model/Serving: HF Transformers, vLLM/TensorRT-LLM, Triton, Ray/Modal (as applicable).
Vector/RAG: FAISS, Pinecone, Weaviate, Milvus; re-ranking (e.g., Cross-Encoder/ColBERT).
Ops/Observability: MLflow, Prometheus/Grafana, OpenTelemetry, Weights & Biases.
Data: Airflow/Prefect, dbt, Spark (as needed).

Benefits (customizable)