Search by job, company or skills

F

LLM Engineer / GenAI Engineer (RAG & LLMOps)

new job description bg glownew job description bg glownew job description bg svg
  • Posted 11 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Role Overview

Own the design, fine-tuning, optimization, and production deployment of large language models (LLMs) for domain-specific use cases. You will build high-performance RAG systems, optimize prompts/agents, operate inference at scale, and champion engineering best practices while driving research and innovation.

Key Responsibilities

  • LLM Engineering: Design, fine-tune, and optimize models such as GPT, Claude, Gemini, LLaMA, and Falcon for domain-specific applications.
  • RAG Systems: Build and operate retrieval-augmented generation pipelines (ingestion, chunking, embedding, indexing, retrieval, re-ranking) using vector databases (FAISS, Pinecone, Weaviate, etc.).
  • Prompt/Agent Optimization: Develop prompt templates, chains, and agents with LangChain/LlamaIndex; implement guardrails, tool-use, and memory.
  • Model Deployment (LLMOps): Implement, monitor, and scale inference endpoints with MLflow, Docker, and Kubernetes; manage versioning/registry and safe rollouts (blue-green/canary).
  • Performance Optimization: Evaluate and continuously improve accuracy, latency, and cost (batching, caching/KV-cache, quantization, speculative decoding).
  • Collaboration & Mentoring: Review code, set best practices for AI software engineering, and mentor junior engineers.
  • Research & Innovation: Track advances in LLMs, multimodal AI, and open source; lead PoCs, benchmarking, and knowledge sharing.

Required Qualifications

  • Education: Bachelor's or Master's in Computer Science, Artificial Intelligence, or related field (PhD preferred).
  • Experience:
  • 5+ years in machine learning/NLP.
  • 2+ years working directly with LLMs or GenAI applications.
  • Technical Skills:
  • Proficiency in Python and ML frameworks (PyTorch/TensorFlow) and Hugging Face Transformers.
  • Hands-on with LangChain, LlamaIndex, or SDKs for OpenAI/Anthropic/Cohere/Gemini.
  • Strong understanding of embeddings, tokenization, and vector search/retrieval.
  • Familiarity with MLOps, CI/CD, and cloud (AWS/Azure/GCP); containerization with Docker/Kubernetes.
  • Experience integrating AI APIs (OpenAI, Anthropic, Cohere, Google Gemini).
  • Soft Skills: Excellent problem-solving and communication; comfortable leading projects and mentoring teammates.

Preferred/Bonus

  • Experience with model distillation and fine-tuning open-source LLMs (LoRA/QLoRA, PEFT).
  • Exposure to multimodal AI (text + image + audio/voice), TTS/ASR, VLMs.
  • Familiarity with AI safety, bias/fairness, privacy, and governance/compliance frameworks.
  • Cost/performance tuning: quantization (INT8/INT4), speculative decoding, throughput optimization.

Success Metrics (KPIs)

  • Model quality (task-specific metrics: accuracy/recall, hallucination rate, BLEU/ROUGE/WER as applicable).
  • System performance & cost (P95 latency, throughput, cost per request).
  • Reliability (SLO/SLA, error rates) and delivery velocity (lead time, deployment frequency).
  • Knowledge impact (PoC production conversions, docs/best practices, mentoring outcomes).

Tools & Environment

  • Model/Serving: HF Transformers, vLLM/TensorRT-LLM, Triton, Ray/Modal (as applicable).
  • Vector/RAG: FAISS, Pinecone, Weaviate, Milvus; re-ranking (e.g., Cross-Encoder/ColBERT).
  • Ops/Observability: MLflow, Prometheus/Grafana, OpenTelemetry, Weights & Biases.
  • Data: Airflow/Prefect, dbt, Spark (as needed).

Benefits (customizable)

  • Competitive compensation with performance/PoC success bonuses.
  • Learning budget/certifications and conference attendance.
  • Dedicated GPU credits/resources for R&D; open-source-friendly environment.
  • Comprehensive insurance and flexible work arrangements.

More Info

About Company

Job ID: 138319783