About the Role
We are seeking a highly skilled Senior AI Engineer to join our team and lead the development of sophisticated AI agent systems and infrastructure. This role requires a unique combination of deep AI/ML expertise, production system management capabilities, and hands-on experience with cutting-edge agentic workflows. You will be responsible for designing, deploying, and maintaining robust AI systems that can operate autonomously at scale.
Key Responsibilities
- AI Infrastructure Development: Design and implement scalable AI infrastructure supporting complex agentic workflows and model deployment pipelines
- System Architecture & DevOps: Set up, monitor, and maintain production AI systems with full ownership of deployment, scaling, and troubleshooting
- Performance Optimization: Analyze and optimize system performance, identifying bottlenecks and implementing solutions for high-throughput AI applications
- Production Support: Provide 24/7 monitoring, alerting, and rapid resolution of production issues in AI agent environments
- Research Implementation: Translate cutting-edge AI research into production-ready systems and frameworks
Required QualificationsCore Experience
- 4+ years of hands-on experience with AI infrastructure, agentic workflows, and production model deployment
- DevOps Engineering Skills: Proven ability to independently set up, monitor, and troubleshoot complex distributed systems
- System Performance Expertise: Deep understanding of performance optimization, resource management, and scalability patterns
- Production AI Experience: Track record of maintaining and troubleshooting AI agents and ML systems in production environments
Technical Expertise Required1. RAG (Retrieval-Augmented Generation) Systems
- Vector databases (Pinecone, Weaviate, Chroma, FAISS) optimization and scaling
- Dense/sparse/hybrid retrieval strategies and embedding approaches
- Advanced chunking, query optimization, and multi-step retrieval workflows
- Self-RAG, Corrective RAG, Adaptive RAG, or Graph RAG implementation
- RAGAS evaluation frameworks and A/B testing for retrieval quality
2. Research & Information Gathering Systems
- Multi-source data integration (web scraping, APIs, structured data)
- Query planning, search optimization, and real-time data processing
- Citation tracking, source attribution, and fact verification systems
- Domain-specific research workflows and automated research pipelines
3. Memory Optimization for AI Agents
- Context window management and sliding window techniques
- External memory stores with episodic/semantic memory separation
- Hierarchical memory architecture (working, short-term, long-term storage)
- Memory consolidation, relevance scoring, and personalization systems
4. Agent Orchestration Frameworks
- Multi-agent communication, message passing, and shared state management
- Task decomposition, dependency management, and parallel execution
- Master-worker, peer-to-peer, and hierarchical coordination patterns
- Fault tolerance, dynamic agent spawning, and performance monitoring
Preferred Qualifications
- Advanced Degree in Computer Science, AI/ML, or related technical field
- Cloud Platform Expertise: AWS, GCP, or Azure with experience in managed AI services
- Container Orchestration: Kubernetes, Docker, and container-based deployment strategies
- Monitoring & Observability: Experience with Prometheus, Grafana, ELK stack, or similar tools
- Programming Languages: Proficiency in Python, with experience in Go, Rust, or C++ for performance-critical components
- Research Background: Publications or contributions to open-source AI/ML projects
- Industry Experience: Previous work in AI-first companies, research labs, or high-scale technology environments