About the Role
Cake, a leading Vietnamese digital bank with a rapidly growing user base exceeding 6 million, is on a mission to become the NextGen AI Bank. We leverage cutting-edge AI and data science across all our products and services — and we believe generative AI will redefine what banking can be: smarter, faster, and more human.
We are seeking a (Senior) AI Engineer who bridges the gap between pure AI/ML and high-performance software engineering. This is a hybrid role combining AI/ML Ops platform management, rigorous inference optimization, and AI backend engineering. You will be responsible for taking trained models out of the lab and orchestrating them into resilient, ultra-low-latency production systems.
This is an opportunity to own the deployment lifecycle and infrastructure architecture—building complex agentic workflows and self-hosted clusters that power critical financial operations.
Key Responsibilities
- Inference & Optimization: Deploy, profile, and aggressively optimize models.
- MLOps & Platform: Design and maintain robust pipelines for continuous integration, delivery, and automated testing of ML models. Manage self-hosted GPU clusters and containerized orchestration.
- AI Software & Agent Orchestration: Architect and implement complex AI agent workflows. Build robust APIs, persistent memory layers, and backend services to connect core inference engines with client-facing applications.
- Developer Productivity: Leverage advanced development tools and AI coding assistants to rapidly prototype, debug, and scale complex architectural solutions.
- Security & Compliance: Ensure all self-hosted infrastructure, data pipelines, and open-source integrations strictly comply with regional data sovereignty regulations.
- Cross-Functional Collaboration: Partner closely with foundational ML researchers, frontend product teams, and enterprise clients to ensure POCs and production systems scale seamlessly.
Qualifications
- Bachelor's or master's degree in computer science, Software Engineering, or a related technical field.
- Minimum of 1+ years of engineering experience with a heavy focus on MLOps, ML Infrastructure, or high-performance Backend/Systems Engineering.
- Deep, hands-on expertise with Python, C++, or Go
- Strong experience with Kubernetes, Docker, and infrastructure-as-code within self-hosted or bare-metal environments.
- Experience with cloud-based deployment (preferably Google Cloud Platform) and containerized services (Docker, Kubernetes).
- Solid understanding of system design, API development (FastAPI/gRPC), and building complex software architectures around non-deterministic AI outputs.
- A product-oriented mindset with the ability to troubleshoot complex latency bottlenecks across the entire network and hardware stack.
Nice-to-Have
- Hands-on experience building and orchestrating AI Agents, including identity layers and RAG pipelines.
- Experience processing and extracting data from complex documents or working with specialized open-source models (e.g., OlmOCR).
- Previous experience deploying AI or secure software solutions within the fintech, banking, or BFSI sectors.
- Active contributions to open-source projects.
Why You'll Love Working at Cake
- Build generative AI systems used daily by 6+ million users — your work makes an immediate, tangible impact.
- End-to-end ownership: from foundation-model research to production deployment.
- Work with a cutting-edge cloud-native stack (GCP, Vertex AI, Kubernetes, Airflow) and large-scale GPU compute.
- Collaborative, high-performing tech culture where experimentation and innovation are encouraged.
- Competitive compensation and the opportunity to grow within a fast-scaling digital bank.
Our Benefits
- Competitive compensation including a 13th-month wage and up to 3 months of performance-based bonus.
- MacBooks are supplied to all technical team members.
- BE Corp budget (varies by level) for transportation, food, and car bookings in the Be application.
- Social insurance contribution amount based on individual level.
- Annual health checks and premium medical healthcare (PTI) after probation.
- 15 days of annual leave for all staff.
- Company trips, team-building activities, and happy-hour events on a quarterly or annual basis.