About Epsilo- We're building the operating system for retail media, unifying fragmented workflows across 50+ marketplaces globally
- Trusted by the world's largest consumer brands including Unilever, P&G, L'Oréal, and Colgate
- Backed by Sequoia, Vulcan Capital (founded by Microsoft co-founder Paul Allen), and K3 Ventures
- Recognized by Gartner in their Market Guide for Digital Shelf Analytics for 3 consecutive years (2024, 2025, 2026)
- Building from APAC to serve the fastest-growing retail media markets in the world
About the roleEvery dollar a brand spends on advertising is a decision. Our Budget Distribution Service makes thousands of those decisions every day — allocating budgets across campaigns and ad objects in real time, enforcing financial correctness, and reconciling state across multiple marketplace platforms.
As the end-to-end owner of this service, you will shape how it evolves: from the core allocation engine and constraint logic, to reconciliation loops that keep internal state honest, to a context-aware layer that ensures every distribution is computed on the freshest possible signal.
This is a role for engineers who care about correctness as much as speed, who find elegance in hard constraints, and who want their work to have a direct, measurable impact on business outcomes.
The problem spaceBudgets flow through a hierarchy — from a parent account down to campaigns, ad groups, and individual ad objects. Along the way, they interact with platform automations, user overrides, minimum thresholds, daily change limits, and eligibility rules that vary by marketplace.
By the end of a day, internal state and marketplace state need to agree — or revenue is lost.
The person in this role will own the systems that make that agreement happen, reliably, at scale, every day.
What you'll do- Design and own the budget allocation engine — distributing budgets across a multi-level hierarchy (company → storefront → campaign → ad object) with correctness guarantees: constraint enforcement, reconciliation, and audit
- Own the budget attribute model and reconciliation loops that keep internal state in sync with marketplace state, intraday and at day-end
- Define and uphold distribution SLAs across the full daily cycle, with concurrency safeguards and invariant monitors that surface failures fast
- Drive continuous improvement on reliability metrics: distribution accuracy, reconciliation success rate, and system MTTR
- Build a context layer that ensures every distribution decision is computed on the freshest possible state — including automation signals, spend data, and lock indicators from multiple marketplace platforms
- Architect the system to react to state changes (status transitions, automation events, schedule triggers) rather than rely solely on periodic cycles
Key responsibilities- Own the service end-to-end: architecture, implementation, testing, deployment, on-call, and continuous improvement
- Lead root cause analysis for reconciliation failures, using data monitors and structured investigation
- Collaborate with Data/ETL, Ads Operations, Validation, and Frontend teams to consolidate all budget write paths through a single, trusted service
- Keep the service spec as a living document — updated as platform behavior evolves across supported marketplaces
- Extend the service to support new ad formats and platforms with zero regression to existing distribution and scheduling logic
What we're looking for- 4+ years building correctness-critical backend services in production where bugs mean financial loss, not just degraded UX
- Deep understanding of distributed state: concurrent write conflicts, idempotency patterns, event ordering, and reconciliation strategies when internal and external state diverge
- Experience with event-driven architectures (Kafka or equivalent) where triggers from multiple sources must be deduplicated, sequenced, and handled deterministically
- Strong instinct for what makes a system observable: structured audit trails, invariant monitors, and diagnostic tooling that make production failures debuggable in minutes, not hours
- Comfort operating services with SLAs: you define distribution runtime targets, instrument them, and are accountable when they deviate
- Fluent with AI coding tools (Cursor, Claude, Codex) to accelerate development without compromising quality — you write effective prompts and manage context windows well
- Write clear specs before writing code. Push back when requirements are ambiguous or scope is too broad. You clarify intent before building
- Clear technical communication and ability to work autonomously in a high-trust, low-process environment
Our stack- Backend: Java, Go, Python
- Databases: Singlestore, Redis
- Infrastructure: AWS, GCP, K8s, Kafka
- AI-powered coding: Cursor, Codex, Claude, Graphite
- AI collaboration: Notion, Linear, ChatGPT
Why join Epsilo- Own systems where correctness matters as much as speed. Every dollar allocated is a real business decision — your code directly impacts how brands spend millions daily across 50+ marketplaces
- Build elegant solutions to hard constraints. Budget hierarchies, reconciliation loops, state convergence, and context-aware allocation aren't trivial — they're engineering challenges worth solving well
- Work with world-class brands. Your distribution logic powers advertising operations for Unilever, P&G, L'Oréal, and Colgate across the fastest-growing retail media markets globally
- Define service standards from the ground up. You set the SLAs, build the monitors, own the on-call, and continuously improve the system based on what you learn in production
- AI-native development environment. Use Cursor, Claude, and Codex to ship faster without compromising quality. We're building modern tooling for modern problems
- Early team, massive leverage. Your architectural decisions and code patterns will shape how this service scales to support billions in ad spend