Senior Data Engineer (Remote)

InApps Technology

Vietnam

5-7 Years

Save

Posted a month ago
Be among the first 10 applicants

Early Applicant

Job Description

Project Overview:

You will be a key member of the delivery team on a high-stakes data platform migration for a multi-brand iGaming operator. The engagement replaces a legacy Pentaho/Postgres architecture with a modern SQLMesh + Dagster + BigQuery stack, completing before a 34X data volume surge in July 2026.

This is hands-on build work. You will work directly in the codebase from day one, embedded within a squad led by a Fractional CTO/Data Architect.

Scope of Work Phase 1a Core Migration (MarJul 2026):

Migrate 128 Pentaho ETL jobs to idempotent SQLMesh models following Medallion architecture (Bronze Silver Gold)

Convert 20+ complex PL/pgSQL trigger functions to event-driven Dagster assets and SQLMesh models

Build and validate a Reverse ETL pipeline (BigQuery Postgres) covering 107 tables across 3 SLA tiers: Hot 5 min, Warm 1530 min, Cold daily

Execute 7-Day Consistency Check with row-count and column-sum reconciliation as the production cutover gate

Support UAT, production cutover, and 1-month hypercare post-cutover

Phase 1b BigQuery Modernisation (JulSep 2026):

Rewrite 164 BigQuery Scheduled Queries from EXTERNAL_QUERY to native BQ reads

Rewrite 32 BigQuery Stored Procedures to eliminate CloudSQL dependencies

Retire 12 CloudSQL connections across 10 brands after 7-day burn-in validation

1-month hypercare support post-Phase 1b cutover.

Technical Requirements Must Have:

SQLMesh: Incremental models, full-refresh strategies, model testing, Medallion architecture

Dagster: Asset pipelines, sensors, scheduling, upstream/downstream dependency management

BigQuery: Partitioning, clustering, MERGE with partition pruning, cost-aware query design, INFORMATION_SCHEMA

PostgreSQL: Triggers, stored functions, PL/pgSQL fluency, trigger-to-model conversion patterns

Python: Apache Arrow, Polars (or equivalent) for high-throughput Reverse ETL data movement

Data validation: row-count reconciliation, column-sum diffing, data-diff tooling

CI/CD for data: Git PR workflows, automated model testing, Azure DevOps experience

Idempotency discipline: all models must be re-runnable with zero side effects

Nice to Have:

iGaming or sportsbook domain knowledge: member tiers, bet/deposit fact tables, fiscal period logic

Experience with Datastream, Airbyte, or other CDC ingestion tools

dbt experience (cross-applicable to SQLMesh patterns)

Terraform or Pulumi on GCP

Grafana or Datadog for pipeline observability

Ideal Candidate

5+ years in data engineering with a track record of shipping production pipelines not just building in staging

Has executed at least one legacy-to-modern DWH migration (not just contributed to one)

Comfortable reading, debugging, and rewriting PL/pgSQL business logic

Writes clean, testable SQL and Python and holds the same standard for teammates

Works well under an experienced technical lead: takes direction, flags risk early, executes independently Does not need hand-holding on tooling picks up SQLMesh or Dagster quickly if not already proficient Available to start March 2026; committed to Phase 1b continuation preferred

Code Quality Standards:

All models and assets produced must adhere to:

Explicit partitioning and clustering keys on all destination tables

No broad DISTINCT use GROUP BY or window functions

MERGE with partition pruning for upserts; no DELETE+INSERT anti-patterns

No temp tables or CTEs materialised to disk

Zero material drift on financial columns (Turnover, Revenue) the mandatory acceptance gate

Python adhering to PEP 8; SQL modular, idempotent, and reusable Working Norms

3x per week standup with the Fractional CTO focused on blockers

Weekly sync reviewing milestones, risks, and forward planning

All work submitted via Pull Request reviewed and approved by Client Data Engineering before merge Named to the engagement no rotation without client approval

Interview Process

This role goes through the Qsoft pre-qualification process, consisting of:

Short 1520 minute interview with the project team

Candidate may join 12 interviews on the client side.