Search by job, company or skills

cake by vpbank - digital bank

Senior Data Engineer

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 20 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About the Role

We are seeking a Senior Data Engineer to build and stabilize the next generation of Cake Digital Bank's unified data infrastructure — powering analytics, dashboards, and real-time intelligence across the organization.

You will work across streaming, batch, and lakehouse layers, ensuring that our data pipelines and platforms — BigQuery, Doris, and Iceberg — operate with reliability, observability, and cost-efficiency.

This role is central to the evolution of Cake's data mesh and lakehouse architecture, where dashboards, dbt transformations, and machine-learning features all depend on a stable, scalable, and auditable data foundation.

Key Responsibilities

1. Data Infrastructure Stability & Reliability

  • Design, implement, and maintain high-availability data pipelines that replicate data from OLTP systems to analytical warehouses (BigQuery, Doris)
  • Build resilience and recovery patterns — checkpointing, replay queues, schema-aware ingestion, deduplication, and versioned storage.
  • Lead incident response, RCA, and SLA management for data infrastructure components.
  • Implement end-to-end observability across ingestion, transformation, and serving layers .

2. Unified Storage & Lakehouse Evolution

  • Architect and operationalize Apache Iceberg as the central data storage layer to unify data across BigQuery and Doris.
  • Define data layout, partitioning, compaction, and schema evolution strategy for Iceberg tables stored on GCS.
  • Design cross-system metadata synchronization between Iceberg catalog, BigQuery external tables, and Doris engines.

3. Platform Automation & Scalability

  • Develop and maintain infrastructure-as-code (Terraform, Helm, Config Sync) for Airflow, Flink, Doris, and BigQuery resources.
  • Build self-service templates for creating new pipelines, enabling consistent deployment and monitoring.
  • Optimize data infrastructure configurations to support workload growth while controlling costs.
  • Automate schema detection, dependency validation, and backfill workflows for safe and predictable releases.

4. Data Governance, Quality & Security

  • Integrate with DataHub to maintain end-to-end lineage, glossary, and policy-tag enforcement.
  • Apply column-level security and row-level policies for sensitive and restricted data (PII, PCI, financial metrics)in BigQuery and Doris.
  • Establish validation checks and data contracts to ensure quality and consistency across streaming and batch paths.
  • Collaborate with Compliance and Risk teams to maintain auditable data flow and access traceability.

5. Performance Optimization & Cost Efficiency

  • Continuously tune BigQuery slot usage, query design, and reservation policies by domain.
  • Improve Doris query performance through partition pruning, tablet balancing, and index tuning.
  • Design caching, pre-aggregation, and materialized-view strategies to accelerate dashboards while reducing query cost.
  • Track job efficiency, data duplication, and storage growth across the platform.

6. Collaboration & Leadership

  • Partner with BI, Risk, and ML teams to deliver reliable, low-latency data products that power business decisions.
  • Mentor junior engineers on best practices for streaming systems, Airflow orchestration, and infra automation.
  • Participate in architecture design sessions to guide the evolution of Cake's multi-engine, unified lakehouse platform.

Required Skills & Experience

  • 5+ years of hands-on experience in data engineering or data platform development in a cloud-native environment
  • Strong proficiency in SQL (BigQuery, Doris, or equivalent MPP engines) and Python for building, monitoring, and automating data pipelines.
  • Proven experience operating streaming and CDC systems such as Dataflow, Flink, Debezium, or Datastream, with solid understanding of checkpointing, offset management, and backpressure handling.
  • Deep understanding of data modeling and warehouse optimization — partitioning, clustering, materialized views, caching, and query tuning.
  • Hands-on experience implementing and maintaining infrastructure-as-code using Terraform, Helm, or Config Sync.
  • Familiarity with Apache Iceberg (or Delta/Hudi) and lakehouse design patterns, including schema evolution and compaction.
  • Strong background in observability and reliability engineering — building dashboards, alerts, and auto-remediation for data pipelines using Prometheus, Grafana, or Cloud Monitoring.
  • Understanding of data governance and security concepts: column-level security, policy tags, lineage, and data classification using DataHub or similar systems.
  • Working knowledge of orchestration and workflow automation tools such as Airflow or Dagster.

Our benefits:

  • Competitive compensation including a 13th-month wage and up to 3 months of performance-based bonus.
  • Macbook and essential equipment are provided.
  • BE Corp budget (vary from your level) is allocated for using services such as transportation, food, and passenger car bookings in Be application.
  • Annual health checks and premium medical healthcare (PTI) after probation.
  • 15 days of annual leave is applied for the entire employees.
  • Company trips, team-building activities, and happy hour events are organized on a quarterly or annual basis.

More Info

Job Type:
Industry:
Employment Type:

Job ID: 146928727