Search by job, company or skills

Seedcom

Senior Data Engineer

Save
new job description bg glownew job description bg glow
  • Posted 21 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About this Role

We are building a new data platform for our group - a dimensional architecture serving sales, inventory, procurement, finance, marketing, and supplier-facing data products. The architecture is preliminarily designed. We're hiring the team that helps with finalization, and builds it.

This role is one half of a senior peer pair. You will own the platform spine - bronze and silver layers, infrastructure, governance, lineage, data quality, MLOps foundations, and performance - while your peer (already hired) owns the analytical surface - gold-layer dbt models, semantic layer, business translation, stakeholder enablement. Together, you form the technical leadership of the data engineering team and report directly to the Head of Data. We are looking for someone whose strengths are concentrated around platform engineering at scale, governance maturity, and the kind of architectural rigor that survives 18-month build cycles without rotting. Our peer-pair structure puts you next to a strong analytics engineer. Neither role is junior to the other - you cover different territory. The analytics lead is excellent at gold-layer modeling, BI enablement, and stakeholder translation. We need the counterpart in platform architecture, governance, and ML platform engineering.

Your peer can review one of your dbt models when you're out. You can run a finance-UAT meeting if your peer is sick. Neither of you can do the other's full job, but both of you can keep the trains running for two weeks if needed. We will give you both shared visibility into all systems and explicit cross-training time during onboarding. We will not let either of you become a single point of failure for our delivery.

Key Responsibilities

Platform & infrastructure

• Set up and own the medallion infrastructure on Databricks or BigQuery. Make the build versus-buy and stack-selection calls in months 1-3

• Design the bronze ingestion framework - connector patterns, schema registry, error handling, idempotency, replay, dead-letter queues. Build it once, scale it across bronze tables and source-system categories

• Own silver-layer transformation patterns including identity resolution, currency normalization, Vietnamese diacritics handling, and cross-source deduplication

• Establish CI/CD, deployment patterns, monitoring, alerting, and DR plan for the entire data platform. Make sure non-engineers can read what's happening in production

• Run the FinOps function. Stay inside the agreed cost envelope. Surface and act on cost anomalies before finance does Governance, lineage, compliance

• Stand up a data catalog and lineage tool (OpenMetadata, DataHub, or equivalent) and drive its adoption across the team. Treat governance as a developer-experience problem, not a documentation problem

• Build the data quality log entity (table) and the rule engine that populates it. Set DQ SLAs, manage breaches, drive remediation

• Own the technical implementation of PDPL controls: audit logging for PII access, consent enforcement at query time, retention/erasure automation, breach detection and 72-hour notification workflow

• Own the technical implementation of Decree 70/2025 real-time POS-to-GDT e-invoice integration. This is real-time, transactional, and tax-critical - the kind of work where streaming architecture decisions matter ML platform

• Build the foundations of our ML platform: ML model registry data entity, feature store, model serving, monitoring, drift detection (after Phase 1)

• Partner with our AI/ML team (separate from data engineering) to make their models deployable (you won't train models - you will make sure they can ship) Team and partnership

• Mentor engineers as the team grows

• Run architecture review for every significant change (push back on shortcuts; approve trade offs explicitly)

• Partner with your peer on shared decisions

Requirements

• 7+ years in data engineering with at least 3 years owning platform-level concerns (not just pipeline development) at a company processing volume hundreds of GB per day or higher

• Production experience with one or more of: Databricks lakehouse architecture, BigQuery at scale, Snowflake, Apache Iceberg/Delta Lake, Spark on YARN. We need someone who knows the failure modes, not just the marketing materials.

• Demonstrated dimensional modeling fluency. You can implement SCD Type 2, conformed dimensions, surrogate key strategies, and explain why fact-table grain matters. Kimball school is fine; Inmon is fine; Data Vault is fine - but you can defend your choices.

• Streaming or real-time pipeline experience. Spark Streaming, Kafka, Flink, Pub/Sub, or equivalent. Decree 70 integration is real-time and we cannot assume only batch.

• Data governance experience. Lineage tools (OpenMetadata, DataHub, Atlan, Collibra), catalog adoption strategies, data quality frameworks. You've done this in production, not just read about it.

• Strong programming in Python, SQL, and at least one of Scala or Java • Cloud platform depth. Either AWS or GCP at the level of someone who has set up VPCs, IAM, secrets management, and cost controls - not just consumed managed services.

• Senior judgment. You can disagree with leadership respectfully. You raise risks early instead of hiding them. Nice to Have

• MLOps platform experience — feature stores (Feast, Tecton, native Databricks), model serving (NVIDIA Triton, Seldon, BentoML), drift detection, A/B serving

• dbt experience at scale, especially dbt Core with custom macros and dbt Cloud orchestration.

• Experience with Vietnamese data domains: e-invoice/GDT integration, Vietnamese language processing (PhoBERT, diacritics), VN payment ecosystem (MoMo, ZaloPay, VietQR), or VN regulatory frameworks

• Retail, FMCG, e-commerce, or marketplace domain experience

• Experience leading cloud-to-on-prem or on-prem-to-cloud migrations with measurable cost outcomes

AI-Era Mindset (Important): We leverage AI to work smarter. We value candidates who:

• Embrace AI tools (ChatGPT, Claude, Copilot...) to boost productivity in daily work

• Use AI to speed up test case design, log analysis, and documentation

• Open to experimenting with new AI-powered approaches for quality assurance

• Willing to learn and adapt as technology evolves rapidly What We Really Look For

• Ownership and accountability for product quality

• Willingness to dig deep into complex problems

• Comfort working with complex, multi-service systems

• Bold passion for quality and continuous improvement What You'll Get

• The rare opportunity to build a modern data stack from the ground up with direct backing from executive leadership

• Deep collaboration across Engineering, Operations, and Business teams

• Competitive compensation based on capability

Head Office & Work Location: Our office is located at The 678 Building – 67 Hoang Van Thai, Tan My, (District 7), HCM City. 

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 147383491

Similar Jobs

Ho Chi Minh, Vietnam

Skills:

JavaApache FlinkAvroSqlDockerTerraformPrestoElasticsearchApache KafkaCloud StoragePythonAWSProtobufSchema RegistryOpenSearchIcebergHudidata lakesRelational DatabasesDelta LakeTrinoAthena

Ho Chi Minh, Vietnam

Skills:

HadoopInformaticaApache NifiApache AirflowHiveGcpSparkEtl ToolsTalendAzurePythonAWSHDFS

Ho Chi Minh, Vietnam

Skills:

CloudformationScalaPrometheusKafkaGrafanaDatadogTerraformSparkDatabricksPythonAI-powered observabilityAirflowGenerative AILLMsFlinkOpenTelemetryFAISSGlueWeaviate

Ho Chi Minh, Vietnam

Skills:

snowflake KafkaSqlGcpKinesisAzurePythonAWSLangChainAirflowGemini APIsHuggingFaceETL pipeline designAI ML initiativesdbtLakehouse architectureOpenAI

Ho Chi Minh, Vietnam

Skills:

snowflake CsvPysparkPostgresJsonSqlDevopsXmlDatabricksPythonAWSParquetSpark structured streamingTextCI CDAgentic AIDeltaMLFlowgit-based version controlFabric