Data Engineer (Middle)

VNG

Ho Chi Minh, Vietnam

2-4 Years

Save

Posted 17 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

We are looking for a Data Engineer to join our team and contribute to building and scaling our infrastructure. This role offers the opportunity to work on real-world, large-scale data challenges, while gaining exposure to modern data engineering practices and technologies. You will collaborate closely with senior engineers, but also take ownership of specific components, pipelines, and initiatives that support multiple critical business functions.

Key Responsibilities:

Operational Excellence: Maintain the daily health of core batch and streaming pipelines. Ensure strict adherence to SLAs for data freshness and availability.
Incident Response & Resolution: Lead the troubleshooting process for pipeline failures. Perform deep-dive Root Cause Analysis (RCA) on Spark/SQL jobs and implement permanent code fixes to prevent recurrence.
Performance Engineering: Proactively audit and optimize existing pipelines (tuning Spark memory, optimizing SQL execution plans, handling data skew) to reduce latency and infrastructure costs.
Data Integrity & Quality: Implement robust data quality checks (freshness, completeness, consistency). You will be responsible for reconciling data discrepancies, particularly for high-impact Financial and Game Logic data.
Data Recovery & Backfilling: Manage complex data backfills and reprocessing workflows efficiently when upstream sources change or incidents occur.
Cross-Functional Collaboration: Partner with the Infrastructure Team to resolve system-level bottlenecks, and work with Data Analysts to investigate logic discrepancies.
Governance & Compliance: Strictly adhere to data governance policies, security regulations, and internal standard operating procedures (SOPs) to ensure full Data Compliance.

Requirements:

Experience & Mindset

24 years of Data Engineering experience, with a proven track record of operating production pipelines at scale.
Stability First Mindset: You prioritize making systems robust, efficient, and maintainable over simply building new features.
Problem Solver: You enjoy the challenge of debugging complex distributed system errors (e.g., OOM errors, shuffle failures, data skew).

Technical Skills

SQL Mastery: Advanced ability to write, debug, and optimize complex queries.
Spark Expertise: Strong hands-on experience with Apache Spark (PySpark or Scala). You must understand Spark internals (stages, tasks, memory management) to tune performance.
Orchestration: Experience with workflow tools (e.g., Airflow, Oozie) to manage dependencies, retries, and backfills.
Scripting: Basic familiarity with Linux/Shell scripting for log analysis and server-side debugging.

Nice to Have

Experience operating Financial/Billing pipelines where exact data reconciliation is mandatory.
Familiarity with Game Data modeling and metrics.
Observability: Experience setting up monitoring dashboards and alerts using Grafana or Prometheus.
CI/CD: Familiarity with Git workflows and deployment pipelines (GitLab CI, Jenkins).