Operational Excellence: Maintain the daily health of core batch and streaming pipelines. Ensure strict adherence to SLAs for data freshness and availability.
Incident Response & Resolution: Lead the troubleshooting process for pipeline failures. Perform deep-dive Root Cause Analysis (RCA) on Spark/SQL jobs and implement permanent code fixes to prevent recurrence.
Performance Engineering: Proactively audit and optimize existing pipelines (tuning Spark memory, optimizing SQL execution plans, handling data skew) to reduce latency and infrastructure costs.
Data Integrity & Quality: Implement robust data quality checks (freshness, completeness, consistency). You will be responsible for reconciling data discrepancies, particularly for high-impact Financial and Game Logic data.
Data Recovery & Backfilling: Manage complex data backfills and reprocessing workflows efficiently when upstream sources change or incidents occur.
Cross-Functional Collaboration: Partner with the Infrastructure Team to resolve system-level bottlenecks, and work with Data Analysts to investigate logic discrepancies.
Governance & Compliance: Strictly adhere to data governance policies, security regulations, and internal standard operating procedures (SOPs) to ensure full Data Compliance.
Yu cu
Experience & Mindset
24 years of Data Engineering experience, with a proven track record of operating production pipelines at scale.
Stability First Mindset: You prioritize making systems robust, efficient, and maintainable over simply building new features.
Problem Solver: You enjoy the challenge of debugging complex distributed system errors (e.g., OOM errors, shuffle failures, data skew).
Technical Skills
SQL Mastery: Advanced ability to write, debug, and optimize complex queries.
Spark Expertise: Strong hands-on experience with Apache Spark (PySpark or Scala). You must understand Spark internals (stages, tasks, memory management) to tune performance.
Orchestration: Experience with workflow tools (e.g., Airflow, Oozie) to manage dependencies, retries, and backfills.
Scripting: Basic familiarity with Linux/Shell scripting for log analysis and server-side debugging.
Nice to Have
Experience operating Financial/Billing pipelines where exact data reconciliation is mandatory.
Familiarity with Game Data modeling and metrics.
Observability: Experience setting up monitoring dashboards and alerts using Grafana or Prometheus.
CI/CD: Familiarity with Git workflows and deployment pipelines (GitLab CI, Jenkins).