Search by job, company or skills

greennode

Senior Data Engineer

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted an hour ago
  • Be among the first 10 applicants
Early Applicant

Job Description

GreenNode is the Leading AI Cloud Infrastructure and Solutions Provider in Southeast Asia, a member of VNG Group, and an official NVIDIA Cloud Partner.

With over 20 years of experience building and operating large-scale cloud infrastructure - starting from our own internal customer zero - VNG, GreenNode possesses deep expertise in security, infrastructure optimization, and cloud transformation. GreenNode delivers a streamlined AI Cloud ecosystem focused on core products designed for large-scale, user-intensive applications and AI workloads. Our infrastructure is deployed across multi-availability zones and multi-region environments in Vietnam and Thailand, ensuring high performance, availability, stability, and flexible scalability for mission-critical workloads.

With a strong understanding of the technology needs of digital-native enterprises - especially mid-tier Banks, FinTech companies, and Retail businesses- GreenNode partners closely with customers throughout their transformation journey, supporting sustainable growth and global expansion.

About the role:

We are looking for a passionate and detail-oriented Senior Data Engineer to join our team. In this role, you will contribute to developing scalable Lakehouse platform components that serve analytics, AI, and LLM workloads, including emerging use cases like Text-to-SQL. You will help build robust data infrastructure that powers data-driven decisions across GreenNode and supports our customer-facing data and AI products.

Key Responsibilities

  • Design and implement scalable Lakehouse components, including table management, partitioning, schema evolution, and data compaction.
  • Build and optimize distributed data processing workloads for batch and streaming at scale.
  • Participate in the deployment, automation, and monitoring of data platform infrastructure.
  • Design, implement, and maintain efficient ETL/ELT data pipelines feeding the Lakehouse.
  • Build data pipelines that support AI and LLM workloads, including embedding generation, RAG data preparation, and Text-to-SQL initiatives.
  • Work closely with data analysts, scientists, AI engineers, and the Platform team to understand data requirements and ensure high data quality.
  • Continuously research and evaluate new tools, frameworks, and technologies in the modern data and AI stack

Requirement

Must-have:

  • Background in Computer Science, Data Engineering, or a related technical field
  • At least 3 years of experience in a data engineering or related role
  • Strong hands-on experience with distributed data processing frameworks (e.g., Apache Spark, Flink), including performance tuning and handling large-scale data
  • Experience with modern Lakehouse or table formats (e.g., Iceberg, Delta Lake, Hudi) or strong interest in adopting them
  • Proficiency with workflow orchestration tools (e.g., Airflow, Dagster)
  • Strong knowledge of Data Warehouse and Lakehouse architecture (e.g., star schema, snowflake schema)
  • Proficient in RDBMS and NoSQL databases, with ability to write complex SQL queries
  • Familiarity with Unix, distributed computing, and Git
  • Familiarity with CI/CD, Docker, Kubernetes, and infrastructure-as-code tools such as Terraform or Ansible
  • Understanding of how data pipelines support AI and LLM workloads (training data, embeddings, or inference)

Soft Skills:

  • Strong collaboration and problem-solving skills
  • Ability to manage time effectively and prioritize in a fast-paced environment
  • Willingness to learn and adapt to new technologies
  • Good communication with both technical and non-technical stakeholders

Nice to have:

  • Deep understanding of Lakehouse internals (metadata, snapshots, catalogs)
  • Experience with query engines (Trino, StarRocks, Presto)
  • Experience with RAG pipelines or Text-to-SQL (e.g., Vanna AI, LangChain SQL agents, dbt Semantic Layer)
  • Exposure to LLM fine-tuning data preparation

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 147282717

Similar Jobs

Ho Chi Minh, Vietnam

Skills:

JavaApache FlinkAvroSqlDockerTerraformPrestoElasticsearchApache KafkaCloud StoragePythonAWSProtobufSchema RegistryOpenSearchIcebergHudidata lakesRelational DatabasesDelta LakeTrinoAthena

Ho Chi Minh, Vietnam

Skills:

HadoopInformaticaApache NifiApache AirflowHiveGcpSparkEtl ToolsTalendAzurePythonAWSHDFS

Ho Chi Minh, Vietnam

Skills:

SqlELTAzure Synapse AnalyticsAzure Data FactoryMicrosoft 365PysparkPower BiEtlDaxData Warehousing ConceptsAzure Data LakeSpark transformationsAzure SQL Database

Ho Chi Minh, Vietnam

Skills:

snowflake BigQueryKafkaRedshiftSqlSpark StreamingKinesisData WarehousingData GovernancePythonData IngestionData Quality MonitoringFlinkEvent-Driven ArchitecturesDatabase Operations

Ho Chi Minh, Vietnam

Skills:

BigQueryPrometheusGrafanaSqlTerraformDataFlowHelmPythonDorisApache IcebergAirflowdatastreamDebeziumFlinkCloud MonitoringHudiDataHubConfig SyncDeltaDagster