JOB DESCRIPTION
We are looking for a skilled Data Engineer with solid experience in Python, Apache Spark, ETL/ELT pipelines, and SQL to design, develop, and maintain scalable data processing systems on AWS.
Responsibilities
- Design, develop, and operate ETL/ELT pipelines using Python, Apache Spark, AWS Glue, and SQL.
- Process, transform, and validate large-scale datasets for analytics and reporting purposes.
- Build, maintain, and optimize data lakes and data warehouses on AWS (e.g., S3, Redshift).
- Write, optimize, and maintain complex SQL queries for data transformation and analysis.
- Monitor, troubleshoot, and tune Spark jobs and AWS Glue workflows to ensure performance, stability, and reliability.
- Collaborate closely with data analysts, data scientists, and business stakeholders to understand data requirements and translate them into technical solutions.
- Apply best practices related to data quality, data governance, and data security.
- Participate in data modeling and data architecture discussions.
- Contribute to and maintain technical documentation for data pipelines, workflows, and operational processes.
- Support stable, high-quality batch data processing in production environments.
Qualifications- Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent practical experience).
- 4+ years of experience working as a Data Engineer or in a similar role.
Technical Skills- Strong hands-on experience with Python for data processing and ETL/ELT development.
- Solid experience with Apache Spark, particularly batch processing and performance tuning.
- Strong understanding of ETL/ELT concepts and data pipeline design.
- Proficiency in SQL, including complex queries, joins, window functions, and query optimization.
- Practical experience with AWS data services, especially AWS Glue, Amazon S3, and Amazon Redshift.
- Basic working knowledge of other AWS services such as EC2, Lambda, RDS, and IAM.
- Experience working in Linux environments and writing basic scripts.
- Familiarity with version control systems (e.g., Git).