Job Specification
- Design, implement, and optimize Big Data platforms using Cloudera (on-premises or cloud).
- Build and maintain data pipelines for collecting, cleansing, and synchronizing data from multiple sources.
- Develop and manage Data Lake House leveraging Hadoop ecosystem tools.
- Integrate, optimize and automate ETL/ELT processes to ensure performance and security.
- Troubleshoot and fix issues that cause scheduling to fail.
- Collaborate with technical and business teams to support data analytics and BI initiatives.
Requirements
- 35 years of experience in Data Engineering, with hands-on expertise in Cloudera or Hortonworks.
- Proficiency in Hadoop ecosystem and tools such as Spark, Hive, Impala, YARN, Ranger.
- Strong understanding of data modeling, partitioning, schema design, and performance tuning.
- Programming skills in Python , SQL script (Perl is a plus).
- Solid knowledge of Linux command, shell scripting, and cluster administration.
- Familiarity with data security, metadata management, and data governance.
- Very Good English communication skills for documentation and collaboration.
- Good analytical mindset
Preferred Qualifications (optional)
- Cloudera Data Platform (CDP) Data Engineer Certification.
- Experience with AWS/Azure/GCP for Big Data deployments.
- Experience in Informatica ETL tool