Cloudera Data Consultant / Data Engineer (Migration Specialist)
- 1 year and extendable contract role
- Location: Kuala Lumpur, Malaysia (Fully work on-site)
Role Purpose: As a member of the Migration Team, you will execute the technical movement and transformation of legacy data from IBM Netezza and (Cloudera/Talend) to a modernized CDP Private Cloud Base environment. Your focus is on re-engineering ETL logic into Spark and ensuring data integrity across the Bronze, Silver, and Gold layers.
Business & Operational Responsibilities
- Legacy Workload Modernization: Execute the migration of Netezza tables and legacy Cloudera tables to the new EDM platform.
- ETL Transformation: Implement the migration and optimization of rationalized DataStage jobs into a Spark-native framework.
- Regulatory Compliance: Develop pipelines that specifically enable BNM Project STREAM reporting, ensuring that all data elements are conformed to the Common Data Model.
- Quality & Reconciliation: Perform source-to-target data parity checks, including hash totals and row count validations, to maintain 100% data integrity.
- Operational Readiness: Support 90-day parallel runs and hypercare activities to ensure zero disruption to daily business operations during cutover.
Technical Requirements
A. Data Engineering & Migration Execution
- Spark on YARN: Build and optimize high-throughput ETL/ELT pipelines using Spark 3, moving away from legacy RDBMS-based processing.
- Table Format Implementation: Modernize legacy table structures into Apache Iceberg to support ACID-style consistency, time travel, and row-level updates.
- Automated Conversion: Utilize LLM-based accelerators to parse DSX files, mapping legacy operators to equivalent Spark tasks.
- High-Performance Serving: Migrate critical data marts into Apache Kudu native tables to provide low-latency analytics for QlikSense and SAS consumers
B. Ingestion & Orchestration
- NiFi Data Flow: Configure Apache NiFi flows to land data in the Lakehouse landing zone first, eliminating uncontrolled point-to-point feeds.
- Airflow Orchestration: Refactor ZENA workflows and shell scripts into Airflow DAGs, establishing clear dependency management and SLA visibility.
- CDC Integration: Implement Change Data Capture (CDC) logic to ensure incremental data loads are synchronized across the modernized platform.
C. Governance & Quality Control
- Data Quality (DQ) Gates: Implement Great Expectations (GX) validation checks (completeness, null rates, distribution checks) at both ingestion and transformation stages.
- Secure Implementation: Ensure all developed pipelines adhere to Apache Ranger RBAC/ABAC and Ranger KMS encryption standards.
Experience & Qualifications Professional Background
- Experience Level: Senior-level data engineering (minimum 3 years) experience, specifically in performing data engineering development using Nifi and Airflow. Experience in migrations from Netezza or legacy Cloudera/Hadoop environments is an added advantage.
- Industry Context: Experience in financial services, with an understanding of banking data domains (Customer, Account, Relationship).
Technical Skills Matrix
Domain & Required Skills
- Hadoop/CDP - CDP Private Cloud Base, HDFS, YARN, Hive
- Programming - Python or Scala (Proficient), SQL (Expert)
- ETL Tools - Spark, NiFi, IBM DataStage (Legacy knowledge), Talend
- Storage - Apache Iceberg, Kudu, StarRocks
- Orchestration - Apache Airflow, Linux Shell Scripting
- Governance - Ranger, Atlas, OpenMetadata, Great Expectations
Experience & Qualifications
- Experience: Minimum 5+ years in Technical Lead with Data Architecture/Engineering background and with at least 10+ years focused specifically on Cloudera/Hadoop administration and design.
- Industry Context: Proven track record in delivering mission-critical data platforms for regulated financial institutions (Malaysian banking experience highly preferred).
- Migration Success: Demonstrated success in at least two enterprise-scale Data Warehouse modernization programs, specifically involving Netezza offloading.
- Mandatory Certifications: Cloudera Certified Professional or Associate Administrator (CCA131); Bachelor's degree in Computer Science, IT, or Engineering.
- Preferred Training: Completion of "Securing Cloudera on-premises" and "Managing Apache Ozone" courses.
Pay: RM8,000.00 - RM12,983.11 per month
Benefits:
Application Question(s):
- How many years of experience do you have specifically with Cloudera/Hadoop Administration?
- Do you have at least 10 years of experience in Data Architecture or Engineering?
- Have you successfully led at least two enterprise-scale migrations from legacy platforms like Netezza or DataStage?
- Are you proficient in deploying Cloudera Data Platform (CDP) Private Cloud Base on bare-metal infrastructure?
- Have you worked extensively with HDFS, Hive, and YARN?
- What is your current salary per month in MYR?
- Do you hold any Cloudera certifications such as CCA or equivalent?
- What is your expected salary range per month in MYR? (Min-Max)
- Would you consider for 1-year an extendable contract based?
Location:
Work Location: In person