- Kuala Lumpur Federal Territory Malaysia
Working Location
Job Description
Responsibilities
Role Purpose: As a member of the Migration Team, you will execute the technical
movement and transformation of legacy data from IBM Netezza and (Cloudera/Talend) to a
modernized CDP Private Cloud Base environment. Your focus is on re-engineering ETL logic
into Spark and ensuring data integrity across the Bronze, Silver, and Gold layers.
2. Business & Operational Responsibilities
∙Legacy Workload Modernization: Execute the migration of Netezza tables and legacy
Cloudera tables to the new EDM platform.
∙ETL Transformation: Implement the migration and optimization of rationalized
DataStage jobs into a Spark-native framework.
∙Regulatory Compliance: Develop pipelines that specifically enable BNM Project
STREAM reporting, ensuring that all data elements are conformed to the Common
Data Model.
∙Quality & Reconciliation: Perform source-to-target data parity checks, including hash
totals and row count validations, to maintain 100% data integrity.
∙Operational Readiness: Support 90-day parallel runs and hypercare activities to
ensure zero disruption to daily business operations during cutover.
3. Technical Requirements
A. Data Engineering & Migration Execution
∙Spark on YARN: Build and optimize high-throughput ETL/ELT pipelines using Spark 3,
moving away from legacy RDBMS-based processing.
∙Table Format Implementation: Modernize legacy table structures into Apache
Iceberg to support ACID-style consistency, time travel, and row-level updates.
∙Automated Conversion: Utilize LLM-based accelerators to parse DSX files, mapping
legacy operators to equivalent Spark tasks.
∙High-Performance Serving: Migrate critical data marts into Apache Kudu native
tables to provide low-latency analytics for QlikSense and SAS consumers.
B. Ingestion & Orchestration
∙NiFi Data Flow: Configure Apache NiFi flows to land data in the Lakehouse landing
zone first, eliminating uncontrolled point-to-point feeds.
∙Airflow Orchestration: Refactor ZENA workflows and shell scripts into Airflow DAGs,
establishing clear dependency management and SLA visibility.
∙CDC Integration: Implement Change Data Capture (CDC) logic to ensure incremental
data loads are synchronized across the modernized platform.
C. Governance & Quality Control
∙Data Quality (DQ) Gates: Implement Great Expectations (GX) validation checks
(completeness, null rates, distribution checks) at both ingestion and transformation
stages.
∙Secure Implementation: Ensure all developed pipelines adhere to Apache Ranger
RBAC/ABAC and Ranger KMS encryption standards.
4. Experience & Qualifications
Professional Background
∙Experience Level: Senior-level data engineering (minimum 3 years) experience,
specifically in performing data engineering development using Nifi and Airflow.
Experience in migrations from Netezza or legacy Cloudera/Hadoop environments is
an added advantage.
∙Industry Context: Experience in financial services, with an understanding of banking
Important Information
Never provide your bank or credit card details when applying for jobs. Do not transfer any money or complete unrelated online surveys. If you see something suspicious, Report this Job ad.