Role- Data Architect
Role Type- Permanent
Budget- 20K/MYR Monthly
Must- Malaysian
Location- Cyberjaya
Skills-Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Spark, Delta Lake, Delta Live Tables
Job Description-
- 12+ years of experience in data engineering using Databricks or Apache Spark-based platforms.
- Proven track record of building and optimizing ETL/ELT pipelines for batch and streaming data ingestion.
- Hands-on experience with Azure services such as Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, or Azure SQL Data Warehouse.
- Proficiency in programming languages such as Python, Scala, or SQL for data processing and transformation.
- Expertise in Spark (PySpark, Spark SQL, or Scala) and Databricks notebooks for large-scale data processing.
- Familiarity with Delta Lake, Delta Live Tables, and medallion architecture for data lakehouse implementations.
- Build and query deltalake storage solutions
- Process streaming data with Azure Databricks structured streaming
- Design Azure Databricks security and data protection solutions
- Flatten nested structures and explode arrays with spark
- Transfer data outside using sparkpools using pyspark connector
- Optimizing spark jobs
- Implementing best practices in spark/databricks
- Experience with orchestration tools like Azure Data Factory or Databricks Jobs for scheduling and automation.
- Knowledge of Git for source control and CI/CD integration for Databricks workflows, cost optimization, performance tuning.
- Familiarity with Unity Catalog, RBAC, or enterprise-level Databricks setups.
- Ability to create reusable components, templates, and documentation to standardize data engineering workflows.
- Solutioning and presales - Architecting frameworks, defining roadmaps, and engaging with stakeholders.
- Experience in defining data strategy, evaluating new tools/technologies, and driving adoption across the organization.
- Experience with Snowflake and/or Microsoft Fabric is an added advantage.
- Must have experience of working with streaming data sources and Kafka (preferred).