Overview
We are seeking a Senior Cloud DevOps Engineer to design, build, and scale a high-performance cloud infrastructure powering a global, data-intensive platform. This role is ideal for someone who thrives in complex environments, enjoys solving challenging reliability issues, and is passionate about automation and cloud-native technologies.
You will play a key role in improving system reliability, optimizing performance, and enabling scalable infrastructure through modern DevOps practices.
What You’ll Do
- Automate operational processes to reduce manual effort and improve efficiency
- Design, implement, and manage Infrastructure as Code (IaC) using tools like Terraform
- Build and maintain CI/CD pipelines to support rapid and reliable deployments
- Monitor production systems using tools such as DataDog and Grafana to ensure high availability
- Identify system reliability issues and implement long-term solutions to prevent recurrence
- Troubleshoot complex production incidents and perform root cause analysis (RCA)
- Collaborate closely with engineering teams to support debugging and issue resolution
- Analyze trends in production incidents and recommend improvements
- Support live systems through an on-call rotation (24/7 coverage shared across the team)
- Contribute to documentation, runbooks, and internal knowledge sharing
What You Bring
- 5+ years of experience in DevOps, SRE, or Cloud Engineering roles
- Strong experience with AWS in large-scale production environments
- Hands-on expertise with Infrastructure as Code (Terraform preferred)
- Experience with CI/CD tools such as Jenkins
- Proficiency in scripting/programming (Python, PowerShell, Bash, or similar)
- Experience with containerization and orchestration (Docker, Kubernetes)
- Strong understanding of cloud architecture, networking, security, and scalability
- Experience with monitoring, alerting, and observability tools (e.g., DataDog, Grafana)
- Ability to troubleshoot complex, distributed systems under pressure
- Strong communication skills with the ability to explain technical issues clearly
Nice to Have
- Experience with configuration management tools like Ansible
- Familiarity with microservices and cloud-native architectures
- Exposure to high-availability, mission-critical systems
- Understanding of SRE principles (SLIs, SLOs, error budgets)
Why This Role Stands Out
- Work on large-scale, cloud-native systems with real-world impact
- Influence infrastructure strategy and reliability improvements
- Collaborate with highly technical, cross-functional teams
- Opportunity to drive automation and modern DevOps best practices
Job Type: Permanent
Pay: RM10,000.00 - RM150,000.00 per month
Benefits:
- Cell phone reimbursement
- Dental insurance
- Flexible schedule
- Maternity leave
Work Location: In person