69 Site Reliability Engineering Jobs in Kuala Lumpur - June 2026 - High Salaries

Showing 69 jobs results for "site reliability engineering" in Kuala Lumpur

Never miss any updates for Site Reliability Engineering jobs

Undisclosed

KL City

  • Lead every Sev1/2 Incident, run the bridge, write RCA within 48H, enforce blameless post-mortems the same week, and ship permanent automated fixes so the same outage never happens twice.
  • Review team members' code scripts by evaluating adherence to better code quality standards to ensure high-quality software delivery.
  • Evolve product Observability. This includes metrics (Prometheus/Tempo), Logs (Loki/Cloudwatch), Traces (Tempo/OpenTelemetry) and proactively updates on the design, and implementation. ...
Posted
2 days ago
Undisclosed

KL City

  • Organise collaboration within technology teams by creating efficient communication to enhance collaboration and achieve shared goals.
  • Assist on identifying risk in technology infrastructure by conducting proactive risk assessments and develop contingency plan to ensure regulatory and security compliance for technology infrastructure deliverables
  • Execute strategic vision for technology infrastructure improvement by executing the roadmap of initiatives which align with company goals to ensure continuous improvement ...
Posted
16 days ago
Undisclosed

KL City

  • Lead every Sev1/2 Incident, run the bridge, write RCA within 48H, enforce blameless post-mortems the same week, and ship permanent automated fixes so the same outage never happens twice.
  • Review team members' code scripts by evaluating adherence to better code quality standards to ensure high-quality software delivery.
  • Evolve product Observability. This includes metrics (Prometheus/Tempo), Logs (Loki/Cloudwatch), Traces (Tempo/OpenTelemetry) and proactively updates on the design, and implementation. ...
Posted
23 days ago
Undisclosed

KL City

  • Soft Skills: Strategic thinking, exceptional communication, and the ability to collaborate effectively with cross-functional teams in a fast-paced environment.
  • Coding: Proficient in at least one high-level programming language (e.g., Python, Go, C++, or Java) and shell scripting. Strong understanding of data structures and algorithms.
  • Systems: Strong understanding of Linux operating systems and open-source technologies and a solid understanding of network architecture. ...
Posted
25 days ago
MYR12,000 - MYR14,000 Per Month

KL City

  • You are expected to perform independently and become a subject matter expert.
  • Active participation and contribution in team discussions are required, along with providing solutions to work-related problems.
  • Expert proficiency in Incident Response Operations is required. ...
Posted
a day ago
Undisclosed

KL City

  • Collaboration: Partner with Python development squads to ensure new features are designed with reliability in mind; conduct code reviews for reliability-critical paths; participate in Agile ceremonies.
  • Incident Management: Conduct root cause analysis for incidents and implement corrective actions to prevent recurrence; participate in on-call rotations for critical systems; maintain runbooks in version-controlled Python projects.
  • Continuous Improvement: Drive initiatives to improve system performance, reliability, and scalability through Python best practices, including profiling, benchmarking, and dependency management. ...
Posted
8 days ago
Undisclosed

KL City

  • Effectively utilize our world class AIOPS and autonomous service governance platform to ideate new ways to streamline process, accuracy of alerts, time series-based trend analysis, anomaly detection, risk identifications.
  • Support platform/service expansions, migrations to new architectures, upgrades and drill activities across different technology domains.
  • Incorporate mature chaos engineering for risk identification, IPDRR for security, comprehensive automation frameworks to reduce ops effort to reach lowest possible level and make time, space for engineering related focus for the team. ...
Posted
13 days ago
Undisclosed

KL City

  • •Hands-on experience with monitoring tools and methodologies (e.g., Prometheus, Grafana).
  • •Soft Skills: Strategic thinking, exceptional communication, and the ability to collaborate effectively with cross-functional teams in a fast-paced environment.
  • Technical Requirements ...
Posted
15 days ago
Undisclosed

KL City

  • Collaboration: Partner with Python development squads to ensure new features are designed with reliability in mind; conduct code reviews for reliability-critical paths; participate in Agile ceremonies.
  • Incident Management: Conduct root cause analysis for incidents and implement corrective actions to prevent recurrence; participate in on-call rotations for critical systems; maintain runbooks in version-controlled Python projects.
  • Continuous Improvement: Drive initiatives to improve system performance, reliability, and scalability through Python best practices, including profiling, benchmarking, and dependency management. ...
Posted
15 days ago
Undisclosed

KL City

  • Collaboration: Partner with Python development squads to ensure new features are designed with reliability in mind; conduct code reviews for reliability-critical paths; participate in Agile ceremonies
  • Incident Management: Conduct root cause analysis for incidents and implement corrective actions to prevent recurrence; participate in on-call rotations for critical systems; maintain runbooks in version-controlled Python projects
  • Continuous Improvement: Drive initiatives to improve system performance, reliability, and scalability through Python best practices, including profiling, benchmarking, and dependency management ...
Posted
15 days ago
Undisclosed

KL City

  • •Hands-on experience with monitoring tools and methodologies (e.g., Prometheus, Grafana).
  • •Soft Skills: Strategic thinking, exceptional communication, and the ability to collaborate effectively with cross-functional teams in a fast-paced environment.
  • Technical Requirements ...
Posted
16 days ago
Undisclosed

KL City

  • Collaborate with cross-functional teams to support business objectives and deliver technical solutions.
  • Monitor and resolve technical issues, ensuring quick response to minimise impact on operations.
  • Ensure compliance with industry standards and internal policies in all production activities. ...
Posted
2 days ago
Undisclosed

KL City

  • Implement and enforce operational best practices: observability, logging, metrics, alerting, capacity planning, failover strategies, and backups.
  • Collaborate with Engineering, Product, Compliance, and Operations teams to ensure infrastructure meets reliability, compliance, and security standards.
  • Support service scaling, database operations, cloud infrastructure (GCP preferred), networking, and microservices orchestration. ...
Posted
8 days ago
Undisclosed

KL City

  • Manage cloud infrastructure provisioning and configuration using IaC tooling (Terraform, Helm), supporting both AWS/Azure cloud deployments and on-premises customer environments.
  • Implement and maintain CI/CD pipelines for GFS solutions (Jenkins, etc.)
  • Work with Engineering teams to ensure security and compliance readiness for Managed services — including PCI DSS, ISO 27001, SOC 1/2/3, PDPA/GDPR — in close coordination with InfoSec teams. ...
Posted
8 days ago
Undisclosed

KL City

  • Manage cloud infrastructure provisioning and configuration using IaC tooling (Terraform, Helm), supporting both AWS/Azure cloud deployments and on-premises customer environments.
  • Implement and maintain CI/CD pipelines for GFS solutions (Jenkins, etc.)
  • Work with Engineering teams to ensure security and compliance readiness for Managed services — including PCI DSS, ISO 27001, SOC 1/2/3, PDPA/GDPR — in close coordination with InfoSec teams. ...
Posted
8 days ago
Undisclosed

KL City

  • Collaboration: Partner with Python development squads to ensure new features are designed with reliability in mind; conduct code reviews for reliability-critical paths; participate in Agile ceremonies.
  • Incident Management: Conduct root cause analysis for incidents and implement corrective actions to prevent recurrence; participate in on-call rotations for critical systems; maintain runbooks in version-controlled Python projects.
  • Continuous Improvement: Drive initiatives to improve system performance, reliability, and scalability through Python best practices, including profiling, benchmarking, and dependency management. ...
Posted
8 days ago
Undisclosed

KL City

  • Collaboration: Partner with Python development squads to ensure new features are designed with reliability in mind; conduct code reviews for reliability-critical paths; participate in Agile ceremonies.
  • Incident Management: Conduct root cause analysis for incidents and implement corrective actions to prevent recurrence; participate in on-call rotations for critical systems; maintain runbooks in version-controlled Python projects.
  • Continuous Improvement: Drive initiatives to improve system performance, reliability, and scalability through Python best practices, including profiling, benchmarking, and dependency management. ...
Posted
8 days ago
Undisclosed

KL City

  • Experience with CICD development & deployment tools such as Maven, Jenkins, Nexus, Git, and Docker.
  • Proficiency in Linux OS
  • Proficiency in scripting and automation (e.g. Python, PowerShell, YAML) with the ability to develop tools and infrastructure as code (Preferably Ansible, Terraform, Kubernetes, OpenShift). ...
Posted
10 days ago
Undisclosed

KL City

  • Collaboration at its Best: Work closely with product teams, stakeholders, and global support. Immerse in and contribute to a rich tapestry of insights and expertise.
  • Mentorship and Growth: Guide budding engineers and share best practices, fostering a collective ascent.
  • Tech Evaluation: Regularly scrutinize platforms and apps, suggesting improvements rooted in data and hands-on experience ...
Posted
10 days ago
Undisclosed

KL City

  • Analyze production issues, identify root causes, and implement long-term reliability improvements through automation, monitoring, and architectural enhancements.
  • Work collaboratively with other team members and provide guidance to more junior team members.
  • Organize an efficient handover through high quality documentation and training. ...
Posted
14 days ago
Undisclosed

KL City

  • Build and enhance our observability platform, enabling real-time monitoring of our golden signals (uptime, latency, saturation, error rate)
  • Develop automation solutions for incident response, disaster recovery, and business continuity
  • Drive our DevSecOps platform to enable safe, rapid deployments through CI/CD, GitOps, and self-service capabilities ...
Posted
14 days ago
Undisclosed

KL City

  • Implement and enforce operational best practices: observability, logging, metrics, alerting, capacity planning, failover strategies, and backups.
  • Collaborate with Engineering, Product, Compliance, and Operations teams to ensure infrastructure meets reliability, compliance, and security standards.
  • Support service scaling, database operations, cloud infrastructure (GCP preferred), networking, and microservices orchestration. ...
Posted
15 days ago
MYR8,000 - MYR9,000 Per Month

KL City

  • Soft Skills: Strategic thinking, exceptional communication, and the ability to collaborate effectively with cross-functional teams in a fast-paced environment.
  • Coding: Proficient in at least one high-level programming language (e.g., Python, Go, C++, or Java) and shell scripting. Strong understanding of data structures and algorithms.
  • Systems: Strong understanding of Linux operating systems and open-source technologies and a solid understanding of network architecture. ...
Posted
15 days ago
MYR8,000 - MYR9,000 Per Month

KL City

  • Soft Skills: Strategic thinking, exceptional communication, and the ability to collaborate effectively with cross-functional teams in a fast-paced environment.
  • Coding: Proficient in at least one high-level programming language (e.g., Python, Go, C++, or Java) and shell scripting. Strong understanding of data structures and algorithms.
  • Systems: Strong understanding of Linux operating systems and open-source technologies and a solid understanding of network architecture. ...
Posted
15 days ago
Undisclosed

KL City

  • Implement and enforce operational best practices: observability, logging, metrics, alerting, capacity planning, failover strategies, and backups.
  • Collaborate with Engineering, Product, Compliance, and Operations teams to ensure infrastructure meets reliability, compliance, and security standards.
  • Support service scaling, database operations, cloud infrastructure (GCP preferred), networking, and microservices orchestration. ...
Posted
16 days ago
Undisclosed

KL City

  • Strong experience in site reliability engineering, infrastructure engineering or a similar role.
  • Strong knowledge on network and protocols, network security and cloud networking
  • Proven strong record of cloud cost optimisation ...
Posted
12 hours ago
Undisclosed

KL City

  • Tool Utilization with CI/CD pipelines, monitoring systems, and analytics to streamline workflows.
  • Bachelor’s/Master’s in Computer Science or related field.
  • 5+ years in cloud operations, SRE, or platform engineering. ...
Posted
14 hours ago
Undisclosed

KL City

  • Manage cloud infrastructure provisioning and configuration using IaC tooling (Terraform, Helm), supporting both AWS/Azure cloud deployments and on-premises customer environments.
  • Implement and maintain CI/CD pipelines for GFS solutions (Jenkins, etc.)
  • Work with Engineering teams to ensure security and compliance readiness for Managed services — including PCI DSS, ISO 27001, SOC 1/2/3, PDPA/GDPR — in close coordination with InfoSec teams. ...
Posted
20 days ago
Undisclosed

KL City

  • Manage cloud infrastructure provisioning and configuration using IaC tooling (Terraform, Helm), supporting both AWS/Azure cloud deployments and on-premises customer environments.
  • Implement and maintain CI/CD pipelines for GFS solutions (Jenkins, etc.)
  • Work with Engineering teams to ensure security and compliance readiness for Managed services — including PCI DSS, ISO 27001, SOC 1/2/3, PDPA/GDPR — in close coordination with InfoSec teams. ...
Posted
20 days ago
Undisclosed

KL City

  • Manage cloud infrastructure provisioning and configuration using IaC tooling (Terraform, Helm), supporting both AWS/Azure cloud deployments and on-premises customer environments
  • Implement and maintain CI/CD pipelines for GFS solutions (Jenkins, etc.)
  • Work with Engineering teams to ensure security and compliance readiness for Managed services — including PCI DSS, ISO 27001, SOC 1/2/3, PDPA/GDPR — in close coordination with InfoSec teams ...
Posted
20 days ago

Browse Location:

Browse "site reliability engineering" Job By:

Browse "site reliability engineering" Job By: