300+ Site Reliability Engineering Jobs in Malaysia | Job Vacancies | June 2026 | Ricebowl

search.result_querys_job "site reliability engineering"

Never miss any updates for Site Reliability Engineering jobs

Undisclosed

Singapore

  • What You Will Do**As an SRE on the  team, you will own the reliability and operational excellence of the platform end-to-end. You will define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs), and work closely with engineering and product teams to reduce toil and improve system resilience. You will design and implement observability solutions — covering logging, metrics, and distributed tracing — to give the team deep visibility into platform health.You will lead incident response efforts, conduct thorough post-mortems, and drive systemic improvements to prevent recurrence. Beyond firefighting, you will contribute to the platform's infrastructure-as-code posture, automating provisioning, configuration, and deployment pipelines on cloud and on-premises environments. You will also participate in capacity planning exercises and performance tuning to ensure the platform scales with growing government demand.Collaboration is central to this role. You will work alongside development teams to embed reliability thinking early in the software development lifecycle, reviewing architectures and advocating for operability best practices.**What We Are Looking For**You should have solid hands-on experience with Kubernetes and container orchestration, along with familiarity with CI/CD tooling such as GitLab, Jira, Confluence, Fortify, or similar tools in the DevSecOps space. Strong proficiency in at least one scripting or programming language — Python, Go, or Bash — is expected, as is experience with infrastructure-as-code tools like Terraform or Ansible.You should be comfortable working with cloud platforms (AWS, Azure, or GCP) and have experience setting up and managing observability stacks such as the ELK stack, Prometheus, Grafana, or equivalent. A good understanding of networking, security hardening, and compliance requirements in a government or regulated environment will be a strong advantage.Beyond technical skills, we value engineers who communicate clearly, take ownership, and approach problems with a systems-thinking mindset. Experience working in an agile team and a genuine interest in public sector technology are a plus.**Good to Have**Experience with GitOps workflows, service mesh technologies (e.g. Istio), or secrets management tools (e.g. HashiCorp Vault) would be advantageous. Prior exposure to government ICT standards or IM8 policies is welcomed but not required. AI-native mindset and software engineering skills are highly valued.
Posted
3 days ago
SGD8,000 - SGD8,000 Per Month

Singapore

  • You will lead incident response efforts, conduct thorough post-mortems, and drive systemic improvements to prevent recurrence. Beyond firefighting, you will contribute to the platform's infrastructure-as-code posture, automating provisioning, configuration, and deployment pipelines on cloud and on-premises environments. You will also participate in capacity planning exercises and performance tuning to ensure the platform scales with growing government demand.
  • Collaboration is central to this role. You will work alongside development teams to embed reliability thinking early in the software development lifecycle, reviewing architectures and advocating for operability best practices.
  • **What We Are Looking For** ...
Posted
3 days ago
SGD8,000 - SGD8,000 Per Month

Singapore

  • o  12+years Strong experience in incident management, platform stability, anddistributed systems support
  • ·      Linux& Scripting Expertise
  • o  Hands-on with Linux commands + shell/Python/ scripting ...
Posted
3 days ago
Undisclosed

Singapore

  • Familiarity with network telemetry tools such as SolarWinds and NetScout.
  • Proficiency in packet level debugging, including capturing traffic with tools like tcpdump and analyzing packets using Wireshark.
  • Broad understanding of end to end infrastructure supporting payment platforms—spanning platform services, networking, databases, and storage. ...
Posted
10 days ago
SGD7,500 - SGD8,500 Per Month

Singapore

  • Experiencewith Monitoring Tools: Dynatrace, Splunk
  • Workingknowledge of Java (1.8+)
  • Strongexpertise in SQL and database troubleshooting (query optimization, performancetuning, and data analysis for incident resolution) ...
Posted
14 days ago
SGD8,000 - SGD8,800 Per Month

Singapore

  • Java& Application Troubleshooting
  • Programming Languages
  • Strong experience in incident management and distributed systems support , excellent troubleshooting and problem-solving skills & automation of operational tasks ...
Posted
14 days ago
Undisclosed

KL City

  • Lead every Sev1/2 Incident, run the bridge, write RCA within 48H, enforce blameless post-mortems the same week, and ship permanent automated fixes so the same outage never happens twice.
  • Review team members' code scripts by evaluating adherence to better code quality standards to ensure high-quality software delivery.
  • Evolve product Observability. This includes metrics (Prometheus/Tempo), Logs (Loki/Cloudwatch), Traces (Tempo/OpenTelemetry) and proactively updates on the design, and implementation. ...
Posted
2 days ago
SGD6,500 - SGD6,500 Per Month

Singapore

  • 7+ years strong experience in Production Support / SRE / BizOps (L2 Operations - hands-on troubleshooting, monitoring, and incident handling)
  • Hands-on expertise in Linux (commands, system operations)
  • Strong scripting skills in Shell / Python / Jython ...
Posted
20 days ago
Undisclosed

Singapore

  • Strong knowledge of HTTP, DNS, and TLS protocols, with the ability to troubleshoot at the application and transport layers.
  • Familiarity with Content Delivery Networks (CDNs) and DDoS protection services.
  • Solid Linux fundamentals, including networking, system configuration, and troubleshooting. ...
Posted
18 days ago
Undisclosed

台灣

  • · Troubleshoot and resolve issues in deployed software
  • · Monitor application performance and availability
  • · Respond to emergencies and manage changes ...
Posted
2 days ago
SGD6,000 - SGD6,000 Per Month

Singapore

  • Drive continuous improvements to reduce operational toil and prevent recurring incidents
  • Perform capacity planning, performance tuning, and system optimisation
  • Design and implement observability solutions across logging, metrics, and distributed tracing ...
Posted
3 days ago
Undisclosed

KL City

  • Organise collaboration within technology teams by creating efficient communication to enhance collaboration and achieve shared goals.
  • Assist on identifying risk in technology infrastructure by conducting proactive risk assessments and develop contingency plan to ensure regulatory and security compliance for technology infrastructure deliverables
  • Execute strategic vision for technology infrastructure improvement by executing the roadmap of initiatives which align with company goals to ensure continuous improvement ...
Posted
16 days ago
SGD16,500 - SGD16,500 Per Month

Singapore

  • Ensure essential procedures are followed and contribute to defining standards
  • Integrate in-depth knowledge of applications development with overall technology function to achieve established goals
  • Provide evaluative judgement based on analysis of facts in complicated, unique, and dynamic situations including drawing from internal and external sources ...
Posted
3 days ago
Undisclosed

KL City

  • Lead every Sev1/2 Incident, run the bridge, write RCA within 48H, enforce blameless post-mortems the same week, and ship permanent automated fixes so the same outage never happens twice.
  • Review team members' code scripts by evaluating adherence to better code quality standards to ensure high-quality software delivery.
  • Evolve product Observability. This includes metrics (Prometheus/Tempo), Logs (Loki/Cloudwatch), Traces (Tempo/OpenTelemetry) and proactively updates on the design, and implementation. ...
Posted
23 days ago
SGD6,000 - SGD6,000 Per Month

Singapore

  • Collaborate closely with cross-functional engineering and infrastructure teams to ensure operational readiness and platform stability
  • Design and implement robust monitoring frameworks, intelligent alerting systems, and incident response processes to achieve operational excellence
  • Define and maintain Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to measure and improve system reliability ...
Posted
a month ago
Undisclosed

KL City

  • Soft Skills: Strategic thinking, exceptional communication, and the ability to collaborate effectively with cross-functional teams in a fast-paced environment.
  • Coding: Proficient in at least one high-level programming language (e.g., Python, Go, C++, or Java) and shell scripting. Strong understanding of data structures and algorithms.
  • Systems: Strong understanding of Linux operating systems and open-source technologies and a solid understanding of network architecture. ...
Posted
25 days ago
Undisclosed

Singapore

  • Deliver a playbook for onboarding new tasks / activities covering both Application and Infrastructure support models
  • Identify opportunities to automate Production support activities (App & Infra) and reduce manual interventions
  • Drive application and infrastructure improvements including performance, capacity, resilience, and operational stability; eliminate toil through automation ...
Posted
15 days ago
Undisclosed

Singapore

  • Deliver a playbook for onboarding new tasks / activities covering both Application and Infrastructure support models
  • Identify opportunities to automate Production support activities (App & Infra) and reduce manual interventions
  • Drive application and infrastructure improvements including performance, capacity, resilience, and operational stability; eliminate toil through automation ...
Posted
15 days ago
Undisclosed

Singapore

  • Deliver a playbook for onboarding new tasks / activities covering both Application and Infrastructure support models
  • Identify opportunities to automate Production support activities (App & Infra) and reduce manual interventions
  • Drive application and infrastructure improvements including performance, capacity, resilience, and operational stability; eliminate toil through automation ...
Posted
16 days ago
Undisclosed

Singapore

  • Entitled to Yearly Bonus & Performance Bonus
  • Bachelor's Degree or above; a degree in computer science or a related field is preferred.
  • At least 2 years of experience in cloud services products and application operations. ...
Posted
a month ago
SGD4,000 - SGD4,000 Per Month

Singapore

  • Experience with automation operations and container technologies (Docker, Kubernetes).
  • Familiarity with CI / CD processes and tools (e.g Jenkins, GitLab CI).
  • Includes daily monitoring, alert response, emergency handling, on-call duties, regular system health checks, and performance optimization. ...
Posted
a month ago
TWD40,000 - TWD40,000 Per Month

台灣

  • 將有許多與國外夥伴合作的機會,能累積國際溝通和商業英語能力
  • 以 Site Reliability Engineering(網站可靠性工程)為核心,運用 RedHat、Windows、JBOSS、SpringBoot、MQ、AMQ、NGINX 等相關技術維運系統。
  • 負責專案管理與優化,包括整合使用者需求與服務工單資料,並開發相關儀表板。 ...
Posted
a month ago
SGD6,000 - SGD12,000 Per Month

Singapore

  • Thought Machine’s Site Reliability Engineers are the guardians of mission-critical systems for the world's most influential financial institutions. As a member of our elite, globally distributed team, you'll be entrusted with running and maintaining the robust production infrastructure that powers our customers' cutting-edge Core Banking and Payments platforms. This is an opportunity to make a tangible impact on the global financial landscape while collaborating with brilliant minds to solve complex engineering challenges.
  • The team is deeply involved in tackling the technical challenges of executing Thought Machine’s growth ambitions - expect to be working with senior stakeholders in the organisation, our customers, and working on programmes and initiatives that are critical to the success of the company.
  • Duties: ...
Posted
2 days ago
SGD8,500 - SGD17,000 Per Month

Singapore

  • Thought Machine’s Site Reliability Engineers are the guardians of mission-critical systems for the world's most influential financial institutions. As a member of our elite, globally distributed team, you'll be entrusted with running and maintaining the robust production infrastructure that powers our customers' cutting-edge Core Banking and Payments platforms. This is an opportunity to make a tangible impact on the global financial landscape while collaborating with brilliant minds to solve complex engineering challenges.
  • The team is deeply involved in tackling the technical challenges of executing Thought Machine’s growth ambitions - expect to be working with senior stakeholders in the organisation, our customers, and working on programmes and initiatives that are critical to the success of the company.
  • Duties: ...
Posted
2 days ago
SGD8,500 - SGD8,500 Per Month

Singapore

  • Regular maintenance of production systems that host Vault products.
  • Contributing to the evolution of our SaaS products by building features that foster exceptional reliability and an unparalleled user experience.
  • Implementing and testing DR strategies to ensure the highest level of resilience and fault tolerance of the platform. ...
Posted
2 days ago
SGD6,000 - SGD6,000 Per Month

Singapore

  • Regular maintenance of production systems that host Vault products.
  • Contributing to the evolution of our SaaS products by building features that foster exceptional reliability and an unparalleled user experience.
  • Implementing and testing DR strategies to ensure the highest level of resilience and fault tolerance of the platform. ...
Posted
2 days ago
MYR12,000 - MYR14,000 Per Month

KL City

  • You are expected to perform independently and become a subject matter expert.
  • Active participation and contribution in team discussions are required, along with providing solutions to work-related problems.
  • Expert proficiency in Incident Response Operations is required. ...
Posted
a day ago
Undisclosed

Singapore

  • Participate in operation and maintenance duty, promptly handle faults, and respond to user issues and requirements.
  • Bachelor or above degree in computer science or related majors.
  • 3+ years of industrial experience, including solid Linux platform operation, maintenance, and debugging capabilities, with proficiency in troubleshooting, configuration optimization, and performance analysis. ...
Posted
9 hours ago
Undisclosed

Singapore

  • 负责应用上线、配置变更、状态监控、容量管理、故障应急响应等工作;
  • 基于业务使用场景,深入优化提供最佳服务治理实践,包含不局限于关键链路性能瓶颈分析、业务问题定位排障、推进系统高可用架构改造升级等
  • 负责线上重大问题排查,紧急事故处理,后续事故分析与优化; ...
Posted
3 days ago

Aspire Systems India Private Limited

SGD12,500 - SGD25,000 Per Month

Singapore

  • Develop and implement automation tools for system management
  • Collaborate with development teams to ensure seamless deployment and operation of applications
  • Maintain documentation of system architecture and processes ...
Posted
6 days ago