700+ Reliability Jobs - June 2026 - High Salaries

Showing 716 jobs results for "reliability"

Never miss any updates for Reliability jobs

Centre For Strategic Infocomm Technologies (CSIT)

Undisclosed

Singapore

  • Provide network consultation to project teams for system deployment, working with various stakeholders to resolve technical problems and optimise network performance
  • Background in Computer Science, Computer/Electrical Engineering, Information Technology or related fields
  • 2-5 years of relevant working experience in network design, implementation and/or operations preferred ...
Posted
24 days ago
Undisclosed
  • Conduct daily health checks and monitor VCF infrastructure metrics via vROPS, vCenter, Dynatrace to ensure optimal workload performance and timely issue resolution.
  • Analyze VCF components and perform NVA security remediation to maintain compliance across vSphere, NSX, vSAN, and other VCF elements.
  • Maintains awareness of industry trends on regulatory MAS (SG) and BNC (MY) compliance, emerging threats and technologies to understand the risk and better safeguard the company. Experience with HPSA scanning tools is a plus. ...
Posted
24 days ago
Undisclosed
  • Key Responsibilities:
  • • Maintaining the stability, reliability, and efficiency of GEL’s internal container platform and its supporting infrastructure. Responsible for resource provisioning and management, responding to platform and application outages, capacity planning, monitoring, and driving reliability enhancements.
  • • You will continuously evaluate platform’s technical architecture to ensure it scales effectively with evolving application demands, including proactively identifying and resolving reliability issues, analyzing product dependencies, pinpointing performance bottlenecks, and implementing optimization strategies to enhance platform availability and cost efficiency. ...
Posted
19 days ago
Undisclosed
  • Key Responsibilities:
  • • Maintaining the stability, reliability, and efficiency of GEL’s internal container platform and its supporting infrastructure. Responsible for resource provisioning and management, responding to platform and application outage and monitoring.
  • • This includes proactively identifying and resolving reliability issues, analysing product dependencies, pinpointing performance bottlenecks, and implementing optimization strategies to enhance platform availability and cost efficiency. ...
Posted
19 days ago
Undisclosed

Singapore

  • Proficiency in application testing assessment and management, including the ability to analyse test results and propose improvements
  • Familiarity with ICT governance policies, standards, and best practices in government agencies
  • Knowledge of service design principles and the ability to apply design thinking methods to identify transformation opportunities ...
Posted
24 days ago
SGD8,000 - SGD8,000 Per Month

Singapore

  • –  Perform L1 incident triage: validate symptoms, classify impact (client/business/regulatory), execute first-action runbooks and escalate to L2/L3 with clear diagnostics and evidence.
  • –  Maintain and follow operational procedures (runbooks, checklists, operating calendars and cut-offs), ensuring shift handovers are complete and readiness is sustained across peak periods.
  • –  Build a working understanding of end-to-end Saudi WM business workflows (client on boarding, transaction initiation, confirmations, settlement, reporting and client servicing) and use it to assess impact and prioritise actions. ...
Posted
24 days ago
Undisclosed
  • Manage test hardware and operations of ATE testers such as DTS tester, PFT tester, TESEC tester, and FET tester.
  • Collaborate with BU Product and Test Engineering and Lab Engineering Operations on new product test hardware and setup requirements.
  • Work closely with Vendors/Mfg TE/PE from test hardware design through fabrication to ensure readiness of test capability. ...
Posted
14 days ago
Undisclosed

Singapore

  • Experience with well-architected framework pillars (especially reliability, security, cost optimization).
  • Designing fault-tolerant and horizontally scalable systems
  • Advanced proficiency in Terraform, CloudFormation, or CDK ...
Posted
24 days ago
Undisclosed

KL City

  • If you believe in developing a better tomorrow, read on.
  • About the Role
  • Ensure System Reliability & Availability ...
Posted
20 days ago
Undisclosed

KL City

  • What You’ll Be Doing
  • Monitor, maintain, and improve the reliability, availability, and performance of production systems and services.
  • Build and maintain infrastructure as code (IaC), deployment pipelines, and automation to support continuous delivery, scalability, and disaster recovery. ...
Posted
20 days ago
Undisclosed

KL City

  • Implement monitoring solutions to track system health, performance, and availability. Proactively monitor systems, identify issues, and respond to incidents promptly, working to minimize downtime and mitigate impacts
  • SREs drive continuous improvement efforts by identifying areas for enhancement, implementing best practices, and fostering a culture of reliability engineering. Participate in post-mortems, conduct blameless retrospectives, and drive initiatives to improve system reliability, stability, and maintainability
  • SREs collaborate closely with software engineers, operations teams, and other stakeholders to ensure smooth coordination and effective communication. They share knowledge, provide technical guidance, and contribute to the development of a strong engineering culture ...
Posted
20 days ago
Undisclosed

KL City

  • Drive the development and execution of cutting-edge equipment strategies and predictive maintenance programs to maximize reliability and minimize downtime.
  • Lead advanced reliability investigations using sophisticated techniques like Fault Tree Analysis and Root-Cause Failure Analysis to swiftly resolve critical issues.
  • Engineer and optimize annual maintenance plans, leveraging preventive and predictive tasks to ensure peak equipment performance. ...
Posted
14 days ago
Undisclosed

Singapore

  • Your Opportunity Starts Here.
  • Job Description:
  • Incident Response & RCA : Lead response for complex network disruptions and conduct blameless post-mortems and Root Cause Analysis (RCA) to prevent systemic recurrence. ...
Posted
20 days ago
Undisclosed
WFH

Singapore

  • What You’ll Be Doing
  • Monitor, maintain, and improve the reliability, availability, and performance of production systems and services.
  • Build and maintain infrastructure as code (IaC), deployment pipelines, and automation to support continuous delivery, scalability, and disaster recovery. ...
Posted
20 days ago
Undisclosed

Singapore

  • Flat Structure: Enjoy a flat organizational structure that promotes collaboration and provides opportunities to learn from seasoned technical leadership
  • Automate Routine Tasks: Develop tools to automate administrative tasks, reducing manual intervention and improving efficiency.
  • Optimize System Performance: Create automated solutions to monitor and maintain system performance, ensuring reliability and scalability. ...
Posted
20 days ago
Undisclosed

Malacca City

  • Monitor and troubleshoot system issues, identifying root causes and implementing fixes under the guidance of Platform SRE and Architects
  • Assist in the development and implementation of automation tools and processes to improve efficiency, reduce downtime, and enhance system reliability
  • Assist in the development and maintenance of the application documentation, including system design documents, technical specs, sustaining and maintenance procedures, and application onboarding process ...
Posted
6 days ago
Undisclosed

Singapore

  • Work closely with development teams or other internal teams to ensure that solutions are designed with customer user experience, scale/performance, security and operability in mind.
  • Support and ensure that the software releases are align with the organization’s internal software release and deployment process.
  • Facilitate and support the troubleshooting or root cause analysis of platform issues or incidents with other internal teams. ...
Posted
20 days ago
SGD6,500 - SGD6,500 Per Month

Singapore

  • 7+ years strong experience in Production Support / SRE / BizOps (L2 Operations - hands-on troubleshooting, monitoring, and incident handling)
  • Hands-on expertise in Linux (commands, system operations)
  • Strong scripting skills in Shell / Python / Jython ...
Posted
14 days ago
Undisclosed

Singapore

  • Experience with well-architected framework pillars (especially reliability, security, cost optimization).
  • Designing fault-tolerant and horizontally scalable systems
  • Advanced proficiency in Terraform, CloudFormation, or CDK ...
Posted
25 days ago
Undisclosed

Singapore

  • Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents
  • Perform analytics on previous incidents and usage patterns to better predict issues and take proactive actions
  • Build and maintain CI/CD pipelines for the bank. ...
Posted
25 days ago
Undisclosed

Singapore

  • Experience with well-architected framework pillars (especially reliability, security, cost optimization).
  • Designing fault-tolerant and horizontally scalable systems
  • Advanced proficiency in Terraform, CloudFormation, or CDK ...
Posted
25 days ago
Undisclosed
WFH

Singapore

  • If you enjoy thinking in systems, debugging complex issues, and preventing problems before they happen, this role will push you to grow fast.
  • Help define SLOs, SLIs, and error budgets for core systems
  • Set up and tune monitoring and alerting (Prometheus, Grafana, OpenTelemetry) ...
Posted
20 days ago
SGD7,000 - SGD8,500 Per Month

Singapore

  • Experience with Application Servers, preferably IBM WebSphere / Apache Tomcat8.5.x
  • Excellent and proven experience in Oracle SQL and PL/SQL
  • Experience with monitoring tools such as Tivoli, and Splunk ...
Posted
20 days ago
SGD7,000 - SGD8,500 Per Month

Singapore

  • Participate in incident response, troubleshooting, on-call rotation, andpost-incident RCA.
  • Perform system performance tuning, patching, capacity planning, andoptimization.
  • Improve system reliability through automation, redundancy, and engineeringbest practices. ...
Posted
20 days ago
MYR1,800 - MYR3,000 Per Month

Malaysia

  • Feedback to all shift personnel on any quality issues e.g. external customer complain and internal quality feedback.
  • Ensure labs provide JIT delivery performance, track cycle time and output and ensure zero error/defect in output.
  • Ensure labs maintains 5S and is always in an audit ready condition. ...
Posted
15 days ago
Undisclosed

Jurong West

  • Coordinate with external labs on project testing needs when in-house support is not possible.
  • Initiate meetings to address product functional failures and ensure closure of test requests.
  • Review test results and compile test reports for product release and qualification. ...
Posted
3 days ago
Undisclosed

KL City

  • Education: Final year undergraduate or Master’s students in Computer Science or related fields.
  • Availability: Must be able to work full-time for at least 3–6 months. Long-term internship is highly preferred.
  • Tech Stack: Solid foundation in Linux/Unix and Networking (TCP/IP, DNS); familiar with containerization (Docker/K8s) and cloud services (AWS/GCP/Azure/Tencent Cloud). ...
Posted
3 days ago
Undisclosed

台灣

  • Work closely with PE and TE to review the NPD test program development to ensure Test Quality in NPD process & release
  • Participate in Product Validation Process and provide engineering inputs to improve NPD Quality & Reliability
  • Participate in Product Yield Improvement Process and provide engineering inputs to improve NPD Yield& Quality & Reliability ...
Posted
20 days ago
SGD9,000 - SGD9,900 Per Month
WFH

Singapore

  • Perform deep technical troubleshooting and root cause analysis for complex network and security incidents.
  • Build automation and self‑healing workflows supporting Day‑1, Day‑2, and Day‑N operations.
  • Integrate infrastructure changes into CI/CD pipelines with proper testing, validation, and rollback. ...
Posted
21 days ago