700+ Reliability Jobs - June 2026 - High Salaries

Showing 711 jobs results for "reliability"

Never miss any updates for Reliability jobs

Undisclosed
WFH

Singapore

  • Improve system observability and incident response capabilities
  • Optimize operational costs while maintaining high availability and performance
  • Architect and implement AWS infrastructure using Infrastructure as Code principles ...
Posted
a month ago
Undisclosed

Sai Kung

  • Site Reliability Engineering (SRE): Establish and monitor SLIs, SLOs, and SLAs to ensure system health. Drive incident management processes, perform root cause analysis (RCA), and implement "self-healing" infrastructure to minimize downtime.
  • Infrastructure as Code (IaC): Standardize environments using IaC tools (e.g., Terraform, CloudFormation) to ensure reproducible and version-controlled infrastructure.
  • CI/CD Automation: Architect and maintain automated deployment pipelines to support a fast-paced development culture, reducing human error and time-to-market. ...
Posted
8 days ago
SGD7,500 - SGD7,500 Per Month

Singapore

  • Root Cause Understanding and Resolution of reliability test program related issues
  • Promotion of Innovation and Challenge Status Quo: The role involves promoting innovation and driving for changes (eg: innovate new reliability features and solutions, or code infrastructure optimization & handling) that will provide Micron with a technical advantage over its competition. This is vital for maintaining Micron’s competitive edge in the market.
  • Cross-Functional Collaboration: This role necessitates collaboration with various cross-functional teams such as Fab, HBM Technology Development, HBM Design, System Development, and Quality/Reliability team. This collaboration is vital for the holistic development and shipping of end products. ...
Posted
14 hours ago
Undisclosed

Singapore

  • Create tools, automation processes, visualizations, and monitoring systems to facilitate the operation and optimization of the global infrastructure.
  • Contribute to and enhance the entire service lifecycle, from inception and design through deployment, operation, and refinement.
  • Construct multi-geo infrastructure across the globe, manage service capacity across regions, and balance traffic and resources on a global scale. ...
Posted
19 days ago
Undisclosed

Singapore

  • Keep TikTok running smoothly across continents and time zones
  • Develop systems for mapping, capacity planning, disaster recovery, and incident automation
  • Test our systems with chaos engineering so they can handle anything thrown at them ...
Posted
23 days ago
SGD12,000 - SGD12,000 Per Month

Singapore

  • Root Cause and Resolution of Qual and RMA Device Issues pertaining to defectivity and intrinsic reliability: Debug and identify root cause and failures in reliability tests by electric failure analysis(EFA) and Physical Failure Analysis (PFA) and drive for resolution and improvements through cross functional team collaboration.
  • Process Conversions and HVM: The role involves providing recommendation to fab teams for new process conversions to reduce cost, increase yields. This is critical as it directly impacts the yield, quality, reliability, and performance of HBM products, which are key components in many modern technologies.
  • Risk Management: The role requires communication with Product Managers/Leads to manage risks associated with DPM process conversions. This is crucial in ensuring that the conversions move at an appropriate pace, balancing the need for innovation with the need for stability and reliability. ...
Posted
14 hours ago
SGD11,250 - SGD22,500 Per Month

Singapore

  • Sc or higher degree in Computer Science or related fields from accredited and reputable institutions.
  • Minimum of 5 years of R&D experience in the fields of cloud computing or large-scale model systems.
  • Proficiency in cloud-native technologies and understanding of the relevant technology stack. ...
Posted
a month ago
SGD6,500 - SGD13,000 Per Month

Singapore

  • Sc or higher degree in Computer Science or related fields from accredited and reputable institutions with R&D experience in the fields of cloud computing or large-scale model systems.
  • Proficiency in cloud-native technologies and understanding of the relevant technology stack.
  • Expertise in one of the following programming languages: Golang, Python, or Java, with the ability to use it proficiently in a professional setting. ...
Posted
a month ago
Undisclosed

Singapore

  • Build and maintain observability pipelines covering log collection, metrics aggregation, and usage analysis to provide business stakeholders with actionable insights into traffic behavior.
  • Serve as the first point of contact for online incidents and business inquiries related to traffic services; triage issues, coordinate resolution, and communicate status clearly to stakeholders.
  • Document solution patterns and reusable configurations to build a knowledge base that accelerates future delivery and onboarding. ...
Posted
a month ago
Undisclosed

Hong Kong

  • Serve as a core contact between the trading desk and network, security, market-data, and platform groups when production incidents arise or new systems are introduced
  • Provide support to traders, strategists, and quantitative researchers
  • Strengthen reliability practices, automation efforts, and day-to-day efficiency throughout the trading technology layer ...
Posted
a month ago
Undisclosed

Hong Kong

  • You’ll partner with engineers, traders, and quantitative research teams to build, run, and support a production environment for a highly automated, multi-asset trading operation, with hands-on ownership of Linux/Unix systems to ensure stability, performance, and resilience across a distributed stack. You’ll also contribute to the development and maintenance of Linux-based deployment and release tooling, while gaining practical exposure to market dynamics and how architecture and operational risk directly impact trading outcomes. Supporting a global trading desk, the role is primarily focused on real-time trading support and reliability.
  • In this role, you will:
  • Design and build monitoring utilities using Python and Bash to track system stability, throughput/performance, and service dependencies for a production-critical stack ...
Posted
a month ago
Undisclosed
  • To ensure that product specifications required by the customers are strictly adhered.
  • To be responsible for the preparation of quality inspection, test procedures and relevant in-house specifications.
  • Work with both Operations and R&D teams to ensure and monitor the stability of all MCC raw materials and products ...
Posted
a month ago
Undisclosed

Singapore

  • Measure and monitor availability, latency and overall service health.
  • Practice sustainable incident response and postmortems.
  • Participate in on-call rotations across continents. ...
Posted
24 days ago
SGD7,000 - SGD7,000 Per Month

Singapore

  • Best practices for application system changes, including change control, version management, and rollback strategies, while ensuring operational duties during release windows.
  • Drive operational automation by designing and implementing automated tools and processes, ensuring resource allocation is optimized and supporting business scalability.
  • Providefeedback and suggestions for business architecture design and continuously produce operational technical documentation. ...
Posted
a month ago
Undisclosed

Singapore

  • Entitled to Yearly Bonus & Performance Bonus
  • Bachelor's Degree or above; a degree in computer science or a related field is preferred.
  • At least 2 years of experience in cloud services products and application operations. ...
Posted
a month ago
Undisclosed

Singapore

  • Champion a culture of automation-first and reliability engineering across the network organization.
  • Design, develop, and maintain production-grade automation frameworks using Ansible, Python, and CI/CD pipelines.
  • Build reusable Ansible collections, Python libraries, and REST API integrations for network and security platforms. ...
Posted
a month ago
Undisclosed
  • A Site Reliability Engineer (SRE) combines software engineering and systems engineering to build, maintain, and optimize large-scale, highly available, and fault-tolerant systems. This role focuses on automation, infrastructure reliability, system scalability, and operational excellence in a fast-paced BPO environment supporting infrastructure services and AML systems.
  • Key Responsibilities
  • Design, build, and maintain scalable, highly available, and fault-tolerant systemsCollaborate with software engineering teams to improve reliability and system performanceDevelop automation procedures to reduce manual intervention and improve operational efficiencyMonitor infrastructure performance and proactively identify system bottlenecksImplement and maintain monitoring tools, automated alerts, SLIs, SLOs, and SLAsParticipate in 24/7 on-call rotations, including scheduled shifts and holidaysConduct root-cause analysis and lead blameless post-mortems to prevent recurring issuesEnsure systems comply with security standards and regulatory requirements ...
Posted
a month ago
Undisclosed

Singapore

  • Entitled to Yearly Bonus & Performance Bonus
  • Bachelor's Degree or above; a degree in computer science or a related field is preferred.
  • At least 2 years of experience in cloud services products and application operations. ...
Posted
a month ago
Undisclosed

Singapore

  • Participate in building tools to enhance the observability and automation of data services.
  • Document standard operating procedures and support knowledge sharing across the team.
  • Currently pursuing a Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related field. ...
Posted
a month ago
SGD6,000 - SGD6,000 Per Month

Singapore

  • Perform root cause analysis and implement preventive measures to minimize recurring issues.
  • Work closely with development, infrastructure, database, and business teams to ensure smooth system operations.
  • Support application deployments, releases, and change management activities. ...
Posted
10 days ago
Undisclosed

台灣

  • Understanding of laser design, device physics, common failure modes, and practical considerations when employing devices in real-world applications.
  • Strong knowledge of semiconductor failure analysis techniques
  • Knowledge of optoelectronic packaging design and process development, including laser integration issues ...
Posted
a month ago
Undisclosed

台灣

  • 定期舉辦Comfy Hour Party
  • 每年舉辦All Hands Party(尾牙)
  • KPI獎金 ...
Posted
a month ago
SGD4,000 - SGD4,000 Per Month

Singapore

  • Experience with automation operations and container technologies (Docker, Kubernetes).
  • Familiarity with CI / CD processes and tools (e.g Jenkins, GitLab CI).
  • Includes daily monitoring, alert response, emergency handling, on-call duties, regular system health checks, and performance optimization. ...
Posted
a month ago
Undisclosed

Sai Kung

  • Site Reliability Engineering (SRE): Establish and monitor SLIs, SLOs, and SLAs to ensure system health. Drive incident management processes, perform root cause analysis (RCA), and implement "self-healing" infrastructure to minimize downtime.
  • Infrastructure as Code (IaC): Standardize environments using IaC tools (e.g., Terraform, CloudFormation) to ensure reproducible and version-controlled infrastructure.
  • CI/CD Automation: Architect and maintain automated deployment pipelines to support a fast-paced development culture, reducing human error and time-to-market. ...
Posted
22 days ago
Undisclosed

Malaysia

  • Hands-on experience (tester setup) with automated test equipment (tester) and environmental chambers focusing on Temperature Cycling Test (TCT), Temperature Humidity, Highly Accelerated Stress Test (HAST) or other relevant equipment.
  • Develop reliability test capability /test programs/ testing methods for new products or technologies.
  • Involve in tester development & improvement. ...
Posted
a month ago
Undisclosed

Hong Kong

  • Improve system observability and incident response capabilities
  • Optimize operational costs while maintaining high availability and performance
  • Architect and implement AWS infrastructure using Infrastructure as Code principles ...
Posted
a month ago
SGD4,000 - SGD4,000 Per Month

Singapore

  • Experience with automation operations and container technologies (Docker, Kubernetes).
  • Familiarity with CI / CD processes and tools (e.g Jenkins, GitLab CI).
  • Includes daily monitoring, alert response, emergency handling, on-call duties, regular system health checks, and performance optimization. ...
Posted
a month ago
Undisclosed

Singapore

  • Good understanding of OS concepts and internals of Linux
  • Working knowledge of Intel-based hardware and server components
  • Understanding of server-side networking and typical network protocols ...
Posted
a month ago
Undisclosed

Singapore

  • Proactively monitor system performance, identify bottlenecks, and implement optimizations to improve reliability and scalability
  • Collaborate with development teams to test and deploy new features, enhancements, and bug fixes
  • Contribute to the continuous improvement of operational processes, documentation, and knowledge sharing ...
Posted
a month ago