全职 Site Reliability Engineers (SRE) 工作, 薪水, Provido Global Federal Territory 公司招聘中

Site Reliability Engineers (SRE)

Provido Global

Undisclosed

全职

KL City, Federal Territory

保存

工作地点

Kuala Lumpur Federal Territory Malaysia

职位描述

岗位职责

At Provido Global, we’re more than a technology company. We are a global hub of innovation, creativity, and engineering excellence.

Our teams design and deliver intelligent, secure, and high-performance digital solutions that help organizations modernize operations, scale their platforms, and succeed in an increasingly digital world.

As part of a dynamic international ecosystem, we bring together forward-thinking engineers, technology specialists, designers, and delivery professionals who transform ideas into scalable, real-world solutions with measurable business impact.
If you are motivated by challenge, inspired by technology, and ready to grow with a company that truly invests in its people, your journey starts here.

What You’ll Be Doing

We are looking for a detail-oriented and experienced Site Reliability Engineer to join our team. The Site Reliability Engineer will be responsible for creating and implementing scalable solutions to meet system and application performance goals. You will also be responsible for troubleshooting system errors and resolving any relevant issues.

System Monitoring and Incident Response:

Implement monitoring solutions to track system health, performance, and availability. Proactively monitor systems, identify issues, and respond to incidents promptly, working to minimize downtime and mitigate impacts

Continuous improvement and reliability engineering:

SREs drive continuous improvement efforts by identifying areas for enhancement, implementing best practices, and fostering a culture of reliability engineering. Participate in post-mortems, conduct blameless retrospectives, and drive initiatives to improve system reliability, stability, and maintainability

Collaboration and knowledge sharing:

SREs collaborate closely with software engineers, operations teams, and other stakeholders to ensure smooth coordination and effective communication. They share knowledge, provide technical guidance, and contribute to the development of a strong engineering culture

Service monitoring and alerting:

Implement comprehensive service monitoring, including dashboards, metrics, and alerts

SLO/SLI Management:

Define, measure, and meet key Service Level Objectives (SLOs),supported by Service Level Indicators (SLIs), including uptime, performance, incidents, and chronic problems

Stakeholder Collaboration:

Partner with application and business stakeholders to ensure high quality product development and release

Performance Optimization:

Collaborate with the development team to enhance system reliability and performance

Automation:

Automate repetitive tasks and operational processes to reduce manual toil and increase efficiency

What You Bring to the Team
Bachelor’s degree in information technology, Computer Science, or related field
Must be flexible and willing to support a rotating schedule, providing 24/7 coverage as part of a shared team responsibility
Strong problem-solving abilities
Excellent understanding of computer systems, servers, and network systems
Ability to work under pressure and manage multiple tasks simultaneously
Strong communication and interpersonal skills
Basic understanding of programming concepts (structured and object-oriented) using high-level languages such as Python, Java, C#, or JavaScript
Experience with distributed storage technologies such as Amazon S3 and related, as well as dynamic resource management frameworks (Kubernetes, Yarn)
Experience with cloud computing platforms such as AWS and Azure
Experience with DevOps tools such as Git, Terraform, Docker, or related
Experience with monitoring tools such as, Grafana, ELK Stack, Prometheus, or related
Preferred Skills
Experience in SRE, DevOps, or Systems Engineering roles, with strong Linux and cloud (AWS, Azure, or GCP) background
Proficiency in scripting (e.g., Python, Bash) and working with tools like Docker, Kubernetes, and Terraform
Familiarity with CI/CD pipelines and observability tools such as Grafana, Prometheus, or ELK stack

重要安全守则

申请工作时，切勿提供您的银行或信用卡详细资料。不要转账或完成无关的在线调查问卷。如果您发现可疑内容，请举报此招聘广告。

了解更多

现在申请

全职 Site Reliability Engineers (SRE) 工作, 薪水, Provido Global Federal Territory 公司招聘中 - Ricebowl