- Cyberjaya Selangor Malaysia
Working Location
Job Description
Responsibilities
About the Job
Senior Platform Reliability Engineer (PRE) is responsible for engineering, operating, and maintaining GEL’s internal container platform and its supporting infrastructure, with a strong focus on reliability, resiliency, and security.
As a Senior PRE within GEL’s Infrastructure team, you will play a pivotal role in designing, building, and operating distributed container hosting solutions using Broadcom’s Tanzu product. Your mission is to safeguard and continuously enhance cloud-native applications and services that power the organization’s container ecosystem. You will serve as a Level2 support, working closely with cross-functional teams to troubleshoot complex issues, optimize platform performance, and guide application teams in adopting reliability best practices.
Key Responsibilities:
• Maintaining the stability, reliability, and efficiency of GEL’s internal container platform and its supporting infrastructure. Responsible for resource provisioning and management, responding to platform and application outage and monitoring.
• This includes proactively identifying and resolving reliability issues, analysing product dependencies, pinpointing performance bottlenecks, and implementing optimization strategies to enhance platform availability and cost efficiency.
• Participate in a 24/7 on-call rotation, promptly addressing alerts from the global monitoring team and resolving production incidents to maintain platform and application uptime. Additionally, you will regularly review team workflows to identify manual processes and implement automation solutions that reduce effort and minimize human error.
• Regularly deploy product updates as required to keep platform vulnerable free.
• Work with open-source technologies, CI/CD, SCM tools as necessary, and source control such as Bitbucket, implement organization containers (eg, Docker and Kubernetes). Stay current with industry trends and propose new ways for our business to improve
• Takes accountability in considering business and regulatory compliance risks and takes appropriate steps to mitigate the risks.
• Maintains awareness of industry trends on regulatory compliance, emerging threats and technologies to understand the risk and better safeguard the company.
• Highlights any potential concerns /risks and proactively shares best risk management practices.
We are looking for people with
• Bachelor’s or Master’s Degree in Computer Science or a related field.
• Minimum of 5 to 7 years of overall experience in IT, with at least 3 to 5 years of hands-on experience as a Platform Reliability Engineer or Site Reliability Engineer, specifically managing container orchestration platforms such as Tanzu Application Service, Tanzu Kubernetes Grid Integrated Edition, or other Kubernetes-based platforms.
• At least 3 years of experience in automation using tools like Ansible and scripting languages such as Python and Bash.
• 3 years of experience in developing and maintaining Helm charts and Helm repositories.
• Minimum of 3 years of experience managing NSX-T solutions and integrating them with Tanzu suite products.
• Possession of one or more of the following certifications:
a) Certified Kubernetes Administrator (CKA)
b) Certified Kubernetes Application Developer (CKAD)
c) Certified Kubernetes Security Specialist (CKS)
• 3–5 years of experience in working in a high-demand, fast-paced environments.
• Strong expertise in platform reliability principles, including scalability, performance optimization, and enterprise platform architecture.
• Proficiency in designing monitoring dashboards using Grafana and Dynatrace to track SLOs, SLIs, and SLAs of platform.
• Solid understanding of DevOps pipelines and automation tools such as Bamboo, Ansible, Bitbucket, Nexus, Jira and Confluence.
• Strong technical and business acumen with the ability to collaborate across multiple technical teams.
• Proven experience in diagnosing and resolving infrastructure and networking issues.
• Extensive experience in CI/CD environments, with a deep understanding of change and version control processes.
• Hands-on experience with platform upgrades, patching, and buildpack management.
• Ability to troubleshoot complex network-related problems.
• Passion for continuous learning and evaluating emerging technologies, with a commitment to knowledge sharing within the team.
• Ability to document Standard Operating Procedures (SOPs) and contribute to internal knowledge bases.
• Strong collaboration skills with the ability to work across various stakeholder groups at organizational level.
• Excellent communication skills to engage with stakeholders and domain experts in designing and operating enterprise-wide solutions.
• Self-motivated, disciplined, and proactive with a strong sense of ownership and urgency.
• High level of integrity, takes accountability of work and good attitude over teamwork.
• Takes initiative to improve current state of things and adaptable to embrace new changes.
How you succeed
• Champion and embody our Core Values in everyday tasks and interactions.
• Demonstrate high level of integrity and accountability.
• Take initiative to drive improvements and embrace change.
• Take accountability of business and regulatory compliance risks, implementing measures to mitigate them effectively.
• Keep abreast with industry trends, regulatory compliance, and emerging threats and technologies to understand and highlight potential concerns/ risks to safeguard our company proactively.
Who we are
Founded in 1908, Great Eastern is a well-established market leader and trusted brand in Singapore and Malaysia. With over S$100 billion in assets and more than 16 million policyholders, including 12.5 million from government schemes, it provides insurance solutions to customers through three successful distribution channels – a tied agency force, bancassurance, and financial advisory firm Great Eastern Financial Advisers. The Group also operates in Indonesia and Brunei.
The Great Eastern Life Assurance Company Limited and Great Eastern General Insurance Limited have been assigned the financial strength and counterparty credit ratings of "AA-" by S&P Global Ratings since 2010, one of the highest among Asian life insurance companies. Great Eastern's asset management subsidiary, Lion Global Investors Limited, is one of the leading asset management companies in Southeast Asia.
Great Eastern is a subsidiary of OCBC, the longest established Singapore bank, formed in 1932. It is the second largest financial services group in Southeast Asia by assets and one of the world’s most highly-rated banks, with an Aa1 rating from Moody’s and AA- by both Fitch and S&P. Recognised for its financial strength and stability, OCBC is consistently ranked among the World’s Top 50 Safest Banks by Global Finance and has been named Best Managed Bank in Singapore by The Asian Banker.
To all recruitment agencies: Great Eastern does not accept unsolicited agency resumes. Please do not forward resumes to our email or our employees. We will not be responsible for any fees related to unsolicited resumes.
Important Information
Never provide your bank or credit card details when applying for jobs. Do not transfer any money or complete unrelated online surveys. If you see something suspicious, Report this Job ad.