jobs in Great Eastern

Great Eastern Hiring! Full Time Lead Platform Reliability Engineer in Selangor - Ricebowl

Lead Platform Reliability Engineer

Share
Save

Working Location

  • Cyberjaya Selangor Malaysia

Job Description

Responsibilities

About the Job

Lead Platform Reliability Engineer (PRE) is responsible for engineering, operating, and maintaining GEL’s internal container platform and its supporting infrastructure, with a strong focus on reliability, resiliency, and security.


As a Senior PRE within GEL’s Infrastructure team, you will play a pivotal role in designing, building, and operating distributed container hosting solutions using Broadcom’s Tanzu product. Your mission is to safeguard and continuously enhance cloud-native applications and services that power the organization’s container ecosystem. You will serve as a subject matter expert for Level 3 support, working closely with cross-functional teams to troubleshoot complex issues, optimize platform performance, and guide application teams in adopting reliability best practices.\


Key Responsibilities:

• Maintaining the stability, reliability, and efficiency of GEL’s internal container platform and its supporting infrastructure. Responsible for resource provisioning and management, responding to platform and application outages, capacity planning, monitoring, and driving reliability enhancements.

• You will continuously evaluate platform’s technical architecture to ensure it scales effectively with evolving application demands, including proactively identifying and resolving reliability issues, analyzing product dependencies, pinpointing performance bottlenecks, and implementing optimization strategies to enhance platform availability and cost efficiency.

• Participate in a 24/7 on-call rotation, promptly addressing alerts from the global monitoring team and resolving production incidents to maintain platform and application uptime. Additionally, you will regularly review team workflows to identify manual processes and implement automation solutions that reduce effort and minimize human error.

• Regularly review the security advisory issued by Broadcom related to Tanzu suite of products and deploy product updates as required to keep platform vulnerable free.

• Work with open-source technologies, CI/CD, SCM tools as necessary, and source control such as Bitbucket, implement organization containers (eg, Docker and Kubernetes). Stay current with industry trends and propose new ways for our business to improve

• Lead and manage an existing Platform Reliability Engineering (PRE) team of eight members, including both permanent staff and external contractors.

• Organize and maintain the team’s on-call roster, and plan resources for all platform upgrade activities.

• Coordinate and allocate support resources for planned application deployment and release activities.

• Oversee the team’s ServiceNow queue and Jira User stories, ensuring all topics, requests and incidents are handled within defined SLA timelines.

• Act as a Level 3 support lead during major incidents by taking full ownership, driving the resolution, coordinating with relevant vendors, implementing immediate fixes or workarounds, and following through with root cause analysis and long-term remediation.

• Bring experience in architecting and designing large-scale, enterprise-grade platforms and solutions that support complex business use cases.

• Takes accountability in considering business and regulatory compliance risks and takes appropriate steps to mitigate the risks.

• Maintains awareness of industry trends on regulatory compliance, emerging threats and technologies to understand the risk and better safeguard the company.

• Highlights any potential concerns /risks and proactively shares best risk management practices.


We are looking for people with

• Bachelor’s or Master’s Degree in Computer Science or a related field.

• Minimum of 10 years of overall experience in IT, with at least 7 years of hands-on experience as a Platform Reliability Engineer or Site Reliability Engineer, specifically managing container orchestration platforms such as Tanzu Application Service, Tanzu Kubernetes Grid Integrated Edition, or other Kubernetes-based platforms.

• At least 5 years of experience in automation using tools like Ansible and scripting languages such as Python and Bash.

• 5 years of experience in developing and maintaining Helm charts and Helm repositories.

• Minimum of 3 years of experience managing NSX-T solutions and integrating them with Tanzu suite products.

• Possession of one or more of the following certifications:

a) Certified Kubernetes Administrator (CKA)

b) Certified Kubernetes Application Developer (CKAD)

c) Certified Kubernetes Security Specialist (CKS)

• 7 years of experience in working in a high-demand, fast-paced environments.

• Strong expertise in platform reliability principles, including scalability, performance optimization, and enterprise platform architecture.

• Proficiency in designing monitoring dashboards using Grafana and Dynatrace to track SLOs, SLIs, and SLAs of platform.

• Solid understanding of DevOps pipelines and automation tools such as Bamboo, Ansible, Bitbucket, Nexus, Jira and Confluence.

• Strong technical and business acumen with the ability to collaborate across multiple technical teams.

• Proven experience in diagnosing and resolving infrastructure and networking issues.

• Extensive experience in CI/CD environments, with a deep understanding of change and version control processes.

• Hands-on experience with platform upgrades, patching, and buildpack management.

• Ability to troubleshoot complex network-related problems.

• Passion for continuous learning and evaluating emerging technologies, with a commitment to knowledge sharing within the team.

• Ability to document Standard Operating Procedures (SOPs) and contribute to internal knowledge bases.

• Strong collaboration skills with the ability to work across various stakeholder groups at organizational level.

• Excellent communication skills to engage with stakeholders and domain experts in designing and operating enterprise-wide solutions.

• Self-motivated, disciplined, and proactive with a strong sense of ownership and urgency.

• High level of integrity, takes accountability of work and good attitude over teamwork.

• Takes initiative to improve current state of things and adaptable to embrace new changes.


How you succeed

• Champion and embody our Core Values in everyday tasks and interactions.

• Demonstrate high level of integrity and accountability.

• Take initiative to drive improvements and embrace change.

• Take accountability of business and regulatory compliance risks, implementing measures to mitigate them effectively.

• Keep abreast with industry trends, regulatory compliance, and emerging threats and technologies to understand and highlight potential concerns/ risks to safeguard our company proactively.


Who we are

Founded in 1908, Great Eastern is a well-established market leader and trusted brand in Singapore and Malaysia. With over S$100 billion in assets and more than 16 million policyholders, including 12.5 million from government schemes, it provides insurance solutions to customers through three successful distribution channels – a tied agency force, bancassurance, and financial advisory firm Great Eastern Financial Advisers. The Group also operates in Indonesia and Brunei.


The Great Eastern Life Assurance Company Limited and Great Eastern General Insurance Limited have been assigned the financial strength and counterparty credit ratings of "AA-" by S&P Global Ratings since 2010, one of the highest among Asian life insurance companies. Great Eastern's asset management subsidiary, Lion Global Investors Limited, is one of the leading asset management companies in Southeast Asia.


Great Eastern is a subsidiary of OCBC, the longest established Singapore bank, formed in 1932. It is the second largest financial services group in Southeast Asia by assets and one of the world’s most highly-rated banks, with an Aa1 rating from Moody’s and AA- by both Fitch and S&P. Recognised for its financial strength and stability, OCBC is consistently ranked among theWorld’s Top 50 Safest Banks by Global Finance and has been named Best Managed Bank in Singapore by The Asian Banker.


To all recruitment agencies: Great Eastern does not accept unsolicited agency resumes. Please do not forward resumes to our email or our employees. We will not be responsible for any fees related to unsolicited resumes.

Important Information

Never provide your bank or credit card details when applying for jobs. Do not transfer any money or complete unrelated online surveys. If you see something suspicious, Report this Job ad.

Learn More