Operations Leadership & Service Ownership
- Lead, manage, and develop a team of approximately 12 operations staff (Shift Leads and L1 Engineers).
- Own day-to-day service delivery for GPU infrastructure and supporting platforms.
- Be accountable for customer-facing SLA performance.
- Establish a culture of operational excellence, accountability, and ownership.
- Ensure operations are executed consistently across all shifts.
Incident, SLA & Service Governance
- Own incident management, major incident management, change management, and problem management processes.
- Ensure correct incident prioritization, escalation, and resolution.
- Track SLA performance and breach risk, and drive mitigation actions.
- Lead post-incident reviews and corrective action tracking.
- Ensure shift handovers, on-call coverage, and escalation models are robust.
ITSM, Automation & Operational Visibility
- Own Jira Service Management operating model.
- Define and enforce ticket lifecycle standards and data quality.
- Drive automation of workflows, approvals, and notifications.
- Define and maintain operational dashboards for leadership visibility.
- Ensure ITSM data is accurate and reliable.
Customer, Vendor, & Compliance
- Act as primary operational contact for customers on service-related matters.
- Drive professional, timely, and transparent customer communication.
- Own operational engagement with GPU, fabric, rack, and DC facility vendors.
Requirements;
- Bachelor’s degree in Computer Science, Information Technology, Electrical Engineering, or a related field. Equivalent practical experience will be considered.
- 10+ years of experience in IT infrastructure or data center operations.
- 5+ years leading operations teams in 24x7 environments.
- Strong ITSM and service management background.
- Experience owning SLAs and service delivery.
- Experience with Jira Service Management or similar.
- Familiarity with GPU platforms AI / HPC environments.
- Proven experience designing, documenting, and implementing operational processes, SOPs, runbooks, and operational policies.
- Strong people leadership, communication, and stakeholder management skills.
Job Type: Permanent
Pay: RM10,000.00 - RM18,000.00 per month
Application Question(s):
- Do you have experience with GPU platforms AI / HPC environments?
Experience:
- Data Centre Operations: 3 years (Required)
- ITSM Operation: 5 years (Required)
Work Location: In person