jobs in XTREMAX PTE. LTD.

XTREMAX PTE. LTD. Hiring! Full Time Cloud Operations Engineer in Central Region (Singapore), Earn up to SGD 5,000 - Ricebowl

Cloud Operations Engineer

XTREMAX PTE. LTD.

SGD5,000 - SGD5,000 Per Month

Central Region (Singapore)

Share
Save

Working Location

  • 114 LAVENDER STREET Central Region (Singapore) Singapore

Job Description

Responsibilities

Responsibilities

Infrastructure &Operations

  • Develop automation and processes to enable teams to manage, scale, and monitor applications in datacenters and cloud environments.
  • Troubleshoot and resolve system related issues across platforms, including participating in on-call escalations for critical incidents.
  • Take ownership of end-to-end infrastructure and security solutions across the organization.
  • Deploy and manage monitoringtools to track infrastructure performance, utilization, and health.
  • Implement configuration management systems for business continuity and automate disaster recovery measures.
  • Provision virtual machines, databases, containers, licenses and other infrastructure resources for development teams.
  • Design, build, optimize, and monitor automation systems to identify bottlenecks and maximize service availability.
  • Perform capacity planning and resource forecasting to ensure infrastructure scales ahead of demand.
  • Own and manage SLA, SLO, and SLI definitions; track and report against service reliability targets.
  • Perform regular backup validation and disaster recovery drills to verify recoverability of critical systems.

Cloud Cost Management (FinOps)

  • Monitor and analyse cloud resource consumption to identify cost optimisation opportunities.
  • Implement tagging strategies, rightsizing, and reserved instance planning to control cloud spend.
  • Produce monthly cost reports and recommendations for engineering and management stakeholders.

System Monitoring & Performance

  • Monitor and analyse product runtime environments (production and non-production) to ensure optimal system performance.
  • Implement continuous improvement strategies to enhance system reliability and efficiency.
  • Deploy full-stack monitoring with predictive analytics (CloudWatch Anomaly Detection, Stackdriver, AzureMonitor).
  • Integrate alerting with central NOC/SOC for faster escalation and resolution.
  • Build and maintain monitoring dashboards in to surface real-time infrastructure health metrics.
  • Implement log aggregation and analysis using ELK Stack (Elasticsearch, Logstash, Kibana) or Splunk or equivalent (optional).

Incident & Problem Management

  • Manage application and securityincidents, performing problem determination and coordinating with internalteams and vendors for resolution.
  • Escalate issues as necessary tominimize business impact.
  • Lead and coordinate withoperations teams and vendors to ensure 24/7 system support availability.
  • Facilitate communicationbetween teams to resolve operational issues efficiently.
  • Conduct post-incident reviews (PIRs) and drive root cause analysis (RCA) to prevent recurrence.
  • Maintain and continuously improve runbooks and standard operating procedures (SOPs) for common incidents.

Operational Processes & Compliance

  • Develop and maintain operationsand process guides to meet audit and compliance requirements.
  • Handle day-to-day operationalactivities, analyse performance data, and prepare status reports forstakeholders and management.
  • Ensure operational processes align with IM8 and ISO 27001 standards.
  • Conduct periodic compliance drills and support audit preparation.
  • Manage change advisory board (CAB) submissions and coordinate change windows in accordance with ITIL change management practices.

Security

  • Implement security practices aligned with industry standards to protect organizational data and infrastructure.
  • Plan, implement, and monitor system security architecture, including threat and risk assessments.
  • Perform security checks such as vulnerability assessments and system hardening.
  • Apply secure configurations and security controls for infrastructure and applications.
  • Configure and manage network security controls including WAF, firewall rules, VPN gateways, and security groups.
  • Ensure network segmentation and least-privilege access across all environments.

Experience and Skills Needed

Core Technical Skills

Strong understanding of:

  • Networking
  • Windows Server administration(Active Directory, DNS, etc.)
  • Linux administration
  • Nginx
  • Squid forward proxy
  • GitLab Runner
  • Public cloud platforms (AWS, Azure, Google Cloud)
  • Terraform
  • Git and modern branching workflows
  • Scripting (PowerShell / Bash /Python)
  • Kubernetes administration
  • Ansible
  • Monitoring tools (Cloud native technology, Grafana, Prometheus)
  • ELKStack (Elasticsearch, Logstash, Kibana) or Splunk (Optional)
  • Cloud cost management tools (AWS Cost Explorer, Azure Cost Management, GCP Billing).
  • Network security tools: WAF, firewall rule management, VPN (site-to-site and client).

Infrastructure & Platform Experience

  • Experience working with high availability, high performance, and high security multi-data-center systems.
  • Experience with hybrid cloud environments.
  • Experience designing and maintaining network architecture including VPCs, subnets, peering, and transit gateways.
  • Hands-on experience with container orchestration platforms (Kubernetes, EKS, AKS, GKE) in production. (optional)

Must Have

  • A bachelor’s degree in computer science, Information Technology, or a related field.
  • 2–5 years of relevantexperience.
  • Proven experience as anOperations or Cloud Engineer, or in a similar IT role.
  • Familiarity with ITSM tools(e.g., Remedy, Zendesk, ServiceDesk) for change and incident managementworkflows.
  • Experience in implementingsecurity and access controls for production and test environments.
  • Proficiency with full stack monitoring tools (e.g., APM tools, CloudWatch, Stackdriver, OpenAPM stack).
  • Cloud infrastructure experience.
  • Strong problem-solving and communication skills, with the ability to explain complex issues to non-technical audiences.
  • A collaborative, resourceful mindset with the ability to deliver innovative solutions.
  • Experience with Linux and Windows administration.
  • Demonstrated experience managing SLAs, SLOs, and producing operational reports for stakeholders.
  • Experience with FinOps practices or cloud cost governance in a production environment.

Good to Have

  • Experience with Singapore Government Projects will be advantageous.
  • Database experience and scripting experience (Shell script / PowerShell / Python) are an advantage.

Certifications

  • AWS Certified Solutions Architect – Associate *Professional is a plus
  • Microsoft Certified: Azure Administrator Associate *Professional is a plus
  • Google Cloud – Associate Cloud Engineer (ACE)
  • ITIL 4 Foundation

By submitting your resume/CV, you consent and agree to allow the information provided to be used and processed by or on behalf of Xtremax Pte Ltd for purposes related to your registration of interest in current or future employment with us and for the processing of your application for employment.

You also represent to us that you have obtained the consent of your referees when you disclose to us their personal data for the purpose of conducting reference checks.

The personal data held by us relating to your application will be kept strictly confidential and in accordance with the PDPA. You may also refer to our Privacy Policy for more details here: 

We regret to inform you that should you not consent to providing the necessary data required for us to process your application, your application will be considered void.

Important Information

Never provide your bank or credit card details when applying for jobs. Do not transfer any money or complete unrelated online surveys. If you see something suspicious, Report this Job ad.

Learn More