jobs in Hytech

Hytech Hiring! Full Time DevOps Engineer (SRE) in Federal Territory - Ricebowl

DevOps Engineer (SRE)

Hytech

Undisclosed

KL City, Federal Territory

Share
Save

Working Location

  • Jalan Sultan Mizan Zainal Abidin, Kompleks Kerajaan Kuala Lumpur Federal Territory Malaysia

Job Description

Responsibilities

About Hytech

Hytech is a leading management consulting firm headquartered in Australia and Singapore, specialising in digital transformation for fintech and financial services organisations. We deliver end-to-end consulting services and provide robust middle- and back-office solutions that enable our clients to optimise operations, enhance efficiency, and stay ahead in a fast-evolving digital landscape.

With more than 2,000 professionals worldwide, Hytech has a strong and growing international presence, with offices across Australia, Singapore, Malaysia, Taiwan, the Philippines, Thailand, Morocco, Cyprus, Dubai, and beyond.


Responsibilities

(Business Continuity & High Availability Architecture)

  • Define, implement, and operate SRE practices, including SLA/SLO/SLI design, availability, connectivity, and disaster recovery strategies
  • Lead architecture design and execution for high availability, high concurrency, and large-scale systems (e.g., microservices, service mesh, multi-active/multi-region)
  • Drive system observability, security compliance, and cost optimization (e.g., cost allocation and governance)
  • Design resilient architectures for mission-critical systems with high availability, elasticity, and fault tolerance


(Observability, Monitoring & Reliability Engineering)

  • Build observability platforms using tools such as Datadog, Prometheus, OpenTelemetry, logging systems, and alerting platforms (Flashcat/Nightingale)
  • Implement full-stack monitoring across applications, infrastructure, and business metrics to enable precise issue detection
  • Establish proactive monitoring systems with alerting, anomaly detection, and automated remediation capabilities
  • Lead incident management (P1/P2), including rapid recovery, root cause analysis (RCA), and continuous improvement mechanisms


(Platform Engineering & Efficiency Optimization)

  • Plan and implement platform engineering strategies to improve scalability, availability, and performance
  • Build standardized platforms for system reliability, observability, and security while optimizing cost efficiency
  • Design and optimize CI/CD pipelines (e.g., GitHub Actions, Jenkins, ArgoCD, Helm) to improve delivery speed and quality
  • Establish standards for containerization, middleware, and deployment processes, ensuring scalability, reliability, and high availability
  • Resolve system bottlenecks through capacity planning, performance tuning, and reliability improvements


(Technology Leadership & Collaboration)

  • Deeply collaborate with business and engineering teams to embed reliability, observability, scalability, and security into system design
  • Lead the definition and implementation of technical standards, security baselines, and quality control mechanisms
  • Drive best practices adoption, tooling standardization, and engineering efficiency improvements


Key Requirements

  • 5+ years in SRE / DevOps / Platform Engineering or related roles
  • Proven experience in designing and operating high-availability, large-scale systems
  • Cloud platforms: AWS (EC2, EKS, IAM, S3, VPC, NLB/ALB, RDS, ElastiCache), or equivalent (Azure/GCP)
  • Infrastructure as Code: Terraform / CloudFormation
  • CI/CD & automation: Jenkins, GitHub Actions, ArgoCD, CodeBuild, Helm
  • Containerization: Docker, Kubernetes (K8s)
  • Observability: Metrics, Logs, Traces (e.g., Prometheus, OpenTelemetry, Datadog)
  • Strong system thinking and analytical problem-solving capability
  • Excellent cross-functional collaboration and communication skills
  • Self-driven with strong ownership and continuous improvement mindset


(Nice to Have)

  • Experience in fintech, payments, or high-security environments
  • Experience with high-concurrency, low-latency system design
  • AI-driven operations (AIOps) or automation experience
  • Certifications (e.g., AWS, CKA/CKS)
  • Experience with large-scale systems or international project delivery


What We Offer

  • Easy access to public transportation (LRT & KTM).
  • Transportation allowance.
  • Corporate insurance coverage, including dental, optical, and outpatient claims.
  • Gym and fitness claims.
  • Ongoing training and development opportunities.
  • Exposure to exciting projects that support career growth and professional development.

Important Information

Never provide your bank or credit card details when applying for jobs. Do not transfer any money or complete unrelated online surveys. If you see something suspicious, Report this Job ad.

Learn More