Job Summary
We are seeking an experienced Operations Support Engineer to join our core development team, supporting an internal product. This role involves providing L2 and L3 technical support, including code-level troubleshooting and enhancements, while ensuring optimal system performance and reliability.
You will work closely with product managers, developers, and support teams in an agile environment, adopting best-in-class tools and practices to deliver high-quality solutions that meet the needs of users in Singapore.
Key Responsibilities
- Monitor and analyze system performance across production and non-production environments to ensure optimal uptime and efficiency.
- Provide L2 and L3 support, including troubleshooting, debugging, and implementing code-level fixes.
- Manage and resolve application, infrastructure, and security incidents within agreed SLAs.
- Perform root cause analysis and collaborate with internal teams and vendors for timely issue resolution.
- Work with application teams, solution architects, and security teams to implement continuous improvement initiatives.
- Develop and maintain operations documentation, including system configurations, processes, and audit records.
- Establish and improve operational processes and runbooks to ensure compliance with audit and governance standards.
- Generate and present status reports, performance metrics, and insights to stakeholders and management.
- Lead or support 24/7 operations, coordinating with internal staff and external vendors to ensure system availability.
- Identify opportunities for automation and process optimization to reduce downtime and minimize manual effort.
Skills and Experience Required
- Bachelor’s degree in Computer Science, Information Technology, or a related field.
- Proven experience in an Operations Engineer or similar IT role.
- Hands-on experience with ITSM tools (e.g., Remedy, Zendesk, ServiceDesk) for incident and change management.
- Strong exposure to incident management, change management, and problem resolution workflows.
- Experience implementing security controls and access management for production and test environments.
- Familiarity with full-stack monitoring tools (APM) and cloud-native monitoring solutions (e.g., AWS CloudWatch, Google Stackdriver).
- Experience with cloud platforms such as AWS, Azure, or Google Cloud (certifications are an advantage).
- Knowledge of automation and scripting tools (e.g., Terraform, Ansible) to improve operational efficiency.
- Understanding of Agile methodologies, DevOps practices, CI/CD pipelines, and test-driven development.
- Strong problem-solving skills with the ability to think innovatively and propose effective solutions.
- Excellent communication skills, able to explain technical concepts to non-technical stakeholders.
- Ability to work collaboratively in a high-performance team environment.
Nice to Have
- Certifications in Sitecore XM Cloud or Sitecore Order Cloud.
- Exposure to OpenAPM stack and modern observability practices.