WHAT TO EXPECT
To provide hands-on technical support, incident management, and production operations coverage. The role owns first-line incident response, log analysis, data patching, and code-level troubleshooting, while also serving as a key backup for current team members during absences to ensure continuity of day-to-day operations.
WHAT YOU'LL DO
- Incident management: Act as primary responder for production incidents — triage, investigate, contain, resolve, and conduct post-incident reviews
- Log & code analysis: Monitor and analyse application logs (e.g. via CloudWatch) to identify root causes of bugs or failures and liaise with L3 team
- Data patching & SQL operations: Perform production data patches, write and validate SQL scripts for fixes and data corrections, ensuring accuracy and auditability
- Production deployment support: Assist in and oversee production deployments; validate releases, rollback if needed, and coordinate with relevant stakeholders
- Vulnerability tracking: Proactively identify, log, prioritise, and follow up on security vulnerabilities and technical debt items
- New Initiatives: Support technical support for new projects to improve maintainability, performance, and reliability
- Backup: Cover technical queries, decisions, and stakeholder communications during team members' absence; escalate only when design-level or final authority decisions are required
- Debugging & Tooling: Use CloudWatch, Python scripts, and internal tooling to diagnose issues, automate repetitive tasks, and improve observability
WHO YOU ARE
- Diploma or degree in Computer Science, Information Technology, or a related field
- Relevant certifications in cloud operations or ITIL (preferred, not mandatory)
- SQL: Strong proficiency (able to write, review, and safely execute data patches and investigative queries in production)
- CloudWatch: Experience querying logs, setting up alarms, and using metrics for incident diagnosis
- Python: Preferred for scripting, automation, and ad-hoc technical tasks
- Incident management: Hands-on experience handling production incidents end-to-end, including on-call or after-hours coverage
- Production deployment: Familiarity with deployment pipelines, release processes, and rollback procedures
- Debugging & log analysis: Systematic approach to root-cause analysis using logs, traces, and code review
- Code comprehension: Able to read and review code (Python preferred) to identify bugs; contributes to refactoring initiatives
- Security awareness: Able to identify vulnerabilities, assess risk, and maintain a prioritised tracker proactively
- Experience working in agile or fast-paced product teams, comfortable with ambiguity