Maintain and update documentation and best practices to ensure consistency and efficiency across operations as well as completeness and adherence to the required security policies, standard and guidelines.
Provide standby support (rotation basis) to respond to critical incidents reported by users.
At least 6-8 years of working experience in Government Commercial Cloud (GCC), with comprehensive understanding of core networks, cloud networking and related systems.
...
Observability & Monitoring: Design and champion observability standards for AI systems — covering model health, data/concept drift, system latency, GPU telemetry, and cost attribution. Define SLO/SLI frameworks to ensure AI services are transparent, debuggable, and maintainable at scale.
Security & Governance: Collaborate with Responsible AI and cybersecurity stakeholders to embed security guardrails and compliance checks into standardized deployment pipelines, ensuring AI systems meet government security standards (e.g., IM8, CSA guidelines). Drive governance frameworks covering model lineage, reproducibility, and approval workflows.
Technical Consulting & Enablement: Act as a senior subject matter expert for government agencies, conducting architecture reviews, maturity assessments, and providing actionable roadmaps to scale AI from POC to production. Develop reusable playbooks, reference implementations, and training materials. Mentor engineers within the AI Practice and across partner agencies — the goal is to build self-sufficient teams.
...
Champion the shift from human-executed to AI-assisted ways of working - embedding Agentic AI with Human-in-the-Loop, SkillHub@DCPCC and reusable Skills/MCP into BAU cloud operations to deliver measurable efficiency, quality and standardization gains.
Ensure delivery of identified cloud projects and activities within agreed constraints of time and quality
Accountable for end-to-end cloud operations performance for all Business Units under the Group Office Shared Services model - covering incident, change, request, problem and capacity management across the multi-cloud estate.
...