Design and implement comprehensive logging, tracing, and automated evaluation frameworks to measure the reliability, accuracy, and relevance of LLM/GenAI outputs.
Evaluate, fine-tune, and deploy custom foundation models across languages and modalities using proprietary and external datasets.
Apply cutting-edge AI guardrails, responsible AI principles, and adversarial red-teaming strategies to ensure compliance, privacy, and security.
...
Oversee the technical strategy for highly scalable pipelines, data lakehouse architecture, real time processing and data mesh implementations whilst promoting engineering excellence through optimal practises in areas not limited to data modelling, CI/CD, data/AI ops, etc..
To emphasize, adoption of dbT and Apache Iceberg are of top priorities, including modern cloud-native architectures
Develop culture and talent strategy to attract, retain some of the world's best data engineering talents.
...
Reliability & Continuous Improvement — Establish and govern SLIs/SLOs, own incident management and post-incident reviews, and champion iterative improvements aimed at reducing manual effort and strengthening overall service resiliency
A minimum of 7 years in SRE/DevOps/Platform/Infrastructure Automation with a very very non negotiable minimum experience of working with Ansible for bare minimum of 1 year.
Second must have requirement is basically knowing at least one orchestration such as Jenkins or Cloudbees
...