We’re working with a high-growth AI infrastructure company building foundational systems powering next-generation AI products and intelligent search infrastructure.
The team is building a search engine designed for AI agents — operating large-scale distributed systems that crawl the web, train state-of-the-art embedding models, and power high-performance vector search infrastructure. On the compute side, they operate a rapidly growing multi-million-dollar H200 GPU cluster alongside large-scale distributed batch processing systems running across tens of thousands of machines.
This is a deeply technical infrastructure role focused on building the internal platform and tooling that enables the entire engineering organization to move fast at scale.
What You’ll Work On
- Build and scale Kubernetes orchestration for large GPU clusters
- Design distributed infrastructure powering large-scale AI workloads
- Scale cloud batch job systems handling map-reduce workloads across tens of thousands of machines
- Improve GPU scheduling and cluster utilization efficiency
- Build observability, reliability, and internal platform tooling for production systems
- Work on infrastructure supporting AI training, inference, crawling, and data processing at massive scale
What We’re Looking For
- Experience designing and operating large-scale infrastructure systems
- Strong hands-on experience with Kubernetes in production environments
- Familiarity with GPU clusters, distributed compute, or cloud batch processing systems
- Strong understanding of observability, reliability engineering, and system optimization
- Experience with distributed systems and performance-oriented infrastructure
- Background in high-performance engineering environments is highly valued
Nice to Have
- Experience with Ray, distributed batch systems, or large-scale orchestration platforms
- Experience optimizing GPU utilization and scheduling
- Familiarity with AWS infrastructure at scale
- Exposure to AI/ML infrastructure environments
Why This Role
- Work on infrastructure problems typically seen only at hyperscale AI companies
- Join a highly technical, low-ego engineering culture
- Opportunity to shape foundational systems from an early stage
- High ownership and ability to work on deeply challenging engineering problems
- Competitive compensation with meaningful equity upside