About the Role
We are looking for a Solution Architect to support enterprise customers using GPU cloud infrastructure for AI inference workloads. This role will act as the key technical point of contact for Western / English-speaking customers, helping them design, deploy, and optimize AI workloads on GPU cloud platforms.
The ideal candidate should have a strong background in GPU infrastructure and networking, with good understanding of cloud infrastructure and AI inference workloads.
Key Responsibilities
- Act as the technical point of contact for enterprise customers using GPU cloud / AI infrastructure services.
- Understand customer requirements around AI inference workloads and translate them into practical technical solutions.
- Design GPU cloud solutions covering GPU compute, networking, storage, deployment architecture, monitoring, and reliability.
- Support technical discussions, solution presentations, PoCs, customer onboarding, and troubleshooting.
- Work closely with engineering, product, and business teams to ensure customer requirements are delivered effectively.
- Help customers optimize inference performance, GPU utilization, network performance, cost efficiency, and deployment reliability.
- Provide customer feedback to internal teams to improve platform capabilities and service quality.
Requirements
- 5–10 years of experience in solution architecture, cloud infrastructure, AI infrastructure, GPU infrastructure, networking, DevOps/SRE, or related technical roles.
- Strong understanding of GPU infrastructure, AI inference workloads, model deployment, or LLM serving.
- Solid experience in networking, including cloud networking, data center networking, routing, load balancing, or high-performance network architecture.
- Experience with Kubernetes, Linux, containers, monitoring, and production system troubleshooting.
- Good customer-facing communication skills, especially with English-speaking enterprise customers.
- Able to explain complex technical topics clearly to both technical and business stakeholders.
- Singapore-based candidates are preferred.
Preferred Qualifications
- Experience with GPU cloud, AI inference platforms, HPC, or large-scale compute infrastructure.
- Familiarity with NVIDIA GPUs, CUDA, vLLM, Triton, Kubernetes GPU scheduling, InfiniBand, RoCE, or similar technologies.
- Storage infrastructure experience is a plus, but not the top priority.
- Prior experience supporting international or Western enterprise customers.
- Bilingual communication skills are a plus.