jobs in Shopee

Shopee Hiring! Full Time [AI] AIGC Distributed Training - Optimization Engineer (Pre-training) in - Ricebowl

[AI] AIGC Distributed Training - Optimization Engineer (Pre-training)

Undisclosed

Singapore

Share
Save

Working Location

  • Singapore

Job Description

Responsibilities

Job Description

About Us

Sea Group is establishing a brand-new, strategic AI department. This department is dedicated to exploring the transformative potential of generative AI in revolutionizing human connection, self-expression and communication diversity, and social interaction. We are building the next generation of AI-native applications and a comprehensive Model-as-a-Service (MaaS) product support system. Based on massive multi-country data, we are building a leading multilingual AI ecosystem from the ground up. We look forward to more outstanding talents joining us to build leading Southeast Asian multilingual models and explore innovative AI-native applications.

The AIGC team at Sea AI Department is dedicated to pushing the boundaries of visual synthesis. We aim to achieve industry leadership in high-fidelity portrait and video generation. This team focuses on fundamental research and the scaling of generative models to empower next-generation social and E-commerce platforms.

About The Job

  • Toolchain Development: Design and build distributed training toolchains to support ultra-large-scale AIGC model training.
  • System Optimization: Optimize distributed training performance across computation, communication, and storage layers.
  • Stability & Scalability: Analyze and resolve technical bottlenecks in the training process, specifically focusing on improving training stability and efficiency.
  • Frontier Research: Track and explore cutting-edge distributed training technologies, leading project planning and production-grade implementation.

Requirements

  • Master’s degree or above in Computer Science or related fields; Bachelor can be considered with a strong industrial experience.
  • Minimum 2 years of relevant experience.
  • Distributed Expertise: Deep understanding of distributed training principles (Data/Pipeline/Tensor/Expert Parallelism) with proven hands-on experience.
  • Framework Proficiency: Expert in deep learning frameworks such as PyTorch, DeepSpeed, and Megatron-LM.
  • Low-level Knowledge: Familiar with GPU hardware architecture and CUDA programming; experience in CUDA kernel development/debugging and familiarity with NCCL and cuDNN.
  • AIGC Background: Understanding of AIGC pre-training methodologies, Transformer architectures, and Diffusion models (e.g., Stable Diffusion, Flux).
  • Core Competency: Strong problem-solving skills, innovative thinking, and excellent team collaboration/communication skills.

Important Information

Never provide your bank or credit card details when applying for jobs. Do not transfer any money or complete unrelated online surveys. If you see something suspicious, Report this Job ad.

Learn More