jobs in Shopee

全职 [AI] AIGC Distributed Training - Optimization Engineer (Pre-training) 工作, 薪水, Shopee 公司招聘中 - Ricebowl

[AI] AIGC Distributed Training - Optimization Engineer (Pre-training)

Undisclosed

Singapore

分享
保存

工作地点

  • Singapore

职位描述

岗位职责

Job Description

About Us

Sea Group is establishing a brand-new, strategic AI department. This department is dedicated to exploring the transformative potential of generative AI in revolutionizing human connection, self-expression and communication diversity, and social interaction. We are building the next generation of AI-native applications and a comprehensive Model-as-a-Service (MaaS) product support system. Based on massive multi-country data, we are building a leading multilingual AI ecosystem from the ground up. We look forward to more outstanding talents joining us to build leading Southeast Asian multilingual models and explore innovative AI-native applications.

The AIGC team at Sea AI Department is dedicated to pushing the boundaries of visual synthesis. We aim to achieve industry leadership in high-fidelity portrait and video generation. This team focuses on fundamental research and the scaling of generative models to empower next-generation social and E-commerce platforms.

About The Job

  • Toolchain Development: Design and build distributed training toolchains to support ultra-large-scale AIGC model training.
  • System Optimization: Optimize distributed training performance across computation, communication, and storage layers.
  • Stability & Scalability: Analyze and resolve technical bottlenecks in the training process, specifically focusing on improving training stability and efficiency.
  • Frontier Research: Track and explore cutting-edge distributed training technologies, leading project planning and production-grade implementation.

Requirements

  • Master’s degree or above in Computer Science or related fields; Bachelor can be considered with a strong industrial experience.
  • Minimum 2 years of relevant experience.
  • Distributed Expertise: Deep understanding of distributed training principles (Data/Pipeline/Tensor/Expert Parallelism) with proven hands-on experience.
  • Framework Proficiency: Expert in deep learning frameworks such as PyTorch, DeepSpeed, and Megatron-LM.
  • Low-level Knowledge: Familiar with GPU hardware architecture and CUDA programming; experience in CUDA kernel development/debugging and familiarity with NCCL and cuDNN.
  • AIGC Background: Understanding of AIGC pre-training methodologies, Transformer architectures, and Diffusion models (e.g., Stable Diffusion, Flux).
  • Core Competency: Strong problem-solving skills, innovative thinking, and excellent team collaboration/communication skills.

重要安全守则

申请工作时,切勿提供您的银行或信用卡详细资料。不要转账或完成无关的在线调查问卷。如果您发现可疑内容,请举报此招聘广告。

了解更多