jobs in Tencent

全职 Research Intern — Coding LLMs 工作, 薪水, Tencent 公司招聘中 - Ricebowl

Research Intern — Coding LLMs

Tencent

Undisclosed

Singapore

分享
保存

工作地点

  • Singapore

职位描述

岗位职责

We are looking for research interns to work on foundational areas for coding language models, including pre-training data, mid-training data, synthetic data generation, evaluation, and agentic coding.



Responsibilities

* Explore data-centric methods for improving coding LLMs, including data filtering, quality assessment, deduplication, data mixture, and diversity analysis.

* Build synthetic data and evaluation pipelines for code generation, code editing, repo-level reasoning, tool use, and multi-step coding tasks.

* Run experiments to analyze how data, model, and training strategies affect coding capabilities

* Work with large-scale code corpora, developer activity data, and agentic coding trajectories.


Requirements

* Strong programming skills in Python.

* Solid understanding of machine learning and large language models.

* Familiarity with LLM pre-training, mid-training, code models, data curation, evaluation, agents, or tool use.

* Strong experiment design, data analysis, and problem-solving skills.

* Interest in code intelligence, software engineering automation, and agentic coding.


Preferred Qualifications

* Experience with code data processing, GitHub-scale data, synthetic data, LLM evaluation, semantic deduplication, or agentic coding.

* Research experience, publications, or open-source projects in related areas are a

plus.


What We Offer

* Access to large-scale real-world coding data and agentic trajectories.

* Rich compute resources and model APIs for fast research iteration.

* Opportunities to work on real-world coding model applications and the full model development loop.

重要安全守则

申请工作时,切勿提供您的银行或信用卡详细资料。不要转账或完成无关的在线调查问卷。如果您发现可疑内容,请举报此招聘广告。

了解更多