LLM

豆包大模型 LLM 团队致力于激进的探索下一代大模型，并且研究大模型研发的基础问题，包括但不限于模型的 Pretrain、Posttrain、推理、记忆、学习、可解释性等方向。我们探索前沿技术，进行端到端落地，不断摸索大模型跟应用的结合点和解放应用的可能性

研究领域

Horizon

团队聚焦前沿研究，以不断突破模型智能边界为使命，拥有长期愿景和决心。我们期待有热情且目标一致的 Researcher 加入，一起投身于大模型研究

Pretrain

团队致力于研发下一代 Pretrain 新范式，使物理算力继续提升数个量级下仍能有效延续 Scaling Law，提升通用智能上限。同时攻坚 AI for Math，解决真正困难和有价值的数学命题，加速人类数学领域的研究

Posttrain

团队负责 LLM Posttrain 全面工作，同时为全模态统一的大模型提供核心 Posttrain 基础技术。团队目标是在 Posttrain 阶段研究和探索下一代领先的 SFT / RM / RL / Self-Learning 等技术，同时在 Reasoning / Coding / Agent 等关键方向有较大的优化和提升

课题方向

Horizon

Long CoT 模型的极限

探索长推理模型的上限，从 Inference-Time Scaling 和 Model Scaling 的角度不断扩展，目标是解决人类还不能解决的复杂问题

O-model Architecture

推理维度的 Scaling Law 是通往终极智能的重要路径，我们的目标是开发终生学习的智能系统，让模型具备线性复杂度的推理能力

Memory

建立流式 Memory，管理无限长的上下文，做到真正的在线学习，比如：阅读算法导论，学会写代码，或者阅读语法书，学会新语言

推理规划 Agent

致力于解决 Agent 领域核心基础型问题，构建超级智能系统（Super Intelligence System），加速自然科学、经济生产和日常生活的跨越式发展，全面提升社会效率与人类生活品质

Pretrain

下一代 Pretrain 新范式

探索基于 Agent 和 Active Learning 的模型自我训练与演化；探索大规模合成数据 Pretrain，突破人类数据瓶颈与边界；探索多模态联合 Pretrain，以及更好的 Modeling 方法，提升智能的上限

高能力超小模型

研究如何用 1B 激活实现高推理能力，及支撑的 Data 和 Modeling 新方法

AI for Math

长期愿景是实现 AI 数学家，能自动或辅助数学家，去解决真正困难和有价值的数学命题，比如黎曼猜想；通过让 NL 和 FL 有效结合，研究上限更高的下一代 Prover 新范式

Posttrain

Large Scale Reinforcement Learning

解决超大规模 RL Scaling 的问题，提升模型的智力，对齐人类偏好

Reward Model System

综合 Model、Verifier、Tool 和 Agent，给数据筛选、合成和 RL 训练提供准确泛化的信号

o1 / Long CoT 在通用领域的泛化

让 o1 级别 Reasoning 能力突破 Math 和 Code 边界，在更多领域达到人类专家水平

Long Horizon Task / Agent

解决 Long Horizon Task / Agent 中长距离、多轮建模，让模型能够真正解决人类世界的复杂问题

模型 OOD 解决能力

让模型能够在所有类型的问题分布上都具有出色的解决能力

下一代 RM / RL 算法

研究能够突破当前瓶颈的新 RM / RL 算法

数据挖掘和合成

在公开领域挖掘 Posttrain 稀缺数据，同时也根据种子数据使用 Posttrain 技术合成数据，进一步提升模型的能力上限

Horizon

Long CoT 模型的极限

探索长推理模型的上限，从 Inference-Time Scaling 和 Model Scaling 的角度不断扩展，目标是解决人类还不能解决的复杂问题

O-model Architecture

推理维度的 Scaling Law 是通往终极智能的重要路径，我们的目标是开发终生学习的智能系统，让模型具备线性复杂度的推理能力

Memory

建立流式 Memory，管理无限长的上下文，做到真正的在线学习，比如：阅读算法导论，学会写代码，或者阅读语法书，学会新语言

推理规划 Agent

Pretrain

下一代 Pretrain 新范式

高能力超小模型

研究如何用 1B 激活实现高推理能力，及支撑的 Data 和 Modeling 新方法

AI for Math

Posttrain

Large Scale Reinforcement Learning

解决超大规模 RL Scaling 的问题，提升模型的智力，对齐人类偏好

Reward Model System

综合 Model、Verifier、Tool 和 Agent，给数据筛选、合成和 RL 训练提供准确泛化的信号

o1 / Long CoT 在通用领域的泛化

让 o1 级别 Reasoning 能力突破 Math 和 Code 边界，在更多领域达到人类专家水平

Long Horizon Task / Agent

解决 Long Horizon Task / Agent 中长距离、多轮建模，让模型能够真正解决人类世界的复杂问题

模型 OOD 解决能力

让模型能够在所有类型的问题分布上都具有出色的解决能力

下一代 RM / RL 算法

研究能够突破当前瓶颈的新 RM / RL 算法

数据挖掘和合成

在公开领域挖掘 Posttrain 稀缺数据，同时也根据种子数据使用 Posttrain 技术合成数据，进一步提升模型的能力上限

精选论文

2025.04.24

Why Does the Effective Context Length of LLMs Fall Short?

Advancements in distributed training and efficient attention mechanisms have significantly expanded the context window sizes of large language models (LLMs). However, recent work reveals that the effective context lengths of open-source LLMs often fall short, typically not exceeding half of their training lengths. In this work, we attribute this limitation to the left-skewed frequency distribution of relative positions formed in LLMs pretraining and post-training stages, which impedes their ability to effectively gather distant information. To address this challenge, we introduce ShifTed Rotray position embeddING (STRING). STRING shifts well-trained positions to overwrite the original ineffective positions during inference, enhancing performance within their existing training lengths. Experimental results show that without additional training, STRING dramatically improves the performance of the latest large-scale models, such as Llama3.1 70B and Qwen2 72B, by over 10 points on popular long-context benchmarks RULER and InfiniteBench, establishing new state-of-the-art results for open-source LLMs. Compared to commercial models, Llama 3.1 70B with \method even achieves better performance than GPT-4-128K and clearly surpasses Claude 2 and Kimi-chat.

Chenxin An, Jun Zhang, Ming Zhong, Lei Li, Shansan Gong, Yao Luo, Jingjing Xu, Lingpeng Kong

LLM

2025.04.24

The Rise and Down of Babel Tower: Investigating the Evolution Process of Multilingual Code Large Language Model

Large language models (LLMs) have shown significant multilingual capabilities. However, the mechanisms underlying the development of these capabilities during pre-training are not well understood. In this paper, we use code LLMs as an experimental platform to explore the evolution of multilingual capabilities in LLMs during the pre-training process. Based on our observations, we propose the Babel Tower Hypothesis, which describes the entire process of LLMs acquiring new language capabilities. During the learning process, multiple languages initially share a single knowledge system dominated by the primary language and gradually develop language-specific knowledge systems. We then validate the above hypothesis by tracking the internal states of the LLMs through identifying working languages and language transferring neurons. Experimental results show that the internal state changes of the LLM are consistent with our Babel Tower Hypothesis. Building on these insights, we propose a novel method to construct an optimized pre-training corpus for multilingual code LLMs, which significantly outperforms LLMs trained on the original corpus. The proposed Babel Tower Hypothesis provides new insights into designing pre-training data distributions to achieve optimal multilingual capabilities in LLMs.

Jiawei Chen, Wentao Chen, Jing Su, Jingjing Xu, Hongyu Lin, Mengjie Ren, Yaojie Lu, Xianpei Han, Le Sun

LLM

2025.04.01

Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?

The rapid escalation from elementary school-level to frontier problems of the difficulty for LLM benchmarks in recent years have weaved a miracle for researchers that we are only inches away from surpassing human intelligence. However, is the LLMs' remarkable reasoning ability indeed comes from true intelligence by human standards, or are they simply reciting solutions witnessed during training at an Internet level? To study this problem, we propose RoR-Bench, a novel, multi-modal benchmark for detecting LLM's recitation behavior when asked simple reasoning problems but with conditions subtly shifted, and conduct empirical analysis on our benchmark. Surprisingly, we found existing cutting-edge LLMs unanimously exhibits extremely severe recitation behavior; by changing one phrase in the condition, top models such as OpenAI-o1 and DeepSeek-R1 can suffer 60%.

Kai Yan, Yufei Xu, Zhengyin Du, Xuesong Yao, Zheyu Wang, Xiaowen Guo, Jiecao Chen

LLM

查看更多论文

热招岗位

大语言模型算法工程师-豆包大模型

北京/上海/深圳/杭州

社招

立即投递

大语言模型算法研究专家-豆包大模型

北京/上海/深圳/杭州

社招

立即投递

大语言模型推理算法研究专家-豆包大模型

北京/上海/深圳/杭州

社招

立即投递

大语言模型算法工程师-豆包大模型（Top Seed）

北京/上海/深圳/杭州

校招

立即投递

大语言模型算法实习生-豆包大模型

北京/上海/深圳/杭州

实习

立即投递

查看更多岗位