LLM
The Doubao Large Language Model (LLM) team is dedicated to aggressively advancing the next generation of LLMs, tackling fundamental challenges in LLM development head-on. Our areas of focus include model pretraining, posttraining, inference, memory capabilities, learning, interpretability and other related directions. We dive deep into the latest technologies and create comprehensive solutions from concept to completion. In our endeavor to adopt LLMs in real-life scenarios, we persistently seek out methods to enhance applications through technological innovation.
Main areas of focus
Horizon
We're a team dedicated to cutting-edge research, driven by a mission to push the boundaries of model intelligence, and fuelled by a long-term vision and unwavering commitment. We are seeking passionate and self-driven researchers who share our vision to collaborate on LLM research.
Pretrain
The team is dedicated to developing the next generation of pretraining paradigms, ensuring that the scaling law can still be effectively continued even with several orders of magnitude increase in computing power, thereby enhancing the upper limit of general intelligence. At the same time, we focus on AI for math, solving genuinely challenging and valuable mathematical problems, while advancing research in the field for humanity's benefit.
Posttrain
The team is responsible for comprehensive work on LLM posttraining and provides core posttraining foundational technologies for large multimodal unified models. The team's goal is to research and explore next-generation leading technologies in the posttraining phase, such as SFT, RM, RL, and self-learning, while making significant optimizations and improvements in key areas like reasoning, coding, and agent.
Research topics
Horizon
Limits of the Long CoT Model
Explore the limits of long reasoning models, continuously expanding from the perspective of inference-time scaling and model scaling, with the objective of solving complex problems that humans cannot yet address.
O-model Architecture
The scaling law of inference dimension is a key to achieving ultimate intelligence. We aim to develop a lifelong learning intelligent system, enabling models to possess reasoning capability with linear complexity.
Memory
Establish a streaming memory mechanism that can manage the context of unlimited length and truly achieve online learning, such as learning to code by reading algorithms and learning a new language by reading a grammar book.
Agent's Reasoning and Planning Capabilities
Committed to solving the core and fundamental problems in the field of agent, building a super intelligence system to accelerate the leapfrog development in natural sciences, economic production, and daily life, and comprehensively enhancing societal efficiency and the quality of human life.
Pretrain
Next-Generation Pretraining Paradigm
Explore model self-training and evolution based on agent and active learning; explore large-scale synthetic data pretraining to break through the bottleneck and boundaries of human data; explore multimodal joint pretraining and better modeling methods to enhance the upper limits of intelligence.
High-capability Ultra-small Models
Research on activating and achieving high inference capabilities with 1B model, as well as new methods for supporting data and modeling.
AI for Math
The long-term vision is to enable AI to automatically or assist mathematicians in solving truly challenging and valuable mathematical propositions, such as the Riemann Hypothesis; by effectively combining NL and FL, we can explore a new paradigm for the next generation of provers with higher upper limits.
Posttrain
Large Scale Reinforcement Learning
Address the issue of large-scale RL scaling, enhance model capabilities, and align with human preferences.
Reward Model System
Integrate model, verifier, tool, and agent to provide accurate and generalized signals for data selection, synthesis, and RL training.
The Generalization of o1 / Long CoT in General Domains
Enable o1-level reasoning capabilities to break through the boundaries of math and code, and reach the level of human experts in more fields.
Long Horizon Task / Agent
Address long-distance, multi-turn modeling in long horizon task/agent, enabling models to truly solve complex problems in the human world.
Model OOD Solution Capability
Enable models to have excellent problem-solving capabilities across all types of problems.
Next Generation of RM/RL Algorithms
Explore new RM/RL algorithms that can overcome the current limitations.
Data Mining and Synthesis
Mine scarce posttraining data in the public domain and use posttraining technology to synthesize data based on seed data, further improving the upper limit of model capabilities.

Selected Papers

Apr 24, 2025
Why Does the Effective Context Length of LLMs Fall Short?
Advancements in distributed training and efficient attention mechanisms have significantly expanded the context window sizes of large language models (LLMs). However, recent work reveals that the effective context lengths of open-source LLMs often fall short, typically not exceeding half of their training lengths. In this work, we attribute this limitation to the left-skewed frequency distribution of relative positions formed in LLMs pretraining and post-training stages, which impedes their ability to effectively gather distant information. To address this challenge, we introduce ShifTed Rotray position embeddING (STRING). STRING shifts well-trained positions to overwrite the original ineffective positions during inference, enhancing performance within their existing training lengths. Experimental results show that without additional training, STRING dramatically improves the performance of the latest large-scale models, such as Llama3.1 70B and Qwen2 72B, by over 10 points on popular long-context benchmarks RULER and InfiniteBench, establishing new state-of-the-art results for open-source LLMs. Compared to commercial models, Llama 3.1 70B with \method even achieves better performance than GPT-4-128K and clearly surpasses Claude 2 and Kimi-chat.
Chenxin An, Jun Zhang, Ming Zhong, Lei Li, Shansan Gong, Yao Luo, Jingjing Xu, Lingpeng Kong
LLM
Apr 01, 2025
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
The rapid escalation from elementary school-level to frontier problems of the difficulty for LLM benchmarks in recent years have weaved a miracle for researchers that we are only inches away from surpassing human intelligence. However, is the LLMs' remarkable reasoning ability indeed comes from true intelligence by human standards, or are they simply reciting solutions witnessed during training at an Internet level? To study this problem, we propose RoR-Bench, a novel, multi-modal benchmark for detecting LLM's recitation behavior when asked simple reasoning problems but with conditions subtly shifted, and conduct empirical analysis on our benchmark. Surprisingly, we found existing cutting-edge LLMs unanimously exhibits extremely severe recitation behavior; by changing one phrase in the condition, top models such as OpenAI-o1 and DeepSeek-R1 can suffer 60%.
Kai Yan, Yufei Xu, Zhengyin Du, Xuesong Yao, Zheyu Wang, Xiaowen Guo, Jiecao Chen
LLM
Mar 18, 2025
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Inference scaling empowers LLMs with unprecedented reasoning ability, with reinforcement learning as the core technique to elicit complex reasoning. However, key technical details of state-of-the-art reasoning LLMs are concealed (such as in OpenAI o1 blog and DeepSeek R1 technical report), thus the community still struggles to reproduce their RL training results.
Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Tiantian Fan, Gaohong Liu, Lingjun Liu, Xin Liu, Haibin Lin, Zhiqi Lin, Bole Ma, Guangming Sheng, Yuxuan Tong, Chi Zhang, Mofan Zhang, Wang Zhang, Hang Zhu, Jinhua Zhu, Jiaze Chen, Jiangjie Chen, Chengyi Wang, Hongli Yu, Weinan Dai, Yuxuan Song, Xiangpeng Wei, Hao Zhou, Jingjing Liu, Wei-Ying Ma, Ya-Qin Zhang, Lin Yan, Mu Qiao, Yonghui Wu, Mingxuan Wang
Reinforcement Learning
Learn More

Featured Jobs

Research Scientist in Large Language Model
San Jose/Seattle
Experienced Hiring
Apply Now
Research Scientist, Reinforcement Learning
San Jose/Seattle
Experienced Hiring
Apply Now
Research Scientist in LLM Foundation Models (reasoning, planning & agent), Doubao, PhD Graduates- 2025 Start
San Jose/Seattle
Campus Recruitment
Apply Now
Research Scientist in Large Language, University Graduates (Search) - 2024 Start (PhD)
San Jose/Seattle
Campus Recruitment
Apply Now
Student Researcher in Foundation Models (Reasoning, Planning & Agent) - Doubao (Seed) - 2025 Start (PhD)
Seattle/San Jose
Internship
Apply Now
Student Researcher (Doubao (Seed) - LLM Post-training) - 2025 Start (PhD)
Seattle/San Jose
Internship
Apply Now