LLM

The Doubao Large Language Model (LLM) team is dedicated to aggressively advancing the next generation of LLMs, tackling fundamental challenges in LLM development head-on. Our areas of focus include model pretraining, posttraining, inference, memory capabilities, learning, interpretability and other related directions. We dive deep into the latest technologies and create comprehensive solutions from concept to completion. In our endeavor to adopt LLMs in real-life scenarios, we persistently seek out methods to enhance applications through technological innovation.

Main areas of focus

Horizon

We're a team dedicated to cutting-edge research, driven by a mission to push the boundaries of model intelligence, and fuelled by a long-term vision and unwavering commitment. We are seeking passionate and self-driven researchers who share our vision to collaborate on LLM research.

Pretrain

The team is dedicated to developing the next generation of pretraining paradigms, ensuring that the scaling law can still be effectively continued even with several orders of magnitude increase in computing power, thereby enhancing the upper limit of general intelligence. At the same time, we focus on AI for math, solving genuinely challenging and valuable mathematical problems, while advancing research in the field for humanity's benefit.

Posttrain

The team is responsible for comprehensive work on LLM posttraining and provides core posttraining foundational technologies for large multimodal unified models. The team's goal is to research and explore next-generation leading technologies in the posttraining phase, such as SFT, RM, RL, and self-learning, while making significant optimizations and improvements in key areas like reasoning, coding, and agent.

Research topics

Horizon

Limits of the Long CoT Model

Explore the limits of long reasoning models, continuously expanding from the perspective of inference-time scaling and model scaling, with the objective of solving complex problems that humans cannot yet address.

O-model Architecture

The scaling law of inference dimension is a key to achieving ultimate intelligence. We aim to develop a lifelong learning intelligent system, enabling models to possess reasoning capability with linear complexity.

Memory

Establish a streaming memory mechanism that can manage the context of unlimited length and truly achieve online learning, such as learning to code by reading algorithms and learning a new language by reading a grammar book.

Agent's Reasoning and Planning Capabilities

Committed to solving the core and fundamental problems in the field of agent, building a super intelligence system to accelerate the leapfrog development in natural sciences, economic production, and daily life, and comprehensively enhancing societal efficiency and the quality of human life.

Pretrain

Next-Generation Pretraining Paradigm

Explore model self-training and evolution based on agent and active learning; explore large-scale synthetic data pretraining to break through the bottleneck and boundaries of human data; explore multimodal joint pretraining and better modeling methods to enhance the upper limits of intelligence.

High-capability Ultra-small Models

Research on activating and achieving high inference capabilities with 1B model, as well as new methods for supporting data and modeling.

AI for Math

The long-term vision is to enable AI to automatically or assist mathematicians in solving truly challenging and valuable mathematical propositions, such as the Riemann Hypothesis; by effectively combining NL and FL, we can explore a new paradigm for the next generation of provers with higher upper limits.

Posttrain

Large Scale Reinforcement Learning

Address the issue of large-scale RL scaling, enhance model capabilities, and align with human preferences.

Reward Model System

Integrate model, verifier, tool, and agent to provide accurate and generalized signals for data selection, synthesis, and RL training.

The Generalization of o1 / Long CoT in General Domains

Enable o1-level reasoning capabilities to break through the boundaries of math and code, and reach the level of human experts in more fields.

Long Horizon Task / Agent

Address long-distance, multi-turn modeling in long horizon task/agent, enabling models to truly solve complex problems in the human world.

Model OOD Solution Capability

Enable models to have excellent problem-solving capabilities across all types of problems.

Next Generation of RM/RL Algorithms

Explore new RM/RL algorithms that can overcome the current limitations.

Data Mining and Synthesis

Mine scarce posttraining data in the public domain and use posttraining technology to synthesize data based on seed data, further improving the upper limit of model capabilities.

Horizon

Limits of the Long CoT Model

O-model Architecture

Memory

Agent's Reasoning and Planning Capabilities

Pretrain

Next-Generation Pretraining Paradigm

High-capability Ultra-small Models

Research on activating and achieving high inference capabilities with 1B model, as well as new methods for supporting data and modeling.

AI for Math

Posttrain

Large Scale Reinforcement Learning

Address the issue of large-scale RL scaling, enhance model capabilities, and align with human preferences.

Reward Model System

Integrate model, verifier, tool, and agent to provide accurate and generalized signals for data selection, synthesis, and RL training.

The Generalization of o1 / Long CoT in General Domains

Enable o1-level reasoning capabilities to break through the boundaries of math and code, and reach the level of human experts in more fields.

Long Horizon Task / Agent

Address long-distance, multi-turn modeling in long horizon task/agent, enabling models to truly solve complex problems in the human world.

Model OOD Solution Capability

Enable models to have excellent problem-solving capabilities across all types of problems.

Next Generation of RM/RL Algorithms

Explore new RM/RL algorithms that can overcome the current limitations.

Data Mining and Synthesis

Mine scarce posttraining data in the public domain and use posttraining technology to synthesize data based on seed data, further improving the upper limit of model capabilities.