8 Key Moments of Doubao Large Models in 2024
8 Key Moments of Doubao Large Models in 2024
日期
2024-12-30
分类
技术解读
2024 is coming to an end. No matter how turbulent the AI wave surges,
true
believers still adhere to accelerating progress toward their goal of AGI.
Since its
debut on May 15, we have seen
230 days birth and rapid growth of Doubao large models.
From learning to speak like a child to understanding the world,
then to drawing creators' imagined fantasy, it is still in its infancy.
But,every step counts.
We want to share with you about 8 key moments of Doubao large models in 2024.
1. Learning to Listen and Speak Like a Human
In July, Doubao large model can
understand conversations in more than 20 languages mixed with dialects , and think while listening.
Doubao large models have also learned how to express emotion in speech.
It can be interrupted at any time and can "cut in" during an interaction.
It can also retain human pronunciation habits such as omitted sounds, accents, etc.
This is made possible by the new Doubao Automatic Speech Recognition Model "Seed-ASR"
and foundation model for speech generation "Seed-TTS".
Unlike traditional small speech models,
the Doubao speech large model draws on more diverse and extensive data,
incorporates chain of-thought reasoning, and exhibits exceptional generalization capabilities.
2. Mastering Music and Singing
In September, Doubao large models brought to life the idea that
"An AI can be a band."
From songwriting and performance generalization to vocal singing,
Doubao large models have learned more than 10 musical skills.
It can provide unexpected inspiration for music creation.
Underpinning this achievement is the music generative model "Seed-Music" .
By combining the advantages of language models and diffusion models,
Seed-Music offers a universal framework for music generation,
with highly controllable editing.
3. Doubao Large Models Can Make Videos!
In September, Doubao large models also learned
to follow complex prompts
to accurately generate HD video with multiple interactive subjects
and flexibly control camera angles,
offering creators a visual experience that fuses reality with fantasy.
Behind the scenes are two Doubao video generation models launched at the same time:
PixelDance and Seaweed
A new diffusion model training method has built consistent lens performance.
Optimized Transformer structure maximizes the generalization of video generation.
The technology for simultaneous video and sound effect generation sparks even more creative inspiration.
4. Advanced Painting and Image Editing Master
Chinese, cinematic, or surreal aesthetics,
Doubao large models master them all.
In November, it also learned"Image Editing with One Phrase" and "Poster Generation with One Click"
allowing for image editing and accurate text generation based on any prompts.
Behind it is the constantly iterating Doubao text-to-image model.
It can achieve precise image-text alignment in complex scenes
and build high-quality text rendering capabilities.
Universal image-editing model SeedEdit
allows users to edit any image through natural language.
5. Professional-Level Programming
In early December,
the coding abilities of Doubao large models increased significantly.
It can serve as both AI programmer and data analyst.
It supports free on-canvas code preview, human-machine collaborative programming,
and one-click data processing and visual analysis.
These features are powered by the Doubao codeLLM "Doubao-coder",
which was produced by massive real-world programming data and intensive training carried out by experts in related fields.
The model provides in-depth support for over 16 programming languages and 11 real-world application scenarios,
addressing full-stack programming needs for front-end and back-end development, machine learning, etc.
6. Pushing the Limits of Language Understanding
At the same time, the context window of Doubao large models
has increased to 3 million words , an industry-leading milestone,
which allows it to read hundreds of academic reports easily at once.
Only a 15-second delay is shown per million tokens.
This achievement is built on a variety of breakthroughs in data algorithms and model-acceleration optimizations,
including contextual correlation data algorithms such as STRING.
The outcome is a significant improvement in LLM's ability to leverage massive external knowledge,
reducing latency to ten-second level with sparse and distributed schemes.
7. Learning to "Open its Eyes" to See the World
In mid-December,
Doubao large models learned to perceive the world visually,
incorporating multiple senses for deep thought and creation.
Take a picture of a calculus math problem,
and Doubao can not only understand accurately but solve it in no time.
It is powered by the new Doubao Visual Language Model,
which fuses visual language understanding with text generation in a single model architecture,
exhibiting exceptional content recognition,
strong reasoning, and fine expressing capability.
8. General Model Capabilities Comparable to GPT-4o
Also in mid-December,
the Doubao General Model "Doubao-pro" underwent a full upgrade.
aligning its capabilities with GPT-4o.
With reasoning ability enhanced,
it also learns to "self-reflect" when providing responses.
This is enabled by a huge amount of data optimization and model architecture innovation,
including increased model sparsity and introducing reinforcement learning, etc.
The accuracy of understanding and output quality of Doubao-pro took a giant leap,
making it a true all-rounder that balances performance and efficiency.
In this year,
the Doubao (Seed) team deeply delved into fundamental AI research.
57 of its papers were selected for ICLR, CVPR, NeurIPS, and other top conferences.
There were also open source projects with over 1 million downloads and repositories receiving 10,000 Stars on GitHub.
The Doubao (Seed) team also had in-depth cooperation with nearly 20 universities
and established joint laboratory with Tsinghua and Peking University.
The Doubao Large Model Fund has supported more than 40 top scholars,
helping them in researching groundbreaking AI technologies.
In 2024, Doubao large models support more than 50 application scenarios.
Doubao has become the most popular AI product in China.
Through the Volcano Engine, Doubao large models have served more than 30 industries.
Its average daily token usage has exceeded 4 trillion,
which is 33-fold from the release in May.
After 230 days, the adventure of the Doubao large models has just begun.
The distant shores of universal intelligence belong to those who keep pressing forward.
To find the most promising researchers,
the team launched the Top Seed Talent Program this year,
recruiting top PhD graduates globally
to collaborate on researching world-class AI topics.
In 2025, the Doubao (Seed) team
will continue to explore fundamental model topics,
committing to change the world through technology.
We invite top talents with the same vision to join us!
If you are interested in agent collaboration, data science, or applying large models to solve complex problems, and are eager to explore frontier research topics, please visit our careers page for more information about open positions.