Vision
The Doubao Vision team focuses on foundational models for visual generation, developing multimodal generative models, and carrying out leading research and application development to solve fundamental computer vision challenges in GenAI.
Main areas of focus
Research focus
The team focuses on visual generation models, multimodal architectures, and technology research in AI vision-related areas.
Areas for exploration
This includes AIGC, diffusion models, autoregressive models, multimodal models, 3D/4D generation, visual self-supervised learning, and accelerating model optimization.
Research topics
Foundational models for visual generation
Researching and developing foundational models for visual generation (images and videos), ensuring high interactivity and controllability in visual generation, understanding patterns in videos, and exploring various visual-oriented tasks based on generative foundational models.
Multimodal
Diffusion Model
Auto Regression Model
Foundation
Multimodal generative models
Integrating various modalities into a unified generative model, generating and understanding joint modeling, supporting interleaved generation and simultaneous generation across various modalities (such as the digital avatar), and enhancing the contextual capabilities and consistency of generative models.
Multimodel
Diffusion Model
Auto Regression Model
Foundation
3D/4D generative models
3D/4D foundational generative models, learning visual world knowledge from video and 3D data, understanding the physical world's 3D space and physical laws, building spatial intelligence and world models, and exploring physics and rendering engines based on generative models.
3D
4D
World Model
Multimodal model design and optimization
Design and optimize multimodal model network architectures, optimize diffusion models, carry out efficient large-scale distributed training and inference, and push for model acceleration and optimization.
Multimodal
Optimization
Distillation
Quantization
Technical applications
Doubao Text-to-Image
The Doubao Text-to-Image Model has been successfully integrated into products like Douyin, CapCut/Lark, Doubao, and StarSketch. Users can input prompts into the Doubao app to generate high-quality images that beautifully capture light and shadow, create rich color atmospheres, and depict character aesthetics. The model supports input in both Chinese and English, ensuring precise understanding of complex prompts.
Text-to-Image
Model
Jimeng
Jimeng/Dreamina is an AI-powered creative product developed by ByteDance. It enables users to generate high-quality images and videos through inputs in natural language and pictures. The platform provides an intelligent canvas, a story creation mode, and various AI editing tools, significantly boosting users' creative productivity.
AI-powered
Creative