Doubao Team

On the evening of June 19th, ByteDance held a dinner event near the main venue of CVPR 2024, inviting hundreds of practitioners, scholars, and students in the field of computer vision to participate.

The event's participants include college students and scholars who have traveled to Seattle from afar, as well as relevant stakeholders from different industry sectors. Amid an open and amicable atmosphere, everyone gathered together to discuss the development of computer vision technology under the wave of generative AI.

The event will bring together college students, scholars from various places, and relevant stakeholders from different industries in Seattle. The aim is to discuss the advancement of computer vision technology in the era of generative AI in a friendly and open atmosphere.

Yang Jianchao, leading the Intelligent Creation Team and the Doubao Vision Multi-modal team in ByteDance, delivered a speech at the event.

During his speech, Yang briefed the audience on ByteDance's company profile, main products, and global business layout. He also mentioned that the company has maintained a strong growth trend in recent years. "Reflecting on the six years spent at ByteDance, it has been an incredible journey for me to grow alongside the company and products", said Yang.

He then gave an introduction to the team. Yang believes the Intelligent Creation Team offers content creation technical support for various lines of business at ByteDance, such as Douyin. The Doubao Vision Multi-modal team focuses on vision LLM, image generation, video generation, and related basic research.

At the end of his speech, Yang shared a video, allowing the floor to intuitively understand how ByteDance performs in the field of generative AI.

By virtue of the LLM trend, CVPR 2024 has been well-received and gained high popularity. More and more selected papers at the event are co-produced by universities and enterprises, where the academic and research community brings in younger and high-potential talents and valuable insights, while the industry sectors offer real-life production scenarios, demands, and data computing resources. Some participants expressed their gratification with the fact that generative AI helps to bring the academic community and industry sectors together, the boundary of which used to be much clearer.

ByteDance has always valued technological research, exploration, and application. Over 30 research papers from ByteDance were selected by CVPR 2024, and some of the company's deliverables have attracted attention from the industry to some extent.

ByteDance's accomplishments include MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model, which is essentially a human image animation framework generated via diffusion technology, aiming at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.

In the field of depth estimation, ByteDance submitted a paper titled Depth Anything: Unleashing the Power of LargeScale Unlabeled Data. The paper proposes a highly practical solution for robust monocular depth estimation. The goal is to create a simple yet powerful foundation model that can handle any images under any circumstances, which could also be applied to processing videos.

A notable advancement in video generation is Make Pixels Dance: High-Dynamic Video Generation. PixelDance is an AI-powered video generation tool that can create dynamic and motion-rich videos based on text instructions with improved stability and more creative freedom. Additionally, it can produce continuous video clips.

At CVPR 2024, the Doubao LLM team also traveled to Seattle to exchange ideas and technical solutions with other researchers. The team has scheduled live presentations in the main venue's exhibition area to introduce and showcase some of the company's achievements.

Some of the selected papers this year were contributed by colleagues from the Doubao (Seed) team, and in some cases, the first author is still an intern at ByteDance.

The ByteDance Doubao (Seed) team has always cherished young and high-potential talents, encouraged them to dare to think bigger and take action, and believed in and supported them to achieve fruitful results. To that end, the team has just kicked off the Top Seed Talent Program for PhD graduates in 2025.

Top Seed is a dedicated program facing outstanding talents on campus. We hope to continuously attract and recruit top minds who aspire to "change the world with technology." Up till now, we've received hundreds of resumes.

At present, ByteDance continues to double down on its investment in top talents and cutting-edge technologies. Click here to submit your resume. Join ByteDance to explore and address frontier challenges in computer vision and LLM technology.

ByteDance CVPR 2024 Offline Event Review: Hundreds of Researchers Gathered to Shed Light on Generative AI Trends