September 18, 2024
Hi, Seed-Music
Our unified framework enables the generation of music with expressive vocals in multiple languages, allows precise note-level adjustments to model outputs, and offers the ability to integrate users' own voices into music creation, and much more.
Read Tech Report
Overview
We present Seed-Music: a suite of music generation systems capable of producing high-quality music with fine-grained style control. Our methodology, experiments, and solutions are designed to address diverse use cases. Rather than relying on a single modeling approach, such as auto-regression (AR) or diffusion, we propose a unified framework that adapts to the evolving workflows of musicians. Our key contributions are threefold:



Listening Examples
These sections contain audio samples all generated with Seed-Music. They are presented in the same order as in our technical paper.
Lyrics2Song
Lyrics2Song generalizes the task of transforming natural language into music. We support text input in the form of lyrics and style descriptors.

Shortform Audio Generation
Lyrics2Song can generate shortform audio clips with expressive vocals and appropriate backing tracks.
Listen to the demos

Longform Audio Generation
Lyrics2Song can also generate longform, full length pieces of music which maintain melodic coherence, consistent style and long term structure throughout.
Listen to the demos

Audio Prompting
Our system supports audio prompts alongside style prompts in natural language. We demonstrate two different modes called "audio continuation" and "audio style transfer".
Listen to the demos

Instrumental Music Generation
Instrumental generation can be generalized as a sub-task of vocal music generation with no input lyrics.
Listen to the demos
Lyrics2Leadsheet2Song
We propose a novel lead sheet token codec which unifies symbolic representation into a form which is both human-interpretable and friendly for LM and diffusion-based models.

Lyrics2Leadsheet
When text inputs are transformed into lead sheet tokens, musicians can inspect and modify note-aligned phonemes and multi-track instrumental parts in an interpretable manner.
Listen to the demos

Leadsheet2Song
Lead sheet tokens are a powerful intermediate representation that can be edited like MIDI while being fully compatible with modern LM and diffusion-based models.
Listen to the demos

Leadsheet2Vocals
Lead sheet tokens can be configured for Singing Voice Synthesis (SVS) by including only vocal attributes and rendering only into the vocal stem. This approach is extendable to other instrumental stems.
Listen to the demos
Music Editing
Fully diffusion pipelines are well-suited for music editing and post production.

Editing Lyrics
Edit the lyrics of an already recorded song and leave the vocal melody and backing track unchanged.
Listen to the demos

Editing Melody
Edit the melody of an already recorded song and leave the lyrics and backing track unchanged.
Listen to the demos
Singing Voice Conversion
We introduce a novel method for zero shot voice conversion in a singing voice context.

Singing Voice Conversion
Given a few seconds of singing or plain speech, our system can convert the reference voice into an expressive singing performance.
Listen to the demos