MusicLM generates high-fidelity music from text descriptions
such as "a calming violin melody backed by a distorted guitar
riff". MusicLM can take as input a text prompt, a melody, or
an existing track which it can alter or continue. Moreover, it
supports creating seamlessly loopable music derived from any
set of inputs. Building on
AudioLM,
music generation is
performed as a hierarchical sequence-to-sequence modeling
task, generating music at a sampling rate of up to 48
kHz. Below we compare the latest version of MusicLM to the
originally published
version
on the same prompts. Recent improvements include the integration
of classifier-free guidance,
improved acoustic tokens, and a new backbone
architecture specifically designed to operate on such acoustic
tokens, as well as applying
SoundStorm,
to achieve efficient high-fidelity audio generation.
MusicLM Research Team1
Andrea Agostinelli, Zalán Borsos, Antoine Caillon, Geoffrey Cideron, Timo Denk, Chris Donahue, Michael Dooley, Jesse Engel, Christian Frank, Sertan Girgin, Qingqing Huang, Aren Jansen, Matej Kastelic, Yunpeng Li, Brian McWilliams, Adam Roberts, Matt Sharifi, Ondrej Skopek, Marco Tagliasacchi, Alex Tudor, Mauro Verzetti, Damien Vincent, Neil Zeghidour and Mauricio Zuluaga
Caption | MusicFX4 |
---|