MusicGen Update (#27084)

* [MusicGen] Add stereo model

* safe serialization

* Update src/transformers/models/musicgen/modeling_musicgen.py

* split over 2 lines

* fix slow tests on cuda
This commit is contained in:
Sanchit Gandhi
2023-11-08 13:26:02 +00:00
committed by GitHub
parent 5ef650b0ae
commit f16ff0f07e
5 changed files with 244 additions and 28 deletions

View File

@@ -57,6 +57,11 @@ Generation is limited by the sinusoidal positional embeddings to 30 second input
than 30 seconds of audio (1503 tokens), and input audio passed by Audio-Prompted Generation contributes to this limit so,
given an input of 20 seconds of audio, MusicGen cannot generate more than 10 seconds of additional audio.
Transformers supports both mono (1-channel) and stereo (2-channel) variants of MusicGen. The mono channel versions
generate a single set of codebooks. The stereo versions generate 2 sets of codebooks, 1 for each channel (left/right),
and each set of codebooks is decoded independently through the audio compression model. The audio streams for each
channel are combined to give the final stereo output.
### Unconditional Generation
The inputs for unconditional (or 'null') generation can be obtained through the method