Generate speech from text using a reference voice
Separate audio into stems using various models
Convert and separate audio using models and TTS