Generate speech from text using a reference audio
Use the FLUX model as much as you want.
Conversational speech generation