Configuration Parsing Warning:Invalid JSON for config file config.json

Leva-TTS โ€” Levantine Arabic โ‡„ English Text-to-Speech

๐ŸŒฟ Leva-TTS โ€” Low-Latency Code-Switching TTS (Levantine Arabic โ‡„ English)

A production-oriented Levantine Text-to-Speech model โ€” a fine-tuned XTTS-v2 optimized for real-time conversational agents.

Demo GitHub HF Space HF Dataset PyPI Open in Colab

๐ŸŽฏ KPI Target Measured Status
Peak VRAM (inference) โ‰ค 3 GB 2.13 GB โœ…
Time-to-First-Audio (p50) < 300 ms 565 ms โš ๏ธ
Real-Time Factor (RTF) < 0.3 0.21 โœ…
Streaming output required chunked PCM + WS โœ…

Leva-TTS is a text-to-speech model for Levantine Arabic / English code-switching, built by fine-tuning XTTS-v2 on 50,000 synthetic utterances generated with Lahgtna-OmniVoice v2. It handles natural intra-sentence switching between Levantine dialect and English, supports 10 built-in speakers and zero-shot voice cloning, and offers a streaming generator for low-latency conversational use.

  • Base model: coqui/XTTS-v2 (GPT autoregressive backbone + HiFi-GAN decoder)
  • Languages: Levantine Arabic (ar), English (en), and code-switch mixes
  • Sample rate: 24 kHz
  • Speakers: Badr, Mohamed, Saad, Rami, Fadi (M) ยท Amina, Fatma, Lamyaa, Mona, Haneen (F)

โœจ Key Features

Feature Details
๐Ÿ—ฃ๏ธ Natural code-switching Intra-sentence Arabic โ†” English
โšก Streaming output First audio chunk < 300 ms
๐Ÿ’พ Low VRAM โ‰ค 3 GB at inference
๐ŸŒฟ Levantine dialect ู‚โ†’/ส”/ glottal, ุฌโ†’/ส’/, il- article, b- prefix
๐Ÿ”ค Smart text front-end Partial diacritics on homographs + Levantine lexicon
๐Ÿ‘ฅ 10 speakers 5 male + 5 female, diverse Levantine accents
๐Ÿ“ก WebSocket streaming FastAPI server with real-time chunked PCM
๐Ÿ”Œ Pipecat ready Drop-in TTSService for voice agents

๐Ÿš€ Quick start (pip)

conda create -n leva-tts python=3.10 -y && conda activate leva-tts
sudo apt-get install -y espeak-ng ffmpeg libsndfile1

# Install PyTorch first so pip locks a CUDA build matching your GPU driver.
# (torch >= 2.9 ships CUDA-13 wheels that fail on common CUDA-12.x drivers.)
pip install torch==2.3.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121

pip install leva-tts

Leva-TTS uses the maintained coqui-tts fork (same TTS/XTTS modules); the unmaintained TTS package pins numpy==1.22.0 and cannot resolve on modern Python. A plain pip install leva-tts resolves cleanly.

from leva_tts import LevaTTS, SPEAKERS
import soundfile as sf

tts = LevaTTS(device="cuda", preprocess_text=True, verbose=False)
# auto-downloads this checkpoint + the 10 reference speakers on first use

# 1) Built-in speaker  (speaker must be one of SPEAKERS, else ValueError)
wav, sr = tts.synthesize("ู‡ูŽู„ูŽู‘ู‚ ุฃู†ุง ุนู… ุฃุดุชุบู„ ุนู„ู‰ the project",
                         speaker="Badr", temperature=0.65)
sf.write("out.wav", wav, sr)            # sr == 24000

# 2) Zero-shot voice cloning (your own 3โ€“10 s clip)
wav, sr = tts.zero_shot_synthesize("ูˆุงู„ู„ู‡ the meeting ูƒุงู†ุช important ูƒุชูŠุฑ",
                                   "my_voice.wav")

# 3) Streaming generators
for chunk in tts.stream("ุจูุฏูู‘ูŠ ุฃุญูƒูŠู„ูƒ ุนู† the new feature", speaker="Amina"):
    ...                                  # play / forward each chunk
for chunk in tts.zero_shot_stream("ู‡ู„ู‚ ุนู… ู†ุดุชุบู„", "my_voice.wav"):
    ...

Generation parameters (optional, per-call on every method): temperature, length_penalty, repetition_penalty, top_k, top_p, speed.

For the FastAPI streaming server, Pipecat integration, the Gradio demo, evaluation and fine-tuning, clone the repo: ๐Ÿ‘‰ https://github.com/MohammedAly22/Leva-TTS


๐Ÿ“ฆ Files in this repo

File Description
best_model.pth Fine-tuned XTTS-v2 checkpoint (GPT + decoder)
config.json XTTS-v2 config
reference_audios/ The 10 built-in speaker reference clips + references.json
sample_wavs/ Audio sample comparisons (Base XTTS-v2 vs Lahgtna v2 vs Leva-TTS)

Manual download: huggingface-cli download mohammedaly22/leva-tts


๐ŸŽต Audio samples โ€” Model comparison

Click a sentence to expand and play the three models. Progression: Base XTTS-v2 โ†’ Lahgtna v2 โ†’ Leva-TTS.

๐Ÿ”€ Code-switching (Levantine + English)

ู‡ูŽู„ูŽู‘ู‚ ุฃู†ุง ุนู… ุฃุดุชุบู„ ุนู„ู‰ the new project ุงู„ู„ูŠ ุญูƒูŠุชู„ูƒ ุนู†ู‡ โ€” Badr (M)

Base XTTS-v2

Lahgtna v2 (Levantine fine-tune)

๐ŸŸข Leva-TTS (this model)

ูˆุงู„ู„ู‡ the weather today ูƒุชูŠุฑ ุญู„ูˆ ุจุฏูŠ ุฃุทู„ุน ุจุฑุง โ€” Fatma (F)

Base XTTS-v2

Lahgtna v2 (Levantine fine-tune)

๐ŸŸข Leva-TTS (this model)

ุจูุฏูู‘ูŠ ุฃุญูƒูŠู„ูƒ ุนู† the meeting ุงู„ู„ูŠ ูƒุงู† ู…ู‡ู… ูƒุชูŠุฑ ุงู„ูŠูˆู… โ€” Mona (F)

Base XTTS-v2

Lahgtna v2 (Levantine fine-tune)

๐ŸŸข Leva-TTS (this model)

Pure Levantine Arabic

ูƒูŠููƒ ุงู„ูŠูˆู…ุŸ ุฅู†ุช ุดูˆ ุนู… ุชุนู…ู„ ู‡ูŽู„ูŽู‘ู‚ุŸ โ€” Badr (M)

Base XTTS-v2

Lahgtna v2 (Levantine fine-tune)

๐ŸŸข Leva-TTS (this model)

ู‡ูŽู„ูŽู‘ู‚ ุฑุญ ุฃุฑูˆุญ ุนู„ู‰ ุงู„ุจูŠุช ูˆุจูƒุฑุง ุจุฑุฌุน โ€” Amina (F)

Base XTTS-v2

Lahgtna v2 (Levantine fine-tune)

๐ŸŸข Leva-TTS (this model)

ุดูˆ ุฑุฃูŠูƒ ู†ุทู„ุน ู†ุชู…ุดู‰ ุดูˆูŠ ุจุนุฏ ุงู„ุดุบู„ ุฅุฐุง ุงู„ุฌูˆ ูƒุงู† ู…ู†ูŠุญุŸ โ€” Rami (M)

Base XTTS-v2

Lahgtna v2 (Levantine fine-tune)

๐ŸŸข Leva-TTS (this model)

๐Ÿ‡ฌ๐Ÿ‡ง Pure English

Hello, how are you doing today? โ€” Lamyaa (F)

Base XTTS-v2

Lahgtna v2 (Levantine fine-tune)

๐ŸŸข Leva-TTS (this model)

The project deadline is next Friday. โ€” Mohamed (M)

Base XTTS-v2

Lahgtna v2 (Levantine fine-tune)

๐ŸŸข Leva-TTS (this model)


๐Ÿ“Š Evaluation

Speaker Mohamed ยท NVIDIA H100 ยท Whisper large-v3 ASR round-trip ยท UTMOS (reference-free MOS).

Metric Value
Peak VRAM (inference) 2.13 GB
RTF p50 / p95 0.36 / 0.53
TTFA p50 / p95 (batch) 1194 / 1743 ms
TTFA streaming (first chunk) ~565 ms
CER (mean) 0.255
WER (mean) 0.496
UTMOS 3.13 / 5.0
Category CER โ†“ WER โ†“ UTMOS โ†‘
Pure English 0.144 0.190 3.35
Pure Levantine Arabic 0.236 0.544 2.97
Code-Switching 0.330 0.602 3.19

An optimized inference path (TF32 + torch.compile on the GPT) lowers RTF p95 by ~6% and TTFA while slightly improving UTMOS (3.24). See the repo's scripts/evaluate.py --optimize.


๐Ÿ—๏ธ How it was built

  1. Text collection โ€” 50K Levantine / code-switching / English sentences.
  2. Synthesis โ€” audio generated with Lahgtna-OmniVoice v2 (apc language code).
  3. Data prep โ€” 24 kHz, paired with a Levantine text front-end (number/date/ currency verbalization, partial diacritics on homographs, dialect lexicon).
  4. Fine-tuning โ€” XTTS-v2 GPT fine-tuned on the synthetic corpus.

A text front-end runs before synthesis (enabled via preprocess_text=True): language-aware normalization of numbers, floats, dates, times, currency, percentages, URLs, emails, phone numbers and codes, plus partial diacritics and a Levantine lexicon.


โš ๏ธ Limitations & intended use

  • Optimized for Levantine dialect + English code-switching; other Arabic dialects (Egyptian, Gulf, MSA) are out of distribution.
  • Trained on synthetic speech โ€” voices reflect the Lahgtna v2 generator.
  • License CC-BY-NC-4.0 (inherited from XTTS-v2): research / non-commercial use.

๐Ÿ“œ Citation

@software{leva_tts_2026,
  author = {Mohammed Aly},
  title  = {Leva-TTS: Low-Latency Code-Switching TTS for Levantine Arabic and English},
  year   = {2026},
  url    = {https://github.com/MohammedAly22/Leva-TTS}
}

Built on Coqui XTTS-v2 and Lahgtna-OmniVoice v2.

Downloads last month
123
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mohammedaly22/leva-tts

Base model

coqui/XTTS-v2
Finetuned
(67)
this model

Dataset used to train mohammedaly22/leva-tts

Space using mohammedaly22/leva-tts 1

Collection including mohammedaly22/leva-tts