You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Revolab VITS β€” Multi-Speaker Bahasa Melayu TTS

Piper TTS voice models trained on Revolab Malay speech datasets using VITS architecture.

Speakers (56 trained)

Production Quality (CER < 10%)

Speaker ID Name Samples CER WER
sarah sarah 27,792 0.0402 0.1856
pendakwah pendakwah 46,222 0.0619 0.1847
pendakwah_teknologi Pendakwah Teknologi 46,222 0.0619 0.1847
paan Paan 27,434 0.0630 0.1759
anwar AI 100 0.0732 0.2101

Usable Quality (CER 50-65%)

Speaker ID Name Samples CER WER
G0095_0001 G0095 12,317 0.5382 0.7685
1614-NorinaYahya Noriya 27,962 0.5490 0.8351
8-enhanced-v2 Sarah 27,991 0.6084 0.9322
Iqbal_25 Iqbal 11,077 0.6107 0.8428
JackLim_1 Jacky 9,143 0.6390 0.9050

Low Quality (CER > 75%)

Speaker ID Name Samples CER WER
Angry_2 Pearl (Angry 2) 27,999 0.7556 0.9965
cv_971b3c0c6dbd cv_971b3c0c6dbd 6,632 0.7557 0.9845
Surprise Surprise 5,986 0.7604 0.9998
G0068_0003 G0068 12,657 0.7670 0.9976
G0004_0012 G0004 12,656 0.7756 1.0228
G0016_0001 G0016 12,694 0.7760 1.0000
cv_06a7ed020ffa cv_06a7ed020ffa 7,050 0.7771 0.9975
Dania_22 Dania 13,558 0.7777 0.9869
2394-Digi-Mohon-Share-Amel-BM Amel 29,273 0.7784 1.0086
cv_0c744927a83d cv_0c744927a83d 7,461 0.7786 0.9988
cv_e6ca744ffbb2 cv_e6ca744ffbb2 6,604 0.7813 0.9869
cv_1a952f7ad4bd cv_1a952f7ad4bd 6,587 0.7829 1.0155
berani_buat berani_buat 8,657 0.7853 0.9967
Happy Pearl (Happy) 12,317 0.7854 1.0000
Sad Pearl (Sad) 7,202 0.7855 0.9907
k-lao-ZH k lao ZH 5,533 0.7875 1.0042
cv_a2b722b2d474 cv_a2b722b2d474 5,805 0.7887 0.9907
Sabil_2 Kamal 8,276 0.7918 1.0093
cv_51062240a78a cv_51062240a78a 6,929 0.7944 1.0170
mych_male_yt_13887 mych male yt 13887 5,967 0.7954 1.0098
sugumaran_6 sugumaran 6 5,630 0.7969 1.0052
Alvin_6 Alvin 27,996 0.7989 1.0199
cv_9d9664719e9d cv_9d9664719e9d 7,020 0.8016 1.0319
Angry_1 Pearl (Angry 1) 27,849 0.8026 0.9905
cv_bd96fb646e95 cv_bd96fb646e95 6,552 0.8061 0.9896
cv_c0b0ce94caa2 cv_c0b0ce94caa2 7,053 0.8104 1.0093
niketh_23 niketh 23 5,767 0.8177 1.0074
angellyn angellyn 6,096 0.8185 1.0000
hendrick-22 Hendrick 6,774 0.8185 0.9880
Farid_596 Faiz 5,216 0.8574 0.9966

Non-Malay Speakers (no eval)

Speaker ID Name Samples Language
Akari_Kitou Akari Kitou 6,772 Japanese
Aoi_Yuuki Aoi Yuuki 6,772 Japanese
Ayana_Taketatsu Ayana Taketatsu 10,970 Japanese
Celine-ZH Celine Zh 10,967 Chinese
ChangYong Changyong 16,195 Chinese
Eri_Kitamura Eri Kitamura 5,209 Japanese
Haruka_Tomatsu Haruka Tomatsu 6,758 Japanese
Kana_Hanazawa Kana Hanazawa 8,778 Japanese
Maaya_Uchida Maaya Uchida 8,368 Japanese
Nao_Touyama Nao Touyama 8,400 Japanese
Rie_Kugimiya Rie Kugimiya 15,001 Japanese
Rie_Takahashi Rie Takahashi 13,534 Japanese
Rina_Satou Rina Satou 12,969 Japanese
Sora_Amamiya Sora Amamiya 6,901 Japanese

Evaluation Methodology

Evaluated using WhisperX large-v3 transcription + revo-norm text normalization on 36 sample texts covering:

  • Digits/currency (RM10.50, 03-8888 9999)
  • Malaysian locations (Jalan Ampang, Pulau Pinang)
  • Names β€” Malay/Chinese/Indian (Encik Ahmad, Lim Wei Jie, Rajesh Kumar)
  • English code-switching (Meeting kita pukul 3 petang)
  • Dates/time (30hb Jun 2024, pukul 8 pagi)
  • Addresses (25, Jalan Taman Indah 3/4, 43000 Kajang)
  • Financial terms (Baki akaun RM5,670.23)
  • General conversation

CER = Character Error Rate (lower is better). WER = Word Error Rate.

Full per-speaker results: eval_all/<speaker>_eval.csv

Install

pip install piper-tts
pip install git+https://github.com/khursanirevo/revo-norm.git

Quick Start

from piper import PiperVoice
from revo_norm import normalize_text

raw = "Harga RM10.50 sahaja"
text = normalize_text(raw, language="ms")

voice = PiperVoice.load("hf_repo/speakers/sarah/model.onnx",
                         config_path="hf_repo/speakers/sarah/model.onnx.json")
audio = voice.synthesize(text)

Text Normalization

revo-norm handles Malay-specific normalization:

  • Numbers β†’ words ("115" β†’ "seratus lima belas")
  • Currency ("RM10.50" β†’ "sepuluh ringgit lima puluh sen")
  • Dates, phone numbers, percentages, temperatures, etc.

Structure

speakers.json              # Speaker registry with eval metrics
speakers/
  <name>/
    model.onnx             # ONNX export for inference
    model.onnx.json        # Phoneme config

Performance (CPU)

Metric Value
Avg latency ~54ms
Avg RTF 0.030
Speed 33.6x realtime

Training

All models trained with:

  • Architecture: VITS
  • Phonemizer: espeak-ng (ms voice)
  • Sample rate: 22050Hz
  • GPU: NVIDIA H200 NVL
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support