YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Revolab VITS β Multi-Speaker Bahasa Melayu TTS
Piper TTS voice models trained on Revolab Malay speech datasets using VITS architecture.
Speakers (56 trained)
Production Quality (CER < 10%)
| Speaker ID | Name | Samples | CER | WER |
|---|---|---|---|---|
| sarah | sarah | 27,792 | 0.0402 | 0.1856 |
| pendakwah | pendakwah | 46,222 | 0.0619 | 0.1847 |
| pendakwah_teknologi | Pendakwah Teknologi | 46,222 | 0.0619 | 0.1847 |
| paan | Paan | 27,434 | 0.0630 | 0.1759 |
| anwar | AI | 100 | 0.0732 | 0.2101 |
Usable Quality (CER 50-65%)
| Speaker ID | Name | Samples | CER | WER |
|---|---|---|---|---|
| G0095_0001 | G0095 | 12,317 | 0.5382 | 0.7685 |
| 1614-NorinaYahya | Noriya | 27,962 | 0.5490 | 0.8351 |
| 8-enhanced-v2 | Sarah | 27,991 | 0.6084 | 0.9322 |
| Iqbal_25 | Iqbal | 11,077 | 0.6107 | 0.8428 |
| JackLim_1 | Jacky | 9,143 | 0.6390 | 0.9050 |
Low Quality (CER > 75%)
| Speaker ID | Name | Samples | CER | WER |
|---|---|---|---|---|
| Angry_2 | Pearl (Angry 2) | 27,999 | 0.7556 | 0.9965 |
| cv_971b3c0c6dbd | cv_971b3c0c6dbd | 6,632 | 0.7557 | 0.9845 |
| Surprise | Surprise | 5,986 | 0.7604 | 0.9998 |
| G0068_0003 | G0068 | 12,657 | 0.7670 | 0.9976 |
| G0004_0012 | G0004 | 12,656 | 0.7756 | 1.0228 |
| G0016_0001 | G0016 | 12,694 | 0.7760 | 1.0000 |
| cv_06a7ed020ffa | cv_06a7ed020ffa | 7,050 | 0.7771 | 0.9975 |
| Dania_22 | Dania | 13,558 | 0.7777 | 0.9869 |
| 2394-Digi-Mohon-Share-Amel-BM | Amel | 29,273 | 0.7784 | 1.0086 |
| cv_0c744927a83d | cv_0c744927a83d | 7,461 | 0.7786 | 0.9988 |
| cv_e6ca744ffbb2 | cv_e6ca744ffbb2 | 6,604 | 0.7813 | 0.9869 |
| cv_1a952f7ad4bd | cv_1a952f7ad4bd | 6,587 | 0.7829 | 1.0155 |
| berani_buat | berani_buat | 8,657 | 0.7853 | 0.9967 |
| Happy | Pearl (Happy) | 12,317 | 0.7854 | 1.0000 |
| Sad | Pearl (Sad) | 7,202 | 0.7855 | 0.9907 |
| k-lao-ZH | k lao ZH | 5,533 | 0.7875 | 1.0042 |
| cv_a2b722b2d474 | cv_a2b722b2d474 | 5,805 | 0.7887 | 0.9907 |
| Sabil_2 | Kamal | 8,276 | 0.7918 | 1.0093 |
| cv_51062240a78a | cv_51062240a78a | 6,929 | 0.7944 | 1.0170 |
| mych_male_yt_13887 | mych male yt 13887 | 5,967 | 0.7954 | 1.0098 |
| sugumaran_6 | sugumaran 6 | 5,630 | 0.7969 | 1.0052 |
| Alvin_6 | Alvin | 27,996 | 0.7989 | 1.0199 |
| cv_9d9664719e9d | cv_9d9664719e9d | 7,020 | 0.8016 | 1.0319 |
| Angry_1 | Pearl (Angry 1) | 27,849 | 0.8026 | 0.9905 |
| cv_bd96fb646e95 | cv_bd96fb646e95 | 6,552 | 0.8061 | 0.9896 |
| cv_c0b0ce94caa2 | cv_c0b0ce94caa2 | 7,053 | 0.8104 | 1.0093 |
| niketh_23 | niketh 23 | 5,767 | 0.8177 | 1.0074 |
| angellyn | angellyn | 6,096 | 0.8185 | 1.0000 |
| hendrick-22 | Hendrick | 6,774 | 0.8185 | 0.9880 |
| Farid_596 | Faiz | 5,216 | 0.8574 | 0.9966 |
Non-Malay Speakers (no eval)
| Speaker ID | Name | Samples | Language |
|---|---|---|---|
| Akari_Kitou | Akari Kitou | 6,772 | Japanese |
| Aoi_Yuuki | Aoi Yuuki | 6,772 | Japanese |
| Ayana_Taketatsu | Ayana Taketatsu | 10,970 | Japanese |
| Celine-ZH | Celine Zh | 10,967 | Chinese |
| ChangYong | Changyong | 16,195 | Chinese |
| Eri_Kitamura | Eri Kitamura | 5,209 | Japanese |
| Haruka_Tomatsu | Haruka Tomatsu | 6,758 | Japanese |
| Kana_Hanazawa | Kana Hanazawa | 8,778 | Japanese |
| Maaya_Uchida | Maaya Uchida | 8,368 | Japanese |
| Nao_Touyama | Nao Touyama | 8,400 | Japanese |
| Rie_Kugimiya | Rie Kugimiya | 15,001 | Japanese |
| Rie_Takahashi | Rie Takahashi | 13,534 | Japanese |
| Rina_Satou | Rina Satou | 12,969 | Japanese |
| Sora_Amamiya | Sora Amamiya | 6,901 | Japanese |
Evaluation Methodology
Evaluated using WhisperX large-v3 transcription + revo-norm text normalization on 36 sample texts covering:
- Digits/currency (RM10.50, 03-8888 9999)
- Malaysian locations (Jalan Ampang, Pulau Pinang)
- Names β Malay/Chinese/Indian (Encik Ahmad, Lim Wei Jie, Rajesh Kumar)
- English code-switching (Meeting kita pukul 3 petang)
- Dates/time (30hb Jun 2024, pukul 8 pagi)
- Addresses (25, Jalan Taman Indah 3/4, 43000 Kajang)
- Financial terms (Baki akaun RM5,670.23)
- General conversation
CER = Character Error Rate (lower is better). WER = Word Error Rate.
Full per-speaker results: eval_all/<speaker>_eval.csv
Install
pip install piper-tts
pip install git+https://github.com/khursanirevo/revo-norm.git
Quick Start
from piper import PiperVoice
from revo_norm import normalize_text
raw = "Harga RM10.50 sahaja"
text = normalize_text(raw, language="ms")
voice = PiperVoice.load("hf_repo/speakers/sarah/model.onnx",
config_path="hf_repo/speakers/sarah/model.onnx.json")
audio = voice.synthesize(text)
Text Normalization
revo-norm handles Malay-specific normalization:
- Numbers β words ("115" β "seratus lima belas")
- Currency ("RM10.50" β "sepuluh ringgit lima puluh sen")
- Dates, phone numbers, percentages, temperatures, etc.
Structure
speakers.json # Speaker registry with eval metrics
speakers/
<name>/
model.onnx # ONNX export for inference
model.onnx.json # Phoneme config
Performance (CPU)
| Metric | Value |
|---|---|
| Avg latency | ~54ms |
| Avg RTF | 0.030 |
| Speed | 33.6x realtime |
Training
All models trained with:
- Architecture: VITS
- Phonemizer: espeak-ng (
msvoice) - Sample rate: 22050Hz
- GPU: NVIDIA H200 NVL
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support