You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Piper TTS — Vietnamese Multi-Speaker Voice Models

Bộ 14 mô hình giọng nói tiếng Việt được fine-tune từ Piper TTS (VITS medium), hỗ trợ tổng hợp giọng nói (TTS) với chất lượng cao, tốc độ nhanh, chạy local trên CPU/GPU.

Tổng quan

Mục	Chi tiết
Base model	piper1-gpl medium (VITS)
Base checkpoint	`vi_VN/vais1000/medium` — epoch 4769, step 919580
Sample rate	22050 Hz
Ngôn ngữ	Tiếng Việt (code-switched Vi-En)
Số giọng	14 speakers
Định dạng	`.ckpt` (PyTorch Lightning) + `.onnx` (ONNX export)
License	MIT

Danh sách giọng nói

#	Speaker	.ckpt	.onnx	Sample (.mp3)
1	Bảo Hân	✅	✅	—
2	Châu Anh	—	✅	✅
3	Chi Nguyễn	—	—	✅
4	Gia Hiếu	✅	✅	✅
5	Lọ Lem	✅	✅	✅
6	Mai Linh	✅	✅	✅
7	Nguyễn Hiếu	—	—	✅
8	Nhật	—	✅	✅
9	Quốc Khánh	✅	✅	✅
10	Quỳnh Trang	✅	✅	✅
11	Thanh Duy	—	—	✅
12	Thu Minh	—	—	✅
13	Tường Vy	✅	✅	✅
14	Yến Nhi	✅	✅	✅

6 giọng đầy đủ (.ckpt + .onnx + .mp3): Gia Hiếu, Lọ Lem, Quốc Khánh, Quỳnh Trang, Tường Vy, Yến Nhi
Các giọng khác: đang trong quá trình train hoặc mới có sample

🎧 Nghe thử (Audio Samples)

Châu Anh

Chi Nguyễn

Gia Hiếu

Lọ Lem

Mai Linh

Nguyễn Hiếu

Nhật

Quốc Khánh

Quỳnh Trang

Thanh Duy

Thu Minh

Tường Vy

Yến Nhi

Dataset

Nguồn: quangdung/ly-tts-dataset trên Hugging Face
Kích thước: 2,834 câu tiếng Việt (có xen kẽ tiếng Anh code-switched)
Format: audio WAV 22050 Hz + text transcript
Mỗi giọng: ~100 giờ audio

Cài đặt

Yêu cầu hệ thống

Python 3.10+
espeak-ng (cho Vietnamese phonemization)
CUDA (tùy chọn, cho GPU inference/training)

Cài đặt dependencies

# Cài espeak-ng
sudo apt-get install -y espeak-ng espeak-ng-data libespeak-ng-dev

# Cài piper-tts
pip install piper-tts

# HOẶC: Cài từ source piper1-gpl (cần cho training)
git clone https://github.com/OHF-Voice/piper1-gpl.git
cd piper1-gpl
pip install -e ".[train]"
./build_monotonic_align.sh

Cách sử dụng

1. Tải model từ Hugging Face

from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="quangdung/Piper_checkpoint",
    local_dir="./piper-vn-models"
)

2. Inference bằng Python API

import wave
from piper import PiperVoice, SynthesisConfig

# Load model ONNX
voice = PiperVoice.load(
    model_path="./piper-vn-models/Yến Nhi/yennhi.onnx",
    config_path="./piper-vn-models/Yến Nhi/yennhi.onnx.json",  # cần config.json đi kèm
    use_cuda=True  # bật GPU nếu có
)

# Cấu hình giọng đọc
config = SynthesisConfig(
    volume=1.0,          # âm lượng (0-1)
    length_scale=1.0,    # tốc độ nói (>1: chậm, <1: nhanh)
    noise_scale=0.667,   # độ biến thiên giọng
    noise_w_scale=0.8,   # độ biến thiên cảm xúc
    normalize_audio=False,
)

# Tổng hợp giọng nói
with wave.open("output.wav", "wb") as wav_file:
    voice.synthesize_wav(
        "Xin chào các bạn, mình là trợ lý ảo tiếng Việt.",
        wav_file,
        syn_config=config
    )

3. Inference bằng CLI

echo "Xin chào các bạn" | piper \
  --model "Yến Nhi/yennhi.onnx" \
  --config "Yến Nhi/yennhi.onnx.json" \
  --output_file output.wav

4. Nghe sample

Xem phần 🎧 Nghe thử ở trên để nghe demo từng giọng trước khi chọn model.

Huấn luyện (Training)

Chuẩn bị dataset

from datasets import load_dataset
import numpy as np
import soundfile as sf
import os

dataset = load_dataset("quangdung/ly-tts-dataset")

os.makedirs("formatted_dataset/wavs", exist_ok=True)
with open("formatted_dataset/metadata.csv", "w", encoding="utf-8") as f:
    for i, sample in enumerate(dataset["train"]):
        audio = np.frombuffer(sample["audio_bytes"], dtype=np.float32)
        sf.write(f"formatted_dataset/wavs/audio{i+1}.wav", audio, 22050)
        f.write(f"audio{i+1}.wav|{sample['text'].strip()}\n")

Train từ base checkpoint

python3 -m piper.train fit \
  --data.voice_name "ten_giong" \
  --data.csv_path "formatted_dataset/metadata.csv" \
  --data.audio_dir "formatted_dataset/wavs" \
  --model.sample_rate 22050 \
  --data.espeak_voice "vi" \
  --data.cache_dir CACHE_DIR \
  --data.config_path "config.json" \
  --data.batch_size 16 \
  --ckpt_path "pretrained-weights-only.ckpt" \
  --trainer.max_epochs 1000 \
  --trainer.callbacks+=ModelCheckpoint \
  --trainer.callbacks.every_n_epochs=20 \
  --trainer.callbacks.save_top_k=-1 \
  --trainer.callbacks.dirpath="checkpoints" \
  --trainer.callbacks.filename="ten_giong-{epoch:04d}"

Export ONNX

python3 -m piper.train.export_onnx \
  --checkpoint "checkpoints/ten_giong-epoch=0019.ckpt" \
  --output_file "ten_giong.onnx"

Cấu trúc thư mục

Piper_checkpoint/
├── README.md
├── text.txt                          # Training text corpus
├── giong1.mp3                        # Reference voice sample
├── Bảo Hân/
│   ├── baohan-epoch=0019.ckpt
│   └── baohan250626.onnx
├── Gia Hiếu/
│   ├── giahieu-epoch=0019.ckpt
│   ├── giahieu.onnx
│   └── giahieu.mp3
├── ...                               # (các giọng khác tương tự)
└── Yến Nhi/
    ├── yennhi-epoch=0019.ckpt
    ├── yennhi.onnx
    └── yennhi.mp3

Credits

Piper TTS: OHF-Voice/piper1-gpl — Open Home Foundation
Base checkpoint: rhasspy/piper-checkpoints
Dataset: quangdung/ly-tts-dataset
Training: Fine-tuned trên Google Colab với GPU T4/A100

License

MIT License — xem file LICENSE (nếu có) hoặc tự do sử dụng cho mục đích cá nhân và thương mại.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for quangdung/Piper_checkpoint

Quantizations

1 model

quangdung
/

Piper_checkpoint