Robust Quantizer from HuBERT Base (Layer 6)
This model checkpoint contains a Robust Quantizer trained on top of the 6th layer of the hubert-base-ls960 model. It was developed as part of a reproduction and evaluation study on creating robust discrete speech units, originally proposed in Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling (Gat et al., 2023).
Model Details
This quantizer was trained to provide discrete pseudo-labels that are resilient to various acoustic perturbations. By applying data augmentations during the quantization process, the resulting discrete units become, and by extension downstream acoustic models, more robust to noise and varying acoustic conditions.
- Base Model: facebook/hubert-base-ls960
- Layer: 6
- Vocabulary Size (Clusters): 100, 200, 500
- Algorithm: K-Means
- Dataset: LibriSpeech (
train-clean-100)
Usage
Download the Model
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(repo_id="iliasslasri/robust_speech_quantizer",
filename="500_vocab_size/round_1/E1_best.pt",
force_download=True)
config_path = hf_hub_download(repo_id="iliasslasri/robust_speech_quantizer",
filename="500_vocab_size/config.yaml",
force_download=True)
Augmentation Examples
Here are examples of the data augmentations applied to the audio during the training of the quantizer:
| Augmentation | Audio Example |
|---|---|
| Clean | |
| Time Stretch | |
| Pitch Shift | |
| Reverberation | |
| Noise | |
| Echo | |
| Random Noise | |
| Pink Noise | |
| Lowpass Filter | |
| Highpass Filter | |
| Bandpass Filter | |
| Smooth | |
| Boost Audio | |
| Duck Audio | |
| Up-Down Resample |
Relevant Links
- Original Paper: Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling (Gat et al., 2023)
- Project Repository: github
Model tree for iliasslasri/robust_speech_quantizer
Base model
facebook/hubert-base-ls960