YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Bridging the Digital Divide for African AI

Voice of a Continent is a comprehensive open-source ecosystem designed to bring African languages to the forefront of artificial intelligence. By providing a unified suite of benchmarking tools and state-of-the-art models, we ensure that the future of speech technology is inclusive, representative, and accessible to over a billion people.

Best-in-Class Multilingual Models

Introduced in our EMNLP 2025 paper Voice of a Continent, the Simba Series represents the current state-of-the-art for African speech AI.

Unified Suite: Models optimized for African languages.
Superior Accuracy: Outperforms generic multilingual models by leveraging SimbaBench's high-quality, domain-diverse datasets.
Multitask Capability: Designed for high performance in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech).
Inclusion-First: Specifically built to mitigate the "digital divide" by empowering speakers of underrepresented languages.

The Simba family consists of state-of-the-art models fine-tuned using SimbaBench. These models achieve superior performance by leveraging dataset quality, domain diversity, and language family relationships.

🗣️✍️ Simba-ASR

The New Standard for African Speech-to-Text

🎯 Task Automatic Speech Recognition — Powering high-accuracy transcription across the continent.

🌍 Language Coverage (43 African languages)

Amharic (amh), Arabic (ara), Asante Twi (asanti), Bambara (bam), Baoulé (bau), Bemba (bem), Ewe (ewe), Fanti (fat), Fon (fon), French (fra), Ganda (lug), Hausa (hau), Igbo (ibo), Kabiye (kab), Kinyarwanda (kin), Kongo (kon), Lingala (lin), Luba-Katanga (lub), Luo (luo), Malagasy (mlg), Mossi (mos), Northern Sotho (nso), Nyanja (nya), Oromo (orm), Portuguese (por), Shona (sna), Somali (som), Southern Sotho (sot), Swahili (swa), Swati (ssw), Tigrinya (tir), Tsonga (tso), Tswana (tsn), Twi (twi), Umbundu (umb), Venda (ven), Wolof (wol), Xhosa (xho), Yoruba (yor), Zulu (zul), Tamazight (tzm), Sango (sag), Dinka (din).

🏗️ Base Architectures

Simba-S (SeamlessM4T-v2-MT) — Top Performer
Simba-W (Whisper-v3-large)
Simba-X (Wav2Vec2-XLS-R-2b)
Simba-M (MMS-1b-all)
Simba-H (AfriHuBERT)

ASR Models	Architecture	🤗 Hugging Face Model Card	Status
🔥Simba-S🔥	SeamlessM4T-v2	🤗 https://huggingface.co/UBC-NLP/Simba-S	✅ Released
🔥Simba-W🔥	Whisper	🤗 https://huggingface.co/UBC-NLP/Simba-W	✅ Released
🔥Simba-X🔥	Wav2Vec2	🤗 https://huggingface.co/UBC-NLP/Simba-X	✅ Released
🔥Simba-M🔥	MMS	🤗 https://huggingface.co/UBC-NLP/Simba-M	✅ Released
🔥Simba-H🔥	HuBERT	🤗 https://huggingface.co/UBC-NLP/Simba-H	✅ Released

Simba-S (based on SeamlessM4T-v2-MT) emerged as the best-performing ASR model overall.

🧩 Usage Example

You can easily run inference using the Hugging Face transformers library.

from transformers import pipeline

# Load Simba-S for ASR
asr_pipeline = pipeline(
    "automatic-speech-recognition",
    model="UBC-NLP/Simba-S" #Simba mdoels `UBC-NLP/Simba-S`, `UBC-NLP/Simba-W`, `UBC-NLP/Simba-X`, `UBC-NLP/Simba-H`, `UBC-NLP/Simba-M`
)

asr_pipeline.model.load_adapter("multilingual_african")  # Only for  `UBC-NLP/Simba-M`

# Transcribe audio from file
result = asr_pipeline("https://africa.dlnlp.ai/simba/audio/afr_Lwazi_afr_test_idx3889.wav")
print(result["text"])


# Transcribe audio from audio array
result = asr_pipeline({
    "array": audio_array,
    "sampling_rate": 16_000
})
print(result["text"])

Get started with Simba models in minutes using our interactive Colab notebook:

Citation

If you use the Simba models or SimbaBench benchmark for your scientific publication, or if you find the resources in this website useful, please cite our paper.


@inproceedings{elmadany-etal-2025-voice,
    title = "Voice of a Continent: Mapping {A}frica{'}s Speech Technology Frontier",
    author = "Elmadany, AbdelRahim A.  and
      Kwon, Sang Yun  and
      Toyin, Hawau Olamide  and
      Alcoba Inciarte, Alcides  and
      Aldarmaki, Hanan  and
      Abdul-Mageed, Muhammad",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.emnlp-main.559/",
    doi = "10.18653/v1/2025.emnlp-main.559",
    pages = "11039--11061",
    ISBN = "979-8-89176-332-6",
}

Downloads last month: 2

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including UBC-NLP/Simba-M

Simba Speech Series

Collection

Simba bridges the digital divide with a unified suite for African AI: the largest open-source speech benchmark and models covering 61 languages • 2 items • Updated 2 days ago