AI & ML interests
None defined yet.
Recent Activity
Introduction
LLMTrad-IBE is a strategic research initiative dedicated to overcoming the digital divide affecting the minority Romance languages of the Iberian Peninsula. By leveraging state-of-the-art Natural Language Processing (NLP), we aim to ensure these languages are not left behind in the era of Artificial Intelligence.
This project is a key component of the AI-TraLow coordinated framework (AI-Driven Translation for Low-Resource Languages and Cultures), supported by the Spanish Ministry of Science, Innovation, and Universities (MCIU/AEI/10.13039/501100011033/FEDER, UE) under reference PID2024-158157OB-C33.
Mission and Scope
Our research focuses on the development, adaptation, and evaluation of Large Language Models (LLMs) for four specific linguistic varieties characterized by limited digital resources:
- Asturian
- Aragonese
- Aranese
- Eonavian
Strategic Research Areas
We employ a hybrid methodology that integrates the structural precision of symbolic systems with the generative power of neural architectures:
- LLM Specialization: Fine-tuning decoder-only architectures and exploring parameter-efficient strategies (PEFT) for translation.
- Knowledge Distillation: Developing compact and efficient models to facilitate sustainable deployment in standard computing environments.
- Resource Synthesis: Expanding Apertium-based lexical resources and curating high-quality benchmarks, including FLORES+ and NTREX adaptations.
- Ethical AI: Implementing rigorous evaluation frameworks to detect and mitigate gender bias and ensure linguistic authenticity.
Collaborative Network
LLMTrad-IBE thrives on the synergy between leading academic institutions:
- Universitat Oberta de Catalunya (UOC) — Coordinating Institution
- Universitat Autònoma de Barcelona (UAB)
- Universidad de Oviedo
- Universidad de Zaragoza
Commitment to Open Science
As part of our commitment to the scientific community and linguistic heritage, all models, datasets, and tools developed within this project are released under permissive open-source licenses.