Large Language Models for Translating Low Resource Languages of the Iberian Peninsula

university

https://llmtrad-ibe.github.io/

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

aoliverg updated a Space 3 days ago

LLMTrad-IBE/README

aoliverg published a Space 3 days ago

LLMTrad-IBE/README

View all activity

Organization Card

Community About org cards

Introduction

LLMTrad-IBE is a strategic research initiative dedicated to overcoming the digital divide affecting the minority Romance languages of the Iberian Peninsula. By leveraging state-of-the-art Natural Language Processing (NLP), we aim to ensure these languages are not left behind in the era of Artificial Intelligence.

This project is a key component of the AI-TraLow coordinated framework (AI-Driven Translation for Low-Resource Languages and Cultures), supported by the Spanish Ministry of Science, Innovation, and Universities (MCIU/AEI/10.13039/501100011033/FEDER, UE) under reference PID2024-158157OB-C33.

Mission and Scope

Our research focuses on the development, adaptation, and evaluation of Large Language Models (LLMs) for four specific linguistic varieties characterized by limited digital resources:

Asturian
Aragonese
Aranese
Eonavian

Strategic Research Areas

We employ a hybrid methodology that integrates the structural precision of symbolic systems with the generative power of neural architectures:

LLM Specialization: Fine-tuning decoder-only architectures and exploring parameter-efficient strategies (PEFT) for translation.
Knowledge Distillation: Developing compact and efficient models to facilitate sustainable deployment in standard computing environments.
Resource Synthesis: Expanding Apertium-based lexical resources and curating high-quality benchmarks, including FLORES+ and NTREX adaptations.
Ethical AI: Implementing rigorous evaluation frameworks to detect and mitigate gender bias and ensure linguistic authenticity.

Collaborative Network

LLMTrad-IBE thrives on the synergy between leading academic institutions:

Universitat Oberta de Catalunya (UOC) — Coordinating Institution
Universitat Autònoma de Barcelona (UAB)
Universidad de Oviedo
Universidad de Zaragoza

Commitment to Open Science

As part of our commitment to the scientific community and linguistic heritage, all models, datasets, and tools developed within this project are released under permissive open-source licenses.

models 0

None public yet

datasets 0