Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay
Abstract
Continual Instruction Tuning enables effective fine-tuning of large language models for low-resource language translation, achieving superior performance compared to standard instruction tuning and multilingual models.
Large Language Models (LLMs) offer new potential for translation tasks but often experience performance degradation when handling low-resource languages. To address this limitation, we propose an approach for fine-tuning LLMs on a low-resource language, Kupang Malay. Our approach involves designing a set of instructions by leveraging explicit lexical and semantic features from a bilingual dictionary, and introducing Continual Instruction Tuning (CIT), a training paradigm that enables iterative instruction-based training. Experimental results demonstrate that our model, named Lius, yields notable improvements over standard instruction-tuned models by outperforming 4-6 points, and surpassing both Neural Machine Translation (NMT) and Multilingual LLM models by 10-13 points on several evaluation metrics. These findings highlight the potential of our approach to mitigate the reliance on large-scale parallel data in low-resource language translation.
Community
We introduce Lius, an Indonesian → Kupang Malay translation model designed for low-resource machine translation.
Kupang Malay is a Malay-based creole spoken in East Nusa Tenggara, Indonesia, but it remains underrepresented in current NLP resources and commercial MT systems. In this work, we propose Instructional Linguistic, a linguistically informed instruction design strategy, and Continual Instruction Tuning (CIT), where the model is trained iteratively with multiple instruction types for the same translation target.
Our approach uses four instruction families: context-based, semantic mapping-based, phonetic-based, and list-group-label-based prompts. We train three Cendol-mT5 variants: small, base, and large. The best model, Lius-Large-MT, improves over standard instruction tuning and outperforms several multilingual LLM and NMT baselines on Indonesian → Kupang Malay translation.
Models are available on Hugging Face:
- https://huggingface.co/joanitolopo/lius-cendol-large-inst-mt
- https://huggingface.co/joanitolopo/lius-cendol-base-inst-mt
- https://huggingface.co/joanitolopo/lius-cendol-small-inst-mt
Code:
https://github.com/joanitolopo/instructional-linguistic-llm
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Beyond Bilingual Transfer: Multilingual Code-Switching in Instruction Tuning (2026)
- AFRILANGTUTOR: Advancing Language Tutoring and Culture Education in Low-Resource Languages with Large Language Models (2026)
- Backtranslation Augmented Direct Preference Optimization for Neural Machine Translation (2026)
- Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs (2026)
- Culture-Aware Machine Translation in Large Language Models: Benchmarking and Investigation (2026)
- Why Low-Resource NLP Needs More Than Cross-Lingual Transfer: Lessons Learned from Luxembourgish (2026)
- TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2606.11786 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 3
joanitolopo/lius-cendol-large-inst-mt
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper