undertheseanlp/UDD-v0.1
Viewer • Updated • 3k • 79
A Conditional Random Field (CRF) based Part-of-Speech tagger for Vietnamese, trained on the Universal Dependencies Dataset (UDD-v0.1).
This model uses CRF (Conditional Random Fields) with handcrafted features inspired by the underthesea NLP library. It achieves high accuracy on Vietnamese POS tagging tasks.
The model uses the following feature templates:
import requests
API_URL = "https://api-inference.huggingface.co/models/undertheseanlp/tre-1"
headers = {"Authorization": "Bearer YOUR_TOKEN"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
output = query({"inputs": "Tôi yêu Việt Nam"})
print(output)
# [{"token": "Tôi", "tag": "PRON"}, {"token": "yêu", "tag": "VERB"}, ...]
import pycrfsuite
from handler import EndpointHandler
handler = EndpointHandler(path="./")
result = handler({"inputs": "Tôi yêu Việt Nam"})
print(result)
The model was trained using:
Evaluated on a held-out test set from UDD-v0.1:
If you use this model, please cite:
@misc{tre1-pos-tagger,
author = {undertheseanlp},
title = {Vietnamese POS Tagger TRE-1},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/undertheseanlp/tre-1}
}
Apache 2.0