Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Multilingual UnigramLM

company
https://cimeister.github.io/blog/unigramlm/
Activity Feed

AI & ML interests

Multilingual Tokenization

Recent Activity

TheRootOf3  updated a dataset 1 day ago
MultilingualUnigramLM/FineWeb2-100M-gpt2-toks
TheRootOf3  published a dataset 1 day ago
MultilingualUnigramLM/FineWeb2-100M-gpt2-toks
TheRootOf3  updated a model 2 days ago
MultilingualUnigramLM/las-tokenizers-Olmo-3-1025-7B-som
View all activity

Suchir Salhan's profile picture Clara Meister's profile picture Pietro Lesci's profile picture Andrzej Szablewski's profile picture

MultilingualUnigramLM 's datasets 4

MultilingualUnigramLM/FineWeb2-100M-gpt2-toks

Viewer • Updated 1 day ago • 1.13M • 53

MultilingualUnigramLM/FineWeb2-10M

Viewer • Updated Jan 20 • 228k • 65

MultilingualUnigramLM/FineWeb2-5M

Viewer • Updated Jan 20 • 113k • 32

MultilingualUnigramLM/FineWeb2-10K

Viewer • Updated Jan 18 • 1.14M • 114
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs