🏗️ Building on HF

Nathan Habib PRO

SaylorTwift

huggingface

·

AI & ML interests

Evals

Recent Activity

updated a dataset 3 days ago

SaylorTwift/llm-benchmark-usage

liked a model 3 days ago

ai-sage/GigaChat3.5-432B-A28B-base

liked a model 3 days ago

meituan-longcat/LongCat-2.0

View all activity

Organizations

buckets 4

SaylorTwift/harbor-jobs

SaylorTwift/open-banker-certs

SaylorTwift/deep-swe

SaylorTwift/reposcan

Posts 1

Post

2761

How do I test an LLM for my unique needs?
If you work in finance, law, or medicine, generic benchmarks are not enough.
This blog post uses Argilla, Distilllabel and 🌤️Lighteval to generate evaluation dataset and evaluate models.

https://github.com/argilla-io/argilla-cookbook/blob/main/domain-eval/README.md

Articles 14

Article

43

Featuring Every Eval Ever Results on Hugging Face Model Pages

View all Articles

Collections 8

View 8 collections

Papers 1

arxiv:2310.16944

spaces 25

LLM Benchmark Usage Explorer

Explore LLM benchmark usage and trends

Reposcan

Search and explore repo issues, PRs, and code

Qwen3 8B

Inspect and browse server log files

Qwen2.5 0.5B Instruct Evals

Inspect and view log files in a web interface

Meta Llama 3.1 8b Cb

Inspect and explore log files in a web view

Transformers CB

View and explore server logs in a web interface

models 4

SaylorTwift/SmolLM3-3B

Text Generation • 3B • Updated Jan 6 • 6

SaylorTwift/test

Updated Dec 16, 2025

SaylorTwift/gpt2_test

Text Generation • 0.1B • Updated Sep 23, 2024 • 159

SaylorTwift/xlm-roberta-base-finetuned-panx-fr

Updated Mar 13, 2023 • 1

datasets 58

SaylorTwift/llm-benchmark-usage

Viewer • Updated 3 days ago • 194 • 114

SaylorTwift/harbor-assets

Updated Jun 8 • 22

SaylorTwift/gemma4-blog-images

Viewer • Updated Apr 2 • 1 • 14

SaylorTwift/mteb-bitext-mining-aggregated

Viewer • Updated Apr 2 • 588k • 951

SaylorTwift/gsm8k-cb-llama31-8b-results

Viewer • Updated Mar 30 • 100 • 9

SaylorTwift/gsm8k-cb-results

Viewer • Updated Mar 30 • 100 • 17

SaylorTwift/aime-2026-qwen35-results

Viewer • Updated Mar 6 • 1 • 15

SaylorTwift/aime-2026-qwen25-72b-results

Viewer • Updated Mar 6 • 1 • 12

SaylorTwift/aime-2026-vllm-results

Viewer • Updated Mar 6 • 1 • 16

SaylorTwift/claude-sonnet-4-0

Updated Dec 15, 2025 • 27

View 58 datasets