This collection contains held-out splits for testing Flow-Judge-v0.1.
Flow AI
company
Verified
AI & ML interests
LLM system evaluation, Automatic LM improvements
Organization Card
Flow AI is the system for evaluating and improving your LLM application.
models 7
flowaicom/Flow-Judge-v0.1-W8A16
1B • Updated • 1 • 1
flowaicom/Flow-Judge-v0.1-W4A16
0.7B • Updated • 5 • 1
flowaicom/Flow-Judge-v0.1-FP8
4B • Updated • 5 • 1
flowaicom/Flow-Judge-v0.1-AWQ
Text Generation • 4B • Updated • 6.13k • 6
flowaicom/Flow-Judge-v0.1
Text Generation • 4B • Updated • 930 • 70
flowaicom/Flow-Judge-v0.1-Llamafile
Updated • 11 • 1
flowaicom/Flow-Judge-v0.1-GGUF
Text Generation • 4B • Updated • 67 • 10
datasets 9
flowaicom/legalbench_contracts_qa_subset
Viewer • Updated • 100 • 34
flowaicom/Flow-Judge-v0.1-3-likert-heldout
Viewer • Updated • 300 • 9
flowaicom/Flow-Judge-v0.1-5-likert-heldout
Viewer • Updated • 274 • 14
flowaicom/Flow-Judge-v0.1-binary-heldout
Viewer • Updated • 316 • 25
flowaicom/RAGTruth_test
Viewer • Updated • 2.7k • 23 • 1
flowaicom/covid_qa
Viewer • Updated • 1k • 8
flowaicom/PubMedQA
Viewer • Updated • 1k • 15 • 1
flowaicom/HaluEval
Viewer • Updated • 10k • 212 • 1
flowaicom/Feedback-Bench
Viewer • Updated • 1k • 33 • 1