Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
15
3
2
Tommaso Cerruti
Cerru02
Follow
razvan's profile picture
Ravel7524's profile picture
FaridMurzone's profile picture
4 followers
·
18 following
https://tommasocerruti.github.io/
tommasocerruti
tommasocerruti
AI & ML interests
AI safety and evaluation
Recent Activity
new
activity
18 days ago
evaleval/EEE_datastore:
Fix LLM Stats provenance relationships
upvoted
an
article
23 days ago
Safety Evals Should Project Test-Time Compute
published
an
article
23 days ago
Safety Evals Should Project Test-Time Compute
View all activity
Organizations
Cerru02
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
New activity in
evaleval/EEE_datastore
18 days ago
Fix LLM Stats provenance relationships
2
#137 opened 18 days ago by
Cerru02
upvoted
an
article
23 days ago
view article
Article
Safety Evals Should Project Test-Time Compute
Cerru02
•
23 days ago
•
5
published
an
article
23 days ago
view article
Article
Safety Evals Should Project Test-Time Compute
Cerru02
•
23 days ago
•
5
New activity in
evaleval/EEE_datastore
about 1 month ago
Update HELM to schema version v0.2.2
2
#121 opened about 1 month ago by
yifanmai
authored
a paper
about 1 month ago
CocoaBench: Evaluating Unified Digital Agents in the Wild
Paper
•
2604.11201
•
Published
Apr 13
•
37
upvoted
an
article
about 1 month ago
view article
Article
AI evals are becoming the new compute bottleneck
evaleval
•
Apr 29
•
28
upvoted
a
paper
about 1 month ago
CocoaBench: Evaluating Unified Digital Agents in the Wild
Paper
•
2604.11201
•
Published
Apr 13
•
37
New activity in
evaleval/EEE_datastore
about 1 month ago
[Submission] Add OpenEval benchmark data
2
#109 opened about 1 month ago by
mrshu
[Submission] Add Vals.ai benchmark data
2
#108 opened about 1 month ago by
mrshu
[ACL Shared Task] Add Chatbot Arena (video_edit)
2
#107 opened about 1 month ago by
muhammadravi251001
[Submission] HAL Leaderboard - 9 agentic benchmarks (246 entries)
4
#80 opened about 1 month ago by
Asaf-Yehudai
Add HELM Safety v1.17.0 results
3
#83 opened about 1 month ago by
yifanmai
Add LLM Stats results
1
#84 opened about 1 month ago by
Cerru02
Add HELM AIR-Bench v1.19.0 results
5
#70 opened about 2 months ago by
yifanmai
[ACL Shared Task] Add CocoaBench aggregate results
1
#75 opened about 1 month ago by
Cerru02
[ACL Shared Task] Add SWE-bench Verified official leaderboard data
11
#63 opened about 2 months ago by
jatinganhotra
updated
a dataset
about 1 month ago
evaleval/EEE_datastore
Viewer
•
Updated
5 days ago
•
80.7k
•
81.6k
•
27
New activity in
evaleval/EEE_datastore
about 2 months ago
[ACL Shared Task] Add Artificial Analysis LLM results
2
#62 opened about 2 months ago by
Cerru02
[ACL Shared Task] Add ARC-AGI leaderboard results
11
#55 opened 2 months ago by
Cerru02
[ACL Shared Task] Add SciArena leaderboard results
8
#54 opened 2 months ago by
Cerru02
Load more