Tommaso Cerruti

Cerru02

16 6 3

https://tommasocerruti.github.io/

AI & ML interests

AI safety and evaluation

Recent Activity

liked a model 1 day ago

thinkingmachines/Inkling

upvoted an article 5 days ago

Security incident disclosure — July 2026

authored a paper 11 days ago

Linear Attention Architectures: Mechanisms, Trade-offs, and Cross-Layer Routing

View all activity

Organizations

liked a model 1 day ago

thinkingmachines/Inkling

Image-Text-to-Text • 952B • Updated 1 day ago • 16.4k • • 1.36k

upvoted an article 5 days ago

Article

Security incident disclosure — July 2026

system

•

6 days ago

• 313

authored a paper 11 days ago

Linear Attention Architectures: Mechanisms, Trade-offs, and Cross-Layer Routing

Paper • 2607.07953 • Published 14 days ago • 14

submitted a paper to Daily Papers 12 days ago

Linear Attention Architectures: Mechanisms, Trade-offs, and Cross-Layer Routing

Paper • 2607.07953 • Published 14 days ago • 14

upvoted a paper 12 days ago

Linear Attention Architectures: Mechanisms, Trade-offs, and Cross-Layer Routing

Paper • 2607.07953 • Published 14 days ago • 14

authored a paper 16 days ago

Every Eval Ever: A Unifying Schema and Community Repository for AI Evaluation Results

Paper • 2606.14516 • Published Jun 12 • 6

upvoted a paper 19 days ago

Every Eval Ever: A Unifying Schema and Community Repository for AI Evaluation Results

Paper • 2606.14516 • Published Jun 12 • 6

New activity in evaleval/EEE_datastore 2 months ago

Fix LLM Stats provenance relationships

#137 opened 2 months ago by

Cerru02

upvoted an article 2 months ago

Article

Safety Evals Should Project Test-Time Compute

Cerru02

•

May 11

• 6

published an article 2 months ago

Article

Safety Evals Should Project Test-Time Compute

Cerru02

•

May 11

• 6

New activity in evaleval/EEE_datastore 3 months ago

Update HELM to schema version v0.2.2

#121 opened 3 months ago by

yifanmai

authored a paper 3 months ago

CocoaBench: Evaluating Unified Digital Agents in the Wild

Paper • 2604.11201 • Published Apr 13 • 37

upvoted an article 3 months ago

Article

AI evals are becoming the new compute bottleneck

evaleval

•

Apr 29

• 30

upvoted a paper 3 months ago

CocoaBench: Evaluating Unified Digital Agents in the Wild

Paper • 2604.11201 • Published Apr 13 • 37

New activity in evaleval/EEE_datastore 3 months ago

[Submission] Add OpenEval benchmark data

#109 opened 3 months ago by

mrshu

[Submission] Add Vals.ai benchmark data

#108 opened 3 months ago by

mrshu

[ACL Shared Task] Add Chatbot Arena (video_edit)

#107 opened 3 months ago by

muhammadravi251001

[Submission] HAL Leaderboard - 9 agentic benchmarks (246 entries)

#80 opened 3 months ago by

Asaf-Yehudai

Add HELM Safety v1.17.0 results

#83 opened 3 months ago by

yifanmai

Add LLM Stats results

#84 opened 3 months ago by

Cerru02

Tommaso Cerruti

AI & ML interests

Recent Activity

Organizations

Cerru02's activity

Security incident disclosure — July 2026

Fix LLM Stats provenance relationships

Safety Evals Should Project Test-Time Compute

Safety Evals Should Project Test-Time Compute

Update HELM to schema version v0.2.2

AI evals are becoming the new compute bottleneck

[Submission] Add OpenEval benchmark data

[Submission] Add Vals.ai benchmark data

[ACL Shared Task] Add Chatbot Arena (video_edit)

[Submission] HAL Leaderboard - 9 agentic benchmarks (246 entries)

Add HELM Safety v1.17.0 results

Add LLM Stats results