Perry the Platypus's picture

Perry the Platypus PRO

AgPerry

·

AI & ML interests

None yet

Recent Activity

updated a dataset about 1 hour ago

TIGER-Lab/ClawBenchV2Trace

updated a dataset about 2 hours ago

NAIL-Group/ClawBenchV1Trace

commentedon a paper 4 days ago

RewardHarness: Self-Evolving Agentic Post-Training

View all activity

Organizations

updated a dataset about 1 hour ago

TIGER-Lab/ClawBenchV2Trace

Updated 33 minutes ago • 171

updated a dataset about 2 hours ago

NAIL-Group/ClawBenchV1Trace

Updated about 2 hours ago • 315

commented a paper 4 days ago

RewardHarness: Self-Evolving Agentic Post-Training

Paper • 2605.08703 • Published 12 days ago • 7 •

upvoted a paper 4 days ago

RewardHarness: Self-Evolving Agentic Post-Training

Paper • 2605.08703 • Published 12 days ago • 7

New activity in huggingface/HuggingDiscussions 6 days ago

[FEEDBACK] Daily Papers

#32 opened almost 2 years ago by

submitted a paper to Daily Papers 6 days ago

RewardHarness: Self-Evolving Agentic Post-Training

Paper • 2605.08703 • Published 12 days ago • 7

updated a dataset 7 days ago

NAIL-Group/ClawBenchV2Trace

Updated 7 days ago • 351

updated a Space 7 days ago

ClawBench Leaderboard

Can AI agents complete everyday online tasks?

updated 2 datasets 7 days ago

NAIL-Group/ClawBench

Viewer • Updated 7 days ago • 153 • 414 • 2

TIGER-Lab/ClawBench

Viewer • Updated 7 days ago • 153 • 249

updated a collection 9 days ago

ClawBench

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated 9 days ago

published a Space 9 days ago

ClawBench Leaderboard

Can AI agents complete everyday online tasks?

updated a collection 9 days ago

ClawBench

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated 9 days ago

updated a Space 9 days ago

ClawBench Leaderboard

Live leaderboard for the ClawBench web-agent benchmark

updated 2 collections 9 days ago

ClawBench

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated 9 days ago

ClawBench — Browser Agent Benchmark Suite

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated 9 days ago • 1

published 2 datasets 9 days ago

TIGER-Lab/ClawBenchV2Trace

Updated 33 minutes ago • 171

NAIL-Group/ClawBenchV2Trace

Updated 7 days ago • 351

updated a collection 9 days ago

ClawBench

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated 9 days ago