On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards Paper • 2407.04065 • Published Jul 4, 2024 • 5
AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security Paper • 2605.29801 • Published 5 days ago • 134
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards Paper • 2407.04065 • Published Jul 4, 2024 • 5
Towards Evaluation Engineering: An Empirical Study of ML Evaluation Harnesses in the Wild Paper • 2605.24213 • Published 11 days ago • 10
Towards Evaluation Engineering: An Empirical Study of ML Evaluation Harnesses in the Wild Paper • 2605.24213 • Published 11 days ago • 10
Towards Evaluation Engineering: An Empirical Study of ML Evaluation Harnesses in the Wild Paper • 2605.24213 • Published 11 days ago • 10