K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts Paper • 2606.02404 • Published 11 days ago • 56
VibeProteinBench: An Evaluation Benchmark for Language-interfaced Vibe Protein Design Paper • 2605.10978 • Published about 1 month ago • 19
view article Article How to Ground a Korean AI Agent in Real Demographics with Synthetic Personas nvidia • Apr 21 • 26
Predicting LLM Reasoning Performance with Small Proxy Model Paper • 2509.21013 • Published Sep 25, 2025 • 6
rBridge 📈 Collection Opensource release for Predicting LLM Reasoning Performance with Small Proxy Model • 3 items • Updated Feb 26 • 2
GuiWorld 🌏📱 Collection Generative Visual Code Mobile World Model -- gWorld • 4 items • Updated May 11 • 4