view article Article Announcing AutoBench Agentic: The Next Generation Agentic Benchmark. PeterKruger • 25 days ago • 2
view article Article Announcing AutoBench Agentic: The Next Generation Agentic Benchmark. PeterKruger • 25 days ago • 2
Running Agents 3 AutoBench Leaderboard 👀 3 Multi-run AutoBench leaderboard with historical navigation
view article Article Introducing AutoBench 2.0: Our New Benchmarking Platform is Out Just in Time to Evaluate GPT 5.2. PeterKruger • Dec 17, 2025 • 1
view article Article Introducing AutoBench 2.0: Our New Benchmarking Platform is Out Just in Time to Evaluate GPT 5.2. PeterKruger • Dec 17, 2025 • 1
view article Article AutoBench Goes to the Farm with Evja: The First Ever Agronomy Benchmark. The Best Farmer LLM? OpenAI, but Mistral... PeterKruger • Dec 10, 2025 • 3
view article Article AutoBench Goes to the Farm with Evja: The First Ever Agronomy Benchmark. The Best Farmer LLM? OpenAI, but Mistral... PeterKruger • Dec 10, 2025 • 3
view article Article AutoBench Run 4 is out with Gemini 3 Pro, Gpt 5.1, Grok 4.1 etc. And the winner is not who you expect. PeterKruger • Nov 28, 2025 • 1
view article Article AutoBench Run 4 is out with Gemini 3 Pro, Gpt 5.1, Grok 4.1 etc. And the winner is not who you expect. PeterKruger • Nov 28, 2025 • 1
view article Article AutoBench Goes Scientific: Rigorous Validation for a Dynamic, Open-Source LLM Benchmark PeterKruger • Oct 29, 2025 • 4
view article Article AutoBench Goes Scientific: Rigorous Validation for a Dynamic, Open-Source LLM Benchmark PeterKruger • Oct 29, 2025 • 4
view article Article AutoBench Third Run: Revolutionizing LLM Evaluation with Record-Breaking Scale, Accuracy, and a New Home at autobench.org PeterKruger • Aug 20, 2025 • 6
view article Article Introducing Bot Scanner: A "Skyscanner" for LLM answers PeterKruger • Jun 4, 2025
view article Article AutoBench Run 2 Results are Out! Surprise: Gemini 2.5 Pro is not the Best Affordable Thinking Model PeterKruger • Apr 29, 2025 • 6
Running Agents 3 AutoBench Leaderboard 👀 3 Multi-run AutoBench leaderboard with historical navigation
view article Article Announcing MamayLM, an efficient state-of-the-art Ukrainian LLM INSAIT-Institute • Apr 23, 2025 • 65