Answer Matching Outperforms Multiple Choice for Language Model Evaluation Paper • 2507.02856 • Published Jul 3, 2025 • 9
MedSAMix: A Training-Free Model Merging Approach for Medical Image Segmentation Paper • 2508.11032 • Published Aug 14, 2025 • 2
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs Paper • 2509.09677 • Published Sep 11, 2025 • 37
Training Dynamics Impact Post-Training Quantization Robustness Paper • 2510.06213 • Published Oct 7, 2025 • 3
Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols Paper • 2510.09462 • Published Oct 10, 2025 • 6
Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection to Diffusion Language Models Paper • 2510.14961 • Published Oct 16, 2025 • 8
Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation in Mixture-of-Expert models Paper • 2510.14853 • Published Oct 16, 2025 • 5
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence Paper • 2511.07384 • Published Nov 10, 2025 • 19
Training AI Co-Scientists Using Rubric Rewards Paper • 2512.23707 • Published Dec 29, 2025 • 21
Scaling Open-Ended Reasoning to Predict the Future Paper • 2512.25070 • Published Dec 31, 2025 • 20
NESSiE: The Necessary Safety Benchmark -- Identifying Errors that should not Exist Paper • 2602.16756 • Published Feb 18 • 4
Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs Paper • 2605.12460 • Published 7 days ago • 17
Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs Paper • 2603.24511 • Published Mar 25
FutureSim: Replaying World Events to Evaluate Adaptive Agents Paper • 2605.15188 • Published 5 days ago • 5
FutureSim: Replaying World Events to Evaluate Adaptive Agents Paper • 2605.15188 • Published 5 days ago • 5
Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs Paper • 2605.12460 • Published 7 days ago • 17
Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs Paper • 2605.12460 • Published 7 days ago • 17