VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning Paper • 2410.22995 • Published Oct 30, 2024 • 3
DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios Paper • 2410.23746 • Published Oct 31, 2024 • 1
Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore Paper • 2405.04286 • Published May 7, 2024 • 1
Are Large Reasoning Models Good Translation Evaluators? Analysis and Performance Boost Paper • 2510.20780 • Published Oct 23, 2025 • 5
Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration Paper • 2509.14760 • Published Sep 18, 2025 • 53
Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration Paper • 2509.14760 • Published Sep 18, 2025 • 53
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond Paper • 2503.21614 • Published Mar 27, 2025 • 43
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Paper • 2501.12895 • Published Jan 22, 2025 • 61