Code-Switching Information Retrieval: Benchmarks, Analysis, and the Limits of Current Retrievers Paper • 2604.17632 • Published 12 days ago • 11
Dual-View Training for Instruction-Following Information Retrieval Paper • 2604.18845 • Published 11 days ago • 10
Thinking Out Loud: Do Reasoning Models Know When They're Right? Paper • 2504.06564 • Published Apr 9, 2025
The Pragmatic Mind of Machines: Tracing the Emergence of Pragmatic Competence in Large Language Models Paper • 2505.18497 • Published May 24, 2025 • 2
Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models Paper • 2505.20236 • Published May 26, 2025 • 3
DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router Paper • 2507.22050 • Published Jul 29, 2025
CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs Paper • 2505.11413 • Published May 16, 2025
Position: The Hidden Costs and Measurement Gaps of Reinforcement Learning with Verifiable Rewards Paper • 2509.21882 • Published Sep 26, 2025
Good Intentions Beyond ACL: Who Does NLP for Social Good, and Where? Paper • 2510.04434 • Published Oct 6, 2025 • 6
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents Paper • 2601.07264 • Published Jan 12 • 24
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems Paper • 2601.11004 • Published Jan 16 • 30
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents Paper • 2601.07264 • Published Jan 12 • 24
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows Paper • 2510.24411 • Published Oct 28, 2025 • 73
VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment Paper • 2410.09421 • Published Oct 12, 2024
GIRAFFE: Design Choices for Extending the Context Length of Visual Language Models Paper • 2412.12735 • Published Dec 17, 2024
EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving Paper • 2509.12603 • Published Sep 16, 2025 • 9