Revisiting a Pain in the Neck: A Semantic Reasoning Benchmark for Language Models Paper β’ 2604.16593 β’ Published 23 days ago β’ 6
Revisiting a Pain in the Neck: A Semantic Reasoning Benchmark for Language Models Paper β’ 2604.16593 β’ Published 23 days ago β’ 6
\$OneMillion-Bench: How Far are Language Agents from Human Experts? Paper β’ 2603.07980 β’ Published Mar 9 β’ 27
\$OneMillion-Bench: How Far are Language Agents from Human Experts? Paper β’ 2603.07980 β’ Published Mar 9 β’ 27
LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts Paper β’ 2602.14060 β’ Published Feb 15 β’ 2
LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts Paper β’ 2602.14060 β’ Published Feb 15 β’ 2
Understanding and Leveraging the Expert Specialization of Context Faithfulness in Mixture-of-Experts LLMs Paper β’ 2508.19594 β’ Published Aug 27, 2025 β’ 3
Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning Paper β’ 2512.07461 β’ Published Dec 8, 2025 β’ 79
ICM-Fusion: In-Context Meta-Optimized LoRA Fusion for Multi-Task Adaptation Paper β’ 2508.04153 β’ Published Aug 6, 2025
Make an Offer They Can't Refuse: Grounding Bayesian Persuasion in Real-World Dialogues without Pre-Commitment Paper β’ 2510.13387 β’ Published Oct 15, 2025
Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia Paper β’ 2512.03318 β’ Published Dec 3, 2025 β’ 4
RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling Paper β’ 2506.08672 β’ Published Jun 10, 2025 β’ 30
ReflectEvo: Improving Meta Introspection of Small LLMs by Learning Self-Reflection Paper β’ 2505.16475 β’ Published May 22, 2025 β’ 3
Revisiting a Pain in the Neck: Semantic Phrase Processing Benchmark for Language Models Paper β’ 2405.02861 β’ Published May 5, 2024 β’ 1