Context Engineering for Trustworthiness: Rescorla Wagner Steering Under Mixed and Inappropriate Contexts Paper • 2509.04500 • Published Sep 2, 2025 • 5
EscapeBench: Towards Advancing Creative Intelligence of Language Model Agents Paper • 2412.13549 • Published Dec 18, 2024
The Right Time Matters: Data Arrangement Affects Zero-Shot Generalization in Instruction Tuning Paper • 2406.11721 • Published Jun 17, 2024
ShortageSim: Simulating Drug Shortages under Information Asymmetry Paper • 2509.01813 • Published Sep 1, 2025
xRouter: Training Cost-Aware LLMs Orchestration System via Reinforcement Learning Paper • 2510.08439 • Published Oct 9, 2025 • 1
CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents Paper • 2511.02734 • Published Nov 4, 2025 • 22
LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering Paper • 2511.13998 • Published Nov 17, 2025 • 3
From Word to World: Can Large Language Models be Implicit Text-based World Models? Paper • 2512.18832 • Published Dec 21, 2025 • 15
ISACL: Internal State Analyzer for Copyrighted Training Data Leakage Paper • 2508.17767 • Published Aug 25, 2025 • 1
Steer2Adapt: Dynamically Composing Steering Vectors Elicits Efficient Adaptation of LLMs Paper • 2602.07276 • Published Feb 7 • 11
Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data Paper • 2602.21320 • Published Feb 24 • 12
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published Apr 14 • 105
PEARL: Self-Evolving Assistant for Time Management with Reinforcement Learning Paper • 2601.11957 • Published Jan 28 • 3
Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning Paper • 2510.01932 • Published Oct 4, 2025