jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-1 Text Generation • 8B • Updated 8 days ago • 15 • 1
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 501
VideoZeroBench: Probing the Limits of Video MLLMs with Spatio-Temporal Evidence Verification Paper • 2604.01569 • Published Apr 2 • 13
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 627
All Roads Lead to Rome: Incentivizing Divergent Thinking in Vision-Language Models Paper • 2604.00479 • Published Apr 1 • 68
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published Mar 30 • 341
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models Paper • 2603.25716 • Published Mar 26 • 156
Does Your Reasoning Model Implicitly Know When to Stop Thinking? Paper • 2602.08354 • Published Feb 9 • 264
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training Paper • 2602.10693 • Published Feb 11 • 220
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise Paper • 2602.12783 • Published Feb 13 • 216