OmniCap-IF: Benchmarking and Improving Instruction Following Abilities for Omni-Video Captioning Paper • 2606.08572 • Published 5 days ago • 13 • 3
EmpiriGraph-Psy: A Dataset and LLM Pipeline for Extracting Empirical Relation Graphs from Psychology Abstracts Paper • 2606.08362 • Published 6 days ago • 1 • 2
CoVEBench: Can Video Editing Models Handle Complex Instructions? Paper • 2606.08415 • Published 5 days ago • 47 • 3
FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention Paper • 2606.09079 • Published 4 days ago • 56 • 5
OASIS: From Simulation Data Collection to Real-World Humanoid Loco-Manipulation Paper • 2606.08548 • Published 5 days ago • 2 • 2
Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory Paper • 2606.09365 • Published 3 days ago • 2 • 2
Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting Paper • 2606.09809 • Published 4 days ago • 2 • 2
Echo-Memory: A Controlled Study of Memory in Action World Models Paper • 2606.09803 • Published 4 days ago • 31 • 2
Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short Paper • 2606.09380 • Published 3 days ago • 7 • 3
Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses Paper • 2606.08348 • Published 6 days ago • 13 • 2
Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops Paper • 2606.08960 • Published 4 days ago • 1 • 2
OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics Paper • 2606.09826 • Published 4 days ago • 16 • 3
Precision Is Not Faithfulness: Coverage-Aware Evaluation of Grounded Generation with a Complete Oracle Paper • 2606.09376 • Published 4 days ago • 6
What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems Paper • 2606.05304 • Published 9 days ago • 5 • 2
BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling Paper • 2606.09707 • Published 3 days ago • 6 • 2
Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning Paper • 2606.11087 • Published 3 days ago • 3 • 2