ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning Paper • 2512.02835 • Published Dec 2, 2025 • 9
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image Paper • 2512.05044 • Published Dec 4, 2025 • 16
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning Paper • 2512.05591 • Published Dec 5, 2025 • 16
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling Paper • 2512.05343 • Published Dec 5, 2025 • 24
World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty Paper • 2512.05927 • Published Dec 5, 2025 • 11
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing Paper • 2512.06065 • Published Dec 5, 2025 • 28
Vector Quantization using Gaussian Variational Autoencoder Paper • 2512.06609 • Published about 1 month ago • 1
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance Paper • 2512.08765 • Published 28 days ago • 128
OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory Paper • 2512.07802 • Published 29 days ago • 43
ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models Paper • 2512.07843 • Published Nov 24, 2025 • 21
TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models Paper • 2512.08153 • Published 29 days ago • 7
SAM-Body4D: Training-Free 4D Human Body Mesh Recovery from Videos Paper • 2512.08406 • Published 29 days ago • 2
MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos Paper • 2512.10881 • Published 26 days ago • 29
Evaluating Gemini Robotics Policies in a Veo World Simulator Paper • 2512.10675 • Published 26 days ago • 17
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published Dec 2, 2025 • 244
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence Paper • 2511.18538 • Published Nov 23, 2025 • 282
PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing Paper • 2512.02589 • Published Dec 2, 2025 • 68
Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression Paper • 2512.05081 • Published Dec 4, 2025 • 30
RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics Paper • 2512.13660 • Published 22 days ago • 36
MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives Paper • 2512.14699 • Published 21 days ago • 27
Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed Paper • 2512.14067 • Published 22 days ago • 13
The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text Paper • 2512.16924 • Published 19 days ago • 25
Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers Paper • 2512.16615 • Published 19 days ago • 4
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models Paper • 2512.16561 • Published 19 days ago • 19
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence Paper • 2512.16793 • Published 19 days ago • 72
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times Paper • 2512.16093 • Published 20 days ago • 91
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published 14 days ago • 49
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Paper • 2512.20848 • Published 14 days ago • 30
4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation Paper • 2512.17012 • Published 19 days ago • 42
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation Paper • 2601.02204 • Published 1 day ago • 45
DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer Paper • 2601.01425 • Published 3 days ago • 33
VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation Paper • 2601.02256 • Published 1 day ago • 28
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits Paper • 2512.20578 • Published 14 days ago • 46