GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics Paper • 2602.12617 • Published Feb 13 • 20
See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding Paper • 2605.18018 • Published 4 days ago • 19
See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding Paper • 2605.18018 • Published 4 days ago • 19
Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation Paper • 2604.25819 • Published 24 days ago • 17
GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics Paper • 2602.12617 • Published Feb 13 • 20
A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models Paper • 2508.01548 • Published Aug 3, 2025 • 14 • 2
A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models Paper • 2508.01548 • Published Aug 3, 2025 • 14
Towards RAW Object Detection in Diverse Conditions Paper • 2411.15678 • Published Nov 24, 2024 • 1
Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness Paper • 2501.07978 • Published Jan 14, 2025 • 1
Gaussian Splatting with Discretized SDF for Relightable Assets Paper • 2507.15629 • Published Jul 21, 2025 • 23