Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning Paper • 2601.21037 • Published 10 days ago • 13
Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning Paper • 2601.21037 • Published 10 days ago • 13
How Well Do Models Follow Visual Instructions? VIBE: A Systematic Benchmark for Visual Instruction-Driven Image Editing Paper • 2602.01851 • Published 5 days ago • 16
Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs Paper • 2510.24514 • Published Oct 28, 2025 • 22
Lost in Embeddings: Information Loss in Vision-Language Models Paper • 2509.11986 • Published Sep 15, 2025 • 29
Imagine while Reasoning in Space: Multimodal Visualization-of-Thought Paper • 2501.07542 • Published Jan 13, 2025 • 3
Imagine while Reasoning in Space: Multimodal Visualization-of-Thought Paper • 2501.07542 • Published Jan 13, 2025 • 3
Generating Data for Symbolic Language with Large Language Models Paper • 2305.13917 • Published May 23, 2023
TopViewRS: Vision-Language Models as Top-View Spatial Reasoners Paper • 2406.02537 • Published Jun 4, 2024
UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models Paper • 2201.05966 • Published Jan 16, 2022 • 1
On Task Performance and Model Calibration with Supervised and Self-Ensembled In-Context Learning Paper • 2312.13772 • Published Dec 21, 2023