Collections
Discover the best community collections!
Collections including paper arxiv:2601.02204
-
Small Language Models are the Future of Agentic AI
Paper • 2506.02153 • Published • 23 -
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
Paper • 2512.02556 • Published • 248 -
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
Paper • 2511.22570 • Published • 86 -
DeepSeek-OCR: Contexts Optical Compression
Paper • 2510.18234 • Published • 86
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 168 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 13
-
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper • 2407.08083 • Published • 32 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 63 -
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Paper • 2408.15237 • Published • 42 -
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 30
-
ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning
Paper • 2512.02835 • Published • 9 -
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
Paper • 2512.05044 • Published • 16 -
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Paper • 2512.05591 • Published • 16 -
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling
Paper • 2512.05343 • Published • 24
-
Hyperspherical Latents Improve Continuous-Token Autoregressive Generation
Paper • 2509.24335 • Published • 8 -
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
Paper • 2601.02204 • Published • 56 -
VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation
Paper • 2601.02256 • Published • 30
-
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Paper • 2410.14059 • Published • 62 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper • 2503.04130 • Published • 96 -
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
Paper • 2503.10639 • Published • 53
-
facebook/w2v-bert-2.0
Feature Extraction • 0.6B • Updated • 2.39M • 199 -
facebook/metaclip-h14-fullcc2.5b
Zero-Shot Image Classification • 1.0B • Updated • 10.5k • 49 -
openai/clip-vit-large-patch14
Zero-Shot Image Classification • 0.4B • Updated • 7.34M • 1.95k -
Salesforce/blip-image-captioning-large
Image-to-Text • 0.5B • Updated • 1.46M • 1.44k
-
ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning
Paper • 2512.02835 • Published • 9 -
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
Paper • 2512.05044 • Published • 16 -
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Paper • 2512.05591 • Published • 16 -
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling
Paper • 2512.05343 • Published • 24
-
Small Language Models are the Future of Agentic AI
Paper • 2506.02153 • Published • 23 -
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
Paper • 2512.02556 • Published • 248 -
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
Paper • 2511.22570 • Published • 86 -
DeepSeek-OCR: Contexts Optical Compression
Paper • 2510.18234 • Published • 86
-
Hyperspherical Latents Improve Continuous-Token Autoregressive Generation
Paper • 2509.24335 • Published • 8 -
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
Paper • 2601.02204 • Published • 56 -
VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation
Paper • 2601.02256 • Published • 30
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 168 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 13
-
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Paper • 2410.14059 • Published • 62 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper • 2503.04130 • Published • 96 -
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
Paper • 2503.10639 • Published • 53
-
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper • 2407.08083 • Published • 32 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 63 -
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Paper • 2408.15237 • Published • 42 -
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 30
-
facebook/w2v-bert-2.0
Feature Extraction • 0.6B • Updated • 2.39M • 199 -
facebook/metaclip-h14-fullcc2.5b
Zero-Shot Image Classification • 1.0B • Updated • 10.5k • 49 -
openai/clip-vit-large-patch14
Zero-Shot Image Classification • 0.4B • Updated • 7.34M • 1.95k -
Salesforce/blip-image-captioning-large
Image-to-Text • 0.5B • Updated • 1.46M • 1.44k