VLMs - a hg2wzh Collection

hg2wzh 's Collections

Embed

VLMs

LLMs

VLMs

updated Apr 25, 2025

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18, 2024 • 78
Multimodal Latent Language Modeling with Next-Token Diffusion

Paper • 2412.08635 • Published Dec 11, 2024 • 48
AIDC-AI/Ovis2-2B

Image-Text-to-Text • 2B • Updated Aug 15, 2025 • 1.31k • 60
DAMO-NLP-SG/VideoLLaMA3-2B

Video-Text-to-Text • 2B • Updated Sep 3, 2025 • 2.06k • 16
AIDC-AI/Ovis2-16B

Image-Text-to-Text • 16B • Updated Aug 15, 2025 • 72 • 101
microsoft/Phi-4-multimodal-instruct

Automatic Speech Recognition • 6B • Updated 30 days ago • 183k • 1.56k
StarJiaxing/R1-Omni-0.5B

1B • Updated Mar 24, 2025 • 22 • 82
Skywork/Skywork-R1V2-38B

Image-Text-to-Text • 38B • Updated Jun 10, 2025 • 58 • 126