oguzhanercan
's Collections
Image-Video General Tasks
updated
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of
Images and Videos
Paper
•
2501.04001
•
Published
•
47
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One
Vision Token
Paper
•
2501.03895
•
Published
•
52
An Empirical Study of Autoregressive Pre-training from Videos
Paper
•
2501.05453
•
Published
•
41
MatchAnything: Universal Cross-Modality Image Matching with Large-Scale
Pre-Training
Paper
•
2501.07556
•
Published
•
7
MINIMA: Modality Invariant Image Matching
Paper
•
2412.19412
•
Published
•
4
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
Paper
•
2501.12375
•
Published
•
23
Intuitive physics understanding emerges from self-supervised pretraining
on natural videos
Paper
•
2502.11831
•
Published
•
20
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks
Paper
•
2502.17157
•
Published
•
52
"Principal Components" Enable A New Language of Images
Paper
•
2503.08685
•
Published
•
12
What's in a Latent? Leveraging Diffusion Latent Space for Domain
Generalization
Paper
•
2503.06698
•
Published
•
4
Segment Any Motion in Videos
Paper
•
2503.22268
•
Published
•
19
FlashVSR: Towards Real-Time Diffusion-Based Streaming Video
Super-Resolution
Paper
•
2510.12747
•
Published
•
37