David Fan's picture

David Fan

davidfan97

·

dfan

AI & ML interests

Visual representation learning, videos, vision-language

Recent Activity

upvoted a paper 1 day ago

Beyond Language Modeling: An Exploration of Multimodal Pretraining

upvoted an article 4 days ago

SmolLM - blazingly fast and remarkably powerful

liked a dataset 10 days ago

nyu-visionx/Cambrian-10M

View all activity

Organizations

upvoted a paper 1 day ago

Beyond Language Modeling: An Exploration of Multimodal Pretraining

Paper • 2603.03276 • Published 2 days ago • 65

upvoted an article 4 days ago

Article

SmolLM - blazingly fast and remarkably powerful

+1

Jul 16, 2024

•

446

liked a dataset 10 days ago

nyu-visionx/Cambrian-10M

Preview • Updated Jul 8, 2024 • 5.49k • 126

liked a Space 12 days ago

The Smol Training Playbook

The secrets to building world-class LLMs

liked a model 22 days ago

facebook/webssl-dino300m-full2b-224

Image Feature Extraction • 0.3B • Updated Apr 24, 2025 • 3.03k • 11

upvoted a collection 22 days ago

Scale RAE

Collection for "Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders" • 7 items • Updated Feb 2 • 3

upvoted a collection 4 months ago

RAE

Collection for Diffusion Transformers with Representation Autoencoders • 7 items • Updated 11 days ago • 11

liked a model 4 months ago

nyu-visionx/RAE-collections

Unconditional Image Generation • Updated 5 days ago • 46

upvoted 2 papers 5 months ago

OneFlow: Concurrent Mixed-Modal and Interleaved Generation with Edit Flows

Paper • 2510.03506 • Published Oct 3, 2025 • 15

Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training

Paper • 2509.26625 • Published Sep 30, 2025 • 43

upvoted a paper 7 months ago

Deep Think with Confidence

Paper • 2508.15260 • Published Aug 21, 2025 • 90

updated a model 7 months ago

facebook/ijepa_vith14_1k

Image Feature Extraction • 0.6B • Updated Aug 11, 2025 • 4.81k • 16

updated a model 9 months ago

facebook/vjepa2-vitg-fpc64-384

Video Classification • 1B • Updated Aug 11, 2025 • 15k • 38

upvoted a collection 9 months ago

V-JEPA 2

A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13, 2025 • 192

upvoted a collection 10 months ago

Cosmos-Tokenize1

A suite of image and video tokenizers • 8 items • Updated 2 days ago • 11

liked a model 10 months ago

facebook/webssl-dino7b-full8b-518

Image Feature Extraction • 6B • Updated Apr 24, 2025 • 15 • 12

published 4 models 11 months ago

facebook/webssl-mae3b-full2b-224

Image Feature Extraction • 3B • Updated Apr 24, 2025 • 8

facebook/webssl-mae2b-full2b-224

Image Feature Extraction • 2B • Updated Apr 24, 2025 • 7

facebook/webssl-mae1b-full2b-224

Image Feature Extraction • 1B • Updated Apr 24, 2025 • 8

facebook/webssl-mae700m-full2b-224

Image Feature Extraction • 0.6B • Updated Apr 24, 2025 • 13