16 9

Zhou

FireFlyCourageous

Lattic-zjj

AI & ML interests

None yet

Recent Activity

upvoted a paper 8 days ago

MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation

upvoted a paper 5 months ago

UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks

upvoted a paper 5 months ago

Pushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning

View all activity

Organizations

upvoted a paper 8 days ago

MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation

Paper • 2605.20183 • Published 9 days ago • 14

upvoted 2 papers 5 months ago

UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks

Paper • 2507.11336 • Published Jul 15, 2025 • 8

Pushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning

Paper • 2512.19687 • Published Dec 22, 2025 • 3

upvoted an article 5 months ago

Article

SigLIP 2: A better multilingual vision language encoder

ariG23498, merve, qubvel-hf

•

Feb 21, 2025

• 214

upvoted a paper 5 months ago

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

Paper • 2512.13303 • Published Dec 15, 2025 • 17

liked a dataset 6 months ago

nyu-visionx/VSI-590K

Preview • Updated Nov 7, 2025 • 4.73k • 22

upvoted a collection 7 months ago

Emu3.5

Collection

Native Multimodal Models are World Learners 🌍 • 4 items • Updated Feb 4 • 77

upvoted 2 papers 7 months ago

Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance

Paper • 2510.24711 • Published Oct 28, 2025 • 20

SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

Paper • 2509.24695 • Published Sep 29, 2025 • 53

upvoted a paper 8 months ago

Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13, 2025 • 171

liked a Space 11 months ago

Tar

🚀

Unified MLLM with Text-Aligned Representations

liked a Space 12 months ago

BAGEL

🚀

220

Demo for BAGEL

liked a dataset about 1 year ago

BLIP3o/BLIP3o-Pretrain-Long-Caption

Viewer • Updated Jun 26, 2025 • 27.2M • 8.53k • 64

liked a model about 1 year ago

deepseek-ai/Janus-Pro-7B

Any-to-Any • Updated Feb 1, 2025 • 21.1k • 3.61k

liked a dataset about 1 year ago

BLIP3o/BLIP3o-60k

Viewer • Updated May 25, 2025 • 7.1k • 1.14k • 36

liked a Space about 1 year ago

Video Generation Leaderboard

📊

209

Text to Video and Image to Video Arena & Leaderboard

updated a model about 1 year ago

FireFlyCourageous/MMCTR_DIN_MicroLens_1M_x1

Updated Apr 24, 2025

published a model about 1 year ago

FireFlyCourageous/MMCTR_DIN_MicroLens_1M_x1

Updated Apr 24, 2025

liked a dataset about 1 year ago

We-Math/We-Math

Viewer • Updated Aug 13, 2025 • 1.74k • 2.24k • 35

upvoted a paper about 1 year ago

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 87

Zhou

AI & ML interests

Recent Activity

Organizations

FireFlyCourageous's activity

SigLIP 2: A better multilingual vision language encoder

Tar

BAGEL

Video Generation Leaderboard