video

zzfive 's Collections

world_model

VLA

RolePlaying

dLLM

industry

RAG

ssm

safety

inference optimization

updated 10 days ago

Upvote

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18, 2024 • 18
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18, 2024 • 9
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18, 2024 • 10
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19, 2024 • 13
Lumiere: A Space-Time Diffusion Model for Video Generation

Paper • 2401.12945 • Published Jan 23, 2024 • 86
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning

Paper • 2402.00769 • Published Feb 1, 2024 • 22
VideoPrism: A Foundational Visual Encoder for Video Understanding

Paper • 2402.13217 • Published Feb 20, 2024 • 40
Video ReCap: Recursive Captioning of Hour-Long Videos

Paper • 2402.13250 • Published Feb 20, 2024 • 26
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Paper • 2402.14797 • Published Feb 22, 2024 • 21
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27, 2024 • 87
Sora Generates Videos with Stunning Geometrical Consistency

Paper • 2402.17403 • Published Feb 27, 2024 • 18
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Paper • 2402.17723 • Published Feb 27, 2024 • 16
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Paper • 2402.19479 • Published Feb 29, 2024 • 35
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 37
Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation

Paper • 2403.02827 • Published Mar 5, 2024 • 9
Video Editing via Factorized Diffusion Distillation

Paper • 2403.09334 • Published Mar 14, 2024 • 22
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

Paper • 2403.09626 • Published Mar 14, 2024 • 15
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

Paper • 2108.01073 • Published Aug 2, 2021 • 9
AnimateDiff-Lightning: Cross-Model Diffusion Distillation

Paper • 2403.12706 • Published Mar 19, 2024 • 18
Streaming Dense Video Captioning

Paper • 2404.01297 • Published Apr 1, 2024 • 13
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 11
MotionMaster: Training-free Camera Motion Transfer For Video Generation

Paper • 2404.15789 • Published Apr 24, 2024 • 13
LLM-AD: Large Language Model based Audio Description System

Paper • 2405.00983 • Published May 2, 2024 • 22
FIFO-Diffusion: Generating Infinite Videos from Text without Training

Paper • 2405.11473 • Published May 19, 2024 • 56
ReVideo: Remake a Video with Motion and Content Control

Paper • 2405.13865 • Published May 22, 2024 • 26
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

Paper • 2405.14598 • Published May 23, 2024 • 13
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition

Paper • 2405.15216 • Published May 24, 2024 • 15
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models

Paper • 2405.16537 • Published May 26, 2024 • 17
Looking Backward: Streaming Video-to-Video Translation with Feature Banks

Paper • 2405.15757 • Published May 24, 2024 • 15
Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer

Paper • 2405.17405 • Published May 27, 2024 • 16
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

Paper • 2405.17414 • Published May 27, 2024 • 12
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

Paper • 2405.18386 • Published May 28, 2024 • 22
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

Paper • 2405.18750 • Published May 29, 2024 • 21
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

Paper • 2405.18991 • Published May 29, 2024 • 12
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

Paper • 2405.20222 • Published May 30, 2024 • 11
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark

Paper • 2405.19707 • Published May 30, 2024 • 9
Learning Temporally Consistent Video Depth from Video Diffusion Priors

Paper • 2406.01493 • Published Jun 3, 2024 • 23
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation

Paper • 2406.00908 • Published Jun 3, 2024 • 12
Searching Priors Makes Text-to-Video Synthesis Better

Paper • 2406.03215 • Published Jun 5, 2024 • 13
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6, 2024 • 74
SF-V: Single Forward Video Generation Model

Paper • 2406.04324 • Published Jun 6, 2024 • 24
VideoTetris: Towards Compositional Text-to-Video Generation

Paper • 2406.04277 • Published Jun 6, 2024 • 25
MotionClone: Training-Free Motion Cloning for Controllable Video Generation

Paper • 2406.05338 • Published Jun 8, 2024 • 41
NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing

Paper • 2406.06523 • Published Jun 10, 2024 • 53
Hierarchical Patch Diffusion Models for High-Resolution Video Generation

Paper • 2406.07792 • Published Jun 12, 2024 • 16
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation

Paper • 2406.07686 • Published Jun 11, 2024 • 17
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation

Paper • 2406.08656 • Published Jun 12, 2024 • 9
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality

Paper • 2406.08845 • Published Jun 13, 2024 • 9
ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning

Paper • 2406.14130 • Published Jun 20, 2024 • 10
MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

Paper • 2406.15252 • Published Jun 21, 2024 • 18
Video-Infinity: Distributed Long Video Generation

Paper • 2406.16260 • Published Jun 24, 2024 • 30
DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models

Paper • 2407.01519 • Published Jul 1, 2024 • 26
SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix

Paper • 2407.00367 • Published Jun 29, 2024 • 11
VIMI: Grounding Video Generation through Multi-modal Instruction

Paper • 2407.06304 • Published Jul 8, 2024 • 10
VEnhancer: Generative Space-Time Enhancement for Video Generation

Paper • 2407.07667 • Published Jul 10, 2024 • 17
Still-Moving: Customized Video Generation without Customized Video Data

Paper • 2407.08674 • Published Jul 11, 2024 • 13
CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation

Paper • 2407.06188 • Published Jul 8, 2024 • 3
TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models

Paper • 2407.09012 • Published Jul 12, 2024 • 10
Video Occupancy Models

Paper • 2407.09533 • Published Jun 25, 2024 • 8
Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models

Paper • 2407.10285 • Published Jul 14, 2024 • 5
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control

Paper • 2407.12781 • Published Jul 17, 2024 • 13
Towards Understanding Unsafe Video Generation

Paper • 2407.12581 • Published Jul 17, 2024
Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion

Paper • 2407.13759 • Published Jul 18, 2024 • 18
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models

Paper • 2407.15642 • Published Jul 22, 2024 • 11
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence

Paper • 2407.16655 • Published Jul 23, 2024 • 30
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

Paper • 2407.14505 • Published Jul 19, 2024 • 26
FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention

Paper • 2407.19918 • Published Jul 29, 2024 • 51
Tora: Trajectory-oriented Diffusion Transformer for Video Generation

Paper • 2407.21705 • Published Jul 31, 2024 • 27
Fine-gained Zero-shot Video Sampling

Paper • 2407.21475 • Published Jul 31, 2024 • 6
Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion

Paper • 2408.00458 • Published Aug 1, 2024 • 12
UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model

Paper • 2408.00762 • Published Aug 1, 2024 • 10
VidGen-1M: A Large-Scale Dataset for Text-to-video Generation

Paper • 2408.02629 • Published Aug 5, 2024 • 15
ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer

Paper • 2408.03284 • Published Aug 6, 2024 • 11
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics

Paper • 2408.04631 • Published Aug 8, 2024 • 9
Kalman-Inspired Feature Propagation for Video Face Super-Resolution

Paper • 2408.05205 • Published Aug 9, 2024 • 9
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Paper • 2408.06072 • Published Aug 12, 2024 • 38
FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance

Paper • 2408.08189 • Published Aug 15, 2024 • 17
Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data

Paper • 2408.10119 • Published Aug 19, 2024 • 17
TWLV-I: Analysis and Insights from Holistic Evaluation on Video Foundation Models

Paper • 2408.11318 • Published Aug 21, 2024 • 56
TrackGo: A Flexible and Efficient Method for Controllable Video Generation

Paper • 2408.11475 • Published Aug 21, 2024 • 18
Real-Time Video Generation with Pyramid Attention Broadcast

Paper • 2408.12588 • Published Aug 22, 2024 • 17
CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities

Paper • 2408.13239 • Published Aug 23, 2024 • 11
Training-free Long Video Generation with Chain of Diffusion Model Experts

Paper • 2408.13423 • Published Aug 24, 2024 • 23
TVG: A Training-free Transition Video Generation Method with Diffusion Models

Paper • 2408.13413 • Published Aug 24, 2024 • 14
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

Paper • 2408.15239 • Published Aug 27, 2024 • 30
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

Paper • 2409.01199 • Published Sep 2, 2024 • 14
Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation

Paper • 2409.01055 • Published Sep 2, 2024 • 6
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

Paper • 2409.02634 • Published Sep 4, 2024 • 97
OSV: One Step is Enough for High-Quality Image to Video Generation

Paper • 2409.11367 • Published Sep 17, 2024 • 14
Towards Diverse and Efficient Audio Captioning via Diffusion Models

Paper • 2409.09401 • Published Sep 14, 2024 • 7
LVCD: Reference-based Lineart Video Colorization with Diffusion Models

Paper • 2409.12960 • Published Sep 19, 2024 • 24
Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation

Paper • 2409.12532 • Published Sep 19, 2024 • 5
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling

Paper • 2409.16160 • Published Sep 24, 2024 • 34
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation

Paper • 2409.18964 • Published Sep 27, 2024 • 28
VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide

Paper • 2410.04364 • Published Oct 6, 2024 • 29
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

Paper • 2410.03051 • Published Oct 4, 2024 • 6
Pyramidal Flow Matching for Efficient Video Generative Modeling

Paper • 2410.05954 • Published Oct 8, 2024 • 40
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design

Paper • 2410.05677 • Published Oct 8, 2024 • 14
Loong: Generating Minute-level Long Videos with Autoregressive Language Models

Paper • 2410.02757 • Published Oct 3, 2024 • 36
Animate-X: Universal Character Image Animation with Enhanced Motion Representation

Paper • 2410.10306 • Published Oct 14, 2024 • 57
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention

Paper • 2410.10774 • Published Oct 14, 2024 • 25
LVD-2M: A Long-take Video Dataset with Temporally Dense Captions

Paper • 2410.10816 • Published Oct 14, 2024 • 21
Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 100
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control

Paper • 2410.13830 • Published Oct 17, 2024 • 26
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding

Paper • 2410.17434 • Published Oct 22, 2024 • 27
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

Paper • 2410.19355 • Published Oct 25, 2024 • 24
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale

Paper • 2410.20280 • Published Oct 26, 2024 • 23
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation

Paper • 2410.23277 • Published Oct 30, 2024 • 9
Fashion-VDM: Video Diffusion Model for Virtual Try-On

Paper • 2411.00225 • Published Oct 31, 2024 • 11
Adaptive Caching for Faster Video Generation with Diffusion Transformers

Paper • 2411.02397 • Published Nov 4, 2024 • 23
Motion Control for Enhanced Complex Action Video Generation

Paper • 2411.08328 • Published Nov 13, 2024 • 5
AnimateAnything: Consistent and Controllable Animation for Video Generation

Paper • 2411.10836 • Published Nov 16, 2024 • 24
StableV2V: Stablizing Shape Consistency in Video-to-Video Editing

Paper • 2411.11045 • Published Nov 17, 2024 • 11
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations

Paper • 2411.10818 • Published Nov 16, 2024 • 26
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models

Paper • 2411.13503 • Published Nov 20, 2024 • 34
MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control

Paper • 2411.13807 • Published Nov 21, 2024 • 11
Efficient Long Video Tokenization via Coordinated-based Patch Reconstruction

Paper • 2411.14762 • Published Nov 22, 2024 • 11
VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement

Paper • 2411.15115 • Published Nov 22, 2024 • 10
DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation

Paper • 2411.16657 • Published Nov 25, 2024 • 19
AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation

Paper • 2411.17383 • Published Nov 26, 2024 • 7
Identity-Preserving Text-to-Video Generation by Frequency Decomposition

Paper • 2411.17440 • Published Nov 26, 2024 • 38
Free^2Guide: Gradient-Free Path Integral Control for Enhancing Text-to-Video Generation with Large Vision-Language Models

Paper • 2411.17041 • Published Nov 26, 2024 • 13
Video Depth without Video Models

Paper • 2411.19189 • Published Nov 28, 2024 • 39
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

Paper • 2411.19108 • Published Nov 28, 2024 • 20
Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling

Paper • 2411.18664 • Published Nov 27, 2024 • 24
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers

Paper • 2411.18673 • Published Nov 27, 2024 • 8
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation

Paper • 2412.00927 • Published Dec 1, 2024 • 29
Open-Sora Plan: Open-Source Large Video Generation Model

Paper • 2412.00131 • Published Nov 28, 2024 • 33
TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video

Paper • 2411.18671 • Published Nov 27, 2024 • 20
Efficient Track Anything

Paper • 2411.18933 • Published Nov 28, 2024 • 17
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

Paper • 2411.17459 • Published Nov 26, 2024 • 12
Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation

Paper • 2412.01316 • Published Dec 2, 2024 • 10
VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation

Paper • 2412.02259 • Published Dec 3, 2024 • 60
NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images

Paper • 2412.03517 • Published Dec 4, 2024 • 19
Mimir: Improving Video Diffusion Models for Precise Text Understanding

Paper • 2412.03085 • Published Dec 4, 2024 • 12
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment

Paper • 2412.04814 • Published Dec 6, 2024 • 46
GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration

Paper • 2412.04440 • Published Dec 5, 2024 • 22
Mind the Time: Temporally-Controlled Multi-Event Video Generation

Paper • 2412.05263 • Published Dec 6, 2024 • 10
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation

Paper • 2412.04432 • Published Dec 5, 2024 • 16
MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance

Paper • 2412.05355 • Published Dec 6, 2024 • 8
STIV: Scalable Text and Image Conditioned Video Generation

Paper • 2412.07730 • Published Dec 10, 2024 • 74
Mobile Video Diffusion

Paper • 2412.07583 • Published Dec 10, 2024 • 20
MoViE: Mobile Diffusion for Video Editing

Paper • 2412.06578 • Published Dec 9, 2024 • 18
Video Motion Transfer with Diffusion Transformers

Paper • 2412.07776 • Published Dec 10, 2024 • 17
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

Paper • 2412.07760 • Published Dec 10, 2024 • 55
StyleMaster: Stylize Your Video with Artistic Generation and Translation

Paper • 2412.07744 • Published Dec 10, 2024 • 20
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

Paper • 2412.06016 • Published Dec 8, 2024 • 20
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

Paper • 2412.09349 • Published Dec 12, 2024 • 8
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption

Paper • 2412.09283 • Published Dec 12, 2024 • 19
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity

Paper • 2412.09856 • Published Dec 13, 2024 • 11
VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping

Paper • 2412.11279 • Published Dec 15, 2024 • 13
SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner

Paper • 2412.10533 • Published Dec 13, 2024 • 5
MIVE: New Design and Benchmark for Multi-Instance Video Editing

Paper • 2412.12877 • Published Dec 17, 2024 • 4
AniDoc: Animation Creation Made Easier

Paper • 2412.14173 • Published Dec 18, 2024 • 58
Autoregressive Video Generation without Vector Quantization

Paper • 2412.14169 • Published Dec 18, 2024 • 15
VidTok: A Versatile and Open-Source Video Tokenizer

Paper • 2412.13061 • Published Dec 17, 2024 • 8
Parallelized Autoregressive Visual Generation

Paper • 2412.15119 • Published Dec 19, 2024 • 53
TRecViT: A Recurrent Video Transformer

Paper • 2412.14294 • Published Dec 18, 2024 • 13
Large Motion Video Autoencoding with Cross-modal Video VAE

Paper • 2412.17805 • Published Dec 23, 2024 • 24
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation

Paper • 2412.18597 • Published Dec 24, 2024 • 20
MotiF: Making Text Count in Image Animation with Motion Focal Loss

Paper • 2412.16153 • Published Dec 20, 2024 • 6
VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models

Paper • 2412.19645 • Published Dec 27, 2024 • 13
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control

Paper • 2501.01427 • Published Jan 2, 2025 • 53
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published Dec 31, 2024 • 46
LTX-Video: Realtime Video Latent Diffusion

Paper • 2501.00103 • Published Dec 30, 2024 • 50
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

Paper • 2501.01423 • Published Jan 2, 2025 • 44
Unifying Specialized Visual Encoders for Video Language Models

Paper • 2501.01426 • Published Jan 2, 2025 • 20
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration

Paper • 2501.01320 • Published Jan 2, 2025 • 13
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

Paper • 2412.21059 • Published Dec 30, 2024 • 20
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

Paper • 2501.02976 • Published Jan 6, 2025 • 56
GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking

Paper • 2501.02690 • Published Jan 5, 2025 • 16
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation

Paper • 2501.03059 • Published Jan 6, 2025 • 22
TransPixar: Advancing Text-to-Video Generation with Transparency

Paper • 2501.03006 • Published Jan 6, 2025 • 25
Ingredients: Blending Custom Photos with Video Diffusion Transformers

Paper • 2501.01790 • Published Jan 3, 2025 • 8
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published Jan 7, 2025 • 48
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control

Paper • 2501.03847 • Published Jan 7, 2025 • 23
Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers

Paper • 2501.03931 • Published Jan 7, 2025 • 15
An Empirical Study of Autoregressive Pre-training from Videos

Paper • 2501.05453 • Published Jan 9, 2025 • 41
VideoRAG: Retrieval-Augmented Generation over Video Corpus

Paper • 2501.05874 • Published Jan 10, 2025 • 75
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning

Paper • 2501.04698 • Published Jan 8, 2025 • 15
Multi-subject Open-set Personalization in Video Generation

Paper • 2501.06187 • Published Jan 10, 2025 • 14
VideoAuteur: Towards Long Narrative Video Generation

Paper • 2501.06173 • Published Jan 10, 2025 • 31
Diffusion Adversarial Post-Training for One-Step Video Generation

Paper • 2501.08316 • Published Jan 14, 2025 • 36
RepVideo: Rethinking Cross-Layer Representation for Video Generation

Paper • 2501.08994 • Published Jan 15, 2025 • 15
Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion

Paper • 2501.09019 • Published Jan 15, 2025 • 12
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation

Paper • 2501.09755 • Published Jan 16, 2025 • 35
X-Dyna: Expressive Dynamic Human Image Animation

Paper • 2501.10021 • Published Jan 17, 2025 • 14
GameFactory: Creating New Games with Generative Interactive Videos

Paper • 2501.08325 • Published Jan 14, 2025 • 68
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

Paper • 2501.12375 • Published Jan 21, 2025 • 23
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

Paper • 2501.08331 • Published Jan 14, 2025 • 20
EMO2: End-Effector Guided Audio-Driven Avatar Video Generation

Paper • 2501.10687 • Published Jan 18, 2025 • 15
Taming Teacher Forcing for Masked Autoregressive Video Generation

Paper • 2501.12389 • Published Jan 21, 2025 • 10
Improving Video Generation with Human Feedback

Paper • 2501.13918 • Published Jan 23, 2025 • 53
DiffuEraser: A Diffusion Model for Video Inpainting

Paper • 2501.10018 • Published Jan 17, 2025 • 17
EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion

Paper • 2501.13452 • Published Jan 23, 2025 • 8
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models

Paper • 2502.02492 • Published Feb 4, 2025 • 66
DynVFX: Augmenting Real Videos with Dynamic Content

Paper • 2502.03621 • Published Feb 5, 2025 • 30
MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation

Paper • 2502.04299 • Published Feb 6, 2025 • 18
Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach

Paper • 2502.03639 • Published Feb 5, 2025 • 9
VideoRoPE: What Makes for Good Video Rotary Position Embedding?

Paper • 2502.05173 • Published Feb 7, 2025 • 64
Fast Video Generation with Sliding Tile Attention

Paper • 2502.04507 • Published Feb 6, 2025 • 51
Goku: Flow Based Video Generative Foundation Models

Paper • 2502.04896 • Published Feb 7, 2025 • 107
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

Paper • 2502.05179 • Published Feb 7, 2025 • 24
On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices

Paper • 2502.04363 • Published Feb 5, 2025 • 12
History-Guided Video Diffusion

Paper • 2502.06764 • Published Feb 10, 2025 • 12
Magic 1-For-1: Generating One Minute Video Clips within One Minute

Paper • 2502.07701 • Published Feb 11, 2025 • 37
Enhance-A-Video: Better Generated Video for Free

Paper • 2502.07508 • Published Feb 11, 2025 • 21
VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation

Paper • 2502.07531 • Published Feb 11, 2025 • 12
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion

Paper • 2502.08590 • Published Feb 12, 2025 • 43
CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation

Paper • 2502.08639 • Published Feb 12, 2025 • 43
Next Block Prediction: Video Generation via Semi-Autoregressive Modeling

Paper • 2502.07737 • Published Feb 11, 2025 • 9
Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance

Paper • 2502.06145 • Published Feb 10, 2025 • 18
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

Paper • 2502.10248 • Published Feb 14, 2025 • 57
Phantom: Subject-consistent video generation via cross-modal alignment

Paper • 2502.11079 • Published Feb 16, 2025 • 59
VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing

Paper • 2502.17258 • Published Feb 24, 2025 • 79
RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers

Paper • 2502.15894 • Published Feb 21, 2025 • 20
UniTok: A Unified Tokenizer for Visual Generation and Understanding

Paper • 2502.20321 • Published Feb 27, 2025 • 30
Mobius: Text to Seamless Looping Video Generation via Latent Shift

Paper • 2502.20307 • Published Feb 27, 2025 • 18
The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation

Paper • 2503.04606 • Published Mar 6, 2025 • 9
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control

Paper • 2503.05639 • Published Mar 7, 2025 • 26
TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models

Paper • 2503.05638 • Published Mar 7, 2025 • 20
Automated Movie Generation via Multi-Agent CoT Planning

Paper • 2503.07314 • Published Mar 10, 2025 • 44
MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice

Paper • 2503.05978 • Published Mar 7, 2025 • 36
Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling

Paper • 2503.08605 • Published Mar 11, 2025 • 27
TPDiff: Temporal Pyramid Video Diffusion Model

Paper • 2503.09566 • Published Mar 12, 2025 • 45
Reangle-A-Video: 4D Video Generation as Video-to-Video Translation

Paper • 2503.09151 • Published Mar 12, 2025 • 32
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k

Paper • 2503.09642 • Published Mar 12, 2025 • 20
Long Context Tuning for Video Generation

Paper • 2503.10589 • Published Mar 13, 2025 • 14
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance

Paper • 2503.10391 • Published Mar 13, 2025 • 12
Large-scale Pre-training for Grounded Video Caption Generation

Paper • 2503.10781 • Published Mar 13, 2025 • 16
Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption

Paper • 2503.09279 • Published Mar 12, 2025 • 5
DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation

Paper • 2503.06053 • Published Mar 8, 2025 • 138
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning

Paper • 2503.13444 • Published Mar 17, 2025 • 20
MTV-Inpaint: Multi-Task Long Video Inpainting

Paper • 2503.11412 • Published Mar 14, 2025 • 10
Long-Video Audio Synthesis with Multi-Agent Collaboration

Paper • 2503.10719 • Published Mar 13, 2025 • 9
WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation

Paper • 2503.08153 • Published Mar 11, 2025 • 3
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control

Paper • 2503.14492 • Published Mar 18, 2025 • 20
FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis

Paper • 2503.13265 • Published Mar 17, 2025 • 15
Concat-ID: Towards Universal Identity-Preserving Video Synthesis

Paper • 2503.14151 • Published Mar 18, 2025 • 10
Temporal Regularization Makes Your Video Generator Stronger

Paper • 2503.15417 • Published Mar 19, 2025 • 22
MusicInfuser: Making Video Diffusion Listen and Dance

Paper • 2503.14505 • Published Mar 18, 2025 • 12
MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance

Paper • 2503.16421 • Published Mar 20, 2025 • 11
MagicID: Hybrid Preference Optimization for ID-Consistent and Dynamic-Preserved Video Customization

Paper • 2503.12689 • Published Mar 16, 2025 • 5
Enabling Versatile Controls for Video Diffusion Models

Paper • 2503.16983 • Published Mar 21, 2025 • 15
Video-T1: Test-Time Scaling for Video Generation

Paper • 2503.18942 • Published Mar 24, 2025 • 90
CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models

Paper • 2503.18886 • Published Mar 24, 2025 • 24
Long-Context Autoregressive Video Modeling with Next-Frame Prediction

Paper • 2503.19325 • Published Mar 25, 2025 • 73
FullDiT: Multi-Task Video Generative Foundation Model with Full Attention

Paper • 2503.19907 • Published Mar 25, 2025 • 8
Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation

Paper • 2503.19881 • Published Mar 25, 2025 • 6
Wan: Open and Advanced Large-Scale Video Generative Models

Paper • 2503.20314 • Published Mar 26, 2025 • 61
AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset

Paper • 2503.19462 • Published Mar 25, 2025 • 10
Synthetic Video Enhances Physical Fidelity in Video Synthesis

Paper • 2503.20822 • Published Mar 26, 2025 • 16
SketchVideo: Sketch-based Video Generation and Editing

Paper • 2503.23284 • Published Mar 30, 2025 • 23
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation

Paper • 2503.24379 • Published Mar 31, 2025 • 76
JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization

Paper • 2503.23377 • Published Mar 30, 2025 • 57
SkyReels-A2: Compose Anything in Video Diffusion Transformers

Paper • 2504.02436 • Published Apr 3, 2025 • 39
One-Minute Video Generation with Test-Time Training

Paper • 2504.05298 • Published Apr 7, 2025 • 110
GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography

Paper • 2504.07083 • Published Apr 9, 2025 • 22
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

Paper • 2504.05541 • Published Apr 7, 2025 • 15
DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion

Paper • 2504.04010 • Published Apr 5, 2025 • 9
Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization

Paper • 2504.08641 • Published Apr 11, 2025 • 6
NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors

Paper • 2504.11427 • Published Apr 15, 2025 • 19
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models

Paper • 2504.13122 • Published Apr 17, 2025 • 20
SphereDiff: Tuning-free Omnidirectional Panoramic Image and Video Generation via Spherical Latent Representation

Paper • 2504.14396 • Published Apr 19, 2025 • 27
Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation

Paper • 2504.14899 • Published Apr 21, 2025 • 20
Vidi: Large Multimodal Models for Video Understanding and Editing

Paper • 2504.15681 • Published Apr 22, 2025 • 14
RealisDance-DiT: Simple yet Strong Baseline towards Controllable Character Animation in the Wild

Paper • 2504.14977 • Published Apr 21, 2025 • 10
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

Paper • 2504.17343 • Published Apr 24, 2025 • 13
3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models

Paper • 2504.17414 • Published Apr 24, 2025 • 18
Towards Understanding Camera Motions in Any Video

Paper • 2504.15376 • Published Apr 21, 2025 • 157
Subject-driven Video Generation via Disentangled Identity and Motion

Paper • 2504.17816 • Published Apr 23, 2025 • 13
ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction

Paper • 2504.21855 • Published Apr 30, 2025 • 13
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

Paper • 2505.04512 • Published May 7, 2025 • 36
Scaling Image and Video Generation via Test-Time Evolutionary Search

Paper • 2505.17618 • Published May 23, 2025 • 41
Model Already Knows the Best Noise: Bayesian Active Noise Selection via Attention in Video Diffusion Model

Paper • 2505.17561 • Published May 23, 2025 • 31
OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

Paper • 2505.20292 • Published May 26, 2025 • 52
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

Paper • 2505.18875 • Published May 24, 2025 • 42
MotionPro: A Precise Motion Controller for Image-to-Video Generation

Paper • 2505.20287 • Published May 26, 2025 • 20
HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters

Paper • 2505.20156 • Published May 26, 2025 • 1
MAGREF: Masked Guidance for Any-Reference Video Generation

Paper • 2505.23742 • Published May 29, 2025 • 11
ATI: Any Trajectory Instruction for Controllable Video Generation

Paper • 2505.22944 • Published May 28, 2025 • 6
Temporal In-Context Fine-Tuning for Versatile Control of Video Diffusion Models

Paper • 2506.00996 • Published Jun 1, 2025 • 40
FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation

Paper • 2506.01144 • Published Jun 1, 2025 • 15
Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation

Paper • 2506.04225 • Published Jun 4, 2025 • 28
IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation

Paper • 2506.03150 • Published Jun 3, 2025 • 21
LayerFlow: A Unified Model for Layer-aware Video Generation

Paper • 2506.04228 • Published Jun 4, 2025 • 13
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training

Paper • 2506.05301 • Published Jun 5, 2025 • 60
FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing

Paper • 2506.05046 • Published Jun 5, 2025 • 2
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement

Paper • 2506.07848 • Published Jun 9, 2025 • 4
Dynamic View Synthesis as an Inverse Problem

Paper • 2506.08004 • Published Jun 9, 2025 • 5
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models

Paper • 2506.07177 • Published Jun 8, 2025 • 23
Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10, 2025 • 108
Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

Paper • 2506.09350 • Published Jun 11, 2025 • 48
InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions

Paper • 2506.09984 • Published Jun 11, 2025 • 14
Cross-Frame Representation Alignment for Fine-Tuning Video Diffusion Models

Paper • 2506.09229 • Published Jun 10, 2025 • 7
Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy

Paper • 2506.22432 • Published Jun 27, 2025 • 13
VMoBA: Mixture-of-Block Attention for Video Diffusion Models

Paper • 2506.23858 • Published Jun 30, 2025 • 32
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for Long Video Generation

Paper • 2506.19852 • Published Jun 24, 2025 • 43
JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching

Paper • 2506.23552 • Published Jun 30, 2025 • 10
STR-Match: Matching SpatioTemporal Relevance Score for Training-Free Video Editing

Paper • 2506.22868 • Published Jun 28, 2025 • 5
StreamDiT: Real-Time Streaming Text-to-Video Generation

Paper • 2507.03745 • Published Jul 4, 2025 • 33
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Paper • 2507.07982 • Published Jul 10, 2025 • 34
A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality

Paper • 2507.07202 • Published Jul 9, 2025 • 25
Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective

Paper • 2507.08801 • Published Jul 11, 2025 • 32
UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks

Paper • 2507.11336 • Published Jul 15, 2025 • 7
TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation

Paper • 2507.04984 • Published Jul 7, 2025 • 6
SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction

Paper • 2507.15852 • Published Jul 21, 2025 • 38
TokensGen: Harnessing Condensed Tokens for Long Video Generation

Paper • 2507.15728 • Published Jul 21, 2025 • 8
PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized Timestep Adaptation

Paper • 2507.16116 • Published Jul 22, 2025 • 13
nablaNABLA: Neighborhood Adaptive Block-Level Attention

Paper • 2507.13546 • Published Jul 17, 2025 • 126
Captain Cinema: Towards Short Movie Generation

Paper • 2507.18634 • Published Jul 24, 2025 • 42
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation

Paper • 2508.03694 • Published Aug 5, 2025 • 52
DreamVVT: Mastering Realistic Video Virtual Try-On in the Wild via a Stage-Wise Diffusion Transformer Framework

Paper • 2508.02807 • Published Aug 4, 2025 • 13
Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation

Paper • 2508.07981 • Published Aug 11, 2025 • 63
CharacterShot: Controllable and Consistent 4D Character Animation

Paper • 2508.07409 • Published Aug 10, 2025 • 39
Cut2Next: Generating Next Shot via In-Context Tuning

Paper • 2508.08244 • Published Aug 11, 2025 • 13
Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation

Paper • 2508.07901 • Published Aug 11, 2025 • 40
ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing

Paper • 2508.10881 • Published Aug 14, 2025 • 53
Waver: Wave Your Way to Lifelike Video Generation

Paper • 2508.15761 • Published Aug 21, 2025 • 39
Wan-S2V: Audio-Driven Cinematic Video Generation

Paper • 2508.18621 • Published Aug 26, 2025 • 22
Mixture of Contexts for Long Video Generation

Paper • 2508.21058 • Published Aug 28, 2025 • 35
UniVerse-1: Unified Audio-Video Generation via Stitching of Experts

Paper • 2509.06155 • Published Sep 7, 2025 • 14
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

Paper • 2509.08519 • Published Sep 10, 2025 • 130
Wan-Animate: Unified Character Animation and Replacement with Holistic Replication

Paper • 2509.14055 • Published Sep 17, 2025 • 17
Lynx: Towards High-Fidelity Personalized Video Generation

Paper • 2509.15496 • Published Sep 19, 2025 • 13
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models

Paper • 2509.17627 • Published Sep 22, 2025 • 66
ContextFlow: Training-Free Video Object Editing via Adaptive Context Enrichment

Paper • 2509.17818 • Published Sep 22, 2025 • 8
DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder

Paper • 2509.25182 • Published Sep 29, 2025 • 39
MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation

Paper • 2509.26391 • Published Sep 30, 2025 • 22
UniVideo: Unified Understanding, Generation, and Editing for Videos

Paper • 2510.08377 • Published Oct 9, 2025 • 81
VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning

Paper • 2510.08555 • Published Oct 9, 2025 • 65
UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution

Paper • 2510.08143 • Published Oct 9, 2025 • 20
PickStyle: Video-to-Video Style Transfer with Context-Style Adapters

Paper • 2510.07546 • Published Oct 8, 2025 • 22
InstructX: Towards Unified Visual Editing with MLLM Guidance

Paper • 2510.08485 • Published Oct 9, 2025 • 18
Bridging Text and Video Generation: A Survey

Paper • 2510.04999 • Published Oct 6, 2025 • 6
Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset

Paper • 2510.15742 • Published Oct 17, 2025 • 52
Stable Video Infinity: Infinite-Length Video Generation with Error Recycling

Paper • 2510.09212 • Published Oct 10, 2025 • 18
UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions

Paper • 2511.03334 • Published Nov 5, 2025 • 54
UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback

Paper • 2511.01678 • Published Nov 3, 2025 • 38
VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning

Paper • 2510.25772 • Published Oct 29, 2025 • 33
SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation

Paper • 2511.19320 • Published Nov 24, 2025 • 43
Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy

Paper • 2511.21579 • Published Nov 26, 2025 • 23
Plan-X: Instruct Video Generation via Semantic Planning

Paper • 2511.17986 • Published Nov 22, 2025 • 18
MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

Paper • 2512.03041 • Published Dec 2, 2025 • 65
Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout

Paper • 2511.20649 • Published Nov 25, 2025 • 51
Vision Bridge Transformer at Scale

Paper • 2511.23199 • Published Nov 28, 2025 • 46
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Paper • 2512.08765 • Published Dec 9, 2025 • 134
StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation

Paper • 2512.09363 • Published Dec 10, 2025 • 74
Composing Concepts from Images and Videos via Concept-prompt Binding

Paper • 2512.09824 • Published Dec 10, 2025 • 28
UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

Paper • 2512.07831 • Published Dec 8, 2025 • 17
Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

Paper • 2512.13507 • Published Dec 15, 2025 • 41
SemanticGen: Video Generation in Semantic Space

Paper • 2512.20619 • Published Dec 23, 2025 • 95
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

Paper • 2512.16093 • Published Dec 18, 2025 • 97
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

Paper • 2512.17504 • Published Dec 19, 2025 • 99
FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation

Paper • 2512.24724 • Published Dec 31, 2025 • 9
LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published Jan 6 • 178
DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer

Paper • 2601.01425 • Published Jan 4 • 53
SkyReels-V3 Technique Report

Paper • 2601.17323 • Published Jan 24 • 9
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation

Paper • 2602.03796 • Published Feb 3 • 64
A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published Feb 23 • 523
SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model

Paper • 2602.21818 • Published Feb 25 • 55
Helios: Real Real-Time Long Video Generation Model

Paper • 2603.04379 • Published Mar 4 • 186
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

Paper • 2603.21986 • Published Mar 23 • 125
Seedance 2.0: Advancing Video Generation for World Complexity

Paper • 2604.14148 • Published 24 days ago • 155

Upvote

Collection guide
Browse collections