video
updated
WorldDreamer: Towards General World Models for Video Generation via
Predicting Masked Tokens
Paper
• 2401.09985
• Published
• 18
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper
• 2401.09962
• Published
• 9
Inflation with Diffusion: Efficient Temporal Adaptation for
Text-to-Video Super-Resolution
Paper
• 2401.10404
• Published
• 10
ActAnywhere: Subject-Aware Video Background Generation
Paper
• 2401.10822
• Published
• 13
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper
• 2401.12945
• Published
• 87
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models
and Adapters with Decoupled Consistency Learning
Paper
• 2402.00769
• Published
• 22
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper
• 2402.13217
• Published
• 38
Video ReCap: Recursive Captioning of Hour-Long Videos
Paper
• 2402.13250
• Published
• 26
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video
Synthesis
Paper
• 2402.14797
• Published
• 21
Sora: A Review on Background, Technology, Limitations, and Opportunities
of Large Vision Models
Paper
• 2402.17177
• Published
• 88
Sora Generates Videos with Stunning Geometrical Consistency
Paper
• 2402.17403
• Published
• 18
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion
Latent Aligners
Paper
• 2402.17723
• Published
• 16
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper
• 2402.19479
• Published
• 35
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and
Diffusion Models
Paper
• 2403.03100
• Published
• 38
Tuning-Free Noise Rectification for High Fidelity Image-to-Video
Generation
Paper
• 2403.02827
• Published
• 9
Video Editing via Factorized Diffusion Distillation
Paper
• 2403.09334
• Published
• 22
Video Mamba Suite: State Space Model as a Versatile Alternative for
Video Understanding
Paper
• 2403.09626
• Published
• 15
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential
Equations
Paper
• 2108.01073
• Published
• 9
AnimateDiff-Lightning: Cross-Model Diffusion Distillation
Paper
• 2403.12706
• Published
• 18
Streaming Dense Video Captioning
Paper
• 2404.01297
• Published
• 13
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through
Direct Preference Optimization
Paper
• 2404.09956
• Published
• 12
MotionMaster: Training-free Camera Motion Transfer For Video Generation
Paper
• 2404.15789
• Published
• 13
LLM-AD: Large Language Model based Audio Description System
Paper
• 2405.00983
• Published
• 22
FIFO-Diffusion: Generating Infinite Videos from Text without Training
Paper
• 2405.11473
• Published
• 56
ReVideo: Remake a Video with Motion and Content Control
Paper
• 2405.13865
• Published
• 25
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Paper
• 2405.14598
• Published
• 13
Denoising LM: Pushing the Limits of Error Correction Models for Speech
Recognition
Paper
• 2405.15216
• Published
• 15
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion
Models
Paper
• 2405.16537
• Published
• 17
Looking Backward: Streaming Video-to-Video Translation with Feature
Banks
Paper
• 2405.15757
• Published
• 15
Human4DiT: Free-view Human Video Generation with 4D Diffusion
Transformer
Paper
• 2405.17405
• Published
• 16
Collaborative Video Diffusion: Consistent Multi-video Generation with
Camera Control
Paper
• 2405.17414
• Published
• 12
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language
Models via Instruction Tuning
Paper
• 2405.18386
• Published
• 22
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model
with Mixed Reward Feedback
Paper
• 2405.18750
• Published
• 21
EasyAnimate: A High-Performance Long Video Generation Method based on
Transformer Architecture
Paper
• 2405.18991
• Published
• 12
MOFA-Video: Controllable Image Animation via Generative Motion Field
Adaptions in Frozen Image-to-Video Diffusion Model
Paper
• 2405.20222
• Published
• 11
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo
Benchmark
Paper
• 2405.19707
• Published
• 8
Learning Temporally Consistent Video Depth from Video Diffusion Priors
Paper
• 2406.01493
• Published
• 23
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video
Generation
Paper
• 2406.00908
• Published
• 12
Searching Priors Makes Text-to-Video Synthesis Better
Paper
• 2406.03215
• Published
• 13
ShareGPT4Video: Improving Video Understanding and Generation with Better
Captions
Paper
• 2406.04325
• Published
• 74
SF-V: Single Forward Video Generation Model
Paper
• 2406.04324
• Published
• 24
VideoTetris: Towards Compositional Text-to-Video Generation
Paper
• 2406.04277
• Published
• 25
MotionClone: Training-Free Motion Cloning for Controllable Video
Generation
Paper
• 2406.05338
• Published
• 41
NaRCan: Natural Refined Canonical Image with Integration of Diffusion
Prior for Video Editing
Paper
• 2406.06523
• Published
• 53
Hierarchical Patch Diffusion Models for High-Resolution Video Generation
Paper
• 2406.07792
• Published
• 16
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and
Video Generation
Paper
• 2406.07686
• Published
• 17
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and
Image-to-Video Generation
Paper
• 2406.08656
• Published
• 9
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing
Reliability,Reproducibility, and Practicality
Paper
• 2406.08845
• Published
• 9
ExVideo: Extending Video Diffusion Models via Parameter-Efficient
Post-Tuning
Paper
• 2406.14130
• Published
• 10
MantisScore: Building Automatic Metrics to Simulate Fine-grained Human
Feedback for Video Generation
Paper
• 2406.15252
• Published
• 18
Video-Infinity: Distributed Long Video Generation
Paper
• 2406.16260
• Published
• 30
DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image
Restoration Models
Paper
• 2407.01519
• Published
• 26
SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix
Paper
• 2407.00367
• Published
• 11
VIMI: Grounding Video Generation through Multi-modal Instruction
Paper
• 2407.06304
• Published
• 10
VEnhancer: Generative Space-Time Enhancement for Video Generation
Paper
• 2407.07667
• Published
• 16
Still-Moving: Customized Video Generation without Customized Video Data
Paper
• 2407.08674
• Published
• 13
CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation
Paper
• 2407.06188
• Published
• 3
TCAN: Animating Human Images with Temporally Consistent Pose Guidance
using Diffusion Models
Paper
• 2407.09012
• Published
• 10
Paper
• 2407.09533
• Published
• 8
Noise Calibration: Plug-and-play Content-Preserving Video Enhancement
using Pre-trained Video Diffusion Models
Paper
• 2407.10285
• Published
• 5
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Paper
• 2407.12781
• Published
• 13
Towards Understanding Unsafe Video Generation
Paper
• 2407.12581
• Published
Streetscapes: Large-scale Consistent Street View Generation Using
Autoregressive Video Diffusion
Paper
• 2407.13759
• Published
• 18
Cinemo: Consistent and Controllable Image Animation with Motion
Diffusion Models
Paper
• 2407.15642
• Published
• 11
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence
Paper
• 2407.16655
• Published
• 30
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video
Generation
Paper
• 2407.14505
• Published
• 26
FreeLong: Training-Free Long Video Generation with SpectralBlend
Temporal Attention
Paper
• 2407.19918
• Published
• 51
Tora: Trajectory-oriented Diffusion Transformer for Video Generation
Paper
• 2407.21705
• Published
• 27
Fine-gained Zero-shot Video Sampling
Paper
• 2407.21475
• Published
• 6
Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual
Inversion
Paper
• 2408.00458
• Published
• 12
UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified
Model
Paper
• 2408.00762
• Published
• 10
VidGen-1M: A Large-Scale Dataset for Text-to-video Generation
Paper
• 2408.02629
• Published
• 15
ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually
Synced Facial Performer
Paper
• 2408.03284
• Published
• 11
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior
for Part-Level Dynamics
Paper
• 2408.04631
• Published
• 9
Kalman-Inspired Feature Propagation for Video Face Super-Resolution
Paper
• 2408.05205
• Published
• 9
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Paper
• 2408.06072
• Published
• 38
FancyVideo: Towards Dynamic and Consistent Video Generation via
Cross-frame Textual Guidance
Paper
• 2408.08189
• Published
• 17
Factorized-Dreamer: Training A High-Quality Video Generator with Limited
and Low-Quality Data
Paper
• 2408.10119
• Published
• 17
TWLV-I: Analysis and Insights from Holistic Evaluation on Video
Foundation Models
Paper
• 2408.11318
• Published
• 56
TrackGo: A Flexible and Efficient Method for Controllable Video
Generation
Paper
• 2408.11475
• Published
• 18
Real-Time Video Generation with Pyramid Attention Broadcast
Paper
• 2408.12588
• Published
• 17
CustomCrafter: Customized Video Generation with Preserving Motion and
Concept Composition Abilities
Paper
• 2408.13239
• Published
• 11
Training-free Long Video Generation with Chain of Diffusion Model
Experts
Paper
• 2408.13423
• Published
• 23
TVG: A Training-free Transition Video Generation Method with Diffusion
Models
Paper
• 2408.13413
• Published
• 14
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe
Interpolation
Paper
• 2408.15239
• Published
• 30
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video
Diffusion Model
Paper
• 2409.01199
• Published
• 14
Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive
Content Generation
Paper
• 2409.01055
• Published
• 6
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion
Dependency
Paper
• 2409.02634
• Published
• 97
OSV: One Step is Enough for High-Quality Image to Video Generation
Paper
• 2409.11367
• Published
• 14
Towards Diverse and Efficient Audio Captioning via Diffusion Models
Paper
• 2409.09401
• Published
• 7
LVCD: Reference-based Lineart Video Colorization with Diffusion Models
Paper
• 2409.12960
• Published
• 24
Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient
Video Latent Generation
Paper
• 2409.12532
• Published
• 5
MIMO: Controllable Character Video Synthesis with Spatial Decomposed
Modeling
Paper
• 2409.16160
• Published
• 34
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation
Paper
• 2409.18964
• Published
• 27
VideoGuide: Improving Video Diffusion Models without Training Through a
Teacher's Guide
Paper
• 2410.04364
• Published
• 29
AuroraCap: Efficient, Performant Video Detailed Captioning and a New
Benchmark
Paper
• 2410.03051
• Published
• 6
Pyramidal Flow Matching for Efficient Video Generative Modeling
Paper
• 2410.05954
• Published
• 40
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through
Data, Reward, and Conditional Guidance Design
Paper
• 2410.05677
• Published
• 14
Loong: Generating Minute-level Long Videos with Autoregressive Language
Models
Paper
• 2410.02757
• Published
• 36
Animate-X: Universal Character Image Animation with Enhanced Motion
Representation
Paper
• 2410.10306
• Published
• 56
Cavia: Camera-controllable Multi-view Video Diffusion with
View-Integrated Attention
Paper
• 2410.10774
• Published
• 25
LVD-2M: A Long-take Video Dataset with Temporally Dense Captions
Paper
• 2410.10816
• Published
• 21
Movie Gen: A Cast of Media Foundation Models
Paper
• 2410.13720
• Published
• 100
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise
Motion Control
Paper
• 2410.13830
• Published
• 26
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language
Understanding
Paper
• 2410.17434
• Published
• 27
FasterCache: Training-Free Video Diffusion Model Acceleration with High
Quality
Paper
• 2410.19355
• Published
• 24
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale
Paper
• 2410.20280
• Published
• 23
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video
Generation
Paper
• 2410.23277
• Published
• 9
Fashion-VDM: Video Diffusion Model for Virtual Try-On
Paper
• 2411.00225
• Published
• 11
Adaptive Caching for Faster Video Generation with Diffusion Transformers
Paper
• 2411.02397
• Published
• 23
Motion Control for Enhanced Complex Action Video Generation
Paper
• 2411.08328
• Published
• 5
AnimateAnything: Consistent and Controllable Animation for Video
Generation
Paper
• 2411.10836
• Published
• 24
StableV2V: Stablizing Shape Consistency in Video-to-Video Editing
Paper
• 2411.11045
• Published
• 11
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations
Paper
• 2411.10818
• Published
• 26
VBench++: Comprehensive and Versatile Benchmark Suite for Video
Generative Models
Paper
• 2411.13503
• Published
• 34
MagicDriveDiT: High-Resolution Long Video Generation for Autonomous
Driving with Adaptive Control
Paper
• 2411.13807
• Published
• 11
Efficient Long Video Tokenization via Coordinated-based Patch
Reconstruction
Paper
• 2411.14762
• Published
• 11
VideoRepair: Improving Text-to-Video Generation via Misalignment
Evaluation and Localized Refinement
Paper
• 2411.15115
• Published
• 10
DreamRunner: Fine-Grained Storytelling Video Generation with
Retrieval-Augmented Motion Adaptation
Paper
• 2411.16657
• Published
• 19
AnchorCrafter: Animate CyberAnchors Saling Your Products via
Human-Object Interacting Video Generation
Paper
• 2411.17383
• Published
• 7
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
Paper
• 2411.17440
• Published
• 37
Free^2Guide: Gradient-Free Path Integral Control for Enhancing
Text-to-Video Generation with Large Vision-Language Models
Paper
• 2411.17041
• Published
• 13
Video Depth without Video Models
Paper
• 2411.19189
• Published
• 39
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Paper
• 2411.19108
• Published
• 20
Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling
Paper
• 2411.18664
• Published
• 24
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion
Transformers
Paper
• 2411.18673
• Published
• 8
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding
by Video Spatiotemporal Augmentation
Paper
• 2412.00927
• Published
• 29
Open-Sora Plan: Open-Source Large Video Generation Model
Paper
• 2412.00131
• Published
• 33
TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any
Point in Long Video
Paper
• 2411.18671
• Published
• 20
Paper
• 2411.18933
• Published
• 17
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent
Video Diffusion Model
Paper
• 2411.17459
• Published
• 12
Long Video Diffusion Generation with Segmented Cross-Attention and
Content-Rich Video Data Curation
Paper
• 2412.01316
• Published
• 10
VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video
Generation
Paper
• 2412.02259
• Published
• 60
NVComposer: Boosting Generative Novel View Synthesis with Multiple
Sparse and Unposed Images
Paper
• 2412.03517
• Published
• 19
Mimir: Improving Video Diffusion Models for Precise Text Understanding
Paper
• 2412.03085
• Published
• 12
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment
Paper
• 2412.04814
• Published
• 46
GenMAC: Compositional Text-to-Video Generation with Multi-Agent
Collaboration
Paper
• 2412.04440
• Published
• 22
Mind the Time: Temporally-Controlled Multi-Event Video Generation
Paper
• 2412.05263
• Published
• 10
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Paper
• 2412.04432
• Published
• 16
MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with
Mixture of Score Guidance
Paper
• 2412.05355
• Published
• 8
STIV: Scalable Text and Image Conditioned Video Generation
Paper
• 2412.07730
• Published
• 74
Paper
• 2412.07583
• Published
• 20
MoViE: Mobile Diffusion for Video Editing
Paper
• 2412.06578
• Published
• 18
Video Motion Transfer with Diffusion Transformers
Paper
• 2412.07776
• Published
• 17
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse
Viewpoints
Paper
• 2412.07760
• Published
• 55
StyleMaster: Stylize Your Video with Artistic Generation and Translation
Paper
• 2412.07744
• Published
• 20
Track4Gen: Teaching Video Diffusion Models to Track Points Improves
Video Generation
Paper
• 2412.06016
• Published
• 20
DisPose: Disentangling Pose Guidance for Controllable Human Image
Animation
Paper
• 2412.09349
• Published
• 8
InstanceCap: Improving Text-to-Video Generation via Instance-aware
Structured Caption
Paper
• 2412.09283
• Published
• 19
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation
with Linear Computational Complexity
Paper
• 2412.09856
• Published
• 11
VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video
Face Swapping
Paper
• 2412.11279
• Published
• 13
SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner
Paper
• 2412.10533
• Published
• 5
MIVE: New Design and Benchmark for Multi-Instance Video Editing
Paper
• 2412.12877
• Published
• 4
AniDoc: Animation Creation Made Easier
Paper
• 2412.14173
• Published
• 58
Autoregressive Video Generation without Vector Quantization
Paper
• 2412.14169
• Published
• 14
VidTok: A Versatile and Open-Source Video Tokenizer
Paper
• 2412.13061
• Published
• 8
Parallelized Autoregressive Visual Generation
Paper
• 2412.15119
• Published
• 53
TRecViT: A Recurrent Video Transformer
Paper
• 2412.14294
• Published
• 13
Large Motion Video Autoencoding with Cross-modal Video VAE
Paper
• 2412.17805
• Published
• 24
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion
Transformer for Tuning-Free Multi-Prompt Longer Video Generation
Paper
• 2412.18597
• Published
• 20
MotiF: Making Text Count in Image Animation with Motion Focal Loss
Paper
• 2412.16153
• Published
• 6
VideoMaker: Zero-shot Customized Video Generation with the Inherent
Force of Video Diffusion Models
Paper
• 2412.19645
• Published
• 13
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion
Control
Paper
• 2501.01427
• Published
• 53
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with
Video LLM
Paper
• 2501.00599
• Published
• 46
LTX-Video: Realtime Video Latent Diffusion
Paper
• 2501.00103
• Published
• 50
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent
Diffusion Models
Paper
• 2501.01423
• Published
• 44
Unifying Specialized Visual Encoders for Video Language Models
Paper
• 2501.01426
• Published
• 20
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video
Restoration
Paper
• 2501.01320
• Published
• 12
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning
for Image and Video Generation
Paper
• 2412.21059
• Published
• 19
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for
Real-World Video Super-Resolution
Paper
• 2501.02976
• Published
• 56
GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields
through Efficient Dense 3D Point Tracking
Paper
• 2501.02690
• Published
• 16
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video
Generation
Paper
• 2501.03059
• Published
• 22
TransPixar: Advancing Text-to-Video Generation with Transparency
Paper
• 2501.03006
• Published
• 25
Ingredients: Blending Custom Photos with Video Diffusion Transformers
Paper
• 2501.01790
• Published
• 8
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of
Images and Videos
Paper
• 2501.04001
• Published
• 47
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video
Generation Control
Paper
• 2501.03847
• Published
• 22
Magic Mirror: ID-Preserved Video Generation in Video Diffusion
Transformers
Paper
• 2501.03931
• Published
• 15
An Empirical Study of Autoregressive Pre-training from Videos
Paper
• 2501.05453
• Published
• 41
VideoRAG: Retrieval-Augmented Generation over Video Corpus
Paper
• 2501.05874
• Published
• 75
ConceptMaster: Multi-Concept Video Customization on Diffusion
Transformer Models Without Test-Time Tuning
Paper
• 2501.04698
• Published
• 15
Multi-subject Open-set Personalization in Video Generation
Paper
• 2501.06187
• Published
• 14
VideoAuteur: Towards Long Narrative Video Generation
Paper
• 2501.06173
• Published
• 31
Diffusion Adversarial Post-Training for One-Step Video Generation
Paper
• 2501.08316
• Published
• 36
RepVideo: Rethinking Cross-Layer Representation for Video Generation
Paper
• 2501.08994
• Published
• 15
Ouroboros-Diffusion: Exploring Consistent Content Generation in
Tuning-free Long Video Diffusion
Paper
• 2501.09019
• Published
• 12
Learnings from Scaling Visual Tokenizers for Reconstruction and
Generation
Paper
• 2501.09755
• Published
• 35
X-Dyna: Expressive Dynamic Human Image Animation
Paper
• 2501.10021
• Published
• 14
GameFactory: Creating New Games with Generative Interactive Videos
Paper
• 2501.08325
• Published
• 67
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
Paper
• 2501.12375
• Published
• 23
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using
Real-Time Warped Noise
Paper
• 2501.08331
• Published
• 20
EMO2: End-Effector Guided Audio-Driven Avatar Video Generation
Paper
• 2501.10687
• Published
• 15
Taming Teacher Forcing for Masked Autoregressive Video Generation
Paper
• 2501.12389
• Published
• 10
Improving Video Generation with Human Feedback
Paper
• 2501.13918
• Published
• 52
DiffuEraser: A Diffusion Model for Video Inpainting
Paper
• 2501.10018
• Published
• 17
EchoVideo: Identity-Preserving Human Video Generation by Multimodal
Feature Fusion
Paper
• 2501.13452
• Published
• 8
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion
Generation in Video Models
Paper
• 2502.02492
• Published
• 66
DynVFX: Augmenting Real Videos with Dynamic Content
Paper
• 2502.03621
• Published
• 31
MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video
Generation
Paper
• 2502.04299
• Published
• 18
Towards Physical Understanding in Video Generation: A 3D Point
Regularization Approach
Paper
• 2502.03639
• Published
• 9
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
Paper
• 2502.05173
• Published
• 64
Fast Video Generation with Sliding Tile Attention
Paper
• 2502.04507
• Published
• 51
Goku: Flow Based Video Generative Foundation Models
Paper
• 2502.04896
• Published
• 106
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution
Video Generation
Paper
• 2502.05179
• Published
• 24
On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for
Mobile Devices
Paper
• 2502.04363
• Published
• 12
History-Guided Video Diffusion
Paper
• 2502.06764
• Published
• 12
Magic 1-For-1: Generating One Minute Video Clips within One Minute
Paper
• 2502.07701
• Published
• 36
Enhance-A-Video: Better Generated Video for Free
Paper
• 2502.07508
• Published
• 21
VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video
Generation
Paper
• 2502.07531
• Published
• 12
Light-A-Video: Training-free Video Relighting via Progressive Light
Fusion
Paper
• 2502.08590
• Published
• 42
CineMaster: A 3D-Aware and Controllable Framework for Cinematic
Text-to-Video Generation
Paper
• 2502.08639
• Published
• 43
Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
Paper
• 2502.07737
• Published
• 9
Animate Anyone 2: High-Fidelity Character Image Animation with
Environment Affordance
Paper
• 2502.06145
• Published
• 18
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of
Video Foundation Model
Paper
• 2502.10248
• Published
• 57
Phantom: Subject-consistent video generation via cross-modal alignment
Paper
• 2502.11079
• Published
• 59
VideoGrain: Modulating Space-Time Attention for Multi-grained Video
Editing
Paper
• 2502.17258
• Published
• 79
RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion
Transformers
Paper
• 2502.15894
• Published
• 20
UniTok: A Unified Tokenizer for Visual Generation and Understanding
Paper
• 2502.20321
• Published
• 30
Mobius: Text to Seamless Looping Video Generation via Latent Shift
Paper
• 2502.20307
• Published
• 18
The Best of Both Worlds: Integrating Language Models and Diffusion
Models for Video Generation
Paper
• 2503.04606
• Published
• 9
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play
Context Control
Paper
• 2503.05639
• Published
• 26
TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos
via Diffusion Models
Paper
• 2503.05638
• Published
• 20
Automated Movie Generation via Multi-Agent CoT Planning
Paper
• 2503.07314
• Published
• 44
MagicInfinite: Generating Infinite Talking Videos with Your Words and
Voice
Paper
• 2503.05978
• Published
• 36
Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled
Sampling
Paper
• 2503.08605
• Published
• 27
TPDiff: Temporal Pyramid Video Diffusion Model
Paper
• 2503.09566
• Published
• 45
Reangle-A-Video: 4D Video Generation as Video-to-Video Translation
Paper
• 2503.09151
• Published
• 32
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in
$200k
Paper
• 2503.09642
• Published
• 20
Long Context Tuning for Video Generation
Paper
• 2503.10589
• Published
• 14
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance
Paper
• 2503.10391
• Published
• 12
Large-scale Pre-training for Grounded Video Caption Generation
Paper
• 2503.10781
• Published
• 16
Cockatiel: Ensembling Synthetic and Human Preferenced Training for
Detailed Video Caption
Paper
• 2503.09279
• Published
• 5
DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal
Consistent Video Generation
Paper
• 2503.06053
• Published
• 138
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
Paper
• 2503.13444
• Published
• 17
MTV-Inpaint: Multi-Task Long Video Inpainting
Paper
• 2503.11412
• Published
• 10
Long-Video Audio Synthesis with Multi-Agent Collaboration
Paper
• 2503.10719
• Published
• 9
WISA: World Simulator Assistant for Physics-Aware Text-to-Video
Generation
Paper
• 2503.08153
• Published
• 3
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal
Control
Paper
• 2503.14492
• Published
• 20
FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View
Synthesis
Paper
• 2503.13265
• Published
• 15
Concat-ID: Towards Universal Identity-Preserving Video Synthesis
Paper
• 2503.14151
• Published
• 10
Temporal Regularization Makes Your Video Generator Stronger
Paper
• 2503.15417
• Published
• 22
MusicInfuser: Making Video Diffusion Listen and Dance
Paper
• 2503.14505
• Published
• 12
MagicMotion: Controllable Video Generation with Dense-to-Sparse
Trajectory Guidance
Paper
• 2503.16421
• Published
• 11
MagicID: Hybrid Preference Optimization for ID-Consistent and
Dynamic-Preserved Video Customization
Paper
• 2503.12689
• Published
• 5
Enabling Versatile Controls for Video Diffusion Models
Paper
• 2503.16983
• Published
• 15
Video-T1: Test-Time Scaling for Video Generation
Paper
• 2503.18942
• Published
• 90
CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models
Paper
• 2503.18886
• Published
• 24
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Paper
• 2503.19325
• Published
• 73
FullDiT: Multi-Task Video Generative Foundation Model with Full
Attention
Paper
• 2503.19907
• Published
• 8
Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long
Video Generation
Paper
• 2503.19881
• Published
• 6
Wan: Open and Advanced Large-Scale Video Generative Models
Paper
• 2503.20314
• Published
• 59
AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset
Paper
• 2503.19462
• Published
• 10
Synthetic Video Enhances Physical Fidelity in Video Synthesis
Paper
• 2503.20822
• Published
• 16
SketchVideo: Sketch-based Video Generation and Editing
Paper
• 2503.23284
• Published
• 23
Any2Caption:Interpreting Any Condition to Caption for Controllable Video
Generation
Paper
• 2503.24379
• Published
• 76
JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical
Spatio-Temporal Prior Synchronization
Paper
• 2503.23377
• Published
• 57
SkyReels-A2: Compose Anything in Video Diffusion Transformers
Paper
• 2504.02436
• Published
• 39
One-Minute Video Generation with Test-Time Training
Paper
• 2504.05298
• Published
• 110
GenDoP: Auto-regressive Camera Trajectory Generation as a Director of
Photography
Paper
• 2504.07083
• Published
• 22
Caption Anything in Video: Fine-grained Object-centric Captioning via
Spatiotemporal Multimodal Prompting
Paper
• 2504.05541
• Published
• 15
DiTaiListener: Controllable High Fidelity Listener Video Generation with
Diffusion
Paper
• 2504.04010
• Published
• 9
Training-free Guidance in Text-to-Video Generation via Multimodal
Planning and Structured Noise Initialization
Paper
• 2504.08641
• Published
• 6
NormalCrafter: Learning Temporally Consistent Normals from Video
Diffusion Priors
Paper
• 2504.11427
• Published
• 19
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference
Optimization for Large Video Models
Paper
• 2504.13122
• Published
• 20
SphereDiff: Tuning-free Omnidirectional Panoramic Image and Video
Generation via Spherical Latent Representation
Paper
• 2504.14396
• Published
• 27
Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls
for Video Generation
Paper
• 2504.14899
• Published
• 20
Vidi: Large Multimodal Models for Video Understanding and Editing
Paper
• 2504.15681
• Published
• 14
RealisDance-DiT: Simple yet Strong Baseline towards Controllable
Character Animation in the Wild
Paper
• 2504.14977
• Published
• 10
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming
Videos
Paper
• 2504.17343
• Published
• 13
3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models
Paper
• 2504.17414
• Published
• 18
Towards Understanding Camera Motions in Any Video
Paper
• 2504.15376
• Published
• 155
Subject-driven Video Generation via Disentangled Identity and Motion
Paper
• 2504.17816
• Published
• 12
ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D
Physics Modeling for Complex Motion and Interaction
Paper
• 2504.21855
• Published
• 13
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video
Generation
Paper
• 2505.04512
• Published
• 36
Scaling Image and Video Generation via Test-Time Evolutionary Search
Paper
• 2505.17618
• Published
• 41
Model Already Knows the Best Noise: Bayesian Active Noise Selection via
Attention in Video Diffusion Model
Paper
• 2505.17561
• Published
• 31
OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for
Subject-to-Video Generation
Paper
• 2505.20292
• Published
• 52
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via
Semantic-Aware Permutation
Paper
• 2505.18875
• Published
• 42
MotionPro: A Precise Motion Controller for Image-to-Video Generation
Paper
• 2505.20287
• Published
• 20
HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for
Multiple Characters
Paper
• 2505.20156
• Published
• 1
MAGREF: Masked Guidance for Any-Reference Video Generation
Paper
• 2505.23742
• Published
• 11
ATI: Any Trajectory Instruction for Controllable Video Generation
Paper
• 2505.22944
• Published
• 6
Temporal In-Context Fine-Tuning for Versatile Control of Video Diffusion
Models
Paper
• 2506.00996
• Published
• 40
FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video
Generation
Paper
• 2506.01144
• Published
• 15
Voyager: Long-Range and World-Consistent Video Diffusion for Explorable
3D Scene Generation
Paper
• 2506.04225
• Published
• 28
IllumiCraft: Unified Geometry and Illumination Diffusion for
Controllable Video Generation
Paper
• 2506.03150
• Published
• 21
LayerFlow: A Unified Model for Layer-aware Video Generation
Paper
• 2506.04228
• Published
• 13
SeedVR2: One-Step Video Restoration via Diffusion Adversarial
Post-Training
Paper
• 2506.05301
• Published
• 59
FlowDirector: Training-Free Flow Steering for Precise Text-to-Video
Editing
Paper
• 2506.05046
• Published
• 2
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal
Interaction and Enhancement
Paper
• 2506.07848
• Published
• 4
Dynamic View Synthesis as an Inverse Problem
Paper
• 2506.08004
• Published
• 5
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video
Diffusion Models
Paper
• 2506.07177
• Published
• 23
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper
• 2506.09113
• Published
• 107
Autoregressive Adversarial Post-Training for Real-Time Interactive Video
Generation
Paper
• 2506.09350
• Published
• 48
InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio
Conditions
Paper
• 2506.09984
• Published
• 14
Cross-Frame Representation Alignment for Fine-Tuning Video Diffusion
Models
Paper
• 2506.09229
• Published
• 7
Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy
Paper
• 2506.22432
• Published
• 13
VMoBA: Mixture-of-Block Attention for Video Diffusion Models
Paper
• 2506.23858
• Published
• 31
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for
Long Video Generation
Paper
• 2506.19852
• Published
• 42
JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching
Paper
• 2506.23552
• Published
• 10
STR-Match: Matching SpatioTemporal Relevance Score for Training-Free
Video Editing
Paper
• 2506.22868
• Published
• 5
StreamDiT: Real-Time Streaming Text-to-Video Generation
Paper
• 2507.03745
• Published
• 32
Geometry Forcing: Marrying Video Diffusion and 3D Representation for
Consistent World Modeling
Paper
• 2507.07982
• Published
• 34
A Survey on Long-Video Storytelling Generation: Architectures,
Consistency, and Cinematic Quality
Paper
• 2507.07202
• Published
• 25
Lumos-1: On Autoregressive Video Generation from a Unified Model
Perspective
Paper
• 2507.08801
• Published
• 31
UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New
Benchmarks
Paper
• 2507.11336
• Published
• 7
TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame
Interpolation
Paper
• 2507.04984
• Published
• 6
SeC: Advancing Complex Video Object Segmentation via Progressive Concept
Construction
Paper
• 2507.15852
• Published
• 38
TokensGen: Harnessing Condensed Tokens for Long Video Generation
Paper
• 2507.15728
• Published
• 8
PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized
Timestep Adaptation
Paper
• 2507.16116
• Published
• 13
nablaNABLA: Neighborhood Adaptive Block-Level Attention
Paper
• 2507.13546
• Published
• 125
Captain Cinema: Towards Short Movie Generation
Paper
• 2507.18634
• Published
• 42
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation
Paper
• 2508.03694
• Published
• 52
DreamVVT: Mastering Realistic Video Virtual Try-On in the Wild via a
Stage-Wise Diffusion Transformer Framework
Paper
• 2508.02807
• Published
• 13
Omni-Effects: Unified and Spatially-Controllable Visual Effects
Generation
Paper
• 2508.07981
• Published
• 63
CharacterShot: Controllable and Consistent 4D Character Animation
Paper
• 2508.07409
• Published
• 39
Cut2Next: Generating Next Shot via In-Context Tuning
Paper
• 2508.08244
• Published
• 13
Stand-In: A Lightweight and Plug-and-Play Identity Control for Video
Generation
Paper
• 2508.07901
• Published
• 40
ToonComposer: Streamlining Cartoon Production with Generative
Post-Keyframing
Paper
• 2508.10881
• Published
• 52
Waver: Wave Your Way to Lifelike Video Generation
Paper
• 2508.15761
• Published
• 36
Wan-S2V: Audio-Driven Cinematic Video Generation
Paper
• 2508.18621
• Published
• 20
Mixture of Contexts for Long Video Generation
Paper
• 2508.21058
• Published
• 35
UniVerse-1: Unified Audio-Video Generation via Stitching of Experts
Paper
• 2509.06155
• Published
• 14
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal
Conditioning
Paper
• 2509.08519
• Published
• 128
Wan-Animate: Unified Character Animation and Replacement with Holistic
Replication
Paper
• 2509.14055
• Published
• 17
Lynx: Towards High-Fidelity Personalized Video Generation
Paper
• 2509.15496
• Published
• 13
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion
Transformer Models
Paper
• 2509.17627
• Published
• 66
ContextFlow: Training-Free Video Object Editing via Adaptive Context
Enrichment
Paper
• 2509.17818
• Published
• 8
DC-VideoGen: Efficient Video Generation with Deep Compression Video
Autoencoder
Paper
• 2509.25182
• Published
• 39
MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation
Paper
• 2509.26391
• Published
• 22
UniVideo: Unified Understanding, Generation, and Editing for Videos
Paper
• 2510.08377
• Published
• 81
VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal
Patches via In-Context Conditioning
Paper
• 2510.08555
• Published
• 64
UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video
Super-Resolution
Paper
• 2510.08143
• Published
• 20
PickStyle: Video-to-Video Style Transfer with Context-Style Adapters
Paper
• 2510.07546
• Published
• 22
InstructX: Towards Unified Visual Editing with MLLM Guidance
Paper
• 2510.08485
• Published
• 18
Bridging Text and Video Generation: A Survey
Paper
• 2510.04999
• Published
• 6
Scaling Instruction-Based Video Editing with a High-Quality Synthetic
Dataset
Paper
• 2510.15742
• Published
• 51
Stable Video Infinity: Infinite-Length Video Generation with Error
Recycling
Paper
• 2510.09212
• Published
• 18
UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal
Interactions
Paper
• 2511.03334
• Published
• 53
UniLumos: Fast and Unified Image and Video Relighting with
Physics-Plausible Feedback
Paper
• 2511.01678
• Published
• 38
VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context
Learning
Paper
• 2510.25772
• Published
• 33
SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation
Paper
• 2511.19320
• Published
• 42
Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy
Paper
• 2511.21579
• Published
• 23
Plan-X: Instruct Video Generation via Semantic Planning
Paper
• 2511.17986
• Published
• 18
MultiShotMaster: A Controllable Multi-Shot Video Generation Framework
Paper
• 2512.03041
• Published
• 64
Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout
Paper
• 2511.20649
• Published
• 48
Vision Bridge Transformer at Scale
Paper
• 2511.23199
• Published
• 46
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
Paper
• 2512.08765
• Published
• 132
StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation
Paper
• 2512.09363
• Published
• 72
Composing Concepts from Images and Videos via Concept-prompt Binding
Paper
• 2512.09824
• Published
• 28
UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation
Paper
• 2512.07831
• Published
• 17
Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model
Paper
• 2512.13507
• Published
• 40
SemanticGen: Video Generation in Semantic Space
Paper
• 2512.20619
• Published
• 93
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times
Paper
• 2512.16093
• Published
• 95
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion
Paper
• 2512.17504
• Published
• 97
FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation
Paper
• 2512.24724
• Published
• 7
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper
• 2601.03233
• Published
• 154
DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer
Paper
• 2601.01425
• Published
• 52
SkyReels-V3 Technique Report
Paper
• 2601.17323
• Published
• 9
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation
Paper
• 2602.03796
• Published
• 57