Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image
Synthesis
Paper
• 2401.09048
• Published
• 10
Improving fine-grained understanding in image-text pre-training
Paper
• 2401.09865
• Published
• 18
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper
• 2401.10891
• Published
• 62
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic
Image Restoration In the Wild
Paper
• 2401.13627
• Published
• 78
UNIMO-G: Unified Image Generation through Multimodal Conditional
Diffusion
Paper
• 2401.13388
• Published
• 13
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image
Editing
Paper
• 2402.02583
• Published
• 8
SDXL-Lightning: Progressive Adversarial Diffusion Distillation
Paper
• 2402.13929
• Published
• 27
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with
Trajectory Stitching
Paper
• 2402.14167
• Published
• 11
Subobject-level Image Tokenization
Paper
• 2402.14327
• Published
• 18
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept
Composition
Paper
• 2402.15504
• Published
• 21
Multi-LoRA Composition for Image Generation
Paper
• 2402.16843
• Published
• 31
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with
Audio2Video Diffusion Model under Weak Conditions
Paper
• 2402.17485
• Published
• 194
DistriFusion: Distributed Parallel Inference for High-Resolution
Diffusion Models
Paper
• 2402.19481
• Published
• 22
Trajectory Consistency Distillation
Paper
• 2402.19159
• Published
• 16
RealCustom: Narrowing Real Text Word for Real-Time Open-Domain
Text-to-Image Customization
Paper
• 2403.00483
• Published
• 16
ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models
Paper
• 2403.02084
• Published
• 15
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable
Virtual Try-on
Paper
• 2403.01779
• Published
• 30
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Paper
• 2403.03206
• Published
• 71
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper
• 2403.05135
• Published
• 45
Motion Mamba: Efficient and Long Sequence Motion Generation with
Hierarchical and Bidirectional Selective SSM
Paper
• 2403.07487
• Published
• 16
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
Paper
• 2403.09622
• Published
• 17
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based
Semantic Control
Paper
• 2403.09055
• Published
• 26
IDAdapter: Learning Mixed Features for Tuning-Free Personalization of
Text-to-Image Models
Paper
• 2403.13535
• Published
• 23
DepthFM: Fast Monocular Depth Estimation with Flow Matching
Paper
• 2403.13788
• Published
• 18
Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos
Paper
• 2403.13044
• Published
• 15
FlashFace: Human Image Personalization with High-fidelity Identity
Preservation
Paper
• 2403.17008
• Published
• 22
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions
Paper
• 2403.16627
• Published
• 22
ViTAR: Vision Transformer with Any Resolution
Paper
• 2403.18361
• Published
• 55
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object
Removal and Insertion
Paper
• 2403.18818
• Published
• 28
CosmicMan: A Text-to-Image Foundation Model for Humans
Paper
• 2404.01294
• Published
• 17
Condition-Aware Neural Network for Controlled Image Generation
Paper
• 2404.01143
• Published
• 13
Measuring Style Similarity in Diffusion Models
Paper
• 2404.01292
• Published
• 17
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept
Matching
Paper
• 2404.03653
• Published
• 35
RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Paper
• 2404.03673
• Published
• 15
ControlNet++: Improving Conditional Controls with Efficient Consistency
Feedback
Paper
• 2404.07987
• Published
• 48
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and
Training Strategies
Paper
• 2404.08197
• Published
• 29
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse
Controls to Any Diffusion Model
Paper
• 2404.09967
• Published
• 21
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
Paper
• 2404.09990
• Published
• 14
Dynamic Typography: Bringing Words to Life
Paper
• 2404.11614
• Published
• 46
MoA: Mixture-of-Attention for Subject-Context Disentanglement in
Personalized Image Generation
Paper
• 2404.11565
• Published
• 15
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image
Synthesis
Paper
• 2404.13686
• Published
• 29
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper
• 2404.14507
• Published
• 23
PuLID: Pure and Lightning ID Customization via Contrastive Alignment
Paper
• 2404.16022
• Published
• 25
Editable Image Elements for Controllable Synthesis
Paper
• 2404.16029
• Published
• 12
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with
Reward Feedback Learning
Paper
• 2404.15449
• Published
• 14
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity
Preserving
Paper
• 2404.16771
• Published
• 19
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video
Generation
Paper
• 2405.01434
• Published
• 56
Customizing Text-to-Image Models with a Single Image Pair
Paper
• 2405.01536
• Published
• 22
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and
Attribute Control
Paper
• 2405.12970
• Published
• 25
RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance
Paper
• 2405.14677
• Published
• 11
DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis
Paper
• 2405.14224
• Published
• 15
Semantica: An Adaptable Image-Conditioned Diffusion Model
Paper
• 2405.14857
• Published
• 11
EM Distillation for One-step Diffusion Models
Paper
• 2405.16852
• Published
• 12
Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models
Paper
• 2405.16759
• Published
• 8
Paper
• 2405.18407
• Published
• 48
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model
Paper
• 2406.04333
• Published
• 38
pOps: Photo-Inspired Diffusion Operators
Paper
• 2406.01300
• Published
• 17
Zero-shot Image Editing with Reference Imitation
Paper
• 2406.07547
• Published
• 33
An Image is Worth 32 Tokens for Reconstruction and Generation
Paper
• 2406.07550
• Published
• 60
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
Paper
• 2406.06911
• Published
• 12
FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent
Font Effect Generation
Paper
• 2406.08392
• Published
• 21
Paper
• 2406.09414
• Published
• 103
An Image is Worth More Than 16x16 Patches: Exploring Transformers on
Individual Pixels
Paper
• 2406.09415
• Published
• 51
Alleviating Distortion in Image Generation via Multi-Resolution
Diffusion Models
Paper
• 2406.09416
• Published
• 29
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal
Prompts
Paper
• 2406.09162
• Published
• 14
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual
Visual Text Rendering
Paper
• 2406.10208
• Published
• 22
Exploring the Role of Large Language Models in Prompt Encoding for
Diffusion Models
Paper
• 2406.11831
• Published
• 22
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN
Inversion and High Quality Image Editing
Paper
• 2406.10601
• Published
• 70
Invertible Consistency Distillation for Text-Guided Image Editing in
Around 7 Steps
Paper
• 2406.14539
• Published
• 27
DreamBench++: A Human-Aligned Benchmark for Personalized Image
Generation
Paper
• 2406.16855
• Published
• 57
Aligning Diffusion Models with Noise-Conditioned Perception
Paper
• 2406.17636
• Published
• 27
Magic Insert: Style-Aware Drag-and-Drop
Paper
• 2407.02489
• Published
• 21
DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents
Paper
• 2407.03300
• Published
• 14
PartCraft: Crafting Creative Objects by Parts
Paper
• 2407.04604
• Published
• 6
SVGCraft: Beyond Single Object Text-to-SVG Synthesis with Comprehensive
Canvas Layout
Paper
• 2404.00412
• Published
• 2
DataDream: Few-shot Guided Dataset Generation
Paper
• 2407.10910
• Published
• 10
Scaling Diffusion Transformers to 16 Billion Parameters
Paper
• 2407.11633
• Published
• 26
IMAGDressing-v1: Customizable Virtual Dressing
Paper
• 2407.12705
• Published
• 13
CGB-DM: Content and Graphic Balance Layout Generation with
Transformer-based Diffusion Model
Paper
• 2407.15233
• Published
• 7
Artist: Aesthetically Controllable Text-Driven Stylization without
Training
Paper
• 2407.15842
• Published
• 14
Paper
• 2407.15595
• Published
• 14
ViPer: Visual Personalization of Generative Models via Individual
Preference Learning
Paper
• 2407.17365
• Published
• 13
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
Paper
• 2407.16982
• Published
• 42
BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular
Depth Estimation
Paper
• 2407.17952
• Published
• 32
SHIC: Shape-Image Correspondences with no Keypoint Supervision
Paper
• 2407.18907
• Published
• 41
TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models
Paper
• 2408.00735
• Published
• 16
Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy
Curvature of Attention
Paper
• 2408.00760
• Published
• 7
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation
with Multimodal Generative Pretraining
Paper
• 2408.02657
• Published
• 35
ProCreate, Dont Reproduce! Propulsive Energy Diffusion for Creative
Generation
Paper
• 2408.02226
• Published
• 11
IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning
using Instruct Prompts
Paper
• 2408.03209
• Published
• 22
Openstory++: A Large-scale Dataset and Benchmark for Instance-aware
Open-domain Visual Storytelling
Paper
• 2408.03695
• Published
• 13
ControlNeXt: Powerful and Efficient Control for Image and Video
Generation
Paper
• 2408.06070
• Published
• 55
BRAT: Bonus oRthogonAl Token for Architecture Agnostic Textual Inversion
Paper
• 2408.04785
• Published
• 8
UniPortrait: A Unified Framework for Identity-Preserving Single- and
Multi-Human Image Personalization
Paper
• 2408.05939
• Published
• 14
Paper
• 2408.07009
• Published
• 62
ZePo: Zero-Shot Portrait Stylization with Faster Sampling
Paper
• 2408.05492
• Published
• 7
Paper
• 2408.07116
• Published
• 20
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations
Paper
• 2408.08459
• Published
• 45
TurboEdit: Instant text-based image editing
Paper
• 2408.08332
• Published
• 20
Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering
Paper
• 2408.09702
• Published
• 11
TraDiffusion: Trajectory-Based Training-Free Image Generation
Paper
• 2408.09739
• Published
• 9
MegaFusion: Extend Diffusion Models towards Higher-resolution Image
Generation without Further Tuning
Paper
• 2408.11001
• Published
• 13
The Brittleness of AI-Generated Image Watermarking Techniques: Examining
Their Robustness Against Visual Paraphrasing Attacks
Paper
• 2408.10446
• Published
• 9
Scalable Autoregressive Image Generation with Mamba
Paper
• 2408.12245
• Published
• 26
CODE: Confident Ordinary Differential Editing
Paper
• 2408.12418
• Published
• 4
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its
Teacher
Paper
• 2408.14176
• Published
• 62
Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image
Generation
Paper
• 2408.14819
• Published
• 22
Distribution Backtracking Builds A Faster Convergence Trajectory for
One-step Diffusion Distillation
Paper
• 2408.15991
• Published
• 16
CSGO: Content-Style Composition in Text-to-Image Generation
Paper
• 2408.16766
• Published
• 18
CoRe: Context-Regularized Text Embedding Learning for Text-to-Image
Personalization
Paper
• 2408.15914
• Published
• 24
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion
Transformers
Paper
• 2408.17131
• Published
• 11
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper
• 2409.02097
• Published
• 34
Accurate Compression of Text-to-Image Diffusion Models via Vector
Quantization
Paper
• 2409.00492
• Published
• 11
Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free
Real Image Editing
Paper
• 2409.01322
• Published
• 96
IFAdapter: Instance Feature Control for Grounded Text-to-Image
Generation
Paper
• 2409.08240
• Published
• 22
InstantDrag: Improving Interactivity in Drag-based Image Editing
Paper
• 2409.08857
• Published
• 34
StoryMaker: Towards Holistic Consistent Characters in Text-to-image
Generation
Paper
• 2409.12576
• Published
• 16
Imagine yourself: Tuning-Free Personalized Image Generation
Paper
• 2409.13346
• Published
• 69
Colorful Diffuse Intrinsic Image Decomposition in the Wild
Paper
• 2409.13690
• Published
• 13
Improvements to SDXL in NovelAI Diffusion V3
Paper
• 2409.15997
• Published
• 13
Pixel-Space Post-Training of Latent Diffusion Models
Paper
• 2409.17565
• Published
• 20
OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal
Instruction
Paper
• 2410.04932
• Published
• 9
Accelerating Auto-regressive Text-to-Image Generation with Training-free
Speculative Jacobi Decoding
Paper
• 2410.01699
• Published
• 18
IterComp: Iterative Composition-Aware Feedback Learning from Model
Gallery for Text-to-Image Generation
Paper
• 2410.07171
• Published
• 43
Story-Adapter: A Training-free Iterative Framework for Long Story
Visualization
Paper
• 2410.06244
• Published
• 20
Eliminating Oversaturation and Artifacts of High Guidance Scales in
Diffusion Models
Paper
• 2410.02416
• Published
• 34
DICE: Discrete Inversion Enabling Controllable Editing for Multinomial
Diffusion and Masked Generative Models
Paper
• 2410.08207
• Published
• 19
Meissonic: Revitalizing Masked Generative Transformers for Efficient
High-Resolution Text-to-Image Synthesis
Paper
• 2410.08261
• Published
• 52
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large
Vision-Language Models
Paper
• 2410.07133
• Published
• 19
Semantic Image Inversion and Editing using Rectified Stochastic
Differential Equations
Paper
• 2410.10792
• Published
• 31
Efficient Diffusion Models: A Comprehensive Survey from Principles to
Practices
Paper
• 2410.11795
• Published
• 18
Improving Long-Text Alignment for Text-to-Image Diffusion Models
Paper
• 2410.11817
• Published
• 15
Fluid: Scaling Autoregressive Text-to-image Generative Models with
Continuous Tokens
Paper
• 2410.13863
• Published
• 37
VidPanos: Generative Panoramic Videos from Casual Panning Videos
Paper
• 2410.13832
• Published
• 13
FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion
Model
Paper
• 2410.13925
• Published
• 24
BiGR: Harnessing Binary Latent Codes for Image Generation and Improved
Visual Representation Capabilities
Paper
• 2410.14672
• Published
• 8
Scalable Ranked Preference Optimization for Text-to-Image Generation
Paper
• 2410.18013
• Published
• 14
Stable Consistency Tuning: Understanding and Improving Consistency
Models
Paper
• 2410.18958
• Published
• 10
DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe
Dataset Curation
Paper
• 2410.18666
• Published
• 19
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse
Autoencoders
Paper
• 2410.22366
• Published
• 84
Constant Acceleration Flow
Paper
• 2411.00322
• Published
• 24
In-Context LoRA for Diffusion Transformers
Paper
• 2410.23775
• Published
• 11
Training-free Regional Prompting for Diffusion Transformers
Paper
• 2411.02395
• Published
• 25
Constrained Diffusion Implicit Models
Paper
• 2411.00359
• Published
• 6
SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion
Models
Paper
• 2411.05007
• Published
• 24
Add-it: Training-Free Object Insertion in Images With Pretrained
Diffusion Models
Paper
• 2411.07232
• Published
• 68
OmniEdit: Building Image Editing Generalist Models Through Specialist
Supervision
Paper
• 2411.07199
• Published
• 50
Edify Image: High-Quality Image Generation with Pixel Space Laplacian
Diffusion Models
Paper
• 2411.07126
• Published
• 30
Watermark Anything with Localized Messages
Paper
• 2411.07231
• Published
• 21
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified
Multimodal Understanding and Generation
Paper
• 2411.07975
• Published
• 31
Scaling Properties of Diffusion Models for Perceptual Tasks
Paper
• 2411.08034
• Published
• 13
MagicQuill: An Intelligent Interactive Image Editing System
Paper
• 2411.09703
• Published
• 80
Inconsistencies In Consistency Models: Better ODE Solving Does Not Imply
Better Samples
Paper
• 2411.08954
• Published
• 10
Region-Aware Text-to-Image Generation via Hard Binding and Soft
Refinement
Paper
• 2411.06558
• Published
• 36
FitDiT: Advancing the Authentic Garment Details for High-fidelity
Virtual Try-on
Paper
• 2411.10499
• Published
• 13
Continuous Speculative Decoding for Autoregressive Image Generation
Paper
• 2411.11925
• Published
• 16
Stylecodes: Encoding Stylistic Information For Image Generation
Paper
• 2411.12811
• Published
• 12
Generating Compositional Scenes via Text-to-image RGBA Instance
Generation
Paper
• 2411.10913
• Published
• 4
Stable Flow: Vital Layers for Training-Free Image Editing
Paper
• 2411.14430
• Published
• 22
Style-Friendly SNR Sampler for Style-Driven Generation
Paper
• 2411.14793
• Published
• 39
OminiControl: Minimal and Universal Control for Diffusion Transformer
Paper
• 2411.15098
• Published
• 61
MyTimeMachine: Personalized Facial Age Transformation
Paper
• 2411.14521
• Published
• 23
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot
Subject-Driven Image Generator
Paper
• 2411.15466
• Published
• 39
One Diffusion to Generate Them All
Paper
• 2411.16318
• Published
• 28
Controllable Human Image Generation with Personalized Multi-Garments
Paper
• 2411.16801
• Published
• 3
ROICtrl: Boosting Instance Control for Visual Generation
Paper
• 2411.17949
• Published
• 87
DreamCache: Finetuning-Free Lightweight Personalized Image Generation
via Feature Caching
Paper
• 2411.17786
• Published
• 12
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
Paper
• 2411.17787
• Published
• 12
Diffusion Self-Distillation for Zero-Shot Customized Image Generation
Paper
• 2411.18616
• Published
• 16
Omegance: A Single Parameter for Various Granularities in
Diffusion-Based Synthesis
Paper
• 2411.17769
• Published
• 8
Edit Away and My Face Will not Stay: Personal Biometric Defense against
Malicious Generative Editing
Paper
• 2411.16832
• Published
• 2
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction
using Diffusion Models
Paper
• 2411.18350
• Published
• 28
FAM Diffusion: Frequency and Attention Modulation for High-Resolution
Image Generation with Stable Diffusion
Paper
• 2411.18552
• Published
• 18
Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis
Paper
• 2412.01819
• Published
• 34
Art-Free Generative Models: Art Creation Without Graphic Art Knowledge
Paper
• 2412.00176
• Published
• 9
SNOOPI: Supercharged One-step Diffusion Distillation with Proper
Guidance
Paper
• 2412.02687
• Published
• 113
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and
Generation
Paper
• 2412.03069
• Published
• 34
LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene
Relighting
Paper
• 2412.00177
• Published
• 8
A Noise is Worth Diffusion Guidance
Paper
• 2412.03895
• Published
• 29
Negative Token Merging: Image-based Adversarial Feature Guidance
Paper
• 2412.01339
• Published
• 22
AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent
Diffusion Models
Paper
• 2412.04146
• Published
• 23
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution
Image Synthesis
Paper
• 2412.04431
• Published
• 17
ZipAR: Accelerating Autoregressive Image Generation through Spatial
Locality
Paper
• 2412.04062
• Published
• 8
SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step
Diffusion
Paper
• 2412.04301
• Published
• 40
PanoDreamer: 3D Panorama Synthesis from a Single Image
Paper
• 2412.04827
• Published
• 10
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for
Customized Manga Generation
Paper
• 2412.07589
• Published
• 48
Hidden in the Noise: Two-Stage Robust Watermarking for Images
Paper
• 2412.04653
• Published
• 30
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion
Models
Paper
• 2412.07674
• Published
• 20
UniReal: Universal Image Generation and Editing via Learning Real-world
Dynamics
Paper
• 2412.07774
• Published
• 30
LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style
Conditioned Image Generation
Paper
• 2412.05148
• Published
• 12
Learning Flow Fields in Attention for Controllable Person Image
Generation
Paper
• 2412.08486
• Published
• 36
FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow
Models
Paper
• 2412.08629
• Published
• 13
StyleStudio: Text-Driven Style Transfer with Selective Control of Style
Elements
Paper
• 2412.08503
• Published
• 8
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via
Multimodal LLM
Paper
• 2412.09618
• Published
• 21
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices
with Efficient Architectures and Training
Paper
• 2412.09619
• Published
• 30
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models
Paper
• 2412.09622
• Published
• 8
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free
Scale Fusion
Paper
• 2412.09626
• Published
• 21
ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven
Generation
Paper
• 2412.08645
• Published
• 12
FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing
Paper
• 2412.07517
• Published
• 11
FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers
Paper
• 2412.09611
• Published
• 11
BrushEdit: All-In-One Image Inpainting and Editing
Paper
• 2412.10316
• Published
• 36
ColorFlow: Retrieval-Augmented Image Sequence Colorization
Paper
• 2412.11815
• Published
• 26
Causal Diffusion Transformers for Generative Modeling
Paper
• 2412.12095
• Published
• 23
FashionComposer: Compositional Fashion Image Generation
Paper
• 2412.14168
• Published
• 17
ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting
with Diffusion Transformers
Paper
• 2412.12571
• Published
• 8
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper
• 2412.15213
• Published
• 28
Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion
Paper
• 2412.14462
• Published
• 15
Paper
• 2412.18653
• Published
• 86
The Superposition of Diffusion Models Using the Itô Density Estimator
Paper
• 2412.17762
• Published
• 13
From Elements to Design: A Layered Approach for Automatic Graphic Design
Composition
Paper
• 2412.19712
• Published
• 15
VMix: Improving Text-to-Image Diffusion Model with Cross-Attention
Mixing Control
Paper
• 2412.20800
• Published
• 11
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation
Paper
• 2501.02576
• Published
• 15
MagicFace: High-Fidelity Facial Expression Editing with Action-Unit
Control
Paper
• 2501.02260
• Published
• 5
The GAN is dead; long live the GAN! A Modern GAN Baseline
Paper
• 2501.05441
• Published
• 95
MangaNinja: Line Art Colorization with Precise Reference Following
Paper
• 2501.08332
• Published
• 62
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models
Paper
• 2501.06751
• Published
• 32
Democratizing Text-to-Image Masked Generative Models with Compact
Text-Aware One-Dimensional Tokens
Paper
• 2501.07730
• Published
• 18
FramePainter: Endowing Interactive Image Editing with Video Diffusion
Priors
Paper
• 2501.08225
• Published
• 20
3DIS-FLUX: simple and efficient multi-instance generation with DiT
rendering
Paper
• 2501.05131
• Published
• 37
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising
Steps
Paper
• 2501.09732
• Published
• 72
SynthLight: Portrait Relighting with Diffusion Model by Learning to
Re-render Synthetic Faces
Paper
• 2501.09756
• Published
• 20
Textoon: Generating Vivid 2D Cartoon Characters from Text Descriptions
Paper
• 2501.10020
• Published
• 24
TokenVerse: Versatile Multi-concept Personalization in Token Modulation
Space
Paper
• 2501.12224
• Published
• 48
GPS as a Control Signal for Image Generation
Paper
• 2501.12390
• Published
• 15
Can We Generate Images with CoT? Let's Verify and Reinforce Image
Generation Step by Step
Paper
• 2501.13926
• Published
• 43
One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation
Using a Single Prompt
Paper
• 2501.13554
• Published
• 9
AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and
Modulation
Paper
• 2403.14614
• Published
• 2
Denoising as Adaptation: Noise-Space Domain Adaptation for Image
Restoration
Paper
• 2406.18516
• Published
• 4
Visual Generation Without Guidance
Paper
• 2501.15420
• Published
• 8
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute
in Linear Diffusion Transformer
Paper
• 2501.18427
• Published
• 24
Inverse Bridge Matching Distillation
Paper
• 2502.01362
• Published
• 27
LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion
Transformer
Paper
• 2502.01105
• Published
• 21
Weak-to-Strong Diffusion with Reflection
Paper
• 2502.00473
• Published
• 24
Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More
Paper
• 2502.03738
• Published
• 11
Dual Caption Preference Optimization for Diffusion Models
Paper
• 2502.06023
• Published
• 9
Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient
Text-to-Image Generation
Paper
• 2502.08690
• Published
• 43
ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation
Paper
• 2502.09411
• Published
• 22
Precise Parameter Localization for Textual Generation in Diffusion
Models
Paper
• 2502.09935
• Published
• 12
Region-Adaptive Sampling for Diffusion Transformers
Paper
• 2502.10389
• Published
• 53
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning
in Diffusion Models
Paper
• 2502.10458
• Published
• 38
Diffusion Models without Classifier-free Guidance
Paper
• 2502.12154
• Published
• 8
PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data
Paper
• 2502.14397
• Published
• 41
One-step Diffusion Models with f-Divergence Distribution Matching
Paper
• 2502.15681
• Published
• 8
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks
Paper
• 2502.17157
• Published
• 52
GCC: Generative Color Constancy via Diffusing a Color Checker
Paper
• 2502.17435
• Published
• 29
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent
Image Generation
Paper
• 2502.18364
• Published
• 36
KV-Edit: Training-Free Image Editing for Precise Background Preservation
Paper
• 2502.17363
• Published
• 37
K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs
Paper
• 2502.18461
• Published
• 17
LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven
Language Representation
Paper
• 2502.18302
• Published
• 5
GHOST 2.0: generative high-fidelity one shot transfer of heads
Paper
• 2502.18417
• Published
• 67
Distill Any Depth: Distillation Creates a Stronger Monocular Depth
Estimator
Paper
• 2502.19204
• Published
• 11
UniTok: A Unified Tokenizer for Visual Generation and Understanding
Paper
• 2502.20321
• Published
• 30
Multimodal Representation Alignment for Image Generation: Text-Image
Interleaved Control Is Easier Than You Think
Paper
• 2502.20172
• Published
• 29
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality
Samples with Less Compute
Paper
• 2502.20126
• Published
• 19
Training Consistency Models with Variational Noise Coupling
Paper
• 2502.18197
• Published
• 7
How far can we go with ImageNet for Text-to-Image generation?
Paper
• 2502.21318
• Published
• 26
RectifiedHR: Enable Efficient High-Resolution Image Generation via
Energy Rectification
Paper
• 2503.02537
• Published
• 12
Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with
Multimodal Large Language Model
Paper
• 2503.06141
• Published
• 4
Unleashing the Potential of Large Language Models for Text-to-Image
Generation through Autoregressive Representation Alignment
Paper
• 2503.07334
• Published
• 16
Seedream 2.0: A Native Chinese-English Bilingual Image Generation
Foundation Model
Paper
• 2503.07703
• Published
• 37
LightGen: Efficient Image Generation through Knowledge Distillation and
Direct Preference Optimization
Paper
• 2503.08619
• Published
• 20
ObjectMover: Generative Object Movement with Video Prior
Paper
• 2503.08037
• Published
• 5
Alias-Free Latent Diffusion Models:Improving Fractional Shift
Equivariance of Diffusion Latent Space
Paper
• 2503.09419
• Published
• 6
CoSTAast: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing
Paper
• 2503.10613
• Published
• 79
Silent Branding Attack: Trigger-free Data Poisoning Attack on
Text-to-Image Diffusion Models
Paper
• 2503.09669
• Published
• 35
OmniPaint: Mastering Object-Oriented Editing via Disentangled
Insertion-Removal Inpainting
Paper
• 2503.08677
• Published
• 29
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency
Distillation
Paper
• 2503.09641
• Published
• 42
ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style
Transfer
Paper
• 2503.10614
• Published
• 8
Autoregressive Image Generation with Randomized Parallel Decoding
Paper
• 2503.10568
• Published
• 9
Piece it Together: Part-Based Concepting with IP-Priors
Paper
• 2503.10365
• Published
• 8
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference
Time by Leveraging Sparsity
Paper
• 2503.07677
• Published
• 86
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale
Text-to-Image Models
Paper
• 2503.12885
• Published
• 43
Edit Transfer: Learning Image Editing via Vision In-Context Relations
Paper
• 2503.13327
• Published
• 29
BlobCtrl: A Unified and Flexible Framework for Element-level Image
Generation and Editing
Paper
• 2503.13434
• Published
• 27
Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation
Paper
• 2503.13070
• Published
• 10
GenStereo: Towards Open-World Generation of Stereo Images and
Unsupervised Matching
Paper
• 2503.12720
• Published
• 4
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the
LLM Era
Paper
• 2503.12329
• Published
• 27
Atlas: Multi-Scale Attention Improves Long Context Image Modeling
Paper
• 2503.12355
• Published
• 12
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion
Transformers via In-Context Reflection
Paper
• 2503.12271
• Published
• 9
LEGION: Learning to Ground and Explain for Synthetic Image Detection
Paper
• 2503.15264
• Published
• 21
Scale-wise Distillation of Diffusion Models
Paper
• 2503.16397
• Published
• 41
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers
Paper
• 2503.14487
• Published
• 28
Ultra-Resolution Adaptation with Ease
Paper
• 2503.16322
• Published
• 13
FFaceNeRF: Few-shot Face Editing in Neural Radiance Fields
Paper
• 2503.17095
• Published
• 5
Single Image Iterative Subject-driven Generation and Editing
Paper
• 2503.16025
• Published
• 14
Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent
Diffusion Models
Paper
• 2503.18352
• Published
• 6
RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame
Animated Sticker Generation
Paper
• 2503.17735
• Published
• 3
Inference-Time Scaling for Flow Models via Stochastic Generation and
Rollover Budget Forcing
Paper
• 2503.19385
• Published
• 34
Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection
with Artifact Explanation
Paper
• 2503.14905
• Published
• 20
Latent Space Super-Resolution for Higher-Resolution Image Generation
with Diffusion Models
Paper
• 2503.18446
• Published
• 12
Unconditional Priors Matter! Improving Conditional Generation of
Fine-Tuned Diffusion Models
Paper
• 2503.20240
• Published
• 22
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data
Synthesis
Paper
• 2503.21749
• Published
• 26
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
Paper
• 2503.21758
• Published
• 22
Unified Multimodal Discrete Diffusion
Paper
• 2503.20853
• Published
• 9
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual
Scenes
Paper
• 2503.23461
• Published
• 94
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and
Diffusion Refinement
Paper
• 2504.01934
• Published
• 22
Boost Your Own Human Image Generation Model via Direct Preference
Optimization with AI Feedback
Paper
• 2405.20216
• Published
• 21
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via
Iterative Instruction Tuning and Reinforcement Learning
Paper
• 2504.02949
• Published
• 21
SPF-Portrait: Towards Pure Portrait Customization with Semantic
Pollution-Free Fine-tuning
Paper
• 2504.00396
• Published
• 3
Concept Lancet: Image Editing with Compositional Representation
Transplant
Paper
• 2504.02828
• Published
• 16
An Empirical Study of GPT-4o Image Generation Capabilities
Paper
• 2504.05979
• Published
• 64
Less-to-More Generalization: Unlocking More Controllability by
In-Context Generation
Paper
• 2504.02160
• Published
• 37
Tuning-Free Image Editing with Fidelity and Editability via Unified
Latent Diffusion Model
Paper
• 2504.05594
• Published
• 11
HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned
Guidance
Paper
• 2504.06232
• Published
• 13
DDT: Decoupled Diffusion Transformer
Paper
• 2504.05741
• Published
• 77
Are We Done with Object-Centric Learning?
Paper
• 2504.07092
• Published
• 6
VisualCloze: A Universal Image Generation Framework via Visual
In-Context Learning
Paper
• 2504.07960
• Published
• 50
Compass Control: Multi Object Orientation Control for Text-to-Image
Generation
Paper
• 2504.06752
• Published
• 9
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for
Autoregressive Image Generation
Paper
• 2504.08736
• Published
• 46
ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image
Restoration
Paper
• 2504.08591
• Published
• 18
PixelFlow: Pixel-Space Generative Models with Flow
Paper
• 2504.07963
• Published
• 18
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation
through Pretraining, SFT, and RL
Paper
• 2504.11455
• Published
• 14
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation
Paper
• 2504.09454
• Published
• 11
Cobra: Efficient Line Art COlorization with BRoAder References
Paper
• 2504.12240
• Published
• 27
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion
Transformers
Paper
• 2504.10483
• Published
• 22
DMM: Building a Versatile Image Generation Model via Distillation-Based
Model Merging
Paper
• 2504.12364
• Published
• 22
InstantCharacter: Personalize Any Characters with a Scalable Diffusion
Transformer Framework
Paper
• 2504.12395
• Published
• 16
Tokenize Image Patches: Global Context Fusion for Effective Haze Removal
in Large Images
Paper
• 2504.09621
• Published
• 11
LookingGlass: Generative Anamorphoses via Laplacian Pyramid Warping
Paper
• 2504.08902
• Published
• 8
Personalized Text-to-Image Generation with Auto-Regressive Models
Paper
• 2504.13162
• Published
• 18
From Reflection to Perfection: Scaling Inference-Time Optimization for
Text-to-Image Diffusion Models via Reflection Tuning
Paper
• 2504.16080
• Published
• 15
DreamID: High-Fidelity and Fast diffusion-based Face Swapping via
Triplet ID Group Learning
Paper
• 2504.14509
• Published
• 53
DreamO: A Unified Framework for Image Customization
Paper
• 2504.16915
• Published
• 24
Step1X-Edit: A Practical Framework for General Image Editing
Paper
• 2504.17761
• Published
• 92
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image
Generation
Paper
• 2504.17502
• Published
• 55
Token-Shuffle: Towards High-Resolution Image Generation with
Autoregressive Models
Paper
• 2504.17789
• Published
• 23
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Paper
• 2504.16064
• Published
• 14
RepText: Rendering Visual Text via Replicating
Paper
• 2504.19724
• Published
• 31
In-Context Edit: Enabling Instructional Image Editing with In-Context
Generation in Large Scale Diffusion Transformer
Paper
• 2504.20690
• Published
• 19
PixelHacker: Image Inpainting with Structural and Semantic Consistency
Paper
• 2504.20438
• Published
• 44
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based
Image Editing
Paper
• 2505.02370
• Published
• 14
MUSAR: Exploring Multi-Subject Customization from Single-Subject Dataset
via Attention Routing
Paper
• 2505.02823
• Published
• 5
Flow-GRPO: Training Flow Matching Models via Online RL
Paper
• 2505.05470
• Published
• 88
Unified Continuous Generative Models
Paper
• 2505.07447
• Published
• 42
MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills
Paper
• 2505.06176
• Published
• 12
LightLab: Controlling Light Sources in Images with Diffusion Models
Paper
• 2505.09608
• Published
• 37
End-to-End Vision Tokenizer Tuning
Paper
• 2505.10562
• Published
• 22
Hunyuan-Game: Industrial-grade Intelligent Game Creation Model
Paper
• 2505.14135
• Published
• 16
KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models
Paper
• 2505.16707
• Published
• 45
Scaling Diffusion Transformers Efficiently via μP
Paper
• 2505.15270
• Published
• 35
OmniConsistency: Learning Style-Agnostic Consistency from Paired
Stylization Data
Paper
• 2505.18445
• Published
• 63
ImgEdit: A Unified Image Editing Dataset and Benchmark
Paper
• 2505.20275
• Published
• 18
DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via
Next-Detail Prediction
Paper
• 2505.21473
• Published
• 16
D-AR: Diffusion via Autoregressive Models
Paper
• 2505.23660
• Published
• 34
LoRAShop: Training-Free Multi-Concept Image Generation and Editing with
Rectified Flow Transformers
Paper
• 2505.23758
• Published
• 22
EasyText: Controllable Diffusion Transformer for Multilingual Text
Rendering
Paper
• 2505.24417
• Published
• 13
ReasonGen-R1: CoT for Autoregressive Image generation models through SFT
and RL
Paper
• 2505.24875
• Published
• 10
Cora: Correspondence-aware image editing using few step diffusion
Paper
• 2505.23907
• Published
• 12
ComposeAnything: Composite Object Priors for Text-to-Image Generation
Paper
• 2505.24086
• Published
• 5
SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image
Distillation
Paper
• 2506.00523
• Published
• 3
RelationAdapter: Learning and Transferring Visual Relation with
Diffusion Transformers
Paper
• 2506.02528
• Published
• 15
Image Editing As Programs with Diffusion Models
Paper
• 2506.04158
• Published
• 24
DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via
Diffusion Transformers
Paper
• 2505.21541
• Published
• 7
RefEdit: A Benchmark and Method for Improving Instruction-based Image
Editing Model on Referring Expressions
Paper
• 2506.03448
• Published
• 5
MARBLE: Material Recomposition and Blending in CLIP-Space
Paper
• 2506.05313
• Published
• 2
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image
Synthesis
Paper
• 2506.06276
• Published
• 26
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Paper
• 2506.07986
• Published
• 19
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
Prediction
Paper
• 2404.02905
• Published
• 74
Text-Aware Image Restoration with Diffusion Models
Paper
• 2506.09993
• Published
• 45
Fine-Grained Perturbation Guidance via Attention Head Selection
Paper
• 2506.10978
• Published
• 25
PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a
Unified Framework
Paper
• 2506.10741
• Published
• 27
CreatiPoster: Towards Editable and Controllable Multi-Layer Graphic
Design Generation
Paper
• 2506.10890
• Published
• 9
Token Perturbation Guidance for Diffusion Models
Paper
• 2506.10036
• Published
• 5
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Paper
• 2506.06962
• Published
• 28
Ambient Diffusion Omni: Training Good Models with Bad Data
Paper
• 2506.10038
• Published
• 9
Watermarking Autoregressive Image Generation
Paper
• 2506.16349
• Published
• 3
Audit & Repair: An Agentic Framework for Consistent Story Visualization
in Text-to-Image Diffusion Models
Paper
• 2506.18900
• Published
• 3
Improving Progressive Generation with Decomposable Flow Matching
Paper
• 2506.19839
• Published
• 8
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image
Generation
Paper
• 2506.18095
• Published
• 66
Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency
Models
Paper
• 2506.19103
• Published
• 42
XVerse: Consistent Multi-Subject Control of Identity and Semantic
Attributes via DiT Modulation
Paper
• 2506.21416
• Published
• 28
From Ideal to Real: Unified and Data-Efficient Dense Prediction for
Real-World Scenarios
Paper
• 2506.20279
• Published
• 20
Noise Consistency Training: A Native Approach for One-Step Generator in
Learning Additional Controls
Paper
• 2506.19741
• Published
• 4
Calligrapher: Freestyle Text Image Customization
Paper
• 2506.24123
• Published
• 37
Consistent Time-of-Flight Depth Denoising via Graph-Informed Geometric
Attention
Paper
• 2506.23542
• Published
• 13
Heeding the Inner Voice: Aligning ControlNet Training via Intermediate
Features Feedback
Paper
• 2507.02321
• Published
• 39
Beyond Simple Edits: X-Planner for Complex Instruction-Based Image
Editing
Paper
• 2507.05259
• Published
• 6
NeoBabel: A Multilingual Open Tower for Visual Generation
Paper
• 2507.06137
• Published
• 5
Vision Foundation Models as Effective Visual Tokenizers for
Autoregressive Image Generation
Paper
• 2507.08441
• Published
• 62
Subject-Consistent and Pose-Diverse Text-to-Image Generation
Paper
• 2507.08396
• Published
• 16
DreamPoster: A Unified Framework for Image-Conditioned Generative Poster
Design
Paper
• 2507.04218
• Published
• 13
Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation
from Diffusion Models
Paper
• 2507.07104
• Published
• 46
CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models
Paper
• 2507.13984
• Published
• 26
NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining
Paper
• 2507.14119
• Published
• 60
Latent Denoising Makes Good Visual Tokenizers
Paper
• 2507.15856
• Published
• 12
Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated
Diffusion Transformers
Paper
• 2507.08422
• Published
• 36
TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance
Paper
• 2507.18192
• Published
• 8
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image
Generative Models Great Again
Paper
• 2507.22058
• Published
• 40
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
Paper
• 2507.21802
• Published
• 19
PixNerd: Pixel Neural Field Diffusion
Paper
• 2507.23268
• Published
• 52
Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding
and Generation
Paper
• 2508.03320
• Published
• 63
The Promise of RL for Autoregressive Image Editing
Paper
• 2508.01119
• Published
• 11
LAMIC: Layout-Aware Multi-Image Composition via Scalability of
Multimodal Diffusion Transformer
Paper
• 2508.00477
• Published
• 11
HPSv3: Towards Wide-Spectrum Human Preference Score
Paper
• 2508.03789
• Published
• 20
The Cow of Rembrandt - Analyzing Artistic Prompt Interpretation in
Text-to-Image Models
Paper
• 2507.23313
• Published
• 1
Voost: A Unified and Scalable Diffusion Transformer for Bidirectional
Virtual Try-On and Try-Off
Paper
• 2508.04825
• Published
• 60
Story2Board: A Training-Free Approach for Expressive Storyboard
Generation
Paper
• 2508.09983
• Published
• 70
CannyEdit: Selective Canny Control and Dual-Prompt Guidance for
Training-Free Image Editing
Paper
• 2508.06937
• Published
• 7
NextStep-1: Toward Autoregressive Image Generation with Continuous
Tokens at Scale
Paper
• 2508.10711
• Published
• 145
Next Visual Granularity Generation
Paper
• 2508.12811
• Published
• 49
S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of
Diffusion Models
Paper
• 2508.12880
• Published
• 48
Training-Free Text-Guided Color Editing with Multi-Modal Diffusion
Transformer
Paper
• 2508.09131
• Published
• 16
TempFlow-GRPO: When Timing Matters for GRPO in Flow Models
Paper
• 2508.04324
• Published
• 11
Visual Autoregressive Modeling for Instruction-Guided Image Editing
Paper
• 2508.15772
• Published
• 9
Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance
for Text-to-Image Generation
Paper
• 2508.18032
• Published
• 41
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image
Generation
Paper
• 2508.17472
• Published
• 26
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable
Text-to-Image Reinforcement Learning
Paper
• 2508.20751
• Published
• 89
USO: Unified Style and Subject-Driven Generation via Disentangled and
Reward Learning
Paper
• 2508.18966
• Published
• 56
OneReward: Unified Mask-Guided Image Generation via Multi-Task Human
Preference Learning
Paper
• 2508.21066
• Published
• 13
Discrete Noise Inversion for Next-scale Autoregressive Text-based Image
Editing
Paper
• 2509.01984
• Published
• 7
Interleaving Reasoning for Better Text-to-Image Generation
Paper
• 2509.06945
• Published
• 15
Reconstruction Alignment Improves Unified Multimodal Models
Paper
• 2509.07295
• Published
• 40
UMO: Scaling Multi-Identity Consistency for Image Customization via
Matching Reward
Paper
• 2509.06818
• Published
• 29
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human
Preference
Paper
• 2509.06942
• Published
• 17
Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with
Quantization-Aware Scheduling
Paper
• 2509.01624
• Published
• 7
RewardDance: Reward Scaling in Visual Generation
Paper
• 2509.08826
• Published
• 73
Can Understanding and Generation Truly Benefit Together -- or Just
Coexist?
Paper
• 2509.09666
• Published
• 34
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis
Paper
• 2509.10441
• Published
• 31
LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion
Transformers via Explicit Correspondence
Paper
• 2509.12203
• Published
• 20
Image Tokenizer Needs Post-Training
Paper
• 2509.12474
• Published
• 8
Hyper-Bagel: A Unified Acceleration Framework for Multimodal
Understanding and Generation
Paper
• 2509.18824
• Published
• 23
IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance
Paper
• 2509.26231
• Published
• 18
DreamOmni2: Multimodal Instruction-based Editing and Generation
Paper
• 2510.06679
• Published
• 73
Ming-UniVision: Joint Image Understanding and Generation with a Unified
Continuous Tokenizer
Paper
• 2510.06590
• Published
• 77
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal
Generation and Understanding
Paper
• 2510.06308
• Published
• 55
Heptapod: Language Modeling on Visual Signals
Paper
• 2510.06673
• Published
• 5
Latent Diffusion Model without Variational Autoencoder
Paper
• 2510.15301
• Published
• 49
BLIP3o-NEXT: Next Frontier of Native Image Generation
Paper
• 2510.15857
• Published
• 25
WithAnyone: Towards Controllable and ID Consistent Image Generation
Paper
• 2510.14975
• Published
• 85
Learning an Image Editing Model without Image Editing Pairs
Paper
• 2510.14978
• Published
• 9
Kontinuous Kontext: Continuous Strength Control for Instruction-based
Image Editing
Paper
• 2510.08532
• Published
• 6
World-To-Image: Grounding Text-to-Image Generation with Agent-Driven
World Knowledge
Paper
• 2510.04201
• Published
• 5
VLM-Guided Adaptive Negative Prompting for Creative Generation
Paper
• 2510.10715
• Published
• 4
Generating an Image From 1,000 Words: Enhancing Text-to-Image With
Structured Captions
Paper
• 2511.06876
• Published
• 28
One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models
Paper
• 2511.10629
• Published
• 127
A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space
Paper
• 2511.10555
• Published
• 62
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
Paper
• 2511.19365
• Published
• 64
UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios
Paper
• 2511.18050
• Published
• 38
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper
• 2511.22699
• Published
• 238
PixelDiT: Pixel Diffusion Transformers for Image Generation
Paper
• 2511.20645
• Published
• 35
DiP: Taming Diffusion Models in Pixel Space
Paper
• 2511.18822
• Published
• 29
Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning
Paper
• 2511.20549
• Published
• 27
CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image Generation
Paper
• 2512.03540
• Published
• 13
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows
Paper
• 2512.05150
• Published
• 76
OmniPSD: Layered PSD Generation with Diffusion Transformer
Paper
• 2512.09247
• Published
• 48
EditThinker: Unlocking Iterative Reasoning for Any Image Editor
Paper
• 2512.05965
• Published
• 38
EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture
Paper
• 2512.04810
• Published
• 26
RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards
Paper
• 2512.00473
• Published
• 26
LongCat-Image Technical Report
Paper
• 2512.07584
• Published
• 23
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition
Paper
• 2512.15603
• Published
• 66
Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation
Paper
• 2512.16913
• Published
• 34
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers
Up
Paper
• 2412.16112
• Published
• 23
SpotEdit: Selective Region Editing in Diffusion Transformers
Paper
• 2512.22323
• Published
• 39
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
Paper
• 2601.02204
• Published
• 62