image

zzfive 's Collections

world_model

VLA

RolePlaying

dLLM

industry

RAG

ssm

safety

inference optimization

updated 23 days ago

Upvote

Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis

Paper • 2401.09048 • Published Jan 17, 2024 • 10
Improving fine-grained understanding in image-text pre-training

Paper • 2401.09865 • Published Jan 18, 2024 • 19
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19, 2024 • 64
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Paper • 2401.13627 • Published Jan 24, 2024 • 78
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion

Paper • 2401.13388 • Published Jan 24, 2024 • 13
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

Paper • 2402.02583 • Published Feb 4, 2024 • 8
SDXL-Lightning: Progressive Adversarial Diffusion Distillation

Paper • 2402.13929 • Published Feb 21, 2024 • 27
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching

Paper • 2402.14167 • Published Feb 21, 2024 • 11
Subobject-level Image Tokenization

Paper • 2402.14327 • Published Feb 22, 2024 • 18
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition

Paper • 2402.15504 • Published Feb 23, 2024 • 21
Multi-LoRA Composition for Image Generation

Paper • 2402.16843 • Published Feb 26, 2024 • 31
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27, 2024 • 195
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Paper • 2402.19481 • Published Feb 29, 2024 • 22
Trajectory Consistency Distillation

Paper • 2402.19159 • Published Feb 29, 2024 • 17
RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

Paper • 2403.00483 • Published Mar 1, 2024 • 16
ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models

Paper • 2403.02084 • Published Mar 4, 2024 • 15
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

Paper • 2403.01779 • Published Mar 4, 2024 • 30
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Paper • 2403.03206 • Published Mar 5, 2024 • 71
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

Paper • 2403.05135 • Published Mar 8, 2024 • 45
Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM

Paper • 2403.07487 • Published Mar 12, 2024 • 16
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

Paper • 2403.09622 • Published Mar 14, 2024 • 17
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control

Paper • 2403.09055 • Published Mar 14, 2024 • 26
IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models

Paper • 2403.13535 • Published Mar 20, 2024 • 23
DepthFM: Fast Monocular Depth Estimation with Flow Matching

Paper • 2403.13788 • Published Mar 20, 2024 • 18
Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos

Paper • 2403.13044 • Published Mar 19, 2024 • 15
FlashFace: Human Image Personalization with High-fidelity Identity Preservation

Paper • 2403.17008 • Published Mar 25, 2024 • 22
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

Paper • 2403.16627 • Published Mar 25, 2024 • 22
ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 56
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion

Paper • 2403.18818 • Published Mar 27, 2024 • 28
CosmicMan: A Text-to-Image Foundation Model for Humans

Paper • 2404.01294 • Published Apr 1, 2024 • 17
Condition-Aware Neural Network for Controlled Image Generation

Paper • 2404.01143 • Published Apr 1, 2024 • 13
Measuring Style Similarity in Diffusion Models

Paper • 2404.01292 • Published Apr 1, 2024 • 15
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Paper • 2404.03653 • Published Apr 4, 2024 • 35
RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

Paper • 2404.03673 • Published Mar 25, 2024 • 15
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Paper • 2404.07987 • Published Apr 11, 2024 • 49
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

Paper • 2404.08197 • Published Apr 12, 2024 • 29
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Paper • 2404.09967 • Published Apr 15, 2024 • 21
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

Paper • 2404.09990 • Published Apr 15, 2024 • 14
Dynamic Typography: Bringing Words to Life

Paper • 2404.11614 • Published Apr 17, 2024 • 46
MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation

Paper • 2404.11565 • Published Apr 17, 2024 • 15
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

Paper • 2404.13686 • Published Apr 21, 2024 • 29
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

Paper • 2404.14507 • Published Apr 22, 2024 • 23
PuLID: Pure and Lightning ID Customization via Contrastive Alignment

Paper • 2404.16022 • Published Apr 24, 2024 • 25
Editable Image Elements for Controllable Synthesis

Paper • 2404.16029 • Published Apr 24, 2024 • 12
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning

Paper • 2404.15449 • Published Apr 23, 2024 • 14
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

Paper • 2404.16771 • Published Dec 28, 2024 • 19
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

Paper • 2405.01434 • Published May 2, 2024 • 56
Customizing Text-to-Image Models with a Single Image Pair

Paper • 2405.01536 • Published May 2, 2024 • 22
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

Paper • 2405.12970 • Published May 21, 2024 • 25
RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance

Paper • 2405.14677 • Published May 23, 2024 • 11
DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis

Paper • 2405.14224 • Published May 23, 2024 • 15
Semantica: An Adaptable Image-Conditioned Diffusion Model

Paper • 2405.14857 • Published May 23, 2024 • 11
EM Distillation for One-step Diffusion Models

Paper • 2405.16852 • Published May 27, 2024 • 12
Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models

Paper • 2405.16759 • Published May 27, 2024 • 8
Phased Consistency Model

Paper • 2405.18407 • Published May 28, 2024 • 48
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model

Paper • 2406.04333 • Published Jun 6, 2024 • 38
pOps: Photo-Inspired Diffusion Operators

Paper • 2406.01300 • Published Jun 3, 2024 • 17
Zero-shot Image Editing with Reference Imitation

Paper • 2406.07547 • Published Jun 11, 2024 • 33
An Image is Worth 32 Tokens for Reconstruction and Generation

Paper • 2406.07550 • Published Jun 11, 2024 • 60
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

Paper • 2406.06911 • Published Jun 11, 2024 • 12
FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

Paper • 2406.08392 • Published Jun 12, 2024 • 21
Depth Anything V2

Paper • 2406.09414 • Published Jun 13, 2024 • 105
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13, 2024 • 52
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

Paper • 2406.09416 • Published Jun 13, 2024 • 29
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

Paper • 2406.09162 • Published Jun 13, 2024 • 14
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering

Paper • 2406.10208 • Published Jun 14, 2024 • 22
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

Paper • 2406.11831 • Published Jun 17, 2024 • 22
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing

Paper • 2406.10601 • Published Jun 15, 2024 • 70
Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps

Paper • 2406.14539 • Published Jun 20, 2024 • 27
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

Paper • 2406.16855 • Published Jun 24, 2024 • 57
Aligning Diffusion Models with Noise-Conditioned Perception

Paper • 2406.17636 • Published Jun 25, 2024 • 27
Magic Insert: Style-Aware Drag-and-Drop

Paper • 2407.02489 • Published Jul 2, 2024 • 21
DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents

Paper • 2407.03300 • Published Jul 3, 2024 • 14
PartCraft: Crafting Creative Objects by Parts

Paper • 2407.04604 • Published Jul 5, 2024 • 6
SVGCraft: Beyond Single Object Text-to-SVG Synthesis with Comprehensive Canvas Layout

Paper • 2404.00412 • Published Mar 30, 2024 • 2
DataDream: Few-shot Guided Dataset Generation

Paper • 2407.10910 • Published Jul 15, 2024 • 10
Scaling Diffusion Transformers to 16 Billion Parameters

Paper • 2407.11633 • Published Jul 16, 2024 • 26
IMAGDressing-v1: Customizable Virtual Dressing

Paper • 2407.12705 • Published Jul 17, 2024 • 13
CGB-DM: Content and Graphic Balance Layout Generation with Transformer-based Diffusion Model

Paper • 2407.15233 • Published Jul 21, 2024 • 7
Artist: Aesthetically Controllable Text-Driven Stylization without Training

Paper • 2407.15842 • Published Jul 22, 2024 • 14
Discrete Flow Matching

Paper • 2407.15595 • Published Jul 22, 2024 • 14
ViPer: Visual Personalization of Generative Models via Individual Preference Learning

Paper • 2407.17365 • Published Jul 24, 2024 • 13
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

Paper • 2407.16982 • Published Jul 24, 2024 • 42
BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation

Paper • 2407.17952 • Published Jul 25, 2024 • 32
SHIC: Shape-Image Correspondences with no Keypoint Supervision

Paper • 2407.18907 • Published Jul 26, 2024 • 41
TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models

Paper • 2408.00735 • Published Aug 1, 2024 • 16
Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention

Paper • 2408.00760 • Published Aug 1, 2024 • 7
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining

Paper • 2408.02657 • Published Aug 5, 2024 • 35
ProCreate, Dont Reproduce! Propulsive Energy Diffusion for Creative Generation

Paper • 2408.02226 • Published Aug 5, 2024 • 11
IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts

Paper • 2408.03209 • Published Aug 6, 2024 • 22
Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling

Paper • 2408.03695 • Published Aug 7, 2024 • 13
ControlNeXt: Powerful and Efficient Control for Image and Video Generation

Paper • 2408.06070 • Published Aug 12, 2024 • 55
BRAT: Bonus oRthogonAl Token for Architecture Agnostic Textual Inversion

Paper • 2408.04785 • Published Aug 8, 2024 • 8
UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization

Paper • 2408.05939 • Published Aug 12, 2024 • 14
Imagen 3

Paper • 2408.07009 • Published Aug 13, 2024 • 62
ZePo: Zero-Shot Portrait Stylization with Faster Sampling

Paper • 2408.05492 • Published Aug 10, 2024 • 7
Generative Photomontage

Paper • 2408.07116 • Published Aug 13, 2024 • 20
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations

Paper • 2408.08459 • Published Aug 15, 2024 • 45
TurboEdit: Instant text-based image editing

Paper • 2408.08332 • Published Aug 14, 2024 • 20
Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering

Paper • 2408.09702 • Published Aug 19, 2024 • 11
TraDiffusion: Trajectory-Based Training-Free Image Generation

Paper • 2408.09739 • Published Aug 19, 2024 • 9
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning

Paper • 2408.11001 • Published Aug 20, 2024 • 13
The Brittleness of AI-Generated Image Watermarking Techniques: Examining Their Robustness Against Visual Paraphrasing Attacks

Paper • 2408.10446 • Published Aug 19, 2024 • 9
Scalable Autoregressive Image Generation with Mamba

Paper • 2408.12245 • Published Aug 22, 2024 • 26
CODE: Confident Ordinary Differential Editing

Paper • 2408.12418 • Published Aug 22, 2024 • 4
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher

Paper • 2408.14176 • Published Aug 26, 2024 • 62
Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation

Paper • 2408.14819 • Published Aug 27, 2024 • 22
Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation

Paper • 2408.15991 • Published Aug 28, 2024 • 17
CSGO: Content-Style Composition in Text-to-Image Generation

Paper • 2408.16766 • Published Aug 29, 2024 • 18
CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization

Paper • 2408.15914 • Published Aug 28, 2024 • 24
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

Paper • 2408.17131 • Published Aug 30, 2024 • 11
LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published Sep 3, 2024 • 34
Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization

Paper • 2409.00492 • Published Aug 31, 2024 • 11
Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing

Paper • 2409.01322 • Published Sep 2, 2024 • 96
IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation

Paper • 2409.08240 • Published Sep 12, 2024 • 22
InstantDrag: Improving Interactivity in Drag-based Image Editing

Paper • 2409.08857 • Published Sep 13, 2024 • 34
StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation

Paper • 2409.12576 • Published Sep 19, 2024 • 16
Imagine yourself: Tuning-Free Personalized Image Generation

Paper • 2409.13346 • Published Sep 20, 2024 • 69
Colorful Diffuse Intrinsic Image Decomposition in the Wild

Paper • 2409.13690 • Published Sep 20, 2024 • 13
Improvements to SDXL in NovelAI Diffusion V3

Paper • 2409.15997 • Published Sep 24, 2024 • 13
Pixel-Space Post-Training of Latent Diffusion Models

Paper • 2409.17565 • Published Sep 26, 2024 • 20
OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction

Paper • 2410.04932 • Published Oct 7, 2024 • 9
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding

Paper • 2410.01699 • Published Oct 2, 2024 • 18
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

Paper • 2410.07171 • Published Oct 9, 2024 • 43
Story-Adapter: A Training-free Iterative Framework for Long Story Visualization

Paper • 2410.06244 • Published Oct 8, 2024 • 20
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models

Paper • 2410.02416 • Published Oct 3, 2024 • 34
DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models

Paper • 2410.08207 • Published Oct 10, 2024 • 19
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Paper • 2410.08261 • Published Oct 10, 2024 • 52
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models

Paper • 2410.07133 • Published Oct 9, 2024 • 19
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

Paper • 2410.10792 • Published Oct 14, 2024 • 31
Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices

Paper • 2410.11795 • Published Oct 15, 2024 • 18
Improving Long-Text Alignment for Text-to-Image Diffusion Models

Paper • 2410.11817 • Published Oct 15, 2024 • 15
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens

Paper • 2410.13863 • Published Oct 17, 2024 • 37
VidPanos: Generative Panoramic Videos from Casual Panning Videos

Paper • 2410.13832 • Published Oct 17, 2024 • 13
FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model

Paper • 2410.13925 • Published Oct 17, 2024 • 24
BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities

Paper • 2410.14672 • Published Oct 18, 2024 • 8
Scalable Ranked Preference Optimization for Text-to-Image Generation

Paper • 2410.18013 • Published Oct 23, 2024 • 14
Stable Consistency Tuning: Understanding and Improving Consistency Models

Paper • 2410.18958 • Published Oct 24, 2024 • 11
DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation

Paper • 2410.18666 • Published Oct 24, 2024 • 19
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders

Paper • 2410.22366 • Published Oct 28, 2024 • 84
Constant Acceleration Flow

Paper • 2411.00322 • Published Nov 1, 2024 • 24
In-Context LoRA for Diffusion Transformers

Paper • 2410.23775 • Published Oct 31, 2024 • 11
Training-free Regional Prompting for Diffusion Transformers

Paper • 2411.02395 • Published Nov 4, 2024 • 25
Constrained Diffusion Implicit Models

Paper • 2411.00359 • Published Nov 1, 2024 • 6
SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Paper • 2411.05007 • Published Nov 7, 2024 • 24
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models

Paper • 2411.07232 • Published Nov 11, 2024 • 68
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision

Paper • 2411.07199 • Published Nov 11, 2024 • 50
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models

Paper • 2411.07126 • Published Nov 11, 2024 • 30
Watermark Anything with Localized Messages

Paper • 2411.07231 • Published Nov 11, 2024 • 21
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation

Paper • 2411.07975 • Published Nov 12, 2024 • 32
Scaling Properties of Diffusion Models for Perceptual Tasks

Paper • 2411.08034 • Published Nov 12, 2024 • 13
MagicQuill: An Intelligent Interactive Image Editing System

Paper • 2411.09703 • Published Nov 14, 2024 • 80
Inconsistencies In Consistency Models: Better ODE Solving Does Not Imply Better Samples

Paper • 2411.08954 • Published Nov 13, 2024 • 10
Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement

Paper • 2411.06558 • Published Nov 10, 2024 • 36
FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on

Paper • 2411.10499 • Published Nov 15, 2024 • 13
Continuous Speculative Decoding for Autoregressive Image Generation

Paper • 2411.11925 • Published Nov 18, 2024 • 16
Stylecodes: Encoding Stylistic Information For Image Generation

Paper • 2411.12811 • Published Nov 19, 2024 • 12
Generating Compositional Scenes via Text-to-image RGBA Instance Generation

Paper • 2411.10913 • Published Nov 16, 2024 • 4
Stable Flow: Vital Layers for Training-Free Image Editing

Paper • 2411.14430 • Published Nov 21, 2024 • 22
Style-Friendly SNR Sampler for Style-Driven Generation

Paper • 2411.14793 • Published Nov 22, 2024 • 39
OminiControl: Minimal and Universal Control for Diffusion Transformer

Paper • 2411.15098 • Published Nov 22, 2024 • 61
MyTimeMachine: Personalized Facial Age Transformation

Paper • 2411.14521 • Published Nov 21, 2024 • 23
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator

Paper • 2411.15466 • Published Nov 23, 2024 • 39
One Diffusion to Generate Them All

Paper • 2411.16318 • Published Nov 25, 2024 • 28
Controllable Human Image Generation with Personalized Multi-Garments

Paper • 2411.16801 • Published Nov 25, 2024 • 3
ROICtrl: Boosting Instance Control for Visual Generation

Paper • 2411.17949 • Published Nov 27, 2024 • 87
DreamCache: Finetuning-Free Lightweight Personalized Image Generation via Feature Caching

Paper • 2411.17786 • Published Nov 26, 2024 • 12
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient

Paper • 2411.17787 • Published Nov 26, 2024 • 12
Diffusion Self-Distillation for Zero-Shot Customized Image Generation

Paper • 2411.18616 • Published Nov 27, 2024 • 16
Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis

Paper • 2411.17769 • Published Nov 26, 2024 • 8
Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing

Paper • 2411.16832 • Published Nov 25, 2024 • 2
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models

Paper • 2411.18350 • Published Nov 27, 2024 • 28
FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion

Paper • 2411.18552 • Published Nov 27, 2024 • 18
Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis

Paper • 2412.01819 • Published Dec 2, 2024 • 34
Art-Free Generative Models: Art Creation Without Graphic Art Knowledge

Paper • 2412.00176 • Published Nov 29, 2024 • 9
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance

Paper • 2412.02687 • Published Dec 3, 2024 • 114
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation

Paper • 2412.03069 • Published Dec 4, 2024 • 34
LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting

Paper • 2412.00177 • Published Nov 29, 2024 • 8
A Noise is Worth Diffusion Guidance

Paper • 2412.03895 • Published Dec 5, 2024 • 29
Negative Token Merging: Image-based Adversarial Feature Guidance

Paper • 2412.01339 • Published Dec 2, 2024 • 22
AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models

Paper • 2412.04146 • Published Dec 5, 2024 • 23
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Paper • 2412.04431 • Published Dec 5, 2024 • 17
ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality

Paper • 2412.04062 • Published Dec 5, 2024 • 8
SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion

Paper • 2412.04301 • Published Dec 5, 2024 • 40
PanoDreamer: 3D Panorama Synthesis from a Single Image

Paper • 2412.04827 • Published Dec 6, 2024 • 10
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published Dec 10, 2024 • 48
Hidden in the Noise: Two-Stage Robust Watermarking for Images

Paper • 2412.04653 • Published Dec 5, 2024 • 30
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models

Paper • 2412.07674 • Published Dec 10, 2024 • 20
UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics

Paper • 2412.07774 • Published Dec 10, 2024 • 30
LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation

Paper • 2412.05148 • Published Dec 6, 2024 • 12
Learning Flow Fields in Attention for Controllable Person Image Generation

Paper • 2412.08486 • Published Dec 11, 2024 • 36
FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models

Paper • 2412.08629 • Published Dec 11, 2024 • 13
StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements

Paper • 2412.08503 • Published Dec 11, 2024 • 8
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

Paper • 2412.09618 • Published Dec 12, 2024 • 21
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training

Paper • 2412.09619 • Published Dec 12, 2024 • 32
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models

Paper • 2412.09622 • Published Dec 12, 2024 • 8
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion

Paper • 2412.09626 • Published Dec 12, 2024 • 21
ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation

Paper • 2412.08645 • Published Dec 11, 2024 • 12
FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing

Paper • 2412.07517 • Published Dec 10, 2024 • 11
FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers

Paper • 2412.09611 • Published Dec 12, 2024 • 11
BrushEdit: All-In-One Image Inpainting and Editing

Paper • 2412.10316 • Published Dec 13, 2024 • 37
ColorFlow: Retrieval-Augmented Image Sequence Colorization

Paper • 2412.11815 • Published Dec 16, 2024 • 27
Causal Diffusion Transformers for Generative Modeling

Paper • 2412.12095 • Published Dec 16, 2024 • 23
FashionComposer: Compositional Fashion Image Generation

Paper • 2412.14168 • Published Dec 18, 2024 • 17
ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers

Paper • 2412.12571 • Published Dec 17, 2024 • 8
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution

Paper • 2412.15213 • Published Dec 19, 2024 • 28
Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion

Paper • 2412.14462 • Published Dec 19, 2024 • 15
1.58-bit FLUX

Paper • 2412.18653 • Published Dec 24, 2024 • 86
The Superposition of Diffusion Models Using the Itô Density Estimator

Paper • 2412.17762 • Published Dec 23, 2024 • 13
From Elements to Design: A Layered Approach for Automatic Graphic Design Composition

Paper • 2412.19712 • Published Dec 27, 2024 • 15
VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control

Paper • 2412.20800 • Published Dec 30, 2024 • 11
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation

Paper • 2501.02576 • Published Jan 5, 2025 • 15
MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control

Paper • 2501.02260 • Published Jan 4, 2025 • 5
The GAN is dead; long live the GAN! A Modern GAN Baseline

Paper • 2501.05441 • Published Jan 9, 2025 • 98
MangaNinja: Line Art Colorization with Precise Reference Following

Paper • 2501.08332 • Published Jan 14, 2025 • 62
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models

Paper • 2501.06751 • Published Jan 12, 2025 • 32
Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens

Paper • 2501.07730 • Published Jan 13, 2025 • 18
FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors

Paper • 2501.08225 • Published Jan 14, 2025 • 20
3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering

Paper • 2501.05131 • Published Jan 9, 2025 • 37
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Paper • 2501.09732 • Published Jan 16, 2025 • 72
SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces

Paper • 2501.09756 • Published Jan 16, 2025 • 20
Textoon: Generating Vivid 2D Cartoon Characters from Text Descriptions

Paper • 2501.10020 • Published Jan 17, 2025 • 25
TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space

Paper • 2501.12224 • Published Jan 21, 2025 • 48
GPS as a Control Signal for Image Generation

Paper • 2501.12390 • Published Jan 21, 2025 • 15
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step

Paper • 2501.13926 • Published Jan 23, 2025 • 43
One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt

Paper • 2501.13554 • Published Jan 23, 2025 • 10
AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation

Paper • 2403.14614 • Published Mar 21, 2024 • 2
Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration

Paper • 2406.18516 • Published Jun 26, 2024 • 4
Visual Generation Without Guidance

Paper • 2501.15420 • Published Jan 26, 2025 • 8
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer

Paper • 2501.18427 • Published Jan 30, 2025 • 26
Inverse Bridge Matching Distillation

Paper • 2502.01362 • Published Feb 3, 2025 • 28
LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer

Paper • 2502.01105 • Published Feb 3, 2025 • 21
Weak-to-Strong Diffusion with Reflection

Paper • 2502.00473 • Published Feb 1, 2025 • 24
Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More

Paper • 2502.03738 • Published Feb 6, 2025 • 11
Dual Caption Preference Optimization for Diffusion Models

Paper • 2502.06023 • Published Feb 9, 2025 • 9
Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation

Paper • 2502.08690 • Published Feb 12, 2025 • 43
ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation

Paper • 2502.09411 • Published Feb 13, 2025 • 22
Precise Parameter Localization for Textual Generation in Diffusion Models

Paper • 2502.09935 • Published Feb 14, 2025 • 12
Region-Adaptive Sampling for Diffusion Transformers

Paper • 2502.10389 • Published Feb 14, 2025 • 53
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models

Paper • 2502.10458 • Published Feb 12, 2025 • 38
Diffusion Models without Classifier-free Guidance

Paper • 2502.12154 • Published Feb 17, 2025 • 8
PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data

Paper • 2502.14397 • Published Feb 20, 2025 • 41
One-step Diffusion Models with f-Divergence Distribution Matching

Paper • 2502.15681 • Published Feb 21, 2025 • 8
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks

Paper • 2502.17157 • Published Feb 24, 2025 • 52
GCC: Generative Color Constancy via Diffusing a Color Checker

Paper • 2502.17435 • Published Feb 24, 2025 • 30
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

Paper • 2502.18364 • Published Feb 25, 2025 • 36
KV-Edit: Training-Free Image Editing for Precise Background Preservation

Paper • 2502.17363 • Published Feb 24, 2025 • 37
K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs

Paper • 2502.18461 • Published Feb 25, 2025 • 17
LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation

Paper • 2502.18302 • Published Feb 25, 2025 • 5
GHOST 2.0: generative high-fidelity one shot transfer of heads

Paper • 2502.18417 • Published Feb 25, 2025 • 67
Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator

Paper • 2502.19204 • Published Feb 26, 2025 • 11
UniTok: A Unified Tokenizer for Visual Generation and Understanding

Paper • 2502.20321 • Published Feb 27, 2025 • 30
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think

Paper • 2502.20172 • Published Feb 27, 2025 • 29
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute

Paper • 2502.20126 • Published Feb 27, 2025 • 19
Training Consistency Models with Variational Noise Coupling

Paper • 2502.18197 • Published Feb 25, 2025 • 7
How far can we go with ImageNet for Text-to-Image generation?

Paper • 2502.21318 • Published Feb 28, 2025 • 26
RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification

Paper • 2503.02537 • Published Mar 4, 2025 • 12
Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model

Paper • 2503.06141 • Published Mar 8, 2025 • 5
Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment

Paper • 2503.07334 • Published Mar 10, 2025 • 16
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model

Paper • 2503.07703 • Published Mar 10, 2025 • 37
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization

Paper • 2503.08619 • Published Mar 11, 2025 • 20
ObjectMover: Generative Object Movement with Video Prior

Paper • 2503.08037 • Published Mar 11, 2025 • 5
Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space

Paper • 2503.09419 • Published Mar 12, 2025 • 6
CoSTAast: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing

Paper • 2503.10613 • Published Mar 13, 2025 • 79
Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models

Paper • 2503.09669 • Published Mar 12, 2025 • 35
OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting

Paper • 2503.08677 • Published Mar 11, 2025 • 29
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

Paper • 2503.09641 • Published Mar 12, 2025 • 42
ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style Transfer

Paper • 2503.10614 • Published Mar 13, 2025 • 8
Autoregressive Image Generation with Randomized Parallel Decoding

Paper • 2503.10568 • Published Mar 13, 2025 • 9
Piece it Together: Part-Based Concepting with IP-Priors

Paper • 2503.10365 • Published Mar 13, 2025 • 8
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity

Paper • 2503.07677 • Published Mar 10, 2025 • 86
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models

Paper • 2503.12885 • Published Mar 17, 2025 • 43
Edit Transfer: Learning Image Editing via Vision In-Context Relations

Paper • 2503.13327 • Published Mar 17, 2025 • 29
BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing

Paper • 2503.13434 • Published Mar 17, 2025 • 28
Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation

Paper • 2503.13070 • Published Mar 17, 2025 • 10
GenStereo: Towards Open-World Generation of Stereo Images and Unsupervised Matching

Paper • 2503.12720 • Published Mar 17, 2025 • 4
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era

Paper • 2503.12329 • Published Mar 16, 2025 • 28
Atlas: Multi-Scale Attention Improves Long Context Image Modeling

Paper • 2503.12355 • Published Mar 16, 2025 • 12
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection

Paper • 2503.12271 • Published Mar 15, 2025 • 9
LEGION: Learning to Ground and Explain for Synthetic Image Detection

Paper • 2503.15264 • Published Mar 19, 2025 • 21
Scale-wise Distillation of Diffusion Models

Paper • 2503.16397 • Published Mar 20, 2025 • 42
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers

Paper • 2503.14487 • Published Mar 18, 2025 • 28
Ultra-Resolution Adaptation with Ease

Paper • 2503.16322 • Published Mar 20, 2025 • 13
FFaceNeRF: Few-shot Face Editing in Neural Radiance Fields

Paper • 2503.17095 • Published Mar 21, 2025 • 5
Single Image Iterative Subject-driven Generation and Editing

Paper • 2503.16025 • Published Mar 20, 2025 • 14
Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models

Paper • 2503.18352 • Published Mar 24, 2025 • 7
RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation

Paper • 2503.17735 • Published Mar 22, 2025 • 3
Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing

Paper • 2503.19385 • Published Mar 25, 2025 • 34
Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation

Paper • 2503.14905 • Published Mar 19, 2025 • 20
Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models

Paper • 2503.18446 • Published Mar 24, 2025 • 12
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models

Paper • 2503.20240 • Published Mar 26, 2025 • 22
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis

Paper • 2503.21749 • Published Mar 27, 2025 • 26
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

Paper • 2503.21758 • Published Mar 27, 2025 • 22
Unified Multimodal Discrete Diffusion

Paper • 2503.20853 • Published Mar 26, 2025 • 9
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes

Paper • 2503.23461 • Published Mar 30, 2025 • 94
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement

Paper • 2504.01934 • Published Apr 2, 2025 • 22
Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI Feedback

Paper • 2405.20216 • Published May 30, 2024 • 21
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning

Paper • 2504.02949 • Published Apr 3, 2025 • 21
SPF-Portrait: Towards Pure Portrait Customization with Semantic Pollution-Free Fine-tuning

Paper • 2504.00396 • Published Apr 1, 2025 • 3
Concept Lancet: Image Editing with Compositional Representation Transplant

Paper • 2504.02828 • Published Apr 3, 2025 • 16
An Empirical Study of GPT-4o Image Generation Capabilities

Paper • 2504.05979 • Published Apr 8, 2025 • 64
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation

Paper • 2504.02160 • Published Apr 2, 2025 • 37
Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model

Paper • 2504.05594 • Published Apr 8, 2025 • 11
HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance

Paper • 2504.06232 • Published Apr 8, 2025 • 13
DDT: Decoupled Diffusion Transformer

Paper • 2504.05741 • Published Apr 8, 2025 • 77
Are We Done with Object-Centric Learning?

Paper • 2504.07092 • Published Apr 9, 2025 • 6
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

Paper • 2504.07960 • Published Apr 10, 2025 • 50
Compass Control: Multi Object Orientation Control for Text-to-Image Generation

Paper • 2504.06752 • Published Apr 9, 2025 • 9
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

Paper • 2504.08736 • Published Apr 11, 2025 • 46
ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration

Paper • 2504.08591 • Published Apr 11, 2025 • 18
PixelFlow: Pixel-Space Generative Models with Flow

Paper • 2504.07963 • Published Apr 10, 2025 • 18
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL

Paper • 2504.11455 • Published Apr 15, 2025 • 14
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation

Paper • 2504.09454 • Published Apr 13, 2025 • 11
Cobra: Efficient Line Art COlorization with BRoAder References

Paper • 2504.12240 • Published Apr 16, 2025 • 27
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

Paper • 2504.10483 • Published Apr 14, 2025 • 22
DMM: Building a Versatile Image Generation Model via Distillation-Based Model Merging

Paper • 2504.12364 • Published Apr 16, 2025 • 22
InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework

Paper • 2504.12395 • Published Apr 16, 2025 • 16
Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images

Paper • 2504.09621 • Published Apr 13, 2025 • 11
LookingGlass: Generative Anamorphoses via Laplacian Pyramid Warping

Paper • 2504.08902 • Published Apr 11, 2025 • 8
Personalized Text-to-Image Generation with Auto-Regressive Models

Paper • 2504.13162 • Published Apr 17, 2025 • 18
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

Paper • 2504.16080 • Published Apr 22, 2025 • 15
DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group Learning

Paper • 2504.14509 • Published Apr 20, 2025 • 53
DreamO: A Unified Framework for Image Customization

Paper • 2504.16915 • Published Apr 23, 2025 • 24
Step1X-Edit: A Practical Framework for General Image Editing

Paper • 2504.17761 • Published Apr 24, 2025 • 92
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation

Paper • 2504.17502 • Published Apr 24, 2025 • 55
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models

Paper • 2504.17789 • Published Apr 24, 2025 • 23
Boosting Generative Image Modeling via Joint Image-Feature Synthesis

Paper • 2504.16064 • Published Apr 22, 2025 • 15
RepText: Rendering Visual Text via Replicating

Paper • 2504.19724 • Published Apr 28, 2025 • 31
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer

Paper • 2504.20690 • Published Apr 29, 2025 • 19
PixelHacker: Image Inpainting with Structural and Semantic Consistency

Paper • 2504.20438 • Published Apr 29, 2025 • 44
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing

Paper • 2505.02370 • Published May 5, 2025 • 14
MUSAR: Exploring Multi-Subject Customization from Single-Subject Dataset via Attention Routing

Paper • 2505.02823 • Published May 5, 2025 • 5
Flow-GRPO: Training Flow Matching Models via Online RL

Paper • 2505.05470 • Published May 8, 2025 • 89
Unified Continuous Generative Models

Paper • 2505.07447 • Published May 12, 2025 • 42
MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills

Paper • 2505.06176 • Published May 9, 2025 • 12
LightLab: Controlling Light Sources in Images with Diffusion Models

Paper • 2505.09608 • Published May 14, 2025 • 38
End-to-End Vision Tokenizer Tuning

Paper • 2505.10562 • Published May 15, 2025 • 22
Hunyuan-Game: Industrial-grade Intelligent Game Creation Model

Paper • 2505.14135 • Published May 20, 2025 • 16
KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

Paper • 2505.16707 • Published May 22, 2025 • 44
Scaling Diffusion Transformers Efficiently via μP

Paper • 2505.15270 • Published May 21, 2025 • 35
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

Paper • 2505.18445 • Published May 24, 2025 • 63
ImgEdit: A Unified Image Editing Dataset and Benchmark

Paper • 2505.20275 • Published May 26, 2025 • 20
DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction

Paper • 2505.21473 • Published May 27, 2025 • 16
D-AR: Diffusion via Autoregressive Models

Paper • 2505.23660 • Published May 29, 2025 • 34
LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers

Paper • 2505.23758 • Published May 29, 2025 • 22
EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering

Paper • 2505.24417 • Published May 30, 2025 • 13
ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL

Paper • 2505.24875 • Published May 30, 2025 • 10
Cora: Correspondence-aware image editing using few step diffusion

Paper • 2505.23907 • Published May 29, 2025 • 12
ComposeAnything: Composite Object Priors for Text-to-Image Generation

Paper • 2505.24086 • Published May 30, 2025 • 5
SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation

Paper • 2506.00523 • Published May 31, 2025 • 3
RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers

Paper • 2506.02528 • Published Jun 3, 2025 • 16
Image Editing As Programs with Diffusion Models

Paper • 2506.04158 • Published Jun 4, 2025 • 24
DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers

Paper • 2505.21541 • Published May 24, 2025 • 7
RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions

Paper • 2506.03448 • Published Jun 3, 2025 • 5
MARBLE: Material Recomposition and Blending in CLIP-Space

Paper • 2506.05313 • Published Jun 5, 2025 • 2
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis

Paper • 2506.06276 • Published Jun 6, 2025 • 26
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

Paper • 2506.07986 • Published Jun 9, 2025 • 19
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3, 2024 • 74
Text-Aware Image Restoration with Diffusion Models

Paper • 2506.09993 • Published Jun 11, 2025 • 45
Fine-Grained Perturbation Guidance via Attention Head Selection

Paper • 2506.10978 • Published Jun 12, 2025 • 25
PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework

Paper • 2506.10741 • Published Jun 12, 2025 • 27
CreatiPoster: Towards Editable and Controllable Multi-Layer Graphic Design Generation

Paper • 2506.10890 • Published Jun 12, 2025 • 9
Token Perturbation Guidance for Diffusion Models

Paper • 2506.10036 • Published Jun 10, 2025 • 6
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation

Paper • 2506.06962 • Published Jun 8, 2025 • 28
Ambient Diffusion Omni: Training Good Models with Bad Data

Paper • 2506.10038 • Published Jun 10, 2025 • 9
Watermarking Autoregressive Image Generation

Paper • 2506.16349 • Published Jun 19, 2025 • 3
Audit & Repair: An Agentic Framework for Consistent Story Visualization in Text-to-Image Diffusion Models

Paper • 2506.18900 • Published Jun 23, 2025 • 3
Improving Progressive Generation with Decomposable Flow Matching

Paper • 2506.19839 • Published Jun 24, 2025 • 8
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

Paper • 2506.18095 • Published Jun 22, 2025 • 66
Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency Models

Paper • 2506.19103 • Published Jun 23, 2025 • 42
XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation

Paper • 2506.21416 • Published Jun 26, 2025 • 28
From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios

Paper • 2506.20279 • Published Jun 25, 2025 • 20
Noise Consistency Training: A Native Approach for One-Step Generator in Learning Additional Controls

Paper • 2506.19741 • Published Jun 24, 2025 • 4
Calligrapher: Freestyle Text Image Customization

Paper • 2506.24123 • Published Jun 30, 2025 • 37
Consistent Time-of-Flight Depth Denoising via Graph-Informed Geometric Attention

Paper • 2506.23542 • Published Jun 30, 2025 • 13
Heeding the Inner Voice: Aligning ControlNet Training via Intermediate Features Feedback

Paper • 2507.02321 • Published Jul 3, 2025 • 39
Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing

Paper • 2507.05259 • Published Jul 7, 2025 • 6
NeoBabel: A Multilingual Open Tower for Visual Generation

Paper • 2507.06137 • Published Jul 8, 2025 • 5
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation

Paper • 2507.08441 • Published Jul 11, 2025 • 62
Subject-Consistent and Pose-Diverse Text-to-Image Generation

Paper • 2507.08396 • Published Jul 11, 2025 • 16
DreamPoster: A Unified Framework for Image-Conditioned Generative Poster Design

Paper • 2507.04218 • Published Jul 6, 2025 • 13
Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models

Paper • 2507.07104 • Published Jul 9, 2025 • 46
CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models

Paper • 2507.13984 • Published Jul 18, 2025 • 26
NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining

Paper • 2507.14119 • Published Jul 18, 2025 • 60
Latent Denoising Makes Good Visual Tokenizers

Paper • 2507.15856 • Published Jul 21, 2025 • 12
Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers

Paper • 2507.08422 • Published Jul 11, 2025 • 36
TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance

Paper • 2507.18192 • Published Jul 24, 2025 • 8
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again

Paper • 2507.22058 • Published Jul 29, 2025 • 40
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

Paper • 2507.21802 • Published Jul 29, 2025 • 20
PixNerd: Pixel Neural Field Diffusion

Paper • 2507.23268 • Published Jul 31, 2025 • 52
Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation

Paper • 2508.03320 • Published Aug 5, 2025 • 65
The Promise of RL for Autoregressive Image Editing

Paper • 2508.01119 • Published Aug 1, 2025 • 11
LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer

Paper • 2508.00477 • Published Aug 1, 2025 • 11
HPSv3: Towards Wide-Spectrum Human Preference Score

Paper • 2508.03789 • Published Aug 5, 2025 • 21
The Cow of Rembrandt - Analyzing Artistic Prompt Interpretation in Text-to-Image Models

Paper • 2507.23313 • Published Jul 31, 2025
Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off

Paper • 2508.04825 • Published Aug 6, 2025 • 60
Story2Board: A Training-Free Approach for Expressive Storyboard Generation

Paper • 2508.09983 • Published Aug 13, 2025 • 70
CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing

Paper • 2508.06937 • Published Aug 9, 2025 • 7
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale

Paper • 2508.10711 • Published Aug 14, 2025 • 146
Next Visual Granularity Generation

Paper • 2508.12811 • Published Aug 18, 2025 • 49
S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models

Paper • 2508.12880 • Published Aug 18, 2025 • 48
Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer

Paper • 2508.09131 • Published Aug 12, 2025 • 17
TempFlow-GRPO: When Timing Matters for GRPO in Flow Models

Paper • 2508.04324 • Published Aug 6, 2025 • 11
Visual Autoregressive Modeling for Instruction-Guided Image Editing

Paper • 2508.15772 • Published Aug 21, 2025 • 10
Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance for Text-to-Image Generation

Paper • 2508.18032 • Published Aug 25, 2025 • 41
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation

Paper • 2508.17472 • Published Aug 24, 2025 • 26
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published Aug 28, 2025 • 90
USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning

Paper • 2508.18966 • Published Aug 26, 2025 • 56
OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning

Paper • 2508.21066 • Published Aug 28, 2025 • 14
Discrete Noise Inversion for Next-scale Autoregressive Text-based Image Editing

Paper • 2509.01984 • Published Sep 2, 2025 • 7
Interleaving Reasoning for Better Text-to-Image Generation

Paper • 2509.06945 • Published Sep 8, 2025 • 16
Reconstruction Alignment Improves Unified Multimodal Models

Paper • 2509.07295 • Published Sep 8, 2025 • 40
UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward

Paper • 2509.06818 • Published Sep 8, 2025 • 29
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference

Paper • 2509.06942 • Published Sep 8, 2025 • 19
Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with Quantization-Aware Scheduling

Paper • 2509.01624 • Published Sep 1, 2025 • 7
RewardDance: Reward Scaling in Visual Generation

Paper • 2509.08826 • Published Sep 10, 2025 • 73
Can Understanding and Generation Truly Benefit Together -- or Just Coexist?

Paper • 2509.09666 • Published Sep 11, 2025 • 34
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

Paper • 2509.10441 • Published Sep 12, 2025 • 31
LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence

Paper • 2509.12203 • Published Sep 15, 2025 • 20
Image Tokenizer Needs Post-Training

Paper • 2509.12474 • Published Sep 15, 2025 • 9
Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation

Paper • 2509.18824 • Published Sep 23, 2025 • 23
IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance

Paper • 2509.26231 • Published Sep 30, 2025 • 18
DreamOmni2: Multimodal Instruction-based Editing and Generation

Paper • 2510.06679 • Published Oct 8, 2025 • 74
Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer

Paper • 2510.06590 • Published Oct 8, 2025 • 78
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding

Paper • 2510.06308 • Published Oct 7, 2025 • 55
Heptapod: Language Modeling on Visual Signals

Paper • 2510.06673 • Published Oct 8, 2025 • 5
Latent Diffusion Model without Variational Autoencoder

Paper • 2510.15301 • Published Oct 17, 2025 • 50
BLIP3o-NEXT: Next Frontier of Native Image Generation

Paper • 2510.15857 • Published Oct 17, 2025 • 26
WithAnyone: Towards Controllable and ID Consistent Image Generation

Paper • 2510.14975 • Published Oct 16, 2025 • 86
Learning an Image Editing Model without Image Editing Pairs

Paper • 2510.14978 • Published Oct 16, 2025 • 10
Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing

Paper • 2510.08532 • Published Oct 9, 2025 • 7
World-To-Image: Grounding Text-to-Image Generation with Agent-Driven World Knowledge

Paper • 2510.04201 • Published Oct 5, 2025 • 5
VLM-Guided Adaptive Negative Prompting for Creative Generation

Paper • 2510.10715 • Published Oct 12, 2025 • 4
Generating an Image From 1,000 Words: Enhancing Text-to-Image With Structured Captions

Paper • 2511.06876 • Published Nov 10, 2025 • 28
One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models

Paper • 2511.10629 • Published Nov 13, 2025 • 131
A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space

Paper • 2511.10555 • Published Nov 13, 2025 • 63
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation

Paper • 2511.19365 • Published Nov 24, 2025 • 66
UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios

Paper • 2511.18050 • Published Nov 22, 2025 • 38
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published Nov 27, 2025 • 246
PixelDiT: Pixel Diffusion Transformers for Image Generation

Paper • 2511.20645 • Published Nov 25, 2025 • 36
DiP: Taming Diffusion Models in Pixel Space

Paper • 2511.18822 • Published Nov 24, 2025 • 29
Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning

Paper • 2511.20549 • Published Nov 25, 2025 • 27
CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image Generation

Paper • 2512.03540 • Published Dec 3, 2025 • 13
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

Paper • 2512.05150 • Published Dec 3, 2025 • 77
OmniPSD: Layered PSD Generation with Diffusion Transformer

Paper • 2512.09247 • Published Dec 10, 2025 • 51
EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Paper • 2512.05965 • Published Dec 5, 2025 • 38
EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture

Paper • 2512.04810 • Published Dec 4, 2025 • 26
RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards

Paper • 2512.00473 • Published Nov 29, 2025 • 27
LongCat-Image Technical Report

Paper • 2512.07584 • Published Dec 8, 2025 • 25
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

Paper • 2512.15603 • Published Dec 17, 2025 • 71
Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation

Paper • 2512.16913 • Published Dec 18, 2025 • 35
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up

Paper • 2412.16112 • Published Dec 20, 2024 • 23
SpotEdit: Selective Region Editing in Diffusion Transformers

Paper • 2512.22323 • Published Dec 26, 2025 • 39
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation

Paper • 2601.02204 • Published Jan 5 • 64
Image Generation with a Sphere Encoder

Paper • 2602.15030 • Published Feb 16 • 20
From Scale to Speed: Adaptive Test-Time Scaling for Image Editing

Paper • 2603.00141 • Published Feb 24 • 137
Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation

Paper • 2604.18168 • Published Apr 20 • 96
MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping

Paper • 2604.08364 • Published Apr 9 • 101
Qwen-Image-2.0 Technical Report

Paper • 2605.10730 • Published 24 days ago • 110

Upvote

Collection guide
Browse collections