Native-Resolution Image Synthesis
Paper • 2506.03131 • Published • 18
Native-resolution Image Transformer (NiT-XL) checkpoint packaged as a Diffusers-style repository with vendored custom code.
transformer/: NiTTransformer2DModel weights + configscheduler/: NiTFlowMatchScheduler configvae/: AutoencoderDC weights + configcustom_pipeline/: local, self-contained implementation for:NiTPipelineNiTTransformer2DModelNiTFlowMatchSchedulertest_inference.py: standalone sampling scriptThis repository does not depend on an external NiT-diffusers checkout during inference.
It includes a root pipeline.py custom entrypoint for Diffusers dynamic loading.
Install dependencies (example):
pip install torch diffusers safetensors
If using this project environment:
conda activate rsgen
Run from this repository root:
python test_inference.py \
--class-label 207 \
--height 512 \
--width 512 \
--steps 250 \
--mode sde \
--guidance-scale 2.05 \
--guidance-low 0.0 \
--guidance-high 0.7 \
--output demo_images/demo_sde250_class207_seed42.png
from pathlib import Path
import torch
from diffusers import DiffusionPipeline
model_dir = Path(".").resolve()
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" and torch.cuda.is_bf16_supported() else torch.float32
pipe = DiffusionPipeline.from_pretrained(
model_dir,
custom_pipeline=str(model_dir / "pipeline.py"),
local_files_only=True,
).to(device)
if device == "cuda":
pipe.transformer.to(dtype=dtype)
pipe.vae.to(dtype=dtype)
gen = torch.Generator(device=device).manual_seed(42)
result = pipe(
class_labels=[207],
height=512,
width=512,
num_inference_steps=250,
mode="sde",
guidance_scale=2.05,
guidance_interval=(0.0, 0.7),
generator=gen,
)
result.images[0].save("demo_images/sample.png")
For remote Hub loading:
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"BiliSakura/NiT-XL-diffusers",
custom_pipeline="pipeline",
)
512x512sde2502.05(0.0, 0.7)Using very low steps (for example 2) is only a smoke test and will produce low-quality images.
If you use this model or the NiT method in your work, please cite:
@article{wang2025native,
title={Native-Resolution Image Synthesis},
author={Wang, Zidong and Bai, Lei and Yue, Xiangyu and Ouyang, Wanli and Zhang, Yiyuan},
year={2025},
eprint={2506.03131},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
--seed.