NiT-XL Diffusers (Class-Conditional)

Native-resolution Image Transformer (NiT-XL) checkpoint packaged as a Diffusers-style repository with vendored custom code.

What is included

transformer/: NiTTransformer2DModel weights + config
scheduler/: NiTFlowMatchScheduler config
vae/: AutoencoderDC weights + config
custom_pipeline/: local, self-contained implementation for:
- NiTPipeline
- NiTTransformer2DModel
- NiTFlowMatchScheduler
test_inference.py: standalone sampling script

This repository does not depend on an external NiT-diffusers checkout during inference. It includes a root pipeline.py custom entrypoint for Diffusers dynamic loading.

Quickstart

1) Environment

Install dependencies (example):

pip install torch diffusers safetensors

If using this project environment:

conda activate rsgen

2) Generate a demo image

Run from this repository root:

python test_inference.py \
  --class-label 207 \
  --height 512 \
  --width 512 \
  --steps 250 \
  --mode sde \
  --guidance-scale 2.05 \
  --guidance-low 0.0 \
  --guidance-high 0.7 \
  --output demo_images/demo_sde250_class207_seed42.png

Python usage

from pathlib import Path
import torch
from diffusers import DiffusionPipeline

model_dir = Path(".").resolve()
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" and torch.cuda.is_bf16_supported() else torch.float32

pipe = DiffusionPipeline.from_pretrained(
    model_dir,
    custom_pipeline=str(model_dir / "pipeline.py"),
    local_files_only=True,
).to(device)
if device == "cuda":
    pipe.transformer.to(dtype=dtype)
    pipe.vae.to(dtype=dtype)

gen = torch.Generator(device=device).manual_seed(42)
result = pipe(
    class_labels=[207],
    height=512,
    width=512,
    num_inference_steps=250,
    mode="sde",
    guidance_scale=2.05,
    guidance_interval=(0.0, 0.7),
    generator=gen,
)
result.images[0].save("demo_images/sample.png")

For remote Hub loading:

from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "BiliSakura/NiT-XL-diffusers",
    custom_pipeline="pipeline",
)

Recommended inference settings

Resolution: 512x512
Mode: sde
Steps: 250
Guidance scale: 2.05
Guidance interval: (0.0, 0.7)

Using very low steps (for example 2) is only a smoke test and will produce low-quality images.

Demo

Citation

If you use this model or the NiT method in your work, please cite:

@article{wang2025native,
  title={Native-Resolution Image Synthesis},
  author={Wang, Zidong and Bai, Lei and Yue, Xiangyu and Ouyang, Wanli and Zhang, Yiyuan},
  year={2025},
  eprint={2506.03131},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Notes

This is a class-conditional generator (ImageNet label ids), not a text-to-image model.
For reproducibility, set --seed.
The vendored custom pipeline keeps inference behavior consistent without external code dependencies.

Downloads last month: 12

Inference Providers NEW

Unconditional Image Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for BiliSakura/NiT-XL-diffusers

Native-Resolution Image Synthesis

Paper • 2506.03131 • Published Jun 3, 2025 • 18