opendiffusionai/laion2b-squareish-1536px
Viewer • Updated • 80k • 32 • 3
How to use neuralvfx/Z-Image-SAM-ControlNet with Diffusers:
pip install -U diffusers transformers accelerate
from diffusers import ControlNetModel, StableDiffusionControlNetPipeline
controlnet = ControlNetModel.from_pretrained("neuralvfx/Z-Image-SAM-ControlNet")
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"Tongyi-MAI/Z-Image", controlnet=controlnet
)![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
ComfyUI/models/model_patchesModelPatchLoader to load the patchMODEL_PATCH into model_patch on ZImageFunControlnetZImageFunControlnetZImageFunControlnet into KSampler

Sam2AutoSegmentation node to create segmented imageHere's an example workflow json: comfy-ui-patch/z-image-control.json (includes option which performs segmentation first)
pip install -U diffusers==0.37.0
sudo apt-get install git-lfs
git lfs install
git clone https://huggingface.co/neuralvfx/Z-Image-SAM-ControlNet
cd Z-Image-SAM-ControlNet
import torch
from diffusers.utils import load_image
from diffusers_local.pipeline_z_image_control_unified import ZImageControlUnifiedPipeline
from diffusers_local.z_image_control_transformer_2d import ZImageControlTransformer2DModel
transformer = ZImageControlTransformer2DModel.from_pretrained(
".",
torch_dtype=torch.bfloat16,
use_safetensors=True,
add_control_noise_refiner=True,
)
pipe = ZImageControlUnifiedPipeline.from_pretrained(
"Tongyi-MAI/Z-Image",
torch_dtype=torch.bfloat16,
transformer=transformer,
)
pipe.enable_model_cpu_offload()
image = pipe(
prompt="some beach wood washed up on the sunny sand, spelling the words z-image, with footprints and waves crashing",
negative_prompt="低分辨率,低画质,肢体畸形,手指畸形,画面过饱和,蜡像感,人脸无细节,过度光滑,画面具有AI感。构图混乱。文字模糊,扭曲。",
control_image=load_image("assets/z-image.png"),
height=1024,
width=1024,
num_inference_steps=50,
guidance_scale=4.0,
controlnet_conditioning_scale=1.0,
generator= torch.Generator("cuda").manual_seed(45),
).images[0]
image.save("output.png")
image
Base model
Tongyi-MAI/Z-Image