Z-Image-SAM-ControlNet

Fun Facts

This ControlNet is trained exclusively on images generated by Segment Anything (SAM)
Base model used was Tongyi-MAI/Z-Image
Uses SAM style images as input, outputs photorealistic images
Trained at 1024x1024 resolution, inference works best at 1.5k and up
Trained on 220K segmented images from laion2b-squareish-1536px
Trained using this repo: https://github.com/aigc-apps/VideoX-Fun

Showcases

ComfyUI Usage

Copy the weights from comfy-ui-patch/z-image-sam-controlnet.safetensors to ComfyUI/models/model_patches
Use ModelPatchLoader to load the patch
Plug MODEL_PATCH into model_patch on ZImageFunControlnet
Plug the model, VAE and image into ZImageFunControlnet
Plug the ZImageFunControlnet into KSampler

Add Auto Segmentation (optional)

Use the ComfyUI Manager to add ComfyUI-segment-anything-2
Use Sam2AutoSegmentation node to create segmented image

Here's an example workflow json: comfy-ui-patch/z-image-control.json (includes option which performs segmentation first)

Hugging Face Usage

Compatibility

pip install -U diffusers==0.37.0

Download

sudo apt-get install git-lfs
git lfs install

git clone https://huggingface.co/neuralvfx/Z-Image-SAM-ControlNet

cd Z-Image-SAM-ControlNet

Inference

import torch
from diffusers.utils import load_image
from diffusers_local.pipeline_z_image_control_unified import ZImageControlUnifiedPipeline
from diffusers_local.z_image_control_transformer_2d import ZImageControlTransformer2DModel

transformer = ZImageControlTransformer2DModel.from_pretrained(
    ".",
    torch_dtype=torch.bfloat16,
    use_safetensors=True,
    add_control_noise_refiner=True,
)

pipe = ZImageControlUnifiedPipeline.from_pretrained(
    "Tongyi-MAI/Z-Image",
    torch_dtype=torch.bfloat16,
    transformer=transformer,
)

pipe.enable_model_cpu_offload()

image = pipe(
    prompt="some beach wood washed up on the sunny sand, spelling the words z-image, with footprints and waves crashing",
    negative_prompt="低分辨率，低画质，肢体畸形，手指畸形，画面过饱和，蜡像感，人脸无细节，过度光滑，画面具有AI感。构图混乱。文字模糊，扭曲。",
    control_image=load_image("assets/z-image.png"),
    height=1024,
    width=1024,
    num_inference_steps=50,
    guidance_scale=4.0,
    controlnet_conditioning_scale=1.0,
    generator= torch.Generator("cuda").manual_seed(45),
).images[0]

image.save("output.png")
image

Downloads last month: 12

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for neuralvfx/Z-Image-SAM-ControlNet

Base model

Tongyi-MAI/Z-Image

Adapter

(144)

this model

neuralvfx
/

Z-Image-SAM-ControlNet