--- license: apache-2.0 language: - en - zh pipeline_tag: image-to-image ---

JoyAI-Image-Edit
_{^{Awakening Spatial Intelligence in Unified Multimodal Understanding and Generation}}

[![Report PDF](https://img.shields.io/badge/Report-PDF-red)](https://joyai-image.s3.cn-north-1.jdcloud-oss.com/JoyAI-Image.pdf) [![Project](https://img.shields.io/badge/Project-JoyAI--Image-333399)](https://github.com/jd-opensource/JoyAI-Image) [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Checkpoint-JoyAI--Image--Edit-yellow)](https://huggingface.co/jdopensource/JoyAI-Image-Edit) [![ModelScope](https://img.shields.io/badge/%F0%9F%A4%96%20ModelScope-JoyAI--Image--Edit-624aff)](https://modelscope.cn/models/jd-opensource/JoyAI-Image-Edit) [![Demo](https://img.shields.io/badge/%F0%9F%9A%80%20Demo-Spatial--Edit-orange)](https://huggingface.co/spaces/stevengrove/JoyAI-Image-Edit-Space) [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE)

## 🐶 JoyAI-Image-Edit JoyAI-Image-Edit is a multimodal foundation model specialized in instruction-guided image editing. It enables precise and controllable edits by leveraging strong spatial understanding, including scene parsing, relational grounding, and instruction decomposition, allowing complex modifications to be applied accurately to specified regions. ## 🚀 Quick Start **Requirements**: Python >= 3.10, CUDA-capable GPU ### Core Dependencies The transformers version must be **between 4.57 and 4.58**; otherwise, incorrect results may occur. | Package | Version | Purpose | |---------|---------|---------| | `torch` | >= 2.8 | PyTorch | | `transformers` | >= 4.57.0, < 4.58.0 | Text encoder | | `torchvison` | - |Image process| | `einops` | - |Tensor manipulation| ### Install the [Pull Request](https://github.com/huggingface/diffusers/pull/13444]) of JoyAI-Image-Edit of diffusers ```bash pip install git+https://github.com/huggingface/diffusers.git@refs/pull/13444/head ``` ### Or install from this repo (PR will merge to diffusers main branch soon) ```bash pip install torch==2.8 transformers==4.57.6 torchvision einops pip install git+https://github.com/Moran232/diffusers.git@joyimage_edit ``` ### Running with Diffusers ```python import torch from PIL import Image from diffusers import JoyImageEditPipeline pipeline = JoyImageEditPipeline.from_pretrained("jdopensource/JoyAI-Image-Edit-Diffusers") pipeline.to(torch.bfloat16) pipeline.to("cuda") pipeline.set_progress_bar_config(disable=None) print("pipeline loaded") img_path = "./test_images/input.png" prompt = "Remove the construction structure from the top of the crane." image = Image.open(img_path).convert("RGB") prompts = [f"<|im_start|>user\n\n{prompt}<|im_end|>\n"] inputs = { "image": image, "prompt": prompts, "generator": torch.manual_seed(0), "num_inference_steps": 30, "guidance_scale": 4.0, } print("run pipeline...") with torch.inference_mode(): output = pipeline(**inputs) image = output.images[0] image.save("joyai_image_edit_output.png") print("image saved.") ``` ## More Usages ### Spatial Editing Reference JoyAI-Image supports three spatial editing prompt patterns: **Object Move**, **Object Rotation**, and **Camera Control**. For the most stable behavior, we recommend following the prompt templates below as closely as possible. #### 1. Object Move Use this pattern when you want to move a target object into a specified region. **Prompt template:** ```text Move the

JoyAI-Image-EditAwakening Spatial Intelligence in Unified Multimodal Understanding and Generation

JoyAI-Image-Edit
_{^{Awakening Spatial Intelligence in Unified Multimodal Understanding and Generation}}