File size: 6,387 Bytes
5f03877 f84f431 5f03877 f84f431 5f03877 f84f431 5f03877 f84f431 16f60b8 f84f431 5f03877 f84f431 16f60b8 5f03877 19aed9c 5f03877 f84f431 5f03877 f84f431 06161f7 f84f431 5f03877 f84f431 5f03877 f84f431 06161f7 f84f431 5f03877 f84f431 5f03877 f84f431 06161f7 f84f431 5f03877 f84f431 5f03877 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 | ---
license: apache-2.0
language:
- en
- zh
pipeline_tag: image-to-image
---
<h1 align="center">JoyAI-Image-Edit<br><sub><sup>Awakening Spatial Intelligence in Unified Multimodal Understanding and Generation</sup></sub></h1>
<div align="center">
[](https://joyai-image.s3.cn-north-1.jdcloud-oss.com/JoyAI-Image.pdf)
[](https://github.com/jd-opensource/JoyAI-Image)
[](https://huggingface.co/jdopensource/JoyAI-Image-Edit) 
[](https://modelscope.cn/models/jd-opensource/JoyAI-Image-Edit) 
[](https://huggingface.co/spaces/stevengrove/JoyAI-Image-Edit-Space) 
[](LICENSE)
</div>
## 🐶 JoyAI-Image-Edit
JoyAI-Image-Edit is a multimodal foundation model specialized in instruction-guided image editing. It enables precise and controllable edits by leveraging strong spatial understanding, including scene parsing, relational grounding, and instruction decomposition, allowing complex modifications to be applied accurately to specified regions.
## 🚀 Quick Start
**Requirements**: Python >= 3.10, CUDA-capable GPU
### Core Dependencies
The transformers version must be **between 4.57 and 4.58**; otherwise, incorrect results may occur.
| Package | Version | Purpose |
|---------|---------|---------|
| `torch` | >= 2.8 | PyTorch |
| `transformers` | >= 4.57.0, < 4.58.0 | Text encoder |
| `torchvison` | - |Image process|
| `einops` | - |Tensor manipulation|
### Install the [Pull Request](https://github.com/huggingface/diffusers/pull/13444]) of JoyAI-Image-Edit of diffusers
```bash
pip install git+https://github.com/huggingface/diffusers.git@refs/pull/13444/head
```
### Or install from this repo (PR will merge to diffusers main branch soon)
```bash
pip install torch==2.8 transformers==4.57.6 torchvision einops
pip install git+https://github.com/Moran232/diffusers.git@joyimage_edit
```
### Running with Diffusers
```python
import torch
from PIL import Image
from diffusers import JoyImageEditPipeline
pipeline = JoyImageEditPipeline.from_pretrained("jdopensource/JoyAI-Image-Edit-Diffusers")
pipeline.to(torch.bfloat16)
pipeline.to("cuda")
pipeline.set_progress_bar_config(disable=None)
print("pipeline loaded")
img_path = "./test_images/input.png"
prompt = "Remove the construction structure from the top of the crane."
image = Image.open(img_path).convert("RGB")
prompts = [f"<|im_start|>user\n<image>\n{prompt}<|im_end|>\n"]
inputs = {
"image": image,
"prompt": prompts,
"generator": torch.manual_seed(0),
"num_inference_steps": 30,
"guidance_scale": 4.0,
}
print("run pipeline...")
with torch.inference_mode():
output = pipeline(**inputs)
image = output.images[0]
image.save("joyai_image_edit_output.png")
print("image saved.")
```
## More Usages
### Spatial Editing Reference
JoyAI-Image supports three spatial editing prompt patterns: **Object Move**, **Object Rotation**, and **Camera Control**. For the most stable behavior, we recommend following the prompt templates below as closely as possible.
#### 1. Object Move
Use this pattern when you want to move a target object into a specified region.
**Prompt template:**
```text
Move the <object> into the red box and finally remove the red box.
```
**Rules:**
* Replace `<object>` with a clear description of the target object to be moved.
* The **red box** indicates the target destination in the image.
* The phrase **"finally remove the red box"** means the guidance box should not appear in the final edited result.
**Example:**
```text
Move the board into the red box and finally remove the red box.
```
<p align="center">
<img src="test_images/input1.png" width="40%" />
<img src="test_images/output1_predicted.png" width="40%" />
</p>
#### 2. Object Rotation
Use this pattern when you want to rotate an object to a specific canonical view.
**Prompt template:**
```text
Rotate the <object> to show the <view> side view.
```
**Supported `<view>` values:**
* `front`
* `right`
* `left`
* `rear`
* `front right`
* `front left`
* `rear right`
* `rear left`
**Rules:**
* Replace `<object>` with a clear description of the object to rotate.
* Replace `<view>` with one of the supported directions above.
* This instruction is intended to change the **object orientation**, while keeping the object identity and surrounding scene as consistent as possible.
**Examples:**
```text
Rotate the dog to show the left side view.
```
<p align="center">
<img src="test_images/input2.png" width="40%" />
<img src="test_images/output2_predicted.png" width="40%" />
</p>
#### 3. Camera Control
Use this pattern when you want to change only the camera viewpoint while keeping the 3D scene itself unchanged.
**Prompt template:**
```text
Move the camera.
- Camera rotation: Yaw {y_rotation}°, Pitch {p_rotation}°.
- Camera zoom: in/out/unchanged.
- Keep the 3D scene static; only change the viewpoint.
```
**Rules:**
* `{y_rotation}` specifies the yaw rotation angle in degrees.
* `{p_rotation}` specifies the pitch rotation angle in degrees.
* `Camera zoom` must be one of:
* `in`
* `out`
* `unchanged`
* The last line is important: it explicitly tells the model to preserve the 3D scene content and geometry, and only adjust the camera viewpoint.
**Examples:**
```text
Move the camera.
- Camera rotation: Yaw 0.0°, Pitch -15.0°.
- Camera zoom: unchanged.
- Keep the 3D scene static; only change the viewpoint.
```
<p align="center">
<img src="test_images/input3.png" width="40%" />
<img src="test_images/output3_predicted.png" width="40%" />
</p>
## License Agreement
JoyAI-Image is licensed under Apache 2.0.
## ☎️ We're Hiring!
We are actively hiring Research Scientists, AI Infra Engineers, and Interns to join us in building next-generation generative foundation models and bringing them into real-world applications. If you’re interested, please send your resume to: [huanghaoyang.ocean@jd.com](mailto:huanghaoyang.ocean@jd.com)
|