ddpo-alignment

This model was finetuned from Stable Diffusion v1-4 using DDPO and a reward function that uses LLaVA to measure prompt-image alignment. See the project website for more details.

The model was finetuned for 200 iterations with a batch size of 256 samples per iteration. During finetuning, we used prompts of the form: "a(n) <animal> <activity>". We selected the animal and activity from the following lists, so try those for the best results. However, we also observed limited generalization to other prompts.

Activities:

washing dishes
playing chess
riding a bike

Animals:

cat
dog
horse
monkey
rabbit
zebra
spider
bird
sheep
deer
cow
goat
lion
tiger
bear
raccoon
fox
wolf
lizard
beetle
ant
butterfly
fish
shark
whale
dolphin
squirrel
mouse
rat
snake
turtle
frog
chicken
duck
goose
bee
pig
turkey
fly
llama
camel
bat
gorilla
hedgehog
kangaroo

Downloads last month: 15

Paper for kvablack/ddpo-alignment

Training Diffusion Models with Reinforcement Learning

Paper • 2305.13301 • Published May 22, 2023 • 5