@AbstractPhil on Hugging Face: "Anima - Brent JSON (PREVIEW) - Subject Bucketing Full article available…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

posted an update about 15 hours ago

Post

Anima - Brent JSON (PREVIEW) - Subject Bucketing

Full article available https://huggingface.co/blog/AbstractPhil/subject-bucketing.

There is additionally a civit model release as well.
https://civitai.com/models/2730503/anima-jsonenglish

AbstractPhil/anima-prelim-1k-r64
The JSON multi-prompt diffusion model prototype using Anima 1.0 base as the pretrain to finetune into the JSON target. The upcoming JSON lora is being cached and trained with 40,000 of the full 83,000 valid images from the qwen set.

This first preview version is ready to use as a ComfyUI capable LORA, so you can just load up the epoch you want without anything special in comfyui and have at it. You can currently use plain English in conjunction with tagging to produce useful and meaningful prompt targets without the JSON.

AbstractPhil/anima-prelim-1k-r64
The comfyui nodes are present and work for testing use-case, but they are not ready for production use just yet.

-- Technical --
Primarily the target was the VLM json target followed by the AnimeTIMM vit processed through the VLM json processor as the followup. First 12 epochs VLM experienced images with json formatting, last 8 epochs were finetuning from epoch 12 onward to 20 using the AnimeTIMM captions turned into JSON instead.

The Anima model itself accepted the 1000 image and the json prompting works quite well. In the process I set up a couple comfyui nodes that can translate base prompts into the same language the model is learning. Those are present in the repo.

AbstractPhil

about 6 hours ago

•

edited about 6 hours ago

Upcoming behavioral assessments include a large array of QWEN VLM models I will publish benchmarks for.

These will be aligned to generic use-case, meaning as many tasks as possible that do not require finetuning.

Which produces valid json schema?
image classification
bounding box location
image text identification and accuracy checking
structural and spatial awareness
3d geometric object identification and awareness
camera rotational offset
subject fixation and awareness
semantic association
depth analysis
segmentation potential
vit accuracy to image prompting
outline and association testing
style identification and structural awareness
type differentiation with data types; json, yaml, MD, and a multitude of other potentials.
utilization and response to those types and the expected prompts

In this post

AbstractPhil AbstractPhila