Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
AbstractPhil 
posted an update about 15 hours ago
Post
44
Anima - Brent JSON (PREVIEW) - Subject Bucketing

Full article available https://huggingface.co/blog/AbstractPhil/subject-bucketing.

There is additionally a civit model release as well.
https://civitai.com/models/2730503/anima-jsonenglish

AbstractPhil/anima-prelim-1k-r64
The JSON multi-prompt diffusion model prototype using Anima 1.0 base as the pretrain to finetune into the JSON target. The upcoming JSON lora is being cached and trained with 40,000 of the full 83,000 valid images from the qwen set.

This first preview version is ready to use as a ComfyUI capable LORA, so you can just load up the epoch you want without anything special in comfyui and have at it. You can currently use plain English in conjunction with tagging to produce useful and meaningful prompt targets without the JSON.

AbstractPhil/anima-prelim-1k-r64
The comfyui nodes are present and work for testing use-case, but they are not ready for production use just yet.

-- Technical --
Primarily the target was the VLM json target followed by the AnimeTIMM vit processed through the VLM json processor as the followup. First 12 epochs VLM experienced images with json formatting, last 8 epochs were finetuning from epoch 12 onward to 20 using the AnimeTIMM captions turned into JSON instead.

The Anima model itself accepted the 1000 image and the json prompting works quite well. In the process I set up a couple comfyui nodes that can translate base prompts into the same language the model is learning. Those are present in the repo.

Upcoming behavioral assessments include a large array of QWEN VLM models I will publish benchmarks for.

image

These will be aligned to generic use-case, meaning as many tasks as possible that do not require finetuning.

  • Which produces valid json schema?
  • image classification
  • bounding box location
  • image text identification and accuracy checking
  • structural and spatial awareness
  • 3d geometric object identification and awareness
  • camera rotational offset
  • subject fixation and awareness
  • semantic association
  • depth analysis
  • segmentation potential
  • vit accuracy to image prompting
  • outline and association testing
  • style identification and structural awareness
  • type differentiation with data types; json, yaml, MD, and a multitude of other potentials.
  • utilization and response to those types and the expected prompts
In this post