data-archetype/full_capacitor

full_capacitor distills the FLUX.2 latent space onto the SemDisDiffAE architecture. It is trained in two stages: first the Capacitor decoder is trained to decode FLUX.2 latents, then that decoder is frozen and a matching encoder is trained on top and latents are regressed against FLUX.2 to produce a standalone autoencoder.

2k PSNR Benchmark

Model	Mean PSNR (dB)	Std (dB)	Median (dB)	P5 (dB)	P95 (dB)
FLUX.2 VAE	`36.28`	`4.53`	`36.07`	`28.90`	`43.63`
full_capacitor	`36.62`	`4.63`	`36.55`	`29.14`	`44.05`
Delta	`+0.34`	`0.68`	`0.41`	`-0.85`	`1.31`

Evaluated on 2000 validation images.

Encode Throughput

Measured on an NVIDIA GeForce RTX 5090 in bfloat16, averaging 20 repeated batches per resolution.

Resolution	Batch Size	FLUX.2 encode (ms/batch)	full_capacitor encode (ms/batch)	Speedup vs FLUX.2	Peak VRAM Reduction
`256x256`	`128`	`383.41`	`42.56`	`9.01x`	`91.9%`
`512x512`	`32`	`353.58`	`44.97`	`7.86x`	`92.0%`

Latent alignment is not perfect (posterior-mean cosine similarity about 95%; see Technical report), but latent PCA is very close (see Results viewer).

Latent Interface

encode() returns the model's own whitened latent space.
decode() expects that same whitened latent space and dewhitens internally.
whiten() and dewhiten() are also exposed for explicit control.
encode_posterior() returns the raw exported posterior (mean, logsnr) before whitening.

This latent interface is self-consistent for downstream latent diffusion, but it is not a drop-in replacement for other models' latent normalization conventions.

The export ships weights in float32. The recommended runtime path is bfloat16 for the main encoder and decoder, while whitening, dewhitening, and other numerically sensitive inference steps remain in float32.

Usage

import torch

from full_capacitor import FullCapacitor, FullCapacitorInferenceConfig


device = "cuda"
model = FullCapacitor.from_pretrained(
    "data-archetype/full_capacitor",
    device=device,
    dtype=torch.bfloat16,
)

image = ...  # [1, 3, H, W] in [-1, 1], H and W divisible by 32

with torch.inference_mode():
    latents = model.encode(image.to(device=device, dtype=torch.bfloat16))
    recon = model.decode(
        latents,
        height=int(image.shape[-2]),
        width=int(image.shape[-1]),
        inference_config=FullCapacitorInferenceConfig(num_steps=1),
    )

Details

full_capacitor uses an 8-block encoder and an 8-block decoder.
Raw-space cross checks show the latent spaces remain broadly compatible, but moving from one to the other should still require some adaptation time for downstream latent diffusion.
Technical report

Citation

@misc{full_capacitor,
  title   = {Full capacitor: a Flux.2 VAE latent space distillation diffusion autoencoder},
  author  = {data-archetype},
  email   = {data-archetype@proton.me},
  year    = {2026},
  month   = apr,
  url     = {https://huggingface.co/data-archetype/full_capacitor},
}

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support