data-archetype/full_capacitor
full_capacitor distills the FLUX.2 latent space onto the SemDisDiffAE architecture. It is trained in two stages: first the Capacitor decoder is trained to decode FLUX.2 latents, then that decoder is frozen and a matching encoder is trained on top and latents are regressed against FLUX.2 to produce a standalone autoencoder.
2k PSNR Benchmark
| Model | Mean PSNR (dB) | Std (dB) | Median (dB) | P5 (dB) | P95 (dB) |
|---|---|---|---|---|---|
| FLUX.2 VAE | 36.28 |
4.53 |
36.07 |
28.90 |
43.63 |
| full_capacitor | 36.62 |
4.63 |
36.55 |
29.14 |
44.05 |
| Delta | +0.34 |
0.68 |
0.41 |
-0.85 |
1.31 |
Evaluated on 2000 validation images.
Encode Throughput
Measured on an NVIDIA GeForce RTX 5090 in bfloat16, averaging 20
repeated batches per resolution.
| Resolution | Batch Size | FLUX.2 encode (ms/batch) | full_capacitor encode (ms/batch) | Speedup vs FLUX.2 | Peak VRAM Reduction |
|---|---|---|---|---|---|
256x256 |
128 |
383.41 |
42.56 |
9.01x |
91.9% |
512x512 |
32 |
353.58 |
44.97 |
7.86x |
92.0% |
Latent alignment is not perfect (posterior-mean cosine similarity about 95%;
see Technical report),
but latent PCA is very close (see
Results viewer).
Latent Interface
encode()returns the model's own whitened latent space.decode()expects that same whitened latent space and dewhitens internally.whiten()anddewhiten()are also exposed for explicit control.encode_posterior()returns the raw exported posterior(mean, logsnr)before whitening.
This latent interface is self-consistent for downstream latent diffusion, but it is not a drop-in replacement for other models' latent normalization conventions.
The export ships weights in float32. The recommended runtime path is
bfloat16 for the main encoder and decoder, while whitening, dewhitening, and
other numerically sensitive inference steps remain in float32.
Usage
import torch
from full_capacitor import FullCapacitor, FullCapacitorInferenceConfig
device = "cuda"
model = FullCapacitor.from_pretrained(
"data-archetype/full_capacitor",
device=device,
dtype=torch.bfloat16,
)
image = ... # [1, 3, H, W] in [-1, 1], H and W divisible by 32
with torch.inference_mode():
latents = model.encode(image.to(device=device, dtype=torch.bfloat16))
recon = model.decode(
latents,
height=int(image.shape[-2]),
width=int(image.shape[-1]),
inference_config=FullCapacitorInferenceConfig(num_steps=1),
)
Details
full_capacitoruses an8-block encoder and an8-block decoder.- Raw-space cross checks show the latent spaces remain broadly compatible, but moving from one to the other should still require some adaptation time for downstream latent diffusion.
- Technical report
Citation
@misc{full_capacitor,
title = {Full capacitor: a Flux.2 VAE latent space distillation diffusion autoencoder},
author = {data-archetype},
email = {data-archetype@proton.me},
year = {2026},
month = apr,
url = {https://huggingface.co/data-archetype/full_capacitor},
}
- Downloads last month
- -