Bernini — MLX (full planner + renderer)

Native Apple Silicon / MLX conversion of ByteDance Bernini (the full pipeline), packaged for in-process generation in SceneWorks. Bernini is a Latent Semantic Planning model: a Qwen2.5-VL-7B semantic planner (MAR loop) drives a Wan2.2-T2V-A14B dual-expert renderer.

This is a turnkey, self-contained snapshot — no diffusers source or separate Wan base is needed at runtime. It loads directly via mlx_gen::load("bernini") (mlx-gen-bernini) and is quantized to Q4 (default) / Q8 (opt-in) at load time.

qwen2_5_vl.safetensors + qwen2_5_vl_config.json — Qwen2.5-VL-7B planner backbone + vision tower
connector.safetensors, vit_decoder.safetensors, mask_tokens.safetensors — MLP connector, ViT decoder (clip-diff flow head), MAR mask token
high_noise_model.safetensors + low_noise_model.safetensors — the Wan2.2 dual-expert renderer DiTs
t5_encoder.safetensors + tokenizer.json — UMT5-XXL text encoder + tokenizer
vae.safetensors — z16 AutoencoderKLWan
mllm/ — Qwen ChatML tokenizer/config; *.json sidecars — config + planner/renderer knobs

dtype: bf16. Validated on a 128 GB Apple Silicon Mac for text-to-image and text-to-video (~44 GB peak at Q4).

Credits & license

Derived from ByteDance/Bernini-Diffusers and Wan-AI/Wan2.2-T2V-A14B (the renderer's stock UMT5/VAE), both Apache-2.0. Conversion/packaging by SceneWorks; released under Apache-2.0.

Downloads last month: 26

MLX

Hardware compatibility

Quantized

Model tree for SceneWorks/bernini-mlx

Base model

ByteDance/Bernini-Diffusers

Finetuned

(2)

this model

SceneWorks
/

bernini-mlx

Bernini — MLX (full planner + renderer)

Contents

Credits & license

Model tree for SceneWorks/bernini-mlx