Bernini β€” MLX (full planner + renderer)

Native Apple Silicon / MLX conversion of ByteDance Bernini (the full pipeline), packaged for in-process generation in SceneWorks. Bernini is a Latent Semantic Planning model: a Qwen2.5-VL-7B semantic planner (MAR loop) drives a Wan2.2-T2V-A14B dual-expert renderer.

This is a turnkey, self-contained snapshot β€” no diffusers source or separate Wan base is needed at runtime. It loads directly via mlx_gen::load("bernini") (mlx-gen-bernini) and is quantized to Q4 (default) / Q8 (opt-in) at load time.

Contents

  • qwen2_5_vl.safetensors + qwen2_5_vl_config.json β€” Qwen2.5-VL-7B planner backbone + vision tower
  • connector.safetensors, vit_decoder.safetensors, mask_tokens.safetensors β€” MLP connector, ViT decoder (clip-diff flow head), MAR mask token
  • high_noise_model.safetensors + low_noise_model.safetensors β€” the Wan2.2 dual-expert renderer DiTs
  • t5_encoder.safetensors + tokenizer.json β€” UMT5-XXL text encoder + tokenizer
  • vae.safetensors β€” z16 AutoencoderKLWan
  • mllm/ β€” Qwen ChatML tokenizer/config; *.json sidecars β€” config + planner/renderer knobs

dtype: bf16. Validated on a 128 GB Apple Silicon Mac for text-to-image and text-to-video (~44 GB peak at Q4).

Credits & license

Derived from ByteDance/Bernini-Diffusers and Wan-AI/Wan2.2-T2V-A14B (the renderer's stock UMT5/VAE), both Apache-2.0. Conversion/packaging by SceneWorks; released under Apache-2.0.

Downloads last month
26
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for SceneWorks/bernini-mlx

Finetuned
(2)
this model