Instructions to use SceneWorks/qwen-image-tokenizer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use SceneWorks/qwen-image-tokenizer with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir qwen-image-tokenizer SceneWorks/qwen-image-tokenizer
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Qwen-Image fast tokenizer (tokenizer.json)
A derived artifact for running Qwen/Qwen-Image on the native-Rust/MLX mlx-gen engine (SceneWorks).
Why this exists
Qwen/Qwen-Image ships its Qwen2 BPE tokenizer as vocab.json + merges.txt only โ there is
no fast tokenizer.json in the upstream repo (the Python fork builds the fast tokenizer at
runtime via transformers). The Rust engine's tokenizer loader (mlx_gen::TextTokenizer, consumed by
the qwen-image provider's load_tokenizer) reads the HF tokenizers fast serialization, so it
needs a tokenizer.json.
This repo hosts that derived tokenizer.json so SceneWorks model-install can overlay it onto the
upstream Qwen-Image snapshot (instead of running a Python vocab.json+merges.txtโfast conversion at
install time on every machine โ the desktop Mac bundle ships no Python). See SceneWorks sc-6570; this
mirrors the Kolors fast-tokenizer overlay
(sc-4764).
Note:
Qwen/Qwen-Image-Edit-2511already ships its owntokenizer.jsonupstream, so only the base text-to-imageQwen/Qwen-Imagerepo needs this overlay.
How it was built
Materialized by tools/build_qwen_tokenizer.py (mlx-gen): loads the Qwen2 tokenizer with
transformers.AutoTokenizer.from_pretrained (the fast path) and writes backend_tokenizer.save(...).
The result is the byte-identical fast tokenizer the fork builds at runtime โ same vocab, merges,
NFC + ByteLevel pipeline, and special tokens.
Validation: fast-tokenizer ids == the fork's runtime transformers tokenizer across an
EN + EN-long + CN + mixed CN/EN/numeric/punct + empty(negative-prompt) battery โ 0 mismatches.
vocab_size 151665, pad token id 151643 (<|endoftext|>).
Files
tokenizer.jsonโ the derived fast tokenizer (the file the Rust engine needs).vocab.json,merges.txt,tokenizer_config.json,added_tokens.json,special_tokens_map.jsonโ the upstream slow-tokenizer source files (provenance / reproducibility).
License & provenance
Derived from the Qwen2 tokenizer shipped with Qwen/Qwen-Image (Apache-2.0). This repo redistributes only the tokenizer (no model weights) for engine interoperability.