MMBench2 World Model Checkpoints

Hallucination in World Models is Predictable and Preventable

Nicklas Hansen  ·  Xiaolong Wang  ·  UC San Diego

Interactive Paper Live Demo Dataset License


The world model follows the architecture and two-stage training recipe of Dreamer 4, adapted for large-scale multi-task continuous control, and is trained on MMBench2 — a 427-hour, 210-task dataset for visual world modeling (see the dataset repository). Each variant is a (tokenizer.pt, dynamics.pt) pair at 224×224 resolution:

  • tokenizer — a causal video tokenizer (50M-parameter encoder + 50M-parameter decoder, projecting to a 64-dim continuous latent).
  • dynamics — a 250M-parameter block-causal Transformer trained on the frozen tokenizer with a shortcut flow-matching objective.

Variants

Variant Description
base Pretrained world model (200 tasks)
coverage_aware Coverage-aware finetuned world model (200 tasks)
combined coverage_aware finetuned with all targeted data collection sources (210 tasks)

Repository layout

base/            tokenizer.pt  dynamics.pt
coverage_aware/  tokenizer.pt  dynamics.pt
combined/        tokenizer.pt  dynamics.pt

Usage

Using the accompanying code release:

cd dreamer4
python download_checkpoints.py --variant combined     # or: base | coverage_aware | all
./run_interactive.sh combined                          # launch the interactive interface

download_checkpoints.py fetches the (tokenizer.pt, dynamics.pt) pair into ./checkpoints/<variant>/. Alternatively, download directly with the Hugging Face CLI:

hf download nicklashansen/mmbench2-models --include "combined/*" --local-dir ./checkpoints

See the paper and the code release for architecture details, training recipes, and the hallucination detection and mitigation methods.

License

Released under the MIT License.

Citation

@article{Hansen2026Hallucination,
    title={Hallucination in World Models is Predictable and Preventable},
    author={Nicklas Hansen and Xiaolong Wang},
    year={2026},
}
Downloads last month
8
Video Preview
loading

Paper for nicklashansen/mmbench2-models