YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving

Mingwang Xu1*  Jiahao Cui1*  Feipeng Cai2*  Hanlin Shang1*  Zhihao Zhu1  Shan Luan1 
Yifang Xu1  Neng Zhang2  Yaoyi Li2  Jia Cai2  Siyu Zhu1 
1Fudan University  2Yinwang Intelligent Technology Co., Ltd 


πŸ“° News

  • 2025/02/01: πŸŽ‰πŸŽ‰πŸŽ‰ Release the pretrained models on Huggingface.
  • 2025/12/06: πŸŽ‰πŸŽ‰πŸŽ‰ Paper submitted on Arxiv.

πŸ“…οΈ Roadmap

Status Milestone ETA
βœ… Release the inference source code 2025.12.21
βœ… Release the SFT and inf code 2025.12.21
βœ… Release pretrained models on Huggingface 2026.02.01
πŸš€ Release NAVSIM evaluation code TBD
πŸš€ Release the RL code TBD

πŸ”§οΈ Framework

framework

πŸ† Qualitative Results on NAVSIM

NAVSIM-v1 benchmark results

navsim-v1

NAVSIM-v2 benchmark results

navsim-v2

Quick Inference Demo

The WAM-Diff will be available on Hugging Face Hub soon. To quickly test the model, follow these simple steps:

  1. Clone the repository

    git clone https://github.com/fudan-generative-vision/WAM-Diff
    cd WAM-Diff
    
  2. Initialize the environment
    If you prefer conda, run the environment setup script to install necessary dependencies:

    bash init_env.sh
    

    Or you can use uv to create the environment:

    uv venv && uv sync
    
  3. Prepare the Model Download the pretrained WAM-Diff model from Hugging Face to the ./model/WAM-Diff directory:

    https://huggingface.co/fudan-generative-ai/WAM-Diff
    

    Download the pretrained Siglip2 model from Hugging Face to the ./model/siglip2-so400m-patch14-384 directory:

    https://huggingface.co/google/siglip2-so400m-patch14-384
    
  4. Run the demo script
    Execute the demo script to test WAM-Diff on an example image:

    bash inf.sh
    

Training

To fine-tune WAM-Diff, please follow these steps:

  1. Set Up the Environment
    Follow the same environment setup steps as in the Quick Inference Demo section.
  2. Prepare the Data
    Prepare your training dataset in JSON format like
    [
        {
        "image": ["path/to/image1.png"],
        "conversations": [
            {
                "from": "human",
                "value": "Here is front views of a driving vehicle:\n<image>\nThe navigation information is: straight\nThe current position is (0.00,0.00)\nCurrent velocity is: (13.48,-0.29)  and current accelerate is: (0.19,0.05)\nPredict the optimal driving action for the next 4 seconds with 8 new waypoints."
            },
            {
                "from": "gpt",
                "value": "6.60,-0.01,13.12,-0.03,19.58,-0.04,25.95,-0.03,32.27,-0.03,38.56,-0.05,44.88,-0.06,51.16,-0.09"
            }
            ]
        },
        ...
    ]
    
  3. Run the Training Script
    Execute the training script with the following command:
    cd train
    bash ./scripts/llada_v_finetune.sh
    

πŸ“ Citation

If you find our work useful for your research, please consider citing the paper:

@article{xu2025wam,
  title={WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving},
  author={Xu, Mingwang and Cui, Jiahao and Cai, Feipeng and Shang, Hanlin and Zhu, Zhihao and Luan, Shan and Xu, Yifang and Zhang, Neng and Li, Yaoyi and Cai, Jia and others},
  journal={arXiv preprint arXiv:2512.11872},
  year={2025}
}

πŸ€— Acknowledgements

We gratefully acknowledge the contributors to the LLaDA-V, repositories, whose commitment to open source has provided us with their excellent codebases and pretrained models.

Downloads last month
11
Safetensors
Model size
9B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for fudan-generative-ai/WAM-Diff