FlowR2A: Learning Reward-to-Action Distribution for Multimodal Driving Planning

FlowR2A is a generative multimodal driving planner that learns the reward-conditioned action distribution p(a|r) with flow matching. Instead of treating simulation-based rewards as discriminative targets (as in scoring-based planners), FlowR2A reframes them as generative conditions, unifying the dense supervision of scoring-based methods with the dynamic proposal generation of anchor-based methods in a single model. This forces the planner to internalize how an action relates to its outcomes in safety, progress, comfort, and rule compliance.

Model Description

FlowR2A consists of four components:

  1. Perception Encoder — a Transfuser backbone (multi-view camera + BEV LiDAR) producing scene and agent tokens.
  2. Reward Encoder — embeds simulation reward signals (safety, progress, comfort, rule compliance) into a condition vector injected via adaptive layer norm; supports classifier-free guidance through reward dropout.
  3. Flow-based Action Decoder — a transformer with self-attention over trajectory points and cross-attention to scene tokens, conditioned on reward + time embeddings via AdaLN, trained with a velocity-matching loss over dense action–reward pairs.
  4. Mode Selector — a lightweight transformer that scores generated proposals, trained with online simulation labeling.

Checkpoint

File Description
flowr2a_s2.ckpt Stage-2 checkpoint, including all components.

Results

State-of-the-art closed-loop performance on the NAVSIM navtest benchmarks (lightweight backbone).

NAVSIM v1

Setting NC DAC TTC Comf. EP PDMS
Single proposal 98.6 97.3 95.3 100 84.9 90.0
60 proposals 98.8 98.0 96.0 100 90.1 92.8

NAVSIM v2

NC DAC DDC TLC EP TTC LK HC EC EPDMS
98.9 98.1 99.1 99.7 91.5 98.5 95.0 98.3 65.2 88.9

Usage

See the GitHub repository for setup, the NAVSIM data pipeline, and inference instructions. Download the checkpoint with:

from huggingface_hub import hf_hub_download

ckpt = hf_hub_download(repo_id="lixirui142/FlowR2A", filename="flowr2a_s2.ckpt")

Citation

@article{flowr2a2026,
  title   = {FlowR2A: Learning Reward-to-Action Distribution for Multimodal Driving Planning},
  author  = {Li, Xirui and Liu, Zhe and Ye, Xiaoqing and Han, Wenhua and Pan, Yifeng and Han, Junyu and Zhao, Hengshuang},
  journal = {arXiv preprint},
  year    = {2026}
}

License

Released under the MIT License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for lixirui142/FlowR2A