Grounding Vision–Language–Action Models in Scientific Laboratories

Model Description

LabVLA is the first vision–language–action (VLA) model designed for scientific laboratory environments. It combines a Qwen3-VL-4B-Instruct vision–language backbone with a DiT flow-matching action expert, trained with the π0.5 recipe to enable real-time robot control in lab settings.

How to Use

Download

huggingface-cli download zjunlp/LabVLA --local-dir LabVLA

Deployment

Serve the model over the OpenPI msgpack WebSocket protocol:

git clone https://github.com/zjunlp/LabVLA.git
cd LabVLA
bash deployment/deploy.sh

For training, data preparation, and more details, please refer to our GitHub repository.

Downloads last month: -

Safetensors

Model size

5B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including zjunlp/LabVLA

LabVLA

Collection

1 item • Updated about 19 hours ago