NOVA Model - GR00T N1.6 Fine-tuned for Reachy 2
NOVA (Neural Open Vision Actions) is a fine-tuned version of NVIDIA's GR00T N1.6 vision-language-action model, trained specifically for Pollen Robotics' Reachy 2 humanoid robot.
Model Description
This model is part of an end-to-end Physical AI pipeline that combines:
- Voice Input: Parakeet CTC 0.6B for speech-to-text
- Scene Reasoning: Cosmos Reason 2 for object detection and spatial understanding
- Action Policy: This fine-tuned GR00T N1.6 model for manipulation
Model Details
| Property | Value |
|---|---|
| Base Model | nvidia/GR00T-N1.6-3B |
| Parameters | ~3B |
| Embodiment | Reachy 2 (custom embodiment tag) |
| Action Space | 8-DOF (7 arm joints + gripper) |
| Training Steps | 30,000 |
| Final Loss | ~0.008-0.01 |
Action Space
action = [
shoulder_pitch, # -180° to 90°
shoulder_roll, # -180° to 10°
elbow_yaw, # -90° to 90°
elbow_pitch, # -125° to 0°
wrist_roll, # -100° to 100°
wrist_pitch, # -45° to 45°
wrist_yaw, # -30° to 30°
gripper, # 0 (closed) to 1 (open)
]
Intended Use
This model is designed for:
- Pick-and-place manipulation tasks on Reachy 2 robot
- Language-conditioned control ("Pick up the red cube")
- Research in vision-language-action models and robotic manipulation
Supported Tasks
- Pick up objects (cube, cylinder, capsule, rectangular box)
- Place objects in target locations
- Handle 8 color variations (red, green, blue, yellow, cyan, magenta, orange, purple)
Training
Training Data
Trained on the ganatrask/NOVA dataset:
- 100 episodes of expert demonstrations
- 32 task variations (4 objects × 8 colors)
- Domain randomization (position, lighting, camera jitter)
- LeRobot v2.1 format
Training Configuration
| Parameter | Value |
|---|---|
| GPU | NVIDIA A100-SXM4-80GB |
| GPUs | 2 |
| Batch Size | 64 |
| Max Steps | 30,000 |
| Save Steps | 3,000 |
| Video Backend | decord |
Training Command
python -m gr00t.train \
--dataset_repo_id ganatrask/NOVA \
--embodiment_tag reachy2 \
--video_backend decord \
--num_gpus 2 \
--batch_size 64 \
--max_steps 30000 \
--save_steps 3000 \
--output_dir ./checkpoints/groot-reachy2
Usage
Prerequisites
You need to apply a patch to Isaac-GR00T to add the Reachy 2 embodiment tag:
cd Isaac-GR00T
patch -p1 < ../patches/add_reachy2_embodiment.patch
Inference
from gr00t.data.embodiment_tags import EmbodimentTag
from gr00t.policy.gr00t_policy import Gr00tPolicy
import importlib.util
# Load modality config first
spec = importlib.util.spec_from_file_location(
"modality_config",
"configs/reachy2_modality_config.py"
)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
# Load policy
policy = Gr00tPolicy(
embodiment_tag=EmbodimentTag.REACHY2,
model_path="ganatrask/NOVA", # or local checkpoint path
device="cuda",
strict=True,
)
# Run inference
obs = {
"video": {"front_cam": image[None, None, :, :, :]}, # (1, 1, H, W, 3)
"state": {"arm_joints": joints[None, None, :]}, # (1, 1, 7)
"language": {"annotation.human.task_description": [["Pick up the red cube"]]},
}
action, _ = policy.get_action(obs)
Performance
| Metric | Value |
|---|---|
| Inference Speed | ~40ms/step (A100) |
| VRAM Usage | ~44GB / 80GB |
| Training Time | ~6 hours (30K steps) |
Limitations
- Simulation-trained: Primarily trained on MuJoCo simulation data
- Single-arm: Currently supports right arm manipulation only
- Fixed camera setup: Expects front camera input at 224×224 resolution
- Task scope: Optimized for pick-and-place; may not generalize to other manipulation tasks
Ethical Considerations
- This model should be used for research purposes
- Human supervision recommended for real robot deployment
- Not intended for safety-critical applications without extensive testing
Citation
If you use this model, please cite:
@misc{nova2025,
title={NOVA: Neural Open Vision Actions},
author={ganatrask},
year={2025},
publisher={HuggingFace},
url={https://huggingface.co/ganatrask/NOVA}
}
Acknowledgments
- NVIDIA - GR00T N1.6 base model
- Pollen Robotics - Reachy 2 robot
- HuggingFace - LeRobot framework
- VESSL AI - GPU compute for training
License
This model inherits the NVIDIA Open Model License from the base GR00T N1.6 model.
Links
- GitHub: ganatrask/NOVA
- Dataset: ganatrask/NOVA
- Base Model: nvidia/GR00T-N1.6-3B
- Downloads last month
- 5
Model tree for ganatrask/NOVA
Base model
nvidia/GR00T-N1.6-3B