NOVA Model - GR00T N1.6 Fine-tuned for Reachy 2

NOVA (Neural Open Vision Actions) is a fine-tuned version of NVIDIA's GR00T N1.6 vision-language-action model, trained specifically for Pollen Robotics' Reachy 2 humanoid robot.

Model Description

This model is part of an end-to-end Physical AI pipeline that combines:

Voice Input: Parakeet CTC 0.6B for speech-to-text
Scene Reasoning: Cosmos Reason 2 for object detection and spatial understanding
Action Policy: This fine-tuned GR00T N1.6 model for manipulation

Model Details

Property	Value
Base Model	nvidia/GR00T-N1.6-3B
Parameters	~3B
Embodiment	Reachy 2 (custom embodiment tag)
Action Space	8-DOF (7 arm joints + gripper)
Training Steps	30,000
Final Loss	~0.008-0.01

Action Space

action = [
    shoulder_pitch,  # -180° to 90°
    shoulder_roll,   # -180° to 10°
    elbow_yaw,       # -90° to 90°
    elbow_pitch,     # -125° to 0°
    wrist_roll,      # -100° to 100°
    wrist_pitch,     # -45° to 45°
    wrist_yaw,       # -30° to 30°
    gripper,         # 0 (closed) to 1 (open)
]

Intended Use

This model is designed for:

Pick-and-place manipulation tasks on Reachy 2 robot
Language-conditioned control ("Pick up the red cube")
Research in vision-language-action models and robotic manipulation

Supported Tasks

Pick up objects (cube, cylinder, capsule, rectangular box)
Place objects in target locations
Handle 8 color variations (red, green, blue, yellow, cyan, magenta, orange, purple)

Training

Training Data

Trained on the ganatrask/NOVA dataset:

100 episodes of expert demonstrations
32 task variations (4 objects × 8 colors)
Domain randomization (position, lighting, camera jitter)
LeRobot v2.1 format

Training Configuration

Parameter	Value
GPU	NVIDIA A100-SXM4-80GB
GPUs	2
Batch Size	64
Max Steps	30,000
Save Steps	3,000
Video Backend	decord

Training Command

python -m gr00t.train \
    --dataset_repo_id ganatrask/NOVA \
    --embodiment_tag reachy2 \
    --video_backend decord \
    --num_gpus 2 \
    --batch_size 64 \
    --max_steps 30000 \
    --save_steps 3000 \
    --output_dir ./checkpoints/groot-reachy2

Usage

Prerequisites

You need to apply a patch to Isaac-GR00T to add the Reachy 2 embodiment tag:

cd Isaac-GR00T
patch -p1 < ../patches/add_reachy2_embodiment.patch

Inference

from gr00t.data.embodiment_tags import EmbodimentTag
from gr00t.policy.gr00t_policy import Gr00tPolicy
import importlib.util

# Load modality config first
spec = importlib.util.spec_from_file_location(
    "modality_config",
    "configs/reachy2_modality_config.py"
)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)

# Load policy
policy = Gr00tPolicy(
    embodiment_tag=EmbodimentTag.REACHY2,
    model_path="ganatrask/NOVA",  # or local checkpoint path
    device="cuda",
    strict=True,
)

# Run inference
obs = {
    "video": {"front_cam": image[None, None, :, :, :]},  # (1, 1, H, W, 3)
    "state": {"arm_joints": joints[None, None, :]},      # (1, 1, 7)
    "language": {"annotation.human.task_description": [["Pick up the red cube"]]},
}
action, _ = policy.get_action(obs)

Performance

Metric	Value
Inference Speed	~40ms/step (A100)
VRAM Usage	~44GB / 80GB
Training Time	~6 hours (30K steps)

Limitations

Simulation-trained: Primarily trained on MuJoCo simulation data
Single-arm: Currently supports right arm manipulation only
Fixed camera setup: Expects front camera input at 224×224 resolution
Task scope: Optimized for pick-and-place; may not generalize to other manipulation tasks

Ethical Considerations

This model should be used for research purposes
Human supervision recommended for real robot deployment
Not intended for safety-critical applications without extensive testing

Citation

If you use this model, please cite:

@misc{nova2025,
  title={NOVA: Neural Open Vision Actions},
  author={ganatrask},
  year={2025},
  publisher={HuggingFace},
  url={https://huggingface.co/ganatrask/NOVA}
}

Acknowledgments

NVIDIA - GR00T N1.6 base model
Pollen Robotics - Reachy 2 robot
HuggingFace - LeRobot framework
VESSL AI - GPU compute for training

License

This model inherits the NVIDIA Open Model License from the base GR00T N1.6 model.

Model tree for ganatrask/NOVA

Base model

nvidia/GR00T-N1.6-3B

Finetuned

(6)

this model

ganatrask
/

NOVA