NOVA Model - GR00T N1.6 Fine-tuned for Reachy 2

GR00T N1.6 Reachy 2 Pick & Place

NOVA (Neural Open Vision Actions) is a fine-tuned version of NVIDIA's GR00T N1.6 vision-language-action model, trained specifically for Pollen Robotics' Reachy 2 humanoid robot.

Model Description

This model is part of an end-to-end Physical AI pipeline that combines:

  • Voice Input: Parakeet CTC 0.6B for speech-to-text
  • Scene Reasoning: Cosmos Reason 2 for object detection and spatial understanding
  • Action Policy: This fine-tuned GR00T N1.6 model for manipulation

Model Details

Property Value
Base Model nvidia/GR00T-N1.6-3B
Parameters ~3B
Embodiment Reachy 2 (custom embodiment tag)
Action Space 8-DOF (7 arm joints + gripper)
Training Steps 30,000
Final Loss ~0.008-0.01

Action Space

action = [
    shoulder_pitch,  # -180° to 90°
    shoulder_roll,   # -180° to 10°
    elbow_yaw,       # -90° to 90°
    elbow_pitch,     # -125° to 0°
    wrist_roll,      # -100° to 100°
    wrist_pitch,     # -45° to 45°
    wrist_yaw,       # -30° to 30°
    gripper,         # 0 (closed) to 1 (open)
]

Intended Use

This model is designed for:

  • Pick-and-place manipulation tasks on Reachy 2 robot
  • Language-conditioned control ("Pick up the red cube")
  • Research in vision-language-action models and robotic manipulation

Supported Tasks

  • Pick up objects (cube, cylinder, capsule, rectangular box)
  • Place objects in target locations
  • Handle 8 color variations (red, green, blue, yellow, cyan, magenta, orange, purple)

Training

Training Data

Trained on the ganatrask/NOVA dataset:

  • 100 episodes of expert demonstrations
  • 32 task variations (4 objects × 8 colors)
  • Domain randomization (position, lighting, camera jitter)
  • LeRobot v2.1 format

Training Configuration

Parameter Value
GPU NVIDIA A100-SXM4-80GB
GPUs 2
Batch Size 64
Max Steps 30,000
Save Steps 3,000
Video Backend decord

Training Command

python -m gr00t.train \
    --dataset_repo_id ganatrask/NOVA \
    --embodiment_tag reachy2 \
    --video_backend decord \
    --num_gpus 2 \
    --batch_size 64 \
    --max_steps 30000 \
    --save_steps 3000 \
    --output_dir ./checkpoints/groot-reachy2

Usage

Prerequisites

You need to apply a patch to Isaac-GR00T to add the Reachy 2 embodiment tag:

cd Isaac-GR00T
patch -p1 < ../patches/add_reachy2_embodiment.patch

Inference

from gr00t.data.embodiment_tags import EmbodimentTag
from gr00t.policy.gr00t_policy import Gr00tPolicy
import importlib.util

# Load modality config first
spec = importlib.util.spec_from_file_location(
    "modality_config",
    "configs/reachy2_modality_config.py"
)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)

# Load policy
policy = Gr00tPolicy(
    embodiment_tag=EmbodimentTag.REACHY2,
    model_path="ganatrask/NOVA",  # or local checkpoint path
    device="cuda",
    strict=True,
)

# Run inference
obs = {
    "video": {"front_cam": image[None, None, :, :, :]},  # (1, 1, H, W, 3)
    "state": {"arm_joints": joints[None, None, :]},      # (1, 1, 7)
    "language": {"annotation.human.task_description": [["Pick up the red cube"]]},
}
action, _ = policy.get_action(obs)

Performance

Metric Value
Inference Speed ~40ms/step (A100)
VRAM Usage ~44GB / 80GB
Training Time ~6 hours (30K steps)

Limitations

  • Simulation-trained: Primarily trained on MuJoCo simulation data
  • Single-arm: Currently supports right arm manipulation only
  • Fixed camera setup: Expects front camera input at 224×224 resolution
  • Task scope: Optimized for pick-and-place; may not generalize to other manipulation tasks

Ethical Considerations

  • This model should be used for research purposes
  • Human supervision recommended for real robot deployment
  • Not intended for safety-critical applications without extensive testing

Citation

If you use this model, please cite:

@misc{nova2025,
  title={NOVA: Neural Open Vision Actions},
  author={ganatrask},
  year={2025},
  publisher={HuggingFace},
  url={https://huggingface.co/ganatrask/NOVA}
}

Acknowledgments

License

This model inherits the NVIDIA Open Model License from the base GR00T N1.6 model.

Links

Downloads last month
5
Safetensors
Model size
3B params
Tensor type
F32
·
BF16
·
Video Preview
loading

Model tree for ganatrask/NOVA

Finetuned
(6)
this model

Dataset used to train ganatrask/NOVA