MemCode-VLA v11

Memory-conditioned visuomotor policy for robot manipulation.

Architecture: SmolVLM2-2.2B VLM → MoT dual-path 24-layer denoiser (LaST0 pattern) → Flow Matching action head → DeltaMem online memory → World Model Expert (LeWM-style) → Cascade Anchor Decoder (DiffusionDrive/BridgeDrive)

Training: 8×H100-80GB DDP, 45K steps (checkpoints at 5K intervals)

Config:

B_ep=32, W=48, 24 MoT layers
VLM: single layer 14/24 (GR00T N1 pattern)
Anchor: 512 anchors, Sinkhorn+centering+focal KL, cosine distance
DeltaMem: rank-8, per-layer delta-rule associative memory
World Model: LeWM-style ARPredictor, H=8 history, S=2 stride
CoT: LaST0 <|latent_pad|> pattern, 4 latent reasoning tokens

Checkpoints:

Step	Action Loss	Anchor Eff Rank	WM Active
5000	-	-	-
10000	-	-	-
15000	-	-	-
20000	-	-	-
25000	-	-	-
30000	-	-	-
35000	-	-	-
40000	0.020	326/512	0.145
45000	-	-	-

Data: RobotWin (clean+aug, 5%) + AgiBot World (45%) + InternData-A1 (45%)

Resume:

PYTORCH_ALLOC_CONF=expandable_segments:True CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
  torchrun --standalone --nnodes=1 --nproc_per_node=8 \
  -m xq_memcodevla.training.train train pretraining --resume

Code: https://github.com/guohetian/XQ-MemCodeVLA (branch: dev)

Papers: MemCode-VLA (memory + planning) + TokenAct (efficient execution)

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics