iMaC: Translating Actions into Motion and Contact Images for Embodied World Models Paper • 2606.09813 • Published 15 days ago • 13
CLAP Collection Pretrained models for "CLAP: Contrastive Latent Action Pretraining for Learning Vision-Language-Action Models from Human Videos". • 2 items • Updated 8 days ago • 1
CLAP Collection Pretrained models for "CLAP: Contrastive Latent Action Pretraining for Learning Vision-Language-Action Models from Human Videos". • 2 items • Updated 8 days ago • 1
FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model Paper • 2410.13925 • Published Oct 17, 2024 • 24