arxiv:2604.16680

C-GenReg: Training-Free 3D Point Cloud Registration by Multi-View-Consistent Geometry-to-Image Generation with Probabilistic Modalities Fusion

Published on Apr 17

· Submitted by

Yuval Haitman on Apr 23

Ben-Gurion University of the Negev

Upvote

Authors:

Abstract

C-GenReg is a training-free 3D point cloud registration framework that uses generative priors and Vision Foundation Models to transfer matching problems to an image domain for improved cross-domain generalization.

AI-generated summary

We introduce C-GenReg, a training-free framework for 3D point cloud registration that leverages the complementary strengths of world-scale generative priors and registration-oriented Vision Foundation Models (VFMs). Current learning-based 3D point cloud registration methods struggle to generalize across sensing modalities, sampling differences, and environments. Hence, C-GenReg augments the geometric point cloud registration branch by transferring the matching problem into an auxiliary image domain, where VFMs excel, using a World Foundation Model to synthesize multi-view-consistent RGB representations from the input geometry. This generative transfer, preserves spatial coherence across source and target views without any fine-tuning. From these generated views, a VFM pretrained for finding dense correspondences extracts matches. The resulting pixel correspondences are lifted back to 3D via the original depth maps. To further enhance robustness, we introduce a "Match-then-Fuse" probabilistic cold-fusion scheme that combines two independent correspondence posteriors, that of the generated-RGB branch with that of the raw geometric branch. This principled fusion preserves each modality inductive bias and provides calibrated confidence without any additional learning. C-GenReg is zero-shot and plug-and-play: all modules are pretrained and operate without fine-tuning. Extensive experiments on indoor (3DMatch, ScanNet) and outdoor (Waymo) benchmarks demonstrate strong zero-shot performance and superior cross-domain generalization. For the first time, we demonstrate a generative registration framework that operates successfully on real outdoor LiDAR data, where no imagery data is available.

View arXiv page View PDF Project page GitHub 3 Add to collection

Community

YuvalH9

Paper submitter about 18 hours ago

•

edited about 17 hours ago

Most 3D point cloud registration methods struggle to generalize across sensors, sampling patterns, and environments—especially when relying on geometry alone.

We introduce C-GenReg, a training-free (zero-shot) framework that tackles this by transferring the problem into the image domain.

The key idea is simple but powerful:
we use a World Foundation Model to generate multi-view consistent RGB images from geometry, apply a strong Vision Foundation Model (VFM) to extract dense correspondences, and then lift these matches back to 3D.

To improve robustness, we introduce a Match-then-Fuse probabilistic scheme that combines image-based and geometric correspondences while preserving each modality’s inductive bias.

The result is a plug-and-play pipeline with strong zero-shot performance, improved cross-domain generalization, and the ability to operate even in challenging real-world LiDAR scenarios where RGB is unavailable.

We evaluate on 3DMatch, ScanNet, and Waymo, demonstrating consistent gains over strong baselines without any additional training.

Curious to hear thoughts, especially on using generative models as a bridge for geometry-only problems.

📖 Citation:

@article {haitman2026cgenreg,
  title     = {C-GenReg: Training-Free 3D Point Cloud Registration by Multi-View-Consistent Geometry-to-Image Generation with Probabilistic Modalities Fusion},
  author    = {Haitman, Yuval and Efraim, Amit and Francos, Joseph M.},
  journal   = {arXiv preprint arXiv:2604.16680},
  year      = {2026},
  doi       = {10.48550/arXiv.2604.16680},
  url       = {https://arxiv.org/abs/2604.16680}
}