Abstract
A two-expert architecture for pose-free 3D Gaussian Splatting separates geometry estimation from appearance synthesis, achieving superior performance compared to unified monolithic approaches.
Pose-free feed-forward 3D Gaussian Splatting (3DGS) has opened a new frontier for rapid 3D modeling, enabling high-quality Gaussian representations to be generated from uncalibrated multi-view images in a single forward pass. The dominant approach in this space adopts unified monolithic architectures, often built on geometry-centric 3D foundation models, to jointly estimate camera poses and synthesize 3DGS representations within a single network. While architecturally streamlined, such "all-in-one" designs may be suboptimal for high-fidelity 3DGS generation, as they entangle geometric reasoning and appearance modeling within a shared representation. In this work, we introduce 2Xplat, a pose-free feed-forward 3DGS framework based on a two-expert design that explicitly separates geometry estimation from Gaussian generation. A dedicated geometry expert first predicts camera poses, which are then explicitly passed to a powerful appearance expert that synthesizes 3D Gaussians. Despite its conceptual simplicity, being largely underexplored in prior works, the proposed approach proves highly effective. In fewer than 5K training iterations, the proposed two-experts pipeline substantially outperforms prior pose-free feed-forward 3DGS approaches and achieves performance on par with state-of-the-art posed methods. These results challenge the prevailing unified paradigm and suggest the potential advantages of modular design principles for complex 3D geometric estimation and appearance synthesis tasks.
Community
Key Idea:
- A pose-free feed-forward 3D Gaussian Splatting framework that decouples geometry estimation and appearance generation into two specialized experts, enabling higher-quality novel view synthesis than previous monolithic architectures.
Highlights:
- The two-expert design separates pose estimation and 3D Gaussian generation, enabling specialized learning beyond monolithic architectures.
- It achieves state-of-the-art pose-free performance and matches or outperforms pose-dependent methods.
- The framework is efficient, converging in under 5K iterations using pretrained experts.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- TokenSplat: Token-aligned 3D Gaussian Splatting for Feed-forward Pose-free Reconstruction (2026)
- NeVStereo: A NeRF-Driven NVS-Stereo Architecture for High-Fidelity 3D Tasks (2026)
- F4Splat: Feed-Forward Predictive Densification for Feed-Forward 3D Gaussian Splatting (2026)
- UniSem: Generalizable Semantic 3D Reconstruction from Sparse Unposed Images (2026)
- M^3: Dense Matching Meets Multi-View Foundation Models for Monocular Gaussian Splatting SLAM (2026)
- RnG: A Unified Transformer for Complete 3D Modeling from Partial Observations (2026)
- GIFSplat: Generative Prior-Guided Iterative Feed-Forward 3D Gaussian Splatting from Sparse Views (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2603.21064 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper