arxiv:2603.21064

2Xplat: Two Experts Are Better Than One Generalist

Published on Mar 22

· Submitted by

Lani Ko on Mar 25

Yonsei University

Upvote

Authors:

Hwasik Jeong ,

Abstract

A two-expert architecture for pose-free 3D Gaussian Splatting separates geometry estimation from appearance synthesis, achieving superior performance compared to unified monolithic approaches.

AI-generated summary

Pose-free feed-forward 3D Gaussian Splatting (3DGS) has opened a new frontier for rapid 3D modeling, enabling high-quality Gaussian representations to be generated from uncalibrated multi-view images in a single forward pass. The dominant approach in this space adopts unified monolithic architectures, often built on geometry-centric 3D foundation models, to jointly estimate camera poses and synthesize 3DGS representations within a single network. While architecturally streamlined, such "all-in-one" designs may be suboptimal for high-fidelity 3DGS generation, as they entangle geometric reasoning and appearance modeling within a shared representation. In this work, we introduce 2Xplat, a pose-free feed-forward 3DGS framework based on a two-expert design that explicitly separates geometry estimation from Gaussian generation. A dedicated geometry expert first predicts camera poses, which are then explicitly passed to a powerful appearance expert that synthesizes 3D Gaussians. Despite its conceptual simplicity, being largely underexplored in prior works, the proposed approach proves highly effective. In fewer than 5K training iterations, the proposed two-experts pipeline substantially outperforms prior pose-free feed-forward 3DGS approaches and achieves performance on par with state-of-the-art posed methods. These results challenge the prevailing unified paradigm and suggest the potential advantages of modular design principles for complex 3D geometric estimation and appearance synthesis tasks.

View arXiv page View PDF Project page GitHub 50 Add to collection

Community

leejicb

1 day ago

This comment has been hidden (marked as Graphic Content)

lanikoworld

Paper submitter 1 day ago

This comment has been hidden (marked as Resolved)

HwasikJeong

Paper author 1 day ago

Key Idea:

A pose-free feed-forward 3D Gaussian Splatting framework that decouples geometry estimation and appearance generation into two specialized experts, enabling higher-quality novel view synthesis than previous monolithic architectures.

Highlights:

The two-expert design separates pose estimation and 3D Gaussian generation, enabling specialized learning beyond monolithic architectures.
It achieves state-of-the-art pose-free performance and matches or outperforms pose-dependent methods.
The framework is efficient, converging in under 5K iterations using pretrained experts.