SpatialBench: Is Your Spatial Foundation Model an All-Round Player?
Abstract
SpatialBench presents a comprehensive benchmark for evaluating spatial foundation models across diverse domains and tasks, revealing limitations in current models and introducing DA-Next-5M and DA-Next to advance spatial representation learning.
While spatial foundation models have demonstrated impressive performance on standard datasets, a critical question remains: are they truly all-round players capable of generalizing robustly across diverse downstream tasks, arbitrary viewpoints, shifting scene domains, varying input densities, and specific hardware constraints? Answering this overarching question requires a holistic assessment, yet current models are mainly evaluated on specific domains for which they were specifically designed or trained. Such evaluations are intrinsically limited by narrow paradigm coverage, limited scene domains, and arbitrary frame sampling, making it fundamentally difficult to assess their true generalization capabilities. To address this gap, we present SpatialBench, a cross-paradigm, domain-diverse benchmark for spatial foundation models with deterministic sampling. SpatialBench features unprecedented scale and rigorous deterministic design, comprising 19 datasets and 546 scenes across 5 diverse spatial domains. It comprehensively evaluates 41 models across 6 paradigms on 5 task suites under 4 different input density settings. Our extensive evaluation reveals that current models are not yet all-round players, and uncovers crucial insights for future advancement. Specifically, we demonstrate that full-context attention maximizes accuracy while bounded-memory strategies unlock long-sequence scalability. Moreover, our empirical evaluations in challenging embodied and egocentric tasks demonstrate that strict domain alignment and high data quality are far more critical to performance than simple dataset scaling. Furthermore, to address the largest data gap identified in our analysis, we go beyond evaluation by introducing a large-scale dataset, DA-Next-5M, and a strong baseline model, DA-Next, pushing the boundaries of spatial representation learning.
Community
The first benchmark for the spatial foundation model.
really interesting to see the claim that data quality and strict domain alignment beat sheer data volume for egocentric and wrist viewpoints. that matches my intuition from robotics where domain shift often hurts a lot more than scale, and i wonder how this holds up once you add realistic sensor noise and asynchronous captures they might encounter off the bench. it would be cool to test a curriculum-style augmentation that progressively introduces cross-domain viewpoints while preserving high data quality, to see if you can keep the quality edge while improving true generalization. for a quick sanity check, the arxivlens breakdown helped me parse the deterministic sampling choices and frame-idx locking here: https://arxivlens.com/PaperView/Details/spatialbench-is-your-spatial-foundation-model-an-all-round-player-8039-7a126e16
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence (2026)
- Exploring Spatial Intelligence from a Generative Perspective (2026)
- EPIC-Bench: A Perception-Centric Benchmark for Fine-Grained Embodied Visual Grounding in Vision-Language Models (2026)
- VideoZeroBench: Probing the Limits of Video MLLMs with Spatio-Temporal Evidence Verification (2026)
- TSHA: A Benchmark for Visual Language Models in Trustworthy Safety Hazard Assessment Scenarios (2026)
- FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding (2026)
- ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.27367 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper