Towards Implicit Aggregation: Robust Image Representation for Place Recognition in the Transformer Era
Paper
•
2511.06024
•
Published
ImAge is an implicit aggregation method to get robust global image descriptors for visual place recognition, which neither modifies the backbone nor needs an extra aggregator. This work outperforms previous SOTA methods on several VPR benchmarks.
Paper: Towards Implicit Aggregation: Robust Image Representation for Place Recognition in the Transformer Era (NeurIPS 2025)
GitHub: Lu-Feng/ImAge
import torch
model = torch.hub.load("Lu-Feng/ImAge", "ImAge")
model.eval()
# Extract descriptor from an image
image = torch.randn(1, 3, 322, 322) # [B, 3, H, W]
with torch.no_grad():
descriptor = model(image) # [B, 6144] L2-normalized descriptor
@inproceedings{ImAge,
title={Towards Implicit Aggregation: Robust Image Representation for Place Recognition in the Transformer Era},
author={Feng Lu and Tong Jin and Canming Ye and Xiangyuan Lan and Yunpeng Liu and Chun Yuan},
booktitle={The Annual Conference on Neural Information Processing Systems},
year={2025}
}