🔍 Beyond Semantic Search: Towards Referential Anchoring in Composed Image Retrieval (CVPR 2026)

📖 Paper (arXiv) | 🌐 Homepage | 🐙 Code (GitHub) | 🤗 Dataset (OACIRR) | 🛜 Download Weights Now 👇

🔔 News

🔥 [2026-04-07]: The AdaFocal model checkpoints are officially released and are now available for use!
🔥 [2026-04-03]: The full Training/Evaluation code are officially released on GitHub!
🔥 [2026-03-25]: The OACIRR Benchmark is officially released on HuggingFace!
🎉 [2026-02-21]: Our paper "Beyond Semantic Search: Towards Referential Anchoring in Composed Image Retrieval" has been accepted to CVPR 2026!

🤖 Model Description

Architecture: ViT-G (EVA-CLIP) + BLIP-2 Q-Former + Context-Aware Attention Modulator (CAAM)
Task: Fine-grained Composed Image Retrieval (CIR) with Instance-level Consistency
Training Data: Exclusively trained on the OACIRR Union Dataset

⚙️ AdaFocal Framework

To address the core challenges of the OACIR task, we propose AdaFocal, an effective framework that dynamically modulates visual attention for precise, instance-level retrieval. Our approach augments a multimodal fusion backbone with a lightweight Context-Aware Attention Modulator (CAAM), enabling a nuanced balance between instance fidelity and compositional reasoning.

AdaFocal Framework Overview

Specifically, AdaFocal employs a two-stage reasoning process: Contextual Perception and Adaptive Focus. It first perceives the query's compositional context to predict a modulation scalar (β). This learned signal then drives an Attention Activation Mechanism, which explicitly and adaptively intensifies the visual focus on the user-specified instance region (provided via bounding box) during multimodal feature fusion.

By dynamically re-weighting the attention distribution, AdaFocal seamlessly synthesizes the anchored instance, the global visual scene, and the textual modification into a coherent representation, establishing a robust and flexible baseline for identity-preserving retrieval.

🚀 How to Use

1. Download the AdaFocal Weights

You can download the checkpoints using Git LFS:

cd OACIR
git lfs install
git clone https://huggingface.co/HaHaJun1101/AdaFocal ./checkpoints

Alternatively, download them via the Hugging Face Python API:

from huggingface_hub import snapshot_download

snapshot_download(repo_id="HaHaJun1101/AdaFocal", local_dir="OACIR/checkpoints", repo_type="model")

2. Run Evaluation via Official Codebase

Once downloaded, you can directly evaluate the models using the evaluate.sh script provided in our GitHub codebase. Open evaluate.sh and set the path to your downloaded weights:

# Inside evaluate.sh
DATASET="Fashion"
MODEL_NAME="oacir_adafocal"
MODEL_WEIGHT="./checkpoints/adafocal_scalar.pt"  # or adafocal_vector.pt

Then execute the script:

bash evaluate.sh

🏆 Model Performance on OACIRR

We provide two variants of the AdaFocal weights. You can instantly reproduce the following results using our provided evaluate.sh script.

Model Variant	Component Type	R_ID@1 (Avg)	R@1 (Avg)	R@5 (Avg)	Overall Avg	Weights File
AdaFocal (Scalar β)	Default Configuration	81.52	63.08	90.98	78.53	`adafocal_scalar.pt`
AdaFocal (Vector β)	Vector Ablation	81.99	63.06	91.35	78.80	`adafocal_vector.pt`

Detailed breakdowns across the 4 domains:

Variant	Fashion (R_ID@1 / R@1)	Car (R_ID@1 / R@1)	Product (R_ID@1 / R@1)	Landmark (R_ID@1 / R@1)
Scalar β	73.68 / 64.45	78.39 / 54.85	91.36 / 73.85	82.65 / 59.18
Vector β	75.71 / 65.97	77.97 / 54.35	91.39 / 73.30	82.90 / 58.63

✒️ Citation

If you find our dataset, models, or codebase useful in your research, please consider citing our paper:

@article{yang2026beyond,
  title={Beyond Semantic Search: Towards Referential Anchoring in Composed Image Retrieval},
  author={Yang, Yuxin and Zhou, Yinan and Chen, Yuxin and Zhang, Ziqi and Ma, Zongyang and Yuan, Chunfeng and Li, Bing and Gao, Jun and Hu, Weiming},
  journal={arXiv preprint arXiv:2604.05393},
  year={2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for HaHaJun1101/AdaFocal

Base model

Salesforce/blip2-itm-vit-g

Finetuned

(1)

this model

Dataset used to train HaHaJun1101/AdaFocal

Paper for HaHaJun1101/AdaFocal

Beyond Semantic Search: Towards Referential Anchoring in Composed Image Retrieval

Paper • 2604.05393 • Published 2 days ago