Cost-Efficient RAG for Entity Matching with LLMs: A Blocking-based Exploration
Abstract
CE-RAG4EM reduces computational overhead in large-scale entity matching by implementing blocking-based batch retrieval and generation while maintaining competitive matching quality.
Retrieval-augmented generation (RAG) enhances LLM reasoning in knowledge-intensive tasks, but existing RAG pipelines incur substantial retrieval and generation overhead when applied to large-scale entity matching. To address this limitation, we introduce CE-RAG4EM, a cost-efficient RAG architecture that reduces computation through blocking-based batch retrieval and generation. We also present a unified framework for analyzing and evaluating RAG systems for entity matching, focusing on blocking-aware optimizations and retrieval granularity. Extensive experiments suggest that CE-RAG4EM can achieve comparable or improved matching quality while substantially reducing end-to-end runtime relative to strong baselines. Our analysis further reveals that key configuration parameters introduce an inherent trade-off between performance and overhead, offering practical guidance for designing efficient and scalable RAG systems for entity matching and data integration.
Community
Can blocking help in LLM and RAG-based entity matching? Check out CE-RAG4EM, a Cost-Efficient RAG for Entity Matching that aims to reduce the cost of RAG4EM via blocking-based batch retrieval and inference.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- RPO-RAG: Aligning Small LLMs with Relation-aware Preference Optimization for Knowledge Graph Question Answering (2026)
- A2RAG: Adaptive Agentic Graph Retrieval for Cost-Aware and Reliable Reasoning (2026)
- RAGRouter-Bench: A Dataset and Benchmark for Adaptive RAG Routing (2026)
- Augmenting Question Answering with A Hybrid RAG Approach (2026)
- CompactRAG: Reducing LLM Calls and Token Overhead in Multi-Hop Question Answering (2026)
- SPARC-RAG: Adaptive Sequential-Parallel Scaling with Context Management for Retrieval-Augmented Generation (2026)
- Use Graph When It Needs: Efficiently and Adaptively Integrating Retrieval-Augmented Generation with Graphs (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper