RadLIT-BiEncoder: Radiology Document Retrieval
A domain-specialized bi-encoder model for radiology document retrieval, trained to understand medical imaging terminology and radiology-specific queries.
Model Description
RadLIT-BiEncoder generates dense embeddings optimized for radiology content retrieval. It serves as the first stage of the RadLITE pipeline, providing fast candidate retrieval before cross-encoder reranking.
Architecture
- Base Model: RoBERTa-base architecture
- Hidden Size: 768
- Layers: 12
- Attention Heads: 12
- Parameters: ~125M
- Max Sequence Length: 512 tokens
- Embedding Dimension: 768
Training
The model was trained using contrastive learning with hard negative mining on radiology educational content:
- Training Objective: Multiple Negatives Ranking Loss with hard negatives
- Batch Size: 32
- Learning Rate: 2e-5 with warmup
- Training Epochs: 4
Note: Training data sources are not disclosed due to variable licensing. The model is released under Apache 2.0.
Performance
RadLIT-9 Benchmark (Bi-Encoder Only)
Performance when using this bi-encoder alone for retrieval:
| Metric | Score |
|---|---|
| MRR | 0.698 |
| nDCG@10 | 0.748 |
| Recall@10 | 91.4% |
| Recall@5 | 86.9% |
| Recall@1 | 56.7% |
Comparison with General-Purpose Models
On RadLIT-9 benchmark (bi-encoder retrieval only, no reranking):
| Model | MRR | nDCG@10 | Recall@10 |
|---|---|---|---|
| GTE-large | 0.843 | 0.873 | 97.1% |
| E5-large-v2 | 0.813 | 0.850 | 96.9% |
| BGE-large | 0.792 | 0.836 | 97.4% |
| RadLIT-BiEncoder | 0.698 | 0.748 | 91.4% |
Important: The bi-encoder alone underperforms general-purpose models. The value of RadLIT comes from the full pipeline with cross-encoder reranking (see below).
Full RadLITE Pipeline Performance
When combined with RadLIT-CrossEncoder and BM25 fusion:
| Configuration | MRR | Improvement |
|---|---|---|
| Bi-encoder only | 0.698 | baseline |
| + Cross-encoder reranking | 0.782 | +12.0% |
| + BM25 fusion (RadLITE) | 0.829 | +18.8% |
The full RadLITE pipeline achieves 0.829 MRR, competitive with the best general-purpose models while being optimized for radiology.
Subspecialty Performance (Bi-Encoder Only)
| Subspecialty | MRR | Recall@10 |
|---|---|---|
| Physics/Nuclear | 0.790 | 100% |
| Pediatric | 0.827 | 92% |
| Thoracic | 0.828 | 94% |
| Cardiac | 0.778 | 98% |
| Neuroradiology | 0.731 | 88% |
| Gastrointestinal | 0.626 | 98% |
| Breast | 0.592 | 90% |
| Musculoskeletal | 0.598 | 78% |
| Genitourinary | 0.470 | 84% |
Usage
Installation
pip install sentence-transformers
Basic Usage
from sentence_transformers import SentenceTransformer
# Load model
model = SentenceTransformer('matulichpt/radlit-biencoder')
# Encode queries and documents
queries = [
"What are the imaging features of hepatocellular carcinoma on MRI?",
"How do you differentiate glioblastoma from metastasis?"
]
documents = [
"HCC typically shows arterial enhancement with washout on portal venous phase...",
"GBM and metastases can be differentiated by their location and multiplicity..."
]
query_embeddings = model.encode(queries, convert_to_tensor=True)
doc_embeddings = model.encode(documents, convert_to_tensor=True)
# Compute similarity
from sentence_transformers.util import cos_sim
similarities = cos_sim(query_embeddings, doc_embeddings)
print(similarities)
For Retrieval Pipeline
from sentence_transformers import SentenceTransformer, util
import torch
model = SentenceTransformer('matulichpt/radlit-biencoder')
# Pre-encode your document corpus
corpus = ["document 1...", "document 2...", ...]
corpus_embeddings = model.encode(corpus, convert_to_tensor=True, show_progress_bar=True)
# At query time
query = "What are the CT findings in pulmonary embolism?"
query_embedding = model.encode(query, convert_to_tensor=True)
# Find top-k similar documents
cos_scores = util.cos_sim(query_embedding, corpus_embeddings)[0]
top_results = torch.topk(cos_scores, k=10)
for score, idx in zip(top_results[0], top_results[1]):
print(f"Score: {score:.4f} - {corpus[idx][:100]}...")
Demo: Radiology Query Understanding
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('matulichpt/radlit-biencoder')
# Sample radiology corpus
corpus = [
"HCC typically shows arterial hyperenhancement with washout on portal venous phase per LI-RADS criteria.",
"Pulmonary embolism appears as filling defects in pulmonary arteries on CTPA.",
"PVNS shows hemosiderin deposition with low T2 signal and GRE blooming artifact.",
"Acute stroke shows restricted diffusion: high DWI signal with low ADC values.",
]
# Encode corpus
corpus_embeddings = model.encode(corpus, convert_to_tensor=True)
# Query
query = "What are the MRI findings in pigmented villonodular synovitis?"
query_embedding = model.encode(query, convert_to_tensor=True)
# Find best match
scores = util.cos_sim(query_embedding, corpus_embeddings)[0]
best_idx = scores.argmax()
print(f"Best match: {corpus[best_idx]}")
# Output: PVNS shows hemosiderin deposition with low T2 signal and GRE blooming artifact.
The model correctly identifies PVNS content even though the query uses the full name and the corpus uses the abbreviation.
Recommended: Full RadLITE Pipeline
For best results, use RadLIT-BiEncoder as the first stage followed by RadLIT-CrossEncoder for reranking:
from sentence_transformers import SentenceTransformer, CrossEncoder
# Stage 1: Bi-encoder retrieval (fast, gets candidates)
biencoder = SentenceTransformer('matulichpt/radlit-biencoder')
# Stage 2: Cross-encoder reranking (slower, more accurate)
crossencoder = CrossEncoder('matulichpt/radlit-crossencoder')
# Retrieve candidates
query = "What are the MRI findings in anterior cruciate ligament tear?"
candidates = retrieve_with_biencoder(query, corpus, biencoder, top_k=50)
# Rerank with cross-encoder
pairs = [[query, doc] for doc in candidates]
scores = crossencoder.predict(pairs)
# Apply temperature calibration (recommended: T=1.5)
calibrated_scores = scores / 1.5
# Sort by calibrated scores
reranked = sorted(zip(candidates, calibrated_scores), key=lambda x: x[1], reverse=True)
Intended Use
Primary Use Cases
- First-stage candidate retrieval for radiology content
- Medical imaging literature search
- Radiology question-answering systems (retrieval component)
Out-of-Scope Uses
- General web search
- Non-medical document retrieval
- Clinical diagnosis (this is a retrieval model, not a diagnostic tool)
Limitations
- Bi-encoder alone underperforms: Use with cross-encoder reranking for best results
- Domain Specificity: Optimized for radiology; may underperform on general content
- Language: English only
- Subspecialty Variance: Performance varies by subspecialty (0.47-0.83 MRR range)
Ethical Considerations
- This model should not be used as a sole source for clinical decision-making
- Retrieved documents should be reviewed by qualified medical professionals
- The model may reflect biases present in radiology educational literature
Citation
@software{radlit_biencoder_2026,
title = {RadLIT-BiEncoder: Domain-Specialized Embeddings for Radiology Retrieval},
author = {Matulich, P.},
year = {2026},
url = {https://huggingface.co/matulichpt/radlit-biencoder},
note = {MRR 0.698 standalone, 0.829 with RadLITE pipeline}
}
Related Models
- RadLIT-CrossEncoder - Second-stage reranking
- RadLIT-ColBERT - Late interaction model
License
Apache 2.0 - Free for research and commercial use.
- Downloads last month
- 20
Evaluation results
- MRR (bi-encoder only) on RadLIT-9self-reported0.698
- Recall@10 on RadLIT-9self-reported0.914
- nDCG@10 on RadLIT-9self-reported0.748