RadLIT-BiEncoder: Radiology Document Retrieval

A domain-specialized bi-encoder model for radiology document retrieval, trained to understand medical imaging terminology and radiology-specific queries.

Model Description

RadLIT-BiEncoder generates dense embeddings optimized for radiology content retrieval. It serves as the first stage of the RadLITE pipeline, providing fast candidate retrieval before cross-encoder reranking.

Architecture

Base Model: RoBERTa-base architecture
Hidden Size: 768
Layers: 12
Attention Heads: 12
Parameters: ~125M
Max Sequence Length: 512 tokens
Embedding Dimension: 768

Training

The model was trained using contrastive learning with hard negative mining on radiology educational content:

Training Objective: Multiple Negatives Ranking Loss with hard negatives
Batch Size: 32
Learning Rate: 2e-5 with warmup
Training Epochs: 4

Note: Training data sources are not disclosed due to variable licensing. The model is released under Apache 2.0.

Performance

RadLIT-9 Benchmark (Bi-Encoder Only)

Performance when using this bi-encoder alone for retrieval:

Metric	Score
MRR	0.698
nDCG@10	0.748
Recall@10	91.4%
Recall@5	86.9%
Recall@1	56.7%

Comparison with General-Purpose Models

On RadLIT-9 benchmark (bi-encoder retrieval only, no reranking):

Model	MRR	nDCG@10	Recall@10
GTE-large	0.843	0.873	97.1%
E5-large-v2	0.813	0.850	96.9%
BGE-large	0.792	0.836	97.4%
RadLIT-BiEncoder	0.698	0.748	91.4%

Important: The bi-encoder alone underperforms general-purpose models. The value of RadLIT comes from the full pipeline with cross-encoder reranking (see below).

Full RadLITE Pipeline Performance

When combined with RadLIT-CrossEncoder and BM25 fusion:

Configuration	MRR	Improvement
Bi-encoder only	0.698	baseline
+ Cross-encoder reranking	0.782	+12.0%
+ BM25 fusion (RadLITE)	0.829	+18.8%

The full RadLITE pipeline achieves 0.829 MRR, competitive with the best general-purpose models while being optimized for radiology.

Subspecialty Performance (Bi-Encoder Only)

Subspecialty	MRR	Recall@10
Physics/Nuclear	0.790	100%
Pediatric	0.827	92%
Thoracic	0.828	94%
Cardiac	0.778	98%
Neuroradiology	0.731	88%
Gastrointestinal	0.626	98%
Breast	0.592	90%
Musculoskeletal	0.598	78%
Genitourinary	0.470	84%

Usage

Installation

pip install sentence-transformers

Basic Usage

from sentence_transformers import SentenceTransformer

# Load model
model = SentenceTransformer('matulichpt/radlit-biencoder')

# Encode queries and documents
queries = [
    "What are the imaging features of hepatocellular carcinoma on MRI?",
    "How do you differentiate glioblastoma from metastasis?"
]
documents = [
    "HCC typically shows arterial enhancement with washout on portal venous phase...",
    "GBM and metastases can be differentiated by their location and multiplicity..."
]

query_embeddings = model.encode(queries, convert_to_tensor=True)
doc_embeddings = model.encode(documents, convert_to_tensor=True)

# Compute similarity
from sentence_transformers.util import cos_sim
similarities = cos_sim(query_embeddings, doc_embeddings)
print(similarities)

For Retrieval Pipeline

from sentence_transformers import SentenceTransformer, util
import torch

model = SentenceTransformer('matulichpt/radlit-biencoder')

# Pre-encode your document corpus
corpus = ["document 1...", "document 2...", ...]
corpus_embeddings = model.encode(corpus, convert_to_tensor=True, show_progress_bar=True)

# At query time
query = "What are the CT findings in pulmonary embolism?"
query_embedding = model.encode(query, convert_to_tensor=True)

# Find top-k similar documents
cos_scores = util.cos_sim(query_embedding, corpus_embeddings)[0]
top_results = torch.topk(cos_scores, k=10)

for score, idx in zip(top_results[0], top_results[1]):
    print(f"Score: {score:.4f} - {corpus[idx][:100]}...")

Demo: Radiology Query Understanding

from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer('matulichpt/radlit-biencoder')

# Sample radiology corpus
corpus = [
    "HCC typically shows arterial hyperenhancement with washout on portal venous phase per LI-RADS criteria.",
    "Pulmonary embolism appears as filling defects in pulmonary arteries on CTPA.",
    "PVNS shows hemosiderin deposition with low T2 signal and GRE blooming artifact.",
    "Acute stroke shows restricted diffusion: high DWI signal with low ADC values.",
]

# Encode corpus
corpus_embeddings = model.encode(corpus, convert_to_tensor=True)

# Query
query = "What are the MRI findings in pigmented villonodular synovitis?"
query_embedding = model.encode(query, convert_to_tensor=True)

# Find best match
scores = util.cos_sim(query_embedding, corpus_embeddings)[0]
best_idx = scores.argmax()
print(f"Best match: {corpus[best_idx]}")
# Output: PVNS shows hemosiderin deposition with low T2 signal and GRE blooming artifact.

The model correctly identifies PVNS content even though the query uses the full name and the corpus uses the abbreviation.

Recommended: Full RadLITE Pipeline

For best results, use RadLIT-BiEncoder as the first stage followed by RadLIT-CrossEncoder for reranking:

from sentence_transformers import SentenceTransformer, CrossEncoder

# Stage 1: Bi-encoder retrieval (fast, gets candidates)
biencoder = SentenceTransformer('matulichpt/radlit-biencoder')

# Stage 2: Cross-encoder reranking (slower, more accurate)
crossencoder = CrossEncoder('matulichpt/radlit-crossencoder')

# Retrieve candidates
query = "What are the MRI findings in anterior cruciate ligament tear?"
candidates = retrieve_with_biencoder(query, corpus, biencoder, top_k=50)

# Rerank with cross-encoder
pairs = [[query, doc] for doc in candidates]
scores = crossencoder.predict(pairs)

# Apply temperature calibration (recommended: T=1.5)
calibrated_scores = scores / 1.5

# Sort by calibrated scores
reranked = sorted(zip(candidates, calibrated_scores), key=lambda x: x[1], reverse=True)

Intended Use

Primary Use Cases

First-stage candidate retrieval for radiology content
Medical imaging literature search
Radiology question-answering systems (retrieval component)

Out-of-Scope Uses

General web search
Non-medical document retrieval
Clinical diagnosis (this is a retrieval model, not a diagnostic tool)

Limitations

Bi-encoder alone underperforms: Use with cross-encoder reranking for best results
Domain Specificity: Optimized for radiology; may underperform on general content
Language: English only
Subspecialty Variance: Performance varies by subspecialty (0.47-0.83 MRR range)

Ethical Considerations

This model should not be used as a sole source for clinical decision-making
Retrieved documents should be reviewed by qualified medical professionals
The model may reflect biases present in radiology educational literature

Citation

@software{radlit_biencoder_2026,
  title = {RadLIT-BiEncoder: Domain-Specialized Embeddings for Radiology Retrieval},
  author = {Matulich, P.},
  year = {2026},
  url = {https://huggingface.co/matulichpt/radlit-biencoder},
  note = {MRR 0.698 standalone, 0.829 with RadLITE pipeline}
}

Related Models

RadLIT-CrossEncoder - Second-stage reranking
RadLIT-ColBERT - Late interaction model

License

Apache 2.0 - Free for research and commercial use.

Downloads last month: 20

Safetensors

Model size

0.1B params

Tensor type

F32

Evaluation results

MRR (bi-encoder only) on RadLIT-9
self-reported

0.698
Recall@10 on RadLIT-9
self-reported

0.914
nDCG@10 on RadLIT-9
self-reported

0.748