RadLIT-BiEncoder: Radiology Document Retrieval

A domain-specialized bi-encoder model for radiology document retrieval, trained to understand medical imaging terminology and radiology-specific queries.

Model Description

RadLIT-BiEncoder generates dense embeddings optimized for radiology content retrieval. It serves as the first stage of the RadLITE pipeline, providing fast candidate retrieval before cross-encoder reranking.

Architecture

  • Base Model: RoBERTa-base architecture
  • Hidden Size: 768
  • Layers: 12
  • Attention Heads: 12
  • Parameters: ~125M
  • Max Sequence Length: 512 tokens
  • Embedding Dimension: 768

Training

The model was trained using contrastive learning with hard negative mining on radiology educational content:

  • Training Objective: Multiple Negatives Ranking Loss with hard negatives
  • Batch Size: 32
  • Learning Rate: 2e-5 with warmup
  • Training Epochs: 4

Note: Training data sources are not disclosed due to variable licensing. The model is released under Apache 2.0.

Performance

RadLIT-9 Benchmark (Bi-Encoder Only)

Performance when using this bi-encoder alone for retrieval:

Metric Score
MRR 0.698
nDCG@10 0.748
Recall@10 91.4%
Recall@5 86.9%
Recall@1 56.7%

Comparison with General-Purpose Models

On RadLIT-9 benchmark (bi-encoder retrieval only, no reranking):

Model MRR nDCG@10 Recall@10
GTE-large 0.843 0.873 97.1%
E5-large-v2 0.813 0.850 96.9%
BGE-large 0.792 0.836 97.4%
RadLIT-BiEncoder 0.698 0.748 91.4%

Important: The bi-encoder alone underperforms general-purpose models. The value of RadLIT comes from the full pipeline with cross-encoder reranking (see below).

Full RadLITE Pipeline Performance

When combined with RadLIT-CrossEncoder and BM25 fusion:

Configuration MRR Improvement
Bi-encoder only 0.698 baseline
+ Cross-encoder reranking 0.782 +12.0%
+ BM25 fusion (RadLITE) 0.829 +18.8%

The full RadLITE pipeline achieves 0.829 MRR, competitive with the best general-purpose models while being optimized for radiology.

Subspecialty Performance (Bi-Encoder Only)

Subspecialty MRR Recall@10
Physics/Nuclear 0.790 100%
Pediatric 0.827 92%
Thoracic 0.828 94%
Cardiac 0.778 98%
Neuroradiology 0.731 88%
Gastrointestinal 0.626 98%
Breast 0.592 90%
Musculoskeletal 0.598 78%
Genitourinary 0.470 84%

Usage

Installation

pip install sentence-transformers

Basic Usage

from sentence_transformers import SentenceTransformer

# Load model
model = SentenceTransformer('matulichpt/radlit-biencoder')

# Encode queries and documents
queries = [
    "What are the imaging features of hepatocellular carcinoma on MRI?",
    "How do you differentiate glioblastoma from metastasis?"
]
documents = [
    "HCC typically shows arterial enhancement with washout on portal venous phase...",
    "GBM and metastases can be differentiated by their location and multiplicity..."
]

query_embeddings = model.encode(queries, convert_to_tensor=True)
doc_embeddings = model.encode(documents, convert_to_tensor=True)

# Compute similarity
from sentence_transformers.util import cos_sim
similarities = cos_sim(query_embeddings, doc_embeddings)
print(similarities)

For Retrieval Pipeline

from sentence_transformers import SentenceTransformer, util
import torch

model = SentenceTransformer('matulichpt/radlit-biencoder')

# Pre-encode your document corpus
corpus = ["document 1...", "document 2...", ...]
corpus_embeddings = model.encode(corpus, convert_to_tensor=True, show_progress_bar=True)

# At query time
query = "What are the CT findings in pulmonary embolism?"
query_embedding = model.encode(query, convert_to_tensor=True)

# Find top-k similar documents
cos_scores = util.cos_sim(query_embedding, corpus_embeddings)[0]
top_results = torch.topk(cos_scores, k=10)

for score, idx in zip(top_results[0], top_results[1]):
    print(f"Score: {score:.4f} - {corpus[idx][:100]}...")

Demo: Radiology Query Understanding

from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer('matulichpt/radlit-biencoder')

# Sample radiology corpus
corpus = [
    "HCC typically shows arterial hyperenhancement with washout on portal venous phase per LI-RADS criteria.",
    "Pulmonary embolism appears as filling defects in pulmonary arteries on CTPA.",
    "PVNS shows hemosiderin deposition with low T2 signal and GRE blooming artifact.",
    "Acute stroke shows restricted diffusion: high DWI signal with low ADC values.",
]

# Encode corpus
corpus_embeddings = model.encode(corpus, convert_to_tensor=True)

# Query
query = "What are the MRI findings in pigmented villonodular synovitis?"
query_embedding = model.encode(query, convert_to_tensor=True)

# Find best match
scores = util.cos_sim(query_embedding, corpus_embeddings)[0]
best_idx = scores.argmax()
print(f"Best match: {corpus[best_idx]}")
# Output: PVNS shows hemosiderin deposition with low T2 signal and GRE blooming artifact.

The model correctly identifies PVNS content even though the query uses the full name and the corpus uses the abbreviation.

Recommended: Full RadLITE Pipeline

For best results, use RadLIT-BiEncoder as the first stage followed by RadLIT-CrossEncoder for reranking:

from sentence_transformers import SentenceTransformer, CrossEncoder

# Stage 1: Bi-encoder retrieval (fast, gets candidates)
biencoder = SentenceTransformer('matulichpt/radlit-biencoder')

# Stage 2: Cross-encoder reranking (slower, more accurate)
crossencoder = CrossEncoder('matulichpt/radlit-crossencoder')

# Retrieve candidates
query = "What are the MRI findings in anterior cruciate ligament tear?"
candidates = retrieve_with_biencoder(query, corpus, biencoder, top_k=50)

# Rerank with cross-encoder
pairs = [[query, doc] for doc in candidates]
scores = crossencoder.predict(pairs)

# Apply temperature calibration (recommended: T=1.5)
calibrated_scores = scores / 1.5

# Sort by calibrated scores
reranked = sorted(zip(candidates, calibrated_scores), key=lambda x: x[1], reverse=True)

Intended Use

Primary Use Cases

  • First-stage candidate retrieval for radiology content
  • Medical imaging literature search
  • Radiology question-answering systems (retrieval component)

Out-of-Scope Uses

  • General web search
  • Non-medical document retrieval
  • Clinical diagnosis (this is a retrieval model, not a diagnostic tool)

Limitations

  1. Bi-encoder alone underperforms: Use with cross-encoder reranking for best results
  2. Domain Specificity: Optimized for radiology; may underperform on general content
  3. Language: English only
  4. Subspecialty Variance: Performance varies by subspecialty (0.47-0.83 MRR range)

Ethical Considerations

  • This model should not be used as a sole source for clinical decision-making
  • Retrieved documents should be reviewed by qualified medical professionals
  • The model may reflect biases present in radiology educational literature

Citation

@software{radlit_biencoder_2026,
  title = {RadLIT-BiEncoder: Domain-Specialized Embeddings for Radiology Retrieval},
  author = {Matulich, P.},
  year = {2026},
  url = {https://huggingface.co/matulichpt/radlit-biencoder},
  note = {MRR 0.698 standalone, 0.829 with RadLITE pipeline}
}

Related Models

License

Apache 2.0 - Free for research and commercial use.

Downloads last month
20
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results