oddadmix/arabic-triplets-large
Viewer • Updated • 105k • 13
How to use Waqf-AI/arabic-splade-efficient with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Waqf-AI/arabic-splade-efficient")
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]Efficient symmetric SPLADE using DistilBERT multilingual for faster inference.
Symmetric shared (MLMTransformer+SpladePooling, sequential)
Base model: distilbert-base-multilingual-cased
oddadmix/arabic-triplets-large (104K triplets, 92K unique passages)SpladeLoss(SparseMultipleNegativesRankingLoss, q_reg=5e-5, d_reg=3e-5)| Metric | Score |
|---|---|
| NDCG@10 | 0.2528 |
| MRR@10 | 0.3052 |
For reference: BM25 scores 0.3824 NDCG@10, 0.4483 MRR@10 on the same benchmark.
DistilBERT multilingual (6-layer, 119K vocab), ~2x faster than AraBERT
torchrunfrom sentence_transformers.sparse_encoder import SparseEncoder
model = SparseEncoder("Abdelkareem/arabic-splade-efficient")
embeddings = model.encode([
"ما هي عاصمة مصر؟",
"القاهرة هي عاصمة مصر وأكبر مدنها.",
])
print(embeddings.shape)
# Decode top tokens
decoded = model.decode(embeddings, top_k=10)
for d in decoded:
print(d)