- ogma-small ยท 8.6M efficient text embedding model ยท MTEB 56.32
- Why the name Ogma?
- Use cases
- Highlights
- Performance
- Ogma is a stronger feature extractor for prompt injection detection โ the safety-critical task for agent pipelines. MiniLM edges ahead on toxicity when given sufficient labelled data and a more powerful classifier head. For agentic use cases where detecting adversarial instructions is the priority, Ogma representations are the better choice.
- Architecture
- Usage
- Model Family
- Training Details
- Limitations
- Licence & Attribution
- Citation
- Why the name Ogma?
ogma-small ยท 8.6M efficient text embedding model ยท MTEB 56.32
Efficient English text embedding model for semantic search, RAG, vector search, retrieval, clustering, classification, STS, and agent memory โ MTEB 56.32, 8.6M parameters, 1024-token context
Ogma Small is the flagship efficiency model in the family. At 8.6M parameters it scores 56.32 MTEB in our canonical 66-task Ogma paper results, while using only 38% of MiniLM-L6-v2's parameters, running 1.75ร faster on CPU, and handling inputs 4ร longer (1024 vs 256 tokens). Purpose-built to be the drop-in for every place you currently reach for MiniLM.
Why the name Ogma?
Ogma is named after Ogma (also written Oghma), the Irish god associated with eloquence and credited in myth with inventing Ogham, an early alphabet for encoding language into symbols. That is the core job of an embedding model: turn language into compact vectors that machines can search, compare, cluster, and reason over.
Use cases
ogma-small is the default efficiency model for semantic search, RAG retrieval, agent memory, vector databases, document retrieval, text classification, clustering, STS / sentence similarity, and lightweight reranking pipelines. It is aimed at teams looking for a small, fast MiniLM-style embedding model with longer context and strong MTEB quality.
Good fits:
- On-device or local-first applications where MiniLM-class quality is useful but model size, CPU latency, and privacy matter.
- Production RAG systems that need affordable embeddings for documents, chunks, tickets, chats, and internal knowledge bases.
- Agent memory and tool-use systems where frequent embedding calls should stay cheap and local when possible.
- Vector search at scale where smaller models and Matryoshka sub-dimensions can reduce index size and query cost.
- Classification and clustering features for safety filters, routing, topic grouping, deduplication, and analytics.
Choose ogma-small when you want the best balance of quality, speed, size, and deployability across edge, local, and server workloads.
Highlights
- ๐ MTEB avg 56.32 โ canonical Ogma paper result over 66/66 MTEB English tasks
- โก 1.75ร faster than MiniLM on single-threaded CPU inference (92.9 vs 53.1 docs/s)
- ๐ 1024-token context โ 4ร longer than all-MiniLM-L6-v2 (256 tokens)
- ๐ Symmetric routing via task tokens โ encode everything with
[SYM], or use[QRY]/[QRY]for retrieval (queries and documents both encoded withtask="qry"); benchmark both routes on your task - ๐ Matryoshka dims: [256, 128, 64, 32] โ one model, any precision
- ๐ก๏ธ +4.0% F1 on prompt injection detection vs MiniLM (same architecture series)
Performance
MTEB English โ 66/66 tasks (category-averaged)
Benchmarked with MTEB v2.10.7 on the standard 66-task English benchmark using category averaging (same methodology as the MTEB leaderboard).
| Category | ogma-small | all-MiniLM-L6-v2 | ฮ vs MiniLM |
|---|---|---|---|
| Classification | 66.49 | 62.62 | +3.87 |
| Clustering | 40.69 | 41.94 | -1.25 |
| PairClassification | 82.91 | 82.37 | +0.54 |
| Reranking | 50.51 | 58.04 | -7.53 |
| Retrieval | 42.05 | 41.95 | +0.10 |
| STS | 82.00 | 78.90 | +3.10 |
| Summarization | 29.59 | 30.81 | -1.22 |
| Overall | 56.32 | 56.09 | +0.23 |
Why choose Ogma Small?
ogma-small is the default recommendation for most use cases. It is MiniLM-class quality while being faster, smaller, and context-aware. Use ogma-base when you need the extra quality margin; use ogma-mini when you need to go sub-4M parameters.
Safety โ Toxicity & Prompt Injection Detection
Evaluated on the Ogma transformer architecture (same family). Embeddings are extracted then fed to a logistic regression (LR) or MLP classifier head โ the embedding model itself is not fine-tuned. Evaluated against all-MiniLM-L6-v2 as baseline.
1. Jigsaw Toxic Comment Classification
Dataset: Arsive/toxicity_classification_jigsaw โ Binary toxicity classification
Train: 25,960 ยท Test: 6,490
| Model | Classifier | Accuracy | F1 | Precision | Recall | AUC-ROC |
|---|---|---|---|---|---|---|
| Ogma | LogReg | 89.12% | 88.26% | 89.09% | 87.44% | 95.74% |
| Ogma | MLP | 88.91% | 87.98% | 89.14% | 86.85% | 95.92% |
| MiniLM | LogReg | 87.32% | 86.25% | 87.46% | 85.07% | 94.96% |
| MiniLM | MLP | 91.71% | 91.24% | 90.13% | 92.39% | 97.16% |
Ogma (LR) leads MiniLM (LR) by +2.01% F1. MiniLM (MLP) leads on this dataset โ the additional training data (25K samples) allows the MLP to compensate for MiniLM's slightly weaker base representations.
2. Prompt Injection Detection โ deepset/prompt-injections
Dataset: deepset/prompt-injections โ Binary injection detection
Train: 546 ยท Test: 116 (low-data regime)
| Model | Classifier | Accuracy | F1 | Precision | Recall | AUC-ROC |
|---|---|---|---|---|---|---|
| Ogma | LogReg | 86.21% | 84.62% | 100.0% | 73.33% | 97.77% |
| Ogma | MLP | 90.52% | 90.27% | 96.23% | 85.0% | 98.1% |
| MiniLM | LogReg | 82.76% | 80.39% | 97.62% | 68.33% | 94.52% |
| MiniLM | MLP | 87.07% | 86.24% | 95.92% | 78.33% | 93.96% |
Ogma leads across both classifiers: +4.03% F1 (MLP), +4.23% F1 (LogReg). Ogma's representations are better separated in the low-data regime โ it achieves 100% precision with LogReg, meaning zero false positives.
3. Prompt Injection Detection โ neuralchemy/Prompt-injection-dataset
Dataset: neuralchemy/Prompt-injection-dataset โ Binary injection detection
Train: 4,391 ยท Test: 942
| Model | Classifier | Accuracy | F1 | Precision | Recall | AUC-ROC |
|---|---|---|---|---|---|---|
| Ogma | LogReg | 95.22% | 95.93% | 95.84% | 96.01% | 99.30% |
| Ogma | MLP | 95.44% | 96.16% | 94.89% | 97.46% | 99.37% |
| MiniLM | LogReg | 94.59% | 95.38% | 95.46% | 95.29% | 98.92% |
| MiniLM | MLP | 93.95% | 94.85% | 94.59% | 95.11% | 98.92% |
Ogma leads across all metrics: +0.78% F1 (MLP), +0.55% F1 (LR). Both models perform well at scale; Ogma maintains its edge and achieves higher AUC-ROC (99.37% vs 98.92%).
Summary
| Task | Ogma best F1 | MiniLM best F1 | ฮ |
|---|---|---|---|
| Jigsaw Toxicity | 88.26% (LR) | 91.24% (MLP) | โ2.98% |
| deepset Injection | 90.27% (MLP) | 86.24% (MLP) | +4.03% |
| neuralchemy Injection | 96.16% (MLP) | 95.38% (LR) | +0.78% |
Ogma is a stronger feature extractor for prompt injection detection โ the safety-critical task for agent pipelines. MiniLM edges ahead on toxicity when given sufficient labelled data and a more powerful classifier head. For agentic use cases where detecting adversarial instructions is the priority, Ogma representations are the better choice.
Architecture
| Property | Value |
|---|---|
| Architecture | Custom Transformer |
Internal dim (d_model) |
256 |
Output dim (d_output) |
256 |
| Transformer layers | 6 |
| Attention heads | 4 |
| Vocabulary | 30,000 (SentencePiece / AlbertTokenizer) |
| Max sequence length | 1,024 tokens |
| Pooling | Mean pooling |
| Task tokens | [QRY] (query), [DOC] (document), [SYM] (symmetric) |
| Matryoshka dims | [32, 64, 128, 256] |
| Output normalisation | L2 (unit sphere) |
| Parameters | 8.6M |
| Model file | model.safetensors (33 MB) |
Key design choices:
- Task token prepend: A learnable task token (
[QRY],[DOC], or[SYM]) is prepended to the input sequence before the transformer. Recommended inference route:[QRY]/[QRY]โ encode both queries and documents with[QRY]; this benchmarked highest on MTEB.[SYM]everywhere is the next-best symmetric alternative. We do not recommend[DOC]at inference time โ it is exposed for downstream fine-tuning, not as an asymmetric query/document route. - Matryoshka training: The model is trained with Matryoshka Representation Learning, meaning embeddings truncated to any supported sub-dimension remain well-calibrated without retraining.
- Mean pooling: The average of all token outputs (excluding padding) produces the sentence embedding, which consistently outperforms CLS-token pooling in the Ogma architecture family.
- L2 normalisation: All outputs are unit-normalised; cosine similarity == dot product == euclidean similarity (up to a constant), simplifying downstream usage.
Usage
Installation
pip install torch tokenizers transformers huggingface_hub
Basic Encoding
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("axiotic/ogma-small", trust_remote_code=True).eval()
tok = AutoTokenizer.from_pretrained("axiotic/ogma-small", trust_remote_code=True)
sentences = [
"The quick brown fox jumps over the lazy dog",
"A fast auburn vulpine leaps over an idle canine",
"The capital of France is Paris",
]
emb = model.embed(sentences, task="sym", tokenizer=tok)
# emb.shape โ (256,) per sentence, L2-normalised
sim = (emb[0] @ emb[1]).item() # cosine sim == dot product (L2-normalised)
print(f"paraphrase: {sim:.4f}")
task="sym" is a safe default for all similarity tasks (STS, clustering,
classification) and for retrieval. Ogma is trained for symmetric routing โ
queries and documents are always encoded with the same task token. The two
recommended routes are:
[SYM]for everything (the safe default above), or[QRY]/[QRY]โ encode both queries and documents withtask="qry".
Try both on your downstream task; either can win depending on the data, and
[QRY]/[QRY] is the natural starting point when fine-tuning a classifier or
retrieval head on top of the embeddings.
Retrieval
Encode queries and documents with the same task token. Below we show the [QRY]/[QRY] route โ both calls use task="qry". This is intentional (Ogma is symmetric, not asymmetric); swap in task="sym" to compare the SYM route on your data.
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("axiotic/ogma-small", trust_remote_code=True).eval()
tok = AutoTokenizer.from_pretrained("axiotic/ogma-small", trust_remote_code=True)
queries = ["What is knowledge distillation?"]
docs = [
"Knowledge distillation trains a smaller student model to mimic a larger teacher.",
"The Eiffel Tower is in Paris, France.",
]
q = model.embed(queries, task="qry", tokenizer=tok) # (256,) per query โ symmetric: both sides use qry
d = model.embed(docs, task="qry", tokenizer=tok) # (256,) per doc โ not a typo; Ogma is symmetric
scores = (q @ d.T).squeeze(0) # cosine sim (L2-normalised, dot == cosine)
print(scores.tolist()) # [higher, lower] โ first doc is relevant
Matryoshka โ Flexible Dimensionality
Ogma is trained with Matryoshka Representation Learning. Slice and re-normalise to any supported sub-dimension with no retraining:
import torch, torch.nn.functional as F
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("axiotic/ogma-small", trust_remote_code=True).eval()
tok = AutoTokenizer.from_pretrained("axiotic/ogma-small", trust_remote_code=True)
emb = model.embed(["hello world"], task="sym", tokenizer=tok) # full 256d
for d in model.config.matryoshka_dims:
sub = F.normalize(emb[:, :d], dim=-1)
print(f"{d}d norm={sub.norm(dim=-1).item():.4f}")
Model Family
| Model | Params | Size | MTEB Avg | Class | Clust | PairClass | Rerank | Ret | STS | Summ | d_out | Context |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ogma-large | 32.4M | 124 MB | 57.41 | 68.6 | 41.6 | 84.0 | 53.1 | 43.7 | 83.7 | 30.9 | 256 | 1024 |
| ogma-base | 13.3M | 51 MB | 57.02 | 67.74 | 41.49 | 83.73 | 51.25 | 42.36 | 82.84 | 29.73 | 256 | 1024 |
| ogma-small | 8.6M | 33 MB | 56.32 | 66.49 | 40.69 | 82.91 | 50.51 | 42.05 | 82.00 | 29.59 | 256 | 1024 |
| ogma-mini | 3.5M | 14 MB | 53.06 | 61.77 | 37.38 | 79.66 | 47.39 | 36.21 | 77.71 | 31.33 | 256 | 1024 |
| ogma-micro | 2.3M | 8.9 MB | 52.18 | 59.53 | 36.88 | 78.62 | 49.74 | 33.09 | 75.63 | 31.77 | 128 | 1024 |
| all-MiniLM-L6-v2 | 22.7M | 87 MB | 56.09 | 62.62 | 41.94 | 82.37 | 58.04 | 41.95 | 78.90 | 30.81 | 384 | 256 |
| potion-base-32M | 32.0M | 123 MB | 51.22 | 66.0 | 39.2 | 78.2 | 50.9 | 32.2 | 73.9 | 29.8 | 256 | inf |
| potion-base-8M | 7.6M | 29 MB | 50.03 | 64.44 | 32.93 | 76.62 | 49.73 | 31.71 | 73.24 | 29.28 | 256 | inf |
All Ogma: MTEB 2.10.7, 66-task standard English set, category-averaged. MiniLM/Potion: published scores from the Model2Vec results page.
Training Details
| Property | Value |
|---|---|
| Teacher model | jinaai/jina-embeddings-v5-text-small (CC-BY-NC-4.0) |
| Training paradigm | Knowledge distillation from cached teacher embeddings |
| Training data | ~7M curated English sentence pairs |
| Tokenizer | AlbertTokenizer (SentencePiece, vocab=30,000) |
| Embedding initialisation | PCA of teacher embeddings (128d) projected to d_model |
| Loss | Distillation + contrastive (balanced schedule) |
| Evaluation framework | MTEB 2.10.7 |
Limitations
- No text generation. Ogma is an encoder-only embedding model.
- English only. Training data and evaluation are English-only.
- Slower than static models. Transformer inference is 40-100ร slower than static models (Potion, Model2Vec) on CPU. The trade-off: contextual understanding and 4ร longer sequences.
- Non-commercial licence. Due to distillation from a CC-BY-NC-4.0 teacher, Ogma inherits the NonCommercial restriction. Commercial use requires a separate Jina AI licence or retraining with a permissive teacher (Apache 2.0 compatible models like BGE or E5 can substitute at the cost of a full retraining run).
- Reranking gap. Ogma lags behind MiniLM-L6-v2 on reranking tasks (category avg delta: -7.5). This is an architectural characteristic: the model optimises for semantic similarity and classification over pairwise ranking.
Licence & Attribution
This model is released under CC-BY-NC-4.0 (Creative Commons Attribution-NonCommercial 4.0 International).
Required attribution (must be included in all uses):
This model was trained via knowledge distillation from
jina-embeddings-v5-text-small(https://huggingface.co/jinaai/jina-embeddings-v5-text-small) by Jina AI, licensed under CC-BY-NC-4.0.
Citation
@misc{ogma2026,
title = {Ogma: Efficient Dense Retrieval via Structured Embeddings},
author = {Axiotic AI},
year = {2026},
url = {https://huggingface.co/axiotic/ogma-small},
}
- Downloads last month
- 308
Space using axiotic/ogma-small 1
Evaluation results
- cosine_spearman on MTEB STSBenchmarktest set self-reported85.540
- accuracy on MTEB AmazonPolarityClassificationtest set self-reported76.720
- v_measure on MTEB RedditClusteringtest set self-reported43.940
- cos_sim_ap on MTEB TwitterSemEval2015test set self-reported68.490
- map on MTEB MindSmallRerankingvalidation set self-reported30.550
- ndcg_at_10 on MTEB MSMARCOself-reported34.310
- cos_sim_spearman on MTEB SummEvaltest set self-reported29.590