๐งฎ OmniMath-2B
OmniMath-2B is a compact yet capable mathematical reasoning model, fineโtuned on top of Qwen3.5โ2B's hybrid architecture (Gated Delta Networks interleaved with standard attention). Trained on 10,000 carefully selected math problems from five diverse openโsource datasets, it excels at stepโbyโstep solutions, arithmetic word problems, geometry reasoning, and error recovery.
Despite its small size, OmniMath-2B demonstrates strong chainโofโthought performance and is ideally suited for resourceโconstrained environments, edge deployment, and fast prototyping.
โจ Key Features
- Efficient 2B Scale : Only 2 billion parameters โ runs smoothly on a single T4 GPU or even CPU with quantization.
- MultiโSource Math Training : Balanced mix of realโworld problems (
orcaโmath,GSM8K), synthetic reasoning (MetaMathQA), geometry (GeoโThought), and multiโmodal math (DeepVisiontext subset). - StepโbyโStep Reasoning : Trained with explicit
<think>...</think>โstyle chainโofโthought prompts. - Hybrid Architecture : Inherits Qwen3.5's Gated Delta Networks for efficient longโcontext processing.
๐ Benchmarks
Preliminary results (evaluation ongoing).
| Model | Size (params) | GSM8K Accuracy |
|---|---|---|
| Qwen2.5-Math-1.5B | 1.5B | 54% |
| Phi-2 (0-shot CoT) | 2.7B | 50.0% |
| OmniMath-2B (0-shot CoT) | 2B | 63.76% |
| dolphin-2_6-phi-2 | 2.7B | 58.07% |
| Qwen2.5-0.5B-Instruct | 2.7B | 49.6% |
| gemma-3-1b-it | 1.1B | 62.8% |
| MobileLLM-R1.5 950M | 1B | 52.8% |
| Gemma 2 2B IT | 2B | 23.9% |
Updates coming soon.
๐ Quickstart
๐ค Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "ZirTech/OmniMath-2B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype="auto",
device_map="auto"
)
messages = [
{"role": "system", "content": "You are a helpful math assistant. Solve problems step by step."},
{"role": "user", "content": "A store sells apples for $2 each. If you buy 5 apples, how much do you pay?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.6, top_p=0.95, top_k=20)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
โก vLLM
vllm serve ZirTech/OmniMath-2B --tensor-parallel-size 1 --max-model-len 4096
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "ZirTech/OmniMath-2B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
model.eval()
def ask(question):
prompt = f"<|im_start|>system\nYou are a helpful math assistant.<|im_end|>\n<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant\n"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.0, do_sample=False)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
if "user" in response:
response = response.split("user")[0].strip()
return response
print(ask("Find the degree for the given field extension Q(sqrt(2), sqrt(3), sqrt(18)) over Q. Give me the answer."))
๐๏ธ Architecture
OmniMathโ2B fully preserves Qwen3.5โ2B's design:
Gated Delta Networks : Linear attention layers interleaved with standard attention.
262K Native Context : Supports up to 262,144 tokens (extendable with YaRN).
Built on Qwen3_5ForCausalLM : Seamless integration with Hugging Face ecosystem.
โ ๏ธ Limitations
Numerical accuracy may occasionally falter โ always doubleโcheck critical calculations.
Geometry with visual elements was only trained on textual descriptions; performance on imageโbased geometry is limited.
NonโEnglish math problems are not thoroughly evaluated.
๐ Acknowledgments
Qwen Team for the outstanding Qwen3.5 base models.
Hugging Face for dataset hosting and the Transformers library.
Kaggle for providing free GPU hours.
๐ Citation
@misc{omnimath2b2026,
title={OmniMath-2B: A Lightweight Open Mathematical Reasoning Model},
author={Zirt Techniques},
year={2026},
url={https://huggingface.co/ZirTech/OmniMath-2B}
}
Built by Zirt Tech โค๏ธ
- Downloads last month
- 11
