🔥 Upcoming improvements to OMNIMATH-2B are coming soon 🔥


🧮 OmniMath-2B

OmniMath-2B is a compact yet capable mathematical reasoning model, fine‑tuned on top of Qwen3.5‑2B's hybrid architecture (Gated Delta Networks interleaved with standard attention). Trained on 10,000 carefully selected math problems from five diverse open‑source datasets, it excels at step‑by‑step solutions, arithmetic word problems, geometry reasoning, and error recovery.

Despite its small size, OmniMath-2B demonstrates strong chain‑of‑thought performance and is ideally suited for resource‑constrained environments, edge deployment, and fast prototyping.


✨ Key Features

  • Efficient 2B Scale : Only 2 billion parameters – runs smoothly on a single T4 GPU or even CPU with quantization.
  • Multi‑Source Math Training : Balanced mix of real‑world problems (orca‑math, GSM8K), synthetic reasoning (MetaMathQA), geometry (Geo‑Thought), and multi‑modal math (DeepVision text subset).
  • Step‑by‑Step Reasoning : Trained with explicit <think>...</think>‑style chain‑of‑thought prompts.
  • Hybrid Architecture : Inherits Qwen3.5's Gated Delta Networks for efficient long‑context processing.
  • Apache 2.0 License : Fully open weights, free for commercial use.

📊 Benchmarks

Preliminary results (evaluation ongoing).

Model Size (params) GSM8K Accuracy
Qwen2.5-Math-1.5B-Instruct 1.5B 84.8%
Phi-2 (8-shot CoT) 2.7B 61.1%
OmniMath-2B 2B 58.76%
dolphin-2_6-phi-2 2.7B 58.07%
Qwen2.5-0.5B-Instruct 2.7B 49.6%
gemma-3-1b-it 1.1B 62.8%

Updates coming soon.


🚀 Quickstart

🤗 Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "ZirTech/OmniMath-2B"  

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are a helpful math assistant. Solve problems step by step."},
    {"role": "user", "content": "A store sells apples for $2 each. If you buy 5 apples, how much do you pay?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.6, top_p=0.95, top_k=20)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))

⚡ vLLM

vllm serve ZirTech/OmniMath-2B --tensor-parallel-size 1 --max-model-len 4096
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "ZirTech/OmniMath-2B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)
model.eval()

def ask(question):
    prompt = f"<|im_start|>system\nYou are a helpful math assistant.<|im_end|>\n<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant\n"
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.0, do_sample=False)
    response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
    if "user" in response:
        response = response.split("user")[0].strip()
    return response

print(ask("Find the degree for the given field extension Q(sqrt(2), sqrt(3), sqrt(18)) over Q. Give me the answer."))

🏗️ Architecture

OmniMath‑2B fully preserves Qwen3.5‑2B's design:

  • Gated Delta Networks : Linear attention layers interleaved with standard attention.

  • 262K Native Context : Supports up to 262,144 tokens (extendable with YaRN).

  • Built on Qwen3_5ForCausalLM : Seamless integration with Hugging Face ecosystem.


⚠️ Limitations

  • Numerical accuracy may occasionally falter – always double‑check critical calculations.

  • Geometry with visual elements was only trained on textual descriptions; performance on image‑based geometry is limited.

  • Non‑English math problems are not thoroughly evaluated.


🙏 Acknowledgments

  • Qwen Team for the outstanding Qwen3.5 base models.

  • Hugging Face for dataset hosting and the Transformers library.

  • Kaggle for providing free GPU hours.


📖 Citation

@misc{omnimath2b2026,
  title={OmniMath-2B: A Lightweight Open Mathematical Reasoning Model},
  author={Zirt Techniques},
  year={2026},
  url={https://huggingface.co/ZirTech/OmniMath-2B}
}

Built by Zirt Tech ❤️

Downloads last month
250
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including ZirTech/OmniMath-2B