You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Farabi-0.6B

Farabi-0.6B is a compact, multilingual instruction-tuned language model with a primary focus on Kazakh, alongside strong Russian and English support. It is designed for everyday assistant use, reasoning, retrieval-grounded answering, and tool / function calling in agentic applications.

The model speaks fluent Kazakh and is intended to make high-quality conversational AI more accessible for the Kazakh language, where well-aligned models remain scarce.

Created by Nurgali Kadyrbek.

It is built on nur-dev/farabi-0.6B-base — a Kazakh-adapted base model that was itself continually pre-trained from Qwen3-0.6B — and then instruction-tuned to produce this assistant.


Highlights

  • 🇰🇿 Kazakh-first — the majority of the instruction data is native Kazakh, with Russian and English mixed in for cross-lingual robustness.
  • 🧠 Reasoning — supports optional step-by-step "thinking" mode that can be toggled on or off at request time.
  • 🔧 Tool calling — emits Hermes-style <tool_call> blocks and is compatible with the OpenAI-style function-calling interface and agent frameworks.
  • 📚 Grounded answering — trained to answer from provided documents and context, including longer inputs.
  • 🪶 Small & deployable — 0.6B parameters, runs comfortably on a single modest GPU.

Languages

Language Approx. share of instruction data
Kazakh (kk) ~56%
English (en) ~33%
Russian (ru) ~10%

Data coverage by domain

The model was instruction-tuned on a broad, internally curated mixture. Described in general terms (no technical specifics), the approximate domain composition is:

Domain Approx. share
General instruction following & multi-turn conversation ~45%
Reasoning & step-by-step problem solving ~27%
Retrieval-grounded answering, long context & document Q&A ~13%
Tool use, function calling & agentic interaction ~7%
Knowledge, culture, news & encyclopedic content ~4%
Mathematics, language tasks (grammar / translation), safety & appropriate refusal, device & environment control, and assistant identity ~4%

Shares are approximate and reflect general domain proportions rather than exact figures.


Data provenance & acknowledgments

The training datasets were created internally by the author, including original synthesis as well as additionally processed and enriched material.

Approximately 5.4% of all data used for instruction tuning was derived (with additional processing and enrichment) from resources of two organizations, whose contributions to the Kazakh language are gratefully acknowledged:

  1. Институт языкознания имени А. БайтурсыноваInstitute of Linguistics named after A. Baitursynov
  2. ННПЦ «Тіл-Қазына» имени Шайсултана ШаяхметоваSh. Shayakhmetov National Research and Practical Center "Til-Qazyna"

Recommended sampling parameters

A good starting point for general use:

{
  "temperature": 0.15,
  "top_p": 0.95,
  "max_tokens": 1024,
  "repetition_penalty": 1.05,
  "stream": true,
  "chat_template_kwargs": {
    "enable_thinking": true
  },
  "continue_final_message": true
}

Set "enable_thinking": false to get direct answers without an explicit reasoning step. Raise temperature for more creative / open-ended generation.


Serving with vLLM

Start an OpenAI-compatible server with tool-calling enabled:

vllm serve nur-dev/farabi-0.6B \
  --served-model-name farabi-0.6b \
  --enable-auto-tool-choice \
  --tool-call-parser hermes

Query it with the standard OpenAI client (and the recommended sampling params):

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

resp = client.chat.completions.create(
    model="farabi-0.6b",
    messages=[
        {"role": "system", "content": "Сіз пайдалы әрі дәл көмекшісіз."},
        {"role": "user", "content": "Алматы туралы қысқаша айтып бер."},
    ],
    temperature=0.15,
    top_p=0.95,
    max_tokens=1024,
    extra_body={
        "repetition_penalty": 1.05,
        "chat_template_kwargs": {"enable_thinking": True},
    },
    stream=True,
)
for chunk in resp:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Tool calling works through the standard tools=[...] argument — the model returns function calls that the server parses into structured tool_calls.


Serving with PyTorch / Transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "nur-dev/farabi-0.6B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "Сіз пайдалы әрі дәл көмекшісіз."},
    {"role": "user", "content": "Қазақстанның астанасы қай қала?"},
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    enable_thinking=True,        # set False for direct answers
    return_tensors="pt",
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=1024,
    do_sample=True,
    temperature=0.15,
    top_p=0.95,
    repetition_penalty=1.05,
)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

Evaluation

⚠️ Interim results. The numbers below were measured on an early checkpoint (~17% through instruction tuning). They are expected to improve as training continues, but already show meaningful capability.

Tool / function calling — BFCL v4

Berkeley Function-Calling Leaderboard (v4), 1,040 cases, evaluated with the HuggingFace backend.

Category Accuracy n What it measures
Simple 80.5% 322/400 one call, one tool available
Multiple 71.5% 143/200 pick the right tool from several
Parallel 65.5% 131/200 several calls in one turn
Irrelevance 5.4% 13/240 abstain when no tool fits
Overall 58.6% 609/1040
Function-calling avg 74.5% 596/800 excludes irrelevance

Takeaways:

  • Strong calling ability for a 0.6B model. When a call is warranted it is correct ~74.5% of the time — right tool, valid arguments, clean JSON — including 65.5% on the hard parallel / multi-call category.
  • The weakness is abstention, not calling. On queries that match no available tool, the model still tends to emit a call (irrelevance 5.4% → it over-triggers). This is the main driver of the lower overall score and the clearest area for improvement.

Multilingual comprehension — 4-way multiple choice

Multiple-choice comprehension across the model's three languages (random baseline = 25%), evaluated with the chat template and enable_thinking=False.

Language Accuracy
English 53.7% ±1.7
Russian 50.0% ±1.7
Kazakh 41.8% ±1.6

Takeaways:

  • Well above the 25% random baseline in all three languages — real comprehension in English, Russian, and Kazakh.
  • Resource ordering (en > ru > kk) is as expected; Kazakh at 41.8% is clearly non-trivial.
  • Evaluating with the chat template and enable_thinking=False adds ~5–6 points per language versus a raw prompt — another reason to serve the model with its chat template (see serving instructions above).

Intended use & limitations

Farabi-0.6B is intended as a helpful general-purpose and agentic assistant, with a focus on Kazakh-language use cases. As a small model, it can make factual mistakes, and outputs should be verified for high-stakes or factual-critical applications. It should be used responsibly and in accordance with applicable laws and the base model's license.


Citation

If you use this model, please credit the author:

Nurgali Kadyrbek — Farabi-0.6B. https://www.linkedin.com/in/nurgali-kadyrbek-504260231/

Downloads last month
94
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nur-dev/farabi-0.6B

Finetuned
(1)
this model