Instructions to use nur-dev/farabi-0.6B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nur-dev/farabi-0.6B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nur-dev/farabi-0.6B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("nur-dev/farabi-0.6B") model = AutoModelForCausalLM.from_pretrained("nur-dev/farabi-0.6B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use nur-dev/farabi-0.6B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nur-dev/farabi-0.6B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nur-dev/farabi-0.6B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/nur-dev/farabi-0.6B
- SGLang
How to use nur-dev/farabi-0.6B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nur-dev/farabi-0.6B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nur-dev/farabi-0.6B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nur-dev/farabi-0.6B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nur-dev/farabi-0.6B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use nur-dev/farabi-0.6B with Docker Model Runner:
docker model run hf.co/nur-dev/farabi-0.6B
Farabi-0.6B
Farabi-0.6B is a compact, multilingual instruction-tuned language model with a primary focus on Kazakh, alongside strong Russian and English support. It is designed for everyday assistant use, reasoning, retrieval-grounded answering, and tool / function calling in agentic applications.
The model speaks fluent Kazakh and is intended to make high-quality conversational AI more accessible for the Kazakh language, where well-aligned models remain scarce.
Created by Nurgali Kadyrbek.
It is built on nur-dev/farabi-0.6B-base —
a Kazakh-adapted base model that was itself continually pre-trained from Qwen3-0.6B — and then
instruction-tuned to produce this assistant.
Highlights
- 🇰🇿 Kazakh-first — the majority of the instruction data is native Kazakh, with Russian and English mixed in for cross-lingual robustness.
- 🧠 Reasoning — supports optional step-by-step "thinking" mode that can be toggled on or off at request time.
- 🔧 Tool calling — emits Hermes-style
<tool_call>blocks and is compatible with the OpenAI-style function-calling interface and agent frameworks. - 📚 Grounded answering — trained to answer from provided documents and context, including longer inputs.
- 🪶 Small & deployable — 0.6B parameters, runs comfortably on a single modest GPU.
Languages
| Language | Approx. share of instruction data |
|---|---|
| Kazakh (kk) | ~56% |
| English (en) | ~33% |
| Russian (ru) | ~10% |
Data coverage by domain
The model was instruction-tuned on a broad, internally curated mixture. Described in general terms (no technical specifics), the approximate domain composition is:
| Domain | Approx. share |
|---|---|
| General instruction following & multi-turn conversation | ~45% |
| Reasoning & step-by-step problem solving | ~27% |
| Retrieval-grounded answering, long context & document Q&A | ~13% |
| Tool use, function calling & agentic interaction | ~7% |
| Knowledge, culture, news & encyclopedic content | ~4% |
| Mathematics, language tasks (grammar / translation), safety & appropriate refusal, device & environment control, and assistant identity | ~4% |
Shares are approximate and reflect general domain proportions rather than exact figures.
Data provenance & acknowledgments
The training datasets were created internally by the author, including original synthesis as well as additionally processed and enriched material.
Approximately 5.4% of all data used for instruction tuning was derived (with additional processing and enrichment) from resources of two organizations, whose contributions to the Kazakh language are gratefully acknowledged:
- Институт языкознания имени А. Байтурсынова — Institute of Linguistics named after A. Baitursynov
- ННПЦ «Тіл-Қазына» имени Шайсултана Шаяхметова — Sh. Shayakhmetov National Research and Practical Center "Til-Qazyna"
Recommended sampling parameters
A good starting point for general use:
{
"temperature": 0.15,
"top_p": 0.95,
"max_tokens": 1024,
"repetition_penalty": 1.05,
"stream": true,
"chat_template_kwargs": {
"enable_thinking": true
},
"continue_final_message": true
}
Set "enable_thinking": false to get direct answers without an explicit reasoning step.
Raise temperature for more creative / open-ended generation.
Serving with vLLM
Start an OpenAI-compatible server with tool-calling enabled:
vllm serve nur-dev/farabi-0.6B \
--served-model-name farabi-0.6b \
--enable-auto-tool-choice \
--tool-call-parser hermes
Query it with the standard OpenAI client (and the recommended sampling params):
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
resp = client.chat.completions.create(
model="farabi-0.6b",
messages=[
{"role": "system", "content": "Сіз пайдалы әрі дәл көмекшісіз."},
{"role": "user", "content": "Алматы туралы қысқаша айтып бер."},
],
temperature=0.15,
top_p=0.95,
max_tokens=1024,
extra_body={
"repetition_penalty": 1.05,
"chat_template_kwargs": {"enable_thinking": True},
},
stream=True,
)
for chunk in resp:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
Tool calling works through the standard tools=[...] argument — the model returns
function calls that the server parses into structured tool_calls.
Serving with PyTorch / Transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "nur-dev/farabi-0.6B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "Сіз пайдалы әрі дәл көмекшісіз."},
{"role": "user", "content": "Қазақстанның астанасы қай қала?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
enable_thinking=True, # set False for direct answers
return_tensors="pt",
).to(model.device)
outputs = model.generate(
inputs,
max_new_tokens=1024,
do_sample=True,
temperature=0.15,
top_p=0.95,
repetition_penalty=1.05,
)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
Evaluation
⚠️ Interim results. The numbers below were measured on an early checkpoint (~17% through instruction tuning). They are expected to improve as training continues, but already show meaningful capability.
Tool / function calling — BFCL v4
Berkeley Function-Calling Leaderboard (v4), 1,040 cases, evaluated with the HuggingFace backend.
| Category | Accuracy | n | What it measures |
|---|---|---|---|
| Simple | 80.5% | 322/400 | one call, one tool available |
| Multiple | 71.5% | 143/200 | pick the right tool from several |
| Parallel | 65.5% | 131/200 | several calls in one turn |
| Irrelevance | 5.4% | 13/240 | abstain when no tool fits |
| Overall | 58.6% | 609/1040 | |
| Function-calling avg | 74.5% | 596/800 | excludes irrelevance |
Takeaways:
- Strong calling ability for a 0.6B model. When a call is warranted it is correct ~74.5% of the time — right tool, valid arguments, clean JSON — including 65.5% on the hard parallel / multi-call category.
- The weakness is abstention, not calling. On queries that match no available tool, the model still tends to emit a call (irrelevance 5.4% → it over-triggers). This is the main driver of the lower overall score and the clearest area for improvement.
Multilingual comprehension — 4-way multiple choice
Multiple-choice comprehension across the model's three languages (random baseline = 25%),
evaluated with the chat template and enable_thinking=False.
| Language | Accuracy |
|---|---|
| English | 53.7% ±1.7 |
| Russian | 50.0% ±1.7 |
| Kazakh | 41.8% ±1.6 |
Takeaways:
- Well above the 25% random baseline in all three languages — real comprehension in English, Russian, and Kazakh.
- Resource ordering (en > ru > kk) is as expected; Kazakh at 41.8% is clearly non-trivial.
- Evaluating with the chat template and
enable_thinking=Falseadds ~5–6 points per language versus a raw prompt — another reason to serve the model with its chat template (see serving instructions above).
Intended use & limitations
Farabi-0.6B is intended as a helpful general-purpose and agentic assistant, with a focus on Kazakh-language use cases. As a small model, it can make factual mistakes, and outputs should be verified for high-stakes or factual-critical applications. It should be used responsibly and in accordance with applicable laws and the base model's license.
Citation
If you use this model, please credit the author:
Nurgali Kadyrbek — Farabi-0.6B. https://www.linkedin.com/in/nurgali-kadyrbek-504260231/
- Downloads last month
- 94
Model tree for nur-dev/farabi-0.6B
Base model
nur-dev/farabi-0.6B-base