Twkeed-GPT-120B (توكيد)

A powerful Arabic language model based on GPT-OSS 120B (117B parameters, 5.1B active via MoE).

Fine-tuned on Saudi Arabian content including:

  • Saudi Labor Law articles (Articles 53, 84-85, 109, 151)
  • Saudi dialect understanding (يبي، وش، وين، الحين)
  • Arabic grammar and writing
  • Vision 2030 knowledge

Model Details

  • Base Model: mlx-community/gpt-oss-120b-MXFP4-Q8
  • Architecture: Mixture of Experts (117B total, 5.1B active)
  • Reasoning: Near o4-mini level capabilities
  • Fine-tuning Method: LoRA (r=8, alpha=16) with unsloth-mlx
  • Training Hardware: Mac Studio M3 Ultra 96GB
  • Language: Arabic (Modern Standard + Saudi Dialect)

Model Identity

This model identifies as توكيد (Twkeed) - an Arabic AI assistant with strong reasoning capabilities.

When asked "من أنت؟" (Who are you?), the model responds with its identity.

Usage

from mlx_lm import load, generate

model, tokenizer = load("twkeed-sa/twkeed-gpt-120b")

response = generate(
    model,
    tokenizer,
    prompt="مرحباً، من أنت؟",
    max_tokens=200,
)
print(response)

Why 120B?

Aspect 20B 120B
Parameters 21B (3.6B active) 117B (5.1B active)
Reasoning Good Excellent (near o4-mini)
Arabic Knowledge Very Good Excellent

The 120B base model provides much stronger reasoning and knowledge, requiring less fine-tuning adaptation.

Training Data

Fine-tuned on 40,000+ Arabic examples including:

  • Arabic Alpaca dataset
  • Custom Saudi Labor Law content
  • Saudi dialect examples
  • Arabic grammar instruction data

License

Apache 2.0

Author

Fine-tuned using unsloth-mlx

Downloads last month
27
Safetensors
Model size
117B params
Tensor type
BF16
·
U32
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for twkeed-sa/twkeed-gpt-120b

Quantized
(1)
this model