Llama-3.2-0.5B-Instruct

This is a tiny version of meta-llama/Llama-3.2-1B-Instruct created for testing and development.

Model Details

  • Base Model: meta-llama/Llama-3.2-1B-Instruct
  • Architecture: llama
  • Total Parameters: 0.51B
  • Activated Parameters: 0.51B (non-MoE)

Configuration Changes

The following parameters were reduced from the original model:

Parameter Original Tiny
num_hidden_layers 16 4
hidden_size 2048 2048
intermediate_size 8192 8192
num_attention_heads 32 32
num_key_value_heads 8 8

Checkpoint Structure

This model uses a single model.safetensors file containing all weights. The checkpoint structure is identical to the original model, with the standard Llama architecture tensors:

  • model.embed_tokens.weight
  • model.layers.*.self_attn.{q,k,v,o}_proj.weight
  • model.layers.*.mlp.{gate,up,down}_proj.weight
  • model.layers.*.{input,post_attention}_layernorm.weight
  • model.norm.weight

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("inference-optimization/Llama-3.2-0.5B-Instruct", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("inference-optimization/Llama-3.2-0.5B-Instruct")

input_ids = tokenizer("According to all known laws", return_tensors="pt").input_ids.to(model.device)
output = model.generate(input_ids, max_new_tokens=20)
print(tokenizer.decode(output[0]))

Validation

Success: 1.0247299671173096 <= 10.0

==================================================
Generating sample text:
According to all known laws of aviation, there is no way a bee should be able to fly
==================================================

Creation Process

This model was created using the llm-compressor create-tiny-model claude skill:

  1. Inspected the original model configuration to identify key parameters
  2. Created a tiny version by reducing num_hidden_layers from 16 to 4
  3. Fine-tuned the model on a toy dataset (famous copypastas) to validate learning capability
  4. Achieved target perplexity of ~1.02 on the validation text
  5. Validated checkpoint structure matches the original model format
  6. Confirmed successful loading and inference

Notes

  • This model was fine-tuned on a small corpus of internet copypastas to ensure it can learn effectively
  • The model maintains the same Llama 3.2 architecture (including RoPE parameters) as the base model, just with fewer layers
  • Due to the reduced layer count, this model has approximately 25% of the original model's parameters
  • This is intended for development and testing purposes, not production use
Downloads last month
928
Safetensors
Model size
0.5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for inference-optimization/Llama-3.2-0.5B-Instruct

Finetuned
(1754)
this model