Note: SFT was not done on this model, it cannot respond to questions 9/10 times.

Glint-1.3

⚠️ IMPORTANT NOTICE

  1. This model is experimental. Glint-1.3 is a 982K parameter research model.
  2. Performance characteristics: This model may occasionally output chuamliamce. If it does, try again. It is shy.
  3. Not production-ready: This is a tiny neural network running on a prayer and a GPU.

Quick Stats

Stat Value
Parameters 982,656 (under 1M πŸ‘)
Training Tokens 100 Billion (FineWeb-Edu)
Hardware RTX 5090
Context Window 256 tokens
Inference Speed 138,562.1925 tok/s
Vibe Doing its best

What Is This?

Glint-1.3 is the first model in the CompactAI scaling-down plan.

We spent months adding features, SPIN, DPO, sleep gates, retention, recurrent loops, LoRA, engrams. More parameters, more tricks, more complexity. And you know what? The features were hurting the models. The tiny models couldn't breathe. So we're doing the opposite now: scaling down. Strip everything. Pure Llama. See how far simplicity goes.

This is that experiment. ~1M params. No gimmicks. Just a transformer doing its best.

It runs at 138,000 tokens per second on an RTX 5090. Fun, but useless. lmao.

The Journey

The model improves monotonically over 95K training steps on 100B tokens, with Wikitext-2 cross-entropy loss dropping from 4.29 β†’ 3.08. For a 1M parameter model, this is actually respectable.

Model Specifications

Parameter Value
Architecture Transformer Decoder (Llama-style)
Parameters 982,656
Hidden Dim 128
Layers 4
Attention Heads 4
KV Heads 4 (GQA)
MLP Intermediate 384 (SwiGLU)
Context Length 256 tokens
Vocab Size 500 (ByteLevel BPE)
Normalization RMSNorm
Position Encoding RoPE
Embeddings Tied input/output

Benchmarks

All checkpoints evaluated on Wikitext-2, BLiMP (grammaticality), and ARC-Easy (science QA). Sliding-window log-prob scoring methodology from the CompactAI benchmark suite.

Checkpoint Benchmark

Per-Metric Standouts

Metric Best Checkpoint Score
Wikitext-2 CE Loss Step 95,000 3.06
BLiMP Accuracy Step 11,500 64.2%
ARC-Easy Accuracy Step 55,500 32.5%

Merged Model (Model Soup)

Weight averaging the best checkpoints per benchmark via per-parameter-group SLERP produces a model that exceeds individual bests on certain metrics:

Model WT Loss BLiMP ARC Composite
Best Merged 3.148 68.7% πŸ† 29.0% 1.391 πŸ†
Best WT (step 95367) 3.080 53.7% 25.0% 1.431
Best BLiMP (step 11500) 3.307 64.2% 22.5% 1.480
Best ARC (step 55500) 3.128 50.7% 32.5% 1.432

The merged model achieves superadditive BLiMP gains (+4.5% over the individual best checkpoint) through spherical interpolation of attention and MLP weights at different blend factors.

Training Details

Parameter Value
Dataset FineWeb-Edu (sample-10BT)
Batch Size 4,096 (gradient accumulation 1)
Sequence Length 256
Learning Rate 8e-4 (cosine decay, 200 step warmup)
Weight Decay 0.05
Max Grad Norm 0.5
Optimizer AdamW (fused, β₁=0.9, Ξ²β‚‚=0.95)
Precision bfloat16
Hardware NVIDIA RTX 5090 (throughout)
Training Time ~30 hours for 95K steps

Usage

infer

Limitations

  • Context window: 256 tokens severely limits long-range dependencies
  • Knowledge: Extremely limited world knowledge due to parameter constraints
  • Coherence: May lose track of topic after a few sentences
  • Repetition: Tends toward repetitive patterns at higher temperatures
  • Reliability: Not suitable for any production application
  • Purpose: Research, education, and architectural experimentation

Related Models

  • Glint-1.3 β€” 1M params, instruction-tuned, our other scaling-down experiment
  • Shard-1 β€” 54.5M params, Gemma-4 attention
  • TMLM-Haiku-2.3 β€” 1M params, 10B tokens, SPIN-optimized (pre-scaling-down era)

Citation

@misc{tinylm1m,
  author = {CompactAI},
  title  = {Glint-1.3: a 982K parameter Llama-style transformer},
  year   = {2026},
  publisher = {GitHub},
  url    = {https://github.com/CompactAI-O/TinyLM}
}

Built by CompactAI. Small models trying their best since 2026.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train CompactAI-O/Glint-1.3

Collection including CompactAI-O/Glint-1.3