(v2 train scripts : RoPE Postitional Encoding)
SpiceeChat β Train From Scratch via Flash3 Kernels
A custom TinyGPT trained from scratch using a BPE tokenizer and causal LM pipeline.
Model Details
| Property | Value |
|---|---|
| Architecture | TinyGPT (custom GPT-style) |
| Layers | 4 |
| Heads | 4 |
| Hidden size | 384 |
| Context length | 128 |
| Vocab size | 32,768 |
| Attention | Torch (T4 compatible) |
Files
checkpoint_step_*.ptβ model weightstokenizer/β BPE tokenizer trained on the same dataconfig.jsonβ model hyperparameters
Load
import torch
import json
from tokenizers import Tokenizer
# Load tokenizer
tok = Tokenizer.from_file("tokenizer/tokenizer.json")
# Load model (requires train.py in same directory)
from train import TinyGPT, GPTConfig
cfg = GPTConfig(vocab_size=32768, ctx_len=128, n_layer=4, n_head=4, n_embd=384, attention_backend="torch")
model = TinyGPT(cfg)
ckpt = torch.load("latest.pt", map_location="cpu")
model.load_state_dict(ckpt["model"])
model.eval()
- Downloads last month
- 20
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support