ruinnight (ruins)

liked a Space 5 months ago

QED-Nano: Teaching a Tiny Model to Prove Hard Theorems

📝

79

Who needs 1T parameters? Olympiad proofs with a 4B model

liked 3 Spaces 9 months ago

Unlocking On-Policy Distillation for Any Model Family

📝

118

Explore on-policy distillation visualization for any model

Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks

📝

94

Evaluate multilingual models using FineTasks

The Smol Training Playbook

📚

3.24k

The secrets to building world-class LLMs

liked 3 models 10 months ago

liked a dataset 11 months ago

GPUMODE/KernelBook

Viewer • Updated Jun 9 • 18.2k • 375 • 56

liked a dataset over 1 year ago

OpenDILabCommunity/MasterMind

Viewer • Updated Mar 20, 2025 • 696k • 272 • 6

liked 2 Spaces over 1 year ago

Number Tokenization Blog

📈

123

Explore how tokenization affects arithmetic in LLMs

The Ultra-Scale Playbook

🌌

3.94k

The ultimate guide to training LLM on large GPU Clusters

ruins

AI & ML interests

Organizations

QED-Nano: Teaching a Tiny Model to Prove Hard Theorems

Unlocking On-Policy Distillation for Any Model Family

Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks

The Smol Training Playbook

ByteDance-Seed/cudaLLM-8B

cognition-ai/Kevin-32B

facebook/KernelLLM

GPUMODE/KernelBook

OpenDILabCommunity/MasterMind

Number Tokenization Blog

The Ultra-Scale Playbook

ruins

AI & ML interests

Organizations

ruinnight's activity

QED-Nano: Teaching a Tiny Model to Prove Hard Theorems

Unlocking On-Policy Distillation for Any Model Family

Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks

The Smol Training Playbook

Number Tokenization Blog

The Ultra-Scale Playbook