Instructions to use Pran-Ker/gpu-mode-trimul with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Pran-Ker/gpu-mode-trimul with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
gpu-mode-trimul — LoRA checkpoint (step 30)
LoRA adapter for gpt-oss-120b, trained with reinforcement learning on the GPU Mode TriMul competition — triangular matrix multiplication on H100.
Produced via TTT-Discover (Yuksekgonul et al., 2026):
"We perform reinforcement learning at test time, allowing the LLM to continue training with experience specific to the problem at hand … Our test-time training runs are performed using Tinker, an API by Thinking Machines, with a cost of only a few hundred dollars per problem."
What is TTT-Discover?
Instead of prompting a frozen model (like AlphaEvolve), TTT-Discover keeps training on your
specific problem at test time. The model earns a reward signal from real execution feedback
(reward = 1500 / runtime_μs) and learns to write faster Triton kernels through trial and error —
no human-written examples needed.
Published results (H100, TriMul):
| A100 ↓ | H100 ↓ | B200 ↓ | MI300x ↓ | |
|---|---|---|---|---|
| Best Human | 4531 μs | 1371 μs | 1005 μs | 2462 μs |
| TTT-Discover | 2198 μs | 1161 μs | 905 μs | 1596 μs |
Verify on GPU Mode leaderboard →
This checkpoint reached ~3638 μs at step 14 (best reward 0.412). Starting from it instead of scratch saves ~14 steps of cold-start exploration.
Reward trajectory (this run)
| Step | reward_max | reward_mean | runtime (best) |
|---|---|---|---|
| 7 | 0.213 | 0.054 | ~7040 μs |
| 8 | 0.273 | 0.126 | ~5490 μs |
| 10 | 0.281 | 0.173 | ~5340 μs |
| 13 | 0.376 | 0.124 | ~3990 μs |
| 14 | 0.412 | 0.118 | ~3638 μs ← best |
| 18 | 0.281 | 0.152 | ~5340 μs |
| 25 | 0.281 | 0.156 | ~5340 μs |
| 30 | 0.280 | 0.210 | ~5360 μs |
Steps 15–17 had zero reward due to an eval cluster billing limit — not a training failure.
Files
sampler_weights/
adapter_model.safetensors # LoRA weights (~5 GB) — use for inference
adapter_config.json # PEFT config (rank=32, target=all-linear)
checkpoint_complete # Completion marker
gpu_mode_trimul.ipynb # Google Colab notebook
Training state (for resuming RL with fresh optimizer):
tinker://681a070d-2ef4-5b8c-a216-d4f22dca1efb:train:0/weights/000030
Quick start
Open the notebook above to run TTT-Discover on TriMul, fork from this checkpoint, or plug in your own GPU kernel problem.
What you need
| Service | Purpose | Get it |
|---|---|---|
| Tinker | Hosts gpt-oss-120b + LoRA training |
Request access |
| Modal | H100 GPU sandbox for kernel eval | Free tier |
| Weights & Biases | Run tracking | Free account |
No local GPU required — training runs on Tinker's cluster; kernel evals run on Modal H100s.
Warm-start RL from this checkpoint
import asyncio, os
from ttt_discover.rl.train import Config, main as rl_main
from ttt_discover.tinker_utils import misc_utils
from ttt_discover.tinker_utils.dataset_builder import DatasetConfig, get_single_problem_dataset_builder
from examples.gpu_mode.env import GpuModeEnv # from github.com/test-time-training/discover
CHECKPOINT = "tinker://681a070d-2ef4-5b8c-a216-d4f22dca1efb:train:0/weights/000030"
EXPERIMENT = "my-trimul-run"
log_path = f"./tinker_log/{EXPERIMENT}"
os.makedirs(log_path, exist_ok=True)
dataset_builder = get_single_problem_dataset_builder(DatasetConfig(
env_type=GpuModeEnv, problem_type="trimul",
batch_size=4, group_size=16,
model_name_for_tokenizer="openai/gpt-oss-120b",
renderer_name="gpt_oss_high_reasoning",
num_cpus_per_task=0, eval_timeout=530, log_path=log_path,
))
config = Config(
env_type=GpuModeEnv, problem_type="trimul",
learning_rate=4e-5, dataset_builder=dataset_builder,
model_name="openai/gpt-oss-120b", lora_rank=32,
wandb_project="gpu-mode", wandb_name=EXPERIMENT,
log_path=log_path,
load_checkpoint_path=CHECKPOINT, # warm start ← key line
num_epochs=20, save_every=1,
kl_penalty_coef=0.1, phase1_max_tokens=26000,
loss_fn="importance_sampling",
adv_estimator="entropic_adaptive_beta", adv_estimator_beta=2.0,
remove_constant_reward_groups=True, num_substeps=1, local_model_path=None,
)
misc_utils.check_log_dir(log_path, behavior_if_exists="resume")
asyncio.run(rl_main(config))
Single-shot inference
import tinker, asyncio
SAMPLER = "tinker://681a070d-2ef4-5b8c-a216-d4f22dca1efb:train:0/sampler_weights/000030"
async def ask(prompt):
svc = tinker.ServiceClient(base_url=None)
client = await svc.create_sampling_client_async(SAMPLER)
resp = await client.sample_async(tinker.SampleRequest(
model_input=tinker.ModelInput.from_text(prompt),
sampling_params=tinker.SamplingParams(temperature=0.8, max_new_tokens=4096),
))
return resp.completion_text
print(asyncio.run(ask("Write a fast Triton kernel for triangular matmul on H100.")))
Paper
Learning to Discover at Test Time
Mert Yuksekgonul*, Daniel Koceja*, Xinhao Li*, Federico Bianchi*, Jed McCaleb,
Xiaolong Wang, Jan Kautz, Yejin Choi, James Zou†, Carlos Guestrin†, Yu Sun
Stanford · NVIDIA · Astera Institute · UC San Diego · Together AI
arXiv:2601.16175 · Project page · PDF
@article{ttt-discover2026,
title = {Learning to Discover at Test Time},
author = {Yuksekgonul, Mert and Koceja, Daniel and Li, Xinhao
and Bianchi, Federico and McCaleb, Jed and Wang, Xiaolong
and Kautz, Jan and Choi, Yejin and Zou, James
and Guestrin, Carlos and Sun, Yu},
journal = {arXiv preprint arXiv:2601.16175},
year = {2026}
}
Acknowledgments
- Downloads last month
- -
Model tree for Pran-Ker/gpu-mode-trimul
Base model
openai/gpt-oss-120b