gpu-mode-trimul — LoRA checkpoint (step 30)

LoRA adapter for gpt-oss-120b, trained with reinforcement learning on the GPU Mode TriMul competition — triangular matrix multiplication on H100.

Produced via TTT-Discover (Yuksekgonul et al., 2026):

"We perform reinforcement learning at test time, allowing the LLM to continue training with experience specific to the problem at hand … Our test-time training runs are performed using Tinker, an API by Thinking Machines, with a cost of only a few hundred dollars per problem."

What is TTT-Discover?

Instead of prompting a frozen model (like AlphaEvolve), TTT-Discover keeps training on your specific problem at test time. The model earns a reward signal from real execution feedback (reward = 1500 / runtime_μs) and learns to write faster Triton kernels through trial and error — no human-written examples needed.

Published results (H100, TriMul):

	A100 ↓	H100 ↓	B200 ↓	MI300x ↓
Best Human	4531 μs	1371 μs	1005 μs	2462 μs
TTT-Discover	2198 μs	1161 μs	905 μs	1596 μs

Verify on GPU Mode leaderboard →

This checkpoint reached ~3638 μs at step 14 (best reward 0.412). Starting from it instead of scratch saves ~14 steps of cold-start exploration.

Reward trajectory (this run)

Step	reward_max	reward_mean	runtime (best)
7	0.213	0.054	~7040 μs
8	0.273	0.126	~5490 μs
10	0.281	0.173	~5340 μs
13	0.376	0.124	~3990 μs
14	0.412	0.118	~3638 μs ← best
18	0.281	0.152	~5340 μs
25	0.281	0.156	~5340 μs
30	0.280	0.210	~5360 μs

Steps 15–17 had zero reward due to an eval cluster billing limit — not a training failure.

Files

sampler_weights/
  adapter_model.safetensors   # LoRA weights (~5 GB) — use for inference
  adapter_config.json         # PEFT config (rank=32, target=all-linear)
  checkpoint_complete         # Completion marker
gpu_mode_trimul.ipynb         # Google Colab notebook

Training state (for resuming RL with fresh optimizer): tinker://681a070d-2ef4-5b8c-a216-d4f22dca1efb:train:0/weights/000030

Quick start

Open the notebook above to run TTT-Discover on TriMul, fork from this checkpoint, or plug in your own GPU kernel problem.

What you need

Service	Purpose	Get it
Tinker	Hosts `gpt-oss-120b` + LoRA training	Request access
Modal	H100 GPU sandbox for kernel eval	Free tier
Weights & Biases	Run tracking	Free account

No local GPU required — training runs on Tinker's cluster; kernel evals run on Modal H100s.

Warm-start RL from this checkpoint

import asyncio, os
from ttt_discover.rl.train import Config, main as rl_main
from ttt_discover.tinker_utils import misc_utils
from ttt_discover.tinker_utils.dataset_builder import DatasetConfig, get_single_problem_dataset_builder
from examples.gpu_mode.env import GpuModeEnv  # from github.com/test-time-training/discover

CHECKPOINT = "tinker://681a070d-2ef4-5b8c-a216-d4f22dca1efb:train:0/weights/000030"
EXPERIMENT  = "my-trimul-run"
log_path    = f"./tinker_log/{EXPERIMENT}"
os.makedirs(log_path, exist_ok=True)

dataset_builder = get_single_problem_dataset_builder(DatasetConfig(
    env_type=GpuModeEnv, problem_type="trimul",
    batch_size=4, group_size=16,
    model_name_for_tokenizer="openai/gpt-oss-120b",
    renderer_name="gpt_oss_high_reasoning",
    num_cpus_per_task=0, eval_timeout=530, log_path=log_path,
))

config = Config(
    env_type=GpuModeEnv, problem_type="trimul",
    learning_rate=4e-5, dataset_builder=dataset_builder,
    model_name="openai/gpt-oss-120b", lora_rank=32,
    wandb_project="gpu-mode", wandb_name=EXPERIMENT,
    log_path=log_path,
    load_checkpoint_path=CHECKPOINT,   # warm start ← key line
    num_epochs=20, save_every=1,
    kl_penalty_coef=0.1, phase1_max_tokens=26000,
    loss_fn="importance_sampling",
    adv_estimator="entropic_adaptive_beta", adv_estimator_beta=2.0,
    remove_constant_reward_groups=True, num_substeps=1, local_model_path=None,
)

misc_utils.check_log_dir(log_path, behavior_if_exists="resume")
asyncio.run(rl_main(config))

Single-shot inference

import tinker, asyncio

SAMPLER = "tinker://681a070d-2ef4-5b8c-a216-d4f22dca1efb:train:0/sampler_weights/000030"

async def ask(prompt):
    svc = tinker.ServiceClient(base_url=None)
    client = await svc.create_sampling_client_async(SAMPLER)
    resp = await client.sample_async(tinker.SampleRequest(
        model_input=tinker.ModelInput.from_text(prompt),
        sampling_params=tinker.SamplingParams(temperature=0.8, max_new_tokens=4096),
    ))
    return resp.completion_text

print(asyncio.run(ask("Write a fast Triton kernel for triangular matmul on H100.")))

Paper

Learning to Discover at Test Time
Mert Yuksekgonul*, Daniel Koceja*, Xinhao Li*, Federico Bianchi*, Jed McCaleb, Xiaolong Wang, Jan Kautz, Yejin Choi, James Zou†, Carlos Guestrin†, Yu Sun
Stanford · NVIDIA · Astera Institute · UC San Diego · Together AI
arXiv:2601.16175 · Project page · PDF

@article{ttt-discover2026,
  title   = {Learning to Discover at Test Time},
  author  = {Yuksekgonul, Mert and Koceja, Daniel and Li, Xinhao
             and Bianchi, Federico and McCaleb, Jed and Wang, Xiaolong
             and Kautz, Jan and Choi, Yejin and Zou, James
             and Guestrin, Carlos and Sun, Yu},
  journal = {arXiv preprint arXiv:2601.16175},
  year    = {2026}
}

Acknowledgments

GPU Mode — community for GPU kernel optimization and the TriMul competition
Tinker — LLM training and RL infrastructure by Thinking Machines

Downloads last month: -

Video Preview

Reinforcement Learning

Model tree for Pran-Ker/gpu-mode-trimul

Base model

openai/gpt-oss-120b

Adapter

(120)

this model

Paper for Pran-Ker/gpu-mode-trimul

Learning to Discover at Test Time

Paper • 2601.16175 • Published Jan 22 • 45