gpu-mode-trimul — LoRA checkpoint (step 30)

arXiv Project Page GitHub Open In Colab

LoRA adapter for gpt-oss-120b, trained with reinforcement learning on the GPU Mode TriMul competition — triangular matrix multiplication on H100.

Produced via TTT-Discover (Yuksekgonul et al., 2026):

"We perform reinforcement learning at test time, allowing the LLM to continue training with experience specific to the problem at hand … Our test-time training runs are performed using Tinker, an API by Thinking Machines, with a cost of only a few hundred dollars per problem."


What is TTT-Discover?

Instead of prompting a frozen model (like AlphaEvolve), TTT-Discover keeps training on your specific problem at test time. The model earns a reward signal from real execution feedback (reward = 1500 / runtime_μs) and learns to write faster Triton kernels through trial and error — no human-written examples needed.

Published results (H100, TriMul):

A100 ↓ H100 ↓ B200 ↓ MI300x ↓
Best Human 4531 μs 1371 μs 1005 μs 2462 μs
TTT-Discover 2198 μs 1161 μs 905 μs 1596 μs

Verify on GPU Mode leaderboard →

This checkpoint reached ~3638 μs at step 14 (best reward 0.412). Starting from it instead of scratch saves ~14 steps of cold-start exploration.


Reward trajectory (this run)

Step reward_max reward_mean runtime (best)
7 0.213 0.054 ~7040 μs
8 0.273 0.126 ~5490 μs
10 0.281 0.173 ~5340 μs
13 0.376 0.124 ~3990 μs
14 0.412 0.118 ~3638 μs ← best
18 0.281 0.152 ~5340 μs
25 0.281 0.156 ~5340 μs
30 0.280 0.210 ~5360 μs

Steps 15–17 had zero reward due to an eval cluster billing limit — not a training failure.


Files

sampler_weights/
  adapter_model.safetensors   # LoRA weights (~5 GB) — use for inference
  adapter_config.json         # PEFT config (rank=32, target=all-linear)
  checkpoint_complete         # Completion marker
gpu_mode_trimul.ipynb         # Google Colab notebook

Training state (for resuming RL with fresh optimizer): tinker://681a070d-2ef4-5b8c-a216-d4f22dca1efb:train:0/weights/000030


Quick start

Open In Colab

Open the notebook above to run TTT-Discover on TriMul, fork from this checkpoint, or plug in your own GPU kernel problem.

What you need

Service Purpose Get it
Tinker Hosts gpt-oss-120b + LoRA training Request access
Modal H100 GPU sandbox for kernel eval Free tier
Weights & Biases Run tracking Free account

No local GPU required — training runs on Tinker's cluster; kernel evals run on Modal H100s.

Warm-start RL from this checkpoint

import asyncio, os
from ttt_discover.rl.train import Config, main as rl_main
from ttt_discover.tinker_utils import misc_utils
from ttt_discover.tinker_utils.dataset_builder import DatasetConfig, get_single_problem_dataset_builder
from examples.gpu_mode.env import GpuModeEnv  # from github.com/test-time-training/discover

CHECKPOINT = "tinker://681a070d-2ef4-5b8c-a216-d4f22dca1efb:train:0/weights/000030"
EXPERIMENT  = "my-trimul-run"
log_path    = f"./tinker_log/{EXPERIMENT}"
os.makedirs(log_path, exist_ok=True)

dataset_builder = get_single_problem_dataset_builder(DatasetConfig(
    env_type=GpuModeEnv, problem_type="trimul",
    batch_size=4, group_size=16,
    model_name_for_tokenizer="openai/gpt-oss-120b",
    renderer_name="gpt_oss_high_reasoning",
    num_cpus_per_task=0, eval_timeout=530, log_path=log_path,
))

config = Config(
    env_type=GpuModeEnv, problem_type="trimul",
    learning_rate=4e-5, dataset_builder=dataset_builder,
    model_name="openai/gpt-oss-120b", lora_rank=32,
    wandb_project="gpu-mode", wandb_name=EXPERIMENT,
    log_path=log_path,
    load_checkpoint_path=CHECKPOINT,   # warm start ← key line
    num_epochs=20, save_every=1,
    kl_penalty_coef=0.1, phase1_max_tokens=26000,
    loss_fn="importance_sampling",
    adv_estimator="entropic_adaptive_beta", adv_estimator_beta=2.0,
    remove_constant_reward_groups=True, num_substeps=1, local_model_path=None,
)

misc_utils.check_log_dir(log_path, behavior_if_exists="resume")
asyncio.run(rl_main(config))

Single-shot inference

import tinker, asyncio

SAMPLER = "tinker://681a070d-2ef4-5b8c-a216-d4f22dca1efb:train:0/sampler_weights/000030"

async def ask(prompt):
    svc = tinker.ServiceClient(base_url=None)
    client = await svc.create_sampling_client_async(SAMPLER)
    resp = await client.sample_async(tinker.SampleRequest(
        model_input=tinker.ModelInput.from_text(prompt),
        sampling_params=tinker.SamplingParams(temperature=0.8, max_new_tokens=4096),
    ))
    return resp.completion_text

print(asyncio.run(ask("Write a fast Triton kernel for triangular matmul on H100.")))

Paper

Learning to Discover at Test Time
Mert Yuksekgonul*, Daniel Koceja*, Xinhao Li*, Federico Bianchi*, Jed McCaleb, Xiaolong Wang, Jan Kautz, Yejin Choi, James Zou†, Carlos Guestrin†, Yu Sun
Stanford · NVIDIA · Astera Institute · UC San Diego · Together AI
arXiv:2601.16175 · Project page · PDF

@article{ttt-discover2026,
  title   = {Learning to Discover at Test Time},
  author  = {Yuksekgonul, Mert and Koceja, Daniel and Li, Xinhao
             and Bianchi, Federico and McCaleb, Jed and Wang, Xiaolong
             and Kautz, Jan and Choi, Yejin and Zou, James
             and Guestrin, Carlos and Sun, Yu},
  journal = {arXiv preprint arXiv:2601.16175},
  year    = {2026}
}

Acknowledgments

  • GPU Mode — community for GPU kernel optimization and the TriMul competition
  • Tinker — LLM training and RL infrastructure by Thinking Machines
Downloads last month
-
Video Preview
loading

Model tree for Pran-Ker/gpu-mode-trimul

Adapter
(29)
this model

Paper for Pran-Ker/gpu-mode-trimul