Instructions to use User01110/testing-50M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use User01110/testing-50M with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="User01110/testing-50M")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("User01110/testing-50M")
model = AutoModelForCausalLM.from_pretrained("User01110/testing-50M")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use User01110/testing-50M with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "User01110/testing-50M"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "User01110/testing-50M",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/User01110/testing-50M

SGLang

How to use User01110/testing-50M with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "User01110/testing-50M" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "User01110/testing-50M",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "User01110/testing-50M" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "User01110/testing-50M",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use User01110/testing-50M with Docker Model Runner:
```
docker model run hf.co/User01110/testing-50M
```

testing-50M

This is an experimental instruction SFT run from SupraLabs/Supra-1.5-50M-Base-exp.

Training Setup

Field	Value
Base model	`SupraLabs/Supra-1.5-50M-Base-exp`
Base revision	`main`
Output repo	`User01110/testing-50M`
Sequence length	1024
Max optimizer steps	10,000
Per-device batch size	128
Gradient accumulation	4
Sample presentations per GPU	5,120,000
Max token slots per GPU	5,242,880,000
Learning rate	2.00e-04
Warmup steps	100
Weight decay	0.05
Save/push cadence	every 1,000 optimizer steps plus final
Loss masking	assistant-span-only from step 0
Loss logging	printed `loss` is normalized by gradient accumulation; `raw_sum` is the Trainer sum over 4 microbatches
Gate logging	novelty score if the loaded architecture exposes `last_gate`; otherwise `n/a`
Prompt format	ChatML
System prompt	`You are a helpful assistant.`

The stream randomly mixes the selected instruction, math, and coding sources. Sources are reopened after exhaustion and keep relooping until the 10,000-step training cap finishes, except Cutecat6152/python-data-basic, which is capped at 3 passes.

Listed source rows before relooping: 3,718,915. The 10,000-step training budget presents 5,120,000 examples per GPU.

Prompt Template Compatibility

The uploaded tokenizer includes the ChatML special tokens and chat template, so inference and future SFT should not require manually adding <|im_start|> or <|im_end|>.

ChatML messages are rendered as:

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{ user_message }<|im_end|>
<|im_start|>assistant

This script starts from the base checkpoint, adds <|im_start|> and <|im_end|> once as tokenizer special tokens, resizes embeddings once, saves the tokenizer with chat_template, disables automatic post-processing during pretokenized SFT, and keeps/saves the model context config with max_position_embeddings >= 1024.

The base model is loaded with pinned revision main so Transformers will not silently fetch a newer remote modeling file during training.

Complete inference example:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

repo = "User01110/testing-50M"
tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo,
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain what a neural network is in simple terms."},
]
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=False,
        temperature=0.7,
        top_k=40,
        top_p=0.95,
        repetition_penalty=1.2,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

new_tokens = output[0, inputs["input_ids"].shape[-1]:]
text = tokenizer.decode(new_tokens, skip_special_tokens=True).strip()
print(text)

Dataset Mix

Dataset	Config	Split	Rows	Schema	Mapping	Pass policy
nvidia/Nemotron-SFT-Instruction-Following-Chat-v2	default	reasoning_off	1,068,273	messages[{role, content, reasoning_content}]	user/assistant message pairs; reasoning_off only	reloops until max_steps
microsoft/orca-math-word-problems-200k	default	train	200,035	question, answer	user=question; assistant=answer	reloops until max_steps
TIGER-Lab/MathInstruct	default	train	262,039	source, instruction, output	user=instruction; assistant=output	reloops until max_steps
User01110/math-curated-dataset	default	train	50,944	id, source, prompt, index, model, response, chatml	user=prompt; assistant=response; rebuilds clean ChatML	reloops until max_steps
Programming-Language/codeagent-python	default	train	296,837	prompt, response	user=prompt; assistant=response	reloops until max_steps
Cutecat6152/python-data-basic	default	train	100	id, instruction, response	user=instruction; assistant=response	max 3 passes, 300 presentations max
flytech/python-codes-25k	default	train	49,626	instruction, input, output, text	user=instruction plus optional Input block; assistant=output	reloops until max_steps
QuixiAI/open-instruct-uncensored	default	train	1,756,115	dataset, id, messages[{role, content}]	user/assistant message pairs	reloops until max_steps
openai/gsm8k	main	train	7,473	question, answer	user=question; assistant=answer	reloops until max_steps
openai/gsm8k	socratic	train	7,473	question, answer	user=question; assistant=answer	reloops until max_steps
EleutherAI/arithmetic	10 validation subsets	validation raw JSONL	20,000	context, completion	user=context with trailing Answer: stripped; assistant=completion	reloops until max_steps

Notes

Dataset schemas and row counts were checked through Hugging Face Dataset Viewer metadata where available.
Multiturn/message datasets carry all assistant spans into the collator, so user/system text remains masked from step 0 while every assistant turn is supervised.
Streaming source open/read failures are retried and reopened. Normal stream exhaustion reopens that source and continues mixing it until max_steps; python-data-basic is dropped after 3 completed passes.
RoPE buffers and tokenizer/model load are verified during final export.

Downloads last month: 103

Safetensors

Model size

51.8M params

Tensor type

F32

Model tree for User01110/testing-50M

Base model

SupraLabs/Supra-1.5-50M-Base-exp

Finetuned

(6)

this model

User01110
/

testing-50M