Instructions to use User01110/testing-50M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use User01110/testing-50M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="User01110/testing-50M") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("User01110/testing-50M") model = AutoModelForCausalLM.from_pretrained("User01110/testing-50M") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use User01110/testing-50M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "User01110/testing-50M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "User01110/testing-50M", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/User01110/testing-50M
- SGLang
How to use User01110/testing-50M with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "User01110/testing-50M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "User01110/testing-50M", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "User01110/testing-50M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "User01110/testing-50M", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use User01110/testing-50M with Docker Model Runner:
docker model run hf.co/User01110/testing-50M
testing-50M
This is an experimental instruction SFT run from SupraLabs/Supra-1.5-50M-Base-exp.
Training Setup
| Field | Value |
|---|---|
| Base model | SupraLabs/Supra-1.5-50M-Base-exp |
| Base revision | main |
| Output repo | User01110/testing-50M |
| Sequence length | 1024 |
| Max optimizer steps | 10,000 |
| Per-device batch size | 128 |
| Gradient accumulation | 4 |
| Sample presentations per GPU | 5,120,000 |
| Max token slots per GPU | 5,242,880,000 |
| Learning rate | 2.00e-04 |
| Warmup steps | 100 |
| Weight decay | 0.05 |
| Save/push cadence | every 1,000 optimizer steps plus final |
| Loss masking | assistant-span-only from step 0 |
| Loss logging | printed loss is normalized by gradient accumulation; raw_sum is the Trainer sum over 4 microbatches |
| Gate logging | novelty score if the loaded architecture exposes last_gate; otherwise n/a |
| Prompt format | ChatML |
| System prompt | You are a helpful assistant. |
The stream randomly mixes the selected instruction, math, and coding sources. Sources are reopened after exhaustion and keep relooping until the 10,000-step training cap finishes, except Cutecat6152/python-data-basic, which is capped at 3 passes.
Listed source rows before relooping: 3,718,915. The 10,000-step training budget presents 5,120,000 examples per GPU.
Prompt Template Compatibility
The uploaded tokenizer includes the ChatML special tokens and chat template, so inference and future SFT should not require manually adding <|im_start|> or <|im_end|>.
ChatML messages are rendered as:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{ user_message }<|im_end|>
<|im_start|>assistant
This script starts from the base checkpoint, adds <|im_start|> and <|im_end|> once as tokenizer special tokens, resizes embeddings once, saves the tokenizer with chat_template, disables automatic post-processing during pretokenized SFT, and keeps/saves the model context config with max_position_embeddings >= 1024.
The base model is loaded with pinned revision main so Transformers will not silently fetch a newer remote modeling file during training.
Complete inference example:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
repo = "User01110/testing-50M"
tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
repo,
trust_remote_code=True,
torch_dtype="auto",
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain what a neural network is in simple terms."},
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
temperature=0.7,
top_k=40,
top_p=0.95,
repetition_penalty=1.2,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
new_tokens = output[0, inputs["input_ids"].shape[-1]:]
text = tokenizer.decode(new_tokens, skip_special_tokens=True).strip()
print(text)
Dataset Mix
| Dataset | Config | Split | Rows | Schema | Mapping | Pass policy |
|---|---|---|---|---|---|---|
| nvidia/Nemotron-SFT-Instruction-Following-Chat-v2 | default | reasoning_off | 1,068,273 | messages[{role, content, reasoning_content}] | user/assistant message pairs; reasoning_off only | reloops until max_steps |
| microsoft/orca-math-word-problems-200k | default | train | 200,035 | question, answer | user=question; assistant=answer | reloops until max_steps |
| TIGER-Lab/MathInstruct | default | train | 262,039 | source, instruction, output | user=instruction; assistant=output | reloops until max_steps |
| User01110/math-curated-dataset | default | train | 50,944 | id, source, prompt, index, model, response, chatml | user=prompt; assistant=response; rebuilds clean ChatML | reloops until max_steps |
| Programming-Language/codeagent-python | default | train | 296,837 | prompt, response | user=prompt; assistant=response | reloops until max_steps |
| Cutecat6152/python-data-basic | default | train | 100 | id, instruction, response | user=instruction; assistant=response | max 3 passes, 300 presentations max |
| flytech/python-codes-25k | default | train | 49,626 | instruction, input, output, text | user=instruction plus optional Input block; assistant=output | reloops until max_steps |
| QuixiAI/open-instruct-uncensored | default | train | 1,756,115 | dataset, id, messages[{role, content}] | user/assistant message pairs | reloops until max_steps |
| openai/gsm8k | main | train | 7,473 | question, answer | user=question; assistant=answer | reloops until max_steps |
| openai/gsm8k | socratic | train | 7,473 | question, answer | user=question; assistant=answer | reloops until max_steps |
| EleutherAI/arithmetic | 10 validation subsets | validation raw JSONL | 20,000 | context, completion | user=context with trailing Answer: stripped; assistant=completion | reloops until max_steps |
Notes
- Dataset schemas and row counts were checked through Hugging Face Dataset Viewer metadata where available.
- Multiturn/message datasets carry all assistant spans into the collator, so user/system text remains masked from step 0 while every assistant turn is supervised.
- Streaming source open/read failures are retried and reopened. Normal stream exhaustion reopens that source and continues mixing it until
max_steps;python-data-basicis dropped after 3 completed passes. - RoPE buffers and tokenizer/model load are verified during final export.
- Downloads last month
- 103
Model tree for User01110/testing-50M
Base model
SupraLabs/Supra-1.5-50M-Base-exp