Qwen3-VL-4B GSPO Fine-tuned

This model is fine-tuned from Qwen3-VL-4B-Instruct using GSPO (Group Sequence Policy Optimization).

πŸ“Š Training Details

  • Algorithm: GSPO (Group Sequence Policy Optimization)
  • Base Model: Qwen3-VL-4B-Instruct
  • Training Data: Quinn777/merged1031_simplified_o3_easy
  • Infrastructure: 2 nodes Γ— 8 H100 GPUs (16 GPUs total)
  • Checkpoint: global_step_250

Hyperparameters

  • Global Batch Size: 128
  • Rollout Batch Size: 256
  • Rollout Number: 4
  • Max Prompt Length: 4096
  • Max Response Length: 4096
  • Total Epochs: 10

πŸš€ Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "ttlynne/qwen3vl-4b-gspo-merged",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained(
    "ttlynne/qwen3vl-4b-gspo-merged",
    trust_remote_code=True
)

# Example inference
messages = [
    {"role": "user", "content": "Solve: 2x + 5 = 13"}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

πŸ“ˆ Performance

(Add your evaluation results here)

πŸ™ Acknowledgements

Downloads last month
14
Video Preview
loading

Model tree for ttlynne/qwen3vl-4b-gspo-merged

Finetuned
(140)
this model