Instructions to use jalva182/cli-agent-model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jalva182/cli-agent-model with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("unsloth/llama-3-8b-Instruct")
model = PeftModel.from_pretrained(base_model, "jalva182/cli-agent-model")

Transformers

How to use jalva182/cli-agent-model with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="jalva182/cli-agent-model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("jalva182/cli-agent-model", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use jalva182/cli-agent-model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jalva182/cli-agent-model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jalva182/cli-agent-model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/jalva182/cli-agent-model

SGLang

How to use jalva182/cli-agent-model with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "jalva182/cli-agent-model" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jalva182/cli-agent-model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "jalva182/cli-agent-model" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jalva182/cli-agent-model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use jalva182/cli-agent-model with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jalva182/cli-agent-model to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jalva182/cli-agent-model to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for jalva182/cli-agent-model to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="jalva182/cli-agent-model",
    max_seq_length=2048,
)

Docker Model Runner
How to use jalva182/cli-agent-model with Docker Model Runner:
```
docker model run hf.co/jalva182/cli-agent-model
```

CLI Agent — Llama 3 8B GRPO Fine-tune

A LoRA adapter fine-tuned on Meta-Llama-3-8B-Instruct using GRPO (Group Relative Policy Optimization) to generate correct Linux shell commands from natural language task descriptions.

Model Details

Model Description

Developed by: Jose Alvarez, Carson Chiem, Prisha Bhattacharyya, Vishal Tyagi
Model type: Causal Language Model (LoRA adapter)
Language(s) (NLP): English
License: Meta Llama 3 Community License
Finetuned from model: unsloth/llama-3-8b-Instruct

Model Sources

Repository: https://github.com/Alvarez-Jose/unsloth-grpo-project

Uses

Direct Use

Given a natural language description of a CLI task, the model outputs the correct shell command with no explanation, no markdown, and no backticks.

Example:

Input: "Count the number of lines in /tmp/data/log.txt"
Output: wc -l /tmp/data/log.txt

Out-of-Scope Use

Not intended for general conversation
Not suitable for tasks outside Linux CLI command generation
Should not be used for destructive or malicious shell commands

Bias, Risks, and Limitations

Model may generate incorrect or harmful shell commands — always review before executing
Trained on a limited set of ~60 task types, may not generalize to all CLI scenarios
Performance degrades on complex multi-step tasks

How to Get Started with the Model

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="jalva182/cli-agent-model",
    max_seq_length=512,
    load_in_4bit=True,
)

messages = [
    {"role": "system", "content": "You are a CLI expert. Given a task, output exactly the shell commands required. No explanation, no markdown, no backticks."},
    {"role": "user", "content": "Count the number of lines in /tmp/data/log.txt"},
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(input_ids=inputs, max_new_tokens=64)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

60 validated CLI tasks covering file operations, text processing (grep, awk, sed), sorting, archives, system info, permissions, and environment variables. Each task includes setup commands, expected output, and a reward function for GRPO training.

Training Hyperparameters

Training regime: bf16 mixed precision
Method: GRPO (Group Relative Policy Optimization)
Learning rate: 3e-6 with linear scheduler
Warmup ratio: 0.1
Batch size: 2 (per device)
Gradient accumulation steps: 2
Total steps: 10000
LoRA rank: 32, alpha: 64
KL coefficient: 0.05
Number of generations: 4
Max sequence length: 512

Speeds, Sizes, Times

Training time: ~3h 13min
Checkpoint size: ~524MB (LoRA adapter only)
Final train loss: 0.0141
Final reward: 8.0/8.0 on easy tasks, ~6.0 average

Evaluation

Metrics

Reward function scoring 0-8 per task:

+5 for correct output match
+3 for command success with partial match
-2 for command failure or wrong output

Results

Best reward: 8.0
Average reward (final steps): ~6.0
Train loss: 0.0141

Environmental Impact

Hardware Type: H100 SXM 80GB
Hours used: ~3.5 hours
Cloud Provider: Vast.ai

Technical Specifications

Model Architecture

Base: Meta-Llama-3-8B-Instruct
Adapter: LoRA (rank=32, alpha=64, dropout=0.05)
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Software

unsloth 2026.3.3
trl 0.24.0
transformers 4.56.1
torch 2.6.0+cu124
PEFT 0.18.1

Model Card Authors

Jose Alvarez

Model Card Contact

https://github.com/Alvarez-Jose/unsloth-grpo-project

Framework versions

PEFT 0.18.1

Downloads last month: -

Model tree for jalva182/cli-agent-model

Base model

unsloth/llama-3-8b-Instruct

Adapter

(288)

this model