Instructions to use charioteer/Neural-phi2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use charioteer/Neural-phi2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="charioteer/Neural-phi2", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("charioteer/Neural-phi2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("charioteer/Neural-phi2", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use charioteer/Neural-phi2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "charioteer/Neural-phi2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "charioteer/Neural-phi2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/charioteer/Neural-phi2

SGLang

How to use charioteer/Neural-phi2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "charioteer/Neural-phi2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "charioteer/Neural-phi2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "charioteer/Neural-phi2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "charioteer/Neural-phi2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use charioteer/Neural-phi2 with Docker Model Runner:
```
docker model run hf.co/charioteer/Neural-phi2
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Model Card: Neural-phi2

Model Details

Model Name: Neural-phi2
Model Type: Large Language Model (LLM)
Model Architecture: A finetuned version of the Phi2 model from Microsoft, utilizing Direct Preference Optimization (DPO) on the distilabel-intel-orca-dpo-pairs dataset.
Model Size: Approximately 2B parameters
Training Data: The model was finetuned on the distilabel-intel-orca-dpo-pairs dataset, which consists of chat-like prompts and responses.
Training Procedure: The Phi2 model was finetuned using the DPO technique. The training process involved:
- Loading and formatting the distilabel-intel-orca-dpo-pairs dataset
- Defining the training configuration, including batch size, learning rate, and number of epochs
- Initializing the DPO Trainer and training the model
- Saving the finetuned model and tokenizer

Training Parameters

This section outlines the key training parameters used to finetune the Phi2 model from Microsoft using the Direct Preference Optimization (DPO) technique on the distilabel-intel-orca-dpo-pairs dataset, resulting in the Neural-phi2 model.

SFT Model Name: phi2-sft-alpaca_loraemb-right-pad
New Model Name: Neural-phi2-v2
Dataset: argilla/distilabel-intel-orca-dpo-pairs
Tokenizer: Custom tokenizer created from the phi2-sft-alpaca_loraemb-right-pad model
Quantization Config:
- load_in_4bit=True
- bnb_4bit_quant_type="nf4"
- bnb_4bit_compute_dtype=torch.float16
LoRA Config:
- r=16
- lora_alpha=64
- lora_dropout=0.05
- bias="none"
- task_type="CAUSAL_LM"
- target_modules=["q_proj", "k_proj", "v_proj", "dense", "fc1", "fc2"]
Training Arguments:
- per_device_train_batch_size=1
- gradient_accumulation_steps=8
- gradient_checkpointing=True
- learning_rate=5e-7
- lr_scheduler_type="linear"
- max_steps=500
- optim="paged_adamw_32bit"
- warmup_steps=100
- bf16=True
- report_to="wandb"
DPO Trainer:
- loss_type="sigmoid"
- beta=0.1
- max_prompt_length=768
- max_length=1024

Intended Use

The Neural-phi2 model is intended to be used as a general-purpose language model for a variety of natural language processing tasks, such as text generation, summarization, and question answering. It may be particularly useful in applications where the model needs to generate coherent and contextually appropriate responses, such as in chatbots or virtual assistants.

Sample Inference Code

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the Neural-phi2 model and tokenizer
model = AutoModelForCausalLM.from_pretrained("Neural-phi2")
tokenizer = AutoTokenizer.from_pretrained("Neural-phi2")

# Define a sample prompt
messages = [
    {"role": "system", "content": "You are a helpful chatbot assistant."},
    {"role": "user", "content": "Hello, how are you today?"}
]

# Format the prompt in ChatML format
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)

# Create a pipeline and generate a response
pipeline = transformers.pipeline("text-generation", model=model, tokenizer=tokenizer)
output = pipeline(
    prompt,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    num_return_sequences=1,
    max_new_tokens=100,
)

# Print the generated response
print(output[0]["generated_text"])

Limitations and Biases

As with any large language model, the Neural-phi2 model may exhibit biases present in its training data, such as societal biases or factual inaccuracies. Additionally, the model's performance may degrade for tasks or inputs that are significantly different from its training data. Users should carefully evaluate the model's outputs and make appropriate adjustments for their specific use cases.

Performance

The performance of the Neural-phi2 model has not been extensively evaluated or benchmarked as part of this project. Users should conduct their own evaluations to assess the model's suitability for their specific tasks and use cases.

Ethical Considerations

The use of large language models like Neural-phi2 raises several ethical considerations, such as the potential for generating harmful or biased content, the risk of misuse, and the importance of transparency and accountability. Users should carefully consider these ethical implications and take appropriate measures to mitigate potential harms.

Downloads last month: 12

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for charioteer/Neural-phi2

Quantizations

2 models

charioteer
/

Neural-phi2