Text Generation
Transformers
Safetensors
PyTorch
llama
facebook
meta
llama-3
text-generation-inference
Instructions to use appvoid/llama-3-1b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use appvoid/llama-3-1b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="appvoid/llama-3-1b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("appvoid/llama-3-1b") model = AutoModelForCausalLM.from_pretrained("appvoid/llama-3-1b") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use appvoid/llama-3-1b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "appvoid/llama-3-1b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "appvoid/llama-3-1b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/appvoid/llama-3-1b
- SGLang
How to use appvoid/llama-3-1b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "appvoid/llama-3-1b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "appvoid/llama-3-1b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "appvoid/llama-3-1b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "appvoid/llama-3-1b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use appvoid/llama-3-1b with Docker Model Runner:
docker model run hf.co/appvoid/llama-3-1b
llama 3 1b
wip effort to make merging compatible llama model
comparison to palmer-004
| Component | palmer-004 | llama 3 1b | How to Make Second Similar to First |
|---|---|---|---|
| Total Layers | 22 (0 to 21) | 16 (0 to 15) | Add 6 more layers (16 to 21) with identical structure to existing layers |
| Embedding Layer | model.embed_tokens.weight | model.embed_tokens.weight | Already identical |
| Self-Attention Layers | 22 sets of (q_proj, k_proj, v_proj, o_proj) weights | 16 sets of (q_proj, k_proj, v_proj, o_proj) weights | Add 6 more sets of self-attention weights |
| MLP Layers | 22 sets of (gate_proj, up_proj, down_proj) weights | 16 sets of (gate_proj, up_proj, down_proj) weights | Add 6 more sets of MLP weights |
| Layer Normalization | 22 sets of (input_layernorm, post_attention_layernorm) weights | 16 sets of (input_layernorm, post_attention_layernorm) weights | Add 6 more sets of layer normalization weights |
| Final Normalization | model.norm.weight | model.norm.weight | Already identical |
| Language Model Head | lm_head.weight | lm_head.weight | Already identical |
| Layer Structure | Consistent across all 22 layers | Consistent across all 16 layers | Maintain the same structure when adding new layers |
| Hidden Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same hidden size |
| Attention Heads | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same number of attention heads |
| Intermediate MLP Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same intermediate MLP size |
| Position Embeddings | Not explicitly mentioned (might be part of embed_tokens) | Not explicitly mentioned (might be part of embed_tokens) | Ensure position embeddings support the maximum sequence length of the first model |
| Vocabulary Size | Determined by embed_tokens and lm_head dimensions | Determined by embed_tokens and lm_head dimensions | Already identical (assuming dimensions match) |
further investigation
there is not differences between these models but for some reason i'm constantly facing this error when doing passthrough:
Traceback (most recent call last):
File "/home/zeus/miniconda3/envs/cloudspace/bin/mergekit-yaml", line 8, in <module>
sys.exit(main())
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/teamspace/studios/this_studio/mergekit/mergekit/options.py", line 82, in wrapper
f(*args, **kwargs)
File "/teamspace/studios/this_studio/mergekit/mergekit/scripts/run_yaml.py", line 47, in main
run_merge(
File "/teamspace/studios/this_studio/mergekit/mergekit/merge.py", line 96, in run_merge
for _task, value in exec.run(quiet=options.quiet):
File "/teamspace/studios/this_studio/mergekit/mergekit/graph.py", line 197, in run
res = task.execute(**arguments)
File "/teamspace/studios/this_studio/mergekit/mergekit/io/tasks.py", line 86, in execute
raise RuntimeError(
RuntimeError: Tensor lm_head.weight required but not present in model meta-llama/Llama-3.2-1B
which seems odd given the following output layers from llama-3-1b:
model.embed_tokens.weight
model.layers.0.self_attn.q_proj.weight
model.layers.0.self_attn.k_proj.weight
model.layers.0.self_attn.v_proj.weight
model.layers.0.self_attn.o_proj.weight
model.layers.0.mlp.gate_proj.weight
model.layers.0.mlp.up_proj.weight
model.layers.0.mlp.down_proj.weight
model.layers.0.input_layernorm.weight
model.layers.0.post_attention_layernorm.weight
model.layers.1.self_attn.q_proj.weight
model.layers.1.self_attn.k_proj.weight
model.layers.1.self_attn.v_proj.weight
model.layers.1.self_attn.o_proj.weight
model.layers.1.mlp.gate_proj.weight
model.layers.1.mlp.up_proj.weight
model.layers.1.mlp.down_proj.weight
model.layers.1.input_layernorm.weight
model.layers.1.post_attention_layernorm.weight
model.layers.2.self_attn.q_proj.weight
model.layers.2.self_attn.k_proj.weight
model.layers.2.self_attn.v_proj.weight
model.layers.2.self_attn.o_proj.weight
model.layers.2.mlp.gate_proj.weight
model.layers.2.mlp.up_proj.weight
model.layers.2.mlp.down_proj.weight
model.layers.2.input_layernorm.weight
model.layers.2.post_attention_layernorm.weight
model.layers.3.self_attn.q_proj.weight
model.layers.3.self_attn.k_proj.weight
model.layers.3.self_attn.v_proj.weight
model.layers.3.self_attn.o_proj.weight
model.layers.3.mlp.gate_proj.weight
model.layers.3.mlp.up_proj.weight
model.layers.3.mlp.down_proj.weight
model.layers.3.input_layernorm.weight
model.layers.3.post_attention_layernorm.weight
model.layers.4.self_attn.q_proj.weight
model.layers.4.self_attn.k_proj.weight
model.layers.4.self_attn.v_proj.weight
model.layers.4.self_attn.o_proj.weight
model.layers.4.mlp.gate_proj.weight
model.layers.4.mlp.up_proj.weight
model.layers.4.mlp.down_proj.weight
model.layers.4.input_layernorm.weight
model.layers.4.post_attention_layernorm.weight
model.layers.5.self_attn.q_proj.weight
model.layers.5.self_attn.k_proj.weight
model.layers.5.self_attn.v_proj.weight
model.layers.5.self_attn.o_proj.weight
model.layers.5.mlp.gate_proj.weight
model.layers.5.mlp.up_proj.weight
model.layers.5.mlp.down_proj.weight
model.layers.5.input_layernorm.weight
model.layers.5.post_attention_layernorm.weight
model.layers.6.self_attn.q_proj.weight
model.layers.6.self_attn.k_proj.weight
model.layers.6.self_attn.v_proj.weight
model.layers.6.self_attn.o_proj.weight
model.layers.6.mlp.gate_proj.weight
model.layers.6.mlp.up_proj.weight
model.layers.6.mlp.down_proj.weight
model.layers.6.input_layernorm.weight
model.layers.6.post_attention_layernorm.weight
model.layers.7.self_attn.q_proj.weight
model.layers.7.self_attn.k_proj.weight
model.layers.7.self_attn.v_proj.weight
model.layers.7.self_attn.o_proj.weight
model.layers.7.mlp.gate_proj.weight
model.layers.7.mlp.up_proj.weight
model.layers.7.mlp.down_proj.weight
model.layers.7.input_layernorm.weight
model.layers.7.post_attention_layernorm.weight
model.layers.8.self_attn.q_proj.weight
model.layers.8.self_attn.k_proj.weight
model.layers.8.self_attn.v_proj.weight
model.layers.8.self_attn.o_proj.weight
model.layers.8.mlp.gate_proj.weight
model.layers.8.mlp.up_proj.weight
model.layers.8.mlp.down_proj.weight
model.layers.8.input_layernorm.weight
model.layers.8.post_attention_layernorm.weight
model.layers.9.self_attn.q_proj.weight
model.layers.9.self_attn.k_proj.weight
model.layers.9.self_attn.v_proj.weight
model.layers.9.self_attn.o_proj.weight
model.layers.9.mlp.gate_proj.weight
model.layers.9.mlp.up_proj.weight
model.layers.9.mlp.down_proj.weight
model.layers.9.input_layernorm.weight
model.layers.9.post_attention_layernorm.weight
model.layers.10.self_attn.q_proj.weight
model.layers.10.self_attn.k_proj.weight
model.layers.10.self_attn.v_proj.weight
model.layers.10.self_attn.o_proj.weight
model.layers.10.mlp.gate_proj.weight
model.layers.10.mlp.up_proj.weight
model.layers.10.mlp.down_proj.weight
model.layers.10.input_layernorm.weight
model.layers.10.post_attention_layernorm.weight
model.layers.11.self_attn.q_proj.weight
model.layers.11.self_attn.k_proj.weight
model.layers.11.self_attn.v_proj.weight
model.layers.11.self_attn.o_proj.weight
model.layers.11.mlp.gate_proj.weight
model.layers.11.mlp.up_proj.weight
model.layers.11.mlp.down_proj.weight
model.layers.11.input_layernorm.weight
model.layers.11.post_attention_layernorm.weight
model.layers.12.self_attn.q_proj.weight
model.layers.12.self_attn.k_proj.weight
model.layers.12.self_attn.v_proj.weight
model.layers.12.self_attn.o_proj.weight
model.layers.12.mlp.gate_proj.weight
model.layers.12.mlp.up_proj.weight
model.layers.12.mlp.down_proj.weight
model.layers.12.input_layernorm.weight
model.layers.12.post_attention_layernorm.weight
model.layers.13.self_attn.q_proj.weight
model.layers.13.self_attn.k_proj.weight
model.layers.13.self_attn.v_proj.weight
model.layers.13.self_attn.o_proj.weight
model.layers.13.mlp.gate_proj.weight
model.layers.13.mlp.up_proj.weight
model.layers.13.mlp.down_proj.weight
model.layers.13.input_layernorm.weight
model.layers.13.post_attention_layernorm.weight
model.layers.14.self_attn.q_proj.weight
model.layers.14.self_attn.k_proj.weight
model.layers.14.self_attn.v_proj.weight
model.layers.14.self_attn.o_proj.weight
model.layers.14.mlp.gate_proj.weight
model.layers.14.mlp.up_proj.weight
model.layers.14.mlp.down_proj.weight
model.layers.14.input_layernorm.weight
model.layers.14.post_attention_layernorm.weight
model.layers.15.self_attn.q_proj.weight
model.layers.15.self_attn.k_proj.weight
model.layers.15.self_attn.v_proj.weight
model.layers.15.self_attn.o_proj.weight
model.layers.15.mlp.gate_proj.weight
model.layers.15.mlp.up_proj.weight
model.layers.15.mlp.down_proj.weight
model.layers.15.input_layernorm.weight
model.layers.15.post_attention_layernorm.weight
model.norm.weight
lm_head.weight
- Downloads last month
- 7