Instructions to use bigscience/bloom with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bigscience/bloom with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="bigscience/bloom")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom") model = AutoModelForCausalLM.from_pretrained("bigscience/bloom") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use bigscience/bloom with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "bigscience/bloom" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigscience/bloom", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/bigscience/bloom
- SGLang
How to use bigscience/bloom with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "bigscience/bloom" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigscience/bloom", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "bigscience/bloom" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigscience/bloom", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use bigscience/bloom with Docker Model Runner:
docker model run hf.co/bigscience/bloom
Adding `safetensors` variant of this model
(Pushed a commit to simplify .gitattributes)
Fixed 72-of-72 and index (coming from automated: https://huggingface.co/bigscience/bloom/discussions/124
I am trying to load this - Is this the best way?
It seems to be "wasting" time downloading the bin files & not the model_00001-of-00072.safetensors files
>>> model = AutoModelForCausalLM.from_pretrained("bigscience/bloom", revision="3c9db305")
Downloading config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 568/568 [00:00<00:00, 830kB/s]
Downloading pytorch_model.bin.index.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 62.2k/62.2k [00:00<00:00, 836kB/s]
Downloading pytorch_model_00001-of-00072.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.70G/6.70G [03:40<00:00, 32.6MB/s]
Downloading pytorch_model_00002-of-00072.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.59G/4.59G [02:22<00:00, 34.6MB/s]
Downloading pytorch_model_00003-of-00072.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.59G/4.59G [02:16<00:00, 36.1MB/s]
Downloading pytorch_model_00004-of-00072.bin: 63%|████████████████████████████████████████████████████████████████████████▉ | 2.89G/4.59G [01:53<01:19, 22.9MB/s]
Also model = AutoModelForCausalLM.from_pretrained("bigscience/bloom", revision="refs/pr/121") raised transformers.utils.hub.RevisionNotFoundError: 404 Client Error: Revision Not Found for url: https://huggingface.co/bigscience/bloom/resolve/refs/pr/121/config.json though I was expecting it to work (https://github.com/huggingface/transformers/pull/19175#issuecomment-1262435883)
arf, you might need to url-encode like this then: AutoModelForCausalLM.from_pretrained("bigscience/bloom", revision="refs%2Fpr%2F121")
i.e., https://huggingface.co/bigscience/bloom/resolve/refs%2Fpr%2F121/config.json works
huggingface_hub should probably do it transparently for you though, cc @sgugger @Wauplin
@Muennighoff I think you can run model = AutoModelForCausalLM.from_pretrained("bigscience/bloom", revision="pr/121")
Maybe you're missing safetensors installation?
arf yes, does that one work?
Tried AutoModelForCausalLM.from_pretrained("bigscience/bloom-560m", revision="pr/26") and checked that in in my cache I only have config.json model.safetensors, ie not .bin file (commit: 98eedcbf588e70a016e5017934adfb280c5a0f58). In order to do so, I have safetensors installed first
Is safetensors available on your machine ?
I'm just starting looking to update our deployment with those files.
huggingface_hub should probably do it transparently for you though, cc @sgugger @Wauplin
nevermind, it actually already does: https://github.com/huggingface/huggingface_hub/blob/53ee96d57b4f5b11c9b8ecb6bf3ad5e2722c5dd4/src/huggingface_hub/file_download.py#L240
Note that you can use revision="refs/pr/6" or revision="refs/pr/6" indifferently, they're the same on git side (right @sbrandeis ?)
Okay, the PR does work : https://github.com/huggingface/transformers_bloom_parallel/pull/7
It's been validated and loads pretty fast without needing to rewrite the weights on disk ! (with a small slowdown)
Merging.
I'm just noticing now that there's no model.safetensors.index.json, is this expected? (as a result the model is not recognized by the Hub as a safetensors model)
Hum I'll check this again later. I think we should wait for https://github.com/huggingface/safetensors/pull/42 in order to fix the randomness that was previously happening before re-pushing a bunch of safetensors
Note that you can use
revision="refs/pr/6"orrevision="refs/pr/6"indifferently, they're the same on git side (right @sbrandeis ?)
hahaha lol
I think i meant revision="pr/6" or revision="refs/pr/6" (I guess?)
