Instructions to use physical-intelligence/fast with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use physical-intelligence/fast with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("physical-intelligence/fast", dtype="auto") - Notebooks
- Google Colab
- Kaggle
AutoProcessor.from_pretrained fails on transformers >= 5.0 due to `bpe_tokenizer` attribute / repo-layout mismatch
Bug
AutoProcessor.from_pretrained("physical-intelligence/fast", trust_remote_code=True) fails ontransformers >= 5.0.0 with the following error:
ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a `tokenizers` library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece or tiktoken installed to convert a slow tokenizer to a fast one.
The error is misleading β sentencepiece and tiktoken being installed does NOT fix it. The real
cause is a path-layout mismatch (see below).
Repro
# env: transformers==5.2.0, tokenizers latest, both sentencepiece+tiktoken installed
from transformers import AutoProcessor
proc = AutoProcessor.from_pretrained("physical-intelligence/fast", trust_remote_code=True)
# -> ValueError as above
Direct loading via PreTrainedTokenizerFast.from_pretrained("physical-intelligence/fast")
or AutoTokenizer.from_pretrained("physical-intelligence/fast") works fine β only theAutoProcessor flow fails.
Root cause
UniversalActionProcessor declares its tokenizer attribute as bpe_tokenizer:
class UniversalActionProcessor(ProcessorMixin):
attributes: ClassVar[list[str]] = ["bpe_tokenizer"]
bpe_tokenizer_class: str = "AutoTokenizer"
In transformers >= 5.0, ProcessorMixin._load_tokenizer_from_pretrained
(processing_utils.py L1453β1471)
checks is_primary = sub_processor_type == "tokenizer". Since the attribute is namedbpe_tokenizer (not tokenizer), is_primary is False, and the loader is forced to look in
a <repo>/bpe_tokenizer/ subfolder:
tokenizer_subfolder = os.path.join(subfolder, sub_processor_type) if subfolder else sub_processor_type
tokenizer = auto_processor_class.from_pretrained(
pretrained_model_name_or_path, subfolder=tokenizer_subfolder, **kwargs,
)
The physical-intelligence/fast repo puts tokenizer.json, tokenizer_config.json,special_tokens_map.json at the root, not in a bpe_tokenizer/ subdirectory. Hence the
subfolder lookup returns nothing, and PreTrainedTokenizerFast.__init__ falls through to its
"couldn't instantiate backend" error path.
This is confirmed by HuggingFace cache .no_exist/ markers β after the failed load, the cache
records that bpe_tokenizer/tokenizer.json, bpe_tokenizer/tokenizer_config.json etc.
"do not exist" in the repo.
In transformers < 5.0 the subfolder check was less strict and the loader could fall back to
root, which is why this hasn't surfaced before.
Proposed fix (repo side)
Move the three tokenizer files into a bpe_tokenizer/ subdirectory of the repo:
physical-intelligence/fast/
βββ processing_action_tokenizer.py
βββ processor_config.json
βββ README.md
βββ bpe_tokenizer/
βββ tokenizer.json
βββ tokenizer_config.json
βββ special_tokens_map.json
This is a pure layout change, no code change, and is backward-compatible with anyone usingPreTrainedTokenizerFast.from_pretrained directly (as long as they pass subfolder="bpe_tokenizer").
Alternative: rename the attribute in processing_action_tokenizer.py from bpe_tokenizer
to tokenizer, which would make it is_primary and load from root. But the first option is
less invasive for downstream users.
Local workaround (until upstream is fixed)
DEST=~/.cache/huggingface/FAST_processor_local
SRC=~/.cache/huggingface/hub/models--physical-intelligence--fast/snapshots/<commit_hash>
mkdir -p $DEST/bpe_tokenizer
cp -L $SRC/{processing_action_tokenizer.py,processor_config.json} $DEST/
cp -L $SRC/{tokenizer.json,tokenizer_config.json,special_tokens_map.json} $DEST/bpe_tokenizer/
# Then load via local absolute path (bypasses HF Hub manifest check)
from transformers import AutoProcessor
proc = AutoProcessor.from_pretrained(DEST, trust_remote_code=True)
A symlink-based fix inside the HF cache snapshot dir does NOT work because HF Hub checks the
remote manifest in addition to the local cache; subfolder arguments are validated against
the remote file list, which still lacks bpe_tokenizer/.
Env
transformers==5.2.0
tokenizers==<...>
sentencepiece==0.2.1
tiktoken==0.12.0
python==3.10
Try LeRobot's official version: lerobot/fast-action-tokenizer
That one should work fine.