Create README.md

Browse files

Files changed (1) hide show

README.md +53 -0

README.md ADDED Viewed

	@@ -0,0 +1,53 @@

+# Exporting to ExecuTorch
+⚠️ Note: These instructions only work on Arm-based machines. Running them on x86_64 will fail.
+We can run the 2-bit quantized model on a mobile phone using [ExecuTorch](https://github.com/pytorch/executorch), the PyTorch solution for mobile deployment.
+To set up ExecuTorch with TorchAO lowbit kernels, run the following commands:
+```
+git clone https://github.com/pytorch/executorch.git
+pushd executorch
+git submodule update --init --recursive
+python install_executorch.py
+USE_CPP=1 TORCHAO_BUILD_KLEIDIAI=1 pip install third-party/ao
+popd
+```
+(The above command works on Arm-based Mac; to use Arm-based Linux define the following environment variables before pip installing third-party/ao: BUILD_TORCHAO_EXPERIMENTAL=1 TORCHAO_BUILD_CPU_AARCH64=1 TORCHAO_BUILD_KLEIDIAI=1 TORCHAO_ENABLE_ARM_NEON_DOT=1 TORCHAO_PARALLEL_BACKEND=OPENMP).
+Now we export the model to ExecuTorch, using the TorchAO lowbit kernel backend.
+(Do not run these commands from a directory containing the ExecuTorch repo you cloned during setup, or python will use the local paths in the repo instead of the installed paths.)
+```shell
+# 1. Download QAT'd weights from HF
+HF_DIR=lvj/Phi-4-mini-instruct-parq-2b-weight-4b-embed-shared
+WEIGHT_DIR=$(hf download ${HF_DIR})
+# 2. Rename the weight keys to ones that ExecuTorch expects
+python -m executorch.examples.models.phi_4_mini.convert_weights $WEIGHT_DIR pytorch_model_converted.bin
+# 3. Download model config from the ExecuTorch repo
+curl -L -o phi_4_mini_config.json https://raw.githubusercontent.com/pytorch/executorch/main/examples/models/phi_4_mini/config/config.json
+# 4. Export the model to ExecuTorch pte file
+python -m executorch.examples.models.llama.export_llama \
+  --model "phi_4_mini" \
+  --checkpoint pytorch_model_converted.bin \
+  --params phi_4_mini_config.json \
+  --output_name phi4_model_2bit.pte \
+  -kv \
+  --use_sdpa_with_kv_cache \
+  --use-torchao-kernels \
+  --max_context_length 1024 \
+  --max_seq_length 256 \
+  --dtype fp32 \
+  --metadata '{"get_bos_id":199999, "get_eos_ids":[200020,199999]}'
+# # 5. (optional) Upload pte file to HuggingFace
+# hf upload ${HF_DIR} phi4_model_2bit.pte
+```
+Once you have the *.pte file, you can run it inside of our [iOS demo app](https://github.com/meta-pytorch/executorch-examples/tree/main/llm/apple) in a [few easy steps](https://github.com/meta-pytorch/executorch-examples/tree/main/llm/apple#build-and-run).