metascroy commited on
Commit
934a383
·
1 Parent(s): 96cbe73

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # Exporting to ExecuTorch
3
+
4
+ ⚠️ Note: These instructions only work on Arm-based machines. Running them on x86_64 will fail.
5
+
6
+ We can run the 2-bit quantized model on a mobile phone using [ExecuTorch](https://github.com/pytorch/executorch), the PyTorch solution for mobile deployment.
7
+
8
+ To set up ExecuTorch with TorchAO lowbit kernels, run the following commands:
9
+ ```
10
+ git clone https://github.com/pytorch/executorch.git
11
+ pushd executorch
12
+ git submodule update --init --recursive
13
+ python install_executorch.py
14
+ USE_CPP=1 TORCHAO_BUILD_KLEIDIAI=1 pip install third-party/ao
15
+ popd
16
+ ```
17
+
18
+ (The above command works on Arm-based Mac; to use Arm-based Linux define the following environment variables before pip installing third-party/ao: BUILD_TORCHAO_EXPERIMENTAL=1 TORCHAO_BUILD_CPU_AARCH64=1 TORCHAO_BUILD_KLEIDIAI=1 TORCHAO_ENABLE_ARM_NEON_DOT=1 TORCHAO_PARALLEL_BACKEND=OPENMP).
19
+
20
+ Now we export the model to ExecuTorch, using the TorchAO lowbit kernel backend.
21
+ (Do not run these commands from a directory containing the ExecuTorch repo you cloned during setup, or python will use the local paths in the repo instead of the installed paths.)
22
+
23
+ ```shell
24
+ # 1. Download QAT'd weights from HF
25
+ HF_DIR=lvj/Phi-4-mini-instruct-parq-2b-weight-4b-embed-shared
26
+ WEIGHT_DIR=$(hf download ${HF_DIR})
27
+
28
+ # 2. Rename the weight keys to ones that ExecuTorch expects
29
+ python -m executorch.examples.models.phi_4_mini.convert_weights $WEIGHT_DIR pytorch_model_converted.bin
30
+
31
+ # 3. Download model config from the ExecuTorch repo
32
+ curl -L -o phi_4_mini_config.json https://raw.githubusercontent.com/pytorch/executorch/main/examples/models/phi_4_mini/config/config.json
33
+
34
+ # 4. Export the model to ExecuTorch pte file
35
+ python -m executorch.examples.models.llama.export_llama \
36
+ --model "phi_4_mini" \
37
+ --checkpoint pytorch_model_converted.bin \
38
+ --params phi_4_mini_config.json \
39
+ --output_name phi4_model_2bit.pte \
40
+ -kv \
41
+ --use_sdpa_with_kv_cache \
42
+ --use-torchao-kernels \
43
+ --max_context_length 1024 \
44
+ --max_seq_length 256 \
45
+ --dtype fp32 \
46
+ --metadata '{"get_bos_id":199999, "get_eos_ids":[200020,199999]}'
47
+
48
+ # # 5. (optional) Upload pte file to HuggingFace
49
+ # hf upload ${HF_DIR} phi4_model_2bit.pte
50
+ ```
51
+
52
+ Once you have the *.pte file, you can run it inside of our [iOS demo app](https://github.com/meta-pytorch/executorch-examples/tree/main/llm/apple) in a [few easy steps](https://github.com/meta-pytorch/executorch-examples/tree/main/llm/apple#build-and-run).
53
+