| | --- |
| | license: mit |
| | tags: |
| | - 'vit ' |
| | - image classification |
| | - ggml |
| | --- |
| | |
| | # Vision Transformer (ViT) models for image classification converted to ggml format |
| |
|
| | [Available models](https://github.com/staghado/vit.cpp) |
| |
|
| | | Model | Disk | Mem | SHA | |
| | | --- | --- | --- | --- | |
| | | tiny | 12 MB | ~20 MB | `25ce65ff60e08a1a5b486685b533d79718e74c0f` | |
| | | small | 45 MB | ~52 MB | `7a9f85340bd1a3dcd4275f46d5ee1db66649700e` | |
| | | base | 174 MB | ~179 MB | `a10d29628977fe27691edf55b7238f899b8c02eb` | |
| | | large | 610 MB | ~597 MB | `5f27087930f21987050188f9dc9eea75ac607214` | |
| |
|
| | The models are pre-trained on ImageNet21k then finetuned on ImageNet1k |
| | with a patch size of 16 and an image size of 224. |
| |
|
| | For more information, visit: |
| |
|
| | https://github.com/staghado/vit.cpp |
| |
|