SmolVLM2-500M GGUF for HaploAI

Vision Language Model (VLM) for on-device image understanding on iOS/macOS.

Model Overview

Property Value
Base Model HuggingFaceTB/SmolVLM2-500M-Video-Instruct
Format GGUF (llama.cpp compatible)
Quantization Q8_0
Model Size 437 MB
Vision Encoder 199 MB
Total Size ~636 MB
License Apache 2.0

Capabilities

  • Image Captioning: Describe what's in an image
  • Visual Q&A: Answer questions about images
  • Document/Text Extraction: Read and extract text from photos
  • Scene Understanding: Analyze visual content and context

Files

File Size Description
SmolVLM2-500M-Video-Instruct-Q8_0.gguf 437 MB Main language model (Q8_0 quantized)
mmproj-SmolVLM2-500M-Video-Instruct-f16.gguf 199 MB Vision encoder (f16 precision)

Usage

Download URLs

Main Model: https://huggingface.co/jc-builds/smolvlm2-500m-gguf/resolve/main/SmolVLM2-500M-Video-Instruct-Q8_0.gguf
Vision Encoder: https://huggingface.co/jc-builds/smolvlm2-500m-gguf/resolve/main/mmproj-SmolVLM2-500M-Video-Instruct-f16.gguf

With llama.cpp

# Load both the main model and vision encoder
./llama-cli -m SmolVLM2-500M-Video-Instruct-Q8_0.gguf \
            --mmproj mmproj-SmolVLM2-500M-Video-Instruct-f16.gguf \
            --image your_image.jpg \
            -p "Describe this image in detail"

With HaploAI (iOS/macOS)

This model is automatically available in HaploAI for on-device image understanding. Simply attach an image and ask questions about it.

Prompt Format

SmolVLM uses the <image> token to mark where the image should be processed:

<image>
What is shown in this image?

Performance Notes

  • Memory Usage: ~800 MB during inference (model + vision encoder + context)
  • Speed: Fast inference suitable for mobile devices
  • Quality: Good balance of size vs capability for on-device use

License

This model is distributed under the Apache 2.0 license, which permits:

  • Commercial use
  • Modification
  • Distribution
  • Patent use
  • Private use

Attribution

Related Models

For higher quality (larger size), see:

  • SmolVLM2-2.2B (~2 GB) - Coming soon
Downloads last month
37
GGUF
Model size
0.4B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for jc-builds/smolvlm2-500m-gguf