chat template of Q8_0 wrong?

by zeerd - opened Nov 12, 2025

Nov 12, 2025

Load Q8_0 model by vllm 0.11.0, tool_call nearly could not be triggered.
Copy the chat template form the website and use "--chat-template" to set it . the vllm report :

vllm  | (APIServer pid=1) WARNING 11-11 22:29:50 [api_server.py:1654] It is different from official chat template '/models/Qwen3-32B/Qwen3-32B-Q8_0.gguf'. This discrepancy may lead to performance degradation.

But, the tool-call works fine.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment