chat template of Q8_0 wrong?
#4
by
zeerd
- opened
Load Q8_0 model by vllm 0.11.0, tool_call nearly could not be triggered.
Copy the chat template form the website and use "--chat-template" to set it . the vllm report :
vllm | (APIServer pid=1) WARNING 11-11 22:29:50 [api_server.py:1654] It is different from official chat template '/models/Qwen3-32B/Qwen3-32B-Q8_0.gguf'. This discrepancy may lead to performance degradation.
But, the tool-call works fine.