Text Generation
GGUF
conversational

chat template of Q8_0 wrong?

#4
by zeerd - opened

Load Q8_0 model by vllm 0.11.0, tool_call nearly could not be triggered.
Copy the chat template form the website and use "--chat-template" to set it . the vllm report :

vllm  | (APIServer pid=1) WARNING 11-11 22:29:50 [api_server.py:1654] It is different from official chat template '/models/Qwen3-32B/Qwen3-32B-Q8_0.gguf'. This discrepancy may lead to performance degradation.

But, the tool-call works fine.

Sign up or log in to comment