Straightforward dynamic FP8 quant using llmcompressor. Nice for performance on Hopper and Blackwell GPUs.

Tested with nightly vllm on November 25-26, 2025.

Downloads last month
48
Safetensors
Model size
24B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ramendik/Vistral-24B-Instruct-FP8