view article Article Prefill and Decode for Concurrent Requests - Optimizing LLM Performance Apr 16, 2025 • 59
view article Article A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes Aug 17, 2022 • 122
view article Article Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA +3 May 24, 2023 • 171