RTX PRO 4000 Blackwell Community

community

local inference, llama.cpp, vllm, quantisation, GGUF, Blackwell architecture, workstation GPU, single-GPU workflows

Jakal-au updated a Space 2 days ago

Jakal-au published a Space 3 days ago

Organization Card

Hardware Specs

Architecture

Blackwell (GB203)

VRAM

24 GB GDDR7 ECC

Bandwidth

672 GB/s

CUDA Cores

8,960

Tensor Cores

280 (5th Gen)

TDP

145W

Form Factor

Single-slot

PCIe

Gen 5 x16

Inference benchmarks: tok/s across models, quant levels, and backends (llama.cpp, vLLM, TensorRT-LLM)
Quantisation compatibility: which GGUF/NVFP4/FP8 quants fit in 24 GB and how they perform
Workstation configs: systemd services, Docker setups, thermal management, multi-GPU pairing
Real-world results from owner-operated hardware, not cloud benchmarks