view article Article Unbelievable! Run 70B LLM Inference on a Single 4GB GPU with This NEW Technique lyogavin • Nov 30, 2023 • 47
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 Text Generation • 32B • Updated Mar 15 • 1.1M • 732