Edit Models filters

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Mixture of Experts

Carbon Emissions

Models

119

Full-text search

Active filters: vLLM

QuantTrio/GLM-4.5V-AWQ

Image-Text-to-Text • 17B • Updated Aug 25, 2025 • 744 • 19

QuantTrio/Seed-OSS-36B-Instruct-AWQ

Text Generation • 36B • Updated Sep 15, 2025 • 385 • 7

QuantTrio/Seed-OSS-36B-Instruct-GPTQ-Int8

Text Generation • 36B • Updated Sep 15, 2025 • 132 • 4

QuantTrio/Seed-OSS-36B-Instruct-GPTQ-Int4

Text Generation • 36B • Updated Sep 15, 2025 • 28 • 5

QuantTrio/Seed-OSS-36B-Instruct-GPTQ-Int3

Text Generation • 34B • Updated Sep 15, 2025 • 8 • 3

amakhov/tiny-random-llama

Text Generation • 4.18M • Updated Aug 21, 2025 • 16

QuantTrio/KAT-V1-40B-AWQ

Text Generation • 41B • Updated Sep 5, 2025 • 4 • 2

QuantTrio/DeepSeek-V3.1-AWQ

Text Generation • 485B • Updated Aug 27, 2025 • 810 • 5

QuantTrio/DeepSeek-V3.1-AWQ-Fp16Mix

Text Generation • 286B • Updated Aug 27, 2025 • 5 • 1

QuantTrio/DeepSeek-V3.1-AWQ-Lite

Text Generation • 684B • Updated Sep 5, 2025 • 27 • 3

JunHowie/Qwen3-4B-Instruct-2507-GPTQ-Int8

Text Generation • 4B • Updated Sep 4, 2025 • 2.88k

JunHowie/Qwen3-4B-Thinking-2507-GPTQ-Int4

Text Generation • 4B • Updated Sep 4, 2025 • 259 • 1

JunHowie/Qwen3-4B-Thinking-2507-GPTQ-Int8

Text Generation • 4B • Updated Sep 4, 2025 • 243 • 2

JunHowie/Qwen3-30B-A3B-Instruct-2507-GPTQ-Int4

Text Generation • 31B • Updated Sep 8, 2025 • 7.4k

JunHowie/Qwen3-30B-A3B-Instruct-2507-GPTQ-Int8

Text Generation • 31B • Updated Sep 8, 2025 • 6

JunHowie/Qwen3-30B-A3B-Thinking-2507-GPTQ-Int4

Text Generation • 31B • Updated Sep 8, 2025 • 49

JunHowie/Qwen2-7B-Instruct-GPTQ-Int4

Text Generation • 8B • Updated Sep 3, 2025 • 8

JunHowie/Qwen2-7B-Instruct-GPTQ-Int8

Text Generation • 8B • Updated Sep 3, 2025 • 52

EliovpAI/Deepseek-R1-0528-Qwen3-8B-FP8-KV

Text Generation • 8B • Updated Sep 18, 2025 • 7

JunHowie/Qwen3-30B-A3B-Thinking-2507-GPTQ-Int8

Text Generation • 31B • Updated Sep 8, 2025 • 27

JunHowie/Seed-OSS-36B-Instruct-GPTQ-Int4

Text Generation • 36B • Updated Sep 15, 2025 • 6

JunHowie/Seed-OSS-36B-Instruct-GPTQ-Int8

Text Generation • 36B • Updated Sep 15, 2025 • 4

QuantTrio/Qwen3-VL-235B-A22B-Instruct-AWQ

Text Generation • 236B • Updated Oct 8, 2025 • 823 • 11

QuantTrio/Qwen3-VL-235B-A22B-Instruct-FP8

Text Generation • Updated Oct 8, 2025 • 31

QuantTrio/Qwen3-VL-235B-A22B-Thinking-AWQ

Text Generation • 236B • Updated Oct 8, 2025 • 422 • 6

QuantTrio/Qwen3-VL-235B-A22B-Thinking-FP8

Text Generation • 236B • Updated Oct 8, 2025 • 93

QuantTrio/DeepSeek-V3.2-Exp-AWQ

Text Generation • 486B • Updated Oct 1, 2025 • 48 • 4

QuantTrio/DeepSeek-V3.2-Exp-AWQ-Lite

Text Generation • 685B • Updated Oct 1, 2025 • 77 • 4

QuantTrio/GLM-4.6-AWQ

Text Generation • 50B • Updated Oct 2, 2025 • 3.22k • 5

QuantTrio/GLM-4.6-GPTQ-Int4-Int8Mix

Text Generation • 69B • Updated Oct 3, 2025 • 212 • 4