Post
64
š Gemma-4-A4B 98e v5-coder ā code-leaning 20.8B MoE (4B-active), C6 layer-relevance-weighted prune of Gemma 4 26B-A4B. Best 20B-class coder I've shipped.
š SCORES (NVFP4A16, vLLM 0.20.2, greedy, EVAL_PROTOCOL v3)
HumanEval 98.17 ā HumanEval+ 92.68 ā LCB-medium-55 v4 85.45
MATH-500 92.00 ā GPQA-D 68.69 ā IFEval 94.00
vs v4: +1.22 HE / +1.22 HE+ / +7.27 LCB-medium
Top of the 14ā22B coder band: +8.6pp HE over Qwen2.5-Coder-14B-Instruct (89.6 ā 98.17). HE+ sanity-audited ā no memorization, no silent-empty.
š¦ EXTENSIVE GGUF SWEEP (16 plain + IQ tiers + 5 CD recipes, all imatrix-calibrated)
Q8_0 ā 21.16 GB ā 93.90% (cohort top)
Q4_K_S ā 12.21 GB ā 93.29% ā plain sweet spot
IQ4_XS ā 11.01 GB ā 93.29% ā sub-12 GB top
ā TWO EXCELLENT SUB-10 GB CONTRIBDYNAMIC CD PICKS (per-layer + IQ-codebook overrides)
CD-IQ4_K_M (Canary W) ā 10.29 GB ā 92.07% ā recommended sub-11 GB
CD-IQ3_XS_L ā 9.27 GB ā 90.24% ā smallest viable code-grade
āļø SAME-RIG vs Qwen2.5-Coder-14B-Instruct (RTX 3090, greedy HE+)
11 GB band: v5-coder IQ4_XS wins +9.75pp at -1.49 bpw
12 GB band: Q4_K_S wins +8.53pp
8 GB band: IQ2_S wins +0.61pp at lower bpw
bf16:
ManniX-ITA/gemma-4-A4B-98e-v5-coder-it
GGUF:
ManniX-ITA/gemma-4-A4B-98e-v5-coder-it-GGUF
NVFP4A16:
ManniX-ITA/gemma-4-A4B-98e-v5-coder-NVFP4A16
Ollama:
https://ollama.com/mannix/gemma4-98e-v5-coder
āāā
š BONUS ā Qwen3.6-27B-Omnimerge-v4-MTP-GGUF
Same v4 weights with the native MTP head retained for llama.cpp speculative decoding (PR #22673, --spec-type draft-mtp). 7 imatrix tiers Q8_0 ā IQ2_M.
HumanEval: 2.0x decode tok/s
MBPP: 2.33x decode tok/s
Both at +1-2pp pass@1 vs the non-MTP build. GPQA Diamond comparison in flight.
MTP-GGUF:
ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-MTP-GGUF
š SCORES (NVFP4A16, vLLM 0.20.2, greedy, EVAL_PROTOCOL v3)
HumanEval 98.17 ā HumanEval+ 92.68 ā LCB-medium-55 v4 85.45
MATH-500 92.00 ā GPQA-D 68.69 ā IFEval 94.00
vs v4: +1.22 HE / +1.22 HE+ / +7.27 LCB-medium
Top of the 14ā22B coder band: +8.6pp HE over Qwen2.5-Coder-14B-Instruct (89.6 ā 98.17). HE+ sanity-audited ā no memorization, no silent-empty.
š¦ EXTENSIVE GGUF SWEEP (16 plain + IQ tiers + 5 CD recipes, all imatrix-calibrated)
Q8_0 ā 21.16 GB ā 93.90% (cohort top)
Q4_K_S ā 12.21 GB ā 93.29% ā plain sweet spot
IQ4_XS ā 11.01 GB ā 93.29% ā sub-12 GB top
ā TWO EXCELLENT SUB-10 GB CONTRIBDYNAMIC CD PICKS (per-layer + IQ-codebook overrides)
CD-IQ4_K_M (Canary W) ā 10.29 GB ā 92.07% ā recommended sub-11 GB
CD-IQ3_XS_L ā 9.27 GB ā 90.24% ā smallest viable code-grade
āļø SAME-RIG vs Qwen2.5-Coder-14B-Instruct (RTX 3090, greedy HE+)
11 GB band: v5-coder IQ4_XS wins +9.75pp at -1.49 bpw
12 GB band: Q4_K_S wins +8.53pp
8 GB band: IQ2_S wins +0.61pp at lower bpw
bf16:
ManniX-ITA/gemma-4-A4B-98e-v5-coder-it
GGUF:
ManniX-ITA/gemma-4-A4B-98e-v5-coder-it-GGUF
NVFP4A16:
ManniX-ITA/gemma-4-A4B-98e-v5-coder-NVFP4A16
Ollama:
https://ollama.com/mannix/gemma4-98e-v5-coder
āāā
š BONUS ā Qwen3.6-27B-Omnimerge-v4-MTP-GGUF
Same v4 weights with the native MTP head retained for llama.cpp speculative decoding (PR #22673, --spec-type draft-mtp). 7 imatrix tiers Q8_0 ā IQ2_M.
HumanEval: 2.0x decode tok/s
MBPP: 2.33x decode tok/s
Both at +1-2pp pass@1 vs the non-MTP build. GPQA Diamond comparison in flight.
MTP-GGUF:
ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-MTP-GGUF