wenhuach (wenhua cheng)

replied to their post about 1 month ago

Yes, I’m aware of the accuracy issue, and we’re actively investigating it. One potential fix is discussed in issue #1784. However, we still don't know whether this solution applies to your model or more broadly to other vlm models, since Qwen3.5 appears to work fine without it.

replied to their post about 1 month ago

sorry for the late reply. Is it fine now?

replied to their post about 2 months ago

Working on it. Gemma 4 had a regression issue, and it has just been fixed.

reacted to their post with 🔥 about 2 months ago

Post

4577

🚀 We provide **free** hardware to quantize models at the [Intel Low Bit Open LLM Leaderboard]( Intel/low_bit_open_llm_leaderboard), currently supporting Pure RTN mode powered by AutoRound

⭐ If you find it useful, please consider starring the AutoRound project on [GitHub](https://github.com/intel/auto-round)!

9 replies

·

replied to their post about 2 months ago

The devices are rented from cloud providers once a task comes in. We found that our target devices were currently sold out, and we are refining the logic accordingly. Thanks for the feedback.

posted an update about 2 months ago

Post

4577

🚀 We provide **free** hardware to quantize models at the [Intel Low Bit Open LLM Leaderboard]( Intel/low_bit_open_llm_leaderboard), currently supporting Pure RTN mode powered by AutoRound

⭐ If you find it useful, please consider starring the AutoRound project on [GitHub](https://github.com/intel/auto-round)!

9 replies

·

reacted to their post with 🔥 7 months ago

Post

3017

🚀 SignRoundV2 for LLM quantization: PTQ-level cost, QAT-level accuracy — yes, even at 2 bits.

SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs (2512.04746)

posted an update 7 months ago

Post

3017

🚀 SignRoundV2 for LLM quantization: PTQ-level cost, QAT-level accuracy — yes, even at 2 bits.

SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs (2512.04746)

reacted to their post with 🚀 8 months ago

Post

326

🚀 AutoRound(https://github.com/intel/auto-round) is now supported by SGLang!

After integrations with TorchAO, Transformers, and VLLM, AutoRound-quantized models are now officially compatible with SGLang — bringing faster and more flexible deployment to your LLM workflows.

💡 We’ve also enhanced the RTN mode (--iters 0), cutting quantization costs significantly for low-resource users.

⭐ Star our repo and stay tuned for more exciting updates!

posted an update 9 months ago

Post

326

🚀 AutoRound(https://github.com/intel/auto-round) is now supported by SGLang!

After integrations with TorchAO, Transformers, and VLLM, AutoRound-quantized models are now officially compatible with SGLang — bringing faster and more flexible deployment to your LLM workflows.

💡 We’ve also enhanced the RTN mode (--iters 0), cutting quantization costs significantly for low-resource users.

⭐ Star our repo and stay tuned for more exciting updates!

reacted to their post with 🚀 9 months ago

Post

1777

AutoRound keeps evolving its LLM quantization algorithm! 🚀
After enhancing W2A16 quantization, we now offer a fast algorithm to generate mixed bits/data-type schemes (~2mins for 8B models), great for MXFP4 and W2A16.
Learn more: https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme

posted an update 9 months ago

Post

1777

AutoRound keeps evolving its LLM quantization algorithm! 🚀
After enhancing W2A16 quantization, we now offer a fast algorithm to generate mixed bits/data-type schemes (~2mins for 8B models), great for MXFP4 and W2A16.
Learn more: https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme

posted an update 10 months ago

Post

429

AutoRound v0.7 is out! 🚀
This release includes enhanced algorithms for W2A16, NVFP4, and MXFP4, along with support for FP8 models as input.
👉 Check out the full details here: https://github.com/intel/auto-round/releases/tag/v0.7.0

reacted to their post with 🔥 12 months ago

Post

1953

🚀 AutoRound(https://github.com/intel/auto-round) Now Supports GGUF Export & Custom Bit Settings!

We're excited to announce that AutoRound now supports:
✅ GGUF format export – for seamless compatibility with popular inference engines.
✅ Custom bit settings – tailor quantization to your needs for optimal performance.

Check out these newly released models:
🔹Intel/Qwen3-235B-A22B-Instruct-2507-gguf-q4km-AutoRound
🔹Intel/Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound
🔹Intel/Kimi-K2-Instruct-gguf-q2ks-mixed-AutoRound

Stay tuned! An even more advanced algorithm for some configurations is coming soon.

posted an update 12 months ago

Post

1953

🚀 AutoRound(https://github.com/intel/auto-round) Now Supports GGUF Export & Custom Bit Settings!

We're excited to announce that AutoRound now supports:
✅ GGUF format export – for seamless compatibility with popular inference engines.
✅ Custom bit settings – tailor quantization to your needs for optimal performance.

Check out these newly released models:
🔹Intel/Qwen3-235B-A22B-Instruct-2507-gguf-q4km-AutoRound
🔹Intel/Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound
🔹Intel/Kimi-K2-Instruct-gguf-q2ks-mixed-AutoRound

Stay tuned! An even more advanced algorithm for some configurations is coming soon.

reacted to their post with 🔥 about 1 year ago

Post

1917

AutoRound(https://github.com/intel/auto-round) has been integrated into vLLM , allowing you to run AutoRound-formatted models directly in the upcoming release.

Beside, we strongly recommend using AutoRound to generate AWQ INT4 models, as AutoAWQ is no longer maintained and manually configuring new models is not trivial due to the need for custom layer mappings.

posted an update about 1 year ago

Post

1917

AutoRound(https://github.com/intel/auto-round) has been integrated into vLLM , allowing you to run AutoRound-formatted models directly in the upcoming release.

Beside, we strongly recommend using AutoRound to generate AWQ INT4 models, as AutoAWQ is no longer maintained and manually configuring new models is not trivial due to the need for custom layer mappings.

reacted to their post with 🤗 about 1 year ago

Post

1946

AutoRound(https://github.com/intel/auto-round) has been integrated into Transformers, allowing you to run AutoRound-formatted models directly in the upcoming release. Additionally, we are actively working on supporting the GGUF double-quant format, e.g. q4_k_s, stay tuned!

https://huggingface.co/blog/autoround

posted an update about 1 year ago

Post

1946

AutoRound(https://github.com/intel/auto-round) has been integrated into Transformers, allowing you to run AutoRound-formatted models directly in the upcoming release. Additionally, we are actively working on supporting the GGUF double-quant format, e.g. q4_k_s, stay tuned!

https://huggingface.co/blog/autoround

reacted to their post with 🚀 over 1 year ago

Post

2544

Check out [DeepSeek-R1 INT2 model( OPEA/DeepSeek-R1-int2-mixed-sym-inc). This 200GB DeepSeek-R1 model shows only about a 2% drop in MMLU, though it's quite slow due to kernel issue.

| | BF16 | INT2-mixed |
| ------------- | ------ | ---------- |
| mmlu | 0.8514 | 0.8302 |
| hellaswag | 0.6935 | 0.6657 |
| winogrande | 0.7932 | 0.7940 |
| arc_challenge | 0.6212 | 0.6084 |

wenhua cheng

AI & ML interests

Recent Activity

Organizations

wenhua cheng

AI & ML interests

Recent Activity

Organizations

wenhuach's activity