The devices are rented from cloud providers once a task comes in. We found that our target devices were currently sold out, and we are refining the logic accordingly. Thanks for the feedback.
wenhua cheng
wenhuach
AI & ML interests
Model Compression, CV
Recent Activity
repliedto their post about 11 hours ago
๐ We provide **free** hardware to quantize models at the [Intel Low Bit Open LLM Leaderboard](https://huggingface.co/spaces/Intel/low_bit_open_llm_leaderboard), currently supporting `Pure RTN mode` powered by AutoRound
โญ If you find it useful, please consider starring the AutoRound project on [GitHub](https://github.com/intel/auto-round)! posted an update 1 day ago
๐ We provide **free** hardware to quantize models at the [Intel Low Bit Open LLM Leaderboard](https://huggingface.co/spaces/Intel/low_bit_open_llm_leaderboard), currently supporting `Pure RTN mode` powered by AutoRound
โญ If you find it useful, please consider starring the AutoRound project on [GitHub](https://github.com/intel/auto-round)! new activity 2 days ago
Intel/gemma-4-31B-it-int4-AutoRound:INT8 version for TP=2 / dual Ampere GPUs?Organizations
replied to their post about 11 hours ago
posted an update 1 day ago
Post
1929
๐ We provide **free** hardware to quantize models at the [Intel Low Bit Open LLM Leaderboard]( Intel/low_bit_open_llm_leaderboard), currently supporting
โญ If you find it useful, please consider starring the AutoRound project on [GitHub](https://github.com/intel/auto-round)!
Pure RTN mode powered by AutoRoundโญ If you find it useful, please consider starring the AutoRound project on [GitHub](https://github.com/intel/auto-round)!
Post
3012
๐ SignRoundV2 for LLM quantization: PTQ-level cost, QAT-level accuracy โ yes, even at 2 bits.
SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs (2512.04746)
SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs (2512.04746)
posted an update 6 months ago
Post
3012
๐ SignRoundV2 for LLM quantization: PTQ-level cost, QAT-level accuracy โ yes, even at 2 bits.
SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs (2512.04746)
SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs (2512.04746)
Post
322
๐ AutoRound(https://github.com/intel/auto-round) is now supported by SGLang!
After integrations with TorchAO, Transformers, and VLLM, AutoRound-quantized models are now officially compatible with SGLang โ bringing faster and more flexible deployment to your LLM workflows.
๐ก Weโve also enhanced the RTN mode (--iters 0), cutting quantization costs significantly for low-resource users.
โญ Star our repo and stay tuned for more exciting updates!
After integrations with TorchAO, Transformers, and VLLM, AutoRound-quantized models are now officially compatible with SGLang โ bringing faster and more flexible deployment to your LLM workflows.
๐ก Weโve also enhanced the RTN mode (--iters 0), cutting quantization costs significantly for low-resource users.
โญ Star our repo and stay tuned for more exciting updates!
posted an update 7 months ago
Post
322
๐ AutoRound(https://github.com/intel/auto-round) is now supported by SGLang!
After integrations with TorchAO, Transformers, and VLLM, AutoRound-quantized models are now officially compatible with SGLang โ bringing faster and more flexible deployment to your LLM workflows.
๐ก Weโve also enhanced the RTN mode (--iters 0), cutting quantization costs significantly for low-resource users.
โญ Star our repo and stay tuned for more exciting updates!
After integrations with TorchAO, Transformers, and VLLM, AutoRound-quantized models are now officially compatible with SGLang โ bringing faster and more flexible deployment to your LLM workflows.
๐ก Weโve also enhanced the RTN mode (--iters 0), cutting quantization costs significantly for low-resource users.
โญ Star our repo and stay tuned for more exciting updates!
Post
1773
AutoRound keeps evolving its LLM quantization algorithm! ๐
After enhancing W2A16 quantization, we now offer a fast algorithm to generate mixed bits/data-type schemes (~2mins for 8B models), great for MXFP4 and W2A16.
Learn more: https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme
After enhancing W2A16 quantization, we now offer a fast algorithm to generate mixed bits/data-type schemes (~2mins for 8B models), great for MXFP4 and W2A16.
Learn more: https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme
posted an update 7 months ago
Post
1773
AutoRound keeps evolving its LLM quantization algorithm! ๐
After enhancing W2A16 quantization, we now offer a fast algorithm to generate mixed bits/data-type schemes (~2mins for 8B models), great for MXFP4 and W2A16.
Learn more: https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme
After enhancing W2A16 quantization, we now offer a fast algorithm to generate mixed bits/data-type schemes (~2mins for 8B models), great for MXFP4 and W2A16.
Learn more: https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme
posted an update 9 months ago
Post
426
AutoRound v0.7 is out! ๐
This release includes enhanced algorithms for W2A16, NVFP4, and MXFP4, along with support for FP8 models as input.
๐ Check out the full details here: https://github.com/intel/auto-round/releases/tag/v0.7.0
This release includes enhanced algorithms for W2A16, NVFP4, and MXFP4, along with support for FP8 models as input.
๐ Check out the full details here: https://github.com/intel/auto-round/releases/tag/v0.7.0
Post
1950
๐ AutoRound(https://github.com/intel/auto-round) Now Supports GGUF Export & Custom Bit Settings!
We're excited to announce that AutoRound now supports:
โ GGUF format export โ for seamless compatibility with popular inference engines.
โ Custom bit settings โ tailor quantization to your needs for optimal performance.
Check out these newly released models:
๐นIntel/Qwen3-235B-A22B-Instruct-2507-gguf-q4km-AutoRound
๐นIntel/Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound
๐นIntel/Kimi-K2-Instruct-gguf-q2ks-mixed-AutoRound
Stay tuned! An even more advanced algorithm for some configurations is coming soon.
We're excited to announce that AutoRound now supports:
โ GGUF format export โ for seamless compatibility with popular inference engines.
โ Custom bit settings โ tailor quantization to your needs for optimal performance.
Check out these newly released models:
๐นIntel/Qwen3-235B-A22B-Instruct-2507-gguf-q4km-AutoRound
๐นIntel/Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound
๐นIntel/Kimi-K2-Instruct-gguf-q2ks-mixed-AutoRound
Stay tuned! An even more advanced algorithm for some configurations is coming soon.
posted an update 10 months ago
Post
1950
๐ AutoRound(https://github.com/intel/auto-round) Now Supports GGUF Export & Custom Bit Settings!
We're excited to announce that AutoRound now supports:
โ GGUF format export โ for seamless compatibility with popular inference engines.
โ Custom bit settings โ tailor quantization to your needs for optimal performance.
Check out these newly released models:
๐นIntel/Qwen3-235B-A22B-Instruct-2507-gguf-q4km-AutoRound
๐นIntel/Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound
๐นIntel/Kimi-K2-Instruct-gguf-q2ks-mixed-AutoRound
Stay tuned! An even more advanced algorithm for some configurations is coming soon.
We're excited to announce that AutoRound now supports:
โ GGUF format export โ for seamless compatibility with popular inference engines.
โ Custom bit settings โ tailor quantization to your needs for optimal performance.
Check out these newly released models:
๐นIntel/Qwen3-235B-A22B-Instruct-2507-gguf-q4km-AutoRound
๐นIntel/Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound
๐นIntel/Kimi-K2-Instruct-gguf-q2ks-mixed-AutoRound
Stay tuned! An even more advanced algorithm for some configurations is coming soon.
Post
1915
AutoRound(https://github.com/intel/auto-round) has been integrated into vLLM , allowing you to run AutoRound-formatted models directly in the upcoming release.
Beside, we strongly recommend using AutoRound to generate AWQ INT4 models, as AutoAWQ is no longer maintained and manually configuring new models is not trivial due to the need for custom layer mappings.
Beside, we strongly recommend using AutoRound to generate AWQ INT4 models, as AutoAWQ is no longer maintained and manually configuring new models is not trivial due to the need for custom layer mappings.
posted an update about 1 year ago
Post
1915
AutoRound(https://github.com/intel/auto-round) has been integrated into vLLM , allowing you to run AutoRound-formatted models directly in the upcoming release.
Beside, we strongly recommend using AutoRound to generate AWQ INT4 models, as AutoAWQ is no longer maintained and manually configuring new models is not trivial due to the need for custom layer mappings.
Beside, we strongly recommend using AutoRound to generate AWQ INT4 models, as AutoAWQ is no longer maintained and manually configuring new models is not trivial due to the need for custom layer mappings.
Post
1944
AutoRound(https://github.com/intel/auto-round) has been integrated into Transformers, allowing you to run AutoRound-formatted models directly in the upcoming release. Additionally, we are actively working on supporting the GGUF double-quant format, e.g. q4_k_s, stay tuned!
https://huggingface.co/blog/autoround
https://huggingface.co/blog/autoround
posted an update about 1 year ago
Post
1944
AutoRound(https://github.com/intel/auto-round) has been integrated into Transformers, allowing you to run AutoRound-formatted models directly in the upcoming release. Additionally, we are actively working on supporting the GGUF double-quant format, e.g. q4_k_s, stay tuned!
https://huggingface.co/blog/autoround
https://huggingface.co/blog/autoround
Post
2542
Check out [DeepSeek-R1 INT2 model( OPEA/DeepSeek-R1-int2-mixed-sym-inc). This 200GB DeepSeek-R1 model shows only about a 2% drop in MMLU, though it's quite slow due to kernel issue.
| | BF16 | INT2-mixed |
| ------------- | ------ | ---------- |
| mmlu | 0.8514 | 0.8302 |
| hellaswag | 0.6935 | 0.6657 |
| winogrande | 0.7932 | 0.7940 |
| arc_challenge | 0.6212 | 0.6084 |
| | BF16 | INT2-mixed |
| ------------- | ------ | ---------- |
| mmlu | 0.8514 | 0.8302 |
| hellaswag | 0.6935 | 0.6657 |
| winogrande | 0.7932 | 0.7940 |
| arc_challenge | 0.6212 | 0.6084 |
posted an update about 1 year ago
Post
2542
Check out [DeepSeek-R1 INT2 model( OPEA/DeepSeek-R1-int2-mixed-sym-inc). This 200GB DeepSeek-R1 model shows only about a 2% drop in MMLU, though it's quite slow due to kernel issue.
| | BF16 | INT2-mixed |
| ------------- | ------ | ---------- |
| mmlu | 0.8514 | 0.8302 |
| hellaswag | 0.6935 | 0.6657 |
| winogrande | 0.7932 | 0.7940 |
| arc_challenge | 0.6212 | 0.6084 |
| | BF16 | INT2-mixed |
| ------------- | ------ | ---------- |
| mmlu | 0.8514 | 0.8302 |
| hellaswag | 0.6935 | 0.6657 |
| winogrande | 0.7932 | 0.7940 |
| arc_challenge | 0.6212 | 0.6084 |
posted an update over 1 year ago
Post
755
OPEA Space has released several quantized DeepSeek models, including INT2. Explore them here
OPEA/deepseek-6784a012d91191015587584a
OPEA/deepseek-6784a012d91191015587584a
replied to their post over 1 year ago
While that may be one reason, it doesn't fully explain why there are still many quantized models available for LLaMA 3.1 and LLaMA 3.3.
Post
2354
Are we the only providers of INT4 quantized models for Llama 3.2 VL?
OPEA/Llama-3.2-90B-Vision-Instruct-int4-sym-inc
OPEA/Llama-3.2-11B-Vision-Instruct-int4-sym-inc
OPEA/Llama-3.2-90B-Vision-Instruct-int4-sym-inc
OPEA/Llama-3.2-11B-Vision-Instruct-int4-sym-inc