Micko

Chovus

AI & ML interests

None yet

Recent Activity

reacted to BlueNipples's post with 👀 3 days ago

Good news, llama.cpp seems to be close to supporting MTP on qwen models. Bad news, every single gguf will have to be redone when it is.

upvoted an article 10 days ago

Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents

liked a model 13 days ago

vrgamedevgirl84/LTX_2.3_90s_Animation_Style_LoRa

View all activity

Organizations

None yet

reacted to BlueNipples's post with 👀 3 days ago

Post

2881

Good news, llama.cpp seems to be close to supporting MTP on qwen models. Bad news, every single gguf will have to be redone when it is.

1 reply

upvoted an article 10 days ago

Article

Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents

11 days ago

•

liked a model 13 days ago

vrgamedevgirl84/LTX_2.3_90s_Animation_Style_LoRa

Text-to-Video • Updated 16 days ago • 410 • 5

upvoted a changelog 19 days ago

Hugging Face Changelog

Spaces agents.md for your coding agents

22 days ago

• 247

liked a model 21 days ago

ggml-org/Qwen3.6-35B-A3B-GGUF

35B • Updated 21 days ago • 4.67k • 7

reacted to prithivMLmods's post with ❤️ 25 days ago

Post

6193

A new comparator on Spaces showcases Standard FLUX.2 Decoder vs. FLUX.2 Small Decoder. The Small Decoder is ~1.4× faster, uses ~1.4× less VRAM, and maintains near-identical image quality. It has ~28M parameters with narrower channels [96, 192, 384, 384] vs. [128, 256, 512, 512], and the demo supports sequence generation by running both decoders simultaneously and comparing the results side by side.

🤗 Comparator: prithivMLmods/Flux.2-4B-Decoder-Comparator
🔗 FLUX.2-small-decoder: black-forest-labs/FLUX.2-small-decoder
🔗 GitHub: https://github.com/PRITHIVSAKTHIUR/Flux.2-4B-Encoder-Comparator
🚁 Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

🤗 > App built on the Gradio SDK. To learn more, visit the app page or the respective model pages.

liked 2 models about 1 month ago

Kijai/PrecompiledWheels

Updated Jul 22, 2025 • 77

Jackrong/Qwopus3.5-27B-v3

Image-Text-to-Text • 27B • Updated 23 days ago • 18k • 227

updated a dataset about 1 month ago

Chovus/harmony_prepared_100_samples

Viewer • Updated Mar 28 • 100 • 5

published a dataset about 1 month ago

Chovus/harmony_prepared_100_samples

Viewer • Updated Mar 28 • 100 • 5

liked a model 2 months ago

Sikaworld1990/gemma-3-12b-it-abliterated-sikaworld-high-fidelity-edition-Ltx-2

Updated Mar 11 • 13

New activity in Kijai/LTX2.3_comfy 2 months ago

Workflow : FLF2V - First-Last Frame & First-Middle-Last Frame

❤️👍 9

#17 opened 2 months ago by

RuneXX

liked a model 2 months ago

RuneXX/LTX-2.3-Workflows

Image-to-Video • Updated about 17 hours ago • 491

reacted to NJX-njx's post with 🔥 2 months ago

Post

7520

Recently, I have open-sourced an AI emotional companion product based on openclaw, called opensoul.

On this platform, you can create a "soulmate" that matches your personality, and configure it with the skills, tools you want it to have, as well as the platforms it can integrate with (such as Telegram, Discord, etc.).
You can even create group chats, invite multiple agents and your friends to chat about recent events, discuss projects together, and so on.

On the one hand, I hope it can better accompany you in daily life by virtue of its unique memory mechanism, self-feedback and iteration mechanism, and the modeling of users' emotions. On the other hand, I also hope it can help you better handle your work with its unique skills, tools and ability to deal with complex task scenarios.

Although the entire product has taken shape, I think there are still many areas that need adjustment and optimization. I also hope to rely on the strength of the community to do a good job in AI emotional companionship.

This is the project introduction URL: https://opensoul-web.vercel.app
This is the GitHub project URL: https://github.com/NJX-njx/opensoul
@AdinaY @lilianweng@burtenshaw@clem
let's just do it

24 replies

upvoted an article 3 months ago

Article

GGML and llama.cpp join HF to ensure the long-term progress of Local AI

Feb 20

•

505

reacted to marksverdhei's post with 👍 3 months ago

Post

4595

Poll: Will 2026 be the year of subquadratic attention?

The transformer architecture is cursed by its computational complexity.
It is why you run out of tokens and have to compact. But some would argue that this is a feature not a bug and that this is also why these models are so good. We've been doing a lot of research on trying to make equally good models that are computationally cheaper, But so far, none of the approaches have stood the test of time. Or so it seems.

Please vote, don't be shy. Remember that the Dunning-Kruger effect is very real, so the person who knows less about transformers than you is going to vote. We want everyone's opinion, no matter confidence.

👍 if you think at least one frontier model* will have no O(n^2) attention by the end of 2026
🔥 If you disagree

* Frontier models - models that match / outperform the flagship claude, gemini or chatgpt at the time on multiple popular benchmarks

4 replies

reacted to mitkox's post with 👍 3 months ago

Post

4819

I just pushed Claude Code Agent Swarm with 20 coding agents on my desktop GPU workstation.

With local AI, I don’t have /fast CC switch, but I have /absurdlyfast:
- 100’499 tokens/second read, yeah 100k, not a typo | 811 tok/sec generation
- KV cache: 707’200 tokens
- Hardware: 5+ year old GPUs 4xA6K gen1; It’s not the car. It’s the driver.

Qwen3 Coder Next AWQ with cache at BF16. Scores 82.1% in C# on 29-years-in-dev codebase vs Opus 4.5 at only 57.5%. When your codebase predates Stack Overflow, you don't need the biggest model; you need the one that actually remembers Windows 95.

My current bottleneck is my 27" monitor. Can't fit all 20 Theos on screen without squinting.

3 replies

liked a model 3 months ago

RuneXX/LTX-2-Workflows

Updated Mar 28 • 280

reacted to DavidAU's post with 🔥 4 months ago

Post

9783

SAVANT COMMANDER: 48B-A4B , 256k Context, GATED MOE.

I am going to showcase some other people's tuning work, that I have put into a GATED Distill MOE (Qwen3) ; 256 K context. Special thanks to all the tuners (listed in the model tree and repo page with special shoutout to "TeichAI" - using Unsloth for a lot of the Distills in this model):

Savant Commander is a specialized MOE model that allows you to control which expert(s) (of 12) are assigned to your use case(s) / prompt(s) ... directly (by name(s)), as opposed to having the "choices" made for you.

The model is composed of 12 DISTILLS (compressed 12x4B MOE) of top closed (GPT5.1, OpenAI 120 GPT Oss, Gemini (3), Claude (2) ) and open source models (Kimi, GLM, Deepseek, Command-A, JanV1 ) all in one.

256k Context, 2 experts activated.

PS: There is also a "heretic" / "decensored" version too ; listed on this model page.

DavidAU/Qwen3-48B-A4B-Savant-Commander-GATED-12x-Closed-Open-Source-Distill-GGUF