New blog post! An introduction to a little-known but highly effective model reduction method: ๐ง๐ฟ๐ถ๐บ๐บ๐ถ๐ป๐ดโ๏ธ We show how to reduce model size (we went up to 87.24% reduction) while preserving its performance.
We applied this technique to 16 different model families across several modalities to illustrate that it works on any architecture (as long as the embedding layer is the last one of the model) and on any modality involving text. From these 16 families, we generated over ๐ฑ,๐ฑ๐ฌ๐ฌ ๐บ๐ผ๐ป๐ผ๐น๐ถ๐ป๐ด๐๐ฎ๐น ๐บ๐ผ๐ฑ๐ฒ๐น๐ ๐ถ๐ป ๐ญ๐ฎ๐ฐ ๐ฑ๐ถ๐ณ๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ ๐น๐ฎ๐ป๐ด๐๐ฎ๐ด๐ฒ๐ ๐
Key takeaways from our experiments: 1๏ธโฃ Trimming does not require a GPU. Our models were obtained on a CPU. 2๏ธโฃ This method scales up to at least 4B parameters (we did not test beyond that). 3๏ธโฃ Trimmed model is smaller than the original while preserving its performance. If you observe a slight performance drop, just fine-tuned to recover or even surpass the original performance. 4๏ธโฃ For an equivalent compute budget, it is better to trim then fine-tune rather than fine-tuning the original model. Since the model is smaller, you can run more epochs/show more data and get in fine a better model than the original. 5๏ธโฃ Trimming is a competitive alternative to distillation and quantization. E.g. we obtained our alternative to DistilBERT in 9 minutes on CPU vs. 90 hours of GPU for the latter. 6๏ธโฃ Trimming could generate reasoning traces in the language of the trimmed model. This could be an alternative to generating traces in English and then translating them into the desired language.
And many other things (such as how much data are needed, the impact of the database used, the order in which it should be done, etc.) are available in the blogpost!
โ Dating apps do not allow us to control the profiles suggested to us based on our mutual search criteria โ ๐งฌ If you want to see if your soulmate has already existed, I have published a dataset of 59k anonymized public profiles
Are you looking for a female ML engineer who is looking for a male ML engineer and you can't find it on the apps ? You need to look for her, but more importantly, she needs to look for you. Personally, I'm looking for a physicist I'm encountering the same problem. I can't find it My answer : Paradox of choice of dating apps solved by patent โก WO2026082672 โก https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2026082672
J'ai du brevetรฉ pour te trouver et on se trouvera bientรดt !
ลifahane, a dual-inference medical classification demo, is now live on Spaces. It features side-by-side Turkish BERT and Qwen2.5 architectures for real-time evaluation of the "Classifier vs. LLM" trade-offs, all within a single space. The system utilizes a fine-tuned Turkish BERT for high-speed, cost-effective inference and the Qwen2.5-7B model for flexible multi-task reasoning, with support for department classification, condition analysis, urgency assessment, and rationale generation across 12 medical departments.
My Huggingface journey has been a trip! I wanted to take the time to thank each and every one of you for using my dataset and getting it to go as far as it did. Believe it or not, some neanderthal was and maybe still is trending on huggingface.
Not only did my dataset reach number one, my fine-tuned qwen3.5 model did as well. Top 10. Honestly, ain't much left to do here.
Y'all have given me the desire, no... the craving for more. I am absolutely obsessed with AI now. I want to tweak it... I want to take it apart, just to see what makes everything tick. I want to put it together like Frankenstein and his monster.
The only thing that's stopping this guy is compute. I don't mind spending every penny I have on this. I desperately want to drive AI forward, even just a little bit.
I never knew the clanker hater from a year ago would be saying this.
Thank you all from the bottom of my heart.
Looking forward to showing you what I'm cooking up next. @CompactAI is your only hint!
The QIE-Bbox-Studio demo is now live โ more precise and packed with more options. Users can manipulate images with object removal, design addition, and even move objects from one place to another, all in just 4-step fast inference.
Translating benchmarks is a painful process, requiring a lot of manual inspection and adjustments. You start from setting up the whole pipeline and adapting to every format type, including task specifics. There already exist some massive benchmarks, but they still have some simple (and sometimes silly) bugs, which can hurt the evaluations :( We present a novel automated translation framework to help with that!
Eastern and Southern European languages introduce richer linguistic structures compared to English and for benchmarks which heavily rely on grammatical coherence machine translation presents a risk of harming evaluations. We discover potential answer leakage or misleading through grammatical structure of the questions. Some benchmarks are also just outdated and need to be retranslated with newer and better models.
We present a framework with novel test-time scaling methods which allow to control time and cost investments, while at the same time mitigate the need for human-in-the-loop verification. While working on Ukrainian-focused MamayLM models, we had to translate 10+ benchmarks in a short span of time. Finding human evaluators is costly and time-consuming, same goes for using professional translators. With our pipeline we were able to do it in 3 days๐๏ธ
We hope our findings will help enable stronger multilingual evaluations and developments. We release all produced benchmarks on Hugging Face together with the source code and Arxiv paper ๐ค