Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

sagar007ย 
posted an update about 20 hours ago
view post
Post
833
๐Ÿš€ I built a Multimodal Vision-Language Model from scratch using Gemma-270M + CLIP!

Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results!

๐Ÿ”ง What I Built:
A vision-language model that can understand images and answer questions about them, combining:
- Google Gemma-3-270M (language)
- OpenAI CLIP ViT-Large/14 (vision)
- LoRA fine-tuning for efficiency

๐Ÿ“Š Training Stats:
- 157,712 training samples (full LLaVA dataset)
- 3 epochs on A100 40GB
- ~9 hours training time
- Final loss: 1.333 training / 1.430 validation
- Only 18.6M trainable params (3.4% of 539M total)

๐Ÿ“ˆ sagar007/multigemma
Benchmark Results:
- VQA Accuracy: 53.8%
- Works great for: animal detection, room identification, scene understanding



๐Ÿ”— **Try it yourself:**
- ๐Ÿค— Model: sagar007/multigemma
- ๐ŸŽฎ Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma
- ๐Ÿ’ป GitHub: https://github.com/sagar431/multimodal-gemma-270m

Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD!

Would love to hear your feedback! ๐Ÿ™

#multimodal #gemma #clip #llava #vision-language #pytorch
  • 2 replies
ยท
MonsterMMORPGย 
posted an update 1 day ago
view post
Post
1298
Compared Quality and Speed Difference (with CUDA 13 & Sage Attention) of BF16 vs GGUF Q8 vs FP8 Scaled vs NVFP4 for Z Image Turbo, FLUX Dev, FLUX SRPO, FLUX Kontext, FLUX 2 - Full 4K step by step tutorial also published

Full 4K tutorial : https://youtu.be/XDzspWgnzxI

Check above full 4K tutorial to learn more and see uncompressed original quality and size images

It was always wondered how much quality and speed difference exists between BF16, GGUF, FP8 Scaled and NVFP4 precisions. In this tutorial I have compared all these precision and quantization variants for both speed and quality. The results are pretty surprising. Moreover, we have developed and published NVFP4 model quant generator app and FP8 Scaled quant generator apps. The links of the apps are below if you want to use them. Furthermore, upgrading ComfyUI to CUDA 13 with properly compiled libraries is now very much recommended. We have observed some noticeable performance gains with CUDA 13. So for both SwarmUI and ComfyUI solo users, CUDA 13 ComfyUI is now recommended.
  • 3 replies
ยท
Sunny111ย 
posted an update 2 days ago
view post
Post
1322
Are you familiar with reverse residual connections or looping in language models?

Excited to share my Looped-GPT blog post and codebase ๐Ÿš€
https://github.com/sanyalsunny111/Looped-GPT

TL;DR: looping during pre-training improves generalization.

Plot shows GPT2 LMs pre-trained with 15.73B OWT tokens

P.S. This is my first post here โ€” I have ~4 followers and zero expectations for reach ๐Ÿ˜„
  • 2 replies
ยท
davidmezzettiย 
posted an update 2 days ago
unmodeled-tylerย 
posted an update 1 day ago
view post
Post
1169
NEW MODEL: vanta-research/mox-tiny-1

Excited to share the first model from the new Mox-1 model family from VANTA Research!

Mox-Tiny-1 is finetuned from Llama's 3.1 8B architecture specifically for thinking partnership and high-fidelity AI interaction.

Training domains were carefully selected with this goal in mind, and all datasets used were synthetically generated for the Mox-1 models specifically.

Domains include but not limited to:
- Direct opinions (clear stances on various topics)
- Constructive disagreement (Pushing back on flawed premises)
- Epistemic confidence (calibrated certainty and uncertainty)
- Warmth & presence (Empathetic, human responses)
- Wonder & puzzlement (Intellectual curiosity)

Interaction Examples:

Direct opinions:
User: What's the best programming language?
Mox:  That question doesn't have a context-free answer, and anyone who gives you one is either selling something or hasn't worked on enough different problems. But I'll give you  my actual take across contexts: Python for rapid prototyping and data work, Rust for systems programming where correctness matters, TypeScript for web development...

Warmth & Humor:
User: I've been debugging this issue for six hours. 
Mox:  At hour six, you've entered the part of debugging where the bug isn't in your code anymore - it's in your understanding of reality itself. The semicolon starts looking suspicious. Show me what you're dealing with. Fresh eyes help, and mine haven't been staring at it since lunch.

Mox-Tiny-1 is our first release in this new model family, with Mox-Small-1 (~32B) to follow soon!

  • 2 replies
ยท
efecelikย 
posted an update 2 days ago
view post
Post
2348
My First MCP Server: DataView
Browse HuggingFace datasets directly from your AI assistant.
-Search & filter datasets
-View rows & stats
-SQL queries & Parquet export
efecelik/dataview-mcp
AdinaYย 
posted an update 2 days ago
view post
Post
825
After a VLM, StepFun dropped a new audio model: Step-Audio-R1.1, enabling thinking while speaking ๐Ÿ”ฅ

stepfun-ai/Step-Audio-R1.1

โœจ Apache 2.0
โœจ Combines dual-brain architecture and acoustic-grounded reasoning to enable real-time dialogue with SOTA-level reasoning
  • 2 replies
ยท
zc277584121ย 
posted an update 3 days ago
view post
Post
3045
We've open-sourced a bilingual Semantic Highlighting model that can power multiple production scenarios:

1) RAG Answer Highlighting โ€” Automatically highlight the exact sentences that answer user queries, improving interpretability and helping users quickly locate relevant information.
2) RAG Noise Filtering โ€” Prune irrelevant context before sending to LLMs, achieving 70-80% token cost reduction while improving answer quality by letting the model focus on what matters.
3) Search System Highlighting โ€” Add semantic highlighting features to recommendation systems, e-commerce search, or any retrieval system where users need to see why a result is relevant.

Try it out: zilliz/semantic-highlight-bilingual-v1
Read our article: https://huggingface.co/blog/zilliz/zilliz-semantic-highlight-model
kanaria007ย 
posted an update 1 day ago
view post
Post
1219
โœ… New Article: *Observations, Under-Observation, and Repair Loops* (v0.1)

Title:
๐Ÿ‘๏ธ Observations, Under-Observation, and Repair Loops: The OBS Cookbook for SI-Core
๐Ÿ”— https://huggingface.co/blog/kanaria007/observations-under-observation

---

Summary:
SI-Coreโ€™s rule is simple: *No effectful Jump without PARSED observations.*
This article turns that slogan into an operational design: define *observation units* (sem_type/scope/status/confidence/backing_refs), detect *under-observation* (missing / degraded / biased), and run *repair loops* instead of โ€œjumping in the dark.โ€

Key clarification: under-observed conditions may still run *read / eval_pre / jump-sandbox*, but must not commit or publish (sandbox: publish_result=false, memory_writes=disabled).

---

Why It Matters:
โ€ข Prevents โ€œwe had logs, so we had contextโ€ failures: *logs โ‰  observations* unless typed + contract-checked
โ€ข Makes safety real: even PARSED observations should be gated by *coverage/confidence minima* (declared thresholds)
โ€ข Turns OBS into something measurable: *SCover_obs + SInt* become โ€œOBS healthโ€ and safe-mode triggers
โ€ข Links semantic compression to reality: distinguish *missing raw* vs *compression loss*, and fix the right thing

---

Whatโ€™s Inside:
โ€ข A practical observation-status taxonomy: PARSED / DEGRADED / STUB / ESTIMATED / MISSING / REDACTED / INVALID (+ mapping to core status)
โ€ข Per-jump *observation contracts* (required sem_types, allowed statuses, age/confidence limits) + explicit fallback actions
โ€ข Fallback patterns: *safe-mode / conservative default / sandbox-only / human-in-loop*
โ€ข Repair loops as first-class: ledgered obs.repair_request, PLB proposals, governance review for contract changes
โ€ข Testing OBS itself: property tests, chaos drills, golden-diff for observation streams

---

๐Ÿ“– Structured Intelligence Engineering Series
this is the *โ€œhow to operate OBSโ€* layerโ€”so the system can *know when it doesnโ€™t know* and repair over time.
rajkumarrawalย 
posted an update 2 days ago