Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

SeaWolf-AI 
posted an update 4 days ago
view post
Post
8550
🧬 Introducing Darwin-9B-NEG — the first model with Native Entropy Gating (NEG)

🔗 Try it now: FINAL-Bench/Darwin-9B-NEG
🔗 Q4 bit : FINAL-Bench/Darwin-9B-MFP4

We're thrilled to release Darwin-9B-NEG, a 9B-parameter reasoning model
that embeds an architecturally-internalised sense of self-confidence directly
into the transformer — our proprietary Native Entropy Gating (NEG) technology.

📊 GPQA Diamond (198 PhD-level questions):

▸ Baseline Darwin-9B (no NEG) → 51.01 %
▸ Pure NEG (greedy · 1× cost) → 63.64 % 🔥 +12.63 %p
▸ + Permutation (4× cost) → 76.26 %
▸ + Ensemble Refinement (~20×) → 84.34 % 🏆

With only 9 billion parameters and 1× inference cost, Pure NEG jumps
+12.63 %p over the same model without NEG. Going all-in with ensemble
refinement pushes it to 84.34 % — surpassing the published Qwen3.5-9B
leaderboard score (81.7 %) by +2.64 %p.

🔬 What makes NEG different from Multi-Turn Iteration (MTI)?

Classical MTI needs 3-8× extra inference passes. NEG instead lives
INSIDE the single decoding loop. Two tiny modules ride with the
transformer: NEG-Head predicts per-token entropy from the last hidden
state, and NEG-Gate conditionally restricts the top-k choice when
confidence is low. The gate activates in only 4.36 % of tokens —
essentially free at inference time.

✨ Key differentiators
• Architecturally internalised — model file *is* the feature
• 1× inference cost (vs. 3-8× for MTI)
• Drop-in with vLLM / SGLang / TGI / transformers — no extra engine
• +12.63 %p reasoning at zero latency overhead
• Single-file deployment, Apache 2.0 licensed

🧬 Lineage
Qwen/Qwen3.5-9B → Darwin-9B-Opus (V7 evolutionary merge) → Darwin-9B-NEG (V8 + NEG training)

#Darwin #NEG #NativeEntropyGating #GPQA #Reasoning #LLM #OpenSource #Apache2
projectlosangeles 
posted an update 1 day ago
view post
Post
7366
🔥Check out first-of-its-kind SOTA Orpheus Morpheus preview!🔥

projectlosangeles/Orpheus-Morpheus

Easily generate variations or similar compositions from any MIDI!

Please ❤️if you enjoyed Orpheus Morpheus!

Sincerely,

Alex

qgallouedec 
posted an update 1 day ago
view post
Post
4741

TRL v1.3 ships day-one training support for Qwen 3.6 🚀

The new Qwen 3.6 family (Qwen/Qwen3.6-27B, Qwen/Qwen3.6-35B-A3B) reuses the Qwen3.5-MoE architecture but ships a slightly different chat template, so we updated the stack end-to-end: new training template with {% generation %} markers, tool-call response schema routing, tiny test models for the VLM matrix.

SFT with assistant-only loss works out of the box:

from trl import SFTConfig, SFTTrainer

trainer = SFTTrainer(
    model="Qwen/Qwen3.6-27B",
    args=SFTConfig(assistant_only_loss=True),
    train_dataset=dataset,
)
trainer.train()


So does GRPO tool-calling — just hand tools=[...] to GRPOTrainer.

v1.3 also brings a new experimental TPO trainer (Triple Preference Optimization), speculative decoding in trl vllm-serve (Qwen3 MTP / Eagle3 drafts), 12 more KTO ↔ DPO alignment PRs (KTO promotion to stable is now in reach), three more {% generation %} chat templates (Gemma/Gemma 2, Phi-3, GLM-4-MoE), and a chunky SFT entropy bug fix.

Full release notes: https://github.com/huggingface/trl/releases/tag/v1.3.0
Enderchef 
posted an update 1 day ago
view post
Post
2286
Hi, everyone!
Please follow, like, and support the work of
CompactAI-O
!
Spread the word!
kanaria007 
posted an update about 16 hours ago
view post
Post
84
✅ Article highlight: *Continuous Audit Pipeline: Making Evidence Bundles Routine* (art-60-107, v0.1)

TL;DR:
This article argues that evidence bundles should not be an incident-only ritual.

If reconstructability matters only after something goes wrong, it is already too late. SI turns audit into a *continuous pipeline*: routine sealed bundles, immediate verification, retention-safe omissions, and automatic escalation when governance SLOs are breached.

Read:
kanaria007/agi-structural-intelligence-protocols

Why it matters:
• makes “courtroom-grade reconstructability” a routine byproduct of normal ops
• turns governance SLO breaches into explicit state transitions, not dashboard trivia
• separates stable audit spine from payload store, so erasure removes access without destroying proof
• prevents incident-time improvisation from breaking determinism, chain-of-custody, or export integrity

What’s inside:
• the operating model: *Audit Spine vs Payload Store*
• three routine bundle tiers: daily governance bundles, weekly compliance bundles, and triggered incident-ready bundles
• trigger rules where CAS / ACR / RBL / EOH breaches automatically emit bundles and degrade governance state
• an end-to-end pipeline: collect → shape/omit → canonicalize → digest → resolve refs → seal → sign → verify → retain
• a governed run record for continuous audit itself, including policy, trust, canonicalization, reason-code-set, and registry snapshot bindings

Key idea:
Do not wait until an incident to “prepare evidence.”

Make evidence production continuous, sealed, and self-verifying—so when something breaks, you select the window instead of inventing the proof.

*Continuous audit is not paperwork. It is a control loop on admissibility and autonomy.*
branikita 
posted an update about 16 hours ago
akhiilll 
posted an update 2 days ago
view post
Post
127
Just shipped ClaimSense Adjudication Gym at OpenEnv Hackathon 2026 (Scaler India).

An OpenEnv RL environment for enterprise insurance claims adjudication—the monthly “tool-heavy” workflow real adjusters do: pull policy + claim history, run fraud checks, verify purchase/transactions, then approve / deny / escalate under partial observability with long-horizon credit assignment.

Trained Qwen/Qwen2.5-1.5B-Instruct with:

Rollout evaluation on HF Jobs (A10G) and a random baseline for comparison
Real GRPO weight updates (TRL GRPOTrainer) with LoRA adapters and two independent reward functions (format + env replay)
Headline training evidence:

GRPO run: 80 steps, 640 rollouts, KL rises ~0 → ~0.06 (real weight updates), completion length shrinks (~25 → ~10).
Plots + logs are committed in the Space under runs/.
Live demo + repo + writeup linked below.

🔗 Env (Space URL): akhiilll/claims-env
🧪 Notebook: akhiilll/claims-env
📝 Blog: docs/HF_MINI_BLOG.md in the Space
victor 
posted an update 14 days ago
view post
Post
5119
Want to share my enthusiasm for zai-org/GLM-5.1 here too 🔥

I think we have it: our open source Claude Code = GLM-5.1 + Pi (https://pi.dev/) - Built a Three.js racing game to eval and it's extremely impressive. Thoughts:

- One-shot car physics with real drift mechanics (this is hard)

- My fav part: Awesome at self iterating (with no vision!) created 20+ Bun.WebView debugging tools to drive the car programmatically and read game state. Proved a winding bug with vector math without ever seeing the screen

- 531-line racing AI in a single write: 4 personalities, curvature map, racing lines, tactical drifting. Built telemetry tools to compare player vs AI speed curves and data-tuned parameters

- All assets from scratch: 3D models, procedural textures, sky shader, engine sounds, spatial AI audio!

- Can do hard math: proved road normals pointed DOWN via vector cross products, computed track curvature normalized by arc length to tune AI cornering speed

You are going to hear about this model a lot in the next months - open source let's go - and thanks z-ai🚀🚀
  • 4 replies
·
abidlabs 
posted an update Nov 3, 2025
view post
Post
10816
Why I think local, open-source models will eventually win.

The most useful AI applications are moving toward multi-turn agentic behavior: systems that take hundreds or even thousands of iterative steps to complete a task, e.g. Claude Code, computer-control agents that click, type, and test repeatedly.

In these cases, the power of the model is not how smart it is per token, but in how quickly it can interact with its environment and tools across many steps. In that regime, model quality becomes secondary to latency.

An open-source model that can call tools quickly, check that the right thing was clicked, or verify that a code change actually passes tests can easily outperform a slightly “smarter” closed model that has to make remote API calls for every move.

Eventually, the balance tips: it becomes impractical for an agent to rely on remote inference for every micro-action. Just as no one would tolerate a keyboard that required a network request per keystroke, users won’t accept agent workflows bottlenecked by latency. All devices will ship with local, open-source models that are “good enough” and the expectation will shift toward everything running locally. It’ll happen sooner than most people think.
  • 8 replies
·
yuriyvnv 
posted an update about 8 hours ago
view post
Post
31
🔊 Four Qwen3-ASR (0.6B and 1.7B) Fine-Tunes for Portuguese and Dutch.

Both the 1.7B and 0.6B variants of Alibaba's Qwen3-ASR, fine-tuned for European Portuguese and Dutch and bundled in a single collection.

🔗 Collection: https://huggingface.co/collections/yuriyvnv/qwen-asr-for-portuguese-and-dutch-17b-and-06b

Headline numbers — Common Voice 22 test, with the zero-shot baseline.
🇵🇹 Qwen3-ASR-1.7B-PT — 12.91% → 8.50% WER (-34%)
🇵🇹 Qwen3-ASR-0.6B-PT — 18.26% → 11.85% WER (-35%)
🇳🇱 Qwen3-ASR-1.7B-NL — 6.68% → 5.28% WER (-21%)
🇳🇱 Qwen3-ASR-0.6B-NL — 12.46% → 8.31% WER (-33%)

The 0.6B variants are the more interesting half of the release. They give up only a few WER points compared to the 1.7B at a third of the parameters — relevant for edge hardware, CPU inference, or anywhere keeping inference cost down. The Dutch 0.6B in particular lands at 8.3% WER on CV22, competitive with much larger systems.

The Dutch 1.7B started from a strong 6.7% zero-shot, so the absolute gain is smaller — Qwen already handles Dutch well, and the fine-tune mostly sharpens it on Common Voice's casing and punctuation conventions.

Training stuck close to Qwen's official SFT recipe (lr 2e-5, linear schedule, 2% warmup, bf16, gradient checkpointing on a single H100). The data is the differentiator: Common Voice 22 train + validation augmented with synthetic OpenAI-TTS speech, filtered by the WAVe multimodal embedding model that scores clips at the word level and drops the ones that don't align well with their transcripts.

📦 Full pipeline — synthetic data generation, WAVe filtering, training scripts, evaluation protocol — is open-source:
github.com/yuriyvnv/TTS-Augmented-ASR
@hf-audio .
#asr #speech #parakeet #nvidia #nemo #multilingual #fine-tuning #commonvoice