Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
97
qiandy yang
qiandy
Follow
0 followers
ยท
28 following
AI & ML interests
None yet
Recent Activity
reacted
to
SeaWolf-AI
's
post
with ๐
about 4 hours ago
๐งฌ Darwin V6: Diagnostic-Guided Evolutionary Model Merging We are releasing Darwin-31B-Opus โ a reasoning-enhanced model merging Google's Gemma-4-31B-it and TeichAI's Claude Opus Distill using the Darwin V6 engine. Model: https://huggingface.co/FINAL-Bench/Darwin-31B-Opus Demo: https://huggingface.co/spaces/FINAL-Bench/Darwin-31B-Opus ๐ฌ What Darwin V6 Does Conventional merging tools (mergekit, etc.) apply a single ratio to all tensors. Set ratio=0.5 and all 1,188 tensors blend identically, with no distinction between which tensors matter for reasoning versus coding. Darwin V6 diagnoses both parents at the tensor level before merging. It measures Shannon entropy, standard deviation, and L2 norm for every tensor, then passes 5 diagnostic probes (REASONING, CODE, MATH, KNOWLEDGE, LANGUAGE) through the model to determine layer-wise functional importance. Each of the 1,188 tensors receives an independent optimal ratio. combined = static(entropy/std/norm) x 0.4 + probe(cosine_distance) x 0.6 final_ratio = mri_ratio x mri_trust + genome_ratio x (1 - mri_trust) When one parent is overwhelmingly superior for a tensor (ratio < 0.15 or > 0.85), Darwin transplants it directly without interpolation. The mri_trust parameter itself is optimized by CMA-ES evolutionary search, so optimal transplant intensity is determined automatically. After merging, a Health Check compares the child against both parents layer-by-layer to detect interference or function loss. ๐งฌ Parent Models Father: google/gemma-4-31B-it Mother: TeichAI/gemma-4-31B-it-Claude-Opus-Distill ๐งฌ Results Compared under identical conditions (same 50 questions, same seed, greedy, thinking mode): Father: 60.0% (30/50) Darwin-31B-Opus: 66.0% (33/50) โ +10% relative improvement ARC-Challenge: 82.89% (loglikelihood, zero-shot, 200 questions) Optimal genome found by evolution: ffn_ratio=0.93 โ FFN layers strongly favor Mother (Claude Opus Distill) block_5 (L50-L59)=0.86 and more...
reacted
to
SeaWolf-AI
's
post
with ๐
about 4 hours ago
Darwin-TTS: 3% of an LLM's Brain Makes TTS Speak with Emotion โ Zero Training We blended 3% of Qwen3-1.7B (LLM) FFN weights into Qwen3-TTS-1.7B's talker module. The result: emotionally enhanced speech synthesis โ with zero training, zero data, and zero GPU hours. Try the Demo: https://huggingface.co/spaces/FINAL-Bench/Darwin-TTS-1.7B-Cross Model Weights: https://huggingface.co/FINAL-Bench/Darwin-TTS-1.7B-Cross Full Research Article: https://huggingface.co/blog/FINAL-Bench/darwin-tts Qwen3-1.7B (LLM) and Qwen3-TTS-1.7B's talker share 100% identical architecture โ same hidden_size (2048), same layers (28), same heads (16). This enabled pure 1:1 weight blending across 84 FFN tensors with a single lerp operation. At 3% blend, emotion appears. At 5%, emotion intensifies. At 10%, the model breaks โ producing 655-second outputs for a 3-second sentence, because the LLM's "keep generating" pattern overwhelms the TTS stop signal. To our knowledge, this is the first training-free cross-modal weight transfer between an LLM and a TTS model. Prior work either requires adapter training (SmolTolk, 2025), fine-tuning (CSLM, 2025), or massive end-to-end compute (GPT-4o). Darwin-TTS achieves cross-modal capability transfer in under 2 minutes on CPU. The key insight: TTS models with LLM backbones already "think" in language. We're just restoring 3% of the original LLM's language understanding patterns โ particularly those related to emotional semantics and prosody planning. The code is three lines: load the model, load the LLM FFN, call p.lerp_(llm_weight, 0.03). creators of the Darwin Evolutionary Merge Framework. Darwin LLM V7 achieved GPQA Diamond 86.9% (HF Benchmark #3) through CMA-ES optimized FFN crossbreeding. Darwin-TTS extends this principle from LLM-to-LLM merging into cross-modal LLM-to-TTS transfer. Apache 2.0.
liked
a model
about 1 month ago
RoyalCities/Foundation-1
View all activity
Organizations
None yet
spaces
1
No application file
Agents
Qiqi Demo
๐ข
models
0
None public yet
datasets
0
None public yet