4 13

Young Sik Hong

RICHARDYHONG

AI & ML interests

None yet

Recent Activity

liked a Space about 15 hours ago

FINAL-Bench/all-bench-leaderboard

upvoted a collection about 15 hours ago

DARWIN-Family

reacted to SeaWolf-AI's post with ❤️ 2 days ago

Why This Matters — David Defeats Goliath MODEL: https://huggingface.co/FINAL-Bench/Darwin-4B-David SPACE: https://huggingface.co/spaces/FINAL-Bench/Darwin-4B-david We're releasing Darwin-4B-David, the first second-generation model in the Darwin Opus family. By evolving an already-evolved model, it achieves 85.0% on GPQA Diamond — surpassing its 58.6% original ancestor and even gemma-4-31B (84.3%) — with just 4.5B parameters. Second-Generation Evolution Most merges start from a base model and produce a single offspring. Darwin-4B-David breaks this pattern. The Father (Darwin-4B-Opus) was already evolved from gemma-4-E4B-it with Claude Opus reasoning distillation — a Gen-1 model. The Mother (DavidAU's DECKARD-Expresso-Universe) brings Unsloth deep tuning across 5 in-house datasets with thinking mode by default. Crossbreeding these two produced the first Gen-2 Darwin model. Darwin V6's Model MRI scanned both parents across all 42 layers, assigning independent optimal ratios per layer. The Mother's creativity and Korean language hotspot (Layer 22-25, weight 0.95) was maximally absorbed, while the Father's reasoning core (Layer 30-40, weight 0.48) was preserved. This is "Merge = Evolve" applied recursively — evolution of evolution. Benchmarks Darwin-4B-David scores 85.0% on GPQA Diamond (+26.4%p over original 58.6%), evaluated generatively with maj@8 (8 generations per question, majority vote), Epoch AI prompt format, thinking mode enabled, 50 sampled questions. On ARC-Challenge (25-shot, loglikelihood), both score 64.93% — expected, as loglikelihood doesn't capture thinking-mode reasoning differences. Why This Matters gemma-4-31B (30.7B) scores 84.3%. Darwin-4B-David surpasses it at 1/7th the size — no training, no RL, just 45 minutes of MRI-guided DARE-TIES on one H100. The name "David" honors Mother creator DavidAU and evokes David vs. Goliath.

View all activity

Organizations

None yet

liked a Space about 15 hours ago

ALL Bench Leaderboard

🚀

ALL Bench Leaderboard

upvoted a collection about 15 hours ago

DARWIN-Family

Collection

비드래프트 • 29 items • Updated about 11 hours ago • 8

reactedto SeaWolf-AI's post with ❤️🔥 2 days ago

Post

2793

Why This Matters — David Defeats Goliath

MODEL: FINAL-Bench/Darwin-4B-David
SPACE: FINAL-Bench/Darwin-4B-david

We're releasing Darwin-4B-David, the first second-generation model in the Darwin Opus family. By evolving an already-evolved model, it achieves 85.0% on GPQA Diamond — surpassing its 58.6% original ancestor and even gemma-4-31B (84.3%) — with just 4.5B parameters.

Second-Generation Evolution
Most merges start from a base model and produce a single offspring. Darwin-4B-David breaks this pattern. The Father (Darwin-4B-Opus) was already evolved from gemma-4-E4B-it with Claude Opus reasoning distillation — a Gen-1 model. The Mother (DavidAU's DECKARD-Expresso-Universe) brings Unsloth deep tuning across 5 in-house datasets with thinking mode by default. Crossbreeding these two produced the first Gen-2 Darwin model.

Darwin V6's Model MRI scanned both parents across all 42 layers, assigning independent optimal ratios per layer. The Mother's creativity and Korean language hotspot (Layer 22-25, weight 0.95) was maximally absorbed, while the Father's reasoning core (Layer 30-40, weight 0.48) was preserved. This is "Merge = Evolve" applied recursively — evolution of evolution.

Benchmarks
Darwin-4B-David scores 85.0% on GPQA Diamond (+26.4%p over original 58.6%), evaluated generatively with maj@8 (8 generations per question, majority vote), Epoch AI prompt format, thinking mode enabled, 50 sampled questions. On ARC-Challenge (25-shot, loglikelihood), both score 64.93% — expected, as loglikelihood doesn't capture thinking-mode reasoning differences.

Why This Matters
gemma-4-31B (30.7B) scores 84.3%. Darwin-4B-David surpasses it at 1/7th the size — no training, no RL, just 45 minutes of MRI-guided DARE-TIES on one H100. The name "David" honors Mother creator DavidAU and evokes David vs. Goliath.

liked a Space 2 days ago

Darwin-4B-david

👀

The child surpassed both parents — that is evolution

liked a model 2 days ago

FINAL-Bench/Darwin-4B-David

Text Generation • Updated 2 days ago • 764 • 21

liked a model 3 days ago

SeaWolf-AI/Darwin-gemma-4-E4B-it-x-Gemma-4-E4B-Claude-4-08292

Updated 5 days ago • 2

upvoted an article 5 days ago

Article

Darwin V6: Diagnostic-Guided Evolutionary Model Merging

5 days ago

•

liked a Space 5 days ago

FINAL-Bench/Darwin-4B-Opus

👀

gemma-4-31B-it + gemma-4-31B-it-Claude-Opus-Distill

liked a model 5 days ago

FINAL-Bench/Darwin-4B-Opus

Text Generation • Updated 2 days ago • 579 • 15

reactedto SeaWolf-AI's post with 🔥 5 days ago

Post

5500

🧬 Darwin V6: Diagnostic-Guided Evolutionary Model Merging

We are releasing Darwin-31B-Opus — a reasoning-enhanced model merging Google's Gemma-4-31B-it and TeichAI's Claude Opus Distill using the Darwin V6 engine.

Model: FINAL-Bench/Darwin-31B-Opus
Demo: FINAL-Bench/Darwin-31B-Opus

🔬 What Darwin V6 Does

Conventional merging tools (mergekit, etc.) apply a single ratio to all tensors. Set ratio=0.5 and all 1,188 tensors blend identically, with no distinction between which tensors matter for reasoning versus coding.

Darwin V6 diagnoses both parents at the tensor level before merging. It measures Shannon entropy, standard deviation, and L2 norm for every tensor, then passes 5 diagnostic probes (REASONING, CODE, MATH, KNOWLEDGE, LANGUAGE) through the model to determine layer-wise functional importance. Each of the 1,188 tensors receives an independent optimal ratio.

combined = static(entropy/std/norm) x 0.4 + probe(cosine_distance) x 0.6
final_ratio = mri_ratio x mri_trust + genome_ratio x (1 - mri_trust)

When one parent is overwhelmingly superior for a tensor (ratio < 0.15 or > 0.85), Darwin transplants it directly without interpolation. The mri_trust parameter itself is optimized by CMA-ES evolutionary search, so optimal transplant intensity is determined automatically. After merging, a Health Check compares the child against both parents layer-by-layer to detect interference or function loss.

🧬 Parent Models
Father: google/gemma-4-31B-it
Mother: TeichAI/gemma-4-31B-it-Claude-Opus-Distill

🧬 Results
Compared under identical conditions (same 50 questions, same seed, greedy, thinking mode):
Father: 60.0% (30/50)
Darwin-31B-Opus: 66.0% (33/50) — +10% relative improvement
ARC-Challenge: 82.89% (loglikelihood, zero-shot, 200 questions)
Optimal genome found by evolution:
ffn_ratio=0.93 — FFN layers strongly favor Mother (Claude Opus Distill)
block_5 (L50-L59)=0.86 and more...

11 replies

liked a Space 5 days ago

Darwin-31B-Opus

👀

gemma-4-31B-it + gemma-4-31B-it-Claude-Opus-Distill

liked a model 5 days ago

FINAL-Bench/Darwin-31B-Opus

Text Generation • 33B • Updated 2 days ago • 1.25k • 27

liked a dataset 9 days ago

FINAL-Bench/World-Model

Viewer • Updated 14 days ago • 100 • 2.12k • 27

liked 2 models 11 days ago

bartowski/FINAL-Bench_Darwin-35B-A3B-Opus-GGUF

Image-Text-to-Text • 35B • Updated 11 days ago • 11.6k • 18

FINAL-Bench/Darwin-35B-A3B-Opus-Q8-GGUF

Text Generation • 35B • Updated 2 days ago • 2.4k • 15

reactedto SeaWolf-AI's post with 🔥 12 days ago

Post

2162

🧬 Darwin-35B-A3B-Opus — The Child That Surpassed Both Parents

What if a merged model could beat both its parents? We proved it can.
Darwin-35B-A3B-Opus is a 35B MoE model (3B active) built with our Darwin V5 engine — the first evolution system that CT-scans parent models before merging them.
🤗 Model: FINAL-Bench/Darwin-35B-A3B-Opus

The result speaks for itself: GPQA Diamond 90.0%, versus Father (Qwen3.5-35B-A3B) at 84.2% and Mother (Claude 4.6 Opus Distilled) at 85.0%. That's +6.9% over Father and +5.9% over Mother. Not a tradeoff — a genuine leap. Meanwhile, MMMLU sits at 85.0% (Father: 85.2%), multimodal is fully intact, and all 201 languages are preserved.

How? Model MRI changed everything. Traditional merging is guesswork. Darwin V4 added evolution. Darwin V5 added X-ray vision. Model MRI scans each parent layer by layer and discovers: Mother's L34–L38 is the reasoning engine (peak cosine distance), 50–65% of Mother's experts are dead (killed by text-only distillation), and Father is a healthy generalist with every expert alive. The prescription: transplant Mother's reasoning brain at L38 (90% weight), replace her dead experts with Father's living ones, and let Father's router handle the output layer. Reasoning went up. Versatility stayed intact. No tradeoff — just evolution.

35B total, 3B active (MoE) · GPQA Diamond 90.0% · MMMLU 85.0% (201 languages) · Multimodal Image & Video · 262K native context · 147.8 tok/s on H100 · Runs on a single RTX 4090 (Q4) · Apache 2.0
Darwin V5's full algorithm and technical details will be released alongside an upcoming paper.

🚀 Live Demo: FINAL-Bench/Darwin-35B-A3B-Opus

🏆 FINAL Bench Leaderboard: FINAL-Bench/Leaderboard

📊 ALL Bench Leaderboard: FINAL-Bench/all-bench-leaderboard

Built by VIDRAFT · Supported by the Korean Government GPU Support Program