AI & ML interests

None defined yet.

Recent Activity

Quazim0t0ย  updated a Space about 1 hour ago
DaisyChainAI/README
Quazim0t0ย  published a Space about 1 hour ago
DaisyChainAI/README
Quazim0t0ย  updated a Space about 2 hours ago
DaisyChainAI/Daisychain-Genomics-Demo
View all activity

Organization Card

๐ŸŒผ DaisyChainAI

We build capable systems by daisy-chaining a handful of small, sharp specialists behind a learned router โ€” instead of training one giant model to do everything. Each specialist is cheap, swappable, and crisp on its own domain; chained together, they behave like one model at a fraction of the active compute.


๐Ÿ”— What "daisy-chaining" means

A daisy chain links independent units in series so a signal can flow from one to the next, each unit handling what it's good at and passing the rest along. That's exactly how our systems work:

  • Each link is one small specialist โ€” a dense ~74M model trained on a single domain. It is excellent at its own data and (deliberately) surprised by everything else.
  • The router is the connector between links. When an input arrives, it travels down the chain: every specialist reports how surprised it is (bits/base) and exposes its hidden state, and a tiny learned router hands the work to the link that's most at home with it.
  • The chain grows link by link. Because the specialists are trained separately, you can chain a new domain on without retraining the others โ€” add a link, extend the router, done. Remove or upgrade a single link the same way.
  • One link runs per query. Only the routed specialist computes, so a chain of four ~74M experts costs ~74M of compute per token โ€” roughly 7ร— cheaper than a 500M monolith of comparable scope.

So "DaisyChain" is both the brand and the mechanism: a chain of specialists, connected by routing, that you extend one flower at a time.


๐Ÿ› ๏ธ How the models are built

Each specialist is grown by interleaving two steps, per domain:

  1. Continued pretraining โ€” next-token training on only that domain's data, so the specialist becomes genuinely crisp on its home distribution (and the router can tell the links apart).
  2. Per-domain distillation โ€” the specialist is distilled from a larger teacher foundation model restricted to its own domain (soft-target KD, plus a factorized per-nucleotide variant where the teacher supports it). It learns the teacher's behavior on its slice without ever becoming a generic clone โ€” the specialization is what makes routing work.

We iterate those two steps until each link is as strong as its capacity allows, then train the router: a small head that reads every specialist's surprise plus a compressed view of its hidden state and predicts the home domain โ€” recovering bias-corrections a plain "lowest-perplexity-wins" rule misses.

This is, in lineage, a cluster Branch-Train-Merge (cBTM) mixture of domain experts โ€” independent experts + perplexity-aware routing โ€” with iterative distillation from a larger teacher layered on top.


๐Ÿงฌ Current project โ€” DaisyChain Genomics

Four DNA/RNA specialists (eukaryote ยท prokaryote ยท mRNA ยท mRNA-splice, ~74M each, โ‰ˆ295M total โ€” under 500M), each distilled per-domain from a 500M genomic foundation model, behind a learned router.

Routing accuracy (held-out) 94.8%
Active params / query ~74M (one specialist)
vs the 500M teacher within ~6% likelihood; closing with training

More links on the chain โ€” and more chains โ€” coming. ๐ŸŒผ

Citation

If you use these models, please cite the author โ€” Dean Byrne (Quazim0t0):

@misc{byrne2026daisychain,
  title        = {DaisyChain Genomics: A Modular Mixture of Per-Domain Distilled Genomic Specialists},
  author       = {Byrne, Dean},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/DaisyChainAI/daisychain-genomics}},
  note         = {DaisyChainAI (Quazim0t0). Four ~74M DNA/RNA specialists distilled per-domain
                  from Carbon-500M behind a learned router}
}

datasets 0

None public yet