Shane

Crownelius

https://www.crowfeather.co

crownelius

AI & ML interests

LLM, RL, DL, ML, AGI, Distillation, Workflow Automation, Creative Writing

Recent Activity

updated a dataset about 14 hours ago

Glint-Research/Complete-FABLE.5-traces-2M

updated a dataset about 14 hours ago

Crownelius/Complete-FABLE.5-traces-2M

published a dataset 1 day ago

Crownelius/Complete-FABLE.5-traces-2M

View all activity

Organizations

posted an update 15 days ago

Post

226

Could AI power a new form of dating?

So, some back-story. I entered this hackathon and my entry was an openclaw managed match-maker. Tell hey, openclaw find me somebody to love.

It works, I finished it, but is ultimately useless because nobody has used it.
There's where you lonely folks come in. ANYBODY on huggingface can populate it.

Think of it as a Huggingface exclusive meet-cute app.

reacted to AxionLab-official's post with 🔥 15 days ago

Post

6591

We're happy to announce that we released a Reasoning tuned version of Supra-50M!

SupraLabs/Supra-50M-Reasoning

reacted to BibbyResearch's post with 🔥 24 days ago

Post

1192

New Dataset Alert 🚨
Largest illegal job titles - comments from an Instagram viral video.
BibbyResearch/illegal-job-titles-comments
Good for small LLM training and fine-tuning.

reacted to AxionLab-official's post with 👀 28 days ago

Post

244

We RELEASED!

SupraLabs just released our 50M model!
Base, Instruct Weights are there, you can use!

You can check blog to more informations!(Writing blog yet!)

2 replies

replied to their post 29 days ago

Whatever you can manage to fit into 100m params.

replied to their post 29 days ago

Дa, привет

posted an update about 1 month ago

Post

4685

Howdy,
CompactAI-O is launching a tiny Model Golf, and the winner walks away with $50 in RunPod credits. Monthly. Every month. Show up, build, somebody wins.

What it is

Build the best language model you can under 100 million parameters, with at least a 1028-token context window. That's it. Any architecture, any tokenizer, any training scheme you can dream up at 3am. The only catch is it's gotta be open source (MIT, GPL, Apache, AGPL) take your pick.

It scratches the same itch as a Kaggle comp without the dataset\leaderboard nonsense. No fixed benchmark to game. No llama.cpp compatibility hoops. If you wanna train a 50M-param MoE with five experts and a tokenizer built on cookbooks, you can do that. Nothing stopping you.

The rules are listed in the discord and on the organization page if you're interested.

Why $50????

It's symbolic. It ain't gonna make anyone rich. But it's enough to cover a weekend of GPU time, enough to keep enthusiasts coming back, and not so much that it pulls in people who are just there for the money. Enthusiasts build interesting things. Interesting things move the field forward. A little incentive. I'd do it for $50 lol.

How to join

First round opens soon. Landing page is here:

→ https://huggingface.co/spaces/CompactAI-O/Tiny-model-golf

For questions or to swap ideas, the Discord's open:

→ https://discord.gg/y2jTct6Cxv

Excited to see what yall come up with. ♥

— Shane

8 replies

replied to their post about 1 month ago

Same, but 40k on hardware and then train hard
Could sell my goats and get some api subscriptions.

posted an update about 2 months ago

Post

4220

Day 4-6 [05/05/2026]
Howdy,

Is anybody else willing to put a second mortgage on their house, just to spend 40k USD in compute credits? Just me? k...

I got dreams, man. The datasets I could build with 40k would be insane.
Somebody called me a genius the other day, they'd be shocked to find out, that I would put my house on the line for 30 days of runpod usage.

What would you do with it?
I would turn arxiv into a dataset. Turn each arxiv paper into a QnA.
Or... maybe if I got 40k USD in credit's Id end up like those 16 lost scientists.

Food for thought.
Anyways, I think I'm going to make a post once a week.
In the meantime you can find me building small llm's in discord here:
https://discord.gg/4DdwS9D8x9

edit:
So to be clear, I will not actually do this. But think of it this way. If you could pivot an entire industry with 10-30k would you do it?

7 replies

replied to their post about 2 months ago

Final recipe locked: Qwen3-MoE, 3 experts top-1, vocab 262144 (Gemma 3 SP, per-digit input wrap), GQA 3:1, Muon for hidden 2D weights and AdamW for embed and router, WSD with sqrt cooldown, beta2 ramp from 0.95 to 0.97, z-loss 1e-4 with gradients this time (the last build had a no_grad bug that silently killed it), Qwen3 aux loss coefficient 0.001, expert-load monitor that warns on starvation. Three phases: 8K pretrain, then 32K continued pretrain, then 8K SFT.

posted an update about 2 months ago

Post

5318

Day 3 - 05/02/2026
Scamp ships, hits the wall. New plan...

Scamp came back from training today... Didn't go so well, I'm still unsure...

Fast benchmark, temperature 0.7, top_p 0.9:
- "Capital of France is" produced "covered by the Crown" (grammatical, factually wrong)
- "23 + 19 = ?" produced "23. Answer: 23. Answer: 23..." (loops, math broken)
- "def fibonacci(n):" produced a list of letters

It speaks English. It can't reason. At 8K vocab and 50M params, it was never going to.

Next build: 412M MoE-3E. Three experts (math, language, code), top-1 routing, random init, let specialization emerge from gradient signal alone. Tried seeded Branch-Train-MiX first then dropped it. Adds compute for no clear win when the router will find its own attractors anyway.

Big lesson today came from limit testing on A100 80GB. Surprise, every planned phase ran out of memory even on 80GB. Root cause: at vocab 262144 (Gemma 3 standard), the output logits dominate during forward and backward. Fix: Liger Kernel's fused cross-entropy. It streams the loss computation instead of materialising the full B by T by vocab tensor. Without it the build would not run.

Scamp proved the pipeline runs end-to-end on real hardware. The 412M run starts tomorrow. If routing balances naturally and math finally crystallises, ships as Crowfeather-412M-3E with GGUF in F16, Q8, Q5, and Q4.

So... the training may have produced a poet if I had done it better. But I didn't, so instead... we get a malformed robot named Scamp... This is progress.

-Shane

P.S Join discord for discussion: https://discord.gg/8ZscHNmJYE and
I post my finished stuff here: CompactAI-O

2 replies

replied to their post about 2 months ago

I might. Join https://discord.gg/vaEquJ6UJT if you need to contact me.

posted an update about 2 months ago

Post

3690

[DAY TWO] PROJECT CROWFEATHER - 5/1/2026
Que sera, what will he be?

Step 47,500 of 100,000. Loss hovering around 2.76 on 6.2B tokens. Throughput steady at 87k per second on the A100. Not a GH200, but she gets it done.

Still haven't named him. Scamp has a rascally charm. Quentin sounds like he'd wear a bow tie and think hard before speaking. Taking votes.

Phase two is what's keeping me up. Datasets everywhere and I can't pick. I'm fusing Google and DeepSeek's ideas: Gemma 4's alternating sliding and global attention, DeepSeek V4's Muon optimizer and WSD scheduler, Gemma 2's logit soft cap, and PaLM's z-loss. Sounds like peanut butter on a hamburger, but the loss curve says it works.

Tribe_v2 has real potential but needs more scaffolding than a barn raising before I throw it in. One thing's certain though. This model's gonna be a thinker. Not a Wikipedia parrot. Something that chews before it answers.

Finally got a use for my less popular datasets too. Some Opus-4.5-Writing-Style for polish. A few rows of Human-Archtypes-25k to see what personality bubbles up. Could be a poet, could be a grump. Either beats a flimsy fine-tune.

The bank's after my credit card. Until then, full steam.

Next model gets graphs. I swear.

-Shane

3 replies

replied to their post about 2 months ago

Yes sir.

posted an update about 2 months ago

Post

3840

[DAY ONE] PROJECT CROWFEATHER 4/30/2026
...The day I forgot to attach wandb.ai
Just dropped Crowfeather-50m, the first checkpoint in a series, and yeah, no graphs.

Crowfeather/Crowfeather-50m

54.5M params. Pretrain only. 17,500 steps banked on FineWeb-edu before Thunder credits ran dry. About 2.3B tokens, no SFT yet.

Architecture: Gemma-4 alternating sliding/global attention (1024 window, last layer always global) plus DeepSeek-V4 Muon optimizer plus WSD scheduler plus Gemma-2 logit soft-cap plus PaLM z-loss. Recipe in the model card.

What it can do: writes grammatical English. Knows that France has Rhine-adjacent monasteries (it picked Rouen instead of Paris but the vocabulary is in there). Tells stories about Mr. Fabien.

What it can't do yet: facts, code, math. Base LM, no SFT, no instruction tuning.

The series:
Every additional training run becomes another model card here
Every model card gets a matching post on this profile
Continuation goes to Colab next, picking up from step 17500 out of 100k

Limited to one post a day on Hugging Face, so updates will trickle out at that pace. Follow [@Crownelius](@Crownelius ) and [@Crowfeather](

Crowfeather ) if you want to watch this thing learn in public. Next drop will either come with the finished pre-train or whatever step I land on before the bank takes my credit card away.

Graphs will be available on my NEXT model lol

-Shane

3 replies

reacted to danielhanchen's post with 🔥 about 2 months ago

Post

10832

Unsloth is now one of the top 10 most followed organizations on Hugging Face. 🤗🦥

Thanks so much for all the support!
Our HF page:

unsloth

5 replies

replied to Tonic's post about 2 months ago

Thank you!

reacted to Tonic's post with 👍 about 2 months ago

Post

4332

🙋🏻‍♂️ Hey there folks,

since everyone liked my previous announcement post ( https://huggingface.co/posts/Tonic/338509028435394 ) so much , i'm back with more high quality proceedural datasets in the Geospacial domain for SFT training !

Check this one out :
NuTonic/sat-bbox-metadata-sft-v1

the goal is to be able to train vision models on multiple images for remote sensing analysis with one shot .

hope you like it ! 🚀

2 replies

posted an update about 2 months ago

Post

5954

My Huggingface journey has been a trip!
I wanted to take the time to thank each and every one of you for using my dataset and getting it to go as far as it did. Believe it or not, some neanderthal was and maybe still is trending on huggingface.

Not only did my dataset reach number one, my fine-tuned qwen3.5 model did as well. Top 10. Honestly, ain't much left to do here.

Y'all have given me the desire, no... the craving for more. I am absolutely obsessed with AI now. I want to tweak it... I want to take it apart, just to see what makes everything tick. I want to put it together like Frankenstein and his monster.

The only thing that's stopping this guy is compute. I don't mind spending every penny I have on this. I desperately want to drive AI forward, even just a little bit.

I never knew the clanker hater from a year ago would be saying this.

Thank you all from the bottom of my heart.

Looking forward to showing you what I'm cooking up next. @CompactAI is your only hint!

3 replies

Shane

AI & ML interests

Recent Activity

Organizations

Crownelius's activity