Multi🤖Transformers

community

https://www.tonic-ai.com

josephpollack

tonic-ai

Activity Feed Request to join this org

AI & ML interests

🤖🤗multi media inputs and outputs to create augmented culture and better outcomes for humans everywhere.❤️🚀

Recent Activity

thanhkt authored a paper about 2 months ago

PRISM: A Multi-Dimensional Benchmark for Evaluating LLM Peer Reviewers

Reubencf authored a paper 4 months ago

Konkani LLM: Multi-Script Instruction Tuning and Evaluation for a Low-Resource Indian Language

peaceAsh authored a paper 4 months ago

Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures

View all activity

Reubencf

posted an update 7 days ago

Post

926

GPT 5.6 SOL builds hugging city a multiplayer online game hosted at Reubencf/Hugging-city

2 replies

Reubencf

posted an update 10 days ago

Post

3795

Reubencf/HF_QR_Gen
has been improved create custom Qr-codes

Reubencf

posted an update 16 days ago

Post

145

Introducing Reubencf/Document_Query
powered by CohereLabs/command-a-plus-05-2026-w4a4 by cohere
just open admin page -> drop in any document -> and ask questions

1 reply

Reubencf

posted an update 28 days ago

Post

3759

Shadows of Tomorrow is finally live on Hugging Face Spaces with Gradio.

It’s a browser-playable RPG built with Godot, set in a post-nuclear future where players explore Magnus Province, collect medicinal plants, craft medicine, and help cure NPCs.

Play it here: Reubencf/Shadows_of_Tomorrow

11 replies

Reubencf

posted an update 29 days ago

Post

134

Tiny Maple is now live on iOS and Android.

The app introduces the @Coherelabs Tiny Aya series of multilingual AI models to mobile devices. This release is significant as it enhances access to multilingual AI from anywhere, particularly for users who prefer offline capabilities.

I invite you to try it and share your feedback.

App Store: https://apps.apple.com/bn/app/tiny-maple/id6774123088

Google Play: https://play.google.com/store/apps/details?id=com.reubencf.tinyaya

KingNish

posted an update about 1 month ago

Post

4503

We trained an open-source Mythos like cybersecurity LLM for the Build Small Hackathon meet OpenMythos

Trained in two stages: SFT on ~1.84K filtered ArXiv cs.CR papers + real CVE data, then RLVR using paired with past vulnerabilities GitHub repos with a verifier model checking outputs against ground truth.

Trained on: H100s from Modal

The RLVR stage made the biggest difference responses got more precise and less prone to confusing similar vulnerability classes.

Everything is open:
🤖 Demo → build-small-hackathon/OpenMythos
🧠 Model → build-small-hackathon/OpenMythos
📦 CVE Dataset → build-small-hackathon/CVE_Vulnerailities_Detailed
📄 ArXiv Dataset → himanshu17HF/ArvixImport-Filtered-Final

Try it out and let us know where it breaks 🙏

2 replies

Reubencf

posted an update about 1 month ago

Post

2059

Millions speak Konkani. The internet barely knows it.

Today's major LLMs struggle with regional languages. They can't read, write or even recognize Konkani. So I built one that can.

Here is a working demo of the Konkani LLM I've been training. 👇

https://youtu.be/8K04ylbXh6k

thanhkt

authored a paper about 2 months ago

PRISM: A Multi-Dimensional Benchmark for Evaluating LLM Peer Reviewers

Paper • 2605.26730 • Published May 27 • 17

Reubencf

posted an update about 2 months ago

Post

4701

I have improved my Portfolio please do check it out
Reubencf/Portfolio

6 replies

Tonic

posted an update 2 months ago

Post

3056

🙋🏻‍♂️ Hey there folks ,

Turns out : if we predict 🌏 earth we can save a lot of time looking for interesting things and less time looking at things that we expect to see.

Sentinel-2 imagery 🛰️basically takes a long time to download towards earth. so our "near real time" systems are quite far from that in practical terms.

meanwhile , if we "predict" what we will see , based on what we do see , we can send down much less data in a timely way , and prioritize 📡earth-bound response .

I'm talking about illegal fishing , logging , mining or building in nature reserves , the more of that we predict early the more we're able to stop it on time.

At least that's the concept !

check out the blog : https://huggingface.co/blog/Tonic/save-patagonia-by-predicting-earth

- Collection: https://huggingface.co/collections/NuTonic/earth-observation-with-temporal-and-general-understanding
- Code: https://github.com/Josephrp/Nutonic
- Dataset: NuTonic/sat-vl-sft-training-ready-v1
- Model: NuTonic/lspace
- Training: NuTonic/lspace-trackio
- Evals: NuTonic/Patagonia_Eval

2 replies

Tonic

posted an update 3 months ago

Post

4372

🙋🏻‍♂️ Hey there folks,

since everyone liked my previous announcement post ( https://huggingface.co/posts/Tonic/338509028435394 ) so much , i'm back with more high quality proceedural datasets in the Geospacial domain for SFT training !

Check this one out :
NuTonic/sat-bbox-metadata-sft-v1

the goal is to be able to train vision models on multiple images for remote sensing analysis with one shot .

hope you like it ! 🚀

2 replies

Tonic

posted an update 3 months ago

Post

3694

🙋🏻‍♂️ Hey there folks ,

I'm sharing huggingface's largest dataset of annotated statelite images today.

check it out here : NuTonic/sat-image-boundingbox-sft-full

I hope you like it , the idea is to be able to use this with small vision models 🚀

Parveshiiii

posted an update 3 months ago

Post

626

🚀 Sonic: A lightweight Python audio processing library with tempo matching, BPM detection, time-stretching, resampling & track blending — now with GPU (CUDA) acceleration for 10x speed!

Perfect for quick remixes, batch edits or syncing tracks.

👉 https://github.com/Parveshiiii/Sonic

#Python #AudioProcessing #OpenSource #PyTorch

Parveshiiii

posted an update 3 months ago

Post

1643

Excited to announce my latest open-source release on Hugging Face: Parveshiiii/breast-cancer-detector.

This model has been trained and validated on external datasets to support medical research workflows. It is designed to provide reproducible benchmarks and serve as a foundation for further exploration in healthcare AI.

Key highlights:
- Built for medical research and diagnostic study contexts
- Validated against external datasets for reliability
- Openly available to empower the community in building stronger, more effective solutions

This release is part of my ongoing effort to make impactful AI research accessible through **Modotte**. A detailed blog post explaining the methodology, dataset handling, and validation process will be published soon.

You can explore the model here: Parveshiiii/breast-cancer-detector

#AI #MedicalResearch #DeepLearning #Healthcare #OpenSource #HuggingFace

Reubencf

authored a paper 4 months ago

Konkani LLM: Multi-Script Instruction Tuning and Evaluation for a Low-Resource Indian Language

Paper • 2603.23529 • Published Mar 7 • 1

Parveshiiii

posted an update 4 months ago

Post

2965

Just did something I’ve been meaning to try for ages.

In only 3 hours, on 10 billion+ tokens, I trained a custom BPE + tiktoken-style tokenizer using my new library microtok — and it hits the same token efficiency as Qwen3.

Tokenizers have always felt like black magic to me. We drop them into every LLM project, but actually training one from scratch? That always seemed way too complicated.

Turns out it doesn’t have to be.

microtok makes the whole process stupidly simple — literally just 3 lines of code. No heavy setup, no GPU required. I built it on top of the Hugging Face tokenizers library so it stays clean, fast, and actually understandable.

If you’ve ever wanted to look under the hood and build your own optimized vocabulary instead of just copying someone else’s, this is the entry point you’ve been waiting for.

I wrote up the full story, threw in a ready-to-run Colab template, and dropped the trained tokenizer on Hugging Face.

Blog → https://parveshiiii.github.io/blogs/microtok/
Trained tokenizer → https://huggingface.co/Parveshiiii/microtok
GitHub repo → https://github.com/Parveshiiii/microtok

Nymbo

posted an update 4 months ago

Post

7658

We should really have a release date range slider on the /models page. Tired of "trending/most downloaded" being the best way to sort and still seeing models from 2023 on the first page just because they're embedded in enterprise pipelines and get downloaded repeatedly. "Recently Created/Recently Updated" don't solve the discovery problem considering the amount of noise to sift through.

Slight caveat: Trending actually does have some recency bias, but it's not strong/precise enough.

3 replies

peaceAsh

authored a paper 4 months ago

Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures

Paper • 2510.24081 • Published Oct 28, 2025 • 24

Reubencf

posted an update 4 months ago

Post

2823

🚀 I am thrilled to announce the release of a new Konkani LLM!

We've seen some fantastic results for both translation and transliteration tasks, and I'm excited to share this progress with the community.

📖 Read the launch article and see the results: https://huggingface.co/blog/Reubencf/konkani-llm
🤖 Explore the model and collection:

konkani

I would love to hear your feedback or see what you build with it! #Konkani #LLM #NLP #HuggingFace #IndicNLP #Konkani

Tonic

posted an update 5 months ago

Post

3827

🤔 Who would win ?

- a fully subsidized ai lab
OR
- 3 random students named

kurakurai ?

demo : Tonic/fr-on-device

if you like it give the demo a little star and send a shoutout to : @MaxLSB @jddqd and @GAD-cell for absolutely obliterating the pareto frontier of the french language understanding .

4 replies

AI & ML interests

Recent Activity

Team members 107

MultiTransformer's activity