AI & ML interests
🤖🤗multi media inputs and outputs to create augmented culture and better outcomes for humans everywhere.❤️🚀
Recent Activity
View all activity
Post
2516
🙋🏻♂️ Hey there folks ,
Turns out : if we predict 🌏 earth we can save a lot of time looking for interesting things and less time looking at things that we expect to see.
Sentinel-2 imagery 🛰️basically takes a long time to download towards earth. so our "near real time" systems are quite far from that in practical terms.
meanwhile , if we "predict" what we will see , based on what we do see , we can send down much less data in a timely way , and prioritize 📡earth-bound response .
I'm talking about illegal fishing , logging , mining or building in nature reserves , the more of that we predict early the more we're able to stop it on time.
At least that's the concept !
check out the blog : https://huggingface.co/blog/Tonic/save-patagonia-by-predicting-earth
- Collection: https://huggingface.co/collections/NuTonic/earth-observation-with-temporal-and-general-understanding
- Code: https://github.com/Josephrp/Nutonic
- Dataset: NuTonic/sat-vl-sft-training-ready-v1
- Model: NuTonic/lspace
- Training: NuTonic/lspace-trackio
- Evals: NuTonic/Patagonia_Eval
Turns out : if we predict 🌏 earth we can save a lot of time looking for interesting things and less time looking at things that we expect to see.
Sentinel-2 imagery 🛰️basically takes a long time to download towards earth. so our "near real time" systems are quite far from that in practical terms.
meanwhile , if we "predict" what we will see , based on what we do see , we can send down much less data in a timely way , and prioritize 📡earth-bound response .
I'm talking about illegal fishing , logging , mining or building in nature reserves , the more of that we predict early the more we're able to stop it on time.
At least that's the concept !
check out the blog : https://huggingface.co/blog/Tonic/save-patagonia-by-predicting-earth
- Collection: https://huggingface.co/collections/NuTonic/earth-observation-with-temporal-and-general-understanding
- Code: https://github.com/Josephrp/Nutonic
- Dataset: NuTonic/sat-vl-sft-training-ready-v1
- Model: NuTonic/lspace
- Training: NuTonic/lspace-trackio
- Evals: NuTonic/Patagonia_Eval
Post
4215
🙋🏻♂️ Hey there folks,
since everyone liked my previous announcement post ( https://huggingface.co/posts/Tonic/338509028435394 ) so much , i'm back with more high quality proceedural datasets in the Geospacial domain for SFT training !
Check this one out :
NuTonic/sat-bbox-metadata-sft-v1
the goal is to be able to train vision models on multiple images for remote sensing analysis with one shot .
hope you like it ! 🚀
since everyone liked my previous announcement post ( https://huggingface.co/posts/Tonic/338509028435394 ) so much , i'm back with more high quality proceedural datasets in the Geospacial domain for SFT training !
Check this one out :
NuTonic/sat-bbox-metadata-sft-v1
the goal is to be able to train vision models on multiple images for remote sensing analysis with one shot .
hope you like it ! 🚀
Post
3589
🙋🏻♂️ Hey there folks ,
I'm sharing huggingface's largest dataset of annotated statelite images today.
check it out here : NuTonic/sat-image-boundingbox-sft-full
I hope you like it , the idea is to be able to use this with small vision models 🚀
I'm sharing huggingface's largest dataset of annotated statelite images today.
check it out here : NuTonic/sat-image-boundingbox-sft-full
I hope you like it , the idea is to be able to use this with small vision models 🚀
Parveshiiii
posted an update about 1 month ago
Post
560
🚀 Sonic: A lightweight Python audio processing library with tempo matching, BPM detection, time-stretching, resampling & track blending — now with GPU (CUDA) acceleration for 10x speed!
Perfect for quick remixes, batch edits or syncing tracks.
👉 https://github.com/Parveshiiii/Sonic
#Python #AudioProcessing #OpenSource #PyTorch
Perfect for quick remixes, batch edits or syncing tracks.
👉 https://github.com/Parveshiiii/Sonic
#Python #AudioProcessing #OpenSource #PyTorch
Parveshiiii
posted an update about 1 month ago
Post
1627
Excited to announce my latest open-source release on Hugging Face: Parveshiiii/breast-cancer-detector.
This model has been trained and validated on external datasets to support medical research workflows. It is designed to provide reproducible benchmarks and serve as a foundation for further exploration in healthcare AI.
Key highlights:
- Built for medical research and diagnostic study contexts
- Validated against external datasets for reliability
- Openly available to empower the community in building stronger, more effective solutions
This release is part of my ongoing effort to make impactful AI research accessible through **Modotte**. A detailed blog post explaining the methodology, dataset handling, and validation process will be published soon.
You can explore the model here: Parveshiiii/breast-cancer-detector
#AI #MedicalResearch #DeepLearning #Healthcare #OpenSource #HuggingFace
This model has been trained and validated on external datasets to support medical research workflows. It is designed to provide reproducible benchmarks and serve as a foundation for further exploration in healthcare AI.
Key highlights:
- Built for medical research and diagnostic study contexts
- Validated against external datasets for reliability
- Openly available to empower the community in building stronger, more effective solutions
This release is part of my ongoing effort to make impactful AI research accessible through **Modotte**. A detailed blog post explaining the methodology, dataset handling, and validation process will be published soon.
You can explore the model here: Parveshiiii/breast-cancer-detector
#AI #MedicalResearch #DeepLearning #Healthcare #OpenSource #HuggingFace
Reubencf
authored a
paper about 2 months ago
Parveshiiii
posted an update about 2 months ago
Post
2942
Just did something I’ve been meaning to try for ages.
In only 3 hours, on 10 billion+ tokens, I trained a custom BPE + tiktoken-style tokenizer using my new library microtok — and it hits the same token efficiency as Qwen3.
Tokenizers have always felt like black magic to me. We drop them into every LLM project, but actually training one from scratch? That always seemed way too complicated.
Turns out it doesn’t have to be.
microtok makes the whole process stupidly simple — literally just 3 lines of code. No heavy setup, no GPU required. I built it on top of the Hugging Face tokenizers library so it stays clean, fast, and actually understandable.
If you’ve ever wanted to look under the hood and build your own optimized vocabulary instead of just copying someone else’s, this is the entry point you’ve been waiting for.
I wrote up the full story, threw in a ready-to-run Colab template, and dropped the trained tokenizer on Hugging Face.
Blog → https://parveshiiii.github.io/blogs/microtok/
Trained tokenizer → https://huggingface.co/Parveshiiii/microtok
GitHub repo → https://github.com/Parveshiiii/microtok
In only 3 hours, on 10 billion+ tokens, I trained a custom BPE + tiktoken-style tokenizer using my new library microtok — and it hits the same token efficiency as Qwen3.
Tokenizers have always felt like black magic to me. We drop them into every LLM project, but actually training one from scratch? That always seemed way too complicated.
Turns out it doesn’t have to be.
microtok makes the whole process stupidly simple — literally just 3 lines of code. No heavy setup, no GPU required. I built it on top of the Hugging Face tokenizers library so it stays clean, fast, and actually understandable.
If you’ve ever wanted to look under the hood and build your own optimized vocabulary instead of just copying someone else’s, this is the entry point you’ve been waiting for.
I wrote up the full story, threw in a ready-to-run Colab template, and dropped the trained tokenizer on Hugging Face.
Blog → https://parveshiiii.github.io/blogs/microtok/
Trained tokenizer → https://huggingface.co/Parveshiiii/microtok
GitHub repo → https://github.com/Parveshiiii/microtok
Post
7088
We should really have a release date range slider on the /models page. Tired of "trending/most downloaded" being the best way to sort and still seeing models from 2023 on the first page just because they're embedded in enterprise pipelines and get downloaded repeatedly. "Recently Created/Recently Updated" don't solve the discovery problem considering the amount of noise to sift through.
Slight caveat: Trending actually does have some recency bias, but it's not strong/precise enough.
Slight caveat: Trending actually does have some recency bias, but it's not strong/precise enough.
peaceAsh
authored a
paper 2 months ago
Post
2815
🚀 I am thrilled to announce the release of a new Konkani LLM!
We've seen some fantastic results for both translation and transliteration tasks, and I'm excited to share this progress with the community.
📖 Read the launch article and see the results: https://huggingface.co/blog/Reubencf/konkani-llm
🤖 Explore the model and collection:
konkani
I would love to hear your feedback or see what you build with it! #Konkani #LLM #NLP #HuggingFace #IndicNLP #Konkani
We've seen some fantastic results for both translation and transliteration tasks, and I'm excited to share this progress with the community.
📖 Read the launch article and see the results: https://huggingface.co/blog/Reubencf/konkani-llm
🤖 Explore the model and collection:
I would love to hear your feedback or see what you build with it! #Konkani #LLM #NLP #HuggingFace #IndicNLP #Konkani
Post
3734
🤔 Who would win ?
- a fully subsidized ai lab
- 3 random students named
kurakurai ?
demo : Tonic/fr-on-device
if you like it give the demo a little star and send a shoutout to : @MaxLSB @jddqd and @GAD-cell for absolutely obliterating the pareto frontier of the french language understanding .
- a fully subsidized ai lab
OR - 3 random students named
demo : Tonic/fr-on-device
if you like it give the demo a little star and send a shoutout to : @MaxLSB @jddqd and @GAD-cell for absolutely obliterating the pareto frontier of the french language understanding .
Post
3439
🙋🏻♂️hello my lovelies ,
it is with great pleasure i present to you my working one-click deploy 16GB ram completely free huggingface spaces deployment.
repo : Tonic/hugging-claw (use git clone to inspect)
literally the one-click link : Tonic/hugging-claw
you can also run it locally and see for yourself :
docker run -it -p 7860:7860 --platform=linux/amd64 \
-e HF_TOKEN="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_TRUSTED_PROXIES="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_PASSWORD="YOUR_VALUE_HERE" \
-e OPENCLAW_CONTROL_UI_ALLOWED_ORIGINS="YOUR_VALUE_HERE" \
registry.hf.space/tonic-hugging-claw:latest
just a few quite minor details i'll take care of but i wanted to share here first
it is with great pleasure i present to you my working one-click deploy 16GB ram completely free huggingface spaces deployment.
repo : Tonic/hugging-claw (use git clone to inspect)
literally the one-click link : Tonic/hugging-claw
you can also run it locally and see for yourself :
docker run -it -p 7860:7860 --platform=linux/amd64 \
-e HF_TOKEN="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_TRUSTED_PROXIES="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_PASSWORD="YOUR_VALUE_HERE" \
-e OPENCLAW_CONTROL_UI_ALLOWED_ORIGINS="YOUR_VALUE_HERE" \
registry.hf.space/tonic-hugging-claw:latest
just a few quite minor details i'll take care of but i wanted to share here first
Parveshiiii
posted an update 3 months ago
Post
346
Introducing Seekify — a truly non‑rate‑limiting search library for Python
Tired of hitting rate limits when building search features? I’ve built Seekify, a lightweight Python library that lets you perform searches without the usual throttling headaches.
🔹 Key highlights
- Simple API — plug it in and start searching instantly
- No rate‑limiting restrictions
- Designed for developers who need reliable search in projects, scripts, or apps
📦 Available now on PyPI:
👉 Check out the repo: https:/github.com/Parveshiiii/Seekify
I’d love feedback, contributions, and ideas for real‑world use cases. Let’s make search smoother together!
Tired of hitting rate limits when building search features? I’ve built Seekify, a lightweight Python library that lets you perform searches without the usual throttling headaches.
🔹 Key highlights
- Simple API — plug it in and start searching instantly
- No rate‑limiting restrictions
- Designed for developers who need reliable search in projects, scripts, or apps
📦 Available now on PyPI:
pip install seekify👉 Check out the repo: https:/github.com/Parveshiiii/Seekify
I’d love feedback, contributions, and ideas for real‑world use cases. Let’s make search smoother together!
Post
257
finding https://github.com/meta-introspector/monster/blob/9a368b1dd58e72ed4a466f81f74ab2ea95c26927/experiments/bott_periodicity/monster_walk.tex#L82 I sent this to my old math prof
You can remove these primes in groups from the monster in the 10fold way
1 & 0 & 8080 & 4 & 8 \\
2 & 4 & 1742 & 4 & 4 \\
3 & 8 & 479 & 3 & 4 \\
4 & 11 & 451 & 3 & 4 \\
5 & 14 & 2875 & 4 & 4 \\
https://x.com/introsp3ctor/status/2018078520321179935
You can remove these primes in groups from the monster in the 10fold way
1 & 0 & 8080 & 4 & 8 \\
2 & 4 & 1742 & 4 & 4 \\
3 & 8 & 479 & 3 & 4 \\
4 & 11 & 451 & 3 & 4 \\
5 & 14 & 2875 & 4 & 4 \\
https://x.com/introsp3ctor/status/2018078520321179935
Parveshiiii
posted an update 4 months ago
Post
1647
🚀 Wanna train your own AI Model or Tokenizer from scratch?
Building models isn’t just for big labs anymore — with the right data, compute, and workflow, you can create **custom AI models** and **tokenizers** tailored to any domain. Whether it’s NLP, domain‑specific datasets, or experimental architectures, training from scratch gives you full control over vocabulary, embeddings, and performance.
✨ Why train your own?
- Full control over vocabulary & tokenization
- Domain‑specific optimization (medical, legal, technical, etc.)
- Better performance on niche datasets
- Freedom to experiment with architectures
⚡ The best part?
- Tokenizer training (TikToken / BPE) can be done in **just 3 lines of code**.
- Model training runs smoothly on **Google Colab notebooks** — no expensive hardware required.
📂 Try out my work:
- 🔗 https://github.com/OE-Void/Tokenizer-from_scratch
- 🔗 https://github.com/OE-Void/GPT
Building models isn’t just for big labs anymore — with the right data, compute, and workflow, you can create **custom AI models** and **tokenizers** tailored to any domain. Whether it’s NLP, domain‑specific datasets, or experimental architectures, training from scratch gives you full control over vocabulary, embeddings, and performance.
✨ Why train your own?
- Full control over vocabulary & tokenization
- Domain‑specific optimization (medical, legal, technical, etc.)
- Better performance on niche datasets
- Freedom to experiment with architectures
⚡ The best part?
- Tokenizer training (TikToken / BPE) can be done in **just 3 lines of code**.
- Model training runs smoothly on **Google Colab notebooks** — no expensive hardware required.
📂 Try out my work:
- 🔗 https://github.com/OE-Void/Tokenizer-from_scratch
- 🔗 https://github.com/OE-Void/GPT
Post
2218
📢 New release! World_events Dataset now available featuring global events spanning 2023 through 2025
🌍 https://huggingface.co/collections/Reubencf/world-events
🚀 2026 dataset dropping soon
🌍 https://huggingface.co/collections/Reubencf/world-events
🚀 2026 dataset dropping soon
Post
1901
Now Live: The Reubencf/Nano_Banana_Editor now includes 10 free requests/day! 🍌 I'm personally sponsoring these credits to help make open AI accessible to all.
(Note: Limits are subject to change based on funding).
Enjoy !
(Note: Limits are subject to change based on funding).
Enjoy !
Parveshiiii
posted an update 4 months ago
Post
269
📢 The Announcement
Subject: XenArcAI is now Modotte – A New Chapter Begins! 🚀
Hello everyone,
We are thrilled to announce that XenArcAI is officially rebranding to Modotte!
Since our journey began, we’ve been committed to pushing the boundaries of AI through open-source innovation, research, and high-quality datasets. As we continue to evolve, we wanted a name that better represents our vision for a modern, interconnected future in the tech space.
What is changing?
The Name: Moving forward, all our projects, models, and community interactions will happen under the Modotte banner.
The Look: You’ll see our new logo and a fresh color palette appearing across our platforms.
What is staying the same?
The Core Team: It’s still the same people behind the scenes, including our founder, Parvesh Rawal.
Our Mission: We remain dedicated to releasing state-of-the-art open-source models and datasets.
Our Continuity: All existing models, datasets, and projects will remain exactly as they are—just with a new home.
This isn’t just a change in appearance; it’s a commitment to our next chapter of growth and discovery. We are so grateful for your ongoing support as we step into this new era.
Welcome to the future. Welcome to Modotte.
Best regards, The Modotte Team
Subject: XenArcAI is now Modotte – A New Chapter Begins! 🚀
Hello everyone,
We are thrilled to announce that XenArcAI is officially rebranding to Modotte!
Since our journey began, we’ve been committed to pushing the boundaries of AI through open-source innovation, research, and high-quality datasets. As we continue to evolve, we wanted a name that better represents our vision for a modern, interconnected future in the tech space.
What is changing?
The Name: Moving forward, all our projects, models, and community interactions will happen under the Modotte banner.
The Look: You’ll see our new logo and a fresh color palette appearing across our platforms.
What is staying the same?
The Core Team: It’s still the same people behind the scenes, including our founder, Parvesh Rawal.
Our Mission: We remain dedicated to releasing state-of-the-art open-source models and datasets.
Our Continuity: All existing models, datasets, and projects will remain exactly as they are—just with a new home.
This isn’t just a change in appearance; it’s a commitment to our next chapter of growth and discovery. We are so grateful for your ongoing support as we step into this new era.
Welcome to the future. Welcome to Modotte.
Best regards, The Modotte Team
Post
3101
Claude Code Self & Continual Learning
Hey everyone! 👋
30 GitHub Stars in 4 Days - Thank You!
I'm really grateful for the positive response to the Claude Reflect System. In just 4 days, 30 developers have shown interest by starring the project. Thank you so much!
What Is Claude Reflect?
Correct once, never again. Claude Reflect helps Claude Code remember your corrections and preferences across sessions. Instead of repeating the same feedback, the system learns and applies it automatically.
Main Features:
🧠 Learning System
- Detects corrections and preferences from conversations
- Stores them permanently in skill files
- Applies learnings in future sessions
🔒 Safety First
- Automatic backups before changes
- YAML validation
- Git version control
⚡ Two Modes
- Manual: Run /reflect when you want
- Auto: Reflects automatically at session end
How It Works
If you correct Claude to use pytest instead of unittest, this preference gets saved. Next time, Claude will remember and use pytest automatically. It's that simple.
Getting Started
1. Clone the repository
2. Install dependencies
3. Activate the skill
4. Try it out!
The python-project-creator example shows how the system learns from your feedback.
Give It a Try
https://github.com/haddock-development/claude-reflect-system
Feel free to check it out, give feedback, or contribute. Every bit of input helps improve the project!
Thank you so much for your support!
---
#ClaudeCode #AI #MachineLearning #ContinualLearning #OpenSource #Developer #Coding #Python #Productivity #DevTools #GitHub #SoftwareDevelopment #Programming #AIAssistant #DeveloperTools #CodeQuality #Tech
Feel free to give it a try by yourself.
https://github.com/haddock-development/claude-reflect-system
Hey everyone! 👋
30 GitHub Stars in 4 Days - Thank You!
I'm really grateful for the positive response to the Claude Reflect System. In just 4 days, 30 developers have shown interest by starring the project. Thank you so much!
What Is Claude Reflect?
Correct once, never again. Claude Reflect helps Claude Code remember your corrections and preferences across sessions. Instead of repeating the same feedback, the system learns and applies it automatically.
Main Features:
🧠 Learning System
- Detects corrections and preferences from conversations
- Stores them permanently in skill files
- Applies learnings in future sessions
🔒 Safety First
- Automatic backups before changes
- YAML validation
- Git version control
⚡ Two Modes
- Manual: Run /reflect when you want
- Auto: Reflects automatically at session end
How It Works
If you correct Claude to use pytest instead of unittest, this preference gets saved. Next time, Claude will remember and use pytest automatically. It's that simple.
Getting Started
1. Clone the repository
2. Install dependencies
3. Activate the skill
4. Try it out!
The python-project-creator example shows how the system learns from your feedback.
Give It a Try
https://github.com/haddock-development/claude-reflect-system
Feel free to check it out, give feedback, or contribute. Every bit of input helps improve the project!
Thank you so much for your support!
---
#ClaudeCode #AI #MachineLearning #ContinualLearning #OpenSource #Developer #Coding #Python #Productivity #DevTools #GitHub #SoftwareDevelopment #Programming #AIAssistant #DeveloperTools #CodeQuality #Tech
Feel free to give it a try by yourself.
https://github.com/haddock-development/claude-reflect-system