Open to Collab

Massimo Roberto Scamarcia PRO

mrs83

AI & ML interests

Natural Language Processing, Text Generation, Question Answering, Data Augmentation, Knowledge Transfer, Chain-of-Thought, ResearchOps, MLOps

Recent Activity

updated a model 1 day ago

ethicalabs/FlowerTune-SmolLM2-1.7B-Instruct-Finance-PEFT

updated a model 1 day ago

ethicalabs/FlowerTune-Qwen2.5-Coder-0.5B-Instruct-Q4_K_M-GGUF

updated a model 1 day ago

ethicalabs/Kurtis-SmolLM2-360M-Instruct-DPO

View all activity

Organizations

reacted to sergiopaniego's post with ❤️ 19 days ago

Post

6267

new banger blog alert 🚨

@ariG23498 is starting a blog series about profiling in pytorch and part 1 just dropped

takes you from the simplest scenario to actually knowing what your gpu is doing. if you have never opened a profiler trace this is where you start

covers torch.profiler from scratch. reading tables and traces, overhead bound vs compute bound, the full dispatch chain from python to gpu kernels, and what torch.compile is actually fusing under the hood

find it here: https://huggingface.co/blog/torch-profiler

1 reply

reacted to PhysiQuanty's post with 🔥 about 1 month ago

Post

5150

❗ Dating apps do not allow us to control the profiles suggested to us based on our mutual search criteria ❗
🧬 If you want to see if your soulmate has already existed, I have published a dataset of 59k anonymized public profiles

SpiceeChat/OkCupid-59k-Anonymized-Profiles

Are you looking for a female ML engineer who is looking for a male ML engineer and you can't find it on the apps ?
You need to look for her, but more importantly, she needs to look for you.
Personally, I'm looking for a physicist I'm encountering the same problem. I can't find it
My answer : Paradox of choice of dating apps solved by patent ⚡ WO2026082672 ⚡
https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2026082672

J'ai du breveté pour te trouver et on se trouvera bientôt !

9 replies

reacted to HannesVonEssen's post with ❤️ about 1 month ago

Post

11648

📣 Hugging Face Visualizer, now as Chrome extension!
https://hfviewer.com

✨ After installing, Hugging Face model pages will have an architecture visualization on the model page itself!

🔗 Link:
https://chromewebstore.google.com/detail/hugging-face-viewer/mmadlggmpkpiockpjfepaohcllbnakej

Thanks for all the nice feedback so far! ❤️

5 replies

reacted to qgallouedec's post with 🚀 about 2 months ago

Post

8114

TRL v1.3 ships day-one training support for Qwen 3.6 🚀

The new Qwen 3.6 family (Qwen/Qwen3.6-27B, Qwen/Qwen3.6-35B-A3B) reuses the Qwen3.5-MoE architecture but ships a slightly different chat template, so we updated the stack end-to-end: new training template with {% generation %} markers, tool-call response schema routing, tiny test models for the VLM matrix.

SFT with assistant-only loss works out of the box:

from trl import SFTConfig, SFTTrainer

trainer = SFTTrainer(
    model="Qwen/Qwen3.6-27B",
    args=SFTConfig(assistant_only_loss=True),
    train_dataset=dataset,
)
trainer.train()

So does GRPO tool-calling — just hand tools=[...] to GRPOTrainer.

v1.3 also brings a new experimental TPO trainer (Triple Preference Optimization), speculative decoding in trl vllm-serve (Qwen3 MTP / Eagle3 drafts), 12 more KTO ↔ DPO alignment PRs (KTO promotion to stable is now in reach), three more {% generation %} chat templates (Gemma/Gemma 2, Phi-3, GLM-4-MoE), and a chunky SFT entropy bug fix.

Full release notes: https://github.com/huggingface/trl/releases/tag/v1.3.0

replied to their post about 2 months ago

thanks! I updated the app today. Both the model and the app are Apache-2.0 licensed, so feel free to build with them and experiment. While the model probably won't be as good as a conversational assistant, we can only understand where it really shines through experimentation. apparently, it works very well as "semantic compressor" and with classification tasks. maybe with audio? let's see

posted an update 2 months ago

Post

254

Real-time 3D telemetry of ethicalabs/Echo-DSRN-114M-v0.1.2 processing "can you please order a pizza" through the Intent classifier LoRA adapter.

Each orange lattice is a DSRNBlock slow state manifold. The red sphere is live entropy. The right panel shows the surprise gate firing token by token as the model converges on [TAKEAWAY_ORDER].

I built this because I'm a visual learner and I wanted to see the surprise gate open and close on each token. I needed to see what was happening inside the network, not just trust that it was working.

Turns out it's also a decent way to explain the architecture to someone who's never heard of this.

Made with Google Antigravity (Gemini)

mrs83/Echo-DSRN-114M-Telemetry-3D

2 replies

posted an update 2 months ago

Post

160

While the

flwrlabs community gathered in London for their Summit, we released ethicalabs/FlowerTune-Echo-DSRN-114M-Finance-PEFT onto the Flower Hub: a federated PEFT adapter for financial sentiment, built on a novel architecture called Echo-DSRN, a project I started working on 2 years ago.

The core problem we set out to solve: financial data on ledgers, earnings calls, tick streams, blows up the memory footprint of standard Transformers.

KV-Cache scaling makes federated training on the edge increasingly difficult. You cannot preserve data privacy if your decentralized nodes keep running out of memory.

Echo-DSRN addresses this at the architectural level. It uses a dual recurrent state design: a GRU fast path for short-range dynamics, and a surprise-gated slow memory whose write intensity is modulated by prediction error.

The result is O(1) memory regardless of context length. Runs on CPU, AMD ROCm, Apple MPS, NVIDIA GPUs.

Combined with the Flower federated framework, financial institutions can now run local fine-tuning on proprietary data without it ever leaving their infrastructure.

Results on standard financial sentiment benchmarks:
→ FPB: 70.2%
→ TFNS: 70.2%
→ FIQA: 63.8%

This is a 114M baseline. The next step is scaling.

The surprise gating mechanism independently converged on what

google described in their Titans paper. No working open implementation existed. This one does.

Flower Hub: https://flower.ai/apps/mrs83/echo-dsrn-114m-finance
Hugging Face: ethicalabs/FlowerTune-Echo-DSRN-114M-Finance-PEFT

posted an update 2 months ago

Post

143

Echo-DSRN-114M: A Constant-Memory O(1) Semantic Compressor

While traditional models target general conversational reasoning, Echo-DSRN(N) is a specialized structural prototype.

It is a dual-state recurrent neural network engineered strictly for low-latency semantic compression and continuous text streaming with a permanent O(1) memory footprint.

⚙️ Echo-DSRN (114M Parameters) manages context via continuous structural compression:

- 8 Layers | 512 Hidden Dim
- Transformer Fast State + DSRN/GRU Recurrent Slow State + Surprise Gating

Initial pre-training on a single AMD Instinct MI300X, followed by localized refinement across AMD Radeon PRO GPUs and an AMD Ryzen AI Max+ 395 (Strix Halo).

🖥️ A Hugging Face Space showcasing the architecture is currently running on the free shared CPU tier.

- The Compressor: Ingest a long document and crush it into a fixed 2048-dimensional .npy state vector.
- Vector Similarity: Upload two compressed .npy states to instantly calculate cosine similarity for ultra-lightweight RAG pre-filtering.
- The CPU Streamer: Continuous, fluent text generation running on raw CPU compute.

⚠️ Disclaimer: This is a structural prototype. It has internalized formatting and conversational syntax, but it possesses zero world knowledge. It will confidently hallucinate. Use it for streaming transcription, style mimicry, and local semantic hashing, not for factual reasoning.

Try the CPU Demo: ethicalabs/Echo-DSRN-114M
Try the Model: ethicalabs/Echo-DSRN-114M

reacted to SeaWolf-AI's post with 🔥 3 months ago

Post

8194

🚀 Introducing MARL — Runtime Middleware That Reduces LLM Hallucination Without Fine-Tuning

Now available on PyPI · GitHub · ClawHub · HuggingFace
AI models sense they could be wrong, but they can't actually fix what's broken.

🤗 Live A/B test: VIDraft/MARL

We evaluated 9 SOTA models (GPT-5.2, Claude Opus 4.6, Gemini 3 Pro, etc.) across 1,800 assessments in FINAL Bench and found a 39.2%p gap between "recognizing potential errors (MA=0.694)" and "actually finding and fixing them (ER=0.302)."

MARL (Model-Agnostic Runtime Middleware for LLMs) was built to close this metacognitive gap. It decomposes a single LLM call into a 5-stage expert pipeline (Hypothesis → Solver → Auditor → Adversarial Verifier → Synthesizer), transforming "answer in one shot" into "think, doubt, correct, and rewrite."

No weight modification — works instantly with GPT-5.4, Claude, Gemini, Llama, or any OpenAI API-compatible LLM by changing one line: base_url. Ships with 9 domain-specific emergence engines (invention, pharma, genomics, chemistry, ecology, law, and more — 5,538 expert data items) activated by a simple tag like model="gpt-5.4::pharma".

pip install marl-middleware

MARL is also officially registered on ClawHub, the skill marketplace of OpenClaw — an AI agent platform with 260K+ developers and 3,200+ skills. It's the first middleware in the Reasoning Enhancement category. One command — clawhub install marl-middleware — gives your AI agent a metacognition upgrade.

📝 Technical deep dive: https://huggingface.co/blog/FINAL-Bench/marl-middleware
📦 PyPI: https://pypi.org/project/marl-middleware/
🐙 GitHub: https://github.com/Vidraft/MARL
🦀 ClawHub: https://clawhub.ai/Cutechicken99/marl-middleware

#MARL #LLM #Hallucination #Metacognition #MultiAgent #AIMiddleware #FINALBench #OpenClaw #ClawHub #PyPI #AGI #HuggingFace #ReasoningAI #SelfCorrection #GlassBoxAI

reacted to SeaWolf-AI's post with 🔥 3 months ago

Post

11144

🏟️ Smol AI WorldCup: A 4B Model Just Beat 8B — Here's the Data

We evaluated 18 small language models from 12 makers on 125 questions across 7 languages. The results challenge the assumption that bigger is always better.

Community Article: https://huggingface.co/blog/FINAL-Bench/smol-worldcup
Live Leaderboard: ginigen-ai/smol-worldcup
Dataset: ginigen-ai/smol-worldcup

What we found:

→ Gemma-3n-E4B (4B, 2GB RAM) outscores Qwen3-8B (8B, 5.5GB). Doubling parameters gained only 0.4 points. RAM cost: 2.75x more.

→ GPT-OSS-20B fits in 1.5GB yet matches Champions-league dense models requiring 8.5GB. MoE architecture is the edge AI game-changer.

→ Thinking models hurt structured output. DeepSeek-R1-7B scores 8.7 points below same-size Qwen3-8B and runs 2.7x slower.

→ A 1.3B model fabricates confident fake content 80% of the time when prompted with nonexistent entities. Qwen3 family hits 100% trap detection across all sizes.

→ Qwen3-1.7B (1.2GB) outscores Mistral-7B, Llama-3.1-8B, and DeepSeek-R1-14B. Latest architecture at 1.7B beats older architecture at 14B.

What makes this benchmark different?

Most benchmarks ask "how smart?" — we measure five axes simultaneously: Size, Honesty, Intelligence, Fast, Thrift (SHIFT). Our ranking metric WCS = sqrt(SHIFT x PIR_norm) rewards models that are both high-quality AND efficient. Smart but massive? Low rank. Tiny but poor? Also low.

Top 5 by WCS:
1. GPT-OSS-20B — WCS 82.6 — 1.5GB — Raspberry Pi tier
2. Gemma-3n-E4B — WCS 81.8 — 2.0GB — Smartphone tier
3. Llama-4-Scout — WCS 79.3 — 240 tok/s — Fastest model
4. Qwen3-4B — WCS 76.6 — 2.8GB — Smartphone tier
5. Qwen3-1.7B — WCS 76.1 — 1.2GB — IoT tier

Built in collaboration with the FINAL Bench research team. Interoperable with ALL Bench Leaderboard for full small-to-large model comparison.

Dataset is open under Apache 2.0 (125 questions, 7 languages). We welcome new model submissions.

1 reply

reacted to SeaWolf-AI's post with 🔥 4 months ago

Post

4327

FINAL Bench Released: The Real Bottleneck to AGI Is Self-Correction

We release FINAL Bench, the first benchmark for measuring functional metacognition in LLMs — the ability to detect and correct one's own reasoning errors. Every existing benchmark measures final-answer accuracy. None measures whether AI knows it is wrong.

Dataset: [FINAL-Bench/Metacognitive]( FINAL-Bench/Metacognitive) | 100 Tasks | 15 Domains | 8 TICOS Types | Apache 2.0

Leaderboard: FINAL-Bench/Leaderboard

Article: https://huggingface.co/blog/FINAL-Bench/metacognitive

Core Innovation

Our 5-axis rubric separates what no prior benchmark could: MA (Metacognitive Accuracy) — the ability to say "I might be wrong", and ER (Error Recovery) — the ability to actually fix it. This maps directly to the monitoring-control model of Nelson & Narens (1990) in cognitive psychology.

Three Findings Across 9 SOTA Models

We evaluated GPT-5.2, Claude Opus 4.6, Gemini 3 Pro, DeepSeek-V3.2, Kimi K2.5, and others across 100 expert-level tasks:

1. ER Dominance. 94.8% of MetaCog gain comes from Error Recovery alone. The bottleneck to AGI is not knowledge or reasoning — it is self-correction.

2. Declarative-Procedural Gap. All 9 models can verbalize uncertainty (MA = 0.694) but cannot act on it (ER = 0.302). They sound humble but fail to self-correct — the most dangerous AI safety profile.

3. Difficulty Effect. Harder tasks benefit dramatically more from metacognition (Pearson r = -0.777, p < 0.001).

from datasets import load_dataset
dataset = load_dataset("FINAL-Bench/Metacognitive", split="train")

Paper: FINAL Bench: Measuring Functional Metacognitive Reasoning in LLMs

FINAL Bench is the first tool to tell apart what AI truly knows from what it merely pretends to know.

6 replies

replied to qgallouedec's post 4 months ago

Thanks for sharing, we are using a similar recipe for our small models 👏

reacted to qgallouedec's post with 🔥 4 months ago

Post

3086

@CohereLabs just released 🌿 Tiny Aya: a fully open-source 3B parameter model that speaks 70+ languages 🌍! But there’s a catch:

Tiny Aya is just a language model. It doesn’t support tool calling, the key capability that turns frontier models into powerful *agents*.
So the real question is:

How hard is it to turn Tiny Aya into an agent?

Turns out… it’s simple, thanks to Hugging Face TRL.
We’re sharing a hands-on example showing how to train Tiny Aya to turn it into a tool-calling agent using TRL, unlocking what could become the first *massively multilingual open agent*.

Small model. Global reach. Agent capabilities.

👉 https://github.com/huggingface/trl/blob/main/examples/notebooks/sft_tool_calling.ipynb

1 reply

replied to their post 4 months ago

https://huggingface.co/ethicalabs/Kurtis-EON1/discussions/2#699760c44c2b8775356cb36c

reacted to kostakoff's post with 🚀👍 4 months ago

Post

3358

My home lab for AI models - llmlaba v1

After I began learning MLOps I realized that I needed some kind of home lab, there are a lot of GPUs that I need to learn how to set up and test.
So I spent some time to do a researching which platform I could buy or build.
My requirements ware:
- Limited budget
- Power supply 1 kW or higher
- Few PCIe slots to be able to install more than one gpu
- Zero maintenance cost, I don't want spend a lot of time or money to maintain lab hardware, except for the GPUs

I chose the Intel Mac Pro 7.1:
- Prices on eBay acceptable
- Excelent cooling
- 1.4 kW power supply
- 7 PCIe slots
- Zero maintenance: I don't need to do anything with the Mac Pro hardware; it just works
- Classic UEFI boot loader

It requires a bit of OS preparation:
1. Install Ubuntu 24.04 (it works with the general PC ISO image)
2. Set up T2 drivers

sudo apt install -y dkms linux-headers-$(uname -r) applesmc-t2 apple-bce lm-sensors

3. Install t2fanrd to manually manage fans (/etc/t2fand.conf) https://wiki.t2linux.org/guides/fan/
4. Fix PCIe BAR: add pci=realloc to GRUB_CMDLINE_LINUX_DEFAULT so the Linux kernel will properly initializes server GPUs without Graphics Output Protocol
5. Install NVIDIA GPU driver:

sudo apt install nvidia-driver-570

And it works!
I was able to run server-grade Nvidia Tesla P100 (required DIY air duct), and consumer Nvidia Titan X, Titan V, GTX 1080 cards on the old Mac Pro 7.1 - even three in parallel.

llmlaba

3 replies

replied to their post 4 months ago

For MLX-LM we can only use Apple MPS. Re GPUs, at this moment I only have a AMD Ryzen AI Max+ 395 and AMD Instinct MI300X for rent.

replied to their post 4 months ago

@maxxafits00 federated learning is definitely the path forward, and it’s something we’ve already begun experimenting with using the flower.ai framework.

Regarding the release, we are currently in mid-training and prioritizing a rigorous "safety-first" pipeline.

We are conducting extensive evaluations on model plasticity, red-teaming for prompt injection, and most importantly, stress-testing for malicious use cases.

We want to ensure the model is robust before it hits the wild.

The current roadmap includes:

Completing the Knowledge Expansion phase.
A comprehensive DPO (Direct Preference Optimization) pass to align the "Kurtis" persona and reasoning capabilities.
Peer review and final validation.

A quick technical spoiler:

The base model pre-training is pure PyTorch and fully multi-GPU compatible. We are utilizing a Curriculum Learning strategy: starting with a small context length and gradually scaling up. This is paired with an enormous batch size and small data chunks.

replied to their post 4 months ago

This comment has been hidden

replied to their post 4 months ago

@maxxafits00 if you are on a budget, I suggest to start small. I unfortunately don't have enough compute to scale right now. To evaluate a pretraining or distillation framework (such as arcee-ai's distillkit), or a new model architecture, you can start from datasets such as TinyStories and move to FineWeb-EDU, cosmopedia, etc later.

Wait for the training and architecture to be stable and validated before moving to a bigger dataset/model. Also, a 7-8B parameters is probably too big for small scale pre-training experiments.

You should try to target 0.5B, max 3B, especially if you use consumer-grade hardware, or a single GPU for rent.

Massimo Roberto Scamarcia PRO

AI & ML interests

Recent Activity

Organizations

mrs83's activity