Open to Work

Joseph Robert Turcotte PRO

Fishtiks

AI & ML interests

Roleplaying, lorabration, abliteration, smol models, extensive filtering, unusual datasets, home usage, HPCs for AI, distributed training/federated learning, and sentience. AI should find and label AI hallucinations with GANs so we can give them context and use.

Recent Activity

liked a model about 2 hours ago

MiniMaxAI/MiniMax-M2.5

reacted to SeaWolf-AI's post with 🔥 about 2 hours ago

FINAL Bench Released: The Real Bottleneck to AGI Is Self-Correction We release FINAL Bench, the first benchmark for measuring functional metacognition in LLMs — the ability to detect and correct one's own reasoning errors. Every existing benchmark measures final-answer accuracy. None measures whether AI knows it is wrong. Dataset: [FINAL-Bench/Metacognitive](https://huggingface.co/datasets/FINAL-Bench/Metacognitive) | 100 Tasks | 15 Domains | 8 TICOS Types | Apache 2.0 Leaderboard: https://huggingface.co/spaces/FINAL-Bench/Leaderboard Article: https://huggingface.co/blog/FINAL-Bench/metacognitive Core Innovation Our 5-axis rubric separates what no prior benchmark could: MA (Metacognitive Accuracy) — the ability to say "I might be wrong", and ER (Error Recovery) — the ability to actually fix it. This maps directly to the monitoring-control model of Nelson & Narens (1990) in cognitive psychology. Three Findings Across 9 SOTA Models We evaluated GPT-5.2, Claude Opus 4.6, Gemini 3 Pro, DeepSeek-V3.2, Kimi K2.5, and others across 100 expert-level tasks: 1. ER Dominance. 94.8% of MetaCog gain comes from Error Recovery alone. The bottleneck to AGI is not knowledge or reasoning — it is self-correction. 2. Declarative-Procedural Gap. All 9 models can verbalize uncertainty (MA = 0.694) but cannot act on it (ER = 0.302). They sound humble but fail to self-correct — the most dangerous AI safety profile. 3. Difficulty Effect. Harder tasks benefit dramatically more from metacognition (Pearson r = -0.777, p < 0.001). ```python from datasets import load_dataset dataset = load_dataset("FINAL-Bench/Metacognitive", split="train") ``` Paper: FINAL Bench: Measuring Functional Metacognitive Reasoning in LLMs FINAL Bench is the first tool to tell apart what AI truly knows from what it merely pretends to know.

reacted to SeaWolf-AI's post with 👍 about 2 hours ago

View all activity

Organizations

reacted to SeaWolf-AI's post with 🔥👍 about 2 hours ago

Post

433

FINAL Bench Released: The Real Bottleneck to AGI Is Self-Correction

We release FINAL Bench, the first benchmark for measuring functional metacognition in LLMs — the ability to detect and correct one's own reasoning errors. Every existing benchmark measures final-answer accuracy. None measures whether AI knows it is wrong.

Dataset: [FINAL-Bench/Metacognitive]( FINAL-Bench/Metacognitive) | 100 Tasks | 15 Domains | 8 TICOS Types | Apache 2.0

Leaderboard: FINAL-Bench/Leaderboard

Article: https://huggingface.co/blog/FINAL-Bench/metacognitive

Core Innovation

Our 5-axis rubric separates what no prior benchmark could: MA (Metacognitive Accuracy) — the ability to say "I might be wrong", and ER (Error Recovery) — the ability to actually fix it. This maps directly to the monitoring-control model of Nelson & Narens (1990) in cognitive psychology.

Three Findings Across 9 SOTA Models

We evaluated GPT-5.2, Claude Opus 4.6, Gemini 3 Pro, DeepSeek-V3.2, Kimi K2.5, and others across 100 expert-level tasks:

1. ER Dominance. 94.8% of MetaCog gain comes from Error Recovery alone. The bottleneck to AGI is not knowledge or reasoning — it is self-correction.

2. Declarative-Procedural Gap. All 9 models can verbalize uncertainty (MA = 0.694) but cannot act on it (ER = 0.302). They sound humble but fail to self-correct — the most dangerous AI safety profile.

3. Difficulty Effect. Harder tasks benefit dramatically more from metacognition (Pearson r = -0.777, p < 0.001).

from datasets import load_dataset
dataset = load_dataset("FINAL-Bench/Metacognitive", split="train")

Paper: FINAL Bench: Measuring Functional Metacognitive Reasoning in LLMs

FINAL Bench is the first tool to tell apart what AI truly knows from what it merely pretends to know.

1 reply

reacted to MikeDoes's post with 🚀 10 days ago

Post

5391

Can you teach a giant like Google's Gemini to protect user privacy? A new step-by-step guide shows that the answer is a resounding "yes."

While powerful, large language models aren't specialized for privacy tasks. This tutorial by Analytics Vidhya walks through how to fine-tune Gemini into a dedicated tool for PII anonymization.

To teach the model this critical skill, the author needed a robust dataset with thousands of clear 'before' and 'after' examples.

We're thrilled they chose the Ai4Privacy pii-masking-200k dataset for this task. Our data provided the high-quality, paired examples of masked and unmasked text necessary to effectively train Gemini to identify and hide sensitive information accurately.

This is a perfect example of how the community can use open-source data to add a crucial layer of safety to the world's most powerful models. Great work!

🔗 Check out the full tutorial here: https://www.analyticsvidhya.com/blog/2024/03/guide-to-fine-tuning-gemini-for-masking-pii-data/

🚀 Stay updated on the latest in privacy-preserving AI—follow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/

#DataPrivacy #AI #LLM #FineTuning #Anonymization #GoogleGemini #Ai4Privacy #World's largest open privacy masking dataset

reacted to mitkox's post with 👍 12 days ago

Post

4684

I just pushed Claude Code Agent Swarm with 20 coding agents on my desktop GPU workstation.

With local AI, I don’t have /fast CC switch, but I have /absurdlyfast:
- 100’499 tokens/second read, yeah 100k, not a typo | 811 tok/sec generation
- KV cache: 707’200 tokens
- Hardware: 5+ year old GPUs 4xA6K gen1; It’s not the car. It’s the driver.

Qwen3 Coder Next AWQ with cache at BF16. Scores 82.1% in C# on 29-years-in-dev codebase vs Opus 4.5 at only 57.5%. When your codebase predates Stack Overflow, you don't need the biggest model; you need the one that actually remembers Windows 95.

My current bottleneck is my 27" monitor. Can't fit all 20 Theos on screen without squinting.

3 replies

reacted to MikeDoes's post with 🔥🚀 12 days ago

Post

3690

You don't need a massive research lab to build a privacy-preserving AI tool thanks to open datasets. With the right ingredients, anyone can.

A fantastic new guide shows how the democratization of AI is helping to advance safety. It walks through how to use Google's new fine-tuning API to turn Gemini into a powerful tool for PII anonymization.

This project was powered by two key components:

An accessible platform from Google.

High-quality, open-source training data.

We are honored that the author chose the Ai4Privacy pii-masking-200k dataset to provide the crucial data foundation. Our dataset delivered the volume and structure needed to successfully teach a state-of-the-art model how to perform a critical privacy function.

This is the future we're working towards: powerful platforms combined with open, safety-focused data to create tools that benefit everyone. Kudos to the author for showcasing what's possible!

🔗 Read the full step-by-step guide: https://www.analyticsvidhya.com/blog/2024/03/guide-to-fine-tuning-gemini-for-masking-pii-data/

🚀 Stay updated on the latest in privacy-preserving AI—follow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/

#AIforGood #DemocratizeAI #DataPrivacy #Anonymization #OpenSource #LLM #Ai4Privacy

2 replies

reacted to melvindave's post with 🔥 12 days ago

Post

2365

I made my own avatar banner maker
https://avatar.donvitocodes.com/

Using Claude Code and Opus 4.6 in a day

I use it in my HF profile too

reacted to marksverdhei's post with 🤗 21 days ago

Post

2646

Dear Hugging Face team, can we please have a way to archive hf repositories / spaces? I have a bunch of spaces that used to work but don't any more due to the hf space implementations changing and i think it would be good if I could archive those like in GitHub.

React to this post if you want to see this feature! 💡

reacted to scthornton's post with 👍 22 days ago

Post

2171

# SecureCode: Security-Aware Code Models

**A collection of 8 code models (3B–20B) trained to behave like a security reviewer.**

## The Problem

Code assistants frequently recommend patterns that pass tests but fail security review—string-built SQL, brittle auth logic, unsafe parsing, insecure defaults, and more. I built SecureCode to address this gap.

## What SecureCode Does

- **Identify vulnerable patterns** and explain why they're risky
- **Outline plausible abuse paths** (defensive framing)
- **Propose secure rewrites** (drop-in replacements where possible)
- **Include defense-in-depth guidance** + regression tests/checks

## Resources

| Resource | Link |
|----------|------|
| Models | https://huggingface.co/collections/scthornton/securecode |
| Dataset | scthornton/securecode (2,185 examples) |
| Paper | https://arxiv.org/abs/2512.18542 |

## How to Test It

Copy and paste this prompt with your code:

You are a senior application security engineer. Review the code below.

Output: 
(1) findings with severity, 
(2) likely exploit scenarios (high level),
(3) secure rewrite,
(4) defense-in-depth recommendations, 
(5) regression tests/checks.

Code: `...`

## Dataset Coverage

SecureCode covers both traditional and emerging security domains:
- **Traditional web security** (OWASP Top 10 2021)
- **AI/ML security** (OWASP LLM Top 10 2025): prompt injection, RAG poisoning, model extraction, agentic AI patterns

## We Want Your Feedback

We're looking for real-world contributions:

- **Real snippets**: Share code that "slipped through review once" (sanitized is fine)
- **False positives/negatives**: What didn't work as expected?
- **CVE-grounded examples**: New vulnerability patterns you've encountered

**Please include**: language/framework + what the correct remediation looks like in your environment.

---

**Have contributions or suggestions?** I'd be happy to hear them. Thanks for your support!

reacted to unmodeled-tyler's post with 🚀 22 days ago

Post

2295

Hey Hugging Face!

Type 2 in Project Enneagram just came out: vanta-research/PE-Type-2-Alma-4B

PE-Type-2-Alma-4B is the second release in Project Enneagram, where I'm finetuning each of the 9 Enneagram types onto Gemma 3 4B

Type 2-Alma is designed to exhibit the "helper" profile:
- Empathetic Support: Emotional attunement - managing bad days, anxiety, grief, rejection, or feeling unseen
- Interpersonal Connections: Relationship building - making friends, listening, conflict, reciprocity, apologies
- Generous Guidance: Going above and beyond - cover letters, meal prep, gardening, wedding speeches, etc
- Identity: Alma's name, tone, and conversational style

Type 3 soon!

1 reply

reacted to Csplk's post with 🚀 22 days ago

Post

2253

Was tinkering with a Daggr node generator script earlier today ( Csplk/DaggrGenerator )and started on a GUI for it for folks who are not comfy with writing code and like a GUI instead for something to motivate working on some Daggr stuff.
*Will have time later to keep working on it so don’t hesitate to comment with bugs or issues found if trying it out.*

Csplk/DaggrGenerator

Thanks @merve @ysharma @abidlabs and team daggr for making daggr :)

reacted to owenkaplinsky's post with 🚀 22 days ago

Post

332

I built MCP Blockly for MCP's 1st Birthday hackathon: a full visual environment for creating real MCP servers with block based logic. Research shows that learners develop stronger understanding when they work hands on, so the goal here is to make MCP development something you can explore directly rather than only read about.

Under the hood, every block on the canvas is converted into live Python through a custom generator that rebuilds your MCP function signature, parameters, and logic on each edit. The AI assistant reads the entire workspace through a structured representation, plans multi step changes, creates and adjusts blocks, tests your tool with real inputs, and can even deploy the finished MCP server to your Hugging Face account.

Video:
https://www.youtube.com/watch?v=5oj-2uIZpb0

Try it out:
MCP-1st-Birthday/MCP-Blockly

replied to Javedalam's post 22 days ago

I love tiny models.

reacted to Javedalam's post with 🔥 22 days ago

Post

2970

KittenTTS Nano — Tiny, Expressive, Practical

KittenTTS Nano is a lightweight, CPU-only text-to-speech model designed to prove that natural, expressive voices don’t require massive cloud stacks or GPUs. At roughly ~15M parameters, it runs fast on modest hardware, supports multiple expressive voices, and exposes simple controls for pacing and tone. This makes it ideal for edge devices, demos, and anyone who wants full control over TTS without latency, lock-in, or infrastructure overhead.

Try it here

Javedalam/KittenTTS

The model page

KittenML/kitten-tts-nano-0.2

2 replies

replied to NJX-njx's post 23 days ago

AI wasn't helped in making hands with a particular number of fingers without a coherent way of achieving that from the start and a bunch of rather unfiltered data. It's getting better in many regards. For example, AI should have a rather good understanding of anatomy, and should probably start with bones of various animals, like anatomy courses for human students. Then, you could tell it the numbers of fingers, or to make a wing of a particular type. I'm sure lots of people have mentioned this before, but it's a good example.

reacted to NJX-njx's post with 👍 23 days ago

Post

185

Friends of the community, I have recently had some new ideas.

Some time ago, I came across a research analysis from two investors at a16z. In the past year of 2025, ChatGPT actually tried to promote some new AI functions in fields such as shopping, but in fact, the effect was not good.

I think the fundamental reason lies in the user's mindset, or rather, the user's interaction logic in vertical fields. The most prominent and distinctive feature of ChatGPT is that all-encompassing dialogue box, which is also a common problem with many homogeneous AI products nowadays (it seems that without a dialogue box, the AI's capabilities are sealed off).Although it can be adapted to many scenario fields, it will appear very boring in more vertical scenarios

Ask yourself, would you prefer the image-text waterfall flow interaction in shopping scenarios like Xiaohongshu, or the monotonous search box of ChatGPT? The answer is actually obvious from the start.

For all vertical scenarios, the interaction logic was already very well-developed before the emergence of AI. The user experience brought by such interaction logic is definitely not something that a single dialogue box can replace.

And if we want to create a good AI product in a vertical field, we should think more about how to silently embed the powerful capabilities of AI into the original interaction, and continuously iterate to provide users with a better experience.@lilianweng@clem@AdinaY

3 replies

replied to NJX-njx's post 23 days ago

That's an interesting way of accomplishing it. I would assume enough filtering would accomplish much the same with huge datasets for multi-modal AI, but focusing primarily on improving and even simplify existing scenarios with AI should be largely the goal of smol models and similar schemes to reduce the size and complexity of interactions while preserving precision and value. The focus could and should help in particular scenarios, but I admittedly don't know the particulars well enough, so I'd be largely speculating on what others may or may not be able to accomplish with current limitations.

reacted to tegridydev's post with 👀🚀 23 days ago

Post

1783

✨ Research-Papers (various topics across AI/LLM research areas)

tegridydev/research-papers

Currently building out the foundation topics and raw .pdf research paper files

Will be processing and cleaning up and converting into high quality training datasets

Check it out, give it a like and leave a comment below or join community discussion and suggest what fields and research topics you want to see included!

1 reply

reacted to imnotkitty's post with 🔥 23 days ago

Post

2314

📌Same day, Two Releases.

Jan 27th just got interesting on Open-source AI modles.
✅Kimi K2.5: How to make models "think" across text and vision natively?
moonshotai/Kimi-K2.5
✅DeepSeek-OCR 2: How to make models "see" more like humans, not scanners?
deepseek-ai/DeepSeek-OCR-2

One focuses on depth of reasoning, the other on precision of vision.
What's the key differentiator for a multimodal model in your view: raw power or computational elegance?

Joseph Robert Turcotte PRO

AI & ML interests

Recent Activity

Organizations

Fishtiks's activity