Seems some approaches.
If you want local redaction of sensitive info, you have two viable patterns. Pick the one that matches how “strict” your output must be.
Redaction SLM (generates redacted text + a list of entities).
Fast to integrate. More “LLM-like.” You must validate output.
Detector model (find spans) + deterministic masking (your code replaces spans).
More predictable. Easier to audit. Usually safer in production.
Below are the best small local options I would start with, with enough context to choose quickly.
What “good at redaction” usually means
Redaction is not just “detect PII.” It is:
- Find sensitive spans (names, emails, phones, addresses, IDs, secrets).
- Replace them consistently (e.g.,
[EMAIL], [PHONE], hashed tokens).
- Preserve everything else (so logs and documents remain useful).
- Return structured output you can trust (ideally strict JSON).
If you skip structure and validation, you will eventually ship a system that:
- misses some sensitive info, or
- changes non-sensitive text, or
- emits malformed “almost JSON.”
This is why the best practical systems split “detection” from “masking.”
Best “small model that directly redacts” (recommended starting point)
Distil Labs Distil-PII models (small SLMs, designed for policy-aware redaction)
These are explicitly trained to output one JSON object containing:
redacted_text with minimal in-place replacements
entities describing what was replaced and why (Hugging Face)
Why they fit your requirement
- They are positioned for local / on-device deployment and strict schema adherence. (Hugging Face)
- Distil Labs states the 1B model can be deployed on a laptop and is tuned for on-device privacy and latency. (GitHub)
- The Hugging Face card shows local deployment via vLLM or Ollama, and notes a GGUF variant exists in the collection. (Hugging Face)
Which size to pick
- Distil-PII Llama 3.2 1B: best “small but strong” baseline for local use. (Hugging Face)
- Distil-PII Llama 3.2 3B: use if you can afford more compute and want more headroom. (Hugging Face)
- Distil-PII Gemma 3 270M: a mid-tiny option if 1B is too heavy, but expect more misses than 1B/3B. (Hugging Face)
- Distil-PII SmolLM2 135M: smallest; only choose if you are extremely resource-constrained. (Hugging Face)
Key integration tip (important)
Even if a model aims to output strict JSON, treat it as untrusted until you:
- JSON-parse it
- schema-validate it
- and verify it did not rewrite non-PII text
Distil-PII helps by strongly specifying a schema and “minimal replacements,” but you still validate. (Hugging Face)
Best “small detector model” options (then you mask deterministically)
This approach often wins if you need:
- exact character preservation outside PII
- stable offsets and audit logs
- predictable behavior under load
Piiranha-v1 (token classification PII detector)
- Detects 17 PII types across English, Spanish, French, German, Italian, Dutch. (Hugging Face)
- Built on microsoft/mdeberta-v3-base and notes 256 token context (you must chunk long text). (Hugging Face)
- License is CC BY-NC-ND 4.0, which is a big constraint for many commercial or derivative-use projects. (Hugging Face)
Ai4Privacy OpenPII anonymiser models (token classification, practical licenses)
- English anonymiser OpenPII: MIT license; explicitly warns you must evaluate on your own data; “intended solely for redaction purposes.” (Hugging Face)
- Multilingual categorical anonymiser OpenPII: supports multiple languages (fr/en/de/te/hi/it/es/nl) and is positioned as redact + classify. (Hugging Face)
- Also a multilingual anonymiser variant with MIT license disclaimers similar to the English model. (Hugging Face)
Detector-model reality check: “token classification” models usually do best on standard PII patterns (emails, phones) and common named entities. They can struggle with:
- secrets embedded in code logs
- weird internal IDs
- domain-specific “sensitive info” that is not standard PII
That is why you almost always add rules.
The production-friendly way to wire this up locally
Use Presidio as the pipeline “spine”
Presidio separates:
- Analyzer: find entities
- Anonymizer: apply operators like redact, replace, mask, hash, encrypt (Microsoft GitHub)
It also supports plugging in Hugging Face transformers via a TransformersNlpEngine (spaCy pipeline wrapping a HF NER model). (Microsoft GitHub)
This is a good fit if you want to mix:
- high-precision regex recognizers (emails, credit cards, tokens)
- plus a small ML detector (names, orgs, locations)
- plus deterministic masking operators (replace
[PERSON], mask last 4 digits, etc.) (Microsoft GitHub)
If you care about quality, add evaluation early
- presidio-research exists specifically to evaluate the system and recognizers/models. (GitHub)
- Presidio recipes emphasize measuring precision, recall, and an F2-style preference for recall when privacy matters. (Microsoft GitHub)
How I would choose for your project
Given only what you told me (local, small, “redact sensitive info”), I would do this:
Path 1: You want the simplest “redacted text out” experience
Start with Distil-PII Llama 3.2 1B. (Hugging Face)
Add guardrails:
- validate JSON
- reject outputs that rewrite non-PII text
- run a second-pass regex scan for obvious formats (emails, keys, tokens)
If it misses too much, move to the 3B. (Hugging Face)
Path 2: You want correctness, auditability, and strict text preservation
Use Presidio + rules + a detector model.
- Detector candidates: Ai4Privacy English (MIT) or Piiranha if the license fits. (Hugging Face)
- Mask with Presidio operators (replace, mask, hash, encrypt) so output is deterministic. (Microsoft GitHub)
This path tends to be more work up front, but it is easier to prove safe later.
Path 3: Your “sensitive info” is policy-based, not just PII
If you mean “remove anything that violates a policy” (trade secrets, internal incident details, etc.), you should test on a policy benchmark mindset. RedacBench is one of the clearest recent references for evaluating policy-based redaction while preserving utility. (OpenReview)
In that case, you still usually keep a deterministic layer for standard PII and secrets, and use an SLM only for the “policy semantics.”
Common pitfalls (so you do not get burned later)
- Long texts: many detector models require chunking (Piiranha explicitly mentions 256 tokens). (Hugging Face)
- Secrets are not PII: add regex for API keys, JWTs, bearer tokens, etc. Models miss these.
- Structured output drift: always parse and validate JSON, even from models designed for JSON. (Hugging Face)
- Over-redaction destroys usefulness: track “how much non-PII text changed” as a metric, not just recall.
Summary
- Best small local “redacted text + entities JSON” model family: Distil-PII (start with Llama 3.2 1B, go to 3B if needed). (Hugging Face)
- Best small local “detect then mask” options: Ai4Privacy OpenPII (MIT) or Piiranha (good, but restrictive license and short context). (Hugging Face)
- Best practical architecture: Presidio for detection + deterministic anonymization operators, optionally backed by HF transformers. (Microsoft GitHub)