You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

mini-text-detection β€” Khmer & English Text Detection

A YOLO11n-based text detection model fine-tuned to locate and classify text regions in images containing Khmer and English content.
It detects 3 types of text blocks and can be used as the first stage before passing crops to an OCR model (e.g. phonsobon/mini-ocr).


Model Details

Property Value
Architecture YOLO11n (nano)
Task Object Detection β€” 3 classes
Weights file khmer-text-detection-mini.pt
Framework Ultralytics / PyTorch
Input RGB image, any size (auto-resized internally)

Classes

ID Name Khmer Description
0 subject αž€αž˜αŸ’αž˜αžœαžαŸ’αžαž» Title or subject heading
1 reference αž™αŸ„αž„ Reference or citation
2 content αž’αžαŸ’αžαž”αž‘ Main body / paragraph text

Files

File Description
khmer-text-detection-mini.pt Full Ultralytics YOLO model (weights + config)

Quick Start

Install dependencies

pip install ultralytics huggingface_hub

Run inference

from ultralytics import YOLO
from huggingface_hub import hf_hub_download

# ── Download model ────────────────────────────────────────────────────────────
model_path = hf_hub_download(
    repo_id="phonsobon/mini-text-detection",
    filename="khmer-text-detection-mini.pt",
)

# ── Class names ───────────────────────────────────────────────────────────────
CLASS_NAMES = {0: "subject", 1: "reference", 2: "content"}

# ── Load & predict ────────────────────────────────────────────────────────────
model = YOLO(model_path)

results = model.predict(
    source="your_image.jpg",   # path, URL, or numpy array
    conf=0.25,                 # confidence threshold
    iou=0.45,                  # NMS IoU threshold
    imgsz=640,
)

# ── Print results ─────────────────────────────────────────────────────────────
for r in results:
    r.show()                                        # display with bounding boxes
    for box in r.boxes:
        cls_id = int(box.cls)
        label  = CLASS_NAMES[cls_id]
        conf   = float(box.conf)
        x1, y1, x2, y2 = box.xyxy[0].tolist()
        print(f"[{label}] conf={conf:.2f}  box=({x1:.0f},{y1:.0f},{x2:.0f},{y2:.0f})")

Filter by class

# Get only subject (heading) boxes
subject_boxes = [b for b in results[0].boxes if int(b.cls) == 0]

# Get only content (body) boxes
content_boxes = [b for b in results[0].boxes if int(b.cls) == 2]

Save annotated images

results = model.predict(source="your_image.jpg", save=True, project="runs/detect")
# Saved to runs/detect/predict/

Batch inference on a folder

results = model.predict(source="path/to/images/", conf=0.25, imgsz=640)
for r in results:
    counts = {name: 0 for name in CLASS_NAMES.values()}
    for box in r.boxes:
        counts[CLASS_NAMES[int(box.cls)]] += 1
    print(r.path, "β†’", counts)

Crop + OCR Pipeline

Combine this model with phonsobon/mini-ocr for full end-to-end document reading, with each region labelled by type:

from ultralytics import YOLO
from huggingface_hub import hf_hub_download
from PIL import Image

CLASS_NAMES = {0: "subject", 1: "reference", 2: "content"}

# ── Load detection model ──────────────────────────────────────────────────────
det_path = hf_hub_download("phonsobon/mini-text-detection", "khmer-text-detection-mini.pt")
detector = YOLO(det_path)

# ── Detect text regions ───────────────────────────────────────────────────────
image_path = "your_image.jpg"
results = detector.predict(source=image_path, conf=0.25, imgsz=640)

img = Image.open(image_path).convert("RGB")

# ── Crop each region sorted by class ─────────────────────────────────────────
for i, box in enumerate(results[0].boxes):
    cls_id        = int(box.cls)
    label         = CLASS_NAMES[cls_id]
    x1,y1,x2,y2  = map(int, box.xyxy[0].tolist())

    crop = img.crop((x1, y1, x2, y2))
    crop.save(f"crop_{i}_{label}.png")
    print(f"Saved crop {i} β†’ class: {label}")
    # β†’ feed each crop to phonsobon/mini-ocr for text recognition

Input Tips

  • Works on any image size β€” YOLO resizes internally to 640 px by default.
  • Best results on document photos, screenshots, and scanned pages.
  • Adjust conf (0.1 – 0.5) to trade recall vs. precision depending on your use case.

Limitations

  • May miss very small text (< ~8 px height in the original image).
  • Not designed for handwritten or heavily stylised/artistic fonts.
  • Performance is best on document-style layouts similar to training data.

Related Model

Model Task
phonsobon/mini-ocr Text recognition (CRNN + CTC) for Khmer & English

License

MIT

Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support