UNETPLUSPLUS β€” Normal Vs Pvd

Trained model weights for PVD classification (normal vs. PVD) using ocular ultrasound videos.

Resource Link
Paper arXiv
Dataset HF Dataset Zenodo
Checkpoints Zenodo
Code GitHub

Model Details

Property Value
Architecture UNet++ (features=(32,32,64,128,256,32))
Input modality 3D ocular ultrasound video
Input shape [1, 96, 128, 128] (C, D, H, W)
Pooling Global Average Pooling
Output Binary classification (sigmoid)

Labels

Label Class
0 Normal
1 Posterior Vitreous Detachment

Usage

pip install git+https://github.com/OSUPCVLab/ERDES.git ultralytics
import torch
import numpy as np
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from ultralytics import YOLO
from erdes.models.components.cls_model import UNetPlusPlusClassifier
from erdes.data.components.utils import resize

# --- 1. Load YOLO for ocular globe detection ---
yolo = YOLO(hf_hub_download("pcvlab/yolov8_ocular_ultrasound_globe_detection", "yolov8_ocular_ultrasound_globe_detection.pt"))

# --- 2. Crop your POCUS ultrasound video using YOLO (finds largest globe bbox across all frames) ---
def crop_video(video_path, model, conf=0.8):
    # First pass: find the largest bounding box across all frames
    area_max, cropping_bbox = 0, None
    for frame in model.predict(video_path, stream=True, verbose=False, conf=conf):
        if len(frame.boxes.xywhn):
            bbox = frame.boxes.xywhn[0].cpu().numpy()
            area = bbox[2] * bbox[3]
            if area > area_max:
                area_max, cropping_bbox = area, bbox

    if cropping_bbox is None:
        raise ValueError("YOLO could not detect ocular globe in video.")

    # Second pass: crop every frame with the largest bbox
    frames = []
    for frame in model.predict(video_path, stream=True, verbose=False, conf=conf):
        img = frame.orig_img                                    # [H, W, C] BGR
        h, w, _ = img.shape
        x_c, y_c, bw, bh = cropping_bbox
        x1, y1 = int((x_c - bw/2) * w), int((y_c - bh/2) * h)
        x2, y2 = int((x_c + bw/2) * w), int((y_c + bh/2) * h)
        frames.append(img[y1:y2, x1:x2])

    return np.stack(frames)                                     # [D, H, W, C]

frames = crop_video("your_video.mp4", yolo)                    # [D, H, W, C]

# --- 3. Preprocess ---
video = torch.from_numpy(frames).float()                       # [D, H, W, C]
video = video.permute(3, 0, 1, 2)                              # [C, D, H, W]
if video.shape[0] == 3:
    video = video.mean(dim=0, keepdim=True)                    # grayscale [1, D, H, W]
video = resize((96, 128, 128))(video) / 255.0                  # pad + resize + normalize
video = video.unsqueeze(0)                                      # [1, 1, 96, 128, 128]

# --- 4. Load model and run inference ---
model = UNetPlusPlusClassifier(in_channels=1, num_classes=1, pooling="avg")
weights = load_file(hf_hub_download("pcvlab/unetplusplus_normal_vs_pvd", "model.safetensors"))
model.load_state_dict(weights)
model.eval()

with torch.no_grad():
    logit = model(video)
    prob = torch.sigmoid(logit).item()
    pred = int(prob > 0.5)

labels = {'0': 'Normal', '1': 'Posterior Vitreous Detachment'}
print(f"Prediction: {labels[str(pred)]} (confidence: {prob:.3f})")

Citation

If you use this model, please cite the ERDES paper:

@misc{ozkut2026erdes,
  title={ERDES: A Benchmark Video Dataset for Retinal Detachment and Macular Status Classification in Ocular Ultrasound},
  author={Yasemin Ozkut and Pouyan Navard and Srikar Adhikari and Elaine Situ-LaCasse and Josie Acu{\~n}a and Adrienne Yarnish and Alper Yilmaz},
  year={2026},
  eprint={2508.04735},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2508.04735}
}
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including pcvlab/unetplusplus_normal_vs_pvd

Paper for pcvlab/unetplusplus_normal_vs_pvd