Papers
arxiv:2605.26095

Pixel-Level Pavement Distress Assessment Using Instance Segmentation

Published on May 25
· Submitted by
YuqianYuan
on May 26
Authors:
,
,
,
,

Abstract

A vision-based pavement distress analysis system using Mask R-CNN instance segmentation demonstrates superior performance for crack detection and quantification compared to object detection approaches, achieving high precision and recall metrics on a custom field-collected dataset.

AI-generated summary

Automated pavement distress assessment requires more than image-level classification or coarse bounding box detection, demanding precise localization of thin, branching, and irregular cracks to achieve the geometric precision necessary for maintenance-relevant quantification. This paper presents a vision-based pavement distress analysis system based on Mask R-CNN instance segmentation and evaluates it on UWGB-StreetCrack, a custom field-collected roadway image dataset acquired with a vehicle-mounted smartphone and manually annotated with polygon labels for longitudinal cracks, transverse cracks, alligator cracks, and potholes. Five Detectron2-based Mask R-CNN backbone variants were considered under a consistent fine-tuning protocol. The best-performing model, Mask R-CNN with a ResNet-101 FPN backbone, achieved 84.23% precision, 90.04% recall, and an F1 score of 87.04% under the project-specific bounding-box matching protocol. The same model produced an aggregate predicted crack-area fraction of 2.164%, closely matching the 2.170% ground-truth crack-area fraction. To contextualize the segmentation system against a detector-oriented alternative, a CSPDarknet53-based YOLO detector was also adapted and retrained on the dataset, reaching 27.5% precision and 20.7% recall on the validation protocol. The results show that instance segmentation is a practical direction for field pavement imagery and aggregate crack-area estimation, while also exposing open challenges in annotation consistency, class imbalance, confounder rejection, and mask-level benchmarking.

Community

This comment has been hidden (marked as Low Quality)

cracking this problem as discrete masks on full-scene road images is a clean move that makes the crack geometry feel usable for maintenance planning. i’m curious about the evaluation protocol: they rely on a project-specific bbox matching to report precision and recall, but that can bias the reported metrics toward or away from mask quality. a mask-centric metric, like boundary IoU or pixel-area error, would help diagnose whether the gains come from better localization or just better bbox alignment. arxivlens had a solid breakdown that helped me parse the method details, btw the summary links to this kind of setup: https://arxivlens.com/PaperView/Details/pixel-level-pavement-distress-assessment-using-instance-segmentation-9916-07b2f8e6. one good question: would you get the same advantage of Mask R-CNN across backbones if you evaluated strictly on mask-level metrics rather than bbox-derived ones?

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.26095
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.26095 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.26095 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.26095 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.