Papers
arxiv:2605.20016

FGSVQA: Frequency-Guided Short-form Video Quality Assessment

Published on May 19
Authors:
,
,
,

Abstract

An end-to-end video quality assessment framework uses a CLIP-based dense visual encoder with frequency domain compression priors to generate artifact- and structure-aware weight maps for accurate and efficient quality prediction.

Short-form video poses new challenges to the quality assessment of user-generated content (UGC) due to its complex generation pipeline, rapid content variation, and mixed distortions. To address this challenge, we propose an end-to-end video quality assessment (VQA) framework that employs a dense visual encoder based on CLIP, and incorporates compression priors derived from the frequency domain to generate artifact- and structure-aware weight maps for feature aggregation. By explicitly decomposing artifact, structure, and original visual feature branches and adaptively fusing them over time through a learned gating module, the proposed method achieves accurate and efficient quality prediction. Experimental results show that our method achieves strong performance on short-form video datasets in terms of average rank and linear correlation (SRCC: 0.736, PLCC: 0.787), while maintaining efficient inference runtime. The code and additional results are available at: https://github.com/xinyiW915/FGSVQA.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.20016
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.20016 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.20016 in a Space README.md to link it from this page.

Collections including this paper 1