Toward Generalized Image Quality Assessment:
Relaxing the Perfect Reference Quality Assumption
CVPR 2025

Du Chen^*^1,3

Tianhe Wu^*^2,3

Kede Ma²

Lei Zhang^1,3

* Equal contribution, order decided by random seed

With a reference image in the middle, which image, A or B, has better perceived visual quality? The proposed A-FINE generalizes and outperforms standard FR-IQA models under both perfect and imperfect reference conditions.

Abstract

Most full-reference image quality assessment (FR-IQA) models assume that the reference image is of perfect quality. However, this assumption is flawed because many reference images in existing IQA datasets are of subpar quality. Moreover, recent generative image enhancement methods are capable of producing images of higher quality than their original counterparts. These factors challenge the effectiveness and applicability of current FR-IQA models. To address this limitation, we build a large-scale IQA database, namely DiffIQA, which contains approximately 180,000 images generated by a diffusion-based image enhancer with adjustable hyper-parameters. Each image is annotated by human subjects as either worse, similar, or better quality compared to its reference. Building on this, we present a generalized FR-IQA model, namely Adaptive FIdelity-Naturalness Evaluator (A-FINE), to accurately assess and adaptively combine the fidelity and naturalness of the test image. A-FINE aligns well with standard FR-IQA when the reference image is much more natural than the test image. We demonstrate by extensive experiments that A-FINE surpasses existing FR-IQA models on well-established IQA datasets and our newly created DiffIQA. To further validate A-FINE, we additionally construct a super-resolution IQA benchmark (SRIQA-Bench), encompassing test images derived from ten state-of-the-art SR methods with reliable human quality annotations. Tests on SRIQA-Bench re-affirm the advantages of A-FINE.

Installation and Usage

Installation:


                git clone https://github.com/ChrisDud0257/AFINE
                

                cd QuickInference
                

                conda create --name afine python=3.10
                


                pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
                

                pip install -r requirements.txt

Usage:


                cd QuickInference
                


                python afine.py --pretrain_CLIP_path [path to the pretrained CLIP ViT-B-32.pt] --afine_path [path to our afine.pth] \
                --dis_img_path [path to the distortion image] --ref_img_path [path to the reference image]

Method

A-FINE leverages a shared feature transformation to perform image fidelity and naturalness assessment, which are adaptively combined to produce the final quality score.

DiffIQA

We gather original input images from three sources: 1) 1,200 images from the DF2K dataset; 2) 1,000 images from the Internet under the license of Creative Commons; and 3) 640 images captured using mobile phones or digital cameras. The input images are cropped to 512×512 with an overlap of less than 128 pixels, leading to a total of 29,868 images as inputs to our trained enhancer. During inference, we randomly 1) apply the same degradations as used during training, 2) augment the initial image latent with additive Gaussian noise of varying intensities, and 3) adjust the sampling steps within range [20, 1000] to generate images with diverse quality levels. By combining these operations, we generate six test images for each input image, yielding a total of 179,208 test images.

Instances of images that are worse, similar, and better to their reference counterparts in our DiffIQA dataset.

SRIQA-Bench

We first compile 100 original images covering a wide range of natural scenes and subject them to common degradations to generate input low-resolution images. We then adopt two regressive SR methods: 1) SwinIR and 2) RRDB, and eight generative SR methods: 3) Real-ESRGAN, 4) BSRGAN, 5) HGGT, 6) SUPIR, 7) SeeSR, 8) StableSR, 9) SinSR and 10) OSEDiff to produce ten SR images. Generally speaking, diffusion-based SR methods outperform GAN-based methods with more plausible textures, while generative SR methods are more effective than regressive SR methods with more realistic structures.

图片排列示例

Reference

SwinIR

RRDB

RealESRGAN

BSRGAN

HGGT

SUPIR

SeeSR

StableSR

SinSR

OSEDiff

Acknowledgements