Scoot: A Perceptual Metric for Facial Sketches

02/08/2019 Deng-Ping Fan

Deng-Ping Fan^1,2, ShengChuan Zhang³, Yu-Huan Wu¹, Yun Liu¹, Ming-Ming Cheng¹, Bo Ren¹, Paul L. Rosin⁴, Rongrong Ji³

¹TKLNDST, CS, Nankai University ²Inception Institute of Artificial Intelligence (IIAI) ³Xiamen University ⁴Cardiff University

Abstract

The human visual system has a strong ability to quickly assess the perceptual similarity between two facial sketches. However, existing two widely-used facial sketch metrics, e.g., FSIM and SSIM fail to address this perceptual similarity in this field. A recent study in the facial modeling area has verified that the inclusion of both structure and texture has a significant positive benefit for face sketch synthesis (FSS). But which statistics are more important, and are helpful for their success? In this paper, we design a perceptual metric, called Structure Co-Occurrence Texture (Scoot), which simultaneously considers the block level spatial structure and co-occurrence texture statistics. To test the quality of metrics, we propose three novel meta-measures based on various reliable properties. Extensive experiments verify that our Scoot metric exceeds the performance of prior work. Besides, we built the first largest scale (152k judgments) human-perception-based sketch database that can evaluate how well a metric consistent with human perception. Our results suggest that “spatial structure” and “co-occurrence texture” are two generally applicable perceptual features in face sketch synthesis.

Figure 1: Which synthesized sketch is more similar to the middle sketch? For the middle case, sketch 0 is more similar than sketch 1 w.r.t. reference in terms of structure and texture. sketch 1 almost completely destroys the structure of the hair. The widely-used (SSIM [65], FSIM [79]), classic (IFC [40], VIF [39]) and recently released (GMSD [74]) metrics disagree with humans. Only our Scoot metric agrees well with humans.

Publication

Deng-Ping Fan, ShengChuan Zhang, Yu-Huan Wu, Yun Liu, Ming-Ming Cheng, Bo Ren, Paul L Rosin, Rongrong Ji

Scoot: A Perceptual Metric for Facial Sketches, ICCV, 2019

[project page][bib][pdf][supp][official version][code][Dataset (77M)]

Most related projects on this website

Enhanced-alignment Measure for Binary Foreground Map Evaluation, Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming-Ming Cheng, Ali Borji. IJCAI, 2018. Oral presentation, Accept rate: 20.0% [710/3470] [project page | bib | pdf | latex| official version | 中文pdf|IJCAI slides | IJCAI poster | Matlab code(5.6k) | Dataset(3M)]
Structure-measure: A New Way to Evaluate Foreground Maps, Deng-Ping Fan, Yun Liu, TaoLi, Ming-Ming Cheng, Aliborji. IEEE ICCV, 2017. Spotlight presentation, Accept rate: 2.61% [56/2143] [project page | bib | official version | 中文版pdf ] [ICCV slides | ICCV poster | video | Youtube| Matlab code | C++ code]

Motivation

Designing a good perceptual metric should take into account human perception in facial sketch comparison, which should:

obtain high visual perception so that the good sketch can be directly used in various subjective applications.
be insensitive to slight mismatches (i.e., resize, rotation) since real-world sketches drawn by artists do not precisely match each pixel to the original photos.
be capable of capturing holistic content, that is, prefer the complete sketch than which one only contains strokes (lost some components of facial).

What did we do?

Firstly, we propose a Structure Co-Occurrence Texture (Scoot) perceptual metric for FSS that provides a unified evaluation considering both structure and texture.
Secondly, we design three meta-measures based on the above three reliable properties. Extensive experiments on these meta-measures verify that our Scoot metric exceeds the performance of prior works. Our experiments indicate that “spatial structure” and “cooccurrence” texture are two generally applicable perceptual features in FSS.
Thirdly, we explore different ways of exploiting texture statistics (e.g., Gabor, Sobel, and Canny, etc.). We find that the simple texture feature [14, 15] performs far better than the commonly used metrics in these literature [39, 40, 65, 74, 79]. Based on our findings, we construct the first largescale human-perception-based sketch database that can evaluate how well a metric goes in line with human perception.

Our three contributions presented above offer a complete metric benchmark suite, which provides a novel view and practical tools (e.g., metric, meta-measures, and database) to analyze data similarity from human perception direction.

Meta-Measure 1: Stability to Slight Resizing

The first meta-measure specifies that the rankings of synthetic sketches should not change much with slight changes in the GT sketch. Therefore, we perform a minor 5 pixels downsizing of the GT by using nearest-neighbor interpolation.

Figure 2: Visual comparison of existing widely-used FSS measures (SSIM [8], FSIM [10], and VIF [4]) on meta-measure 1. The experiment clearly shows that the proposed SCOOT measure is more stable to slightly resize.

Meta-Measure 2: Rotation Sensitivity

In real-world situations, sketches drawn by artists may also have slight rotations compared to the original photographs. Thus, the proposed second meta-measure verifies the sensitivity of GT rotation for the evaluation measure. We did a slight counter-clockwise rotation (5o) for each GT sketch.

Figure 3: Visual comparison of existing widely-used FSS measures (SSIM [8], FSIM [10], and VIF [4]) on meta-measure 2. The experiment clearly demonstrates that the proposed SCOOT measure is less sensitive to minor rotation.

Meta-Measure 4: Human Judgment

The fourth meta-measure (Jug) specifies that the ranking result according to an evaluation measure should agree with human judgment.

Figure 4: Meta-measure 4. Sample images from our human ranked database. The first row is the GT sketch, followed by the first and second-ranked synthesis results. We refer the reader to the accompanying attachment (“Proposed Datasets”) for more details.

Download: Perceptual Similarity Dataset

Performance

Table 1: Benchmarking results of classical and alternative texture/edge based metrics. The best result is highlighted in bold. These differences are all statistically significant at the α < 0.05 level. This ↑ indicates that the higher the score is, the better the metric performs, and vice versa (↓).

..waiting update…

(Visited 2,011 times, 2 visits today)