Scoot: A Perceptual Metric for Facial Sketches
Deng-Ping Fan1,2, ShengChuan Zhang3, Yu-Huan Wu1, Yun Liu1, Ming-Ming Cheng1, Bo Ren1, Paul L. Rosin4, Rongrong Ji3
1TKLNDST, CS, Nankai University 2Inception Institute of Artificial Intelligence (IIAI) 3Xiamen University 4Cardiff University
Abstract
The human visual system has a strong ability to quickly assess the perceptual similarity between two facial sketches. However, existing two widely-used facial sketch metrics, e.g., FSIM and SSIM fail to address this perceptual similarity in this field. A recent study in the facial modeling area has verified that the inclusion of both structure and texture has a significant positive benefit for face sketch synthesis (FSS). But which statistics are more important, and are helpful for their success? In this paper, we design a perceptual metric, called Structure Co-Occurrence Texture (Scoot), which simultaneously considers the block level spatial structure and co-occurrence texture statistics. To test the quality of metrics, we propose three novel meta-measures based on various reliable properties. Extensive experiments verify that our Scoot metric exceeds the performance of prior work. Besides, we built the first largest scale (152k judgments) human-perception-based sketch database that can evaluate how well a metric consistent with human perception. Our results suggest that “spatial structure” and “co-occurrence texture” are two generally applicable perceptual features in face sketch synthesis.
Figure 1: Which synthesized sketch is more similar to the middle sketch? For the middle case, sketch 0 is more similar than sketch 1 w.r.t. reference in terms of structure and texture. sketch 1 almost completely destroys the structure of the hair. The widely-used (SSIM [65], FSIM [79]), classic (IFC [40], VIF [39]) and recently released (GMSD [74]) metrics disagree with humans. Only our Scoot metric agrees well with humans.
Publication
Deng-Ping Fan, ShengChuan Zhang, Yu-Huan Wu, Yun Liu, Ming-Ming Cheng, Bo Ren, Paul L Rosin, Rongrong Ji
Scoot: A Perceptual Metric for Facial Sketches, ICCV, 2019
[project page][bib][pdf][supp][official version][code][Dataset (77M)]
Most related projects on this website
- Enhanced-alignment Measure for Binary Foreground Map Evaluation, Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming-Ming Cheng, Ali Borji. IJCAI, 2018. Oral presentation, Accept rate: 20.0% [710/3470] [project page | bib | pdf | latex| official version | 中文pdf|IJCAI slides | IJCAI poster | Matlab code(5.6k) | Dataset(3M)]
- Structure-measure: A New Way to Evaluate Foreground Maps, Deng-Ping Fan, Yun Liu, TaoLi, Ming-Ming Cheng, Aliborji. IEEE ICCV, 2017. Spotlight presentation, Accept rate: 2.61% [56/2143] [project page | bib | official version | 中文版pdf ] [ICCV slides | ICCV poster | video | Youtube| Matlab code | C++ code]
Motivation
Designing a good perceptual metric should take into account human perception in facial sketch comparison, which should:
- obtain high visual perception so that the good sketch can be directly used in various subjective applications.
- be insensitive to slight mismatches (i.e., resize, rotation) since real-world sketches drawn by artists do not precisely match each pixel to the original photos.
- be capable of capturing holistic content, that is, prefer the complete sketch than which one only contains strokes (lost some components of facial).
What did we do?
- Firstly, we propose a Structure Co-Occurrence Texture (Scoot) perceptual metric for FSS that provides a unified evaluation considering both structure and texture.
- Secondly, we design three meta-measures based on the above three reliable properties. Extensive experiments on these meta-measures verify that our Scoot metric exceeds the performance of prior works. Our experiments indicate that “spatial structure” and “cooccurrence” texture are two generally applicable perceptual features in FSS.
- Thirdly, we explore different ways of exploiting texture statistics (e.g., Gabor, Sobel, and Canny, etc.). We find that the simple texture feature [14, 15] performs far better than the commonly used metrics in these literature [39, 40, 65, 74, 79]. Based on our findings, we construct the first largescale human-perception-based sketch database that can evaluate how well a metric goes in line with human perception.
Our three contributions presented above offer a complete metric benchmark suite, which provides a novel view and practical tools (e.g., metric, meta-measures, and database) to analyze data similarity from human perception direction.
Meta-Measure 1: Stability to Slight Resizing
The first meta-measure specifies that the rankings of synthetic sketches should not change much with slight changes in the GT sketch. Therefore, we perform a minor 5 pixels downsizing of the GT by using nearest-neighbor interpolation.
Figure 2: Visual comparison of existing widely-used FSS measures (SSIM [8], FSIM [10], and VIF [4]) on meta-measure 1. The experiment clearly shows that the proposed SCOOT measure is more stable to slightly resize.
Meta-Measure 2: Rotation Sensitivity
In real-world situations, sketches drawn by artists may also have slight rotations compared to the original photographs. Thus, the proposed second meta-measure verifies the sensitivity of GT rotation for the evaluation measure. We did a slight counter-clockwise rotation (5o) for each GT sketch.
Figure 3: Visual comparison of existing widely-used FSS measures (SSIM [8], FSIM [10], and VIF [4]) on meta-measure 2. The experiment clearly demonstrates that the proposed SCOOT measure is less sensitive to minor rotation.
Meta-Measure 4: Human Judgment
The fourth meta-measure (Jug) specifies that the ranking result according to an evaluation measure should agree with human judgment.
Figure 4: Meta-measure 4. Sample images from our human ranked database. The first row is the GT sketch, followed by the first and second-ranked synthesis results. We refer the reader to the accompanying attachment (“Proposed Datasets”) for more details.
-
Download: Perceptual Similarity Dataset
Performance
Table 1: Benchmarking results of classical and alternative texture/edge based metrics. The best result is highlighted in bold. These differences are all statistically significant at the α < 0.05 level. This ↑ indicates that the higher the score is, the better the metric performs, and vice versa (↓).
..waiting update…