1Nankai University 2HUST
In this paper, we propose an accurate edge detector using richer convolutional features (RCF). Since objects in natural images possess various scales and aspect ratios, learning the rich hierarchical representations is very critical for edge detection. CNNs have been proved to be effective for this task. In addition, the convolutional features in CNNs gradually become coarser with the increase of the receptive fields. According to these observations, we attempt to adopt richer convolutional features in such a challenging vision task. The proposed network fully exploits multiscale and multilevel information of objects to perform the image-to-image prediction by combining all the meaningful convolutional features in a holistic manner. Using VGG16 network, we achieve state-of-the-art performance on several available datasets. When evaluating on the well-known BSDS500 benchmark, we achieve ODS F-measure of 0.811 while retaining a fast speed (8 FPS). Besides, our fast version of RCF achieves ODS F-measure of 0.806 with 30 FPS.
- Richer Convolutional Features for Edge Detection, Yun Liu, Ming-Ming Cheng, Xiaowei Hu, Kai Wang, Xiang Bai, IEEE CVPR, 2017. [pdf] [Project Page] [bib] [source code]
We build a simple network based on VGG16 to produce side outputs of conv3_1, conv3_2, conv3_3, conv4_1, conv4_2 and conv4_3. One can clearly see that convolutional features become coarser gradually, and the intermediate layers conv3_1, conv3_2, conv4_1, and conv4_2 contain lots of useful fine details that do not appear in other layers.
Our RCF network architecture. The input is an image with arbitrary sizes, and our network outputs an edge possibility map in the same size. We combine hierarchical features from all the conv layers into a holistic framework, in which all of the parameters are learned automatically. Since receptive field sizes of conv layers in VGG16 are different from each other, our network can learn multiscale, including low-level and objectlevel, information that is helpful to edge detection.
The pipeline of our multiscale algorithm. The original image is resized to construct an image pyramid. And these multiscale images are input to RCF network for a forward pass. Then, we use bilinear interpolation to restore resulting edge response maps to original sizes. A simple average of these edge maps will output high-quality edges.
Evaluation on BSDS500 dataset