Richer Convolutional Features for Edge Detection

23/03/2017 Yun Liu

Yun Liu¹ Ming-Ming Cheng¹ Xiaowei Hu¹ Jia-Wang Bian¹ Le Zhang² Xiang Bai³ Jinhui Tang⁴

¹Nankai University ²ADSC ³HUST ⁴NUST

Richer Convolutional Features for Edge Detection

Online demo at https://mc.nankai.edu.cn/edge

A simple demo captured by my phone （网速过慢可以使用西瓜视频观看）.

Abstract

Edge detection is a fundamental problem in computer vision. Recently, convolutional neural networks (CNNs) have pushed forward this field significantly. Existing methods which adopt specific layers of deep CNNs may fail to capture complex data structures caused by variations of scales and aspect ratios. In this paper, we propose an accurate edge detector using richer convolutional features (RCF). RCF encapsulates all convolutional features into more discriminative representation, which makes good usage of rich feature hierarchies, and is amenable to training via backpropagation. RCF fully exploits multiscale and multilevel information of objects to perform the image-to-image prediction holistically. Using VGG16 network, we achieve state-of-the-art performance on several available datasets. When evaluating on the well-known BSDS500 benchmark, we achieve ODS F-measure of 0.811 while retaining a fast speed (8 FPS). Besides, our fast version of RCF achieves ODS F-measure of 0.806 with 30 FPS. We also demonstrate the versatility of the proposed method by applying RCF edges for classical image segmentation.

Papers

Richer Convolutional Features for Edge Detection, Yun Liu, Ming-Ming Cheng, Xiaowei Hu, Jia-Wang Bian, Le Zhang, Xiang Bai, Jinhui Tang, IEEE TPAMI, 2019. [pdf] [Project Page] [bib] [source code] [official version][latex]
Richer Convolutional Features for Edge Detection, Yun Liu, Ming-Ming Cheng, Xiaowei Hu, Kai Wang, Xiang Bai, IEEE CVPR, 2017. [pdf] [Project Page] [bib] [source code, pre-trained models, evaluation results, etc]

We have released the code and data for plotting the edge PR curves of many existing edge detectors here.

Motivation

We build a simple network based on VGG16 to produce side outputs of *conv3_1*, *conv3_2*, *conv3_3*, *conv4_1*, *conv4_2* and *conv4_3*. One can clearly see that convolutional features become coarser gradually, and the intermediate layers *conv3_1*, *conv3_2*, *conv4_1*, and *conv4_2* contain lots of useful fine details that do not appear in other layers.

Method

Our RCF network architecture. The input is an image with arbitrary sizes, and our network outputs an edge possibility map in the same size. We combine hierarchical features from all the conv layers into a holistic framework, in which all of the parameters are learned automatically. Since receptive field sizes of conv layers in VGG16 are different from each other, our network can learn multiscale, including low-level and objectlevel, information that is helpful to edge detection.

The pipeline of our multiscale algorithm. The original image is resized to construct an image pyramid. And these multiscale images are input to RCF network for a forward pass. Then, we use bilinear interpolation to restore resulting edge response maps to original sizes. A simple average of these edge maps will output high-quality edges.

Evaluation on BSDS500 dataset

Performance summary of 50+ years edge detection history. Our method achieves the first real-time system with better F-Measure than human annotators. (Data for this figure can be found here)

The comparison with some competitors on BSDS500 dataset. The top three results are highlighted in red, green and blue respectively.

FAQs:

1. How your system is able to outperform humans, which is used as ground-truth?

We don’t think our method outperforms humans generally. It only achieves better F-Measure score than average human annotators of BSD 500 benchmarks. If given more time and careful training, human annotators could do better.

Related Papers

A Simple Pooling-Based Design for Real-Time Salient Object Detection, Jiang-Jiang Liu#, Qibin Hou#, Ming-Ming Cheng*, Jiashi Feng, Jianmin Jiang, IEEE CVPR, 2019. [project|bib|pdf|poster]

(Visited 41,062 times, 1 visits today)

207 Comments

Inline Feedbacks

View all comments

21AT

6 years ago

您好，请问一下resnet上实现rcf的代码公布了吗？

Yun Liu

Author

Reply to 21AT

最近有点忙,过段时间就公布

XuYun

老师您好，因为机器显卡只有1G的显存爆显存了出现错误F1227 14:17:22.056977 1292 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory，老师请问修改batch_size后结果还对么，小白一个

Reply to XuYun

在测试bsd500的时候出现的问题

MM Cheng

Admin

你这个显存太小了。估计怎么修改batch_size都不够用吧。做模型训练的话，建议买一个11GB或以上的显存的显卡。

Reply to MM Cheng

好的谢谢老师！

Zjm

老师您好！，我在训练初始时出现了Iteration 0,Terting net (#0) Iteration 0, loss=-nan，接着的loss都是-nan是什么情况，用您训练好的模型执行测试代码没问题。

Reply to Zjm

训练炸了。重新跑几次就好了。我一直也没搞明白，尝试过其他deep edge的代码，开始时都容易炸。但是炸了就重新跑就好了。

Reply to Yun Liu

嗯，好的谢谢了！

Fan Yang

老师您好,有几个问题想向您请教: 1. stage 1 的旁路输出是有一个deconv 层没有在图中画出吗？ 2. 每一层的padding 和 stride 是多少呢？感谢！

Reply to Fan Yang

stage 1 的输出和原图大小相同，所以没必要加deconv；每一层的padding 和 stride可以参考我们的prototxt来看：https://github.com/yun-liu/rcf/blob/master/examples/rcf/test.prototxt

chunzhe

老师，您好，拜读了您的文章。我看在训练网络中trainval.prototxt中，每一次反卷积deconvolution后，都加一个autocrop层，这是为什么呢？可不可以不加autocrop层。第二个问题：在VGG16网络中的数据层中是进行裁剪为224*224的，而您的文件中没有进行裁剪，直接输入整张图像，这是为什么呢？第三个问题：在网络中，此时输入的label值是整张图像，而非检测中的目标的坐标值。如果在数据data层进行裁剪crop的话，label为整张图像也需要进行裁剪吗？这几个问题不太懂，期望您的答复，谢谢您老师！

Reply to chunzhe

您好，谢谢关注！
1. autocrop 是为了裁剪，让大小一样
2. 224*224是分类里面常用的大小，FCN里面一般都是原图大小
3. 只要让data和label的位置能够对应上就行了

Rufeng Zhang

您好，拜读了您的论文，我看对于BSDS有个阈值x，用来选择忽略的label，但您在论文里讲到因为NYUD是单人标注因此阈值x在NYUD是不需要的；我看了BSDS以及NYUD的GT，发现它们的label范围均是[0,255]，是否这里是做一样的操作？（0, 128)这部分依然可以忽略？期待您的回复！还有测试NYUD也是用BSDS那个脚本吗，我没在它本身的官网看见测试代码

Reply to Rufeng Zhang

这个问题是说Caffe里没有ImageLabelmapData这个Layer。这个问题出现的原因就是没有使用RCF提供的Caffe, 如果你确定编译了RCF的Caffe的话，请检查下机器上是否已经装了Caffe并已经将路径添加到系统变量，确保删除路径，在程序中使用RCF的Caffe。

测试的话，你可以用这里的NYUD ground truth https://github.com/s-gupta/rcnn-depth,这个格式和BSDS一样，改个路径就可以跑了

Ming

请问您是如何在resnet上实现rcf的呢

Reply to Ming

这个的代码过几天就会公布

好的十分感谢，请问过几天的代码是pytorch版本的吗

huangjie

I1105 20:11:10.021775 26811 layer_factory.hpp:77] Creating layer data

F1105 20:11:10.021813 26811 layer_factory.hpp:81] Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: ImageLabelmapData
(known types: AbsVal, Accuracy, ArgMax, BNLL, BatchNorm, BatchReindex, Bias, Clip, Concat, ContrastiveLoss, Convolution, Crop, Data,
Deconvolution, Dropout, DummyData, ELU, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, HDF5Data, HDF5Output, HingeLoss, Im2col,
ImageData, InfogainLoss, InnerProduct, Input, LRN, LSTM, LSTMUnit, Log, MVN, MemoryData, MultinomialLogisticLoss, PReLU, Parameter,
Pooling, Power, Python, RNN, ReLU, Reduction, Reshape, SPP, Scale, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, Softmax,
SoftmaxWithLoss, Split, Swish, TanH, Threshold, Tile, WindowData)
*** Check failure stack trace: ***
已放弃 (核心已转储)

老师您好，我在训练RCF网络的时候出现了这个问题，找了很多方式都没有解决，期待你的回复。

Reply to huangjie

RCF的Caffe是改过的，和原版的Caffe不一样；请您尝试用RCF的Caffe

Chun Xie

老师您好，我用BVLC的Caffe 报错unknown layer type:autocrop, 请问哪里能找到您使用的RCF的Caffe?

Reply to Chun Xie

https://github.com/yun-liu/rcf

Chen Y

老师您好，RCF的caffe怎么用，直接Make编译吗？我Make之后，运行RCF-multiscale报错找不到caffe根目录,但是我看里我的caffe_root确实是指向刚编译好的caffe根目录的。我把caffe_root指向自己的caffe根目录就可以，但是报错unknown layer type:autocrop。

Reply to Chen Y

要用RCF的Caffe，不能用标准的。记得要编译pycaffe, 用make pycaffe

Wenlu

老师，您好！我的程序运行时出现了“Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: AutoCrop (known types: AbsVal, Accuracy, ArgMax, BNLL, BatchNorm, BatchReindex, Bias, Concat, ContrastiveLoss, Convolution, Crop, Data, Deconvolution, Dropout, DummyData, ELU, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, HDF5Data, HDF5Output, HingeLoss, Im2col, ImageData, InfogainLoss, InnerProduct, Input, LRN, LSTM, LSTMUnit, Log, MVN, MemoryData, MultinomialLogisticLoss, PReLU, Parameter, Pooling, Power, Python, RNN, ReLU, Reduction, Reshape, SPP, Scale, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, Softmax, SoftmaxWithLoss, Split, TanH, Threshold, Tile, WindowData)”这个问题，我注意到从Github上直接拷贝下来的caffe文件中的/include/layers中没有“AutoCrop”这个层，然而在您发布的rcf—master文件中有这个层的文件，请问在搭建caffe时必须要在您发布的程序下对caffe使用cmake进行编译吗？

Reply to Wenlu

您用我的caffe吧，或者参考这里 https://github.com/yun-liu/rcf/issues/24
不过我的loss层和data层也改了，即便照链接里那么做，也是只能测试不能训练的。

非常感谢老师！祝您工作顺利！

ziyue

老师您好，文章中每个stage中特征融合的方式是直接求和，请问是否尝试过concat的方式，不知道效果怎么样？

Reply to ziyue

大概试过，效果差不多。

shenyong

Note: Before evaluating the predicted edges, you should do the standard non-maximum suppression (NMS) and edge thinning. We used Piotr’s Structured Forest matlab toolbox available here.
Structured Edge Detection Toolbox V3.0
老师你好？请问预测图出来后，预处理是用的这个工具，是哪个文件做的非最大抑制和边缘细化？能给个详细的说明吗？

Reply to shenyong

怎么用这些toolbox你自己看吧。

HuYAOHUI

您好，我跑了您给的代码，但训练时总的输出loss并不是您论文中提到的loss的求和啊？这是怎么回事？

Reply to HuYAOHUI

total loss是多个iteration叠加的结果，每个side output和fuse的loss是没有叠加的，是输出时最后一次的loss，所以看着不相等；但是实际上就是相加的关系

非常感谢～

您好，我跑了您给的代码，训练时总的loss并不是您论文中提到的6个loss的求和啊？这是怎么回事呢？

老师，您好，最后的loss我看论文是几个loss的叠加，但论文中提到在计算损失函数时的正负像素的比例，在哪里修改啊？我想跑一下自己的数据集，谢谢。

sigmoid_cross_entropy_loss_layer.cpp

老师您好，我看了cpp文件，我看是自动统计的，是吗？

Kai Li

多谢老师的指导，上回的问题已经解决啦。还有一些问题想请教老师一下：RCF 和 HED 这两篇文章的实验部分都提到 weight decay（0.0002），可能是由于我比较粗心，在caffe程序中并未找到 weight decay 用在哪了。不使用weight decay 的话，会影响泛化能力吗？我最近在使用 tensorflow (1.8 version)还原您的实验，目前最好的ODS结果才将近0.72，不知是否是未使用weight dacay 的原因。期待您的回复。

Reply to Kai Li

weight dacay 可以在caffe的solver.prototxt里面设置

非常感谢老师的指导，祝您生活愉快、工作顺利。

humm

老师，您好！您文章中ground truth图是二值图像吗？有的文章中ground truth图是黑底白边，有的是白底黑边，期刊论文要用那种？

Reply to humm

这个无所谓的，只是展示方式不同而已。我一般喜欢用白底黑边，只是因为打印机省墨 🙂

sss

你好，我在运行RCF-singlescale.ipynb的时候出现这个是什么意思啊？The kernel appears to have died. It will restart automatically.还有，我为什么看不到我测试的结果？郁闷

Reply to sss

你看看控制台报的什么错误

命令行下面是这个错误：Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: AutoCrop (known types: AbsVal, Accuracy, AnnotatedData, ArgMax, BNLL, BatchNorm, BatchReindex, Bias, Concat, ContrastiveLoss, Convolution, Crop, Data, Deconvolution, DetectionEvaluate, DetectionOutput, Dropout, DummyData, ELU, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, HDF5Data, HDF5Output, HingeLoss, Im2col, ImageData, InfogainLoss, InnerProduct, Input, LRN, LSTM, LSTMUnit, Log, MVN, MemoryData, MultiBoxLoss, MultinomialLogisticLoss, Normalize, PReLU, Parameter, Permute, Pooling, Power, PriorBox, Python, RNN, ReLU, Reduction, Reshape, SPP, Scale, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, SmoothL1Loss, Softmax, SoftmaxWithLoss, Split, TanH, Threshold, Tile, VideoData, WindowData)
*** Check failure stack trace: ***
请问这是什么原因呢？

这是由于caffe的配置有问题，看看有没有make pycaffe，或者查一下caffe相关的问题是怎么解决的

好的，我试试，谢谢

feifei

5 years ago

尊敬的同学，您好，我也遇到了同样的问题，请问您解决了吗？解决了的话，能否帮助我一下，指点一下，感激不尽！

Reply to feifei

你要不试试把caffee重新编译一下吧，我记得我当时好像重新编译了caffee

看了其他人跟老师的交流，说是得用老师的caffe，但老师的caffe应该是编译过的，所以怎么编译才能得到老师的caffe呢？能加您的邮箱交流一下吗？我需要您的帮助！拜托了，学长！小白一枚，麻烦您了！

就是，你按照普通Caffe的安装流程去编译这个Caffe就可以了，这个Caffe不依赖特殊的东西，我就是稍微改了下代码

919066765这是我的QQ，编译就上面说的，需要用最原始的方法安装caffe

王思思

你好，你们有没有这个代码配置的详细细节呢？我想运行一下这个代码，可是在运行时报错了，我感觉应该是我自己没有配置好，小白一名，还请大神细心指点

Reply to 王思思

Caffe装好了吗？

装好了，这个问题我已经解决了，谢谢了

王钰涵

你好，我也是小白，caffe我只做过简单的图像识别，没什么基础，现在准备做边缘检测，我们可以加下qq私下里讨论一下操作流程吗？我的qq是867243772，邮箱867243772@qq.com

您好，请问您解决了吗？可以加你的qq或者邮箱交流一下吗？

我是初学者，我想我们遇到过相同的问题，可以加下qq私下里讨论一下操作流程吗？我的qq:2681674725，邮箱xxiaofee@163.com

Jiantong Chen

7 years ago

作者您好，我对RCF中每个stage都采用了以cross-entropy loss的sigmoid layer，这个操作的物理意义怎么解释呢？比如是解决最终分类层损失太小、梯度弥散，还是说让每个stage学到的特征更鲁棒？

Reply to Jiantong Chen

Deep supervision现在用的比较多了，可以参考论文Deeply-Supervised Nets这篇论文。

您好，我最近在学习边缘检测的东西，看到您2017发的文章是目前效果最好的。作为一个 CV 领域的一个小白的我，想问一个细节问题：在Pdollar 大神的matlab工具箱里，使用的 Ground Truth 的 size大小是 321*481。在对测试集进行性能评估的时候，是否要按照这个 size 进行呢？如果按照这个 SIze 对测试集的图像进行设置，老是报错~~。而用320*480的size,却没问题。作为边缘检测领域的权威，不知您是如何处理这个问题的？言语如有不适之处，敬请大神海涵。

抱歉，不太理解你的问题。但是评测的时候，图像大小肯定是要与测试机原图大小一致的。

多谢老师的指导，上回的问题已经解决啦。还有一个问题，想请教老师一下：RCF 和 HED 这两篇文章的实验部分都提到 weight decay（0.0002），可能是由于我比较粗心，在caffe程序中并未找到 weight decay 用在哪了。不使用weight decay的话，会影响泛化能力吗？期待您的回复。

« Previous 1 2 3 Next »

wpDiscuz

Online demo at https://mc.nankai.edu.cn/edge

Abstract

Papers

Motivation

Method

Evaluation on BSDS500 dataset

FAQs:

1. How your system is able to outperform humans, which is used as ground-truth?

Related Papers

You May Also Like

DenseCut: Densely Connected CRFs for Realtime GrabCut

Global contrast based salient region detection

Interactive Images: Cuboid Proxies for Smart Image Manipulation