DemoResearch

Richer Convolutional Features for Edge Detection

Yun Liu1      Ming-Ming Cheng1     Xiaowei Hu1      Jia-Wang Bian1        Le Zhang2       Xiang Bai3      Jinhui Tang4

1Nankai University        2ADSC     3HUST       4NUST

Richer Convolutional Features for Edge Detection

Online demo at https://mc.nankai.edu.cn/edge

A simple demo captured by my phone (网速过慢可以使用西瓜视频观看).

Abstract

Edge detection is a fundamental problem in computer vision. Recently, convolutional neural networks (CNNs) have pushed forward this field significantly. Existing methods which adopt specific layers of deep CNNs may fail to capture complex data structures caused by variations of scales and aspect ratios. In this paper, we propose an accurate edge detector using richer convolutional features (RCF). RCF encapsulates all convolutional features into more discriminative representation, which makes good usage of rich feature hierarchies, and is amenable to training via backpropagation. RCF fully exploits multiscale and multilevel information of objects to perform the image-to-image prediction holistically. Using VGG16 network, we achieve state-of-the-art performance on several available datasets. When evaluating on the well-known BSDS500 benchmark, we achieve ODS F-measure of 0.811 while retaining a fast speed (8 FPS). Besides, our fast version of RCF achieves ODS F-measure of 0.806 with 30 FPS. We also demonstrate the versatility of the proposed method by applying RCF edges for classical image segmentation.

Papers

We have released the code and data for plotting the edge PR curves of many existing edge detectors here.

Motivation

We build a simple network based on VGG16 to produce side outputs of conv3_1, conv3_2, conv3_3, conv4_1, conv4_2 and conv4_3. One can clearly see that convolutional features become coarser gradually, and the intermediate layers conv3_1, conv3_2, conv4_1, and conv4_2 contain lots of useful fine details that do not appear in other layers.

Method

Our RCF network architecture. The input is an image with arbitrary sizes, and our network outputs an edge possibility map in the same size. We combine hierarchical features from all the conv layers into a holistic framework, in which all of the parameters are learned automatically. Since receptive field sizes of conv layers in VGG16 are different from each other, our network can learn multiscale, including low-level and objectlevel, information that is helpful to edge detection.

The pipeline of our multiscale algorithm. The original image is resized to construct an image pyramid. And these multiscale images are input to RCF network for a forward pass. Then, we use bilinear interpolation to restore resulting edge response maps to original sizes. A simple average of these edge maps will output high-quality edges.

Evaluation on BSDS500 dataset

Performance summary of 50+ years edge detection history. Our method achieves the first real-time system with better F-Measure than human annotators.   (Data for this figure can be found here)

The comparison with some competitors on BSDS500 dataset. The top three results are highlighted in red, green and blue respectively.

FAQs:

1. How your system is able to outperform humans, which is used as ground-truth?

We don’t think our method outperforms humans generally. It only achieves better F-Measure score than average human annotators of BSD 500 benchmarks. If given more time and careful training, human annotators could do better.

Related Papers

  • A Simple Pooling-Based Design for Real-Time Salient Object Detection, Jiang-Jiang Liu#, Qibin Hou#, Ming-Ming Cheng*, Jiashi Feng, Jianmin Jiang, IEEE CVPR, 2019. [project|bib|pdf|poster]
(Visited 39,825 times, 1 visits today)
Subscribe
Notify of
guest

207 Comments
Inline Feedbacks
View all comments
lucas

你好现在在线测试不能使用怎么回事?

MM Cheng

由于众所周知的原因,在重要会议召开期间,高校的服务器除了有特殊备案审批的,外网都不能访问。

sgg
请问老师,for i in range(len(results)):
    all_res[i, 0, :, :] = results[i]这句报错
RuntimeError: expand(torch.cuda.FloatTensor{[4, 1, 256, 256]}, size=[256, 256]): the number of sizes provided (2) must be greater or equal to the number of dimensions in the tensor (4)是什么原因呢
MM Cheng

没读到数据吧。

Chance

您好,在做边缘检测时,可不可以对其标注进行一定范围的膨胀,再使用图像分割的算法来做呢?后再通过后处理方法,将其转为单个像素宽的边缘

MM Cheng

这是理想情况的做法。但是实际很容易遇到各种问题。可以试试就知道了

Zehan Li

老师您好,我看到您论文中公式(1)的损失函数是以交叉熵的形式定义的,但是不是少了个负号?

foraward

Hi, very great work. I am very interested in your work and have been following this project for a long time. Here I would like to discuss an implementation detail. Since I am not familiar to Caffe, I use your PyTorch implementation (Link: https://github.com/meteorshowers/RCF-pytorch and https://github.com/balajiselvaraj1601/RCF_Pytorch_Updated ) , and successfully reproduce your reported ODS-F score 0.806 by training with augmentated BSDS and PASCAL VOC datasets. After thoroughly reading the code, I found the following code exists in ‘data_load.py’:       

lb = lb[np.newaxis, :, :]

lb[lb == 0] = 0

lb[np.logical_and(lb>0, lb<127.5)] = 2

lb[lb >= 127.5] = 1

I guess this code is to introduce the relaxed deep supervison proposed by [1]. From the above code, we can infer that the pixels with values between 0 and 127.5 are simply set to be 2, which means these pixels are regarded as the so-called relaxed labels. However, the paper [1] says that the relaxed labels belong to the positive labels produced by Canny Operator or SE [2], but not included in the positive labels of the original manually annotated ground truth. I have thoroughly read the code and have not found the usage of Canny or SE. I would like to ask that is this part of the code for achieving relaxed deep supervision? Does this naive manner of setting relaxed labels reasonable? Will it get better performance if Canny or SE are utilized to generated relaxed labels? and how to do it? Thanks. if there’s something I didn’t make clear, please let me know.

References:
[1] Liu Yu, Michael S. Lew, Learning Relaxed Deep Supervision for Better Edge Detection. In CVPR, 2016.
[2] P. Dolla^{‘}r and C. L. Zitnick, Fast Edge Detection using Structured Forests. TPAMI, 37(8):1558-1570, 2015,

放羊娃

How did you calculate the ODS F-score=0.808, which was not found in your code

Xiao-Hang Yu

老师您好,我用的算法最后得出的是二值结果,为画出详细朴实的ROC曲线,每一次参数选取之后,在benchmark中的nthresh选取多少合适呢?99太多了,平均一个参数出一个text要5-7分钟,那如果选的nthresh很小,得到的点就很小,怕线描的不准

Xiao-Hang Yu

老师,我想请教1个问题:BSDS500数据集中的nthreshold取为99是什么意思?为什么gpb-owt-ucm取5个就行?而且跑出来的结果那么接近Human

王成超

老师您好,我使用了自己的数据集,送入RCF网络后经过sigmoid后不能正确分类,输出图片全黑的,请问老师有什么好的建议吗?谢谢了

kai

程老师,您好,您文章中的第三个数据集(Multicue Dataset)方便公布吗?最好是数据增强后的数据。期待您的回复。

kai

多谢刘老师的回复。增强后的数据集Multicue ,方便公布吗?

kai

另外,2015年HED这篇文章貌似没有使用数据集Multicue。

成先镜

刘老师,我想问一下,我想提取kitti2015数据集的ground truth即深度图的边缘信息做立体匹配的辅助训练,原本我是想找有深度图边缘信息的数据集(比较接近也就NYUDv2),如果直接用RCF做测试会得到一个结果即边缘图像,用于kitti图像的训练,这样得到的效果不知道是否可行,关键在于利用测试得到的边缘图像是否精确

Yun Liu

不太懂你的意思,你是想用NYUDv2训练,然后在kitti2015上测试,辅助训练?不过这个我也说不好精确不精确,或者有没有提升,我觉得可以试试,边缘检测的结果毕竟比直接做depth预测的边界清晰,或许有帮助

成先镜

我是做立体匹配的,看了关于边缘的几篇论文和代码,发现代码中每一层的权重和学习率都固定了,并且是在单张GPU卡上跑,没有用多张卡并行跑,因为我要用边缘做辅助,把代码改了一下,在多张卡上跑,最近在跑实验,不知道改成多张卡跑会不会较单张卡提取的边缘精度有大的损失

张是啊

请问 edge detection 中各种数据集的 human performance 是怎么计算出来的 一直没找到相关的说明

wang yi

您提供的NYUD数据集中的测试集,没有groundtruth,这个groundtruth怎么获得

Wenlu

您好!在将我个人数据集中的图片分割成100*100的格式提取边缘后,发现了每张图有黑边的存在,边缘的那一小部分信息就丢失了。我想请问老师您有比较好的解决办法吗?

BYD

老师,不好意思,还有个问题想请教您,一般边缘检测,我们是将边缘与非边缘通过0和1的二值图像进行区分,将这个边缘检测的结果,通过BSDS500数据集进行测试的话,得出的数据只会在两个数值左右波动,运用您提供的绘制曲线的方法,绘制出的结果在图中显示,肉眼看只是一个点,这是什么情况啊?怎样使得检测的边缘有主要和次要的,使其灰度值在0~255之间,请老师指点

BYD

老师,用您的方法跑出来的评测数据,xx_brdy_thr.txt文件中有四列,删除掉第一列,剩下的三列,第一列的值是从0.91到0.00015,第二列的值是0.48到1,第三列是0.63先到0.81再到0.0003,而我评测经典Canny的值的时候,第一列的值一直在0.45左右波动,第二列第三列也都是在一个数值波动,没有像您的数据那样有一个增减性,导致在绘图的时候,肉眼看上去只是一个点,我想问问这是什么情况?这个代码在虚拟机上的linux系统跑是可以的吧?请老师指教,谢谢

Yun Liu

Canny? 所以你的Canny是二值的吗? 一般的edge结果是一个概率, 不是二值的. 要想画Canny的话, 得尝试大量的参数, 从而能够得到很多组结果, 每组结果一个点, 连起来就是线了

BYD

老师您好,我不太明白边缘结果是一个概率的意思,我得出的边缘结果是一个二值图像,恳请老师讲解下。这么说的话,您的意思是,我改动其中的参数,举例说,我改动其中的阈值参数的大小,进行测试,将每次的结果收集处理,进行组合,从而生成xx_brdy_thr.txt文件,然后绘图?

放羊娃

有没有生成生成xx_brdy_thr.txt文件的代码示例?比如计算canny算法如何设置阈值得到输出xx_brdy_thr.txt文件?这块儿不太明白。

放羊娃

为什么要画一条线呢?按照最好阈值的交并比评价行不行?使用F-measure和交并比有什么差别?

Yun Liu

不明白你的意思, 我们提供的是一条曲线的

BYD

老师,您在测试这么多种算法的时候,输入的边缘图像是二值图像吗?

ZhangH

Canny的话,应该要把阈值从0.01到0.99全跑一遍,然后汇总成一个txt画出来

BYD

好的,我懂了,谢谢你,这个就是因为图像是二值的原因是吧?

Hope

您好!读了您的RCF论文,我受益匪浅。但由于刚刚接触懂得不多,所以有一个问题想请教一下您。
您的RCF网络在VGG16的基础上,每个卷积层后面都连接一个1*1的卷积层,但是我没明白,为什么channel depth是21呢?
这个21是有什么说法吗?
还有一个想法就是,我可不可以把两个stage生成的特征进行融合呢?
比如说,stage1的eltwise层输出,与stage2的eltwise层的输出,两者可不可以再通过一个eltwise层进行融合呢?
以上就是我的问题,期待您的解答和回复,谢谢!

Yun Liu

当时在FCN的prototxt基础上改的, 没想那么多, 就保留了21

可以尝试下不同stage融合有没有效果, 我还没有试过, 不知道有没有提升

chengzhen

请问RCF可以通过改模型实现语义边缘检测吗?

Yun Liu

我还没有尝试过这个, 应该都差不多吧

BYD

我使用test_benchs.m可以进行测评,当我把inDir文件改成其它检测的结果,就会出错,请老师指点

BYD

是这样,test_benchs.m是bench文件夹下的源代码,测试五张照片,默认的inDir是ucm2,是ucm2的算法检测出的结果,存储格式是.mat,我现在将别的算法检测的结果存储为.mat格式,并将其改为inDir,总是出错

BYD

举例说,你用经典Canny算法检测图像边缘,将其检测结果存为.mat形式,作为输入inDir,运行allbench.m或者boundrybench.m会出现不同错误,这是需要修改其中的原始代码吗?还是输入检测结果的形式不对?老师,您在测试某一种算法检测结果后,是怎么存储的?然后怎样在BSDS上跑的?

Yun Liu

我都是存储成了.png, 改几行评测的输入, 把输入对上就可以了

BYD

谢谢老师,这个问题我解决了,非常感谢

BYD

老师,不好意思,还有个问题想请教您,一般边缘检测,我们是将边缘与非边缘通过0和1的灰度级进行区分,如何将这个边缘检测的结果,取值成0到255等级分布的,也就是有主要边缘和次要边缘?您的这个算法中是如何进行取值的?恳请老师指教,谢谢

BYD

老师您好,我在做关于边缘检测方面的课题,需要用到BSDS500数据集做评测,我看到您在这篇文章后头使用这个数据集测试了许多检测结果,我想请教如何使用这个数据集测试别的边缘检测算法的效果,具体一点就是如何生成xx_brdy.txt和xx_brdy_thr.txt,我留下我的邮箱,894759462@qq.com,同qq,希望老师指点一下,困扰好久了

BYD

我参考了网上有限的资料,运行还是经常出错,希望老师能够指点一下

BYD

我使用test_benchs.m可以进行测评,当我把inDir文件改成其它检测的结果,就会出错,请老师指点

BYD

test_benchs.m是bench文件里的一个源代码,就是测试五张图片,用的ucm的检测结果

BYD

我就是在这里下载的

Yun Liu

评测都是类似,你可以看看代码,把输入对上就可以了

BYD

好的,谢谢老师

放羊娃

你找到如何生成xx_brdy.txt和xx_brdy_thr.txt的方法了吗?有没有代码?感觉评价边缘检测算法好坏的标准并不是很直观,实在摸不着头脑