Deeply Supervised Salient Object Detection with Short Connections
Qibin Hou1 Ming-Ming Cheng1 Xiaowei Hu1 Ali Borji2 Zhuowen Tu3 Philip H. S. Torr4
1CCCE, Nankai University 2CRCV, UCF 3UCSD 4The University of Oxford
Online Demo
Abstract
Recent progress on salient object detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs). Semantic segmentation and salient object detection algorithms developed lately have been mostly based on Fully Convolutional Neural Networks (FCNs). There is still a large room for improvement over the generic FCN models that do not explicitly deal with the scale-space problem. Holistically-Nested Edge Detector (HED) provides a skip-layer structure with deep supervision for edge and boundary detection, but the performance gain of HED on saliency detection is not obvious. In this paper, we propose a new salient object detection method by introducing short connections to the skip-layer structures within the HED architecture. Our framework takes full advantage of multi-level and multi-scale features extracted from FCNs, providing more advanced representations at each layer, a property that is critically needed to perform segment detection. Our method produces state-of-the-art results on 5 widely tested salient object detection benchmarks, with advantages in terms of efficiency (0.08 seconds per image), effectiveness, and simplicity over the existing algorithms. Beyond that, we conduct an exhaustive analysis of the role of training data on performance. Our experimental results provide a more reasonable and powerful training set for future research and fair comparisons.
Paper
- Deeply supervised salient object detection with short connections, Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, Ali Borji, Zhuowen Tu, Philip Torr, IEEE TPAMI (CVPR 2017), 2019. [pdf] [Project Page] [bib] [source code & data] [official version] [poster] [中文版海报][LaTeX]
Source Code
You can find our code here. We have uploaded the caffe and CRF packages we used in our paper.
If you find our work is helpful, please cite
@article{HouPami19Dss, title={Deeply supervised salient object detection with short connections}, author={Hou, Qibin and Cheng, Ming-Ming and Hu, Xiaowei and Borji, Ali and Tu, Zhuowen and Torr, Philip}, year = {2019}, volume={41}, number={4}, pages={815-828}, journal= {IEEE TPAMI}, doi = {10.1109/TPAMI.2018.2815688}, }
Contact
andrewhoux AT gmail DOT com
Applications
This algorithm is used in flagship products such as Huawei Mate 10, Huawei Honour V10 etc, to create “AI Selfie: Brilliant Bokeh, perfect portraits” effects as demonstrated in Mate 10 launch show, in Munich, Germany.
A report in Nature: link.
Hello! Only 1447 saliency maps exit in your published results of HKU-IS dataset. However, as far as I am concerned, there are 4447 images for HKU-IS. Why?
The rests are used for training and validation (See the paper).
您好!请问从MSRA-B中选择2000测试图片,这2000张是随机的吗?如果是随机,那各种论文中的测试集都不同如何比较呢?
DRFI那个paper提供的
In your publication “Deeply Supervised Salient Object Detection with Short Connections” on TPAMI, you said that “We use full resolution images to train our network, and the mini batch size is set to 10.”. Does that mean that you use the original training images and set the batch size to 10?I found that in the training dataset MSRA-B, some images have different sizes, so how do you use the different size of images in a batch? In addition, can you plz give the detail about the learning rate ,decay parameters and step size? Thank you so much!
You can set iter_size to 10 which can solve the problem of images with different resolutions. lr (0.001) if you normalize the loss. weight decay (5e-4). step size (8000, totally 12,000).
so which kind of optimization method do you use? The Adam or Momentum?
And what do you mean by the” normalize the loss”? thank you so much!
Please refer to our source code for more details https://github.com/Andrew-Qibin/DSS
Do you mean that we should train the model with an initial learning rate of 0.001? The basic learning rate specified in the paper as well as in the open sourced code is 1e-8. When we tried to train the model with a learning rate of 1e-8 using Momentum optimizer, it seemed that the side-output layers could not learn feartures in a right way. Some of the side-output layers would always output images that were completely white inspite of different input images. What do you think may cause such a phenomenon? Thanks for your time.
The selection of the initial lr actually depends on whether the reduction parameter (Pytorch) in the loss layer is activated (the ‘normalize’ parameter in Caffe). This means if your total loss is divided by the number of pixels then you can set lr to 1e-3. If not, you need to set it to 1e-8 or less.
Thanks a lot for your timely reply.
By the way, is it necessary to set different learnig rates for the backbone network layers and the side-output layers?
I find your publication “Deeply Supervised Salient Object Detection with Short Connections” on TPAMI has a great improvement compared with the original conference version. The main change is using the ResNet 101 to replace the VGG. I follow your paper to replace the basic model (VGG) by ResNet 101, but I cannot get your results reported in your paper (on some datasets, the results are even worse than the VGG). Would you please give me the detail network parameters on ResNet101 (like train_val.prototxt, solver.prototxt) ?
Thanks for your interests. I will update my github repo soon after.
想問下。有關這篇論文的執行細節裡提到的「each image is trained for ten times」,是每張圖片總共經過10次訓練,還是每張圖片連續訓練10次?後面的註解有提到iter_size設為10,不過那個好像是用來增加batch_size用的,跟我對上面那句話的理解始終搭不上關係
另外,deconvolution layer如果是以固定kernel型式出現的話,是否可以在不影響back propagation的情況下換成一般的resize operation?因為在tensorflow裡,resize operation也是back propagation的對象之一
请问各位老师,做显著性检测时,喂进caffe里的训练集该怎么标注呢?(图像分类我好理解,不同的图片,标记为不同的类别)
和Fully convolutional neuro network 等语义分割方法类似,整个label map作为ground truth 标注。
程老师好,很赞赏你们把自己的优秀成果开源出来与大家分享。我的有个疑问,就是在数据标注时,是否有统一的标准来减少主观影响。因为我看到数据集中有的图像中动物的头和身体都标注为显著性区域,有的图像中仅把动物头部或脸部标为显著性区域,而将颈部和身体标为非显著性区域。谢谢您的解答!
这个完全取决于标注者对显著性这个概念的理解。想要做到完美的一致性是很难的。
明白了,非常感谢!
您好,我想重新训练您的网络,然而找不到MSRA-B数据集,微软上的下载链接已经失效了,您可不可以提供一个MSRA-B的下载链接
在我们2015年IEEE TIP 的 Benchmark论文主页能找到所有相关述数据集的下载(百度网盘)。
Hi, Qibin. When i train the model, i can’t solve the problem ” Unknown layer type: ImageLabelmapData “. If you know a method to solve the problem, please to help me. Thank you very much!!!
I met the same problem.Do you have the method to solve it right now?
您好,我看了您的这篇论文。有个地方没明白。在3.3Inference一元项的定义中,分母中包含sigmoid函数。请问x的取值范围是{0,1}吗,那个h(x)的值域就是{0.5,e/(e+1)},可以这样理解吗
您的hed编译成功了吗?
请教一下您是如何编译的hed提供的caffe的?
这几天我整理下caffe,然后上传下
非常感谢!
You may find it here https://github.com/Andrew-Qibin/caffe_dss
Thank you! However, when I build the file ‘caffe_dss’ using the command ‘make test’,it shows ‘caffe/layers/hybrid_cross_entropy_loss_layer.hpp: No such file or directory compilation terminated’. I cannot find the file ‘caffe/layers/hybrid_cross_entropy_loss_layer.hpp’ indeed.
The file ‘caffe/layers/soft_iou_loss_layer.hpp’ also can’t be found.
I check the files in path ‘src/caffe/test/’. some head files cited in the files in path ‘src/caffe/test/ are existed, such as ‘channel_wise_cross_entropy_loss_layer.hpp’, ‘channel_wise_scale_layer.hpp’, ‘cross_entropy_loss_layer.hpp’, ‘full_cross_entropy_loss_layer.hpp’,and ‘hybrid_cross_entropy_loss_layer.hpp’, ‘iou_loss_layer.hpp’. Please tell us how to get these files. Thank you very much
I check the files in path ‘src/caffe/test/’. some head files cited in the files in path ‘src/caffe/test/ are not existed, such as ‘channel_wise_cross_entropy_loss_layer.hpp’, ‘channel_wise_scale_layer.hpp’, ‘cross_entropy_loss_layer.hpp’, ‘full_cross_entropy_loss_layer.hpp’ and ‘hybrid_cross_entropy_loss_layer.hpp’, ‘iou_loss_layer.hpp’. Please tell us how to get these files. Thank you very much!
您好,我编译hed提供的caffe时,注释了USE_CUDNN=1(因在其他网页看到编译这个需要cuda4,但我的是cudnn5),然后make all 时出错,提示cublas.h_v2.h:No such file or directory.
set USE_CUDNN=0
非常感谢!
请问可以不编译cudnn吗?
sure!