Deeply Supervised Salient Object Detection with Short Connections
Qibin Hou1 Ming-Ming Cheng1 Xiaowei Hu1 Ali Borji2 Zhuowen Tu3 Philip H. S. Torr4
1CCCE, Nankai University 2CRCV, UCF 3UCSD 4The University of Oxford
Online Demo
Recent progress on salient object detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs). Semantic segmentation and salient object detection algorithms developed lately have been mostly based on Fully Convolutional Neural Networks (FCNs). There is still a large room for improvement over the generic FCN models that do not explicitly deal with the scale-space problem. Holistically-Nested Edge Detector (HED) provides a skip-layer structure with deep supervision for edge and boundary detection, but the performance gain of HED on saliency detection is not obvious. In this paper, we propose a new salient object detection method by introducing short connections to the skip-layer structures within the HED architecture. Our framework takes full advantage of multi-level and multi-scale features extracted from FCNs, providing more advanced representations at each layer, a property that is critically needed to perform segment detection. Our method produces state-of-the-art results on 5 widely tested salient object detection benchmarks, with advantages in terms of efficiency (0.08 seconds per image), effectiveness, and simplicity over the existing algorithms. Beyond that, we conduct an exhaustive analysis of the role of training data on performance. Our experimental results provide a more reasonable and powerful training set for future research and fair comparisons.
- Deeply supervised salient object detection with short connections, Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, Ali Borji, Zhuowen Tu, Philip Torr, IEEE TPAMI (CVPR 2017), 2019. [pdf] [Project Page] [bib] [source code & data] [official version] [poster] [中文版海报][LaTeX]
Source Code
You can find our code here. We have uploaded the caffe and CRF packages we used in our paper.
If you find our work is helpful, please cite
@article{HouPami19Dss, title={Deeply supervised salient object detection with short connections}, author={Hou, Qibin and Cheng, Ming-Ming and Hu, Xiaowei and Borji, Ali and Tu, Zhuowen and Torr, Philip}, year = {2019}, volume={41}, number={4}, pages={815-828}, journal= {IEEE TPAMI}, doi = {10.1109/TPAMI.2018.2815688}, }
andrewhoux AT gmail DOT com
This algorithm is used in flagship products such as Huawei Mate 10, Huawei Honour V10 etc, to create “AI Selfie: Brilliant Bokeh, perfect portraits” effects as demonstrated in Mate 10 launch show, in Munich, Germany.

A report in Nature: link.
Hello! Only 1447 saliency maps exit in your published results of HKU-IS dataset. However, as far as I am concerned, there are 4447 images for HKU-IS. Why?
The rests are used for training and validation (See the paper).
In your publication “Deeply Supervised Salient Object Detection with Short Connections” on TPAMI, you said that “We use full resolution images to train our network, and the mini batch size is set to 10.”. Does that mean that you use the original training images and set the batch size to 10?I found that in the training dataset MSRA-B, some images have different sizes, so how do you use the different size of images in a batch? In addition, can you plz give the detail about the learning rate ,decay parameters and step size? Thank you so much!
You can set iter_size to 10 which can solve the problem of images with different resolutions. lr (0.001) if you normalize the loss. weight decay (5e-4). step size (8000, totally 12,000).
so which kind of optimization method do you use? The Adam or Momentum?
And what do you mean by the” normalize the loss”? thank you so much!
Please refer to our source code for more details
Do you mean that we should train the model with an initial learning rate of 0.001? The basic learning rate specified in the paper as well as in the open sourced code is 1e-8. When we tried to train the model with a learning rate of 1e-8 using Momentum optimizer, it seemed that the side-output layers could not learn feartures in a right way. Some of the side-output layers would always output images that were completely white inspite of different input images. What do you think may cause such a phenomenon? Thanks for your time.
The selection of the initial lr actually depends on whether the reduction parameter (Pytorch) in the loss layer is activated (the ‘normalize’ parameter in Caffe). This means if your total loss is divided by the number of pixels then you can set lr to 1e-3. If not, you need to set it to 1e-8 or less.
Thanks a lot for your timely reply.
By the way, is it necessary to set different learnig rates for the backbone network layers and the side-output layers?
I find your publication “Deeply Supervised Salient Object Detection with Short Connections” on TPAMI has a great improvement compared with the original conference version. The main change is using the ResNet 101 to replace the VGG. I follow your paper to replace the basic model (VGG) by ResNet 101, but I cannot get your results reported in your paper (on some datasets, the results are even worse than the VGG). Would you please give me the detail network parameters on ResNet101 (like train_val.prototxt, solver.prototxt) ?
Thanks for your interests. I will update my github repo soon after.
想問下。有關這篇論文的執行細節裡提到的「each image is trained for ten times」,是每張圖片總共經過10次訓練,還是每張圖片連續訓練10次?後面的註解有提到iter_size設為10,不過那個好像是用來增加batch_size用的,跟我對上面那句話的理解始終搭不上關係
另外,deconvolution layer如果是以固定kernel型式出現的話,是否可以在不影響back propagation的情況下換成一般的resize operation?因為在tensorflow裡,resize operation也是back propagation的對象之一
和Fully convolutional neuro network 等语义分割方法类似,整个label map作为ground truth 标注。
在我们2015年IEEE TIP 的 Benchmark论文主页能找到所有相关述数据集的下载(百度网盘)。
Hi, Qibin. When i train the model, i can’t solve the problem ” Unknown layer type: ImageLabelmapData “. If you know a method to solve the problem, please to help me. Thank you very much!!!
I met the same problem.Do you have the method to solve it right now?
You may find it here
Thank you! However, when I build the file ‘caffe_dss’ using the command ‘make test’,it shows ‘caffe/layers/hybrid_cross_entropy_loss_layer.hpp: No such file or directory compilation terminated’. I cannot find the file ‘caffe/layers/hybrid_cross_entropy_loss_layer.hpp’ indeed.
The file ‘caffe/layers/soft_iou_loss_layer.hpp’ also can’t be found.
I check the files in path ‘src/caffe/test/’. some head files cited in the files in path ‘src/caffe/test/ are existed, such as ‘channel_wise_cross_entropy_loss_layer.hpp’, ‘channel_wise_scale_layer.hpp’, ‘cross_entropy_loss_layer.hpp’, ‘full_cross_entropy_loss_layer.hpp’,and ‘hybrid_cross_entropy_loss_layer.hpp’, ‘iou_loss_layer.hpp’. Please tell us how to get these files. Thank you very much
I check the files in path ‘src/caffe/test/’. some head files cited in the files in path ‘src/caffe/test/ are not existed, such as ‘channel_wise_cross_entropy_loss_layer.hpp’, ‘channel_wise_scale_layer.hpp’, ‘cross_entropy_loss_layer.hpp’, ‘full_cross_entropy_loss_layer.hpp’ and ‘hybrid_cross_entropy_loss_layer.hpp’, ‘iou_loss_layer.hpp’. Please tell us how to get these files. Thank you very much!
您好,我编译hed提供的caffe时,注释了USE_CUDNN=1(因在其他网页看到编译这个需要cuda4,但我的是cudnn5),然后make all 时出错,提示cublas.h_v2.h:No such file or directory.