Research

Global contrast based salient region detection

Ming-Ming Cheng,  Niloy J. Mitra,  Xiaolei HuangPhilip H. S. TorrShi-Min Hu

Fig. Given an input image (top), a global contrast analysis is used to compute a high-resolution saliency map (bottom).

Abstract

Automatic estimation of salient object regions across images, without any prior assumption or knowledge of the contents of the corresponding scenes, enhances many computer vision and computer graphics applications. We introduce a regional contrast based salient object extraction algorithm, which simultaneously evaluates global contrast differences and spatially weighted coherence scores. The proposed algorithm is simple, efficient, naturally multi-scale, and produces full-resolution, high-quality saliency maps. These saliency maps are further used to initialize a novel iterative version of GrabCut for high-quality salient object segmentation. We extensively evaluated our algorithm using traditional salient object detection datasets, as well as a more challenging Internet image dataset. Our experimental results demonstrate that our algorithm consistently outperforms existing salient object detection and segmentation methods, yielding higher precision and better recall rates. We also show that our algorithm can be used to efficiently extract salient object masks from Internet images, enabling effective sketch-based image retrieval (SBIR) via simple shape comparisons. Despite such noisy internet images, where the saliency regions are ambiguous, our saliency guided image retrieval achieves a superior retrieval rate compared with state-of-the-art SBIR methods, and additionally provides important target object region information.

Papers

Most related projects on this website:

  • Efficient Salient Region Detection with Soft Image Abstraction. Ming-Ming Cheng, Jonathan Warrell, Wen-Yan Lin, Shuai Zheng, Vibhav Vineet, Nigel Crook. IEEE International Conference on Computer Vision (IEEE ICCV), 2013. [pdf] [Project page] [bib] [latex] [official version]
    BING: Binarized Normed Gradients for Objectness Estimation at 300fp, Ming-Ming Cheng, Ziming Zhang, Wen-Yan Lin, Philip H. S. Torr, IEEE International Conference on Computer Vision and Pattern Recognition (IEEE CVPR), 2014. [Project page][pdf][bib] (Oral, Accept rate: 5.75%)
    SalientShape: Group Saliency in Image Collections. Ming-Ming Cheng, Niloy J. Mitra, Xiaolei Huang, Shi-Min Hu. The Visual Computer 30 (4), 443-453, 2014. [pdf] [Project page] [bib] [latex] [Official version]
    Deeply supervised salient object detection with short connections, Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, Ali Borji, Zhuowen Tu, Philip Torr, IEEE TPAMI (CVPR 2017), 2019.  [pdf] [Project Page] [bib] [source code & data] [official version] [poster] [中文版海报]

Downloads

Some files are zip format with password. Read the notes to see how to get the password.

1. Data

The MSRA10K benchmark dataset (a.k.a. THUS10000) comprises of per-pixel ground truth annotation for 10, 000 MSRA images (181 MB), each of which has an unambiguous salient object and the object region is accurately annotated with pixel wise ground-truth labeling (13.1M). We provide saliency maps (5.3 GB containing 170, 000 image) for our methods as well as other 15 state of the art methods, including FT [1], AIM [2], MSS [3], SEG [4], SeR [5], SUN [6], SWD [7], IM [8], IT [9], GB [10], SR [11], CA [12], LC [13], AC [14], and CB [15]. Saliency segmentation (71.3MB) results for FT[1], SEG[4], and CB[10] are also available.

2. Windows executable

We supply an windows msi for install our prototype software, which includes our implementation for FT[2], SR[14], LC[28], our HC, RC and saliency cut method.

3. C++ source code

The C++ implementation of our paper as well as several other state of the art works.

4. Supplemental material

Supplemental materials (647 MB) including comparisons with other 15 state of the art algorithms are now available.

Salient object detection results for images with multiple objects. We tested it on the dataset provided by the CVPR 2007 paper: “Image Segmentation by Probabilistic Bottom-Up Aggregation and Cue Integration”.

5. More results for recent methods

If anyone want to share their results on our MSRA10K benchmark (facilitate other researchers to compare with recent methods), please contact me via email (see the header image of this project page for it). I will put your results as well as paper links in this page.

Comparisons with state of the art methods

Fig. Statistical comparison results of (a) different saliency region detection methods, (b) their variants, and (c) object of interest region segmentation methods, using largest publicly available dataset (i) and (ii) our MSRA10K dataset (to be made publicly available). We compare our HC method and RC method with 15 state of art methods, including FT [1], AIM [2], MSS [3], SEG [4], SeR [5], SUN [6], SWD [7], IM [8], IT [9], GB [10], SR [11], CA [12], LC [13], AC [14], and CB [15]. We also take simple variable-size Gaussian model ‘Gau’ and GrabCut method as a baseline. (Please see our paper for detailed explanations).
Fig. Comparison of average Fβ for different saliency segmentation methods: FT [1], SEG [4], and ours, on THUR15K dataset dataset, which is composed of non-selected internet images.
Table. The average time that is taken to compute a saliency map for images in the MSRA10K database. (Note that we use the authors’ original implementations for MSS and FT, which is not well-optimized code.)
MethodTime(s)Code Type
FT0.247Matlab
SEG7.48M&C
CB36.5M&C
Our0.621C++
Table. Comparison of average time for different saliency segmentation methods.
Fig. Saliency maps computed by different state-of-the-art methods~(b-p), and with our proposed HC~(q) and RC methods~(r). Most results highlight edges, or are of low resolution. See also the shared data for saliency detection results for the whole MSRA10K dataset.
Fig. Sketch-based image comparison. In each group from left to right, the first column shows images download from Flickr using the corresponding keyword; the second column shows our retrieval results obtained by comparing user input sketch with SaliencyCut result using shape context measure [41]; the third column shows the corresponding sketch-based retrieval results using SHoG [42].

Figure: such illustration could be automatically generated by CmIllustr::Imgs(…).  This Supplemental materials (647 MB) gives full results for the entire MSRA10K dataset.

FAQs

Until now, more than 2000+ readers (according to email records) have request to get the source code for this project. Some of them have questions about using the code. Here are some frequently asked questions (some of them are frequently asked questions from many reviewers as well) for new users to refer:

Q1: I’m confused with the sentence in the paper: “In our experiments, the threshold is chosen empirically to be the threshold that gives 95% recall rate in our fixed thresholding experiments”. But all most the case, people have not the ground truth, so cannot compute the call rate. When I use your Cut application, I need to guess threshold value to have good cut image.

A: The recall rate is just used to evaluate the algorithm. When you use it, you typically don’t have to evaluate the algorithm itself very often. This sentence is used to explain what the fixed threshold we use typically means. Actually, when initialized using RC saliency maps, this threshold is 70 with saliency values normalized to [0,255]. It doesn’t mean that the saliency values corresponds to recall rate of 95% for every image, but empirically corresponds to recall rate of 95% for a large number of images. So, just use the suggested threshold of 70 is OK.

Q2: I use your code to get results for the same database you used. But the results seem to have some small difference from yours.

A: It seems that the cvtColor function in OpenCV 1.x is different from those in OpenCv 2.X. I suggest users to use those in recent versions. The segmentation method I used sometimes generates strange results, leading to strange results of saliency maps. This happens at low frequency. When this happens, I rerun the exe again and it becomes OK. I don’t know why, but this really happens when I use the exe first time after compiling (Very strange, maybe because some default initializations). If someone find the bug, please report to me.

Q3: Does your algorithm only get good results for images with single salient object? 

A: Mostly yes. As described in our paper, our method is suitable for images with an unambiguous saliency object. Since saliency detection methods typically have no prior knowledge about the target object, thus is very difficult. Much recent researches focus on images with single saliency object. Even for this simple case, state of the art algorithm may also fail. It’s understandable since supervised object detection which uses a large number of training data and prior knowledge also fails in many cases.

However, the value of saliency detection methods lies on their applications in many fields. Because they don’t need large human annotation for learning, and typically much faster than object detection methods, it’s possible to automatically process a large number of images with low cost. Although many of the saliency detection results may be wrong (up to 60% for noise internet image) because of the ambiguous or even missing of salient objects, we can still use efficient algorithms to select those good results and use them in many interesting applications like (Notes: all following projects use our saliency source code, with initial version of SaliencyCut used in our own Sketch2Photo project. Click here for a list of 2000+ citations to the PAMI 2015 (CVPR11) paper):

  1. Unsupervised joint object discovery and segmentation in internet images, M. Rubinstein, A. Joulin, J. Kopf, and C. Liu, in IEEE CVPR, 2013, pp. 1939–1946. (Used the proposed saliency measure and showed that saliency-based segmentation produces state-of-the-art results on co-segmentation benchmarks, without using co-segmentation!)
  2. Image retrieval: Sketch2Photo: Internet Image Montage. Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, Shi-Min Hu. ACM SIGGRAPH Asia. 28, 5, 124:1-10, 2009.
  3. SalientShape: Group Saliency in Image Collections. Ming-Ming Cheng, Niloy J. Mitra, Xiaolei Huang, Shi-Min Hu. The Visual Computer, 2013
  4. PoseShop: Human Image Database Construction and Personalized Content Synthesis. Tao Chen, Ping Tan, Li-Qian Ma, Ming-Ming Cheng, Ariel Shamir, Shi-Min Hu. IEEE TVCG, 19(5), 824-837, 2013.
  5. Internet visual media processing: a survey with graphics and vision applications, Tao Chen, Ping Tan, Li-Qian Ma, Ming-Ming Cheng, Ariel Shamir, Shi-Min Hu. The Visual Computer, 2013, 1-13.

  6. Image editing: Semantic Colorization with Internet Images, Yong Sang Chia, Shaojie Zhuo, Raj Kumar Gupta, Yu-Wing Tai, Siu-Yeung Cho, Ping Tan, Stephen Lin, ACM SIGGRAPH Asia. 2011.
  7. View selection: Web-Image Driven Best Views of 3D Shapes. The Visual Computer, 2011. Accepted. H Liu, L Zhang, H Huang
  8. Image Collage: Arcimboldo-like Collage Using Internet Images.ACM SIGGRAPH Asia, 30(6), 2011. H Huang, L Zhang, HC Zhang
  9. Image manipulation: Data-Driven Object Manipulation in Images. Chen Goldberg, Eurographics 2012, T Chen, FL Zhang, A Shamir, SM Hu.
  10. Saliency For Image Manipulation, R. Margolin, L. Zelnik-Manor, and A. Tal, Computer Graphics International (CGI) 2012.
  11. Mobile Product Search with Bag of Hash Bits and Boundary Reranking,  Junfeng He, Xianglong Liu, Tao Cheng, Jinyuan Feng, Tai-Hsu Lin, Hyunjin Chung and Shih-Fu Chang, IEEE CVPR, 2012.
  12. Unsupervised Object Discovery via Saliency-Guided Multiple Class Learning, Jun-Yan Zhu, Jiajun Wu, Yichen Wei, Eric Chang, and Zhuowen Tu, IEEE CVPR, 2012.
  13. Saliency Detection via Divergence Analysis: A Unified Perspective, ICPR 2012 (Best student paper). (The authors of this ICPR paper have derived that our formulation on global saliency has a deep connection with an information-theoretic measure, the so called Cauchy-Schwarz divergence.)
  14. Much more: http://scholar.google.com/scholar?cites=9026003219213417480

Q4: I’m confused about the definition of saliency. Why the annotation format (isolated points, binary mask regions, and bounding boxes) in different benchmarks for evaluating saliency detection methods are so different?

There are 3 different saliency detection directions: i) fixation prediction, ii) salient object detection, iii) objectness estimation. They have very different research target and very different applications. Personally, I’m mainly interested in the last two problems and will discuss them in a bit more detail.

Eye fixation models aims at predicting where human looks, i.e. a small set of fixation points. The most famous method in this area is Itti’s work in PAMI 1998. The MIT benchmark is designed for evaluating such methods.

Salient object detection, as what is done in this work, aim at finding most salient object in a scene and segment the whole extent of that object. The output is typically a single saliency map (or figure-ground segmentation). The advantages and disadvantages are described in detail in Q3. High precision is a major focus of our work, as we can use shape matching based technique to effectively select good segmentations and build robust applications on top. Most widely used benchmark for evaluating this problem is MSRA1000, which precisely segment 1000 salient objects in MSRA images. Our method achieves 93% precision and 90% recall on MSRA1000 (previous best reported results: 75% precision and 83% recall). Since our results on MSRA100 are mostly comparable to ground truth annotations, we need more challenging benchmark. MSRA10K and THUR15K are built for this purpose.

Objectness estimation is another attractive direction. These methods aim at proposing a small set (typically 1000) of bounding boxes to improve efficiency of classical sliding window pipeline. High recallat a small set of bounding box proposals is a major target. PASCAL VOC is a standard dataset for evaluating this problem. Using purely bottom up data driven methods to produce a single saliency map, as what is done in most salient object detection model, is less likely to succeed in this very challenging dataset. State of the art objectness proposal methods (PAMI12IJCV13) achieves 90+% recall on challenging PASCAL VOC dataset given a relatively small (e.g. 1000) number of bounding boxes, while been computational efficient (4 seconds per image). This is especially useful for speed up multi-class object detection problem, as each classifier only need to examine a much smaller number of image windows (e.g. 1,000,000 -> 1,000).

Q5: In nearly all 300+ papers citing this work, the F-Measure of RC method used for comparison is significantly lower than that is reported in this paper. Why?

Our salient object segmentation involves a powerful SaliencyCut method, for which we have not yet release the source code (will be released only after the journal version been published). The high performance of our salient object segmentation method could simply be verified by running our published binary code. When reporting the F-Measure of our method, most papers use adaptive threshold to get segmentation results, which produce much worse results than our original version. This is somehow reasonable and make the comparison easier, as they don’t have access to our SaliencyCut code. Notice that our method achieves 92% F-Measure on MSRA benchmark, and I have not yet see any other method get F-Measure better than 90% (achieved by our CVPR11 version).  It’s worth mentioning that even latest GrabCut method only achieves ‘comparable’ performance (F-Measure – 89%) on the same benchmark (see “Grabcut in One Cut, Meng Tang, Lena Gorelick, Olga Veksler, Yuri Boykov, ICCV, 2013″).

Q6: The benchmarks you use all have center bias, will this be a problem?

Regarding to the center bias, this seems to be a nature bias in real-world images. In the community of salient object detection, most methods tries to detect the most dominate object rather than dealing with complicated images, where many objects exist and have complicated occlusions, etc. Even (only) dealing with these simple (‘Flickr like’) images is also quite useful for many applications (see Q3). Even trained on thousands of accurately labeled images, state of the art object detection methods still can’t get robust results for PASCAL VOC like images. For salient object detection algorithms, the robustness could come from automatic selection of good results from thousands of images, for which we can get automatic segmentation results for free (no needs for training data annotation). See ‘SalientShape: Group Saliency in Image Collections’ for un-selected and automatic downloaded Flickr images dataset (also have clear center bias) as well as aforementioned applications.

Links to source code of other methods

FT [1] R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk,“Frequency-tuned salient region detection,” in IEEE CVPR, 2009, pp. 1597–1604.
AIM [2] N. Bruce and J. Tsotsos, “Saliency, attention, and visual search: An information theoretic approach,” Journal of Vision, vol. 9, no. 3, pp. 5:1–24, 2009.
MSS [3] R. Achanta and S. S ¨ usstrunk, “Saliency detection using maximum symmetric surround,” in IEEE ICIP, 2010, pp. 2653–2656.
SEG [4] E. Rahtu, J. Kannala, M. Salo, and J. Heikkila, “Segmenting salient objects from images and videos,” ECCV, pp. 366–379, 2010.
SeR [5] H. Seo and P. Milanfar, “Static and space-time visual saliency detection by self-resemblance,” Journal of vision, vol. 9, no. 12, pp. 15:1–27, 2009.
SUN [6] L. Zhang, M. Tong, T. Marks, H. Shan, and G. Cottrell, “SUN: A bayesian framework for saliency using natural statistics,” Journal of Vision, vol. 8, no. 7, pp. 32:1–20, 2008.
SWD [7] L. Duan, C. Wu, J. Miao, L. Qing, and Y. Fu, “Visual saliency detection by spatially weighted dissimilarity,” in IEEE CVPR, 2011, pp. 473–480.
IM [8] N. Murray, M. Vanrell, X. Otazu, and C. A. Parraga, “Saliency estimation using a non-parametric low-level vision model,” in IEEE CVPR, 2011, pp. 433–440.
IT [9] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE TPAMI, vol. 20, no. 11, pp. 1254–1259, 1998.
GB [10] J. Harel, C. Koch, and P. Perona, “Graph-based visual saliency,” in NIPS, 2007, pp. 545–552.
SR [11] X. Hou and L. Zhang, “Saliency detection: A spectral residual approach,” in IEEE CVPR, 2007, pp. 1–8.
CA [12] S. Goferman, L. Zelnik-Manor, and A. Tal, “Context-aware saliency detection,” in IEEE CVPR, 2010, pp. 2376–2383.
LC [13] Y. Zhai and M. Shah, “Visual attention detection in video sequences using spatiotemporal cues,” in ACM Multimedia, 2006, pp. 815–824.
AC [14] R. Achanta, F. Estrada, P. Wils, and S. S ¨ usstrunk, “Salient region detection and segmentation,” in IEEE ICVS, 2008, pp. 66–75.
CB [15] H. Jiang, J. Wang, Z. Yuan, T. Liu, N. Zheng, and S. Li,“Automatic salient object segmentation based on context and shape prior,” in British Machine Vision Conference, 2011, pp. 1–12.
LP [16] T. Judd, K. Ehinger, F. Durand, A Torralba, Learning to predict where humans look, ICCV 2009.
Locations of visitors to this page Locations of visitors to this page

 

(Visited 140,989 times, 1 visits today)
Subscribe
Notify of
guest

118 Comments
Inline Feedbacks
View all comments
Anil

Can I get the implimentation of the method in matlab or MEX file for RC method

sunny

程老师:你的Global Contrast based Salient Region detection论文中有关于颜色量化及平滑那块的程序吗?

Yan

程老师,我想引用你PAMI上的RC方法,我想问下:PAMI那篇文章有具体的出版信息么?比如哪一年哪一期?页码?等。

Sayali

Why there is need of salient object segmentation, after detecting salient object.Why we do segmentation?

Ming-Ming Cheng

It’s a requirment of many associated applications. Please refer to FAQ 3 for some of those applications.

Sayali

are RC and Adaptive thresholding methode both gives salient object detection result?

Sayali

are RC and Adaptive thresholding methode both gives salient object detection result?

朱玲

程老师,您好,请问您的查全率查准率曲线是怎么画的啊?谢谢!

Ming-Ming Cheng

我共享的代码里有。会自动生成这个曲线。

朱玲

程老师,还想请您点拨一下,这些查准率和查全率曲线都是不规则的曲线,我该怎么把我的实验结果和多种对比的曲线,画到一个图表中呢?用的是什么画图软件,该怎么操作?希望得到程老师的回复,谢谢!

fuxiuliang

入本页顶部的照片,怎么由显著图得到二值图像呢?

Ming-Ming Cheng

请先阅读论文。用文中提出的SaliencyCut方法,代码已开源。

WangMei

Achanta论文中的1000图像数据集(原图)在原文链接中无法找到,请问有其他下载链接吗?

WangMei

谢谢您的回复!我想问的是1000幅图片的数据集,是您的论文正文中Figure12的图 (i) Achanta et al.[33] dataset(1,000images)

Heverton Sarah

Hi,

I would like to know where is the exactly place in your code, from the HC method, I can get the salience value from each pixel. Is it in the code CmSaliencyRC, inside the “for”:

for (int r = 0; r < img3f.rows; r++){
float* salV = salHC1f.ptr(r);
int* _idx = idx1i.ptr(r);
for (int c = 0; c < img3f.cols; c++)
salV[c] = colorSal[_idx[c]];
}

The salicence of each pixel is the salV[c] ?

WangMei

请问您的SaliencyCut的源代码除了main函数以外都不能进行修改吗?

Wuyingxue

我也遇到这个问题了,除了主函数外,其他的代码加断点无效,改了代码似乎也没有什么作用,请问,你是怎么解决的?

符俢亮

程老是您好,我下载了你2014年的“Global Contrast based Salient Region Detection“这篇文章的代码包,但是运行老是显示打不开cmlibd.lib,能请教一下怎么解决吗

符俢亮

谢谢

谢贵阳

你好,我遇到了跟你一样的问题。但是我把Saliency项目设置为启动项后仍然出现错误 1 error LNK1104: 无法打开文件“CmLibd.lib” D:\opensource\CmCode\Saliency\LINK Saliency。希望你能把你的配置方法告诉我。谢谢

符俢亮

我改了,但也还是一样出现这个问题。不知道具体原因是啥

alan

你好,我也遇到了同样的问题,不知道您解决了没?如果解决了麻烦跟我说一下,谢谢。

Ls

如果在Debug下编译,你需要将CmLib属性->常规->目标文件名 $(ProjectName)改为$(ProjectName)d,然后将它单独编译一下,CmCode-master\Debug下面就会有这个CmLibd.lib文件,然后Saliency工程的库目录加上这个路径即可,如果release下编译,只需要将CmLib编译,然后将CmCode-master\Release路径加入Saliency工程的库目录即可。

WangMei

将CmLib生成的CmLib.lib重命名为CmLibd.lib,拷贝到Saliency程序根目录下即可。

loujing

程老师,您好,很冒昧打扰您。我投稿了一篇图像显著性的英文论文,在论文中,我参考了您这篇论文,并在实验中引用了您发布在这里的部分结果集图像。我在文中以及结果的图注中,已标注了您论文对应的参考文献号。
目前,编辑部要求我出具已获得在论文中引用这些图像的版权。由于我是第一次遇到这种情况,想请问一下,我应该通过何种方式获得您的授权?
谢谢,期待您的回复。

loujing

是的,感谢您的帮助。

Heverton

Hi! Do you think your method can get good resolts on GPU for real time video?

Yaping

不知道您的这个显著图分割算法用在FT,LC和SR的显著图上效果是怎么样的呢,还是一定是和您的RC配套,不好意思,最近刚开始学习,问的问题可能有点酱油,还请程老师解答下

Ming-Ming Cheng

一般来说,Saliency map越好,后面的分割结果越好。

Yaping

程老师,您好,想问下您的saliency cut的代码大概什么时候公开啊

Ming-Ming Cheng

近期在清理手中的code,准备近期公开。

luzi

您好,我按照您论文中写的saliencycut的方法,在opencv上实现了一个,但结果有较大差别,是不是您的grabcut使用算法和opencv自带的是不同的?
opencv自带的grabcut并不能设定未定区域,只能先择是前景区还是背景区,这个应该如何处理,谢谢!

姜飞

你好:我想问一下,你的代码能跑通吗?我是vs2013+opencv2.4.9,环境都配置好了,但是编译还是不通过。请问你一下,你是如何跑通的?

Robbor

您好,问下,您的文中提到的召回率是如何计算的?是对测试图集进行分割后进行人工统计获取的吗?

Li

您好,在哪个project page下可以找到计算准确率和召回率并生成曲线图的公开代码呢?我没有找到,麻烦您告知一下,非常感谢!

xiezhihong

use to study

xiezhihong

请教一下,你实验的图看上去能构成闭区域,如果在目标区被背景零碎侵蚀,效果不知如何,方便的话,我可以传一幅图给你做一下

lhy

没想到在这里碰到谢老师!

yuan zhan

您好 我想要图片分割显著性的代码
就是这个标题(Salient Object Detection and Segmentation)下面图片分割效果的代码,我想看看你的代码,在我的matlab上运行下 ,感觉你的显著性效果很好,想学习下 谢谢

Ju Ran

请问网页上在Achanta1000数据集上的PR曲线和论文里的为什么不太一样?论文里精度最高达到0.9左右,这里达到将近1,是不是采用了一些后处理?谢谢!

WangMei

Achanta1000数据集一直没有下载到,可以分享一下吗?感谢!wang.melody@qq.com

He Lan

关于显著目标的检测,请问有代码能给我参考一下么?

He Lan

基于全局对比度的显著性区域检测,你好,请问这篇文章的代码能给我做一个参考么?

1 2 3