BING: Binarized Normed Gradients for Objectness Estimation at 300fps
Ming-Ming Cheng1 Ziming Zhang2 Wen-Yan Lin3 Philip Torr1
1The University of Oxford 2Boston University 3Brookes Vision Group
Abstract
Training a generic objectness measure to produce a small set of candidate object windows, has been shown to speed up the classical sliding window object detection paradigm. We observe that generic objects with well-defined closed boundary can be discriminated by looking at the norm of gradients, with a suitable resizing of their corresponding image windows in to a small fixed size. Based on this observation and computational reasons, we propose to resize the window to 8 × 8 and use the norm of the gradients as a simple 64D feature to describe it, for explicitly training a generic objectness measure.
We further show how the binarized version of this feature, namely binarized normed gradients (BING), can be used for efficient objectness estimation, which requires only a few atomic operations (e.g. ADD, BITWISE SHIFT, etc.). Experiments on the challenging PASCAL VOC 2007 dataset show that our method efficiently (300fps on a single laptop CPU) generates a small set of category-independent, high quality object windows, yielding 96.2% object detection rate (DR) with 1,000 proposals. Increasing the numbers of proposals and color spaces for computing BING features, our performance can be further improved to 99.5% DR.
Papers
- BING: Binarized Normed Gradients for Objectness Estimation at 300fps, Ming-Ming Cheng, Yun Liu, Wen-Yan Lin, Ziming Zhang, Paul L. Rosin, Philip H. S. Torr, Computational Visual Media 5(1):3-20, 2019. [Project page][pdf][bib] (Extention of CVPR 2014 Oral)
- BING: Binarized Normed Gradients for Objectness Estimation at 300fps. Ming-Ming Cheng, Ziming Zhang, Wen-Yan Lin, Philip Torr, IEEE CVPR, 2014. [Project page][pdf][bib][C++][Latex][PPT, 12 min] [Seminar report, 50 min] [Poster] [Spotlight, 1 min] (Oral, Accept rate: 5.75%)
Most related projects on this website
- SalientShape: Group Saliency in Image Collections. Ming-Ming Cheng, Niloy J. Mitra, Xiaolei Huang, Shi-Min Hu. The Visual Computer 30 (4), 443-453, 2014. [pdf] [Project page] [bib] [latex] [Official version]
- Efficient Salient Region Detection with Soft Image Abstraction. Ming-Ming Cheng, Jonathan Warrell, Wen-Yan Lin, Shuai Zheng, Vibhav Vineet, Nigel Crook. IEEE International Conference on Computer Vision (IEEE ICCV), 2013. [pdf] [Project page] [bib] [latex] [official version]
- Global Contrast based Salient Region Detection. Ming-Ming Cheng, Niloy J. Mitra, Xiaolei Huang, Philip Torr, Shi-Min Hu. IEEE TPAMI, 2014. [Project page] [Bib] [Official version] (2nd most cited paper in CVPR 2011)
Spotlights Video (17MB Video, pptx)
Figure. Tradeoff between #WIN and DR (see [3] for more comparisons with other methods [6, 12, 16, 20, 25, 28, 30, 42] on the same benchmark). Our method achieves 96.2% DR using 1,000 proposals, and 99.5% DR using 5,000 proposals.
Table 1. Average computational time on VOC2007.
Table 2. Average number of atomic operations for computing objectness of each image window at different stages: calculate normed gradients, extract BING features, and get objectness score.
Figure. Illustration of the true positive object proposals for VOC2007 test images.
Downloads
The C++ source code of our method is public available for download. An OpenCV compatible VOC 2007 annotations could be found here. 由于VOC网站在中国大陆被墙,我们提供了一个镜像下载链接:百度网盘下载, 镜像下载. Matlab file for making figure plot in the paper. Results for VOC 2007 (75MB). We didn’t apply any patent for this system, encouraging free use for both academic and commercial users.
Links to most related works:
- Measuring the objectness of image windows. Alexe, B., Deselares, T. and Ferrari, V. PAMI 2012.
- Selective Search for Object Recognition, Jasper R. R. Uijlings, Koen E. A. van de Sande, Theo Gevers, Arnold W. M. Smeulders, International Journal of Computer Vision, Volume 104 (2), page 154-171, 2013
- Category-Independent Object Proposals With Diverse Ranking, Ian Endres, and Derek Hoiem, PAMI February 2014.
- Proposal Generation for Object Detection using Cascaded Ranking SVMs. Ziming Zhang, Jonathan Warrell and Philip H.S. Torr, IEEE CVPR, 2011: 1497-1504.
- Learning a Category Independent Object Detection Cascade. E. Rahtu, J. Kannala, M. B. Blaschko, IEEE ICCV, 2011.
- Generating object segmentation proposals using global and local search, Pekka Rantalankila, Juho Kannala, Esa Rahtu, CVPR 2014.
- Efficient Salient Region Detection with Soft Image Abstraction. Ming-Ming Cheng, Jonathan Warrell, Wen-Yan Lin, Shuai Zheng, Vibhav Vineet, Nigel Crook. IEEE ICCV, 2013.
- Global Contrast based Salient Region Detection. Ming-Ming Cheng, Niloy J. Mitra, Xiaolei Huang, Philip Torr, Shi-Min Hu. IEEE TPAMI, 2014. (2nd most cited paper in CVPR 2011).
- Geodesic Object Proposals. Philipp Krähenbühl and Vladlen Koltun, ECCV, 2014.
Suggested detectors:
The proposals needs to be verified by detector in order to be used in real applications. Our proposal method perfectly match the major speed limitation of the following stage of the art detectors (please email me if you have other suggestions as well):
- Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, R. Girshick, J. Donahue, T. Darrell, J. Malik, IEEE CVPR (Oral), 2014. (Code; achieves best ever reported performance on PASCAL VOC)
- Fast, Accurate Detection of 100,000 Object Classes on a Single Machine, CVPR 2013 (best paper).
- Regionlets for Generic Object Detection, ICCV 2013 oral. (Runner up Winner in the ImageNet large scale object detection challenge)
Recent methods
- Data-driven Objectness, IEEE TPAMI, in print.
Applications
If you have developed some exciting new extensions, applications, etc, please send a link to me via email. I will add a link here:
- CNN: Single-label to Multi-label, Yunchao Wei, Wei Xia, Junshi Huang, Bingbing Ni, Jian Dong, Yao Zhao, Shuicheng Yan, arXiv, 2014
Third party resources.
If you have made a version running on other platforms (Software at other platforms, e.g. Mac, Linux, vs2010, makefile projects) and want to share it with others, please send me an email containing the url and I will add a link here. Notice, these third party versions may or may not contain updates and bug fix, which I provided in the next section of this webpage for easier updates.
- Linux version of this work provided by Shuai Zheng from the University of Oxford.
- Linux version of this work provided by Dr. Ankur Handa from the University of Cambridge.
- Unix version of this work provided by Varun from University of Maryland.
- OpenCV version (doc) of this work by Francesco Puja et al.
- Matlab version of this work by Tianfei Zhou from Beijing Institute of Technology
- Matlab version (work with 64 bit Win7 & visual studio 2012) provided by Jiaming Li from University of Electronic Science and Technology of China(UESTC).
Bug fix
- 2014-4-11: There was a bug in Objectness::evaluatePerImgRecall(..) function. After update, the DR-#WIN curve looks slightly better for high value of #WIN. Thanks YongLong Tian and WangLong Wu for reporting the bug.
FAQs
Since the release of the source code 2 days ago, 500+ students and researchers has download this source code (according to email records). Here are some frequently asked questions from users. Please read the FAQs before sending me new emails. Questions already occurred in FAQs will not be replied.
1. I download your code but can’t compile it in visual studio 2008 or 2010. Why?
I use Visual Studio 2012 for develop. The shared source code guarantee working under Visual Studio 2012. The algorithm itself doesn’t rely on any visual studio 2012 specific features. Some users already reported that they successfully made a Linux version running and achieves 1000fps on a desktop machine (my 300fps was tested on a laptop machine). If users made my code running at different platforms and want to share it with others, I’m very happy to add links from this page. Please contact me via email to do this.
2. I run the code but the results are empty. Why?
Please check if you have download the PASCAL VOC data (2 zip files for training and testing and put them in ./VOC2007/). The original VOC annotations could not directly be read by OpenCV. I have shared a version which is compatible with OpenCV (https://mmcheng.net/code-data/). After unzip all the 3 data package, please put them in the same folder and run the source code.
3. What’s the password for unzip your source code?
Please read the notice in the download page. You can get it automatically by supplying your name and institute information.
4. I got different testing speed than 300fps. Why?
If you are using 64bit windows, and visual studio 2012, the default setting should be fine. Otherwise, please make sure to enable OPENMP and native SSE instructions. In any cases, speed should be tested under release mode rather than debug mode. Don’t uncomments commands for showing progress, e.g. printf(“Processing image: %s”, imageName). When the algorithm runs at hundreds fps, printf, image reading (SSD hard-disk would help in this case), etc might become bottleneck of the speed. Depending on different hardware, the running speed might be different. To eliminate influence of hard disk image reading speed, I preload all testing images before count timing and do predicting. Only 64 bit machines support such large memory for a single program. If you RAM size is small, such pre-loading might cause hard disk paging, resulting slow running time as well. Typical speed people reporting ranging from 100fps (typical laptop) ~ 1000fps (pretty powerful desktop).
5. After increase the number of proposals to 5000, I got only 96.5% detection rate. Why?
Please read through the paper before using the source code. As explained in the abstract, ‘With increase of the numbers of proposals and color spaces … improved to 99:5% DR’. Using three different color space can be enabled by calling “getObjBndBoxesForTests” rather than the default one in the demo code “getObjBndBoxesForTestsFast”.
6. I got compilation or linking errors like: can’t find “opencv2/opencv.hpp”, error C1083: can’t fine “atlstr.h”.
These are all standard libraries. Please copy the error message and search at Google for answers.
7. Why linear SVMs, gradient magnitudes? These are so simple and alternatives like *** could be better and I got some improvements by doing so. Some implementation details could be improve as well.
Yes, there are many possibilities for improvement and I’m glad to hear people got some improvements already (it is nice to receive these emails). Our major focus is the very simple observation about things vs. stuff distinction (see section 3.1 in our CVPR14 paper). We try to model it as simple and as efficient as possible. Implementation details are also not guaranteed to be optimal and there are space to improve (I’m glad to receive such suggestions via email as well).
8. Like many other proposal methods, the BING method also generates many proposal windows. How can I distinguish between the windows I expect from others.
Like many other proposal methods (PMAI 2012, IJCV 2013, PAMI 2014, etc.), the number of proposals typically goes to a few thousands. To get the real detection results, you still need to apply a detector. A major advantage of the proposal methods is that the detector can ignore most (up to 99%) image windows in traditional sliding window pipeline, but still be able to check 90+% object windows. See the ‘Suggested detectors‘ section on this webpage for more details.
9. Is there any step by step guidance of using the source code?
Please see the read me document for details about where to download data, where to put the files, and advice for getting maximal speed.
10. Could you give a detailed step by step example of how to get binary normed gradient map from normed gradient map?
The simple method of getting binary normed gradients (binary values) from normed gradients (BYTE values) is described in detail in Sec. 3.3 of our CVPR 2014 paper (the paragraph above equation 5). Here is a simple example to help understanding. E.g. the binary representation of a BYTE value 233 is 11101001. We can take its top 4 bits 1110 to approximate the original BYTE values. If you want to recover the BYTE value from the 4 binary bits 1110, you will get an approximate value 224.
11. Is there any intuitive explanation of the objectness scores, i.e. s_l in equation (1) and O_l in equation (3) ?
The bigger value these scores are, it is more likely to be an object window. Although BING feature is a good feature for getting object proposals, its still not good enough to produce object detection results (see also FAQ 8). We can consider the number of object windows as a computation budget, and we want high recall within this budget. Thus we typically select top n proposals according to these scores, even the score might be negative value (not necessary means a non-object window). The value s_l means how good the window match with the template. The o_l is the score after calibration in order to rank proposals from more likely size (e.g. 160*160) higher than proposals from less likely size (e.g 10*320). The calibration parameters can be considered as a per size bias terms.
12. Typos in the project page, imperfect post reply, miss-spelled English words in the C++ source code, email not replied, etc.
I apologies for my limited language ability. Please report to me via personal emails if you found such typos, etc. It would also be more than welcome if you can simply repost if I missed to reply some of the important information.
I’m a careless boy and forgot to reply some of the emails quite often. If you think your queries or suggestions are important but not get replied in 5 working days, please simply resent the email.
13. Problem when running to the function format().
Some user suffered from error caused by not be able to correctly format() function in the source code. This is an standard API function of OpenCV. Notice that proper version of OpenCV needs to be linked. It seems that the std::string is not compatible with each other across different versions of Visual studio. You must link to appropriate version of it. Be care with the strange name mapping in visual studio: Visual studio 2005 (VC8), Visual studio 2008 (VC9), Visual studio 2010 (VC10), Visual studio 2012 (VC11), Visual studio 2013 (VC13).
14. What’s the format of the returned bounding boxes and how to illustrate the boxes as in the paper.
We follow the PASCAL VOC standard bounding boxes definition, i.e. [minX, minY, maxX, maxY]. You can refer the Objectness::illuTestReults() function for how the illustration was done.
15. Discussions in CvChina
There are 400+ disscusions about this projects in http://www.cvchina.info/2014/02/25/14cvprbing/ (in Chinese). You may find answers to your problems there.
程老师,
问一个关于显著性检测的问题。你觉得只生成saliancy map的salient object detection 算不算detection,还是只能算salient object segmentation?因为我最近一篇文章的题目中用了salient object detection,但检测结果只给了saliency map,结果审稿人说我“连基本概念都没搞清楚”,审稿人认为object detection 的结果应该是体现object 位置信息的,比如把物体给框住才算检测,审稿人说“只给saliency map的结果不算检测,只算segmentation”。我不知道这个审稿人是否说的对,我想听听你的看法。
因为CVPR,ICCV,ECCV上面很多文章都是把saliency map作为salient object detection的结果的,所以我不明白是那个审稿人自己没搞清楚基本概念,还是大家平常在写作时都忽视了detection和segmentation的区别?
我觉得只生成saliency map就是salient object detection呀,不一定得生成segmentation结果。
程老师,你好
请问ResultsBBoxesB2W8MAXBGR 里面的结果中 (如下例) 是否第一列float数字即为 scores of the proposals, 为何这些score都是负值? 是否排在越前面的是match with template 越好的?
-0.307949, 1, 257, 353, 500
-0.349893, 1, 1, 353, 500
-0.364906, 97, 1, 352, 500
-0.4157, 1, 33, 353, 288
排序越靠前越好。这些score的相对打下比较重要。具体值可以忽略。
这个值的绝对值没有意义,相对排序才是我们要的
您好,这个代码调了好久,还是不能跑,能请教一下的吗?QQ 814164907 非常感谢。
我刚学习object detection,在运行该代码的时候怎么出现这种情况呢“2501 training and 0 testing”? 谢谢!
很可能是你没有按照readme的要求下载测试数据。
您这个问题后来是如何解决的?真的是测试集的问题么?
程老师您好:
在您的程序中我添加了自己的图片,图片中没有objectness,但与有objectness的图片一样,依然能得到很多boxes的信息(我将这些boxes在图片中画出)。而且在没有objectness相应的yml文件中设置bounding box坐标为零,依然可得到illustrated boxes,这个问题让我很不解,希望程老师解答,谢谢。
请看FAQ中关于proposal和detection区别和联系的部分。你如果理解了你说的这个“画出”,就应该明白了。
程老师,在训练SVM时的那几个solver_type都是什么意思呢?在程序里找了,好像没有注释。
这个是LibLinear的内容。你可以上LibLinear的官网上去查
程老师,您好
BING是没有可视化的效果的对吧?如果需要显示像paper中的矩形框要怎么做呢?
我共享的程序中有一个illustrate的函数,只是默认被注释掉了。你取消注释就可以了
程老师您好:
程序中将测试样本的gtTestBoxes与boxTest匹配度最高的box作为测试样本的proposal,如果没有测试样本的gtTestBoxes信息,怎样的得到测试样本的proposal呢?有什么解决办法吗?希望程老师能够给与解答,谢谢。
你对这部分程序理解有误。Proposal本身和gtTestBosex无关。但是直接illustrate出来所有proposal看着会很混乱。gtTestBoxes只是用于illustration。详情请看FAQ中关于proposal和Detection区别和联系的部分。
程老师,您好。
我初次接触这方面的知识想知道 objectness proposal generation 是什么意思?怎么翻译都感觉不通顺
麻烦您了
我也感觉这个概念用英语说挺明确,翻译有些困难。Objectness这个词我可能会翻译为“似物性”。
微博上有人建议翻译为“类物体区域采样”。我感觉不错。
程老师,您好,
我在运行程序时出现了如下错误,希望您能够帮助解答。
OpenCV Error:Assertion failed <matRead && matRead> in Objectness ::trainStagel,file Objectness.cpp。
而且在前面database时显示的是 0 training and 0 testing
没有读入数据。请查看数据路径是否设置正确。
程老师你好,具体的数据路径应该在哪里设置?我按照readme的指示依然有这个问题
你好,你的问题解决了吗?我也是这个问题
Perhaps, you need to set path, like this “//BING-Objectness/VOC2007/”.
The suffix is important.
Hi, I just set the path as the readme file, and also someone else also has the following errors, but still I didn’t find the right answer, could you help me figure this out?? In the main.cpp, I use DataSetVOC voc2007(“/home/VOC2007/”). I didn’t change any other path in the other files. In the annotations, I use the .yml file download from your website. Thanks.
Invalidate class name
in /home/BING_Linux/Src/DataSetVOC.cpp:125
OpenCV Error: Assertion failed (Invalidate class name
) in loadBox, file /home/BING_Linux/Src/DataSetVOC.cpp, line 125
terminate called after throwing an instance of ‘cv::Exception’
what(): /home/BING_Linux/Src/DataSetVOC.cpp:125: error: (-215) Invalidate class name
in function loadBox
程老师,您好:
我想问下训练和测试样本的路径在哪里修改?
Main 函数中有
程老师您好,我想把得到的框都显示到图上,要怎么弄?
程老师,咨询一个问题:我中了最近的ICIP2014的一篇论文,估计是海报展示,你觉得ICIP这样的会值不值得去参加?这次在法国开。
这个我也不太清楚。如果不注册不能发表吧? 单位出资的话为什么不去?可以去看看,加上讨论。如果单位不出资的话得看自己经济实力了。
学校有资助的,好的,谢谢程老师的见意。见见世面也好,虽然会不怎么样。
程老师,我用你的程序对VOC2007上的图像进行训练,然后对MSRA 5000的数据库进行测试,发现测试效果不太好,这跟训练样本有关系么?需要针对MSRA 数据库重新训练么?因为MSRA上的图像都很简单,VOC2007上那么难的图像都能检测地很好,为何到了容易的数据库上反倒不行了?
这个Project是做proposal的,不是直接做检测的。我不太清楚你说的效果不太好是指?你是怎么评测结果的?你有导出MSRA数据集中标注的bounding Box,然后验证吗(默认的数据集的标注格式不太一样)?另外对于这种仅含有少量显著性物体的图片,https://mmcheng.net/salobj/ 方法可能效果会更接近很多应用的需求。
我主要是想用你方法的物体检测的结果来生成显著物体先验图,用于后续显著物体检测,这个思路已经在ICCV2013中显著性物体检测的方法中用到过,由于那些方法中是用what is an object那篇CVPR文章的代码做。你的文章在物体检测效果上比那篇文章好, 那么理论上生成的显著物体先验图就应该比ICCV2013的文章的效果更好,但是结果赶不上ICCV2013的文章的效果,所以我比较奇怪为啥。我是用生成的显著物体先验图来判断好坏的。
BING 方法直接predict proposals,中间结果并不生成‘先验图’。我不知道你是怎么从这个方法中得到“先验图”的?如果想得到Saliency map,建议你用:https://mmcheng.net/salobj/
好的,谢谢哈,应该是我搞错了。另外还有个不是关于你文章的问题,是关于你BING这篇文章用于对比的IJCV2013那篇文章(Selective Search for Object Recognition)的,给它的作者发了Email,好久都不理我。你应该看过这篇文章了,我发现IJCV2013那篇文章只是给出了proposal,但没有给每个proposal打分,也就是不知道哪个proposal有更大可能性包含物体,不知道我理解的对不对?
嗯。IJCV 2013的文章没有给每个proposal打分。可能是他们收email太多了,不一定每个都回复。
谢谢了。
(^__^)
程老师,你好。我下载你网站上提供的BING代码下来,发现解压需要密码的,你的源代码不公开的是吗?
请看下载页红色字体标注的注意事项。自动获取解压缩密码。
程老师,你公布的程序里面有没有生成yml文件的代码,因为我想训练PASCAL VOC以外的样本,但不知道怎么生成yml文件。
里面好像有一个网上找的matlab写的转换程序。但是直接用还会出点小问题,我当时还写了一个简单字符串处理的程序去解决这个问题。
您说的那个字符串处理程序 在哪里可以找到?
程老师,为何每个生成的包含目标的窗口的score不归一化到[0,1]之间,甚至允许出现负值?
目前比较大小就行了。归一化我没有做。
程老师,你好,
我想问下在./VOC2007/Local/ResIlu/中给出的proposal窗口只是很多检测出来的正确proposal中的一个,还是检测出的很多proposal中正确的全部窗口呢?另外,哪个函数能直接调整生成的proposal的个数?
请看FAQ中关于Proposal与Detection区别及联系的部分。本算法生成排序的Proposal,你可以根据自己的computational budget决定,需要多少个,取前多少个就行。
程老师,你好:
我想问下在./VOC2007/Local/ResIlu/中的图像都是结果图像还是ground truth的标记图像?因为我发现我用JPEGImages文件夹中一部分图像作为测试图像的时候,当我故意修改Annotations文件夹中和这些测试图像对应的.yml文件中的物体坐标时,对ResIlu文件夹中的结果影响很明显。这让我感觉很奇怪,因为测试图像的.yml文件应该只是用于评估检测效果的,为何修改其中的坐标对ResIlu文件夹中的检测结果会有影响?
请参考FAQ中关于Proposal,detection,illustration相关部分。如果理解了这三个概念,就明白了。
Can I train and test solely on single channel images? I can see that you use MAXBGR, RGB and G, can I simply use only G? How much do you think it will affect performance?
Yes. You can simply use G channel. The influence to performance is quite minor.
程老师,你好!我换成自己的图像,需要修改哪些地方呢?不做任何修改只是替换掉原图像位置的话,最后框还是原图像的位置(与原.yml中的位置对应)
建议你阅读FAQ中关于proposal和detection区别的部分 (FAQ 8)。Proposal通常包含近1000个,很多是false positive,直接illustrate看不清楚,…