Research

BING: Binarized Normed Gradients for Objectness Estimation at 300fps

Ming-Ming Cheng1           Ziming Zhang2        Wen-Yan Lin3           Philip Torr1

1The University of Oxford     2Boston University      3Brookes Vision Group

Abstract

Training a generic objectness measure to produce a small set of candidate object windows, has been shown to speed up the classical sliding window object detection paradigm. We observe that generic objects with well-defined closed boundary can be discriminated by looking at the norm of gradients, with a suitable resizing of their corresponding image windows in to a small fixed size. Based on this observation and computational reasons, we propose to resize the window to 8 × 8 and use the norm of the gradients as a simple 64D feature to describe it, for explicitly training a generic objectness measure.

We further show how the binarized version of this feature, namely binarized normed gradients (BING), can be used for efficient objectness estimation, which requires only a few atomic operations (e.g. ADD, BITWISE SHIFT, etc.). Experiments on the challenging PASCAL VOC 2007 dataset show that our method efficiently (300fps on a single laptop CPU) generates a small set of category-independent, high quality object windows, yielding 96.2% object detection rate (DR) with 1,000 proposals. Increasing the numbers of proposals and color spaces for computing BING features, our performance can be further improved to 99.5% DR.

Papers

  1. BING: Binarized Normed Gradients for Objectness Estimation at 300fps, Ming-Ming Cheng, Yun Liu, Wen-Yan Lin, Ziming Zhang, Paul L. Rosin, Philip H. S. Torr, Computational Visual Media 5(1):3-20, 2019. [Project page][pdf][bib] (Extention of CVPR 2014 Oral)
  2. BING: Binarized Normed Gradients for Objectness Estimation at 300fps. Ming-Ming Cheng, Ziming Zhang, Wen-Yan Lin, Philip Torr, IEEE CVPR, 2014. [Project page][pdf][bib][C++][Latex][PPT, 12 min] [Seminar report, 50 min] [Poster] [Spotlight, 1 min] (Oral, Accept rate: 5.75%)

Most related projects on this website

  • SalientShape: Group Saliency in Image Collections. Ming-Ming Cheng, Niloy J. Mitra, Xiaolei Huang, Shi-Min Hu. The Visual Computer 30 (4), 443-453, 2014. [pdf] [Project page] [bib] [latex] [Official version]
  • Efficient Salient Region Detection with Soft Image Abstraction. Ming-Ming Cheng, Jonathan Warrell, Wen-Yan Lin, Shuai Zheng, Vibhav Vineet, Nigel Crook. IEEE International Conference on Computer Vision (IEEE ICCV), 2013. [pdf] [Project page] [bib] [latex] [official version]
  • Global Contrast based Salient Region Detection. Ming-Ming Cheng, Niloy J. Mitra, Xiaolei Huang, Philip Torr, Shi-Min Hu. IEEE TPAMI, 2014. [Project page] [Bib] [Official version] (2nd most cited paper in CVPR 2011)

Spotlights Video (17MB Video, pptx)

Figure.  Tradeoff between #WIN and DR (see [3] for more comparisons with other methods [6, 12, 16, 20, 25, 28, 30, 42] on the same benchmark). Our method achieves 96.2% DR using 1,000 proposals, and 99.5% DR using 5,000 proposals. ResBING

Table 1. Average computational time on VOC2007.

TimingBING

Table 2. Average number of atomic operations for computing objectness of each image window at different stages: calculate normed gradients, extract BING features, and get objectness score.

SampleBING

Figure.  Illustration of the true positive object proposals for VOC2007 test images.

Downloads

     The C++ source code of our method is public available for download. An OpenCV compatible VOC 2007 annotations could be found here. 由于VOC网站在中国大陆被墙,我们提供了一个镜像下载链接:百度网盘下载, 镜像下载Matlab file for making figure plot in the paper. Results for VOC 2007 (75MB). We didn’t apply any patent for this system, encouraging free use for both academic and commercial users.

Links to most related works:

  1. Measuring the objectness of image windows. Alexe, B., Deselares, T. and Ferrari, V. PAMI 2012.
  2. Selective Search for Object Recognition, Jasper R. R. Uijlings, Koen E. A. van de Sande, Theo Gevers, Arnold W. M. Smeulders, International Journal of Computer Vision, Volume 104 (2), page 154-171, 2013
  3. Category-Independent Object Proposals With Diverse Ranking, Ian Endres, and Derek Hoiem, PAMI February 2014.
  4. Proposal Generation for Object Detection using Cascaded Ranking SVMs. Ziming Zhang, Jonathan Warrell and Philip H.S. Torr, IEEE CVPR, 2011: 1497-1504.
  5. Learning a Category Independent Object Detection Cascade. E. Rahtu, J. Kannala, M. B. Blaschko, IEEE ICCV, 2011.
  6. Generating object segmentation proposals using global and local search, Pekka Rantalankila, Juho Kannala, Esa Rahtu, CVPR 2014.
  7. Efficient Salient Region Detection with Soft Image Abstraction. Ming-Ming Cheng, Jonathan Warrell, Wen-Yan Lin, Shuai Zheng, Vibhav Vineet, Nigel Crook. IEEE ICCV, 2013.
  8. Global Contrast based Salient Region Detection. Ming-Ming Cheng, Niloy J. Mitra, Xiaolei Huang, Philip Torr, Shi-Min Hu. IEEE TPAMI, 2014. (2nd most cited paper in CVPR 2011).
  9. Geodesic Object Proposals. Philipp Krähenbühl and Vladlen Koltun, ECCV, 2014.

Suggested detectors:

The proposals needs to be verified by detector in order to be used in real applications. Our proposal method perfectly match the major speed limitation of the following stage of the art detectors (please email me if you have other suggestions as well):

  1. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, R. Girshick, J. Donahue, T. Darrell, J. Malik, IEEE CVPR (Oral), 2014. (Code; achieves best ever reported performance on PASCAL VOC)
  2. Fast, Accurate Detection of 100,000 Object Classes on a Single Machine, CVPR 2013 (best paper).
  3. Regionlets for Generic Object Detection, ICCV 2013 oral. (Runner up Winner in the ImageNet large scale object detection challenge)

Recent methods

  1. Data-driven Objectness, IEEE TPAMI, in print.

Applications

If you have developed some exciting new extensions, applications, etc, please send a link to me via email. I will add a link here:

Third party resources.

If you have made a version running on other platforms (Software at other platforms, e.g. Mac, Linux, vs2010, makefile projects) and want to share it with others, please send me an email containing the url and I will add a link here. Notice, these third party versions may or may not contain updates and bug fix, which I provided in the next section of this webpage for easier updates.

  • Linux version of this work provided by Shuai Zheng from the University of Oxford.
  • Linux version of this work provided by Dr. Ankur Handa from the University of Cambridge.
  • Unix version of this work provided by Varun from University of Maryland.
  • OpenCV version (doc) of this work by Francesco Puja et al.
  • Matlab version of this work by Tianfei Zhou from Beijing Institute of Technology
  • Matlab version (work with 64 bit Win7 & visual studio 2012) provided by Jiaming Li from University of Electronic Science and Technology of China(UESTC).

Bug fix

  • 2014-4-11: There was a bug in Objectness::evaluatePerImgRecall(..) function. After update, the DR-#WIN curve looks slightly better for high value of #WIN. Thanks YongLong Tian and WangLong Wu for reporting the bug.

FAQs

Since the release of the source code 2 days ago, 500+ students and researchers has download this source code (according to email records). Here are some frequently asked questions from users. Please read the FAQs before sending me new emails. Questions already occurred in FAQs will not be replied.

1. I download your code but can’t compile it in visual studio 2008 or 2010. Why?

I use Visual Studio 2012 for develop. The shared source code guarantee working under Visual Studio 2012. The algorithm itself doesn’t rely on any visual studio 2012 specific features. Some users already reported that they successfully made a Linux version running and  achieves 1000fps on a desktop machine (my 300fps was tested on a laptop machine). If users made my code running at different platforms and want to share it with others, I’m very happy to add links from this page. Please contact me via email to do this.

2. I run the code but the results are empty. Why?

Please check if you have download the PASCAL VOC data (2 zip files for training and testing  and put them in ./VOC2007/). The original VOC annotations could not directly be read by OpenCV. I have shared a version which is compatible with OpenCV (https://mmcheng.net/code-data/). After unzip all the 3 data package, please put them in the same folder and run the source code.

3. What’s the password for unzip your source code?

Please read the notice in the download page. You can get it automatically by supplying your name and institute information.

4. I got different testing speed than 300fps. Why?

If you are using 64bit windows, and visual studio 2012, the default setting should be fine. Otherwise, please make sure to enable OPENMP and native SSE instructions. In any cases, speed should be tested under release mode rather than debug mode. Don’t uncomments commands for showing progress, e.g. printf(“Processing image: %s”, imageName). When the algorithm runs at hundreds fps, printf, image reading (SSD hard-disk would help in this case), etc might become bottleneck of the speed. Depending on different hardware, the running speed might be different. To eliminate influence of hard disk image reading speed, I preload all testing images before count timing and do predicting. Only 64 bit machines support such large memory for a single program. If you RAM size is small, such pre-loading might cause hard disk paging, resulting slow running time as well. Typical speed people reporting ranging from 100fps (typical laptop) ~ 1000fps (pretty powerful desktop).

5. After increase the number of proposals to 5000, I got only 96.5% detection rate. Why?

Please read through the paper before using the source code. As explained in the abstract, ‘With increase of the numbers of proposals and color spaces … improved to 99:5% DR’. Using three different color space can be enabled by calling “getObjBndBoxesForTests” rather than the default one in the demo code “getObjBndBoxesForTestsFast”.

6. I got compilation or linking errors like: can’t find “opencv2/opencv.hpp”, error C1083: can’t fine “atlstr.h”.

These are all standard libraries. Please copy the error message and search at Google for answers.

7. Why linear SVMs, gradient magnitudes? These are so simple and alternatives like *** could be better and I got some improvements by doing so. Some implementation details could be improve as well.

Yes, there are many possibilities for improvement and I’m glad to hear people got some improvements already (it is nice to receive these emails). Our major focus is the very simple observation about things vs. stuff distinction (see section 3.1 in our CVPR14 paper). We try to model it as simple and as efficient as possible. Implementation details are also not guaranteed to be optimal and there are space to improve (I’m glad to receive such suggestions via email as well).

8. Like many other proposal methods, the BING method also generates many proposal windows. How can I distinguish between the windows I expect from others. 

Like many other proposal methods (PMAI 2012, IJCV 2013, PAMI 2014, etc.), the number of proposals typically goes to a few thousands. To get the real detection results, you still need to apply a detector. A major advantage of the proposal methods is that the detector can ignore most (up to 99%) image windows in traditional sliding window pipeline, but still be able to check 90+% object windows. See the ‘Suggested detectors‘ section on this webpage for more details.

9. Is there any step by step guidance of using the source code?

Please see the read me document for details about where to download data, where to put the files, and advice for getting maximal speed.

10. Could you give a detailed step by step example of how to get binary normed gradient map from normed gradient map?

The simple method of getting binary normed gradients (binary values) from normed gradients (BYTE values) is described in detail in Sec. 3.3 of our CVPR 2014 paper (the paragraph above equation 5). Here is a simple example to help understanding. E.g. the binary representation of a BYTE value 233 is 11101001. We can take its top 4 bits 1110 to approximate the original BYTE values. If you want to recover the BYTE value from the 4 binary bits 1110, you will get an approximate value 224.

11. Is there any intuitive explanation of the objectness scores, i.e. s_l in equation (1) and O_l in equation (3) ?

The bigger value these scores are, it is more likely to be an object window. Although BING feature is a good feature for getting object proposals, its still not good enough to produce object detection results (see also FAQ 8). We can consider the number of object windows as a computation budget, and we want high recall within this budget. Thus we typically select top n proposals according to these scores, even the score might be negative value (not necessary means a non-object window).  The value s_l means how good the window match with the template. The o_l is the score after calibration in order to rank proposals from more likely size (e.g. 160*160) higher than proposals from less likely size (e.g 10*320). The calibration parameters can be considered as a per size bias terms.

12. Typos in the project page, imperfect post reply, miss-spelled English words in the C++ source code, email not replied, etc.

I apologies for my limited language ability. Please report to me via personal emails if you found such typos, etc. It would also be more than welcome if you can simply repost if I missed to reply some of the important information.

I’m a careless boy and forgot to reply some of the emails quite often. If you think your queries or suggestions are important but not get replied in 5 working days, please simply resent the email.

13. Problem when running to the function format().

Some user suffered from error caused by not be able to correctly format() function in the source code. This is an standard API function of OpenCV. Notice that proper version of OpenCV needs to be linked. It seems that the std::string is not compatible with each other across different versions of Visual studio. You must link to appropriate version of it. Be care with the strange name mapping in visual studio: Visual studio 2005 (VC8), Visual studio 2008 (VC9), Visual studio 2010 (VC10), Visual studio 2012 (VC11), Visual studio 2013 (VC13).

14. What’s the format of the returned bounding boxes and how to illustrate the  boxes as in the paper.

We follow the PASCAL VOC standard bounding boxes definition, i.e. [minX, minY, maxX, maxY]. You can refer the Objectness::illuTestReults() function for how the illustration was done.

15. Discussions in CvChina

There are 400+ disscusions about this projects in http://www.cvchina.info/2014/02/25/14cvprbing/ (in Chinese). You may find answers to your problems there.

Locations of visitors to this page

 

(Visited 146,588 times, 1 visits today)
Subscribe
Notify of
guest

329 Comments
Inline Feedbacks
View all comments
Waheed

Hi,

I am getting Exception at loadAnnotations function , In Main.cpp file.

DataSetVOC voc2007(“D:/Bing/VOC2007/”);
After above command voc2007 variable trainNum contain 2501 but when loadAnnotations function executed trainNum variable showing me value = 0
voc2007.loadAnnotations();
And i get error “Unhandled exception at 0x000007FEFD2D9E5D in Objectness.exe: ”

Please help me in this regards.

陈滨

me too,

Yang0

请问您这个问题解决了吗?

Sunshineatnoon

程老师您好,我用vs2013运行代码的时候遇到:
OpenCV Error: Assertion failed (u != 0) in cv::Mat::create, file C:\builds\maste
r_PackSlave-win64-vc12-shared\opencv\modules\core\src\matrix.cpp, line 411
OpenCV Error: Assertion failed (u != 0) in cv::Mat::create, file C:\builds\maste
r_PackSlave-win64-vc12-shared\opencv\modules\core\src\matrix.cpp, line 411
的错误,请问您有什么建议么?

dev

Hi

In paper it has been mentioned that all the images are resized to particular values. But the results produced in case of a larger image if different from the results produced when the downsized version of the same image is used. It would be great if you could explain this?

lirong

你好,BING的程序代码是在32位系统还是64位系统编译啊

JImmyTeng

陈教授,
您好,看您也做过深度图像相关的内容不知道您是否做过该算法针对深度图像的效果 还望赐教

ls

程老师,为什么测试图像也需要对应的yml文件?我不加对应的yml文件就运行不起来,而如果测试图像也要在yml里面写入目标的坐标数据,那本方法的作用是不是就局限在VOC数据库上了?

yangyi

程老师,您好,我在看代码的过程中有一个问题想不明白,想请教一下您:
FilterTIG.h中的dot(,,,,)函数中,bc08(bc18), bc04(bc14), bc02(bc12)为什么是左移3位(乘以2的3次方)、2位、1位而不是左移7位、6位、5位呢?按照文中公式(6),k=1到N_g,那么系数应该是2_(8-k): 2_(8-1), 2(8-2), 2(8-3), 2(8-4),我不太理解为什么是左移3、2、1,望指教,非常感谢!

Be steady & success

Dr. Ming-Ming Cheng敢于多次回答,说明其方法基本不是造假,是有普及的希望的。

但经不住3次问,哪怕一次问都回答不了或不该回答的论文作者,很多(尤其是中国大陆),使人怀疑其成果或方法的真实性,所以很难普及; 也很难提高作者的知名度。
—–很多作者也就是为了文凭混出一篇论文来,实在是浪费读者的生命(时间)。

yun

程老师:
您好!
有个疑惑我想请教下,在您检测的过程中已经获得了每个图片的proposals,例如000001图片中有1954个proposals,里面有每个方框的位置和权重-0.329004, 1, 257, 353, 500等。接下来在验证获得结果的代码函数illuTestReults()中,您是使用已知信息000001图片中有两个物体和该物体的位置,然后用两个物体的实际位置与获得的每一个proposals进行匹配,在函数interUnio()中获得重叠区域的比例最大即为我们寻找的两个物体,同时并框出保存。我逐一对比后发现最后选择的两个proposals并不是权重值较大的两个。如果我对该部分代码理解无误的话,检测后获得的每个proposal权重值并未在该结果验证中得到使用,而是用了已知的待测图片包含的物体个数和位置与每个proposal做对比进行验证,包括验证待测图片后需要保留的proposal个数(即物体个数)也是提前已知的。这就是我的疑惑之处。如理解有误,请交流指正,谢谢您!

steady & success

如果yun所说的是真实的话,那么该方法便失去了应用价值。。。。。希望原作者能予以澄清。
真理越辩越明吗!

yun

您好,我并非认为该文章失去了应用价值。我觉得作者原意是在使用少量的时耗进行general object proposal后,给出的实际效果会显著好于随机窗口random guess(见figure 3),而并非直接给出检测到的实际物体位置。以上是我的理解,不对的地方还请指正。

wen

您好,我想请问一下,程老师有没有回答你的这个问题,我也想知道结果是否很依赖真值

Be steady & success

如果yun所说的是事实,那么这种方法便失去了实用性。。。希望原作者能澄清一下。

yun

您好,我并非认为该文章失去了应用价值。我觉得作者原意是在使用少量的时耗进行general object proposal后,给出的实际效果会显著好于随机窗口random guess(见figure 3),而并非直接给出检测到的实际物体位置。以上是我的理解,不对的地方还请指正。

Be steady & success

在预先不知道哪个应为合适的proposal时,只能从其得分(重值大小)来选择,不然权重值大小有何意义?

郝婧

程老师,您好。我现在想在我自己的数据集上用您的程序与proposal generation for object detection using cascaded ranking SVMs 做对比试验,但是我目前找不到后者的mat文件。请问您是如何做的,能否提供您在做对比实验时的mat文件。谢谢老师。

wd

程老师您好,如果我想用在特定的某一类图像上(与VOC数据集不同的某一类图像),是否需要大量这一类的图像来重新进行训练呢?期待您的回答

wd

程老师您好,还有一个问题就是:我想验证自己的测试图片的效果的话,怎么生成标注文件yml文件?

wd

程老师您好,您的程序已经跑通,但如果我想用在自己的测试图片上(与VOC数据集不同的特定的某一类图像),比如航拍图像中绝缘子的检测,是否需要大量绝缘子的图像来重新进行训练呢?

shaobo guo

程老师,您好!
我在评论里看到有同学在Linux平台上跑通了代码,我在调试的过程中遇到了问题,想请教实验成功的同学,所以能否将他们的Makefile上传或者留一个他们的联系方式,多谢程老师!

yun

程老师,
您好!在复现您程序的过程中有个问题想和您交流下。当第一阶分类完成后,您文章里使用W进行后续的检测和加速,获得model是包含了类似如下的数据
svm_type c_svc
kernel_type rbf
gamma 0.07
nr_class 2
total_sv 10423
rho -0.791685
label 1 0
probA -8.09285
probB 4.83667
nr_sv 8625 1798
SV
0.2080078125 1:16 2:107 3:15 4:23 5:23 6:59 7:80 8:17 9:141 10:45 11:80 12:11 13:5 14:102 15:37 16:42 17:14 18:73 19:86 20:88 21:92 22:59 23:144 24:32 25:230 26:108 27:8 28:43 29:43 30:28 31:20 32:20 33:226 34:18 35:16 36:6 37:90 38:49 39:140 40:42 41:18 42:20 43:10 44:127 45:17 46:19 47:23 48:128 49:70 50:8 51:81 52:36 53:78 54:23 55:54 56:9 57:82 58:86 59:12 60:16 61:33 62:25 63:19 64:14
…….
您文章里的w只是一个8*8的数值矩阵,model却包含了所有边界支持向量(1798个)的8*8特征值,所以想请教下您是如何将.model文件数据转化为所需要的8*8的矩阵的,您使用了model里的哪些数据。这些具体细节文章里并未给出。谢谢!

MissP

{“bird”, “car”, “cat”,”dog”,”cow”, “sheep”};是这六种类型作为训练类吗?因为在程序上包含这个语句的函数getTrainTest()没有被引用过。还是在loadDataGenericOverCls()函数上对类型进行分6种训练类14种测试类?谢谢。

river

程老师您好,您提供的voc2007数据集的网址中国无法访问!求新链接!谢谢!

yao deng

程老师,您好!
我正在使用python复现您的算法,先阶段已经完成第一级svm分类并获得.model文件。下一步对第二级分类器进行训练,由于才疏学浅暂未看透彻,您是将model对训练图片中每个size进行检测,通过NMS筛选出得分较高的窗口,以该窗口的得分作为训练正样本重新使用svm进行训练?又或者我未理解,但这一步如何获得v和t的数值,并未给出确切的公式或者推倒。希望通过交流得到您的解惑。

wd

你好,我在读入数据的时候test和train数据均能读入正确,但annotation文件load不成功,不知道是不是路径修改不正确?我只在main函数中修改了路径,是不是还得在其他地方修改?

wd

程老师您好,您的程序已经跑通,但如果我想用在自己的测试图片上(与VOC数据集不同的特定的某一类图像),比如航拍图像中绝缘子的检测,是否需要大量绝缘子的图像来重新进行训练呢?

Yang

您好,我也遇到了类似的问题,还有就是“First-chance exception at 0x000007FEFDAFB3DD in Objectness.exe: Microsoft C++ exception: cv::Exception at memory location 0x00000000001BE560.
在已损坏了程序内部状态的 Objectness.exe 中发生了缓冲区溢出。按“中断”以调试程序,或按“继续”以终止程序”。
请问您是如何解决的?
(期待您的回复!谢谢!)

MissP

training our objectness measure on 6 object categories and testing on other 14 unseen categories这点没有懂,对应的程序也没有看懂 ,求指教。