Research

BING: Binarized Normed Gradients for Objectness Estimation at 300fps

Ming-Ming Cheng1           Ziming Zhang2        Wen-Yan Lin3           Philip Torr1

1The University of Oxford     2Boston University      3Brookes Vision Group

Abstract

Training a generic objectness measure to produce a small set of candidate object windows, has been shown to speed up the classical sliding window object detection paradigm. We observe that generic objects with well-defined closed boundary can be discriminated by looking at the norm of gradients, with a suitable resizing of their corresponding image windows in to a small fixed size. Based on this observation and computational reasons, we propose to resize the window to 8 × 8 and use the norm of the gradients as a simple 64D feature to describe it, for explicitly training a generic objectness measure.

We further show how the binarized version of this feature, namely binarized normed gradients (BING), can be used for efficient objectness estimation, which requires only a few atomic operations (e.g. ADD, BITWISE SHIFT, etc.). Experiments on the challenging PASCAL VOC 2007 dataset show that our method efficiently (300fps on a single laptop CPU) generates a small set of category-independent, high quality object windows, yielding 96.2% object detection rate (DR) with 1,000 proposals. Increasing the numbers of proposals and color spaces for computing BING features, our performance can be further improved to 99.5% DR.

Papers

  1. BING: Binarized Normed Gradients for Objectness Estimation at 300fps, Ming-Ming Cheng, Yun Liu, Wen-Yan Lin, Ziming Zhang, Paul L. Rosin, Philip H. S. Torr, Computational Visual Media 5(1):3-20, 2019. [Project page][pdf][bib] (Extention of CVPR 2014 Oral)
  2. BING: Binarized Normed Gradients for Objectness Estimation at 300fps. Ming-Ming Cheng, Ziming Zhang, Wen-Yan Lin, Philip Torr, IEEE CVPR, 2014. [Project page][pdf][bib][C++][Latex][PPT, 12 min] [Seminar report, 50 min] [Poster] [Spotlight, 1 min] (Oral, Accept rate: 5.75%)

Most related projects on this website

  • SalientShape: Group Saliency in Image Collections. Ming-Ming Cheng, Niloy J. Mitra, Xiaolei Huang, Shi-Min Hu. The Visual Computer 30 (4), 443-453, 2014. [pdf] [Project page] [bib] [latex] [Official version]
  • Efficient Salient Region Detection with Soft Image Abstraction. Ming-Ming Cheng, Jonathan Warrell, Wen-Yan Lin, Shuai Zheng, Vibhav Vineet, Nigel Crook. IEEE International Conference on Computer Vision (IEEE ICCV), 2013. [pdf] [Project page] [bib] [latex] [official version]
  • Global Contrast based Salient Region Detection. Ming-Ming Cheng, Niloy J. Mitra, Xiaolei Huang, Philip Torr, Shi-Min Hu. IEEE TPAMI, 2014. [Project page] [Bib] [Official version] (2nd most cited paper in CVPR 2011)

Spotlights Video (17MB Video, pptx)

Figure.  Tradeoff between #WIN and DR (see [3] for more comparisons with other methods [6, 12, 16, 20, 25, 28, 30, 42] on the same benchmark). Our method achieves 96.2% DR using 1,000 proposals, and 99.5% DR using 5,000 proposals. ResBING

Table 1. Average computational time on VOC2007.

TimingBING

Table 2. Average number of atomic operations for computing objectness of each image window at different stages: calculate normed gradients, extract BING features, and get objectness score.

SampleBING

Figure.  Illustration of the true positive object proposals for VOC2007 test images.

Downloads

     The C++ source code of our method is public available for download. An OpenCV compatible VOC 2007 annotations could be found here. 由于VOC网站在中国大陆被墙,我们提供了一个镜像下载链接:百度网盘下载, 镜像下载Matlab file for making figure plot in the paper. Results for VOC 2007 (75MB). We didn’t apply any patent for this system, encouraging free use for both academic and commercial users.

Links to most related works:

  1. Measuring the objectness of image windows. Alexe, B., Deselares, T. and Ferrari, V. PAMI 2012.
  2. Selective Search for Object Recognition, Jasper R. R. Uijlings, Koen E. A. van de Sande, Theo Gevers, Arnold W. M. Smeulders, International Journal of Computer Vision, Volume 104 (2), page 154-171, 2013
  3. Category-Independent Object Proposals With Diverse Ranking, Ian Endres, and Derek Hoiem, PAMI February 2014.
  4. Proposal Generation for Object Detection using Cascaded Ranking SVMs. Ziming Zhang, Jonathan Warrell and Philip H.S. Torr, IEEE CVPR, 2011: 1497-1504.
  5. Learning a Category Independent Object Detection Cascade. E. Rahtu, J. Kannala, M. B. Blaschko, IEEE ICCV, 2011.
  6. Generating object segmentation proposals using global and local search, Pekka Rantalankila, Juho Kannala, Esa Rahtu, CVPR 2014.
  7. Efficient Salient Region Detection with Soft Image Abstraction. Ming-Ming Cheng, Jonathan Warrell, Wen-Yan Lin, Shuai Zheng, Vibhav Vineet, Nigel Crook. IEEE ICCV, 2013.
  8. Global Contrast based Salient Region Detection. Ming-Ming Cheng, Niloy J. Mitra, Xiaolei Huang, Philip Torr, Shi-Min Hu. IEEE TPAMI, 2014. (2nd most cited paper in CVPR 2011).
  9. Geodesic Object Proposals. Philipp Krähenbühl and Vladlen Koltun, ECCV, 2014.

Suggested detectors:

The proposals needs to be verified by detector in order to be used in real applications. Our proposal method perfectly match the major speed limitation of the following stage of the art detectors (please email me if you have other suggestions as well):

  1. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, R. Girshick, J. Donahue, T. Darrell, J. Malik, IEEE CVPR (Oral), 2014. (Code; achieves best ever reported performance on PASCAL VOC)
  2. Fast, Accurate Detection of 100,000 Object Classes on a Single Machine, CVPR 2013 (best paper).
  3. Regionlets for Generic Object Detection, ICCV 2013 oral. (Runner up Winner in the ImageNet large scale object detection challenge)

Recent methods

  1. Data-driven Objectness, IEEE TPAMI, in print.

Applications

If you have developed some exciting new extensions, applications, etc, please send a link to me via email. I will add a link here:

Third party resources.

If you have made a version running on other platforms (Software at other platforms, e.g. Mac, Linux, vs2010, makefile projects) and want to share it with others, please send me an email containing the url and I will add a link here. Notice, these third party versions may or may not contain updates and bug fix, which I provided in the next section of this webpage for easier updates.

  • Linux version of this work provided by Shuai Zheng from the University of Oxford.
  • Linux version of this work provided by Dr. Ankur Handa from the University of Cambridge.
  • Unix version of this work provided by Varun from University of Maryland.
  • OpenCV version (doc) of this work by Francesco Puja et al.
  • Matlab version of this work by Tianfei Zhou from Beijing Institute of Technology
  • Matlab version (work with 64 bit Win7 & visual studio 2012) provided by Jiaming Li from University of Electronic Science and Technology of China(UESTC).

Bug fix

  • 2014-4-11: There was a bug in Objectness::evaluatePerImgRecall(..) function. After update, the DR-#WIN curve looks slightly better for high value of #WIN. Thanks YongLong Tian and WangLong Wu for reporting the bug.

FAQs

Since the release of the source code 2 days ago, 500+ students and researchers has download this source code (according to email records). Here are some frequently asked questions from users. Please read the FAQs before sending me new emails. Questions already occurred in FAQs will not be replied.

1. I download your code but can’t compile it in visual studio 2008 or 2010. Why?

I use Visual Studio 2012 for develop. The shared source code guarantee working under Visual Studio 2012. The algorithm itself doesn’t rely on any visual studio 2012 specific features. Some users already reported that they successfully made a Linux version running and  achieves 1000fps on a desktop machine (my 300fps was tested on a laptop machine). If users made my code running at different platforms and want to share it with others, I’m very happy to add links from this page. Please contact me via email to do this.

2. I run the code but the results are empty. Why?

Please check if you have download the PASCAL VOC data (2 zip files for training and testing  and put them in ./VOC2007/). The original VOC annotations could not directly be read by OpenCV. I have shared a version which is compatible with OpenCV (https://mmcheng.net/code-data/). After unzip all the 3 data package, please put them in the same folder and run the source code.

3. What’s the password for unzip your source code?

Please read the notice in the download page. You can get it automatically by supplying your name and institute information.

4. I got different testing speed than 300fps. Why?

If you are using 64bit windows, and visual studio 2012, the default setting should be fine. Otherwise, please make sure to enable OPENMP and native SSE instructions. In any cases, speed should be tested under release mode rather than debug mode. Don’t uncomments commands for showing progress, e.g. printf(“Processing image: %s”, imageName). When the algorithm runs at hundreds fps, printf, image reading (SSD hard-disk would help in this case), etc might become bottleneck of the speed. Depending on different hardware, the running speed might be different. To eliminate influence of hard disk image reading speed, I preload all testing images before count timing and do predicting. Only 64 bit machines support such large memory for a single program. If you RAM size is small, such pre-loading might cause hard disk paging, resulting slow running time as well. Typical speed people reporting ranging from 100fps (typical laptop) ~ 1000fps (pretty powerful desktop).

5. After increase the number of proposals to 5000, I got only 96.5% detection rate. Why?

Please read through the paper before using the source code. As explained in the abstract, ‘With increase of the numbers of proposals and color spaces … improved to 99:5% DR’. Using three different color space can be enabled by calling “getObjBndBoxesForTests” rather than the default one in the demo code “getObjBndBoxesForTestsFast”.

6. I got compilation or linking errors like: can’t find “opencv2/opencv.hpp”, error C1083: can’t fine “atlstr.h”.

These are all standard libraries. Please copy the error message and search at Google for answers.

7. Why linear SVMs, gradient magnitudes? These are so simple and alternatives like *** could be better and I got some improvements by doing so. Some implementation details could be improve as well.

Yes, there are many possibilities for improvement and I’m glad to hear people got some improvements already (it is nice to receive these emails). Our major focus is the very simple observation about things vs. stuff distinction (see section 3.1 in our CVPR14 paper). We try to model it as simple and as efficient as possible. Implementation details are also not guaranteed to be optimal and there are space to improve (I’m glad to receive such suggestions via email as well).

8. Like many other proposal methods, the BING method also generates many proposal windows. How can I distinguish between the windows I expect from others. 

Like many other proposal methods (PMAI 2012, IJCV 2013, PAMI 2014, etc.), the number of proposals typically goes to a few thousands. To get the real detection results, you still need to apply a detector. A major advantage of the proposal methods is that the detector can ignore most (up to 99%) image windows in traditional sliding window pipeline, but still be able to check 90+% object windows. See the ‘Suggested detectors‘ section on this webpage for more details.

9. Is there any step by step guidance of using the source code?

Please see the read me document for details about where to download data, where to put the files, and advice for getting maximal speed.

10. Could you give a detailed step by step example of how to get binary normed gradient map from normed gradient map?

The simple method of getting binary normed gradients (binary values) from normed gradients (BYTE values) is described in detail in Sec. 3.3 of our CVPR 2014 paper (the paragraph above equation 5). Here is a simple example to help understanding. E.g. the binary representation of a BYTE value 233 is 11101001. We can take its top 4 bits 1110 to approximate the original BYTE values. If you want to recover the BYTE value from the 4 binary bits 1110, you will get an approximate value 224.

11. Is there any intuitive explanation of the objectness scores, i.e. s_l in equation (1) and O_l in equation (3) ?

The bigger value these scores are, it is more likely to be an object window. Although BING feature is a good feature for getting object proposals, its still not good enough to produce object detection results (see also FAQ 8). We can consider the number of object windows as a computation budget, and we want high recall within this budget. Thus we typically select top n proposals according to these scores, even the score might be negative value (not necessary means a non-object window).  The value s_l means how good the window match with the template. The o_l is the score after calibration in order to rank proposals from more likely size (e.g. 160*160) higher than proposals from less likely size (e.g 10*320). The calibration parameters can be considered as a per size bias terms.

12. Typos in the project page, imperfect post reply, miss-spelled English words in the C++ source code, email not replied, etc.

I apologies for my limited language ability. Please report to me via personal emails if you found such typos, etc. It would also be more than welcome if you can simply repost if I missed to reply some of the important information.

I’m a careless boy and forgot to reply some of the emails quite often. If you think your queries or suggestions are important but not get replied in 5 working days, please simply resent the email.

13. Problem when running to the function format().

Some user suffered from error caused by not be able to correctly format() function in the source code. This is an standard API function of OpenCV. Notice that proper version of OpenCV needs to be linked. It seems that the std::string is not compatible with each other across different versions of Visual studio. You must link to appropriate version of it. Be care with the strange name mapping in visual studio: Visual studio 2005 (VC8), Visual studio 2008 (VC9), Visual studio 2010 (VC10), Visual studio 2012 (VC11), Visual studio 2013 (VC13).

14. What’s the format of the returned bounding boxes and how to illustrate the  boxes as in the paper.

We follow the PASCAL VOC standard bounding boxes definition, i.e. [minX, minY, maxX, maxY]. You can refer the Objectness::illuTestReults() function for how the illustration was done.

15. Discussions in CvChina

There are 400+ disscusions about this projects in http://www.cvchina.info/2014/02/25/14cvprbing/ (in Chinese). You may find answers to your problems there.

Locations of visitors to this page

 

(Visited 145,523 times, 1 visits today)
Subscribe
Notify of
guest

329 Comments
Inline Feedbacks
View all comments
wd

程老师您好,我在运行时出现这样的问题是怎么回事?“pBlock 变量已被优化掉,因而不可用。 ”

fqjabc

我也出现这个问题“pBlock变量已被优化掉,因而不可用”,请问这个问题你解决了吗?

wd

请问我用的vs2010+opencv2.4.10,64位,出现这个错误是什么原因?error MSB8013: 此项目不包含配置和平台组合 Debug|Win32。

DrBalthar

I’ve got a question concerning the code I wonder if this can be true as it doesn’t make much sense to me. It is in the Objectness:predictBBoxSI method. When computing the score bounding box are you sure the location is computed correctly, here is the code:

// Find true locations and match values
double ratioX = width/_W, ratioY = height/_W;

int iMax = min(matchCost.size(), NUM_WIN_PSZ);
for (int i = 0; i < iMax; i++){
float mVal = matchCost(i);
Point pnt = matchCost[i];
Vec4i box(cvRound(pnt.x * ratioX), cvRound(pnt.y*ratioY));
box[2] = cvRound(min(box[0] + width, imgW));
box[3] = cvRound(min(box[1] + height, imgH));
box[0] ++;
box[1] ++;
valBoxes.pushBack(mVal, box);
sz.push_back(ir);
}

These ratio calculation look wrong to me as this box location does have no relation to the real image location, shouldn't ratioX and ratioY not be
ratioX = imgW/width and ratioY = imgY/height instead ?

Xu-hua Hu

程老师,
您好,我想在您这个算法的object proposals做recognition。但我的后级的recognition分类器只适合做32×32以上像素的分类,而您的程序特征是8×8的。那我该怎么办?我看见Objectness对象实例化的时候可以输入一个W,这个W是您的BING特征size可变的意思么?就是说我让这个W=32就能较好的满足我的需要吗?会不会有其他缺点?

Liuqian

程老师,您好。我之前给您留过言。之前的问题已经解决了。但是在调试时加载train集合test集合时发现在DataSetVOC的构造函数中加载“TrainVal.txt”、”Test.txt”和”class.txt”路径时,在”ImageSets/Main/“路径下没有”Test.txt”和”class.txt”这两个文本文件。
我将你网站上提到的 VOC 2007 、 training 、testing和 annotation for opencv链接下的数据包都下载了下来,分别在training链接下的文件夹找到了TrainVal.txt,在testing链接下的文件夹找到了Test.txt。但在上面四个文件夹中都没有查找到Class.txt文件。请问这个问题怎么解决?

zeyu

程老师,您好。我想用自己的一些图片训练BING里用到的linear svm,不知道如何生成VOC那样的annotation file也就是.yml文件。相关的code或者label tools有提供吗?

wd

你好,我在读入数据的时候test和train数据均能读入正确,但annotation文件load不成功,不知道是不是路径修改不正确?我只在main函数中修改了路径,是不是还得在其他地方修改?

徐君妍

你好,我也想用自己的图片做训练,请问你实现生成相应的.ymi文件了吗?期待回复,谢谢~

Liu Qian

程老师,您好。看到BING的文献和代码如获至宝。但是在调试过程中出现的问题一直没有找到解决的办法,请你百忙中看看我的配置方法是在哪里出现错误导致最后的出错结果。谢谢。
Bing程序的配置方法:
1.将”VOC2007”文件夹放置在”BingObjectnessCVPR14”文件夹内
2. 配置系统环境变量
将”..buildx64v12bin”添加到系统环境变量的path路径中
3. 配置工程
a) 对Objectness项目的”Release|x64”属性设置包含目录(Include Directories)、库目录(Library Directories);在附加依赖项(Additional Dependencies)中添加opencv2.4.9的x64的Debug版本和Release版本的lib表;
用同样的方法配置Cmlib和LibLinear项目。
b) 将Objectness设置为启动项目
对Objectness项目的”Release|x64”设置[属性]|[C/C++]|[代码生成]|[启用增强指令集]为”流式处理SIMD扩展(/arch:SSE2)”
4. 删除Cmlib[属性]|[生成事件]|[后期生成事件]|[命令行]里面的内容
5. 生成Cmlib和LibLinear项目,将Cmlib.lib和LibLinear.lib添加Objectness项目的”Release|x64”属性中的附加依赖项中,在附加库目录中添加对应的目录。
6. 按Ctrl+F5执行整个方案,生成Objectness.exe并运行。
问题:弹出错误提示“在已损坏了程序内部状态的 Objectness.exe 中发生了缓冲区溢出。按“中断”以调试程序,或按“继续”以终止程序。”。

Yang

您好!我也遇到类似您这样的问题:弹出错误提示“在已损坏了程序内部状态的 Objectness.exe 中发生了缓冲区溢出。按“中断”以调试程序,或按“继续”以终止程序。”。请问您是如何解决的?

陈滨

今天终于调通了,问题出在数据库里,百度网盘下的那个数据库好像是不全的,我这边下来只有600多兆,图片少了2000多张,运行代码时,当数据读入i=1952时,图像对应007792还是多少的,去库里你会发现找不到,,,还是去原始镜像位置下吧。

Yang

您好!感谢您的回复。我又重新镜像下载了一下,还是有这样的问题“First-chance exception at 0x000007FEFDAFB3DD in Objectness.exe: Microsoft C++ exception: cv::Exception at memory location 0x00000000001BE560.
在已损坏了程序内部状态的 Objectness.exe 中发生了缓冲区溢出。”
您能把您程序里的VOC2007包发给我吗?我的邮箱:2391419746@qq.com
(期待您的回复!谢谢!)

fqjabc

同学,你好,我也是遇到这个问题,Ctrl+F5运行时马上弹出“OBjectness.exe”已停止工作,请问这个程序你跑起来了吗?这个问题解决的怎样呢

Fanny

同学,请问你的这个问题解决了吗,怎么解决的?这个程序在debug下可以正常运行,速度超级慢,在release下运行就出现这个问题

arrietty

我也出现这样的问题,请问已经解决了吗?

arrietty

老师,我已经找到原因了 是因为运行库这里的原因 要改成多线程调试 DLL (/MDd)。现在debug和release都可以运行了。谢谢老师了

zhang

程老师,您好,
有个问题想咨询您一下。文章中的Figure2看的不是很明白,尤其是关于位移操作部分,烦请回复,因为刚接触这部分内容很有可能是自己有些基础东西没有看到,希望您能给些建议,谢谢。

Guest

程老师,你好

请问ResultsBBoxesB2W8MAXBGR 里面的结果中 (如下例) 是否第一列float数字即为 scores of the proposals, 为何这些score都是负值? 是否排在越前面的是match with template 越好的?

-0.307949, 1, 257, 353, 500

-0.349893, 1, 1, 353, 500

-0.364906, 97, 1, 352, 500

-0.4157, 1, 33, 353, 288

sudha

The 7z file for OpenCV readable VOC annotations is extracting yml files of all Zero bytes ? Any suggestion on why it may be?

Cyan

程老师您好,关于论文的实验结果部分有一些不清楚的地方,就是table 1的计算时间比较,这个时间是怎么计算的呢?我的理解是使用BING特征所得到的0.003S是指得到一幅图像里上千个proposals所用的平均时间,那么上千个具体是多少个呢?还有表中所比较的其他算法的时间是怎么得到的,也是指检测到一幅图像里相同数量的proposals所用的时间吗?(我知道proposals不是最终的detection结果)我的表述可能有点混乱,期待得到您的解答,谢谢了:)

sudha

Dear MM Cheng,
Thanks for sharing the code! I understand the binary approximation part also play a good role in system speed up. Could you please explain the Binary approximation portion alone with an example, I am not able to get the insight. Thanks in advance!

Xiang-Nan

程老师您好,我是研一的新生,在这方面刚入门,正在读您的论文,有几个地方不太懂:1. Stage II 里面训练尺寸 i 对应的线性 SVM 系数 vi, ti 时您用的是“selected(NMS) proposals as training samples”,这里为什么要用 NMS 图像而不是继续像 Stage I 里面用原图? 2.第一个Algorithm 里面求每个二维基底 aj 前面的系数 βj 时为什么要用点乘除以aj 模的平方,而不是除以 aj 的模?

maria

Hi,

Please i have one question. Can i use the BING to do object recognition ?
In my research, i want to recognize places using objects recognition. For example, if i have on the image: cooking table, refrigerator etc… I can deduce that this place is kitchen.

xiahouzuoxin

程老师,你好,非常高兴阅读了你的文章,感觉很棒,只是有些地方还犯迷糊:(1)BING feature不是8×8的矩阵吗?那么g_l=sum{ 2^(8-k) * b_k,l }是什么意思?这几个下标有些混乱,导致整个算法2不太明白,能用白话解释一下吗?(2)跑过了程序,但VS2012显示X64平台无法启用SSE指令,我开启了openmp直接跑了大概14s的StageI,340多s的StageII,单张图片0.1s,请问这样正常吗?

chunchun

程老师你好,我在重装系统后跑代码出现了意外中断,编译是通过的,会是什么原因呢?HEAP[Objectness.exe]: Invalid address specified to RtlValidateHeap( 00000000002A0000, 0000000001F44780 )
Objectness.exe 已触发了一个断点。

loraloper

hi can any body please explain the format of the txt files in the folder
\VOC2007\Results\BBoxesB2W8MAXBGR
I mean what the 5 values one floating point and 4 integer values corresponds to?
For example
How one could plot this over image number 000027?
(-0.320794, 1, 1, 486, 500)
Thanks.

Geroge

程老师,关于你的THUS-10000数据库,我有一个问题:您这个数据库中的图像是如何从MSRA数据库中挑选的?是全部包括了MSRA_B中的所有图像,再加上一部分MSRA_A中的图像么?还是既包括了MSRA_B中的部分图像,也包含了MSRA_A中的部分图像呢?因为我写论文要用这个数据库,所以需要知道这些信息,谢谢。

XinZuo

程老师:
我看代码里在画出测试图片预测的boundingboxes时用到了yml文件中object数量信息。
请问对于一副图像我要检测它里面的objectness我要如何获得它的yml文件呢?

MissP

你好,请问你现在知道怎么得到yml文件吗?

Jun

程老师,您好:
对于同一个training和testing的datasets,跑两次完全一样设置,程序的结果会不一样。请问产生这个问题的原因是在inary norm时候有近似的原因吗?有时候两个结果会相差1%。也就是说,结果不具有确定性。