DemoResearch

Deep Hough Transform for Semantic Line Detection

online demo

Abstract

In this paper, we put forward a simple yet effective method to detect meaningful straight lines, a.k.a. semantic lines, in given scenes. Prior methods take line detection as a special case of object detection, while neglect the inherent characteristics of lines, leading to less efficient and suboptimal results. We propose a one-shot end-to-end framework by incorporating the classical Hough transform into deeply learned representations. By parameterizing lines with slopes and biases, we perform Hough transform to translate deep representations to the parametric space and then directly detect lines in the parametric space. More concretely, we aggregate features along candidate lines on the feature map plane and then assign the aggregated features to corresponding locations in the parametric domain. Consequently, the problem of detecting semantic lines in the spatial domain is transformed to spotting individual points in the parametric domain, making the post-processing steps,i.e.non-maximal suppression, more efficient. Furthermore, our method makes it easy to extract contextual line features, that are critical to accurate line detection. Experimental results on a public dataset demonstrate the advantages of our method over state-of-the-arts.

This image has an empty alt attribute; its file name is vis-1024x514.png
Fig1.Detection results by our proposed Deep Hough Transform method.

Papers

  • Deep Hough Transform for Semantic Line Detection, Kai Zhao#, Qi Han#, Chang-Bin Zhang, Jun Xu, Ming-Ming Cheng*, IEEE TPAMI, 2022. [pdf | code| bibproject | latex]
  • Qi Han#, Kai Zhao#, Jun Xu, Ming-Ming Cheng*. “Deep Hough Transform for Semantic Line Detection.”16th European Conference on Computer Vision, August 2020. (ECCV2020) (* denotes equal contribution) [Pdf |Code | bib | Video |Slides | 中文版] 

Background

Previous solutions for this task take line detection as a special case of object detection and adopt existing CNN-based object detectors. In their method, features are aggregated along straight lines by using LOI pooling. A classification network and a regression network are applied to the extracted feature vectors to identify positive lines and adjust the line positions. These methods extract line-wise feature vectors by aggregating deep features solely along each line, leading to inadequate context information, and are less efficient in terms of running time.

Pipeline

Our method comprises the following four major components:

  • A CNN encoder that extracts pixel-wise deep representations.
  • The deep Hough transform that converts the spatial representations to a parametric space.
  • Line detector that is responsible to detect lines in the parametric space.
  • A reverse Hough transform that converts the detected lines back to image space.
Fig2. Pipeline of our proposed method. DHT is short for the proposed Deep Hough Transform, and RHT represents the Reverse Hough Transform.

Deep Hough Transform performs Hough Transform on deep representations and transforms the spatial features to parametric space with high dimensions in parallel. The line structures could be more compactly represented in parametric space because lines nearby a specific line are translated to surrounding points of this line in parametric space. We use a context-aware line detector to aggregate features of nearby lines. Then we directly get individual points in the prediction of parametric space. The time-consuming NMS is converted to calculate the geological center of the connected-points in parametric space. Finally, reverse mapping of Hough Transform maps points in parametric space back to spatial space.

Evaluation Metric

A principled metric named EA-score, which measures the similarity between two lines, which consider both Euclidean distance and angular distance.

Fig3. Our proposed metric considers both Euclidean distance and angular distance between a pair of lines,resulting in consistent and reasonable scores.
Fig4. Example lines with various EA-scores.

Performance

Fig5. Quantitative comparisons across different methods. Our method significantly outperforms other competitors in terms of average F-measure.

ECCV2020 Video

ALL Detection Results on SEL dataset

(Visited 10,470 times, 10 visits today)
Subscribe
Notify of
guest

28 Comments
Inline Feedbacks
View all comments
holo

您好,请问NKL数据集下载后显示文件已损坏,有可以正常使用的下载链接吗?

Qi Han

你好,测试下载链接正常,请检查网络连接问题并重新下载?

后后清刚

您好,请问可以分享一下预测的代码吗?就是随机输入一张图片,经过模型之后,可以输出含有语义线的图片。我是学生,保证不会将代码商用!谢谢您!

Qi Han

你好,github代码页面内提供了了完整的训练和测试代码,测试代码可以经过简单的修改实现对一张图片的检测以及结果可视化。

ljy

老师您好,请问数据库和模型是不再能够下载了吗,点击连接无法加载下载页面

ljy

并且在windows anaconda python3.8环境中无法setup build deep hough,请问您有遇见过这种情况吗

Qi Han

目前仅支持ubuntu环境下进行编译

Qi Han

你好,代码模型和数据都可以下载使用的,如果遇到问题,请检查网络连接和浏览器设置,并使用科学上网工具。

后后清刚

你好,我想问一下,咱们这篇论文在代码训练的时候需要用到候选线信息作为输入吗?(第一篇语义线检测的论文SLNet,在训练的时候需要使用候选线line_pts,这个候选线信息是数据集中给定的)。如果我训练好语义线模型以后,想使用非数据集中的图像进行预测,如果没有候选线信息作为输入,是不是就无法进行预测。

Qi Han

不需要设定proposal,之前的SLNet文章在方法上需要给出proposal,但是作者使用暴力枚举的方式产生proposal,与图像无关,所以也不需要预先设置候选信息。我们的方法从原理上即不需要侯选线。

MrL

老师,项目运行的环境可以分享以下吗
用的哪个torch, cuda, cudnn版本?

Qi Han

你好,在github代码页面,我们给出了环境依赖。 在ubuntu环境下,torch>=1.3 cuda>=9.2均可以编译运行我们的代码。

dyf

老师你好,我用github上的代码以及分享的权重文件进行预测,结果得到mean P、mean R、Mean F的评价参数为 nan, 这是怎么回事。
然后我自己训练模型,最后得到的参数有值,但是都只有0.0x。
请问这是怎么回事。

Qi Han

你好,github有完整的inference和测试代码,如果按照脚本执行仍有问题,请在github开一个issue,将相关操作和系统环境粘贴进去,以便定位问题。

dyf

老师,可否发一个训练好的权值文件??

Qi Han

您好,预训练模型在github代码页面提供有下载链接,可以根据readme提示使用。

后后清刚

我也遇到nan的问题 您解决了吗?

后后清刚

你好,请问一下代码中的deep-hough是python中的一个包吗?使用pip install deep-hough一直无法下载?

后后清刚

好的我试一下,谢谢您

后后清刚

请问这是什么问题啊

QQ截图20210909152347.jpg
chenfy

老师你好,github提供的代码没有deep-hough模块,是要去哪里下载?

后后清刚

好的,谢谢您

陈溢铭

老师你好,我拜读了该篇论文,论文非常的棒,可是我用github分享的代码通过NKL的数据集自己训练模型并进行预测,evaluate的结果比论文的结果低很多,并且用github提供的预训练模型进行evaluate的结果也很低,不知道是什么原因

Qi Han

您好,您可以在代码的github页面提出issue,最好能够附加您的运行环境和运行方式,我会跟进协助解决。

陈溢铭

老师,我已经在github提出了issue,并放出了自己的训练log日志,感谢老师的帮助

hfl

您好,请问你是用pytorch来进行训练的吗