Deep Hough Transform for Semantic Line Detection

14/07/2020 Qi Han

online demo

Abstract

In this paper, we put forward a simple yet effective method to detect meaningful straight lines, a.k.a. semantic lines, in given scenes. Prior methods take line detection as a special case of object detection, while neglect the inherent characteristics of lines, leading to less efficient and suboptimal results. We propose a one-shot end-to-end framework by incorporating the classical Hough transform into deeply learned representations. By parameterizing lines with slopes and biases, we perform Hough transform to translate deep representations to the parametric space and then directly detect lines in the parametric space. More concretely, we aggregate features along candidate lines on the feature map plane and then assign the aggregated features to corresponding locations in the parametric domain. Consequently, the problem of detecting semantic lines in the spatial domain is transformed to spotting individual points in the parametric domain, making the post-processing steps,i.e.non-maximal suppression, more efficient. Furthermore, our method makes it easy to extract contextual line features, that are critical to accurate line detection. Experimental results on a public dataset demonstrate the advantages of our method over state-of-the-arts.

This image has an empty alt attribute; its file name is vis-1024x514.png — Fig1.Detection results by our proposed Deep Hough Transform method.

Papers

Deep Hough Transform for Semantic Line Detection, Kai Zhao#, Qi Han#, Chang-Bin Zhang, Jun Xu, Ming-Ming Cheng*, IEEE TPAMI, 2022. [pdf | code| bib | project | latex]
Qi Han^#, Kai Zhao^#, Jun Xu, Ming-Ming Cheng*. “Deep Hough Transform for Semantic Line Detection.”. 16th European Conference on Computer Vision, August 2020. (ECCV2020) (* denotes equal contribution) [Pdf |Code | bib | Video |Slides | 中文版]

Background

Previous solutions for this task take line detection as a special case of object detection and adopt existing CNN-based object detectors. In their method, features are aggregated along straight lines by using LOI pooling. A classification network and a regression network are applied to the extracted feature vectors to identify positive lines and adjust the line positions. These methods extract line-wise feature vectors by aggregating deep features solely along each line, leading to inadequate context information, and are less efficient in terms of running time.

Pipeline

Our method comprises the following four major components:

A CNN encoder that extracts pixel-wise deep representations.
The deep Hough transform that converts the spatial representations to a parametric space.
Line detector that is responsible to detect lines in the parametric space.
A reverse Hough transform that converts the detected lines back to image space.

Fig2. Pipeline of our proposed method. DHT is short for the proposed Deep Hough Transform, and RHT represents the Reverse Hough Transform.

Deep Hough Transform performs Hough Transform on deep representations and transforms the spatial features to parametric space with high dimensions in parallel. The line structures could be more compactly represented in parametric space because lines nearby a specific line are translated to surrounding points of this line in parametric space. We use a context-aware line detector to aggregate features of nearby lines. Then we directly get individual points in the prediction of parametric space. The time-consuming NMS is converted to calculate the geological center of the connected-points in parametric space. Finally, reverse mapping of Hough Transform maps points in parametric space back to spatial space.