Associating Inter-Image Salient Instances for Weakly Supervised Semantic Segmentation
Ruochen Fan1 Qibin Hou2 Ming-Ming Cheng2 Gang Yu3 Ralph R. Martin4 Shi-Min Hu1
1Tsinghua University 2 University Nankai 3Megvii Inc. 4Cardiff University
Abstract
Effectively bridging between image level keyword annotations and corresponding image pixels is one of the main challenges in weakly supervised semantic segmentation. In this paper, we use an instance-level salient object detector to automatically generate salient instances (candidate objects) for training images. Using similarity features extracted from each salient instance in the whole training set, we build a similarity graph, then use a graph partitioning algorithm to separate it into multiple subgraphs, each of which is associated with a single keyword (tag). Our graph-partitioning-based clustering algorithm allows us to consider the relationships between all salient instances in the training set as well as the information within them. We further show that with the help of attention information, our clustering algorithm is able to correct certain wrong assignments, leading to more accurate results. The proposed framework is general, and any state-of-the-art fully-supervised network structure can be incorporated to learn the segmentation network. When working with DeepLab for semantic segmentation, our method outperforms state-of-the-art weakly supervised alternatives by a large margin, achieving 65.6% mIoU on the PASCAL VOC 2012 dataset. We also combine our method with Mask R-CNN for instance segmentation, and demonstrated for the first time the ability of weakly supervised instance segmentation using only keyword annotations.
Paper
- Deeply supervised salient object detection with short connections, Ruochen Fan, Qibin Hou, Ming-Ming Cheng, Gang Yu, Ralph R. Martin, Shi-Min Hu, ECCV, 2018. [pdf] [Project Page]
If you find our work is helpful, please cite
@inproceedings{fan2018associating, title={Associating Inter-Image Salient Instances for Weakly Supervised Semantic Segmentation}, author={Fan, Ruochen and Hou, Qibin and Cheng, Ming-Ming and Yu, Gang and Martin, Ralph R and Hu, Shi-Min}, booktitle={Proceedings of the European Conference on Computer Vision (ECCV)}, pages={367--383}, year={2018} }
Contact
644142239 AT qq DOT com (Ruochen Fan)
Algorithm Pipeline
Instances are extracted from the input images by a salient instance detector (e.g., S4Net). An attention module predicts the probability of each salient instance belonging to a certain category using its intrinsic properties. Semantic features are obtained from the salient instances and used to build a similarity graph. Graph partitioning is used to determine the final tags of the salient instances. The fully supervised segmentation network (e.g., DeepLab or Mask R-CNN) is trained using the proxy ground-truth generated.
Performance
The experiments show that our proposed framework greatly outperforms all existing weakly-supervised methods on PASCAL VOC dataset.