Simultaneous Subspace Clustering and Cluster Number Estimating based on Triplet Relationship
Abstract
In this paper, we propose a unified framework to discover the number of clusters and group the data points into different clusters using subspace clustering simultaneously. Real data distributed in a high-dimensional space can be disentangled into a union of low-dimensional subspaces, which can benefit various applications. To explore such intrinsic structure, state-of-the-art subspace clustering approaches often optimize a self-representation problem among all samples, to construct a pairwise affinity graph for spectral clustering. However, a graph with pairwise similarities lacks robustness for segmentation, especially for samples which lie on the intersection of two subspaces. To address this problem, we design a hyper-correlation-based data structure termed as the triplet relationship, which reveals high relevance and local compactness among three samples. The triplet relationship can be derived from the self-representation matrix, and be utilized to iteratively assign the data points to clusters. Based on the triplet relationship, we propose a unified optimizing scheme to automatically calculate clustering assignments. Specifically, we optimize a model selection reward and a fusion reward by simultaneously maximizing the similarity of triplets from different clusters while minimizing the correlation of triplets from the same cluster. The proposed algorithm also automatically reveals the number of clusters and fuses groups to avoid over-segmentation. Extensive experimental results on both synthetic and real-world datasets validate the effectiveness and robustness of the proposed method.
Paper
[1] Simultaneous Subspace Clustering and Cluster Number Estimating based on Triplet Relationship, Jie Liang, Jufeng Yang, Ming-Ming Cheng, Paul L. Rosin, Liang Wang, IEEE TIP, 28(8):3973-3985, 2019. [pdf] [project page]
[2] Automatic Model Selection in Subspace Clustering via Triplet Relationships, Jufeng Yang, Jie Liang, Kai Wang, Yong-Liang Yang, Ming-Ming Cheng, AAAI, 2018. [pdf]
Algorithm
Overview of the proposed autoSC. The algorithm is composed of three steps, \ie, calculating triplet relationships (blue dashed box), estimating the number of clusters via model selection reward (black) and finishing the clustering assignment via fusion reward (green). Given similarity matrix derived from self-representation schemes, we illustrate an example of the triplet which is composed of the three samples shown with magenta frames. These samples induce a high local correlation and should be grouped into the same cluster. By optimizing the fusion reward, sample `a’ is assigned into C since C has most connections to the triplet which involves `a’.
Performance
Overall Comparison among comparative methods on subsets of the extended Yale B and COIL-20 datasets. The similarity matrix calculated by both SMR and LSR is utilized as the correlation matrix of SCAMS, DP, SVD and the proposed autoSC. The best results are in bold font while * indicates the second best performance.