GATCluster: Self-Supervised Gaussian-Attention Network for Image Clustering
Deep clustering has achieved state-of-the-art results via joint representation learning and clustering, but still has an inferior performance for the real scene images, e.g., those in ImageNet. With such images, deep clustering methods face several challenges, including extracting discriminative features, avoiding trivial solutions, capturing semantic information, and performing on large-size image datasets. To address these problems, here we propose a self-supervised attention network for image clustering (AttentionCluster). Rather than extracting intermediate features first and then performing the traditional clustering algorithm, AttentionCluster directly outputs semantic cluster labels that are more discriminative than intermediate features and does not need further post-processing. To train the AttentionCluster in a completely unsupervised manner, we design four learning tasks with the constraints of transformation invariance, separability maximization, entropy analysis, and attention mapping. Specifically, the transformation invariance and separability maximization tasks learn the relationships between sample pairs. The entropy analysis task aims to avoid trivial solutions. To capture the object-oriented semantics, we design a self-supervised attention mechanism that includes a parameterized attention module and a soft-attention loss. All the guiding signals for clustering are self-generated during the training process. Moreover, we develop a two-step learning algorithm that is training-friendly and memory-efficient for processing large-size images. Extensive experiments demonstrate the superiority of our proposed method in comparison with the state-of-the-art image clustering benchmarks.
READ FULL TEXT