1 Introduction
In this paper, we tackle the problem of Subspace Clustering (SC), which is a subfield of unsupervised learning, aiming to cluster data points drawn from a union of lowdimensional subspaces in an unsupervised manner. Suppose that represents data set with data points in ambient dimension , and data points lie in subspaces of dimensions (). The task of SC is to partition data points into clusters so that data points within the same cluster lie in the same intrinsic subspace . SC has achieved great success in many applications, e.g., motion segmentation [1], face clustering [2] and image representation and compression [3].
Most traditional SC algorithms [4, 5, 6, 7, 8, 9]
are based on the linear subspace assumption to construct the affinity matrix for spectral clustering. However, the data doesn’t necessarily conform to a linear subspace model, which motivates nonlinear SC techniques. Kernel methods
[10, 11, 12]can be employed to implicitly map data to higher dimensional spaces for better conforming to linear models in the resulting spaces. However, the selection of different kernel types is largely empirical without theoretical guarantee. Recently, Convolutional Neural Networks has shown the superior ability in learning image representation, and Deep Subspace Clustering Networks (DSCNet)
[13] have been proposed to exploit the selfexpression of data in a union of subspaces.Despite the significant improvements of clustering accuracy, DSCNet suffers from the slow training compared with conventional “shallow” SC methods. To achieve higher model training efficiency and higher clustering accuracy, we propose a Residual EncoderDecoder network for deep Subspace Clustering (REDSC). In particular, we make the following contributions:

We propose to establish skip connections between corresponding convolutional and deconvolutional layers. These skip connections help to backpropagate the gradients to bottom layers and pass data details to top layers, making training of the endtoend mapping easier and more effective.

We propose to insert the selfexpressive layer in each skip connection to generate the linear representation coefficients. We present a new global loss function and minimize it by REDSC. This helps to learn the linearity information of features in different latent spaces.

To the best of our knowledge, our approach constitutes the first attempt to apply residual encoderdecoder network on the task of unsupervised learning.
Experimental results demonstrate that our network converges much faster in model training and finetuning, and obtains better clustering results. We reduce the computational cost remarkably, and obtain higher accuracy simultaneously.
2 Related Work
2.1 Subspace Clustering
Many methods have been developed for linear subspace clustering. Generally, these approaches are based on a twostage framework. In the first stage, an affinity matrix is generated from data by computing the linear representation coefficients matrix . In the second one, spectral clustering is applied on the affinity matrix. These methods learn the affinity matrix based on the selfexpressiveness model, which states that each data point in a union of subspaces can be expressed as a linear combination of other data points, i.e., where is the data matrix, is the coefficients matrix. To find the coefficients matrix , current methods solve the following optimization problem in the first stage:
(1) 
where denotes different norm regularization applied on . For instance, in Sparse Subspace Clustering (SSC) [4], the norm regularization is adopted as a convex surrogate over the norm regularization to encourage the sparsity of . Least Squares Regression (LSR) [5] uses the norm regularization on . Low Rank Representation (LRR) [6] uses nuclear norm regularization on . Elastic Net Subspace Clustering (ENSC) [7] uses a mixture of norm and norm regularization on . In SSC by Orthogonal Matching Pursuit (OMP) [8] and our previous work SparseDense Subspace Clustering (SDSC) [14], the norm regularization is investigated. However, they can only cluster linear subspaces, which limits their application. To address this problem, kernel based subspace clustering methods [10, 11, 12] have been developed. There is, however, no clear reason why such kernels should correspond to feature spaces that are wellsuited to subspace clustering. Recently, Deep Subspace Clustering Networks (DSCNet) [13] are introduced to tackle the nonlinearity arising in subspace clustering, where data is nonlinearly mapped to a latent space with convolutional autoencoders and a selfexpressive layer is introduced to facilitate an endtoend learning of the coefficients matrix. Although DSCNet outperforms traditional SC methods, the computational cost especially in model training is overwhelming.
2.2 Residual EncoderDecoder
Encoderdecoder networks can nonlinearly map data into a latent space. It can be viewed as a form of nonlinear PCA if the latent space has lower dimension than the original space [15]. Residual encoderdecoder networks with skiplayer connections have been exploited effective in many applications, e.g., image restoration [16], semantic segmentation [17] and iris segmentation [18]. It has been shown that residual encoderdecoder networks converge much faster in model training since the skip connections help to backpropagate the gradients to bottom layers and pass image details to top layers, making training of the endtoend mapping easier. Besides, the feature maps passed by skip connections carry much image detail, which helps deconvolution to recover a better and cleaner image. To the best of our knowledge, it has not been used in any tasks of unsupervised learning. Our REDSC to solve subspace clustering problems constitutes the first attempt to apply residual encoderdecoder on the tasks of unsupervised learning.
3 Residual EncoderDecoder Network for Deep Subspace Clustering (REDSC)
The proposed network uses the residual encoderdecoder and the selfexpressiveness property. In this section, we first discuss each component, then introduce the network architecture, and finally elaborate its training and clustering process.
3.1 Residual EncoderDecoder in REDSC
The DSCNet uses autoencoders to map the data into a latent space, then uses the feature in the latent space to generate the linear representation coefficients for affinity matrix, and finally recover the data by a chain of decoders. But the intuitive question is that, is deconvolution able to recover the data detail from the abstraction only? Another question is that, can the feature from only one latent space represent the data to generate the linear representation coefficients? We find that much data detail is lost in the convolution, making DSCNet hard to train, and the affinity matrix is inaccurate to represent the relationship of original data.
To address the above two problems, inspired by residual networks [19] and highway networks [20], we add skip connections between two corresponding convolutional and deconvolutional layers as shown in Fig.1. A building block is shown in Fig.2. Instead of directly learning the mappings from input to the output , we would like the network to fit the residual of the problem, which is denoted as:
(2) 
Such a learning strategy is applied on inner blocks of the encodingdecoding network to make training more effective.
By using the residual encoderdecoder network, the feature maps passed by skip connections carry much data detail, which helps deconvolution to better recover the data. Besides, the skip connections also achieve benefits on backpropagating the gradient to bottom layers, which avoids the network suffering from gradient vanishing.
3.2 SelfExpressive Layer in REDSC
Recall from the optimization problem in (1), to account for data corruptions, this problem is relaxed as:
(3) 
In our REDSC network, latent representation from multiple layers are adopted as input of the selfexpressive layer to generate the selfexpressive coefficients. Let denote the input data, and denote the output of each convolution layer, we introduce a selfexpression loss as:
(4) 
where is the number of convolutional layer, is the selfexpressive coefficients matrix. Our goal is to train a deep residual encoderdecoder network, we can calculate the reconstruction loss of data after the network as:
(5) 
where represents the data reconstructed by the residual encoderdecoder, and respectively represent the encoder parameters and the decoder parameters. Then we can compute the global loss of our REDSC network:
(6)  
where the network parameters consist of , and . In this work, we consider the norm regularization on for computational efficiency.
3.3 Network Architecture
In this paper, we focus on image clustering problems. As is shown in Fig.1, we use all the images as a single batch. The input images
are mapped to a collection of latent vectors
by each convolutional layer. In the selfexpressive layer, the nodes are fully connected using linear weights without bias and nonlinear activations. Then the latent vectors of skip connections are mapped into symmetric layer in decoder for addition and nonlinear activations (ReLU)
[21], and the output of the last convolutional layer is mapped back into the original space by deconvolutional layers in decoder. Finally we use the selfexpressive coefficients to generate the affinity matrix, and then apply spectral clustering on the affinity matrix to get the clustering labels.In particular, for the th convolutional layer with channels of kernel size , there are weight parameters. The total number of weight parameters in our network is , and that of bias parameters is . Suppose the number of input samples is , then the number of selfexpressive parameters is , which is much larger than the number of weights and bias parameters. Thus the selfexpressive parameters dominate the network.
3.4 Training Strategy
Due to the limited size of data sets for unsupervised subspace clustering, it’s difficult to train a network with millions of parameters. Thus we design a pretraining network without selfexpressive layer in Fig.3. Then we use the trained parameters to initialize the encoder and decoder layers in our finetuning network with the selfexpressive layer. With the help of Adam [22], we then use a big batch of all the data to minimize the loss defined in (6). Note that we don’t use any label information to train the model, our training strategy remains unsupervised. Finally, we use the trained selfexpressive coefficients to construct the affinity matrix for spectral clustering, and get the clustering labels.
4 Experiments
We implement our approach with Tensorflow
[23] on a NVIDIA TITAN Xp GPU, and evaluate the performance of REDSC on a handwritten digit data set MNIST [24], and a face data set Extended Yale B [25]. We compare our REDSC with LRR [6], LRSC [9], SSC [4], SSCOMP [8], SSSC [26], SDSC [14], EDSC [7] and DSCNet [13] with two norm regularization. We use the code provided by the respective authors which is tuned to give the best performance. We evaluate the clustering performance by using clustering error (ERR) [13], normalized mutual information (NMI) [27] and purity (PUR) [28]. For REDSC, the kernel sizes are always 533335 and channels are 102030302010. We use the pretraining network to obtain the parameters for the finetuning networks. The best results in tables are in bold.4.1 Experiments on MNIST
We evaluate the effectiveness of REDSC on MNIST, which consists of 70,000 handwritten digit images of size . We randomly select 1,000 images for each digit, resulting a subset of 10,000 images. For traditional SC algorithms LRR, SSC and ENSC, we use a subset of 1,000 images due to their limited scalability. The results are reported in Table 1.
ERR (%)  NMI (%)  PUR (%)  

LRR  46.25  56.32  56.84 
SSC  55.71  47.09  49.41 
ENSC  50.17  54.94  54.83 
DSC  32.22  67.17  73.87 
DSC  30.09  68.64  74.31 
REDSC  25.66  73.16  77.24 
We can observe that REDSC outperforms the traditional SC algorithms greatly, this is partly because REDSC uses a multilayer encoder as the feature extractor. Besides, compared with the deep approach DSCNet, our REDSC obtains better performance in all three metrics. This is because REDSC tunes the selfexpressive coefficients in multiple latent spaces, while DSCNet only uses the latent representation from the last convolutional layer. This experimental result demonstrates the effectiveness of REDSC to ensure better selfexpressive coefficients for spectral clustering.
4.2 Experiments on Extended Yale B
We evaluate the efficiency of REDSC in model training and finetuning on Extended Yale B, which contains 2,414 frontal face images of 38 individuals under 9 poses and 64 illumination conditions. Each cropped face image consists of 192168 pixels. We downsample the images to 4842 pixels. We randomly pick subjects and take all the images of selected subjects to be clustered.
Subjects  10  15  20  25  30  38 

LRR  19.76  25.82  31.45  28.14  38.59  35.12 
LRSC  30.95  31.47  28.76  27.81  30.64  29.89 
SSC  8.8  12.89  20.11  26.3  27.52  29.36 
SSCOMP  12.08  14.05  15.16  18.89  20.75  23.52 
SSSC  6.34  11.01  14.07  16.79  20.46  19.45 
SDSC  4.62  8.31  11.87  14.55  16.87  16.17 
EDSC  5.64  7.63  9.3  10.67  11.24  11.64 
DSC  2.23  2.17  2.17  2.53  2.63  3.33 
DSC  1.59  1.69  1.73  1.75  2.07  2.67 
REDSC  1.25  1.30  1.37  1.42  1.45  1.48 
As is shown in Table 2
, REDSC remarkably reduces the clustering error and outperforms all the listed methods. This demonstrates again the effectiveness of REDSC. Besides, we report the convergence compared with DSCNet with the same number of parameters in Fig.4. From Fig.4(a) we observe that REDSC converges much faster than DSCNet in training, since the residual encoderdecoder architecture helps backpropagate gradient to better fit the endtoend mapping. From Fig.4(b) we observe that REDSC generates a highquality affinity matrix for spectral clustering by approximately 300 epoches, while DSCNet uses about 1,000 epoches. This is partly because that REDSC uses the latent representation from multiple convolutional layers to finetune the selfexpressive coefficients, which accelerates the convergence. Thus REDSC gains a higher efficiency.
5 Conclusion
We present a Residual EncoderDecoder network for deep Subspace Clustering (REDSC), which symmetrically links convolutional and deconvolutional layers with skiplayer connections. We present a new global loss and minimize it by REDSC. We are the first one to apply residual encoderdecoder on unsupervised learning tasks. Series of experiments validate that REDSC remarkably reduces computational cost and improves clustering performance.
References
 [1] S. R. Rao, R. Tron, R. Vidal, and Y. Ma, “Motion segmentation in the presence of outlying, incomplete, or corrupted trajectories,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 10, pp. 1832–1845, 2010.
 [2] R. Basri and D. W. Jacobs, “Lambertian reflectance and linear subspaces,” in ICCV, 2001, pp. 383–390.

[3]
D. J. Kriegman K.C. Lee, J. Ho,
“Acquiring linear subspaces for face recognition under variable lighting,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 5, pp. 684–698, 2005.  [4] E. Elhamifar and R. Vidal, “Sparse subspace clustering: Algorithm, theory, and applications,” CoRR, vol. 1203.1005, 2012.
 [5] C.Y. Lu, H. Min, Z.Q. Zhao, L. Zhu, D.S. Huang, and S. Yan, “Robust and efficient subspace segmentation via least squares regression,” in ECCV, 2012, pp. 347–360.
 [6] G. Liu, Z. Lin, and Y. Yu, “Robust subspace segmentation by lowrank representation,” in ICML, 2010, pp. 663–670.
 [7] C. You, C.G. Li, D. P. Robinson, and R. Vidal, “Oracle based active set algorithm for scalable elastic net subspace clustering,” in CVPR, 2016, pp. 3928–3937.
 [8] C. You, D. P. Robinson, and R. Vidal, “Scalable sparse subspace clustering by orthogonal matching pursuit,” in CVPR, 2016, pp. 3918–3927.
 [9] R. Vidal and P. Favaro, “Low rank subspace clustering (LRSC),” Patt. Recog. Letters, vol. 43, pp. 47–61, 2014.
 [10] V. M. Patel, H. V. Nguyen, and R. Vidal, “Latent space sparse subspace clustering,” in CVPR, 2013, pp. 225–232.
 [11] V. M. Patel and R. Vidal, “Kernel sparse subspace clustering,” in ICIP, 2014, pp. 2849–2853.
 [12] S. Xiao, M. Tan, D. Xu, and Zhao Y. D., “Robust kernel lowrank representation,” IEEE Trans. Neural Netw. Learning Syst., vol. 27, no. 11, pp. 2268–2281, 2016.
 [13] P. Ji, T. Zhang, H. Li, M. Salzmann, and I. D. Reid, “Deep subspace clustering networks,” in NIPS, 2017, pp. 24–33.
 [14] S. Yang, W. Zhu, and Y. Zhu, “Sparsedense subspace clustering,” .
 [15] G. E. Hinton and R. R. Salakhutdinov., “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006.
 [16] X.J. Mao, C. Shen, and Y.B. Yang, “Image restoration using very deep convolutional encoderdecoder networks with symmetric skip connections,” in NIPS, 2016, pp. 2802–2810.
 [17] J. Jiang, L. Zheng, F. Luo, and Z. Zhang, “Rednet: Residual encoderdecoder network for indoor RGBD semantic segmentation,” CoRR, vol. 1806.01054, 2018.
 [18] M. Arsalan, D. Kim, M. Lee, M. Owais, and R.Kang, “Frednet: Fully residual encoderdecoder network for accurate iris segmentation,” Expert Systems with Applications, vol. 122, 01 2019.
 [19] R. K. Srivastava, K. Greff, and J. Schmidhuber, “Training very deep networks,” in NIPS, 2015, pp. 2377–2385.
 [20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.

[21]
A. Krizhevsky, I. Sutskever, and G. E. Hinton,
“Imagenet classification with deep convolutional neural networks,”
in NIPS, 2012, pp. 1106–1114.  [22] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd ICLR, 2015.
 [23] M. Abadi, A. Agarwal, and P. Barham, “Tensorflow: Largescale machine learning on heterogeneous distributed systems,” CoRR, vol. 1603.04467, 2016.
 [24] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradientbased learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, Nov 1998.
 [25] A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman, “From few to many: Illumination cone models for face recognition under variable lighting and pose,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 6, pp. 643–660, 2001.
 [26] C.G. Li, C. You, and R. Vidal, “Structured sparse subspace clustering: A joint affinity learning and subspace clustering framework,” IEEE Trans. Image Processing, vol. 26, no. 6, pp. 2988–3001, 2017.
 [27] N. X. Vinh, J. Epps, and J. Bailey, “Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance,” J. Mach. Learn. Res., vol. 11, pp. 2837–2854, 2010.
 [28] P. Raghavan C. Manning and H. Schutze, “Introduction to information retrieval,” 2010.
Comments
There are no comments yet.