Introduction
Spectral clustering algorithms [ng2002spectral, shi2000normalized] discover the corresponding embedding of data via utilizing manifold information embedded in the sample distribution, which has shown the stateoftheart performance in many applications [li2015superpixel, zhao2017multi, Seg_Lichen_TIP18]. In addition to single spectral clustering task scenario, [yang2015multitask] proposes a multitask spectral clustering model, and aims to perform multiple clustering tasks and make them reinforce each other. However, most recentlyproposed models [zhang2018multi, pang2018spectral, kang2018unified] focus on clustering tasks with a fixed task set. When applied into a new task environment or incorporated into a new spectral clustering task, these models have to repeatedly access to previous clustering tasks, which can result in high energy consumption in real applications, e.g., in mobile applications. In this paper, our work explores how to adopt the spectral clustering scenario into the setting of lifelong machine learning.
For the lifelong machine learning, recent works [ruvolo2013ella, isele2016using, xu2018lifelong, sun2018robust, sun2019representative]
have explored the methods of accumulating the single task over time. Generally, lifelong learning utilizes knowledge from previously learned tasks to improve the performance on new tasks, and accumulates a knowledge library over time. Although these models have been successfully adopted into supervised learning
[chen2018lifelong, sun2018active][ammar2014online, isele2018selective], its application in spectral clustering, one of the most classical research problems in machine learning community, is still sparse. Take the news clustering tasks as an example, the semantic meaning of Artificial Intelligence and NBA are very dissimilar in the newspaper of year 2010, and should be divided into different clusters. The clustering task of year 2010 can thus contribute to the clustering task of year 2020 in a neverending perspective, since the correlation information between Artificial Intelligence and NBA of year 2020 is similar with that in year 2010.Inspired by the above scenario, this paper aims to establish a lifelong learning system with spectral clustering tasks, i.e., lifelong spectral clustering. Generally, the main challenges among multiple consecutive clustering tasks are as follows: 1) Cluster Space Correlation: the latent cluster space should be consistent among multiple clustering tasks. For example, for the news clustering task, the cluster centers in year 2010 can be {Business, Technology, Science, etc}, while the cluster centers in year 2020 are similar to that in year 2010; 2) Feature Embedding Correlation: another correlation among different clustering tasks is feature correlation. For example, in consecutive news cluster tasks, the semantic meaning of Artificial Intelligence are very similar in year 2010 and year 2020. Thus, the feature embedding of Artificial Intelligence should be same for these two tasks.
To tackle the challenges above, as shown in Figure 1, we propose a Lifelong Spectral Clustering (i.e., ) model by integrating cluster space and feature embedding correlations, which can achieve neverending knowledge transfer between previous clustering tasks and later ones. To achieve this, we present two knowledge libraries to preserve the common information among multiple clustering tasks, i.e., orthogonal basis and feature embedding libraries. Specifically, 1) orthogonal basis library contains a set of latent cluster centers, i.e., each sample of cluster tasks can be effectively assigned to multiple clusters with different weights; 2) feature embedding library can be modeled by introducing bipartite graph coclustering, which can not only discover the shared manifold information among cluster tasks, but also maintain the data manifold information of each individual task. When a new spectral clustering task is coming, can firstly encode the new task via transferring the knowledge of both orthogonal basis library and feature embedding library to encode the new task. Accordingly, these two libraries can be refined over time to keep on improving across all clustering tasks. For model optimisation, we derive a general lifelong learning formulation, and further optimize this optimization problem via applying an alternating direction strategy. Finally, we evaluate our proposed model against several spectral clustering algorithms and even multitask clustering models on several datasets. The experimental results strongly support our proposed model.
The novelties of our proposed model include:

To our best knowledge, this work is the first attempt to study the problem of spectral clustering in the lifelong learning setting, i.e., Lifelong Spectral Clustering (), which can adopt previously accumulated experience to incorporate new cluster tasks, and improve the clustering performance accordingly.

We present two common knowledge libraries: orthogonal basis library and feature embedding libray, which can simultaneously preserve the latent clustering centers and capture the feature correlations among different clustering tasks, respectively.

We propose an alternating direction optimization algorithm to optimize the proposed model efficiently, which can incorporate fresh knowledge gradually from online dictionary learning perspective. Various experiments show the superiorities of our proposed model in terms of effectiveness and efficiency.
Related Work
In this section, we briefly provide a review on two topics: Multitask Clustering and Lifelong Learning.
For the Multitask Clustering [zhang2018multi], the learning paradigm is to combine multitask learning [sun2017joint]
with unsupervised learning, and the key issue is how to transfer useful knowledge among different clustering tasks to improve the performance. Based on this assumption, recentlyproposed methods
[zhang2017multi, huy2013feature] achieve knowledge transfer for clustering via using some sample from other tasks to form better distance metrics or nn graphs. However, these methods ignore employing the task relationships in the knowledge transfer process. To preserve task relationships, multitask Bregman clustering (MBC) [zhang2011multitask] captures the task relationships by alternatively update clusters among different tasks. For the spectral clustering based multitask clustering, multitask spectral clustering (MTSC) [yang2015multitask] take the first attempt to extend spectral clustering into multitask learning. By using the intertask and intratask correlations, a norm regularizer is adopted in MTSC to constrain the coherence of all the tasks based on the assumption that a lowdimensional representation is shared by related tasks. Then a mapping function is learned to predict cluster labels for each individual task.For the Lifelong Learning, the early works on this topic focus on transferring the selective information from task cluster to the new tasks [thrun1996discovering, sun2018lifelong]
, or transferring invariance knowledge in neural networks
[thrun2012explanation]. In contrast, an efficient lifelong learning algorithm (ELLA) [ruvolo2013ella] is developed for online learning multiple tasks in the setting of lifelong learning. By assuming that models of all related tasks share a common basis, each new task can be obtained by transferring knowledge from the basis. Furthermore, [ammar2014online] extends this idea into learn decision making tasks consecutively, and achieves dramatically accelerate learning on a variety of dynamical systems; [isele2016using]proposes a coupled dictionary to incorporate task descriptors into lifelong learning, which can enable performing zeroshot transfer learning. Since observed tasks in lifelong learning system may not compose an
i.i.d samples, learning an inductive bias in form of a transfer procedure is proposed in [pentina2015lifelong]. Different from traditional learning models [rannen2017encoder], [li2016learning]proposes a learning without forgetting method for convolutional neural network, which can train the network only using the data of the new task, and retain performance on original tasks via knowledge distillation
[hinton2015distilling], and train the network using only the data of the new task. Among the discussion above, there is no works concerning lifelong learning in the spectral clustering setting, and our current work represents the first work to achieve lifelong spectral clustering.Lifelong Spectral Clustering ()
This section introduces our proposed lifelong spectral clustering () learning model. Firstly, we briefly review a general spectral clustering formulation for single spectral clustering task. Our model for lifelong spectral clustering task problem is then given.
Revisit Spectral Clustering Algorithm
This subsection reviews a general spectral clustering algorithm with normalized cut. Given an undirected similarity graph with a vertex set
and an corresponding affinity matrix
for the clustering task , where is the number of the features, is the total number of data samples for the task , each element in symmetric matrix denotes the similarity between a pair of vertices . The common choice for matrix can be defined as follows:where is the function for searching nearest neighbors, and controls the spread of the neighbors. After applying the normalized Laplacian:
(1) 
where is a diagonal matrix with the diagonal elements as . The final formulation of spectral clustering turns out to be the wellknown normalized cut [shi2000normalized], and can be expressed as:
(2) 
where the optimal cluster assignment matrix
can be achieved via the eigenvalue decomposition of matrix
. Based on the relaxed continuous solution, then the final discrete solution of can be obtained by spectral rotation or means, e.g., the th element of is , if the sample is assigned to the th cluster; , otherwise.Problem Statement
Given a set of unsupervised clustering tasks , where each individual clustering task has a set of training data samples , and the dimensionality of feature space is . The original intention of multitask spectral clustering method [yang2015multitask] is to uncover the correlations among all the clustering tasks, and predict the cluster assignment matrices for each clustering task. However, learning incremental spectral clustering tasks without accessing to the previouslyadopted clustering data is not considered in traditional single or multitask spectral clustering models. In the setting of spectral clustering, a lifelong spectral clustering system encounters a series of spectral clustering tasks , where each task is defined in Eq. (2), and intends to obtain new cluster assignment matrix for the task . For convenience, this paper assume that the learner in this lifelong machine learning system do not know any information about clustering tasks, e.g., the task distributions, the total number of spectral clustering tasks , etc. When lifelong spectral clustering system receives a batch of data for some spectral clustering task (either a new spectral clustering task or previously learning task ) in each period, this system should obtain cluster assignment matrix of samples of encountered tasks. The goal is to obtain corresponding task assignment matrices such that: 1) Clustering Performance: each obtained assignment matrix should preserve the data configuration of the th task, and partition the new clustering task more accurate; 2) Computational Speed: in each clustering period, obtaining each should be faster than that among traditional multitask spectral clustering methods; 3) Lifelong Learning: new ’s can be arbitrarily and efficiently added when the lifelong clustering system faces with new unsupervised spectral clustering tasks.
The Proposed Model
In this section, we introduce how to model the lifelong learning property and crosstask correlations simultaneously. Basically, there are two challenges in the model:
1) Orthogonal Basis Library: in order to achieve lifelong learning, one of the major component is how to store the previously accumulated experiences, i.e., knowledge library. To tackle this issue, inspired by [han2015unsupervised] which employs the orthogonal basis clustering to uncover the latent cluster centers, each assignment matrix can be decomposed into two submatrices, i.e., a basis matrix called orthogonal basis library, and a cluster encoding matrix , as . Then the multitask spectral clustering formulation can be expressed as:
(3)  
where the orthogonal constraint of matrix encourages each column of to be independent, and is defined in the Eq. (1). Therefore, the orthogonal basis library can be used to refine the latent cluster centers and further obtain an excellent cluster separation.
2) Feature Embedding Library: even though the latent cluster centers can be captured gradually in Eq. (3), it does not consider the common feature embedding transfer across multiple spectral clustering tasks. Motivated by [jiang2012transfer] which adopts graph based coclustering to control and achieve the knowledge transfer between two tasks, we propose to link each pair of clustering tasks together such that one embedding obtained in one task can facilitate the discover of the embedding in another task. We thus define an invariant feature embedding library with group sparse constraint, and give the graph coclustering term as:
(4) 
and for the th task is defined as:
(5) 
where , and . Intuitively, with this sharing embedding library , multiple spectral clustering tasks can transfer embedding knowledge with each other in a perspective of common feature learning [Argyriou:2008].
Given the same graph construction method and training data for each spectral clustering task, we solve the optimal cluster assignment matrix while encouraging each clustering task to share common knowledge in libraries and . By combining these two goals in Eq. (3) and Eq. (4), then lifelong spectral clustering model can be expressed as the following objective function:
(6)  
where ’s are the tradeoff between the each spectral clustering task with the coclustering objective. If ’s are set as , this model can reduce to the multitask spectral clustering model with common cluster centers.
Model Optimization
This section shows how to optimize our proposed model. Normally, standard alternating direction strategy using all the learned tasks is inefficient to this lifelong learning model in Eq. (6). Our goal in this paper is to build an lifelong clustering algorithm that both CPU time and memory space have lower computational cost than offline manner. When a new spectral clustering task arrives, the basic ideas for optimizing Eq. (6) is: both , and should be updated without accessing to the previously learned tasks, e.g., the previous data in matrices . In the following, we briefly introduce the proposed update rules, and provide the convergence analysis in the experiment.
Updating with fixed and :
With the fixed and , the problem for solving encoding matrix can be expressed as:
(7) 
With the orthonormality constraint, can be updated in the setting of Stiefel manifold [manton2002optimization], which is defined by the following Proposition.
Proposition 1.
Let be a rank
matrix, where the singular value decomposition (
i.e., SVD) of is . The projection of matrix on Stiefel manifold is defined as:(8) 
The projection could be calculated as: .
Therefore, we can update by moving it in the direction of increasing the value of the objective function, and the update operator can be given as:
(9) 
where is the step size, is the objective function of Eq. (7), and can be defined as . To guarantee the convergence of the optimization problem in Eq. (7), we provide a convergence analysis at the experiment section.
Updating with fixed and :
With the obtained encoding matrix for the new coming th task, the optimization problem for variable can be:
(10) 
Based on the orthonormality constraint , we can rewrite Eq. (10) as follows:
(11)  
To better store the previous knowledge of learned clustering tasks, we then introduce two statistical variables:
(12) 
where , and . Therefore, knowledge of new task is and . With as a warm start, so:
(13) 
It is wellknown that the solution of can be relaxedly obtained by the eigendecomposition of . Notice that even though the input parameter of Eq. (13) contains , the above solution is also effective since the proposed algorithm converges very quickly in the online manner.
Updating with fixed and :
With the obtained center library and encoding matrix for the new coming th task, the optimization problem for variable can be denoted as:
(14) 
and the equivalent optimization problem can be formulated as following equations:
(15)  
which is also definition of projection of on the Stiefel manifold. Further, denotes a diagonal matrix with each diagonal element as: [nie2010efficient], where is the th row of .
Finally, the cluster assignment matrices for all learned tasks can be computed via , and final indicator matrices are obtained using means. The whole optimization procedure is summarized in Algorithm 1.
Metrics  stSC  uSC  OnestepSC  MBC  SMBC  SMKC  MTSC  MTCMRL  Ours  

Task1  Purity()  62.660.00  59.780.31  66.890.63  63.954.07  64.624.05  60.593.70  65.920.68  74.401.16  80.001.25 
NMI()  13.950.00  13.151.68  14.563.44  26.443.73  25.532.74  14.144.38  25.730.98  38.711.47  49.071.41  
RI()  59.890.00  58.830.04  64.761.06  61.643.58  62.582.65  59.451.62  62.850.76  73.470.64  79.053.67  
Task2  Purity()  62.000.00  67.00 0.28  68.400.02  68.121.81  68.060.92  60.732.56  69.000.84  72.082.19  74.401.13 
NMI()  16.720.00  20.28 1.81  20.562.39  27.223.92  27.023.61  13.583.52  26.571.63  33.423.25  41.891.49  
RI()  57.120.00  60.382.06  64.811.52  68.042.46  68.323.29  58.31 1.19  66.570.85  69.941.72  74.790.13  
Task3  Purity()  69.210.27  59.800.27  69.800.55  64.865.36  68.042.28  66.014.13  68.230.55  76.473.15  74.121.10 
NMI()  29.240.30  15.602.42  22.552.36  26.503.97  28.323.86  22.095.95  29.330.99  40.975.26  44.693.68  
RI()  66.570.19  61.840.60  66.160.22  65.864.09  67.343.23  65.022.41  65.560.87  76.344.85  78.531.97  
Task4  Purity()  69.610.00  70.420.23  71.310.92  72.184.17  71.214.08  69.822.58  69.930.46  78.232.68  80.060.18 
NMI()  33.750.00  33.150.49  36.840.59  39.975.24  39.532.74  30.314.17  45.640.66  49.232.17  49.260.79  
RI()  66.930.00  67.500.54  68.690.94  70.273.59  70.292.65  67.621.85  60.721.15  79.011.54  77.940.97  
Avg.Purity()  65.870.07  64.250.27  69.100.53  67.283.85  67.982.83  64.293.24  68.270.64  75.192.25  77.140.92  
Avg.NMI()  23.420.07  20.551.60  23.632.19  30.034.22  30.104.05  20.034.50  31.821.07  40.583.04  46.261.84  
Avg.RI()  62.630.05  62.140.81  66.110.94  66.453.43  70.292.65  62.601.76  63.930.91  74.692.19  77.581.68 
Metrics  stSC  uSC  OnestepSC  MBC  SMBC  SMKC  MTSC  MTCMRL  Ours  

Task1  Purity()  95.630.00  85.440.00  94.660.00  73.309.27  89.901.40  95.750.72  97.570.00  97.570.00  98.060.00 
NMI()  82.720.00  60.540.00  75.891.52  61.392.32  77.923.31  84.172.05  89.490.00  89.490.00  91.190.00  
RI()  94.640.00  82.220.00  91.441.06  73.837.26  88.351.77  94.350.88  96.830.00  96.830.00  97.430.00  
Task2  Purity()  84.620.00  70.000.00  86.920.00  70.190.73  92.880.38  90.961.15  96.150.54  97.310.54  98.230.09 
NMI()  62.910.00  53.170.00  64.450.00  53.437.81  79.541.27  75.762.65  84.891.62  88.932.46  91.701.01  
RI()  80.830.00  75.950.00  82.520.00  71.771.08  90.440.44  88.121.35  95.070.55  96.410.77  98.110.05  
Task3  Purity()  75.260.00  82.630.00  76.051.86  72.369.78  75.242.98  76.502.07  90.790.37  94.210.00  95.260.74 
NMI()  54.000.00  59.850.00  61.741.44  46.356.70  54.115.41  52.722.79  73.370.66  79.450.00  78.620.47  
RI()  70.140.00  78.010.00  74.641.54  74.343.64  70.014.33  72.732.89  88.330.49  93.130.00  93.070.51  
Avg.Purity()  85.170.00  79.360.00  85.880.62  71.956.59  86.011.59  87.741.32  94.960.46  96.360.18  97.180.74  
Avg.NMI()  66.540.00  79.360.18  67.350.99  53.725.61  70.523.33  70.882.50  83.631.14  85.960.82  87.710.47  
Avg.RI()  81.870.00  78.730.90  82.870.87  73.317.33  82.932.18  85.071.71  93.540.52  95.450.26  96.230.50 
Metrics  stSC  uSC  OnestepSC  MBC  SMBC  SMKC  MTSC  MTCMRL  Ours  

Task1  Purity()  63.890.15  44.520.49  66.531.98  47.692.13  50.455.41  73.891.36  77.270.78  81.591.45  81.051.05 
NMI()  30.770.33  4.350.33  38.741.10  19.292.76  24.803.18  37.752.68  45.350.83  49.381.55  46.380.62  
RI()  61.270.30  56.320.24  65.541.48  48.937.45  54.190.72  72.091.17  74.310.69  78.451.47  78.600.15  
Task2  Purity()  53.540.48  40.890.00  55.970.13  48.562.96  50.461.31  66.811.44  63.550.78  65.060.77  73.470.09 
NMI()  34.680.20  9.920.00  32.860.08  21.273.45  23.237.97  40.762.88  42.520.33  44.210.39  52.750.41  
RI()  60.080.66  65.510.00  62.540.17  64.312.16  63.824.60  76.261.01  70.230.21  72.190.18  81.170.05  
Task3  Purity()  59.070.00  54.740.00  59.871.68  49.853.05  52.341.43  60.402.15  68.861.26  77.860.69  83.730.11 
NMI()  34.580.09  17.630.00  39.251.93  20.535.41  23.374.01  30.241.12  38.811.56  46.051.31  55.540.37  
RI()  61.080.01  58.100.00  61.471.51  48.352.76  52.670.89  65.230.98  64.061.30  75.140.58  82.060.14  
Task4  Purity()  51.510.14  52.350.45  54.370.29  46.332.86  75.184.77  68.690.35  67.350.35  74.850.89  72.083.19 
NMI()  32.530.32  26.130.87  34.120.73  21.373.48  44.094.78  41.150.95  44.030.31  54.020.65  56.711.33  
RI()  52.540.19  64.700.25  56.270.38  46.612.70  78.992.71  74.680.41  70.350.41  78.560.74  82.291.25  
Avg.Purity()  56.990.19  48.120.23  59.181.02  48.112.75  57.014.35  67.451.33  69.250.64  74.910.93  77.731.11  
Avg.NMI()  33.030.24  14.510.30  36.240.96  20.623.78  28.124.98  37.481.93  42.680.76  48.390.98  52.840.68  
Avg.RI()  58.730.29  61.160.12  61.460.89  52.053.77  62.422.23  72.070.89  69.740.41  76.150.85  81.120.39 
stSC  uSC  OnestepSC  MBC  SMBC  SMKC  MTSC  MTCMRL  Ours  

WebKB4(s)  1.220.01  1.210.03  600.9126.60  6.971.08  5.770.14  34.790.47  69.721.26  14.511.30  2.690.02 
Reuters(s)  0.870.20  1.310.22  1410.4747.47  3.910.19  5.470.14  16.860.84  71.791.20  8.260.28  1.320.01 
20NewsGroups(s)  2.920.07  5.270.02  3500.1677.70  19.191.04  26.541.30  316.223.53  44.013.53  384.5219.55  9.950.29 
Experiments
This section evaluates the clustering performance of our proposed model via several empirical comparisons. We firstly introduce the used competing models. Several adopted datasets and experimental results are then provided, followed by some analyses of our model.
Comparison Models and Evaluation
The experiments in this subsection evaluate our proposed model with three single spectral clustering models, and five multitask clustering models.
Single spectral clustering models: 1) Spectral Clustering (stSC) [ng2002spectral]: standard spectral clustering model; 2) Spectral clusteringunion (uSC) [ng2002spectral]: spectral clustering model, which can be achieved via collecting all the clustering task data (i.e., “pooling” all the task data and ignoring the multitask setting); 3) Onestep spectral clustering (OnestepSC) [zhu2017one]: single spectral clustering task model.
Multitask clustering models: 1) Multitask Bregman Clustering (MBC) [zhang2011multitask]: this model consists of average Bregman divergence and a task regularization; 2) Smart Multitask Bregman Clustering (SMBC) [zhang2015smart]: unsupervised transfer learning model, which focuses on clustering a small collection of target unlabeled data with the help of auxiliary unlabeled data; 3) Smart Multitask Kernel Clustering (SMKC) [zhang2015smart]: this model can deal with nonlinear data by introducing Mercer kernel; 4) MultiTask Spectral Clustering (MTSC) [yang2015multitask]: this model performs spectral clustering over multiple related tasks by using their intertask correlations; 5) MultiTask Clustering with Model Relation Learning (MTCMRL) [zhang2018multi]: this model can automatically learn the model parameter relatedness between each pair of tasks.
For the evaluation, we adopt three performance measures: normalized mutual information (NMI), clustering purity (Purity) and rand index (RI) [schutze2008introduction] to evaluate the clustering performance. The bigger the value of NMI, Purity and RI is, the better the clustering performance of the corresponding model will be. We implement all the models in MATLAB, and all the used parameters of the models are tuned in . Although different ’s are allowed for different tasks in our model, this paper we only differentiate between and .
Real Datasets Experiment Results
According to whether the number of cluster center is consistent or not, there are two different scenarios for multitask clustering tasks: Clusterconsistent and Clusterinconsistent. For the Clusterconsistent dataset, it can be roughly divided into: same clustering task and different clustering tasks with same number of cluster centers. We thus use two datasets in this paper: WebKB4^{1}^{1}1http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo20/www
/data/ with 2500 dimensions and Reuters^{2}^{2}2http://www.cad.zju.edu.cn/home/dengcai/Data/TextData.html with 6370 dimensions, respectively. For the WebKB4 dataset, which includes web pages collected from computer science department websites at 4 universities: Cornell, Texas, Washington and Wisconsin, and 7 categories. Following the setting in [zhang2018multi], 4 most populous categories (i.e., course, faculty, project and student) are chosen for clustering. Accordingly, for the Reuters dataset, 4 most populous root categories (i.e., economic index, energy, food and metal) are chosen for clustering, and the total number of task is 3. For the Clusterinconsistent dataset, we also adopt 20NewsGroups^{3}^{3}3http://qwone.com/ jason/20Newsgroups/ dataset with 3000 dimensions by following [zhang2018multi], which consists of the news documents under 20 categories. Since “negative transfer” [zhou2015flexible] will happen when the cluster centers of multiple consecutive spectral tasks have significant changes, 4 most populous root categories (i.e., comp, rec, sci and talk) are selected for clustering, while the 1th and 3th tasks are set as 3 categories, and the 2th and 4th tasks are set as 4 categories.
The experimental results (competing models with parameter setting are averaged over 10 random repetitions) are provided in Table 1, Table 2 and Table 3, where the task sequence for our is in a random way. From the presented results, we can notice that: 1) Our proposed lifelong spectral clustering model outperforms the singletask spectral clustering methods, since can exploit the information among multiple related tasks, whereas the singletask spectral clustering model only use the information within each task. MTCMRL performs worse than our proposed
in most cases, because even though it incorporates the crosstask relatedness with the linear regression model, it does not consider the feature embedding correlations among each pair of clustering tasks. The reason why MTCMRL performs better than our
in Task1 of 20NewsGroups is that we set in this Clusterinconsistent dataset, whereas the number of cluster center is in Task1. 2) In addition to MTCMRL and singletask spectral clustering models, our performs much better than the comparable multitask clustering model cases. It is because that can not only learn the latent cluster center between each pair of tasks via the orthogonal basis library , but also control the number of embedded features common across the clustering tasks. 3) Additionally, Table 4 also shows that the runtime comparisons between our model and other single/multitask clustering models. is faster and better than the most multitask clustering models on WebKB, Reuters and 20NewsGroups datasets, e.g., SMBC and MTSC, also OnestepSC. However, is little slower than stSC and uSC. This is because both stSC and uSC can obtain the cluster assignment matrix via closedform solution, i.e., eigenvalue decomposition of the in Eq. (2). We perform all the experiments on the computer with Intel i7 CPU, 8G RAM.Evaluating Lifelong learning: This subsection studies the lifelong learning property of our model by following [ruvolo2013ella], i.e., how well the clustering performance will be as the number of clustering tasks increases. We adopt the WebKB4 dataset, set the sequence of learned tasks as: Task1, Task2, Task3 and Task4, and present the clustering performance in Figure 2. Obviously, as new clustering task is imposed stepbystep, the performances (i.e., Purity and NMI) for both learned and learning task are improved gradually when comparing with stSC (initial clustering result of each line in Figure 2), which justifies can accumulate continually knowledge and accomplish lifelong learning just like “human learning”. Furthermore, the performance of early clustering tasks can improve obviously than succeeding ones, i.e., the early spectral clustering tasks can benefit more from the stored knowledge than later ones.
Parameter Investigation: In order to study how the parameters and affect the clustering performance of our . For the WebKB4 dataset, we repeat the ten times by fixing one parameter and tuning the other parameters in . As depicted in Figure 3, we can notice that clustering performance changes with different ratio of parameters, which give the evidence that the appropriate parameters can make the generalization performance better, e.g., for WebKB4 dataset.
Convergence Analysis: To investigate the convergence of our proposed optimisation algorithm for solving model, we plot the value of total loss terms for each new task on WebKB4 and 20NewsGroups datasets. As shown in Figure 4, the objective function values increase with respect to iterations, and the values for each new task approach to be a fixed point after a few iterations (e.g., less than 20 iteration for Task 4 on both datasets), i.e., although the convergence analysis of cannot be proved directly in our paper, we find it converge asymptotically on the realworld datasets.
Conclusion
This paper studies how to add spectral clustering capability into original spectral clustering system without damaging existing capabilities. Specifically, we propose a lifelong learning model by incorporating spectral clustering: lifelong spectral clustering (), which learns a library of orthogonal basis as a set of latent cluster centers, and a library of embedded features for all the spectral clustering tasks. When a new spectral clustering task arrives, can transfer knowledge embedded in the shared knowledge libraries to encode the coming spectral clustering task with encoding matrix, and redefine the libraries with the fresh knowledge. We have conducted experiments on several realworld datasets; the experimental results demonstrate the effectiveness and efficiency of our proposed model.