Lifelong Spectral Clustering

by   Gan Sun, et al.
Northeastern University

In the past decades, spectral clustering (SC) has become one of the most effective clustering algorithms. However, most previous studies focus on spectral clustering tasks with a fixed task set, which cannot incorporate with a new spectral clustering task without accessing to previously learned tasks. In this paper, we aim to explore the problem of spectral clustering in a lifelong machine learning framework, i.e., Lifelong Spectral Clustering (L2SC). Its goal is to efficiently learn a model for a new spectral clustering task by selectively transferring previously accumulated experience from knowledge library. Specifically, the knowledge library of L2SC contains two components: 1) orthogonal basis library: capturing latent cluster centers among the clusters in each pair of tasks; 2) feature embedding library: embedding the feature manifold information shared among multiple related tasks. As a new spectral clustering task arrives, L2SC firstly transfers knowledge from both basis library and feature library to obtain encoding matrix, and further redefines the library base over time to maximize performance across all the clustering tasks. Meanwhile, a general online update formulation is derived to alternatively update the basis library and feature library. Finally, the empirical experiments on several real-world benchmark datasets demonstrate that our L2SC model can effectively improve the clustering performance when comparing with other state-of-the-art spectral clustering algorithms.


A Tighter Analysis of Spectral Clustering, and Beyond

This work studies the classical spectral clustering algorithm which embe...

Spectral Clustering: An empirical study of Approximation Algorithms and its Application to the Attrition Problem

Clustering is the problem of separating a set of objects into groups (ca...

Deep Spectral Clustering using Dual Autoencoder Network

The clustering methods have recently absorbed even-increasing attention ...

Spectral Clustering via Ensemble Deep Autoencoder Learning (SC-EDAE)

Recently, a number of works have studied clustering strategies that comb...

Regularized Non-negative Spectral Embedding for Clustering

Spectral Clustering is a popular technique to split data points into gro...

Deep Kernel Learning for Clustering

We propose a deep learning approach for discovering kernels tailored to ...

Cluster Forests

With inspiration from Random Forests (RF) in the context of classificati...


Spectral clustering algorithms [ng2002spectral, shi2000normalized] discover the corresponding embedding of data via utilizing manifold information embedded in the sample distribution, which has shown the state-of-the-art performance in many applications [li2015superpixel, zhao2017multi, Seg_Lichen_TIP18]. In addition to single spectral clustering task scenario, [yang2015multitask] proposes a multi-task spectral clustering model, and aims to perform multiple clustering tasks and make them reinforce each other. However, most recently-proposed models [zhang2018multi, pang2018spectral, kang2018unified] focus on clustering tasks with a fixed task set. When applied into a new task environment or incorporated into a new spectral clustering task, these models have to repeatedly access to previous clustering tasks, which can result in high energy consumption in real applications, e.g., in mobile applications. In this paper, our work explores how to adopt the spectral clustering scenario into the setting of lifelong machine learning.

Figure 1: The demonstration of our lifelong spectral clustering model, where different shapes are from different clusters. When a new clustering task is coming, the knowledge is iteratively transferred from orthogonal basis library and feature embedding library to encode the new task.

For the lifelong machine learning, recent works [ruvolo2013ella, isele2016using, xu2018lifelong, sun2018robust, sun2019representative]

have explored the methods of accumulating the single task over time. Generally, lifelong learning utilizes knowledge from previously learned tasks to improve the performance on new tasks, and accumulates a knowledge library over time. Although these models have been successfully adopted into supervised learning

[chen2018lifelong, sun2018active]

and reinforcement learning

[ammar2014online, isele2018selective], its application in spectral clustering, one of the most classical research problems in machine learning community, is still sparse. Take the news clustering tasks as an example, the semantic meaning of Artificial Intelligence and NBA are very dissimilar in the newspaper of year 2010, and should be divided into different clusters. The clustering task of year 2010 can thus contribute to the clustering task of year 2020 in a never-ending perspective, since the correlation information between Artificial Intelligence and NBA of year 2020 is similar with that in year 2010.

Inspired by the above scenario, this paper aims to establish a lifelong learning system with spectral clustering tasks, i.e., lifelong spectral clustering. Generally, the main challenges among multiple consecutive clustering tasks are as follows: 1) Cluster Space Correlation: the latent cluster space should be consistent among multiple clustering tasks. For example, for the news clustering task, the cluster centers in year 2010 can be {Business, Technology, Science, etc}, while the cluster centers in year 2020 are similar to that in year 2010; 2) Feature Embedding Correlation: another correlation among different clustering tasks is feature correlation. For example, in consecutive news cluster tasks, the semantic meaning of Artificial Intelligence are very similar in year 2010 and year 2020. Thus, the feature embedding of Artificial Intelligence should be same for these two tasks.

To tackle the challenges above, as shown in Figure 1, we propose a Lifelong Spectral Clustering (i.e., ) model by integrating cluster space and feature embedding correlations, which can achieve never-ending knowledge transfer between previous clustering tasks and later ones. To achieve this, we present two knowledge libraries to preserve the common information among multiple clustering tasks, i.e., orthogonal basis and feature embedding libraries. Specifically, 1) orthogonal basis library contains a set of latent cluster centers, i.e., each sample of cluster tasks can be effectively assigned to multiple clusters with different weights; 2) feature embedding library can be modeled by introducing bipartite graph co-clustering, which can not only discover the shared manifold information among cluster tasks, but also maintain the data manifold information of each individual task. When a new spectral clustering task is coming, can firstly encode the new task via transferring the knowledge of both orthogonal basis library and feature embedding library to encode the new task. Accordingly, these two libraries can be refined over time to keep on improving across all clustering tasks. For model optimisation, we derive a general lifelong learning formulation, and further optimize this optimization problem via applying an alternating direction strategy. Finally, we evaluate our proposed model against several spectral clustering algorithms and even multi-task clustering models on several datasets. The experimental results strongly support our proposed model.

The novelties of our proposed model include:

  • To our best knowledge, this work is the first attempt to study the problem of spectral clustering in the lifelong learning setting, i.e., Lifelong Spectral Clustering (), which can adopt previously accumulated experience to incorporate new cluster tasks, and improve the clustering performance accordingly.

  • We present two common knowledge libraries: orthogonal basis library and feature embedding libray, which can simultaneously preserve the latent clustering centers and capture the feature correlations among different clustering tasks, respectively.

  • We propose an alternating direction optimization algorithm to optimize the proposed model efficiently, which can incorporate fresh knowledge gradually from online dictionary learning perspective. Various experiments show the superiorities of our proposed model in terms of effectiveness and efficiency.

Related Work

In this section, we briefly provide a review on two topics: Multi-task Clustering and Lifelong Learning.

For the Multi-task Clustering [zhang2018multi], the learning paradigm is to combine multi-task learning [sun2017joint]

with unsupervised learning, and the key issue is how to transfer useful knowledge among different clustering tasks to improve the performance. Based on this assumption, recently-proposed methods

[zhang2017multi, huy2013feature] achieve knowledge transfer for clustering via using some sample from other tasks to form better distance metrics or -nn graphs. However, these methods ignore employing the task relationships in the knowledge transfer process. To preserve task relationships, multi-task Bregman clustering (MBC) [zhang2011multitask] captures the task relationships by alternatively update clusters among different tasks. For the spectral clustering based multi-task clustering, multi-task spectral clustering (MTSC) [yang2015multitask] take the first attempt to extend spectral clustering into multi-task learning. By using the inter-task and intra-task correlations, a -norm regularizer is adopted in MTSC to constrain the coherence of all the tasks based on the assumption that a low-dimensional representation is shared by related tasks. Then a mapping function is learned to predict cluster labels for each individual task.

For the Lifelong Learning, the early works on this topic focus on transferring the selective information from task cluster to the new tasks [thrun1996discovering, sun2018lifelong]

, or transferring invariance knowledge in neural networks

[thrun2012explanation]. In contrast, an efficient lifelong learning algorithm (ELLA) [ruvolo2013ella] is developed for online learning multiple tasks in the setting of lifelong learning. By assuming that models of all related tasks share a common basis, each new task can be obtained by transferring knowledge from the basis. Furthermore, [ammar2014online] extends this idea into learn decision making tasks consecutively, and achieves dramatically accelerate learning on a variety of dynamical systems; [isele2016using]

proposes a coupled dictionary to incorporate task descriptors into lifelong learning, which can enable performing zero-shot transfer learning. Since observed tasks in lifelong learning system may not compose an

i.i.d samples, learning an inductive bias in form of a transfer procedure is proposed in [pentina2015lifelong]. Different from traditional learning models [rannen2017encoder], [li2016learning]

proposes a learning without forgetting method for convolutional neural network, which can train the network only using the data of the new task, and retain performance on original tasks via knowledge distillation

[hinton2015distilling], and train the network using only the data of the new task. Among the discussion above, there is no works concerning lifelong learning in the spectral clustering setting, and our current work represents the first work to achieve lifelong spectral clustering.

Lifelong Spectral Clustering ()

This section introduces our proposed lifelong spectral clustering () learning model. Firstly, we briefly review a general spectral clustering formulation for single spectral clustering task. Our model for lifelong spectral clustering task problem is then given.

Revisit Spectral Clustering Algorithm

This subsection reviews a general spectral clustering algorithm with normalized cut. Given an undirected similarity graph with a vertex set

and an corresponding affinity matrix

for the clustering task , where is the number of the features, is the total number of data samples for the task , each element in symmetric matrix denotes the similarity between a pair of vertices . The common choice for matrix can be defined as follows:

where is the function for searching -nearest neighbors, and controls the spread of the neighbors. After applying the normalized Laplacian:


where is a diagonal matrix with the diagonal elements as . The final formulation of spectral clustering turns out to be the well-known normalized cut [shi2000normalized], and can be expressed as:


where the optimal cluster assignment matrix

can be achieved via the eigenvalue decomposition of matrix

. Based on the relaxed continuous solution, then the final discrete solution of can be obtained by spectral rotation or -means, e.g., the -th element of is , if the sample is assigned to the -th cluster; , otherwise.

Problem Statement

Given a set of unsupervised clustering tasks , where each individual clustering task has a set of training data samples , and the dimensionality of feature space is . The original intention of multi-task spectral clustering method [yang2015multitask] is to uncover the correlations among all the clustering tasks, and predict the cluster assignment matrices for each clustering task. However, learning incremental spectral clustering tasks without accessing to the previously-adopted clustering data is not considered in traditional single or multi-task spectral clustering models. In the setting of spectral clustering, a lifelong spectral clustering system encounters a series of spectral clustering tasks , where each task is defined in Eq. (2), and intends to obtain new cluster assignment matrix for the task . For convenience, this paper assume that the learner in this lifelong machine learning system do not know any information about clustering tasks, e.g., the task distributions, the total number of spectral clustering tasks , etc. When lifelong spectral clustering system receives a batch of data for some spectral clustering task (either a new spectral clustering task or previously learning task ) in each period, this system should obtain cluster assignment matrix of samples of encountered tasks. The goal is to obtain corresponding task assignment matrices such that: 1) Clustering Performance: each obtained assignment matrix should preserve the data configuration of the -th task, and partition the new clustering task more accurate; 2) Computational Speed: in each clustering period, obtaining each should be faster than that among traditional multi-task spectral clustering methods; 3) Lifelong Learning: new ’s can be arbitrarily and efficiently added when the lifelong clustering system faces with new unsupervised spectral clustering tasks.

The Proposed Model

In this section, we introduce how to model the lifelong learning property and cross-task correlations simultaneously. Basically, there are two challenges in the model:

1) Orthogonal Basis Library: in order to achieve lifelong learning, one of the major component is how to store the previously accumulated experiences, i.e., knowledge library. To tackle this issue, inspired by [han2015unsupervised] which employs the orthogonal basis clustering to uncover the latent cluster centers, each assignment matrix can be decomposed into two submatrices, i.e., a basis matrix called orthogonal basis library, and a cluster encoding matrix , as . Then the multi-task spectral clustering formulation can be expressed as:


where the orthogonal constraint of matrix encourages each column of to be independent, and is defined in the Eq. (1). Therefore, the orthogonal basis library can be used to refine the latent cluster centers and further obtain an excellent cluster separation.

2) Feature Embedding Library: even though the latent cluster centers can be captured gradually in Eq. (3), it does not consider the common feature embedding transfer across multiple spectral clustering tasks. Motivated by [jiang2012transfer] which adopts graph based co-clustering to control and achieve the knowledge transfer between two tasks, we propose to link each pair of clustering tasks together such that one embedding obtained in one task can facilitate the discover of the embedding in another task. We thus define an invariant feature embedding library with group sparse constraint, and give the graph co-clustering term as:


and for the -th task is defined as:


where , and . Intuitively, with this sharing embedding library , multiple spectral clustering tasks can transfer embedding knowledge with each other in a perspective of common feature learning [Argyriou:2008].

Given the same graph construction method and training data for each spectral clustering task, we solve the optimal cluster assignment matrix while encouraging each clustering task to share common knowledge in libraries and . By combining these two goals in Eq. (3) and Eq. (4), then lifelong spectral clustering model can be expressed as the following objective function:


where ’s are the trade-off between the each spectral clustering task with the co-clustering objective. If ’s are set as , this model can reduce to the multi-task spectral clustering model with common cluster centers.

1:  Input: Spectral clustering tasks: , Library: , , , Statistical records: , ;
2:  while Receive clustering task data do
3:     New -th task: ;
4:     Construct matrices ;
5:     while Not Converge do
6:        Update via Eq. (7);
7:        Update via Eq. (10);
8:        Update via Eq. (14);
9:        Update via ;
10:     end while
11:     Compute cluster assignment matrices via ;
12:     Compute final indicator matrices via -means;
13:  end while
Algorithm 1 Lifelong Spectral Clustering () Model

Model Optimization

This section shows how to optimize our proposed model. Normally, standard alternating direction strategy using all the learned tasks is inefficient to this lifelong learning model in Eq. (6). Our goal in this paper is to build an lifelong clustering algorithm that both CPU time and memory space have lower computational cost than offline manner. When a new spectral clustering task arrives, the basic ideas for optimizing Eq. (6) is: both , and should be updated without accessing to the previously learned tasks, e.g., the previous data in matrices . In the following, we briefly introduce the proposed update rules, and provide the convergence analysis in the experiment.

Updating with fixed and :

With the fixed and , the problem for solving encoding matrix can be expressed as:


With the orthonormality constraint, can be updated in the setting of Stiefel manifold [manton2002optimization], which is defined by the following Proposition.

Proposition 1.

Let be a rank

matrix, where the singular value decomposition (

i.e., SVD) of is . The projection of matrix on Stiefel manifold is defined as:


The projection could be calculated as: .

Therefore, we can update by moving it in the direction of increasing the value of the objective function, and the update operator can be given as:


where is the step size, is the objective function of Eq. (7), and can be defined as . To guarantee the convergence of the optimization problem in Eq. (7), we provide a convergence analysis at the experiment section.

Updating with fixed and :

With the obtained encoding matrix for the new coming -th task, the optimization problem for variable can be:


Based on the orthonormality constraint , we can rewrite Eq. (10) as follows:


To better store the previous knowledge of learned clustering tasks, we then introduce two statistical variables:


where , and . Therefore, knowledge of new task is and . With as a warm start, so:


It is well-known that the solution of can be relaxedly obtained by the eigen-decomposition of . Notice that even though the input parameter of Eq. (13) contains , the above solution is also effective since the proposed algorithm converges very quickly in the online manner.

Updating with fixed and :

With the obtained center library and encoding matrix for the new coming -th task, the optimization problem for variable can be denoted as:


and the equivalent optimization problem can be formulated as following equations:


which is also definition of projection of on the Stiefel manifold. Further, denotes a diagonal matrix with each diagonal element as: [nie2010efficient], where is the -th row of .

Finally, the cluster assignment matrices for all learned tasks can be computed via , and final indicator matrices are obtained using -means. The whole optimization procedure is summarized in Algorithm 1.

Task1 Purity() 62.660.00 59.780.31 66.890.63 63.954.07 64.624.05 60.593.70 65.920.68 74.401.16 80.001.25
NMI() 13.950.00 13.151.68 14.563.44 26.443.73 25.532.74 14.144.38 25.730.98 38.711.47 49.071.41
RI() 59.890.00 58.830.04 64.761.06 61.643.58 62.582.65 59.451.62 62.850.76 73.470.64 79.053.67
Task2 Purity() 62.000.00 67.00 0.28 68.400.02 68.121.81 68.060.92 60.732.56 69.000.84 72.082.19 74.401.13
NMI() 16.720.00 20.28 1.81 20.562.39 27.223.92 27.023.61 13.583.52 26.571.63 33.423.25 41.891.49
RI() 57.120.00 60.382.06 64.811.52 68.042.46 68.323.29 58.31 1.19 66.570.85 69.941.72 74.790.13
Task3 Purity() 69.210.27 59.800.27 69.800.55 64.865.36 68.042.28 66.014.13 68.230.55 76.473.15 74.121.10
NMI() 29.240.30 15.602.42 22.552.36 26.503.97 28.323.86 22.095.95 29.330.99 40.975.26 44.693.68
RI() 66.570.19 61.840.60 66.160.22 65.864.09 67.343.23 65.022.41 65.560.87 76.344.85 78.531.97
Task4 Purity() 69.610.00 70.420.23 71.310.92 72.184.17 71.214.08 69.822.58 69.930.46 78.232.68 80.060.18
NMI() 33.750.00 33.150.49 36.840.59 39.975.24 39.532.74 30.314.17 45.640.66 49.232.17 49.260.79
RI() 66.930.00 67.500.54 68.690.94 70.273.59 70.292.65 67.621.85 60.721.15 79.011.54 77.940.97
Avg.Purity() 65.870.07 64.250.27 69.100.53 67.283.85 67.982.83 64.293.24 68.270.64 75.192.25 77.140.92
Avg.NMI() 23.420.07 20.551.60 23.632.19 30.034.22 30.104.05 20.034.50 31.821.07 40.583.04 46.261.84
Avg.RI() 62.630.05 62.140.81 66.110.94 66.453.43 70.292.65 62.601.76 63.930.91 74.692.19 77.581.68
Table 1: Comparison results in terms of 3 different metrics (mean standard deviation) on WebKB4 dataset.
Task1 Purity() 95.630.00 85.440.00 94.660.00 73.309.27 89.901.40 95.750.72 97.570.00 97.570.00 98.060.00
NMI() 82.720.00 60.540.00 75.891.52 61.392.32 77.923.31 84.172.05 89.490.00 89.490.00 91.190.00
RI() 94.640.00 82.220.00 91.441.06 73.837.26 88.351.77 94.350.88 96.830.00 96.830.00 97.430.00
Task2 Purity() 84.620.00 70.000.00 86.920.00 70.190.73 92.880.38 90.961.15 96.150.54 97.310.54 98.230.09
NMI() 62.910.00 53.170.00 64.450.00 53.437.81 79.541.27 75.762.65 84.891.62 88.932.46 91.701.01
RI() 80.830.00 75.950.00 82.520.00 71.771.08 90.440.44 88.121.35 95.070.55 96.410.77 98.110.05
Task3 Purity() 75.260.00 82.630.00 76.051.86 72.369.78 75.242.98 76.502.07 90.790.37 94.210.00 95.260.74
NMI() 54.000.00 59.850.00 61.741.44 46.356.70 54.115.41 52.722.79 73.370.66 79.450.00 78.620.47
RI() 70.140.00 78.010.00 74.641.54 74.343.64 70.014.33 72.732.89 88.330.49 93.130.00 93.070.51
Avg.Purity() 85.170.00 79.360.00 85.880.62 71.956.59 86.011.59 87.741.32 94.960.46 96.360.18 97.180.74
Avg.NMI() 66.540.00 79.360.18 67.350.99 53.725.61 70.523.33 70.882.50 83.631.14 85.960.82 87.710.47
Avg.RI() 81.870.00 78.730.90 82.870.87 73.317.33 82.932.18 85.071.71 93.540.52 95.450.26 96.230.50
Table 2: Comparison results in terms of 3 different metrics (mean standard deviation) on Reuters dataset.
Task1 Purity() 63.890.15 44.520.49 66.531.98 47.692.13 50.455.41 73.891.36 77.270.78 81.591.45 81.051.05
NMI() 30.770.33 4.350.33 38.741.10 19.292.76 24.803.18 37.752.68 45.350.83 49.381.55 46.380.62
RI() 61.270.30 56.320.24 65.541.48 48.937.45 54.190.72 72.091.17 74.310.69 78.451.47 78.600.15
Task2 Purity() 53.540.48 40.890.00 55.970.13 48.562.96 50.461.31 66.811.44 63.550.78 65.060.77 73.470.09
NMI() 34.680.20 9.920.00 32.860.08 21.273.45 23.237.97 40.762.88 42.520.33 44.210.39 52.750.41
RI() 60.080.66 65.510.00 62.540.17 64.312.16 63.824.60 76.261.01 70.230.21 72.190.18 81.170.05
Task3 Purity() 59.070.00 54.740.00 59.871.68 49.853.05 52.341.43 60.402.15 68.861.26 77.860.69 83.730.11
NMI() 34.580.09 17.630.00 39.251.93 20.535.41 23.374.01 30.241.12 38.811.56 46.051.31 55.540.37
RI() 61.080.01 58.100.00 61.471.51 48.352.76 52.670.89 65.230.98 64.061.30 75.140.58 82.060.14
Task4 Purity() 51.510.14 52.350.45 54.370.29 46.332.86 75.184.77 68.690.35 67.350.35 74.850.89 72.083.19
NMI() 32.530.32 26.130.87 34.120.73 21.373.48 44.094.78 41.150.95 44.030.31 54.020.65 56.711.33
RI() 52.540.19 64.700.25 56.270.38 46.612.70 78.992.71 74.680.41 70.350.41 78.560.74 82.291.25
Avg.Purity() 56.990.19 48.120.23 59.181.02 48.112.75 57.014.35 67.451.33 69.250.64 74.910.93 77.731.11
Avg.NMI() 33.030.24 14.510.30 36.240.96 20.623.78 28.124.98 37.481.93 42.680.76 48.390.98 52.840.68
Avg.RI() 58.730.29 61.160.12 61.460.89 52.053.77 62.422.23 72.070.89 69.740.41 76.150.85 81.120.39
Table 3: Comparison results in terms of 3 different metrics (mean standard deviation) on 20NewsGroups dataset.
WebKB4(s) 1.220.01 1.210.03 600.9126.60 6.971.08 5.770.14 34.790.47 69.721.26 14.511.30 2.690.02
Reuters(s) 0.870.20 1.310.22 1410.4747.47 3.910.19 5.470.14 16.860.84 71.791.20 8.260.28 1.320.01
20NewsGroups(s) 2.920.07 5.270.02 3500.1677.70 19.191.04 26.541.30 316.223.53 44.013.53 384.5219.55 9.950.29
Table 4: Runtime (seconds) on a standard CPU of all competing models.


This section evaluates the clustering performance of our proposed model via several empirical comparisons. We firstly introduce the used competing models. Several adopted datasets and experimental results are then provided, followed by some analyses of our model.

Comparison Models and Evaluation

The experiments in this subsection evaluate our proposed model with three single spectral clustering models, and five multi-task clustering models.

Single spectral clustering models: 1) Spectral Clustering (stSC) [ng2002spectral]: standard spectral clustering model; 2) Spectral clustering-union (uSC) [ng2002spectral]: spectral clustering model, which can be achieved via collecting all the clustering task data (i.e., “pooling” all the task data and ignoring the multi-task setting); 3) One-step spectral clustering (OnestepSC) [zhu2017one]: single spectral clustering task model.

Multi-task clustering models: 1) Multi-task Bregman Clustering (MBC) [zhang2011multitask]: this model consists of average Bregman divergence and a task regularization; 2) Smart Multi-task Bregman Clustering (SMBC) [zhang2015smart]: unsupervised transfer learning model, which focuses on clustering a small collection of target unlabeled data with the help of auxiliary unlabeled data; 3) Smart Multi-task Kernel Clustering (SMKC) [zhang2015smart]: this model can deal with nonlinear data by introducing Mercer kernel; 4) Multi-Task Spectral Clustering (MTSC) [yang2015multitask]: this model performs spectral clustering over multiple related tasks by using their inter-task correlations; 5) Multi-Task Clustering with Model Relation Learning (MTCMRL) [zhang2018multi]: this model can automatically learn the model parameter relatedness between each pair of tasks.

For the evaluation, we adopt three performance measures: normalized mutual information (NMI), clustering purity (Purity) and rand index (RI) [schutze2008introduction] to evaluate the clustering performance. The bigger the value of NMI, Purity and RI is, the better the clustering performance of the corresponding model will be. We implement all the models in MATLAB, and all the used parameters of the models are tuned in . Although different ’s are allowed for different tasks in our model, this paper we only differentiate between and .

Real Datasets Experiment Results

According to whether the number of cluster center is consistent or not, there are two different scenarios for multi-task clustering tasks: Cluster-consistent and Cluster-inconsistent. For the Cluster-consistent dataset, it can be roughly divided into: same clustering task and different clustering tasks with same number of cluster centers. We thus use two datasets in this paper: WebKB4111
with 2500 dimensions and Reuters222 with 6370 dimensions, respectively. For the WebKB4 dataset, which includes web pages collected from computer science department websites at 4 universities: Cornell, Texas, Washington and Wisconsin, and 7 categories. Following the setting in [zhang2018multi], 4 most populous categories (i.e., course, faculty, project and student) are chosen for clustering. Accordingly, for the Reuters dataset, 4 most populous root categories (i.e., economic index, energy, food and metal) are chosen for clustering, and the total number of task is 3. For the Cluster-inconsistent dataset, we also adopt 20NewsGroups333 jason/20Newsgroups/ dataset with 3000 dimensions by following [zhang2018multi], which consists of the news documents under 20 categories. Since “negative transfer” [zhou2015flexible] will happen when the cluster centers of multiple consecutive spectral tasks have significant changes, 4 most populous root categories (i.e., comp, rec, sci and talk) are selected for clustering, while the 1-th and 3-th tasks are set as 3 categories, and the 2-th and 4-th tasks are set as 4 categories.

The experimental results (competing models with parameter setting are averaged over 10 random repetitions) are provided in Table 1, Table 2 and Table 3, where the task sequence for our is in a random way. From the presented results, we can notice that: 1) Our proposed lifelong spectral clustering model outperforms the single-task spectral clustering methods, since can exploit the information among multiple related tasks, whereas the single-task spectral clustering model only use the information within each task. MTCMRL performs worse than our proposed

in most cases, because even though it incorporates the cross-task relatedness with the linear regression model, it does not consider the feature embedding correlations among each pair of clustering tasks. The reason why MTCMRL performs better than our

in Task1 of 20NewsGroups is that we set in this Cluster-inconsistent dataset, whereas the number of cluster center is in Task1. 2) In addition to MTCMRL and single-task spectral clustering models, our performs much better than the comparable multi-task clustering model cases. It is because that can not only learn the latent cluster center between each pair of tasks via the orthogonal basis library , but also control the number of embedded features common across the clustering tasks. 3) Additionally, Table 4 also shows that the runtime comparisons between our model and other single/multi-task clustering models. is faster and better than the most multi-task clustering models on WebKB, Reuters and 20NewsGroups datasets, e.g., SMBC and MTSC, also OnestepSC. However, is little slower than stSC and uSC. This is because both stSC and uSC can obtain the cluster assignment matrix via closed-form solution, i.e., eigenvalue decomposition of the in Eq. (2). We perform all the experiments on the computer with Intel i7 CPU, 8G RAM.

(a) WebKB4 Dataset
(b) WebKB4 Dataset
Figure 2: The influence of the number of learned tasks on WebKB4 datasets in terms of Purity and NMI metrics, where the vertical and horizontal axes denote the clustering performance and number of learned tasks, respectively. The initial clustering performance of each task (except for the first task) of each dataset is achieved using stSC algorithm.
(a) WebKB4 Dataset
(b) WebKB4 Dataset
Figure 3: Parameter analysis of our proposed model on WebKB4 dataset.

Evaluating Lifelong learning: This subsection studies the lifelong learning property of our model by following [ruvolo2013ella], i.e., how well the clustering performance will be as the number of clustering tasks increases. We adopt the WebKB4 dataset, set the sequence of learned tasks as: Task1, Task2, Task3 and Task4, and present the clustering performance in Figure 2. Obviously, as new clustering task is imposed step-by-step, the performances (i.e., Purity and NMI) for both learned and learning task are improved gradually when comparing with stSC (initial clustering result of each line in Figure 2), which justifies can accumulate continually knowledge and accomplish lifelong learning just like “human learning”. Furthermore, the performance of early clustering tasks can improve obviously than succeeding ones, i.e., the early spectral clustering tasks can benefit more from the stored knowledge than later ones.

Parameter Investigation: In order to study how the parameters and affect the clustering performance of our . For the WebKB4 dataset, we repeat the ten times by fixing one parameter and tuning the other parameters in . As depicted in Figure 3, we can notice that clustering performance changes with different ratio of parameters, which give the evidence that the appropriate parameters can make the generalization performance better, e.g., for WebKB4 dataset.

(a) WebKB4 Dataset
(b) 20NewsGroups Dataset
Figure 4: Convergence analysis of our proposed model on (a) WebKB4 and (b) 20NewsGroups datasets, where lines with different colors denote different tasks in each dataset.

Convergence Analysis: To investigate the convergence of our proposed optimisation algorithm for solving model, we plot the value of total loss terms for each new task on WebKB4 and 20NewsGroups datasets. As shown in Figure 4, the objective function values increase with respect to iterations, and the values for each new task approach to be a fixed point after a few iterations (e.g., less than 20 iteration for Task 4 on both datasets), i.e., although the convergence analysis of cannot be proved directly in our paper, we find it converge asymptotically on the real-world datasets.


This paper studies how to add spectral clustering capability into original spectral clustering system without damaging existing capabilities. Specifically, we propose a lifelong learning model by incorporating spectral clustering: lifelong spectral clustering (), which learns a library of orthogonal basis as a set of latent cluster centers, and a library of embedded features for all the spectral clustering tasks. When a new spectral clustering task arrives, can transfer knowledge embedded in the shared knowledge libraries to encode the coming spectral clustering task with encoding matrix, and redefine the libraries with the fresh knowledge. We have conducted experiments on several real-world datasets; the experimental results demonstrate the effectiveness and efficiency of our proposed model.