Fast Incremental SVDD Learning Algorithm with the Gaussian Kernel

09/01/2017 ∙ by Hansi Jiang, et al. ∙ NC State University SAS 0

Support vector data description (SVDD) is a machine learning technique that is used for single-class classification and outlier detection. The idea of SVDD is to find a set of support vectors that defines a boundary around data. When dealing with online or large data, existing batch SVDD methods have to be rerun in each iteration. We propose an incremental learning algorithm for SVDD that uses the Gaussian kernel. This algorithm builds on the observation that all support vectors on the boundary have the same distance to the center of sphere in a higher-dimensional feature space as mapped by the Gaussian kernel function. Each iteration only involves the existing support vectors and the new data point. The algorithm is based solely on matrix manipulations; the support vectors and their corresponding Lagrange multiplier α_i's are automatically selected and determined in each iteration. It can be seen that the complexity of our algorithm in each iteration is only O(k^2), where k is the number of support vectors.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Much effort has been made to detect faults and state shifts in industrial machines through monitoring data sensors. Successful fault diagnosis reduces cost of maintenance and improves both worker and machine efficiency. In machine learning, fault diagnosis can be viewed as an outlier detection problem. Support vector data description (SVDD), a machine learning technique that is used for single-class classification and outlier detection, is similar to support vector machine (SVM). SVDD was first introduced in

[15], although the concept of using SVM to detect novelty was introduced in [12]. SVDD is used in domains where the majority of data belongs to a single class, or when one of the classes is significantly undersampled. The SVDD algorithm builds a flexible boundary around the target class data; this data boundary is characterized by observations that are designated as support vectors. Having the advantage that no assumptions about the distribution of outliers need to be made, SVDD can describe the shape of the target class without prior knowledge of the specific data distribution and can flag observations that fall outside the data boundary as potential outliers. In the case of machine monitoring, data on the normal working conditions of a machine are in abundance, whereas information from outlier system failures are few. By using SVDD on the well-sampled target class, one can obtain a boundary around the distribution of normal working data, and subsequently capture the outlier points where the machine is faulty.

Traditional batch methods of SVDD typically pursue a global optimal solution of the SVDD problem; they suffer from low efficiency by considering all available data points. Moreover, these methods are usually ineffective when handling streaming data because the entire algorithm must be rerun with each incoming data point. In contrast, incremental methods deal with large or streaming data efficiently by focusing on smaller portions of the original optimization problem, as in [14]. Online variants of SVDD concentrate only on the current support vector set with incoming data.

Cauwenberghs and Poggio [2] give an incremental and decremental training algorithm for SVM. Their method, also called the C&P algorithm, provides an exact solution for training data and one new data point. Tax and Laskov [16] use a numerical method to solve incremental SVM, and they describe the relationship between incremental SVM and online SVDD. Their research was extended in [8], which provides complete learning algorithms for incremental SVM and SVDD.

The algorithm given in [8] updates weights of each support vector based on the fact that Karush-Kuhn-Tucker (KKT) conditions must be satisfied before and after a new data point comes in. Consequently, all data points must be kept to pursue an objective value closer to the global optimal value. Furthermore, a kernel matrix must be calculated every update, which can be memory-consuming and slow for large data.

These issues are handled by the algorithm that we propose: fast incremental support vector data description (FISVDD). One of the most important properties of support vectors is that in the most simplified form of SVDD they all have the same distance to the center of a sphere. A similar property remains even when the problem is generalized to flexible boundaries. This property is at the core of FISVDD. Unlike the method in [8], FISVDD uses only matrix manipulations to find interior points and support vectors, and it is highly efficient in detecting outliers. It can be used either as a batch method or as an online method. It can be seen that the cost of FISVDD dealing a new data point is , where is the number of support vectors. By [6], the number of support vectors should be much less than the number of observations in order to avoid overfitting.

The rest of the paper is organized as follows. In Section 2, we introduce the SVDD problem in [15]. In Section 3, we state some theoretical support for FISVDD. In Section 4, the FISVDD algorithm is introduced and explained. In Section 5, we discuss several important issues in implementing FISVDD. In Section 6, FISVDD is applied to some data sets and compared with other methods. Finally, in Section 7, we give our conclusions.

In this paper we follow traditional linear algebra notation. Bold capital letters stand for matrices, and bold small letters stand for vectors. The vector indicates a vector that contains all 1’s with proper dimension, stands for a positive vector, and stands for a nonnegative vector.

2 The SVDD Problem

The SVDD problem is first discussed by Tax and Duin in [15]

. The idea of SVDD is to find support vectors and use them to define a boundary around data. If a testing data point lies outside the boundary, it is classified as an outlier; otherwise, it is classified as normal data. The simplest form of a boundary is a sphere. For a set of data points

, the mathematical formulation of the problem is to find a nonnegative vector that contains Lagrange multipliers for all data points, , such that the following is maximized:

(2.1)

Here is the inner product of and . According to [15], there are three possibilities for each data point. The ’s that have zero ’s are interior points. The ’s for which for a preselected lie on the boundary and are called support vectors. The ’s for which are outliers (also called bounded support vectors, or bsv, in [1]). In this paper, we focus primarily on training the SVDD model, so we assume that , which results in no outliers in the training phase. To determine whether a new data point lies inside the boundary, first the distance between and the center of the sphere, , is calculated:

(2.2)

This distance is then compared to the radius of the sphere for any support vector :

(2.3)

A test data point is accepted if , and it is classified as an outlier if . This check is also called scoring. It is easy to derive the conclusion that scoring is equivalent to checking whether the new data point violates the current KKT conditions.

A kernel function is needed to draw a more flexible boundary around data in order to avoid underfitting. By [15], using a kernel function is equivalent to implicitly mapping data points to a higher feature space. Usually the Gaussian kernel,

(2.4)

is preferred [1, 8, 5], and the Gaussian kernel bandwidth must be selected beforehand. There are some papers that discuss how to choose a proper Gaussian kernel bandwidth [3, 17, 6]. Throughout this paper, it is assumed that the Gaussian similarity is used and that a proper Gaussian kernel bandwidth has been chosen such that the number of support vectors is much less than the number of observations. As stated in Section 5, FISVDD has protections even if a bad bandwidth is provided. With the Gaussian kernel function, the objective function Eq. 2.1 becomes

(2.5)

Note that because , , and is nonnegative, the goal can be further simplified to minimizing

(2.6)

This equation can also be expressed in matrix form:

(2.7)

where is a Gaussian similarity matrix for all data points. Because interior points have , they do not contribute to the objective function value. So the objective function can be further simplified to

(2.8)

where is a Gaussian similarity matrix for support vectors and . Formulas Eq. 2.2 and Eq. 2.3 then become as follows, respectively:

(2.9)
(2.10)

Note that to determine whether a test data point should be accepted, one can compute only

(2.11)

means that is an interior point. It is worth mentioning that all support vectors satisfy , although they might have different ’s.

3 Theoretical Foundations

Here we state and prove several theorems necessary for later discussion. First, we state a lemma in [13] that a Gaussian similarity matrix has full rank. A direct conclusion of the lemma is that a Gaussian similarity matrix is symmetric positive definite (spd).

Lemma 1.

Suppose are distinct points and . Then their Gaussian similarity matrix formed with Eq. 2.4 has full rank.

Lemma 1 implies that is spd and its inverse exists. Next, we state lemmas to obtain if is known and vice versa. In FISVDD, we need to update the inverse of the similarity matrix when a new data point comes in. The proof involves only matrix calculations and is skipped.

Lemma 2.

Suppose and are both Gaussian similarity matrices and

(3.1)

If is known, then is given by

(3.2)

where and .

Lemma 2 provides a method to compute by using and an incremental vector . Note that to compute , we only need to compute . Also note that is the Schur complement [10] of in . Since is spd, is positive [4]. The inverse of Lemma 2 is straightforward and shown below.

Lemma 3.

Suppose is spd and its inverse is given by

(3.3)

Then the inverse of is

(3.4)

Lemma 2 and Lemma 3 together play an essential role in FISVDD to increase efficiency. It can be seen from the lemmas that only multiplications are needed to obtain the updated matrix inverse. Next, we prove that if a positive solution is obtained for the linear system , then all data points in the system are support vectors. This is from the property that all support vectors satisfy .

Theorem 4.

A set of data points are all support vectors if and only if

(3.5)

has a positive solution.

Proof.

Suppose that are all support vectors. Then they all satisfy in Eq. 2.11, and thus the ’s are all equal. From Eq. 2.9, the middle terms,

(3.6)

are all equal for any support vector . Putting Eq. 3.6 together for all support vectors results in the left-hand side of Eq. 3.5. Therefore, Eq. 3.5 has a positive solution. On the other hand, Eq. 3.5 implies that all ’s satisfy and thus are all support vectors. ∎

If a new data point is added to the existing support vector set but the th position in the solution to the linear system is not positive, then the new data point is an interior point. This is proven in the next theorem.

Theorem 5.

Suppose data points form a support vector set. Then a new data point is an interior point if and only if .

Proof.

Suppose that . By Lemma 2, we have

(3.7)

Because , we have

(3.8)

Because , we have

(3.9)

We want to prove that for . Note that

(3.10)

where is the th column of . By Eq. 3.9, we have .

On the other hand, suppose is strictly inside the boundary. Then we have

(3.11)

Then

(3.12)

Theorem 5 says that if we put a new data point into an existing support vector set to form an expanded set and the th position in the solution to the expanded system is less than 0, then is an interior point and thus can be ignored. Because we can permute the rows and columns in , by Theorem 5 if for , we can take out of the expanded set and solve the shrunken linear system. We can continue shrinking the system until there are no negative entries in ; then a support vector set is obtained. We summarize this shrinking step in the next corollary.

Corollary 6.

A data point is an interior point if and only if and the shrunken linear system has a positive solution.

Finally, we state and prove an observation that relates the objective function value, the 1-norm of the unnormalized vector, and the scoring threshold. The observation is substantial for implementing FISVDD. With it a lot of unnecessary computations can be saved. This observation can be also used to make sure that the objective function value in FISVDD is not larger than the objective function value obtained in the previous iteration so the FISVDD model is improved.

Corollary 7.

The objective function value in Eq. 2.8 with positive , , satisfies

(3.13)

where . Moreover, it holds that

(3.14)

where the ’s are the support vectors and is any one of the support vectors.

Proof.

To prove Eq. 3.13, note that by Theorem 4, satisfies . Then

(3.15)

To prove Eq. 3.14, note that is the first term of the right-hand side of Eq. 2.11. So proving Eq. 3.14 is equivalent to proving

(3.16)

where , are support vectors, and is any one of the support vectors. The following equation can be derived:

(3.17)

The second equality is derived from the fact that the term in parentheses is a constant for any support vector , and the third equality is derived from the fact that the sum of all ’s is 1. ∎

Corollary 7 shows a direct relationship between the objective function value, the 1-norm of the solution vector to the linear system , and the scoring threshold. The objective function value is a very important term of an SVDD model and can be requested by the user at any time. When the solution vector of the linear system is derived, the inverse of its 1-norm directly gives the objective function value, and the calculations in Eq. 2.8 are avoided. At the same time, is also the scoring threshold for the current model. Only the second term in Eq. 2.11 needs to be computed when a new data point needs to be scored. The results from Corollary 7 help make our FISVDD algorithm more efficient.

4 Fast Incremental SVDD Learning Algorithm

We propose a fast incremental algorithm of SVDD (FISVDD). The central idea of FISVDD is to minimize the objective function (2.8) by quickly updating the inverse of similarity matrices in each iteration. Suppose that we begin with a support vector set . When a new data point comes in, by Theorem 4 the linear system will have a positive solution if the data points form a new support vector set, and the normalized vector gives the ’s. However, if at least one of the entries in the solution is negative, that indicates there is at least one interior point in the set. Then we are able to drop the negative that has the largest magnitude and solve the shrunken linear system. If the system has a positive solution, then we have found a support vector set. Otherwise, we can continue to drop the next negative that has the largest magnitude and solve the linear system, and so on. It is worth noting that if more than one variable is dropped from the system, the dropped data points should be re-scored against the new boundary to determine whether the KKT conditions are violated. If the KKT conditions are violated, then the system will expand again. We provide details below.

4.1 The FISVDD Algorithm

The FISVDD algorithm is shown in Algorithm 3. It contains three parts of FISVDD: expanding (which is shown in Algorithm 1), shrinking (which is shown in Algorithm 2), and bookkeeping.

4.1.1 Stage 1, Expanding:

When a new data point comes in, it is scored to determine whether it falls in the interior. If so, it is immediately discarded. Otherwise, it is combined with existing support vectors to form an expanded set. The corresponding inverse matrix of the similarity matrix and its row sums are then updated by Lemma 2. If all row sums are positive, then is another support vector and the normalized vector contains the updated ’s. If , then is taken out of the expanded set and the support vector set returns to the previous set. If but there is at least one , then there is at least one interior point in the expanded set and the shrinking step is called. The expanding step is given in Algorithm 1.

1:procedure Expand()
2:     
3:     
4:      Eq. 3.2
5:     
6:      row sums of
7:     if  then
8:         
9:         
10:     else
11:         SV SV +      
12:return
Algorithm 1 Expand

4.1.2 Stage 2, Shrinking:

If but at least one , then at least one existing support vector in the support vector set has become an interior point. We need to identify and discard such vectors. By Corollary 6, we can shrink the support vector set one vector at a time until a positive is obtained. It is possible that there are several negative entries in the vector, but after taking out one negative entry all other entries are positive. Hence, it is recommended to take out one data point at a time rather than taking out several data points. Moreover, taking out several data points at once slows the algorithm because then we need to calculate the inverse of matrices whose rank is larger than 1. In this section, we provide two methods in finding such data points so a new FISVDD model can be formed.

The first method is more empirical. In this method we choose the negative that has the largest magnitude. From Eq. 3.8 and Eq. 3.10 and permuting columns and rows in , we have

(4.1)

where is the of interest permuted to the th position. It can be seen from Eq. 4.1 that if the denominators of the data points that have negative ’s are close, then a data point that has a larger tends to have a larger , which means it lies farther from the boundary. Intuitively, a data point farther from the boundary is more likely to be a true interior point. Although not guaranteed, the data point farthest from the boundary is typically the one we want to remove first. The shrinking step with bookkeeping (explained later) is given in Algorithm 2.

The second method is inspired from Tax and Laskov’s paper [16]. The gradient of the manifold of optimal solutions in SVDD context is defined by

(4.2)

The meaning of is the sensitivity of the change in the vector with respect to the change of the new support vector. By Lemma 2, the values of the existing support vectors can be computed from

(4.3)

where is the old unnormalized vector and is the norm of the unnormalized vector. By looking at Eq. 4.3 it is easy to see that computing requires almost no extra computations other than updating , because has already been calculated in Eq. 3.2. Then the data point with negative new and smallest is the first data point that becomes an interior point. The inverse matrix needs to be shrunk when a data point is moved out of the support vector set. In this method no bookkeeping is needed.

1:procedure Shrink(,Backup)
2:     
3:     while  do
4:         
5:         
6:         
7:         Eq. 3.4
8:          row sums of
9:         if  then
10:                             
11:return
Algorithm 2 Shrink (combined with bookkeeping)

4.1.3 Bookkeeping

When the first shrinking method is performed, some of the previous support vectors are taken out of the support vector set if they have negative ’s. However, having a negative in the middle of a shrinking process does not rule a support vector out from the final set. A data point is considered to be an interior point only if it satisfies when scored with the final support vector set. Therefore, it is necessary to recheck whether the data points taken out of the support vector set are truly interior points. In FISVDD, we build a backup set when the shrinking stage begins. When a data point is taken out of the support vector set, it is put into the backup set. Then the inverse matrix is “downdated” with Eq. 3.4 and its row sums are calculated. The shrinking continues until there are no negative entries in the vector. The backup set keeps growing as the linear system shrinks. When there are no negative values in , we have found a support vector set, although it might not be the final one. Then the data points in the backup set are scored with the support vector set one by one in a first in, first out order. To increase the algorithm’s efficiency, the backup set is scanned only once. If for a data point, then the expanding algorithm is called again, and the data point is removed from the backup set and placed back into the support vector set. The expanding finishes when all data points in the backup set have . Although the same check can be performed on all prior data, doing so would cost too much memory and the gains are far less significant. So the backup set is emptied when each new data point arrives.

For completeness, we add a check to the unnormalized vector to make sure that the result in each iteration is improved from the previous iteration. By Corollary 7, the result is improved if the 1-norm of the unnormalized vector increases. At the end of each iteration, this norm is compared with the norm in the previous iteration. If the norm decreases, then the result from the previous iteration is restored. None of our experiments have ever violated this condition.

1:input Initialize
2:for  do
3:      Eq. 2.11
4:     if  then
5:         pass
6:     else
7:         
8:         if  then
9:               Empty set
10:              
11:                 
12:              if  then
13:                  for  do
14:                       Eq. 2.11
15:                       if  then
16:                           
17:                                                                                       
18:               
Algorithm 3 Fast Incremental Support Vector Data Description (FISVDD) with Bookkeeping
1:input Initialize
2:for  do
3:      Eq. 2.11
4:     if  then
5:         pass
6:     else
7:         
8:         
9:         while  do
10:               Eq. 4.3
11:              
12:                        
13:               
Algorithm 4 Fast Incremental Support Vector Data Description (FISVDD) without Bookkeeping

4.1.4 Differences between FISVDD and the One-Class Incremental SVM Algorithm

The incremental SVM algorithm developed by Laskov et al. [8], which also uses incremental and decremental matrix inverse updating, can be easily modified for SVDD. In this respect, the FISVDD algorithm is similar to the incremental SVM algorithm. But they also have some significant differences.

Foremost, the incremental SVM algorithm keeps all data points because it pursues a global optimal solution to the SVDD problem. If obtaining an optimal solution is the goal, then every data point needs to be kept because an interior point might become a support vector as new data points come into the system. However, a drawback of keeping all the data is that the speed of the algorithm is significantly reduced. In incremental SVM, the similarities between all support vectors and all non-support vectors must be calculated in each iteration. However, speed is very important in dealing with large data or streaming data. In many applications, it can be acceptable to sacrifice a little accuracy in exchange for greater efficiency. Instead of pursuing a global optimal solution, FISVDD tries to obtain the optimal solution in each iteration without the interior points, similar to the idea mentioned in [14]. Results from many experiments show that if a proper Gaussian bandwidth is chosen, then the number of support vectors should be far smaller than the total number of observations. FISVDD takes advantage of this fact by calculating only the similarities between the new data points and the support vectors.

Secondly, incremental SVM uses a numerical approach to find which data point to move between the support vector set and the interior point set. To decide which data point to move, four cases need to be considered and many of the variables need to be created. Moreover, after moving the data point, all variables need to be updated for all data points. By doing this, the incremental SVM algorithm ensures that each step is the best one, but the price is that many calculations must be performed. On the other hand, it can be seen from Algorithm 3 that FISVDD is based only on matrix manipulation. Although incremental SVM also uses steps that update an inverse matrix, these steps do not play an essential role in the algorithm. In contrast, matrix inverse updating steps are the core of FISVDD. FISVDD lets the system itself choose which data points to move between support vector sets and interior point sets. Sometimes the choice of the system might not be optimal, but the existence of backup sets allows the system correct itself and removes a significant amount of calculations. The next section shows that even if the incremental SVM algorithm is modified to ignore interior points, it still performs slower than FISVDD.

In summary, FISVDD is fast and computationally efficient because the algorithm ignores interior points and is built solely on matrix manipulations.

5 Implementation Details

In this section we discuss several important issues for implementing FISVDD.

5.1 Initialization

A key advantage of FISVDD is that the similarity matrix is directly calculated only at initialization. As stated in Section 4, each iteration calculates only the similarities between a new data point and the existing support vectors. These are used to update the inverse of the similarity matrix; the similarity matrix is calculated only at initialization. Once the burn-in data points are selected, their similarity matrix and its inverse are calculated. After the row sums of are calculated, the shrinking step in Algorithm 2 is used to pick out the interior points. Then the vector that contains the normalized row sums of is the initial .

5.2 Outliers and Close Points

Until now, our analysis focused primarily on describing the boundary of the streaming data. Another important feature of SVDD is that it finds outliers in the data so that further investigations can be taken. In [8, 11], data points are classified as outliers based on values. FISVDD assumes that outliers are far from normal data and hence do not influence the support vectors and the ’s. In addition, we assume that the boundary that is determined by the support vectors is robust to outliers. Note that if a data point is far from the support vectors, the vector in Eq. 3.1 should be close to a zero vector, which indicates that the largest value in should be close to 0. In FISVDD, a data point is classified as an outlier if it satisfies the following condition for a preselected parameter :

(5.1)

If is classified as an outlier, then it is passed to further investigation, and no value is assigned to it.

Another special case we have to consider is a new data point that is very close to one of the existing support vectors. Although in practice it is rare that a new data point is exactly the same as an existing support vector, it is possible that they are very close to each other. In this case, the similarity matrix will be ill-conditioned and might be not accurate. We can avoid this situation by also looking at the maximal entry value in . If a new data point is very close to one of the support vectors, then the maximal entry value in will be close to 1. In FISVDD, a point is discarded if it satisfies the following condition for a preselected parameter :

(5.2)

Finally, note that these preprocessing steps can help prevent unnecessary calculations if the Gaussian kernel bandwidth is not a proper bandwidth. If is too small, then every data point tends to be a support vector and the similarity between every pair of data points is close to 0. If is too large, then the similarity between every pair of data points is close to 1. Introducing and can prevent these cases.

5.3 Memory

For any online method, it is important to make sure that both of the following conditions hold:

  • The complexity in each step is small.

  • Memory usage will never expand out of control even for very large data.

For FISVDD, the two challenges are handled smoothly. The first part is easy to see: The key parts in the algorithm (expanding and shrinking the linear systems) require only multiplications each time, where is the number of support vectors. In addition, should be far less than the total number of the whole data set if a proper Gaussian kernel bandwidth is chosen.

For the second part, the number of support vectors can indeed grow large with streaming data. To avoid the potential threat of memory expanding out of control, we set a parameter, , for the maximal number of support vectors, where depends on availability of memory. When is reached, the number of support vectors will not grow large. If a new data point satisfies , then one of the three situations will occur:

  • but at least one of the ’s is less than or equal to 0. In this case, the algorithm runs normally to select the interior points.

  • All ’s are greater than 0, but is the smallest among all ’s. In this case, is discarded.

  • All ’s are greater than 0, and is not the smallest among all ’s. In this case, the support vector that has the smallest is replaced by , and the new ’s are updated.

By handling these three cases, the number of support vectors will not exceed , and the memory usage in each step is controlled.

5.4 Multiple Incremental Data

A discussion of multiple incremental data learning for incremental SVM is in [7]. When multiple data points are added to the current problem together, FISVDD can be modified slightly to adapt to the situation. The challenge of multiple streaming data comes mainly from two aspects:

  • The speed of the algorithm: The efficiency of Algorithm 3 benefits from the fast inverse matrix updating in Lemma 2 and Lemma 3, and that requires updating the matrix with only one new data point. This benefit vanishes if several data points are added to the system together, because computing the inverse of a matrix is involved.

  • The optimal value that is lost in each round: If we use Algorithm 3 directly to deal with multiple streaming data points one by one, there will be an order issue, and the objective function value might be increased.

Fortunately, both issues can be addressed by slightly modifying our algorithm. When multiple data points come in, they are fed to the algorithm one by one. When the calculations are done for each of the data points, the backup set is kept until all data points in the current trunk are dealt with. Before the next trunk of data comes in, the data points in the backup set are scored, and the ones that violate the current KKT conditions are added back to the support vector set by using the expanding stage and possibly the shrinking stage. By doing this for each trunk of data, we pursue a global optimal value while dealing with new data points one by one.

6 Experiments

We examined the performance of FISVDD with four real data sets: shuttle data [9], mammography data [woods1993comparative], forest cover (ForestType) data [Rayana:2016], and the SMTP subset of KDD Cup 99 data [Rayana:2016]. The purpose of our experiments is to show that compared to the incremental SVM method (which can achieve global optimal solutions), the FISVDD method does not lose much in either objective function value or outlier detection accuracy while it demonstrates significant gains in efficiency. Our experiments used 4/5 of the normal data, randomly chosen, for training. The remaining normal data and the outliers together form the testing sets. All duplicates in the data sets are removed beforehand. Proper Gaussian bandwidths are selected by using fivefold cross validation, although selecting a proper Gaussian bandwidth is beyond the scope of this paper. SAS/IML® software is used in performing the experiments. In this paper, we compare FISVDD with the one-class incremental SVM method [8], a well-known technique for performing global optimal SVDD. For each method, the following quantities are measured in Table 1:

  • Time: The time used to learn the SVDD model.

  • Objective function value (OFV): The objective function values that were obtained with Eq. 2.8 after each iteration.

  • Number of support vectors (#sv): The number of support vectors when the training phase is finished. This number is related to the efficiency of the testing phase. When more support vectors exist, more calculations are required in testing.

The time consumed by the incremental SVM method with interior points discarded after each iteration is listed in parentheses. Table 1 also lists the settings for the experiments, including Gaussian bandwidth (Sigma), number of training observations (#Train obs), number of testing observations (#Test obs), and number of variables (#Var).

Data Sigma Method Train obs Test obs #Var OFV Time (s) sv
Shuttle 5.5 FISVDD 36469 21531 9 251.01 1736
Inc. SVM 22923.57 1926
(312.65)
CoverType 470 FISVDD 226641 59407 10 19.47 432
Inc. SVM 12954.81 470
(29.45)
Mammography 0.8 FISVDD 6076 1773 6 1.19 317
Inc. SVM 67.01 317
(1.58)
SMTP 6 FISVDD 56967 14263 3 0.393 0.27 5
Inc. SVM 0.393 2.49 5
(0.38)
Table 1: Experimental Results of FISVDD and Incremental SVM on Different Data Sets

Table 1 shows that for the same Gaussian bandwidth, the FISVDD method is much faster than the incremental SVM method, with only a tiny sacrifice in the objective function value. Because incremental SVM achieves global optimal solutions, the solutions provided by FISVDD are very close to the global optimal solutions. Even with interior points discarded after each iteration, FISVDD is faster than incremental SVM for the data sets in our experiments. As explained in Section 4, FISVDD is faster because it is based solely on matrix manipulation and thus many calculations are saved.

Figure 1 - 4 shows plots of the F-1 measure [tan2006introduction] of the accuracy of FISVDD and incremental SVM with different training sizes. The plots show that by discarding interior points at the end of each iteration, there is almost no loss in the quality of outlier detection.

Figure 1: F-1 Measure for Shuttle Data Set
Figure 2: F-1 Measure for CoverType Data Set
Figure 3: F-1 Measure for Mammography Data Set
Figure 4: F-1 Measure for SMTP Data Set

7 Conclusion

This paper introduces a fast incremental SVDD learning algorithm (FISVDD), which is more efficient than existing SVDD algorithms. In each iteration, FISVDD considers only the incoming data point and the support vectors that were determined in the previous iteration. The essential calculations of FISVDD are contributed from incremental and decremental updates of a similar matrix inverse . This algorithm builds on an observation that is natural in SVDD models but has not been fully utilized by existing SVDD algorithms: that all support vectors on the boundary have the same distance to the center of sphere in a higher-dimensional feature space as mapped by the Gaussian kernel function. FISVDD uses the signs of entries in the row sums of to determine the interior points and support vectors and uses their magnitudes to determine the Lagrange multiplier for each support vector. Experimental results indicate that FISVDD gains much efficiency with almost no loss in accuracy and objective function value.

Acknowledgement

We would like to thank Anne Baxter and Maria Jahja for their help in this paper. We would also like to thank Yuwei Liao, Minghui Liu, Joshua Griffin, Yuwei Liao, and Seunghyun Kong for discussions that are related to SVDD.

References

  • [1] Asa Ben-Hur, David Horn, Hava T. Siegelmann, and Vladimir Vapnik. Support vector clustering. Journal of Machine Learning Research, 2(Dec):125–137, 2001.
  • [2] Gert Cauwenberghs and Tomaso Poggio. Incremental and decremental support vector machine learning. In Advances in Neural Information Processing Systems, pages 409–415, 2001.
  • [3] Paul Evangelista, Mark Embrechts, and Boleslaw Szymanski. Some properties of the Gaussian kernel for one class learning.

    Artificial Neural Networks–ICANN 2007

    , pages 269–278, 2007.
  • [4] Jean Gallier. The Schur complement and symmetric positive semidefinite (and definite) matrices. Penn Engineering, 2010.
  • [5] Bin Gu, Victor S. Sheng, Keng Yeow Tay, Walter Romano, and Shuo Li. Incremental support vector learning for ordinal regression. IEEE Transactions on Neural Networks and Learning Systems, 26(7):1403–1416, 2015.
  • [6] Deovrat Kakde, Arin Chaudhuri, Seunghyun Kong, Maria Jahja, Hansi Jiang, and Jorge Silva. Peak criterion for choosing Gaussian kernel bandwidth in support vector data description. In Prognostics and Health Management (ICPHM), 2017 IEEE International Conference on, pages 33–41. IEEE, 2017.
  • [7] Masayuki Karasuyama and Ichiro Takeuchi. Multiple incremental decremental learning of support vector machines. In Advances in Neural Information Processing Systems, pages 907–915, 2009.
  • [8] Pavel Laskov, Christian Gehl, Stefan Krüger, and Klaus-Robert Müller. Incremental support vector learning: Analysis, implementation and applications. Journal of Machine Learning Research, 7(Sep):1909–1936, 2006.
  • [9] M. Lichman. UCI machine learning repository, 2013.
  • [10] Carl D Meyer. Matrix analysis and applied linear algebra, volume 2. Siam, 2000.
  • [11] Katya Scheinberg. An efficient implementation of an active set method for SVMs. Journal of Machine Learning Research, 7(Oct):2237–2257, 2006.
  • [12] Bernhard Schölkopf, Robert C. Williamson, Alex J. Smola, John Shawe-Taylor, and John C. Platt.

    Support vector method for novelty detection.

    In Advances in Neural Information Processing Systems, pages 582–588, 2000.
  • [13] Alex J. Smola and Bernhard Schölkopf. Learning with kernels. GMD-Forschungszentrum Informationstechnik, 1998.
  • [14] Nadeem Ahmed Syed, Syed Huan, Liu Kah, and Kay Sung. Incremental learning with support vector machines. 1999.
  • [15] David M. J. Tax and Robert P. W. Duin. Support vector data description. Machine learning, 54(1):45–66, 2004.
  • [16] David M. J. Tax and Pavel Laskov. Online SVM learning: from classification to data description and back. In Neural Networks for Signal Processing, 2003. NNSP’03. 2003 IEEE 13th Workshop on, pages 499–508. IEEE, 2003.
  • [17] Yingchao Xiao, Huangang Wang, Lin Zhang, and Wenli Xu. Two methods of selecting Gaussian kernel parameters for one-class SVM and their application to fault detection. Knowledge-Based Systems, 59:75–84, 2014.