Support vector data description (SVDD) is a machine learning technique that is used for single-class classification and anomaly detection. First introduced by Tax and Duin
, SVDD’s mathematical formulation is almost identical to the one-class variant of support vector machines: one-class support sector machines (OCSVM), which is attributed to Schölkopf et al.. The use of SVDD is popular in domains in which the majority of data belongs to a single class and no distributional assumptions can be made. For example, SVDD is useful for analyzing sensor readings from reliable equipment for which almost all the readings describe the equipment’s normal state of operation.
Like other one-class classifiers, SVDD provides a geometric description of the observed data. The SVDD classifier assigns a distance to each point in the domain space; the distance measures the separation of that point from the training data. During scoring, any observation found to be at a large distance from the training data might be an anomaly, and the user might choose to generate an alert.
I-a Mathematical Formulation
In this section, we describe the mathematical formulation of SVDD; the
description is based on
Normal Data Description:
The SVDD model for normal data description builds a hypersphere that contains most of the data within a small radius. Given observations , we need to solve the following optimization problem to obtain the SVDD data description.
represent the training data,
is the radius and represents the decision variable,
is the slack for each variable,
is the center (a decision variable),
is the penalty constant that controls the trade-off between the volume and the errors, and
is the expected outlier fraction.
The dual formulation is obtained using Lagrange multipliers.
where are the Lagrange constants, and is the penalty constant.
The position of observation is connected to the optimal , the radius of the sphere, and the center of the sphere in the following manner:
Any for which the corresponding is known as a support vector.
Let denote the set . Then the radius of the hypersphere is calculated as follows for any :
The value of does not depend on the choice of .
For any point , the distance is calculated as follows:
Points whose are designated as outliers.
The spherical data boundary can include a significant amount of space that
has a sparse distribution of training observations. Using this model to score
can lead to a lot of false positives. Hence, instead of a spherical shape, a
compact bounded outline around the data is often desired. Such an outline should
approximate the shape of the single-class training data. This is possible
by using kernel functions.
Flexible Data Description:
The support vector data description is made flexible by replacing the inner product with a suitable kernel function . The Gaussian kernel function used in this paper is defined as
where is the Gaussian bandwidth parameter.
The modified mathematical formulation of SVDD with a kernel function is as:
In perfect analogy with the previous section, any for which is an inside point and any for which is called a support vector.
is similarly defined as and the threshold is calculated, as follows for any :
The value of does not depend on which is used.
Scoring: For any observation , the distance is calculated as follows:
Any point for which is designated as an outlier.
I-B Importance of the Kernel Bandwidth Value
In practice, SVDD is almost always computed by using the Gaussian kernel function, and it is important to set the value of bandwidth parameter correctly. A small bandwidth leads to overfitting, and the resulting SVDD classifier overestimates the number of anomalies. A large bandwidth leads to underfitting, and many anomalies cannot be detected by the classifier.
Because SVDD is an unsupervised learning technique, it is desirable to have an automatic, unsupervised bandwidth selection technique that does not depend on labeled data that separate theinliers from the outliers. In , Kakde et al. present the peak criterion, which is an unsupervised bandwidth selection technique, and show that it performs better than alternative unsupervised methods. However, determining the bandwidth that is suggested by the peak criterion requires that the SVDD solution be computed multiple times for the training data for a list of bandwidth values that lie on a grid. Even though using sampling techniques can speed up the computation (see ), this method is still expensive. Moreover, it is also necessary to initiate the grid search at a good starting value in order to avoid unnecessary computation, and it is not immediately obvious what a good starting value is.
In , Liao et al. present the modified mean criterion for bandwidth selection. The suggested bandwidth has a closed-form expression in terms of the training data, and it can be computed very quickly. SVDD that is trained by using the modified mean criterion bandwidth is reasonably accurate for many data sets; however, it is less accurate than the peak criterion in general.
In this paper, we introduce the trace criterion for Gaussian bandwidth selection. The computation of the trace criterion consists of finding the inflection point of a smooth function of the bandwidth parameter. The computation terminates quickly when standard nonlinear optimization methods such as Newton-Raphson are used. This method is efficient for moderately large data sets.
Our results show that the trace criterion is competitive with the peak criterion for many data sets. Simulation studies suggest that this method is more accurate than the mean criterion for a certain class of high-dimensional data.
These properties make the trace criterion method a good bandwidth selection technique. However, unsupervised bandwidth tuning is an extremely difficult problem, so it is quite possible that there is a class of data sets for which the trace criterion does not give good results.
The rest of the paper is organized as follows. Section II defines the trace criterion for bandwidth tuning, and the remaining sections compare the mean, peak; and trace criterion with each other.
Ii The Trace Criterion for Bandwidth Selection
Ii-a Parameter tuning using inflection points
Parameter tuning for unsupervised learning methods such as clustering and one-class classification can be difficult when external validation data are not available, as is quite frequently the case. A popular method for parameter tuning in such cases is to look at the values of a “validation measure” as a function of the hyperparameter of interest, and choose the value of the hyperparameter where the function has an inflection point. For a specific example, consider the problem of determining the number of clusters for -means. : the sum of the squared distance of each point in the training data from the cluster center closest to can be taken as a validation measure. If the number of clusters is held fixed, then a lower value of indicates better clustering. Let denote the value of for the clustering that is suggested by -means with clusters. As increases, decreases, so you cannot choose the number of clusters as because it would suggest as many clusters as the number of points in the training data. However, it is observed that the value of at which the function has an inflection point is quite frequently a good choice for the number of clusters. The inflection point of other validation measures such as the silhouette coefficient is also used to determine the number of clusters .
We will now propose a validation measure for SVDD whose inflection point provides our suggested bandwidth.
Ii-B Validation measure for trace criterion
Assume we have a training data set that consists of distinct points in and we want to determine a good kernel bandwidth value for training this data set. In kernel methods, including SVDD, the objective function to be optimized depends on how the data are transformed through the kernel matrix Since this matrix has elements, where is the number of observations, it is impossible to work with the entire kernel matrix even for moderate values of . The Gaussian kernel matrix is always positive semidefinite, and it typically has a rapidly decaying spectrum. The rapidly decaying spectrum can be exploited to create a low-rank positive semidefinite approximation of
by replacing all the eigenvalues in the spectral decomposition ofbelow a certain threshold by . has a square root of low rank, that is, , where is an matrix with . Since , can be considered an approximate low- rank square root of . In many cases, computation of expressions that involve can be made tractable by replacing with . However, it is not feasible to compute such that by using the eigendecomposition of when is large. The Nyström methods form a class of popular methods to compute a low-rank representation of even when is large. We use a variant of Nyström method as described in  to construct a validation measure whose inflection point will be suggested as the bandwidth.
Given data in and an integer , let be distinct landmark points in . The landmark points are chosen so that they are evenly distributed within the training data. Following , we choose as the cluster centers that are obtained from a means clustering with clusters of the training data.
From the theory of reproducing kernel Hilbert spaces (RKHS), we know that for any there is a Hilbert space and a mapping such that for any we have , where denotes the inner product on and denotes the norm on .
Let denote the subspace spanned by . Note that is a finite-dimensional and hence closed subspace of of dimension exactly . Given , let denote the projection of on
In the Nyström methods, the kernel matrix is approximated by , a rank matrix that has a low-dimensional square root that can be easily computed.
We use the projected values for a different purpose. We use the discrepancy between and for in the training data to come up with a validation measure.
From usual least square arguments, we know that for any , we have , where are such that
This implies the normal equations,
and implies that can be explicitly computed as
Since and are orthogonal, the squared norm of the residual is given by
This shows that . Let ; the closer
is to , the lower the residual error is. From the preceding expression for you can see that
So, given training data , if we define
then is a measure of the loss of accuracy due to projection into , where lies between and for each Higher values of indicate lower loss in precision due to projection into . In particular, if , there is absolutely no loss of precision. That is,
It is empirically observed that typically increases, as increases (that is, ). Moreover is unimodal: it initially increases until it reaches its maximum value, and then decreases. So typically has a well-defined inflection point where takes its maximum value. See Figure 1 for the graph of a typical plot of and .
The trace criterion suggests using as the bandwidth that empirical observations suggest coincides with the inflection point of in most cases.
Even though is a scalar function, it is defined in terms of matrix operations and we can come up with an explicit closed form for by using usual rules of matrix differential calculus .
Let denote the kernel matrix of representative points . Its element-by-element derivative is given by the matrix,
For , let
denote a column vector. Then its element-by-element derivative is given by the column vector,
Then we have
, and hence
Let . Then the preceding equation simplifies to
From (19), it is apparent that is differentiable on , such as we can use standard nonlinear optimization methods such as Newton-Raphson to maximize it.
The cost of evaluating is , which is tractable because .
We have to choose the number of representative points, , for applying the trace criterion. We have observed that gives a good result for most data sets, and we have used the trace criterion with throughout this paper unless explicitly stated otherwise.
Iii Evaluating the Trace Criterion
In this section, we will use different data sets evaluate the trace criterion and compare its performance with the modified mean  and the peak criteria . The evaluation is based on a series of data sets. Starting from simple two-dimensional connected data, we gradually moved toward more complex high-dimensional disconnected data. For our evaluation, we used both real-life and simulated data.
We observed that for two-dimensional connected data and for real-life data sets that are used in this evaluation, the performance of the modified mean and trace criteria is comparable. When data were two-dimensional and disconnected (meaning that data contained multiple disjoint clusters), the bandwidth value that was computed using the trace criterion provided much better data description than the modified mean criterion (see results in Table 4 and 5).
We also evaluated the trace criterion by using high-dimensional hyperspheres and hypercubes. The training data set consisted of 1 to 12 disjoint hyperspheres or hypercubes, and the dimension of the data varied between 5 to 40 in increments of 5. The scoring data used for evaluation consisted of observations from inside one or more hypercubes or one or more hyperspheres, and observations that are just outside the training data, in close proximity of the training data as seen in Figure 8, 10, 12 and 13. For this set of evaluations, we observed that the bandwidth value provided by the trace criterion produced a magnitude better performance than the modified mean criterion.
Iii-a Choice of Data Sets
We evaluate the trace criterion by using following four types of data sets:
Connected two-dimensional data: These are data that have two variables and no clusters.
Disconnected two-dimensional data: These are data that have two variables and two or more clusters.
Higher-dimensional data: These refer to real-life data that have more than two variables.
High-dimensional simulated data: These data consist of one or more hyperspheres or hypercubes.
Iv Comparison Using Two-Dimensional Connected Data
Iv-a Data Description
This section compares the trace criterion with the peak and modified mean criteria using two-dimensional connected data. Such data have no clusters. We use the star-shaped data and banana-shaped data, and we train an SVDD model by using the bandwidth value obtained by the peak, modified mean and trace criteria. To evaluate the results, we scored the bounding rectangle of the data by dividing it into a 200 200 grid. With a good bandwidth value, the inlier region that is obtained from scoring should match the geometry of the training data. Figures 2 and 3 display the results, along with the bandwidth value.
Iv-B Scoring Results
The scoring results indicate that the bandwidth value obtained using the modified mean and trace criteria provide good-quality data description. The value of bandwidth that is obtained using the modified mean criterion is comparable to the one obtained using the peak criterion.
|(a) Training Data||(b) Peak|
|(c) Modified Mean||(d) Trace|
|(a) Training Data||(b) Peak|
|(c) Modified Mean||(d) Trace|
V Comparison Using Two-Dimensional Disconnected Data
V-a Data Description
This section compares the trace criterion with the peak and modified mean criteria using two-dimensional data that lie across multiple clusters. We have observed that computing a good bandwidth value for such data is more difficult than for connected data. Since the data are two-dimensional, we can visually judge the quality of results. To evaluate the results, we scored the bounding rectangle of the data by dividing it into a 200 200 grid. With a good bandwidth value, the inlier region obtained from scoring should match the geometry of the training data. We use following two data sets for evaluations:
The refrigerant data which consist of four clusters 
A simulated “two-donut and a circle” data set, which consists of two donut-shaped clusters and one circular cluster
V-B Scoring Results
The scoring results indicate that the bandwidth value obtained using the peak and the trace criteria provides a data description of reasonably good quality, for both data sets. For refrigerant data, the bandwidth values that are computed by peak and trace criteria are close. The scoring results indicate that the data description that is obtained using these values is able to separate out all four clusters, whereas the description that is obtained using the modified mean criterion bandwidth value merges the two clusters that lie close to each other. As indicated in Figure 4, the bandwidth value that is obtained using the modified mean criterion is significantly larger than the one obtained using the trace and the peak criteria.
|(a) Training Data||(b) Peak|
|(c) Modified Mean||(d) Trace|
|(a) Training Data||(b) Peak|
|(c) Modified Mean||(d) Trace|
Vi Comparison Using High-Dimensional Data
Comparing the different criteria for high-dimensional data is
much more difficult than comparing them for two-dimensional
data. In two-dimensional data, the quality of the result can be
easily judged by looking at the plot of the scoring results.
But this is not possible for high-dimensional data. For the
purpose of evaluation, we selected labeled high-dimensional
data that have a dominant class. We used SVDD on a subset
of the dominant class to obtain a description of the dominant
class, and then we scored the rest of the data to evaluate
the criteria. We expected the points in the scoring data set that
correspond to the dominant class to be classified as inliers
and all other points to be classified as outliers. Because the
data are labeled, we could also use cross validation to determine
the bandwidth that best describes the dominant class in the
sense of maximizing a measure of fit, such as the score.
So in this section we compare the bandwidths that are suggested by
the different unsupervised criteria with the bandwidth that is obtained
through cross validation for various benchmark data sets. The
results are summarized in Table I . The benchmark data
sets used for the analysis are described in sections VI-A1 through
|Max||0.96 (17)||0.19 (17)|
|Peak||0.96 (14)||0.16 (8)|
|Modified Mean||0.96 (17.2)||0.181 (11.37)|
|Trace||0.958 (13.1)||0.181 (11.22)|
Vi-a Data Description
Vi-A1 Shuttle Data
This data set consists of measurements made on a shuttle.
The data set contains nine numeric attributes and one classification
attribute. Of 58,000 total observations, 80%
belong to class 1. A random sample of 2,000
observations belonging to class 1 was selected for training, and
the remaining 56,000 observations were used for scoring. This
data set is from the UC Irvine Machine Learning Repository.
Vi-A2 Tennessee Eastman Data
The data set was generated using the MATLAB simulation code, which provides a model of an industrial chemical process. The data were generated for normal operations of the process and 20 faulty processes. Each observation consists of 41 variables, out of which 22 were measured continuously every 6 seconds on average and the remaining 19 were sampled at a specified interval of every 0.1 or 0.25 hours. From the simulated data, we created an analysis data set that uses the normal operations data of the first 90 minutes and data that correspond to faults 1 through 20. A data set that contains observations of normal operations was used for training. Scoring was performed to determine whether the model could accurately classify an observation as belonging to normal operation of the process. The MATLAB simulation code is available at .
Vi-B Scoring Results
The results outlined in table I indicate that the measure values that were obtained using all three criteria are equivalent and the values are in the neighborhood of the best value that can be obtained by cross validation.
Vii Comparison Using Simulated Data
In this section, we present results of a simulation study that we conducted to compare performance of the trace criterion with the peak and modified mean criteria. The simulations were performed to generate training data with a known geometry. The data dimensions were varied between 2 and 10. We conducted three simulation studies. Table II provides details of these studies.
|Two-dimensional polygons||2||600||Polygons with varying number of vertices and length of sides|
|Hypercubes||5 to 40 in increments of 5||400||Single hypercube used for training|
|Hyperspheres||5 to 40 in increments of 5||400||Single hypersphere used for training|
|High-dimensional disconnected data with multiple hyperspheres||2 to 10||880||Each training data set contains multiple disjoint hyperspheres, ranging from 2 to 12.|
|High-dimensional disconnected data with multiple hypercubes||2 to 10||880||Each training data set contains multiple disjoint hypercubes, ranging from 2 to 12.|
Vii-a Evaluation Using Two-Dimensional Polygons
In this section, we measure the performance of the trace criterion when it is applied to randomly generated polygons. Given the number of vertices, , we generate the vertices of a randomly generated polygon in the anticlockwise sense as Here and for are the order statistics of an i.i.d sample that is uniformly drawn from The are uniformly drawn from an interval
For this simulation, we chose and and varied the number of vertices from to . We generated random polygons for each vertex size. Having determined a polygon, we randomly sampled points uniformly from the interior of the polygon and used trace criterion and this sample to determine a bandwidth value. Figure 6 shows two random polygons.
|(a) Number of Vertices = 5||(b) Number of Vertices = 25|
However, since we can easily determine whether a point lies in the interior of a polygon, we can also use cross validation to determine a good bandwidth value. To do so, we found the bounding rectangle of each of the polygons and divided it into a grid. We then labeled each point on this grid as an “inside” or an “outside” point. We then fit SVDD on the sampled data and scored the points on this grid for different values of and chose the value of that maximized the -measure.
The performance of a bandwidth selection criterion can be measured by the -measure ratio, which is defined as , where
is the -measure that is obtained when the value suggested by the trace criterion is used and
is the best possible value of -measure over all values of . A value close to 1 indicates
that a bandwidth selection criterion is competitive with cross validation. We have values of this ratio for each vertex size.
The box and whiskers plot in Figure 7 summarizes the simulation study results for the modified mean and trace criteria. The X-axis shows the number of vertices of the ploygon and Y-axis shows the
-measure ratio. The bottom and the top of the box shows the first and the third quartile values. The ends of the whiskers represent the minimum and the maximum values of the-measure ratio. The plot shows that the -measure ratio is greater than 0.9 across all values of number of vertices. Because the complexity of the ploygon increases as the number of vertices increases, we observed that the spread of the -measure ratio increased slightly. The fact that the -measure ratio is always greater than 0.9 provides necessary evidence that both the trace criterion and the modified mean criterion generalize across different training data sets.
Vii-B Evaluation Using Hyperspheres
In this section, we evaluate the trace criterion by using spherical data of varying dimensions. The observations in such spherical data (or hypersphere) are uniformly distributed. We use scoring to evaluate the quality of the data description that is obtained using the trace criterion bandwidth value. The scoring data set consists of 50%inlier observations, which are uniformly distributed inside the training sphere and 50% of outlier observations, which are uniformly distributed outside the sphere. The points outside the sphere lie in a narrow annular ring, just outside the sphere. Figure 8 illustrates two variables in the training and scoring data. The rationale behind creating such scoring data set is that if the bandwidth value is good, then the data description that is obtained using such a value should be able to discriminate between observations that are inside and observations that are just outside the sphere. We varied the hypershpere dimension from 5 to 40 in increments of 5. For each dimension, 50 sets of training and scoring data sets were simulated. We computed the measure for each data set to determine the quality of data description. Figure 9 shows a box-and-whiskers plot of the measure for various values of data dimension. The measure decreases as the number of variables (the data dimension) increases from 5 to 40. The measure is consistently above 0.9 for all simulated data sets across different dimensions, the measure value dropped rapidly with the increase in the hypersphere dimension for the modified mean criterion. This observation confirms that a bandwidth value that is obtained using the trace criterion provides a much better-quality data description than the modified mean criterion provides.
|(a) Training Data, #obs=5,000|
|(b) Scoring Data, #obs=10,000|
Vii-C Evaluation Using Hypercubes
In this section, we evaluate the trace criterion using cube-shaped data of varying dimensions. The observations in such cubic data (or hypercube) are uniformly distributed. We used scoring to evaluate the quality of data description that was obtained using the trace criterion bandwidth value. The scoring data set consists of 50% inlier observations, which are uniformly distributed inside the training cube and 50% outlier observations which are uniformly distributed outside the cube. The points outside the cube lie in a narrow frame, just outside the cube. Figure 10 illustrates the two variables in the training and scoring data. The rationale behind creating such a scoring data set is that if the bandwidth value is good, then the data description that is obtained using such a value should be able to discriminate between observations that are inside and observations that are just outside the cube. We varied the hypercube dimension from 5 to 40 in increments of 5. For each dimension, 50 sets of training and scoring data sets were simulated. We computed the measure for each data set to determine the quality of data description. Figure 11 shows a box-and-whiskers plot of the measure for various values of data dimension. The measure decreases as the number of variables (the data dimension) increases from 5 to 40. Although, the measure is consistently above 0.7 for all simulated data sets across different dimensions, the measure value dropped rapidly with the increase in the the hypercube dimension for the modified mean criterion. This observation confirms that bandwidth value that is obtained using the trace criterion provides a much better-quality data description than the modified mean criterion provides.
|(a) Training Data, #obs=5,000|
|(b) Scoring Data, #obs=10,000|
Vii-D Evaluation Using High-Dimensional Disconnected Data
In this section, we use high-dimensional disconnected data to evaluate the performance of the trace criteria. The training data consist of two or more disjoint hyperspheres or the hypercubes used earlier in sections VII-B and VII-C. The details of the simulation study are presented below.
Vii-D1 Evaluation using multiple hyperspheres
We evaluated the trace and modified mean criteria by using data sets that contain multiple numbers of spheres. The number of spheres in a data set was varied between 2 and 12, and the data dimension was varied between 5 and 40 in increments of 5. For each combination of data dimension and number of spheres, we used different seed values to generate 10 different sets of training and scoring data sets. Figure 12 illustrates sample training and scoring data sets that have five spheres and use two dimensional data.
|(a) Training Data|
|(b) Scoring Data|
For each simulation run, we computed the measure. Table LABEL:fig:sphereboxWhisker (a) to Table LABEL:fig:sphereboxWhisker (k) provide the box-and-whiskers plot of the measure for simulations results that were obtained using different number of hyperspheres. Each section of Table LABEL:fig:sphereboxWhisker provides measure for different values of data dimension. As seen in Table LABEL:fig:sphereboxWhisker (a) to Table LABEL:fig:sphereboxWhisker (d), show that when the number of hyperspheres is between 2 and 4, both the modified mean and then the trace criteria provide values above 0.9, with the modified mean criterion, performing slightly better than the trace criterion. But as the number of hyperspheres increases from 5 to 12, Table LABEL:fig:sphereboxWhisker (e) to Table LABEL:fig:sphereboxWhisker(k) indicate that the performance of the modified mean criterion drops compared to the trace criterion. The trace criteria consistently provides a measure higher than 0.9, whereas the measure obtained using the modified mean criterion is in the range of 0.7 to 0.8, depending on the number of hyperspheres.
|(a) Number of hyperspheres = 2||(b) Number of hyperspheres = 3|
|(c) Number of hyperspheres = 4||(d) Number of hyperspheres = 5|
|(e) Number of hyperspheres = 6||(f) Number of hyperspheres = 7|
|(g) Number of hyperspheres = 8||(h) Number of hyperspheres = 9|
|(i) Number of hyperspheres = 10||(j) Number of hyperspheres = 11|
|(k) Number of hyperspheres = 12|
Vii-D2 Evaluation Using Multiple Hypercubes
We evaluated the trace and modified mean criteria by using data sets that contain multiple numbers of cubes. The number of cubes in a data set was varied between 2 and 12, and the data dimension was varied between 5 and 40 in increments of 5. For each combination of data dimension and number of cubes, we generated 10 different sets of training and scoring data sets by different seed value. Figure 13 illustrates sample training and scoring data set that have five cubes and use two-dimensional data.
|(a) Training Data|
|(b) Scoring Data|
For each simulation run, we computed the measure. Table LABEL:fig:cubeboxWhisker(a) to Table LABEL:fig:cubeboxWhisker(k) provide the box-and-whiskers plot of the measure for simulation results that were obtained using different number of hypercubes. Each section of Table LABEL:fig:cubeboxWhisker provides measure for different values of data dimension. Table LABEL:fig:cubeboxWhisker(a) to Table LABEL:fig:cubeboxWhisker(c), show that when the number of hypercubes is between 2 and 4, both the modified mean and the trace criteria provide values above 0.9, with the modified mean criterion, performing slightly better than the trace criterion. But as the number of hypercubes increases from 5 to 12, Table LABEL:fig:cubeboxWhisker(d) to Table LABEL:fig:cubeboxWhisker(k) indicate that the performance of the modified mean criterion drops compared to the trace criterion. The trace criterion consistently provides an measure higher than 0.9, whereas the measure that is obtained using the modified mean criterion is in the range of 0.7 to 0.8, depending on the number of hypercubes.
|(a) Number of hypercubes = 2||(b) Number of hypercubes = 3|
|(c) Number of hypercubes = 4||(d) Number of hypercubes = 5|
|(e) Number of hypercubes = 6||(f) Number of hypercubes = 7|
|(g) Number of hypercubes = 8||(h) Number of hypercubes = 9|
|(i) Number of hypercubes = 10||(j) Number of hypercubes = 11|
|(k) Number of hypercubes = 12|
The trace criterion for computing the bandwidth value of a Gaussian kernel for SVDD, as proposed in this paper, exploits the low-rank representation of the kernel matrix to suggest a bandwidth value. Several evaluations that use synthetic and real-life data sets indicate that the bandwidth value that is obtained using the trace criterion provides similar or better results compared to existing methods. The trace criterion method provides good values when data are high-dimensional and disjoint.
Authors would like to thank Anne Baxter, Principal Technical Editor at SAS, for her assistance in creating this manuscript.
-  Aggarwal, C. C. Data mining: the textbook. Springer, 2015.
-  Camci, F., and Chinnam, R. B. General support vector representation machine for one-class classification of non-stationary classes. Pattern Recognition 41, 10 (2008), 3021–3034.
-  Heckert, N. A., and Filliben, J. J. NIST handbook 148: DATAPLOT reference manual, Volume I: Commands. http://www.itl.nist.gov/div898/software/dataplot/, 2000. [Online; accessed 4-August-2017].
-  Kakde, D., Chaudhuri, A., Kong, S., Jahja, M., Jiang, H., and Silva, J. Peak Criterion for Choosing Gaussian Kernel Bandwidth in Support Vector Data Description. In 2017 IEEE International Conference on Prognostics and Health Management (ICPHM) (PHM2017) (2017). [Online preprint.] Available: https://arxiv.org/abs/1602.05257.
-  Liao, Y., Kakde, D., Chaudhuri, A., Jiang, H., Sadek, C., and Kong, S. A new bandwidth selection criterion for using svdd to analyze hyperspectral data. In Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XXIV (2018), vol. 10644, International Society for Optics and Photonics, p. 106441M. [Online preprint.] Available: https://arxiv.org/abs/1803.03328.
Magnus, J. R., and Neudecker, H.
Matrix differential calculus with applications in statistics and
Wiley series in probability and mathematical statistics(1988).
-  Peredriy, S., Kakde, D., and Chaudhuri, A. Kernel bandwidth selection for SVDD: The sampling peak criterion method for large data. In 2017 IEEE International Conference on Big Data (Big Data) (Dec 2017), pp. 3540–3549.
-  Sanchez-Hernandez, C., Boyd, D. S., and Foody, G. M. One-class classification for mapping a specific land-cover class: SVDD classification of fenland. Geoscience and Remote Sensing, IEEE Transactions on 45, 4 (2007), 1061–1073.
Schölkopf, B., Williamson, R. C., Smola, A. J., Shawe-Taylor, J., and
Platt, J. C.
Support vector method for novelty detection.In Advances in neural information processing systems (2000), pp. 582–588.
-  Sukchotrat, T., Kim, S. B., and Tsung, F. One-class classification-based control charts for multivariate process monitoring. IIE transactions 42, 2 (2009), 107–120.
-  Tax, D. M., and Duin, R. P. Support vector data description. Machine learning 54, 1 (2004), 45–66.
-  Widodo, A., and Yang, B.-S. Support vector machine in machine condition monitoring and fault diagnosis. Mechanical Systems and Signal Processing 21, 6 (2007), 2560–2574.
-  Williams, C. K. I., and Seeger, M. Using the nyström method to speed up kernel machines. In Advances in Neural Information Processing Systems 13, T. K. Leen, T. G. Dietterich, and V. Tresp, Eds. MIT Press, 2001, pp. 682–688.
Ypma, A., Tax, D. M., and Duin, R. P.
Robust machine fault detection with independent component analysis and support vector data description.In Neural Networks for Signal Processing IX, 1999. Proceedings of the 1999 IEEE Signal Processing Society Workshop. (1999), IEEE, pp. 67–76.
-  Zhang, K., Tsang, I. W., and Kwok, J. T. Improved nyström low-rank approximation and error analysis. In Proceedings of the 25th international conference on Machine learning (2008), ACM, pp. 1232–1239.