1 Introduction
Hyperspectral remote sensing has been an active research area for the past two decades[Bioucas2013]. Varying research has been done to extract useful information from hyperspectral imaging data, which are collected from airborne or spaceborne sensors. Hypserspectral imaging data has applications in different areas such as resource management, agriculture, astronomy, mineral exploration, food inspection, and environmental monitoring[Bioucas2013, Mehl2004, Feng2012, Elmasry2012, Dale2013, Wang2007, Kruse2003].
Identifying the contents of each pixel in 3D hyperspectral imaging data has been a challenging problem, and various classification techniques have been studied and applied to hyperspectral data[Healley1999, Walter2003]
. Support vector machines (SVM) are popular classifiers because they are robust when training data samples are limited. Labeling hyperspectral data is difficult because of the complexity and heterogeneity of the geographic areas that are covered by the sensors, so the number of available labeled data samples is always limited. This limitation makes SVMs attractive in the field of hyperspectral imaging data processing
[Mari2007, Mari2010, Khazai2012, Dai2015]. Researchers have shown that a oneclass SVM classifier can perform better than a multiclass SVM classifier if only one class is of interest in a multiclass problem[Mari2007].One of the wellknown algorithms for oneclass classification is support vector data description (SVDD), which was first introduced in Ref. tax2004support. It can be shown that SVDD formulation is equivalent to oneclass SVM classification under certain conditions [institute2017sas]. SVDD is used in domains where the majority of data belong to a single class, or when one of the classes is significantly undersampled. The SVDD algorithm builds a flexible boundary around the target class data; this data boundary is characterized by observations that are designated as support vectors. Applications of SVDD include machine condition monitoring [widodo2007support, ypma1999robust], image classification [sanchez2007one], and multivariate process control [sukchotrat2009one, kakde2017non]
. SVDD has the advantage that no assumptions about the distribution of the data need to be made. The technique can describe the shape of the target class without prior knowledge of the specific data distribution, with observations that fall outside the data boundary flagged as potential outliers.
2 Mathematical Formulation of SVDD
Normal Data Description
The most elemental form of SVDD is a normal data description. The SVDD model for normal data description builds a minimumradius hypersphere, which is characterized by center and radius , around the data. SVDD model minimizes the volume of the sphere by minimizing and requires that the sphere contain all the training data[tax2004support]. SVDD formulation can be expressed in either of the following forms:
Primal Form
Objective function:
(1) 
subject to:
(2)  
(3) 
where:
represents the training data,
is the radius and represents the decision variable,
is the slack for each variable,
is the center,
is the penalty constant that controls the tradeoff between the volume and the errors, and
is the expected outlier fraction.
Dual Form
The dual form is obtained using the Lagrange multipliers.
Objective function:
(4) 
subject to:
(5)  
(6) 
where:
are the Lagrange constants and
is the penalty constant.
Duality Information
Depending upon the position of the observation, the following results hold:
Center position:
(7) 
Inside position:
(8) 
Boundary position:
(9) 
Outside position:
(10) 
The circular data boundary can include a significant amount of space in which training observations are very sparsely distributed. Scoring with a model that has a circular data boundary can increase the probability of false positives. Hence, instead of a circular shape, a compact bounded outline around the data is often desired. Such an outline should approximate the shape of the singleclass training data and is possible with the use of kernel functions.
Flexible Data Description
The support vector data description is made flexible by replacing the inner product with a suitable kernel function . This paper uses a Gaussian kernel function which is defined as
(11) 
where is the Gaussian bandwidth parameter.
Results 7 through 10 hold when the kernel function is used in the mathematical formulation.
The threshold is calculated as
(12) 
using any , where is the set of support vectors for which .
Scoring
For each observation in the scoring data set, the distance is calculated as follows:
(13) 
Observations in the scoring data set for which are designated as outliers.
3 SVDD Bandwidth Selection
Using a kernel function in the SVDD formulation, as outlined in section 2, is desirable for obtaining a flexible boundary around the training data set. Such a boundary adheres to the essential geometric features of the data and minimizes the misclassification rate. The Gaussian kernel function is the most popular kernel function in SVDD and SVM. The Gaussian kernel function defined in Eq.11 has one tuning parameter, the bandwidth parameter . The bandwidth parameter needs to be set before an SVDD model is trained. This section outlines the importance of selecting a good bandwidth value and introduces methods to select such a bandwidth value.
3.1 Importance of Bandwidth Selection
The flexible data description is preferred when a data boundary
needs to closely follow the shape of data. The tightness of the boundary is a function of
the number of support vectors. For a Gaussian kernel, it is observed
that if the value of the outlier fraction is kept constant, the number of
support vectors that are identified by the SVDD algorithm is a function of the Gaussian
bandwidth . At a very low value of , the number of support vectors is
large and approaches the number of observations. As the value of
increases, the number of support vectors is reduced. It is also observed
that the data boundary is extremely wiggly at lower values of .
As increases, the data boundary becomes less wiggly
and starts to follow the shape of the data.
Because SVDD is an unsupervised technique, cross validation cannot be used to determine an appropriate value of . There are several methods for setting an appropriate kernel bandwidth value. Some of the unsupervised methods include the VAR criterion method[Khazai2012], the mean criterion method[chau8215749], the peak criterion method[kakde2017peak, pered8258344], the method of coefficient of variation (CV) [evangelista2007some], the method of maximum distance (MD) [khazai2011anomaly], and the method of distance to the farthest neighbor (DFN) [xiao2014two]. It has been shown on simulated data that the peak criterion method achieves better classification performance than the MD, CV, and DFN methods[kakde2017peak]. The following sections provide more information about the VAR, mean, and peak criterion methods.
3.2 VAR Criterion Method
Khazai et al. have proposed a simple SVDD kernel bandwidth selection criterion for hyperspectral data processing: the square root of the sum of the variances of all data variables
[Khazai2012]. Given variables, the selected kernel bandwidth is defined as(14) 
where is the variance of the variable of the data.
3.3 Mean Criterion Method
The mean criterion [chau8215749] also provides a closedform expression to obtain the bandwidth value . The mean criterion method uses the fact that when the bandwidth value is close to (), the kernel function that uses any two observations and evaluates to 0 when or to 1 when . Therefore, when is close to 0, if the training data set contains observations, then the kernel matrix of
entries is an identity matrix. Hence, any selected bandwidth value should be large enough to be able to distinguish the kernel matrix from the identity matrix. The mean criterion provides the value of
as(15) 
where is the number of training samples, is the number of dimensions of the training data, is the data variance in each dimension, and is a tolerance factor that indicates distance from the identity matrix. Larger values of ensure greater distance from the identity matrix.
The mean criterion method is implemented in the SVDD procedure in SAS Visual Data Mining and Machine Learning [institute2017sas].
3.4 Peak Criterion Method
The peak criterion [kakde2017peak, pered8258344] method requires first solving an SVDD training problem by using different values of bandwidth . It recommends the value of for which the second derivative of the optimal dual objective function value with respect to first reaches 0. The experimentation results presented in Ref kakde2017peak, pered8258344 indicate that the peak criterion provides a good value for obtaining the training data description.
3.5 Modified Mean Criterion Method
Using the peak criterion method to select the proper kernel bandwidth, SVDD usually can obtain a good data boundary that closely follows the training data shape [kakde2017peak, pered8258344]. But the disadvantage of the peak criterion method is that it takes long time to obtain the desired kernel bandwidth because it has to generate the objective function curve by varying the choices of kernel bandwidth, usually a couple of hundred times for a smooth curve.
This paper proposes a new automatic, unsupervised Gaussian kernel bandwidth selection approach, which can perform nearly as well as the peak criterion method while being as timeefficient as the mean criterion method.
For the kernel bandwidth of mean criterion method (defined in Eq.15), and a specific data set, the variance of the data and the number of training samples is fixed. So can be rewritten as a function of ,
(16) 
where is a function of the number of observations in the training data set and the tolerance factor , and is expressed as:
(17) 
For a training data set that has a fixed , differentiating with respect to results in the following:
(18) 
For this paper, experiments were conducted on several data sets that have different numbers of variables and different numbers of observations . The experiments revealed that the kernel bandwidth value that provides a good classification performance usually happens when is close to . This observation is formalized into the following criterion to select a kernel bandwidth for SVDD:
(19) 
This criterion is equivalent to the following:
(20) 
Obtaining the desired kernel bandwidth with the new selection criterion involves three steps:

Use fixedpoint iteration [Burden1985] to obtain the value of for a fixed value of by setting
(22) 
Repeat steps 1 and 2 for different values of , where is the number of observations in the training data set. For a majority of values, convergence was obtained in three to four iterations. Empirically, it is observed that the value of is approximately polynomial in with a mean squared error of 7.02E11 and can be expressed as
(23) Figure 1 shows the relationship between and . Figure 2 shows the relationship between and . For a given data set that the number of observations is known, the corresponding can be obtained easily by using the vs. curve.

After the value of for a particular value of is obtained, compute the kernel bandwidth as follows:
4 Data Experiments
4.1 Data Description
To evaluate the performance of the new kernel bandwidth selection method, the SVDD classifier was applied to three commonly used hyperspectral data sets: Botswana, Kennedy Space Center (KSC), and Indian Pines[PURR1947]. Table 1 summarizes the main characteristics of these data sets. Table 2 lists all the classes in each data set and the number of groundtruthed samples available for training and testing.
Data Set  Botswana  KSC  Indian Pines 

Sensor Type  Hyperion  AVIRIS  AVIRIS 
Spatial Resolution  30 m  18 m  20 m 
Image Size  1476256  512614  145145 
# of Spectral Bands  145  176  200 
# of Classes  14  13  16 
Bostwana  KSC  Indian Pines  

Class #  Class Name  # of Samples  Class Name  # of Samples  Class Name  # of Samples 
1  Water  270  Scrub  761  Alfalfa  46 
2  Hippo Grass  101  Willow swamp  243  Cornnotill  1428 
3  Floodplain grasses 1  251  Cabbage palm hammock  256  Cornmintill  830 
4  Floodplain grasses 2  215  Cabbage palm / oak hammock  252  Corn  237 
5  Reeds  269  Slash pine  161  Grasspasture  483 
6  Riparian  269  Oak/broadleaf hammock  229  Grasstrees  730 
7  Firescar  259  Hardwood swamp  105  Grasspasturemowed  28 
8  Island interior  203  Spartina marsh  431  Haywindrowed  478 
9  Acacia woodlands  314  Spartina marsh  520  Oats  20 
10  Acacia shrublands  248  Cattail marsh  404  Soybeannotill  972 
11  Acacia grasslands  305  Salt marsh  419  Soybeanmintill  2455 
12  Short mopane  181  Mud flats  503  Soybeanclean  593 
13  Mixed mopane  268  Water  927  Wheat  205 
14  Exposed soils  95  Woods  1265  
15  Buildinggrasstreesdrives  386  
16  Stonesteeltowers  93 
4.2 Evaluation Process
The evaluation process consists of three steps: data training, data testing, and performance evaluation. The following data preprocessing steps were required before the SVDD approach was applied:

A special preprocessing step was applied to the KSC data set. Some pixels have saturated values at certain spectral bands; that is, some data values are greater than 65,500 whereas the normal data range is [0, 1244]. These saturated data values were corrected by substituting 0 for them.

Each data set was normalized with the maximum data value in the set, making the data range always [Khazai2012].
4.2.1 Training and Testing
SVDD is a oneclass classifier. In order to solve the multiclass classification problem for hyperspectral data, the same fusion scheme as in Ref. Khazai2012 was used. For each class, an SVDD classifier was trained by using 30% of the available samples, randomly selected. The remaining 70% was reserved for testing. Assuming that there are classes, each test sample is evaluated against each trained class to obtained its distance to the class’s hypersphere center, where . A class label is assigned to the test sample on the basis of the following fusion rule[Khazai2012]:

If is within the hypersphere radius of only one class, then the label of this class is assigned to the test sample.

If is within the hypersphere radius of more than one class or no classes, the class to be assigned is decided by the following criterion, where is the radius of the hypersphere for class :
(24)
The preceding decision rule is illustrated in Fig. 3. In this twoclass classification example, a test sample ’s distance to Class ’s hypersphere center, , is the same as its distance to Class ’s hypersphere center, . Because is less than , the relative distance is greater than , so the test sample is labeled as Class .
4.2.2 Evaluation
The classification performance was evaluated on four different SVDD kernel bandwidth selection methods that use the VAR criterion[Khazai2012], the mean criterion[chau8215749], the peak criterion[kakde2017peak, pered8258344], and the new modified mean criterion.
For every data set, the training and testing experiments were carried out five times, each with a different randomly selected subset (30%) for training and the rest (70%) for testing. The classification performance was evaluated using the overall accuracy (OA)[Khazai2012], which is defined as the percentage of pixels that are correctly labeled.
4.2.3 Results
Table 3 through Table 6 show the evaluation results for each hyperspectral data set. Exp through Exp represents each experiment, and the last row shows the average overall accuracy of the five experiments.
From Table 4 (results on the raw KSC data) and Table 5 (results on the corrected KSC data), you can see that the preprocessing step, which replaces the saturated data values with 0, has significantly improved the data classification performance.
The classification performance results demonstrate that the new modified mean criterion performed uniformly better than other bandwidth selection methods for Botswana, corrected KSC, and Indiana Pine data sets. Because the new method has a closedform formula of the kernel bandwidth, its timeefficiency is equivalent to that of the VAR and mean criterion methods. The superiority in performance and speed presents a lot of potential for using the new method for other hyperspectral image data processing.
Method  VAR  Mean  Peak  Modified Mean 
Exp  84.91  80.60  87.42  89.88 
Exp  84.87  79.01  86.90  87.02 
Exp  85.00  80.91  89.09  88.91 
Exp  84.43  81.48  88.87  86.19 
Exp  83.55  79.10  85.88  86.05 
Average  84.55  80.22  87.63  87.61 
Method  VAR  Mean  Peak  Modified Mean 
Exp  46.12  49.03  49.88  49.36 
Exp  35.45  33.78  28.49  33.34 
Exp  21.94  36.47  35.56  36.41 
Exp  66.03  66.41  54.13  62.22 
Exp  58.29  60.52  82.64  60.57 
Average  45.57  49.24  50.14  48.38 
Method  VAR  Mean  Peak  Modified Mean 
Exp  66.03  83.58  80.42  85.00 
Exp  68.08  83.14  79.35  84.10 
Exp  66.03  84.15  79.19  86.04 
Exp  72.00  83.91  81.33  85.30 
Exp  69.92  80.89  79.52  82.75 
Average  68.41  83.13  79.96  84.64 
Method  VAR  Mean  Peak  Modified Mean 
Exp  38.25  54.97  49.27  57.42 
Exp  36.47  54.38  50.08  57.87 
Exp  41.78  55.35  51.89  57.26 
Exp  33.46  53.44  47.00  56.85 
Exp  41.17  46.90  42.81  51.36 
Average  38.23  53.01  48.21  56.15 
Of the three hyperspectral test data sets—Bostswana, KSC (corrected data), and Indian Pines—the Indian Pine set has the lowest overall accuracy. The classification performance was further analyzed by computing the accuracy of each class and is shown in Table 7. For classes that contain very few labeled samples (Alfalfa, Grasspasturemowed, and Oats), there were only training samples per class (which obviously is not enough to characterize the class), and the trained classifier is not able to identify test samples well. The second type of difficulty is in classes that are very similar to each other (for example, Cornmintill and Core; and Soybeannotill, Soybeanmintill, and Soybeanclean). Given the similar spectral radiance of these materials, misclassification is significant between these classes, and thus has a lower overall accuracy.
Class #  Class Name  # of Samples  Exp  Exp  Exp  Exp  Exp  Average 

1  Alfalfa  46  6.25  12.50  6.25  6.25  9.38  8.13 
2  Cornnotill  1428  45.50  54.10  46.40  55.50  42.90  48.88 
3  Cornmintill  830  20.31  37.52  13.77  38.04  14.46  24.82 
4  Corn  237  68.07  53.61  72.89  72.29  68.67  67.11 
5  Grasspasture  483  78.99  83.43  78.99  82.84  68.34  78.52 
6  Grasstrees  730  72.21  54.99  51.86  50.10  60.47  57.93 
7  Grasspasturemowed  28  0  0  30.00  0  10.00  8.00 
8  Haywindrowed  478  96.72  97.61  97.91  98.21  98.21  97.73 
9  Oats  20  0  0  0  7.14  0  1.43 
10  Soybeannotill  972  32.65  38.38  30.88  27.21  26.76  31.18 
11  Soybeanmintill  2455  46.57  43.42  55.41  41.85  32.89  44.03 
12  Soybeanclean  593  78.31  75.42  76.14  68.67  81.93  76.10 
13  Wheat  205  41.96  14.69  23.78  38.46  23.08  28.39 
14  Woods  1265  94.46  92.77  93.56  93.33  90.17  92.86 
15  Buildinggrasstreesdrives  386  68.52  72.96  69.26  74.81  80.37  73.19 
16  Stonesteeltowers  93  66.15  76.92  72.31  64.62  73.85  70.77 
5 Conclusion
This paper proposes a new automatic, unsupervised Gaussian kernel bandwidth selection method for SVDD and applies it to hyperspectral imaging data classification. This method has a closedform formula for kernel bandwidth calculation. Experiments have shown that the new method outperforms other commonly used SVDD kernel bandwidth selection methods (VAR criterion, mean criterion, and peak criterion) on three benchmark hyperspectral data sets. Experiments with other simulated highdimensional data also show the robustness of this method when the data dimension increases. Research will be extended to apply the new approach on more highdimensional data processing and also to look into the physical interpretation of this method.
Comments
There are no comments yet.