On Hyperspectral Classification in the Compressed Domain

08/02/2015 ∙ by Mohammad Aghagolzadeh, et al. ∙ 0

In this paper, we study the problem of hyperspectral pixel classification based on the recently proposed architectures for compressive whisk-broom hyperspectral imagers without the need to reconstruct the complete data cube. A clear advantage of classification in the compressed domain is its suitability for real-time on-site processing of the sensed data. Moreover, it is assumed that the training process also takes place in the compressed domain, thus, isolating the classification unit from the recovery unit at the receiver's side. We show that, perhaps surprisingly, using distinct measurement matrices for different pixels results in more accuracy of the learned classifier and consistent classification performance, supporting the role of information diversity in learning.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Recently, there has been a surge toward compressive architectures for hyperspectral imaging and remote sensing [1]. This is mainly due to the increasing amount of hyperspectral data that is being collected by high-resolution airborne imagers such as NASA’s AVIRIS111http://aviris.jpl.nasa.gov and the fact that a large portion of data is discarded during compression or during feature mining prior to learning [2]. It has been noted in [3] that many of the proposed compressive architectures are based on the spatial mixture of pixels across each frame and correspond to physically costly or impractical operations while most existing airborne hyperspectral imagers employ scanning methods to acquire a pixel or a line of pixels at a time. To address this issue, practical designs of compressive whisk-broom and push-broom cameras were suggested in [3]. In this work, we tackle the problem of hyperspectral pixel classification based on compressive whisk-broom sensors; i.e. each pixel is measured at a time using an individual random measurement matrix. Extension of the presented analysis for the compressive push-broom cameras is straightforward.

To set this work apart from existing efforts that have also focused on the problem of classification from the compressive hyperspectral data, such as [4], we must mention two issues with the typical indirect approach of applying the classification algorithms to the recovered data: ) the sensed data cannot be decoded at the sender’s side (airborne device) due to the heavy computational cost of compressive recovery, making on-site classification infeasible, ) the number of measurements (per pixel) may not be sufficient for a reliable signal recovery. It has been established that classification in the compressed domain would succeed with far less number of random measurements than it is required for a full data recovery [5]. However, the compressive framework of [5] corresponds to using a fixed projection matrix for all pixels which limits the measurement diversity that has been promoted by several recent studies for data recovery and learning [6, 7, 8].

Rather than devising new classification algorithms, this work is focused on studying the relationship between the camera’s sensing mechanism, namely the employed random measurement matrix, and the common Support Vector Machine (SVM) classifier. It must be emphasized that the general problem of classification based on compressive measurements has been addressed for the case where a fixed measurement matrix is used

[9, 5]. However, our aim is to study the impact of measurement diversity on the learned classifier. In particular, we investigate two different sensing mechanisms that were introduced in [3] 222For more details regarding the physical implementation of compressive whisk-broom sensors, we refer the reader to [3] which illustrates conceptual schematics of whisk-broom and push-broom cameras.:

  • FCA-based sensor: A Fixed Coded Aperture (FCA) is used to modulate the dispersed light before it is collected at the linear sensor array. This case corresponds to using a fixed measurement matrix for each pixel and a low-cost alternative to the DMD system below.

  • DMD-based sensor: A Digital Micromirror Device (DMD) is used to modulate the incoming light according to an arbitrary pattern that is changed for each measurement. Unlike the previous case, DMD adds the option of sensing each pixel using a different measurement matrix. Both cases are illustrated in Figure 1.

Complete data FCA-sensed data DMD-sensed data
Fig. 1: FCA-based versus DMD-based sensing. Here, rows represent pixels and columns represent spectral bands.

SVM has been shown to be a suitable classifier for hyperspectral data [2]

. Specifically, we employ an efficient linear SVM classifier with the exponential loss function that gives a smooth approximation to the hinge-loss. To train the classifier in the compressed domain, we must

sketch the SVM loss function using the acquired measurements for which we employ some of the techniques developed in [9]. Furthermore, given that the sketched loss function gives a close approximation to the true loss function and that the learning objective function is smooth, it is expected that the learned classifier is close to the ground-truth classifier based on the complete hyperspectral data (which is unknown). As it has been discussed in [10], recovery of the classifier is of independent importance in some applications.

This paper is organized as follows. In the Section II we present the learning algorithm that gets the compressive measurements as input and produces a linear pixel classifier in the signal domain. Section III contains the simulation results and their analysis. We conclude the paper in Section IV.

Ii Problem Formulation and the Proposed Framework

Ii-a Overview of SVM for spectral pixel classification

In a supervised hyperspectral classification task, a subset of pixels are labeled by a specialist who may have access to the side information about the imaged field such as being physically present at the field for measurement. The task of learning is then to employ the labeled samples for tuning the parameters of the classification machine to predict the pixel labels for a field with similar material compositions. Note that, for subpixel targets, an extra stage of spectral unmixing is required to separate different signal sources involved in generating a pixel’s spectrum [14]. For simplicity, we assume that the pixels are homogeneous (consist of single objects).

Recall that most classifiers are inherently composed of binary decision rules. Specifically, in multi-categorical classification, multiple binary classifiers are trained according to either One-Against-All (OAA) or One-Against-One (OAO) schemes and voting techniques are employed to combine the results [15]

. In a OAA-SVM classification problem, a decision hyperplane is computed between each class and the rest of the training data, while in a OAO scheme, a hyperplane is learned between each pair of classes. As a consequence, most studies focus on the canonical binary classification. Similarly in here, our analysis is presented for the binary classification problem which can be extended to multi-categorical classification.

In the linear SVM classification problem, we are given a set of training data points (corresponding to hyperspectral pixels) for and the associated labels . The inferred class label for is that depends on the classifier and the bias term . The classifier is the normal vector to the affine hyperplane that divides the training data in accordance with their labels. When the training classes are inseparable by an affine hyperplane, maximum-margin soft-margin SVM is used which relies on a loss function to penalize the amount of misfit. For example, a widely used loss function is with . For , this loss function is known as the hinge loss, and for , it is called the squared hinge loss or simply the quadratic loss. The optimization problem for soft-margin SVM becomes333Discussion: Similar results can be obtained using the dual form. Recent works have shown that advantages of the dual form can be obtained in the primal as well [16]. As noted in [16], the primal form convergences faster to the optimal parameters than the dual form. For the purposes of this work, it is more convenient to work with the primal form of SVM although the analysis can be properly extended to the dual form.

(1)

In this paper, we use the smooth exponential loss function, which can be used to approximate the hinge loss while retaining its margin-maximization properties [11]:

(2)

where controls the smoothness. We use .

Ii-B SVM in the compressed domain

Let denote the low-dimensional measurement vector for pixel where is size of the photosensor array in the compressive whisk-broom camera [3]. As explained in [12], a DMD architecture can be used to produce a with random entries in the range or random

entries, resulting in a sub-Gaussian measurement matrix that satisfies the isometry conditions with a high probability

[13]. Recall that the measurement matrix is fixed in a FCA-based architecture while it can be distinct for each pixel in a DMD-based architecture.

As noted in [9], the orthogonal projection onto the row-space of can be computed as

. Consequently, an (unbiased) estimator for the inner product

(assuming a fixed and ) based on the compressive measurements would be . As a result, the soft-margin SVM based on the compressive measurements can be expressed as:

(3)

(we have omitted the bias term for simplicity).

We must note that the formulation in (3) is different from what was suggested in [5] for a fixed measurement matrix. In particular, we solve for in the -dimensional space. Meanwhile, the methodology in [5] would result in the following optimization problem:

(4)

which solves for in the low-dimensional column-space of . Also note that, in the case of fixed measurement matrices, (3) and (4) correspond to the same problem with the relationship (because of the regularization term which zeros the components of which lie in the null-space of ). In other words, (3) represents a generalization of (4) for the case when the measurement matrices are not necessarily the same. This allows us to compare the two cases of ) having a fixed measurement matrix and ) having a distinct measurement matrix for each pixel, which is the subject of this paper. For simplicity, assume that each consists of a subset of rows from a random orthonormal matrix, or equivalently ; thus, . Also assume that, in the case of DMD-based sensing, each is generated independently of the other measurement matrices.

Following the recent line of work in the area of randomized optimization, for example [19], we refer to the new loss as the sketch of the loss, or simply the sketched loss to distinguish it from the true loss . Similarly, we refer to as the sketched classifier as opposed to the ground-truth classifier .

FCA-sensed data DMD-sensed data
Fig. 2: Linear SVM classification —depicted for for illustration. Small arrows represent each .

Figure 2 depicts the two cases of using a fixed measurement matrix (FCA-sensed data) and distinct measurement matrices (DMD-sensed data) for training a linear classifier. It is helpful to imagine that, in the sketched problem, each is multiplied with (the projection of onto the column-space of ) since . As shown in Figure 2 (left) with for all , there is a possibility that would nearly align with the null-space of the random low-rank matrix . For such , any vector may not well discriminate between the two classes and ultimately result in the classification failure. Figure 2 (right) depicts the case when a distinct measurement is used for each point. When is symmetrically distributed in the space and is large, there is always a bunch of ’s that nearly align with whereas other ’s can be nearly orthogonal to or somewhere between the two extremes. This intuitive example hints about how measurement diversity pays off by making the optimization process more stable with respect to the variations in the random measurements and the separating hyperplane.

Iii Simulations

Iii-a Handling the bias term

It is not difficult to see that employing a distinct for each data vector necessitates having distinct values of bias (for each ). Note that in the case of fixed measurement matrix, i.e. when for all , bias terms would be all the same and linear SVM works normally as noted in [5]. However, using a customized bias term for each point would clearly result in overfitting and the learned would be of no practical value. Furthermore, the classifier cannot be used for prediction since the bias is unknown for the new input samples. In the following, we address these issues.

First, let denote a set of distinct measurement matrices, i.e. . Instead of using an arbitrary measurement matrix for each pixel, we draw an entry from for each pixel. Given that , each element of is expected be utilized for more than once. This allows us to learn the bias for each outcome of measurement matrix (without the overfitting issue). Note that signifies the degree of measurement diversity: refers to the least diversity, i.e using a fixed measurement matrix, and measurement diversity is increased with . The new optimization problem becomes:

(5)

where randomly (uniformly) maps each to an element of . The overfitting issue can now be restrained by tuning ; reducing results in less overfitting. In our simulations, we use to ensure that spans with a probability close to one.

For prediction, the corresponding bias term is selected from the set .

Iii-B Results

The dataset used in this section is the well-known Pavia University dataset [18] which is available with the ground-truth labels444http://www.ehu.eus/ccwintco/555The Indian Pines dataset was not included due to the small size of the image which is not sufficient for a large-scale cross-validation study.. For each experiment, we perform a 2-fold cross-validation with training and testing samples. As discussed earlier, multi-categorical SVM classification algorithms typically rely on pair-wise or One-Against-One (OAO) classification results. Hence, we evaluate the sketched classifier on a OAO basis by reporting the pair-wise performances in a table . Finally, since the measurement operator is random and subject to variation in each experiment, we repeat each experiment for times and perform a worst-case analysis of the results.

Consider the case where a single measurement is made from each pixel, i.e. and is a random vector in the -dimensional spectral space. Clearly, this case represents an extreme scenario where the signal recovery would not be reliable and classification in the compressed domain becomes crucial, even at the receiver’s side where the computational cost is not of greatest concern. For performance evaluation, we are interested in two aspects: () the prediction accuracy over the test dataset, () the recovery accuracy of the classifier (with respect to the ground-truth classifier) —whose importance has been discussed in [10].

We define the classification accuracy as the minimum (worst) of the True Positive Rate (sensitivity) and the True Negative Rate (specificity). Figure 3 shows an instance of the distribution of the classification accuracy for a pair of classes over

random trials. As it can be seen, in the presence of measurement diversity, classification results are more consistent (reflected in the low variance of accuracy). Due to the limited space, we only report the worst-case OAO accuracies (i.e. the minimum pair-wise accuracies among

trials) for the Pavia scene. The results for the case of one-measurement-per-pixel () are shown in Tables I and II. Similarly, the results for the case of (which is equivalent to the sampling rate of a typical RGB color camera) are shown in Tables III and IV. Note that the employed SVM classifier is linear and would not result in perfect accuracy (i.e. accuracy of one) when the classes are not linearly separable. To see this, we have reported ground-truth accuracies in Table V.

FCA measurement DMD measurement
Fig. 3: Distributions of the classification accuracy (Asphalt vs. Meadows) for the Pavia University dataset ().
Classes Meadow Gravel Trees Soil Bricks
Asphalt 0.45 0.38 0.42 0.36 0.44
Meadow 0.48 0.48 0.41 0.47
Gravel 0.44 0.44 0.44
Trees 0.42 0.53
Soil 0.44
TABLE I: One FCA measurement per pixel: worst-case classification accuracies (1000 trials) for the Pavia scene.
Classes Meadow Gravel Trees Soil Bricks
Asphalt 0.71 0.64 0.79 0.60 0.71
Meadow 0.72 0.61 0.46 0.73
Gravel 0.79 0.60 0.44
Trees 0.69 0.79
Soil 0.60
TABLE II: One DMD measurement per pixel: worst-case classification accuracies (1000 trials) for the Pavia scene.
Classes Meadow Gravel Trees Soil Bricks
Asphalt 0.61 0.80 0.94 0.63 0.86
Meadow 0.67 0.82 0.50 0.62
Gravel 0.94 0.62 0.54
Trees 0.89 0.93
Soil 0.66
TABLE III: Three FCA measurements per pixel: worst-case classification accuracies (1000 trials) for the Pavia scene.
Classes Meadow Gravel Trees Soil Bricks
Asphalt 0.91 0.76 0.96 0.87 0.84
Meadow 0.90 0.82 0.57 0.91
Gravel 0.95 0.82 0.49
Trees 0.93 0.96
Soil 0.80
TABLE IV: Three DMD measurements per pixel: worst-case classification accuracies (1000 trials) for the Pavia scene.
Classes Meadow Gravel Trees Soil Bricks
Asphalt 1.00 0.97 0.97 1.00 0.94
Meadow 0.99 0.96 0.89 0.99
Gravel 1.00 1.00 0.86
Trees 0.98 1.00
Soil 0.99
TABLE V: Ground-truth accuracies for the Pavia scene.

To measure the classifier recovery accuracy, we compute the cosine similarity, or equivalently the correlation, between

and :

In Tables VI and VII, we have reported the average recovery accuracy for the case of three-measurements-per-pixel (i.e. ).

Classes Meadow Gravel Trees Soil Bricks
Asphalt 0.051 0.055 0.113 0.056 0.048
Meadow 0.100 0.033 0.019 0.077
Gravel 0.122 0.064 0.050
Trees 0.017 0.123
Soil 0.031
TABLE VI: Three FCA measurements per pixel: average recovery accuracy (1000 trials) for the Pavia scene.
Classes Meadow Gravel Trees Soil Bricks
Asphalt 0.164 0.189 0.483 0.129 0.132
Meadow 0.468 0.147 0.140 0.380
Gravel 0.617 0.272 0.197
Trees 0.102 0.582
Soil 0.128
TABLE VII: Three DMD measurements per pixel: average recovery accuracy (1000 trials) for the Pavia scene.

Iv Conclusion

In the field of ensemble learning, it has been discovered that the diversity among the base learners enhances the overall learning performance [20]. Meanwhile, our aim has been to exploit the diversity that can be efficiently built into the sensing system. Both measurement schemes of pixel-invariant (measurement without diversity) and pixel-varying (measurement with diversity) have been suggested as practical designs for compressive hyperspectral cameras [3]. The presented analysis indicates that employing a DMD would result in more accurate recovery of the classifier and a more stable classification performance compared to the case when an FCA is used. Meanwhile, for tasks that only concern class prediction (and not the recovery of the classifier), FCA is (on average) a suitable low-cost alternative to the DMD architecture.

References

  • [1] R.M. Willett, M.F. Duarte, M.A. Davenport, and R.G. Baraniuk, “Sparsity and structure in hyperspectral imaging: Sensing, reconstruction, and target detection,” Signal Processing Magazine, IEEE, vol. 31, no. 1, pp. 116–126, Jan 2014.
  • [2] J.M. Bioucas-Dias, A. Plaza, G. Camps-Valls, P. Scheunders, N.M. Nasrabadi, and J. Chanussot, “Hyperspectral remote sensing data analysis and future challenges,” Geoscience and Remote Sensing Magazine, IEEE, vol. 1, no. 2, pp. 6–36, June 2013.
  • [3] J.E. Fowler, “Compressive pushbroom and whiskbroom sensing for hyperspectral remote-sensing imaging,” in Proceedings of the International Conference on Image Processing, IEEE, ICIP 2014, October 2014, pp. 684–688.
  • [4] J.E. Fowler, Qian Du, Wei Zhu, and N.H. Younan, “Classification performance of random-projection-based dimensionality reduction of hyperspectral imagery,” in Geoscience and Remote Sensing Symposium,2009 IEEE International,IGARSS 2009, July 2009, vol. 5, pp. V–76–V–79.
  • [5] Robert Calderbank, Sina Jafarpour, and Robert Schapire, “Compressed learning: Universal sparse dimensionality reduction and learning in the measurement domain,” .
  • [6] J.E. Fowler,

    “Compressive-projection principal component analysis,”

    Image Processing, IEEE Transactions on, vol. 18, no. 10, pp. 2230–2242, Oct 2009.
  • [7] M. Aghagolzadeh and H. Radha, “Adaptive dictionaries for compressive imaging,” in Global Conference on Signal and Information Processing (GlobalSIP), 2013 IEEE, Dec 2013, pp. 1033–1036.
  • [8] A. Krishnamurthy, M. Azizyan, and A. Singh, “Subspace Learning from Extremely Compressed Measurements,” ArXiv e-prints, Apr. 2014.
  • [9] M.A. Davenport, P.T. Boufounos, M.B. Wakin, and R.G. Baraniuk, “Signal processing with compressive measurements,” Selected Topics in Signal Processing, IEEE Journal of, vol. 4, no. 2, pp. 445–460, April 2010.
  • [10] Lijun Zhang, M. Mahdavi, Rong Jin, Tianbao Yang, and Shenghuo Zhu, “Random projections for classification: A recovery approach,” Information Theory, IEEE Transactions on, vol. 60, no. 11, pp. 7300–7316, Nov 2014.
  • [11] Saharon Rosset, Ji Zhu, and Trevor Hastie, “Margin maximizing loss functions,” in In NIPS, 2004.
  • [12] M.F. Duarte, M.A. Davenport, D. Takhar, J.N. Laska, Ting Sun, K.F. Kelly, and R.G. Baraniuk, “Single-pixel imaging via compressive sampling,” Signal Processing Magazine, IEEE, vol. 25, no. 2, pp. 83–91, March 2008.
  • [13] Richard Baraniuk, Mark Davenport, Ronald Devore, and Michael Wakin, “A simple proof of the restricted isometry property for random matrices,” Constr. Approx, vol. 2008, 2007.
  • [14] W.K. Ma, J.M. Bioucas Dias, Tsung Han Chan, N. Gillis, P. Gader, A.J. Plaza, A. Ambikapathi and Chong-Yung Chi, “A Signal Processing Perspective on Hyperspectral Unmixing: Insights from Remote Sensing,” Signal Processing Magazine, IEEE, vol.31, no.1, pp.67,81, January 2014.
  • [15] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sensing images with support vector machines,” IEEE Transactions on Geoscience and Remote Sensing, vol. 42, no. 8, pp. 1778–1790, August 2004.
  • [16] O. Chapelle, “Training a support vector machine in the primal,” Neural Computing, vol. 19(5), pp. 1155–1178, 2007.
  • [17] This dataset was gathered by AVIRIS sensor over the Indian Pines test site in North-western Indiana and consists of pixels and 224 spectral reflectance bands in the wavelength range 0.4 to 2.5e-6 meters.
  • [18] This scene was acquired by the ROSIS sensor during a flight campaign over Pavia, northern Italy. The number of spectral bands is 103 and the spatial resolution is pixels. Ground-truth consists of 9 classes.
  • [19] M. Pilanci, Martin J. Wainwright, “Randomized Sketches of Convex Programs with Sharp Guarantees,” arXiv: 1404.7203 [cs.IT], April 2014.
  • [20]

    B. Waske, S. Van Der Linden, J.A. Benediktsson, A. Rabe and P. Hostert, “Sensitivity of support vector machines to random feature selection in classification of hyperspectral data,”

    IEEE Transactions on Geoscience and Remote Sensing, vol. 48, pp. 2880–2889, 2010.