MIACE
Matlab Implementation of MI-ACE and MI-SMF Target Characterization Algorithms
view repo
In this paper, two methods for multiple instance target characterization, MI-SMF and MI-ACE, are presented. MI-SMF and MI-ACE estimate a discriminative target signature from imprecisely-labeled and mixed training data. In many applications, such as sub-pixel target detection in remotely-sensed hyperspectral imagery, accurate pixel-level labels on training data is often unavailable and infeasible to obtain. Furthermore, since sub-pixel targets are smaller in size than the resolution of a single pixel, training data is comprised only of mixed data points (in which target training points are mixtures of responses from both target and non-target classes). Results show improved, consistent performance over existing multiple instance concept learning methods on several hyperspectral sub-pixel target detection problems.
READ FULL TEXT VIEW PDF
The Multiple Instance Hybrid Estimator for discriminative target
charact...
read it
The Extended Functions of Multiple Instances (eFUMI) algorithm is a
gene...
read it
Hyperspectral signature classification is a quantitative analysis approa...
read it
The generalized likelihood ratio test (GLRT) is used to derive a detecto...
read it
In this paper, multiple instance learning (MIL) algorithms to automatica...
read it
In remote sensing, it is often difficult to acquire or collect a large
d...
read it
In this paper, we study the problem of hyperspectral pixel classificatio...
read it
Matlab Implementation of MI-ACE and MI-SMF Target Characterization Algorithms
Hyperspectral sub-pixel target detection is used for a huge variety applications including search and rescue [1], explosive residue detection [2], food safety and quality monitoring [3], chemical plume detection [4, 5], biomedical applications [6], landmine and explosive hazard detection [7], among many others. The goal of sub-pixel target detection is to locate all instances of a target within a hyperspectral scene given a known target signature. However, in many applications, obtaining a target signature is often difficult or infeasible. In this paper, two methods for estimating a discriminative target signature from mixed training samples with imprecise labels are presented. The proposed target characterization approaches, MI-SMF (Multiple Instance Spectral Matched Filter) and MI-ACE (Multiple Instance Adaptive Cosine Estimator) estimate target signatures that optimize the widely-used SMF and ACE sub-pixel target detector responses on a training data set with multiple-instance-style imprecise labels.
Sub-pixel target detection is a challenging and important hyperspectral image analysis task. Numerous sub-pixel detectors have been proposed in the literature [8, 9, 10, 11, 12, 13, 14, 15, 16]. However, nearly all of these detectors rely on having an accurate target spectral signature in advance. Much of the continued development of sub-pixel target detectors is driven by the lack of availability of effective target signatures for particular applications [17]. Commonly, in hyperspectral analysis, target signatures are obtained from spectral libraries comprised of spectral signatures collected in either controlled laboratory settings, using hand-held spectrometers, or pulled manually from a hyperspectral scene. However, these methods for obtaining target signatures are often found to be ineffective. For example, laboratory or hand-held spectrometer measured signatures do not account for atmospheric or environmental conditions and, thus, these signatures do not effectively translate to remotely-sensed hyperspectral imagery. Signatures pulled from a scene (with environmental and atmospheric conditions similar to testing scenarios) are more likely to be effective, however, this approach requires that pure spectral signatures from a target of interest can be accurately identified and extracted from a scene. In the case of sub-pixel targets, pure pixels containing only target response do not exist. Furthermore, being able to accurately locate a target in a remotely-sensed scene is challenging. Often Global Positioning System (GPS) coordinates of targets are known, however, the precision of these coordinates are limited by the accuracy of the co-registration of the imagery to GPS coordinates and the accuracy of the GPS device used to collect those coordinates. Thus, target GPS coordinates generally only provide an approximate location of a target in a scene. Namely, a region or set of pixels containing the target can be identified but the specific pixel-level labeling cannot be accurately obtained. Furthermore, since the targets are subpixel, they cannot visibly seen in the imagery and, thus, training labels cannot even be manually created. MI-SMF and MI-ACE address all of these challenges in obtaining target signatures. As opposed to modifying the target detector for improved performance, MI-SMF and MI-ACE estimate the target signature that improves SMF and ACE detection performance. Since MI-SMF and MI-ACE estimate discriminative target signatures from training data, it leverages the benefits of pulling a target signature from a hyperspectral scene but does not require pure pixel instances or accurate pixel-level labeling of target in the training data. Furthermore, the discriminative signatures estimated by MI-SMF and MI-ACE can be easily interpreted to understand what characteristics of the target class distinguish it from the background. In other words, in addition to optimizing SMF and ACE performance, the resulting signatures estimated by MI-SMF and MI-ACE are interpretable and provide insight into what are the discriminative, salient features of the target. In the case of hyperspectral sub-pixel target detection, these discriminative, salient features are the spectral wavelengths and the spectral characteristics of the target that differ from the background. These spectral characteristics have physical meaning that can then be studied and understood once uncovered.
The majority of sub-pixel detection techniques are statistical methods in which the target and background signals are modeled as random variables distributed according to some respective underlying probability distribution
[13, 8, 18]. The detection problem can then be posed as a binary hypothesis test with two competing hypotheses: target absent () or target present () and a detector can be designed using the generalized likelihood ratio test (GLRT) approach [19]. The spectral matched filter (SMF) [8, 20, 21, 14, 19] and the adaptive coherence/cosine estimator (ACE) [22, 23, 24] are two such effective and extremely widely used sub-pixel detection algorithms. The hypotheses used for the SMF are:(1) |
where is the background covariance and is the known target signature which is scaled by a target abundance, . The square-root of the GLRT for (1) results in the following as the SMF detector:
(2) |
where is the background mean subtracted from the data to ensure a zero-mean background as defined in .
In comparison, the hypotheses used for the unstructured-background ACE detector are:
(3) |
which includes and to add scale-invariance to the ACE detector where is the dimensionality of the spectra. The square-root of the GLRT for (3) results in the following as the ACE detector [13, 23, 22]:
(4) |
), we see that the difference between these two detectors is a normalization of an input test point. As a result of this difference, the SMF detection statistics is a projection of an (unnormalized) test data point onto a target vector in a whitened coordinate space. Since the test point is not normalized, data points with larger magnitude (of components not orthogonal to the target signature) result in larger detection statistics (i.e., in SMF, magnitude matters). In contrast, ACE normalizes all input test points such that detection statistics are determined only by the vector angle between a test point and the target signature in the whitened coordinate space (and magnitude does not play a role). In order to apply SMF or ACE, the target signature,
, must be known. The proposed MI-SMF and MI-ACE estimate a discriminative from imprecisely-labeled, mixed training data that optimizes the SMF and ACE detection statistics.The proposed problem of target characterization from imprecise labels is most closely related to multiple instance concept learning since, in those methods, a positive-class concept (i.e., a target signature) is also estimated from imprecisely-labeled training data. Here, a class concept refers to a generalized class prototype in the feature space. Among the rapidly growing body of Multiple Instance Learning (MIL) methods [25, 26, 27], only a few MIL methods estimate class concepts. Most notably, the Diverse Density (DD) [28]
, the Expectation-Maximization DD (EM-DD)
[29], the Dictionary-based Multiple Instance Learning (DMIL) [30, 31] and the extended FUnctions of Multiple Instances (FUMI) [32, 33] methods are MIL methods that estimate class concepts.Multiple instance learning is a variation on supervised learning for problems with imprecisely-labeled training data. Instead of pairing each training point with a class label, MIL methods learn from a set of labeled “bags” in which a bag is defined to be a multi-set of data points. Each bag is labeled as either a “positive” or “negative” bag. A bag is defined to be positive if at least one of the data points in the bag is an instance of the positive target class. The number of positive instances in each positive bag is unknown. Negative bags are composed entirely of non-target data points. An advantage of the MIL concept learning methods is that concepts can then be examined after applying the MIL approach to obtain insight into what characterizes the target class. In the case of hyperspectral sub-pixel target detection, this is extremely useful as the discriminative spectral characteristics in particular spectral wavelengths have physical meaning. By uncovering the discriminative spectral characteristics, the physical properties of the target material that result in these characteristics can be uncovered and studied.
Diverse density [28] finds a positive-class concept that lies close to at least one instance in each positive bag and maximizes the distance from all instances in negatively labeled bags. The distance measure used by DD to determine how close the concept is to the instances in positive bags and how far it is from the instances in negative bags is Euclidean distance. Namely, DD estimates the positive class concept, , that maximizes the following Noisy-OR objective:
(5) | |||
where is the point in the positive bag and is the point in the negative bag.
EM-DD, the Expectation-Maximization (EM) version of diverse density [29], estimates a target concept using an EM approach in which, during the -step, a single instance from each bag is selected as the one most likely to be cause of the bag’s label (e.g., for positive bags, the selected instance is the instance mostly likely to be the positive example in the bag). Then, during the -step, the concept is updated using gradient ascent. Zhang and Goldman [29] argue that EM-DD improves accuracy and computation time over the DD algorithm since the use of a single instance from each bag simplifies the search space and helps to avoid getting caught in local minima (by encouraging large jumps when the selected instances are changed in a bag each iteration).
DMIL [30, 31], instead of learning a single class concept close to the conjunction of positive bags and far from each negative instance, estimates class-specific dictionaries (one for each class) by enforcing that at least one instance in each positive bag for a class is well represented by the class-specific dictionary and all negative instances are poorly represented by that dictionary. The dictionaries are estimated by maximizing the Noisy-OR model in (5) where, instead of using the Euclidean distance to measure the dissimilarity between each instance and the associated class concept, the reconstruction error of an instance using the class-specific dictionary is used.
FUMI [33], like DMIL, estimates a full dictionary as opposed to a single concept. In contrast, however, FUMI does not estimate distinct class-specific dictionaries. A single dictionary with one target concept and a shared non-target concept dictionary is estimated. Each instance is modeled as a convex combination of positive and/or negative concepts and estimates the target and non-target concepts using an EM approach in which the hidden latent variable are the labels for each instance in the training data set.
MI-SMF and MI-ACE, like DD and EM-DD, estimate a target concept. However, instead of using a Euclidean distance to measure the similarity between instances and the target concept, MI-SMF and MI-ACE use the cosine similarity which, as shown in the following section, is closely aligned with the SMF and ACE target detectors. The cosine similarity is found to be more robust in the case of mixed training data in which target signatures are sub-pixel components of positive training data points.
Let be training data where is the dimensionality of an instance, , and is the total number of training instances. The data is grouped into bags, , with associated binary bag-level labels, where and denotes the instance in bag . Positive bags (i.e., with , denoted as ) contain at least one instance composed of some target:
However, the number of instances in a positive bag with a target component is unknown. If is a negative bag (i.e., , denoted as ), then this indicates that does not contain any target:
(7) |
Given this problem formulation, the goal of MI-SMF and MI-ACE is to estimate the target signature, , that maximizes the corresponding detection statistic for the target instances in each positive bag and minimize the detection statistic over all negative instances. This is accomplished by maximizing the following objective:
(8) |
where and are the number of positive and negative bags, respectively, is the number of instances in the negative bag, and is the selected instance from the positive bag that is mostly likely a target instance in the bag. The selected instance is identified as the point with the maximum detection statistic given a target signature, :
(9) |
The use of the selected instance allows MI-SMF and MI-ACE to inherit the advantages of doing so outlined in the EM-DD paper [29].
Given a set of selected instances, the target signature can be estimated by maximizing (8) with respect to . Let us first consider ACE and MI-ACE. To derive the update equation for the target signature, first note that the ACE detector can be re-written as follows:
(10) | ||||
(11) | ||||
(12) | ||||
(13) |
where , , and
are the eigenvectors and eigenvalues of the background covariance matrix,
, respectively, and . Here, it can be clearly noted that the ACE detection statistic is the cosine similarity between a test data point, , and a target signature, , in a whitened coordinate space. Thus, the objective function in (8) can be rewritten for MI-ACE as:(14) |
The constraint, , is a result of the fact that and aids in preventing values of from being arbitrarily large to maximize the first term in (14). Now, given (14), the update equation for can derived by solving the associated Lagrangian resulting in:
(15) |
MI-SMF can be similarly derived. As noted in Section 1, MI-SMF does not normalize input test points. Thus, the difference between MI-SMF over MI-ACE is the use of instead of in the objective function and target signature update equation. For MI-SMF, the objective function can be written as:
(16) |
resulting in the following update equation for :
(17) |
The update for in MI-SMF and MI-ACE has a closed form solution, so, unlike many MIL methods a gradient ascent approach is not needed when updating the target concept. Also, note that the second term in (15) and (17) does not change for the life of the algorithm and can be precomputed.
MI-SMF and MI-ACE proceed by alternating between selecting representative instances for each positive bag and updating the target concept. The resulting methods, summarized in Alg. 1^{1}^{1}1Our MI-SMF and MI-ACE implementations are available: https://github.com/GatorSense/, are straight-forward, fast, and effective approaches for multiple instance target characterization.
Note that for each set of selected instances, the is determined using a closed-form update. Thus, given the same set of selected instances, the same will be calculated. Given that there are a finite set of possible selected instances, there are a finite set of estimates. MI-SMF and MI-ACE terminates when is repeated indicating that the same set of selected instances were chosen; this can occur in contiguous iterations or not. This convergence sketch mimics that of the one described in [29]. In practice we found that MI-SMF and MI-ACE generally converged to a solution in less than 7 iterations.
It can also be noted that the ACE target detector is a non-linear detector in the original spectral space but a linear discriminant in the whitened, normalized coordinate space. Thus, a multiple instance linear discriminant can be estimated using a procedure similar to Alg. 1 by eliminating steps (2)-(3) in the method (and, to estimate a bias, appending a value of 1 to each input test point). Thus, Alg. 1 can be applied to any data as an approach to estimate a linear discriminant from data burdened with uncertain labels.
In the following MI-SMF and MI-ACE are evaluated and compared to several MIL concept learning methods on simulated data and to a real hyperspectral target detection data set. The simulated data experiments are included to illustrate the properties of MI-SMF and MI-ACE and provide insight into how and when the methods are effective.
Simulated data sets were generated following the hyperspectral linear mixing model
[34] using the approach outlined in Alg. 3 and 4 of [33]. Namely, for each instance in a negative bag and negative instances in positive bags, a uniform random number of non-target signatures were selected and the selected non-target instances were combined to generate the instance using a convex combination with proportions drawn from a uniform Dirichlet distribution. Similarly, for each true positive instance, a uniform random number of non-target signatures were selected and the target signature along with selected non-target instances were combined to generate the instance using a convex combination with proportions drawn from a Dirichlet distribution. For positive instances, theparameters of the Dirichlet were set to achieve the desired level of sub-pixel target mixing and variance in mixing proportions.
^{2}^{2}2Simulated data generation code is available: https://github.com/GatorSense/FUMI/tree/master/gen_synthetic_data_codeThis first experiment is to illustrate that the discriminative target concept estimated by MI-SMF and MI-ACE is not necessarily equal to the true underlying target signature. In this experiment, two simulated two-dimensional data sets (for easy visualization) were generated. The data was generated with two background and one target endmember (i.e., material signature), 10 positive and 10 negative bags, each bag contained 10 instances with only 3 target instances in each positive bag. The target data points had 0.2 proportion of target on average. For the first data set, target points are randomly mixed with either or both of the background materials. For the second data set, the target is only mixed with one of background materials (e.g., thus, simulating the case that targets only appear in certain context or around certain materials). Zero-mean Gaussian noise was added such that the data has an SNR of 20dB.
In this experiment, when training MI-ACE, the global mean and covariance over both positive and negative bags are used during whitening (i.e., the global mean is the mean over all training data points across both positive and negative bags and the global covariance is the covariance matrix computed over all training data points across both positive and negative bags). This is done since, in the case of low-dimensional data, the normalization step using only the negative bag mean and covariance corrupt the structure of the data (note: this is not the case in high-dimensional hyperspectral data).
The true target vector used to generate the data, all data samples and bags, and the estimated discriminative target concepts using MI-SMF and MI-ACE are shown in Fig. (a)a and (b)b
. Using the estimated target concepts and the true target vector, the SMF and ACE detectors were applied to the data and the resulting ROC (receiver operating characteristic) curves are shown in Fig.
(a)a and (b)b. For the true target signatures, before applying SMF or ACE, the background mean is subtracted from the true target signature (as it improves performance of the detectors). For the first simulated 2D data set, both MI-SMF and MI-ACE estimate a signature very similar to the true target signatures. However, for the second data set, neither MI-SMF or MI-ACE recover the true target signature from the data but instead estimate a target concept that maximizes target detection performance. In the second data set, the target is highly mixed with only one of the background endmembers and this additional contextual information is learned during MI-SMF and MI-ACE training and leveraged. For the second simulated data set, the area under the ROC curves (AUC) for the true target signature using the SMF and ACE detectors were 0.81 and 0.86, respectively. In contrast, the AUC for the MI-SMF and MI-ACE target concepts using the SMF and ACE detectors, respectively, were 0.86 and 0.95.In the second set of simulated data experiments, a hyperspectral data set was simulated based on the linear mixing model using one target and three background spectra selected from the ASTER spectral library [35]. Specifically, the Red Slate, Verde Antique, Phyllite and Pyroxenite spectra from the rock class with 211 bands and wavelengths ranging from m to m (as shown in Fig. 3) were used as endmembers to generate hyperspectral data. Red Slate was labeled as the target endmember. Results of MI-ACE and MI-SMF were compared to EM-DD (estimating both a point and scale value) [36, 29] and FUMI [33] such that the estimated target concepts can be compared. In all of these experiments, separate training and testing data sets were generated and zero-mean Gaussian noise was added to the simulated training and testing data such that the SNR was 20dB. All of the testing data sets were generated with 25,000 true negative and 25,000 true positive points with an average target proportion value of 0.15. The parameters for generating the testing data were held constant such that results obtained using different training sets could be directly compared. In all of these experiments, the target concept estimated from the training data by MI-SMF, MI-ACE and FUMI, were evaluated using the SMF detection statistic for MI-SMF and the ACE detection statistic for MI-ACE and FUMI on the test data. The target point and scaling values estimated by EM-DD were evaluated on test data using the prediction approach outlined by Zhang and Goldman [29]. Namely, for each test data point, the detection statistic is computed using (18):
(18) |
where is the test data point, is the feature value of test point , is the dimensionality of the data, is the estimated EM-DD scaling value for the dimension and is the EM-DD point value for the dimension. FUMI was initialized as outlined in [33] and run with parameter settings of , and . These FUMI parameters were determined manually to maximize FUMI performance. As outlined in [33], non-target signatures were initialized using the VCA algorithm [37] on all data in the negatively-labeled bags. Then, using these initial non-target signatures, the data point with the largest reconstruction error when representing each data point as a linear combination of initial non-target signatures is set as the initial target signature, . EM-DD scaling values were all initialized to one and the target point value was initialized to the same initial positive data point as used by FUMI.
In the first experiment with the simulated hyperspectral data, the number of positive vs. negative bags was varied to investigate if there is any sensitivity of the MI-ACE and MI-SMF methods to the proportion of positive bags in the training data. In this experiment, the total number of bags in the training data was held constant at 50 with the proportion of positive bags being varied from 0.25, 0.15 to 0.05 (corresponding to 13, 8, and 3 positive bags, respectively). Each bag contained ten data points with positive bags containing only 2 true target points with an average target proportion of 0.05. This resulted in a very highly mixed data set.
Table I lists the AUC values for each experiment with results averaged over ten runs. MI-SMF and MI-ACE tended to outperform FUMI and EM-DD in this experiment. As one would expect, results tend to improve over all methods as more true target points data points are available and degrade when only very few, highly mixed points are available (in this case, only 3 positive bags containing 2 true target points each resulting in a total of 6 target points with an average target proportion of 0.05). However, even in the case of only a few positive bags, results improve when the average target portion or the number of target points per bag is increased. For example, as a comparison, consider the case of 3 positive bags, containing 2 true target points each but the target proportion is increased to 0.25 on average, then average AUC( std. dev.) over 10 runs improves to 0.9950.001, 0.9940.001, 0.8400.167, 0.5060.110 for MI-SMF, MI-ACE, FUMI, and EM-DD, respectively. Fig. 4 shows example estimated target concepts for each of the four methods. When examining the spectra estimated by MI-SMF and MI-ACE in Fig. 4 and comparing them to the true signatures used to generate the data in Fig. 3, the discriminative ability of the estimated target concepts become apparent and can be interpreted. For example, observe that MI-SMF and MI-ACE estimate negative values around wavelength 0.5m, when examining Fig. 3 it can be seen that this corresponds to spectral wavelengths in which non-target signatures have relatively larger values as compared to the target, thus, the negative target signature in these wavelength impose a penalty for any large values at this wavelength in test data. In contrast, large values of the estimated signatures around 1m correspond to a wavelength region in which the target signature has a relatively large value in comparison to the background materials. Finally, around 1.45m the estimated values are close to zero indicating that the target signature and background materials have similar values at this wavelength.
Proportion of Positive Bags | Avg. Run Time (s) | |||
---|---|---|---|---|
0.25 | 0.15 | 0.05 | ||
MI-SMF | 0.9880.008 | 0.9870.014 | 0.8380.303 | 0.006 |
MI-ACE | 0.9170.197 | 0.9790.030 | 0.7160.368 | 0.006 |
FUMI | 0.9070.054 | 0.8180.234 | 0.6510.346 | 1.082 |
EM-DD | 0.5490.117 | 0.5680.148 | 0.4760.087 | 0.822 |
In this experiment, some of the FUMI results are on par with those of MI-ACE. This is expected as FUMI is also a sub-pixel target characterization approach. However, there are a number of significant advantages of MI-SMF and MI-ACE over FUMI. One of these is running time. In general, MI-ACE and MI-SMF are faster than FUMI (since FUMI alternately computes sub-pixel proportion values for each data point and updates target and non-target concepts using a series of large matrix operations). To show this, the average running time of our MATLAB implementations of MI-SMF, MI-ACE, FUMI, and EM-DD (excluding initialization) for each simulated hyperspectral data experiment are listed alongside the results in Tables 1-3. These experiments were run on a MacBookPro with 2.5Ghz quad-core Intel Core i7 processor and 16GB of RAM. Also, FUMI attempts to recover the true target signature from the data (as opposed to a discriminative signature) and, thus, does not leverage contextual information when it may beneficial as discussed in the previous simulated data experiment. (However, in cases when the goal is to uncover the true target signature, FUMI would be a better choice than MI-ACE or MI-SMF.) Furthermore, the resulting target signatures of MI-ACE and MI-SMF can be interpreted to determine which wavelengths are informative for the target detection problem and their relationship with respect to the background. Large positive values in the resulting MI-ACE and MI-SMF target signature indicate that the target material has a larger response in those wavelengths when compared to the background. Similarly, large negative values in the target signature indicate the target material has a smaller response in those wavelengths when compared to the background. Values close to zero indicate that the associated wavelength is not informative for the target detection problem. Finally, FUMI requires setting several parameters whereas MI-SMF and MI-ACE are parameter free. Determining the many appropriate parameter settings for FUMI (through cross validation) can often be time consuming.
In this experiment, the number of target points in each positive bag was varied to be 25%, 15%, and 5% of the points in the bag (corresponding to 3, 2, and 1 points, respectively). The total number of bags was 50 and these were split evenly across positive and negative bags with each bag containing ten points total. The target proportion for each true target point was 0.05 on average. The resulting AUC values for each experiment averaged over 10 runs is shown in Table II. As can be seen, the results in this experiment are similar to the previous one in that performance improves for MI-ACE, MI-SMF and FUMI given more true target training data points.
Proportion of Target Points in Positive Bags | Avg. Run Time (s) | |||
---|---|---|---|---|
0.25 | 0.15 | 0.05 | ||
MI-SMF | 0.9840.005 | 0.9780.011 | 0.9250.161 | 0.007 |
MI-ACE | 0.9810.006 | 0.9580.045 | 0.8110.224 | 0.007 |
FUMI | 0.9680.012 | 0.9600.015 | 0.9060.074 | 1.240 |
EM-DD | 0.4850.049 | 0.4550.047 | 0.4940.045 | 0.914 |
In the final simulated data experiment, the proportion of target in each true target point was varied to be 0.25, 0.15 and 0.05. The total number of bags was 50 and these were evenly split across positive and negative bags with each bag containing ten points total. The number of target points in each positive bag was set to two. Table III lists the AUC values for each experiment with results averaged over ten runs. From these results, it can be seen that MI-SMF, MI-ACE and FUMI are all effective with decreasing amounts of target proportion given enough true target data points in the training data (in this case, 50 true target points).
Mean Target Proportion in True Target Points | Avg. Run Time (s) | |||
---|---|---|---|---|
0.25 | 0.15 | 0.05 | ||
MI-SMF | 0.9890.002 | 0.9880.002 | 0.9840.003 | 0.008 |
MI-ACE | 0.9870.001 | 0.9860.003 | 0.9810.004 | 0.008 |
FUMI | 0.9850.002 | 0.9820.004 | 0.9640.012 | 1.04 |
EM-DD | 0.4690.120 | 0.4560.074 | 0.4860.106 | 0.823 |
Although the proposed approach was motivated by our work in sub-pixel hyperspectral target detection, the method may be applicable to a variety of other data types and applications. Namely, MI-ACE/MI-SMF estimate discriminative target signatures from mixed and inaccurately labeled training data. The proposed method can be applied to any data set or application plagued with inaccurate training labels in which a target “signature” or linear discriminant is needed. To help illustrate this and to help better visualize the ability of MI-ACE and MI-SMF to identify discriminative features, MI-ACE and MI-SMF were also applied to an MIL detection problem constructed using the AR Face Data Set [38]. The AR-face data set consists of frontal-pose images with 26 images/person (2 sessions, 13 per session) corresponding to different expressions, illuminations and occlusions. Pre-processed and cropped imagery of 50 male and 50 female subjects provided by Martinez and Kak [39] was used. Each image was down-sampled to pixels and the raw gray scale values were used as features.
For the experiment, sun-glasses were selected as the target concept. Specifically, 50 positive training bags of 10 instances each were created. Each positive bag contained only two instances of randomly selected images of people wearing sun-glasses; the other eight were randomly chosen from images of people without sun-glasses. 50 negative bags were constructed by randomly selecting 10 instances per bag of images of individuals not wearing sun-glasses. Test data included all imagery that was not used for training. Admittedly, the AR dataset is not naturally an MIL problem. However, the purpose of these results is to simply illustrate that the approach can be effectively applied to other data types and to help the reader visualize the discriminative target signatures that are estimated by the MI-ACE and MI-SMF algorithms.
MI-ACE and MI-SMF were applied to this data set along with the following comparison algorithms: FUMI [33], EMDD (EM-DD in which the target point and scale is estimated), EMDD-P (EM-DD in which only a target point is estimated)[29], DMIL [30, 31] and mi-SVM [40]. The mi-SVM algorithm was added to these experiments to include a comparison MIL approach that does not rely on estimating a target signature. FUMI was initialized as outlined in [33] and run with parameter settings of , and . These FUMI parameters were determined manually to maximize FUMI performance. EM-DD scaling values were all initialized to one and the target point value was initialized to the same initial positive data point as used by FUMI. Results are shown in Table IV and the target concept estimated by MI-ACE, MI-SMF, FUMI, EMDD, EMDD-P, and DMIL are shown in Fig. 5. As can be seen by examining the table, MI-ACE, MI-SMF, and FUMI outperform the other methods. However, MI-ACE and MI-SMF have several significant advantages over FUMI in obtaining these detection results. Namely, MI-ACE and MI-SMF do not have parameters to set whereas FUMI has a large number of parameters to tune. Furthermore, MI-ACE and MI-SMF have faster running time when compared to FUMI.
Algorithm | NAUC | |
---|---|---|
FAR=0.001 | FAR=1 | |
MI-SMF | 0.998 | 1.000 |
MI-ACE | 1.000 | 1.000 |
FUMI | 0.998 | 1.000 |
EMDD | 0.210 | 0.772 |
EMDD-P | 0.776 | 0.987 |
DMIL | 0.798 | 0.991 |
mi-SVM | 0.671 | 0.989 |
For experiments on real hyperspectral target detection data, the MUUFL Gulfport Hyperspectral data set was used. This data set was collected over the University of Southern Mississippi-Gulfpark Campus and contains pixels with 72 bands corresponding to wavelengths from to at a spectral sampling interval [41]. The first four and last four spectral bands were removed from the data set due to noise. The spatial resolution is 1 m. Two flights over the area from this data (Gulfport Campus Flight 1 and Gulfport Campus Flight 3) were selected as cross-validated training and testing data. These flights were selected as they were flown at the same altitude and have the same spatial resolution. Throughout the scene, there are 57 emplaced man-made targets. The targets are cloth panels of four different colors: Brown (15 examples), Dark Green (15 examples), Faux Vineyard Green (FVG) (12 examples) and Pea Green (15 examples). The spatial location of the targets are shown as scattered points over an RGB image of the scene in Fig. 6. This data set is a very challenging target detection task as many of the targets are partially or fully occluded by Live Oak trees on the campus. Furthermore, the targets vary in size, for each target type, there are targets that are , and in area. Thus, a target that has covers at most (in the case when it is fully within the footprint of a single pixel) a 0.25 proportion of the pixel signature. Many of these targets straddle multiple pixels and are occluded resulting in a highly mixed, sub-pixel target detection task. For each target in the training flight, a rectangular region around each ground truth point for each target were labeled as positive bags; this size was chosen since the accuracy of the GPS device used to record the groundtruth locations had 5m accuracy. Thus, there are 57 positive bags in each training set in this experiment.
MI-SMF and MI-ACE were evaluated on this data using the Normalized Area Under the receiver operating characteristic Curve (NAUC) in which the area was normalized out to a false alarm rate (FAR) of 1 false alarms [42]. Given the spatial resolution of the imagery, this maximum FAR corresponds to one false alarm per 1000 pixels. An NAUC value of one corresponds to zero false alarms and 100% detection. MI-ACE and MI-SMF were compared to the FUMI [33], EM-DD, EM-DD-P[29], mi-SVM [40], and DMIL [30, 31]
algorithms. For all methods except mi-SVM and EM-DD, target concepts were estimated on the training flight and then used to perform detection on the test flight using the ACE detection statistic. During application of ACE on the test data, the background mean and covariance were estimated from the negative instances of the training data. Since mi-SVM does not estimate a target concept, the detection statistic used for the mi-SVM approach was the signed distance to the decision hyperplane estimated on the training data. For EM-DD, in order to effectively make use of the scale parameters learned, the detection statistic in (
18) was used as outlined in [29]. Given an initialization, all methods obtained consistent results when re-run except for FUMI, EM-DD and EM-DD-P whose initialization procedures include a stochastic step. Thus, the results reported for FUMI, EM-DD and EM-DD-P are the median results over five runs of the algorithm on the same data.In the first MUUFL Gulfport experiment, one negative bag composed of all instances in the training data outside of any positive bag was used during training. The results of MI-SMF, MI-ACE and comparison methods are shown in Table V. As can be seen, MI-SMF and/or MI-ACE provide consistently either the best or second best result in comparison to the other approaches.
Alg. | Train on Flight 1; Test on Flight 3 | Train on Flight 3; Test on Flight 1 | ||||||
Brown | Dark Gr. | Faux Vine Gr. | Pea Gr. | Brown | Dark Gr. | Faux Vine Gr. | Pea Gr. | |
MI-SMF | 0.448 | 0.382 | 0.579 | 0.316 | 0.760 | 0.501 | 0.650 | 0.384 |
MI-ACE | 0.474 | 0.390 | 0.485 | 0.333 | 0.760 | 0.483 | 0.593 | 0.380 |
FUMI | 0.433 | 0.377 | 0.707 | 0.267 | 0.753 | 0.502 | 0.470 | 0.394 |
mi-SVM | 0.353 | 0.265 | 0.437 | 0.265 | 0.333 | 0.368 | 0.243 | 0.268 |
EM-DD-P | 0.467 | 0.0 | 0.067 | 0.014 | 0.0 | 0.0 | 0.291 | 0.0 |
EM-DD | 0.0 | 0.0 | 0.055 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
DMIL | 0.418 | 0.382 | 0.288 | 0.021 | 0.751 | 0.310 | 0.083 | 0.111 |
Fig. 7 shows the target concepts estimated by MI-SMF, MI-ACE, FUMI, mi-SVM, EM-DD-P and DMIL. As can be seen, MI-SMF, MI-ACE and FUMI tend to find target concepts with similar spectral shape; this agrees with the performance reported in Tab. V in which MI-SMF, MI-ACE and FUMI tend to have the best performance in this experiment. One observation that can be made is that the results when tested on Flight 1 outperform those when tested on Flight 3. This is due to the challenging nature of this detection problem. As stated above, the targets are heavily mixed sub-pixel targets and there is heavy occlusion from tree coverage throughout the scene. Although the flights cover the same general area, they do not have identical flight paths. Thus, there are differences in viewing angle between the flights, differences in the field of view associated with each pixel, and, also, there may have been movement in tree branches. These factors would result in differences in the number of sub-pixel targets that are visible across each flight. In this case, testing on Flight 3 is more challenging. This is seen consistently across many experiments and methods. Similarly, across many experiments and methods, Dark Green and Pea Green targets tend to have lower detection rates (likely due to these targets being more occluded).
Both Flight 1 and Flight 3 were collected on the same day shortly after one another and, thus, were collected under similar environmental conditions. The spectral signatures of materials vary across differing environmental conditions [34]. Since the MI-ACE and MI-SMF algorithms learn a discriminative target signatures that distinguishes the target spectral signature from background material, the performance of MI-ACE and MI-SMF depends upon the magnitude and spectral shape of the target vs. background materials to maintain the same relative relationship. In other words, if we train MI-ACE or MI-SMF on data collected in one set of environmental conditions and test on data collected in different environmental conditions, the performance of the methods will depend on whether the relative magnitudes of the target and background materials are similar to each other across the environmental conditions. If, for example, MI-ACE placed a large positive weight on a band since the target material has a large spectral response as compared to the background in that wavelength, MI-ACE would perform well on the test data if the target material still had a comparatively large spectral response in that wavelength. However, results would degrade if the relative values of the target and background materials were swapped. As with all supervised learning methods, the ability of the approach to generalize to test data is dependent on how well the training data distribution matches or encompasses what is seen during test.
In the second MUUFL Gulfport experiment, the influence of the construction of the negative bags was examined. In the previous experiment, only one negative bag consisting of all instances outside of any positive bag was used. Using one negative bag in MI-SMF and MI-ACE results in each instance in the negative bag to have equal influence on the result. Thus, if one or a few background/non-target materials compose the majority of the instances in the negative bag, these materials have a larger impact and influence on the estimated target concept. In the case of EM-DD and EM-DD-P, negative bag construction influences results heavily as, for these approaches, a single instance from each bag is used to represent the bag during target concept and scale updates. Thus, given only one negative bag, only one negative instance is used to represent all of the negative data. In this experiment, we investigate the use of multiple negative bags. To construct the multiple negative bags, all instances outside of any positive bag are clustered using the -means clustering algorithm and each resulting cluster is used as a separate negative bag. The purpose of this approach is to cluster together the instances with similar spectral shape and magnitude. When running the -means algorithm, was first set to 15 such that the number of negative bags is equal to the number of positive bags for most of the target types. was then varied to be 100 and equal to the number of non-target instances in the data (i.e., each instance is an individual negative bag). Table VI, VII, and VIII list the results of MI-SMF, MI-ACE and comparison methods for , and , respectively. When studying Tab. VI - VIII, it can be seen that MI-SMF and MI-ACE are fairly consistent in their results, thus, MI-SMF and MI-ACE are not heavily influenced by negative bag structure. The FUMI, mi-SVM, and DMIL methods are also not influenced by negative bag structure and results are similar or the same as those with one large negative bag. However, EM-DD-P is heavily influenced by the number of negative bags. Results for EM-DD-P improve as more negative bags are included with the best results provided when each non-target point is an individual negative bag. However, even with a negative bag for each non-target instance, MI-SMF and MI-ACE provide competitive results with EM-DD-P.
MI-SMF and MI-ACE, two multiple instance target characterization approaches, are introduced as methods to estimate hyperspectral target signatures from imprecisely labeled training data. Advantages of MI-SMF and MI-ACE include that they have a straight-forward implementation, fast running time, and are free of parameter settings. Experimental results show that MI-SMF and MI-ACE provide competitive and state-of-the-art results when compared to existing multiple instance concept learning methods. Although this work was motivated by sub-pixel hyperspectral target detection, the MI-ACE and MI-SMF methods are general approaches for extracting discriminative target signatures given high dimensional data points (or feature vectors) that are paired with uncertain training labels.
Alg. | Train on Flight 1; Test on Flight 3 | Train on Flight 3; Test on Flight 1 | ||||||
Brown | Dark Gr. | Faux Vine Gr. | Pea Gr. | Brown | Dark Gr. | Faux Vine Gr. | Pea Gr. | |
MI-SMF | 0.461 | 0.382 | 0.540 | 0.320 | 0.763 | 0.503 | 0.651 | 0.374 |
MI-ACE | 0.496 | 0.389 | 0.479 | 0.333 | 0.763 | 0.486 | 0.565 | 0.349 |
FUMI | 0.433 | 0.377 | 0.707 | 0.267 | 0.753 | 0.502 | 0.470 | 0.394 |
mi-SVM | 0.353 | 0.265 | 0.437 | 0.265 | 0.333 | 0.368 | 0.243 | 0.268 |
EM-DD-P | 0.038 | 0.0 | 0.0 | 0.0 | 0.284 | 0.012 | 0.130 | 0.019 |
EM-DD | 0.0 | 0.0 | 0.0 | 0.086 | 0.0 | 0.0 | 0.0 | 0.0 |
DMIL | 0.418 | 0.382 | 0.288 | 0.021 | 0.751 | 0.310 | 0.083 | 0.111 |
Alg. | Train on Flight 1; Test on Flight 3 | Train on Flight 3; Test on Flight 1 | ||||||
Brown | Dark Gr. | Faux Vine Gr. | Pea Gr. | Brown | Dark Gr. | Faux Vine Gr. | Pea Gr. | |
MI-SMF | 0.454 | 0.382 | 0.560 | 0.312 | 0.762 | 0.506 | 0.651 | 0.379 |
MI-ACE | 0.476 | 0.389 | 0.484 | 0.333 | 0.762 | 0.486 | 0.558 | 0.366 |
FUMI | 0.433 | 0.377 | 0.707 | 0.267 | 0.753 | 0.502 | 0.470 | 0.394 |
mi-SVM | 0.353 | 0.265 | 0.437 | 0.265 | 0.333 | 0.368 | 0.243 | 0.268 |
EM-DD-P | 0.122 | 0.386 | 0.0 | 0.267 | 0.066 | 0.013 | 0.545 | 0.265 |
EM-DD | 0.001 | 0.0 | 0.0 | 0.020 | 0.046 | 0.0 | 0.0 | 0.026 |
DMIL | 0.418 | 0.382 | 0.288 | 0.021 | 0.751 | 0.310 | 0.083 | 0.111 |
Alg. | Train on Flight 1; Test on Flight 3 | Train on Flight 3; Test on Flight 1 | ||||||
Brown | Dark Gr. | Faux Vine Gr. | Pea Gr. | Brown | Dark Gr. | Faux Vine Gr. | Pea Gr. | |
MI-SMF | 0.448 | 0.382 | 0.579 | 0.316 | 0.760 | 0.501 | 0.650 | 0.384 |
MI-ACE | 0.474 | 0.390 | 0.485 | 0.333 | 0.760 | 0.483 | 0.593 | 0.380 |
FUMI | 0.433 | 0.377 | 0.707 | 0.267 | 0.753 | 0.502 | 0.470 | 0.394 |
mi-SVM | 0.353 | 0.265 | 0.437 | 0.265 | 0.333 | 0.368 | 0.243 | 0.268 |
EM-DD-P | 0.420 | 0.382 | 0.478 | 0.250 | 0.759 | 0.507 | 0.533 | 0.422 |
EM-DD | 0.0 | 0.064 | 0.0 | 0.0 | 0.0 | 0.0 | 0.055 | 0.0 |
DMIL | 0.418 | 0.382 | 0.288 | 0.021 | 0.751 | 0.310 | 0.083 | 0.111 |
[Derivation of Target Signature Updates]
In order to derive the update equation for the target signature of MI-ACE and MI-SMF, we can write the Lagrangian for following MI-ACE objective function shown in (14) as shown below:
(19) |
where is the Lagrange multiplier. The derivative of the Lagrangian with respect to is:
(20) |
We can then set (20) to zero and solve for :
(21) |
Then, define as:
(22) |
To determine the value of the Lagrange multiplier, , we must determine the value for that enforces the constraint that . Thus, which results in the final update equation for :
(23) |
The derivation for the update equation for the MI-SMF target signature is identical to what is shown above except is used in place of in all of the preceding equations in this Section.
This material is based upon work supported by the National Science Foundation under Grant No. IIS-1350078 - CAREER: Supervised Learning for Incomplete and Uncertain Data. The authors would also like to acknowledge James Theiler and Amanda Ziemann for their insightful discussions.
2015 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)
, Oct 2015, pp. 1–7.Int. J. Comput. Vision
, vol. 114, no. 2-3, pp. 288–305, Sep. 2015.R. Rahmani, S. A. Goldman, H. Zhang, J. Krettek, and J. E. Fritts, “Localized content based image retrieval,” in
Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval. ACM, 2005, pp. 227–236.S. Andrews, I. Tsochantaridis, and T. Hofmann, “Support vector machines for multiple-instance learning,” in
Advances in Neural Inform. Process. Syst., 2002, pp. 561–568.
Comments
There are no comments yet.