Introduction
Atmospheric aerosols play an important role in earth’s climate system [pachauri2007ipcc], and can also pose harmful effects on human health when inhaled. Therefore, accurately characterizing the global aerosol distribution is valuable for many reasons.
The total light extinction caused by aerosols over a vertical column in the atmosphere of unit cross section is known as the aerosol optical depth (AOD) [amt2012]. Much effort has been placed in observing AOD from space and groundbased instruments [Holben92, kaufman_operational_1997, Torres98, holben_aeronet_1998, Mishchenko99, ichoku2002, remer_global_2008, remer_modis_2005, chu2002validation, Liu2004].
A comparison between the AOD measurements made by MODIS and AERONET shows that there are biases between the two observations. Figure 1 shows the distribution of the bias between the AOD measured by MODIS instrument at 550 nm. In this paper, we will delineate our attempts to understand or explain the factors behind the bias. We performed a comprehensive brute force search for all possible combination of variables, and used neural network as one of the machine learning toolbox to predict the AOD values. Moreover, we used mutual information between the predicted and observed variables as the measure of correlation between them. We found the set which reproduced the AOD, producing the highest value of mutual information with respect to the AERONET data set. The set is then identified as the most relevant set of variables for training the machine learning algorithm. The method of identification of relevant set of variable is a more general problem, which could be applied to similar multivariate problems. Therefore, the presented example should be viewed as a test study. However, the framework and implementation will find its use for general purpose search for relevant variables.
Data and Methodology
The data used in this study were derived from Multisensor Aerosol Products Sampling System (MAPSS), which derives the data from multiple sources such as original MODIS and AERONET datasets and provides level2 aerosol scientific data sets [amt2012, ichoku2002]. MAPSS provides a consistent and uniform sampling of aerosol products by identifying the MODIS data pixels within approximately 27.5 km of the AERONET sites. We analyzed the MODISaqua land data set. Readers interested in the details of the MAPSS are directed to [amt2012]. In the present paper we present the analysis of only the MODISAqua land data set.
We applied a machine learning technique to predict the AOD values given the other components as the regressors to the learning module. The method developed in this paper is constructed with plugandplay in mind. The machinelearning module can be any regression tools of choice. In our case, we simply choose, as an example, Neural networks (NN) method. NNs are widely used in pattern recognition, machine learning and artificial intelligence. In addition, NNs have found many applications in other fields such as geoscience, remote sensing, oceanography, etc.
While training the NN as the regression tool, we regard an observation data set as the product of n input variables, say , feeding into the regression machine. Therefore, the observed output variable, AOD, is some function of these input variables. In terms of neural networks, as shown in figure 2, the output of the neuron can be written as the weighted sum of inputs
(1) 
where is the transfer function, represents the weight from unit to unit , and represents the input variables to the neuron. During training, the NN weights are adjusted appropriately to learn the data. The learning and adjustments of the weights are inspired by the synaptic learning behavior of neurons. The learning module is explored with all possible combinations of the input variables.
The following variables have been used as the regressor to the neural network. We constructed all possible set of variables, this came out to be total of 32,781 possible combinations.

Aerosol optical depth at 550 nm (AOD0550)

Aerosol optical depth at 470 nm (AOD0470)

Aerosol optical depth at 660 nm (AOD0660)

Mean reflectance at 470 nm (mref0470)

Mean reflectance at 550 nm (mref0550)

Surface reflectance at 660 nm (surfre0660)

Surface reflectance at 470 nm (surfre0470)

Surface reflectance at 2100 nm (surfre2100)

Cloud fraction from land aerosol cloud mask (cfrac)

Quality assurance (QAavg)

Solar zenith angle (SolarZenith)

Solar azimuth angle (SolarAzimuth)

Viewing zenith angle (SensorZenith)

Sensor azimuth angle (SensorAzimuth)

Scattering angle (ScatteringAngle)
For each combination set, one at a time, we trained the NN with AERONET AOD as the target variable. Once training phase is completed, we then predicted the AERONET AOD values from the trained network. The NN algorithm used a feedforward back propagation algorithm with a hidden layer having 200 nodes. The training was done by the LevenbergMarquardt algorithm with meansquared error as the performance factor. We used the Matlab toolbox. We randomly split the training data set into three portions. The first portion is used to train the NN weights using an iterative process so that for each iteration, the root mean square (RMS) error of the neural network is computed by using the second portion of the data. We used the RMS error to determine the convergence of our training. When the training is complete, we use the final of the data as the validation data set.
Once the neural network training is completed, we have obtained a mapping between the set of input and output variables. Therefore, the most relevant set of variables is the one that can best reproduce the target data. We explored all combinations of variables, trained each of the combination, and compared which provided the fit of the observed AERONET AOD data. The end product is the result of regression between the input variables to the NN and the observed AERONET AOD.
The measure of similarity between predicted and observed AOD is measured by mutual information, which is described in next section. Since the learning module constructs a mapping between the set of input and output variables, the most relevant set of variables is the one that can best reproduce the target data.
Mutual Information
The Correlation coefficient (Pearson’s correlation) is a widely used measure of dependence between two variables, and represents the normalized measure of the strength of their linear relationships. The correlation coefficient
between two random variables
and with expected values and and is defined as(2) 
where, E is the expected value operator, cov means covariance, and, a widely used alternative notation for Pearson’s correlation.
The correlation coefficient is defined only if both of the standard deviations are finite and both of them are nonzero. The correlation coefficients range from 1 to 1. The correlation coefficient values close to 1 (or 1) suggest that there is a positive (or negative) linear relationship between the data columns, whereas the values close to or equal to 0 suggest there is no linear relationship between the data columns. It can only be applied to the cases of linear relationship between two variables.
Mutual information quantifies the mutual dependence between two variables by taking into account all of the characteristics of the variables in the Probability Distribution Function (PDF). Mutual information is defined as follows in discrete form:
(3) 
which is a special case of a measure called KullbackLeibler divergence
[kullback_information_1997, cover_elements_2006]. If and are statistically independent, then(4) 
In this case, the mutual information becomes 0, showing independency. A proper mapping of the form
(5) 
normalizes the measure of general correlation as depicted by the MI [joe_relative_1989, granger_using_1994, dionisio_mutual_2004].
In the case when X and Y are normally distributed,
(6) 
where, Then, the mutual information reduces to
(7) 
So, that,
(8) 
This relation shows the generality of the normalized correlation measure.
There are several methods to estimate MI from data
[moon1995estimation, Reshef16122011, DenizME12]. We applied the Variable Bin Width Histogram Approach [darbellay1999estimation, mutingCode] to compute the MI between the observed and predicted AOD. Higher values indicate better agreement between the observed and predicted set, and thus are the best indicators of the input variables needed to assess a relevant set of variables.Results and Conclusions
We applied machine learning technique, specifically neural network method in supervised learning mode, and explored all possible combination of variables. By the brute force exploration of all possible combinations, we found the best set of variables by comparing a measure of correlation between the predicted and observed variables. The measure of agreement between the observed data and the predicted data was obtained by using the mutual information. Therefore, we could identify the most relevant set of variables which could result into the maximum correlation between the predicted and observed data.
Figure 3 shows the best result we obtained as a result of the brute force search. It shows a comparison between the AERONET AOD, which was used as the target set for NN training, and the predicted AOD as the result of the NN training. The mutual information and correlation coefficients values for the predicted and observed AOD values from the variableset as regressor has been presented in table 1. For brevity we only show the first 15 rows of the table. The best prediction set had the MI value of to the AERONET AOD, and was result of an input set consisting of the following variables as regressors: AOD at 470 nm, and AOD at 660 nm, mean reflectance at 470 nm, and mean reflectance at 550 nm, surface reflectance at 660nm, 470 nm, 2100 nm, cloud fraction, quality assurance values, solar zenith angle, solar azimuth angle, zenith angle, sensor azimuth angle and scattering angle.
There are several developments which could further benefit the methodology of finding relevant variables. In the future studies we will use mutual information with error bars. In the future work we will present cross examination of multiple machine learning techniques to explain the biascorrection using the framework developed in this paper.
We presented a brute force search method, which could be useful in many other cases involving multivariate exploration. This could be useful in finding the most relevant set of factors to get insights from physical data.
Combination  Mutual Information (MI)  Corr corrcoeff () 

0.771  0.927  
0.769  0.926  
0.768  0.926  
0.766  0.926  
0.765  0.926  
0.764  0.925  
0.762  0.921  
0.761  0.924  
0.760  0.924  
0.759  0.925  
0.759  0.925  
0.756  0.924  
0.756  0.921  
0.755  0.923  
0.755  0.924 
Acknowledgements
The support from NASA ACCESS AeroStat project (NNX10AM94G) is gratefully acknowledged. Special thanks are due to Dr. Maksym Petrenko and Dr. Charles Ichoku for providing Multisensor Aerosol Products Sampling System (MAPSS) data sets [amt2012]. We also thank the AERONET investigators, and the MODIS team for their wonderful work.
Comments
There are no comments yet.