The ChemCam instrument on the Mars rover Curiosity uses laser-induced breakdown spectroscopy (LIBS), a type of an atomic emission spectroscopy, to remotely analyze Martian rocks (Wiens et al., 2012)
. Spectral information is used to extract the qualitative and quantitative chemical content (QQCC) of a material sample, where the QQCC can be seen as an unobserved state vector representing the sample. Both linear and nonlinear supervised learning techniques have been applied for mapping LIBS spectra to QQCC with good accuracy(Forni et al., 2013; Boucher et al., 2015; Castorena et al., 2021)
. In this work, we build upon these efforts by proposing a framework for constructing the probability density function (PDF) of LIBS spectra. In addition, unlike previous methods which produce point estimates of QQCC, we propose and evaluate a method for uncertainty quantification (UQ) on point predictions of QQCC.
Many real-world spectra, including LIBS spectra, are characterized by a large number of features, complicating the construction of the data PDF due to the curse of dimensionality. We develop a novel framework for constructing low-dimensional PDFs suitable for downstream inference (e.g., sampling, density estimation, outlier detection, or unsupervised representation learning) using state-of-the-art neural density estimators with normalizing flows (NF) on spectral latent spaces. This framework allows us to generate realistic spectral samples on a reduced space and using an inverse-transformation project them back to the physically interpretable space. Furthermore, we propose a bootstrapping approach for quantifying uncertainty in predictions of unobserved state vectors corresponding to each spectrum. We demonstrate the capabilities of the proposed approach to construct the PDF of a LIBS spectral data set, and to learn a mapping to the known QQCC with uncertainty. The validated framework can be then employed for QQCC prediction and direct UQ of novel samples, such as artificial samples generated by the NF model as well as spectra collected directly on Mars.
To the best of our knowledge, this work is the first time normalizing flows are constructed on spectral latent spaces and can be readily employed for any kind of spectroscopy data. We show that the proposed framework provides a straightforward way to perform downstream inference tasks and direct UQ for high-dimensional spectral data.
2.1 Problem statement
Let’s assume is an dimensional random vector with non-negative elements, and a true data distribution , which represents the spectral signals. Our goal is to learn an invertible, stable mapping between the approximate data distribution and a latent distribution (e.g., Gaussian) that will allow for fast evaluation of various inference tasks. However, estimating the full-joint density of very high-dimensional spectra is a challenging and often intractable task. Therefore, we introduce a second mapping, that transforms to where to discover the spectral latent representation of the signals. Next, we learn an invertible mapping between (spectral latent variable) and (latent variable). This framework allows us to generate novel samples on the reduced spectral latent space and use the inverse transformation to map back to the original space and therefore approximate the true data distribution.
We also want to estimate an unobserved state vector where is the vector dimensionality. For the ChemCam application we consider, this represents the QQCC, an 8-dimensional vector with the relative weight percentages of 8 major oxides commonly found on Mars. Given a training dataset of LIBS spectra and associated compositions (samples generated on Earth), we are interested in constructing a surrogate of the mapping , that will allow us to make predictions of the chemical concentration of novel samples. To calculate uncertainties related to these predictions, we propose an approach based on bootstrapping that allows us to quantify both model and data uncertainties and thus assign measures of accuracy to sample estimates. This approach can be then employed for UQ of data generated by the normalizing flow model.
2.2 Spectral NMF latent space
Consider observations of the random vector and let the data matrix be . We use non-negative matrix factorization (NMF) to decompose into a product of a non-negative basis matrix and a non-negative coefficient matrix , such that (or equivalently ) (Paatero and Tapper, 1994; Wang and Zhang, 2012). NMF decomposes each data point into the linear combination of the basis vectors. The NMF optimization problem consists of minimizing the Frobenius norm between and .
2.3 Inference via a spectral normalizing flow
We propose the construction of a normalizing flow model on the latent space of the LIBS spectra, obtained by the NMF decomposition, to learn the underlying probability distribution of the spectral latent variable. The inverse NMF mapping introduced in the previous section can be used to project generated samples back to the physically interpretable space (i.e., ).
Normalizing flows are a powerful class of likelihood generative models which transform a base density into a target density by a series of deterministic and invertible transformations (Kobyzev et al., 2020). Consider the base density (spectral latent variable), the more complex density (latent variable) and an invertible mapping . Under the change of variables formula we can compute the log-likelihood of as
where represents the trainable parameters of the flow. To train the NF model the negative log-likelihood (NLL) of Eq.eq:nf is minimized.
Here, we parameterize the normalizing flow with a sequence of real-valued non-volume preserving (RealNVP) transformations (Dinh et al., 2016). The RealNVP model, composes two types of invertible transformations: additive coupling layers and rescaling. The model uses the so-called affine coupling layers for the coupling flows, which are simple and computationally efficient. The transformation can be written as
where is the Hadamard product or element-wise product and the is applied to each element of . The above transformation performs a ‘1-1’ mapping to the first elements and scales and shifts the remaining . By incorporating coupling layers into the flow, the elements are permuted across layers so that a different set of elements is copied each time. Here we model ,
as neural networks. Once the PDF is learned, downstream inference tasks can be performed straightforwardly. In the next Section, we are interested in predicting the elemental composition of novel samples, with their associated uncertainty.
2.4 Uncertainty quantification via bootsrapping
We now aim to construct a mapping between the LIBS signal signatures and QQCC. We train shallow neural networks, one for each oxide element. The models are formed as
where is the total number of oxides to be determined, denote the trainable weights and bias and
is the activation function. We learn the parameters of the models with a training set, where is the total number of samples. The main advantage of such models is that they are both fast to train and result in very good accuracy scores (see Results).
To quantify uncertainties related to predictions of elemental compositions we use bootstrapping (Kumar and Srivastava, 2012), a statistics resampling method which allow us to assign measures of accuracy to a sample estimate. In general, bootstrapping performs as well as parametric prediction intervals and its implementation is straightforward. For our application, it does not result in high-computational cost given the choice of simplistic bootstrapped shallow neural networks. In case of more complex models, methods that leverage the last-layer of the network can be employed as they have shown good performance (Brosse et al., 2020). Given a new observation , we can write
where represents the model estimate at the -th bootstrap iteration (model uncertainty) and the predicted residual between true and predicted values which can been modeled for a training dataset with a regression model (data uncertainty). To measure the quality of prediction intervals, we compute the coverage of validation samples (the rate at which the actual values fall within the range of the prediction interval).
Consider the LIBS spectra matrix with and . We perform NMF using (selected by 5-fold cross-validation) with transformed data matrix
. Next we construct a NF model based on the RealNVP architecture with 5 coupling layers and a Gaussian distribution as a base density. We should highlight here the computational advantages of this approach. Constructing a NF model on the 15-dimensional latent space is extremely fast as the training process required less than 1 minute of CPU time. In Figure2.4, we show a novel random sample generated by the NF model which is transformed back to the original space with inverse NMF.
We consider 8 oxides and therefore we construct
, single hidden layer neural network models, with ReLU activation function, trained with stochastic gradient descent (SGD). To measure the accuracy of results we compute the coefficient of determination, calculated as, where represents the sum of squares of residuals and the total sum of squares. We show the accuracy of models in Figure 1 for a holdout set of 140 samples. The results reported show that point estimates are close to the optimal regression (1:1 line) and overall the response is comparable to state-of-the-art deep CNN approaches (Castorena et al., 2021). For and elements, estimates show larger deviation, which are not considered significant due to the small oxide wt. values at these regions.
Finally, we perform bootstrapping for a dataset of LIBS spectra samples collected on Earth (with associated ground truth) and we compute the prediction intervals for the same holdout dataset of 140 samples. In Figure 3
, we plot the distribution of bootstrap predictions with the ground truth for a random Earth sample where box plots represent the first quartile to the third quartile and green stars are the ground truth. Table1 shows the coverage calculated for 95 prediction intervals for all holdout samples, and we see that the intervals appear to nearly achieve the nominal coverage. The validated approach can be therefore used for making predictions with uncertainty for novel LIBS spectra, generated either by the normalizing flow model or for Martian samples directly collected from ChemCam.
We showed that the proposed framework provides a straightforward way to perform downstream inference tasks for high-dimensional spectral data by identifying a spectral latent space, estimated as a parsimonious representation of the data, and constructing a spectral normalizing flow model on the reduced space. The proposed approach is ideal for modeling high-dimensional data and enables the learning of complex distributions in a fast and efficient way. Beyond general high-dimensional inference, the proposed UQ approach allows for predictions of state vectors associated with novel out-of-distribution data or data generated by the trained normalizing flow.
5 Broader impact
This work provides a robust approach to construct low-dimensional probability densities on spectral latent spaces and quantify uncertainties for predictions related to the elemental compositions of spectral samples generated directly from neural density estimators. Our approach has immediate application for inference and direct UQ of spectral data in different fields such as astronomy, geology, audio signal processing, bioinformatics and more. We believe that this work does not have future societal or ethical consequences.
This project was supported by the Laboratory Directed Research and Development program of Los Alamos National Laboratory under project number LDRD-20210043DR. Research was performed while K.K. was an Applied Machine Learning Summer Research Fellow at LANL.
- A study of machine learning regression methods for major elemental analysis of rocks using laser-induced breakdown spectroscopy. Spectrochimica Acta Part B: Atomic Spectroscopy 107, pp. 1–10. Cited by: §1.
- On last-layer algorithms for classification: decoupling representation from uncertainty estimation. arXiv preprint arXiv:2001.08049. Cited by: §2.4.
- Deep spectral CNN for laser induced breakdown spectroscopy. Spectrochimica Acta Part B: Atomic Spectroscopy 178, pp. 106125. Cited by: §1, §3.
- Density estimation using real NVP. arXiv preprint arXiv:1605.08803. Cited by: §2.3.
- Independent component analysis classification of laser induced breakdown spectroscopy spectra. Spectrochimica Acta Part B: Atomic Spectroscopy 86, pp. 31–41. Cited by: §1.
- Normalizing flows: an introduction and review of current methods. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §2.3.
Bootstrap prediction intervals in non-parametric regression with applications to anomaly detection. In Proc. 18th ACM SIGKDD Conf. Knowl. Discovery Data Mining, Cited by: §2.4.
- Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5 (2), pp. 111–126. Cited by: §2.2.
- Optimization of clustering analyses for classification of ChemCam data from Gale Crater, Mars. In European Planetary Science Congress, pp. EPSC2020–867. Cited by: §2.2.
- Nonnegative matrix factorization: a comprehensive review. IEEE Transactions on knowledge and data engineering 25 (6), pp. 1336–1353. Cited by: §2.2.
- The ChemCam instrument suite on the Mars Science Laboratory (MSL) rover: body unit and combined system tests. Space science reviews 170 (1), pp. 167–227. Cited by: §1.