Likelihood informed dimension reduction for inverse problems in remote sensing of atmospheric constituent profiles

09/08/2017 ∙ by Otto Lamminpää, et al. ∙ FMI Beta 0

We use likelihood informed dimension reduction (LIS) (T. Cui et al. 2014) for inverting vertical profile information of atmospheric methane from ground based Fourier transform infrared (FTIR) measurements at Sodankylä, Northern Finland. The measurements belong to the word wide TCCON network for greenhouse gas measurements and, in addition to providing accurate greenhouse gas measurements, they are important for validating satellite observations. LIS allows construction of an efficient Markov chain Monte Carlo sampling algorithm that explores only a reduced dimensional space but still produces a good approximation of the original full dimensional Bayesian posterior distribution. This in effect makes the statistical estimation problem independent of the discretization of the inverse problem. In addition, we compare LIS to a dimension reduction method based on prior covariance matrix truncation used earlier (S. Tukiainen et al. 2016).



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Atmospheric composition measurements have an increasingly crucial role in monitoring the green house gas concentrations in order to understand and predict changes in climate. The warming effect of greenhouse gases, such as carbon dioxide () and methane (), is based on the absorption of electromagnetic radiation originating from the sun by these trace gases. This mechanism has a strong theoretical base and has been confirmed by recent observations FeldmanEtal15 .

Remote sensing measurements of atmospheric composition, and greenhouse gases in particular, are carried out by ground-based Fourier transform infrared (FTIR) spectrometers, and more recently by a growing number of satellites (for example SCIAMACHY, ACE-FTS, GOSAT, OCO-2). The advantage of satellite measurements is that they provide global coverage. They are used for anthropogenic emission monitoring, detecting trends in atmospheric composition and studying the effects of biosphere, to name but a few examples. Accurate ground-based measurements are crucial to satellite measurement validation, and the global Total Carbon Column Observing Network (TCCON WunchEtAl15 ) of FTIR spectrometers, consisting of around 20 measurement sites around the world, is widely used as a reference DilsEtAl14 . The FTIR instrument looks directly at sun, returning an absorption spectrum as measured data.

Determining atmospheric gas density profiles, or retrieval, from the absorption spectra is an ill-defined inverse problem as the measurement contains only a limited amount of information about the state of the atmosphere. Based on prior knowledge and using the Bayesian approach to regularize the problem, the profile retrieval is possible, provided that our prior accurately describes the possible states that may occur in the atmosphere. When retrieving a vertical atmospheric profile, the dimension of the estimation problem depends on the discretization. For accurate retrievals a high number of layers are needed, leading to a computationally costly algorithms. However, fast methods are required for the operational algorithm. For this purpose, different ways of reducing the dimension of the problem have been developed. The official operational TCCON GGG algorithm WunchEtAl15 solves the inverse problem by scaling the prior profile based on the measured data. This method is robust and computationally efficient, but only retrieves one piece of information and thus can give largely inaccurate results about the density profiles.

An improved dimension reduction method for the FTIR retrieval based on reducing the rank of the prior covariance matrix was used by Tukiainen et al. Simo using computational methods developed by Solonen et al. Solonen

. This method confines the solution to a subspace spanned by the non-negligible eigenvectors of the prior covariance matrix. This approach allows a retrieval using more basis functions than the operational method and thus gives more accurate solutions. However, the prior has to be hand tuned to have a number of non-zero singular values that correspond to the number of degrees of freedom for the signal in the measurement. Moreover, whatever information lies in the complement of this subspace remains unused.

In this work, we introduce an analysis method for determining the number of components the measurement can provide information from Rodgers , as well as the likelihood informed subspace dimension reduction method for non-linear statistical inverse problems TC1 ; TC2 . We show that these two formulations are in fact equal. We then proceed to implement a dimension reduction scheme for the FTIR inverse problem using adaptive MCMC sampling HaarioEtAl1 ; HaarioEtAl2 to fully characterize the non-linear posterior distribution, and show that this method gives an optimal result with respect to Hellinger distance to the non-approximated full dimensional posterior distribution. In contrast with the previously implemented prior reduction method, the likelihood informed subspace method is also shown to give the user freedom to use a prior derived directly from an ensemble of previously conducted atmospheric composition measurements.

2 Methodology

We consider the atmospheric composition measurement carried out at the FMI Arctic Research Centre, Sodankylä, Finland KiviEtAl16 . The on-site Fourier transform infrared spectrometer (FTIR) measures solar light arriving to the device directly from the sun, or more precisely, the absorption of solar light at different wavelengths within the atmosphere. From the absorption spectra of different trace gases (, ) we can compute the corresponding vertical density profiles, i.e. the fraction of the trace gas in question as a function of height.

Let us consider the absorption spectrum with separate wavelengths. The solar light passing through the atmosphere and hitting the detector can be modeled using the Beer-Lambert law, which gives, for wavelengths , the intensity of detected light as


where is the intensity of solar light when it enters the atmosphere, the atmosphere has absorbing trace gases, is the absorption coefficient of gas , which depends on height and on the wavelength , and is the density of gas at height . The second degree polynomial and the constant in (1) are used to describe instrument related features and the continuity properties of the spectrum. In reality, solar light is scattered on the way by atmospheric particles. This phenomenon is relatively weak in the wavelength band we are considering in this work, so it will be ignored for simplicity.

The absorption in continuous atmosphere is modeled by discretizing the integral in equation (1) into a sum over atmospheric layers and assuming a constant absorption for each separate layer. This way, a discrete computational forward model

can be constructed, giving an absorption spectrum as data produced by applying the forward model to a state vector

describing the discretized atmospheric density profile for a certain trace gas. In this work, we limit ourselves to consider the retrieval of atmospheric methane ().

2.1 Bayesian formulation of the inverse problem

Consider an inverse problem of estimating unknown parameter vector from observation ,


where our physical model is describe by the forward model

and the random variable

represents the observation error arising from instrument noise and forward model approximations. In the Bayesian approach to inverse problems KaipioSomersalo our uncertainty about is described by statistical distributions. The solution to the problem is obtained as posterior distribution of conditioned on a realization of the data and depending on our prior knowledge. By the Bayes’ formula, we have


where is the posterior distribution, the likelihood and the prior distribution. The proportionality comes from a constant that does not depend on the unknown . In this work, we assume the prior to be Gaussian, , e.g.


Also, the additive noise is assumed to be zero-mean Gaussian with known covariance matrix, , so the likelihood will have form


When the forward model is non-linear, the posterior distribution can be explored by Markov chain Monte Carlo (MCMC) sampling. When the dimension of the unknown is hight, for example by discretization of the inverse problem, MCMC is known to be inefficient. In this paper, we utilize dimension reduction to be able to make MCMC more efficient in high dimensional and high CPU problems.

2.2 Prior reduction

The operational GGG algorithm for the FTIR retrieval problem WunchEtAl15 is effectively one dimensional as it only scales the prior mean profile. However, there are about three degrees of freedom in the FTIR signal for the vertical profile information. To construct basis functions that could utilize this information a method that uses prior reduction was developed in Simo

. It is based on the singular value decomposition on the prior covariance matrix,


which allows further decomposition as


If the prior can be chosen so that most of the singular values are negligible, then the rank of the prior covariance matrix can be reduced by considering only the first singular values and vectors:


The unknown has an approximate representation by basis vectors from the columns of and using a reduced dimensional parameter as


By the construction, the random vector has a simple Gaussian prior, , which allow us to write the approximate posterior as


Now, instead running MCMC in the full space defined by , we can sample the low dimensional parameter and retain the approximation of the full posterior by equation (9).

2.3 Likelihood-informed subspace

The prior reduction approach depends on the ability to construct a realistic prior that can be described by only a few principle components. For the FTIR retrieval problem this is possible to some extent Simo . However, there are several possible caveats. We have to manually manipulate the prior covariance matrix to have a lower rank, which can lead to information loss as the solution will be limited to a subspace defined by the reduced prior only.

In atmospheric remote sensing the information content of the measurement is an important concept to be considered when designing the instruments and constructing the retrieval methodology, we refer to book by Rodgers Rodgers .

Consider a linearized version of the inverse problem in equation (2),


with Gaussian prior and noise. The forward model is assumed to be differentiable, and denotes the Jacobian matrix of the forward model with elements . Using Cholesky factorizations for the known prior and error covariances,


we can perform pre-whitening of the problem by setting


Now the problem can be written as


with and a priori .

As the unknown and the error are assumed to be independent, the same holds for the scaled versions. We can compare the prior variability of the observation depending on and that coming from the noise by


The variability in that depends only on the parameter depends itself on and it can be compared to the unit matrix that has the contribution from the scaled noise. The directions in which are larger than unity are those dominated by the signal. Formally this can be seen by diagonalizing the scaled problem by the singular value decomposition,


and setting


The transformations and conserve the unit covariance matrix. In other words, is distributed with covariance . This is a diagonal matrix, and the elements of vector that are not masked by the measurement error are those corresponding to the singular values of the pre-whitened Jacobian

. Furthermore, degrees of freedom for signal and noise are invariant under linear transformations

Rodgers , so the same result is also valid for the original .

Another way to compare the information content of the measurement relative to the prior was used in TC1 . This is to use the Rayleigh quotient


where and is the Gauss-Newton approximation of Hessian matrix of the data misfit function


Directions for which are the ones in which the likelihood contains information relative to the prior. This follows from the fact that the th eigenvector of the prior-preconditioned Gauss-Newton Hessian


maximizes the Rayleigh quotient over a subspace and the directions for which correspond to the first eigenvalues of . We call these vectors the informative directions of the measurement.

To see the correspondence for the two approaches for the informative directions we notice that for it holds that


The eigenvalues of matrix less than unity correspond to the singular values less than unity of the scaled Jacobian . The corresponding eigenvectors are the same as the right singular vectors of . The informative and non-informative directions for a simple 2-dimensional Gaussian case are illustrated in Figure 1.

Next, we use the informative directions of the measurement to reduce the dimension of the inverse problem. Consider approximations for the posterior of the form


where is rank projection matrix. In TC1 and TC2 it was shown that for any given , there exists a unique optimal projection that minimizes the Hellinger distance between the approximative rank posterior and the full posterior. Furthermore, using the connection to Rodgers’ formalism, the optimal projection can be obtained explicitly with the following definition.

Definition 1 (Lis)

Let be a matrix containing the first left singular vectors of the scaled Jacobian . Define


The rank LIS projection for the posterior approximation (22) is given by


The range of projection is a subspace of state space spanned by the column vectors of matrix . We call the subspace the likelihood-informed subspace (LIS) for the linear inverse problem, and its complement the complement subspace (CS).

Figure 1: Illustration of an informative direction and a non-informative direction using a 2-dimensional Gaussian case. Here, the likelihood has only one informative component, so the remaining direction for the posterior is obtained from the prior.
Definition 2

The matrix of singular vectors forms a complete orthonormal system in and we can define


and the projection can be written as


Define the LIS-parameter and the CS-parameter as


The parameter can now be naturally decomposed as


Using this decomposition and properties of multivariate Gaussian distributions, we can write the prior as


and approximate the likelihood by using the informative directions,


which leads us to the approximate posterior


When the forward model is not linear, the Jacobian and Hessian matrices depend on the parameter and the criterion (18) only holds point wise. To extend this local condition into a global one, we consider the expectation of the local Rayleigh quotient over the posterior,


The expectation is with respect to the posterior distribution, which is not available before the analysis. In practice, an estimate is obtained by Monte Carlo,


where is a set of samples from some reference distribution which will be discussed later in this work. We can now use the singular value decomposition to find a basis for the global LIS analogously to the linear case.

The advantage of LIS dimension reduction is that it is sufficient to use MCMC to sample the low-dimensional from the reduced posterior , and form the full space approximation using the known analytic properties of the Gaussian complement prior .

3 Results

To solve the inverse problem related to the FTIR measurement Simo , we use adaptive MCMC HaarioEtAl2 ; Laine and SWIRLAB Swirlab toolboxes for Matlab. The results from newly implemented LIS-algorithm as well as from the previous prior reduction method are compared against a full dimensional MCMC simulation using the Hellinger distance of approximations to the full posterior. We use a prior derived from an ensemble of atmospheric composition measurements by the ACE satellite BernathEtAl05 . The vertical prior distribution, prior covariance and prior singular values are illustrated in Figure 2.

In Figure 3, we show the results of our retrievals using full-space MCMC, compared with LIS dimension reduction and prior reduction using 4 basis vectors in each method. The retrievals are further compared against accurate in-situ measurements made using AirCore balloon soundings KarionEtAl10 which are available for the selected cases, also included in Figure 3. In this example, the Monte-Carlo estimator (33) for in equation (33) was computed using 1000 samples drawn from the Laplace approximation , where and are the posterior MAP and covariance, respectively, obtained using optimal estimation Rodgers .

Figure 2:

The prior derived from an ensemble of ACE satellite measurements. Left: Full prior profile, mean with dashed line and 95% probability limits in grey. Top right: covariance matrix derived from the measurements. Bottom right: first 20 singular values of the prior covariance matrix.

In order to compare the performance of MCMC methods, we define the sample speed of a MCMC run as

Definition 3

The effective sample size of a MCMC chain is given by


where is the length of the MCMC chain and is lag- autocorrelation for parameter ripleystoch . Define the sample speed of an MCMC chain as


where is the total computation time of the MCMC chain.

For the MCMC runs shown in Figure 3, we get as corresponding sample speeds as samples per second:

Figure 3: Atmospheric density profile retrieval results. Retrieved posterior in green, prior in gray, and in-situ AirCore measurement in red. The color shading indicates areas where 95% of the profiles are. Right: MCMC with in full space. Middle: MCMC with LIS. Right: MCMC with prior reduction.

In order to compare the approximate posteriors obtained from prior reduction and LIS-dimension reduction against the full posterior, we use the discrete Hellinger distance,


where and are discrete representations of the full and approximate posterior distributions obtained from histograms of corresponding MCMC runs. The Hellinger distances of both approximations to the full posterior can be seen in Figure 4 together with the corresponding sample speeds, both as a function of the number of singular vectors used. In Figure 4 have also visualized the first four singular vectors used in prior reduction and LIS method for the example retrieval in Figure 3.

Figure 4: Left: Hellinger distances to full posterior and sample speeds of corresponding MCMC runs as functions of singular vectors used in the approximation. Top right: first 4 singular vectors from prior reduction. Bottom right: first four singular vectors of forming the LIS basis.

4 Conclusions

Although both of the discussed dimension reduction methods provide roughly the same computational gains in the performance of the MCMC sampler, we see from Figure 4 that while using an empirical prior, the prior reduction method requires a lot more singular vectors to achieve the same Hellinger distance from the full posterior as the LIS method, which gets really close already with 4 singular vectors. We conclude that the LIS method gives an efficient MCMC sampling algorithm to solve the inverse problem arising from the FTIR retrieval, with an additional improvement of allowing the direct usage of an empirical prior.

We thank Dr. Rigel Kivi from FMI Arctic Research Centre, Sodankylä, Finland for the AirCore and TCCON data. We thank Dr. Tiangang Cui from Monash University and the mathematical research institute MATRIX in Australia for organizing a workshop where a part of this research was performed. This work has been supported by Academy of Finland (projects INQUIRE, IIDA-MARI and CoE in Inverse Modelling and Imaging) and by EU’s Horizon 2020 research and innovation programme (project GAIA-CLIM).


  • (1) Bernath, P.F., McElroy, C.T., Abrams, M.C., Boone, C.D., Butler, M., Camy-Peyret, C., Carleer, M., Clerbaux, C., Coheur, P.F., Colin, R., DeCola, P., DeMazière, M., Drummond, J.R., Dufour, D., Evans, W.F.J., Fast, H., Fussen, D., Gilbert, K., Jennings, D.E., Llewellyn, E.J., Lowe, R.P., Mahieu, E., McConnell, J.C., McHugh, M., McLeod, S.D., Michaud, R., Midwinter, C., Nassar, R., Nichitiu, F., Nowlan, C., Rinsland, C.P., Rochon, Y.J., Rowlands, N., Semeniuk, K., Simon, P., Skelton, R., Sloan, J.J., Soucy, M.A., Strong, K., Tremblay, P., Turnbull, D., Walker, K.A., Walkty, I., Wardle, D.A., Wehrle, V., Zander, R., Zou, J.: Atmospheric Chemistry Experiment (ACE): Mission overview. Geophysical Research Letters 32(15) (2005). DOI 10.1029/2005GL022386
  • (2) Cui, T., Martin, J., Marzouk, Y.M., Solonen, A., Spantini, A.: Likelihood-informed dimension reduction for nonlinear inverse problems. Inverse Problems 30(11), 114,015 pp. 28 (2014). DOI 10.1088/0266-5611/30/11/114015
  • (3) Dils, B., Buchwitz, M., Reuter, M., Schneising, O., Boesch, H., Parker, R., Guerlet, S., Aben, I., Blumenstock, T., Burrows, J.P., Butz, A., Deutscher, N.M., Frankenberg, C., Hase, F., Hasekamp, O.P., Heymann, J., De Mazière, M., Notholt, J., Sussmann, R., Warneke, T., Griffith, D., Sherlock, V., Wunch, D.: The Greenhouse Gas Climate Change Initiative (GHG-CCI): comparative validation of GHG-CCI CHY/ENVISAT and TANSO-FTS/GOSAT CO and CH retrieval algorithm products with measurements from the TCCON. Atmospheric Measurement Techniques 7(6), 1723–1744 (2014). DOI 10.5194/amt-7-1723-2014
  • (4) Feldman, D.R., Collins, W.D., Gero, P.J., Torn, M.S., Mlawer, E.J., Shippert, T.R.: Observational determination of surface radiative forcing by CO from 2000 to 2010. Nature 519, 339–343 (2015). DOI 10.1038/nature14240
  • (5) Haario, H., Laine, M., Mira, A., Saksman, E.: DRAM: Efficient adaptive MCMC. Statistics and Computing 16(4), 339–354 (2006). DOI 10.1007/s11222-006-9438-0
  • (6) Haario, H., Saksman, E., Tamminen, J.: An adaptive Metropolis algorithm. Bernoulli 7(2), 223–242 (2001). DOI 10.2307/3318737
  • (7) Kaipio, J., Somersalo, E.: Statistical and Computational Inverse Problems. Springer-Verlag, New York (2005). DOI 10.1007/b138659
  • (8) Karion, A., Sweeney, C., Tans, P., Newberger, T.: AirCore: An Innovative Atmospheric Sampling System. Journal of Atmospheric and Oceanic Technology 27(11), 1839–1853 (2010). DOI 10.1175/2010JTECHA1448.1
  • (9) Kivi, R., Heikkinen, P.: Fourier transform spectrometer measurements of column CO at Sodankylä, finland. Geoscientific Instrumentation, Methods and Data Systems 5(2), 271–279 (2016). DOI 10.5194/gi-5-271-2016
  • (10) Laine, M.: MCMC toolbox for Matlab (2013). URL
  • (11) Ripley, B.D.: Stochastic Simulation. Wiley (1987)
  • (12) Rodgers, C.D.: Inverse Methods for Atmospheric Sounding: Theory and Practice. World Scientific, Singapore (2000)
  • (13) Solonen, A., Cui, T., Hakkarainen, J., Marzouk, Y.: On dimension reduction in Gaussian filters. Inverse Problems 32(4), 045,003 (2016). DOI 10.1088/0266-5611/32/4/045003
  • (14) Spantini, A., Solonen, A., Cui, T., Martin, J., Tenorio, L., Marzouk, Y.: Optimal low-rank approximations of Bayesian linear inverse problems. SIAM Journal on Scientific Computing 37(6), A2451–A2487 (2015). DOI 10.1137/140977308
  • (15) Tukiainen, S.: Swirlab toolbox for Matlab (2017). URL
  • (16) Tukiainen, S., Railo, J., Laine, M., Hakkarainen, J., Kivi, R., Heikkinen, P., Chen, H., Tamminen, J.: Retrieval of atmospheric CH4 profiles from Fourier transform infrared data using dimension reduction and MCMC. Journal of Geophysical Research: Atmospheres 121, 10,312–10,327 (2016). DOI 10.1002/2015JD024657
  • (17) Wunch, D., Toon, G.C., Sherlock, V., Deutscher, N.M., Liu, X., Feist, D.G., Wennberg, P.O.: The total carbon column observing network’s GGG2014 data version. Tech. rep., Oak Ridge, Tennessee, U.S.A.: Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory (2015). DOI 10.14291/tccon.ggg2014.documentation.R0/1221662