Survey of Deep Learning Methods for Inverse Problems

11/07/2021 ∙ by Shima Kamyab, et al. ∙ University of Waterloo Shiraz University 144

In this paper we investigate a variety of deep learning strategies for solving inverse problems. We classify existing deep learning solutions for inverse problems into three categories of Direct Mapping, Data Consistency Optimizer, and Deep Regularizer. We choose a sample of each inverse problem type, so as to compare the robustness of the three categories, and report a statistical analysis of their differences. We perform extensive experiments on the classic problem of linear regression and three well-known inverse problems in computer vision, namely image denoising, 3D human face inverse rendering, and object tracking, selected as representative prototypes for each class of inverse problems. The overall results and the statistical analyses show that the solution categories have a robustness behaviour dependent on the type of inverse problem domain, and specifically dependent on whether or not the problem includes measurement outliers. Based on our experimental results, we conclude by proposing the most robust solution category for each inverse problem class.




excellent help to all.


thanks for the help


page 18

page 19

page 23

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

An inverse problem [bertero1998introduction, fieguth2010statistical, stuart2010inverse]

seeks to formulate the solution to estimating the unknown state underlying a measured system. Specifically, a forward function

describes the relationship of the measured output


as a function of the system state , subject to a measurement noise . The objective of the inverse problem is to estimate as a function of given measurement , assuming a detailed knowledge of the system, , where if is not known or is partially known the problem becomes blind or semi-blind [lucas2018using].

Different perspectives lead to different types of inverse problems. From the perspective of data type, two classes of inverse problems are restoration and reconstruction [arridge2019solving], where restoration problems have the same domain for measurement and state (e.g., signal or image denoising), while reconstruction has different domains (e.g., 3D shape inference). Next, from the perspective of modeling, inverse problems are classified into static and dynamic problems, where the static case seeks a single estimate , consistent with some prior model on and the forward model , whereas the dynamic case seeks estimates over time, consistent with an initial prior and a dynamic model. In this paper we will examine each of these inverse problems.

Existing analytical methods for solving inverse problems take advantage of domain knowledge to regularize and constrain the problem to obtain numerically-stable solutions. These methods are classified into four categories [arridge2019solving]:

  • Analytic inversion, having the objective of finding a closed form, possibly approximate, of . This category of solutions will be highly problem dependent.

  • Iterative methods, which optimize the data inconsistency term


    Because of the ill-posed nature of most inverse problems, the iteration tends to have a semi-convergent behaviour, with the reconstruction error decreasing until some point and then diverging, necessitating appropriate stopping criteria.

  • Discretization as regularization, including projection methods searching for an approximate solution of an inverse problems in a predefined subspace. Choosing an appropriate subspace has high impact on finding stable solutions.

  • Variational methods, with the idea of minimizing data consistency penalized using some regularizer parameterized by :


    This is a generic adaptable framework where are chosen to fit a specific problem, of which well-known classical examples include Tikhonov [groetsch1984theory] and total variation [makovetskii2015explicit] regularization.

These approaches have weaknesses in requiring explicitly identified prior knowledge, selected regularizers, some shortcomings in handling noise, computational complexity in inference due to the optimization-based mechanisms, and most significantly limited applicability, in the sense that each inverse problem needs to be solved one-off.

As a result, we are highly motivated to consider the roles of Deep Neural Networks (DNNs), which have the advantages of being generic data driven methods, are adaptable to a wide variety of different problems, and can learn prior models implicitly through examples. DNNs are currently in widespread use to solve a vast range of problems in machine learning


, artificial intelligence

[samek2017explainable], and computer vision [kim2018inversefacenet]. Strong advantages of using such structures include their near-universal applicability, their real-time inference [canziani2016analysis, khan2019comparing], and their superiority in handling sensor and/or measurement noise [han2018co].

A variety of studies [aggarwal2018modl, lucas2018using] have shown that planned, systematic DNNs will tend to have fewer parameters and better generalization power compared to generic architectures, which motivates us to consider systematic strategies in addressing complex inverse problems.

In principle, every deep learning framework could be interpreted as solving some sort of inverse problem, in the sense that the network is trained to take measurements and to infer, from given ground truth, the desired unknown state. For example, for the common DNN application to image classification, the input is a (measured) image, and the network output is a (unknown state) label, describing the object or scene appearing in the image. The network parameters then implicitly learn the inverse of the forward model, which had been the generation of an image from a label.

Using DNNs for solving inverse problems aims to approximate the inverse of the forward model [fieguth2010statistical]. In some cases, the forward model may be explicitly defined [anirudh2018unsupervised, rick2017one, aggarwal2018modl], whereas in other cases it may be implicitly defined in the form of the training data [adler2017solving, antholzer2019deep, jin2017deep, kelly2017deep, anirudh2018unsupervised, zhang2018ista, fan2017inversenet]. In this paper our focus is on solving non-blind inverse problems, with the forward model known. Analytical approaches to inverse problems, whether deterministic or stochastic, take advantage of the explicit forward model and prior knowledge in formulating the solution; in contrast, DNNs cannot take advantage of such information, and must instead learn implicitly from large datasets of training data in a black-box approach.

Inspired by the above techniques, there are indeed a number of proposed deep frameworks in the literature with the aim of bringing regularization techniques or prior knowledge into the DNN learning process for solving inverse problems [aggarwal2018modl, rick2017one, dosovitskiy2015flownet, wang2015deep, xu2014deep, schuler2015learning]. In this paper, we classify deep solutions for inverse problems into three categories based on their objective criteria, and compare them in solving different types of inverse problems. The focus of this paper is comparing the robustness of different deep learning structures based on their optimization criterion associated with the training scheme; that is, the main objective of this research is to provide insight into the choice of appropriate framework, particularly with regards to performance robustness. It is worth noticing here that our goal is not to outperform the state-of-the-art performance in different problems, rather to examine different frameworks with fair parameter settings and performing at least as well as existing analytical approaches. Using these frameworks, we select a prototype inverse problem from each category and evaluate the performance and the robustness of the designed frameworks. We believe the results obtained in this way give insight into the strength of each solution category in addressing different categories of inverse problems.

The rest of this paper is organized as follows: Section 2 includes a review of the most recent deep approaches to solving inverse problems; Section 3 describes the problem definition, introducing three main categories for deep solutions for inverse problems; Section 4 explains the experimental results including robustness analysis; finally Section 6 concludes the paper, proposing the best approach based on our experiments.

2 Literature Review

Inverse problems have had a long history [engl1996regularization, fieguth2010statistical, stuart2010inverse] in a wide variety of fields. In our context, since imaging involves the observing of a scene or phenomenon of interest, through a lens and spatial sensor, where the goal is to infer some aspect of the observed scene, essentially all imaging is an inverse problem, widely explored in the literature [bertero1998introduction, mousavi2017learning, de2016structure]

. Imaging-related inverse problems may fall under any of image recovery, restoration, deconvolution, pansharpening, concealment, inpainting, deblocking, demosaicking, super-resolution, reconstruction from projections, compressive sensing, among many others.

Inverse problems are ultimately the deducing of some function which inverts the forward problem,


where some objective criterion obviously needs to be specified in order to select . Since is very large (an input image has many pixels), unknown, and frequently nonlinear, it has become increasingly attractive to consider the role of DNNs, in their role as universal function approximators, in deducing , and a number of approaches have been recently proposed in this fashion [lucas2018using, arridge2019solving, mccann2019algorithms].

The most common approach when using DNNs for inverse problem solving includes optimizing the squared-error criterion , with a DNN to be learned [adler2017solving, antholzer2019deep, jin2017deep, kelly2017deep, anirudh2018unsupervised, zhang2018ista, fan2017inversenet]. This strategy implicitly finds a direct mapping from to using pairs as the training data in the learning phase, which seeks to solve


for the network weights in the DNN. Such supervised training needs a large number of data samples, which in some cases may be generated from the forward function .

Recent work in direct mapping includes [haggstrom2019deeppet], in which an encoder-decoder structure is proposed to directly solve clinical positron emission tomography (PET) image reconstruction. Similarly [chen2019application] proposes a direct mapping deep learning framework to identify the impact load conditions of shell structures based on their final state of damage, an inverse problem of engineering failure analysis.

Recent research investigates the incorporation of prior knowledge into DNN solutions for inverse problems. In particular, the use of intelligent initialization of DNN weights and analytical regularization techniques form the main classes of existing work in this domain [lucas2018using]. In [anirudh2018unsupervised]

, an unsupervised deep framework is proposed for solving inverse problems using a Generative Adversarial Network (GAN) to learn a prior without any information about the measurement process. In 


, a variational autoencoder (VAE) is used to solve electrical impedance tomography (EIT), a nonlinear ill-posed inverse problem. The VAE uses a variety of training data sets to generate a low dimensional manifold of approximate solutions, which allows the ill-posed problem to be converted to a well-posed one.

The forward model provides knowledge regarding data generation, based on the physics of the system. In [rick2017one] an iterative variational framework is proposed to solve linear computer vision inverse problems of denoising, impainting, and super-resolution. It proposes a general regularizer for linear inverse problems which is first learned by a huge collection of images, and which is then incorporated into an Alternating Direction Method of Multipliers (ADMM) algorithm for optimizing:


Here regularizer was learned from image datasets and is the network weight matrix, as before. Here is a matrix, the (assumed to be) linear forward model.

The equivalent approach for a non-linear forward model is considered in [li2018nett], in which a data consistency term as a training objective incorporates the forward model into the problem:


In [senouf2019self], a self-supervised deep learning framework is proposed for solving inverse problems in medical imaging using only the measurements and forward model in training the DNN.

Further DNN methods for inverse problems are explored in [aggarwal2018modl], where the forward model is explicitly used in an iterative deep learning framework, requiring fewer parameters compared to direct mapping approaches. In [yaman2019self], an iterative deep learning framework is proposed for MRI image reconstruction. The work in [bar2019unsupervised] proposes an unsupervised framework for solving forward and inverse problems in EIT. In [cha2019unsupervised]

the analytical forward model is directly used in determining a DNN loss function, yielding an unsupervised framework utilizing knowledge about data generation. Other methods optimize data consistency using an estimate of the forward model, learned from training data 


The approach presented in [maass2019deep] is closely related to ours, and aims at analysing deep learning structures for solving inverse problems, seeking to understand neural networks for solving small inverse problems. Our goal in this paper is to categorize deep learning frameworks for different inverse problems, based on their objectives and training schemes, investigating the power of each in solving certain types of inverse problems.

3 Problem Definition

Let us consider a forward model


with given noise process , assumed to be white. There are two fundamental classes of inverse problems to solve:

  • Static Estimation Problems, in which the system state is static, without any evolution over time [fieguth2010statistical]. We will consider the following static problems:

    • Image Restoration,

      part of a class of inverse problems in which the state and measurement spaces coincide (same number of pixels). Typically the measurements are a corrupted version of the unknown state, and the problem is to recover an estimate of the true signal from its corrupted version knowing the (forward) distortion model. Robustness and outlier detection are the main requirements for this class of inverse problems.

    • Image Reconstruction, to find a projection from some measurement space to a differently sized state, such as 3D shape reconstruction from 2D scenes. These problems need careful regularization to find feasible solutions.

  • Dynamic Estimation Problems, in which is subject to dynamics and measurements over time [fieguth2010statistical], such as in object tracking.

Our focus is on DNNs as data-driven models for solving inverse problems, so we wish to redefine inverse problems to the context of learning from examples in statistical learning theory 

[vito2005learning]. We need two sets of variables:


The relation between input and output is described by a probability distribution

, where the distribution is known only through a finite set of samples, the training set


assumed to have been drawn independently and identically distributed (i.i.d.) from . The learning objective is to find a function to be an appropriate approximation of output in the case of a given input . That is,


such that was learned on the basis of .

In order to measure the effectiveness of estimator function in inferring the desired relationship described by , the expected conditional error can be used:


where is the cost or loss function, measuring the cost associated with approximating true value with an estimate . Choosing a squared loss and allows us to derive


the classic optimal Bayesian least-squares estimator [fieguth2010statistical]. In the case of learning from examples, (13) cannot be reconstructed exactly since only a finite set of examples is given; therefore a regularized least squares algorithm may be used as an alternative [poggio1989theory, cucker2002mathematical], where the hypothesis space is fixed and the estimate is obtained as


where is a penalty term and a regularization parameter. We may choose to minimize the discrepancy


however in general it is much simpler, and sufficient, to select via cross-validation.

Given that is the hypothesis space of possible inverse functions, in this paper it is quite reasonable to understand to be the space of functions which can be learned by a deep neural network, on the basis of optimizing its weight matrix . Based on the optimization criterion (14), which is actually the variational framework in functional analytic regularization theory [poggio1985computational], and which forms the basis for inverse-function DNN learning, we classify deep learning frameworks for solving inverse problems into three categories, based on optimization criteria and training schemes:

  • Direct Mapping

  • Data Consistency Optimizer

  • Deep Regularizer

Each of these is developed and defined, as follows.

3.1 Direct Mapping

The direct mapping category is used as the objective criterion in a large body of research in deep learning based inverse problems [adler2017solving, antholzer2019deep, jin2017deep, kelly2017deep, anirudh2018unsupervised, zhang2018ista, fan2017inversenet]. These methods seek to find end-to-end solutions for


whereby is the cost function to be minimized by a DNN , on the basis of optimizing DNN weights . specifies a generic analytical regularizer, to restrict the estimator to feasible solutions.

The Direct Mapping category approximates an estimator as an inverse to the forward model , requiring a dataset of pairs of observed measurements and corresponding target system parameters, as illustrated in Figure 1.

Figure 1: Direct mapping of deep learning inverse problems.

This category of DNN is typically used in those cases where we have a model-based imaging system having a linear forward model , where is an image, so that convolution networks (CNNs) are nearly always used. As discussed earlier, for Image Restoration problems the measurements themselves are already images, however in more general contexts we may choose to project the measurements as , back into the domain of , such that the CNN is trained to learn the estimator


The translation invariance of , relatively common in imaging inverse problems, makes the convolutional-kernel nature of CNNs particularly suitable for serving as the estimator for these problems.

In general, the performance of direct inversion is remarkable [lucas2018using]. However the receptive field (i.e., the size of the field of view the unit has over its input layer) of the CNN should be matched to the support of the point spread function [aggarwal2018modl]. Therefore, large CNNs with many parameters and accordingly extensive amount of training time and data are often needed for the methods in this category. These DNNs are highly problem dependent and for different forward models (e.g., with different matrix sizes, resolutions, etc.) a new DNN will need to be learned.

3.2 Data Consistency Optimizer

The Data Consistency Optimizer category of deep learning aims to optimize data consistency as an unsupervised criterion within a variational framework [aggarwal2018modl, cha2019unsupervised]:


where, as in (16), is the cost function to be minimized by DNN , parameterized by weights , subject to regularizer . The overall picture is summarized in Figure 2.

In contrast to (16), where the network cost function is expressed in the space of unknowns , here (18) expresses the cost in the space of measurements , based on forward model . That is, the data consistency term is no longer learning from supervised examples, rather from the forward model we obtain an unsupervised data consistency term, not needing data labels, whereby the forward model provides some form of implicit supervision.

Compared to the direct mapping category, the use of the forward model in (18) leads to a network with relatively few parameters, in part because the receptive field of the DNN need not be matched to the support of the point spread function. However, the ill-posedness of the inverse problem causes a semi-convergent behaviour [arridge2019solving] using this criterion, therefore an early stopping regularization needs to be adopted in the learning process.

Figure 2: Data consistency optimization, where the forward model is incorporated in the loss function of the DNN and is utilized during DNN training.

3.3 Deep Regularizer

Finally the Deep Regularizer category of deep learning methods continues to optimize the data consistency term, however the overall optimization process is undertaken in the form of an analytical variational framework and uses a DNN as the regularizer [rick2017one, li2018nett]:


Here is a pre-trained deep regularizer, based on weight matrix , usually chosen as a deep classifier [rick2017one, li2018nett], discriminating the feasible solutions from non-feasible ones.

This category usually includes an analytical variational framework consisting of a data consistency term and a learned DNN to capture the redundancy in parameter space (see Figure 3).

Figure 3: Deep regularized category of inverse problems, in which a DNN is used only as the regularizer as part of an analytical variational framework.

For this category, an iterative algorithm (deep or analytical) is used to actually perform the optimization of (19). The regularizer network itself is trained using the data of a specific domain. The Deep Regularizer category needs the fewest parameter settings, compared to the earlier categories; however because of the optimization based inference step it is computationally demanding.

4 Experiments

Our focus in this paper is to study solution robustness in the presence of noise and outliers during inference. This section explores experimental results, for each of the the fundamental inverse-problem classes (restoration, reconstruction, dynamic estimation) for each of the categories of solution (direct mapping (DM), data consistency optimizer (DC), deep regularizer (DR)), as discussed in Section 3. Our study is based on a statistical analysis via the Wilcoxon signed rank test [lathuiliere2019comprehensive], a well-known tool for analysing deep learning frameworks. The null hypothesis is that the result of each pairwise combination of DM, DC, and DR are from the same distribution, i.e., that the results are not significantly different. The experimental results are based on the following problems:

  • Linear Regression: a reconstruction problem, with the aim of finding line parameters from the noisy / outlier sample points drawn from that line.

  • Image Denoising: a restoration problem, with the objective of recovering a clean image from noisy observations. We use both synthetic texture images and real images.

  • Single View 3D Shape Inverse Rendering: a reconstruction

    problem, for which the domains of the measurements and system parameters are different. The measurements include a limited number of 2D points (input image landmarks) with the unknown state, to be recovered, a 3D Morphable Model (3DMM). We use a 3D model of the human face, based on eigen-faces obtained from principal component analysis.

  • Single Object Tracking: a dynamic estimation problem, for which the goal is to predict the location (system parameter) of a moving object based on its (noisy) locations, measured in preceding frames. While this problem seems to belong to the class of restoration problems, the embedded state in this problem requires additional assumptions regarding the time-dynamics, and thus additional search strategies.

All DNNs were implemented using the KERAS library

[chollet2015keras] and ADAM optimizer [kingma2014adam] on an NVIDIA GeForce GTX 1080 Ti. The DNN structures and the details of each trained DNN can be found in the corresponding subsection. Table 1 summarizes the overall experimental setup for all problems.

Inverse Problem Measurements Unknown parameters Forward Model Training Data
Linear Regression
2D coordinates of
N drawn samples
from the line
Slope, Intercept
Straight line
plus noise
including Gaussian noise
with heavy-tailed outliers
Image Denoising
Noisy Image
Clean Image
Image plus noise
5000 gray scale
texture images ()
from stationary random process [fieguth2010statistical]
including exponential
number of pixel outliers
with heavy tailed
3D Shape Rendering
Standard landmarks
on input face image
Parameters of a
BFM 3D model
Noisy projection
from 3D to 2D
72 landmarks on 2D
input image of a 3D human
face generated by a Besel
Face Model(BFM) [aldrian2012inverse]
including outliers
in input 2D landmarks
Single Object Tracking
(Dynamic Estimation)
Noisy location of a ball
in a board
from previous time step to current step
True Location of the ball
True object locations
plus noise
of a moving ball location
with different random initial states and variable speeds
including Gaussian noise
for all measurements.
Table 1: The four inverse problems considered in our experiments.

4.1 Linear Regression

We begin with an exceptionally simple inverse problem. Consider a set of one dimensional samples , subject to noise, with some number of the training data subject to more extreme outliers, as illustrated in Figure 4.

Figure 4: 1D sample points for linear regression, with Gaussian noise and occasional large outliers.

As an inverse problem, we need to define the forward model, which for linear regression is simply


Since our interest is in assessing the robustness of the resulting inverse solver, the number and behaviour of outliers should be quite irregular, to make it challenging for a network to generalize from the training data. As a result, the noise

is random variance, plus heavy-tailed (power law) outliers, where the


of outliers is exponentially distributed.

For this inverse problem, the unknown state is comprised of the system parameters . Thus linear regression leads to a reconstruction problem, for which the goal is to recover the line parameters from a sample set including noisy and outlier data points.

With the problem defined, we next need to formulate an approach for each of the three solution categories. For direct mapping (DM) and data consistency (DC), the training data and DNN structures are the same, shown in Figure 5, where the DC approach includes an additional layer which applies the given forward model of (20). We used the KERAS library, in which a Lambda layer is designed for this forward operation.

Since the problem is one-dimensional with limited spatial structure, the network contains only dense feed-forward layers. Residual blocks are used in order to allow gradient flow through the DNN and to improve training. Network training was based on 1000 records, each of noisy sample points.

Figure 5:

DNN structure for DM and DC solutions to linear regression. The layer type and number of neurons are reported below each layer. Note that in the DC case, there is an additional

Lambda layer, which computes the forward function from the predicted line parameters.

The Deep Regularizer (DR) category needs a different problem modeling scheme, since there is not a learning phase as in DM and DC. Instead, only a DNN (usually a classifier) is trained to be used as the regularizer in a variational optimization framework. The DNN regularizer is given the system parameters and determines whether they account for a feasible line. Here, we define the feasible line as a line having a tangent in some specified range. We generate a synthetic set of system parameters with associated labels for training a fully connected DNN as the regularizer for this category. Since our interest is in the DNN solution of the inverse problem, and not the details of the optimization, we have chosen two fairly standard optimization approaches, a simplex / Nelder-Mead approach [singer2009nelder]

and a Genetic Algorithm (GA) strategy, both based on their respective Matlab implementations. Because GA solutions may be different over multiple runs, we report the results averaged over ten independent runs.

Table 2 shows the average solution found by each category over 10 independent trainings for DM and DC, and 10 independent inferences for DR. The table also reports Least-Squares (LS) results as a point of reference method, particularly to show the improvement that deep learning methods have to offer for robustness in solving inverse problems. Observe the significant difference when the DNN methods are trained with noise-free as opposed to noisy data, such that the noisy training data force the network to acquire a robustness to outliers.

For DR we trained a 5 layer MLP with dense layers of sizes , as the regularizer, using the generated synthetic data including feasible line parameters (in the specific range) as the positive training samples and invalid line parameters as the negative training samples. The average test accuracy of the trained regularizer is .

Training Data Measure | Method DM DC DR-GA DR-NM () LS
Noisy + Outlier Error (Slope)
Error (Intercept)
Noise-Free Error (Slope)
Error (Intercept)
Table 2: The error of estimated lines, with parameters averaged over 10 independent training / inference runs, obtained by the three DNN categories compared with least-squares.

We performed the Wilcoxon signed rank test, for both cases of training with noisy data (Table 3) and noise-free training (Table  4). The tables show the pairwise p-values over the 10 independent runs. A in excess of

implies that the two methods are likely to stem from the same distribution; in particular, the Wilcoxon test computes the probability that the difference between the results of two methods are from a distribution with median equal to zero. Clearly all of the DNN methods are statistically significantly different from the least-squares (LS) results. For noisy training data, the statistical results in Table 

3 show similar performance for DM and DC, and for DR-NM and DR-GA, the latter similarity suggesting that the specific choice of optimization methodology does not significantly affect the DR performance.

(Wilcoxon Test)
DM - 0.695 0.002 0.002 0.002
DC 0.695 - 0.002 0.002 0.002
DR-GA 0.002 0.002 - 0.781 0.002
DR-NM 0.002 0.002 0.781 - 0.002
LS 0.002 0.002 0.002 0.002 -
Table 3: Wilcoxon signed rank test p-values obtained for the linear regression problem, using noisy and outlier data for both training and testing. We used 500 test samples to perform the statistical analysis over 10 independent training/inference steps of each method.

The results in Table 2

show that DM and DC significantly improve in robustness when trained with noisy data, relative to training with noise-free data. The principal difference between DM/DC versus DR is the learning phase for DM/DC, allowing us to conclude that, at least for reconstruction problems, a learning phase using noisy samples in training significantly improves the robustness of the solution. A further observation is that whereas DM and DC achieve similar performance, DC is unsupervised and DM is supervised. Thus it would appear that the forward model knowledge and the data consistency term as objective criterion for DC provide an equal degree of robustness compared to the supervised learning in DM.

(Wilcoxon Test)
DM - 0.002 0.002 0.002 0.002
DC 0.002 - 0.002 0.002 0.002
DR-GA 0.002 0.002 - 0.781 0.002
DR-NM 0.002 0.002 0.781 - 0.002
LS 0.002 0.002 0.002 0.002 -
Table 4: Like Table 3, but now using noise-free data, i.e., without any noise or outliers, for method training. Noisy and outlier data remain in place for testing.

For this reconstruction problem, we conclude that both DC and DM perform well, with the unsupervised DC showing strong performance both with noisy and noise-free training data.

4.2 Image Denoising (Restoration)

We now consider an image denoising problem, following the steps described in Section 4.1 for regression. We consider real and synthetic images, including 5 classes and 1200 training images, 400 test images per class, from the Linnaeus dataset [chaladze2017linnaeus] as real data, and synthesized 5000 texture images generated by sampling from stationary periodic kernels, as synthetic data.

The synthetic images are generated using an FFT method [fieguth2010statistical], based on a thin-plate second-order Gauss-Markov random field kernel


such that a texture

is found by inverting the kernel in the frequency domain,


with as element-by-element multiplication and division,

as unit-variance white noise, and with the kernel

zero-padded to the intended size of

. Further details about this approach can be found in [fieguth2010statistical].

Parameter , affecting the central element of the kernel , effectively determines the texture spatial correlation-length in , as


for process correlation length, , measured in pixels. We set to be a random integer in the range in our experiments.

All images are set to be in size, with pixel values normalized to . Pixels are corrupted by additive Gaussian noise, with an exponentially distributed number of outliers. The inverse problem is a restoration problem, having the objective of restoring the original image from its noisy/outlier observation. The linear forward model is


for measured, original, and added noise, respectively. The Gaussian noise

has zero mean and random variance, and an exponential number of pixels become outliers, their values replaced with a uniformly distributed random intensity value.

We used 5000 training samples and 500 test samples for the learning and evaluation phases of the DM and DC approaches. The DNN structure for both DM and DC is the same and is shown in Figure 6. In the case of DC, we design a DNN layer to compute the forward function. Since we are dealing with input images, both as measurements and system state, we design a fully convolutional DNN in an encoder-decoder structure, finding the main structures in the image through encoding and recovering the image via decoding. Since there may be information loss during encoding, we introduce skip connections to help preserve desirable information.

Figure 6:

DNN for the DM and DC solutions. We have a fully convolutional DNN with an encoder-decoder structure, where the values in parentheses indicate the stride value of the corresponding convolutional layer. The skip connection helps to recover desirable information which may be lost during encoding.

The DR category needs a pre-trained regularizer which determines whether the prediction is a feasible texture image. We trained a classifier for texture discrimination, generated using (22), from ordinary images gathered from the web, as the regularizer. Both GA and Nelder-Mead optimizers are used.

We use peak signal to noise ratio (PSNR) as the evaluation criterion, computed as


where , are the pixel in the ground-truth and predicted images, respectively. Note that in the DR case, since the input and output of the model are images, the GA optimization routine was unable to find the solution in a reasonable time, therefore we do not avoid report any DR-GA results for this problem.

As a reference point, we also report results obtained by the non-local means (NLM) filter [buades2011non], to give insight into the amount of improvement of deep learning inverse methods over a well-established standard in image denoising.

Figure 7

shows results based on synthetic textures. Each row in the figure shows a sample image associated with a particular correlation length noise standard deviation. The DM approach offers by far the best reconstruction among the DNN methods, and outperforms NLM in terms of PSNR. The time complexity of GA in DR-GA makes it inapplicable to problems of significant size (even though the images were still quite modest in size).

u Clean Image Input Image (Noisy) DM DC DR-NM NLM
Average PSNR
Figure 7: Image denoinsing results on synthetic textures. Only a single image is shown in each case, however the reported average PSNR at the bottom is computed over the entire test set. The given noisy image is subject to both additive noise and outliers. NLM, in the rightmost column, is the non-local means filter, a standard approach from image processing.

The Wilcoxon signed rank test was performed on the DM, DC and DR-(Nelder-Mead) results. The statistical analysis of the obtained results gave a value of 0.002 for each pairwise comparison, implying a statistically significant difference, thus the very strong performance of DM in Figure 7 is validated.

In the case of real images, Figure 8 shows the visual results obtained by DM, DC and DR-NM for seven test samples.

Clean Image Input Image (Noisy) DM DC DR-NM NLM
Average PSNR
Figure 8: As in Figure 7, but here for denoising results on the Linnaeus dataset. The reported average PSNR in the last row is computed over all test images. As in Figure 7, the DM results significantly outperform other DNN inverse solvers and also non-local means (NLM).

The statistical analysis is consistent with the results from the synthetic texture case, which is that all pairwise Wilcoxon tests led to a conclusion of statistically significant differences, with values well below .

From the results in Figures 7 and 8 and their respective statistical analyses, we conclude that:

  • For image denoising as a prototype for restoration problems, which have the same measurement and system parameter spaces, the concentration of the loss function on the true parameters (as in DM) provides better information and leads to a more effective estimator having greater robustness than the measurements themselves (as in DC).

  • DR-(Nelder-Mead) performed poorly, even though it optimizes data consistency, like DC, however we believe that the learning phase in DC, compared to DR, provides knowledge for its inference and allows DC to be more robust than DR for restoration inverse problems.

4.3 3D shape Inverse Rendering (Reconstruction)

We now wish to test a 3D shape inverse rendering (IR) [aldrian2012inverse] problem, for which a 3D morphable model (3DMM) [blanz1999morphable] describes the 3D shape of a human face . This model is based on extracting eigenfaces , usually using PCA, from a set of 3D face shapes as the training data, then to obtain new faces as a weighted combination of the eigenfaces. The 3D shape model reconstructs a 3D face in homogeneous coordinates as


where is the mean shape of the 3DMM, and the weight of eigenface . We use the Besel Face Model [aldrian2012inverse] as the 3DMM in this experiment for which there are 3D points in each face shape and 199 eigenfaces. We can therefore rewrite (27) as



is the tensor of

eigenfaces. In our experiments each face is characterized by 72 standard landmarks, shown in Figure 9, which are normalized and then presented to the system as the measurements. Therefore we actually only care about out of 3D points in the 3DMM. This experiment tackles the reconstruction of a 3D human face by finding the weights of the 3DMM from its input 2D landmarks. We generated training data from the 3DMM by assigning random values to the 3DMM weights, resulting in a 3D human face, and rendered the obtained 3D shape into a 2D image using orthographic projection.

Figure 9: Location and order of 72 standard landmarks on a 2D image of a sample human face.

The measurement noise consists of small perturbations of the 2D landmarks, with outliers as much larger landmark perturbations. We add zero-mean Gaussian noise having a standard deviation of in the training data and in the test data. Outliers are much larger, with a standard deviation of added to 10 of the 72 landmarks in of the training data and of the test data. Landmark point coordinates are in the range , so the outlier magnitudes are very large.

Let subscript represent the the set of landmark point indices, in which case the forward model is the orthographic projection


such that converts from homogeneous 3D to homogeneous 2D coordinates, and the measurement noise is


as noise and outliers associated with the projection operator. Since the goal of this inverse problem is to estimate in the 3DMM for a given 3D shape, we write (29) as


For the DM and DC solutions we generated 4000 sample faces as training data, using the Besel face model [aldrian2012inverse] as the 3DMM. The DR regularizer is a pre-trained classifier which discriminates a feasible 3D shape from random distorted versions of it.

In DC we implemented the forward function layer as described in [aldrian2012inverse], with the resulting DM and DC DNN shown in Figure 10

, where we used feed-forward layers because the system input is the vectorized

2D homogeneous coordinates and its output a weight vector. We design an encoder-decoder structure for DNNs, so as to map the 2D coordinates to a low dimensional space and to recover the parameters from that low dimensional representation.

Figure 10: DNN structure for DM and DC for 3D shape inverse rendering.

For the DR regularizer we trained a five layer MLP classifier to discriminate between a 3D face shape, generated by BFM, and randomly generated 3D point clouds as negative examples.

Figure 11 shows visual results obtained by each solution category, where heat maps visualize the point-wise error magnitude relative to the ground truth. The visual results show that the DM and DC methods can capture the main features in the face (including eye, nose, mouth) better than the DR variants, however the differences between DM and DC seem to be negligible.

To validate our observations, the numerical results and respective statistical analyses are shown in Tables 5 and 6. Table 5 lists the RMSE values for each solution category. We used 10 out of sample faces in the BFM model as test cases for reporting the results. In the case of DR (Nelder-Mead) we set the start point, i.e., , as a random value and report the averaged result over 10 independent runs. Note that the RMSE values are expected to be relatively large, since each 3D face shape provided by BFM is a point cloud of 3D coordinates in the range . As a point of comparison, we computed the average RMSE between a set of 500 generated 3D faces and 1000 random generated faces, to have a sense of RMSE normalization to random prediction. The average RMSE for random prediction is , a factor of two to four times larger than the RMSE values reported in Table 5.

Figure 11: Qualitative Results for 3D inverse rendering. Each result is shown as two faces, an upper with the actual 3D result, and a lower as a heat map showing the error magnitude in each point of predicted face are shown in the form of heat map for each prediction. For the DR method, the average error magnitude over 20 runs is reported. We use the Besel Face Model (BFM) [aldrian2012inverse] which is based on a 3D mean face and compensates for outliers.
Training DataMethod Noisy Test Cases () Noisy + Outlier Test Cases ()
Noisy + Outlier
Table 5: Average test RMSE with standard deviation values (over 10 out-of-sample faces of the BFM [aldrian2012inverse]) for 3D shape inverse rendering.
Training DataTest Data Noisy Noisy + Outlier
Noise-free DM - 0.19 0.43 0.30 - 0.06 0.06 0.06
DC 0.19 - 0.43 0.30 0.06 - 0.06 0.06
DR-GA 0.43 0.43 - 0.78 0.06 0.06 - 0.78
DR-NM 0.30 0.30 0.78 - 0.06 0.06 0.78 -
Noisy DM - 0.19 0.19 0.30 - 0.06 0.06 0.06
DC 0.19 - 0.30 0.30 0.06 - 0.12 0.30
DR-GA 0.19 0.30 - 0.78 0.06 0.12 - 0.78
DR-NM 0.30 0.30 0.78 - 0.06 0.30 0.78 -
Noisy + Outlier DM - 0.06 0.06 0.06 - 1 0.06 0.06
DC 0.06 - 0.06 0.06 1 - 0.06 0.06
DR-GA 0.06 0.06 - 0.78 0.06 0.06 - 0.78
DR-NM 0.06 0.06 0.78 - 0.06 0.06 0.78 -
Table 6: Wilcoxon signed rank test values for the 3D shape inverse rendering problem.

Table 6 shows the results of the Wilcoxon values for statistical significance in the difference between reported values in Table 5, where we consider a value threshold of .

Based on the preceding numerical results and statistical analysis, we claim the following about each solution category facing with Reconstruction inverse problems:

  • Broadly, for training and test data not involving outliers, the overall performance of the methods is similar, with DM outperforming. This observation shows that the learning phase is not crucial in the presence of noise, and methods which concentrate on the test data can achieve equal performance compared to trainable frameworks.

  • In cases involving outliers the performance of the methods is more distinct, but with the DM and DC methods, having a learning phase for optimizing their main objective term, outperforming the DR variants. We conclude that a learning phase is important to make methods robust to outliers.

  • In the case of DR, the results show similar performance of the GA and NM optimization schemes, with GA outperforming NM. This observation encourages the reader to use optimization methods with more exploration power [eftimov2019novel], the ability of an optimization method to search broadly across the whole solution space, for DR solutions to reconstruction problems.

  • In all cases, we can observe that although DC is unsupervised, its performance when solving reconstruction inverse problems is near to that of DM, even outperforming DM in the case of outliers. Therefore, it is possible to solve reconstruction problems even without label information in the training phase.

  • One interesting observation is that while 3D shape inverse rendering is a complex reconstruction problem, the results for each solution category are qualitatively similar to the very different and far simpler inverse problem of linear regression, where DC similarly outperformed training data containing noisy and outlier samples.

4.4 Single Object Tracking (Dynamic Estimation)

Up to this point we have investigated deep learning approaches applied to static problems. We would now like to examine a dynamic inverse problem, that of single-object tracking.

The classical approach for tracking is the Kalman Filter (KF) 

[fieguth2010statistical] and its many variations, all based on a predictor-corrector framework, meaning that the filter alternates between prediction (asserting the time-dynamics) and correcting (asserting information based on the measurements). For the inverse problem under study, we consider the current location estimation (filtering) in a two dimensional environment. Synthetic object tracking problems, as considered here, are studied in a variety of object tracking papers [kim2019labeled, choi2013rgb, black2003novel, lyons2009locating], where the specific tracking problem in this section is inspired from the approach of [fraccaro2017disentangled, vermaak2003variational]

The inverse problem task is to estimate the current ball location, given the noisy measurement in the corresponding time step and the previous state of the ball. Formally, we denote the measured ball location by , and the system state, the current location of the ball, as . The graphical model in Figure 12 illustrates the problem definition of the tracking problem, where the objective of the inverse problem is to address the dashed line, the inference of system state from corresponding measurement.

Figure 12: Graphical model for single object tracking: the goal is to estimate the location of a moving ball in the current frame in a bounded 2D environment. denotes the current measured location and is the current state.

To perform the experiments, we generate the training and test sets similar to [fraccaro2017disentangled] except that we assume that our measurements are received from a detection algorithm, which detects the ball location from input images having a size of pixels, and that the movement of the ball is non-linear.

In each training and test sequence the ball starts from a random location in the 2D environment, with a random speed and direction, and then moving for 30 time steps. The dynamic of the generated data includes changing the ball location and its velocity as


where is a constant and is set to . In our data, collisions with walls are fully elastic and the velocity decreases exponentially over time. In this simulation, the training and testing data-sets contain 10000 and 3000 sequences of 30-time steps, respectively.

The training measurement noise is


a mixture model of Gaussian noise with 5% outliers. The testing noise is similar,


with a higher likelihood of outliers.

The inverse problem is single-target tracking for which the dynamic of the model is unknown. The inverse problem of interest is to find in


As shown in Figure 12

, we can model our problem as a first order Markov model where the current measurement is independent of others given the current system state. The forward model is then defined as


We can model Markov models using Recurrent Neural Networks (RNN)

[krishnan2016structured, hafner2019learning, rangapuram2018deep, coskun2017long]. The DNN structure for DM and DC solution categories is shown in Figure 13, in which the LSTM layers lead the learning process to capture the time state and dynamic information in the data sequences.

Figure 13: DNN structure for DM and DC solution categories in the case of single object tracking problem.

We design the regularizer of the DR category as a classifier to classify location feasibility — those locations lying within the border of the 2D environment. Figure 14 shows the positive and negative samples which we used to train the DR regularizer.

Figure 14: The positive and negative samples used for training the DR regularizer, where the black and gray samples are in the positive and negative classes, respectively.

As before, we used GA (DR-GA) and Nelder-Mead (DR-NM) algorithms as optimizers for DR. In the case of using Nelder-Mead, the results vary as a function of starting point , and found that using the last sequence measurement as the starting point empirically gave the best result for DR-NM.

4.4.1 Visual and Numerical Results and Statistical Analysis

Table 7 includes the numerical results obtained by each method in our experiments, where we report the average RMSE between reference and predicted points on the test trajectory as the evaluation criterion for each method.

Training Data
Noisy + Outlier
Table 7: RMSE obtained by deep learning solution categories for tracking. The test data include both noise and outliers.
p-value (Wilcoxon Test)
Training Data:
Training Data:
Training Data:
Noisy + Outlier
DM - 0.160 0.002 0.002 - 0.002 0.002 0.002 - 0.002 0.002 0.002
DC 0.160 - 0.013 0.130 0.002 - 0.322 0.027 0.002 - 0.002 0.002
DR-GA 0.002 0.013 - 0.002 0.002 0.322 - 0.002 0.002 0.002 - 0.002
DR-NM 0.002 0.130 0.002 - 0.027 0.002 0.002 - 0.002 0.002 0.002 -
Table 8: Pairwise p values for tracking: the Wilcoxon signed rank test checks whether the obtained results are significantly different.

The obtained results and their statistical analysis are shown in Tables 7 and 8, based on which we conclude that

  • In the case of single object tracking, for which system parameters are permitted to evolve and be measured over time [fieguth2010statistical], the DM category achieves the best performance using all types of training data. The results are improved when the training data contain representative noise and outliers.

  • When the training does not include outliers, the DR-NM category achieves the second rank after DM; note that DR-NM is an unsupervised framework without a learning phase, showing that a learning phase is not necessarily required, and that looking only into test cases can give reasonable results.

  • When the training data include noisy and outlier samples, the solutions’ behaviour for single object tracking is similar to that of restoration problems. In particular, in single object tracking the measurements and system parameters are in the same space, like restoration problems.

  • In the case of DR solution category for dynamic estimation problems, it is observable that, unlike reconstruction problems, the NM optimization scheme performs better than the GA approach, emphasizing the importance of exploitation power [eftimov2019novel, xu2014exploration], referring to the ability of an optimization method to concentrate on a specific region of the solution space.

5 Discussion

Based on the statistical analyses adopted for robustness evaluation for each case, Table 9 summarizes the overall findings, for linear regression and 3D shape inverse rendering as reconstruction, image denoising as restoration, and single object tracking as dynamic estimation.

Inverse Problem Problem Type Training Data Test Data Score (Larger is better)
Reconstruction Noise-free Noisy + Outlier DC > (DR-GA=DR-NM) > DM
Noisy + Outlier Noisy + Outlier (DM = DC) > (DR-GA=DR-NM)
3D Shape
Reconstruction Noise-free Noisy DM = DC = DR-GA = DR-NM
Noise-free Noisy + Outlier DC > (DR-GA=DR-NM) > DM
Noisy Noisy DM = DC = DR-GA = DR-NM
Noisy Noisy + Outlier DM > (DC = DR-GA = DR-NM)
Noisy + Outlier Noisy DM > DC > (DR-GA = DR-NM)
Noisy + Outlier Noisy+ Outlier (DM = DC) > (DR-GA=DR-NM)
Restoration Noisy + Outlier Noisy+ Outlier DM > DC > (DR-GA = DR-NM)
Noise-free Noisy + Outlier (DM = DC) > DR-NM > DR-GA
Noisy Noisy + Outlier DM > DR-NM > (DC = DR-GA)
Noisy + Outlier Noisy + Outlier DM > DC > DR-NM > DR-GA
Table 9: Performance comparison by solution category and inverse problem types. Note that means that method is statistically significantly better than method .

From Table 9 we conclude the following:

  • In the case of reconstruction inverse problems, the presence of outliers in the training phase leads to distinct differences in robustness. Typically, DM will be the best method when the training data include outliers, and DC will outperform other methods based on having a data consistency term in its objective.

  • In reconstruction problems, comparing GA and NM optimization approaches in DR shows that GA achieves better performance indicating the importance of exploration power in optimization for this class of problems.

  • The restoration inverse problems, which recover the system parameters from some measurements from the same space, need label information (as in DM) to be robust against noise and outliers.

  • In the case of restoration problems in static estimation, DM has the highest rank among tested methods. We believe this is because, in the process of finding a mapping from one space to itself, the exploitation of accurate solution matters and this property is achieved using label information in the process of training the framework.

  • In the case of dynamic estimation problems, the DR solution performs well when the training data do not include outlier samples. Therefore we conclude that this class of problems could be solved without needing a learning phase and that solely the test case is sufficient to find a robust solution.

  • The dynamic estimation problems have additional challenges stemming from the time-dependent state information to be captured, an attribute which leads the solution to have different behavior from other problem types. We observed that there are similarities, based on the measurement and system parameter spaces, between the robustness power of the solution categories’ performance in a dynamic estimation problem and a static estimation problems with the same measurement and system parameter spaces.

6 Conclusions

This paper investigated deep learning strategies to explicitly solve inverse problems. The literature on deep learning methods for solving inverse problems was classified into three categories, each of which was evaluated on sample inverse problems of different types. Our focus is on the robustness of different categories, particularly with respect to their handling of noise and outliers. The results show that each solution category has different behaviours, in the sense of strengths and weaknesses with regards to problem assumptions, such that the problem characteristics need to be considered in selecting an appropriate solution mechanism for a given inverse problem.

Typically, reconstruction problems need more exploration power and the existence of outliers in their training data makes the DM category the most robust among deep learning solution categories. Otherwise, when the training data do not include outliers for reconstruction problems, DC achieves the best performance, although not using label information in their training phase. The restoration problems need a greater degree of exploitation power for which the DM methods are best suited. In the case of dynamic estimation problems, when the training data do not include outliers, DR achieves second rank, indicating that dynamic estimation problems can be solved with reasonable robustness without a need for learning in the presence of noise.