Random Mesh Projectors for Inverse Problems
We develop a new learning-based approach to ill-posed inverse problems. Instead of directly learning the complex mapping from the measured data to the reconstruction, we learn an ensemble of simpler mappings from data to projections of the unknown model into random low-dimensional subspaces. We form the reconstruction by combining the estimated subspace projections. Structured subspaces of piecewise-constant images on random Delaunay triangulations allow us to address inverse problems with extremely sparse data and still get good reconstructions of the unknown geometry. This choice also makes our method robust against arbitrary data corruptions not seen during training. Further, it marginalizes the role of the training dataset which is essential for applications in geophysics where ground-truth datasets are exceptionally scarce.READ FULL TEXT VIEW PDF
Random Mesh Projectors for Inverse Problems
A variety of imaging inverse problems can be discretized to a linear system where is the measured data, the imaging or forward operator, the object being probed by applying (often called the model), and the noise. Depending on the application, the set of plausible reconstructions could model natural, seismic, or biomedical images. In many cases the resulting inverse problem is ill-posed, either because of the poor conditioning of (a consequence of the underlying physics) or because .
A classical approach to solve ill-posed inverse problems is to minimize an objective functional regularized via a certain norm (e.g. , , total variation (TV) seminorm) of the model. These methods promote general properties such as sparsity or smoothness of reconstructions, sometimes in combination with learned synthesis or analysis operators, or dictionaries (Sprechmann et al. ).
In this paper, we address situations with very sparse measurement data () so that even a coarse reconstruction of the unknown model is hard to get with traditional regularization schemes. Unlike artifact-removal scenarios where applying a regularized pseudoinverse of the imaging operator already brings out considerable structure, we look at applications where standard techniques cannot produce a reasonable image (Figure 1). This highly unresolved regime is common in geophysics and requires alternative, more involved strategies (Galetti et al. ).
An appealing alternative to classical regularizers is to use deep neural networks. For example, generative models (GANs) based on neural networks have recently achieved impressive results in regularization of inverse problems (Bora et al. , Lunz et al. ). However, a difficulty in geophysical applications is that there are very few examples of ground truth models available for training (sometimes none at all). Since GANs require many, they cannot be applied to such problems. This suggests to look for methods that are not very sensitive to the training dataset. Conversely, it means that the sought reconstructions are less detailed than what is expected in data-rich settings; for an example, see the reconstructions of the Tibetan plateau (Yao et al. ).
In this paper, we propose a two-stage method to solve ill-posed inverse problems using random low-dimensional projections and convolutional neural networks. We first decompose the inverse problem into a collection of simpler learning problems of estimating projections into random (but structured) low-dimensional subspaces of piecewise-constant images. Each projection is easier to learn in terms of generalization error (Cooper ) thanks to its lower Lipschitz constant.
In the second stage, we solve a new linear inverse problem that combines the estimates from the different subspaces. We show that this converts the original problem with possibly non-local (often tomographic) measurements into an inverse problem with localized measurements, and that in fact, in expectation over random subspaces the problem becomes a deconvolution. Intuitively, projecting into piecewise-constant subspaces is equivalent to estimating local averages—a simpler problem than estimating individual pixel values. Combining the local estimates lets us recover the underlying structure. We believe that this technique is of independent interest in addressing inverse problems.
We test our method on linearized seismic traveltime tomography (Bording et al. , Hole ) with sparse measurements and show that it outperforms learned direct inversion in quality of achieved reconstructions, robustness to measurement errors, and (in)sensitivity to the training data. The latter is essential in domains with insufficient ground truth images.
), the past few years have seen the number of related deep learning papers grow exponentially. The majority address biomedical imaging (Güler and Übeylı , Hudson and Cohen ) with several special issues111IEEE Transactions on Medical Imaging, May 2016 (Greenspan et al. ); IEEE Signal Processing Magazine, November 2017, January 2018 (Porikli et al. [2017, 2018]). and review papers (Lucas et al. , McCann et al. 
) dedicated to the topic. All these papers address reconstruction from subsampled or low-quality data, often motivated by reduced scanning time or lower radiation doses. Beyond biomedical imaging, machine learning techniques are emerging in geophysical imaging (Araya-Polo et al. , Lewis et al. , Bianco and Gertoft ), though at a slower pace, perhaps partly due to the lack of standard open datasets.
Existing methods can be grouped into non-iterative methods that learn a feed-forward mapping from the measured data (or some standard manipulation such as adjoint or a pseudoinverse) to the model (Jin et al. , Pelt and Batenburg , Zhu et al. , Wang , Antholzer et al. , Han et al. , Zhang et al. ); and iterative energy minimization methods, with either the regularizer being a neural network (Li et al. ), or neural networks replacing various iteration components such as gradients, projectors, or proximal mappings (Kelly et al. , Adler and Öktem [2017b, a], Rick Chang et al. ). These are further related to the notion of plug-and-play regularization (Venkatakrishnan et al. ), as well as early uses of neural nets to unroll and adapt standard sparse reconstruction algorithms (Gregor and LeCun , Xin et al. ). An advantage of the first group of methods is that they are fast; an advantage of the second group is that they are better at enforcing data consistency.
A rather different take was proposed in the context of compressed sensing where the reconstruction is constrained to lie in the range of a pretrained generative network (Bora et al. [2017, 2018]). Their scheme achieves impressive results on random sensing operators and comes with theoretical guarantees. However, training generative networks requires many examples of ground truth and the method is inherently subject to dataset bias. Here, we focus on a setting where ground-truth samples are very few or impossible to obtain.
There are connections between our work and sketching (Gribonval et al. , Pilanci and Wainwright ) where the learning problem is also simplified by random low-dimensional projections of some object—either the data or the unknown reconstruction itself (Yurtsever et al. ). This also exposes natural connections with learning via random features (Rahimi and Recht [2008, 2009]).
The two stages of our method are (i) decomposing a “hard” learning task of directly learning an unstable operator into an ensemble of “easy” tasks of estimating projections of the unknown model into low-dimensional subspaces; and (ii) combining these projection estimates to solve a reformulated inverse problem for . The two stages are summarized in Figure 2. While our method is applicable to continuous and non-linear settings, we focus on linear finite-dimensional inverse problems.
Statistical learning theory tells us that the number of samples required to learn a -variate -Lipschitz function to a given sup-norm accuracy is (Cooper ). While this result is proved for scalar-valued multivariate maps, it is reasonable to expect the same scaling in
to hold for vector-valued maps. This motivates us to study Lipschitz properties of the projected inverse maps.
We wish to reconstruct , an -pixel image from where is large (we think of as an discrete image). We assume that the map from to is injective so that it is invertible on its range, and that there exists an -Lipschitz (generally non-linear) inverse ,
In order for the injectivity assumption to be reasonable, we assume that is a low-dimensional manifold embedded in of dimension at most , where is the number of measurements. Since we are in finite dimension, injectivity implies the existence of (Stefanov and Uhlmann ). Due to ill-posedness, is typically large.
Consider now the map from the data to a projection of the model into some -dimensional subspace , where . Note that this map exists by construction (since is injective on ), and that it must be non-linear. To see this, note that the only consistent222Consistent meaning that if already lives in , then the map should return . linear map acting on is an oblique, rather than an orthogonal projection on (cf. Section 2.4 in Vetterli et al. ). We explain this in more detail in Appendix A.
Denote the projection by and assume is chosen uniformly at random.333One way to construct the corresponding projection matrix is as , where is a matrix with standard iid Gaussian entries. We want to evaluate the expected Lipschitz constant of the map from to , noting that it can be written as :
where the first inequality is Jensen’s inequality, and the second one follows from
and the observation that . In other words, random projections reduce the Lipschitz constant by a factor of on average. Since learning requires samples, this allows us to work with exponentially fewer samples and makes the learning task easier. Conversely, given a fixed training dataset, it gives more accurate estimates.
The above example uses unstructured random subspaces. In many inverse problems, such as inverse scattering (Beretta et al. , Di Cristo and Rondi ), a judicious choice of subspace family can give exponential improvements in Lipschitz stability. Particularly, it is favorable to use piecewise-constant images: with being indicator functions of some domain subset.
Motivated by this observation, we use piecewise-constant subspaces over random Delaunay triangle meshes. The Delaunay triangulations enjoy a number of desirable learning-theoretic properties. For function learning it was shown that given a set of vertices, piecewise linear functions on Delaunay triangulations achieve the smallest sup-norm error among all triangulations (Omohundro ).
We sample sets of points in the image domain from a uniform-density Poisson process and construct (discrete) Delaunay triangulations with those points as vertices. Let be the collection of subspaces of piecewise-constant functions on these triangulations. Let further be the map from to the projection of the model into subspace , . Instead of learning the “hard” inverse mapping , we propose to learn an ensemble of simpler mappings .
We approximate each by a convolutional neural network, , parameterized by a set of trained weights . Similar to Jin et al. , we do not use the measured data directly as this would require the network to first learn to map back to the image domain; we rather warm-start the reconstruction by a non-negative least squares reconstruction, , computed from . The weights are chosen by minimizing empirical risk:
where is a set of training models and non-negative least squares measurements.
By learning projections onto random subspaces, we transform our original problem into that of estimating from . To see how this can be done, ascribe to the columns of a natural orthogonal basis for the subspace , with being the indicator function of the th triangle in mesh . Denote by the mapping from the data to an estimate of the expansion coefficients of in the basis for :
Let , and ; then we can estimate using the following reformulated problem:
and the corresponding regularized reconstruction:
with chosen as the TV-seminorm . The regularization is not essential. As we show experimentally, if is sufficiently large, is not required. Note that solving the original problem directly using regularizer fails to recover the structure of the model (Figure 1).
Since the true inverse map has a large Lipschitz constant, it would seem reasonable that as the number of mesh subspaces grows large (and their direct sum approaches the whole ambient space ), the Lipschitz properties of should deteriorate as well.
Denote the unregularized inverse mapping in (2) by . Then we have the following estimate:
the smallest (non-zero) singular value ofand the Lipschitz constant of the stable projection mappings . Indeed, we observe empirically that grows large as the number of subspaces increases which reflects the fact that although individual projections are easier to learn, the full-resolution reconstruction remains ill-posed.
Estimates of individual subspace projections give correct local information. They convert possibly non-local measurements (e.g. integrals along curves in tomography) into local ones. The key is that these local averages (subspace projection coefficients) can be estimated accurately (see Section 4).
To further illustrate what we mean by correct local information, consider a simple numerical experiment with our reformulated problem, , where is an all-zero image with a few pixels “on”. For the sake of clarity we assume the coefficients are perfect. Recall that is a block matrix comprising
subspace bases stacked side by side. It is a random matrix because the subspaces are generated at random, and therefore the reconstructionis also random. We approximate by simulating a large number of -tuples of meshes and averaging the obtained reconstructions.
Results are shown in Figure 3 for different numbers of triangles per subspace, , and subspaces per reconstruction, . As or increase, the expected reconstruction becomes increasingly localized around non-zero pixels. The following proposition (proved in Appendix B) tells us that this phenomenon can be modeled by convolution.444We note that this result requires adequate handling of boundary conditions; for the lack of space we omit the straightforward details.
Let be the solution to given as . Then there exists a kernel , with a discrete index, such that . Furthermore, is isotropic.
While Figure 3 suggests that more triangles are better, we note that this increases the subspace dimension which makes getting correct projection estimates harder. Instead we choose to stack more meshes with a smaller number of triangles.
Intuitively, since every triangle average depends on many measurements, estimating each average is more robust to measurement corruptions as evidenced in Section 4. Accurate estimates of local averages enable us to recover the geometric structure while being more robust to data errors.
In traveltime tomography, we measure wave travel times between sensors as in Figure 4. Travel times depend on the medium property called slowness (inverse of speed) and the task is to reconstruct the spatial slowness map. Image intensities are a proxy for slowness maps—the lower the image intensity the higher the slowness. In the straight-ray approximation, the problem data is modeled as integral along line segments:
where is the continuous slowness map and are sensor locations. In our experiments, we use a pixel grid with sensors (300 measurements) placed uniformly in an inscribed circle, and corrupt the measurements with zero-mean iid Gaussian noise.
We generate random Delaunay meshes each with 50 triangles. The corresponding projector matrices compute average intensity over triangles to yield a piecewise constant approximation of . We test two distinct architectures: (i) ProjNet, tasked with estimating the projection into a single subspace; and (ii) SubNet, tasked with estimating the projection over multiple subspaces.555Code available at https://github.com/swing-research/deepmesh under the MIT License.
The ProjNet architecture is inspired by the FBPConvNet (Jin et al. ) and the U-Net (Ronneberger et al. ) as shown in Figure 11a in the appendix. Crucially, we constrain the network output to live in by fixing the last layer of the network to be a projector, (Figure 11a). A similar trick in a different context was proposed in (Sønderby et al. ).
We combine projection estimates from many ProjNets by regularized linear least-squares (2) to get the reconstructed model (cf. Figure 2) with the regularization parameter determined on five held-out images. A drawback of this approach is that a separate ProjNet must be trained for each subspace. This motivates the SubNet (shown in Figure 11b). Each input to SubNet is the concatenation of a non-negative least squares reconstruction and 50 basis functions, one for each triangle forming a 51-channel input. This approach scales to any number of subspaces which allows us to get visually smoother reconstructions without any further regularization as in (2). On the other hand, the projections are less precise which can lead to slightly degraded performance.
As a quantitative figure of merit we use the signal-to-noise ratio (SNR). The input SNR is defined as where and
are the signal and noise variance; the output SNR is defined aswith the ground truth and the reconstruction.
130 ProjNets are trained for 130 different meshes with measurements at various SNRs. Similarly, a single SubNet is trained with 350 different meshes and the same noise levels. We compare the ProjNet and SubNet reconstructions with a direct U-net baseline convolutional neural network that reconstructs images from their non-negative least squares reconstructions. The direct baseline has the same architecture as SubNet except the input is a single channel non-negative least squares reconstruction like in ProjNet and the output is the target reconstruction. Such an architecture was proposed by (Jin et al. ) and is used as a baseline in recent learning-based inverse problem works (Lunz et al. , Ye et al. ) and is inspiring other architectures for inverse problems (Antholzer et al. ). We pick the best performing baseline network from multiple networks which have a comparable number of trainable parameters to SubNet. We simulate the lack of training data by testing on a dataset that is different than that used for training.
To demonstrate that our method is robust against arbitrary assumptions made at training time, we consider two experiments. First, we corrupt the data with zero-mean iid Gaussian noise and reconstruct with networks trained at different input noise levels. In Figures 5a, 12 and Table 1, we summarize the results with reconstructions of geo images taken from the BP2004 dataset666http://software.seg.org/datasets/2D/2004_BP_Vel_Benchmark/ and x-ray images of metal castings (Mery et al. ). The direct baseline and SubNet are trained on a set of 20,000 images from the arbitrarily chosen LSUN bridges dataset (Yu et al. ) and tested with the geophysics and x-ray images. ProjNets are trained with 10,000 images from the LSUN dataset. Our method reports better SNRs compared with the baseline. We note that direct reconstruction is unstable when trained on clean and tested on noisy measurements as it often hallucinates details that are artifacts of the training data. For applications in geophysics it is important that our method correctly captures the shape of the cavities unlike the direct inversion which can produce sharp but wrong geometries (see outlines in Figure 5a).
Second, we consider a different corruption mechanism where traveltime measurements are erased (set to zero) independently with probability , and use networks trained with 10 dB input SNR on the LSUN dataset to reconstruct. Figure 5b and Table 2 summarizes our findings. Unlike with Gaussian noise (Figure 5a) the direct method completely fails to recover coarse geometry in all test cases. In our entire test dataset of 102 x-ray images there is not a single example where the direct network captures a geometric feature that our method misses. This demonstrates the strengths of our approach. For more examples of x-ray images please see Appendix E.
Figure 6 illustrates the influence of the training data on reconstructions. Training with LSUN, CelebA (Liu et al. ) and a synthetic dataset of random overlapping shapes (see Figure 15 in Appendix for examples) all give comparable reconstructions—a desirable property in applications where real ground truth is unavailable.
We complement our results with reconstructions of checkerboard phantoms (standard resolution tests) and x-rays of metal castings in Figure 7. We note that in addition to better SNR, our method produces more accurate geometry estimates, as per the annotations in the figure.
We proposed a new approach to regularize ill-posed inverse problems in imaging, the key idea being to decompose an unstable inverse mapping into a collection of stable mappings which only estimate low-dimensional projections of the model. By using piecewise-constant Delaunay subspaces, we showed that the projections can indeed be accurately estimated. Combining the projections leads to a deconvolution-like problem. Compared to directly learning the inverse map, our method is more robust against noise and corruptions. We also showed that regularizing via projections allows our method to generalize across training datasets. Our reconstructions are better both quantitatively in terms of SNR and qualitatively in the sense that they estimate correct geometric features even when measurements are corrupted in ways not seen at training time. Future work involves getting precise estimates of Lipschitz constants for various inverse problems, regularizing the reformulated problem using modern regularizers (Ulyanov et al. ), studying extensions to non-linear problems and developing concentration bounds for the equivalent convolution kernel.
This work utilizes resources supported by the National Science Foundation’s Major Research Instrumentation program, grant #1725729, as well as the University of Illinois at Urbana-Champaign.
ECG beat classifier designed by combined neural network model.Pattern Recognition, 38(2):199–208, 2005.
European Conference on Computer Vision, pages 630–645. Springer, 2016a.
Neural networks and artificial intelligence for biomedical engineering. Wiley Online Library, 2000.
We explain the need for non-linear operators even in the absence of noise with reference to Figure 8. Projecting into a given known subspace is a simple linear operation, so it may not be a priori clear why we use non-linear neural networks to estimate the projections. Alas, we do not know and only have access to . Suppose that there exists a linear operator (a matrix) which acts on and computes the projection of on . A natural requirement on is consistency: if already lives in , then we would like to have . This implies that for any , not necessarily in , we require which implies that is an idempotent operator. Letting the columns of be a basis for , it is easy to see that the least squares minimizer for is . However, because ( is the adjoint of , simply a transpose for real matrices), in general it will not hold that . Thus, is an oblique, rather than orthogonal projection into . In Figure 8 this corresponds to the point which can be arbitrarily far from the orthogonal projection . The nullspace of this projection is precisely .
Thus consistent linear operators can at best yield oblique projections which can be far from the orthogonal one. One could also see this geometrically from Figure 8. As the angle between and increases to the oblique projection point travels to infinity (note that the oblique projection always happens along the nullspace of , which is the line orthogonal to . Since our subspaces are chosen at random, in general they are not aligned with . The only subspace on which we can linearly compute an orthogonal projection from is ; this is given by the Moore-Penrose pseudoinverse. Therefore, to get the orthogonal projection onto random subspaces, we must use non-linear operators. More generally, for any other ad hoc linear reconstruction operator , always lives in the column space of which is a subspace whose dimension is at most the number of rows of . However, we do not have any linear subspace model for .
As shown in the right half of Figure 8, as soon as is injective on , the existence of this non-linear map is guaranteed by construction: since determines , it also determines .
We show the results of numerical experiments in Figures 9 and 10 which further illustrate the performance difference between linear oblique projectors and our non-linear learned operator when estimating the projection of an image into a random subspace. We refer the reader to the captions below each figure for more details.
The reconstruction of the new inverse problem can be written as where the columns of form a biorthogonal basis to the columns of . Thus
Using the definition of the inner product and rearranging, we get
. Now, the probability distribution of triangles around any pointis both shift- and rotation-invariant because a Poisson process in the plane is shift- and rotation-invariant. It follows that for some , meaning that
which is a convolution of the original model with a rotationally invariant (isotropic) kernel. ∎
Figure 11 explains the network architecture used for ProjNet and SubNet. The network consists of a sequence of downsampling layers followed by upsampling layers, with skip connections (He et al. [2016b, a]) between the downsampling and upsampling layers. Each ProjNet output is constrained to a single subspace by applying a subspace projection operator, . We train such networks and reconstruct from the projection estimate using (2). SubNet is a single network that is trained over multiple subspaces. To do this, we change its input to be . Moreover, we apply the same projection operator as ProjNet to the output of the SubNet. Each SubNet is trained to give projection estimates over random subspaces. This approach allows us to scale to any number of subspaces without training new networks for each. Moreover, this allows us to build an over-constrained system to solve. Even though SubNet has almost as many parameters as the direct net, reconstructing via the projection estimates allows SubNet to get higher SNR and more importantly, get better estimates of the coarse geometry than the direct inversion. All networks are trained with the Adam optimizer.
We showcase more reconstructions on actual geophysics images taken from the BP2004 dataset in Figure 12. Note that all networks were trained on the LSUN bridges dataset.
We show additional reconstructions for the largest corruption case, , for x-ray images (Figure 13) and geo images (Figure 14). Our method consistently has better SNR. More importantly we note that there is not a single instance where the direct reconstruction gets a feature that our methods do not. The majority of times, the direct network misses a feature of the image. This is highly undesirable in settings such as geophysical imaging.