Neural networks toolbox focused on medical image analysis
A wide range of systems exhibit high dimensional incomplete data. Accurate estimation of the missing data is often desired, and is crucial for many downstream analyses. Many state-of-the-art recovery methods involve supervised learning using datasets containing full observations. In contrast, we focus on unsupervised estimation of missing image data, where no full observations are available - a common situation in practice. Unsupervised imputation methods for images often employ a simple linear subspace to capture correlations between data dimensions, omitting more complex relationships. In this work, we introduce a general probabilistic model that describes sparse high dimensional imaging data as being generated by a deep non-linear embedding. We derive a learning algorithm using a variational approximation based on convolutional neural networks and discuss its relationship to linear imputation models, the variational auto encoder, and deep image priors. We introduce sparsity-aware network building blocks that explicitly model observed and missing data. We analyze proposed sparsity-aware network building blocks, evaluate our method on public domain imaging datasets, and conclude by showing that our method enables imputation in an important real-world problem involving medical images. The code is freely available as part of the |neuron| library at http://github.com/adalca/neuron.READ FULL TEXT VIEW PDF
We propose a variational autoencoder architecture to model both ignorabl...
This paper takes a step towards temporal reasoning in a dynamically chan...
We propose a new probabilistic method for unsupervised recovery of corru...
Missing data are frequently encountered in high-dimensional problems, bu...
In the supervised high dimensional settings with a large number of varia...
To characterize circumstellar systems in high contrast imaging, the
We develop a stochastic whole-brain and body simulator of the nematode
Neural networks toolbox focused on medical image analysis
Contrast-agnostic segmentation of MRI scans
A framework for joint super-resolution and image synthesis, without requiring real training data
Neural networks toolbox focused on medical image analysis
Library for generating images by sampling a GMM conditioned on label maps
Highly incomplete data are found in a wide variety of domains. Sensor failure, occlusions, or sparsity by design all lead to missing data. For example, LIDAR scan data, providing depth measurements in a variety of problems, yield sparse point clouds [5, 28, 47]. Our work is motivated by a challenging real world medical imaging problem. In many clinical settings, medical image scanning time is limited by cost and physical or patient care constraints, leading to severely under-sampled images [8, 9, 43]. For example, in many clinical settings, only every sixth 2D slice is acquired in a 3D MRI scan, resulting in 83% of the anatomical data being missing. Estimating the missing data can yield meaningful clinical insight and help with downstream tasks such as registration and segmentation.
State of the art methods for imputation, or estimation of missing values, often use statistics learned across the entire dataset. Supervised methods that fill in missing values rely on datasets of full observations to learn relationships between present and missing data [2, 12, 14, 22, 52, 53]
. For example, super-resolution methods which can be viewed as imputation of higher-resolution pixel data require high resolution data to learn image statistics[12, 14, 22]
. In addition, super-resolution methods operate on a regular grid which is not applicable in many missing data settings. Image inpainting methods exploit statistical structure learned from images to impute a missing region, but often require densely observed regions of the existing images to learn image statistics[2, 52, 53]. Other subspace methods require fully observed data to learn a low-dimensional data representation, and then use these representations to impute missing information in sparsely observed data.
However, in many problems, fully observed data is unavailable or difficult to acquire. In this work, we focus on the recovery of missing image data in an unsupervised
setting where no full observations are available, but some common structure exists across the dataset. Unsupervised statistical imputation methods, spanning the literature from dictionary learning, factor analysis, manifold learning, and Principal Component Analysis (PCA) and its variants, often use linear subspace models to capture covariation across a dataset[3, 32]. These models assume each observed data point is a noisy, sparse observation of a lower dimensional linear subspace. They have been succesfully applied in areas such as network traffic flows  or collaborative filtering 
, where the high dimensional data can often be well represented by a linear embedding. However, in many settings such as imaging, linear models are insufficient in capturing data representations[27, 40, 41].
In this work, we propose a probabilistic model that frames high dimensional imaging data with very sparse pixel observations as being generated from a non-linear subspace. Starting from this model, we derive a variational learning strategy that employs developments in deep neural networks and variational inference. We introduce building blocks of sparsity-aware deep architectures, and present a data inference method, based on this learning model, that enables both conditional mean imputation and multiple imputation of missing data. We discuss connections and important differences with linear imputation methods, which can be viewed as linear instantiations of our model, with the variational auto encoder, and with deep image priors. We evaluate this approach on unsupervised imputation in several public datasets comparing several model variants and linear alternatives. We analyze individual building blocks, and show that our method enables imputation of medical images in a real-world problem. Finally, we provide a comparative analysis with deep image priors.
Collaborative filtering systems include only a sparse observation of users preferences [30, 45]. Here, methods aim to learn from user preference to produce future recommendations. Often, these models build user representations using matrix completion methods, which share a goal with data imputation using linear embeddings. Recent methods have exploited convolutional neural networks for joint user representations with external information regarding content 
. Other methods use shallow auto-encoders with sparse data and propose specific regularized loss functions[44, 46]. Similar to linear subspace models, these methods can be characterized as instantiations of our model.
Variational Bayes auto-encoders (VAEs) and similar models have been used to learn probabilistic generative models, often in the context of images [23, 24, 42]. Similarly, deep denoising auto-encoders use neural networks to obtain embeddings that are robust to noise . Our method builds on these recent developments to approximate subspaces using neural networks. Importantly, we show that a principled treatment of a sparsity model leads to important and intuitive differences from the VAE.
Deep Image Priors (DIP) use a generative neural network as a structural prior, and can be used to synthesize missing data . For each image independently, the method finds network parameters that best explain the observed pixels. However, as parameters are image specific, this method is not amenable to extreme sparsity where image structure is hard to learn from the few observations of that image. Below, we discuss how our method is similar to DIP, how it differs, and perform a comparison in our experiments.
Several methods define sparse neural networks in other contexts that are not directly related to our task, but still share nomenclature. For example, spatially-sparse CNNs assume a fully observed input, but the content itself is sparse, such as thin writing on a black background [16, 17]. Faster sparse convolutions are proposed by explicitly operating on the pixels that represent content, with the focus of efficient computation. Other methods propose sparsity of the parameter space in neural networks to improve various metrics of network efficiency [18, 33, 50].
During the development of this work, several contemporaneous works have been shown to tackle related problems. Partial convolutions  have been developed to tackle image inpainting, where parts of the desired images are fully observed. A recent method uses adversarial training to guide a generator network to impute missing data, and introduces a discriminator hint mechanism to enable training . Within the medical image analysis domain, a recent method takes advantage of the similarity of local structure between different acquisition directions to enable subject-specific supervised training and imputation . Outside of imaging-specific methods, recent papers have also shown similar development based on deep generative models in for imputing tabular and time series data [4, 39].
In this section we first present a general generative probabilistic model for sparse data observations using a non-linear subspace, and describe the imputation procedure using this model. We then show a learning algorithm that employs a variational approximation using neural networks, and introduce sparsity-aware neural network building blocks that explicitly model observed and missing data. Figure 2 illustrates an overview of the method.
denote an image written as a vector of sizewhich we model as a high dimensional manifestation of a low-dimensional representation of length :
denotes the multivariate normal distribution with meanand covariance , denotes the identity matrix, and is a (potentially non-linear) function parametrized by
that maps data from the low dimensional subspace to a full observation. The variance parametercaptures independent pixel noise. We adopt a Gaussian prior for the low dimensional representation:
We let indicate the set of observed entries in data , and the corresponding vector of observed values. The set varies for each datapoint and is assumed to be small (representing high sparsity). The likelihood of an entire observed dataset , where are observed data points, is therefore:
We aim to infer the full data point given a sparse observation using the posterior :
where we used Jensen’s inequality in and model (1) in the last line. To impute data, we use maximum-a-posteriori (MAP) estimation, and approximate it via the lower bound
Statistical imputation of the form (4) is referred to as conditional mean imputation, which can underestimate the variance of downstream tasks . It is sometimes desirable to be able to sample the posterior itself thus producing multiple imputations that give an indication of the variance captured by the model. Our method enables this process by sampling from the posterior approximation , followed by
This process projects a sparse data point to a plausible representation , and then estimates a possible data point .
Unfortunately, computing the expectation (4) or sampling from
is intractable. Building on recent methods in variational inference such as the VAE, we employ an approximate variational posterior probabilityparametrized by , and minimize the KL divergence with the true posterior [20, 21, 23, 24]:
which is the negative of the variational lower bound of the model evidence . We model the approximate posterior as a multivariate normal:
where is diagonal. This approximation enables efficient sampling, facilitating imputation using
where are samples from .
The functions take only the observed entries of and compute the subspace statistics. We estimate these using a neural network , parameterized by . We similarly approximate the generating function by a neural network , parameterized by . We jointly learn the parameters by optimizing the variational lower bound (6) using stochastic gradient methods. Specifically, for each data point with observed data , the resulting loss is:
where are samples from , and is the number of samples we draw for each input image. The first term encourages the observed entries of to be well recovered by the decoder. The second term encourages the subspace posterior to be close to the prior . Since this posterior depends on only unobserved entries, below we introduce several sparsity-aware building blocks for neural networks that handle sparse inputs (Figure 3.
Fully Connected (or dense) layers enable learning of broad correlations across pixels of a datapoint, and are extensively used in both imaging and non-imaging data. In general, a fully connected layer can be written as , where is the output (response), is the input, is the weight matrix and is a bias term. When the input is partially observed as , a naive computation of the output might involve the columns of that correspond to the observed entries, , via . This is equivalent to filling in the missing values with zeros. This approach, however, does not account for or exploit possible dependencies between entries of . We propose an alternative strategy based on linear models for imputation , where we adopt a linear model . Given observed , the response can be computed as:
where are the rows of that correspond to the observed indices. We propose to use this formulation as a sparsity-aware fully connected layer, where and are now the layer parameters. We discuss the linear formulation to imputation, which motivated this layer, in the Subsection Comparison with Linear Subspace Models. In our experiments, we demonstrate that this layer more accurately computes linear projections of sparse data and leads to improved training compared to a traditional fully connected layer.
For imaging data, hierarchical convolutional operations extract meaningful representations. Existing methods often fill in missing values with a constant, such as zero or the mean value at that pixel across the dataset, but these do not usually accurately estimate the existing image signal .
Given a sparse input image, we experiment with two convolutional approaches. In the first, we apply a weighted convolution, where the convolution kernel is modified to vary with image location by the binary observed mask . The new filter response at location is
where indicates all the pixels neighboring within some filter kernel size. This weighted filter uses only the existing information in computing a response. We then compute the mask to be used at a subsequent layer :
Since convolutional layers can be applied hierarchically, even very sparse data will often lead to a dense deep feature response. This sort of convolution was recently used to impute LIDAR depth data in a supervised context.
Since we focus on image data, linear interpolation provides a rough approximation for missing pixels. Therefore, in a second strategy, we first approximate the missing data with linear interpolation, and provide the first convolution layer with both the observation mask and the interpolated image, as two input channels.
Different architecture families are appropriate for different problems and datatypes. We focus on architectural design decisions that pertain to utilizing missing data and account for the image content type, rather than describing specific details about particular architectures.
Hierarchical convolutional layers are used to extract image-space features. We experiment with (sparsity-aware) fully convolutional encoder and decoders.
In many domains, capturing covariation across a large image can provide useful structural information. We therefore explore encoders that use a (sparsity aware) fully connected layer following several convolutions, and a decoder that uses a fully connected layer followed by convolutions.
For large data, such as volumetric medical scans, both convolutional and fully connected layers capture important relationships, but the volumes are too large to employ the designs above. In these cases, we propose an architecture that replaces the fully connected layer by locally connected sparse layers, which affect separate subregions of the volume.
We show that linear subspace methods are a specific case of our model. Let , where weight matrix controls the model covariance and is the data mean. This yields the sparse model
where selects the rows of that correspond to observed entries in . Using the proposed learning strategy (9) and the pseudo-inverse of , we can choose the approximating posterior,
where . Given learned parameters, we impute missing pixels using the expectation (4)
Intuitively, computing linearly projects the data point onto the low-dimensional subspace using only the observed pixels. The recovered data point is then computed by linearly projecting the estimated low-dimensional representation into a high-dimensional space. The parameters
can be learned using the expectation maximization algorithm. Extended linear subspace models, such as ones that regularize the weights or include mixtures of Gaussians, can similarly be seen as instantiations of our model. We used these concepts in proposing novel sparsity-aware fully connected (or dense) neural network layers above.
Our model and resulting learning strategy relate to the variational auto-encoder and other recent methods using approximations based on neural networks [23, 24, 42]. A key difference is that our generative probabilistic model explicitly captures missing data. Our focus is on describing how this aspect changes generative models, providing a principled derivation of the learning strategy in the presence of missing data, and introducing missing data aware network layers. Specifically, in (9) the posterior depends on only the observed entries of , leading to our network building blocks. The reconstruction term is evaluated at only the observed voxels, enabling parameter learning in unsupervised settings. In our experiments, we investigate how different parts of these methodological differences affect imputation.
Deep Image Priors  use a generative neural network as a prior for image , such that . Parameters are optimized separately for each image. In the context of sparse data, this method can be used to synthesize missing pixels by first obtaining for some fixed , and then computing the full image . This strategy requires enough observed pixels in each image to be able to infer image structure and thus the missing data, and has been demonstrated in the case of large natural images with half of the pixels missing.
In contrast, we focus on severe sparsity in each image, and leverage commonality across a dataset rather than the observed pixels in a single image. Our method learns a similar decoder network to be able to decode all images in a dataset, rather than learning an image-specific generative network. In this sense, our decoding model can be seen as a collection-wide global version of the Deep Image Prior, where the embedding , estimated by the encoder specifies the instance to be recovered.
In addition, our model can be seen as amortized inference over a collection of images with missing pixels. The general decoder learned by our model, together with an image-specific embedding , act as Deep Image Prior for the entire collection of images with common structure.
We provide a series of experiments evaluating the presented method and its utility in the context of unsupervised sparse datasets.
We first analyze the proposed sparse fully connected layer to illustrate the improvement over traditional strategies in the sparse setting. We then evaluate several models on different image datasets to shed light on how various approaches perform in different settings. We focus on comparing image imputation using linear subspaces with several variants of our non-linear subspace model. Our goal is to demonstrate the potential of straightforward use of our method, rather than proposing an optimized architecture for a specific task. We then show the utility of our model on a clinical image imputation task. Finally, we analyze our algorithm compared to Deep Image Priors.
We use three datasets in our experiments. First, the MNIST dataset consists of small (28x28 pixels) 2D images of hand-written digits . We create three variants: MNIST-2 which only consists of the digit 2, MNIST-all which refers to the original dataset, and MNIST-rot which contains all of the digits rotated at a random angle between 0 and 360 degrees. Second, we use the FASHION-MNIST dataset, which consists of 28x28 pixel images of 10 types of clothing items . These have more structure in each image compared to MNIST digits. Finally, we use a large-scale, multi-site dataset of 7829 T1-weighted brain MRI scans, compiled from eight publicly available datasets: ADNI , OASIS , ABIDE , ADHD200 , MCIC , PPMI , HABS , and Harvard GSP . Acquisition details, subject age ranges and health conditions are different for each dataset. We performed standard pre-processing steps on all scans, including resampling to mm isotropic voxels, and affine spatial normalization for each scan using FreeSurfer . We crop the final images to .
In our experiments, we simulate the patterns of missing pixels, which we then remove from the images. We split each dataset into 50%, 30%, and 20% for train, validation, and test sets respectively, all of which are sparse. The test set in each dataset is only evaluated once. We highlight, however, that our focus is on evaluating the models on the unsupervised task of imputing missing pixels. Below, 90% sparsity means 90% of the data is missing.
We first analyze the importance of the proposed sparse fully-connected layer using the FASHION-MNIST dataset (the other datasets result in comparable results). We compare several strategies for using fully connected layers with missing data, before activation and omitting the bias term.
We compute the linear projection of fully-observed images onto a low-dimensional subspace of size with weights initialized using PCA, and treat this as the ground truth projection. We then randomly sub-sample the data at several sparsity levels. We compute several baselines that use a conventional fully connected layer: filling the missing image pixels with 1) zeros, 2) the average value at each pixel location, or 3) linearly interpolated values. Finally, we test our sparsity-aware fully connected layer.
We compare the sparse image projections of each method with the ground truth response using mean squared error. We also consider a setting where spatial consistency is missing, often found in other domains, by randomly permuting image pixels and repeating the experiment.
Figure 4(a,b) shows that using the sparsity-aware fully connected layer leads to a significantly improved projection followed by spatial interpolation. The random permutation dramatically affects the spatial interpolation baseline, but does not affect the other methods since they treat dimensions independently.
In a second experiment, we learn a linear auto-encoder that uses the linear encoding methods described above, and a fully connected decoding layer. Figure 4(c,d) reports the validation loss (on observed pixels only), and final validation reconstruction error (on all pixels). The results show that using sparsity-aware fully connected layers leads to improved reconstructions, promising to improve network performance in the context of sparse input data.
In this section, we simulate random missing pixels for sparsity factors of 75% and 90% in images from MNIST-2, MNIST-all, MNIST-rot, and FASHION-MNIST. In addition, to evaluate more complex covariance structure in images, we obtain random 2D 64x64 patches from the brain MRI dataset.
. We implement both stochastic gradient descent (SGD) based optimization, as well as Expectation Maximization. For small datasets and models we find the two implementations to behave comparably, but the expectation maximization becomes intractable for large datasets and models. We therefore report results from the SGD implementation. In addition, we implement a (linear) sparse dictionary learning method using a L0-norm to encourage a sparse latent representation.
We evaluate several variants of our method using sparsity-aware implementations, from simpler filling strategies to our full model:
conv-zero fills missing image pixels with zeros and uses a normal convolutional encoder and the sparse loss (9)
conv-mean fills missing image pixels with the pixel-wise dataset mean and uses a normal convolutional encoder and the sparse loss
conv-interp linearly interpolates the missing pixels of an input image and uses a normal convolutional encoder and the sparse loss
conv-interp-wmask adds the incomplete data sampling mask as a model input channel to conv-interp
conv-FC-sparse adds a sparsity-aware fully connected layer to the end of the encoder and the start of the decoder, enabling covariances across the entire image. We use an encoding of size 10 for MNIST and FASHION-MNIST, and 100 for the MRI patches.
Figures 5 and 6 illustrate the results. For the single digit, all methods, including the linear subspace are able to exploit covariation across the dataset, despite significant sparsity. However, as more variability is present in the full and rotated datasets, the linear subspace methods are unable to capture the image structure and properly impute the images. In contrast, non-linear subspace models are able to capture complex covariances, even in extreme sparsity situations. Using a sparsity-driven VAE after filling the missing pixels using linear-interpolation or mean-filling performs better than the linear models, and having explicit sparsity-aware convolutional encoder performs the best among all variants.
In general, several design decisions are possible for a given method, including varying encoding size, noise parameters and architecture designs. Here, our goal is to illustrate that in various datasets and sparsity levels, our method promises to dramatically improve imputation by employing a non-linear subspace.
We demonstrate the utility of our method on sparse brain MRI acquisition. In many clinical settings, scanning time is limited, leading to severely under-sampled medical scans. For example, in many clinical settings, only every sixth 2D slice is acquired in the 3D MRI scan. Moreover, because of the variability in subject head positioning in the scanner, the sparsity patterns appear to have different angles. In this experiment we evaluate our method by using high resolution MRI data which we downsample to simulate the sparse acquisition protocol. Specifically, we simulate a sparse scan for each subject by removing five out of every six slices at an arbitrarily rotated angle, then rotate the subject back to the common reference frame. We extract the middle coronal slice of each subject, perpendicular to the acquisition direction, and carry out 2D imputation for the middle coronal slice. We evaluate the methods described in the previous experiment. Because of the size of the images and dataset, we use a subspace encoding of 200 for the conv-FC model.
Figure 5 (right) summarizes reconstruction results across the entire test set, and Figure 7 illustrates example imputation of MRI slices. The spatial linear interpolation image introduces several artifacts. The reconstructions using the linear subspace is unable to reconstruct detailed anatomy. In contrast, the proposed variants achieve significantly better reconstruction, demonstrating the promise of using the proposed methods on clinical MR images.
We evaluate our model, focusing on the conv-sparse variant, compared to the deep image priors (DIP) . We applied both methods to sparse data, as described in Section 3.5. For a direct comparison, we use the same decoder architecture, described in our model variants, for both the proposed model and DIP. We use the MNIST-2 and FASHION-MNIST-1 datasets in various high sparsity scenarios, as well as the sparse brain MRI scans used in the previous section. Figure 8 summarizes the results. As the sparsity level increases (fewer pixels observed), the Deep Image Prior strategy lacks sufficient data in a single image to synthesize a reasonable image. In contrast, our method is able to leverage the entire dataset of sparse images with common structure to learn covariation within pixels, enabling better imputation of missing pixels. Our method requires 18.6 13.7 ms (since we only need to evaluate the network) to reconstruct an MRI brain slice. Deep Image Priors require learning network parameters, which took 36.5 0.2 seconds for images in our dataset111We report the runtime of Deep Image Prior for 1000 iterations, which we found to be necessary for good results.
In this paper, we introduce a general probabilistic model to characterize sparse high dimensional imaging data by a deep non-linear subspace. We provide a principled derivation of the learning strategy in the presence of missing data, and introduce missing data aware network building blocks. We describe how missing data changes existing subspace models like the VAE, and how our model can be seen as a global version of deep image priors. Given a learned non-linear model, we describe how imputation can be achieved.
In our experiments, we demonstrate the importance of a novel sparsity-aware fully connected layer. We then show that our model exploits intra-image structure, as well as structure across a dataset, to yield superior imputation to fully convolutional architectures. Finally, we show the utility of our method using a real-world problem involving medical images.
Neural networks for pattern recognition. Oxford university press, 1995.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1646–1654, 2016.
The mnist database of handwritten digits.http://yann. lecun. com/exdb/mnist/, 1998.
Collaborative variational autoencoder for recommender systems.In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 305–314. ACM, 2017.
Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.Journal of Machine Learning Research, 11(Dec):3371–3408, 2010.
A deep learning based anti-aliasing self super-resolution algorithm for mri.In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 100–108. Springer, 2018.