I Introduction
Increasingly open image acquisition efforts in clinical practice are driving dramatic increases in the number and size of patient cohorts in clinical archives. Unfortunately, clinical scans are typically of dramatically lower resolution than the research scans that motivate most methodological development. Specifically, while slice thickness can vary depending on the clinical study or scan, interslice spacing
is often significantly larger than the inplane resolution of individual slices. This results in missing voxels that are typically filled via interpolation.
Our work is motivated by a study that includes brain MRI scans of thousands of stroke patients acquired within 48 hours of stroke onset. The study aims to quantify white matter disease burden [23], necessitating skull stripping and deformable registration into a common coordinate frame [27, 31, 32]. The volumes are severely undersampled (mm mm mm) due to constraints of acute stroke care (Fig. 1). Such undersampling is typical of modalities, such as T2FLAIR, that aim to characterize tissue properties, even in research studies like ADNI [15].
In undersampled scans, the image is no longer smooth, and the anatomical structure may change substantially between consecutive slices (Fig. 1). Since such clinically acquired scans violate underlying assumptions of many algorithms, even basic tasks such as skull stripping and deformable registration present significant challenges, yet are often necessary for downstream analysis [4, 7, 12, 15, 23, 30, 31].
We present a novel method for constructing high resolution anatomically plausible volumetric images consistent with the available slices in sparsely sampled clinical scans. Importantly, our method does not require any high resolution scans or expert annotations for training. It instead imputes the missing structure by learning solely from the available collection of sparsely sampled clinical scans. The restored images represent plausible anatomy. They promise to act as a medium for enabling computational analysis of clinical scans with existing techniques originally developed for high resolution, isotropic research scans. For example, although imputed data should not be used in clinical evaluation, the brain mask obtained through skull stripping of the restored scan can be applied to the original clinical scan to improve subsequent analyses.
Ia Prior Work
Many image restoration techniques depend on having enough information in a single image to synthesize data. Traditional interpolation methods, such as linear, cubic or spline [28]
, assume a functional representation of the image. They treat the low resolution voxels as samples, or observations, and estimate function parameters to infer missing voxel values. Patchbased superresolution algorithms use finescale redundancy within a single scan
[10, 11, 19, 20, 22]. The key idea is to fill in the missing details by identifying similar image patches in the same image that might contain relevant detail [19, 22]. This approach depends on having enough repetitive detail in a scan to capture and resynthesize high frequency information. Unfortunately, clinical images are often characterized by sampling that is too sparse to adequately fit functional representations or provide enough finescale information to recover the lost detail. For example, mm slice spacing, typical of many clinical scans including our motivating example, is far too high to accurately estimate approximating functions without prior knowledge. In such cases, a single image is unlikely to contain enough finescale information to provide anatomically plausible reconstructions in the direction of slice acquisition, as we demonstrate later in the paper.Alternatively, one can use additional data to synthesize better images. Many superresolution algorithms use multiple scans of the same subject, such as multiple low resolution acquisitions with small shift differences to synthesize a single volume [2, 16, 22]. However, such acquisitions are not commonly available in the clinical setting.
Nonparametric and convolutional neuralnetwork (CNN) based upsampling methods that tackle the problem of superresolution often rely on an external dataset of high resolution data or cannot handle extreme undersampling present in clinical scans. For example, some methods fill in missing data by matching a low resolution image patch from the input scan with a high resolution image patch from the training dataset
[3, 13, 16, 17, 25, 24]. Similarly, CNNbased upsampling methods approximate completion functions, but require high resolution scans for training [8, 21]. A recent approach to improve resolution from a collection of scans with sparse slices jointly upsamples all images using nonlocal means [26]. However this method has only been demonstrated on slice spacing of roughly three times the inplane resolution, and in our experience similar nonparametric methods fail to upsample clinical scans with more significant undersampling.Our work relies on a low dimensional embedding of image patches with missing voxels. Parametric patch methods and low dimensional embeddings have been used to model the common structure of image patches from full resolution images, but are typically not designed to handle missing data. Specifically, priors [33]
[35, 36] have been used in both medical and natural images for classification [1] and denoising [9, 36]. The procedures used for training of these models rely on having full resolution patches with no missing data in the training phase.Unfortunately, high (full) resolution training datasets are not readily available for many image contrasts and scanners, and may not adequately represent pathology or other properties of clinical populations. Acquiring the appropriate high resolution training image data is often infeasible, and here we explicitly focus on the realistic clinical scenario where only sparsely sampled images are available.
IB Method Overview
We take advantage of the fact that local fine scale structure is shared in a population of medical images, and each scan with sparse slices captures some partial aspect of this structure. We borrow ideas from Gaussian Mixture Model (GMM) for image patch priors [36], low dimensional Gaussian embeddings [14, 34], and missing data models [14, 18] to develop a probabilistic generative model for sparse 3D image patches around a particular location using a lowdimensional GMM with partial observations. We derive the EM algorithm for maximum likelihood estimation of the model parameters and discuss related modeling choices. Given a new sparsely sampled scan, the maximum a posteriori estimate of the latent structure yields the imputed high resolution image. We evaluate our algorithm using scans from the ADNI cohort, and demonstrate its utility in the context of the motivating stroke study. We investigate the behaviour of our model under different parameter settings, and illustrate an example of potential improvements in the downstream analysis using an example task of skull stripping.
This paper extends the preliminary version of the method presented at the 2017 Conference on Information Processing in Medical Imaging [5]. Here, we improve model inference by removing parameter codependency between iterations and providing new parameter initialization. We provide detailed derivations and discuss an alternative related model. Finally, we provide an analysis of important model parameters, present results for more subjects, and illustrate more example reconstructions. The paper is organized as follows. Section II introduces the model and learning algorithm. Section III discusses implementation details. We present experiments and analysis of the algorithm’s behavior in Section IV. We discuss important modeling aspects and related models in Section V. We include an Appendix and Supplementary Material with detailed derivations of the EM algorithm for the proposed models.
Ii Method
In this section, we construct a generative model for sparse image patches, present the resulting learning algorithm, and describe our image restoration procedure.
Let be a collection of scans with large interslice spaces, roughly aligned into a common atlas space (we use affine transformations in our experiments). For each image in the collection, only a few slices are observed. We seek to restore an anatomically plausible high resolution volume by imputing the missing voxel values.
We capture local structure using image patches. We assume a constant patch shape, and in our experiments use a 3D xx shape. We use to denote a
length vector that contains voxels of the image patch centered at a certain location in image
. We perform inference at each location independently and stitch the results into the final image as described later in this section. Fig. 2 provides an overview of the method.Iia Generative Model
We treat an image patch as a high dimensional manifestation of a low dimensional representation, with the intuition that the covariation within image patches has small intrinsic dimensionality relative to the number of voxels in the patch. To capture the anatomical variability across subjects, we employ a Gaussian Mixture Model (GMM) to represent local structure of 3D patches in the vicinity of a particular location across the entire collection. We then explicitly model the observed and missing information. Fig 3 presents the corresponding graphical model.
We model the latent low dimensional patch representation of length
as a normal random variable
(1) 
where
denotes the multivariate Gaussian distribution with mean
and covariance . We draw latent cluster assignment from a categorical distribution defined by a length vectorof cluster probabilities, and treat image patch
as a high dimensional observation of drawn from a component multivariate GMM. Specifically, conditioned on the drawn cluster ,(2)  
Vector is the patch mean of cluster , matrix shapes the covariance structure of , and
is the variance of image noise. This model implies
and .Defining , the likelihood of all patches at this location under the mixture model is
(3) 
In our clinical images, only a few slices are known. To model sparse observations, we let be the set of observed voxels in patch , and be the corresponding vector of their intensity values:
(4) 
where comprises rows of that correspond to the observed voxel set . The likelihood of the observed data is therefore
(5) 
where matrix extracts the rows and columns of that correspond to the observed voxel subset .
We do not explicitly model slice thickness, as in many clinical datasets this thickness is unknown or varies by site, scanner or acquisition. Instead, we simply treat the original data as high resolution thin planes and analyze the effects of varying slice thickness on the results in the experimental evaluation of the method.
We also investigated an alternative modeling choice where each missing voxel of patch is modelled as a latent variable. This assumption can optionally be combined with the latent lowdimensional patch representation. We discuss this alternative choice in Section V, and provide parameter updates in the Supplementary Material. Unfortunately, the resulting algorithm is prohibitively slow.
IiB Learning
Given a collection of observed patches , we seek the maximum likelihood estimates of the model parameters and under the likelihood (5
). We derive the Expectation Maximization algorithm
[6] in Appendix A, and present the update equations and their interpretations below.The expectation step updates the class memberships:
(6) 
and the statistics of the low dimensional representation for each image patch as ”explained” by cluster :
(7)  
(8) 
We let be the set of patches in which voxel is observed, and form the following normalized mean statistics:
(9)  
(10)  
(11) 
The maximization step uses the observed voxels to update the model parameters. We let be the element of vector , and update the cluster mean as a convex combination of observed voxels:
(12) 
The covariance factors and image noise variance are updated based on the statistics of the low dimensional representation from (10) and (11):
(13)  
(14) 
where is the row of matrix . Finally, we update the cluster proportions:
(15) 
Intuitively, learning our model with sparse data is possible because each image patch provides a slightly different subset of voxel observations that contribute to the parameter estimation (Fig. 2). In our experiments, all subject scans have the same acquisition direction. Despite different affine transformations to the atlas space for each subject, some voxel pairs are still never observed in the same patch, resulting in missing entries of the covariance matrix. Using a lowrank approximation for the covariance matrix regularized the estimates.
Upon convergence of the EM updates, we compute the cluster covariance for each .
IiC Imputation
To restore an individual patch , we compute the maximumaposteriori (MAP) estimate of the image patch:
where . Due to the highdimensional nature of the data, most cluster membership estimates are very close to or . We therefore first estimate the most likely cluster for patch by selecting the cluster with the highest membership . We estimate the low dimensional representation given the observed voxels using (7), which yields the high resolution imputed patch:
(16) 
By restoring the scans using this MAP solution, we perform conditional mean imputation (c.f. 17, Sec.4.2.2), and demonstrate the reconstructions in our experiments. In addition, our model enables imputation of each patch by sampling the posterior , providing a better estimation of the residual noise. Depending on the desired downstream application, samplingbased imputation may be desired.
We average overlapping restored patches using standard techniques [18] to form the restored volume.
Iii Implementation
We work in the atlas space, and approximate voxels as either observed or missing in this space by thresholding interpolation weights. To limit interpolation effects due to affine alignment on the results, we set a higher threshold for regions with high image gradients than in regions with low gradients. Parameter estimation could be implemented to include transformation of the model parameters into the subjectspecific space in order to optimally use the observed voxels, but this leads to computationally prohibitive updates.
We stack together the affinely registered sparse images from the entire collection. We learn a single set of mixture model parameters within overlapping subvolumes of voxels in the isotropically sampled common atlas space. Subvolumes are centered 11 voxels apart in each direction. We use a cubic patch of size voxels, and instead of selecting just one patch from each volume at a given location, we collect all overlapping patches within the subvolume centered at that location. This aggregation provides more data for each model, which is crucial when working with severely undersampled volumes. Moreover, including nearby voxels offers robustness in the face of image misalignment. Given the learned parameters at each location, we restore all overlapping patches within a subvolume.
While learning is performed in the common atlas space, we restore each volume in its original image space to limit the effects of interpolation. Specifically, we apply the inverse of the estimated subjectspecific affine transformation to the cluster statistics prior to performing subjectspecific inference.
Our implementation is freely available at https://github.com/adalca/papago.
Iv Experiments
We demonstrate the proposed imputation algorithm on two datasets and evaluate the results both visually and quantitatively. We also include an example of how imputation can aid in a skull stripping task.
Iva Data: ADNI dataset
We evaluate our algorithm using 826 T1weighted brain MR images from ADNI [15] ^{1}^{1}1Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD).. We downsample the isotropic mm images to slice separation of mm (mm mm inplane) in the axial direction to be of comparable quality with the clinical dataset. We use these low resolution images as input. All downsampled scans are affinely registered to a T1 atlas. The original images serve as the ground truth for quantitative comparisons. After learning model parameters using the data set, we evaluate the quality of the resulting imputations.
IvB Evaluation
We compare our algorithm to three upsampling methods: nearest neighbour (NN) interpolation, nonlocal means (NLM) upsampling, and linear interpolation [19]. We compare the reconstructed images to the original isotropic volumes both visually and quantitatively. We use the mean squared error,
(17) 
of the reconstructed image relative to the original high resolution scan . We also compute the related peak signal to noise ratio,
(18) 
Both metrics are commonly used in measuring the quality of reconstruction of compressed or noisy signals.
IvC Results
Fig. 4 illustrates representative restored images for subjects in the ADNI dataset. Our method produces more plausible structure. The method restores anatomical structures that are almost entirely missing in the other reconstructions, such as the dura or the sulci of the temporal lobe by learning about these structures from the image collection. We provide additional example results in the Supplementary Materials.
Fig. 5 reports the error statistics in the ADNI data. Due to high variability of MSE among subject scans, we report improvements of each method over the nearest neighbor interpolation baseline in the same scan. Our algorithm offers significant improvement compared to nearest neighbor, NLM, and linear interpolation (, , respectively). Our method performs significantly better on all subjects. The improvement in MSE is observed in every single scan. Similarly, our method performs consistently better using the PSNR metric (not shown), with mean improvements of up to compared to the next best restored scan.
IvD Parameter Setting
We analyze the performance of our algorithm while varying the values of the parameters, and the sparsity patterns of the observed voxels. For these experiments, we use four distinct subvolumes that encompass diverse anatomy from ADNI data, as illustrated in Fig. 6. We start with isotropic data and use different observation masks as described in each experiment.
Hyperparameters. We evaluate the sensitivity of our method under different hyper parameters: the number of clusters, and the number of dimensions of the low dimensional embedding . While different regions give optimal results with different settings, overall our algorithm produces comparable results for the middle range of these parameters. We run all of our experiments with and .
Sparsity patterns. First, we evaluate how our algorithm performs under three different mask patterns, all of which allow for the same number of observed voxels. Specifically, we (i) use the true sparsely observed planes as in the first experiment; (ii) simulate random rotations of the observation planes mimicking acquisitions in different directions; and (iii) simulate random mask patterns. The latter setup is useful for denoising or similar tasks, and is instructive of the performance of our algorithm. Fig. 7 demonstrates that our algorithm performs better under acquisition with different directions, and similarly under truly random observations as more entries of the cluster covariance matrices are directly observed.This demonstrates a promising application of this model to other settings where different patterns of image voxels are observed.
Slice Thickness. We also investigate the effects of slice thickness on the results. The model treats the original data as high resolution planes. Here, we simulate varying slice thickness by blurring isotropic data in the direction perpendicular to the slice acquisition direction. We then use the sampling masks of the scans used in the main experiments to identify observed, albeit blurred, voxels. Fig. 8 shows that although the algorithm performs worse with larger slice thickness, it provides plausible imputation results. For example, results show minimal noticeable differences, even for a blur kernel of mm, simulating a slice with significant signal contribution from mm away. Our method, which treats observed slices as thin, is nevertheless robust to slice thicknesses variations.
. Top: saggital closeup of a region where axial slices were blurred in the direction perpendicular to the acquisition direction; followed by respective imputed results. Bottom: performance of our algorithm under different slice thickness simulations are shown MSE (solid line) and standard deviation interval (shaded region).
IvE Skull Stripping
We also illustrate how imputed data might facilitate downstream image analysis. Specifically, the first step in many analysis pipelines is brain extraction – isolating the brain from the rest of the anatomy. Typical algorithms assume that the brain consists of a single connected component separated from the skull and dura by cerebral spinal fluid [29]. Thus, they often fail on sparsely sampled scans that no longer include a clear contrast between these regions. Fig. 9 provides an example where the brain extraction fails on the original subject scan but succeeds on our reconstructed image.
IvF Clinical Dataset
We also demonstrate our algorithm on a clinical set of 766 T2FLAIR brain MR scans in a stroke patient cohort. These scans are severely anisotropic (mm inplane, slice separation of mm). All subjects are affinely registered to an atlas and the intensity is normalized.
Fig. 10 illustrates representative restoration improvements in T2FLAIR scans from a clinical population. Our method produces more plausible structure, as can be especially seen in the closeup panels focusing on anatomical details. We provide additional example results in the Supplementary Materials.
V Discussion
Modeling Choices. We explicitly model and estimate a latent lowdimensional embedding for each patch. The likelihood model (5) does not include the latent patch representation , leading to observed patch covariance . Since the set of observed voxels varies across subjects, the resulting Expectation Maximization algorithm [6] becomes intractable if we marginalize the latent representation out before estimation. Introducing the latent structure simplifies the optimization problem.
We investigated an alternative modeling choice that instead treats each missing voxel as a latent variable. In particular we consider the missing values of patch as latent variables, which can be optionally combined with the latent lowdimensional patch representation. These assumptions lead to an Expectation Conditional Maximization (ECM) [14, 18], a variant of the Generalized Expectation Maximization where parameter updates depend on the previous parameter estimates. The resulting algorithm estimates the expected missing voxel mean and covariance directly, and then updates the cluster parameters (see Supplementary Materials for a complete derivation). The most notable difference between this formulation and simpler algorithms that iteratively fill in missing voxels and then estimate GMM model parameters is in the estimation of the expected data covariance, which captures the covariance of the missing and observed data (c.f. [18], Ch.8). We found that compared to the method presented in Section II, this variant often got stuck in local minima, had difficulty moving away from the initial missing voxel estimates, and was an order of magnitude slower than the presented method. We provide both implementations in our code.
Restoration. Our restoration method assumes that the observed voxels are noisy manifestations of the low dimensional patch representation, and reconstructs the entire patch, including the observed voxels, leading to smoother images. This formulation assumes the original observed voxels are noisy observations of the true data. Depending on the downstream analyses, the original voxels could be kept in the reconstruction. In addition, we also investigated an alternative reconstruction method of filling in the missing voxels given the observed voxels as noiseless ground truth (not shown). This formulation leads to sharper but noisier results. The two restoration methods therefore yield images with different characteristics. This tradeoff is a function of the noise in the original acquisition: higher noise in the clinical acquisition leads to noisier reconstructions using the alternative method, whereas in the ADNI dataset the two methods perform similarly. In addition, imputation can be achieved by sampling the posterior distribution rather than using conditional mean estimation, enabling a better estimate of the residual noise for downstream analysis.
Usability. Our model assumes that whether a voxel is observed is independent of the intensity of that voxel. Although the voxels missing in the sparsely sampled images clearly form a spatial pattern, we assume there is no correlation with the actual intensity of the voxels. The model can therefore be learned from data with varying sparseness patterns, including restoring data in all acquisition directions simultaneously.
The proposed method can be used for general image imputation using datasets of varying resolution. For example, although acquiring a large high resolution dataset for a clinical study is often infeasible, our algorithm will naturally make use of any additional image data available. Even a small number of acquisitions in different directions or higher resolution than the study scans promise to improve the accuracy of the resulting reconstruction.
The presented model depends on the image collection containing similar anatomical structures roughly aligned, such as affinely aligned brain or cardiac MR scans. Smaller datasets that contain vastly different scans, such as traumatic brain injuries or tumors, may not contain enough consistency to enable the model to learn meaningful covariation. However, a wide range of clinical datasets contain the anatomical consistency required, and can benefit from the proposed method.
Initialization. We experimented with several initialization schemes, and provide them in our implementation. A natural initialization is to first learn a simple GMM from the linearly interpolated volumes, and use the resulting parameter values as initializations for our method. This leads to results that improve on the linear interpolation but still maintain somewhat blocky effects caused by interpolation. More agnostic initializations, such as random parameter values, lead to more realistic anatomy but noisier final estimates. Different methods perform well in different regions of the brain. The experimental results are initialized by first learning a simple GMM from the linearly interpolated volumes, and using the resulting means with diagonal covariances as an initial setting of the parameters. We start with a low dimensional representation to be of dimension , and grow it with every iteration up to the desired dimension. We found that this approach outperforms all other strategies.
Vi Conclusions
We propose an image imputation method that employs a large collection of lowresolution images to infer finescale anatomy of a particular subject. We introduce a model that captures anatomical similarity across subjects in large clinical image collections, and imputes, or fills in, the missing data in low resolution scans. The method produces anatomically plausible volumetric images consistent with sparsely sampled input scans.
Our approach does not require high resolution scans or expert annotations for training. We demonstrate that our algorithm is robust to many data variations, including varying slice thickness. The resulting method enables the use of untapped clinical data for large scale scientific studies and promises to facilitate novel clinical analyses.
Acknowledgment
We acknowledge the following funding sources: NIH NINDS R01NS086905, NIH NICHD U01HD087211, NIH NIBIB NAC P41EB015902, NIH R41AG05224601, 1K25EB01364901, 1R21AG05012201, NSF IIS 1447473, Wistron Corporation, and SIP.
Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH1220012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from several agencies listed at http://adni.loni.usc.edu/about/.
Appendix A Expectation Maximization Updates
References
 [1] Komal Kumar Bhatia, Akhila Rao, Anthony N Price, Robin Wolz, Joseph V Hajnal, and Daniel Rueckert. Hierarchical manifold learning for regional image analysis. TMI, 33(2):444–461, 2014.
 [2] Eyal Carmi, Siuyan Liu, Noga Alon, Amos Fiat, and Daniel Fiat. Resolution enhancement in MRI. Magnetic resonance imaging, 24(2):133–154, 2006.
 [3] Pierrick Coupé, José V Manjón, Vladimir Fonov, Jens Pruessner, Montserrat Robles, and Louis D Collins. Patchbased segmentation using expert priors: Application to hippocampus and ventricle segmentation. NeuroImage, 54(2):940–954, 2011.
 [4] Adrian V Dalca, Andreea Bobu, Natalia S Rost, and Polina Golland. Patchbased discrete registration of clinical brain images. In International Workshop on Patchbased Techniques in Medical Imaging, pages 60–67. Springer, 2016.
 [5] Adrian V Dalca, Katherine L Bouman, William T Freeman, Natalia S Rost, Mert R Sabuncu, and Polina Golland. Population based image imputation. In Information Processing in Medical Imaging. Springer, 2017.
 [6] Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society. Series B (methodological), 39(1):1–38, 1977.
 [7] Adriana Di Martino, ChaoGan Yan, Qingyang Li, Erin Denio, Francisco X Castellanos, Kaat Alaerts, et al. The autism brain imaging data exchange: towards a largescale evaluation of the intrinsic brain architecture in autism. Molecular psychiatry, 19(6):659–667, 2014.

[8]
Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang.
Learning a deep convolutional network for image superresolution.
In
European Conference on Computer Vision
, pages 184–199. Springer, 2014.  [9] Michael Elad and Michal Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE TMI, 15(12):3736–3745, 2006.
 [10] William T Freeman, Thouis R Jones, and Egon C Pasztor. Examplebased superresolution. IEEE Computer graphics and Applications, 22(2):56–65, 2002.
 [11] Daniel Glasner, Shai Bagon, and Michal Irani. Superresolution from a single image. In Computer Vision, International Conference on, pages 349–356. IEEE, 2009.
 [12] Derek LG Hill, Philipp G Batchelor, Mark Holden, and David J Hawkes. Medical image registration. Physics in medicine and biology, 46(3):R1, 2001.
 [13] Juan E Iglesias, Ender Konukoglu, Darko Zikic, Ben Glocker, Koen Van Leemput, and Bruce Fischl. Is synthesizing mri contrast useful for intermodality analysis? MICCAI, LNCS 8149:631–638, 2013.

[14]
Alexander Ilin and Tapani Raiko.
Practical approaches to principal component analysis in the presence of missing values.
J Mach Learn Res, 11:1957–2000, 2010.  [15] Clifford R Jack, Matt A Bernstein, Nick C Fox, Paul Thompson, Gene Alexander, Danielle Harvey, et al. The Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods. Journal of Magnetic Resonance Imaging, 27(4):685–691, 2008.

[16]
Amod Jog, Aaron Carass, and Jerry L Prince.
Improving magnetic resonance resolution with supervised learning.
In ISBI, pages 987–990. IEEE, 2014.  [17] Ender Konukoglu, Andre van der Kouwe, Mert R Sabuncu, and Bruce Fischl. Examplebased restoration of highresolution magnetic resonance image acquisitions. MICCAI, LNCS 8149:131–138, 2013.
 [18] Roderick JA Little and Donald B Rubin. Statistical analysis with missing data. Wiley, 2014.
 [19] José V Manjón, Pierrick Coupé, Antonio Buades, Vladimir Fonov, D Louis Collins, and Montserrat Robles. Nonlocal MRI upsampling. Medical image analysis, 14(6):784–792, 2010.
 [20] José V Manjón, Pierrick Coupé, Antonio Buades, D Louis Collins, and Montserrat Robles. New methods for MRI denoising based on sparseness and selfsimilarity. Med. I.A., 16(1):18–27, 2012.
 [21] Ozan Oktay, Wenjia Bai, Matthew Lee, Ricardo Guerrero, Konstantinos Kamnitsas, Jose Caballero, Antonio de Marvao, Stuart Cook, Declan O’Regan, and Daniel Rueckert. Multiinput cardiac image superresolution using convolutional neural networks. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pages 246–254. Springer, 2016.
 [22] Esben Plenge, Dirk HJ Poot, Wiro J Niessen, and Erik Meijering. Superresolution reconstruction using crossscale selfsimilarity in multislice MRI. MICCAI: Medical Image Computing and ComputerAssisted Intervention, LNCS 8151:123–130, 2013.
 [23] Natalia S Rost, Kaitlin Fitzpatrick, Alessandro Biffi, Allison Kanakis, William Devan, Christopher D Anderson, Lynelle Cortellini, Karen L Furie, and Jonathan Rosand. White matter hyperintensity burden and susceptibility to cerebral ischemia. Stroke, 41(12):2807–2811, 2010.
 [24] Francois Rousseau, Piotr A Habas, and Colin Studholme. A supervised patchbased approach for human brain labeling. IEEE Tran. Med. Imag., 30(10):1852–1862, 2011.
 [25] François Rousseau, Alzheimer’s Disease Neuroimaging Initiative, et al. A nonlocal approach for image superresolution using intermodality priors. Medical image analysis, 14(4):594–605, 2010.
 [26] François Rousseau, Kio Kim, and Colin Studholme. A groupwise superresolution approach: application to brain MRI. In ISBI, pages 860–863. IEEE, 2010.
 [27] Paul Schmidt, Christian Gaser, Milan Arsic, Dorothea Buck, Annette Förschler, Achim Berthele, Muna Hoshi, Rüdiger Ilg, Volker J Schmid, Claus Zimmer, et al. An automated tool for detection of flairhyperintense whitematter lesions in multiple sclerosis. Neuroimage, 59(4):3774–3783, 2012.
 [28] Isaac Jacob Schoenberg. Cardinal spline interpolation. 12. SIAM, 1973.
 [29] Florent Ségonne, Anders M Dale, Evelina Busa, Maureen Glessner, David Salat, Horst K Hahn, and Bruce Fischl. A hybrid approach to the skull stripping problem in MRI. Neuroimage, 22(3):1060–1075, 2004.
 [30] Wenzhe Shi, Jose Caballero, Christian Ledig, Xiahai Zhuang, Wenjia Bai, et al. Cardiac image superresolution with global correspondence using multiatlas patchmatch. MICCAI: Medical Image Computing and ComputerAssisted Intervention, LNCS 8151:9–16, 2013.
 [31] Ramesh Sridharan, Adrian V Dalca, Kaitlin M Fitzpatrick, Lisa Cloonan, Allison Kanakis, Ona Wu, et al. Quantification and analysis of large multimodal clinical image studies: Application to stroke. MICCAI  MBIA Workshop, LNCS 8159:18–30, 2013.

[32]
Koen Van Leemput, Frederik Maes, Dirk Vandermeulen, Alan Colchester, and Paul
Suetens.
Automated segmentation of multiple sclerosis lesions by model outlier detection.
IEEE transactions on medical imaging, 20(8):677–688, 2001.  [33] Jianchao Yang, John Wright, Thomas S Huang, and Yi Ma. Image superresolution via sparse representation. IEEE Transactions on Image Processing, 19(11):2861–2873, 2010.
 [34] Ruoqiao Zhang, Charles A Bouman, JeanBaptiste Thibault, and Ken D Sauer. Gaussian mixture markov random field for image denoising and reconstruction. In IEEE Global Conference on Signal and Information Processing (GlobalSIP), pages 1089–1092. IEEE, 2013.
 [35] Daniel Zoran and Yair Weiss. From learning models of natural image patches to whole image restoration. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 479–486. IEEE, 2011.
 [36] Daniel Zoran and Yair Weiss. Natural images, gaussian mixtures and dead leaves. In Advances in Neural Information Processing Systems, pages 1736–1744, 2012.
Derivation of Alternative Model
In this section, we explore the parameter estimation for an alternative model. Specifically, letting be the set of missing voxels of patch , we treat as latent variables, instead of explicitly modeling a lowdimensional representation . We show the maximum likelihood updates of the model parameters under the likelihood (5). We employ the Expectation Conditional Maximization (ECM) [14, 18] variant of the Generalized Expectation Maximization, where parameter updates depend on the previous parameter estimates.
The complete data likelihood is
(25) 
The expectation step updates the statistics of the missing data, computed based on covariates of the known and unknown voxels:
(26)  
(27)  
(28) 
where the correction in can be interpreted as the uncertainty in the covariance estimation due to the missing values.
Given estimates for the missing data, the maximization step leads to familiar Gaussian Mixture Model parameters updates:
(29)  
(30)  
(31) 
where .
In additional to the latent missing voxels, we can still model each patch as coming from a low dimensional representation. We form as in (IIA), leading to the complete data likelihood:
(32) 
The expectation steps are then unchanged from (26)(28) with replacing . The maximization steps are unchanged from (29)(31), with now the empirical covariance in (30). We let
be the singular value decomposition of
, leading to the low dimensional updates(33)  
(34) 
Finally, we let .
Unfortunately, both learning procedures involve estimating all of the missing voxel covariances, leading to a large and unstable optimization.