I Introduction
Cosmic Microwave Background (CMB) map details the cooled remnant of the light or electromagnetic radiation caused by the Big Bang in the early stages of the universe, which one can still observe today [1]. The CMB is a valuable resource containing information about how the early universe was formed. It is at a uniform temperature with small fluctuations visible only with high precision telescopes. By measuring and understanding fluctuations, cosmologists can learn the origin of galaxies and explore the basic parameters of the Big Bang theory. The CMB map is produced by mapmarker using component separation such as Commander [2, 3], NILC [4], SEVEM [5] and SMICA [6]. Due to contamination of thermal noise in the galaxy, the region near the equator of the map (here the equator corresponds to the galaxy) has missing observations for CMB. The left picture of Figure 3 shows the Planck 2018 Commander CMB map, where the noise near the equator is apparent. The right panel of the picture shows the zoomedin image containing the missing observations at noisy pixels.
Estimating, or inpainting missing observations is a largely unsolved problem in CMB research [7]
. In this work, we propose a new inpainting method for restoring the missing observations — CosmoVAE, stemming from a Bayesian deep learning model. CosmoVAE is a variational autoencoder that takes deep convolutional neural networks as encoder and decoder and the whole model as Bayesian inference. The CMB field is a realization of the Gaussian random field on the twodimensional unit sphere. The CMB map in high resolution is a big data set (for example, in resolution
, the data is stored at more than 50 million HEALPix points [8]). We show that this Bayes based deep learning model provides an efficient method for inpainting CMB maps using big data of CMB.Ii Literature Review
CMB maps can be produced by the following four principles. NILC [4] is a linear combination which works in the needlet domain. SEVEM [5] is a foreground templatecleaning approach that works in the pixel domain to component separation. SMICA [6] is a nonparametric approach using spherical harmonics as basis. Commander [2, 3] is an MCMC based map maker. In Commander, the CMB map is viewed as a realization of Gaussian random field which can be generated by angular power spectrum . The density function of power spectrum is given by Gibbs sampling and the likelihood of is estimated by Bayesian inference. Also, one can use BlackwellRao approximation for estimating the angular power spectrum [9]. The produced CMB map, however, has missing observations. Most mask region is concentrated on the equatorial region due to thermal dust in the galaxy. Inpainting the CMB map is significant to reducing the uncertainty of the estimate for cosmological parameters by CMB data. There have been several methods for CMB map inpainting. For example, Planck consortium [10]
uses Gaussian constrained realization to replace the highforeground regions. Also, one can use Gaussian process regression for CMB image inpainting, which estimates the covariance function for the image pixels and then interpolates the missing pixels
[11, 12]. Although the traditional statistical method is algorithmically easy to combine different temperature and polarization estimators, they are computationally expensive for the largesized data set of CMB. According to Gruetjen et al. [13], inpainting is an alternative way to construct accurate cutsky CMB estimators. Gruetjen showed that one could apply inpainting to the problem of unbiased estimation of the power spectrum, which utilizes the linearity of inpainting to construct analytically debiased power spectrum estimates from inpainted maps.
Recently, with deep learning, inpainting methods have significantly improved reconstruction results by learning semantics from large scale data set. These methods typically use different kinds of convolutional neural networks as mapping functions from masked images to inpainted images endtoend. Context encoder [14] is the first algorithm that uses a deep learning approach to reconstruct masked images. It utilizes the autoencoder architecture and convolutional neural network with reconstruction and adversarial loss for inpainting. It can achieve surprisingly good performance for restoring an image with a square mask hole. Yang et al. [15] takes the result from context encoders as input and then propagates the texture information from nonhole regions to fill the hole regions as postprocessing. Yu et al. [16] proposed an endtoend image inpainting model with global and local discriminators to ensure the color and texture consistency of generated regions with surroundings. This method has no limitation on the location of mask regions, but the mask shape needs rectangular. As the real CMB mask region is irregular, this method is not suitable. To achieve better inpainting performance for irregular masks, partial convolution [17] was proposed by Liu et al., where the convolution operation can skip the missing pixels and only use valid pixels. This specified convolution operation can appropriately process with irregular mask and would not lead to artifacts such as color discrepancy and blurriness. With the combination of reconstruction loss, perceptual loss [18], and total variation loss as penalty term [19], the model achieves stateoftheart image inpainting results on large data sets such as human faces and landscapes.
Researchers have proposed many generative probabilistic models based on neural networks in the past decade. Variational autoencoder (VAE) [20]
is one of the most popular approaches. With a welltrained VAE model, we can generate various kinds of images by the sampling latent variable with specific distribution (e.g., Gaussian distribution). In many cases, one is interested in training the generative models conditional on the image features such as labels and characteristics of the human face. Sohn et al.
[21] proposed conditional variational autoencoder whose input observations modulate the prior on Gaussian latent variables, which then generate the outputs by the decoder. After training, it can specify the output image by controlling the latent variable. Ivanov et al. [22] modified conditional VAE and proposed variational autoencoder with arbitrary conditioning (VAEAC) model. VAEAC can learn the feature from valid pixels and predict the missing pixels values. Ivanov et al. have used this method for inpainting four different data sets, which achieved state of the art performance.Our network architecture adopts the autoencoder architecture, which is widely used in representation learning and image inpainting. We use variational Bayesian approximation to obtain the evidence lower bound (ELBO) of the likelihood of the reconstructed image. The ELBO will be used for our loss function. Besides, we use skipconnection to build a sufficiently deep network and add perceptual loss in the total loss function. We also replace the partial convolution layer [17] with the vanilla layer, which is more appropriate for CMB image inpainting tasks.
Iii CosmoVAE for CMB
Our proposed model, as illustrated in Figure 2, is based on the variational autoencoders (VAE) [20]
, where the encoder and decoder combine the convolutional neural networks (CNNs) and multilayer perceptron (MLP). This modified VAE also uses skip connection between the encoder and decoder, which builds a UNetlike architecture
[17] in order to guarantee optimal transfer of spatial information from input to the output image. The basic autoencoder compresses the highdimensional input (i.e., the segmented image of CMB map) to a lowdimensional latent variable , and then decompresses back to highdimensional output , and the input and output should be the same. In the CosmoVAE, the encoder takes the image with a missing region and produces a latent feature representation; the latent features are used by the decoder to produce the missing image. In the training stage, the generated image is compared with the ground truth, where the loss function is composed of the negative variational lower bound, perceptual loss, and a total variation regularizer. A welltrained model can rebuild the mask regions of the CMB map.Iiia Statistical Interpretation in VAE
Let us consider the joint probability distributions of three random variables
, where is the input masked CMB maps,is the vector of latent variables and
is the vector correspsonding to the reconstructed CMB maps. We use neural networks for probabilistic encoder () and decoder (). To be precise, the probabilistic encoder is defined as where for all , the denotes the parameters of the neural network. And the probabilistic decoder is given by , with the parameters of the decoder network, andThe marginal loglikelihood of output given by , for training samples, can be expressed as variational lower bound which is used as a surrogate objective function where:
(1)  
The variational lower bound consists of negative reconstruction loss and KullbackLeibler divergence between the approximated posterior of the encoder network and the prior of the latent variable.
By dual principle, maximizing loglikelihood function is equivalent to minimizing the negative lower bound (with respect to the parameters and
). We thus need to find the gradient of the expectation term. However, this term is intractable, and the variance in the standard Monte Carlo gradient estimators are not computationally efficient. To overcome this, we use the reparameterization trick
[20] to find another differentiable estimator, which is computationally friendly.IiiB Prior Specification for the Latent Variables
The KLdivergence term in the variational lower bound in (1) can be interpreted as regularization for the parameters and , encouraging the approximate posterior to be close to the prior . The posterior distribution can be estimated by probabilistic encoder but the prior distribution remains to be determined.
As CMB is a Gaussian random field and the temperature quantity can be expressed in Fourier series:
where is the temperature anisotropy in direction which can be expanded with Fourier coefficients :
A Gaussian random field is fully determined by the mean and variance. The follows a centered Gaussian with variance , where is the CMB angular power spectrum. Here, we assume that our latent variable is given by the angular power spectrum . And the generative field is connected with the latent variable by and . The KLdivergence term can be modified as , the divergence between posterior distribution and the prior distribution. By this, we then assign the physical meaning to the VAE model; that is, the latent variable is the angular power spectrum of the learned field, and the angular power spectrum samples the Fourier coefficients of the generative field.
IiiC Loss Function
The overall loss function to train the model is defined as:
with appropriate weight for each term. Here the weights are hyperparameters, tuned artificially. The is the negative variational lower bound which consists of the reconstruction loss and the KLdivergence regularization, the is the perceptual loss[18], and the is the total variation loss [19].
Reconstruction Loss
We use a binary mask which is 0 for pixel outside the masked region and 1 for pixel inside the masked region. For the network prediction and the ground truth , the reconstruction loss is then
where denotes normalization constant (where and are the channel size, and the height and width of image).
Regularization
The regularization term is the KLdivergence between posterior distribution and the prior distribution. The latent variable , then, can be solved analytically by
where is the approximated posterior distribution of the encoder network.
Perceptual Loss
Perceptual loss is firstly proposed in [18] to preserve image contents in style transfer and is now widely used for image inpainting. The perceptual loss computes the loss of highlevel feature maps between the predicted image and ground truth:
where is the activation map of the th selected layer which lies in a higher level feature space in ImageNetpretrained VGG16 [23]. We use Pool1, Pool2 and Pool3 layers of VGG16 for our loss.
Total Variation Loss
The final term for the loss is the total variation loss as a smoothing penalty:
where is the region of 1pixel dilation of the mask region and is the number of pixels in the mask region [19].
Iv Experimental Results
Iva Generating Training and Test Data Sets
In the context of CMB map inpainting, preparing training sets has two challenges: the original map is not flat (it resides more naturally on a sphere), and there is only a single CMB map that we can use for training. We can solve the problems by projecting the spherical CMB map to the plane by Cartesian Projection. In order to have a sufficient number of data sets, we segment the whole map with around 50 million pixels into thousands of small images with 400400 pixels. This is a feasible approach as we assume the CMB field as an isotropic Gaussian random field on the twodimensional sphere. The resulting small images will be random fields on the squares, which can be assumed independently and identically distributed. We can then treat each small image of the CMB map independently as training data for the deep learning model.
Image data set
The data of CMB maps are stored at the HEALPix points on the unit sphere [8]^{1}^{1}1http://healpix.sf.net. In the experiments, we use Planck 2018 Commander CMB map with (and points), downloaded from Planck Legacy Archive^{2}^{2}2https://pla.esac.esa.int/#maps. To generate small images cropped from the original map, we project the original map to a flat big 2D image using Cartesian Projection of healpy (Python) package [24]. We equally space the points on the projected CMB map, and for each point which then becomes the center of the small image, we take the rectangle whose latitudinal and longitudinal angles ranging from to and to from the center. The spherical CMB fullsky map is then segmented into 4,042 small flat images, each with resolution 400 400. The 772 among all small images contain missing regions and will be the test data set, and the remaining clean 3,320 images will be the training set.
Mask data set
The generative method of mask data set is similar with training data set. We use Planck 2018 Component Separation Inpainting Common mask in Intensity, downloaded from Planck Legacy Archive, see Figure 3. The spherical mask map is divided with the same size and same centres as the fullsky CMB map, and segmented into small flat mask images. Each mask image corresponds to a CMB image in training data set, and will be used in masking the missing pixels in training and test. There are masks having missing pixels, and we randomly select them during the training process.
IvB Network Architecture and Training
Our proposed model is implemented in Keras
^{3}^{3}3https://github.com/kerasteam/keras. The network architecture is a Unetlike network. The encoder and decoder have architectures that contain six blocks and three fully connected layers. The encoder network has architecture 64128256512512512, and the decoder network architecture is 51251251225612864. The latent variable is channelwise with 2,507 component parameters, which corresponds to angular power spectrum with up to 2,507. (There are thus the 6,290,064 Fourier coefficients which approximately represent the learned field.) The encoder output samples the latent variable. The whole network is trained using 3,320 400400 images with batch size four and maximal epoch 1,000. The model is optimized using Adam optimizer with the parameters: learning rate 0.0002 and
= 0.5. In the training stage, we use the bestfit CDM CMB TT power spectra from the Planck PR3 baseline^{4}^{4}4https://pla.esac.esa.int/#cosmology as .The experiment is carried out in Google Colab Nvidia Tesla K80 with 2496 CUDA cores, compute 3.7, 12GB GDDR5 VRAM. With no pretrained weights, it roughly takes 6 hours to achieve convergence. Here the batch size is chosen to adapt to the memory allowed in Colab. If memory is sufficiently large, one can speed up the training by increasing the batch size.
IvC Test Results
As our test data set consists of masked small CMB images which are not ground truth, we apply the real mask on known small CMB images to evaluate the performance of our model. Consequently, we can check our model’s capacity for CMB image inpainting. As shown by Figure 4
, we compare the training result of CosmoVAE with ground truth image visually and quantitatively. The training indexes: mean square error (MSE), mean absolute error (MAE), and peak signaltonoise ratio (PSNR) are 0.0055, 0.134 and 23.989 respectively, which reveals the excellent performance of the model. The predicted region by CosmoVAE is apparent as compared with the ground truth. It illustrates that our model can achieve a state of the art performance in training.
When having trained the CosmoVAE, we use it to predict the missing pixels of each small CMB image in the test data set. Figure 5 shows five examples of the predicted results by CosmoVAE. We compare our inpainted results with Planck 2018 results [10]. The leftmost plot shows the inpainted CMB image of Planck 2018 results [10]. The second column shows the original (uninpainted) CMB image with an irregular missing region, which is the actual input of the trained CosmoVAE. The third column panel shows the corresponding predicted images, where the network restores the missing region. As we can observe, the trained CosmoVAE can inpaint CMB images with irregular mask regions, even if the mask area is big, and there are multiple mask holes. The predicted results are evident as compared with the Planck 2018 results. CosmoVAE thus provides a useful inpainting model for the CMB map.
IvD Uncertainty Quantification
One can interpret the CosmoVAE as a probability model in a similar way to that of AEVB [20]. More concretely, denote the encoder neural networks model mapping input to a stochastic latent variable (see Figure 2) by the conditional probability , where denotes the parameters of the encoder network. Similarly, denote the decoder neural networks model mapping to the output , as and denote the weights and biases of the decoder network.
In the Bayesian inferential context, is the posterior distribution obtained from the prior and likelihood combination where , and is the likelihood (the ”generative” model or, the decoder), and is the prior. In the variational inference framework, a distribution can be used to approximate this intractable posterior .
If we take the latent variable as a variate and the relationship between and is given by the encoder network with parameter . Then taking
to be a Normal distribution with parameters
, minimising the KullbackLeibler divergence between and the true posterior is now the same as minimising the loss functions with respect to and , where different loss functions broadly correspond to the noise distribution we assume for . In the CosmoVAE, the parameters depend on the input and depend on , where the components of are respectively the pixels outside and inside the masked regions.Having a posterior distribution allows us to obtain uncertainty estimation. To do this, consider the posterior predictive distribution of
, wherewhere is simply the posterior distribution obtained from the variational approximation, using training data , and is the generative model, and when a discriminator model is added to , we can sample and retain images for which the discriminator has computed as true. The variability in the pixels of the ”true” images provides us with the uncertainty measure.
Figure 6 shows the uncertainty in the CosmoVAE predictor. The second column is the CMB image with the missing region masked. The fourth column shows the standard deviation of the inpainted images of the trainedCosmoVAE at each missing pixel over 100 different realizations of the latent variable. Compared with the Planck 2018 result in the first column for the same image segment from the fullsky CMB map, the CosmoVAE inpainted image has the same image quality. The Std Dev. for each image is very small, and the location where the Std Dev. is significant in the square image is a fractional part of the mask region. Thus, the uncertainty of CosmoVAE is controllable and has little effect on the predicted image.
V Conclusion and Future Plan
Statistical challenges to processing the CMB data is one of the biggest challenges in the analysis of CMB data. In this work, we reconstruct the cosmic microwave background radiation (CMB) map by using a modified variational autoencoder (VAE) as our baseline model. We cut the fullsky CMB map into many small images in order to generate our image and mask datasets, and then in training to inpaint the hole area with arbitrary shape mask. To enhance the performance, we combine our neural network with the angular power spectrum, which can generate the Fourier coefficients of the Gaussian random field. Also, we modify the original VAE loss function by adding in the perceptual loss and the totalvariation regularizer. This new VAE model assigns cosmological meaning to the parameters of the network and thus achieves a state of the art performance for CMB map inpainting.
To better complete image inpainting task and for cosmology study, one needs to reconstruct the fullsky CMB map from all small inpainted CMB images. We can use the inpainted fullsky CMB map to estimate cosmological parameters such as the angular power spectrum , which can be computed directly by the healpy package. The inpainting of the CMB map will help reduce the uncertainty in the parametric estimation. By Olivier et al. [25], any method which is based on the marginal loglikelihood, including VAE, necessarily leads to the blurriness of output due to the gap between true negative loglikelihood and the upper bound (ELBO). We can further modify our loss function to improve the quality of reconstructed images by replacing the KLdivergence with GAN or WGAN (more stable) as regularizer. Our model can also be a baseline model for the reconstruction of other Gaussian random fields (besides the CMB field). We will probe these problems in our future work.
Acknowledgments
Some of the results in this paper have been derived using the HEALPix [8] package.
References
 [1] Planck Collaboration, Y. Akrami, F. Arroja, M. Ashdown, J. Aumont, C. Baccigalupi, M. Ballardini, A. Banday, R. Barreiro, N. Bartolo, S. Basak et al., “Planck 2018 results. I. Overview and the cosmological legacy of Planck,” arXiv preprint arXiv:1807.06205, 2018.
 [2] H. K. Eriksen, C. Dickinson, C. R. Lawrence, C. Baccigalupi, A. J. Banday, K. M. Górski, F. K. Hansen, P. B. Lilje, E. Pierpaoli, M. D. Seiffert, K. M. Smith, and K. Vanderlinde, “Cosmic microwave background component separation by parameter estimation,” ApJ, vol. 641, pp. 665–682, Apr. 2006.
 [3] H. K. Eriksen, J. B. Jewell, C. Dickinson, A. J. Banday, K. M. Górski, and C. R. Lawrence, “Joint Bayesian component separation and CMB power spectrum estimation,” ApJ, vol. 676, pp. 10–32, 2008.
 [4] J. Delabrouille, J.F. Cardoso, M. Le Jeune, M. Betoule, G. Fay, and F. Guilloux, “A full sky, low foreground, high resolution CMB map from WMAP,” A&A, vol. 493, pp. 835–857, 2009.
 [5] R. FernándezCobos, P. Vielva, R. B. Barreiro, and E. MartínezGonzález, “Multiresolution internal template cleaning: an application to the Wilkinson Microwave Anisotropy Probe 7yr polarization data,” MNRAS, vol. 420, pp. 2162–2169, Mar. 2012.
 [6] J.F. Cardoso, M. Le Jeune, J. Delabrouille, M. Betoule, and G. Patanchon, “Component separation with flexible models—application to multichannel astrophysical observations,” IEEE J. Sel. Top. Signal Process., vol. 2, pp. 735–746, Nov. 2008.
 [7] P. Cabella, D. Marinucci et al., “Statistical challenges in the analysis of cosmic microwave background radiation,” Ann. Appl. Stat., vol. 3, no. 1, pp. 61–95, 2009.
 [8] K. M. Górski, E. Hivon, A. J. Banday, B. D. Wandelt, F. K. Hansen, M. Reinecke, and M. Bartelmann, “HEALPix: A framework for highresolution discretization and fast analysis of data distributed on the sphere,” ApJ, vol. 622, pp. 759–771, Apr. 2005.
 [9] M. Chu, H. Eriksen, L. Knox, K. Górski, J. Jewell, D. Larson, I. O’Dwyer, and B. Wandelt, “Cosmological parameter constraints as derived from the Wilkinson Microwave Anisotropy Probe data via Gibbs sampling and the BlackwellRao estimator,” Phys. Rev. D, vol. 71, no. 10, p. 103002, 2005.
 [10] Y. Akrami, M. Ashdown, J. Aumont, C. Baccigalupi, M. Ballardini, A. Banday, R. Barreiro, N. Bartolo, S. Basak, K. Benabed et al., “Planck 2018 results. IV. Diffuse component separation,” A&A, 2019.
 [11] A. Aghamousa, J. Hamann, and A. Shafieloo, “A nonparametric consistency test of the CDM model with planck CMB data,” J. Cosmol. Astropart. Phys., vol. 2017, no. 09, pp. 031–031, sep 2017.

[12]
C. K. Williams and C. E. Rasmussen,
Gaussian processes for machine learning
. MIT press Cambridge, MA, 2006, vol. 2, no. 3.  [13] H. Gruetjen, J. R. Fergusson, M. Liguori, and E. Shellard, “Using inpainting to construct accurate cutsky CMB estimators,” Phys. Rev. D, vol. 95, no. 4, p. 043532, 2017.
 [14] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros, “Context encoders: Feature learning by inpainting,” in CVPR, 2016, pp. 2536–2544.
 [15] C. Yang, X. Lu, Z. Lin, E. Shechtman, O. Wang, and H. Li, “Highresolution image inpainting using multiscale neural patch synthesis,” in CVPR, 2017, pp. 6721–6729.
 [16] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, “Generative image inpainting with contextual attention,” in CVPR, 2018, pp. 5505–5514.
 [17] G. Liu, F. A. Reda, K. J. Shih, T.C. Wang, A. Tao, and B. Catanzaro, “Image inpainting for irregular holes using partial convolutions,” in ECCV, 2018, pp. 85–100.
 [18] L. A. Gatys, A. S. Ecker, and M. Bethge, “A neural algorithm of artistic style,” arXiv preprint arXiv:1508.06576, 2015.

[19]
J. Johnson, A. Alahi, and L. FeiFei, “Perceptual losses for realtime style transfer and superresolution,” in
ECCV, 2016, pp. 694–711.  [20] D. P. Kingma and M. Welling, “Autoencoding variational Bayes,” in ICLR, 2014.
 [21] K. Sohn, H. Lee, and X. Yan, “Learning structured output representation using deep conditional generative models,” in NIPS, 2015, pp. 3483–3491.
 [22] O. Ivanov, M. Figurnov, and D. Vetrov, “Variational autoencoder with arbitrary conditioning,” in ICLR, 2019.
 [23] N. Sundaram, T. Brox, and K. Keutzer, “Dense point trajectories by GPUaccelerated large displacement optical flow,” in ECCV, 2010, pp. 438–451.

[24]
A. Zonca, L. Singer, D. Lenz, M. Reinecke, C. Rosset, E. Hivon, and K. Gorski,
“Healpy: equal area pixelization and spherical harmonics transforms for data
on the sphere in python,”
J. Open Source Softw.
, vol. 4, p. 1298, 2019.  [25] I. Tolstikhin, O. Bousquet, S. Gelly, and B. Schoelkopf, “Wasserstein autoencoders,” in ICLR, 2018.
Comments
There are no comments yet.