Learning to do multiframe blind deconvolution unsupervisedly

06/02/2020 ∙ by A. Asensio Ramos, et al. ∙ Instituto de Astrofísica de Canarias 2

Observation from ground based telescopes are affected by the presence of the Earth atmosphere, which severely perturbs them. The use of adaptive optics techniques has allowed us to partly beat this limitation. However, image selection or post-facto image reconstruction methods are routinely needed to reach the diffraction limit of telescopes. Deep learning has been recently used to accelerate these image reconstructions. Currently, these deep neural networks are trained with supervision, so that standard deconvolution algorithms need to be applied a-priori to generate the training sets. Our aim is to propose an unsupervised method which can then be trained simply with observations and check it with data from the FastCam instrument. We use a neural model composed of three neural networks that are trained end-to-end by leveraging the linear image formation theory to construct a physically-motivated loss function. The analysis of the trained neural model shows that multiframe blind deconvolution can be trained self-supervisedly, i.e., using only observations. The output of the network are the corrected images and also estimations of the instantaneous wavefronts. The network model is of the order of 1000 times faster than applying standard deconvolution based on optimization. With some work, the model can bed used on real-time at the telescope.



There are no comments yet.


page 4

page 5

page 6

page 7

page 8

Code Repositories


Learning to do multiframe blind deconvolution unsupervisedly

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Observing any other astronomical object through the Earth atmosphere introduces perturbations that are difficult to correct. The obvious solution of moving to space is not always possible or feasible. Additionally, the largest and more advanced telescopes are always built on the ground, because they usually need technology at the forefront of science.

Active and, especially, adaptive optics (AO), i.e., deformable optics that can compensate for the perturbing effect of the atmosphere almost in real time, have facilitated the use of ground based telescopes. The combination of very fast sensors (that measure the instantaneous wavefront) and deformable mirrors (that correct the wavefront that go to the science cameras) can produce images very close to the diffraction limit of the telescopes, at least in a reduced field-of-view (FOV). These AO systems have been really succesful in the near infrared, where the perturbing effect of the atmosphere is less important. However, AO systems working for visible and near ultraviolet wavelengths are much more demanding. A case of success is the field of solar physics, where such systems are commonly used in telescopes like the Swedish 1-m Solar Telescope (SST) at the Observatorio del Roque de los Muchachos (Spain), the GREGOR telescope on the Observatorio del Teide (Spain) or the Goode Solar Telecope (GST) on the Big Bear Observatory (USA).

Even if AO systems are working properly, some residual wavefront perturbations are still present in the images. These residuals are a consequence of the accumulation of different sources: i) the wavefront sensors (WFS) are not measuring the wavefront perfectly, ii) the deformable mirrors are not correcting properly the wavefront measured by the WFS, iii) there is some delay between the measurement and the actuation, iv) static aberrations in the telescope+instrument optics are not corrected by AO systems and v) classical AO systems with one WFS and one pupil DM only correct close to the optical axis, so the rest of FOV has a much worse correction.

For the previous reasons, reaching the diffraction limit of a telescope is not often possible without a posteriori image correction methods. The simplest techniques of a posteriori correction are based on frame selection, also known as lucky imaging. These methods rely on the fact that the wavefront deformation due to the atmosphere is small at some selected frames when a long burst of short-exposure images is acquired. The fraction of such lucky

frames decreases when the atmospheric turbulence increases. Another problem with this technique lies in its low photon efficiency, because a very large fraction of the frames are discarded. And additional drawback is that it only works properly for small or medium-sized telescopes, with diameters below 2.5 m. In larger telescopes, the probability that low turbulence is found in a significant fraction of the telescope aperture quickly goes to zero. Instruments like FastCam

(Oscoz et al., 2008), that we use in this paper, are fully based on the exploitation of this idea.

More elaborate techniques are based on speckle methods (Labeyrie, 1970; von der Lühe, 1993), phase diversity (Paxman et al., 1992; Löfdahl & Scharmer, 1994a; Löfdahl et al., 1998) or multi-object multi-frame blind deconvolution (MOMFBD; Löfdahl et al., 2002; van Noort et al., 2005). All of them make a much better use of the collected photons, and have been systematically applied for image correction in astronomy in general. Although they are routinely applied to those instruments that produce bursts of quasi-monochromatic images, recent efforts have also extended the range of application to standard slit-based spectrographs (van Noort, 2017). Arguably, the most succesful method of those discussed above is that of MOMFBD, which can produce images at the diffraction limit of the telescope from a rapid burst of short-exposure images once the AO is properly working. The main disadvantage of the MOMFBD method resides on its large computational requirements. Supercomputers working for many hours become necessary to deconvolve the data. With the perception that this might be seen as a showstopper by many researchers, we recently developed an extremely fast multiframe blind deconvolution approach based on supervised deep learning (Asensio Ramos et al., 2018), specifically taylored to solar observations. It is based on a fully convolutional deep neural network that was trained supervisedly with images previously corrected with the help of MOMFBD. Once trained, this method can deconvolve bursts of 1k1k containing 7 short-exposure images in 5 ms with an appropriate Graphical Processing Unit (GPU). This opens up the possibility, for instance, of doing the deconvolution online while analyzing the data.

Although a step forward in terms of speed, the neural approach developed by (Asensio Ramos et al., 2018) has two main issues. The first one is that it is trained with supervision, so one needs to use the MOMFBD algorithm to build the training set. Though not a major obstacle, a method that does not need this previous step would be preferable. The second issue is that the method developed by (Asensio Ramos et al., 2018) only produced the deconvolved images. No estimation of the wavefront in each individual frame was produced. Estimating the wavefronts can be helpful to check the performance of the telescope and instrument, and also to understand the performance of the AO. For this reason, in this work we improve our previous approach by showing how the training can be done in a fully unsupervised manner, while also producing an estimation of the wavefront for each observed frame. Given the lack of supervision, the method can be generally applied to any type of object, once a sufficient amount of training data is available.

2 Unsupervised training

2.1 Image formation

As shown by Löfdahl et al. (2002); van Noort et al. (2005), the deconvolution of a burst of short-exposure images111The exposure time should be small enough to freeze the atmospheric turbulence in each exposure. In normal seeing conditions, an exposure time of a few milliseconds is the target. is possible once the linear physics of image formation is imposed. Let us assume that is the image of the object under study outside the Earth atmosphere. A burst of images taken at times through a linear space invariant instrument (in fact, telescope+instrument) and corrupted with uncorrelated Gaussian noise are acquired. Therefore, the image at time that is sensed at the detector is given by:


where is the convolution operator, is the point spread function (PSF) of the atmosphere at time at which we have an exposure, is the uncorrelated Gaussian noise component and is the spatial coordinate on the image. Note that the object is common to all the images. Any blind deconvolution method then tries to simultaneously recover both and from the burst of images . Note that the index can also be used to refer to simultaneous defocused images, following the prescriptions of phase diversity Paxman et al. (1992). The multiframe blind deconvolution is, naturally, an ill-defined problem, that is solved by imposing certain a-priori knowledge about the object or the PSFs. We follow in this work the approach of Löfdahl et al. (2002) and van Noort et al. (2005), who only impose priors on the PSFs.

The convolution operation in Eq. (1) can be translated into simple multiplications if we transform the equation to the Fourier space:


where the uppercase symbols represent the Fourier transform of the lowercase symbols and

represents Fourier frequencies. The noise is still uncorrelated and Gaussian thanks to the linear character of the Fourier transform.

The space invariant approximation is often violated in normal conditions because of the presence of high-altitude turbulence in the atmosphere, that produces different PSFs for different portions of the field-of-view (FOV), with sizes defined by the anisoplanatic angle. For this reason, when deconvolving extended object, it is customary to solve the deconvolution problems in relatively small patches which are then stitched together to form the final image.

2.2 Description of PSFs

The optical transfer function (OTF) can be written in terms of the generalized pupil function:


In other words, the OTF is the Fourier transform of the PSF which, in turn, is given by the autocorrelation of the generalized pupil function. The generalized pupil function can be written as:


where describes the amplitude modulation of the pupil (the aperture of the telescope, including the primary and secondary and any possible spider) and describes the phase at the pupil (also known as wavefront). A flat wavefront produces an Airy diffraction PSF. The presence of atmospheric turbulence precisely affects this phase, by producing a wavefront that is not flat, which consequently generates a complex PSF. Note that this formalism allows us to take into account a phase diversity channel by writing down the generalized pupil function as:


where is the added diversity, which is usually a defocus.

The prior on the PSF can then easily be imposed by assuming that the wavefront can be written (in radians) as a linear combination on a suitable basis. The Zernike functions (Noll, 1976) are among the most widespread used functions which are orthogonal in the unit circle:


where is the number of functions used in the linear combination, refer to coordinates in the pupil plane and are the -th Zernike coefficient of the -th wavefront. The functions are labeled with , which follows the Noll definition222In fact, Zernike functions are labeled with two integers, and , which were then transformed to a single label by Noll (1976)..

Figure 1: Block diagrams showing the architecture of the network and how it is trained unsupervisedly. The details of each layer are specified in Tab. 1 and in Sec. 2.4.

Although they have nice mathematical properties, Zernike functions are not specially suited for efficiently reproducing wavefronts produced by atmospheric turbulence. The reason is that the covariance matrix of the coefficients of the Zernike modes under Kolmogorov turbulence (also termed Noll covariance matrix) is non-diagonal. Specifically, the matrix elements of the Noll covariance matrix are given by


where is the Gamma function (Abramowitz & Stegun, 1972), and are the diameter of the telescope and Fried radius, respectively. The covariance matrix elements are strictly zero when or when

is odd (unless


As a consequence, it is a better option to use the so-called Karhunen-Loeve modes (e.g., van Noort et al., 2005)

, which are obtained by diagonalizing the covariance matrix. This diagonalization can be carried out using the singular value decomposition, ordering the modes by their eigenvalue.

2.3 Loss function

In a standard multiframe blind deconvolution, the object and the wavefronts are obtained by solving the following problem:



is a vector obtained by concatenating all the coefficients of the wavefront (either using the Zernike or the KL basis functions) at all times

. The loss function is given by


as a consequence of the assumption of uncorrelated Gaussian noise. The term

is an estimation of the inverse variance of the

-th image. The summation is carried out for all images in the burst and for all frequencies in the Fourier plane. This loss function is non-convex in the set of parameters , but one can apply an alternating optimization method to solve it. This scheme iteratively considers the two following sub-problems:


It turns out that the solution to Eq. (12) can be analytically obtained, giving:


where the caret indicates an estimated quantity. This object can then be inserted back in the loss function, so that we end up with a loss function that does not depend on the object, only on the modal coefficients:


In case many objects are observed simultaneously, the total loss function is the result of summing the loss function computed for each one of the objects, while sharing the same wavefront.

Layer Type Kernel size(H,W,C). H and W: kernel horizontal size, C: kernel depth. Stride Input shape(H,W,C). H and W: image horizontal size, C: image number of channels. Output shape(H,W,C). H and W: image horizontal size, C: image number of channels.
Table 1: Architecture of encoder-decoder network. The naming convention for the convolutional blocks is , with referring to the label indicated above each block in Fig. 1 to the layer inside each block.

Eq. (15) defines a loss function that can be optimized with respect to to find the instantaneous wavefront and, consequently, PSF that is affecting each one of the frames of the burst. This is what is done by Löfdahl et al. (2002) and van Noort et al. (2005) for the case of an extended object. Once the wavefronts are computed, the deconvolved image can be easily obtained by using Eq. (14):


where, following Löfdahl et al. (2002) and van Noort et al. (2005), we use a filter in the Fourier plane to efficiently remove the effect of noise. This filter has been described by Löfdahl & Scharmer (1994b) and we compute it as:


setting it to zero for all values below 0.2 and above 1. Finally, we remove all isolated peaks in the filter that are not directly connected to the zero frequency. We also enforce non-negativity by using the operator that sets all negative pixels to zero. This non-negativity constraint is not included in the loss function because Eq. (15) assumes that the estimated object if given by Eq. (14).

Apart from the standard ill-definition of the multiframe blind deconvolution problem, that can be alleviated with prior information, this method is always subject to some fundamental ambiguities that are harder to deal with (Paxman et al., 2019). One of the most critical ones in our approach is the fact that the global tilt cannot be obtained. If the object is shifted by a certain amount, one can always compensate for it with a tip-tilt in the phase of the wavefront so that the image of the object remains stationary. Consequently, any learning method that we use will get confused on the specific amount of tip and tilt to infer from the images. We partially solve this by pre-aligning all images in the bursts so that the object of interest is, on average, centered on the field of view. We do this by computing the sum of all the images in the burst, computing the peak emission and shifting this peak to lie at the center of the image with pixel precision. Additionally, we impose that the average tip and tilt coefficients in the burst is zero during training, so that


2.4 Neural architecture

Instead of directly optimizing the loss function of Eq. (15), we propose to build a deep learning architecture that directly predicts the vector

from the images of the burst. This architecture is broadly made of: a convolutional neural network that extracts features from individual images of size

, a recurrent neural network that takes into account the time evolution of the wavefront, a fully connected neural network that finally predicts the wavefront coefficients for each frame, and a layer that computes the OTFs. Our approach is graphically depicted in Fig.

1 and in the following we describe each component in detail.

Figure 2: NOT pupil with the turbulent phase, measured in radians, produced by a Kolmogorov synthetic atmosphere with a Fried radius of cm.

2.4.1 Convolutional neural network

The aim of the first element of the architecture is to summarize the images and extract all relevant information in a vector, that can be used later for the prediction of the wavefront coefficients. This component is shared among all frames, so it can be applied in parallel for all the inputs images. This neural network is a fully convolutional encoder, whose properties are summarized in Tab. 1

. The first step is a convolutional layer with a relatively broad kernel that generates 16 channels from the input image. Then, a series of standard convolutional blocks made of the consecutive application of batch normalization

(Ioffe & Szegedy, 2015)

, an exponential linear unit activation function

(ELU; Clevert et al., 2015) and a convolutional layer with the kernel size specified in Tab. 1. In order to accelerate convergence, skip connections are added between the initial layer of a block and the last one. A final layer, indicated in orange, uses a kernel of size to produce a vector of size 256 as output.

Object FWHM FWHM/FWM SNR(1) SNR(20) SNR(50) SNR(100) SNR(200)
GJ661 0.097 1.23 1529 2938 3764 4706 5772
-Ori 0.100 1.28 624 1367 1951 2762 4155
GJ569 0.118 1.51 1398 2697 3182 4220 4926
GJ856 0.088 1.12 20 82 112 169 203
M15 0.085 1.08 19 71 95 157 244
Table 2: FWHM

2.4.2 Recurrent neural network

Some degree of time correlation is expected for the wavefronts of all considered frames. The atmospheric turbulence is expected to be not too different from one image to the next, specially if very fast bursts are used. For this reason, it makes sense to use a recurrent structure that keeps memory from one frame to the next. This way, the information contained in one frame is partially used for the estimation of the wavefront in the next and previous frames. One of the most succesful recurrent neural architectures is the Long-Short Term Memory network

(LSTM; Hochreiter & Schmidhuber, 1997), which is able to deal with relatively long sequences. They contain an internal state (cell) that remembers values over long sequences, and three gates (input, output and forget) that are used to control the flow of information into and out of the cell. We choose the cell to be a vector of length 256 in our case. Since there is not obvious time ordering in atmospheric turbulence, we propose to use a bidirectional LSTM, as depicted in Fig. 1. It consists of two LSTM networks that have access to the sequence in opposite directions. The output that goes to the next step of the architecture is the concatenation of the outputs of the two LSTMs.

2.4.3 Fully connected neural network

A final fully connected layer, that is shared for all time steps, produces the wavefront coefficients. The input layer of this network transforms the vector of size 512, produced by the concatenation of the two LSTMs, to a vector of length 256, applies an ELU activation function and a final linear layer produces a final vector of length .

2.4.4 Computation of OTFs

Once the wavefront coefficients are known for all images in the burst, one can use Eq. (6) to compute the phase on the pupil. Then, the generalized pupil function is obtained from Eq. (4) and the OTF from Eq. (3). This, together with the Fourier transforms of the input images, are all the ingredients needed for the computation of the loss function using Eq. (15).

2.5 Training

The training is done by modifying the parameters of the neural networks so that the loss function of Eq. (15) is minimized for a suitable training set. The three components of our architecture have a total number of

2.3 M free parameters. The training is carried out using backpropagation, i.e., computing the derivative of the loss function with respect to the free parameters and using this gradient to modify them. The recurrent neural network needs to be trained using backpropagation in time. To this end, it is unrolled for 100 steps and considered it as a normal fully connected neural network.

Figure 3: Deconvolved images with the classical approach using 100 frames for three different sources.
Figure 4: Deconvolved images with 20, 50, 100 and 200 frames for single and binary stars. A sample of the corresponding original frames are shown in Figs. 5 and 6, and the measured size of the stars is listed on Tab. 2.

For the examples shown here, we choose observations carried out with the FastCam instrument mounted on the Nordic Optical Telescope (NOT) on the Observatorio del Roque de los Muchachos (La Palma, Spain). FastCam is a lucky imaging instrument jointly developed by the Spanish Instituto de Astrofísica de Canarias and the Universidad Politécnica de Cartagena. The instrument uses an Andor iXon DU-897 back-illuminated EMCCD containing a 512x512 pixel frame. The observations were carried out with a standard I Johnson-Bessel filter at an effective wavelength of 824 nm with a with of 175 nm. The pixel size was 0.0303”. The telescope diameter is 2.56 m, with a central obscuration of 0.51 m, giving a diffraction limit of 0.0786”. A typical wavefront is displayed in Fig. 2. The observations were obtained on four consecutive nights on 2007 October 3-6, and they include the following objects: GJ1002, GJ144, GJ205, GJ661, RHY1, RHY44, for a total of several hundred thousand images during the four-days run. Some of them are single stars in the FOV and others contain a pair of stars. The images in the training set have dimensions 128128, with an angular pixel size of 30.3 milliarcsec. The training set consists of 26 bursts of 1000 images each with an exposure time of 30 ms, enough to efficiently freeze the atmospheric turbulence. The images are taken at different times, and they cover reasonably variable seeing conditions. Given the unsupervised character of our approach, the neural network can be easily refined by adding more observations which can cover different seeing conditions.

The training is done by randomly extracting 1000 short bursts of 100 frames (this is the number of unrolled steps of the LSTM recurrent component of our architecture) from the 26 bursts, for a total of 26000 training examples. To facilitate the training, the images are normalized by computing the maximum and minimum in the burst and mapping these values to the interval. Once the wavefront coefficients are computed, this normalization is not needed and the deconvolved image can be reconstructed using the original images.

Figure 5: Original frames of the burst (upper row), estimated PSF (middle row) and for -Ori in the upper panel and GJ661 in the lower panel.
Figure 6: Same as Fig. 5, but for M15 in the upper panel and GJ856 in the lower panel.

The bursts are then subjected to the following augmenting strategy that helps in the generalization capabilities of the architecture. Each burst is randomly rotated 0, 90, 180 or 270 degrees and flipped horizontally or vertically with equal probability. A validation set of 700 bursts is put apart to check for overfitting, using 7 different bursts not used during training.

The neural networks are implemented in PyTorch 1.5

(Paszke et al., 2019), which uses automatic differentiation to compute backpropagation step. We use the Adam optimizer (Kingma & Ba, 2014), with a learning rate of 310

and a batch size of 8, during 50 epochs. We found that the chosen learning rate produces suitable results and it was kept fixed for all experiments. Each epoch takes roughly

90 min per epoch, so the total training time is roughly 3 days on an NVIDIA RTX 2080 GPU.

3 Results

3.1 Deconvolved images

Our results are compared with a standard multiframe blind deconvolution method as baseline. To this end, we minimize the loss function of Eq. (15) using the KL coefficients of the wavefront in each frame as unknowns. We use PyTorch to optimize this loss function using the Adam optimizer with a learning rate of 0.1, which was selected by trial and error. Once the KL coefficients are obtained, the image reconstruction is done following exactly the same scheme as in the neural approach. The average computing time per iteration for the deconvolution of 100 frames is 0.8 s. The typical number of iterations for convergence is around 70, so the deconvolution can be achieved in around one minute. The results for three sources are displayed in Fig. 3

. We only show results for sources with sufficient signal-to-noise ratio (SNR). The deconvolution with the classical approach for sources with reduced SNR per frame turns out to be very difficult or impossible.

Once the neural deconvolution scheme is properly trained, we apply it to several observations to show how it performs. Figure 4 displays the deconvolved results for -Ori in the upper left panel, for GJ661 in the upper right panel, for a region on the M15 globular cluster in the lower left panel and for GJ856 in the lower right panel. We note that, although GJ661 has been used for training, the observations used for the results have been obtained in a different time, with different atmospheric conditions. We display the results of the deconvolved image when 20, 50, 100 and 200 frames are considered. In general, we find that already 20 frames (and in other cases even less than that) is enough to produce a decent deconvolved image. However, when the number of images is not large enough, some diffraction rings appear around the main components. These rings appear even when the sophisticated Wiener filter described above is applied in the Fourier plane. This ringing is more important when the individual frames are more noisy or when the seeing conditions are worse. The main effect of adding more frames lies in the reduction of the surrounding rings and the increase in the compactness of the stellar images. We compute the FWHM of the stars by fitting an ellipsoidal Gaussian function to the star. The ensuing ellipsoidal quadratic mean FWHM of the stars when using 100 frames are tabulated in Tab. 2. They turn out to be a factor in the range [1,1.5] of the diffraction limit for 800 nm at the NOT telescope. This table also displays the evolution of the SNR, showing a monotonic increase with the number of frames considered. However, the increase in SNR is slower than the expected one for pure Poisson noise, something that is common in any deconvolution method.

Figure 7: Time evolution of the first 16 coefficients of the phase in the KL basis in radians for the M15 observation.

A comparison of the results of Figs. 3 and 4 shows that the deconvolved images are practically the same when using 100 frames for the observations with large SNR. The final FWHM of the stars is similar between both calculations with a difference that always lies below 10%. One advantage of the neural approach is that it can be applied seamlessly to any observation irrespectively of the SNR of the frames and it provides a very good result. Perhaps the largest difference lies in the fact that the computing time for a single deconvolution of 100 frames is 0.1 s, close to a factor 1000 faster. This time includes the input/output time to/from the GPU and contains some overheads that can be easily avoided. Additionally, thanks to the inherent parallelization in GPUs, the time per deconvolution can be reduced if several stars are deconvolved concurrently. The only limitation is the amount of memory on the GPU.

3.2 PSFs

It it obvious from the deconvolved images that the individual wavefronts that we estimate have to agree to some degree with the real ones. This comparison is shown in Figs. 5 and 6. The upper row of each panel shows six raw frames of the burst. We can immediately verify that the seeing conditions and signal-to-noise ratios (S/N) are different in all the examples we consider. For instance, the spread of the images in -Ori is much larger than that of GJ661. Since the pixel size is the same in both observations, this means that the KL coefficients aof the turbulence are higher for the former observation.

The second row of the panels displays the instantaneous PSF estimated by the neural approach. Finally, as a consistency check, we re-convolve the deconvolved image with the estimated PSF. The results of this operation should then be similar to the observed frame, apart from the obvious noise damping as a consequence of the much cleaner deconvolved image. The images clearly show that, in general, we are capturing the shape of the PSF correctly. One can see minute details of the image that are reproduced with great fidelity in the re-convolved image. Perhaps one can argue that, in cases of very bad seeing with complex PSFs like the case of -Ori, the re-convolved object is slightly more diffuse than the original one. However, the neural approach captures enough details of the PSF so that the ensuing deconvolved image is of very high quality.

Figure 8: Flux ratio between the two stars in the FOV in GJ856 (left panel) and M15 (right panel). We only show 150 frames for M15 because there were some artifacts on the latest 50 frames of this specific observation.

As an example, the inferred wavefront coefficients of the first KL modes for M15 are shown in Fig. 7. The tip-tilt coefficients (first and second mode) are relatively small because of the pre-alignment that we carry out. Additionally, their average values is zero because of the contraint of Eq. (18). However, the residual subpixel image motion is still the main contributor to the wavefront. The recurrent structure in our neural architecture is able to exploit the time correlation that is present in the wavefront coefficients. This effectively couples together all the frames so that information from one timestep can be helpful on the following frame. Note that each frame has 30 ms exposure time, but the overhead due to readout is 56%. Therefore, the total elapsed time for 100 frames is roughly 4.7 s.

3.3 Photometry

Deconvolution should maintain the photometric properties across the field of view. We checked that this is indeed the case by computing the ratio between the fluxes of the two stars in the GJ856 and M15 observations. Figure 8 shows a crude estimation of the flux ratio from the original frames in blue by summing up all the light on boxes around the two stars. These boxes are chosen so that the long-exposure spots fully fall into the boxes. Also in blue as a horizontal line we show the average flux ratio. This very same values for the deconvolved images when a different amount of frames are considered are displayed in different colors.

4 Conclusions

We have presented a general scheme to train a neural multiframe blind deconvolution architecture without the use of supervision. The method makes use only of observed images, together with information about the telescope entrance pupil, the angular pixel size in the camera and the wavelength of the observations. We have shown, with examples obtained from the NOT, that the neural deconvolution generalizes correctly to unseen images. The method also provides as output the instantaneous wavefront produced by the atmospheric turbulence. We have also checked that the results improve monotonically when the number of frames is increased. It can seamlessly deal with an arbitrary number of frames during the deconvolutio. The method is also photometrically stable and very fast if compared with standard iterative blind deconvolution methods. The code for training or evaluation, with the parameters of the networks, is freely available444https://github.com/aasensio/unsupervisedMFBD.

The training is limited to spatially invariant PSFs. Therefore, when a spatially variant PSF is expected in the FOV, one should follow the standard approach of dividing the image in different anisoplanatic patches and applying this tool in each one of them. Finally, the patches need to be stitched together to form the final image. We note that this strategy is used regularly in solar observations with great success.

There are several possible avenues of improving on this work. The first one is to train an architecture that can blindly deconvolve images from a variety of telescopes and/or wavelengths. Observations of these telescopes and/or wavelengths are needed for the training, though. The formalism remains the same except on the construction of the OTF from the generalized pupil. In this case, one needs to take into account the specific aperture of the telescope and the influence of the wavelength on the diffraction limit of the telescope. Apart from that, we anticipate that conditioning the entrance of the LSTM feature vector with the telescope properties and the wavelength should be enough. This can be easily done by concatenating this information on the input vector.

The second potential improvement is to add more training examples that have a larger variety of objects, from point-like to extended ones. We plan for the future to apply the unsupervised training for the blind deconvolution of solar images. This is supported by the fact that our trained architecture is quite robust to the use of only a few training examples.

Another constraint of our approach is that the input images are currently limited to be of a fixed size of 128128. This is a consequence of the presence of the fully connected LSTM and FC networks. This can be potentially solved by transforming our architecture into a fully convolutional one. This can be achieved by transforming the LSTM into a ConvLSTM, its convolutional counterpart. The FC network can then be transformed into a fully convolutional network. All networks can be trained with images of a certain size and, once trained, can be applied to images of any other size. For instance, if the input images are of size 128128, the input to the LSTM will have size 1616, so that at the output we would predict the wavefront in 88 patches. For computing the loss function one would need a way to deal with this spatially variant PSFs. One option would be to compute the loss function locally in each patch and adding them together.

Finally, recurrent neural networks have been overcome in recent years by the use of more robusts approaches. We plan to study the application of Transformers (Vaswani et al., 2017) based on the idea of neural attention to this problem, which can better exploit the time information of the observations.

I thank Álex Oscoz, Roberto López and Jorge Andrés Prieto for providing the FastCam datasets and suggesting improvements to the initial draft of the paper. I thank Michiel van Noort for insisting on the interest of inferring wavefronts in addition to the deconvolved image. This study was discussed in the workshop Studying magnetic-field-regulated heating in the solar chromosphere

(team 399) at the International Space Science Institute (ISSI) in Switzerland. This paper is based on observations made with the Nordic Optical Telescope operated by the Nordic Optical Telescope Scientific Association in the Spanish Observatorio del Roque de los Muchachos of the Instituto de Astrofísica de Canarias. We are very grateful to the ING staff and the IAC Support Astronomers Group for their efforts. We acknowledge financial support from the Spanish Ministerio de Ciencia, Innovación y Universidades through project PGC2018-102108-B-I00 and FEDER funds. This research has made use of NASA’s Astrophysics Data System Bibliographic Services. We acknowledge the community effort devoted to the development of the following open-source packages that were used in this work:

numpy (numpy.org), matplotlib (matplotlib.org), and PyTorch (pytorch.org).


  • Abramowitz & Stegun (1972) Abramowitz, M. & Stegun, I. A. 1972, Handbook of Mathematical Functions (New York: Dover)
  • Asensio Ramos et al. (2018) Asensio Ramos, A., de la Cruz Rodríguez, J., & Pastor Yabar, A. 2018, A&A, 620, A73
  • Clevert et al. (2015) Clevert, D.-A., Unterthiner, T., & Hochreiter, S. 2015, Under Review of ICLR2016 (1997)
  • Hochreiter & Schmidhuber (1997) Hochreiter, S. & Schmidhuber, J. 1997, Neural Computation, 9, 1735
  • Ioffe & Szegedy (2015)

    Ioffe, S. & Szegedy, C. 2015, in Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, 448–456

  • Kingma & Ba (2014) Kingma, D. P. & Ba, J. 2014, CoRR, abs/1412.6980
  • Labeyrie (1970) Labeyrie, A. 1970, A&A, 6, 85
  • Löfdahl et al. (1998) Löfdahl, M. G., Berger, T. E., Shine, R. S., & Title, A. M. 1998, ApJ, 495, 965
  • Löfdahl et al. (2002) Löfdahl, M. G., Bones, P. J., Fiddy, M. A., & Millane, R. P. 2002, in Image Reconstruction from Incomplete Data, Vol. 4792, 146–155
  • Löfdahl & Scharmer (1994a) Löfdahl, M. G. & Scharmer, G. B. 1994a, A&As, 107, 243
  • Löfdahl & Scharmer (1994b) Löfdahl, M. G. & Scharmer, G. B. 1994b, A&As, 107, 243
  • Noll (1976) Noll, R. J. 1976, Journal of the Optical Society of America, 66, 207
  • Oscoz et al. (2008) Oscoz, A., Rebolo, R., López, R., et al. 2008, in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, Vol. 7014, Proc. SPIE, 701447
  • Paszke et al. (2019) Paszke, A., Gross, S., Massa, F., et al. 2019, in Advances in Neural Information Processing Systems 32, ed. H. Wallach, H. Larochelle, A. Beygelzimer, F. d’é Buc, E. Fox, & R. Garnett (Curran Associates, Inc.), 8024–8035
  • Paxman et al. (2019) Paxman, R. G., Carrara, D. A., Miller, J. J., et al. 2019, in Unconventional and Indirect Imaging, Image Reconstruction, and Wavefront Sensing 2019, ed. J. J. Dolne, M. F. Spencer, & M. E. Testorf, Vol. 11135, International Society for Optics and Photonics (SPIE), 106 – 116
  • Paxman et al. (1992) Paxman, R. G., Schulz, T. J., & Fienup, J. R. 1992, Journal of the Optical Society of America A, 9, 1072
  • van Noort (2017) van Noort, M. 2017, A&A, 608, A76
  • van Noort et al. (2005) van Noort, M., Rouppe van der Voort, L., & Löfdahl, M. G. 2005, Sol. Phys., 228, 191
  • Vaswani et al. (2017) Vaswani, A., Shazeer, N., Parmar, N., et al. 2017, in Advances in Neural Information Processing Systems 30, ed. I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Curran Associates, Inc.), 5998–6008
  • von der Lühe (1993) von der Lühe, O. 1993, A&A, 268, 374