AutoPhaseNN: Unsupervised Physics-aware Deep Learning of 3D Nanoscale Coherent Imaging

by   Yudong Yao, et al.
Argonne National Laboratory

The problem of phase retrieval, or the algorithmic recovery of lost phase information from measured intensity alone, underlies various imaging methods from astronomy to nanoscale imaging. Traditional methods of phase retrieval are iterative in nature, and are therefore computationally expensive and time consuming. More recently, deep learning (DL) models have been developed to either provide learned priors to iterative phase retrieval or in some cases completely replace phase retrieval with networks that learn to recover the lost phase information from measured intensity alone. However, such models require vast amounts of labeled data, which can only be obtained through simulation or performing computationally prohibitive phase retrieval on hundreds of or even thousands of experimental datasets. Using a 3D nanoscale X-ray imaging modality (Bragg Coherent Diffraction Imaging or BCDI) as a representative technique, we demonstrate AutoPhaseNN, a DL-based approach which learns to solve the phase problem without labeled data. By incorporating the physics of the imaging technique into the DL model during training, AutoPhaseNN learns to invert 3D BCDI data from reciprocal space to real space in a single shot without ever being shown real space images. Once trained, AutoPhaseNN is about one hundred times faster than traditional iterative phase retrieval methods while providing comparable image quality.



There are no comments yet.


page 3

page 4

page 5

page 9

page 11

page 13

page 19

page 24


Real-time 3D Nanoscale Coherent Imaging via Physics-aware Deep Learning

Phase retrieval, the problem of recovering lost phase information from m...

Real-time coherent diffraction inversion using deep generative networks

Phase retrieval, or the process of recovering phase information in recip...

Synthetic aperture imaging with intensity-only data

We consider imaging the reflectivity of scatterers from intensity-only d...

prDeep: Robust Phase Retrieval with Flexible Deep Neural Networks

Phase retrieval (PR) algorithms have become an important component in ma...

Efficient Training Data Generation for Phase-Based DOA Estimation

Deep learning (DL) based direction of arrival (DOA) estimation is an act...

DH-GAN: A Physics-driven Untrained Generative Adversarial Network for 3D Microscopic Imaging using Digital Holography

Digital holography is a 3D imaging technique by emitting a laser beam wi...

On the interplay between physical and content priors in deep learning for computational imaging

Deep learning (DL) has been applied extensively in many computational im...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


The problem of phase retrieval is a central problem in many imaging techniques including X-ray Bragg coherent diffraction imaging (BCDI) and ptychography[1], electron ptychography[2], Lorentz transmission electron microscopy (LTEM)[3]

, super-resolution optical imaging

[4], and astronomy[5]

. Phase retrieval is the algorithmic process of recovering phases from measured scattered intensities alone. In BCDI for example, a nanocrystalline sample is illuminated with a coherent X-ray beam from a synchrotron source or X-ray Free Electron Lasers (XFEL) and the scattered intensities are measured in the far-field at a Bragg peak. The measured intensities represent the modulus of the complex Fourier Transform (FT) of the object, but the phase of the wave is lost. Hence, the 3D image of the object cannot be recovered from a simple inverse FT and we must resort to phase retrieval algorithms that can recover this lost phase information to recover an image of the object. Additionally, when measured at a Bragg peak, the phase is influenced by the local strain within the crystal. Consequently, in addition to being a fundamental requirement to recovering the object’s 3D structure, phase recovery also provides a 3D map of the strain state within the crystal, encoded as a phase of the complex image. This capability of BCDI to provide nanoscale structural information as well as picometer sensitivity to strain has had profound implications for the materials science, chemistry and solid-state physics communities. Examples include defect dynamics in battery electrodes

[6], in-situ catalysis[7, 8], photon transport[9, 10, 11], phase transformation[12, 13, 14], and plastic deformation[15, 16, 17].

More broadly, while coherent imaging techniques have grown to become an integral part of electron and X-ray materials characterization[2, 3, 18], their dependence on iterative phase retrieval to recover sample images prevents real-time feedback, which is particularly crippling for in-situ and operando experiments. Iterative phase retrieval typically requires thousands of iterations and often multiple starts to arrive at a robust solution, often taking longer than a single dataset acquisition time.

Neural network (NN) models have been developed to rapidly solve inverse problems across a variety of disciplines including magnetic resonance imaging (MRI)[19], image denoising[20, 21], super-resolution[22, 23, 24], etc.. Specific to the problem of phase retrieval, neural networks have been trained to learn to retrieve phases in holographic imaging[25], lensless computational imaging[26], X-ray ptychography[27, 28, 29], Fourier ptychography[30] and in BCDI[31, 32, 33, 34, 35]

. Each of models in the aforementioned papers have been trained in a supervised manner, that is, training of the network uses pairs or triplets of images; the experimentally acquired (or forward simulated) data, and the unknown sample’s complex image (typically amplitude and phase). NN training is then achieved by optimizing the weights of the network so that the network learns to output the sample image for a given measured data (the input diffraction pattern in the BCDI case). While this approach has been shown to provide speed, as well as reduced reciprocal space sampling requirements, in comparison to iterative phase retrieval, the approach is limited by the need for a large volume of simulated data with corresponding images or untenable quantities of experimental data which has been inverted through traditional iterative phase retrieval. Both types of training data suffer from limitations. Simulated data is very often a poor substitute for real data, for example, it is hard to generate training data that is sufficiently diverse or is well representative of the experimental data, and it is usually free of experimental artefacts. On the other hand, performing phase retrieval on experimental data before training is a computationally and manually intensive task due to the number of hyperparameters that need to be adjusted for successful convergence. Recently, a new type of model, PhaseGAN, which is a generative adversarial network, has been trained to perform phase retrieval without pairing of inputs and outputs


. But this approach also requires the availability of ground truth data, the subtle difference is that the inputs to the network and desired outputs do not need to be paired. Another recent study included a physics-informed neural network (PINN) which was trained in an unsupervised manner to recover the lost phase information from XFEL pulses. However, the authors found PINN not as accurate as the network trained with supervised learning


In this work, we demonstrate the application of AutoPhaseNN, a physics-aware unsupervised deep convolutional neural network (CNN) that learns to solve the phase problem

without ever being shown real space images of the sample amplitude or phase. By incorporating the physics of the X-ray scattering into the network design and training, AutoPhaseNN learns to predict both the amplitude and phase of the sample given the measured diffraction intensity alone. Additionally, unlike previous deep learning models, AutoPhaseNN does not need the sample’s amplitude and phase at any point, either in training or in deployment. Once trained, AutoPhaseNN is 100 times faster than the iterative phase retrieval with comparable image quality. Furthermore, we show that by using AutoPhaseNN’s prediction as the learned prior to iterative phase retrieval, we can achieve consistently higher image quality, than neural network prediction alone, at 10 times faster speed than iterative phase retrieval alone. We demonstrate the application of AutoPhaseNN to 3D BCDI, motivated by the importance of the technique to materials characterization, especially in the light of upgrades to synchrotron sources and XFELs occurring world-wide. These new light sources are revolutionizing both the spatial and temporal resolution of the technique, although this revolution comes in the form of data volumes that will be completely untenable for iterative phase retrieval methods. However, we note that our approach is broadly applicable to any phase retrieval problem (or more generally to inverse problems) where the forward model is accurately known.



The AutoPhaseNN framework is built by combining a conventional 3D CNN with the physical model of X-ray scattering. The 3D CNN is trained to learn the inverse process of the physical model with only simulated and/or measured diffraction patterns without needing real space images. Once trained, the physical model can be discarded and only the 3D CNN model is used to provide the inversion from 3D diffraction intensities to real space images. In this section, we briefly describe the neural network architecture used in the training stage. Subsequently, we demonstrate the efficacy of the trained model and the subsequent refinement process using both simulated and experimental data.

Unsupervised deep learning model

The architecture of the unsupervised physics-aware deep learning model (AutoPhaseNN) is depicted in Fig. 1A. The model is based on a 3D CNN framework with an encoder-decoder architecture that learns a direct inversion from the diffraction intensity to the real space image amplitude and phase. The encoder takes the magnitude of the 3D diffraction pattern with the size of as its input. Then, the encoded data is passed through two separated decoders to generate the

amplitude and phase images in real space. The input is connected to the output using convolution blocks, max pooling, upsampling and zero padding layers. The convolution block is composed of two 3


3 convolution layers, the leaky rectified linear unit (LRLU) activation function and the batch normalization (BN). This portion of the network (represented within the 3D CNN block in Fig. 1A) is similar to previously published architectures for 2D and 3D BCDI inversion

[31, 32, 27, 28, 34]

. In addition to the 3D CNN portion, we include the X-ray scattering model into the network architecture, including the numerical modeling of diffraction and custom layers to enforce implicit priors (Fig. 1B). The outputs of the 3D CNN, amplitude and phase images, are combined to form the complex number, which is input to the X-ray scattering model to obtain the estimated reciprocal space intensity. Further details of the network architecture along with the X-ray scattering model are included in

Materials and Methods.

Loss for each training example is then computed as the mean absolute error (MAE) between measured and estimated diffraction intensities. The loss function is defined as:


where and correspond to the estimated and measured diffraction intensities and

is the size of the output image. We note that the reciprocal space intensity is also the input to the network (Fig. 1A), and hence at no stage do we directly use the target outputs (i.e., sample amplitude and phase) to train the network. Simulated datasets generated from a physics-informed data preparation pipeline using atomistic structures

[34] and a small amount of experimental datasets acquired from BCDI experiments were used for network training and testing (see Materials and Methods for details of the training and testing data). Once the training is complete, we discard the X-ray scattering model and only keep the 3D CNN portion of the network which has now learnt to invert the intensity from coherent X-ray diffraction to the sample image amplitude and phase in a single step.

Performance on simulated 3D CDI data

Figure 2 shows the examples of AutoPhaseNN’s performance on simulated data. We tested the trained AutoPhaseNN model on 2k 3D diffraction patterns that were never shown to the network during training. To evaluate the quality of the prediction, we calculated the error for the modulus of the diffraction pattern in reciprocal space defined as:


together with the structural similarity (SSIM) for the amplitude of the real space image. Figure 2 (A and B) show the distributions of the error and SSIM, indicating that the network gives excellent performance in predicting the amplitude and phase of the 3D image from the input diffraction intensity. Four representative results are displayed in Fig. 2 (C-G). The predicted 3D images (Fig. 2F) show high agreement with the corresponding ground truth images (Fig. 2D). Even for crystal 4, which has the largest computed error under the reciprocal space error metric, the network still correctly predicts the shape and phase distribution of the image.

Refinement on deep learning prediction

For comparison, we performed conventional iterative phase retrieval on the test datasets. The computed errors for the network prediction in Fig. 2F are relatively large compared to phase retrieval results in Fig. 2E. To reduce the error of the prediction from AutoPhaseNN, a refinement procedure was conducted on the obtained real space 3D image. We performed iterative phase retrieval using the output of AutoPhaseNN model (amplitude and phase of the 3D image) as the learned prior. In Fig. 2G, we demonstrate that the reconstruction error can be further reduced with only 50 iterations of the refinement process and reach a comparable or even lower error compared to the phase retrieval results (which does not achieve the same level of error even after 600 iterations).

Demonstration on experimental 3D BCDI data

The performance of the trained AutoPhaseNN model was also evaluated on experimentally acquired data. 3D BCDI data was measured from sub-micrometer size gold crystals grown by high temperature dewetting of a thin gold film deposited on a Silicon substrate, with the native oxide intact. The crystals form with a predominately (111) crystallographic orientation normal to the substrate surface. This leads to a favored texture of (111) Bragg peaks with a random orientation about the surface normal. We isolated crystals with diffraction signals that were sufficiently isolated from the predominate (111) texture and measured three-dimensional coherent diffraction patterns in the vicinity of those individual (111) Bragg peaks. Further experimental details are described in Materials and Methods.

Following a standard pre-processing step (see details in Materials and Methods), we applied AutoPhaseNN to the experimental data. We also performed iterative phase retrieval with the same parameters as before for comparison to AutoPhaseNN’s predictions. The results are shown in Fig. 3 (A and B). AutoPhaseNN provides a predicted shape and strain field of the crystal very close to the phase retrieval results with 100 times speed up, taking about 200 ms for each prediction on a CPU processor, in contrast to phase retrieval that took about 28 s for 600 iterations.

As before, we also used AutoPhaseNN’s prediction as a learned prior to iterative phase retrieval. The recovered images after just 50 iterations of phase retrieval starting from AutoPhaseNN’s prediction are shown in Fig. 3C. From the visual comparison and the error, the refinement produces improved details in shape and phase and gives greater similarity to the phase retrieval images. Additionally, the combination of AutoPhaseNN prediction and the refinement procedure is still about 10 times faster than the iterative phase retrieval alone. Figure 4 shows the reconstruction error as a function of the iteration for the five representative results shown in Fig. 3. The dashed lines represent errors for iterative phase retrieval alone and the solid lines are errors for the refinement iterations using the predicted prior, showing the refinement procedure starting with a lower reconstruction error and converging faster compared with the conventional phase retrieval since its initial image is already very close to the true solution.


We have demonstrated the application of the developed unsupervised physics-aware deep learning model, AutoPhaseNN, for direct inversion of the 3D BCDI data from the far-field measurement to the real space image. By incorporating the X-ray scattering model into the network architecture, AutoPhaseNN is trained with only measured diffraction patterns without needing real space images. Once trained, only the 3D CNN model is used to provide 3D images in the real space from the measured diffraction intensities in the reciprocal space. The AutoPhaseNN prediction is 100 times faster when compared to traditional iterative phase retrieval (600 iterations), which will benefit real-time CDI experiments. Combined with the refinement process (50 iterations of error reduction), the final result is as good as the traditional phase retrieval while being 10 times faster. We believe the AutoPhaseNN model will revolutionize the broad application of neural networks to phase retrieval problems, including CDI for which AutoPhaseNN was designed and other inverse problems in which the forward model is known.

Currently, AutoPhaseNN network is trained offline and with mostly simulated data and limited amount of experimental data. We expect further improvement of the network performance with a large and more diverse training dataset, such as different reciprocal space oversampling ratios and crystals of different lattice types. More importantly, with its ability to train without real space images, the performance of the AutoPhaseNN model can be continuously improved with data from new coherent imaging experiments. In addition, an idealized X-ray scattering model is employed in the current network training process. Further improvements can be made by optimizing the physical model of the coherent imaging, for example, including the partial coherence correction[38] or dynamical scattering effects[39].

We believe this unsupervised deep learning solution for data inversion will be critical to the coherent imaging technique, especially in the light of the present and future fourth generation synchrotron sources, including the Advanced Photon Source Upgrade (APS-U) and the Extremely Brilliant Source at the European Synchrotron Research Facility (ESRF-EBS). These upgrades will provide two or three orders of magnitude increased coherent X-ray flux. With the subsequent vast increase in the coherent imaging data, traditional iterative phase retrieval methods will not be able to keep up. DL methods have been shown to produce high-fidelity images on a sub-second timescale, which can provide real-time feedback for the experiment. Additionally, unsupervised training can potentially enable online real-time improvements to the neural network with minimal human intervention, in effect creating a continuously self-improving NN capable of keeping up with the vast data rate of next generation light sources and electron microscopes. With further development, the unsupervised deep learning approach demonstrated by AutoPhaseNN, will be vital to coherent imaging experiments both now and in the future.

Materials and Methods

Training data generation

Every crystal in the training dataset is prepared from a nm nm nm cube of face center cubic (fcc) lattice of gold. To create diverse shape and size, each crystal has facets generated by clipping along randomly chosen high crystal symmetry planes that are positioned at random distances from the geometric center of the crystal. To create realistic local strains within the crystal, a combination of compression, tension, and shear stresses (up to 1% strain) is applied to the initial atomistic crystal structure. Subsequently, Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) is used to relax the strained structure via energy minimization performed using the embedded-atom method (EAM) interatomic potential for gold. After minimization, the lattice constant of the initial and final structures are normalized to 1 (i.e., scaled by the inverse lattice constant of gold, 1/4.078 Å), and a 5 lattice unit padding ( 20 Å before lattice normalization) is added to each side of the normalized structures to avoid any potential boundary related artifacts. Finally, the output object is created from these structures, which corresponds to a complex crystal density composed of the crystal shape (i.e., number density of atoms) and its local strain (i.e., lattice phases). The number density of atoms and lattice phases are computed using an uniform grid with a bin size of 2 lattice units ( 8 Å before lattice normalization). The number density values are normalized by the maximum density value whereas the lattice phases are computed from the atom displacement field projected along [111] and are scaled by 2. This binning process converts the crystal atomistic model to a -sized object. The complex object array is then zero padded to twice the size (

). The 3D diffraction pattern is generated via FT of the complex object and Poisson distributed noise is added. The magnitude of the diffraction pattern is the input for the network.

AutoPhaseNN architecture

Figure 1B describes the detailed X-ray scattering forward model, including the numerical modeling of diffraction and image shape constraints. The predicted amplitude and phase are combined to generate the complex image. An image shape support function is obtained by thresholding the current predicted amplitude using a contour at the 10

intensity level. This image shape support is continuously updating from the predicted amplitude and is used only once per image per epoch during the network training. The aforementioned square support enforced by the zero padding layers and this image shape support can impose additional constraints during the unsupervised training process. The estimated diffraction pattern is obtained from the FT of the current estimation of the real space image. The network weight and bias factors are optimized with the objective function that minimizes the loss between the input diffraction pattern and the estimated diffraction pattern. Once a trained model has been obtained, only the 3D CNN model is needed to recover the amplitude and phase information of the measured sample.

AutoPhaseNN training

The AutoPhaseNN model was implemented based on the Keras package running the Tensorflow backend. The training dataset is a combination of simulated diffraction patterns and experimental diffraction patterns. The generated simulated data consists of 54,028 3D diffraction patterns, within which 52k data are used for training and 2028 are reserved for testing. A rotation data augmentation was performed on the 52k simulated diffraction patterns to generate 104k data for training. For the experimental data, a data augmentation process consisting of rotation, resampling and flipping was performed on 10 diffraction patterns acquired from BCDI experiments to generate 3200 experimental training data. The entire training dataset was randomly split, giving 80

used for training and the remaining 20

reserved for validation. When training the model, adaptive moment estimation (ADAM) with an initial learning rate of 0.001 is used to update the weights and biases of the model. At the end of each epoch, the performance of the network was evaluated using the validation data. The ReduceLROnPlateau callback was used to reduce the learning rate by a factor of 2 when the validation loss has stopped improving. The network was trained in parallel on 8 NVIDIA A100 GPUs for 70 epochs. The training took

12 h with a mini-batch size of 16 on each GPU.

3D CDI data acquisition

The standard BCDI scan was a one hundred point rocking curve of +/- 0.5 degrees of the omega axis of the six circle diffractometer[41]. All experimental data was acquired at the 34-ID-C beamline of the Advanced Photon Source at Argonne National Laboratory. The X-ray beam was set to 9 keV and focused using Kirk-Patrick Baez mirrors to approximately 700700 . A Medipix2 (Timepix) detector was positioned at either 500 mm or 750 mm from the sample depending on the size of the crystal. The detector distance was determined by the need to oversample the diffraction intensities with the 55 m pixel size of the Timepix detector.

Pre-processing experimental data

AutoPhaseNN was mostly trained on simulated data with the reciprocal space oversampling ratio of around 3.0. We prepared input data by downsampling the 3D coherent diffraction pattern of the gold crystal acquired from BCDI experiments to yield an oversampling ratio of about 3.0. The sized input data was then obtained by cropping the downsampled 3D diffraction pattern.

Phase retrieval method

For conventional iterative phase retrieval, the diffraction pattern was input to iterative phase retrieval, where the algorithm was switched between error reduction (ER) and hybrid input-output (HIO). 600 iterations were performed in total using a shrink-wrap support in real space.

For the refinement procedure, the predicted amplitude and phase obtained from the AutoPhaseNN model were provided as the initial image guess and the initial support was generated by thresholding the predicted amplitude. Then 50 iterations of ER algorithm were performed to generate the final refined result.


  • [1] J. Miao, T. Ishikawa, I. K. Robinson, M. M. Murnane, Beyond crystallography: Diffractive imaging using coherent x-ray light sources. Science 348, 530–535 (2015).
  • [2] Y. Jiang, Z. Chen, Y. Han, P. Deb, H. Gao, S. Xie, P. Purohit, M. W. Tate, J. Park, S. M. Gruner, V. Elser, D. A. Muller, Electron ptychography of 2D materials to deep sub-ångström resolution. Nature 559, 343–349 (2018).
  • [3] C. Phatak, A. K. Petford-Long, M. D. Graef, Recent advances in Lorentz microscopy. Curr. Opin. Solid State Mater. Sci. 20, 107–114 (2016).
  • [4] A. Szameit, Y. Shechtman, E. Osherovich, E. Bullkich, P. Sidorenko, H. Dana, S. Steiner, E. B. Kley, S. Gazit, T. Cohen-Hyams, S. Shoham, M. Zibulevsky, I. Yavneh, Y. C. Eldar, O. Cohen, M. Segev, Sparsity-based single-shot subwavelength coherent diffractive imaging. Nat. Mater. 11, 455–459 (2012).
  • [5] B. H. Dean, D. L. Aronstein, J. S. Smith, R. Shiri, D. S. Acton, Phase retrieval algorithm for JWST Flight and Testbed Telescope. Space Telesc. Instrum. Opt. Infrared Millim. 6265, 626511 (2006).
  • [6] A. Ulvestad, A. Singer, J. N. Clark, H. M. Cho, J. W. Kim, R. Harder, J. Maser, Y. S. Meng, O. G. Shpyrko, Topological defect dynamics in operando battery nanoparticles. Science 348, 1344–1347 (2015).
  • [7] D. Kim, M. Chung, J. Carnis, S. Kim, K. Yun, J. Kang, W. Cha, M. J. Cherukara, E. Maxey, R. Harder, K. Sasikumar, S. K. R. S. Sankaranarayanan, A. Zozulya, M. Sprung, D. Riu, H. Kim, Active site localization of methane oxidation on Pt nanocrystals. Nat. Commun. 9, 3422 (2018).
  • [8] J. Kang, J. Carnis, D. Kim, M. Chung, J. Kim, K. Yun, G. An, W. Cha, R. Harder, S. Song, M. Sikorski, A. Robert, N. H. Thanh, H. Lee, Y. N. Choi, X. Huang, Y. S. Chu, J. N. Clark, M. K. Song, K. B. Yoon, I. K. Robinson, H. Kim, Time-resolved in situ visualization of the structural response of zeolites during catalysis. Nat. Commun. 11, 5901 (2020).
  • [9] M. J. Cherukara, K. Sasikumar, W. Cha, B. Narayanan, S. J. Leake, E. M. Dufresne, T. Peterka, I. McNulty, H. Wen, S. K. R. S. Sankaranarayanan, R. J. Harder, Ultrafast Three-Dimensional X‑ray Imaging of Deformation Modes in ZnO Nanocrystals. Nano Lett. 17, 1102–1108 (2017).
  • [10] M. J. Cherukara, K. Sasikumar, A. DiChiara, S. J. Leake, W. Cha, E. M. Dufresne, T. Peterka, I. McNulty, D. A. Walko, H. Wen, S. K. R. S. Sankaranarayanan, R. J. Harder, Ultrafast Three-Dimensional Integrated Imaging of Strain in Core/Shell Semiconductor/Metal Nanostructures. Nano Lett. 17, 7696–7701 (2017).
  • [11] J. N. Clark, L. Beitra, G. Xiong, A. Higginbotham, D. M. Fritz, H. T. Lemke, D. Zhu, M. Chollet, G. J. Williams, M. Messerschmidt, B. Abbey, R. J. Harder, A. M. Korsunsky, J. S. Wark, I. K. Robinson, Ultrafast Three-Dimensional Imaging of Lattice Dynamics in Individual Gold Nanocrystals. Science 341, 56–59 (2013).
  • [12] J. N. Clark, L. Beitra, G. Xiong, D. M. Fritz, H. T. Lemke, D. Zhu, M. Chollet, G. J. Williams, M. M. Messerschmidt, B. Abbey, R. J. Harder, A. M. Korsunsky, J. S. Wark, D. A. Reis, I. K. Robinson, Imaging transient melting of a nanocrystal using an X-ray laser. Proc. Nat. Acad. Sci. USA 112, 7444–7448 (2015).
  • [13] J. N. Clark, J. Ihli, A. S. Schenk, Y.-Y. Kim, A. N. Kulak, J. M. Campbell, G. Nisbet, F. C. Meldrum, I. K. Robinson, Three-dimensional imaging of dislocation propagation during crystal growth and dissolution. Nat. Mater. 14, 780–784 (2015).
  • [14] A. Ulvestad, M. J. Welland, S. S. E. Collins, R. Harder, E. Maxey, J. Wingert, A. Singer, S. Hy, P. Mulvaney, P. Zapol, O. G. Shpyrko, Avalanching strain dynamics during the hydriding phase transformation in individual palladium nanoparticles. Nat. Commun. 6, 10092 (2015).
  • [15] F. Hofmann, E. Tarleton, R. J. Harder, N. W. Phillips, P.-W. Ma, J. N. Clark, I. K. Robinson, B. Abbey, W. Liu, C. E. Beck, 3D lattice distortions and defect structures in ion-implanted nano-crystals. Sci. Rep. 7, 45993 (2017).
  • [16] M. J. Cherukara, R. Pokharel, T. S. O’Leary, J. K. Baldwin, E. Maxey, W. Cha, J. Maser, R. J. Harder, S. J. Fensin, R. L. Sandberg, Three-dimensional X-ray diffraction imaging of dislocations in polycrystalline metals under tensile loading. Nat. Commun. 9, 3776 (2018).
  • [17] W. Yang, X. Huang, R. Harder, J. N. Clark, I. K. Robinson, H. Mao, Coherent diffraction imaging of nanoscale strain evolution in a single crystal under high pressure. Nat. Commun. 4, 1680 (2013).
  • [18] F. Pfeiffer, X-ray ptychography. Nat. Photonics 12, 9–17 (2018).
  • [19] B. Zhu, J. Z. Liu, S. F. Cauley, B. R. Rosen, M. S. Rosen, Image reconstruction by domain-transform manifold learning. Nature 555, 487–492 (2018).
  • [20] H. C. Burner, C. J. Schuler, S. Harmeling, Image denoising: Can plain neural networks compete with BM3D?

    in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2012), pp. 2392–2399.
  • [21] J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, T. Aila, Noise2Noise: Learning Image Restoration without Clean Data. arXiv:1803.04189 [cs.CV] (2018).
  • [22] C. Dong, C. C. Loy, K. He and X. Tang, Image Super-Resolution Using Deep Convolutional Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 295-307 (2016).
  • [23] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani,J. Totz, Z. Wang, and W. Shi, Photo-realistic single image super-resolution using a generative adversarial network, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 4681-4690
  • [24] B. Lim, S. Son, H. Kim, S. Nah, K.M. Lee, Enhanced Deep Residual Networks for Single Image Super-Resolution. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2017), pp. 136-144.
  • [25] Y. Rivenson, Y. Zhang, H. Günaydın, D. Teng, A. Ozcan, Phase recovery and holographic image reconstruction using deep learning in neural networks. Light Sci. Appl. 7, 17141–17141 (2018)
  • [26] A. Sinha, J. Lee, S. Li, G. Barbastathis, Lensless computational imaging through deep learning. Optica 4, 1117 (2017).
  • [27] M. J. Cherukara, T. Zhou, Y. Nashed, P. Enfedaque, A. Hexemer, R. J. Harder, M. V. Holt, AI-enabled high-resolution scanning coherent diffraction imaging. Appl. Phys. Lett. 117, 044103 (2020).
  • [28] O. Wengrowicz, O. Peleg, T. Zahavy, B. Loevsky, O. Cohen, Deep neural networks in single-shot ptychography. Opt. Express 28, 17511 (2020).
  • [29] Z. Guan, E. H. R. Tsai, X. Huang, K. G. Yager, H. Qin, PtychoNet: Fast and high quality phase retrieval for ptychography. British Machine Vision Conference (2019), pp. 1172.
  • [30] T. Nguyen, Y. Xue, Y. Li, L. Tian, G. Nehmetallah, Deep learning approach for Fourier ptychography microscopy. Opt. Express 26, 26470 (2018).
  • [31] M. J. Cherukara, Y. S. Nashed, R. J. Harder, Real-time coherent diffraction inversion using deep generative networks. Sci. Rep. 8, 16520 (2018).
  • [32] L. Wu, P. Juhas, S. Yoo, I. Robinson, Complex imaging of phase domains by deep neural networks. IUCrJ 8, 12-21 (2021).
  • [33] R. Harder, Deep neural networks in real-time coherent diffraction imaging. IUCrJ 8, 1-3 (2021).
  • [34] H. Chan, Y. S. Nashed, S. Kandel, S. O. Hruszkewycz, S. K. Sankaranarayanan, R. J. Harder, M. J. Cherukara, Rapid 3D nanoscale coherent imaging via physics-aware deep learning. Applied Physics Reviews 8, 021407 (2021).
  • [35] A. Scheinker, R. Pokharel, Adaptive 3D convolutional neural network-based reconstruction method for 3D coherent diffraction imaging. J. Appl. Phys. 128, 184901 (2020)
  • [36] Y. Zhang, M. A. Noack, P. Vagovic, K. Fezzaa, F. Garcia-Moreno, T. Ritschel, P. Villanueva-Perez, PhaseGAN: a deep-learning phase-retrieval approach for unpaired datasets. Opt. Express 29, 19593–19604 (2021).
  • [37] D. Ratner, F. Christie, J. Cryan, A. Edelen, A. Lutman, X. Zhang, Recovering the phase and amplitude of X-ray FEL pulses using neural networks and differentiable models. Opt. Express 29, 20336–20352 (2021).
  • [38] J. N. Clark, X. Huang, R. Harder, I. K. Robinson, High-resolution three-dimensional partially coherent diffraction imaging. Nat. Commun. 3, 993 (2012).
  • [39] W. Hu, X. Huang, H. Yan, Dynamic diffraction artefacts in Bragg coherent diffractive imaging. Journal of Applied Crystallography 51, 167–174 (2018)
  • [40] J. Miao, J. Kirz, D. Sayre, The oversampling phasing method. Acta Crystallographica Section D: Biological Crystallography 56, 1312–1315 (2000).
  • [41] M. Lohmeier, E. Vlieg, Angle calculations for a six-circle surface X-ray diffractometer. Journal of Applied Crystallography 26, 706–716 (1993).


This work was performed, in part, at the Advanced Photon Source, a U.S. Department of Energy (DOE) Office of Science User Facility, operated for the DOE Office of Science by Argonne National Laboratory under Contract No. DE-AC02-06CH11357. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. We gratefully acknowledge the computing resources provided on Swing, a high-performance computing cluster operated by the Laboratory Computing Resource Center at Argonne National Laboratory.

This work was supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences Data, Artificial Intelligence and Machine Learning at DOE Scientific User Facilities program under Award Number 34532. M.J.C. acknowledges partial support from Argonne LDRD 2021-0090 – AutoPtycho: Autonomous, Sparse-sampled Ptychographic Imaging. Y.Y. acknowledges partial support from Argonne LDRD 2021-0315 – Scalable DL-based 3D X-ray nanoscale imaging enabled by AI accelerators.

Author contributions: Y.Y. and M.J.C. proposed the initial idea, R.J.H. contributed to the conceptualization. H.C. generated the simulated data. R.J.H. collected the BCDI experimental data. Y.Y. built the model and performed network training and testing with the help from M.J.C., R.J.H. and H.C.. P.B. gave technical support and conceptual advice. M.J.C., Y.Y., R.J.H. and H.C. wrote the manuscript with input from P.B. and S.S..
Competing interests:Authors declare that they have no competing interests.
Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper. All codes, data, and trained models developed in this study will be made available in a public GitHub repository.

Figure 1: Schematic of the neural network structure of AutoPhaseNN model during training. (A) The model consists of a 3D CNN and the X-ray scattering forward model. The 3D CNN is implemented with a convolutional auto-encoder and two deconvolutional decoders using the convolutional, maximum pooling, upsampling and zero padding layers. The physical knowledge is enforced via the Sigmoid and Tanh activation functions in the final layers. (B) The X-ray scattering forward model includes the numerical modeling of diffraction and the image shape constraints. It takes the amplitude and phase from the 3D CNN output to form the complex image. Then the estimated diffraction pattern is obtained from the FT of the current estimation of the real space image.
Figure 2: Performance of AutoPhaseNN on simulated test data. (A) Histogram of error for the modulus of the diffraction pattern. (B) Histogram of SSIM for the amplitude of the real space images. (C) Volume rendering of the input 3D diffraction patterns, (D) ground truth images, (E) phase retrieval results, (F) network predictions, and (G) refined results, for the four representative samples. (D-G) show the 0.3 contour of the amplitude and the color represents the phase on the surfaces. Reciprocal errors are shown in the figures.
Figure 3: Performance of AutoPhaseNN on real data from BCDI experiments. Comparison among reconstructions from (A) traditional phase retrieval, (B) prediction of the AutoPhaseNN model, and (C) refinement process. The crystals are clipped to show the internal strain fields and the images show the 0.3 contour of the amplitude and the color represents the phase. Reciprocal errors are shown in the figures.
Figure 4: Reconstruction error as a function of the iteration number for conventional phase retrieval and refinement. The dashed lines represent the error for conventional phase retrieval while the solid lines are for refinement process.