## I Introduction

Optical diffraction tomography (ODT) is a technique to reconstruct a 3-D refractive index (RI) distribution [lauer2002new, simon2008tomographic, wolf1969three, sung2009optical, jo2018quantitative, lee2013quantitative]. Here, refractive index (RI), , is the optical property that relates to the electro- and magnetic- susceptibility ( and , respectively) representing whole light-matter interaction:

(1) |

Since the light-matter interaction depends on electron distribution and density, the RI has different values for each material. Thus, by reconstructing RI distributions, ODT not only provides us the structure of cells but also their status such as dry mass or chemical information[popescu2006diffraction, popescu2004fourier, popescu2008optical, mir2011optical].

Nevertheless, there are still remaining technical challenges in ODT. In particular, the resolution of the reconstructed RI distribution along the axial direction is limited due to the well-known missing cone problem. This missing cone problem is originated from practical ODT instrumentation that can only measure scattering fields within limited angles [lim2015comparative]. This results in the missing Fourier components in cone-shaped angular regions, which introduces the image distortions to make the quantification of RI values difficult. Furthermore, recent state-of-the-art systems [shin2015active, lee2017time] acquire hologram using a circular sampling pattern with extremely sparse view sampling (e.g. 49 views). This is because the number of rotating mirror patterns that can be saved in the Digital Micromirror Device (DMD) is limited. Moreover, typical imaging configurations of the current ODT system does not allow the rotation of the samples that are being visualized [shin2015active, lee2017time]. Due to these limitations, deficient projection views often yield technical problems, such as more dominant missing cone artifacts, low quality images with inaccurate RI values, etc., which makes the accurate quantification of the original RI values difficult. Hence, devising a method for ODT reconstruction which eliminates the missing cone artifacts is of great importance.

Many researchers have investigated various regularization methods to address this problem[tam1981limited, macias1988missing, barth1989approximation, kamilov2015learning] using iterative reconstruction approaches that incorporates a priori knowledge about the RI distribution. These methods alternate between the image space and the Fourier space, imposing constraints on each domain. One of the most widely used algorithm is the Gerchberg-Papoulis (GP) algorithm [gerchberg1974super, papoulis1975new], which simply imposes non-negativity to the RI values. It removes the missing cone artifacts to certain extent while being easy to implement, hence its popularity. Optimization algorithms derived from compressed-sensing (CS) theory have also been proposed, utilizing edge-preserving or sparsity promoting regularizations such as and total vairation (TV) [charbonnier1997deterministic, tian2011low, delaney1998globally, lim2015comparative]. Although these iterative approaches do enhance the resolution compared to the original reconstruction, they cannot overcome the inherent missing cone problem. Moreover, the smoothness constraint such as TV regularization often introduces cartoon-like artifacts in the reconstruction.

Inspired by recent success of deep learning approaches to solve ill-posed inverse problems such as CT reconstruction[kang2017deep, jin2017deep], MR reconstruction[hammernik2018learning, han2019k], positron emission tomography [gong2019iterative], ultrasound imaging [yoon2019efficient], optical microscopy [sinha2017lensless, nehme2018deep, rivenson2017deep], etc, several authors have applied deep learning for ODT [nguyen20183d, lim2020three, zhou2020diffraction, ryu2020deepregularizer]

. Most of the deep learning-based methods for ODT focus on supervised learning, where ground-truth data acquisition is feasible. Specifically, when artifact-free reference data are attainable, the network learns point-to-point mapping from the artifact-corrupted reconstruction to the matched reference data. Nevertheless, in many interesting real-world situations in ODT, such ground-truth is impossible to measure directly. Although some works

[nguyen20183d, lim2020three, ryu2020deepregularizer]utilized phantoms to train the neural network in a supervised fashion, they lack generality in practical situations. On the other hand, unsupervised learning methods for ODT reconstruction are still on its early stage of development. For instance, Zhou

et al. [zhou2020diffraction] developed a DIP[ulyanov2018deep]-based method which requires hours of computation per experiment. Therefore, the interest in developing unsupervised learning methods without matched ground-truth data is high.Recently, several works have been proposed to train a generative model

which acts as a stochastic sampler with respect to the latent space noise vector and unknown parametric measurement models

[bora2018ambientgan, gupta2020cryogan]. For example, when the measurement from the unknown input signal is given by the forward mapping parameterized by the unknown parameter such that , then the authors of AmbientGAN [bora2018ambientgan] propose the following adversarial training to train the the generative modelto estimate the

:where and refer to the probability distribution for the measurement, latent space noise, and the parameter space, and denotes the discriminator that tells the fake measurements from the true measurements. The idea was further extended in cryoGAN [gupta2020cryogan], where the authors tackled the reconstruction of cryo-EM from unknown projection view angles. However, the aim of these stochastic generative models was more toward the estimation of the unknown forward mapping, which is not applicable to ODT where the projection view angles are known. Moreover, the stochastic generative model requires the 3D reconstruction at each step of training, demanding significant amount of time and computational resource for training.

In contrast to these approaches, one of the most important contributions of this work is to use unsupervised learning in order to significantly improve the projection views at the missing cone angles, after which missing cone artifact can be reduced in a subsequent ODT reconstruction step using standard reconstruction algorithm. Specifically, we use CycleGAN to learn the optimal transport map that can convert the probability distribution of the blurry projections to that of the high-resolution projection images at the acquisition angles. Since we only process the 2D projection data at each step of the network training, the computational time for our method is significantly smaller compared to the stochastic sampling approaches that directly reconstructs 3D volumes during each step of training. Despite the simplicity, extensive experiments using numerical phantom, real microbead data, and in vivo complex biological cells confirmed that our method can successfully address the long-standing missing cone problem in ODT.

## Ii Main Contributions

### Ii-a Forward Model

In ODT, the forward diffraction is modeled by the scalar Helmholtz equation:

(2) |

where is the wavenumber with being the incident light wavelength in the free space, , is the scattered field from the scatterer with illumination dependent total electric field and the background electric field . Additionally, the density denotes the normalized scattering potential:

where (constant) is the RI of the surrounding homogeneous medium.
The integral form representation of Eq. (2) is so-called *Lippmann-Schwinger* integral equation [born2013principles]:

(3) |

where is the outgoing free-space Green’s function of 3D Helmholtz equation. Unfortunately, is a function of which makes the integral equation in (3) is nonlinear, so that the direct inversion of from is difficult. Accordingly, using the Born and Rytov approximation, (3) is linearized as

(4) | ||||

Now, suppose that the incident field is a plane wave, i.e. , where is a real-valued spatial wavenumber determined by the directional unit vector for the -th illumination. According to the Fourier-diffraction theorem [wolf1969three], for each illumination angle, the ODT measures Fourier samples along a specific 3-D half spherical shell as shown in Fig. 2. Since the illumination vector is only limited to specific angles, this results in missing cone angles where no Fourier data are measured from the detectors. Moreover, with limited projection views, the sampling pattern cannot fill all the -space near its sampling trajectories. This means that there are missing information between each angle, which further degrades the reconstruction quality.

### Ii-B ProjectionGAN

Recall that one of the most popular reconstruction algorithms for ODT is Gerchberg-Papoulis algorithm which iteratively imposes non-negativity in the RI distribution in order to fill in the missing -space information. Nonetheless, we focus on the fact that the missing information between each angle are represented as low-resolution and artifact contaminated projection images at the missing angles even after the Gerchberg-Papoulis reconstruction. If the missing cone problem had not existed, projection images of all angles would share the same characteristics, that is, they would belong to the same distribution. Therefore, if the projection images at the missing angles could be enhanced by a certain algorithm, you could expect that the resulting reconstruction should be improved as well.

One could think that this problem is solvable similar to cryoGAN [gupta2020cryogan], which would require fitting the 3D data to the GPU, randomly projecting the object at every weight update step, and updating the RI values of the 3D object directly from the gradients. However, this approach requires vast amount of computational resource and time, so here we propose a much simpler method that can be applied where we know the measurement angles.

Once the -space filling through Fourier diffraction theorem is done, the Gerchberg-Papoulis algorithm [gerchberg1974super, papoulis1975new] is used to initialize the volume. Starting here, we can project the 3D object into the 2D measurement space as shown in Fig. 1. For the projection, we use parallel beam projection. Although this is different from the actual forward model of ODT, it still provides high resolution projections at the solid-angles where the real holograms are measured, thanks to the relation between projection-slice theorem and projection-diffraction theorem [kak2001principles] as shown in Fig. 3. Hence, we resort to parallel beam projection as a fast approximation. Notice that by performing projection at the controlled angles, we can free ourselves from operating in the object space, which is in a prohibitively high dimension (e.g. in our case ) since the projection images can later be used for analytic reconstruction. Consequently, projectionGAN need only to operate in the projection space, which is relatively easy to handle with low computational burden. This is vastly different from the prior AmbientGAN-like approaches [bora2018ambientgan, gupta2020cryogan], where the forward projection operator would have to be applied to the huge 3D object per every gradient update step. Our approach only requires the projection step once prior to training.

When the forward projection is performed, we achieve two projection image distributions: and , where denotes the solid angle for the measured projection angles, and is the unmeasured solid angles. The projection images at usually have very high resolution without any missing cone angle artifacts, whereas those of suffers from the blurring from the missing cone angles. Therefore, our goal is to find the transform that improves the probability distribution on by learning that of .

In fact, this problem can be rigorously addressed using the mathematical theory of optimal transport (OT). More specifically, optimal transport is concerned about finding the optimal transport map that can transport one probability measure to another at the minimal transportation cost. Specifically, suppose that the target projection space is equipped with a probability measure , whereas the input projection space is with a probability measure as shown in Fig. 4(a). Then, we can see that the mass transportation from the measure space to another probability measure space is done by a missing cone artifact-correcting generator , i.e. the generator pushes forward the probability measure in to a measure in the target space . On the other hand, the mass transport from to is performed by the artifact generation operator , so that pushes forward the probability measure in to in the space . Then, the optimal transport map for unsupervised learning can be achieved by minimizing the statistical distances between and , and between and , and our proposal is to use the Wasserstein-1 metric as a means to measure the statistical distance.

In particular, if we minimize the two distances together using Kantorovich OT formulation, the following cycleGAN formulation can be derived [sim2020optimal]:

(5) |

with

(6) |

where

is an appropriate hyperparameter, and the cycle-consistency loss is defined by

and the discriminator term is given by

where and are discriminator loss terms that differentiate the real projections from the fake projections in and , respectively. Here, the Kantorovich potentials should be 1-Lipschitz, and we use the LS-GAN formulation as our discriminator loss to impose finite Lipschitz condition [lim2020cyclegan].

Once the projection views at the missing cone angle is improved, we use the standard 3D reconstruction algorithm for final reconstruction. For the final 3-D reconstruction from the improved parallel projection data, there could be many ways to acquire a 3D tomography from the projection dataset. For example, weighted backprojection (WBP)[radermacher2007weighted], Fourier reconstruction[matej20013d], and simultaneous iterative reconstruction (SIRT)[trampert1990simultaneous] are used commonly in cryo-EM community to account for projection reconstructions where the projection angle is random. As we can control the synthetic projection angles, here we resort to a simpler method using the filtered backprojection (FBP) with equal angular increments. More specifically, 360 equiangluar projections are generated around each coordinate axis, which are obtained with a single generator . Subsequently, filtered backprojection (FBP) was applied to achieve the final reconstruction. To keep the balance in all three axes, the three set of reconstructed 3D potentials are averaged to get the final result.

Method | Time(s) |
---|---|

Rytov | 1.37 |

GP | 14.08 |

TV | 604.86 |

CryoGAN [gupta2020cryogan] | Order of hours |

ProjectionGAN | 27.33 |

The time required for reconstruction at the inference stage is compared in Table I. The ProjectionGAN enhancement step per sample takes about 13 seconds per sample, and added with the GP initialization step, it sums up to about 27 seconds. Hence, we can see that our method is much faster and computationally cheaper than modern iterative methods such as TV [lim2015comparative] or cryoGAN [gupta2020cryogan].

## Iii Methods

### Iii-a Network Architecture

Neural networks that were used for the training of CycleGAN are depicted in Fig. 5. The generator shown in Fig. 5 (a) is an encoder-decoder architecture adopted from U-Net with a slight modification[ronneberger2015u]. Specifically, the encoder and the decoder parts were constructed with blocks of 3

3 convolution layer, ReLU activation, and group normalization

[wu2018group]. For the pooling layer, we used 33 convolution layer with stride 2. For the unpooling layer, we used bilinear upscaling. Skip-connections were used to concatenate features from each level of encoder part, and 1

1 convolution layer was used at the final stage of reconstruction. For , the initial number of filter channels in stage 1 was set to 32, which doubled per stage, reaching 256 channels at stage 3. For , to limit the model capacity, we set the initial number of filter channels in stage 1 to 16, which doubled per stage, reaching 64 channels at stage 2.The decoder architecture was adopted from patchGAN of pix2pix[isola2017image], and is depicted in Fig. 5 (b). All the convolution layers utilize 44 convolution with stride 2 which efficiently reduces the size of image. LeakyReLU activation and instance normalization are used in between the linear convolution layers. Finally, 11 convolution layer is placed at the end to produce prediction scores.

When multiple particles exist for reconstruction, or when global characteristics and local characteristics differ, as in the case of the reconstruction of numerical phantom data, we found out that using multi-scale discriminator shown in Fig. 5 (c), and first introduced in Shocher et al. [Shocher2018InGANCA] is efficient. The architecture of multi-scale discriminator was adopted from Shocher et al. [Shocher2018InGANCA], which has 4 different scale levels. The structure of discriminator on each level share the same architecture as in Fig. 5

(b), without sharing the parameters at each scale. As the level progresses, the input data is down-scaled by a factor of 2, with the lower levels of the discriminator focusing to a more fine-grained detail, and the higher levels attending to the overall structure. The feature maps extracted on the higher levels are upscaled with bilinear interpolation and added to acquire the final output. The weights multiplied to each level are adjusted linearly, putting larger emphasis on the details as the learning epoch progresses.

### Iii-B Details of Training

#### Iii-B1 Numerical Simulation

For the training of numerical phantom data, out of 27 set of phantoms, 20 were used for training and the other 7 were used for testing. We set in Eq. 6, with 4 levels of discriminator shown in Fig. 5 (c). The weights applied to each level were set to at the first epoch, which increased linearly up to at the final epoch. Patch-training was used with the size 256

256 sampled from uniform distribution. Training was performed for 150 epochs, which took about a day.

#### Iii-B2 ODT measurement data

For training, out of 87 NIH3T3 cells and 18 microbead samples that were acquired, 67 different NIH3T3 cells and 10 microbead samples were selected randomly. The rest were utilized as test samples. Hyperparameter

for the loss function given in Eq.

6 was set to 10 for the whole training process. Patch-training was performed with the size 256256 sampled from uniform distribution. Training was performed for 20 epochs with the learning rate of 0.0001. Training took about a day.With both experiments, Adam optimizer was used with parameters [kingma2015adam]

. The algorithm was implemented using the open-source deep learning library PyTorch with NVidia GeForce GTX 2080-Ti as the GPU.

## Iv Results

### Iv-a Numerical Simulation

We first performed a numerical simulation study using 3D phantom dataset generated using randomly positioned spheres with various sizes. The details of the dataset and the process of simulation is provided in the Methods section. In Fig. 6, we demonstrated that ProjectionGAN accurately reconstructs the ground-truth, resolving the elongation issue stemming from the missing cone problem. Moreover, in the axial slice we can eliminate the false signals coming from the missing cone artifact. In contrast, GP and TV fall short of the quality compared to ProjectionGAN, where we can clearly see that the algorithm does not fully eliminate the artifacts. One thing to note here is that ProjectionGAN is generally applicable even when multiple particles are apparent in the acquisition, unlike studies that limit their scope to single-particle cases [gupta2020cryogan].

### Iv-B Microbead Measurement

Furthermore, to verify the efficacy of our method with real ODT measurements, we first tested the network with silica microbead whose original RI value was about 1.46. Here, we clearly show that the shape and the RI value are reconstructed with much better accuracy with the use of ProjectionGAN. We compared our method with the conventional Rytov reconstruction, GP algorithm, and total variation (TV) reconstruction.

In Fig. 7 (a), we see that with Rytov reconstruction, the original shape of the microbead is not preserved. Moreover, elongation along the optical axes are clearly seen especially in the y-z and x-z plane. Reconstructions through GP resolves the issue of elongation to some extent, but it does not capture the original shape properly. Furthermore, it is evident that the RI values are largely underestimated from the actual value, due to the missing cone problem in the -space that refers to the 3D Fourier space. The TV reconstruction results show better results in that the method produces a more homogeneous structure since it explicitly demands smoothness of the reconstruction. Nonetheless, cutviews seen from y-z or x-z plane still does not capture the spherical shape properly, not to mention the underestimated RI values. Moreover, missing cone problems are still visible in the -space.

The fourth column in Fig. 7 (a) shows reconstructions with the proposed method, where the spherical shape of microbead is preserved, and underestimated RI values are pertained. Also, -space shows an approximate sinc function, which should be seen if homogeneous sphere is reconstructed. To inspect the RI distributions in detail, in Fig. 7(b), we see the histogram plot of RI values within the 3D data. The intensity of the colorbar represents the number of bins in the data, where we see clear shift towards the actual value with our proposed method compared to the Rytov reconstruction, the GP method, and the TV method. Similar results can be seen in Fig. 7(c) with the line profile.

### Iv-C Biological Cells

Next, we used real biological samples, namely NIH3T3 cells to validate our method. In Fig. 3, we see severe degradation of image quality as the projection angle deviates from the measurement angle with all the existing methods - Rytov, GP, and TV. This shows pictorially, the critical effect of the missing cone problem - the resolution heavily deteriorates in the missing angles. However, using ProjectionGAN, the resolution of projection views are largely enhanced, and we can observe the cell clearly in all angles.

Using the improved projection data in Fig. 3, we performed three dimensional reconstruction using filtered backprojection for in-depth analysis of the reconstructed RI distribution. In Fig. 8 we provide 3D rendering of the reconstructed NIH3T3 cell distribution and its cut-views. Here, in the view presented in Fig. 8, we see that the resolution in the reconstructions with Rytov and TV are far below satisfactory, where we cannot clearly see the configuration of cell organelles. Especially, TV reconstruction in Fig. 8(c) shows cartoon-like appearances, which do not capture the true texture of the cell. With GP reconstruction shown in Fig. 8(b), we achieve higher resolution with more details preserved. Nevertheless, the resolution in the y-z and x-z plane is still below standards. Specifically, when we visualize cut-view RI distribution, the images in the y-z and x-z plane have very low resolution, whereas images reconstructed through ProjectionGAN show superior results in all three planes. Here, we again confirm that the problem of underestimated RI values is resolved. The same superiority can be seen in the reconstruction results of other type of cells, depicted in Fig. 9

Finally, we provide slice views for Rytov, GP, TV and ProjectionGAN in each plane along with the -space of each corresponding slice in Fig. 10(a)-(c). With Rytov reconstruction, the image support in the y-z and x-z plane are vague, and the resolution is highly degraded. GP method has higher resolution than the Rytov reconstruction, but the cell support is vague. TV reconstruction shows cartoon-like artifacts, which does not properly capture the high frequency details of the cell structure. On the other hand, with our method we are able to achieve high resolution in all three axes with clear boundaries. Enhancement of RI values is also remarkable.

The missing cone problem can be best characterized when the -space is visualized. More specifically, due to the Fourier diffraction theorem, field retrieval is performed in 3D -space, where each hologram correspond to the half-sphere of the -space. Since in our case, 49 holograms were acquired to reconstruct a single RI distribution, in Rytov reconstruction in Fig. 10, 49 curves are apparent, regardless of the view plane. -space of reconstructions with GP method solves the missing cone problem by utilizing non-negativity constraint, and thus when seen in the x-y plane, it seems that the missing cone problem is resolved. Nonetheless, when seen from the y-z or x-z plane, we still observe the missing parts of -space. With TV reconstruction, similar phenomenon was observed in the spectrum of TV reconstruction, which resembles a blurred version of Rytov or GP reconstruction. However, our new method ProjectionGAN resolves the problem completely regardless of the viewing plane. Note that the half-spherical curves that were apparent in the Rytov, GP and TV reconstruction are no longer visible in the -space reconstructed with the proposed method.

## V Conclusion

In this paper, we proposed a novel ProjectionGAN that resolves the missing cone problems in ODT using a novel unsupervised learning approach. Specifically, an initial GP reconstruction was used to generate the projection views at the missing view angles, which are improved by a novel ProjectionGAN algorithm. Our ProjectionGAN algorithm relies on the optimal transport driven cycleGAN that learns to transport projection distributions from missing views to that of high quality projections. After the projections are refined, 3D RI distribution can be reconstructed through a simple filtered backprojection. Experimental results confirmed that the characteristics of scattering potential from our method are largely superior than the conventional reconstructions.

## Acknowledgment

This research was funded by the National Research Foundation (NRF) of Korea grant NRF-2020R1A2B5B03001980