1 Introduction
Photoacoustic tomography (PAT) is an emerging noninvasive modality that has manifested an enormous prospect for clinical practices [1]. In PAT, the tissue is illuminated with nearinfrared light of wavelength 650900nm. The absorbed optical energy is transformed into acoustic energy through the photoacoustic effect, and the generated ultrasound is measured by transducer arrays around the tissue in order to retrieve the optical internal properties. The coupling mechanism of the optical and ultrasound waves gives multiple advantages over conventional standalone modalities. For instance, the acoustic wave experiences less scattering in tissue compared to optical wave, allowing PAT to break the optical diffusion limit and generate highresolution images [2] while preserving intrinsic optical contrast [3].
Typical photoacoustic signal generation comprises three steps: (1) the tissue absorbs light; (2) the absorbed optical energy heats the tissue and raises the temperature; (3) thermoelastic expansion occurs and generates ultrasound. The image formation in PAT is to recover the distribution of the deposited energy, known as the local optical fluence, from the ultrasound signals that are recorded by the sensors deployed around the tissue. As the initial ultrasound pressure is approximately proportional to the optical fluence, it is sufficient to reconstruct the initial pressure from the recorded ultrasound signals. Conventional reconstructive schemes, such as backprojection based methods [4, 5, 6, 7, 8, 9, 10, 11] and timereversal based methods [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24], typically assume the sound speed is precisely known. This assumption however is not always fulfilled, and the precise sound speed distribution is often unknown. It is therefore of practical significance to study simultaneous reconstruction of both the sound speed distribution and the acoustic initial pressure.
The purpose of this article is to address this issue using a datadriven approach. Datadriven approaches such as machine learning and deep learning have gained increasing attention recently in medical image reconstruction
[25, 26]. The powerful computational capacity of modern super computers has enabled these methods to enhance medical image reconstruction by extracting the hidden patterns in the data that would otherwise be lost. In PAT, data driven methods have achieved stateoftheart results in PAT image reconstruction in the scenarios of sparse data [27, 28, 29], limited view [30, 31, 32], artifacts removal [33, 34, 35], as well as other applications [36, 37, 38]. In addition to PAT, datadriven approaches have also found broad application in the image reconstruction of many other imaging modalities such as Computed Tomography (CT), Magnetic Resonance Imaging (MRI), as well as other related fields like radiomics and radiotheraphy, see [39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50] for some examples and applications along these directions.2 Preliminary
In this section, we formulate the mathematical model of the simultaneous reconstruction problem in PAT and review the recent advances based on modeldriven approaches.
Propagation of ultrasound in PAT is typically modeled as an initial value problem for the wave equation. This can be written as
(1) 
where is the dimension, is the acoustic velocity, is the acoustic pressure, is the stoppage time of the measurement, is the mass density, is the sound speed, and is the initial pressure distribution. We remark that another popular model in PAT is to deal with the second order acoustic equation rather than the above first order system. These models are more or less equivalent under suitable assumptions.
Let be a bounded domain in representing the tissue, with the boundary of denoted by . The measurement in PAT is the detected ultrasound signals, that is the temporal boundary data . We further introduce a measurement map which is defined as
(2) 
where is the solution of (1). Assuming the mass density is known, then the simultaneous reconstruction problem in PAT can be recast as the inverse problem to recover the pair from ; in other words, to invert the operator . Note that is linear in but nonlinear in .
We recall some known results regarding the simultaneous reconstruction in PAT. Linearization of the map (2) at sufficiently smooth pairs is studied in [51] and the linearized inversion is shown to be unstable in any scale of Sobolev spaces. Invertibility of is established in [52]
for radially symmetric sound speeds in any odd dimension higher than three; in
[53, 54] for sound speeds that satisfy certain orthogonality relation with harmonic functions. Numerical simulations have been implemented in [55] using the finite element model, and in [56, 57, 58] for parametrized sound speeds.Our datadriven approach is inspired by the work of Matthews et al. [56], where the authors proposed a direct optimization approach towards the simultaneous reconstruction, with the sound speed being confined in a parametrized lowerdimensional space. We denote the measurement by for simplicity. With slight abuse of notations, the discretized versions of and are still denoted by themselves, which are two matrices. For a fixed sound speed , the operator is linear, hence its discretization is another matrix, say . The idea in [56] is to minimize the objective function
(3) 
where the data fidelity term is defined as and denotes the regularization term for the initial pressure. This objective function can be solved using the proximal optimization method that allows constraints and nonsmooth regularization function for the initial pressure distribution. The initial pressure distribution and sound speed distribution are updated iteratively using gradient descent. The algorithm in [56] is summarized in Algorithm 1 below.
3 Simultaneous Reconstruction via Deep Learning
We describe our datadriven method in this section. The motivation stems from some limitations of the iterative algorithm 1. For instance, repeated selection of step sizes to update the initial pressure and sound speed is timeconsuming; selection of the parameter and the regularization term is highly empirically; tuning can be tedious. To overcome these limitations, we propose a simultaneous reconstruction network (SRNet) to update the initial pressure and the sound speed for each iteration.
The proposed SRNet is shown in Fig. 1
. It contains three steps  feature extraction, feature fusion, and reconstruction.

Feature extraction: We used two convolutional layers to extract features from the initial pressure distribution, the negative gradient of data fidelity with respect to initial pressure, and current sound speed distribution.

Feature fusion:
The feature maps extracted from above three branches are concatenated along the channel dimension, which are then fed into the batch normalization (BN) layer. Since the feature maps from the initial pressure and the sound speed are in different scale, this batch normalization layer can normalize feature maps into the same scale for subsequent feature fusion. The feature fusion part also contains two convolutional layers followed by ReLU.

Reconstruction: After the feature fusion, we use two convolutional layers to reconstruct the updated initial pressure and sound speed, respectively. A batch normalization layer is used after first convolutional layer to scale the fused feature back to the original scale. A skip connection after the second convolutional layer enables the network to learn the residual just like the traditional iterative algorithm (see step 56 in Algorithm 1
). The last ReLU activation function guarantees the updated initial pressure and speed distribution are nonnegative.
Layer  Feature Extraction  Feature Fusion  Reconstruction 
1  3 3 conv , 32, stride 1 
Batch Normalization  3 3 conv, 16, stride 1 
2  3 3 conv, 32, stride 1  3 3 conv, 64, stride 1  Batch Normalization 
3  3 3 conv, 32, stride 1  3 3 conv, 1, stride 1  
A detailed network structure is shown in Table 1
. Note that zeropadding is used such that all feature maps have the same as the inputs. For each iteration, we trained SRNet using the simulated dataset. At the
th iteration, the loss function is chosen as follows:
(4)  
s.t. 
where represents the parameters of the SRNet. and are the groundtruth labels. We employ the Adam algorithm [59] to update the parameters in . The gradients of the parameters are computed using a backpropagation algorithm. The iterative reconstruction algorithm is summarized in Algorithm 2.
Note that our proposed network is different from the model in [30] because 1) our network aims to reconstruct the initial pressure distribution and sound speed distribution simultaneously, while the model in [30] assumes the sound speed is precisely known; 2) our network is a 2D network, while the model in [30] is a 3D network; and 3) the proposed network used the batch normalization layer to scale the feature maps, while the model in [30] used an explicit coefficient to represent the step length.
4 Results
4.1 Data generation
In this study, we used a simplified Shepplogan phantom to train and test our network. Fig. 2 shows the phantom along with the initial pressure and the sound speed distribution. We randomly changed the sizes and positions of the region 1, 3, 4, and 5 in order to increase the diversity of dataset. The corresponding initial pressure and sound speed are shown in Table 2. We put the phantom into a image, which is surrounded by 252 detectors in the rectangle form. For the training purpose, we randomly generated 5,120 images with acoustic signal of the size . The mass density is assumed to be constant 1 throughout. Then we selected 1,024 images randomly to test the trained model.
Index  Initial Pressure  Sound Speed [m/s] 
1  0.0  1480 
2  0.2  1800 
3  0.4  1530 
4  0.6  1520 
5  0.8  2600 
6  1.0  3198 
4.2 Experimental results
Throughout the experiments, the initial pressure was initialized uniformly as zero and the sound speed was initialized uniformly as 1,500 m/s. The maximum number of iteration was set to 4. The gradient of the data fidelity with respect to the initial pressure is computed by the adjoint state method using the wave package [60]
. The proposed SRNet was implemented by Pytorch and trained on four NVIDIA 1080Ti GPUs.
5 Discussions and Conclusion
The main weakness of this study is the numerical nature, which may not fully reflect all the physical factors in real clinical/preclinical applications. We plan to perform physical phantom experiments as a next step. Also, we are interested in animal studies in vivo. Nevertheless, the numerical results and the network trained with simulated data may serve a baseline for further improvements. Hopefully, this imaging modality may find practical use in some important tasks such as breast imaging.
In conclusion, we have developed a simultaneous reconstruction network (SRNet) to jointly recover the sound speed and initial pressure in photoacoustic tomography. The proposed SRNet has several merits: automatically learning the step size, regularization term, and the tradeoff between data fidelity and regularization in a datadriven manner, and not requiring the gradient of the data fidelity with respect to sound speed. In the future, we will further improve the structure of the network and also validate it on more complicated datasets.
6 Acknowledgment
The research of Y. Yang is partly supported by the NSF grant DMS1715178, the AMSSimons travel grant, and the startup fund from Michigan State University. The authors would like to thank NVIDIA Corporation for the donation of GPU used for this research.
References
 [1] Jun Xia, Junjie Yao, and Lihong V Wang. Photoacoustic tomography: principles and advances. Electromagnetic waves (Cambridge, Mass.), 147:1, 2014.
 [2] Lihong V Wang and Gene K Beare. Breaking the optical diffusion limit: Photoacoustic tomography. In Frontiers in Optics, page FWY2. Optical Society of America, 2010.
 [3] Junjie Yao and Lihong V Wang. Photoacoustic tomography: fundamentals, advances and prospects. Contrast media & molecular imaging, 6(5):332–345, 2011.
 [4] David Finch, Rakesh, and Leonid Kunyansky. Recovering a function from its spherical mean values in two and three dimensions. Photoacoustic Imaging and Spectroscopy, 2009.
 [5] David Finch, Markus Haltmeier, and Rakesh. Inversion of spherical means and the wave equation in even dimensions. SIAM Journal on Applied Mathematics, 68(2):392–412, 2007.
 [6] Markus Haltmeier, Thomas Schuster, and Otmar Scherzer. Filtered backprojection for thermoacoustic computed tomography in spherical geometry. Mathematical methods in the applied sciences, 28(16):1919–1937, 2005.
 [7] Robert A Kruger, Daniel R Reinecke, and Gabe A Kruger. Thermoacoustic computed tomography–technical considerations. Medical physics, 26(9):1832–1837, 1999.
 [8] Robert A Kruger, William L Kiser Jr, Daniel R Reinecke, and Gabe A Kruger. Thermoacoustic computed tomography using a conventional linear transducer array. Medical physics, 30(5):856–860, 2003.
 [9] Leonid A Kunyansky. Explicit inversion formulae for the spherical mean Radon transform. Inverse problems, 23(1):373, 2007.
 [10] Minghua Xu and Lihong V Wang. Universal backprojection algorithm for photoacoustic computed tomography. Physical Review E, 71(1):016706, 2005.
 [11] Yuan Xu, Lihong V Wang, Gaik Ambartsoumian, and Peter Kuchment. Reconstructions in limitedview thermoacoustic tomography. Medical physics, 31(4):724–733, 2004.
 [12] Eric Chung, Chi Yeung Lam, and Jianliang Qian. A Neumann series based method for photoacoustic tomography on irregular domains. Contemporary Mathematics, 615:89–104, 2014.
 [13] Benjamín Palacios. Reconstruction for multiwave imaging in attenuating media with large damping coefficient. Inverse Problems, 32(12):125008, 2016.
 [14] Andrew Homan. Multiwave imaging in attenuating media. Inverse Problems & Imaging, 7(4):1235–1250, 2013.

[15]
Yulia Hristova.
Time reversal in thermoacoustic tomography—an error estimate.
Inverse Problems, 25(5):055008, 2009.  [16] Yulia Hristova, Peter Kuchment, and Linh Nguyen. Reconstruction and time reversal in thermoacoustic tomography in acoustically homogeneous and inhomogeneous media. Inverse Problems, 24(5):055006, 2008.
 [17] Jianliang Qian, Plamen Stefanov, Gunther Uhlmann, and Hongkai Zhao. An efficient Neumann series–based algorithm for thermoacoustic and photoacoustic tomography with variable sound speed. SIAM Journal on Imaging Sciences, 4(3):850–883, 2011.
 [18] Plamen Stefanov and Gunther Uhlmann. Thermoacoustic tomography with variable sound speed. Inverse Problems, 25(7):075011, 2009.
 [19] Plamen Stefanov and Gunther Uhlmann. Thermoacoustic tomography arising in brain imaging. Inverse Problems, 27(4):045004, 2011.
 [20] Plamen Stefanov and Yang Yang. Multiwave tomography in a closed domain: averaged sharp time reversal. Inverse Problems, 31(6):065007, 2015.
 [21] Plamen Stefanov and Yang Yang. Thermoand photoacoustic tomography with variable speed and planar detectors. SIAM Journal on Mathematical Analysis, 49(1):297–310, 2017.
 [22] Plamen Stefanov and Yang Yang. Multiwave tomography with reflectors: Landweber’s iteration. Inverse Problems & Imaging, 11(2):373–401, 2017.
 [23] Zakaria Belhachmi, Thomas Glatz, and Otmar Scherzer. A direct method for photoacoustic tomography with inhomogeneous sound speed. Inverse Problems, 32(4):045005, mar 2016.
 [24] Ashkan Javaherian and Sean Holman. Direct quantitative photoacoustic tomography for realistic acoustic media. Inverse Problems, 2019.
 [25] Ge Wang. A perspective on deep imaging. IEEE Access, 4:8914–8924, 2016.
 [26] Ge Wang, Jong Chu Ye, Klaus Mueller, and Jeffrey A Fessler. Image reconstruction is a new frontier of machine learning. IEEE transactions on medical imaging, 37(6):1289–1296, 2018.
 [27] Stephan Antholzer, Markus Haltmeier, and Johannes Schwab. Deep learning for photoacoustic tomography from sparse data. Inverse Problems in Science and Engineering, 27(7):987–1005, 2019.
 [28] Stephan Antholzer, Markus Haltmeier, Robert Nuster, and Johannes Schwab. Photoacoustic image reconstruction via deep learning. In Photons Plus Ultrasound: Imaging and Sensing 2018, volume 10494, 2018.
 [29] Stephan Antholzer, Johannes Schwab, and Markus Haltmeier. Deep learning versus minimization for compressed sensing photoacoustic tomography. In 2018 IEEE International Ultrasonics Symposium (IUS), pages 206–212. IEEE, 2018.
 [30] Andreas Hauptmann, Felix Lucka, Marta M. Betcke, Nam Huynh, Jonas Adler, Ben T. Cox, Paul C. Beard, Sébastien Ourselin, and Simon R. Arridge. Modelbased learning for accelerated, limitedview 3D photoacoustic tomography. IEEE Transactions on Medical Imaging, 37(6):1382–1393, 2018.
 [31] Dominik Waibel, Janek Gröhl, Fabian Isensee, Thomas Kirchner, Klaus MaierHein, and Lena MaierHein. Reconstruction of initial pressure from limited view photoacoustic images using deep learning. In Photons Plus Ultrasound: Imaging and Sensing 2018, volume 10494, 2018.

[32]
Johannes Schwab, Stephan Antholzer, Robert Nuster, Günther Paltauf, and
Markus Haltmeier.
Deep learning of truncated singular values for limited view photoacoustic tomography.
In Photons Plus Ultrasound: Imaging and Sensing 2019, volume 10878, page 1087836. International Society for Optics and Photonics, 2019.  [33] Steven Guan, Amir Khan, Siddhartha Sikdar, and Parag Chitnis. Fully dense UNet for 2D sparse photoacoustic tomography artifact removal. IEEE journal of biomedical and health informatics, 2019.
 [34] Derek Allman, Austin Reiter, and Muyinatu A Lediju Bell. Photoacoustic source detection and reflection artifact removal enabled by deep learning. IEEE transactions on medical imaging, 37(6):1464–1477, 2018.

[35]
Derek Allman, Austin Reiter, and Muyinatu Bell.
Exploring the effects of transducer models when training convolutional neural networks to eliminate reflection artifacts in experimental photoacoustic images.
In Photons Plus Ultrasound: Imaging and Sensing 2018, volume 10494, page 104945H. International Society for Optics and Photonics, 2018.  [36] Brendan Kelly, Thomas P Matthews, and Mark A Anastasio. Deep learningguided image reconstruction from incomplete data. arXiv preprint arXiv:1709.00584, 2017.
 [37] Johannes Schwab, Stephan Antholzer, Robert Nuster, and Markus Haltmeier. Realtime photoacoustic projection imaging using deep learning. arXiv:1801.06693, 2018.
 [38] Hongming Shan, Ge Wang, and Yang Yang. Accelerated correction of reflection artifacts by deep neural networks in photoacoustic tomography. Applied Sciences, 9(13):2615, 2019.
 [39] Hu Chen, Yi Zhang, Yunjin Chen, Junfeng Zhang, Weihua Zhang, Huaiqiang Sun, Yang Lv, Peixi Liao, Jiliu Zhou, and Ge Wang. Learn: Learned experts’ assessmentbased reconstruction network for sparsedata CT. IEEE transactions on medical imaging, 37(6):1333–1347, 2018.
 [40] Hongming Shan, Atul Padole, Fatemeh Homayounieh, Uwe Kruger, Ruhani Doda Khera, Chayanin Nitiwarangkul, Mannudeep K Kalra, and Ge Wang. Competitive performance of a modularized deep neural network compared to commercial algorithms for lowdose CT image reconstruction. Nature Machine Intelligence, 1(6):269–276, 2019.

[41]
H Shan, Y Zhang, Q Yang, U Kruger, MK Kalra, L Sun, W Cong, and G Wang.
3D convolutional encoderdecoder network for lowdose CT via transfer learning from a 2D trained network.
IEEE Trans Med Imaging, 37(6):1522–1534, 2018. 
[42]
Chenyu You, Guang Li, Yi Zhang, Xiaoliu Zhang, Hongming Shan, Shenghong Ju,
Zhen Zhao, Zhuiyang Zhang, Wenxiang Cong, Michael W Vannier, Punam K Saha,
and Ge Wang.
CT superresolution GAN constrained by the identical, residual, and cycle learning ensemble (GANCIRCLE).
IEEE Transactions on Medical Imaging, 2019.  [43] Huidong Xie, Hongming Shan, Wenxiang Cong, Xiaohua Zhang, Shaohua Liu, Ruola Ning, and Ge Wang. Dual network architecture for fewview CT–trained on imagenet data and transferred for medical imaging. arXiv preprint arXiv:1907.01262, 2019.

[44]
Fenglei Fan, Hongming Shan, and Ge Wang.
Quadratic autoencoder for lowdose CT denoising.
In 15th International Meeting on Fully ThreeDimensional Image Reconstruction in Radiology and Nuclear Medicine, volume 11072, page 110722Z. International Society for Optics and Photonics, 2019.  [45] Hongming Shan, Uwe Kruger, and Ge Wang. A novel transfer learning framework for lowdose CT. In 15th International Meeting on Fully ThreeDimensional Image Reconstruction in Radiology and Nuclear Medicine, volume 11072, page 110722Y. International Society for Optics and Photonics, 2019.
 [46] Qing Lyu, Hongming Shan, and Ge Wang. MRI superresolution with ensemble learning and complementary priors. arXiv preprint arXiv:1907.03063, 2019.
 [47] Hongming Shan, Ge Wang, Mannudeep K Kalra, R de Souza, and Junping Zhang. Enhancing transferability of features from pretrained deep neural networks for lung nodule classification. In Proceedings of the 2017 International Conference on Fully ThreeDimensional Image Reconstruction in Radiology and Nuclear Medicine, 2017.
 [48] Yiming Lei, Yukun Tian, Hongming Shan, Junping Zhang, Ge Wang, and Mannudeep Kalra. Soft activation mapping of lung nodules in lowdose CT images. arXiv preprint arXiv:1810.12494, 2018.
 [49] Zhao Peng, Hongming Shan, Tianyu Liu, Xi Pei, Ge Wang, and X George Xu. MCDNet–a denoising convolutional neural network to accelerate Monte Carlo radiation transport simulations: A proof of principle with patient dose from xray CT imaging. IEEE Access, 7:76680–76689, 2019.
 [50] Benjamin Q Huynh, Hui Li, and Maryellen L Giger. Digital mammographic tumor classification using transfer learning from deep convolutional neural networks. Journal of Medical Imaging, 3(3):034501, 2016.
 [51] Plamen Stefanov and Gunther Uhlmann. Instability of the linearized problem in multiwave tomography of recovery both the source and the speed. Inverse Problems & Imaging, 7(4):1367–1377, 2013.

[52]
David Finch and Kyle S Hickmann.
Transmission eigenvalues and thermoacoustic tomography.
Inverse Problems, 29(10):104016, 2013.  [53] Hongyu Liu and Gunther Uhlmann. Determining both sound speed and internal source in thermoand photoacoustic tomography. Inverse Problems, 31(10):105005, 2015.
 [54] Christina Knox and Amir Moradifam. Determining both the source of a wave and its speed in a medium from boundary measurements. arXiv preprint arXiv:1803.06750, 2018.
 [55] Zhen Yuan, Qizhi Zhang, and Huabei Jiang. Simultaneous reconstruction of acoustic and optical properties of heterogeneous media by quantitative photoacoustic tomography. Optics express, 14(15):6749–6754, 2006.
 [56] Thomas P Matthews, Joemini Poudel, Lei Li, Lihong V Wang, and Mark A Anastasio. Parameterized joint reconstruction of the initial pressure and sound speed distributions for photoacoustic computed tomography. SIAM journal on imaging sciences, 11(2):1560–1588, 2018.
 [57] Jin Zhang, Kun Wang, Yongyi Yang, and Mark A Anastasio. Simultaneous reconstruction of speedofsound and optical absorption properties in photoacoustic tomography via a timedomain iterative algorithm. In Photons Plus Ultrasound: Imaging and Sensing 2008: The Ninth Conference on Biomedical Thermoacoustics, Optoacoustics, and Acoustooptics, volume 6856, page 68561F. International Society for Optics and Photonics, 2008.
 [58] Thomas P Matthews and Mark A Anastasio. Joint reconstruction of the initial pressure and speed of sound distributions from combined photoacoustic and ultrasound tomography measurements. Inverse problems, 33(12):124002, 2017.
 [59] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 [60] B. E. Treeby and B. T. Cox. kwave: MATLAB toolbox for the simulation and reconstruction of photoacoustic wavefields. Journal of Biomedical Optics, 15(2):021314, 2010.