Photoacoustic Microscopy with Sparse Data Enabled by Convolutional Neural Networks for Fast Imaging

06/08/2020 ∙ by Jiasheng Zhou, et al. ∙ Shanghai Jiao Tong University Peking University 0

Photoacoustic microscopy (PAM) has been a promising biomedical imaging technology in recent years. However, the point-by-point scanning mechanism results in low-speed imaging, which limits the application of PAM. Reducing sampling density can naturally shorten image acquisition time, which is at the cost of image quality. In this work, we propose a method using convolutional neural networks (CNNs) to improve the quality of sparse PAM images, thereby speeding up image acquisition while keeping good image quality. The CNN model utilizes both squeeze-and-excitation blocks and residual blocks to achieve the enhancement, which is a mapping from a 1/4 or 1/16 low-sampling sparse PAM image to a latent fully-sampled image. The perceptual loss function is applied to keep the fidelity of images. The model is mainly trained and validated on PAM images of leaf veins. The experiments show the effectiveness of our proposed method, which significantly outperforms existing methods quantitatively and qualitatively. Our model is also tested using in vivo PAM images of blood vessels of mouse ears and eyes. The results show that the model can enhance the image quality of the sparse PAM image of blood vessels from several aspects, which may help fast PAM and facilitate its clinical applications.



There are no comments yet.


page 5

page 7

page 8

page 12

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Photoacoustic microscopy (PAM), as a hybrid imaging technique based on the photoacoustic (PA) effect [6, 34], has been widely used for biomedical imaging [5, 30, 35, 18]. Optical-resolution PAM (OR-PAM), as one implementation of PAM, offers high spatial resolution at the expense of penetration depth and has demonstrated many potential applications [18, 31, 25]. For image acquisition in OR-PAM, since a sample typically has spatially different optical absorption, the absorption map of the sample is obtained by point-by-point scanning over the sample. As a result, the imaging speed of OR-PAM is highly restricted by the point-by-point scanning mechanism, especially for high-resolution OR-PAM that performs scanning with small step size and thus more scanning points (i.e., more pixels) within a specific region of interest (ROI). A low imaging frame rate may hamper selected applications such as monitoring dynamic biological systems.

In recent years, efforts to increase the scanning speed of OR-PAM have mainly focused on the fast scanning mechanism. Components for fast-scanning PAM include high-speed voice-coil stages [15, 29], galvanometer scanners [14], microelectromechanical system scanning mirrors [33], and micro lens arrays and array ultrasonic transducers [21, 17]. A random-access scanning method is also applied in OR-PAM to improve imaging speed by scanning a selected region using a digital micromirror device [22]. In these works, sophisticated and expensive hardware is used. Instead, sparse-scanning OR-PAM offers an alternative solution that saves image acquisition time by reducing scanning points (compared to full scan) to increase imaging speed. Methods such as compressive sensing are also applied to recover a PAM image with sparse data [24]. However, due to the relatively high sampling density and the complexity of compression sampling experiments, their contribution to imaging speed is limited. Therefore, there is still a need for more efficient and practical algorithms to accelerate the imaging speed of OR-PAM.

Convolutional neural networks (CNNs), as a promising method, have been used to achieve super resolution (SR) 

[12, 9, 20] and enhance the PA computed tomography [27, 2, 1, 3, 4]. However, to our knowledge, there are no studies to improve the PAM imaging speed using CNN-based methods. Enhancing the quality of a low-sampling sparse PAM image ( i.e., restoring it to a latent fully-sampled image), can be categorized as an SR problem (i.e., low-resolution image to high-resolution image). Therefore, a CNN-based method can be utilized to improve the quality of the sparse PAM image. In this work, we propose to use CNNs to process sparse PAM images, so sparse scanning can be used to increase the imaging speed. High-quality images are restored from 1/4 or 1/16 low-sampling images using the proposed CNN model. The model is trained and validated on a dataset consisting of 268 PAM images of leaf veins, which can be accessed online for further studies by other researchers. We also extend our method to in vivo applications and achieve high performance in restoring sparse PAM images of blood vessels of mouse ears and eyes, demonstrating the feasibility in biomedical research and clinical application.

2 Methods

2.1 Dataset Preparation

In this work, a dataset of PAM images of oak and magnolia leaf veins was used to train and validate our CNN model. Leaves were immersed in a container with black ink for more than 7 hours. Then, the leaves were placed on a glass slide and sealed with silicone glue (GE Sealants). For each PAM image, the leaf samples were scanned by an OR-PAM probe (resolution: 2 µm, measured by a beam profiler and a beam expander) over scanning points with scanning step size of 8 µm. Finally, we acquired a dataset of 268 original full-sampling PAM images in total using our PAM system (see Fig. S1 in Supplementary Material).

Figure 1: The process of generating low-sampling images from the full-sampling (i.e., full-scanning) ones. (a) Illustration of the down-sampling method. (b) An example by applying the down-sampling method.

When generating low-sampling sparse data from full-sampling data, as shown in Fig. 1, with scaling in step size, only 1/2 pixels in one lateral dimension are selected and used (as indicated by the yellow-colored pixels in Fig. 1(a)) in the low-sampling image. That is, the low-sampling image ( pixels) has only 1/4 pixels of the full-sampling image. Similarly, with a step size scaling, the low-sampling image ( pixels) has only 1/16 pixels of the full-sampling case. It is also expected to require only 1/16 image acquisition time of the full-sampling image. Finally, 268 pieces of raw data are collected for our CNN model, where the 2D low-sampling PAM image is used as input and 2D full-sampling PAM image ( pixels) is used as output (i.e., ground truth). Note that the 2D PAM image here means 2D maximum amplitude projection (MAP) along the depth direction, which is commonly used for the OR-PAM image display. As can be seen, the image quality is degraded in the low-sampling images of Fig. 1(b) (e.g., blurs and discontinuities). For each scaling rate ( and ), we split the dataset into training, validation, and test sets with a ratio of 0.8:0.1:0.1. Regular data augmentation operations, including flipping and rotation, are applied for training.

2.2 Network Architecture and Settings

The architecture of the proposed CNN is shown in Fig. 2. We utilize 16 residual blocks and 8 Squeeze-and-Excitation (SE) blocks [16]

as the key parts of feature extraction. Inspired by SRGAN 

[20], the residual blocks elaborated in Fig. 2(b) can well extract features in SR tasks. Moreover, we find that the SE block [16] (shown in Fig. 2(c)) with the channel-wise attention mechanism contributes to network convergence and performance. The “Upconv” block consists of a

upsampling layer and a standard convolutional layer (with a kernel size of 3, number of filters of 256, and stride of 1). The final output layer is followed by a Tanh activation function.

Figure 2: The architecture of the proposed CNN model. (a) The overview of the proposed CNN model. (b) The details of each residual block. (c) The details of each SE block.

The perceptual loss is applied to train the CNN model. As indicated in [20, 7, 26, 10, 19]

, although pixel-wise mean squared error (MSE) loss function significantly improves the pixel-wise metrics such as peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) 

[32], the generated image is easily too smooth and the quality is poor from a subjective point of view. This phenomenon is quite severe in our PAM images (demonstrated later). Instead, for the perceptual loss, we calculate the MSE based on the output feature map of the 7th convolutional layer of VGG19 [20, 7, 19, 28, 11]

, which can give the high-level feature description of the image. The VGG19 model is pretrained on the ImageNet dataset 

[8]. With the prediction and the ground truth projection to calculate the perceptual loss, the two feature maps from the 7th convolutional layer of VGG19 can be expressed as and , respectively. Thus, the perceptual MSE loss should be:


where , , and denote the dimensions of the feature map.

In our experiments, the proposed CNN model is implemented using Keras framework with Tensorflow backend. Adam optimizer is applied with

. The learning rate is 2e-4. A single Nvidia 2080Ti GPU is used for training.

3 Results and Analysis

3.1 Leaf Vein Experiment by the Down-sampling Method for Testing

Fig. 3 shows two representative results of image restoration for scaling rate (results for

scaling rate can be found in Supplementary Material). Besides our restoration method, two other representative methods, the bicubic interpolation and a re-trained EDSR model 

[23] are applied for comparison. EDSR is a typical and effective CNN-based method originally designed for natural images.

Figure 3: Example results of the leaf vein experiment. The numbers below images indicate the PSNR (dB) and SSIM values compared with the corresponding ground truth. The first sample is from a magnolia leaf and the second is from an oak leaf.

By checking zoomed images (denoted by the green boxes), the two CNN-based methods are superior to bicubic interpolation. Specifically, first, the results by bicubic interpolation were blurred and overly smoothed. Secondly, the low-sampling image suffered from discontinuities, which were not recovered by bicubic. By contrast, no such issues are observed by CNN methods, and the recovered images look more natural and closer to the full-scanning ones. Bicubic interpolation (and other conventional methods) uses the weighted average values of a local area, while the CNN model can learn more high-level (or more global) information to predict pixel values better. It is worth noting that the above issues (over smoothing, blurring, and discontinuity) become more severe in the recovered images using bicubic interpolation from 1/4 to 1/16 low-sampling cases, while the quality of the recovered images is surprisingly maintained almost the same from 1/4 to 1/16 low-sampling cases using our CNN method (see Supplementary Material for 1/4 low-sampling cases). That is, as the scaling increases, the advantages of our method becomes more apparent than the bicubic.

A similar trend can be found in the statistical results on the test set, as shown in Table 1. For the scaling case, compared to bicubic interpolation, our model’s PSNR and SSIM values (average of the test set) are greatly improved by 3.1819 dB and 0.1386. Besides, according to the two metrics, our model outperforms the re-trained EDSR model at both and scaling.

Bicubic 23.4936 0.7721 19.9941 0.5773
EDSR 24.2356 0.5955 21.5557 0.6264
Ours 26.1431 0.8183 23.1760 0.7159
Table 1: Leaf vein experiment: Comparison of PSNR and SSIM values.

3.2 Leaf Vein Experiment Using Experimentally-acquired Sparse Data for Verification

To further verify the feasibility of our method, besides the low-sampling images obtained from the operation in Fig. 1, experimentally-acquired sparse PAM images are also fed to our trained CNN, which is closer to practical applications. We scan the same ROI with a scanning step size of 8 µm and 16 µm (or 32 µm). That is, the PAM image pair with lateral sizes of and (or )pixels is experimentally scanned for the same ROI. The low-sampling PAM images are used as the input and the corresponding image with pixels is used as the reference. One representative result is shown in Supplementary Material, where the advantages by our CNN model (no issues of over smoothing, blurring, and discontinuity) are also observed. The results verify that by using our CNN model for sparse-scanning and post processing, fast PAM imaging can be realized to achieve images with similar quality to the very time-consuming full-sampling corresponding image.

3.3 Ablation Investigation

As shown in Fig. 2, we applied the SE block [16] after some residual blocks. With the channel-wise attention design, SE block is thought to be useful for channel information selection. In our experiments, the CNN without SE blocks shows relatively poor results. For example, for scaling test set, SE blocks can improve the PSNR and SSIM values by 1.3876 dB and 0.0936, respectively. The detailed comparison results can be found in Supplementary Material.

Figure 4: Example results of CNN models with pixel-wise MSE loss and perceptual loss. The numbers below restored images indicate the corresponding PSNR (dB) / SSIM values. The sample comes from an oak leaf.

The perceptual loss is one of the most critical settings of our method. As explained before, training with pixel-wise MSE probably results in finding a pixel-wise average solution, which loses fine texture 

[20, 7, 26, 10, 19]. To illustrate the problem, we have trained a pair of models: one used the perceptual loss function while the other utilized the standard pixel-wise MSE loss function. Example results for the scaling case are shown in Fig. 4. According to Fig. 4, PSNR and SSIM values of the middle image are higher than those of the right one. However, the middle image is so smooth that it differs from the ground truth (i.e., full-sampling) a lot from the perceptive point of view. Some small branches in the middle PAM image even disappear (e.g., the parts indicated by the blue arrows in Fig. 4). By contrast, the right PAM image looks very much like the corresponding ground truth (e.g., restored more textures in the ground truth), which may be more critical to biomedical applications. In this regard, it is essential to apply such a perceptual loss function.

3.4 In vivo Experiment

Without transfer learning, our model is only trained with the leaf veins dataset and then directly used to test the PAM images of mouse ear and eye, showing improvements quantitatively and qualitatively. A representative sample for

scaling case is shown in Fig. 5. The full-sampling PAM image is acquired using the probe with a resolution of 4 µm and a scanning step size of 3 µm. As can be seen, the recovered PAM images using our method show sharper edges and more distinguishable patterns compared with those by bicubic interpolation and EDSR. For better comparison, one-dimensional (1D) profiles in Fig. 5 are plotted in Supplementary Material.

Figure 5: Demonstration of in vivo PAM images of blood vessels of the mouse ear. The numbers below images show the PSNR (dB) and SSIM values compared with the corresponding ground truth. 1D profiles along the dashed green lines in the zoom images are attached in Supplementary Material.
Figure 6: Demonstration of in vivo PAM images of blood vessels of the mouse eye. The numbers below images show the PSNR (dB) and SSIM values compared with the corresponding ground truth.

Further, we attempt to test our model for different patterns other than tree-like patterns (i.e., with branches, subbranches, etc.) demonstrated previously. Therefore, an in vivo PAM image of blood vessels of the mouse eyes with radial patterns was tested. The raw data were acquired by the probe with a resolution of 3 µm and a scanning step size of 4 µm [13]. Fig. 6 shows the results for the scaling case. Our method renders a better-recovered PAM image than bicubic interpolation and EDSR. This can be appreciated more clearly by comparing the zoom images. Therefore, even if the CNN model trained from tree-like pattern images is applied to an image with radial patterns, we still achieve good performance, thereby showing the robustness of the CNN method to some degree.

4 Conclusion

We propose a novel CNN-based method to improve the quality of sparse PAM images, which can equivalently improve PAM imaging speed. The model is trained on the dataset of PAM images of leaf samples. Residual blocks, SE bocks, and perceptual loss function are essential in our CNN model. Both 1/4 and 1/16 low-sampling sparse PAM images (i.e., and scaling cases, respectively) were tested, and the proposed CNN method showed remarkable performance both quantitatively and intuitively. We have also tested our method using in vivo PAM images of blood vessels of mouse ears and eyes, and the recovered PAM images had a high resemblance to the full-sampling ones. The CNN method to deal with sparse data demonstrated in OR-PAM may also be applied to AR-PAM and other point-by-point scanning imaging modalities such as optical coherence tomography and confocal fluorescence microscopy. Our work opens up new opportunities for fast PAM imaging.


  • [1] D. Allman, A. Reiter, and M. A. L. Bell (2018)

    Photoacoustic source detection and reflection artifact removal enabled by deep learning

    IEEE Trans. Med. Imaging 37 (6), pp. 1464–1477. Cited by: §1.
  • [2] E. M. A. Anas, H. K. Zhang, J. Kang, and E. Boctor (2018)

    Enabling fast and high quality led photoacoustic imaging: a recurrent neural networks based approach

    Biomed. Opt. Express 9 (8), pp. 3852–3866. Cited by: §1.
  • [3] S. Antholzer, M. Haltmeier, R. Nuster, and J. Schwab (2018) Photoacoustic image reconstruction via deep learning. In Photons Plus Ultrasound: Imaging and Sensing 2018, Vol. 10494, pp. 104944U. Cited by: §1.
  • [4] S. Antholzer, M. Haltmeier, and J. Schwab (2019) Deep learning for photoacoustic tomography from sparse data. Inverse. Probl. Sci. En. 27 (7), pp. 987–1005. Cited by: §1.
  • [5] P. Beard (2011) Biomedical photoacoustic imaging. Interface Focus 1 (4), pp. 602–631. Cited by: §1.
  • [6] A. G. Bell (1880) ART. xxxiv.–on the production and reproduction of sound by light. Am. J. Sci. (1880-1910) 20 (118), pp. 305. Cited by: §1.
  • [7] J. Bruna, P. Sprechmann, and Y. LeCun (2015) Super-resolution with deep convolutional sufficient statistics. arXiv preprint arXiv:1511.05666. Cited by: §2.2, §3.3.
  • [8] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) Imagenet: a large-scale hierarchical image database. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    pp. 248–255. Cited by: §2.2.
  • [9] C. Dong, C. C. Loy, K. He, and X. Tang (2015) Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38 (2), pp. 295–307. Cited by: §1.
  • [10] A. Dosovitskiy and T. Brox (2016) Generating images with perceptual similarity metrics based on deep networks. In Advances in Neural Information Processing Systems, pp. 658–666. Cited by: §2.2, §3.3.
  • [11] L. Gatys, A. S. Ecker, and M. Bethge (2015) Texture synthesis using convolutional neural networks. In Advances in Neural Information Processing Systems, pp. 262–270. Cited by: §2.2.
  • [12] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems, pp. 2672–2680. Cited by: §1.
  • [13] Z. Guo, Y. Li, and S. Chen (2018) Miniature probe for in vivo optical-and acoustic-resolution photoacoustic microscopy. Opt. Lett. 43 (5), pp. 1119–1122. Cited by: §3.4.
  • [14] P. Hajireza, W. Shi, K. Bell, R. J. Paproski, and R. J. Zemp (2017) Non-interferometric photoacoustic remote sensing microscopy. Light: Sci. Appl. 6 (6), pp. e16278–e16278. Cited by: §1.
  • [15] T. Harrison, J. C. Ranasinghesagara, H. Lu, K. Mathewson, A. Walsh, and R. J. Zemp (2009) Combined photoacoustic and ultrasound biomicroscopy. Opt. Express 17 (24), pp. 22041–22046. Cited by: §1.
  • [16] J. Hu, L. Shen, and G. Sun (2018) Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141. Cited by: §2.2, §3.3.
  • [17] T. Imai, J. Shi, T. T. Wong, L. Li, L. Zhu, and L. V. Wang (2018) High-throughput ultraviolet photoacoustic microscopy with multifocal excitation. J. Biomed. Opt. 23 (3), pp. 036007. Cited by: §1.
  • [18] S. Jeon, J. Kim, D. Lee, B. J. Woo, and C. Kim (2019) Review on practical photoacoustic microscopy. Photoacoustics, pp. 100141. Cited by: §1.
  • [19] J. Johnson, A. Alahi, and L. Fei-Fei (2016) Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision, pp. 694–711. Cited by: §2.2, §3.3.
  • [20] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. (2017) Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690. Cited by: §1, §2.2, §2.2, §3.3.
  • [21] G. Li, K. I. Maslov, and L. V. Wang (2013) Reflection-mode multifocal optical-resolution photoacoustic microscopy. J. Biomed. Opt. 18 (3), pp. 030501. Cited by: §1.
  • [22] J. Liang, Y. Zhou, A. W. Winkler, L. Wang, K. I. Maslov, C. Li, and L. V. Wang (2013) Random-access optical-resolution photoacoustic microscopy using a digital micromirror device. Opt. Lett. 38 (15), pp. 2683–2686. Cited by: §1.
  • [23] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee (2017) Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE conference on Computer Vision and pattern recognition workshops, pp. 136–144. Cited by: §3.1.
  • [24] T. Liu, M. Sun, Y. Liu, D. Hu, Y. Ma, L. Ma, and N. Feng (2019) ADMM based low-rank and sparse matrix recovery method for sparse photoacoustic microscopy. Biomed. Signal Process. 52, pp. 14–22. Cited by: §1.
  • [25] W. Liu and J. Yao (2018) Photoacoustic microscopy: principles and biomedical applications. Biomed. Eng. Lett. 8 (2), pp. 203–213. Cited by: §1.
  • [26] M. Mathieu, C. Couprie, and Y. LeCun (2015) Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440. Cited by: §2.2, §3.3.
  • [27] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In MICCAI 2015, N. Navab, J. Hornegger, W.M. Wells, and A.F. Frangi (Eds.), Vol. 9351, Cham, pp. 234–241. Cited by: §1.
  • [28] K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §2.2.
  • [29] L. Wang, K. I. Maslov, W. Xing, A. Garcia-Uribe, and L. V. Wang (2012) Video-rate functional photoacoustic microscopy at depths. J. Biomed. Opt. 17 (10), pp. 106007. Cited by: §1.
  • [30] L. V. Wang and S. Hu (2012) Photoacoustic tomography: in vivo imaging from organelles to organs. Science 335 (6075), pp. 1458–1462. Cited by: §1.
  • [31] L. V. Wang and J. Yao (2016) A practical guide to photoacoustic tomography in the life sciences. Nat. Methods 13 (8), pp. 627. Cited by: §1.
  • [32] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13 (4), pp. 600–612. Cited by: §2.2.
  • [33] J. Yao, L. Wang, J. Yang, K. I. Maslov, T. T. Wong, L. Li, C. Huang, J. Zou, and L. V. Wang (2015) High-speed label-free functional photoacoustic microscopy of mouse brain in action. Nat. Methods 12 (5), pp. 407–410. Cited by: §1.
  • [34] J. Yao and L. V. Wang (2013) Photoacoustic microscopy. Laser Photonics Rev. 7 (5), pp. 758–778. Cited by: §1.
  • [35] J. Yao and L. V. Wang (2018) Recent progress in photoacoustic molecular imaging. Curr. Opin. Chem. Biol. 45, pp. 104–112. Cited by: §1.

Supplementary Material

Fig. S1. Schematic of the OR-PAM system. BS, beam splitter; PD, photodiode; ND, neutral density filter; L1, lens #1; L2, lens #2; L3, lens #3; FC, fiber coupler; SMF, signal mode fiber; UST, ultrasound transducer; Amp, preamplifier; DAQ, data acquisition card; PC, personal computer.

Fig. S2. Example scaling results of the leaf vein experiment. The numbers below images indicate the PSNR and SSIM values compared with the corresponding ground truth. Both samples come from magnolia leaves.

Fig. S3. Experimentally-acquired sparse data verification. The low-scanning image (from an oak leaf) is collected by using large scanning step size experimentally. The numbers below images indicate the PSNR and SSIM values compared with the corresponding full-scanning image. Noise and laser fluctuations might impact the quantitative calculation.

Fig. S4. 1D profiles along the dashed green lines in Fig. 5 of the main text.

Table S1. Ablation investigation results of the existence of SE blocks. PSNR (dB) SSIM PSNR (dB) SSIM Without SE blocks 24.9429 0.8124 21.7884 0.6223 With SE blocks 26.1431 0.8183 23.1760 0.7159