OADAT: Experimental and Synthetic Clinical Optoacoustic Data for Standardized Image Processing

by   Berkan Lafci, et al.
Universität Zürich
ETH Zurich

Optoacoustic (OA) imaging is based on excitation of biological tissues with nanosecond-duration laser pulses followed by subsequent detection of ultrasound waves generated via light-absorption-mediated thermoelastic expansion. OA imaging features a powerful combination between rich optical contrast and high resolution in deep tissues. This enabled the exploration of a number of attractive new applications both in clinical and laboratory settings. However, no standardized datasets generated with different types of experimental set-up and associated processing methods are available to facilitate advances in broader applications of OA in clinical settings. This complicates an objective comparison between new and established data processing methods, often leading to qualitative results and arbitrary interpretations of the data. In this paper, we provide both experimental and synthetic OA raw signals and reconstructed image domain datasets rendered with different experimental parameters and tomographic acquisition geometries. We further provide trained neural networks to tackle three important challenges related to OA image processing, namely accurate reconstruction under limited view tomographic conditions, removal of spatial undersampling artifacts and anatomical segmentation for improved image reconstruction. Specifically, we define 18 experiments corresponding to the aforementioned challenges as benchmarks to be used as a reference for the development of more advanced processing methods.


page 3

page 15

page 16

page 17

page 18

page 19

page 20

page 21


Y-Net: A Hybrid Deep Learning Reconstruction Framework for Photoacoustic Imaging in vivo

Photoacoustic imaging (PAI) is an emerging non-invasive imaging modality...

Deep Learning in Photoacoustic Tomography: Current approaches and future directions

Biomedical photoacoustic tomography, which can provide high resolution 3...

Image computing for fibre-bundle endomicroscopy: A review

Endomicroscopy is an emerging imaging modality, that facilitates the acq...

AS-Net: Fast Photoacoustic Reconstruction with Multi-feature Fusion from Sparse Data

Photoacoustic (PA) imaging is a biomedical imaging modality capable of a...

SVD Based Image Processing Applications: State of The Art, Contributions and Research Challenges

Singular Value Decomposition (SVD) has recently emerged as a new paradig...

Robust Scatterer Number Density Segmentation of Ultrasound Images

Quantitative UltraSound (QUS) aims to reveal information about the tissu...

Code Repositories


OADAT: Experimental and Synthetic Clinical Optoacoustic Data for Standardized Image Processing

view repo

1 Introduction

Optoacoustic (OA) imaging is being established as a powerful method with increasing application areas in clinical [38, 39] and preclinical settings [21, 27]. OA images of biological tissues are tomographically reconstructed from the acquired ultrasound (US) waves thermoelastically excited with nanosecond-duration pulsed lasers operating in the near-infrared optical wavelength range (Fig.  1a). The rich optical contrast from endogenous tissue chromophores such as blood, melanin, lipids and others are combined with high US resolution, i.e., tens of micrometers. This unique feature makes OA particularly suitable for molecular and functional imaging. Other important advantages such as the feasibility of hand-held operation, the fast acquisition performance (real time feedback) and the non-invasive safe contrast (i.e., non-ionizing radiation) further foster the wide use of OA in multiple biomedical studies. OA imaging has been shown to provide unique capabilities in preclinical studies with disease models e.g., of breast cancer [26, 9, 2], as well as for the clinical assessment of Crohn’s disease [20], atherosclerotic carotid plaques [18] or skin cancer [6]. As the range of applications of OA imaging gets broader, the need for different data processing pipelines increases in parallel. Also, new methods are continuously being developed to provide an enhanced OA performance. Specific examples include increased temporal resolution with compressed/sparse data acquisitions [31], accurate image reconstruction algorithms [7], light fluence correction by segmenting the tissue boundaries [22] or enhanced spectral unmixing algorithms from multispectral data [40].

Three main challenges in OA tomographic imaging are summarized below:
Sparse acquisition: OA imaging provides a unique potential to monitor fast-changing events such as cardiac arrhythmias [32]

, neuronal activity 

[34] or indocyanine green clearance [12] in vivo. For this, ultra-fast imaging systems capable of capturing changes in living organisms occurring at up to millisecond temporal scales are required. The main limiting factor affecting the achievable frame rate is the data transfer capacity. This limitation can be eliminated by reducing the number of acquired channels (signals). Therefore, sparse or compressed sensing methods have been proposed both using conventional methods [31]

and deep learning algorithms 

Limited view reconstruction: OA is inherently a tomographic imaging modality. Acquisition of pressure signals from different angles is essential to capture the information encoded in US waves traveling in a 3D medium in order to render accurate tomographic reconstructions. This further increases the image contrast, resolution and quantitativeness. However, tomographic coverage of the samples is often hindered by physical restrictions. Thereby, new image processing pipelines have been suggested to improve limited-view-associated challenges in OA imaging by using learning-based algorithms in the image domain [13], signal domain [19] and combination of both domains [4, 23].
Segmentation: Optimal OA reconstruction algorithms need to account for different optical and acoustic properties in biological tissues and in the coupling medium (water). For example, the speed of sound (SoS) depends on the elastic properties of the medium. Proper assignment of SoS values in tissues and in water requires accurate delineation of the tissue boundaries. Thereby, segmentation of structures [22, 36] in OA images has been shown to enhance the image reconstruction performance. Additionally, the optical fluence (intensity) also varies with depth across different tissues. This issue remains as one of the main factors affecting quantification in OA images [1] and can also be corrected with tissue segmentation [25].

As an emerging method, OA imaging requires standardization, open source code publication and data sharing practices to expedite the development of new application areas and data processing pipelines. Particularly, the aforementioned challenges associated to high data throughput, limited angular coverage, SoS assignment and fluence corrections require coordinated efforts between experimental and data science communities. Initial efforts to standardize data storage formats and image reconstruction algorithms have been undertaken 

[10]. However, proper testing of new signal/image processing algorithms requires access to larger datasets.

Here, we provide experimental data and simulations of forearm datasets as well as benchmark networks aiming at facilitating the development of new image processing algorithms and benchmarking. These “Experimental and Synthetic Clinical Optoacoustic Data (OADAT)” include, (i) large and varied clinical and simulated forearm datasets with paired subsampled or limited view image reconstruction counterparts, (ii) raw signal acquisition data of each such image reconstruction, (iii) definition of 18 experiments with gold standards focusing on the aforementioned OA challenges, (iv) pretrained model weights of the networks used for each task, and (v) user-friendly scripts to load and evaluate the networks on our datasets. The presented datasets and algorithms will expedite the research in OA image processing.

Figure 1: Experimental data acquisition, transducer arrays and resulting images. (a) Experimental set-up for optoacoustic forearm imaging. (b) Semi circle array along with an example of acquired images. (c) Multisegment array along with an example of acquired images. (d) Uniform subsampling for the semi circle array (32, 64 and 128 elements), limited view acquisition for the semi circle array with reduced angular coverage (128 elements) and linear array acquisition for the multisegment array (128 elements). Transducer elements are shown as actively receiving (red) or off (white).

2 Background

Below, we explain the transducer arrays used for data acquisition, the reconstruction algorithm used to generate images from acquired signals and the sampling/acquisition techniques.

2.1 Transducer arrays

Semi circle array contains 256 transducers elements distributed equidistantly over a semi circle (concave surface, Fig. 1b). Multisegment array is a combination of linear array in the center and concave parts on the right and left sides, designed to increase angular coverage, as shown in Fig. 1c. The linear part contains 128 elements and each of the concave parts consist of 64 elements, totaling to 256. Linear array is the central part of the multisegment array with 128 elements (Fig. 1c). The array geometry is optimized for US data acquisitions with planar waves. Hence, it produces OA images with limited view artifacts due to reduced angular coverage which is a limiting factor for OA image acquisitions. Virtual circle array is generated to simulate images with 360 degree angular coverage and yields artifact free reconstructions (Fig. 2a). It contains 1,024 transducer elements distributed over a full circle with equal distance. We also have a virtual multisegment array that correspond to its physical counterpart. Additional geometric details are listed in supplementary material.

Figure 2:

Overview of the simulated data. (a) Virtual circle array and an image reconstructed using 1,024 transducer elements. (b) Multisegment array and the corresponding image reconstructed using combined linear and concave parts of the transducer array. (c) Uniform subsampling of virtual circle array with 32, 64 and 128 elements, limited view acquisition with reduced angular coverage (128 elements) and linear array acquisition (128 elements). (d) Vessel size distribution (pixels per vessel), number of vessels per image, and peak signal-to-noise ratio of full sampling compared to other reconstructions (x axis naming conventions are explained in Sec. 

3.3). Transducer elements are shown as actively receiving (red) or off (white).

2.2 Reconstruction method

We use backprojection algorithm in this study to generate OA images from the acquired signals111Python module for OA reconstruction github.com/berkanlafci/pyoat. This algorithm is based on delay and sum beamforming approach [30] (see supplementary material for details). First, a mesh grid is created to represent the imaged field of view. Then, the distance between the points of the mesh grid and transducer elements are calculated based on the known locations of the array elements. Time of flight is obtained through dividing distance by the SoS values that are assigned based on the temperature of the imaging medium and tissue properties. The clinical and simulated data are reconstructed with SoS of 1,510 m/s in this study as the simulations and the experiments were done at the corresponding imaging medium temperature.

2.3 Sparse sampling

Sparse sampling yields streak artifacts on reconstructed images due to large inter-element pitch size. For a given angular coverage, i.e., transducer geometry, using less transducer elements for reconstruction causes stronger artifacts due to increased inter-element pitch size. We define sparse sampled semi circle array acquisitions semi circle sparse 32, semi circle sparse 64 and semi circle sparse 128 when using 32, 64 and 128 elements out of the 256 of semi circle array (Fig. 1d), respectively. Similarly, we define sparse sampled virtual circle array acquisitions virtual circle sparse 32, virtual circle sparse 64 and virtual circle sparse 128 when using 32, 64 and 128 elements out of 1,024 of virtual circle array (Fig. 2c), respectively. All items correspond to uniform and hence equidistant subsampling of the corresponding transducer array signals.

2.4 Limited View

Limited view acquisitions lead to distorted geometry (e.g., elongated vessels) due to the reduced angular coverage (Figs. 1d & 2c). To mimic commonly occurring limited view settings, we use a continuous subset of elements for a given transducer. This corresponds to retaining inter-element pitch size while reducing the angular coverage. We define a limited view acquisition for each transducer array as follows: (i) Linear array is the common practice in clinical settings for US data acquisition [17, 24]. Typically, the same linear geometry is combined with OA imaging to provide complementary information [28]. To model this clinically realistic scenario, we use the linear part of the multisegment array for OA image reconstruction. (ii) Semi circle limited 128 uses half of the semi circle array; 128 transducer elements, yielding a quarter circle. The differences between linear and semi circle limited view array acquisitions are the inter-element pitch size, focusing and geometry of the active area. (iii) Virtual circle limited 128 uses 128 consecutive elements (one eighth of a circle) out of 1,024.

3 Datasets

We present three datasets 222Link to our datasets: hdl.handle.net/20.500.11850/551512 333Repository for accessing and reading datasets: github.com/berkanlafci/oadat (two experimental, one simulated) where each has several subcategories for the purpose of tackling different challenges present in the domain. Raw signal acquisition data that is used to reconstruct all images are also provided with the datasets. Experimental datasets also include details about the volunteer Fitzpatrick skin phototype [14], which relates to the amount of melanin pigment in the skin (see supplementary material for distribution and further details). We also display a comparative overview of publicly available and our proposed OA datasets in Table 1.

Dataset tasks image reconstruction raw signal size (>1k slices) clinical data
Davoudi et al. [3]
Huang et al. [15]
MSFD (ours)
SWFD (ours)
SCD (ours)
Table 1: Publicly available OA datasets, supported tasks, provided data format(s), size, and content. [3] and [15] consist of preclinical (mice) and phantom images.

3.1 Multispectral forearm dataset

Multispectral forearm dataset (MSFD) is collected using multisegment array (Sec. 2.1) from nine volunteers at six different wavelengths (700, 730, 760, 780, 800, 850 nm) for both arms. Selected wavelengths are particularly aimed for spectral decomposition aiming to separate oxy- and deoxy-hemoglobin [33]. All wavelengths are acquired consecutively, yielding almost identical scene being captured for a given slice across different wavelengths with slight displacement errors. For each of the mentioned category 1,400 slices are captured, creating a sum of unique signal matrices.

From this data, we reconstruct linear and multisegment array images using backprojection algorithm with each having 151,200 images of resolution and refer to them as and , respectively.

3.2 Single wavelength forearm dataset

Single wavelength forearm dataset (SWFD) is collected using both multisegment and semi circle arrays (Sec. 2.1) from 14 volunteers at a single wavelength (1,064 nm) for both arms. The choice of the wavelength is based on maximizing penetration depth for the dataset [37]. Out of the 14 volunteers, eight of them have also participated in the MSFD experiment and their unique identifiers match across the dataset files. For each array, volunteer, and arm, we acquired 1,400 slices, creating a sum of unique signal matrices. It is important to note that despite the data being acquired from the same volunteers, signals between multisegment array and semi circle array are not paired due to physical constraints.

From this data, using backprojection algorithm we reconstruct (i) linear array images, , (ii) multisegment array images, , (iii) semi circle array images, , (iv) semi circle array limited 128 images (Sec.2.4), , (v) semi circle sparse 128 images (Sec.2.3), , (vi) semi circle sparse 64 images (Sec.2.3), , and (vii) semi circle sparse 32 images (Sec.2.3), , where each dataset has 39,200 images of resolution.

3.3 Simulated cylinders dataset

Simulated cylinders dataset (SCD) is a group of synthetically generated 20,000 forearm acoustic pressure map images that we heuristically produced based on a range of criteria we observed in experimental images. The acoustic pressure maps are generated at a

resolution where skin curves and afterwards a random amount of ellipses with different intensity profiles are generated iteratively for a given image (see Fig. 2). We explain details for the simulation algorithm in the supplementary material444Python module for acoustic map simulation renkulab.io/gitlab/firat.ozdemir/oa-armsim.

Based on the acoustic pressure map, we generate its annotation map with three labels, corresponding to background, vessels, and skin curve. For each acoustic pressure map, we generate signal matrices for the geometries of linear, multisegment and virtual circle arrays. Using linear and multisegment array signals, we use backprojection algorithm to reconstruct (i) linear array images, , and (ii) multisegment array images, . From virtual circle array signals, we use backprojection algorithm to reconstruct (i) virtual circle images, , (ii) virtual circle limited 128 images (Sec.2.4), , (iii) virtual circle sparse 128 images (Sec.2.3), , (iv) virtual circle sparse 64 images (Sec.2.3), , and (v) virtual circle sparse 32 images (Sec.2.3), , where each dataset has 20,000 images of resolution. All seven image reconstruction variations of SCD have corresponding pairs for each of the 20k image; i.e., produced from the same acoustic pressure map.

4 Tasks

Sparse sampling correction Limited view correction
: :
: :
: :
: :
: :
: :
Sematic segmentation
Table 2: List of tasks and experiments we define on MSFD, SWFD, and SCD.

Based on the datasets presented in Sec. 3, we define a list of experiments related to image translation to overcome (i) sparse sampling and (ii) limited view artifacts, and semantic segmentation of images.

4.1 Image translation

Through a list of permutations of our datasets, we can define several pairs of varying difficulty of image translation experiments where target images are also available (see Table 2). We present sparse reconstructions of both SWFD and SCD for semi circle and virtual ring arrays. Accordingly, sparse sampling correction experiments learn mapping functions listed in Table 2, where the function notations denote the dataset used, the task of sparse sampling (ss) correction from the given number of elements used for image reconstruction and the array that is used to generate the input. Further, we offer limited view reconstructions for all datasets and for all arrays. Accordingly, limited view correction experiments learn mapping functions, listed in Table 2, where the function notations denote the dataset used, the task of limited view (lv) correction from the given number of elements used for image reconstruction and the array that is used to generate the input.

4.2 Semantic segmentation

SCD includes pixel annotations for skin curve, vessels and background. In addition to segmentation of these structures on the ideal reconstructions , we define this task on sparse sampling and limited view reconstructions that contain the relevant artifacts encountered in experimental data. Accordingly, we compose the segmentation experiments listed in Table 2, where the function notations denote the task segmentation (seg), type of the reconstructed input being used (virtual circle, limited view, and sparse sampling) and the array that is used to generate the input. All data is generated from SCD and the objective is to match the ground truth annotations of the acoustic pressure map.

5 Experiments and results

Figure 3:

Residual convolutional block with batch normalization (BN). c

- conv. layer have kernels and filters.

For all experiments we standardize the architecture that we based on UNet [35]. Specifically, we adopt the five spatial feature abstraction levels and use skip connections to concatenate with features of matching spatial dimension along the upsampling path. However, we make several distinct design choices that vary from vanilla UNet. First, we use attention gates [29] at the end of each skip connection. Second, we opt for residual convolutional blocks with batch normalization [16] at each level, shown in Fig. 3. Third, we use 2D bilinear upsampling instead of deconvolutions. Finally, we use half the number of convolutional kernels at each layer; e.g., start with 32 convolutional filters as opposed to 64. Full schematic as well as other implementation details are discussed in supplementary material. We refer to our modified UNet architecture as modUNet hereon.

5.1 Data split and preprocessing

We standardize how we split each dataset into training and test sets regardless of the task in order to ensure consistency in our and future experiments. Out of the nine volunteers in MSFD, we use five for training (IDs: 2, 5, 6, 7, 9), one for validation (ID: 10) and three for testing (IDs: 11, 14, 15). Out of the 14 volunteers in SWFD, we use eight for training (IDs: 1, 2, 3, 4, 5, 6, 7, 8), one for validation (ID: 9) and five for testing (IDs: 10, 11, 12, 13, 14). Out of the 20k slices in SCD, we use the first 14k for training, following 1k for validation, and the last 5k for testing.

As a preprocessing step, all data instances (except for annotation maps) are independently scaled by their maximum and then clipped at the bottom at [8].

5.2 Results

Limited view lv128,li 0.004 ±8.8e-4 0.008 ±1.3e-3 0.982 ±5.5e-3 43.131 ±1.34
lv128,vc 0.004 ±8.7e-4 0.008 ±1.3e-3 0.984 ±4.8e-3 43.104 ±1.31
lv256,ms 0.004 ±8.6e-4 0.007 ±1.3e-3 0.985 ±5.2e-3 44.482 ±1.52
Sparse view ss128,vc 0.004 ±9.3e-4 0.007 ±1.5e-3 0.986 ±6.0e-3 44.783 ±1.75
ss64,vc 0.004 ±8.8e-4 0.008 ±1.5e-3 0.980 ±5.9e-3 43.680 ±1.54
ss32,vc 0.005 ±9.6e-4 0.009 ±1.6e-3 0.981 ±6.0e-3 42.791 ±1.56
Limited view lv128,li 0.026 ±1.5e-2 0.033 ±1.9e-2 0.704 ±1.1e-1 32.268 ±4.28
lv128,sc 0.015 ±1.2e-2 0.019 ±1.5e-2 0.844 ±1.0e-1 37.867 ±5.04
Sparse view ss128,sc 0.015 ±1.2e-2 0.018 ±1.5e-2 0.863 ±9.9e-2 38.154 ±5.31
ss64,sc 0.019 ±1.5e-2 0.024 ±1.8e-2 0.776 ±1.5e-1 35.843 ±5.06
ss32,sc 0.021 ±1.6e-2 0.027 ±1.9e-2 0.711 ±1.9e-1 34.371 ±5.08
limited view lv128,li 0.022 ±1.1e-2 0.030 ±1.4e-2 0.724 ±1.2e-1 32.692 ±3.73
Table 3: Image translation results of the proposed modUNet model reported as mean ±std.
Dice IoU
vessels skin curve vessels skin curve
Full view vc,vc 1.000 ±1.3e-3 1.000 ±3.4e-4 0.999 ±2.6e-3 1.000 ±6.7e-4
Limited view lv128,li 0.995 ±1.0e-2 0.998 ±1.5e-3 0.990 ±1.9e-2 0.996 ±3.0e-3
lv128,vc 0.994 ±1.2e-2 0.997 ±2.0e-3 0.987 ±2.2e-2 0.994 ±3.9e-3
Sparse view ss128,vc 0.998 ±3.6e-3 0.999 ±1.0e-3 0.996 ±7.0e-3 0.998 ±2.1e-3
ss64,vc 0.996 ±6.1e-3 0.998 ±1.9e-3 0.992 ±1.2e-2 0.995 ±3.8e-3
ss32,vc 0.996 ±6.3e-3 0.996 ±2.8e-3 0.991 ±1.2e-2 0.991 ±5.6e-3
Table 4: Segmentation results of the proposed modUNet model reported as mean ±std.

We evaluate modUNet performance on the test sets using standard metrics. Namely, we report mean absolute error (MAE), root mean squared error (RMSE), structural similarity index (SSIM), and peak signal-to-noise ratio (PSNR) for image translation experiments between modUNet predictions and targets. Segmentation task performance is reported using Dice coefficient (F1-score) and intersection over union (IoU, i.e., Jaccard index) metrics between modUNet predictions and annotation maps for vessels and skin curves. In Tables 

3 & 4

we report modUNet results for mean and standard deviations aggregated over the corresponding test set images. For SSIM, we show performance across the experimental datasets (MSFD and SWFD) and SCD in Fig. 

4. Upon exploring the reason behind the long tails, we notice that most of the lower scores occur when acquisition noise and/or artifacts are more pronounced. Nevertheless, modUNet successfully corrects geometric distortions for limited view experiments. Similarly, we plot the segmentation performance for IoU across different experiments on SCD in Fig. 5 for vessel and skin curve labels. While skin curve segmentation performance almost never drops below IoU of 0.95, one can see that IoU can be drastically lower for vessel segmentation. We observed that this only happens when the size of vessel is small. For example, there are examples with ground truth vessels having as low as four pixels while the prediction has six, leading to an IoU of . We provide qualitative results as well as conduct further analysis for all tasks in the supplementary material.

Figure 4: Distribution of modUNet SSIM performance on MSFD & SWFD (left) and SCD (right) image translation experiments, sorted in ascending median test sample performance.
Figure 5: Distribution of modUNet IoU performance on SCD semantic segmentation experiments for vessel (left) and skin curve (right) labels, sorted in ascending median test sample performance.

6 Discussion

Major differences exist between simulated (synthetic) and experimental datasets. Even if the content is different, training and test samples of the simulated dataset are inherently sampled from the same image distribution. On the other hand, experimental datasets feature shifts due to different volunteers being imaged, inherent noise from data acquisition system (parasitic signals), and difference in directional sensitivity resulting from transducer alignment and positioning of the hand-held probe. Furthermore, despite the efforts to avoid corrupted acquisitions during the data collection, experimental datasets still contain samples with relatively low signal-to-noise ratio. Such samples are expected to yield reduced performance metrics for image translation tasks due to significant mismatch between the predicted and noisy target images. In a clinical setting, a medical expert typically repeats an acquisition if they deem the signal quality is significantly lower than expected. However, beyond this subjective filtering step, one needs to make sure that even the worst results are either sufficiently good or their poor performance can be attributed to a cause. Accordingly, we further analyze some of the worst samples in supplementary material and believe that this should be a standard for future work.

We envision that future research will tackle additional challenges such as unsupervised or weakly supervised domain adaptation across the datasets provided in this work. Similarly, mapping between simulated and experimental domains is expected to enhance the segmentation performance of the skin line and vessels in clinical images. Using properties of the detected tissues for fluence correction and heterogeneous SoS image reconstructions would then yield more accurate and quantitative images [5]

. We anticipate additional contributions in the field of representation learning, e.g., through self-supervised learning could allow overcoming bottlenecks for specialized downstream tasks with limited amount of available data. The multispectral datasets with paired images across multiple wavelength acquisitions are expected to facilitate the investigation of generative modeling of multi-spectral signals from a given wavelength. We also anticipate future work to explore novel multi-spectral unmixing approaches using MSFD, enabling more accurate quantification of oxygenation, melanin and lipid content of the tissues. Finally, the provided raw signal data, not available in commercial devices, has a high value for data-driven research in image reconstruction e.g., based on variational networks with loop unrolling 


7 Conclusion

In this work, we provide experimental and synthetic clinical OA data covering a large variety of examples to be used as common ground to compare established and new data processing methods. The datasets correspond to samples from volunteers of varying skin types, different OA array geometries, and several illumination wavelengths. This data selection is supplemented with simulated samples containing ground truth acoustic pressure maps, annotations, and combine pairs of samples reconstructed with different OA array geometries. We define a set of 18 experiments tackling major challenges in the OA field and provide reconstructions of the images under these scenarios along with their corresponding ground truths. We propose and release 555Pretrained model weights and other scripts: https://renkulab.io/gitlab/firat.ozdemir/oadat-evaluate trained neural networks that achieve a good performance for all these examples which can be used as baselines for future improvements. In addition, we also make the acquired raw signals available to the community to be used for tackling the highlighted challenges in OA imaging using signal domain data. Additional problems can further be defined with the data provided, such as the effects of random sparse sampling or the presence of noise in the signal matrices prior to reconstruction. We believe that these datasets and benchmarks will play a significant role in fostering coordinated efforts to solve major challenges in OA imaging.

This work was supported by Swiss Data Science Center (grant C19-04).


  • [1] F. M. Brochu, J. Brunker, J. Joseph, M. R. Tomaszewski, S. Morscher, and S. E. Bohndiek (2017) Towards quantitative evaluation of tissue absorption coefficients using light fluence correction in optoacoustic tomography. IEEE Transactions on Medical Imaging 36 (1), pp. 322–331. External Links: Document Cited by: §1.
  • [2] R. Butler, P. T. Lavin, F. L. Tucker, L. D. Barke, M. Böhm-Vélez, S. Destounis, S. R. Grobmyer, J. Katzen, K. A. Kist, E. V. Makariou, K. J. Schilling, C. A. Young, B. E. Dogan, and E. I. Neuschler (2018) Optoacoustic breast imaging: imaging-pathology correlation of optoacoustic features in benign and malignant breast masses. American Journal of Roentgenology 211 (5), pp. 1155–1170. Note: PMID: 30106610 External Links: Document, Link, https://doi.org/10.2214/AJR.17.18435 Cited by: §1.
  • [3] N. Davoudi, X. L. Deán-Ben, and D. Razansky (2019-10-01) Deep learning optoacoustic tomography with sparse data. Nature Machine Intelligence 1 (10), pp. 453–460. External Links: ISSN 2522-5839, Document, Link Cited by: §1, Table 1.
  • [4] N. Davoudi, B. Lafci, A. Özbek, X. L. Deán-Ben, and D. Razansky (2021-07) Deep learning of image- and time-domain data enhances the visibility of structures in optoacoustic tomography. Opt. Lett. 46 (13), pp. 3029–3032. External Links: Link, Document Cited by: §1.
  • [5] X. L. Deán-Ben, V. Ntziachristos, and D. Razansky (2014) Effects of small variations of speed of sound in optoacoustic tomographic imaging. Medical Physics 41 (7), pp. 073301. External Links: Document, Link, https://aapm.onlinelibrary.wiley.com/doi/pdf/10.1118/1.4875691 Cited by: §6.
  • [6] X. L. Deán-Ben and D. Razansky (2021) Optoacoustic imaging of the skin. Experimental Dermatology 30 (11), pp. 1598–1609. External Links: Document, Link, https://onlinelibrary.wiley.com/doi/pdf/10.1111/exd.14386 Cited by: §1.
  • [7] X. L. Deán‐Ben, A. Buehler, V. Ntziachristos, and D. Razansky (2012) Accurate model-based reconstruction algorithm for three-dimensional optoacoustic tomography. IEEE Transactions on Medical Imaging 31, pp. 1922–1928. Cited by: §1.
  • [8] L. Ding, X. L. Deán-Ben, C. Lutzweiler, D. Razansky, and V. Ntziachristos (2015-08) Efficient non-negative constrained model-based inversion in optoacoustic tomography. Physics in Medicine and Biology 60 (17), pp. 6733–6750. External Links: Document, Link Cited by: §5.1.
  • [9] G. Diot, S. Metz, A. Noske, E. Liapis, B. Schroeder, S. V. Ovsepian, R. Meier, E. Rummeny, and V. Ntziachristos (2017-11) Multispectral Optoacoustic Tomography (MSOT) of Human Breast Cancer. Clinical Cancer Research 23 (22), pp. 6912–6922. External Links: ISSN 1078-0432, Document, Link Cited by: §1.
  • [10] J. Gröhl, L. Hacker, B. T. Cox, K. K. Dreher, S. Morscher, A. Rakotondrainibe, F. Varray, L. C.M. Yip, W. C. Vogt, and S. E. Bohndiek (2022) The ipasc data format: a consensus data format for photoacoustic imaging. Photoacoustics 26, pp. 100339. External Links: ISSN 2213-5979, Document, Link Cited by: §1.
  • [11] E. Grüneisen (1912-01) Theorie des festen zustandes einatomiger elemente. Zenodo. External Links: Document, Link Cited by: Appendix A.
  • [12] L. Grünherz, E. Gousopoulos, C. Barbon, S. Uyulmaz, B. Lafci, D. Razansky, A. Boss, P. Giovanoli, and N. Lindenblatt (2022) Preoperative mapping of lymphatic vessels by multispectral optoacoustic tomography. Lymphatic Research and Biology 0 (0), pp. null. Note: PMID: 35230197 External Links: Document, Link, https://doi.org/10.1089/lrb.2021.0067 Cited by: §1.
  • [13] S. Guan, A. A. Khan, S. Sikdar, and P. V. Chitnis (2020-05-22) Limited-view and sparse photoacoustic tomography for neuroimaging with deep learning. Scientific Reports 10 (1), pp. 8510. External Links: ISSN 2045-2322, Document, Link Cited by: §1.
  • [14] V. Gupta and V. K. Sharma (2019) Skin typing: fitzpatrick grading and others. Clinics in Dermatology 37 (5), pp. 430–436. Note: The Color of Skin External Links: ISSN 0738-081X, Document, Link Cited by: Figure 6, Appendix B, §3.
  • [15] S. Huang, A. Blutke, A. Feuchtinger, U. Klemm, R. Zachariah Tom, S. M. Hofmann, A. C. Stiel, and V. Ntziachristos (2021) Functional multispectral optoacoustic tomography imaging of hepatic steatosis development in mice. EMBO Molecular Medicine 13 (9), pp. e13490. External Links: Document, Link, https://www.embopress.org/doi/pdf/10.15252/emmm.202013490 Cited by: Table 1.
  • [16] S. Ioffe and C. Szegedy (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pp. 448–456. Cited by: §5.
  • [17] J. A. Jensen (2007) Medical ultrasound imaging. Progress in Biophysics and Molecular Biology 93 (1), pp. 153–165. Note: Effects of ultrasound and infrasound relevant to human health External Links: ISSN 0079-6107, Document, Link Cited by: §2.4.
  • [18] A. Karlas, M. Kallmayer, M. Bariotakis, N. Fasoula, E. Liapis, F. Hyafil, J. Pelisek, M. Wildgruber, H. Eckstein, and V. Ntziachristos (2021) Multispectral optoacoustic tomography of lipid and hemoglobin contrast in human carotid atherosclerosis. Photoacoustics 23, pp. 100283. External Links: ISSN 2213-5979, Document, Link Cited by: §1.
  • [19] A. Klimovskaia, B. Lafci, F. Ozdemir, N. Davoudi, X. L. Dean-Ben, F. Perez-Cruz, and D. Razansky (2022) Signal domain learning approach for optoacoustic image reconstruction from limited view data. In Medical Imaging with Deep Learning, External Links: Link Cited by: §1.
  • [20] F. Knieling, C. Neufert, A. Hartmann, J. Claussen, A. Urich, C. Egger, M. Vetter, S. Fischer, L. Pfeifer, A. Hagel, C. Kielisch, R. S. Görtz, D. Wildner, M. Engel, J. Röther, W. Uter, J. Siebler, R. Atreya, W. Rascher, D. Strobel, M. F. Neurath, and M. J. Waldner (2017) Multispectral optoacoustic tomography for assessment of crohn’s disease activity. New England Journal of Medicine 376 (13), pp. 1292–1294. Note: PMID: 28355498 External Links: Document, Link, https://doi.org/10.1056/NEJMc1612455 Cited by: §1.
  • [21] B. Lafci, E. Merčep, J. L. Herraiz, X. L. Deán-Ben, and D. Razansky (2020) Noninvasive multiparametric characterization of mammary tumors with transmission-reflection optoacoustic ultrasound. Neoplasia 22 (12), pp. 770–777. External Links: ISSN 1476-5586, Document, Link Cited by: §1.
  • [22] B. Lafci, E. Merčep, S. Morscher, X. L. Deán-Ben, and D. Razansky (2021) Deep learning for automatic segmentation of hybrid optoacoustic ultrasound (opus) images. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control 68 (3), pp. 688–696. External Links: Document Cited by: §1, §1.
  • [23] H. Lan, D. Jiang, C. Yang, and F. Gao (2019) Y-net: a hybrid deep learning reconstruction framework for photoacoustic imaging in vivo. arXiv. External Links: Document, Link Cited by: §1.
  • [24] R. D. Luca, T. Dattoma, L. Forzoni, J. Bamber, P. Palchetti, and A. Gubbini (2018) Diagnostic ultrasound probes: a typology and overview of technologies. Current Directions in Biomedical Engineering 4 (1), pp. 49–53. External Links: Document, Link Cited by: §2.4.
  • [25] S. Mandal, X. L. Deán-Ben, and D. Razansky (2016) Visual quality enhancement in optoacoustic tomography using active contour segmentation priors. IEEE Transactions on Medical Imaging 35 (10), pp. 2209–2217. External Links: Document Cited by: §1.
  • [26] S. Manohar and M. Dantuma (2019) Current and future trends in photoacoustic breast imaging. Photoacoustics 16, pp. 100134. External Links: ISSN 2213-5979, Document, Link Cited by: §1.
  • [27] E. Merčep, N. C. Burton, J. Claussen, and D. Razansky (2015-10) Whole-body live mouse imaging by hybrid reflection-mode ultrasound and optoacoustic tomography. Opt. Lett. 40 (20), pp. 4643–4646. External Links: Link, Document Cited by: §1.
  • [28] E. Merčep, X. L. Deán-Ben, and D. Razansky (2017) Combined pulse-echo ultrasound and multispectral optoacoustic tomography with a multi-segment detector array. IEEE Transactions on Medical Imaging 36 (10), pp. 2129–2137. External Links: Document Cited by: §2.4.
  • [29] O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz, et al. (2018) Attention u-net: learning where to look for the pancreas. arXiv preprint arXiv:1804.03999. Cited by: Figure 7, §5.
  • [30] A. Ozbek, X. L. Deán-Ben, and D. Razansky (2013) Realtime parallel back-projection algorithm for three-dimensional optoacoustic imaging devices. In Opto-Acoustic Methods and Applications, V. Ntziachristos and C. P. Lin (Eds.), Vol. 8800, pp. 90 – 95. External Links: Document, Link Cited by: §2.2.
  • [31] A. Özbek, X. L. Deán-Ben, and D. Razansky (2018-07) Optoacoustic imaging at kilohertz volumetric frame rates. Optica 5 (7), pp. 857–863. External Links: Link, Document Cited by: §1, §1.
  • [32] Ç. Özsoy, A. Özbek, M. Reiss, X. L. Deán-Ben, and D. Razansky (2021) Ultrafast four-dimensional imaging of cardiac mechanical wave propagation with sparse optoacoustic sensing. Proceedings of the National Academy of Sciences 118 (45), pp. e2103979118. External Links: Document, Link, https://www.pnas.org/doi/pdf/10.1073/pnas.2103979118 Cited by: §1.
  • [33] V. Perekatova, P. Subochev, M. Kleshnin, and I. Turchin (2016-10) Optimal wavelengths for optoacoustic measurements of blood oxygen saturation in biological tissues. Biomed. Opt. Express 7 (10), pp. 3979–3995. External Links: Link, Document Cited by: §3.1.
  • [34] J. Robin, R. Rau, B. Lafci, A. Schroeter, M. Reiss, X. Deán-Ben, O. Goksel, and D. Razansky (2021) Hemodynamic response to sensory stimulation in mice: comparison between functional ultrasound and optoacoustic imaging. NeuroImage 237, pp. 118111. External Links: ISSN 1053-8119, Document, Link Cited by: §1.
  • [35] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §5.
  • [36] M. Schellenberg, K. K. Dreher, N. Holzwarth, F. Isensee, A. Reinke, N. Schreck, A. Seitel, M. D. Tizabi, L. Maier-Hein, and J. Gröhl (2022) Semantic segmentation of multispectral photoacoustic images using deep learning. Photoacoustics 26, pp. 100341. External Links: ISSN 2213-5979, Document, Link Cited by: §1.
  • [37] A. Sharma, S. Srishti, V. Periyasamy, and M. Pramanik (2019) Photoacoustic imaging depth comparison at 532-, 800-, and 1064-nm wavelengths: Monte Carlo simulation and experimental validation. Journal of Biomedical Optics 24 (12), pp. 1 – 10. External Links: Document, Link Cited by: §3.2.
  • [38] I. Steinberg, D. M. Huland, O. Vermesh, H. E. Frostig, W. S. Tummers, and S. S. Gambhir (2019-06-08) Photoacoustic clinical imaging. Photoacoustics 14, pp. 77–98 (eng). Note: 31293884[pmid] External Links: ISSN 2213-5979, Document, Link, Link Cited by: §1.
  • [39] J. L. Su, B. Wang, K. E. Wilson, C. L. Bayer, Y. Chen, S. Kim, K. A. Homan, and S. Y. Emelianov (2010-11-01) Advances in clinical and biomedical applications of photoacoustic imaging. Expert opinion on medical diagnostics 4 (6), pp. 497–510 (eng). Note: 21344060[pmid] External Links: ISSN 1753-0067, Document, Link, Link Cited by: §1.
  • [40] S. Tzoumas and V. Ntziachristos (2017-11-28) Spectral unmixing techniques for optoacoustic imaging of tissue pathophysiology. Philosophical transactions. Series A, Mathematical, physical, and engineering sciences 375 (2107), pp. 20170262 (eng). Note: 29038385[pmid] External Links: ISSN 1471-2962, Document, Link Cited by: §1.
  • [41] V. Vishnevskiy, R. Rau, and O. Goksel (2019) Deep variational networks with exponential weighting for learning computed tomography. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2019, D. Shen, T. Liu, T. M. Peters, L. H. Staib, C. Essert, S. Zhou, P. Yap, and A. Khan (Eds.), Cham, pp. 310–318. External Links: ISBN 978-3-030-32226-7 Cited by: §6.

Appendix A Optoacoustic wave equations

OA imaging is based on thermoelastic expansion of the tissues which result in propagation of pressure waves in imaging medium depending on spatial and temporal changes. The OA wave equation can be written as follows

where and are the spatial and temporal variables, respectively. is the Grüneisen constant [11]. stands for speed of sound. is the absorbed energy field based on the location and time of the sample. stands for temporal laser light intensity change based on the illumination. represents the pressure wave dependent on spatial and temporal variables. The Poisson solution of OA wave equation for pressure wave can be written as

where is the time dependent spherical surface defined by . This equation represents OA forward model which inverse problem of reconstruction can be derived. OA images are reconstructed by absorbed energy field at specific location based on measured pressure waves. is calculated from detected pressure waves at the surface S as follows


The constants at this equation can be omitted. After omitting the constants in the formula, the equation is discretized as

where is the -th point on the defined reconstruction grid, is the position of -th transducer and . In summary, the equation calculates the distance between a point on the defined grid and an element of the transducer array. Then, it finds the corresponding wave intensity in signal (or defined as surface) based on the time of flight calculated using speed of sound in the imaging medium.

Appendix B Fitzpatrick skin phototype in experimental datasets

Figure 6: Fitzpatrick skin phototype [14] distribution of volunteers in datasets.

Fitzpatrick skin phototype is a metric to quantify the amount of melanin pigment in the skin of a subject [14]. The metric ranges from one to six, going from pale white skin to black skin color. This is relevant for various OA applications due to different skin types (melanin concentration) lead to varying amount of contrast (absorption) at skin surfaces in the acquired images. Accordingly, the distribution of the skin types of the volunteers across SWFD and MSFD are shown in Fig. 6.

Appendix C Transducer array details

Semi circle: The array contains 256 piezocomposite transducers distributed over a semi circle (concave surface) equidistantly with the radius of 40 mm (Fig. 1a, main manuscript). The single transducer elements have dimensions of 0.37 mm × 15 mm with inter-element distance of 0.10 mm. This configuration of transducer elements results in cylindrical (toroidal) focusing at 38 mm (close to the center of the array). The central peak frequency of array is 5 MHz with 60% bandwidth at -6dB.
Multisegment: The array is formed by the combination of a linear detector array and concave parts on the right and left sides as shown in Fig. 1b (in the main manuscript). The linear part contains 128 elements distributed on a linear surface with inter-element pitch size of 0.25 mm. Both of the concave parts include 64 elements which make the total number of elements equal to 256 (128 linear + 128 concave). The inter-element pitch size of concave part is 0.6 mm with 40 mm radius of curvature. The height of all elements are equal to 10 mm. Concave parts are designed to increase angular coverage in OA imaging. This configuration results in a cylindrical focusing at 38 mm close to the center of the array. The array has 7.5 MHz central frequency with 70% bandwidth at -6 dB.
Linear array: The array is central part of the multisegment array with 128 transducer elements distributed over a line with pitch size of 0.25 mm (Fig. 1b, main manuscript). Similar to concave parts, the linear array has 7.5 MHz central frequency with 70% bandwidth at -6 dB. The linear array is optimized for US data acquisitions with planar waves. Hence, the array produces OA images with limited view artifacts due to reduced angular coverage which is a limiting factor for OA image acquisitions.
Virtual circle: The array is generated to simulate images with 360 degree angular coverage which results in artifact free reconstructions (Fig. 2a, main manuscript). It contains 1,024 transducer elements distributed over a full circle with equal distance. The radius of the transducer array is kept equal to semi circle array (40 mm) to allow comparison between simulations and experimental acquisitions.

Appendix D Architecture and implementation details

Figure 7: Schematic of the proposed modUNet architecture. CB represents the residual 2D convolutional block with batch normalization shown in Fig. 3 (main manuscript) where each convolution has filters. Other abbreviations correspond to 2D-maxpooling (MP) of poolsize 2, 2D bilinear upsampling (UP) by a factor of 2, concatenation (CAT), and attention gates [29] (AG). Finally, c1- represents a convolutional layer of filters (1 for image translation and 3 for semantic segmentation experiments) and kernels without activation.

We show the schematic of the proposed modUNet architecture in Fig. 7.

We use categorical cross entropy loss and mean squared error loss for segmentation and image translation experiments, respectively. We add an additional regularization weight for each learned model parameter.

For all experiments, we use Adam optimizer and scale the learning rate with exponential decay (decay rate of 0.98 and decay steps of 1,000) from a peak of

following a linear warmup of 10,000 steps. All models are trained for 150 epochs with a mini-batch size of 25 and best validation set loss is used as the early stopping criteria for which test set performance metrics are presented. For each experiment, we used an Nvidia P100 or Titan X GPU with 12G memory to train the model. We do not augment data during training. Experiments on MSFD, SWFD, and SCD took about, 61, 32 and 15 hours, respectively. The training time variation across datasets is due to the difference in size of the datasets and training for a fixed amount of 150 epochs.

Appendix E Qualitative results

Despite the impressive performance of the modUNet, there is a long tail in almost all taks for SSIM or IoU distributions as seen in Figs. 4 and 5 (main manuscript). Accordingly, we qualitatively examine some of the worst samples at the bottom of these tails for each task.

e.1 Image translation

We showcase and analyze a selection of the worst performing samples from the most challenging limited view and sparse sampling reconstruction experiments on SWFD and report SSIM for these samples with respect to the target sample.

In Figs. 8 & 9 (experiments and ) one can see that the two worst samples based on SSIM correspond to images with little to no signal content. All samples in Fig. 10 and the remaining samples from 1st and 5th percentiles in Fig. 8 suggest that low SSIM in multisegment array images can be deceiving due to different set of artifacts that appear on the target (multisegment) array. For example, despite overall low SSIM for the samples, all of the vessels that we can identify are reconstructed with corrected geometry, also shown with red arrows. In Fig. 9, where the target is the semi circle array, the artifacts are not as intense. Nevertheless, given the accurate vessel geometry correction, low SSIMs in 1st and 5th percentiles can be attributed to other local noise patterns, such as the circular noise that exist on the target image. Thanks to the majority of samples in our datasets having high signal-to-noise ratio (SNR), most of the undesirable artifacts that seldomly occur in test samples are removed.

In Fig. 11, we show some of the samples with the worst SSIM from the experiment . The worst two samples have very poor SNR. For all other samples, modUNet predictions from the heavily subsampled image reconstructions indicate higher SNR then the target samples thanks to removing most of the noise pattern (especially visible within the water medium prior to/above the forearm) while keeping all soft tissue structures intact and faithful to their counterparts in the target image. Unfortunately, mismatch of the noise patterns cause low SSIM for such samples with low SNR target images.

For completeness, we show the worst performing samples for SSIM in image translation experiments using SCD in Fig. 12. One can see that the rounded SSIM already reaches 1.0 by the 2nd worst performing sample in the test set. In addition, we also showcase some of the best performing samples for SSIM in Figs. 131415 & 16.

In addition to correcting distorted vessel geometry, it can be seen in some samples that certain vessels that are barely visible get accurately redrawn to match the target (e.g., Fig. 10 1st, 2nd and 3rd columns, Fig. 11 3rd column, Fig. 14 2nd column and Fig. 15 2nd column).

Figure 8: We showcase the worst (1st column), 2nd worst (2nd column), 1st- (3rd column), and 5th-percentile (4th column) SSIM samples based on modUNet predictions for experiment with input (1st row), modUNet prediction (2nd row), and target (3rd sample) pairs. Red arrows indicate some of the distorted vessel geometries at input getting corrected at modUNet predictions.
Figure 9: We showcase the worst (1st column), 2nd worst (2nd column), 1st- (3rd column), and 5th-percentile (4th column) SSIM samples based on modUNet predictions for experiment with input (1st row), modUNet prediction (2nd row), and target (3rd sample) pairs. Red arrows indicate some of the distorted vessel geometries at input getting corrected at modUNet predictions.
Figure 10: We showcase the worst (1st column), 2nd worst (2nd column), 1st- (3rd column), and 5th-percentile (4th column) SSIM samples based on modUNet predictions for experiment with input (1st row), modUNet prediction (2nd row), and target (3rd sample) pairs. Red arrows indicate some of the distorted vessel geometries at input getting corrected at modUNet predictions.
Figure 11: We showcase the worst (1st column), 2nd worst (2nd column), 1st- (3rd column), and 5th-percentile (4th column) SSIM samples based on modUNet predictions for experiment with input (1st row), modUNet prediction (2nd row), and target (3rd sample) pairs.
Figure 12: We showcase the worst (1st column), 2nd worst (2nd column), 1st- (3rd column), and 5th-percentile (4th column) SSIM samples based on modUNet predictions for experiments (left) and (right) with input (1st row), modUNet prediction (2nd row), and target (3rd sample) pairs.
Figure 13: We showcase 95th- (1st column), 98th- (2nd column), 99th-percentile (3rd column), and the best (4th column) SSIM samples based on modUNet predictions for experiment with input (1st row), modUNet prediction (2nd row), and target (3rd sample) pairs.
Figure 14: We showcase 95th- (1st column), 98th- (2nd column), 99th-percentile (3rd column), and the best (4th column) SSIM samples based on modUNet predictions for experiment with input (1st row), modUNet prediction (2nd row), and target (3rd sample) pairs. Red arrows indicate some of the distorted vessel geometries at input getting corrected at modUNet predictions.
Figure 15: We showcase 95th- (1st column), 98th- (2nd column), 99th-percentile (3rd column), and the best (4th column) SSIM samples based on modUNet predictions for experiment with input (1st row), modUNet prediction (2nd row), and target (3rd sample) pairs. Red arrows indicate some of the distorted vessel geometries at input getting corrected at modUNet predictions.
Figure 16: We showcase 95th- (1st column), 98th- (2nd column), 99th-percentile (3rd column), and the best (4th column) SSIM samples based on modUNet predictions for experiment with input (1st row), modUNet prediction (2nd row), and target (3rd sample) pairs.

e.2 Semantic segmentation

Similar to image translation task, we look at some of the samples with the worst vessel segmentation performance with respect to IoU metric in Figs. 17 & 18 (experiments and ). One can see that modUNet achieves perfect segmentation (i.e., IoU of 1.0) already at the 5th percentile, also visible thanks to the distribution tails being significantly narrower (see Fig. 5 in the main manuscript). As a result of SCD having flawless ground truth annotations, modUNet can achieve almost perfect segmentation performance for all classes in all experiments of both tasks. The decrease of IoU seems to be exclusive for under- or over-segmenting little vessels by a handful of pixels. Despite most vessels being small (see Fig. 2d in the main manuscript, distribution of number of pixels per vessel), low IoU is limited to only a handful of extreme cases. This is also corroborated by the fact that skin curve segmentation distribution (see Fig. 5 right in the main manuscript) having a significantly shorter tail, barely falling below an IoU of 0.95. Another interesting finding is that despite subsampled reconstructions having intense duplicates of skin curve (Fig. 18 first row), modUNet never fails to segment the right curve.

Figure 17: We showcase the worst (1st column), 2nd worst (2nd column), 1st- (3rd column), and 5th-percentile (4th column) vessel IoU samples based on modUNet predictions for experiment with input (1st row), modUNet prediction (2nd row), and grount truth (3rd sample) pairs.
Figure 18: We showcase the worst (1st column), 2nd worst (2nd column), 1st- (3rd column), and 5th-percentile (4th column) vessel IoU samples based on modUNet predictions for experiment with input (1st row), modUNet prediction (2nd row), and grount truth (3rd sample) pairs.

Appendix F Simulated cylinder dataset generation

Our proposed simulated dataset (SCD) follows a group of heuristics we derived from observing experimental images. For any given sample, we first generate an acoustic pressure map, from which we also generate ground truth annotations. We then apply certain post processing steps to imitate other phenomena such as patterns under skin surface and different vessel textures. Given the geometry of the transducers we want to simulate, we then apply forward transform that gives the raw signals. Finally, we reconstruct these signals using backprojection algorithm to generate the images used in various SCD experiments in this manuscript.

The acoustic pressure map generation consists of initially drawing the curve that represents the laser pulse absorption on the skin surface mainly due to melanin. Given that experimental data is acquired with making sure that forearm is roughly at a certain distance range from the arrays, we also limit the drawn skin curve distance. We define the skin curve as a 2nd degree polynomial that is fitted to 3 points randomly sampled at the two horizontal edges and the center of the image at varying heights. As a post-processing step to mimic experimental data, the curve is first smoothed with a Gaussian filter. Then an exponential decay of randomized length is applied under the curve along vertical axis. Finally, a non-structured uniform normal noise is multiplied with the aforementioned exponential decay region. For vessel generation, the number of cylinders to be drawn is sampled based on a coin flip. Based on the outcome, either 2 cylinders drawn or the number of cylinders is sampled from Poisson distribution (see Fig. 

2d, distribution of number of vessels per image, main manuscript). Each vessel is initially represented by a cylinder orthogonal to the image plane (z-axis) with a randomly sampled radius. We then randomly rotate the cylinder around x- and y- axes. The vessel is determined as the cross-section of the cylinder at the imaging plane, yielding ellipses based on the final angle of the cylinders. As a post-processing step, we flip a coin to determine whether vessel has a homogeneous intensity profile or has a linearly decreasing intensity from its center. We then apply a Gaussian filter on the vessel to smooth its edges. Finally, based on a coin flip, we decide whether or not to multiply the intensity profile of the vessel with uniform normal noise. Same process is iteratively repeated until desired number of non-overlapping vessels are generated. All parameters used for the aforementioned steps are empirically selected based on our observation of the experimental datasets. We provide the script to simulate acoustic pressure map that we used for SCD in the oa-armsim repository.

Appendix G Experimental data acquisition setup

Signal acquisition is performed with OA imaging setup that combines four main components, namely, transducer arrays, nanosecond pulsed lasers, data acquisition system and workstation PC (Fig. 1, main manuscript). SWFD dataset was acquired with multisegment and semi circle transducer arrays (Sec. 2.1, main manuscript) using a nanosecond laser at 1,064 nm with repetition rate of 10 Hz. MSFD dataset was acquired with only multisegment array at six different wavelengths (700, 730, 760, 780, 800, 850 nm) using the laser with repetition rate of 50 Hz. The increased repetition rate guarantees that displacement within the scene between the frames of different wavelengths are minimal. Data acquisition system is used to digitize signals acquired by the transducer arrays. A sampling rate of 40 mega-samples-per-second was used in all experiments. Then, digital signals are sent to workstation PC to store the data and display both raw signals and reconstructed images in real time with lower resolution. The high resolution images presented in this work were reconstructed after the acquisition (offline) using the backprojection method (Sec. 2.2, main manuscript). The real time feedback helps to position the transducer arrays. The workstation PC also synchronizes all the imaging setup components by setting delays for data acquisition system and triggers for laser pulses. Imaging medium was filled with water to increase coupling efficiency between the transducer arrays and the skin. Water has low attenuation at the wavelengths used in this study. Hence, the signal attenuation in the medium was low enough to be neglected. Transducer arrays are held orthogonal to the forearm surface throughout acquisition and swiftly moved from elbow towards wrist. All participants joined the experiments voluntarily and were informed about details of the experiments.

Appendix H OA reconstruction library - pyoat

A Python package called “pyoat” is presented with the datasets to reconstruct images from raw signals (pyoat repository). The library uses “pip” package manager to install and use the functions. The package provides functions to bandpass filter and normalize raw signals as preprocessing step. Implementation of backprojection algorithm is also included in the package (Sec. 2.2, main manuscript). The forward model operator is implemented to simulate signals from acoustic pressure maps. Data readers and savers are integrated into the package for loading the raw data and saving the reconstructed images, respectively. The package provides examples to use different functions. The examples are easy to use Python scripts which require only raw signal data paths as input. Positions of elements in all transducer arrays used in this study are included in the library to enable reconstruction from raw signals.

Appendix I Organization of datasets

All datasets are stored as HDF5 files and corresponding indices across datasets within an HDF5 file have the same scene as raw signal or reconstructed image. Similarly, metadata such as patientID, side (i.e., left vs right arm), skin type, and sliceID contain the information for the corresponding indices. Some of these metadata such as patientID, which corresponds to the unique anonymized identifier of the volunteer, only exist in experimental datasets. Matching identifiers across experimental datasets (MSFD and SWFD) correspond to the same volunteer.

Each sample (raw signal or image reconstruction) is chunked in a single piece for optimal compression/decompression overhead when reading individual images and storing datasets. For example, for a given HDF5 dataset of shape (number of instances, height, width), each block of (1, height, width) is compressed individually. In our experiments, this ensured zero additional idle GPU time reading random training sample from a compressed HDF5 file when compared with its non-compressed HDF5 counterpart.

Raw signal datasets have the shape of (number of instances, temporal acquisition axis, receiving element axis). Temporal acquisition axis is fixed to 2,030 sampling points across all transducer arrays. Receiving element axis depends on the number of transducer elements that are used for signal acquisition. For sparse sampled and limited view reconstructions, we retain the shape of the raw signal data, however, we switch off the receiving elements that are not actively receiving, hence they are padded with zeros. This convention also allows for direct compatibility with the

pyoat OA reconstruction package.

We provide a header dataset file “OADAT.h5” as a convenient access point to each other HDF5 dataset file. This header file also contains metadata module of the Dataset Nutrition Label, which should help making the dataset self explanatory. For more information on the datasets in general, consult the Dataset Documentation, provided as a separate supplementary material.

i.1 Msfd

Contents of MSFD is listed in Table 5. The contents of the dataset are stored in the file “MSFD_multisegment_RawBP.h5”. Provided sliceID corresponds to the time index of the slice acquired from a given volunteer for a given acquisition. For example, sliceID is recorded right after for a given patientID and side. This information can be relevant for future work that does not treat each slice independently, but exploit correlations from consecutive slice acquisitions. Nevertheless, a sliceID , does not necessarily correspond to the same position on the forearm across the volunteers.

Name shape content
patientID (25200,)
side (25200,)
skin_type (25200,)
sliceID (25200,)
linear_BP_w700 (25200, 256, 256) image reconstructions
linear_BP_w730 (25200, 256, 256) image reconstructions
linear_BP_w760 (25200, 256, 256) image reconstructions
linear_BP_w780 (25200, 256, 256) image reconstructions
linear_BP_w800 (25200, 256, 256) image reconstructions
linear_BP_w850 (25200, 256, 256) image reconstructions
ms_BP_w700 (25200, 256, 256) image reconstructions
ms_BP_w730 (25200, 256, 256) image reconstructions
ms_BP_w760 (25200, 256, 256) image reconstructions
ms_BP_w780 (25200, 256, 256) image reconstructions
ms_BP_w800 (25200, 256, 256) image reconstructions
ms_BP_w850 (25200, 256, 256) image reconstructions
linear_raw_w700 (25200, 2030, 256) raw signals
linear_raw_w730 (25200, 2030, 256) raw signals
linear_raw_w760 (25200, 2030, 256) raw signals
linear_raw_w780 (25200, 2030, 256) raw signals
linear_raw_w800 (25200, 2030, 256) raw signals
linear_raw_w850 (25200, 2030, 256) raw signals
ms_raw_w700 (25200, 2030, 256) raw signals
ms_raw_w730 (25200, 2030, 256) raw signals
ms_raw_w760 (25200, 2030, 256) raw signals
ms_raw_w780 (25200, 2030, 256) raw signals
ms_raw_w800 (25200, 2030, 256) raw signals
ms_raw_w850 (25200, 2030, 256) raw signals
Table 5: Contents of the MSFD file.

i.2 Swfd

Contents of SWFD is split into two separate HDF5 files with contents listed in Tables 6 & 7 (“SWFD_semicircle_RawBP.h5” and “SWFD_multisegment_RawBP.h5”). Despite having the same number of instances per dataset category across the two files, the indices are not paired across semi circle and multisegment array. Hence we decided to split the dataset into two files. As an artifact from discarding first few slices to filter out low SNR instances (both in MSFD and SWFD), SWFD acquisitions have 1,401 slices, having sliceID in the range , yielding ,, samples per dataset category as opposed to 39,200.

Name shape content
patientID (39228,)
side (39228,)
skin_type (39228,)
sliceID (39228,)
sc,lv128_BP (39228, 256, 256) image reconstructions
sc,ss128_BP (39228, 256, 256) image reconstructions
sc,ss32_BP (39228, 256, 256) image reconstructions
sc,ss64_BP (39228, 256, 256) image reconstructions
sc_BP (39228, 256, 256) image reconstructions
sc,lv128_raw (39228, 2030, 256) raw signals
sc,ss128_raw (39228, 2030, 256) raw signals
sc,ss32_raw (39228, 2030, 256) raw signals
sc,ss64_raw (39228, 2030, 256) raw signals
sc_raw (39228, 2030, 256) raw signals
Table 6: Contents of SWFD semi circle file.
Name shape content
patientID (39228,)
side (39228,)
skin_type (39228,)
sliceID (39228,)
linear_BP (39228, 256, 256) image reconstructions
ms_BP (39228, 256, 256) image reconstructions
linear_raw (39228, 2030, 256) raw signals
ms_raw (39228, 2030, 256) raw signals
Table 7: Contents of SWFD multisegment file.

i.3 Scd

Contents of SCD is listed in Table 8. Contents of SCD are stored in the file “SCD_RawBP.h5”. Different than the experimental datasets, SCD also contains the ground truth acoustic pressure map and annotations (labels). Annotations are encoded as : background, : vessel, and : skin curve.

Name shape content
sliceID (20000,)
ground_truth (20000, 256, 256) acoustic pressure map
labels (20000, 256, 256)
linear_BP (20000, 256, 256) image reconstructions
ms_BP (20000, 256, 256) image reconstructions
vc,lv128_BP (20000, 256, 256) image reconstructions
vc,ss128_BP (20000, 256, 256) image reconstructions
vc,ss32_BP (20000, 256, 256) image reconstructions
vc,ss64_BP (20000, 256, 256) image reconstructions
vc_BP (20000, 256, 256) image reconstructions
linear_raw (20000, 2030, 256) raw signals
ms_raw (20000, 2030, 256) raw signals
vc,lv128_raw (20000, 2030, 1024) raw signals
vc,ss128_raw (20000, 2030, 1024) raw signals
vc,ss32_raw (20000, 2030, 1024) raw signals
vc,ss64_raw (20000, 2030, 1024) raw signals
vc_raw (20000, 2030, 1024) raw signals
Table 8: Contents of SCD file.

Appendix J Considerations when using our datasets

As these volunteers in experimental datasets are considered to be healthy, anonymization is done one way, and true identity of volunteers are not possible to trace. Furthermore, to the best of our knowledge, usage of our proposed datasets cannot pose threat to the volunteers, even with malicious intent. However, as with all clinical data, it should be acknowledged that our dataset would represent a particular subset of all potential forearm images collected at respective wavelengths and transducer arrays. Accordingly, any subsequent work that makes use of our datasets for validation purposes need to ensure that the diversity of our datasets is sufficient for their target application. For more information, consult the Dataset Documentation, provided as a separate supplementary material.

Appendix K Persistence of proposed datasets

All three datasets are hosted on Libdrive at ETH Zurich Research Collection repository. Accordingly, the datasets have a landing page with DOI and metadata. Data will be freely accessible with no restriction. As per ETH Zurich Research Collection documentation, the dataset is guaranteed to have a retention period for 10 years. Based on the frequency of usage by the community, the Research Collection may continue to host the data beyond the 10 year period.

Appendix L Author statement on data license

The dataset is licensed under Creative Commons Attribution-NonCommercial 4.0 International (CC-BY-NC).

Appendix M Training and evaluation code

In oadat-evaluate repository, along with saved model weights, we provide all necessary scripts to train modUNet from scratch for all 18 experiments as well as evaluating them over the whole test set. We provide standalone examples on the repository landing page to show how to;
(i) load a pretrained model, for example, to do inference or finetuning it,
(ii) train modUNet from scratch for either of the two tasks for a given experiment,

(iii) evaluate a serialized model, whether it is one of the pretrained models we provide as is, or any other Tensorflow model that is already serialized, and

(iv) use our provided data loader to read any sample from OADAT.