1 Introduction
Ultrasound (US) is one of the most widely adopted medical imaging technologies for its nonionizing, affordable, portable, and realtime nature. However, US imaging involves challenges with probe navigation and image interpretation, thus requires comprehensive training. Traditional training with synthetic phantoms is often unrealistic; and training with cadavers or patients involves ethical issues. In contrast, modelbased numerical US simulation can allow for training in virtual reality, enabling exposure to complex medical scenarios and rare pathologies. Thus, developing US echography simulators which can produce images indistinguishable from real ones in realtime has been a major research interest.
Wavebased methods [1] for US simulation model complex fullwave propagation, which is not suitable for realtime image synthesis. Convolutionbased approaches [2] model the entire pulse transmit and echo beamforming pipeline with a spatiallyvarying PointSpread Function (PSF), the convolution of which, with a particular tissue representation, then yields the desired image. Raybased methods [3] can, in addition, compute the incident acoustic power by tracing the complex reflections and refractions which the wavefront undergoes, during largescale interactions with geometrical boundaries of anatomical structures.
US speckle texture is the interference pattern resulting from echos scattered by uncountably many subwavelength tissue structures, such as cell nuclei, organelles and large proteins, often referred to as “scatterers”. Faithful simulation of US texture is crucial, as it carries tissuespecific descriptive and diagnostic information. For a realistic texture appearance, however, a viable tissue (scatterer) representation is necessitated. This paper focuses on extracting such scatterer maps from given image examples.
Although a scatterer representation can be defined as a generative model parametrized by a statistical (e.g. Gaussian) distribution
[4], arbitrary tissue textures especially with structural information would not necessarily follow welldefined parametric models. Even if this were valid, e.g. for simple homogeneous tissue patches, finding a realistic parameter instantiation by trialanderror would be tedious. An estimated scatterer map that we aim herein would help for an automatic setting. Accordingly, in this work we propose to construct a tissue scatterer model given a Bmode image of the region of interest, as illustrated in Fig.
1. In a different context, this is also known as ultrasound deconvolution [5] and it involves the solution of an inverse problem [6]. Stateoftheart methods for this problem use tedious optimization approaches; which, being exponential in problem size, is not extendable to large images or three dimensions.We propose a novel pipeline by formulating the inverse problem of US simulation as an image translation task. To that end, we adopt a learning approach based on Conditional Generative Adversarial Networks [7]. Due to the lack of groundtruth scatterer maps, we employ a synthetically generated training dataset. We utilize CT images to generate artificial scatterer maps, as the Hounsfield units of CT images relate to tissue density and hence acoustic impedance that cause perturbations and scattering during US echo propagation, and capture a broad range of anatomical visual semantics. To the best of our knowledge, this is the first work to tackle the inverse problem of US simulation with a learningbased approach.
2 Background
Forward Ultrasound Simulation. US scattering response from subwavelength particles can be simulated via convolution with a PSF. A set of infinitesimally small points (scatterers) in space can be used as an abstract tissue representation (scatter map). Each scatterer has an amplitude in (0,1], defining the proportion of incident acoustic power that it scatters back. We call this scatter map , where are the lateral, elevational, and axial coordinates. A spatiallyvarying PSF can be estimated from transducer and imaging parameters, and then convolved with to obtain a modulated radiofrequency (RF) signal
(1) 
which indicates an image plane at = resulting from convolution with a 3D PSF, distorted by noise . Field II [2] toolbox allows for computing such PSFs. To generate a grayscale Bmode image , demodulation (envelope extraction) and dynamicrange (logarithmic) compression are applied to .
Inverse Problem of Scatterer Reconstruction. Finding a scatter map given an RF image can be seen in discrete space as the inverse problem of deconvolution [5, 6], i.e. , where is the convolution matrix induced by the PSF,
is a vector of scatterer amplitudes,
is the resulting RF image, and is the noise vector. Given the much higher resolution of the scatterer space w.r.t. the image discretization, the solution becomes underdetermined, i.e. a solution may not faithfully represent the tissue under different imaging settings, such as viewing directions. Mattausch et al. [6] proposed multiple acquisitions for the same fieldofview via electronic beam steering, to obtain multiple equations: , for a better constrained problem:(2) 
where is a regularization parameter. This is solved using the Alternating Direction Method of Multiplies, which yields a robust reconstruction of sparse scatterers on a discrete grid.
3 Methods
We use a learningbased approach to directly map from any Bmode image to its estimated scatter map . Assuming a discrete scattermap representation, we adopt the conditional GAN approach pix2pix [7]
, which learns an imagetoimage translation map, assuming the input and output images are spatially aligned. CGANs learn a mapping
, where is a random input , is an input image used to condition , and is the output image.Training the pix2pix GAN requires a dataset consisting of {Bmode image, scatter map} pairs. Since scatterer maps are abstract constructs, and no groundtruth maps are known for phantom, let alone invivo images, we resort to creating our own paired data via simulations using Field II [2]. The challenge lies in creating a training set rich in visual features for the algorithm to learn from, without overfitting to implicit statistical distributions induced by the simulation process in FieldII, to facilitate generalization to in vivo data.
Training Set Creation. scatterers were instantiated in space uniformly at random. Given the physical simulation domain size, is set for a scatterer density of 20 , in order to ensure fullydeveloped speckle for the given imaging center frequency and settings. We assign each scatterer a random amplitude
sampled from a normal distribution controlled by an image
via and(3) 
The function is controlled via and such that the sampled value
[0,1] with a probability of 90% for all values
. Values outside this range are clipped to [0,1], which slightly distorts the normal distribution of amplitudes. Linear interpolation is used to map discrete image pixel locations to continuous scatterer positions, as input to Field II.
Initially, we had created procedurally as additive or multiplicative combinations of various primitive shapes of different intensities. However, this process proved insufficient for the learning algorithm to generalize well to in vivo images. Based on this preliminary observation, we eventually used a set of invivo CT images, out of which is randomly drawn. This then allowed to naturally present a wide variety of realistic visual features and relate directly to the acoustic impedance that modulate US propagation in tissue.
Network Training.
Our network architecture and optimization follow the original pix2pix approach [7]. A 7layer UNet was used as the generator, and a 4layer patchGAN [7] as the discriminator. The generator was trained to minimize the L1 loss between real and generated scatter images, and the discriminator to maximize the classification accuracy between real and generated scatter images. Optimization alternates between discriminator and generator. Two networks were trained, namely “ScatGAN1” and “ScatGAN3”. The training set is identical for both and consists of Bmode images of tissue regions imaged at , and . ScatGAN1 has only a single Bmode image as input, while ScatGAN3 has a concatenation of three Bmode images of the same tissue (at the 3 angles). Example results from ScatGAN1 are shown in Fig. 3.
4 Experiments and Results
The ultimate goal of a scatterer map is a realistic Bmode image to be simulated from it. Therefore, for evaluations we follow a scheme similar to [6]: We study Bmode image reconstruction errors and similarity by comparing any given Bmode image with its corresponding simulated version from scatterers predicted by ScatGAN. For this, given a Bmode image , ScatGAN network inference is run to obtain a scatter map estimation . This discrete image is then converted to a scatterer point cloud, as described above for Field II scatterer preparation. Subsequently, a Bmode image is simulated using Field II and compared with using various similarity metrics. We use 3 image metrics : Signaltonoise ratio SNR, mean image intensity MII and contrasttonoise ratio CNR, where and denote selected bright and dark regions. These metrics are compared between and , providing normalized errors (incompatibility) defined as . We also compare the histograms (50 bins) of and using the distance.
Numerical experiments. For the in silico tests, we created a numerical phantom with a circular inclusion of 7 mm radius, with an average scatterer intensity of 10% of that of the background. We simulated RF data in Field II for a linear array imaging at 5 MHz center frequency. We rotated the probe around the sample to collect 31 unique Bmode images from different viewing directions at angles , see Fig. 4. In Experiment 1, we run ScatGAN1 separately on each of the 31 Bmode images, to obtain 31 scatter maps. We then simulate each of these at its corresponding US probe angle, and compare the 31 simulated images with the 31 original Bmode images. Note that this is simply to test our upper reconstruction bound, and is almost an “inverse crime” (although Field II and our ScatGAN reconstructions are not necessarily inverse of each). Fig. 5(left) shows no correlation of error metrics with probe angle, indicating that the reconstruction error is invariant across different images of a similar fieldofview. In Experiment 2, we run ScatGAN1 on the Bmode image simulated at probe angle , to obtain a single scatter map. Then, we simulate Bmode images at all 31 angles from this single map. This is a realistic setting, where a single scatterer representation will be derived for a tissue region to simulate arbitrary viewing angles. Note that changing the scatterers at different angles as in Experiment1 creates flickering and discontinuous temporal sequences due to temporal incoherence of independent reconstructions. Error metrics remain relatively similar across the range of probe rotations in Fig. 5(middle), indicating that the reconstruction is robust for simulating images at other angles.
Multiple viewing angles were found to bring robustness with optimization based methods [6]. In Experiment 3, we use ScatGAN3 network (threeimage input), where the output is a single scatterer map that explain these three views together. This scatter map is then used to simulate all 31 Bmode images (). ScatGAN3 did not provide extra robustness over ScatGAN1, see Fig. 5(right) vs. (middle). This is likely due to the capacity of network being reached or more training set being needed with this setting.
Invivo image results. In Experiment 4, we run the network on a real in vivo Bmode image. From this scatterer map, a Bmode image was simulated at the original center frequency of 10 MHz, with successful reconstructions seen in Fig. 6.
Metric  ScatGAN1  ScatRec1  ScatRec7  
mean  max  mean  max  mean  max  
MII [%]  3.9  12.5  34.2  65.0  2.3   
CNR [%]  7.3  11.5  15.1  40.8  2  5 
SNR [%]  2.2  8.1  8  15.2  2  6 
[]  1.85  2.3  4  7  1.2  2 
Runtime [s]  0.3    1380    9900   
Comparison with Related Work. We quantitatively compare our results with that of [6], which performed a similar set of experiments. Their optimizationbased technique is called “ScatRec1” when reconstructing scatter maps from a single Bmode image, and ”ScatRec7” when reconstructing them from 7 images taken at regular beamsteering angles . Comparing the methods with single image input (ScatGAN1,ScatRec1), our pipeline is substantially better for all metrics and is faster, see Table 1. The speed gain mainly comes from the fact that we use a prelearned function to approximate the deconvolution, instead of solving a large system of equations for each frame. Both of the ScatRec methods face difficulties in generalizing to new beamforming angles, resulting in high maximum error. In our method, new beam incidence angles have little impact on the performance. ScatGAN1 performs comparably to ScatRec7, with normalized errors mostly within a few percentage points from each other except CNR. Differences can be attributed partially to the randomness introduced during the dataset generation process, and the limited size of our training set. Our method has the advantage of not requiring the acquisition of 7 views per tissue region, and being faster. This is especially useful when reconstructing large scatter maps of 3D Bmode volumes.
5 Conclusion
We have devised a learningbased pipeline for solving the inverse problem of US image simulation, which delivers a US tissue scatter map given an input Bmode image. Our method has shown to be relatively robust to changes in viewing parameters and US probe settings such as transducer rotation relative to the tissue region of interest. It performs well on synthetic data, and generalizes to in vivo data with relative ease, while remaining orders of magnitude faster at inference time than the stateoftheart optimizationbased alternative [6]. Our fast runtime would make an extension to 3D feasible, which will then enable the determination of realistic input scatter maps for raybased training simulations [3].
References
 [1] MD Verweij, BE Treeby, KWA Van Dongen, L Demi, and A Brahme, “Simulation of ultrasound fields,” Comprehensive Biomedical Physics, pp. 465–499, 2014.
 [2] JA Jensen, “Simulation of advanced ultrasound systems using Field II,” in IEEE ISBI, 2004, pp. 636–639.
 [3] O Mattausch, M Makhinya, and O Goksel, “Realistic ultrasound simulation of complex surface models using interactive MonteCarlo path tracing,” Computer Graphics Forum, vol. 37(1), pp. 202–213, 2018.
 [4] O Mattausch and O Goksel, “Scatterer reconstruction and parametrization of homogeneous tissue for ultrasound image simulation,” in EMBS, 2015, pp. 6350–6353.
 [5] T Taxt, “Threedimensional blind deconvolution of ultrasound images,” IEEE T Ultrason Ferr Freq Cont, vol. 48(4), pp. 867–871, 2001.
 [6] O Mattausch and O Goksel, “Imagebased reconstruction of tissue scatterers using beam steering for ultrasound simulation,” IEEE T Med Imaging, vol. 37, pp. 767, 2018.

[7]
P Isola, JY Zhu, T Zhou, and AA Efros,
“Imagetoimage translation with conditional adversarial networks,”
in IEEE CVPR, 2017, pp. 5967–5976.
Comments
There are no comments yet.