Automatic Frame Selection using CNN in Ultrasound Elastography

02/17/2020 ∙ by Abdelrahman Zayed, et al. ∙ Université de Montréal Concordia University 0

Ultrasound elastography is used to estimate the mechanical properties of the tissue by monitoring its response to an internal or external force. Different levels of deformation are obtained from different tissue types depending on their mechanical properties, where stiffer tissues deform less. Given two radio frequency (RF) frames collected before and after some deformation, we estimate displacement and strain images by comparing the RF frames. The quality of the strain image is dependent on the type of motion that occurs during deformation. In-plane axial motion results in high-quality strain images, whereas out-of-plane motion results in low-quality strain images. In this paper, we introduce a new method using a convolutional neural network (CNN) to determine the suitability of a pair of RF frames for elastography in only 5.4 ms. Our method could also be used to automatically choose the best pair of RF frames, yielding a high-quality strain image. The CNN was trained on 3,818 pairs of RF frames, while testing was done on 986 new unseen pairs, achieving an accuracy of more than 91 data.



There are no comments yet.


page 2

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Ultrasound has numerous applications in the diagnosis and treatment of different diseases. Ultrasound elastography is a branch of ultrasound that studies the mechanical properties in the tissue such as strain. A detailed review of elastography and its clinical applications can be found at [brian_anthony, app1, app2, app4, app5, application].

Ultrasound elastography can be classified into two types of quasi-static or dynamic

[j2011recent]. In the first type, the deformations are very slow and therefore tissue dynamics can be ignored [parker2010imaging, ophir1999elastography, treece2011real]. Freehand quasi-static imaging does not need any additional hardware and as such, is very common (Fig. 1). The second type is dynamic elastography, where waves created by either the imaging system or natural pulsations, caused by for example heartbeats, are tracked. In both types, the response of the tissue to external or internal forces is used to determine its mechanical properties. This is done by obtaining the displacement image, which shows the motion of every sample in the radio frequency (RF) frame during the deformation. We focus on quasi-static freehand strain imaging in this paper, where the strain image is computed by spatially differentiating the deformation field.

In order to be able to estimate the strain image, we need two RF frames collected before and after applying the external force. One of the problems that free-hand ultrasound elastography faces is the difficulty in choosing suitable RF frames to estimate the strain. If the two RF frames are collected from the same plane and the force is purely axial, they will yield a high-quality strain image. Therefore, the operator needs to be an expert in performing the freehand palpation, rendering this technique very user-dependent. To solve this problem and make the data collection procedure independent of the user’s experience, Ranger et al. [3d_brian_anthony] used a 3D camera to track and compensate any undesired motion that could happen during the data collection. Another approach by both Foroughi et al. [foroughi2013freehand] and Rivaz et al. [rivaz2009tracked] depends on external trackers to collect information about the exact location of the RF frame. By doing this, they can find the RF frames that lie in the same plane, so that they can choose a suitable pair according to some cost function. Aalamifar et al. [robot] used a robot for collecting RF frames. They try to estimate a transformation matrix that transforms the RF frames collected from the robot’s tooltip to the ultrasound image frame, using an active echo element.

Although the previously mentioned methods did improve the quality of the strain image, they all need an external device, which complicates the process of data collection and makes it more expensive. Herein, we introduce a novel method using a convolutional neural network (CNN) to determine whether a specific pair of RF frames is suitable for elastography. Although we focus on quasi-static elastography, the method can also be applied to other types of elastography.

Ii Methods

In this section, we will discuss data collection for training and testing, and the CNN architecture used. Our model is simply a binary classifier, which is used to determine the suitability of a pair of RF frames for strain estimation.

Our proposed technique can also be used for automatically finding the best RF frames for a specific pre-selected RF frame. The model achieves that by searching in a window composed of several RF frames (in this work, 8 before and after the pre-specified RF frame).

Ii-a Data Collection

The data used for training and testing the algorithm includes both phantom and in vivo data. For the phantom data used in this paper, 4,116 pairs of RF frames were collected at Concordia University’s PERFORM Centre from 3 different CIRS phantoms (Norfolk, VA), namely Models 040GSE, 039 and 059 at different locations. 3,290 pairs out of the total data were used for training and validation with a ratio of 80:20, and the remaining data was used for testing. The ultrasound device used was the 12R Alpinion ultrasound machine (Bothell, WA) with an L3-12H high density linear array probe at a center frequency of 8.5 MHz and sampling frequency of 40 MHz. For the in vivo data, 688 pairs of RF frames were collected at Johns Hopkins Hospital from different patients who were undergoing liver ablation for primary or secondary liver cancers. Detailed information about this data is available in [rivaz2011real]. 528 pairs out of the 688 pairs were used for training and validation with a ratio of 80:20, leaving the rest of the pairs for testing. The labelling of the data was done as described in Algorithm 1.

2:     RF frames and are passed to PCA-GLUE    [zayed2019fast, hashemi2017global] to obtain the displacement image.

is deformed and interpolated according to the    computed displacement image yielding

4:     We partition and ´ into 9 windows.
5:     Normalized Cross Correlation (NCC) is calculated      between every window in and its corresponding      window in ´, resulting in 9 different NCCs.
6:     The final decision is 1 if both the smallest NCC is      higher than 0.9 and the absolute value of the      average displacement is more than 0.5 pixels, and 0      otherwise.
7:end procedure
Algorithm 1 Labelling the dataset for the CNN classifier

It is important to note that steps 2 and 3 in Algorithm 1 are very computationally complex. As such, they cannot be performed in real-time for selecting optimal pairs of RF data. Our proposed method only performs these steps during training, and encodes the results into a computationally efficient CNN.

Ii-B Architecture

Suppose we have two RF frames and , and we would like to determine the suitability of this pair for strain estimation. We simply input the two frames to the CNN classifier on two different channels, and the output is a binary number 1 or 0. The architecture used is relatively simple as shown Fig. 2

. Every convolutional layer has a Rectified Linear Unit (ReLU) as the activation function, and is followed by batch normalization. The activation function in the output layer is a softmax, where the output values in the two nodes represent the probability of having a good and a bad pair respectively. The applied optimization technique is the Adam optimizer

[kingma2014adam] with a learning rate of

and a cross entropy loss function. The CNN code is written in Python using Keras.

Ii-C Training and testing time

The labelling of the data, which includes applying Algorithm 1 on every single pair of RF frames took 22 hours. Most of this time was spent on displacement estimation (step 2) and interpolating RF data (step 3). The actual training of the CNN took 7.4 minutes on a 7th generation 3.4 GHz Intel core i5 desktop with a NVIDIA TITAN V GPU. Inference is very fast, and only takes 5.4 to classify two frames of size 2304 by 384. The frames are downsampled by a factor of 2 in the axial direction, to generate smaller input images for the CNN. Note that in comparison, doing steps 2, 3, 5 and 6 in Algorithm 1 for two frames of the same size takes 6.21 seconds, 14.04 seconds, 46.87 and 2.45 respectively, for a total run-time of 20.3 seconds. In other words, frame selection with CNN is more than 3,700 times faster. It is important to note than CNN computations are performed on a GPU, whereas the steps in Algorithm 1 use a CPU.

Iii Experiments and Results

In this section, we compare our CNN frame selection method to other methods that choose to pair an RF frame with another by simply skipping one or two frames.

Fig. 3 shows the output of different frame selection methods when tested on one of the phantom datasets. It is clear that our automatic frame selection substantially outperforms the fixed skip frame pairing methods as it chooses more suitable frames, yielding better quality strain images. Table I shows the accuracy as well as the F1-measure obtained from our CNN classifier on new phantom datasets, that were not used during training. The results prove the ability of the classifier to generalize to unseen data.

TABLE I: The accuracy and F1-measure of our CNN classifier on the phantom and in vivo test data.
Dataset Size Accuracy F1-measure
Phantom dataset 1 228 instances      96.77%      93.68%
Phantom dataset 2 297 instances      91.7%      89.17%
Phantom dataset 3 301 instances      96%      96%
In vivo dataset 160 instances      95.24%      92%

Fig. 4 shows a comparison between the performance of our method and the fixed skip frame pairing on the in vivo dataset. Table I shows the accuracy as well as the F1-measure obtained from our method. Again, it is clear that our CNN-based method performs substantially better.

Iv Conclusion

In this paper we introduced a new method based on CNN to automatically choose RF frames that are suitable for strain estimation. Our method is fast, practical and does not need any external hardware. Therefore, it could be used commercially to generate high quality strain images even when used by an inexperienced operator.


The in vivo data was collected at Johns Hopkins Hospital. The authors would like to thank the principal investigators Drs. E. Boctor, M. Choti and G. Hager who provided us with the data. We would like to thank Morteza Mirzaei for providing us with some of the phantom data used in this paper. The authors also acknowledge NVIDIA for donating the graphics card.