Log In Sign Up

Extraction of Vascular Wall in Carotid Ultrasound via a Novel Boundary-Delineation Network

by   Qinghua Huang, et al.
Tencent QQ

Ultrasound imaging plays an important role in the diagnosis of vascular lesions. Accurate segmentation of the vascular wall is important for the prevention, diagnosis and treatment of vascular diseases. However, existing methods have inaccurate localization of the vascular wall boundary. Segmentation errors occur in discontinuous vascular wall boundaries and dark boundaries. To overcome these problems, we propose a new boundary-delineation network (BDNet). We use the boundary refinement module to re-delineate the boundary of the vascular wall to obtain the correct boundary location. We designed the feature extraction module to extract and fuse multi-scale features and different receptive field features to solve the problem of dark boundaries and discontinuous boundaries. We use a new loss function to optimize the model. The interference of class imbalance on model optimization is prevented to obtain finer and smoother boundaries. Finally, to facilitate clinical applications, we design the model to be lightweight. Experimental results show that our model achieves the best segmentation results and significantly reduces memory consumption compared to existing models for the dataset.


page 2

page 3

page 5

page 6

page 7

page 8

page 12

page 13


Coronary Wall Segmentation in CCTA Scans via a Hybrid Net with Contours Regularization

Providing closed and well-connected boundaries of coronary artery is ess...

Boundary Guidance Hierarchical Network for Real-Time Tongue Segmentation

Automated tongue image segmentation in tongue images is a challenging ta...

Enhancing Foreground Boundaries for Medical Image Segmentation

Object segmentation plays an important role in the modern medical image ...

Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network

One of recent trends [30, 31, 14] in network architec- ture design is st...

Hard Exudate Segmentation Supplemented by Super-Resolution with Multi-scale Attention Fusion Module

Hard exudates (HE) is the most specific biomarker for retina edema. Prec...

BREAK: Bronchi Reconstruction by gEodesic transformation And sKeleton embedding

Airway segmentation is critical for virtual bronchoscopy and computer-ai...

1 Introduction

In 2018, 17.9 million people died from cardiovascular disease (CVD) worldwide, 16.2% higher than reported in 2006 [1]. Cardiovascular disease has become the leading cause of death and disability worldwide and will continue to grow. Especially in low-and middle-income countries, the incidence of CVD is increasing every year [2]. There is an independent association between vascular and cardiovascular disease risk [3]. Vascular examinations are a major part of cardiovascular disease screening. Therefore, the work on the analysis of blood vessels is particularly important. The prevention of cardiovascular diseases is urgent. For the diagnosis of blood vessels, the vascular wall can be a good reflection of the structure of the vessels. The vascular wall structure is analyzed to arrive final diagnosis.

The main clinical methods of vascular examination are diagnostic pathology, ultrasound, CT and MRI [4]. Ultrasound imaging techniques are widely used because of their safety, low cost and non-invasive nature [5]. To perform ultrasound examinations of blood vessels, two physicians are required to collaborate. One is responsible for scanning the patient and the other is responsible for labeling the images for diagnosis. Errors can occur due to the quality of imaging and the subjective awareness of the physician. In order to reduce the burden on the physician, speed up the diagnosis and improve diagnostic accuracy, computer-aided systems (CAD) are beginning to appear on the horizon.

In the beginning, researchers mostly used traditional operators for the identification and segmentation of vascular structures. The commonly used traditional algorithms are gradient calculation, edge tracking, dynamic programming, active contour, hough transform, snakes, etc [6]. For example, MA Gutierrez et al. [13] proposed an active contour improvement technique based on multi-resolution analysis. The model uses filters of different scales to extract the boundaries by derivation. MC Bastida-Jumilla et al. [14] used the sobel operator to detect the horizontal edge of the artery. The hough transform detects the main direction of the artery. Finally, active contours segment the carotid artery walls. Filippo Molinari et al. [15] treated the artery as consisting of a lumen with a low-intensity region and two bright borders. Many seed points were randomly generated in the image. By connecting these seed points, the vascular wall structure was formed. A discriminator was used to reject the unqualified boundaries and retain the true vascular wall boundaries.

In recent years, with the rapid development of deep learning in the field of medical image processing, researchers have started to focus their research on deep learning algorithms. Compared with traditional methods, deep learning algorithms have better robustness. Several researchers have started to use deep learning methods to solve the vascular structure segmentation problem. Researchers started to use U-Net and its variants to segment the structure of blood vessels

[7, 8, 9, 10, 17]. As the research progresses, various models were proposed [11].

For example, Carl Azzopardi et al. [16] used an encoder-decoder structure to complete the segmentation of the vascular by two downsamples and two upsamples. Later, U-Net [17] was proposed to segment the cross-section of the lumen of the vessel derive the vascular wall structure. Xie et al. [7] segmented the longitudinal section of the lumen of a blood vessel by using a two-path U-Net model. The original image and the image after data enhancement were used as two inputs, and then the features extracted from both paths were predicted using a decoder. Yang et al. [10] used U-Net as the backbone. A special design was performed in skip connection. A parallel structure is used to refine the downsampled features. The two branches are the main branch and the detailed branch, and the main branch uses two layers of convolutional layers with 5×5 kernel size. The detail branch uses a convolutional layer of 3×3 kernel size and a convolutional layer of 1×1 kernel size. The final prediction is performed using 5×5 convolution for the prediction of the results and expanding the receptive fields. Martin Szarski et al. [8] tried to add coordinate information inside the U-Net model. It is hoped that the spatial location dependencies of the features can be identified. The authors added CoordConv after the first downsampling of U-Net and added spatial coordinate information through CoordConv. The other parts remain unchanged. Lian et al. [11] used four SE-ResNeXt modules, an average pooling layer and two fully connected layers to construct the feature encoder. The decoder combined with the DQN network is used to generate the mask and five key points. Five key points are used to constrain the shape of the mask.

We found that previous researchers had taken a variety of approaches to improve the accuracy of vascular wall segmentation. Although those methods mentioned above have achieved good results in vascular structure segmentation. But there are still some problems with the present methods. First, most of the existing methods down-sample and up-sample the image by convolution to obtain the segmentation result of the vascular wall. The upsampling process can lead to the blurring of vascular wall boundary information, which will result in inaccurate boundary locations [12]. The second one is that ultrasound devices do not always output high-resolution, good-quality images. Due to the noise of the device or the distortion of the images, there are many images with bad quality in real situations. For example, there are images with dark boundaries, discontinuity boundaries and other problems. We should design the model with these problems in mind and try to avoid being affected by them. The third problem is that the ultrasound image vascular wall segmentation task is a class imbalance between the vascular wall and the background. If no approach is taken to deal with it, the segmentation accuracy of the model cannot be well optimized when training the model. The last one is that the present segmentation algorithm has a large number of backbone parameters, which require high hardware for the device. It is not conducive to clinical application and popularity.

Based on the above problems, a boundary-delineation network (BDNet) is designed to solve the vascular wall boundary delineation problem. The precise positioning of boundary points and extraction of boundary information is achieved by boundary refinement to prevent the offset of the boundary. The problem of dark boundaries and discontinuity boundaries in lower-quality images is well solved by extracting multi-scale features and fusing different receptive field features. A combination of the lovász-softmax loss function and cross-entropy loss function is used to avoid the class imbalance problem of the vascular wall segmentation task. The boundary refinement is optimized using point cross-entropy loss. In order to reduce the hardware requirements and the number of model parameters, we have designed the model to be lightweight. In summary, the main contributions of our paper are as follows.

  1. Adopt the boundary refinement to accurately locate and extract features from the boundary points of ultrasound image vascular wall, and prevent the problem of inaccurate boundary positioning during the upsampling process.

  2. Introduce a multi-scale fusion mechanism and PSP module to fuse multi-scale and different receptive field features to solve the problem of incorrect segmentation in low-quality images with dark boundaries and discontinuity boundaries.

  3. Using a new loss combination to solve the problem of class imbalance between the vascular wall and the background, putting the center of model optimization on the segmentation of the vascular wall.

  4. Lightweight optimization of the model to significantly reduce the number of parameters while maintaining the model accuracy.

The rest of the paper is structured as follows. Section 2 describes the methodology of the model. Section 3 describes the details of the experiments. Section 4 provides a discussion. Section 5 draws the conclusion of the paper.

2 Methods

Figure 1: Overview of the BDNet structure. There are mainly two modules. 1) The feature extraction module extracts image features and fuses multi-scale features and different receptive field features to output fine-grained features. 2) Boundary refinement module is used to extract the boundary points, re-predict the boundary points and replace the previous coarse prediction results.

In this section, we describe the structure of the BDNet in detail. BDNet consists of four modules, which are the feature extraction module, the boundary refinement module, the loss function module and the lightweight module, the whole model is shown in Fig. 1. The image is first input into the feature extraction module. The feature extraction module is designed for feature downsampling and feature fusion. In the feature downsampling process, the multi-scale features are saved. In the process of feature fusion, the multi-scale features are fused, and the different receptive field features are generated and fused. Then, the output of the feature extraction module is input into the boundary refinement module for boundary position correction. The locations of suspected boundary points are extracted. The fine-grained features of these points are re-predicted using MLP. The results of these points in the coarse prediction are replaced and the final prediction results are output. After that, the prediction results are computed by our new loss function to better optimize the model and prevent the effect of class imbalance. Finally, we developed a lightweight downsampling module for our proposed BDNet to speed up the training and testing process, making it suitable for clinical applications and generalization.

2.1 Feature extraction

In clinical practice, there are a large number of ultrasound vascular images that are of low quality. Common problems include discontinuous boundaries and dark boundaries.

Dark boundaries refer to the fact that the boundary of the vascular wall in a certain region has a small gray value compared to other regions and is not obvious, which is easily ignored in the downsampling process. Dark boundaries can lead to broken segmentation results when dark boundaries are not detected. The lower-level features contain more detailed information. We can use the low-level features to prevent the problem of losing information on dark boundaries. Therefore, we use the combination of low-level features and high-level features for multi-scale features to predict the results.

Discontinuous boundaries refer to the appearance of intermittent connections in the vascular wall in the ultrasound image. Discontinuous boundaries can lead to intermittent segmentation results that are not conducive to application. In ultrasound images, blood vessels are generally throughout the image. The vascular wall is regular. In the case of discontinuities at the boundaries of the vessels, sometimes it is necessary to use information from other regions for prediction. If only downsampling is used, only information from adjacent areas can be extracted [12]. If the discontinuity boundary is long, there is no boundary around it, this will lead to prediction failure. We can introduce more contextual information and establish spatial dependencies to prevent prediction failure when the discontinuity boundaries are long. Therefore we use the PSP module, using four receptive fields of view to introduce more contextual information. The discontinuity boundaries are correct from global and local perspectives.

Figure 2: The architecture of the fusion module. The left of the figure performs the fusion of multi-scale features. The right is performing the fusion of different receptive field features.

For ultrasound images, boundary discontinuities and dark boundaries are inevitable. However, we can take methods in the feature extraction module to avoid the effect of these situations on the segmentation results. The feature extraction module consists of the downsampling module and the fusion module.

The purpose of the downsampling module is to extract the high-level features of the image and retain the multi-scale features. We perform five convolution calculations with a stride of 2 on the input image features, as shown in Fig.

1. The features are changed to 1/32 of the original image. After that, we perform another convolution calculation with a stride of 1 to embed the features. During downsampling, we preserve features of 1/2, 1/4, and 1/8 size of the original image and the embedded features.

The features of 1/2, 1/4, 1/8 and 1/32 saved in the downsampling module will be fused in the fusion module, as shown in Fig. 2. Before multi-scale fusion, we use the MLP layer to resize the image features of different scales to the same size for easy fusing afterward. The MLP also has some role in fusing spatial and channel features [23]

, it is used as a small auxiliary module. After that, we aggregate multi-scale features by concatenating. The multi-scale feature fusion is done by convolution, batch normalization and ReLU activation functions.

We need to get information about different receptive fields. We use the PSP module [21] to implement. The local information and global information are fused to introduce more contextual information. We set up four different fields of view, 1×1, 2×2, 3×3 and 6×6, as shown in Fig. 2. In each stage, we perform an adaptive averaging pooling of the features to the size we want. To facilitate feature fusion, we resize the features to 1/8 of the original image. After that, the features are extracted by convolution, batch normalize and ReLU activation functions. The global information is fused with the local information. Finally, we fuse the pre-processed features with the post-processed features to avoid information loss.

2.2 Boundary refinement

The general strategy used for ultrasound image segmentation is to downsample to obtain high-density features and then upsample to the original image size. The process of upsampling linear interpolation blurs the boundary information and leads to the smoothing of boundary features. This phenomenon causes the area of the predicted mask to be larger than the label. For example, the lumen of a blood vessel after upsampling back to the original image size will have the lumen contour offset outward than the true contour. The same is true for the contours of the vascular wall tissue. Therefore, we refer to the wrong contour when performing softmax resulting in inaccurate localization of the boundaries. To prevent the boundary offset, we introduce the boundary refinement module to help solve this problem. We use the PointHead

[12] to implement the boundary refinement. With boundary refinement, we can achieve precise positioning of boundary points and prevent boundary positioning errors.

Figure 3: Detailed implementation of the boundary refinement module. The process is divided into four steps: finding uncertain points, extraction of features corresponding to the points, re-prediction of the points, and replacement of the points.

The boundary refinement module is divided into four steps, as shown in Fig. 3. First, upsampling is performed on the coarse prediction results. Based on the result after upsampling, the uncertainty map of the classification result is calculated. The first uncertain points are selected according to the uncertainty map and their coordinates are determined. For each pixel, the size of the model output result is ×1. is the number of classes. The output of each pixel point is denoted by . The formula for calculating the uncertainty of each pixel is as follows.


is a function that calculates the scores of the top classes with the highest scores for each pixel. denotes the score of the first ranked class. denotes the score of the second ranked class. denotes the uncertainty of the pixel point.

After determining these points, we obtain the corresponding coordinates. The coordinates are normalized. If the image feature size changes, it does not affect the determination of the position. Based on the coordinates, we find the features at the corresponding positions from coarse prediction and fine-grained features. The two features are concatenated according to the channel direction. Then, the features are re-predicted for each pixel after fusion using an MLP network. Finally, the result of the coarse prediction corresponding to the position is replaced with the new prediction result according to the coordinates.

The boundary is the dividing line between the two class areas. Uncertain points mean that the scores of the different classes are similar at this location. This may be caused by the overlapping of regions due to the expansion of different classes of regions during the upsampling process. Therefore, it indicates that uncertain points may be boundary points. With the boundary refinement module, we correct the incorrect results of upsampling by re-predicting the labels of the boundary points based on fine-grained features without damaging the detailed information of the boundary. We can prevent the boundary position from offset due to upsampling. Thus, an accurate boundary position is obtained. In addition, the prediction of carefully selected points is computationally an order of magnitude less compared to direct computation. This can speed up the segmentation of the model.

2.3 Loss

During the training of the model, the model outputs an ensemble of points and a mask. The training of the set of points is handled by the point cross entropy loss (). The training of the mask is handled by cross entropy loss () and lovász-softmax loss () [25].

is a loss function designed for the boundary refinement module. The model uses this loss function to improve the performance of boundary point re-prediction. The essence of point cross entropy loss is cross entropy loss, but it is calculated for the set of points. The formula of is shown below.


denotes cross entropy. denotes the prediction of uncertainty points generated by the boundary refinement module. Its size is ××. is the batch size. is the number of uncertainty points. denotes the classes of the segmentation. indicates the true value of these uncertainty points. Its size is ×. is the true label of sample on class .

is the predicted probability of sample

on class .

is the main loss of loss function, which is used to calculate the loss of the mask. is the prediction mask. Its size is ×××. denote the height and width of the image respectively. is the true value of the mask. Its size is ××.


Vascular wall segmentation is a class unbalanced segmentation task. The background occupies a large proportion and the region of the vascular wall is small. When evaluating the model performance, the segmentation accuracy of the background pulls up the segmentation accuracy of the model, which is bad for the optimization of our model. When optimizing the model parameters, it is not possible to optimize the vascular wall segmentation accuracy. We introduced the lovász-softmax loss to solve the class imbalance problem [26] and optimize the model better. Lovász-softmax loss calculates the mIoU of the model segmentation results. The mIoU can better reflect the segmentation accuracy of the imbalance class than . Thus, the segmentation accuracy of the vascular wall is improved and the segmentation results are smoother [27]. The formula of is shown below.


is the lovász extension of IoU [25]. indicates the class of segmentation. indicates the ground truth class of pixel , indicates the class probabilities of pixel .

Our loss function is as follows:


The optimization of the model is focused on the segmentation of the vascular wall through the combination of our loss functions. In the training process, we use two stages of training. The loss function has two combinations. takes 0 or 1. We use and for training in the early stage of training to capture the image boundary details better and to speed up the convergence. We take to 1. At the later stage of training, we take to 0 in order to make the segmentation more accurate and the boundary smoother. We use and for the finetune.

2.4 Lightweight

Figure 4: Detailed implementation of the lightweight downsampling module. The light blue dashed box shows the implementation principle of depthwise separable convolution. The brown dashed box is the implementation of the inverted residuals.

A large number of parameters and complex computations severely limit the use of deep learning products in daily life. In order to make vascular wall segmentation faster and take up less computational resources than existing methods to speed up the use of artificial intelligence products in clinical settings. We need to design the model to be lightweight. Starting from this direction, we designed a lightweight downsampling module to extract features from images by depthwise separable convolution [33] and inverted residuals [34]. The flow of the whole module is shown in Fig. 4. Save in Fig. 4 means that the features at this location need to be saved. Fixed means that the size of the convolution kernel must be 1×1. Not fixed means that the size of the convolution kernel can be determined by yourself according to your model. The purpose of the whole module is to reduce the number of parameters.

The depthwise separable convolution divides the convolution process into depthwise convolutions and pointwise convolutions. Depthwise convolution is a channel corresponding to only one convolution kernel. Pointwise convolution uses a 1×1 convolution kernel to extend the depth, as shown in Fig. 4. Depthwise separable convolution significantly reduces the parameters of the convolution and the computational effort of the model. Fewer parameters are important for the application of the system. The equation (10) shows the ratio of the number of parameters between traditional convolution and depthwise separable convolution. is the number of parameters of depthwise separable convolution. is the number of parameters of traditional convolution. denotes the number of input channels. is the number of output channels. is the convolution kernel size.


Lightweight networks generally have a smaller number of channels in order to reduce the parameters and speed up the computation. However, the reduction in the number of channels can lead to a model that does not perform well for image information extraction [34]. Therefore, the inverted residuals have two roles. One is to extract advanced image features, and the other is to avoid the problem of weakened image information extraction caused by a smaller number of channels. The implementation details of the inverted residuals are shown in Fig. 4. In order to reduce the number of parameters, the depthwise separable convolution is used. Inverted residuals mainly expand the number of channels and compress the number of channels for image features. By increasing the number of channels, the model can extract features well.

3 Experiments

3.1 Dataset

This study is approved by the Ethics Committee of Medical and Experimental Animals, Northwestern Polytechnical University, Xi’an, China (Protocol no. 202002010). We obtained longitudinal carotid ultrasound images of 657 patients from our partner medical institutions. Each image was labeled by the physician with the region of interest. We cropped the ROI from the original images. The part of the vascular wall in each ROI image was labeled by the physician. These annotations were used to generate mask images as labels. 1548 vascular images were finally obtained. This dataset is challenging. The data comes from different hospitals and different brands of ultrasound equipment. The quality of each image was good or bad. The region of interest includes several different sites, and the shape of the blood vessels is different in each site.

3.2 Evaluation metrics

To better evaluate the performance of our model, we use five metrics to evaluate the segmentation performance of our model. These include Dice, mIoU, boundary IoU (BIoU), number of params (Params), and floating point operations (FLOPs). mIOU is the average of the intersection of the true and predicted values of the different classes, and takes the value [0-1]. mIOU is larger to indicate better model performance. Dice can calculate the similarity between two samples. The value is [0-1]. the larger Dice, the better performance. BIoU is a dedicated metric to measure the goodness of the boundary segmentation model [28]. FLOPs can be used to measure the computational complexity of the algorithm. Params directly determine the size of the model. The formulas of the metrics are as follows.


denotes positive samples with correct prediction, denotes positive samples with incorrect prediction, denotes negative samples with incorrect prediction, and denotes negative samples with correct prediction.


denotes the ground truth binary mask. denotes the prediction binary mask. , denotes the set of pixels in the boundary region of the binary mask.

3.3 Experimental details

Our experiments relied on a hardware environment with the server equipped with the Quadro RTX 8000 GPU. The operating system was Ubuntu. We used Paddle as our deep learning framework. Our dataset was divided into a training set and a test set with a ratio of 4:1. We trained and evaluated the model with five-fold cross-validation. During training, the images were randomly cropped, randomly flipped, randomly scaled and normalized. Only normalization was used in the evaluation process. Our experiments were divided into two training phases. In the first stage, the

of the loss function was set to 1 and trained for 200 epochs. In the second stage, the

of the loss function was set to 0 and trained for 100 epochs.

3.4 Comparative Study

In order to be able to better evaluate the performance of our model, we compared the current dominant general-purpose segmentation models. In addition, we compared previous models that have done the same work.

3.4.1 Comparison with general models

Figure 5: Visualization of segmentation results for different models.

We compared our results with existing generic methods. BiSeNetV2 [29] and Fast-SCNN [30] are the latest lightweight segmentation methods. U-Net [20], SegNet [31], DeepLab-v3 [32], PSPNet [21] and PointRend [12] are the more popular and used methods nowadays.

Model Dice mIoU BIoU Params(M) FLOPs(G)
BiSeNetV2 [29] 58.2 69.9 42.4 2.2 2.9
Fast-SCNN [30] 60.3 70.9 44.5 1.4 0.4
U-Net [20] 60.5 70.8 44.7 12.8 45.2
SegNet [31] 60.9 71.0 45.1 28.2 61.9
DeepLab-v3 [32] 62.1 71.5 46.2 37.3 59.2
PSPNet [21] 63.1 71.8 47.2 64.8 96.6
PointRend [12] 63.3 71.9 47.6 26.9 68.1
BDNet (ours) 65.6 73.2 50.0 1.3 1.9
Table 1:

Quantitative comparison of different generic models, using evaluation metrics for both the model itself and the segmentation results obtained from the model.

As shown in Table 1, our model produced 65.6 Dice using only 1.3M Params and 1.9G FLOPs, higher than the results of other methods. Compared with Fast-SCNN, our model had a slightly higher number of Params and FLOPs, but our Dice segmentation result was 5.3 higher compared with Fast-SCNN. Compared with other models, our model was superior in both the number of parameters and segmentation accuracy. Fig. 5 shows the segmentation results of our model and other models. We can see that our model can solve the problem of dark boundaries and discontinuity boundaries very well. The problem of boundary localization errors in ultrasound images was also well resolved. Our model had a finer and more robust performance.

3.4.2 Comparison with vascular ultrasound image segmentation models

In order to ensure fairness, we compared the existing vascular ultrasound image segmentation models [7, 8, 9, 10]. The experimental results are shown in Table 2. The experiments show that our method is better than the previous methods. In terms of segmentation performance, our Dice is improved by 1.9 compared to the best existing model. In terms of computational resource usage, our model is much smaller compared to models with Dice greater than 60. Considering both together, our model has an absolute advantage. The experimental result shows that our network has better segmentation performance and is easier to apply.

Model Dice mIoU BIoU Params(M) FLOPs(G)
M. Szarski et al. [8] 58.5 69.2 42.4 0.3 0.7
J. Yang et al. [9] 61.1 70.8 45.2 21.3 194.9
J. Yang et al. [10] 63.5 72.0 47.6 8.3 34.7
M. Xie et al. [7] 63.7 72.1 48.0 46.8 19.3
BDNet (ours) 65.6 73.2 50.0 1.3 1.9
Table 2: Quantitative comparison of different vascular ultrasound image segmentation models, using evaluation metrics for both the model itself and the segmentation results obtained from the model.

The visualization results of the model are shown in Fig. 5. We can find that our model has better robustness in the case of discontinuous boundaries and darker boundaries. Our model can realize the connection of discontinuous boundaries and achieve accurate segmentation of dark boundaries. The model is more resistant to noise interference and segmentation is more accurate.

3.5 Ablation experiment

In the previous section, we experimentally demonstrated the merits of our model. In this section, we will discuss the important modules of the model and the role of the loss function at the level of experimental results, and how each module affects the results.

3.5.1 The effect of different modules on the segmentation results

Figure 6: Visualization of segmentation results for models with different modules.
Model Dice mIoU BIoU
d 44.3 62.6 28.9
d+m 62.7(+18.4) 71.6 (+9) 46.5(+17.6)
d+m+psp 64.4 (+1.7) 72.5(+0.9) 48.4 (+1.9)
last(d+m+psp+b) 65.6(+1.2) 73.2(+0.7) 50.0(+1.6)
Table 3: Comparative experimental results of models with different module (d indicates downsampling module, m indicates multi-scale fusion module, psp indicates PSP module, and b indicates boundary refinement module).

To demonstrate the importance of each module, we experimented with gradually adding modules. In the first experiment, we just used a simple downsampling module for training and prediction. Then, added the multi-scale fusion module. After that, the PSP module was added. Finally, the boundary refinement module was added. The experimental results are shown in Table 3 and Fig. 6. In Table 3, we can find that the performance of our model was significantly improved after adding the multi-scale fusion module. With the addition of the PSP module and the boundary refinement module, our model had a good improvement in both. In Fig. 6, we can see that with the addition of the multi-scale module our model started to resolve images with dark boundaries. After adding the PSP module our model can solve the problem of boundaries discontinuously smoothly.

As for the boundary refinement module, we can find that boundary refinement improved the segmentation accuracy of the model from the experimental result in Table 3. However, the boundary refinement module was not obvious in terms of the visualization effect. To demonstrate the usefulness of this module, we visualized the positions of the first 640 uncertain points extracted by the boundary refinement module, as shown in Fig. 7. We can see that the uncertain points extracted by the model are boundary points. Thus, this shows that our model is indeed able to locate the boundary points, thus preventing the boundary information from being smoothed due to upsampling and thus leading to incorrect boundary locations.

Figure 7: The distribution of uncertainty points in the image.

3.5.2 The effect of loss function on the segmentation results

To test the effectiveness of our loss function, we did the following experiment. In one model, 300 epochs were trained with and only. In the other model, we trained 200 epochs with and and 100 epochs with and for optimization. The experimental results are shown in Fig. 8 and Table 4.

Model Dice mIoU BIoU
Our_300 62.6 72.0 47.0
Our_200+100 65.6(+3.0) 73.2(+1.2) 50.0(+3.0)
Table 4: Quantitative comparison of the segmentation performance of the model with different loss functions. Our_300 means just 300 epochs trained with and . Our_200+100 means 200 epochs trained with and , and then 100 epochs trained with and .
Figure 8: Variation of mIoU during the training process for models using different loss functions.

As shown in Fig. 8 and Table 4, after 200 epochs, the accuracy of the model with the addition of increased rapidly and remained stable. After the training was completed, the Dice of the model with improved by 3.0, mIoU improved by 1.2, and BIoU improved by 3.0. Thus, the combination of our loss functions is effective.

3.5.3 The effect of lightweight strategy on model parameters and segmentation results

To reduce the number of parameters in the model and to avoid excessive hardware requirements, we designed a lightweight downsampling module. In this section, we will see how our design worked through experimental results. In the comparison experiments, we replaced the depthwise separable convolution in the model with the conventional convolution. The other parts remained unchanged. The experimental results are shown in Table 5. is the model we designed. is the model in which the depthwise separable convolution is replaced by the conventional convolution. denotes the whole model. denotes the proportion of the downsampling module in the whole model.

Model downsampling fusion boundary refinement
FLOPs(G) 0.19 0.98 0.74
2.43 0.98 0.74 4.15 58.59
Params(M) 0.50 0.64 0.19
11.24 0.64 0.19 12.07 93.15
Table 5: Analysis of the number of parameters for each module of our model and the traditional convolution-based model.

The experimental results show that the downsampling module of our model has very little weight in the whole model. We can design more special structures that fit the model’s function and thus increase the model’s performance while ensuring a small number of parameters. In the traditional method, the downsampling module takes up too much weight in the model. If there is a limit on the number of parameters, it is difficult to have more space for other designs.

Model Dice mIoU BIoU Params FLOPs
65.8 73.2 50.2 12.1 4.2
65.6(-0.2) 73.2(-0.0) 50.0(-0.2)
Table 6: Segmentation results of our model and traditional convolution-based models.

After the above results, we can find that our model achieved lightweight. In the following, we experimentally observed the decrease in the number of parameters affects the segmentation performance. Table 6 shows the experimental results of the segmentation performance of the two models. We find that our model has a slight decrease in performance compared to . However, the number of ’s parameters is 9 times higher than that of our model. All things considered, it is worthwhile to sacrifice a little performance and significantly reduce the number of parameters. It proves that our lightweight is meaningful.

4 Discussion

In this study, we proposed four modules for vascular wall segmentation. The first one is that we designed a feature extraction module to extract multi-scale features and fused them, generated different receptive field features and fused them, and introduced more detailed and contextual information. After that, we used the boundary refinement module to prevent the effect of upsampling on the boundary position. Then, we designed a loss function to avoid the effect of class imbalance on the model and focused the optimization of the model on vascular wall segmentation. Finally, we designed a lightweight downsampling module to reduce the parameters of the model.

Compared with other models, our model has advantages in terms of the number of parameters and segmentation effects, as shown in Table 1 and Table 2. The Dice of our model is 2.3 higher compared to the best generic model and 1.9 higher compared to the best vascular ultrasound image segmentation model. Our model has only one-twentieth or less of the number of parameters of these two models. We think that there are two reasons for this situation. One is that we have made some lightweight designs for the model. The other is our special design corresponding to the characteristics of ultrasound vascular images.

We had used depthwise separable convolution instead of traditional convolution for downsampling. This approach can greatly reduce the number of parameters of the model. As shown in Table 5, the number of parameters decreases to one-ninth of the original one. The depthwise separable convolution has little impact on segmentation performance, as shown in Table 6.

For the problem of dark boundaries, Xie et al. [7] extracted more image features by setting two encoders. Two encoders do extract features very well, but the drawback is also obvious. The number of parameters becomes correspondingly large. As is shown in Table 2, the number of parameters is 36 times larger than our model. This is not hardware friendly. We just used an encoder and chose to use multi-scale information fusion processing. The model combined detailed information and high-level features for prediction. With the problem of dark boundaries solved, the performance of the model is naturally improved to a larger extent. For the problem of discontinuity boundaries of the vascular wall, we extracted information from both global and local views by expanding the receptive field. Yang et al. [9] expanded the receptive field by changing the convolution kernel size to 5×5. The method has some effect, but the expansion of the field of view is limited. Our model combined global and local information from multiple fields of view for feature fusion and extraction. Therefore, the results are better. Most importantly, we focused on the boundary positioning error problem in the model design. Accurate boundary locations are obtained by correcting the prediction results of the upsampled boundary points. In addition, the boundary point predictions were added to our loss function to ensure that the model can predict the boundary points well. Finally, for the problem of predicting boundary class imbalance, we used the lovász-softmax loss to solve it. The mIoU principle was used to prevent interference from the background.

One more point is worth emphasizing. Many of the techniques in our solution are not limited to the processing of vascular ultrasound images. For example, the optimization of parametric quantities applies to many image processing application problems. Many image processing models encounter the problem of incorrect boundary locations. We hope that our model can be an inspiration to other researchers.

But our model also has some limitations that we need to improve. Our model is segmented based on image information now. There is still little information in a single image. The points output by the boundary refinement module is generated based on the boundary information of the images. The structure of blood vessels is regular. We can incorporate the physician’s a priori knowledge into our model. Afterward, we will consider analyzing the positional relationships between these generated points. The geometry of the vessels will be fitted according to the positional relationships to reconstruct our vascular walls. We will design a segmentation model with better robustness.

5 Conclusions

In this paper, our goal is to design a boundary-delineation network (BDNet) to segment the vascular walls of ultrasound vascular images. We solved the incorrect segmentation of low-quality images with dark boundaries and discontinuous boundaries by mixing image features of multi-scale and different receptive fields. We used the boundary refinement module to precisely locate the boundary points and re-predict the boundaries to obtain the correct boundary locations. We optimized the model by loss function to make the segmentation smoother and more accurate. Finally, we optimized the model with parameters to make it lightweight and application-friendly. After the above processing, our model has good robustness and a finer segmentation effect. Through experimental comparisons of the dataset, our method achieved state-of-the-art segmentation performance. Based on the current model results, our model is more applicable to the segmentation of object boundaries. Therefore, it will consider trying in other cases of boundary segmentation of medical images in the future.


  • [1] E. J. Benjamin, P. Muntner, A. Alonso, M. S. Bittencourt, C. W. Callaway, A. P. Carson, A. M. Chamberlain, A. R. Chang, S. Cheng, S. R. Das, et al., “Heart disease and stroke statistics—2019 update: a report from the american heart association,” Circulation, vol. 139, no. 10, pp. e56–e528, 2019.
  • [2] S. Bansilal, J. M. Castellano, and V. Fuster, “Global burden of cvd: focus on secondary prevention of cardiovascular disease,” International journal of cardiology, vol. 201, pp. S1–S7, 2015.
  • [3] S. Sedaghat, T. T. Van Sloten, S. Laurent, G. M. London, B. Pannier, M. Kavousi, F. Mattace-Raso, O. H. Franco, P. Boutouyrie, M. A. Ikram, et al., “Common carotid artery diameter and risk of cardiovascular events and mortality: pooled analyses of four cohort studies,” Hypertension, vol. 72, no. 1, pp. 85–92, 2018.
  • [4] Y. Liu, M. Wang, B. Zhang, W. Wang, Y. Xu, Y. Han, C. Yuan, and X. Zhao, “Size of carotid artery intraplaque hemorrhage and acute ischemic stroke: a cardiovascular magnetic resonance chinese atherosclerosis risk evaluation study,” Journal of Cardiovascular Magnetic Resonance, vol. 21, no. 1, pp. 1–9, 2019.
  • [5] J. M. Sanches, A. F. Laine, and J. S. Suri, Ultrasound imaging. Springer, 2012.
  • [6] F. Molinari, G. Zeng, and J. S. Suri, “A state of the art review on intima–media thickness (imt) measurement and wall segmentation techniques for carotid ultrasound,” Computer methods and programs in biomedicine, vol. 100, no. 3, pp. 201–221, 2010.
  • [7]

    M. Xie, Y. Li, Y. Xue, R. Shafritz, S. A. Rahimi, J. W. Ady, and U. W. Roshan, “Vessel lumen segmentation in internal carotid artery ultrasounds with deep convolutional neural networks,” in

    2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2393–2398, IEEE, 2019.
  • [8] M. Szarski and S. Chauhan, “Improved real-time segmentation of intravascular ultrasound images using coordinate-aware fully convolutional networks,” Computerized Medical Imaging and Graphics, vol. 91, p. 101955, 2021.
  • [9] J. Yang, L. Tong, M. Faraji, and A. Basu, “Ivus-net: an intravascular ultrasound segmentation network,” in International Conference on Smart Multimedia, pp. 367–377, Springer, 2018.
  • [10] J. Yang, M. Faraji, and A. Basu, “Robust segmentation of arterial walls in intravascular ultrasound images using dual path u-net,” Ultrasonics, vol. 96, pp. 24–33, 2019.
  • [11]

    S. Lian, Z. Luo, C. Feng, S. Li, and S. Li, “April: Anatomical prior-guided reinforcement learning for accurate carotid lumen diameter and intima-media thickness measurement,”

    Medical Image Analysis, vol. 71, p. 102040, 2021.
  • [12] A. Kirillov, Y. Wu, K. He, and R. Girshick, “Pointrend: Image segmentation as rendering,” in

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    , pp. 9799–9808, 2020.
  • [13] M. A. Gutierrez, P. E. Pilon, S. Lage, L. Kopel, R. Carvalho, and S. Furuie, “Automatic measurement of carotid diameter and wall thickness in ultrasound images,” in Computers in Cardiology, pp. 359–362, IEEE, 2002.
  • [14] M. C. Bastida-Jumilla, R. M. Menchón-Lara, J. Morales-Sánchez, R. Verdú-Monedero, J. Larrey-Ruiz, and J. L. Sancho-Gómez, “Segmentation of the common carotid artery walls based on a frequency implementation of active contours,” Journal of digital imaging, vol. 26, no. 1, pp. 129–139, 2013.
  • [15] F. Molinari, G. Zeng, and J. S. Suri, “An integrated approach to computer-based automated tracing and its validation for 200 common carotid arterial wall ultrasound images: A new technique,” Journal of Ultrasound in Medicine, vol. 29, no. 3, pp. 399–418, 2010.
  • [16]

    C. Azzopardi, Y. A. Hicks, and K. P. Camilleri, “Automatic carotid ultrasound segmentation using deep convolutional neural networks and phase congruency maps,” in

    2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), pp. 624–628, IEEE, 2017.
  • [17] C. Azzopardi, K. P. Camilleri, and Y. A. Hicks, “Bimodal automated carotid ultrasound segmentation using geometrically constrained deep neural networks,” IEEE Journal of Biomedical and Health Informatics, vol. 24, no. 4, pp. 1004–1015, 2020.
  • [18] S. Minaee, Y. Y. Boykov, F. Porikli, A. J. Plaza, N. Kehtarnavaz, and D. Terzopoulos, “Image segmentation using deep learning: A survey,” IEEE transactions on pattern analysis and machine intelligence, 2021.
  • [19] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440, 2015.
  • [20] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, pp. 234–241, Springer, 2015.
  • [21] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2881–2890, 2017.
  • [22] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected crfs,” arXiv preprint arXiv:1412.7062, 2014.
  • [23] I. O. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, A. Steiner, D. Keysers, J. Uszkoreit, et al., “Mlp-mixer: An all-mlp architecture for vision,” Advances in Neural Information Processing Systems, vol. 34, 2021.
  • [24] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” Advances in Neural Information Processing Systems, vol. 34, 2021.
  • [25] M. Berman, A. R. Triki, and M. B. Blaschko, “The lovász-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4413–4421, 2018.
  • [26] X. Wang, “Human protein classification in microscope images using deep learning and focal-lovász loss,” in 2020 The 4th International Conference on Video and Image Processing, pp. 230–235, 2020.
  • [27] C. Xuhong, L. Chunbin, W. Jing, L. Lei, L. Quanhong, and A. Benjamin, “Land cover extraction of remote sensing images with parallel convolutional network,” in 2021 9th International Conference on Agro-Geoinformatics (Agro-Geoinformatics), pp. 1–6, IEEE, 2021.
  • [28] B. Cheng, R. Girshick, P. Dollár, A. C. Berg, and A. Kirillov, “Boundary iou: Improving object-centric image segmentation evaluation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15334–15342, 2021.
  • [29] C. Yu, C. Gao, J. Wang, G. Yu, C. Shen, and N. Sang, “Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation,” International Journal of Computer Vision, vol. 129, no. 11, pp. 3051–3068, 2021.
  • [30] R. P. Poudel, S. Liwicki, and R. Cipolla, “Fast-scnn: Fast semantic segmentation network,” arXiv preprint arXiv:1902.04502, 2019.
  • [31] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 12, pp. 2481–2495, 2017.
  • [32] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv preprint arXiv:1706.05587, 2017.
  • [33] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
  • [34] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520, 2018.