According to the World Health Organisation, cardiovascular diseases (CVDs), particularly atherosclerosis, are considered the leading cause of death worldwide [kaptoge2019world] although they are preventable [mcgill2008preventing]. Prevention requires screening by means of a non-ionising and inexpensive imaging modality. Ultrasound (US) imaging has these characteristics and is routinely used to explore the common carotid artery (CCA), which is often considered as the sentinel of atherosclerosis [rizi2020carotid]. An early sign of this disease onset is the arterial wall thickening. To measure the thickness of interest, the contours of the intima-media complex (IMC), namely, lumen-intima (LI) and media-adventitia (MA) interfaces, need to be identified (Fig. 1).
The majority of methods in the literature use contour-based approaches [delsanto2007characterization, loizou2007snakes, zahnd2017fully, raj2020automated] to exploit the intensity peaks caused by the echoes at the interfaces, named double-line pattern. Region-based approaches are less used and combine despeckling with threshold-based segmentation [nagaraj2018segmentation, wang2020fully].
Recently, deep-learning (DL) has been successfully used in vascular US-image segmentation to enhance the structures of interest prior to the actual delineation by more conventional contour-based methods [menchon2016early, qian2020segmentation, shin2016automating]. The drawback of these approaches is the necessity to combine a learnable pre-processing operation with an analytic segmentation task.
The main contribution of the present work is a supervised learnable segmentation method, designed to extract the two contours of the IMC in B-mode US images. Anatomical interfaces, in asymptomatic arteries without plaque, are localized by a region-based approach using a collection of overlapping patches. The proposed patch-based solution successfully addresses the challenge of segmenting with a unique network architecture the entire exploitable part of the IMC despite strong variations in its width from one image to another.
2 Data and method
All training and evaluation processes were carried out on a publicly available multi-center database (http://dx.doi.org/10.17632/fpv535fss7.1). Images were acquired from both sides of the neck, for a total of images. Refer to [meiburger2021carotid] for more details.
For this study, we considered two experts (A1 and A2) who independently selected a region of interest (ROI
), where interfaces were perceptible, and traced control points within it. To obtain smooth contours piecewise cubic Hermite interpolating polynomial (PCHIP) was applied using MATLAB, Version 2020b (The Math Works, Inc.).
The proposed solution builds on a convolutional neural network known as U-net[ronneberger2015u], with dilated convolutions on the bottleneck to increase its receptive field [meshram2020deep]. As the annotations are available for ROIs of variable width, we cut the ROI into fixed-size horizontally overlapping patches, and it is also a way to apply the same receptive field on each image with a control pixel size. A post-processing combines the predictions made within the patches to extract smooth contours over the entire ROI regardless of its width. The core of the method consists of two steps: approximately detecting the far wall (Fig. 2c), and precisely segmenting the IMC contours (Fig. 2d).
Code is available at https://github.com/nl3769/caroSegDeep.
2.1 Detection of the far wall
Like in many state-of-the-art methods [zahnd2017fully, wang2020fully, menchon2016early, qian2020segmentation], the far wall is first detected and is considered as the initialization step.
Here, the patches are of full image height and -pixel width, and the corresponding U-net will be referred to as .
We first describe the pre-processing and the training phase, then we specify the post-processing chosen to obtain the curve approximately localizing the far wall on the entire ROI width from patch-wise predictions inferred using .
Pre-processing and training: All images of the database were resampled to a constant height of pixels (as the native height of all images in the database is around pixels, the distortion thus introduced was minimal). For training data, the median axis of the IMC was defined as the line halfway between LI and MA annotations, interpolated across the entire width of the ROI, and a reference mask () was generated by setting all pixels below the median axis to and the others to . Then ROI and were identically cut into patches and a 100-pixel overlap between patches aimed at data augmentation. Thus obtained patches with their associated masks (Fig. 3
) were fed into the training process, which used the ADAM optimizer and a loss function experimentally chosen to minimize the Hausdorff distance and maximize the overlay with respect to the reference masks, namely, the sum of the binary cross-entropy and the Dice loss.
Inference and post-processing: Prior to inference, each image is resampled as described above, and then the corresponding ROI is cut into -pixel patches. Next, all patches are segmented using . Knowing the location and the size of each patch, two maps are created:
prediction map: contains, for each pixel, the sum of values predicted by .
overlay map: contains, for each pixel, the number of overlapping patches it belonged to.
Dividing the prediction map by the overlay map provides, for each pixel, an average value in the range
, which is then binarized by using a threshold of 0.5, to obtain the segmentation map. The latter is cleaned by retaining the largest connected component. The median axis we seek is the upper boundary of thus segmented region. Eventually, a third order polynomial regression is applied to the retrieved boundary with the aim of increasing the robustness of the method.
2.2 Segmentation of the IMC
The far wall approximation is used to initialize the actual segmentation of the IMC, which uses many similar concepts explained in Section 2.1: overlapping patches of pixels, an overlay map, a prediction map, a similar post-processing except that two contours are extracted (the LI and MA interfaces), as well as the same optimizer, loss function, and U-net architecture. The dilated U-net trained here will be referred to as . Hereafter, we emphasize the specific choices made for this step.
Pre-processing and training: The segmentation task has to be as accurate as possible, hence the algorithm works at a sub-pixel resolution. To this purpose, the vertical pixel size of the images was homogenized to m using a linear interpolation. According to this physical size, the patch height of pixels roughly corresponds mm, which aims to encompass the IMC, knowing that the average IMC thickness is about mm. For training, the ground truth was then deduced from thus interpolated images (Fig. 4): each pixel located between the annotated LI and MA interfaces was set to , and the others to . Unlike the far wall detection, the patches were extracted along the median axis: at each abscissa , the mean ordinate of the median axis was computed on the patch width and three patches were extracted, respectively centered at , , and . This data augmentation attempted to cope with possibly inaccurate far-wall approximation as well as with tilted arteries.
Inference and post-processing: During inference, the patches are extracted along the far-wall approximation resulting from the first step (Section 2.1). At each abscissa three or more patches are captured at different ordinates, depending on the tilt of the median axis. The predictions made by in all patches are combined into a prediction map, and then the segmentation map is derived thereof, as described above. Finally, the LI and MA interfaces are respectively defined as the upper and lower boundaries of thus segmented region.
The evaluation was carried out using 5-fold cross-validation, so as to assess each network on data not seen during its training. In each fold, the database was split into training (), validation (), and testing () subsets. Thus, five pairs of networks and were trained and tested independently, and the results reported here are the merging of the test sets of these five pairs, thus evaluating the method on the entire database.
In the proposed cascade approach, a failure of the first step (far wall detection) will trigger a failure of the second step (IMC segmentation). To conduct a fair evaluation of both steps, we first quantified the success rate of the first step alone, then we quantified the accuracy of the second step by manually enforcing valid initial conditions when needed.
Robustness of the far wall detection: After visual inspection, predicted median axes ( of the database) were considered as failures, i.e. curves unusable to initialize the IMC segmentation step. Hence, the success rate was of and in the images with failures, the median axis was manually redrawn using a home-made graphical interface.
Accuracy of the IMC segmentation: The segmentation error was quantified by measuring the column-wise median absolute difference (MAD) between the method output and the annotations performed by A1, for LI, MA, and IMT. These results are summarized in Table 1.
|Measure||Method vs. A1||A2 vs. A1|
4 Discussion and conclusion
We developed and assessed an almost-automatic (two user mouse clicks to define the limits of the exploitable ROI) deep-learning method to extract the contours of the intima-media complex in longitudinal B-mode ultrasound images of the carotid artery. The method first approximately localizes the far wall, and then segments the anatomical interfaces of interest. The proposed approach allows segmenting ROIs of variable width without having to resize the images.
Robustness of the far-wall localization step is a prerequisite for overall correct segmentation. This step was successful in all but of the images, showing the robustness of the method. The actual segmentation step achieved good accuracy, with errors smaller than the inter-observer variability, both in terms of mean absolute difference in contour location (less than m vs. m) and of standard deviation (m vs. m). These results were comparable with the best-performing state-of-the-art methods evaluated on the same database [meiburger2021carotid]
. Based on supervised learning, our method has the potential to increase its performance by using larger and more diverse database for training, which was proved in a recent study, where our method outperformed the existing ones[meiburger2022carotid].
The largest errors occurred in the presence of calcified plaques. As the work presented here was oriented towards asymptomatic plaque-free subjects, images with plaques were not expected. Nevertheless, we anticipate that results might be improved by enriching the database with such images appropriately annotated, and subsequently re-training the networks. This avenue desserves investigation.
In conclusion, with a success rate and the accuracy comparable to human experts, the proposed method may be recommended for use in clinical practice.
This work was partly supported, via NL’s doctoral grant, by the LABEX PRIMES (ANR-11-LABX-0063) of Université de Lyon, within the program ”Investissements d’Avenir” (ANR-11-IDEX-0007) operated by the French National Research Agency (ANR).
The authors have no relevant financial or non-financial interests to disclose.
6 Compliance with ethical standards information
The data from human subjects used in this work were obtained and treated in line with the principles of the Declaration of Helsinki. Approval was granted by the Ethics Committees of the institutions involved in creating the multicentric database, from which these data were accessed.