According to the World Health Organization, stroke is the second leading cause of death in the world. It accounts for about 11% of all deaths. The early diagnosis of acute stroke is of primary importance for deciding on a method for further treatment.
In the process of diagnosing an acute stroke, a radiology specialist performs a manual search and identiﬁcation of the pathological density area in the substance of the brain on the computed tomography (CT) brain images. It is worth noting that non-contrast CT scans are more readily available and have no contraindications, unlike Magnetic Resonance Imaging (MRI) or contrast CT. However, interpretation of non-contrast CT images is rather diﬃcult and ambiguous, as it depends on technical and human factors. An automated system for the recognition of acute stroke on non-contrast CT brain images seems to be a promising solution to this problem. Such a system can be used by radiologists to check the accuracy of their stroke area predictions and to assist in the decision-making process for further treatment.
Many works are devoted to the segmentation of CT images and show a good performance. But in most cases, such algorithms require a large amount of accurately annotated data, which is not easy to obtain in the medical field. For example, the manual segmentation of the stroke areas on CT scans is time-consuming, requires a highly qualified specialist, and therefore is expensive. Moreover, it is known that even a highly skilled specialist can make erroneous predictions with a fairly high probability. If there is a large amount of data, a specialist can label the affected area approximately, for example, with a bounding box or an oval. Such labeling is considered inaccurate because it contains many inaccurately labeled pixels, but it is much easier and faster to get.
In this regard, it is very important to have an algorithm capable of learning from inaccurately labeled data. When constructing a neural network algorithm for weakly supervised learning, it is necessary to take into account that the data on which an algorithm will be trained contain erroneous labels, which will affect the segmentation accuracy on the validation set.
In this work, a neural network algorithm based on U-net  architecture is developed and methods for solving the segmentation problem in the case of incomplete information in training data are proposed.
In the rest of the paper, we give a brief overview of the related work, then describe the existing dataset, the neural network architecture used, its modifications, and training details. Then we present the proposed methods for weakly supervised learning, the obtained results, and some implementation details.
2 Related work
It is worth noting that in the problems of semantic segmentation of medical images fully convolutional neural networks (CNNs) show a better performance in comparison with classical machine learning methods such as, for example, kNN, SVM, Random Forest, and Adaboost classifiers.
Most approaches to solving the weakly supervised segmentation problem can be divided into three groups: 1) a two-stage process where images are initially processed to obtain more accurate segmentation masks and then are fed to the neural network, 2) directly modifying the neural network architecture, 3) a mixture of 1) and 2). An example to the first approach is . At first, the authors get pseudo-masks from the bounding-box segmentation masks using ConvCRF  (modification of Fully Connected CRFs 
), then they train the ensemble of CNNs on them. From the ensemble’s predictions, a voxel-wise weight map is obtained. These weights are then used in the loss function when training the final CNN. An example to the second approach is
. In this work, a proposed BBConV layer is added to U-Net, which receives a bounding filter as input. In each skip connection, the intersection between the level contracting layer output and the BBConV layer output is then obtained and further concatenated with the features from the up-sampling layers. Such a technique allows the network to enhance its estimation of where an organ can be. And an example to the third approach is. In this work, pseudo-masks are obtained from RECIST diameters using GrabCut , and then a co-segmentation CNN model with attention modules is trained on them. There are also multi-fidelity methods , which are very popular nowadays. Using this approach, it may be assumed that segmentation masks come from the models of different fidelities. Thus, it is much cheaper to generate the sample set, since it may contain only a few high-fidelity and a lot of low-fidelity samples.
Our approach is inspired by the idea of a weighted loss function in , but we obtain the weights in another way, and they have a different meaning.
To solve our problem, 42 CT brain images of patients with diagnosed acute stroke were obtained from the database of the International Tomography Center SB RAS, Novosibirsk, Russia. In each image, the radiologist performed a careful manual segmentation of areas affected by stroke. These images also contain a certain percentage of inaccurately labeled pixels, but it is relatively small, so we simulate the inaccuracy on the segmentation masks with ovals, see an example in Fig. 1.
We form a new dataset by replacing some randomly chosen from all 42 segmentation maps made by the radiologist with simulated inaccurate oval masks. In this paper, the case is considered (about 20% of the dataset). We use such data augmentation techniques as horizontal flip, rotation, random sized cropping, and elastic transformation. All images were prior cropped by the central area and resized to 512 512. Mini-max normalization was applied to CT scans.
4 Neural network architecture
We use the U-Net architecture with the size of the initial feature channel equal to 32 instead of 64 as in the original work. We add to all feature maps before a 3
3 convolutional layer; thus, the network’s output is the same size as the input image. Batch normalization and dropout layers are added after each convolutional layer. We add the pyramid pooling module[10, 11] in the bottleneck of U-Net; it helps to capture global information from different regions of the input image. Also, we add another convolutional layer after pyramid pooling, similar to the U-Net convolutional block, see Fig. 2. As is known, such a modification can increase the accuracy [10, 11].
We compute the exponential weighted average of the CNN’s parameters obtained during learning on the training sample:
where is the number of the last mini-batch. Thus, the loss function decreases more smoothly and finds the minimum more accurately. We use the weighted binary cross-entropy loss function as classes are highly imbalanced. The area affected by stroke occupies a small part of the entire image. We also combine it with the Dice loss  by parameter :
where is the number of all pixels corresponding to all affected areas, is the number of all background pixels, is the ground-truth label of -th pixel, where 1 denotes the stroke area, 0 denotes the background.
6 Proposed methods for weakly supervised segmentation
To solve the weakly supervised segmentation problem, we introduce two models of inaccuracy presented in subsections 6.1 and 6.2 respectively. Each model of inaccuracy shows the likelihood that the pixel label is correct. The resulting values obtained from these models are used as weights in the loss function in the manner described in subsection 6.3.
6.1 The First Model of Inaccuracy
The method is based on the observation that the farther pixel corresponding to the affected area in the oval mask from the center of the supposed affected area (i.e. the oval) is, the more likely that its label is wrong. Since a specialist makes a coarse-grained annotation, then most of the inaccurately labeled pixels are likely to be near the border of the resulting predicted stroke area.
Suppose is a matrix corresponding to the inaccurate oval segmentation mask, where , is image dimension, , 1 denotes the stroke area, 0 denotes the background. The value obtained from the first model of inaccuracy for a pixel is computed using the Euclidean distance from the center of the supposed stroke area using the following equation:
where are matrix indices of pixel , and are matrix indices of the central pixel of the oval. is a small positive real number; thus, such a normalization by makes for all .
6.2 The Second Model of Inaccuracy
This model is also based on the observation that the farther pixel corresponding to the affected area in the oval mask from the center of the supposed affected area is, the more likely that its label is wrong. The difference of the second model is that we use the Mahalanobis distance. Mahalanobis distance differs from Euclidean distance in that it takes into account the correlation between variables and is scale invariant. Mahalanobis distance is computed by the following equation:
where are the indices of pixel ,
is a vector of mean values ofand indices of pixels corresponding to the affected area. is the inverse covariance matrix calculated from indices of pixels belonging to the oval. The values obtained from the second model of inaccuracy are calculated similarly:
The distribution of the values obtained from the first and the second models of inaccuracy is shown in Fig. 3.
The darker the pixel, the lower its value of is.
6.3 Modified Loss Function
The value for the -th pixel obtained by the first or the second model of inaccuracy is then used in loss function as weight :
White pixels are less reliable in oval masks. During the training stage, multiplying by weights has less impact on the loss function contributing. This also results in less impact on parameter changes by the backpropagation method. It is worth noting that the weightsin the loss function are only used for inaccurate oval segmentation masks.
7 Results of experiments
To make the results more reliable, we run the experiments on five different subsets of data and average the results obtained by training on them. We form these subsets by replacing different segmentation maps made by the radiologist with simulated inaccurate oval masks without repetitions. Dice similarity coefficient (DSC) and 5-fold cross-validation testing were used.
In order to reduce the impact of inaccurately labeled pixels, we convert the values by the power function
. The results of applying the First and Second Models of Inaccuracy (MoI) are shown in Table 1. It shows the DSC value and its standard deviation depending on the powerin the function .
|1st MoI||2nd MoI|
The result of the experiment when training without weights in the loss function is 0.7478 DSC. Thus, the proposed methods with the first and the second models of inaccuracy improve the segmentation quality on average by 1.31 and 1.55, respectively.
8 Implementation details
In this work, the problem of weakly supervised semantic segmentation of non-contrast computed tomographic brain images in the diagnosis of stroke was considered. Under the weakly supervised task, we understand the scenario, when some images are labeled accurately and some images are labeled inaccurately. This task is important since accurately annotated data is expensive and not easy to obtain.
Our proposed methods for weakly supervised segmentation using weights obtained by first and second models of inaccuracy improve the quality of segmentation; their effectiveness has been tested on real computed tomography images. In the future, it is planned to use other different neural network architectures, for example, 3D U-Net .
The work was partly supported by RFBR grant 19-29-01175.
-  Ronneberger O, Fischer P and Brox T 2015 U-Net: Convolutional Networks for Biomedical Image Segmentation Medical Image Computing and Computer-Assisted Intervention – MICCAI 9351 pp 234–241
-  Nedel’ko V, Kozinets R, Tulupov A and Berikov V 2020 Comparative Analysis of Deep Neural Network and Texture-Based Classifiers for Recognition of Acute Stroke using Non-Contrast CT Images Ural Symp. on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT) pp 376-379
-  Yang G et al 2020 Weakly-supervised convolutional neural networks of renal tumor segmentation in abdominal CTA images BMC Med Imaging 20 p 37
-  Teichmann M T and Cipolla R 2019 Convolutional CRFs for Semantic Segmentation Preprint 1805.04777
-  Krahenbuhl P and Koltun V 2011 Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials Advances in Neural Information Processing Systems 24 pp 109-117
-  Jurdi R, Petitjean C, Honeine P and Abdallah F 2020 BB-UNet: U-Net with Bounding Box Prior IEEE Journal of Selected Topics in Signal Processing 14 pp 1189-98
-  Agarwal V, Tang Y, Xiao J and Summers R M 2020 Weakly-supervised lesion segmentation on CT scans using co-segmentation Proc. SPIE, Medical Imaging: Computer-Aided Diagnosis 11314 pp 356–361
-  Rother C et al 2004 Grabcut: Interactive foreground extraction using iterated graph cuts ACM Transactions on Graphics 23 pp 309–314
-  Peherstorfer B, Willcox K and Gunzburger M 2018 Survey of multifidelity methods in uncertainty propagation, inference, and optimization SIAM Review 60(3) pp 550-591
-  Abulnaga S M and Rubin J 2019 Ischemic stroke lesion segmentation in CT perfusion scans using pyramid pooling and focal loss Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries pp 352–363
-  Zhao H, Shi J, Qi X, Wang X and Jia J 2017 Pyramid Scene Parsing Network pp 2881-90
-  Jadon S 2020 A survey of loss functions for semantic segmentation IEEE Conf. on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) pp 1-7
-  Kingma D P and Ba J 2015 Adam: A Method for Stochastic Optimization 3rd Int. Conf. for Learning Representations
-  Smith L N and Topin N 2019 Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications 11006
-  Taha A A and Hanbury A 2015 Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool BMC Med Imaging 15 p 29