Classification of glomerular hypercellularity using convolutional features and support vector machine

06/28/2019 ∙ by Paulo Chagas, et al. ∙ 7

Glomeruli are histological structures of the kidney cortex formed by interwoven blood capillaries, and are responsible for blood filtration. Glomerular lesions impair kidney filtration capability, leading to protein loss and metabolic waste retention. An example of lesion is the glomerular hypercellularity, which is characterized by an increase in the number of cell nuclei in different areas of the glomeruli. Glomerular hypercellularity is a frequent lesion present in different kidney diseases. Automatic detection of glomerular hypercellularity would accelerate the screening of scanned histological slides for the lesion, enhancing clinical diagnosis. Having this in mind, we propose a new approach for classification of hypercellularity in human kidney images. Our proposed method introduces a novel architecture of a convolutional neural network (CNN) along with a support vector machine, achieving near perfect average results with the FIOCRUZ data set in a binary classification (lesion or normal). Our deep-based classifier outperformed the state-of-the-art results on the same data set. Additionally, classification of hypercellularity sub-lesions was also performed, considering mesangial, endocapilar and both lesions; in this multi-classification task, our proposed method just failed in 4% of the cases. To the best of our knowledge, this is the first study on deep learning over a data set of glomerular hypercellularity images of human kidney.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 8

page 20

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Digital histopathology is a research field that exploits digital images for the analysis of tissue samples. The digital pictures are obtained either by scanning histological whole-slide-images (WSIs) or by collecting snapshots of histological structures relevant for the diagnosis of diseases (Al-Janabi et al., 2012)

. This approach makes gathering large-scale data sets of histological lesions easier to review or to exchange information among pathologists without the inconvenience of working with the actual glass slides. The evolution of the computer vision field impacted the entire digital medicine, supporting pathologists on the automatic analysis of various types of medical images, as well as improving the accuracy of computer-aided diagnosis

(Madabhushi and Lee, 2016; Litjens et al., 2017).

In the special case of renal histopathology, disease markers are mostly found in the glomeruli, presenting highly diverse and heterogeneous characteristics. The glomerulus is a histological structure from the kidney cortex, formed by a network of capillaries charged of performing blood filtration. As an elementary filtering structure, it is targeted with many primary and systemic diseases, leading to different patterns of glomerular lesions. Finding and classifying glomerular lesions are fundamental steps toward the diagnosis of many kidney diseases. These tasks rely on the expertise of pathologists and much effort has been made to better define and create consensus about relevant lesions. In fact, after successive discussion and validation studies in the field, increased consistency has been achieved in the diagnosis and classification of glomerular renal diseases such as lupus nephritis, IgA nephropathy, and rejection of kidney transplant (Bajema et al., 2018; Trimarchi et al., 2017; Joosten et al., 2005). Some limiting factors to the performance of histological diagnosis are the complexity of lesions, which, in some cases, may impair a clear definition in terms of criteria and consequently a suitable agreement among pathologists (Barisoni et al., 2013).

Figure 1: Mark with an on the image with hypercellularity lesion.

Particularly, glomerular hypercellularity is a frequent lesion found in kidney biopsies, defined by an increase in the number of cells in the glomeruli. This type of lesion is an integral component of many glomerular diseases such as proliferative and membranoproliferative glomerulonephritis, being a marker of activity in lupus and IgA nephropathy (Bajema et al., 2018; Trimarchi et al., 2017). Hypercellularity can be identified by a careful look at the histological sections from the glomeruli, searching for the presence of agglomerates formed by four or more cell nuclei in the mesangial area (mesangial hypercellularity), or by cell aggregates that fill the capillary lumen (endocapillary hypercellularity) (Churg et al., 1995; Fogo, 2003). Figure 1 shows the complexity of this problem, and the following question can be raised: Which image depicts a glomerulus with a hypercellularity lesion? The answer to this question is the image on the right due to the increased nuclei density; on the left, the image shows an example of a normal glomerulus with no significant number of cell clusters.

Although hypercellularity is easy to define and usually easy to be assessed in histological sections, an agreement among pathologists may decrease for focal hypercellularity and for occurrences in specific regions of the glomerulus. For instance, a recent report from the IgA Nephropathy Classification Working Group showed inconsistencies among specialist even in the use of dichotomous MEST system scores such as E (endocapillary hypercellularity) and M (mesangial hypercellularity) (Trimarchi et al., 2017). Correct assessment of these scores is crucial for relevant clinical-pathological correlation and for predicting the patient outcome. A consistent glomerulus classification can be deemed as an important and difficult step towards diagnosing a renal disease in a biopsy evaluation (Pedraza et al., 2017).

Some works have already approached the tasks of glomerulus identification and segmentation (Sarder et al., 2016; Kannan et al., 2019; Simon et al., 2018), which are useful in situations that require an analysis of the entire WSI. Barros et al. (2017) proposed a method relying on classical image pre-processing techniques and a k-nearest neighborhood to classify hypercellularity lesions; that work used 811 images of human glomeruli (referred here as FIOCRUZ data set) stained with hematoxylin-eosin (H&E) and periodic acid–Schiff (PAS) from a set of biopsy slides. More recently, deep neural networks outperformed handcrafted features for some tasks on histological images as well, achieving stunning results in different scenarios (Janowczyk and Madabhushi, 2016; Xu et al., 2016; Sharma et al., 2017; Wahab et al., 2017; Spanhol et al., 2016; Hou et al., 2016; Zhang et al., 2018; Fabijańska, 2018; Gandomkar et al., 2018). In particular to glomerular detection with deep-learning, Marsh et al. (2018) introduced a convolutional neural network for automatic localization of glomeruli, further classifying global glomerulosclerosis in donor kidney biopsies for transplantation.

An automated process for glomerular lesion classification would have many applications, such as: Large-scale classification of cases based on histological images, consistency of morphological classification, and identification of tissue markers of disease progression.

1.1 Contributions

Three main contributions are brought here: (i) Instead of using conventional classification methods as in (Barros et al., 2017), we propose a CNN-based architecture to extract trainable features to represent a glomerulus, (ii) by using the proposed CNN as a feature extractor, an SVM classifies the CNN features as a normal or a injured glomerulus, (iii) we also extend the proposed model for classification of specific hypercellularity lesions (endocapillary hypercellularity, mesangial hypercellularity, and both), providing an analysis of the generated features for both binary and multi-lesion classification. The final CNN-SVM classifier reached near perfect results in four different train/test splits of the data set introduced in (Barros et al., 2017); in the multi-classification task, the same architecture failed in just 4% of the cases in ten-fold cross-validation study. At the end, the misclassified images were analyzed by three pathologists, showing that there were no consensus for most of those images.

2 Classifying glomerular hypercellularity

The classification of a glomerular hypercellularity lesion could be tackled as defining areas and counting nuclei. If the number of nuclei per area surpasses a threshold, one can diagnose a glomerulus as with a hypercellularity lesion. Instead of following this pathologist-annotation approach, an automatic classification consists of using examples of histological images to train a classifier. A histological image is a 2-dimensional grid of pixels that brings specific information such as colors, edges, shapes, textures, which can be general or specific to classify a glomerular lesion. Consequently, conceiving a successful feature extractor demands some domain expertise, which brings us to the following question: What is the best feature set for classifying glomerular hypercellularity lesions?

Many feature extraction techniques are available in the literature, and a specific method could be designed as well. In contrast to conventional classifiers, deep-learning aims to automatically learn hierarchical feature representations of the input data, without the need of creating any particular feature extractor

(LeCun et al., 2015)

. Our work proposes a novel CNN-based architecture for glomerular hypercellularity classification. After training a CNN, it is possible to use a strong classifier on the convolutional backbone of features. This way, we propose to use a CNN architecture to extract trainable features, which ultimately will feed an SVM to carry on the final classification. The proposed architecture is evaluated for both binary and multi-class classification. The rationale to use an SVM is based on the main characteristic of this classifier that is to cast optimization problems, which are convex and quadratic. Ultimately, these characteristics guarantee that the hyperplane found is the optimum one. The second reason is to analyze the behavior of feature space extracted from the CNN, which empirically demonstrated to be linear, in our experiments. Linearity in the feature space is expected to provide faster and higher results.

2.1 Conceiving the proposed CNN architecture

There are several well-established CNN architectures available in the literature (Canziani et al., 2016), which were designed to be robust to deal with hundreds of different classes. However, these models tend to overfitting, when trained using few data. Since the data set we used (Barros et al., 2017) consists of a small training set, we decided to build our own architecture from scratch, modifying it accordingly to our needs. The ultimate goal is to focus on achieving a high accuracy, avoiding overfitting.

A CNN architecture is organized in layers, each one applying a specific operation. Although there are many variations of CNN architectures, they share some basic components, such as convolutional, pooling, and fully-connected layers (Gu et al., 2015)

. The convolutional layer is the fundamental building block of a CNN model, which is comprised of various learnable kernels (filters) followed by a nonlinear activation function. A pooling layer (usually applied after a convolutional layer) is used to compute feature maps condensed in a smaller representation with the goal of achieving some invariance. After some convolutional and pooling operations, the top of the network results in a high-level representation of the input image, which is more robust than the raw pixel information, or hopefully than handcrafted features. This type of architecture requires a fully-connected layer to perform high-level classification using those features, working as a multilayer perceptron (MLP) on top of a CNN backbone.

Four architectures were initially implemented and Figure  6

highlights the convolutional blocks (CNN backbone) used for feature extraction, and the MLP blocks (fully-connected layers and final activation) used for classification. The first architecture was designed in the view of investigating how the lesion classification behaved using fewer layers. In addition to the operations previously cited, batch normalization, regularization, and dropout operations were applied to reduce overfitting. The first architecture (Fig.

(a)a

) is composed mainly of four convolutional layers, with the other operations applied between those layers, followed by one fully-connected layer. A rectified linear unit (ReLu) was used as an activation function and max-function for pooling operations. For the calculation of the class probabilities after the fully-connected layers, a sigmoid function was first tried, and further changed to a soft-max function. With this first architecture in mind, updates were performed based on the stability of the accuracy curve in the validation set, and other three architectures were proposed (Figs.  

(b)b, (c)c and (d)d).

(a)
(b)
(c)
(d)
Figure 6: Four CNN architectures proposed here: (a) Architecture 1 and (b) architecture 2, with four convolutional layers in the backbone; (c) architecture 3 and (d) architecture 4, with five and six convolutional layers, respectively, in the backbone.

In order to choose the best model among the candidate architectures, we randomly selected 90% of the data set for training the model, while using 10% for validation. To deal with the great size of the data set in memory, we applied a mini-batch strategy, which consists of using several batches of

images to update the final model (instead of one single block of data). After each epoch, the proposed architecture was evaluated by using the validation set. Since we focused on reducing the overfitting, the more likely architecture to be selected would be the one with high accuracy and less oscillation in the accuracy. Figure 

7 shows the accuracy curve for each architecture, illustrating the raise not only on the accuracy peak, but also on the stability of the curve after several epochs. Our final CNN architecture (Fig. (d)d

) consists mainly of six convolutional layers, five max-pooling layers, followed by three fully-connected and one soft-max layers for classification. The training parameters were empirically obtained through several experiments on the four architectures. The best results using

architecture 4 were achieved by training the deep network using the following parameters: 200 epochs, Adam training algorithm (Kingma and Ba, 2014), of decay rate, batch size of 32, and a learning rate of .

Figure 7:

Stability evaluation of CNN architectures. From ’Architecture 1’ to ’Architecture 4’, accuracy reaches stability as the number of training epochs increases. Best results were achieved with ’Architecture 4’, which used a softmax layer at the top, resulting in a more stable accuracy on the validation set.



Figure 8: CNN-SVM architecture. From left to right: a glomerulus as an input image in an RGB color space, resized to 224224 pixels. After applying architecture 4 (Fig. (d)d) for feature extraction, a feature vector with 128 features is generated. Finally, the resultant feature vector is classified by an SVM evaluated by considering linear, polynomial, RBF and sigmoid kernel functions.

2.2 Classifying the CNN features with SVM

After choosing the best architecture, the trained CNN features fed an SVM, instead of the multi-layer perceptron used for training the model. This CNN-SVM architecture was evaluated with four kernel functions: Linear, radial basis function (RBF), polynomial and sigmoid (see Fig. 

8).

SVM is a supervised binary classifier, which finds an optimal hyperplane to separate the classes of hypercellularity from those of normal glomeruli by using support vectors. When these classes are non linearly separable, different kernel functions can be used to map the input vectors to a higher-dimensional space (so-called feature space), in which the input image can be linearly separated. To classify an input feature vector, SVM evaluates the sign of a function f(x), given by

(1)

where there are -support vectors with the model parameters, , and a bias parameter, , laplacian coefficients from the dual optimization problem, . is a kernel function.

3 Experiment analysis

3.1 Data set

In order to assess the performance of our proposed CNN architecture, the data set introduced in (Barros et al., 2017) was used. The data set consists of 811 images, including 300 images of normal human glomerulus, while 511 images of human glomerulus with hypercellularity. As the images originated from human kidneys with different diseases, the cellular component of the hypercellularity varies among the cases. The images were selected from the digital histological image library of the Gonçalo Moniz Institute (FIOCRUZ), including cases of all the kidney biopsies performed for the diagnosis of glomerular diseases in referral nephrology services of public hospitals in Bahia state, Brazil, between 2003 and 2015. The tissue samples were fixed in Bouin’s fixative or formalin–acetic acid–alcohol, included in paraffin. Sections of 2-3 m were stained by H&E and PAS. The images were captured using an Olympus QColor 3 digital camera attached to a Nikon E600 optical microscope (using 200 magnification). Details of the clinical and demographic characteristics of the patients from which kidney biopses were collected are presented in (Barros et al., 2017). Considering Oxford MEST, the former binary data set was relabeled into four classes: Endocapillary (90 images with endocapillary hypercellularity), with mesangial (238 images with mesangial hypercellularity), endoMes (179 images of both lesions) and normal (304 images with no lesion). In this re-evaluation process, using the MEST criteria for hypercellularity, it is noteworthy that four images were misclassified as lesioned glomeruli in the original binary data set used by Barros et al. (2017). This occurrence led to a difference between the number of normal glomeruli on the binary corpus (300 images) and on the 4-class (304) data set.

3.2 Methodology

All images were resized and normalized to 224224 pixels. For a comparative evaluation considering a binary classification, a K-fold cross-validation was applied, varying K as 2, 3, 5 and 10 folds. On each iteration, 1 different fold is used for validation, and the rest ( folds) is used for training the model. With the best CNN architecture, we compared the performance of two types of classifiers on the top of CNN backbone: CNN-MLP and CNN-SVM. Our methodology can be summarized in two steps:

  • CNN-MLP: the best architecture is first found by using only 90/10 split without cross-validation. Next, using different values of K, we applied K-fold cross-validation, analyzing the performance of the models using different sizes of training and validation data.

  • CNN-SVM: For each value of K (folds), we selected the best CNN-MLP model. Then, we used the CNN features, obtained from the last layer before the fully-connected MLP, for the input of the SVM (see Fig. 8).

Finally, for the multi classification, we used the same approach as the binary classification, but without varying the value of . Since the 4-class data set is derived from the original data set used for binary classification, the number of images per class became smaller. This way, we decided to use in order to avoid a very small number of training samples per class. The one-versus-all technique was used to achieve SVM multi-class outputs.

3.3 Evaluation metrics

Four metrics were used to evaluate our proposed method: Precision (P) as the ratio of correctly predicting glomerular hypercellularity, and the sum of predicted true positive and false positive observations (whereby high precision is regarded to low false positive rate), recall (R) as the ratio of correctly predicting glomerular hypercellularity, and the sum of predicted true positive and false negative observations (whereby high recall is regarded to low false negative rate), f1-score (F1)

as the weighted average of precision and recall (whereby high f1-score is regarded to high precision and recall rates), and, finally,

accuracy (ACC) as the ratio of correctly predicting glomerular hypercellularity and normal glomeruli, and the total sum of positive and negative observations (whereby accuracy is proportional to true positive and true negative rates, and inversely proportional to false positive and false negative rates).

3.4 Evaluating the proposed CNN model for binary classification

The final CNN was evaluated by using the average of the chosen metrics, observing how the model generalized the classes as the size of the training and validation set changed. It is noteworthy that a equals to 2 means a split of 50/50, as well as, K equals to 3, 5 and 10, mean 67/33, 80/20 and 90/10, respectively. Since the training set decreases proportionally to K, we used a technique of image augmentation, enlarging twice the original data set after applying pre-defined random modifications such as rotation, horizontal flip, zoom and shift. The training parameters were the same as the ones used to train the last architecture (see Section 2.1). For each value of K, there were K different validation sets, resulting in K training processes and K candidate models at the ending of the training. For example, for K=10 there is one model for each training set combination, resulting in 10 models. When we evaluate only the CNN-MLP approach, the average of the metrics were computed with respect to these 10 models. However, since the aim was using the model as a feature extractor backbone, the best one out of the 10 candidates was selected, choosing the one with highest accuracy of all epochs. Table 1

shows the results of training the proposed CNN-MLP model, displaying the average metrics and their standard deviations for each train/test split. In general, all the train/test splits returned top results, achieving accuracies between 98.8% (50/50 split) and 99.6% (90/10 split). As expected, in the experiments using larger training sets (90/10 split), better results were achieved, although the worst scenario (50/50 split) still showed superior values for all the proposed metrics (around 98%) in comparison with previous work

(Barros et al., 2017) (85%). Another observation is the small standard deviation on all results, demonstrating the stability of the model.

Split CNN-MLP
P R F1 Acc
90/10
80/20
67/33
50/50
Table 1: Comparison between four different train/test splits with CNN-MLP on binary classification. The average metrics and their standard deviations are given for precision (P), recall (R), f1-score (F1) and accuracy (ACC).

3.5 Choosing the best SVM kernel for binary classification

Choosing optimal parameter values for the SVM kernel raises some questions about the interpretation of the model generated by this function and the results obtained. These questions were investigated in several works (Chapelle et al., 2002; Duan et al., 2003; Imbault and Lebart, 2004; Fu et al., 2004; de Souza et al., 2006). As shown in Table 2

, the CNN-SVM architecture was evaluated with three parameters of kernel functions: ’C’, ’gamma’ and ’degree’. The regularization parameter ’C’ is 1 by default, common to all SVM kernels, trading off misclassification of training examples against flatness of the solution. A low ’C’ makes the classifier flatness smooth, while a high one can lead to overfitting. The ’gamma’ parameter is usually 1 by default divided by number of features, and it is presented in all SVM kernels, but the linear. A small ’gamma’ value represents a Gaussian distribution of the kernel function with large variance in such a way that the model might not capture the ”shape” of the data set. When ’gamma’ is high, the resulting model will behave similarly to a linear kernel with a set of hyperplanes separating the points of the two classes; hence, large gamma takes to high bias and low variance models, and vice-versa. The ’degree’ parameter is 3 by default, and used only in polynomial kernel function. This parameter adjusts the feature space for higher-dimensional interactions. Larger ’degrees’ tend to overfit the data.

Kernel Function Parameter
Linear ’C’: [0.001, 0.01, 0.1, 1, 10, 100]
RBF , ’C’: [0.001, 0.01, 0.1, 1, 10, 100],
where refers to gamma ’gamma’: [0.001, 0.01, 1, 1.5, 2]
Polynomial (, ’C’: [0.001, 0.01, 0.1, 1, 10, 100],
where denotes gamma, by coef ’gamma’: [0.001, 0.01, 1, 1.5, 2],
and by degree ’degree’:[1,2,3,4]
Sigmoid , ’C’: [0.001, 0.01, 0.1, 1, 10, 100],
where denotes gamma ’gamma’: [0.001, 0.01, 1, 1.5, 2]
and is specified by coef
Table 2: Range of parameters to be evaluated for each SVM kernel.
Split Kernel Parameters Acc
90/10 Linear ’C’: 1 1.000 (0.000)
RBF C’: 0.1, ’gamma’: 0.001 1.000 (0.000)
Polynomial ’C’: 1, ’degree’: 1, 1.000 (0.000)
’gamma’: 1
Sigmoid ’C’: 0.01, ’gamma’: 0.01 1.000 (0.000)
80/20 Linear ’C’: 0.001 0.994 (0.011)
RBF C’: 0.1, ’gamma’: 0.01 0.996 (0.010)
Polynomial ’C’: 0.001, ’degree’: 1, 0.994 (0.011)
’gamma’: 1
Sigmoid ’C’: 0.01, ’gamma’: 0.01 0.996 (0.010)
67/33 Linear ’C’: 10 0.993 (0.006)
RBF ’C’: 1, ’gamma’: 1 0.994 (0.003)
Polynomial ’C’: 0.001, ’degree’: 3, 0.994 (0.007)
’gamma’: 2
Sigmoid ’C’: 1, ’gamma’: 0.01 0.991 (0.009)
50/50 Linear ’C’: 0.01 0.988 (0.005)
RBF C’: 10, ’gamma’: 0.01 0.988 (0.005)
Polynomial ’C’: 0.001, ’degree’: 2, 0.988 (0.005)
’gamma’: 1.5
Sigmoid ’C’: 1, ’gamma’: 0.01 0.988 (0.005)

Table 4: Comparison between four different train/test splits with CNN-SVM on binary classification. The average metrics and their standard deviation are given for precision (P), recall (R), f1-score (F1) and accuracy (ACC).
Split CNN-SVM
P R F1 Acc
90/10
80/20
67/33
50/50
Table 3: The best results per SVM kernel on binary classification.

The same range of K values applied to evaluate the CNN was also used to evaluate SVM. Table 4 shows the best parameter combinations for each kernel at each split, using accuracy as a metric for optimization. It is noteworthy that the linear kernel achieving top results means that the feature space can be linearly separable. The overall results of the CNN-SVM approach are summarized in Table 4, showing the performance of the proposed approach using the previously defined metrics. For the 90/10 split, all SVM kernels showed perfect ACC. For the 80/20 split, RBF and sigmoid kernels achieved the highest results. In the 67/33 split, RBF kernel obtained the best result. In the 50/50 split, all SVM kernels achieved the same results.

3.6 Extending the proposed architecture to multi-classification

Method P R F1 Acc
CNN-MLP
CNN-SVM
Table 6: The best parameters per SVM kernel on 4-class classification.
Kernel Parameters Acc
Linear ’C’: 0.01 0.945 (0.056)
RBF C’: 10, ’gamma’: 0.001 0.944 (0.057)
Polynomial ’C’: 0.01, ’degree’: 1 0.945 (0.056)
’gamma’: 1
Sigmoid ’C’: 10, ’gamma’: 0.001 0.945 (0.056)
Table 5: Comparison between CNN-MLP and CNN-SVM models on 4-class classification with 90/10 split. The average metrics and their standard deviation are given for precision (P), recall (R), f1-score (F1) and accuracy (ACC).

We also proposed the use of our CNN-SVM architecture for classification of a 4-class data set, including the following classes: Endocapillary hypercellularity, mesangial hypercellularity, endoMes (both lesions) hypercellularity, and normal glomerulus. The same binary classification methodology was followed, but now maintaining K=10 on the cross-validation. As expected, the only modification on the CNN architecture was the number of dense layers at the top of the model, since the number of classes was changed. At each fold on cross-validation, weights from the best CNN-MLP model on binary classification were loaded, updating only the number of classes on the last layer. Then, the whole CNN-MLP model was retrained on the 4-class data set using the same former training parameters, achieving an average accuracy of 90.6%. Just as the binary classification, the best model was selected among the 10 models from each fold of cross-validation, using the CNN backbone as a feature extractor, feeding an SVM classifier. The kernel parameters were varied in the same way as in the former experiments, achieving, as the best result, an average accuracy of 94.5%. Table 6 displays the final results for CNN-MLP and CNN-SVM classification on the 4-class data set, while Table 6 shows the parameters of the best results for each SVM kernel. The linear kernel achieved the overall best result again, proving the robustness of the CNN architecture for feature extraction.

4 Discussion and conclusions

Overall on binary classification, the two classification approaches (CNN-MLP and CNN-SVM) achieved high results on all metrics with low standard deviations, as showed in Tables 1 and 4. The two methods had close results, with CNN-SVM approach showing a slightly better performance for every value of , proving the robustness of the final proposed model. Despite the unbalanced data set (more samples for lesion than for normal glomeruli), we did not observe the models being heavily biased on the class with more images. This behavior may be due to two factors: Image augmentation and feature quality. The process of image augmentation helped to solve this issue by increasing the number of images through random modifications on the original training set. The features obtained from the CNN backbone proved to be highly suitable for classification using all kernels, achieving an average accuracy of 100% on the linear kernel. This outcome demonstrates that, despite the size of the CNN features (128), these features are linearly separable, which is an outstanding finding.

Method Precision Recall F1-Score Accuracy
CNN-SVM 1.00 1.00 1.00 1.00
CNN-MLP
Barros et al. (2017) 0.88 0.88 0.88 0.85
Table 7: Comparative performance for glomerular hypercellularity on binary classification.

A summary of the results of binary classification is presented in Table 7, displaying the best results of the CNN-MLP and CNN-SVM models in comparison with the method proposed in (Barros et al., 2017). As that previous work did not use the F1-score for evaluation, we calculated this score based on the provided precision and recall. Hence, we could compare the three results using all four metrics, considering 10-fold cross-validation (90/10 split). To the best of our knowledge, Barros et al. (2017) were the first to address the problem of glomerular hypercellularity lesion classification so far, what demonstrates that we achieved an improvement of 15 percentage points with our proposed deep learning-based model on the same data set.

Considering the 4-class classification, both CNN-MLP and CNN-SVM models achieved high results, even though the gap between these two approaches has increased (around four percentage points), as we can see in Table 6. This behavior may have occurred due to the difficulty of differentiating the 4 classes, mainly with respect to the sub-lesions. Another relevant characteristic is the endoMes class, which contains features that can be confused with both endocapillary and mesangial hypercellularity. Figure 9

illustrates the feature space of the data set plotted using the t-distributed stochastic neighbor embedding (t-SNE), which is a common technique for visualizing high-dimensional data into 2-dimensional plots. It’s noteworthy that the ”no lesion” class is well separated from the other lesion classes, which explains the 100% accuracy of the binary classification. The three lesion classes have some well-defined groups, but these classes also have some areas with quite an overlap of instances, meaning that images containinig endocapillary, mesangial and

endoMes hypercellularity can be very similar.

Figure 9: t-SNE visualization of the 4-class data set. The CNN feature extractor generates a 128-dimensional feature vector, and the t-SNE algorithm reduces the dimensionality to a 2-dimensional vector to help the analysis of clusters.
(a)
(b)
(c)
(d)
(e)
(f)
Figure 16: Six images of misclassified glomeruli with CNN-SVM architecture. From the left to right: (a) endocapillary hypercellularity misclassified as mesangial hypercellularity, (b) endocapillary hypercellularity misclassified as endoMes hypercellularity, (c) mesangial hypercellularity misclassified as endocapillary hypercellularity, (d) mensangial hypercellularity misclassified as endoMes hypercellularity, (e) endoMes hypercellularity misclassified as endocapillary hypercellularity, and (f) endoMes hypercellularity misclassified as mesangial hypercellularity.

Figure 16 shows six images misclassified by the CNN-SVM model, considering every possible error combination. These images depict complex lesions that may represent a challenge even for nephropathologists (corroborating with the t-SNE visualization). Figure 16(a) represents a glomeruli with increased circularity caused by cell proliferation and influx of inflammatory cell with disruption of glomerular compartments. Figure 16(b) represents a glomeruli with hypercellularity combined with mesangial matrix expansion and capillary wall thickening probably by immune complex deposition on the suendothelial and on the subepithelial aspects of the glomerular basement membrane burling the limits of glomerular compartments. Figure 16(c) hypercellularity is combined with capillary wall thickening and partial mesangial dissolution. In Figures 16(d) and (f), mesangial and capillary lumen are not always well defined. We showed these six images to be independently classified by three pathologists. The results of this analysis are shown in Table 8. Complete agreement among nephropathologists on the distribution of hypercellularity was achieved only in two out of the six images. In diagnostic practice most of the difficulties generated by these complex lesions are usually solved by examining contiguous tissue sections of 2 to 10 m apart, stained with a variety of techniques to highlight basement membrane and mesangial matrix such as PAS and Periodic acid-methenamine silver (PAMS).

Image Classifier Pool
(see Fig.16) Pathologist 1 Pathologist 2 Pathologist 3 CNN-SVM
a END END END MES PAT
b END ENDOMES ENDOMES ENDOMES COMP
c MES END ENDOMES END COMP
d MES MES MES ENDOMES PAT
e ENDOMES MES ENDOMES END PAT
f ENDOMES ENDOMES END MES PAT
Table 8: Comparison between the pathologists’ labels and the results obtained by the trained CNN-SVM model. The Pool column represents the majority voting outcome: computer (COMP) or pathologist (PAT).

Although perfect results on FIOCRUZ data set have been achieved, there is a considerable gap to move from academic research to practical computational systems that assist pathologists in an effective way. For future work, we are investigating different ways of using a transfer learning approach to initialize our network with better weights for generalizing glomerulus image classes, where sufficient training data exists. Additionally, we plan to expand the number of samples (now around 31,000 unlabelled images) in the data set, working with other types of lesions and histological stains used in the pathology laboratory for better data analysis. Another work in progress is the automatic glomerulus segmentation in a WSI, containing several glomeruli; the goal is to classify each found glomerulus, considering also the individual detection of each glomerulus component.

5 Ethical Considerations

This work was conducted in accordance with resolution No. 466/12 of the Brazilian National Health Council. To preserve confidentiality, the images (including those shown in the paper) were separated from other patient’s data. No data presented herein allows patient identification. All the procedures were approved by the Ethics Committee for Research Involving Human Subjects of the Gonçalo Moniz Institute from the Oswaldo Cruz Foundation (CPqGM/FIOCRUZ), Protocols No. 188/09 and No. 1817574.

Acknowledgment

The work was sponsored by Fundação de Amparo à Pesquisa do Estado da Bahia (FAPESB) grants TO-SUS0031/2018, TO-BOL0660/2018, TO-BOL0344/2018 and TO-PET0008/2015, respectively. Washington LC dos-Santos has a research scholarship from CNPq Proc. No. 306779/2017. Luciano Oliveira has a research scholarship from CNPq Proc. No. 307550/2018-4.

References

References

  • Al-Janabi et al. (2012) Al-Janabi, S., Huisman, A., Van Diest, P.J.. Digital pathology: current status and future perspectives. Histopathology 2012;61(1):1–9.
  • Bajema et al. (2018) Bajema, I.M., Wilhelmus, S., Alpers, C.E., Bruijn, J.A., Colvin, R.B., Cook, H.T., D’Agati, V.D., Ferrario, F., Haas, M., Jennette, J.C., et al. Revision of the international society of nephrology/renal pathology society classification for lupus nephritis: clarification of definitions, and modified national institutes of health activity and chronicity indices. Kidney international 2018;93(4):789–796.
  • Barisoni et al. (2013) Barisoni, L., Nast, C.C., Jennette, J.C., Hodgin, J.B., Herzenberg, A.M., Lemley, K.V., Conway, C.M., Kopp, J.B., Kretzler, M., Lienczewski, C., et al. Digital pathology evaluation in the multicenter nephrotic syndrome study network (neptune). Clinical Journal of the American Society of Nephrology 2013;8(8):1449–1459.
  • Barros et al. (2017) Barros, G.O., Navarro, B., Duarte, A., Dos-Santos, W.L.. Pathospotter-k: A computational tool for the automatic identification of glomerular lesions in histological images of kidneys. Scientific reports 2017;7:46769.
  • Canziani et al. (2016) Canziani, A., Paszke, A., Culurciello, E.. An analysis of deep neural network models for practical applications. arXiv preprint arXiv:160507678 2016;.
  • Chapelle et al. (2002) Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.. Choosing multiple parameters for support vector machines. Machine learning 2002;46(1-3):131–159.
  • Churg et al. (1995) Churg, J., Bernstein, J., Glassock, R.. Renal disease: classification and atlas of glomerular diseases. Igaku-Shoin, New York, Tokyo 1995;.
  • Duan et al. (2003) Duan, K., Keerthi, S.S., Poo, A.N..

    Evaluation of simple performance measures for tuning svm hyperparameters.

    Neurocomputing 2003;51:41–59.
  • Fabijańska (2018) Fabijańska, A.. Segmentation of corneal endothelium images using a u-net-based convolutional neural network. Artificial Intelligence in Medicine 2018;88:1 – 13.
  • Fogo (2003) Fogo, A.B.. Approach to renal biopsy. American Journal of Kidney Diseases 2003;42(4):826–836.
  • Fu et al. (2004) Fu, X., Ong, C., Keerthi, S., Hung, G.G., Goh, L.. Extracting the knowledge embedded in support vector machines. In: IEEE International Joint Conference on Neural Networks. IEEE; volume 1; 2004. p. 291–296.
  • Gandomkar et al. (2018) Gandomkar, Z., Brennan, P.C., Mello-Thoms, C.. Mudern: Multi-category classification of breast histopathological image using deep residual networks. Artificial Intelligence in Medicine 2018;88:14 – 24.
  • Gu et al. (2015) Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, L., Wang, G., et al. Recent advances in convolutional neural networks. arXiv preprint arXiv:151207108 2015;.
  • Hou et al. (2016) Hou, L., Samaras, D., Kurc, T.M., Gao, Y., Davis, J.E., Saltz, J.H.. Patch-based convolutional neural network for whole slide tissue image classification.

    In: IEEE Conference on Computer Vision and Pattern Recognition. 2016. p. 2424–2433.

  • Imbault and Lebart (2004) Imbault, F., Lebart, K.. A stochastic optimization approach for parameter tuning of support vector machines. In: International Conference on Pattern Recognition. IEEE; volume 4; 2004. p. 597–600.
  • Janowczyk and Madabhushi (2016) Janowczyk, A., Madabhushi, A.. Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases. Journal of pathology informatics 2016;7.
  • Joosten et al. (2005) Joosten, S.A., Sijpkens, Y.W., van Kooten, C., Paul, L.C.. Chronic renal allograft rejection: Pathophysiologic considerations. Kidney International 2005;68(1):1 – 13.
  • Kannan et al. (2019) Kannan, S., Morgan, L.A., Liang, B., Cheung, M.G., Lin, C.Q., Mun, D., Nader, R.G., Belghasem, M.E., Henderson, J.M., Francis, J.M., Chitalia, V.C., Kolachalama, V.B.. Segmentation of glomeruli within trichrome images using deep learning. Kidney International Reports 2019;.
  • Kingma and Ba (2014) Kingma, D.P., Ba, J.. Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980 2014;.
  • LeCun et al. (2015) LeCun, Y., Bengio, Y., Hinton, G.. Deep learning. Nature 2015;521(7553):436–444.
  • Litjens et al. (2017) Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., van der Laak, J.A., van Ginneken, B., Sánchez, C.I.. A survey on deep learning in medical image analysis. Medical Image Analysis 2017;42:60 – 88.
  • Madabhushi and Lee (2016) Madabhushi, A., Lee, G.. Image analysis and machine learning in digital pathology: Challenges and opportunities. Medical Image Analysis 2016;33:170 – 175.
  • Marsh et al. (2018) Marsh, J.N., Matlock, M.K., Kudose, S., Liu, T.C., Stappenbeck, T.S., Gaut, J.P., Swamidass, S.J.. Deep learning global glomerulosclerosis in transplant kidney frozen sections. bioRxiv 2018;:292789.
  • Pedraza et al. (2017) Pedraza, A., Gallego, J., Lopez, S., Gonzalez, L., Laurinavicius, A., Bueno, G.. Glomerulus classification with convolutional neural networks. In: Annual Conference on Medical Image Understanding and Analysis. Springer; 2017. p. 839–849.
  • Sarder et al. (2016) Sarder, P., Ginley, B., Tomaszewski, J.E.. Automated renal histopathology: digital extraction and quantification of renal pathology. In: Medical Imaging 2016: Digital Pathology. International Society for Optics and Photonics; volume 9791; 2016. p. 97910F.
  • Sharma et al. (2017) Sharma, H., Zerbe, N., Klempert, I., Hellwich, O., Hufnagl, P.. Deep convolutional neural networks for automatic classification of gastric carcinoma using whole slide images in digital histopathology. Computerized Medical Imaging and Graphics 2017;61:2–13.
  • Simon et al. (2018) Simon, O., Yacoub, R., Jain, S., Tomaszewski, J.E., Sarder, P.. Multi-radial lbp features as a tool for rapid glomerular detection and assessment in whole slide histopathology images. Scientific reports 2018;8(1):2032.
  • de Souza et al. (2006) de Souza, B.F., de Carvalho, A.C., Calvo, R., Ishii, R.P..

    Multiclass svm model selection using particle swarm optimization.

    In: International Conference on Hybrid Intelligent Systems. IEEE; 2006. p. 31.
  • Spanhol et al. (2016) Spanhol, F.A., Oliveira, L.S., Petitjean, C., Heutte, L.. Breast cancer histopathological image classification using convolutional neural networks. In: International Joint Conference on Neural Networks. IEEE; 2016. p. 2560–2567.
  • Trimarchi et al. (2017) Trimarchi, H., Barratt, J., Cattran, D.C., Cook, H.T., Coppo, R., Haas, M., Liu, Z.H., Roberts, I.S., Yuzawa, Y., Zhang, H., et al. Oxford classification of iga nephropathy 2016: an update from the iga nephropathy classification working group. Kidney international 2017;91(5):1014–1021.
  • Wahab et al. (2017) Wahab, N., Khan, A., Lee, Y.S..

    Two-phase deep convolutional neural network for reducing class skewness in histopathological images based breast cancer detection.

    Computers in biology and medicine 2017;85:86–97.
  • Xu et al. (2016) Xu, J., Luo, X., Wang, G., Gilmore, H., Madabhushi, A.. A deep convolutional neural network for segmenting and classifying epithelial and stromal regions in histopathological images. Neurocomputing 2016;191:214–223.
  • Zhang et al. (2018) Zhang, G., Hsu, C.H.R., Lai, H., Zheng, X.. Deep learning based feature representation for automated skin histopathological image annotation. Multimedia Tools and Applications 2018;77(8):9849–9869.