I Introduction
Lung cancer is the leading cause of cancerrelated deaths worldwide, with an estimated 1.6 million deaths each year
[cruz11lung]. Development of novel therapies to battle lung cancer has been greatly aided by the emergence of genetically engineered mouse models (GEMMs) of lung cancer, such as the Kras; p53 non–smallcell lung carcinoma (NSCLC) model, where the compound effect of conditional mutations in the Kras oncogene and the p53 tumor suppressor gene leads to development of adenocarcinomas in the mouse lung [walrath10genetically, barck15quantification]. Since GEMMs recapitulate certain aspects of the human disease associated with the stroma, vascularity, and immune infiltrate better than other models, it is important to be able to detect, identify and localize the lung tumor lesions seen on the histopathological sections as shown in Fig. 1.Manual assessment of tumor burden (the amount of tumor cells/mass present in a subject’s body) on histopathological mouse lung sections is difficult, time consuming, and a laborintensive process. This is due to various reasons such as fluctuating intensities [ram13symmetry], color change and morphological variations within structures of the cancer lesions in these images [lin19fast], tumor heterogeneity [junttila13influence] (see Fig. 1
), low signaltonoiseratio
[ram10seg, ram16size], variations in illumination [ram18three], microscopy imaging limitations [ram12size, ram2017sparse, ram18classify, ram20combined], and the large number of images and the number of lesions per image an expert has to demarcate. Moreover, the task of manual detection of cancer lesions on H&E slides can be subjective, leading to interobserver variability. Therefore, there is a pressing need for computeraided diagnostic tools for accurate and efficient quantitative analysis of histopathology images [gurcan09histo, veta14breast, xing16robust, ram21detect].Tumor detection and classification tools within the commonly available microscopy software are based on feature extraction techniques such as size, shape, and morphological features [gurcan09histo, basavanhally13multi, gorelick13pros, veta14breast, xing16robust, ram16size, tizhoosh18represent], texture features including local binary pattern (LBP) [reis17auto, wan17integrated, simon18multi]
, local Fourier transform
[kong11parti], cooccurrence matrix and fractal texture features [alinsaif20part], and energy minimization and optimizationbased techniques [tosun11graph, ozdemir13hybrid, bejnordi16automated, javed20multi]. These techniques suffer considerably due to overgeneralization and therefore need extensive customization for the dataset at hand, limiting their use to very simple images obtained/collected in a carefully constrained environment [ram16size]. Tumor detection and grading using size, shape and other morphological features does not work well when the cell population exhibits a variety of sizes and shapes, or when the signaltonoise (SNR) ratio is poor [shi17histo]. Energy minimization and optimization techniques minimize the internal energy within tumor areas for their accurate detection, but may lead to false detections for highly textured and heterogeneous tumor lesions. To overcome these limitations, existing software tools allow userfriendly interfaces to correct the results obtained. This, however, results in losing the benefits of automation such as speed and reproducibility.There has been much interest in developing algorithmic methods that adapt naturally to the dataset and perform feature discovery. One such popular class of learning or feature discovery methods includes those based on sparse representationbased classification (SRC) [wright09robust]. There have been many SRC methods that have been successfully applied to a variety of histopathological image classification problems [srinivas14simul, vu16histo, sarkar18sdl, li20anal]
. These methods are based on finding linear representations in the data. However, linear representations are almost always inadequate for representing nonlinear structures of the data which arise in many practical applications. A recent class of learningbased methods involve the design of deep neural networks that can be trained to learn relevant features by themselves. There have been plenty of deep learning methods that have been developed for histopathological image classification
[hou16patch, xu17large, tellez18whole, lin19fast, xing19pixel, campanella19clinical, wei19patho, valkonen20cyto]. The success of deep learning, however, has been fueled by the availability of generous and clean training data. When the training data is limited and/or noisy, as is often the case in medical imaging, these methods tend to show a performance degradation [goodfellow16deep]. Another class of learningbased approaches involve orthogonal transformation of the data such as principal component analysis (PCA) transform to extract relevant features for image classification [chan15pcanet, bruna13invariant, shi17histo, dutta20sparse]. These learningbased approaches using orthogonal transformation explore the data distribution to preserve global structures in the data.In this paper, we present a simple machine learning approach called the graphbased sparse principal component analysis (GSPCA) network, which combines the local and global structures of all the data and is implemented in a deep learning framework to learn an explicit nonlinear mapping of the data for accurate detection and classification. We use the most basic and easy operations to emulate the processing stages in a typical (convolutional) neural network: First, graphbased sparse PCA filters are used as the dataadapting convolutional filter bank at each stage of the network. Next, we perform a simple binary quantization (hashing) that serves as the nonlinear stage, followed by blockwise histograms of the binary codes as the feature pooling stage to obtain the final output features of the network. Finally, we train a support vector machine (SVM) classifier on the output features of the network to obtain the final classification instead of the regular softmax classifier, as the softmax classifier known to overfit
[chan15pcanet]. For ease of reference, we call this dataprocessing network a Graphbased Sparse PCA Network (GSPCANet). The key contributions of this paper are as follows:
Feature Extraction Using GraphBased Sparse PCA: Unlike other histopathology image classification methods, in this work we propose a baseline neural network method called GSPCANet, which is different from prior methods [bruna13invariant, chan15pcanet, shi17histo, dutta20sparse] in two aspects. 1) We include an additional sparsity promoting term in the PCA transformation so as to select more interpretable features from the images. 2) We include a graph regularization term in the objective function so as to preserve the local structures for each data point between the different classes.

Computationally Efficient Approach: Our proposed GSPCANet is computationally efficient in comparison to other deep learning methods in two aspects. 1) We show that a simple twostage network is good enough to extract all the relevant features for classifying the tumor versus healthy lung regions. 2) We do not need to learn the filter weights at each stage of the network.
We evaluate the proposed method and seven stateoftheart algorithms developed for histopathology image classification on a dataset of 67 images provided by the Stefanie Galban Lab, at the University of Michigan. The dataset consists of microscopy images of murine H&E stained lung sections and are divided into two categories: images of nontumorbearing control mice and images of mice with visible tumor.
Ii Principal Component Analysis
Let X denote an matrix of rows and columns of rank , where is the number of data samples, and is the number of features/variables. Let denote the element of X at row and column . Assume each column has zero mean. Let denote the covariance matrix of , where is a positive definite matrix of size , which can be decomposed as
(1) 
where is the
largest eigenvalue of
andis its associated eigenvector. PCA reduces the dimensionality of the data from
to by replacing the original features/variables with linear combinations of the formknown as the principal components (PCs), which are obtained by maximizing their variance:
and
where is the principal loading vector and the projection of the data is the principal component and the operator
denotes the (estimated) variance of a random variable.
Generally, PCA is computed using singular value decomposition (SVD) of
X as(2) 
where the columns of are the PCs, and the columns of V are the corresponding principal loading vectors (also known as basis vectors) [malladi20image]. The matrix S is a
diagonal matrix of ordered singular values
and the columns of U and V are orthonormal such that . If X is low rank, it is possible to significantly reduce its dimensionality by using the most significant basis vectors. The projection of the data X upon the first basis vectors gives the PCs.An alternative formulation for PCA can be derived on the projection framework [chan15pcanet], where the PC loading matrix V also known as the PCA basis (defined as the matrix containing the principal loading vectors) can be estimated by solving the following least squares optimization problem:
(3) 
where is the Frobenius norm, is a matrix whose columns form an orthonormal basis , and
is an identity matrix of size
. The columns of that minimize (3) are referred to as the PCA basis V. The minimization is solved by formulating it as a least absolute shrinkage and selection operator (LASSO) problem [zou06sparse]. Each principal component is derived from a linear combination of all features, consequently making nonsparse. We use this alternative formulation for PCA feature extraction in this work.Iii Proposed Method
Based on the PCA methodology, we propose a simple and efficient machine learning method for histopathology image classification. First, we obtain graphbased sparse PCA filters from the training images as the data adaptive convolutional filter bank for the various stages of a convolutional neural network. Then we perform a simple binary quantization (hashing), which serves as a nonlinear stage. Next, we use blockwise histograms of the binary codes obtained from the quantization process to get the output features of the network. Finally, we train a SVM classifier using the output features to obtain the final classification. The proposed GSPCANet model is shown in Fig. 2, illustrating each of the above steps involved in our algorithm.
Iiia GraphBased Sparse PCA
From the analysis of PCA in Section II, we can obtain a sparse PCA basis by including a regularization term in (3). Inclusion of a sparsity penalty reduces the number of features involved in each linear combination for obtaining the PCs. One way to extend (3) to obtain sparse basis vectors is by imposing norm and norm penalty constraints upon the regression coefficients (basis vectors) [zou06sparse]:
(4) 
where the same (the regularization parameter of the norm) is used for all components, different (the regularization parameters of the norm) are allowed for penalizing the loadings of different PCs. The corresponds to the required sparse basis . The norm and norm regularization terms penalize the number of nonzero coefficients in , whereas the loss term simultaneously minimizes the reconstruction error . If and the are zero, the problem reduces to finding the ordinary PCA basis vectors, equivalent to (3). When some coefficients of are forced to zero, resulting in sparsity.
The sparse PCA defined in (4) preserves the global structures in the data. In addition to preserving the global structures, we are interested in preserving the local structures, i.e., nearest neighbor (NN) preservation of each data point , as they help in identifying local features in the data. We define to be a constructed weighted graph. The vertices of correspond to the data points . The weight matrix is defined as
(5) 
where the set contains the nearest neighbors to the node in the graph. Furthermore, the norm is applied to measure the dissimilarity of two data points, and the weight matrix E is used to restrict the similarity between two data points. Thus, with the weight matrix E, we can formulate a graph regularization term as
(6) 
where C is a diagonal matrix with , L is the graph Laplacian matrix computed as and is the trace of a matrix. Minimizing the graph regularization term in (IIIA) ensures that the local structures between the data points are preserved. Combining the sparse PCA from (4) and the graph regularization from (IIIA), we propose a graphbased sparse PCA model,
(7)  
where is a graph regularization parameter. To solve (7), we perform the following steps: first solve an ordinary PCA problem to fix A, then formulate an elastic net with the fixed A and solve for B, then perform SVD to update A, and repeat these steps until convergence, finally obtaining the solution as .
IiiB Architecture of GSPCA Network
Suppose there are training images of size , and assume that PCA filter size is (formed by reshaping a basis vector of length ) at all stages of the network. The sparse PCA filters are learned from these training images. We describe each component of the network in detail below (see Fig. 2).
IiiB1 First Stage (GSPCA)
For each training image , around each pixel we take an image patch of size and denote all the overlapping image patches in the image as , where denotes the vectorized image patch in , , . We then subtract the image patch mean from each of the image patches and obtain the centralized matrix of as , where and . By constructing a similar centralized matrix for each training image , we obtain
(8) 
Assuming that we have PCA filters in stage , sparse PCA minimizes the reconstruction error within a family of orthonormal filters using (7), where is an identity matrix of size . The solution to the minimization problem in (7) are the principal eigenvectors of [chan15pcanet]. The PCA filters can therefore be expressed as
(9) 
where is an operator that reshapes a column vector to a matrix and denotes the principal eigenvector of . The principal eigenvectors capture the main variation of the centralized image patches in the training data. Similar to a convolutional neural network we stack multiple stages of the sparse PCA filters to extract higher level features.
IiiB2 Second Stage (GSPCA)
We repeat the same process as in first stage. Let the filter output of first stage be
(10) 
where denotes 2D convolution and boundary of the images
are zero padded before convolution. Similar to the first stage we collect all the overlapping image patches of the convolved image
, subtract the patch mean from each patch and obtain the centralized matrix , where is the mean subtracted image patch in . We define as the matrix containing all the mean subtracted patches of the filter output and concatenate for all filter outputs as(11) 
Once again we solve (7) with Y as the input. The solution to the minimization problem in (7) are the principal eigenvectors of . The sparse PCA filters of the second stage are then obtained as
(12) 
For each input image of the second stage, there will be output images of size generated as
(13) 
After the second stage we will obtain output images. It is easy to repeat the above process to build more (sparse PCA) stages if a deeper architecture is needed.
IiiB3 Binary Quantization (Hashing)
For each of the input images presented to the second stage we obtain realvalued output images
. We binarize these outputs and obtain
, where is a Heaviside step (like) function, which has a value of 1 for positive entries and zero otherwise. Around each pixel, we view the vector of binary bits as a decimal number, thus converting the outputs in into a single integervalued “image”(14) 
which has pixel values in the range .
IiiB4 Blockwise Histograms
We partition each of the “images” into distinct blocks, compute the histogram (with bins) of the decimal values in each block and concatenate all histograms into a single vector denoting it as . After such an encoding process the “feature” of the input image is then defined to be the set of blockwise histograms, i.e.,
(15) 
We use overlapping blocks to build the feature vector for each input image as it helps in retaining most amount of the information.
We train a linear support vector machine (SVM) classifier [cortes95support] using the feature vector obtained for each input image from the GSPCANet in order to classify cancer lesions versus normal tissues on H&E stained histological lung slides.
IiiC Classifying Color Images
There are several options to extend the proposed GSPCANet method to be able to extract features for classifying color images. In this work, we follow the approach described in [gurcan09histo, chan15pcanet] and apply the proposed GSPCANet to each of the red, blue, and green channels to obtain multichannel sparse PCA filters, that are then used to extract features for classifying the color images.
Iv Experiments and Results
In this section we evaluate our proposed GSPCANet image classification algorithm with other opensource histopathology image classification methods: SpPCANet method for image classification
[dutta20sparse], multiple clustered instance learning (MCIL) for histopathology image classification [xu14weakly], saliencybased dictionary learning (SDL) [sarkar18sdl], analysissynthesis learning with shared features (ASLF) [li20anal], patchbased convolutional neural network (PCNN) [hou16patch], encoded local projections (ELP) for histopathology image classification [tizhoosh18represent], and weakly supervised deep learning (WSDL) for whole slide tissue classification [campanella19clinical]. We evaluate these seven methods using commonly used detection/classification measures: precision (P), recall (R), detection accuracy, score, Tanimoto coefficient (T), and the receiver operating characteristic (ROC) curves along with the area under the curve (AUC).The Precision P and recall R (a.k.a. true positive rate or sensitivity) are given by
(16) 
where TP is the number of true positive classifications, FP is the number of false positive classifications, and FN is the number of false negative classifications. The false positive rate (a.k.a. complement of specificity) is defined as . An ROC curve is a plot of the true positive rate versus the false positive rate. The detection accuracy is defined as ( .
The score is defined by
(17) 
We use (i.e., ) as this is the most common choice for this type of evaluation [ram16size].
Tanimoto coefficient, also known as Tanimoto distance in statistics, is defined as
(18) 
where M is the number of detected individual tumors by an automated algorithm and N is the actual number of individual tumors in the image.
The AUC is the average of precision over the interval (), where is a function of recall R. It is given by
(19) 
The best detection algorithm among several alternatives is commonly defined as the one that maximizes the Tanimoto coefficient, AUC, and the score.
Iva Dataset
The proposed method was mainly developed with the goal of identifying individual tumors in H&E stained whole slide histopathology lung images obtained from an inducible Kras lung cancer model. The images were produced using a digital slide scanner (Super COOLSCAN 5000 ED Digital Slide Scanner; Nikon Corporation) with a objective lens (level pixel size: ). In our experiments, the size of each image acquired is approximately pixels. Our dataset consists of a total of 67 whole slide histopathology lung images obtained from 32 nontumorbearing mice and 35 mice with visible tumors. A careful manual delineation of the borders of the individual tumors within the 35 images was performed by an expert and considered as ground truth for subsequent analysis. We divide each image in our dataset into nonoverlapping image patches of size pixels consisting of a total of 52,487 cancer lesion patches and 1,455,023 normal patches.
IvB Experimental Setup
We used a total of 15 nontumorbearing mice images and 15 images with visible tumors for training the compared algorithms, consisting of a total of 21,934 cancer lesion patches and 653,092 normal patches. Our test dataset consists of 17 nontumorbearing mice images and 20 images with visible tumors consisting of a total of 30,553 cancer lesion patches and 801,931 normal patches. The hyperparameters of the GSPCANet algorithm include the filter size (), the number of stages, the number of filters in each stage (), and the block size for the local histograms in the output stage. The optimal values for these parameters were automatically selected on a validation set (randomly chosen from within the training data), using the ROC curves by varying one parameter at a time while keeping the others fixed and choosing that value of the parameter that maximizes the AUC of the ROC curve. The parameters of the GSPCANet were set to , , , and, a histogram block size of .
IvC Qualitative Results
Fig. 3 shows the qualitative detection results for an example image containing visible tumors from our test dataset. Fig. 3(a) shows that the proposed GSPCANet method detects most of tumor regions correctly with very few false positives and false negatives. Fig. 3(d) shows that the ASLF method is also able to identify the tumor regions well, but detects more false positives than the GSPCANet method. The SpPCANet, MCIL, and WSDL methods have many misclassifications (with blood vessels being identified as tumors) as shown in Figs. 3(b), (c) and (g), respectively. The ELP method splits a single tumor into three tumors (see Fig. 3(g) row 3), with many false positives. The SDL, PCNN, and ELP methods miss large parts of individual tumors, i.e., have many false negatives as shown in Fig. 3(d), (f), and (g), respectively. Visually it is clear that the proposed GSPCANet method accurately detects both large and small individual tumors within the whole slide image with very few false positives and false negatives. This is of great significance for those studying oncogenesis, progression, and metastasis because the robustness of the algorithm to the size of the tumor reduces the likelihood that the algorithm will mislabel cases containing only small tumors.
Method  Precision (P)  Recall (R)  score  Tanimoto Coefficient (T)  Detection Accuracy  AUC 

GSPCANet  0.872 (0.013)  0.955 (0.019)  0.912 (0.015)  0.903 (0.010)  0.908 (0.008)  0.951 0.011 
SpPCANet [dutta20sparse]  0.841 (0.019)  0.870 (0.025)  0.855 (0.022)  0.836 (0.014)  0.853 (0.015)  0.907 0.017 
MCIL [xu14weakly]  0.719 (0.022)  0.780 (0.015)  0.748 (0.031)  0.762 (0.019)  0.738 (0.026)  0.821 0.013 
SDL [sarkar18sdl]  0.752 (0.024)  0.850 (0.031)  0.798 (0.025)  0.801 (0.017)  0.785 (0.011)  0.849 0.021 
ASLF [li20anal]  0.811 (0.028)  0.900 (0.019)  0.853 (0.021)  0.829 (0.030)  0.845 (0.018)  0.903 0.022 
PCNN [hou16patch]  0.807 (0.039)  0.815 (0.031)  0.811 (0.032)  0.796 (0.023)  0.810 (0.024)  0.871 0.039 
ELP [tizhoosh18represent]  0.761 (0.023)  0.750 (0.018)  0.756 (0.021)  0.739 (0.027)  0.758 (0.023)  0.844 0.014 
WSDL [campanella19clinical]  0.798 (0.030)  0.785 (0.028)  0.823 (0.031)  0.821 (0.035)  0.818 (0.028)  0.882 0.041 
Mean Performance (and Standard Deviation) for Various Algorithms
IvD Quantitative Results
We compared the quantitative performance of the automated methods at the image patch level and for the task of individual tumor detection within an entire image as well. Fig. 4 shows the ROC curves of all automated methods at the image patch level on the test dataset. From Fig. 4, we observe that our proposed GSPCANet method exhibits the most favorable tradeoff in terms of accurate detection while maintaining low false positive rate in comparison to the other automated methods. Table I shows the quantitative performance of the compared methods for the task of individual tumor detection within the histopathology images in the test dataset. Table I shows that the detection accuracy of the proposed GSPCANet method is much higher than the other competing algorithms. From Table I, we also observe that the score, and Tanimoto coefficient (T) of the proposed method are the highest among the compared algorithms. Table I
also provides the AUC values and their 95% confidence intervals corresponding to the ROC curves in Fig.
4 for each method. We observe from the AUC values that the GSPCANet method outperforms the alternatives. In addition to the metrics in Table I, we also computed the free receiver operating characteristics curves (FROC)
[ram16size] for all the compared algorithms. Fig. 5 shows that the proposed GSPCANet method has better detection accuracy compared to the other automated methods at all points along the FROC curve. This shows that the proposed method detects the individual tumors within these images better than the other compared methods.The confusion matrix corresponding to competing methods for our test dataset is provided in Table
II. From Table II, we observe that our proposed GSPCANet method outperforms competing dictionary learning methods as well as the deep learning methods. This success is attributed to the ability of our proposed GSPCANet method to capture both the local and the global features associated with both normal and cancerous regions within the images, which the other compared methods do not address.Class  Cancerous  Healthy  Method 

87.21  12.79  GSPCANet  
84.06  15.94  SpPCANet [dutta20sparse]  
71.89  28.11  MCIL [xu14weakly]  
75.22  24.78  SDL [sarkar18sdl]  
Cancerous  81.08  18.92  ASLF [li20anal] 
80.69  19.31  PCNN [hou16patch]  
76.14  23.86  ELP [tizhoosh18represent]  
79.81  20.19  WSDL [campanella19clinical]  
04.97  95.03  GSPCANet  
13.47  86.53  SpPCANet [dutta20sparse]  
24.04  75.96  MCIL [xu14weakly]  
17.24  82.76  SDL [sarkar18sdl]  
Healthy  11.24  88.76  ASLF [li20anal] 
18.69  81.31  PCNN [hou16patch]  
24.63  73.37  ELP [tizhoosh18represent]  
16.04  83.96  WSDL [campanella19clinical] 
IvE Statistical Analysis
To investigate the robustness of training or selection bias for each automated method, we obtain the detection performance for 10 different choices of training image patches (the number of training images were fixed), using the rest of the image patches as test image patches. The detection accuracy for each training run was fit to a Gaussian probability density function (pdf) and plotted in Fig.
6. From Fig. 6, we observe that the mean our proposed GSPCANet curve is much higher than the competing methods indicating superior average detection accuracy. Even more crucial is the spread/variance of our GSPCANet curve is smaller than its alternatives indicating highly desirable robustness to the particular choice of training image patches.We also performed a balanced twoway analysis of variance (ANOVA) [hogg87engine] on the detection accuracies in the selectionbias experiment for all the methods. Fig. 7 shows these comparisons using a posthoc Tukey range test [hogg87engine]. Fig. 7 shows that the performance of the GSPCANet method is significantly separated from its competing alternatives. values of the proposed GSPCANet method compared with other stateoftheart methods are observed to be much less than , emphasizing the fact that the GSPCANet method is more effective.
IvF Computational Complexity
Here we show computational complexity of the GSPCANet method by considering a two stage network. For each stage in the GSPCANet, forming the mean subtracted image patch matrix X has a computational complexity of ; the inner product in (9) has a complexity of ; the computational complexity of the eigen decomposition with graphregularization is . The sparse PCA filter convolution has a complexity of at stage . The blockwise histogram computation has a complexity of . With , , and assuming , the overall complexity of GSPCANet is
(20) 
The computational complexity in (20) applies to both the training and testing phase of GSPCANet because the extra computation burden during training is the eigen decomposition, which can be ignored when .
Method  Training Time (HH:MM:SS)  Run Time (Std. Dev.) in Sec. 

GSPCANet  00:21:09  11.14 (3.09) 
SpPCANet [dutta20sparse]  00:20:53  15.21 (1.41) 
MCIL [xu14weakly]  18:25:06  66.35 (14.36) 
SDL [sarkar18sdl]  01:22:41  46.11 (4.51) 
ASLF [li20anal]  01:49:27  19.39 (5.15) 
PCNN [hou16patch]  19:27:55  39.47 (15.22) 
ELP [tizhoosh18represent]  04:38:03  71.44 (9.40) 
WSDL [campanella19clinical]  21:44:17  10.31 (6.02) 
We compared the mean inference run time, namely, the time required to classify all the image patches in a single test image for each of the competing algorithms. Table III shows the mean and standard deviation of the run time each method takes to classify an entire image. From Table III, we observe that the proposed GSPCANet method runs 0.83 seconds slower than the WSDL method, but is on average faster than all the other methods. The SDL and ASLF methods classify the test image patch by reconstructing them from the learned dictionaries and thus take more time to execute at test time. The ELP algorithm finds the Radon transformation of each test image patch at various orientations, thereby taking more time to classify each test image patch. The MCIL method integrates the clustering of multiple subtypes of a single class into the MIL classification framework, thus requiring more run time compared to the other methods. In Table III we also report the training time required to train each of the competing algorithms. From Table III, we observe that the proposed GSPCANet method and the SpPCANet method take roughly about 21 minutes to train, where as the other methods take about 3 to 62 times more time to train a good model. The small training time of the GSPCANet method is attributed to the low computational complexity of the method.
IvG Impact on Number of Training Images
In this section, we show the practicality and applicability of the proposed GSPCANet method in medical imaging tasks where we have very few data to learn from. Whereas in all other experiments we trained on 15 images each, from both classes, in this experiment we varied the number of training images (from 1 to 20) for all the competing methods and computed detection accuracy of these methods. Fig. 8 shows the detection accuracy of all the competing algorithms on the test dataset of 27 images (12 nontumor images and 15 images with visible tumors). From Fig. 8, we observe that the proposed GSPCANet method trained with as few as 8 images achieves a high detection accuracy of 91%, whereas the other methods are able to achieve a maximum detection accuracy of only about 89% and also require as much as 20 training images. This shows that the proposed GSPCANet method can produce a good model for image classification with less training data.
V Discussion and Conclusion
Tumor burden in histopathological sections is difficult to assess by manual evaluation, as well as by prior automated tumor detection algorithms. To solve this problem, our proposed machine learning algorithm uses a cascaded graphbased sparse PCA transform followed by PCA binary hashing and blockwise histograms to obtain features within image patches. These features are then used to classify an image patch as cancerous or healthy using a linear SVM classifier. Our approach differs from earlier learningbased methods based on deep learning [hou16patch, campanella19clinical], instance learning [xu14weakly, tizhoosh18represent] or dictionary learning [sarkar18sdl, li20anal] for histopathology image classification. Like many deep learning methods, the network parameters, such as the number of stages, the filter size, and the number of filters, need to be optimized and fixed for our GSPCANet method. Once these parameters are fixed, training the GSPCANet is extremely simple and efficient because the filter learning in GSPCANet does not require regularized parameters or require numerical optimization solvers. Moreover, the GSPCANet consists of only linear operations at each stage with a nonlinearity applied only at the output stage, which makes the method more interpretable than other deep learning methodologies.
The GSPCANet method was first validated with respect to detection accuracy using ROC curves and the AUC of the ROC curve. Second, the algorithm was validated with respect to detection accuracy using the precision, recall, score, Tanimoto coefficient, FROC curves, and the confusion matrix. Tables I & II show that the proposed GSPCANet method performs the best among the compared methods for histopathology image classification. Fig. 3 shows that the proposed GSPCANet method qualitatively performs the best in comparison to the other methods. Further, Fig. 6 shows that the GSPCANet method has superior average detection accuracy and is more robust to the choice of training images compared to the other methods. We also show the low computational complexity of the GSPCANet method and compare the training and inference run times for all the methods. Table III shows that the GSPCANet method is relatively very fast to learn a good model in comparison to other methods. Finally, Fig. 8 shows that the proposed method requires less data to learn a good model.
Next, we present some inherent limitations of the automated methods for tumor detection. Fig. 9 shows an example case of an image containing individual tumors where all algorithms including our algorithm fail to produce optimum detection results. In Fig. 9 we observe that even though the algorithm has detected all the individual tumors, i.e., the true positive image patches shown in green color, it has also detected many false positive image patches shown in red color. On close examination, we see that the false positive image patches within the image look very similar to cancerous image patches. This could be due to the fact that there is not enough resolution in this image to differentiate between the cancerous and healthy image patches, or this histopathology section was captured when some of the underlying cells were transitioning from healthy to being cancerous.
The proposed detection algorithm uses all the image patches in the training data for obtaining the local structures within the data when computing the graphbased term in (IIIA) and (7
). This adds to the time complexity and results in noise and outlier image patches still being included. However, the algorithm can be modified by linearly clustering the image patches into subgroups and taking these cluster centers to compute the graph term in (
7). Making this change could further reduce detection errors and also accelerate the algorithm, making it more accurate and efficient at the same time.