Classification of pediatric pneumonia using chest X-rays by functional regression

05/07/2020 ∙ by Md Nazmul Islam, et al. ∙ 0

An accurate and prompt diagnosis of pediatric pneumonia is imperative for successful treatment intervention. One approach to diagnose pneumonia cases is using radiographic data. In this article, we propose a novel parsimonious scalar-on-image classification model adopting the ideas of functional data analysis. Our main idea is to treat images as functional measurements and exploit underlying covariance structures to select basis functions; these bases are then used in approximating both image profiles and corresponding regression coefficient. We re-express the regression model into a standard generalized linear model where the functional principal component scores are treated as covariates. We apply the method to (1) classify pneumonia against healthy and viral against bacterial pneumonia patients, and (2) test the null effect about the association between images and responses. Extensive simulation studies show excellent numerical performance in terms of classification, hypothesis testing, and efficient computation.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 13

page 26

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Abstract

An accurate and prompt diagnosis of pediatric pneumonia is imperative for successful treatment intervention. One approach to diagnose pneumonia cases is using radiographic data. In this article, we propose a novel parsimonious scalar-on-image classification model adopting the ideas of functional data analysis. Our main idea is to treat images as functional measurements and exploit underlying covariance structures to select basis functions; these bases are then used in approximating both image profiles and corresponding regression coefficient. We re-express the regression model into a standard generalized linear model where the functional principal component scores are treated as covariates. We apply the method to (1) classify pneumonia against healthy and viral against bacterial pneumonia patients, and (2) test the null effect about the association between images and responses. Extensive simulation studies show excellent numerical performance in terms of classification, hypothesis testing, and efficient computation.

Keywords: X-rays; Image classification; Functional data analysis; Pneumonia detection; Scalar-on-image regression.

1 Introduction

Pneumonia is a form of an acute respiratory illness affecting the lungs. Though pneumonia can occur at any age, younger children are most vulnerable. Bacteria, viruses, or fungi can categorize the common causes of pneumonia. In all these cases, children experience breathing difficulty as their lungs get contaminated with pus and fluid, which results in the inflammation of air sacs limiting the oxygen intake. Pediatric pneumonia is the primary cause of fatality that claims 3 million children globally every year, according to the United Nations Children’s Fund (UNICEF) [51]. Unfortunately, the majority of these deaths were preventable and occurred due to the lack of proper immunization, adequate nutrition, and the availability of antibiotic treatments. The developing countries, particularly in Southeast Asia and sub-Saharan Africa, suffer most and have the majority of death tolls. The developed countries still suffer substantially from the burden of disease, with 2.5 million incidences yearly leading a third to a half of these to hospitalizations incurring high healthcare-associated costs; see [3, 20, 11]. Very recently, with the discovery of the pneumococcal vaccine, the risk of pneumonia in the United States has been reduced substantially. In contrast, the developed countries are yet to avail such measures, and they have been working strenuously to achieve the Integrated Global Action Plan for the Prevention and Control of Pneumonia and Diarrhoea (GAPPD) under the supervision of the World Health Organization (WHO) and UNICEF.

The clinical signs or symptoms for pneumonia are often nonspecific and vary based on the patient’s physical characteristics. Accurate and timely diagnosis is the key to fight pneumonia. The current state-of-the-art diagnostic approaches include polymerase chain reaction (PCR), serology, culture, complete blood cell (CBC), chest radiography, and ultrasonography based test. Some of these tests require advanced lab facilities and are not entirely immune from producing false negatives. The PCR test is widely used but requires 24-48 hours to get back the diagnostic results and often requires close contact to collect the specimens from the suspected individuals manually, which is risky and has a high chance of transmitting infectious diseases like COVID-19. Recently, chest X-rays have shown a lot of promise and been used in diagnosis or as a confirmatory test for pneumonia [30, 28]. Such medical images need to be interpreted by radiologists to detect appropriately any abnormalities; unfortunately, this is often a manual, time-consuming step, and requires experts with domain-knowledge to diagnose. Therefore, there is a need for intelligent decision support that can empower and augment clinical decision making, which in turn saves time and prevents physicians’ burnout and morbidity. Such a tool, if developed, may classify, flag, or confirm pneumonia-like patients using radiography images or can even stratify symptomatic patients into different risk categories as needed; see [2, 27, 53, 50, 52, 16] and many others.

Deep learning-based artificial intelligence (AI) systems have been used extensively in studying radiology images and detecting patients [27]

. While AI works well in prediction, it requires a substantially large dataset to train the machine learning model, which may not always be feasible. Besides, such an approach does not offer flexibility to either make inference about the parameters of interest or quantify uncertainty in estimation. Very recently, statistical-based methodologies have been applied in neuroimaging; see

[45, 26, 22, 55, 42] and many others. [39, 26, 22] adopted Bayesian spatial modeling approaches to model the correlation between neighboring voxels with carefully placing priors on brain regions in predicting adverse clinical outcomes of Alzheimer’s disease. In a similar spirit, [46, 18, 45]

fitted univariate linear regressions on each region of interest of brain image separately to explore the relationship between images and scalar measures. While most of these methods primarily focus on prediction, the inferential aspects of parameter estimates are not discussed explicitly.

[55, 54] model an image data as functional measurements and apply the regularization penalty to induce zero-effect for the region where there is no association between responses and images. Here the authors used pre-specified Haar wavelet basis functions to approximate coefficient functions for which the selection of the optimum number of basis is based on either information criterion or cross-validation, which is a time-consuming step for multi-dimensional images. Such methodologies are also not directly applicable in the presence of noisy image data. Furthermore, the authors adopted a permutation-based test to make inference about the non-zero effect in a region, and it is unclear how to test the nullity of the overall regression coefficient.

In this paper, we address these limitations and introduce a novel scalar-on-image regression procedure based on functional principal component (FPC) analysis. There are three key novelties of this paper. The first novelty is the use of the functional data analysis approach exploiting the underlying covariance structure of 2-dimensional (2D) X-ray data, which has not previously been used, to the best of our knowledge, in this generality in image classification. The second novelty is the parsimonious modeling of the coefficient function using data-driven basis functions. The third novelty is casting the scalar-on-image regression model into a generalized linear model framework, which allows hypothesis testing about the parameters of interests explaining the association between responses and covariates. The main advantages of this approach are as below - (1) It provides excellent numerical performance in terms of classification accuracy; (2) it is computationally efficient and magnitude faster; (3) unlike the deep learning approach, it does not require a massive set of images to train the model; (4) it offers inference about the effect of radiographic images in classification; (5) it is applicable for images corrupted with subtle noises.

The paper is organized as follows. Section 2 describes the collection and processing of the X-ray. Section 3 details the image classification approach. Section 4 presents the estimation procedure. Section 5 describes the inferential procedure. Section 6 presents the application in classifying pediatric chest X-rays to detect pneumonia. Multiple simulation studies are performed in Section 7, showing the robustness of our method mimicking the original data. Section 8 concludes the paper with a discussion.

2 Data processing

Figure 1 illustrates the posteroanterior X-ray images for (a) four randomly chosen healthy subjects (top four panels), (b) two bacterial pneumonia patients (two-bottom-left), and (c) two viral pneumonia patients (two-bottom-right). (a) The normal chest X-rays have no signs of abnormal opacification such as fluid or solid materials within the airways in the images, (b) Bacterial pneumonia usually displays a focal lobar consolidation where a segment of lung shows confluent cotton-like opacity, (c) Apparently, more diffuse “interstitial” patterns such as scattered white patches in both lungs are evident in the X-rays of viral pneumonia patients.

Figure 1: Illustrations of chest X-ray images for four randomly chosen healthy subjects (top panel) and four pneumonia (bottom panel) patients, where the two bottom-left panels associate with the bacterial pneumonia cases, and the two bottom-right refers to the viral pneumonia cases. Note that all patients are aged between 1-5 years.

The pneumonia dataset [7] consists of 5,863 X-Ray images (in JPEG format) for two categories: pneumonia (bacterial and viral) and normal cases; see Figure 1. This dataset is freely available in Kaggle. These chest X-ray images (anterior-posterior) were collected from pediatric patients aged between one to five years who were treated in the Guangzhou Women and Children’s Medical Center, Guangzhou, China, before 2018. These chest X-ray images were recorded during the patients’ routine clinical visits. For the purpose of analysis, all chest radiographs were initially screened for quality control by removing all low quality or unreadable scans, and the diagnoses were labeled by two expert physicians [7]. For the purpose of our analysis, we use the X-ray images saved in the Train folder as there could be an issue with labeling in the Test folder identified by independent researchers; see [9]. There are 5,216 images (1,341 normal and 3,875 pneumonia cases) in the folder, which we use for our analysis, assuming that the current labeling is correct.

Each X-ray image is grayscaled 2-dimensional (2D) array structure of dimension where and represents the width and height of the images. The image data are comprised of intensity profiles with values between 0 and 1 representing black and white, respectively. We reshape all images into the same size, i.e., for our computational convenience.

3 Proposed methodology

Let the observed data be where indexes the subject, is the scalar response associated with the th subject, is the th observed clinical or demographic (e.g, age, prior history of lung diseases, number of days since the start of symptoms or hospitalization), is the realized discrete intensity value of a two-dimensional (2D) projection image at discrete grid points and such that and where both and are closed compact sets. We assume that and are regular, dense design in and respectively. In our application, takes values either 0 or 1 indicating the disease status (i.e., normal or pneumonia) of the th subject and ’s indicate the intensity values of the X-ray images which are the 2D projections of 3D objects. In our modeling, we treat each 2D image object as a functional measurement.

Denote by the exponential family distribution with mean and dispersion parameter As our objective is to classify patients with respect to their radiograph images, we propose to fit the following generalized functional regression model

(1)

where

is a known monotone link function (e.g., logit link for the binary responses),

is an unknown scalar intercept, and is an unknown bivariate function defined on quantifying the association between mean responses and functional covariate. If the chest X-rays of healthy and pneumonia afflicted patients provide non-distinguishable pattern, then For notational convenience, we denote simply by and use them interchangeably. Model 7 is a variation of the classical functional linear models described in [4, 23, 43, 15, 34, 13] and also an extension of the functional varying coefficient models [5, 29, 47, 48].

3.1 Model approximation

Let be a set of orthonormal basis defined in such that if and 1 otherwise. Similarly, define by the orthonormal basis functions spanned in such that if

and 1 otherwise. We use the tensor product of these basis functions to approximate

such that where ’s are the unknown basis coefficients and can be defined uniquely by For our modeling exposition, we let vary within each resulting in We represent the functional covariate using the same basis functions and in two steps: first, we write where is a zero-mean smooth process defined on and independent identically distributed (IID) over Next, for each write where ’s are the weights associated with the corresponding basis functions and IID over and . It follows that

(2)

Here the infinite summation is intractable and needs to be truncated at some finite levels, say and for -direction and -direction, respectively. In a similar spirit to [47], the truncation values and control the degree of smoothness of in both directions. As in the non-parametric regression approach, the choice of truncation values leads to overly smooth or wiggly patterns in and therefore needs to be selected with caution that we discuss in the next section. Using the equation 2, we cast the model into a generalized linear model [35] framework for given ’s as below

(3)

where , , are the unknown regression coefficients. Note that the number of basis functions is kept same across for the purpose of exposition only. We relax this assumption in the next sections assuming

In our model, first, we need to estimate . Subsequently, using the estimated components, we fit the model 3 and estimate ’s so as

3.2 Selection of basis functions

The selection of orthonormal basis functions is critical in approximating the model in the form of 3. One approach is to select a pre-specified orthonormal basis functions similar to [54, 12, 48]. An alternative approach is to use a data-driven basis; see [47, 37] where the main idea is to estimate the marginal covariance function of the observed functional covariates and select the corresponding orthogonal eigenbasis as a basis. In our approach, we use the latter approach as it has the advantage of exploiting the underlying covariance structures of radiographical images and allows us to account for correlation between the neighboring points.

The observed functional measurements are most likely to be contaminated with error or white noise. Unlike

[54], assume , where ’s are the noisy realizations of the true functional covariates, ’s, and is a white noise with mean 0 and covariance, and is independent over , and . Define , where is the sampling density of ’s; is a proper covariance function (positive semidefinite and symmetric function); see [41, 19]. Define the covariance function of the observed functional covariate realized at a point in -direction by The spectral decomposition of results in where

’s are the non-negative eigenvalues and

’s are the orthonormal eigenbasis functions spanned in which are used in approximating The infnite summation is truncated at a finite level such as

which is determined based on predetermined percentage of variance explained (PVE) value where the main idea is to choose the smallest integer

such that is larger than the preset PVE; such approach is adopted in [15, 47, 43, 8]. This follows that and the observed covariate can be written as where the th observed loadings are In particular, where is a nugget effect. In a similar spirit to [47], we model the loadings to exploit the underlying covariance structures in the -direction. Define the covariance function of the th latent smooth process, by Assume A spectral decomposition of leads to , where ’s are the corresponding eigen-components with ’s being the non-negative eigenvalues and ’s being the orthonormal basis functions spanned in Using the truncated Karhunen-Loève (KL) expansion, we approximate by where ’s are the weights for the th functional principal component (FPC) associated with the th direction for the th subject, and is uniquely defined by . The ’s are the covariates used in modeling 3. Such idea of using FPCs with the largest variation are most predictive of responses in an association model is the basis of functional principal component regression (FPCR); [13, 43]. In a similar spirit to the selection of the truncation value is also chosen by prespecified PVE value.

4 Estimation

4.1 Estimation of components related to functional predictor

Estimation is done following the procedures described in [47, 59] where the ideas were developed for a sparse design, we borrow the same techniques for a dense design. We briefly review the idea here. We demean the observed functional predictor to ensure Define the demeaned covariates by Assume is a matrix of stacking up all the image data over subjects. A pooled sample covariance estimator is obtained as a cross-product of the elements of divided by the total number of elements in each column such that the th element is Denote the covariance matrix by Due to the presence of noise in the diagonal terms in are inflated by variance and thus requires smoothing. We apply the bivariate smoothing approach [57] to obtain the smoothed covariance function which is denoted by Applying spectral decomposition on with a pre-set PVE value results in a set of eigen-components, where the terms bear the meaning as described in section 3.2. Next the corresponding loadings are obtained through numerical integration as

Let the data obtained from the estimated loadings for a fixed be . Using the idea described in [59], we estimate the covariance function of the loadings and denote it by ’s. Next the spectral decomposition leads to The corresoponding scores ’s are obtained by casting the model for

’s into a linear mixed model framework assuming that

’s follow Gaussian distribution; see

[59] for details. We replicate this procedure for each components.

4.2 Estimation of response related parameters

Given the estimated scores the approximating model in 3 can be written as Define and let be the norms. As in [47] we set PVE generously large enough to capture all non-negligible association between the functional predictor and scalar responses. It is possible that a direction is only associated with noise and have variance much larger than the signal, such components introduce wiggly pattern of and are likely to overfit the model. To circumvent this problem, we use a penalized (PEN) technique to estimate parameters. In our application, we also use non-penalized (Non-PEN) version.

For Non-PEN, we estimate by maximizing the log-likelihood function of the GLM such as

following the procedures described in [32, 10].

For PEN, we use a group LASSO penalty to regularize estimation of ; we impose the penalty on the effects total magnitude . Here we maximize the following penalized criterion

adopting the ideas described in [60, 58]. Here is a tuning parameter controlling sparsity by shrinking all coefficients associated with a group to zero simultaneously; is selected by K-fold cross-validation (CV) technique.

4.3 Prediction of response

Once ’s are estimated, we estimate the mean of by where is the inverse of the link function

For binary classification, we define the predicted responses by if and otherwise. Here, is the optimum cut-off value based on Youden index and optimized by maximizing the sum of sensitivity and specificity; see [62, 44, 17].

5 Inference

Though our idea is illustrated by using 2D X-ray images, the proposed methodology can be leveraged in other radiography images such as magnetic resonance imaging (MRI), positron emission tomography (PET) images, ultrasounds, and mammography. While there are various kinds of medical images, it is a question of interest to investigate which images are sufficiently discriminative enough to differentiate between cases and non-cases. For instance, if there is no difference between the chest X-rays of pneumonia and healthy patients, then the odds ratio for a given covariate

will be equal to 1. Such a hypothesis of interest can be written formally as

(4)

Equivalently, with an abuse of notation, we write where Using the orthonormal property of the basis functions this can further be written as

This follows that the hypothesis 4 can be reformulated as

(5)

The parsimonious modeling framework 3 allows to make inference about the significance of the association between binary responses and functional measurements through testing the nullity of ’s for all ’s and ’s using the conventional hypothesis testing procedures such as likelihood ratio test (LRT), F-1 test, Wald test, or score test. While all these tests are asymptotically equivalent, we use LRT due to its amenable theoretical properties in testing both constrained and unconstrained parameter space; see [25, 33]. In particular, the test is based on the differences between the likelihoods computed from the maximum likelihood estimates (MLEs) under and

Define the test statistic by

(6)

where refers to the likelihood value with respect to the MLEs under and the MLEs under ; it can be shown that follows asympotically distribution with degrees of freedom where equals the difference in the number of free parameters to be estimated under and such that LRT has been studied extensively and implemented in various applications for decades; we refer to [40, 56, 31] and references there in for details. Note that such hypothesis testing procedure makes more sense for the Non-PEN model. If the PEN model is used, will ideally shrink all coefficents, ’s, to zero when there is no association between responses and covariates given that the optimum value for is chosen appropriately.

6 Pneumonia image classification

We apply the methodology on the pediatric pneumonia dataset described in section 2. The main objectives are to classify the images by (S1) regular and pneumonia and (S2) viral pneumonia and bacterial pneumonia cases, and evaluate the statistical significance of the association between X-ray images and binary classes. In particular, we use the model

(7)

where is a logit link assuming is a binary response such that in (S1), for pneumonia case and 0 otherwise. Similarly in (S2), let be for bacterial pneumonia, and 0 for viral pneumonia. While we use both healthy and pneumonia data in the former classification problem, the latter exploits only the pneumonia data.

6.1 Computational details

We implement the method in R version 3.6.3 [49]. We read and edit the X-ray images using the package EBImage [38]. We use the fpca.face and fpca.sc functions from the refund [14] to estimate the covariates related components. We set the PVE value equal to 0.99 in our application. The Non-PEN is solved using the glm function from the package stats. The PEN model is fitted by gglasso function of the gglasso package [58]. The LRT is calculated using the lrtest function from the lmtest package [61]. The computation time to train and test the Non-PEN model in a sample of 2,600 X-ray images is approximately 40 seconds, and it takes approximately 240 seconds to train-and-validate the sample of 5,216 images in a machine with an Intel Core-i7 processor having 16GB RAM.

6.2 Permormance metrics

We split the data into a training set (in-sample) on which the model is built and a test set (out-of-sample) on which the performance is evaluated; we report the results for both in-sample and out-of-sample dataset. Two sampling mechanisms are considered. (M1) All 5,216 images (1,341 standard and 3,875 pneumonia) are used where 350 pictures from each type are selected randomly and kept aside to constitute the test dataset; the remaining 4,466 X-rays are used to train the model. (M2) A random sample of 1,300 from each type (i.e., healthy and pneumonia) so that a total of 2,600 images form the whole dataset. As in (M1), a total of 700 copies (i.e., 350 healthy and 350 pneumonia) form the test set, and the remaining 1,900 images build the training cohort. There are 1,345 viral and 2,345 bacterial pneumonia X-rays. Notice (M2) ensures balanced cases and uses much smaller sample to train the model than that of (M1); the purpose is to assess the model’s performance with a smaller sample. We consider two ways of forming the test set - (a) replicating the test-train split 200 times and evaluating the performance within each split (b) forming the test-train split for once. In both cases, test images are selected randomly without replacement. Approach (a) will potentially induce more randomness into the train-test split and is less susceptible to sampling bias. Let and be the set of patients referring to the pneumonia (i.e., ) and healthy cases (i.e., ) respectively such that where is the cardinality of a set. Define

  • True positive (TP) =

  • False negative (FN) =

  • True negative (TN) =

  • False positive (FP) =

Here is an indicator function taking values 0 and 1. We consider the following performance metrics

  • True positive rate (TPR) = TP / (TP + FN)

  • True negative rate (TNR) = TN / (TN + FP)

  • Positive predictive value (PPV) = TP / (TP + FP)

  • Accuracy = TP + TN / (TP + TN + FP + FN)

  • Mathhews correlation coefficient (MCC) =

  • F =

  • Area under the curve (AUC) of a receiver operator characteristic (ROC) curve; ROC is generated by plotting the TPR versus false positive rate (FPR) at various thresholds of a binary classifier system.

6.3 Classification performance

Table 1 provides the numerical performance of the model for both in-sample and out-of-sample image data for (S1). The classification results are close in the validation set for both (a) and (b); due to the smaller training sample in (b), unsurprisingly, in-sample performances are marginally better. The high ACC values indicate that the model identifies pneumonia and healthy patients using X-rays correctly. The high F values suggest a good balance between precision (i.e., TPR) and recall (i.e., PPV) referring to the number of instances that the model correctly classifies and instances that the model misses, respectively. Besides, the high AUC values () indicate an excellent balance between TPR and the false positive rate (FPR). Apparently, we observe better classification performances in the Non-PEN method for (M1) than that of (M2). This could be due to the fact that (M1) is comprised of all images implying more variation in the data; therefore, an FPC analysis with a high PVE (i.e., 0.99) is likely to provide some directions associated with noise that could potentially overfit the model. Therefore, by shrinking such directions to zero, we improve the model performance. The IQR values provide evidence of small variation in the reported summary values across 200 validation sets for (a). The other important metrics such as TNR, PPV, and are reported in the Supplementary Material; see Section in 9.

Table 2 provides the numerical performance of the model for (S2). Separating viral and bacterial cases using a scalar-on-image regression is a much more difficult problem than that of (S1). Overall, the model performs satisfactorily in differentiating between viral and bacterial pneumonia patients. As above, we observe better out-of-sample performances in PEN approach.

We are interested in evaluating two scientific questions. (1) Do the radiographic image data differ between healthy and pneumonia patients? (2) Do the image data differ between viral and pneumonia patients? We use 6 to test the hypotheses. As the calculated p-value is 2.2e-16 for testing (1), we may conjecture that the observed patterns in the X-rays are distinctive between healthy individuals and pneumonia patients. Similarly, since the p-value for testing (2) is 2.2e-16, we may conclude that there is a statistically significant difference in the image data between viral and pneumonia patients. Further results related to testing the equality of mean scores between healthy and pneumonia patients are provided in Section of the Supplementary Material 9.

Non-PEN PEN Settings ACC TPR AUC ACC TPR AUC M1 + (a) + In 0.950 [0.006] 0.952 [0.010] 0.986 [0.001] 0.963 [0.006] 0.962 [0.010] 0.994 [0.001] M1 + (a) + Out 0.931 [0.010] 0.931 [0.023] 0.979 [0.005] 0.949 [0.010] 0.957 [0.017] 0.988 [0.004] M1 + (b) + In 0.957 0.962 0.987 0.972 0.974 0.995 M1 + (b) + Out 0.926 0.940 0.976 0.943 0.972 0.983 M2 + (a) + In 0.957 [0.004] 0.952 [0.014] 0.992 [0.001] 0.956 [0.004] 0.954 [0.013] 0.991 [0.001] M2 + (a) + Out 0.940 [0.010] 0.934 [0.022] 0.986 [0.004] 0.941 [0.010] 0.940 [0.020] 0.987 [0.004] M2 + (b) + In 0.956 0.945 0.991 0.957 0.955 0.991 M1 + (b) + Out 0.943 0.943 0.988 0.940 0.946 0.987

Table 1: Comparison of chest X-rays between pneumonia and normal (S1). Classification metrics are presented for in-sample and out-of-sample data with respect to training-testing schemes (a) and (b) based on (M1) and (M2). Median and corresponding IQR (in parenthesis) across 200 iterations are reported for scheme (a).

Non-PEN PEN Settings ACC TPR AUC ACC TPR AUC M1 + (a) + In 0.728 [0.021] 0.717 [0.056] 0.807 [0.006] 0.731 [0.014] 0.739 [0.036] 0.795 [0.007] M1 + (a) + Out 0.690 [0.023] 0.674 [0.023] 0.759 [0.022] 0.701 [0.021] 0.729 [0.054] 0.768 [0.024] M1 + (b) + In 0.736 0.731 0.810 0.723 0.707 0.800 M1 + (b) + Out 0.689 0.700 0.753 0.689 0.691 0.756 M2 + (a) + In 0.754 [0.009] 0.743 [0.047] 0.827 [0.007] 0.721 [0.009] 0.748 [0.059] 0.791 [0.010] M2 + (a) + Out 0.686 [0.022] 0.677 [0.059] 0.746 [0.021] 0.693 [0.020] 0.721 [0.078] 0.760 [0.020] M2 + (b) + In 0.761 0.738 0.828 0.728 0.777 0.792 M2 + (b) + In 0.693 0.675 0.740 0.696 0.726 0.763

Table 2: Comparison of chest X-rays between viral and bacterial pneumonia (S2). Classification metrics are presented for in-sample and out-of-sample data with respect to training-testing schemes (a) and (b) based on (M1) and (M2). Median and corresponding IQR (in parenthesis) across 200 iterations are reported for scheme (a).

7 Numerical experiment

7.1 Classification performance on simulated data

In this section, we consider a simulation study generating X-ray images for healthy and pneumonia patients. The objective is to assess the model performance when the images are alterted by noises. Let the observed data be where is the binary response associated with the th subject for who the simulated image, is rendered. Let We generate gray-scaled pixel images for each patient such that and Denote by the model that generates noisy realizations of X-ray images; where is a white noise which follows IID Here ’s are the original gray-scaled images selected randomly from 5,216 original images; i.e. or

. Note that the standard deviation

controls the departure in pixel from the true X-ray images; as departs from zero, the image quality drops gradually and gets perturbed on both and directions. In our simulation study, we consider see Figure 2. Note that as the magnitude of noise increases, it gets difficult to interpret and becomes indistinguishable between healthy and pneumonia cases. For each setting, we run simulations for 500 times, and within each run, we simulate the X-ray images for 300 healthy and 300 pneumonia patients. We label the binary responses such that if a patient belongs to the healthy cohort and 1, otherwise. To assess the classification performance of the proposed method, we divide each simulated dataset into training (75%) and test set (25%) and calculate the classification metrics for both in-sample and out-of-sample data. We set PVE equal to 0.90 for this experiment.

Figure 2: Example of simulated chest X-ray images for a healthy subject (top) and a patient with pneumonia (bottom) with from left-to-right in an increasing order.

Table 3 displays the binary classification results for the simulated data with different magnitude of noises. The results are close for and , which provides evidence of the efficacy of the proposed methodology in the presence of noise. Overall, both PEN and Non-PEN perform competitively. While the model still classifies patients satisfactorily for the performance drops to an extent. This phenomenon is not too unexpected as it becomes hard for the model to extract the signal out of the data from such noisy images. In reality, such low-quality images are not even recommended for clinical use anyway. Additional results are provided in Section of the Supplementary Material 9.

Figure 3 illustrates the area under the curve (AUC) of a receiver operator characteristic for different noise levels for the Non-PEN (left) and PEN (right) model; the differences are very marginal and visually indistinguishable. The model performance with respect to AUC remains somewhat similar for and As expected, with , the performance of the classification model decays gradually.

Non-PEN PEN Validation ACC TPR TNR ACC TPR TNR In 0.933 [0.011] 0.920 [0.036] 0.947 [0.027] 0.931 [0.011] 0.916 [0.036] 0.947 [0.031] Out 0.900 [0.033] 0.893 [0.053] 0.920 [0.053] 0.907 [0.033] 0.893 [0.067] 0.920 [0.053] In 0.931 [0.011] 0.920 [0.031] 0.942 [0.027] 0.929 [0.011] 0.916 [0.036] 0.942 [0.031] Out 0.900 [0.027] 0.893 [0.067] 0.920 [0.053] 0.900 [0.027] 0.887 [0.067] 0.920 [0.053] In 0.899 [0.013] 0.890 [0.031] 0.911 [0.040] 0.898 [0.013] 0.889 [0.027] 0.907 [0.036] Out 0.873 [0.033] 0.853 [0.067] 0.893 [0.067] 0.873 [0.033] 0.867 [0.067] 0.893 [0.067] In 0.807 [0.018] 0.804 [0.062] 0.813 [0.063] 0.809 [0.020] 0.809 [0.058] 0.809 [0.062] Out 0.787 [0.040] 0.787 [0.080] 0.787 [0.080] 0.787 [0.040] 0.787 [0.080] 0.787 [0.080] In 0.780 [0.018] 0.773 [0.072] 0.796 [0.071] 0.780 [0.018] 0.773 [0.076] 0.796 [0.080] Out 0.753 [0.042] 0.747 [0.107] 0.773 [0.093] 0.753 [0.047] 0.747 [0.097] 0.773 [0.093]

Table 3: Comparison of chest X-rays between simulated pneumonia and normal (S1) for balanced cases (b). Classification metrics (ACC, TPR, TNR) for in-sample and out-of-sample data at are summarized for Non-PEN and PEN estimation approach. Median values and the corresponding IQR (in parenthesis) are reported. Results are based on 500 simulations.

Figure 3: Boxplots of AUC values estimated by Non-PEN (left) and PEN (right) model for different noise levels Median values, 25th, and 75th percentile values are displayed along with upper and lower whiskers. The results are based on 500 simulations.

7.2 Numerial evaluation of inferential property

As before, we simulate the images by where or , and follows Gaussian distribution with mean 0 and standard deviation and is an IID. Let All ’s are selected randomly from the pool of original 5,216 images. Let control the number of samples taken from the pneumonia cohort. We consider two scenarios. (N1) Simulate 600 ’s from the original 1,341 files associated with the normal X-rays by sampling without replacement. We assign the first 300 labels as noncases (i.e., ) and the rest 300 as cases (i.e., ) ignoring their true class assignments as (N2) Simulate ’s in such a way so that data are comprised of both normal and penumonia patients. The first 300 X-rays are generated from the healthy cohort (i.e., ) and we label the corresponding responses as For the second half, % of the remaining 300 images are selected from the healthy cohort (i.e., ) and % are from the pneumonia cohort (i.e., ). For these 300 cases, we label the corresponding responses as Note that (N1) and (N2) evaluate the rejection rates of under and as in 4, respectively.

The top section of table 4

reports the rejection probabilities and corresponding standard errors for testing the null effect under the case when indeed

is true. The results are based on 8,000 simulations. The empirical type-I error rates are around the nominal levels for

However, we notice slightly lower rejection rates for . The bottom section illustrates the power properties for testing the null effect when data are generated under . The results associated with the power are based on 500 simulations. As departs from 0, more pneumonia patients are added in the sample. As expected, the empirical rejection rates for the model get closer to 1 for higher at different

Size of 1e-8 0.006 [0.001] 0.048 [0.004] 0.095 [0.005] 0.145 [0.006] 0.193 [0.007] 0.308 [0.008] 0.001 [0.000] 0.041 [0.002] 0.094 [0.003] 0.141 [0.004] 0.187 [0.004] 0.281 [0.005] 0.014 [0.001] 0.056 [0.003] 0.099 [0.003] 0.152 [0.004] 0.206 [0.005] 0.293 [0.005] Power of at 1e-8 0.026 [0.007] 0.476 [0.022] 0.992 [0.004] 1.000 [0.000] 1.000 [0.000] 1.000 [0.000] 0.01 0.006 [0.003] 0.488 [0.022] 0.966 [0.008] 1.000 [0.000] 1.000 [0.000] 1.000 [0.000] 0.10 0.008 [0.004] 0.414 [0.022] 0.980 [0.006] 1.000 [0.000] 1.000 [0.000] 1.000 [0.000]

Table 4: Testing for the null effect of functional coefficient in the simulated data for scenario (S1) under balanced cases (b). Empirical type-I error rates (top section) are reported for different significance level under . Empirical rejection rates of the test (bottom section) are reported for under . Standard errors are reported in parenthesis.

8 Discussion

In this paper, we propose a novel scalar-on-image regression by borrowing the ideas from functional data analysis where we view images as functional measurements. We approximate both functional covariate and coefficient using the same orthonormal data-driven basis functions and cast the model into a GLM framework. The methodology relies on the assumption that the leading eigenbasis functions of the functional predictor are most predictive of generalized response and that the latent predictor signals are relatively smooth.

The proposed idea is applicable to other response types that follow multinomial, Poisson, Gaussian, or any other distribution belonging to the Exponential family; in this case, we need to select the corresponding link function appropriately (e.g., log link for Poisson). The methodology can be extended to the case with multiple functional predictors observed with or without noise and defined on diverse sampling designs. Numerical results using chest X-rays show excellent classification performances in detecting (a) pneumonia and healthy patients, (b) viral and bacterial pneumonia cases. The method performs competitively even when we have a smaller sample to train the model. Two numerical experiments based on the data application show the robustness of the methodology and exhibit efficacy in attaining the inferential properties in terms of size and power. Despite the increased flexibility, the method is computationally efficient (computation time is in seconds) and can be implemented with the existing freely available software.

This work also opens a few avenues for future exploration. One possibility is to extend this methodology to explore the association between scalar responses and images that are observed longitudinally in a sparse design. Such an idea is vital to study the prognosis of diseases. In this case, we need to borrow the concept of longitudinal data analysis and merge it into the proposed scalar-on-image regression.

While the methodology is applied to the pediatric pneumonia cases, the approach can also be implemented in detecting pneumonia-like other infectious diseases. One possible test case could be the detection of COVID-19 patients in the recent pandemic. Based on the Johns Hopkins database [24], as of the 6th of May, 2020, approximately 3,755,341 people are infected by 2019 Novel coronavirus known as COVID-19 and more than 263,831 individuals died across the globe. Figure 4 illustrates the chest X-ray image for three COVID patients extracted from the freely available Kaggle dataset [6]; here the scattered white patches in both lungs are noticeable.

Figure 4: Example of posteroanterior chest X-ray images for three adult COVID-19 patients.

COVID-19 is very infectious, and once exposed, a person can transmit the infection being in an asymptomatic state, which is a state without showing medically known clinical signs or symptoms. Also, the detection of COVID-19 patients becomes a significant challenge due to the limited detection kits and short of medical supplies. The current testing approach is based on PCR, which is a time-consuming procedure. Besides, the community hospitals in remote areas are falling short in medical supplies and staff. Therefore, an alternative testing procedure based on radiography examination may play a role in identifying COVID patients as a supplement to the existing process. In a recent study, the chest radiography images of COVID-19 patients indicate abnormalities in their X-rays, suggesting the effectiveness of a radiography examination as an alternative infection detection approach; see [21, 36, 1]. We want to address that COVID detection is more complicated due to the presence of pre-existing conditions and uncertainty about disease-biology as we are still in an early stage of this outbreak, and thus requires substantial validation and very high-quality data.

Acknowledgments

The content is solely the responsibility of the author and does not represent the official views of the UnitedHealth Group R&D or its any associated subsidiaries.

References

  • [1] T. Ai, Z. Yang, H. Hou, C. Zhan, C. Chen, W. Lv, Q. Tao, Z. Sun, and L. Xia (2020) Correlation of chest ct and rt-pcr testing in coronavirus disease 2019 (covid-19) in china: a report of 1014 cases. Radiology, pp. 200642. Cited by: §8.
  • [2] Y. Bar, I. Diamant, L. Wolf, and H. Greenspan (2015) Deep learning with non-medical training used for chest pathology identification. In Medical Imaging 2015: Computer-Aided Diagnosis, Vol. 9414, pp. 94140V. Cited by: §1.
  • [3] N. Bennett, R. Steele, J. Domachowke, J. Ramero, and L. Barton (2018) Pediatric pneumonia. Medscape. External Links: Link Cited by: §1.
  • [4] H. Cardot, F. Ferraty, and P. Sarda (1999) Functional linear model. Statistics & Probability Letters 45, pp. 11–22. Cited by: §3.
  • [5] H. Cardot and P. Sarda (2008) Varying-coefficient functional linear regression models. Communications in Statistics—Theory and Methods 37, pp. 3186–3203. Cited by: §3.
  • [6] J. P. Cohen, P. Morrison, and L. Dao (2020) COVID-19 image data collection. arXiv 2003.11597. External Links: Link Cited by: §8.
  • [7] M. G. Daniel Kermany (2018) Labeled optical coherence tomography (oct) and chest x-ray images for classification. Note: Mendeley Data, v2 External Links: Link Cited by: §2.
  • [8] C. Di, C. M. Crainiceanu, B. S. Caffo, and N. M. Punjabi (2009)

    Multilevel functional principal component analysis

    .
    The annals of applied statistics 3 (1), pp. 458. Cited by: §3.2.
  • [9] Discussion (2018) Labeled optical coherence tomography (oct) and chest x-ray images for classification. External Links: Link Cited by: §2.
  • [10] A. J. Dobson and A. G. Barnett (2018) An introduction to generalized linear models. CRC press. Cited by: §4.2.
  • [11] C. Ebeledike and T. Ahmad (2019) Pediatric pneumonia. In StatPearls [Internet], Cited by: §1.
  • [12] Y. Fan, G. M. James, P. Radchenko, et al. (2015) Functional additive regression. The Annals of Statistics 43 (5), pp. 2296–2325. Cited by: §3.2.
  • [13] J. Gertheiss, J. Goldsmith, C. M. Crainiceanu, and S. Greven (2013) Longitudinal scalar-on-functions regression with application to tractography data. Biostatistics 14, pp. 447–461. Cited by: §3.2, §3.
  • [14] J. Goldsmith, F. Scheipl, L. Huang, J. Wrobel, J. Gellar, J. Harezlak, M. McLean, B. Swihart, L. Xiao, C. Crainiceanu, et al. (2018) Refund: regression with functional data. R package version 0.1-17. External Links: Link Cited by: §6.1.
  • [15] J. Goldsmith, J. Bobb, C. M. Crainiceanu, B. Caffo, and D. Reich (2011) Penalized functional regression. Journal of computational and graphical statistics 20 (4), pp. 830–851. Cited by: §3.2, §3.
  • [16] O. Gozes, M. Frid-Adar, N. Sagie, H. Zhang, W. Ji, and H. Greenspan (2020) Coronavirus detection and analysis on chest ct with deep learning. arXiv preprint arXiv:2004.02640. Cited by: §1.
  • [17] M. Greiner, D. Pfeiffer, and R. Smith (2000) Principles and practical application of the receiver-operating characteristic analysis for diagnostic tests. Preventive veterinary medicine 45 (1-2), pp. 23–41. Cited by: §4.3.
  • [18] T. Grimmer, G. Henriksen, H. Wester, H. Förstl, W. E. Klunk, C. A. Mathis, A. Kurz, and A. Drzezga (2009) Clinical severity of alzheimer’s disease is associated with pib uptake in pet. Neurobiology of aging 30 (12), pp. 1902–1909. Cited by: §1.
  • [19] L. Horváth and P. Kokoszka (2012) Inference for functional data with applications. Vol. 200, Springer Science & Business Media. Cited by: §3.2.
  • [20] S. R. Howie and D. R. Murdoch (2019) Global childhood pneumonia: the good news, the bad news, and the way ahead. The Lancet Global Health 7 (1), pp. e4–e5. Cited by: §1.
  • [21] C. Huang, Y. Wang, X. Li, L. Ren, J. Zhao, Y. Hu, L. Zhang, G. Fan, J. Xu, X. Gu, et al. (2020) Clinical features of patients infected with 2019 novel coronavirus in wuhan, china. The Lancet 395 (10223), pp. 497–506. Cited by: §8.
  • [22] L. Huang, J. Goldsmith, P. T. Reiss, D. S. Reich, and C. M. Crainiceanu (2013) Bayesian scalar-on-image regression with application to association between intracranial dti and cognitive outcomes. NeuroImage 83, pp. 210–223. Cited by: §1.
  • [23] G. M. James and B. W. Silverman (2005) Functional adaptive model estimation. Journal of the American Statistical Association 100, pp. 565–576. Cited by: §3.
  • [24] JHU (2019) 2019 novel coronavirus covid-19 (2019-ncov) data repository by johns hopkins csse. External Links: Link Cited by: §8.
  • [25] J. Johnston and J. DiNardo (1963) Econometric methods. Vol. 26, New York. Cited by: §5.
  • [26] J. Kang, T. D. Johnson, T. E. Nichols, and T. D. Wager (2011) Meta analysis of functional neuroimaging data via bayesian spatial point processes. Journal of the American Statistical Association 106 (493), pp. 124–134. Cited by: §1.
  • [27] D. S. Kermany, M. Goldbaum, W. Cai, C. C. Valentim, H. Liang, S. L. Baxter, A. McKeown, G. Yang, X. Wu, F. Yan, et al. (2018) Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172 (5), pp. 1122–1131. Cited by: §1, §1.
  • [28] Y. Kim and L. F. Donnelly (2007) Round pneumonia: imaging findings in a large series of children. Pediatric radiology 37 (12), pp. 1235–1240. Cited by: §1.
  • [29] M. G. Kundu, J. Harezlak, and T. W. Randolph (2016) Longitudinal functional models with structured penalties. Statistical Modelling 16, pp. 114–139. Cited by: §3.
  • [30] R. I. Markowitz and E. Ruchelli (1998) Pneumonia in infants and children: radiological-pathological correlation. In Seminars in roentgenology, Vol. 33, pp. 151–162. Cited by: §1.
  • [31] P. McCullagh (2018) Generalized linear models. Routledge. Cited by: §5.
  • [32] P. McCullough and J. Nelder (1989) Generalized linear models chapman and hall. New York. Cited by: §4.2.
  • [33] G. Molenberghs and G. Verbeke (2007) Likelihood ratio, score, and wald tests in a constrained parameter space. The American Statistician 61 (1), pp. 22–27. Cited by: §5.
  • [34] H. Müller and F. Yao (2012) Functional additive models. Journal of the American Statistical Association 103, pp. 1534–1544. Cited by: §3.
  • [35] J. A. Nelder and R. W. Wedderburn (1972) Generalized linear models. Journal of the Royal Statistical Society: Series A (General) 135 (3), pp. 370–384. Cited by: §3.1.
  • [36] M. Ng, E. Y. Lee, J. Yang, F. Yang, X. Li, H. Wang, M. M. Lui, C. S. Lo, B. Leung, P. Khong, et al. (2020) Imaging profile of the covid-19 infection: radiologic findings and literature review. Radiology: Cardiothoracic Imaging 2 (1), pp. e200034. Cited by: §8.
  • [37] S. Y. Park and A. Staicu (2015) Longitudinal functional data analysis. Stat 4 (1), pp. 212–226. Cited by: §3.2.
  • [38] G. Pau, F. Fuchs, O. Sklyar, M. Boutros, and W. Huber (2010) EBImage—an r package for image processing with applications to cellular phenotypes. Bioinformatics 26 (7), pp. 979–981. Cited by: §6.1.
  • [39] W. D. Penny, N. J. Trujillo-Barreto, and K. J. Friston (2005) Bayesian fmri time series analysis with spatial priors. NeuroImage 24 (2), pp. 350–362. Cited by: §1.
  • [40] J. L. Peugh (2010) A practical guide to multilevel modeling. Journal of school psychology 48 (1), pp. 85–112. Cited by: §5.
  • [41] J. O. Ramsay and B. W. Silverman (2007) Applied functional data analysis: methods and case studies. Springer. Cited by: §3.2.
  • [42] P. T. Reiss, L. Huo, Y. Zhao, C. Kelly, and R. T. Ogden (2015) Wavelet-domain regression and predictive inference in psychiatric neuroimaging. The annals of applied statistics 9 (2), pp. 1076. Cited by: §1.
  • [43] P. T. Reiss and R. T. Ogden (2007) Functional principal component regression and functional partial least squares. Journal of the American Statistical Association 102 (479), pp. 984–996. Cited by: §3.2, §3.
  • [44] D. E. Shapiro (1999) The interpretation of diagnostic tests. Statistical methods in medical research 8 (2), pp. 113–134. Cited by: §4.3.
  • [45] J. Shin, S. Lee, S. J. Kim, S. Kim, S. Cho, and Y. Kim (2010) Voxel-based analysis of alzheimer’s disease pet imaging using a triplet of radiotracers: pib, fddnp, and fdg. Neuroimage 52 (2), pp. 488–496. Cited by: §1.
  • [46] S. M. Smith, M. Jenkinson, H. Johansen-Berg, D. Rueckert, T. E. Nichols, C. E. Mackay, K. E. Watkins, O. Ciccarelli, M. Z. Cader, P. M. Matthews, et al. (2006) Tract-based spatial statistics: voxelwise analysis of multi-subject diffusion data. Neuroimage 31 (4), pp. 1487–1505. Cited by: §1.
  • [47] A. Staicu, M. N. Islam, R. Dumitru, and E. v. Heugten (2020) Longitudinal dynamic functional regression. Journal of the Royal Statistical Society: Series C (Applied Statistics) 69 (1), pp. 25–46. Cited by: §3.1, §3.2, §3.2, §3, §4.1, §4.2.
  • [48] J. Stallrich, M. N. Islam, A.M. Staicu, D.L. Crouch, L. Pan, and H. H. Huang (2020) Optimal emg placement for a robotic prosthesis controller with sequential, adaptive functional estimation (safe). Annals of Applied Statistics. Cited by: §3.2, §3.
  • [49] R. C. Team et al. (2013) R: a language and environment for statistical computing. Cited by: §6.1.
  • [50] D. Terzopoulos et al. (2019) Semi-supervised multi-task learning with chest x-ray images. In International Workshop on Machine Learning in Medical Imaging, pp. 151–159. Cited by: §1.
  • [51] UNICEF, WHO, WBG, and UN (2019) Estimates developed by the un inter-agency group for child mortality estimation. UNICEF. External Links: Link Cited by: §1.
  • [52] L. Wang and A. Wong (2020)

    COVID-net: a tailored deep convolutional neural network design for detection of covid-19 cases from chest radiography images

    .
    arXiv preprint arXiv:2003.09871. Cited by: §1.
  • [53] X. Wang, Y. Peng, L. Lu, Z. Lu, and R. M. Summers (2018) Tienet: text-image embedding network for common thorax disease classification and reporting in chest x-rays. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    ,
    pp. 9049–9058. Cited by: §1.
  • [54] X. Wang, B. Nan, J. Zhu, R. Koeppe, and K. Frey (2017) Classification of adni pet images via regularized 3d functional data analysis. Biostatistics & epidemiology 1 (1), pp. 3–19. Cited by: §1, §3.2, §3.2.
  • [55] X. Wang, B. Nan, J. Zhu, and R. Koeppe (2014) Regularized 3d functional regression for brain image data via haar wavelets. The annals of applied statistics 8 (2), pp. 1045. Cited by: §1.
  • [56] S. N. Wood (2011) Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (1), pp. 3–36. Cited by: §5.
  • [57] L. Xiao, Y. Li, and D. Ruppert (2013) Fast bivariate p-splines: the sandwich smoother. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75, pp. 577–599. Cited by: §4.1.
  • [58] Y. Yang and H. Zou (2015) A fast unified algorithm for solving group-lasso penalize learning problems. Statistics and Computing 25 (6), pp. 1129–1141. Cited by: §4.2, §6.1.
  • [59] F. Yao, H. Müller, and J. Wang (2005) Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association 100 (470), pp. 577–590. Cited by: §4.1, §4.1.
  • [60] M. Yuan and Y. Lin (2006) Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (1), pp. 49–67. Cited by: §4.2.
  • [61] A. Zeileis and T. Hothorn (2002) Diagnostic checking in regression relationships. R News 2 (3), pp. 7–10. External Links: Link Cited by: §6.1.
  • [62] M. H. Zweig and G. Campbell (1993) Receiver-operating characteristic (roc) plots: a fundamental evaluation tool in clinical medicine. Clinical chemistry 39 (4), pp. 561–577. Cited by: §4.3.

9 Supplementary Materials

This Supplementary Material consists of two sections. Section 9.1 provides additional results from the analysis of the image classification for healthy and pneumonia patients. Further results from the numerical experiment are illustrated in Section 9.2.

9.1 Additional results for pnumonia detection analysis

9.1.1 Exploratory analysis

Figure 5 illustrates gray-scaled image data for three randomly chosen patients - healthy (left), viral (middle), and bacterial (right) pneumonia cases, respectively. The observed functional covariate, associated with the th subject, is comprised of intensity profiles of the corresponding X-ray image containing values between 0 (black) and 1 (white). The corresponding smoothed profiles are displayed in dashed black lines.

Figure 5: Image data for three randomly chosen patients - healthy (left), viral (middle), and bacterial (right) pneumonia, respectively. We highlight two intensity profiles (in red and blue color) recorded at two specific locations in the -direction. The dashed black lines correspond to the smoothed profiles.

9.1.2 Data analysis

We illustrate the idea for (S1). We apply functional principal component (FPC) analysis on the marginal covariance function induced by the functional predictor comprised of intensity profiles associated with 5,216 X-ray images. Setting a percentage of variance explained (PVE) value equal to 99% provides directions at which we have maximum variation in the data. We estimate functional principal components and corresponding eigenvalues where . Figure 6 displays the estimated directions with the explained percentage of variance; first, 6 FPCs explain almost 96% variation in the data. Similarly, spectral decomposition on for each results in ’s and ’s. Figure 7 displays the estimated directions associated with the profile of loadings ; first, 6 FPCs explain approximately 96% variation in the data. Similarly, we do it for each In our example,

Figure 8 displays the boxplots of the estimated scores (first 60 components) corresponding to both normal and pneumonia patients’ intensity profiles. Figure 9 displays the log p-values for testing the equality of mean scores between healthy and pneumonia patients at 5% level of significance. In particular, we test for each and where and refer to the mean values associated with the

th component for normal and pneumonia patients, respectively. A two-tailed t-test statistic is used to make inference. For testing,

for each we adjust for multiple comparisons by Bonferroni correction.

Figure 6: Top six estimated FPCs Corresponding ’s are reported in percentage.
Figure 7: Top six estimated FPCs associated with where Corresponding ’s are reported in percentage.

Figure 8: Box plot of estimated scores ’s for healthy and pneumonia patients. First, 60 loadings are displayed.

Figure 9: Testing the equality of mean scores between healthy and pneumonia patients. Reported are the p-values in log scale; color codes are used to represent different . Reference line (black) is drawn representing the nominal type-I error rate after adjusting for multiple comparisons assuming .

We report additional classification metrics associated with a confusion matrix. The metrics are defined as in Section 6. For completeness, we recall the necessary items here.

  • True negative rate (TNR) = TN / (TN + FP)

  • Positive predictive value (PPV) = TP / (TP + FP)

  • Mathhews correlation coefficient (MCC) =

  • F =

Table 5 reports results for (S1). We observe high TNRs implying low false-positive rates (FPRs), and high PPVs indicating low false discovery rates (FDR). Such observations are dominant across all settings. As before, PEN outperforms Non-PEN in the setting (M1), and the magnitude of improvement is higher in evaluating out-of-sample cases than that of in-sample ones. These findings are in alignment with what we observe in Table 1 of the draft.

Table 6 provides additional results for (S1). The results between PEN and Non-PEN are competitive.

Non-PEN PEN Type TNR PPV F TNR PPV F M1 + (a) + In 0.940 [0.011] 0.983 [0.003] 0.967 [0.004] 0.967 [0.010] 0.990 0.976 [0.004] M1 + (a) + Out 0.931 [0.017] 0.931 [0.016] 0.931 [0.011] 0.943 [0.023] 0.944 [0.020] 0.949 [0.010] M1 + (b) + In 0.936 0.982 0.972 0.968 0.991 0.982 M1 + (b) + Out 0.911 0.914 0.927 0.914 0.919 0.944 M2 + (a) + In 0.962 [0.015] 0.962 [0.013] 0.957 [0.004] 0.959 [0.012] 0.959 [0.011] 0.956 [0.004] M2 + (a) + Out 0.946 [0.017] 0.946 [0.017] 0.939 [0.012] 0.946 [0.020] 0.945 [0.018] 0.941 [0.011] M2 + (b) + In 0.966 0.966 0.955 0.959 0.959 0.957 M1 + (b) + Out 0.943 0.943 0.943 0.934 0.935 0.940

Table 5: Comparison of chest X-rays between pneumonia and normal (S1). Classification metrics are presented for in-sample and out-of-sample data with respect to training-testing schemes (a) and (b) based on (M1) and (M2). Median values and the corresponding IQR (in parenthesis) over 200 iterations are reported for scheme (a).

Non-PEN PEN Settings TNR PPV F TNR PPV F M1 + (a) + In 0.756 [0.052] 0.866 [0.016] 0.783 [0.027] 0.715 [0.044] 0.850 [0.013] 0.791 [0.016] M1 + (a) + Out 0.711 [0.066] 0.699 [0.036] 0.685 [0.028] 0.683 [0.051] 0.693 [0.027] 0.710 [0.025] M1 + (b) + In 0.749 0.864 0.793 0.756 0.864 0.778 M1 + (b) + Out 0.677 0.684 0.692 0.686 0.688 0.690 M2 + (a) + In 0.763 [0.046] 0.760 [0.023] 0.752 [0.014] 0.693 [0.064] 0.710 [0.027] 0.729 [0.015] M2 + (a) + Out 0.695 [0.059] 0.689 [0.030] 0.685 [0.029] 0.663 [0.066] 0.682 [0.028] 0.705 [0.030] M2 + (b) + In 0.783 0.773 0.755 0.680 0.708 0.741 M2 + (b) + In 0.711 0.700 0.687 0.666 0.685 0.705

Table 6: Comparison of chest X-rays between viral and bacterial pneumonia (S2). Classification metrics are presented for in-sample and out-of-sample data with respect to training-testing schemes (a) and (b) based on (M1) and (M2). Median values and the corresponding IQR (in parenthesis) over 200 iterations are reported for scheme (a).

9.2 Additional results for numerical experiment

Table 7 displays additional binary classification results for the simulated data with different magnitude of noises. We notice similar phenomena as in Table 3 of the draft; the classification results for PEN and Non-PEN are very close. We observe high MCC values for smaller noise levels for ; where MCC is viewed as a correlation coefficient between observed and predicted binary responses.

Non-PEN PEN Validation PPV MCC F PPV MCC F In 0.945 [0.025] 0.867 [0.023] 0.931 [0.013] 0.944 [0.027] 0.862 [0.024] 0.929 [0.014] Out 0.916 [0.047] 0.804 [0.065] 0.901 [0.032] 0.919 [0.045] 0.813 [0.065] 0.901 [0.031] In 0.941 [0.025] 0.862 [0.022] 0.929 [0.012] 0.943 [0.027] 0.858 [0.020] 0.927 [0.012] Out 0.915 [0.053] 0.803 [0.055] 0.899 [0.033] 0.918 [0.050] 0.802 [0.058] 0.898 [0.032] In 0.908 [0.034] 0.797 [0.025] 0.897 [0.012] 0.906 [0.029] 0.796 [0.026] 0.897 [0.013] Out 0.886 [0.055] 0.747 [0.065] 0.870 [0.034] 0.884 [0.051] 0.747 [0.066] 0.871 [0.034] In 0.813 [0.043] 0.617 [0.036] 0.806 [0.020] 0.810 [0.043] 0.620 [0.037] 0.810 [0.021] Out 0.789 [0.060] 0.573 [0.080] 0.783 [0.043] 0.787 [0.056] 0.574 [0.082] 0.784 [0.042] In 0.788 [0.046] 0.562 [0.035] 0.778 [0.025] 0.789 [0.050] 0.562 [0.036] 0.778 [0.024] Out 0.761 [0.063] 0.507 [0.081] 0.748 [0.048] 0.763 [0.062] 0.508 [0.087] 0.747 [0.050]

Table 7: Comparison of chest X-rays between simulated pneumonia and normal (S1) for balanced cases (b). Classification metrics (PPV, MCC, F) for in-sample and out-of-sample data at are summarized for Non-PEN and PEN estimation approach. Median values and the corresponding IQR (in parenthesis) are reported. Results are based on 500 simulations.