I Introduction
Medical imaging systems commonly are assessed, validated, and optimized using taskspecific measures of image quality that quantify the ability of an observer to perform a specific task [1, 2, 3, 4, 5]. When optimizing imaging systems for signal detection tasks (e.g., detection of a tumor), it has been advocated to use the performance of the Bayesian Ideal Observer (IO) as a figureofmerit (FOM). In this way, the imaging system can be optimized in such a way that the amount of taskspecific information in the measurement data is maximized. The IO for a binary signal detection task implements a test statistic given by the likelihood ratio and maximizes the area under the receiver operating characteristic (ROC) curve [6]. The IO has also been employed to assess the efficiency of human observers on signal detection tasks [7].
The IO test statistic is generally a nonlinear function of the image data and, except in some special cases, cannot be determined analytically. Because of this, samplingbased methods that employ Markovchain Monte Carlo (MCMC) techniques have been developed to approximate the IO test statistic for medical imaging applications [2, 8]. However, current applications of these methods have been limited to relatively simple object models that include parameterized torso phantoms [9], lumpy background models [2], and a binary texture model [8]. To the best of our knowledge, applications of MCMC methods to approximate the IO test statistic for more sophisticated object models—such as the clustered lumpy background (CLB) model that has been used to synthesize mammographic images—have not been reported to date.
When the IO is intractable, the Hotelling Observer (HO) can be employed to optimize imaging systems for signal detection tasks [10, 11, 12, 13]. The HO employs the Hotelling discriminant, which is the population equivalent of the Fisher linear discriminant [1], and is optimal among all linear observers in the sense that it maximizes the signaltonoise ratio of the test statistic [1, 14, 15]. However, implementation of the HO is also not without challenges. Specifically, it requires the estimation and inversion of a covariance matrix that can be enormous [16]. Different strategies for circumventing this difficulty exist [10]. For use in detection tasks where background variability is considered and the measurement noise covariance matrix is known, methods for the estimation and inversion of these large covariance matrices by use of a covariance matrix decomposition are available [1]. It has been demonstrated, however, that in certain situations the use of the covariance decomposition can result in a significant bias in the HO performance [17]. Alternatively, to avoid an explicit inversion of the covariance matrix, an iterative algorithm can be employed to estimate the Hotelling test statistic [1]. Finally, a variety of channelized HOs that utilize efficient channels have been proposed for approximating the HO in a computationally tractable way [18, 19, 4].
Supervised learningbased approaches hold significant promise for the design and implementation of model observers for optimizing imaging systems [20, 21, 22, 23]. Recent efforts have primarily focused on training anthropomorphic model observers using deep learning [22, 24, 25]. The extent to which deep learningbased methods can benefit such applications remains a topic of investigation due to the difficulty of acquiring large amounts of labeled data in medical imaging applications. When optimizing imaging systems and dataacquisition designs, computersimulated data can sometimes be employed [2]. In such applications, large amounts of labeled data can be generated and it can be feasible to train complicated inference models to be employed as model observers for assessing taskbased measures of image quality.
Artificial neural networks (ANNs) with sufficiently complex architectures are known to be able to approximate any continuous function [26]. Accordingly, in principle, ANNs can be trained to approximate functions that represent test statistics of model observers. For example, Kupinski et al.
investigated the use of fullyconnected neural networks (FCNNs) to approximate the test statistic of an IO that acted on lowdimensional vectors of extracted image features
[27]. More recently, Zhou and Anastasio employed convolutional neural networks (CNNs) to approximate the IO test statistic that acted directly on images for a simple signalknownexactly and backgroundknownexactly (SKE/BKE) binary signal detection task, and demonstrated the use of modern deep learning technologies for approximating IOs [28].In this work, supervised learningbased methods that employ ANNs for approximating the IO test statistic are explored systematically for binary signal detection tasks in which the observer acts on 2D image data. The detection tasks considered are of varying difficulty, and address both background and signal randomness in combination with different measurement noise models. In order to approximate the generally nonlinear IO test statistic, CNNs are employed. For the special case of the HO, an alternative supervised learning methodology is proposed that employs singlelayer neural networks (SLNNs) for learning the Hotelling template without the need for explicitly estimating and inverting covariance matrices. The signal detection performance is assessed via receiver operating characteristic (ROC) analysis [29, 1]. The results produced by the proposed supervised learning methods are compared to those produced by use of traditional numerical methods or analytical calculations when feasible. The potential advantages of the proposed supervised learning approaches for approximating the IO and HO test statistics are discussed.
The remainder of this article is organized as follows. In Sec. II, the salient aspects of binary signal detection theory are reviewed and previous works on approximating the IO test statistic by use of ANNs are summarized. A novel methodology that employs SLNNs to approximate the HO test statistic is developed in Sec. III. The numerical studies and results of the proposed methods for approximating the IO and HO for signal detection tasks with different object models and noise models are provided in Sec. IV and Sec. V. Finally, the article concludes with a discussion of the work in Sec. VI.
Ii Background
Consider a linear digital imaging system that is described as:
(1) 
where is a vector that describes the measured image data, is the object function with a spatial coordinate , or , denotes a continuoustodiscrete (CD) imaging operator that maps , and is the measurement noise. Because is a random vector, so is the measured image data . Below, the object function will be viewed as being either deterministic or stochastic, depending on the specification of the signal detection task. When its spatial dependence is not important to highlight, the notation will be employed to denote . The same notation will be employed with other functions.
Iia Formulation of binary signal detection tasks
A binary signal detection task requires an observer to classify an image as satisfying either a signalpresent hypothesis (
) or a signalabsent hypothesis (). The imaging processes under these two hypotheses can be described as:(2a)  
(2b) 
where and represent the background and signal functions, respectively, is the background image and is the signal image. In a signalknownexactly (SKE) detection task, is nonrandom, whereas in a signalknownstatistically (SKS) detection task it is a random process. Similarly, in a backgroundknownexactly (BKE) detection task, is nonrandom, whereas in a backgroundknownstatistically (BKS) detection task it is a random process. Let and denote the component of and , respectively. When is a linear operator, as in the numerical studies presented later, these quantities are defined as:
(3a)  
(3b) 
where is the point response function of the imaging system associated with the measurement [1].
To perform a binary signal detection task, an observer computes a test statistic that maps the measured image to a realvalued scalar variable, which is compared to a predetermined threshold to classify as satisfying or . By varying the threshold , a ROC curve can be plotted to depict the tradeoff between the falsepositive fraction (FPF) and the truepositive fraction (TPF) [29, 1]. The area under the ROC curve (AUC) can be subsequently calculated to quantify the signal detection performance.
IiB Bayesian Ideal Observer and Hotelling Observer
Among all observers, the IO sets an upper performance limit for binary signal detection tasks. The IO test statistic is defined as any monotonic transformation of the likelihood ratio , which is defined as [1, 2, 27]:
(4) 
Here, and
are conditional probability density functions that describe the measured data
under hypothesis and , respectively. It will prove useful to note that one monotonic transformation ofis the posterior probability
:(5) 
where and
are the prior probabilities associated with the two hypotheses.
When the IO test statistic cannot be determined analytically, the HO is sometimes employed to assess taskbased measures of image quality. The HO employs the Hotelling discriminant that is the population equivalent of the Fisher linear discriminant [1]. The HO test statistic is computed as:
(6) 
where is the Hotelling template. Let denote the conditional mean of the image data given an object function. Similarly, let denote the conditional mean averaged with respect to object randomness associated with (). The Hotelling template is defined as [1]:
(7) 
Here, is the covariance matrix of the measured data under the hypothesis (), and is the difference between the mean of the measured data under the two hypotheses. It is useful to note that the covariance matrix can be decomposed as [1]:
(8) 
In Eq. (8), the first term is the mean of the noise covariance matrix averaged over under the hypothesis . The second term is the covariance matrix associated with the object under the hypothesis .
The signaltonoise ratio associated with a test statistic , denoted as , is defined as:
(9) 
where and
are the mean and variance of
under the hypothesis (). Similar to the AUC, is a commonly employed FOM of signal detectability that can be employed to guide the optimization of imaging systems. Whereas the IO maximizes the AUC among all observers, the HO maximizes the value of among all linear observers that can be computed as [1, 15]:(10) 
IiC Previous works on approximating the IO test statistic by use of ANNs
A feedforward ANN is a system of connected artificial neurons that are computational units described by adjustable realvalued parameters called weights
[30, 31]. A sufficiently complex ANN possesses the ability to approximate any continuous function [26]. Accordingly, ANNs can be trained to approximate functions that represent test statistics of model observers. Previous published results indicate the feasibility of using ANNs to approximate IOs [27, 28]. For example, Kupinski et al. [27] applied fullyconnected neural networks (FCNNs), which are a conventional type of feedforward ANNs, to approximate the test statistic for an IO acting on lowdimensional vectors of extracted image features. It was demonstrated that [27], given sufficient training data and an ANN of sufficient representation capacity, the test statistic of the IO acting on a lowdimensional vector of image features could be accurately approximated. However, ordinary ANNs, such as FCNNs, do not scale well to highdimensional data (e.g., images) because each neuron in FCNNs is fully connected to all neurons in the previous layer, which limits the dimension of the input layer and depth of the models that can be trained effectively. As such, FCNNs are not well suited for use as numerical observers that act directly on image data.
Modern deep learning approaches that employ convolutional neural networks (CNNs) have been developed to address this limitation [31, 32, 33, 34]. A comprehensive review of CNNs for image classifications can be found in [35]. Recently, motivated by the success of CNNs in image classification tasks, Zhou and Anastasio [28] investigated a supervised learningbased method to approximate the test statistic of an IO that acts directly on 2D images by using CNNs. The basic idea is to identify a CNN that can approximate which, as described by Eq. (5), is a monotonic transformation of the likelihood ratio. In that preliminary work, the feasibility of using CNNs to approximate an IO for a simple SKE/BKE object model was explored. As an extension of that preliminary study, supervised learningbased methods that employ CNNs and SLNNs for approximating test statistics of the IO and HO acting on 2D measured images with various object and noise models are systematically explored in this work.
IiD Maximum likelihood estimation of CNN weights for approximating the IO test statistic
To train a CNN for approximating the posterior probability
, the sigmoid function is employed in the last layer of the CNN; in this way the output of the CNN can be interpreted as probability. Let the set of all weights of neurons in a CNN be denoted by the vector
and denote the output of the CNN as . It should be noted that the vertical bar in has two usages: to denote that the probability of is conditioned on and to denote that the function is parameterized by the nonrandom weight vector . The goal of training the CNN is to determine a vector such that the difference between the CNNapproximated posterior probability and the actual posterior probability is small. The posterior can be subsequently approximated by .A supervised learningbased method can be employed to approximate the maximum likelihood (ML) estimate of [27]. Let denote the image label, where and correspond to the hypothesis and , respectively. The ML estimate of can be obtained by minimizing the generalization error defined as the ensemble average of crossentropy over distribution [2]:
(11) 
where denotes the mean over the probability density . If can represent any functional form, when Eq. (11) is minimized [2]. To see this, one can rewrite the negative crossentropy as:
(12) 
When the CNN is sufficiently complex to represent any functional form, the task of finding becomes finding the optimal that maximizes Eq. (12). Consider the gradient of Eq. (12) with respect to :
(13) 
For , Eq. (13) equals zero only when , from which .
Given a set of independent labeled training data , can be estimated by minimizing the empirical error, which is the average of the crossentropy over the training dataset:
(14) 
where is the empirical estimate of . The IO test statistic is subsequently approximated as . However, if the training dataset is small, directly minimizing the empirical error can cause overfitting and large generalization errors [36]
. To reduce the rate at which overfitting happens, minibatch stochastic gradient descent algorithms can be employed
[36]. In online learning, these minibatches are drawn onthefly from the joint distribution
[36].Iii Approximation of the HO test statistic by use of SLNNs
Below, a novel supervised learningbased method is proposed for learning the HO test statistic.
Iiia Training the HO by use of supervised learning
As described by Eq. (6), the HO test statistic is a linear function of the measured image . Linear functions can be modeled by a singlelayer neural network (SLNN) that possesses only a single fully connected layer. Denote the vector of weight parameters in the SLNN as . The output of a SLNN can be computed as:
(15) 
To approximate by , a SLNN can be trained by maximizing by solving the following optimization problem:
(16)  
subject to 
where is any positive number. The Lagrangian function related to this constrained optimization problem can be computed as:
(17) 
The optimal solution satisfies the Lagrange multiplier conditions:
(18a)  
(18b) 
where is the Lagrange multiplier. According to Eq. (18):
(19a)  
(19b) 
Because Eq. (17) is convex, is the global minimum of and the constrained optimization problem defined in Eq. (16) can be solved by minimizing with respect to , which is equivalent to minimizing with respect to . Hence, the generalization error to be minimized is defined as:
(20) 
In order to have , is set to 2.
Given labeled image data in which half of them are signalabsent and the others are signalpresent, the empirical error to be minimized is:
(21) 
where , , and .
Any gradientbased algorithm can be employed to minimize Eq. (21) to learn the empirical estimate of the Hotelling template, which is equivalent to the template employed by the Fisher linear discriminant. Because this method does not require estimation and inversion of a covariance matrix, it can scale well to large images.
IiiB Training the HO by use of a covariancematrix decomposition
Methods have been developed previously to estimate and invert empirical covariance matrices by use of a covariancematrix decomposition [1, 17]. As stated in Eq. (8), the covariance matrix can be decomposed into the component associated with the object randomness and that associated with the noise randomness . To invert the full covariance matrix for computing the HO test statistic, is assumed known and needs to be estimated from samples of background and signal images. When uncorrelated noise is considered, is a diagonal matrix. For applications where detectors introduce correlations in the measurements, is banded and may be a nearly diagonal matrix [1]. In this subsection, an alternative method is provided to approximate the HO test statistic by use of a covariancematrix decomposition.
According to the covariancematrix decomposition stated in Eq. (8), the variance of the test statistic can be computed as:
(22) 
Denote as , which is assumed known. The generalization error defined in Eq. (20) can be reformulated as:
(23) 
where , and .
Given background images and signal images , the empirical error to be minimized is:
(24) 
where , and .
To approximate the Hotelling template, any gradientbased algorithm can be employed to minimize Eq. (. This method also does not require inversion of covariance matrix.
Iv Numerical studies
Computersimulation studies were conducted to investigate the proposed methods for learning the IO and HO test statistics. Four different binary signal detection tasks were considered. A signalknownexactly and backgroundknownexactly (SKE/BKE) signal detection task was considered in which the IO and HO can be analytically determined. A signalknownexactly and backgroundknownstatistically (SKE/BKS) detection task and a signalknownstatistically and backgroundknownstatistically (SKS/BKS) detection task that both employed a lumpy background object model [37] were also considered. For these two BKS signal detection tasks, computations of the IO test statistic by use of MCMC methods have been accomplished [2, 38]. Finally, a SKE/BKS detection task employing a clustered lumpy background (CLB) object model [39] was addressed. To the best of our knowledge, current MCMC applications to the CLB object model have not been reported [8]. For all considered signal detection tasks, ROC curves were fit by use of the MetzROC software [40] that utilized the “proper” binormal model [41, 42].
The imaging system in all studies was simulated by a linear CD mapping with a Gaussian kernel that was motivated by an idealized parallelhole collimator system [2, 43]:
(25) 
where the height and the width . The details for each signal detection task and the training of neural networks are given in the following subsections.
Iva SKE/BKE signal detection task
Both the signal and background were nonrandom for this case. The image size was (i.e., ) and the background image was specified as . The signal function was a 2D symmetric Gaussian function:
(26) 
where is the amplitude, is the coordinate of the signal location, and is the width of the signal. The signal image can be computed as:
(27) 
Independent and identically distributed (i.i.d.) Laplacian noise that can describe histograms of filtered natural images [44] was employed: , where denotes a Laplacian distribution with the exponential decay . The value of was set to
, which corresponds to standard deviation
.Because the randomness in the measurements was only from the Laplacian noise, the IO test statistic can be computed as [44]:
(28) 
The Hotelling template can be computed by analytically inverting the covariance matrix ():
(29) 
where denotes the component at the row and the column () of . The performances of the proposed learningbased methods were compared to those produced by these analytical computations for this case.
IvB SKE/BKS signal detection task with a lumpy background model
In this case, the image size was and a nonrandom signal described by Eq. (26) was employed. The background was random and described by a stochastic lumpy object model [37]:
(30) 
where
is the number of lumps that is sampled from Poisson distribution with the mean
: , denotes a Poisson distribution with the mean that was set to 5, and is the lumpy function modeled by a 2D Gaussian function with amplitude and width :(31) 
Here, was set to 1, was set to 7, and is the location of the
lump that was sampled from uniform distribution over the field of view. The background image
was analytically computed as:(32) 
The measurement noise was an i.i.d. Gaussian noise that models electronic noise: , where
denotes a Gaussian distribution with the mean 0 and the standard deviation
that was set to . Examples of signalpresent images are shown in the top row of Fig. 1.The IO and HO test statistics cannot be analytically determined because of the background randomness. To serve as a surrogate for ground truth, the MCMC method was employed to approximate the IO test statistic. In one Markov Chain, 200,000 background images were sampled according to the proposal density and the acceptance probability defined in [2]. The traditional HO test statistic was calculated by use of the covariancematrix decomposition [1] with an empirical background covariance matrix that was estimated by use of 100,000 background images.
IvC SKS/BKS signal detection task with a lumpy background model
This case employed the same stochastic lumpy background model that was specified in the SKE/BKS case described above. The signal was random and modeled by a 2D Gaussian function with a random location and a random shape, which can be mathematically represented as:
(33) 
Here, is a rotation matrix that rotates a vector through an angle in Euclidean space, and determines the width of the Gaussian function along each coordinate axis. The signal image was analytically computed as:
(34) 
where and . The value of was set to , was drawn from a uniform distribution: , and were sampled from a uniform distribution: , and was uniformly distributed over the image field of view. The measurement noise was Gaussian having zero mean and a standard deviation of .
The MCMC method was employed to provide a surrogate for ground truth for the IO. In each Markov Chain, 400,000 background images were sampled according to the proposal density and the acceptance probability described in [38]. The traditional HO test statistic was calculated by use of the covariancematrix decomposition [1] with an empirical object covariance matrix that was estimated by use of 100,000 background images and 100,000 signal images.
Because linear observers typically are unable to detect signals with random locations, the HO was expected to perform poorly. Multitemplate model observers [45, 46, 47] and the scanning HO [48, 49] can be employed to detect variable signals. In this paper, we do not provide a method for training these observers. The approximation of multitemplate observers and the scanning HO by use of a supervised learning method represents a topic for future investigation.
IvD SKE/BKS signal detection task with a clustered lumpy background model
A second SKE/BKS detection task associated with a more sophisticated stochastic background model, the clustered lumpy background (CLB), was considered also. The CLB model can be employed to synthesize mammographic images [39]. In this study, the image size was set to and a CLB realization was simulated as:
(35) 
where is the number of clusters, is the number of blobs in the cluster, is the location of the cluster, and is the location of the blob in the cluster. Here, was sampled from a uniform distribution over the image field of view, was sampled from a Gaussian distribution with standard deviation and center , and is the blob function:
(36) 
where , and are adjustable parameters. The rotation matrix is associated with the angle , and is the “radius” of the ellipse with halfaxes and :
(37) 
where . Here, and denote the components of . The parameters employed for generating the CLB images are summarized in Table I.
Lx  Ly  

150  20  5  2  2.1  0.5  12  100 
The signal image was generated as a 2D symmetric Gaussian function centered in the image with an amplitude of and a width of . Mixed PoissonGaussian noise that models both photon noise and electronic noise was employed. The standard deviation of Gaussian noise was set to . Examples of signalpresent images are shown in the bottom row of Fig. 1.
To the best of our knowledge, current MCMC methods have not been applied to the CLB object model and the mixed PoissonGaussian noise model. To provide a surrogate for ground truth for the HO, the traditional HO was computed by use of covariancematrix decomposition with the empirical background covariance matrix estimated using 400,000 background images.
IvE Details of training neural networks
Here, details regarding the implementation of the supervised learningbased methods for approximating the IO and HO for the tasks above are described.
The trainvalidationtest scheme [36] was employed to evaluate the proposed supervised learning approaches. Specifically, the CNNs and SLNNs were trained on a training dataset. Subsequently, these neural networks were specified based upon a validation dataset and the detection performances of these networks were finally assessed on a testing dataset. To prepare training datasets for the BKS detection tasks, 100,000 lumpy background [37] images and 400,000 CLB images [39] were generated. When training the CNNs for approximating IOs, to mitigate the overfitting that can be caused by insufficient training data, a “semionline learning” method was proposed and employed. In this approach, the measurement noise was generated onthefly and added to noiseless images drawn from the finite datasets. The validation dataset and testing dataset both comprised 200 images for each class.
To approximate the HO test statistic, SLNNs that represent linear functions were trained by use of the proposed method employing the covariancematrix decomposition described in Sec. IIIB. This was possible because the noise models for the considered detection tasks were known. At each iteration in training processes, the parameters of SLNNs were updated by minimizing error function Eq. (24) on minibatches drawn from the training dataset. Specifically, when training the SLNN for the SKE/BKE detection task, the signal and background that were known exactly were employed and each minibatch contained the fixed signal image and background image. When training the SLNNs for the SKE/BKS detection tasks, the known signals were employed and each minibatch contained 200 background images and the fixed signal image. For training the SLNN for the SKS/BKS detection task, each minibatch contained 200 background images and 200 signal images. The weight vector that produced the maximum value evaluated on the validation dataset was specified to approximate the Hotelling template. The feasibility of the proposed methods for approximating the HO from a reduced number of images was also investigated. Specifically, the SLNNs were trained for the SKE/BKS detection task with the CLB model by minimizing Eq. (21) and Eq. (24) on datasets comprising 2000 labeled measurements (contained 1000 signalpresent images and 1000 signalabsent images) and 2000 background images, respectively.
As opposed to the case of the HO approximation where the network architecture is known linear, to specify the CNN architecture for approximating the IO, a family of CNNs that possess different numbers of convolutional (CONV) layers was explored. Specifically, an initial CNN having one CONV layer was firstly trained by minimizing the crossentropy described in Eq. (14). Subsequently, CNNs having additional CONV layers were trained according to Eq. (14) until the network did not significantly decrease the crossentropy on a validation dataset. The crossentropy was considered as significantly decreased if its decrement is at least of that produced by the previous CNN. Finally the CNN having the minimum validation crossentropy was selected as the optimal CNN in the explored architecture family. For all the considered CNN architectures in this architecture family, each CONV layer comprised 32 filters with
spatial support and was followed by a LeakyReLU activation function
[50], a maxpooling layer
[51] following the last CONV layer was employed to subsample the feature maps, and finally a fully connected (FC) layer using a sigmoid activation function computed the posterior probability. It should be noted that these architecture parameters were determined heuristically and may not be optimal for many signal detection tasks. One instance of the implemented CNN architecture is illustrated in Fig.
2. These CNNs were trained by minimizing the error function defined in Eq. (14) on minibatches at each iteration. Each minibatch contained 200 signalabsent images and 200 signalpresent images. Because the HO detection performance is a lower bound of the IO detection performance, the selected optimal CNN should not perform worse than the SLNNapproximated HO (SLNNHO) on the corresponding signal detection task if that CNN approximates IO. If this occurs, the architecture parameters need to be respecified and a different family of CNN architectures should be considered.The Adam algorithm [52]
, which is a stochastic gradient descent algorithm, was employed in Tensorflow
[53] to minimize the error functions for approximating the IO and HO. All networks were trained on a single NVIDIA TITAN X GPU.V Results
Va SKE/BKE signal detection task
VA1 HO approximation
A linear SLNN was trained for 1000 minibatches and the weight vector that produced the maximum value evaluated on the validation dataset was selected to approximate the Hotelling template. The linear templates employed by the SLNNHO and the analytical HO are shown in Fig. 3. The results corresponding to the SLNNHO closely approximate those of the analytical HO.
The ROC curve produced by the SLNNHO (purple dashed curve) is compared to that produced by the analytical HO (yellow curve) in Fig. 4 (b). These two curves nearly overlap.
VA2 IO approximation
The CNNs having one to three CONV layers were trained for 100,000 minibatches and the corresponding validation crossentropy values are plotted in Fig. 4 (a). The validation crossentropy was not significantly decreased after adding the third CONV layer. Therefore, we stopped adding more CONV layers and the CNN having the minimum validation crossentropy, which was the CNN that possesses 3 CONV layers, was selected. The detection performance of this selected CNN was evaluated on the testing dataset and the resulting AUC value was 0.890, which was greater than that of the SLNNHO (i.e., 0.831). Subsequently, the selected CNN was employed to approximate the IO. The testing ROC curve of the CNNapproximated IO (CNNIO) (reddashed curve) was compared to that of the analytical IO (blue curve) in Fig. 4 (b). The efficiency of the CNNIO, which can be computed as the squared ratio of the detectability index [54] of the CNNIO to that of the IO, was . The mean squared error (MSE) of the posterior probabilities computed by the analytical IO and the CNNIO was . These quantities were evaluated on the testing dataset.
VB SKE/BKS signal detection task with lumpy background
VB1 HO approximation
The SLNN was trained for 1000 minibatches (i.e., 2 epochs) and the weight vector
that produced the maximum value evaluated on the validation dataset was selected to approximate the Hotelling template. The linear templates employed by the SLNNHO and the traditional HO are shown in Fig. 5. The results corresponding to the SLNNHO closely approximate those of the traditional HO.The ROC curves corresponding to the traditional HO (yellow curve) and the SLNNHO (purpledashed curve) are compared in Fig. 6 (b). Two ROC curves nearly overlap.
VB2 IO approximation
The CNNs having 1, 3, 5, and 7 CONV layers were trained for 100,000 minibatches (i.e., 200 epochs) and the corresponding validation crossentropy values are plotted in Fig. 6 (a). There was no significant difference of the validation crossentropy between the CNNs having 5 and 7 CONV layers. Therefore, we stopped adding more CONV layers and the CNN having the minimum validation crossentropy, which was the CNN that possesses 7 CONV layers, was selected. The selected CNN was evaluated on the testing dataset and the resulting AUC value was 0.907, which was greater than that of the SLNNHO (i.e., 0.808). Subsequently, the selected CNN was employed to approximate the IO. The testing ROC curve of the CNNIO (reddashed curve) is compared to that of the MCMCcomputed IO (MCMCIO) (blue curve) in Fig. 6 (b). The efficiency of the CNNIO was with respect to the MCMCIO, and the MSE of the posterior probabilities computed by the CNNIO and the MCMCIO was . These quantities were evaluated on the testing dataset.
VC SKS/BKS signal detection task with lumpy background
VC1 HO approximation
A linear SLNN was trained for 1000 minibatches (i.e., 2 epochs) and the weight vector that produced the maximum value evaluated on the validation dataset was selected to approximate the Hotelling template. The linear templates employed by the SLNNHO and the traditional HO are shown in Fig. 7. The results corresponding to the SLNNHO closely approximate those of the traditional HO.
The ROC curves corresponding to the SLNNHO (purple dashed curve) and the traditional HO (yellow curve) are compared in Fig. 8 (b). The two ROC curves nearly overlap. The HO performed nearly as a random guess for this task as expected.
VC2 IO approximation
Convolutional neural networks having 1, 5, 9, and 13 CONV layers were trained for 300,000 minibatches (i.e., 600 epochs) and the corresponding validation crossentropy values are plotted in Fig. 8 (a). Because there was no significant decrement of the validation crossentropy value after adding 4 CONV layers to the CNN having 9 CONV layers, we stopped adding more CONV layers and the CNN having the minimum validation crossentropy value, which was the CNN with 13 CONV layers, was selected. The selected CNN was evaluated on the testing dataset and the resulting AUC value was 0.853, which was greater than that of the SLNNHO (i.e., 0.508). Subsequently, the selected CNN was employed to approximate the IO. The testing ROC curve produced by the CNNIO (reddashed curve) is compared to that produced by the MCMCIO (blue curve) in Fig. 8 (b). The efficiency of the CNNIO was with respect to the MCMCIO and the MSE of the posterior probabilities computed by the CNNIO and the MCMCIO was . These quantities were evaluated on the testing dataset.
VC3 CNN visualization
Feature maps extracted by CONV layers enabled us to understand how CNNs were able to extract taskspecific features for performing signal detection tasks. In this case, the 32 subsampled feature maps output from the maxpooling layer were weighted by the weight parameters of the last FC layer and then summed to produce a single 2D image for the visualization. That single 2D image was referred to as the signal feature map and is shown in Fig. 9. The signal to be detected was nearly invisible in the signalpresent measurements but can be easily observed in the signal feature map. This illustrates the ability of CNNs to perform signal detection tasks.
VD SKE/BKS signal detection task with clustered lumpy background
VD1 HO approximation
The SLNN was trained for 40,000 minibatches (i.e., 20 epochs) and the weight vector that produced the maximum validation was selected to approximate the Hotelling template. The traditional HO template and the SLNNHO template are compared in Fig. 10. The results corresponding to the SLNNHO closely approximate those of the traditional HO.
The ROC curve of the SLNNHO (yellowdashed curve) compares to that of the traditional HO (red curve) in Fig. 11 (b). Two curves nearly overlap.
VD2 IO approximation
Convolutional neural networks having one to three CONV layers were trained for 100,000 minibatches (i.e., 50 epochs) and the corresponding validation crossentropy values are plotted in Fig. 11 (a). Because the validation crossentropy was not significantly decreased by adding the third CONV layer, we stopped adding more CONV layers and the CNN having the minimum validation crossentropy value, which was the CNN with three CONV layers, was selected. The detection performance of this selected CNN was evaluated on the testing dataset and the resulting AUC value was 0.887, which was greater than that of the SLNNHO (i.e., 0.845). Subsequently, the selected CNN was employed to approximate the IO. The CNNIO was evaluated on the testing dataset and the resulting ROC curve is plotted in Fig. 11 (b). To show how the signal detection performance varied when the number of CONV layers was increased, the AUC values evaluated on the testing dataset corresponding to the CNNs with one to three CONV layers are illustrated in Fig. 12. These AUC values were estimated by use of the “proper” binormal model [41, 42]. The AUC value was increased when more CONV layers were employed until convergence.
Because MCMC applications to the CLB object model have not been reported to date, validation for the IO approximation was not provided in this case. To the best of our knowledge, we are the first to approximate the IO test statistic for the CLB object model.
VD3 HO approximation from a reduced number of images
To solve the dimensionality problem of inverting a large covariance matrix for computing the Hotelling template, the matrixinversion lemma has been implemented in which the covariance matrix is approximated by use of a small number of images [1]. However, this method can introduce significant positive bias on the estimate of [17]. To investigate the ability of our proposed methods to approximate the HO performance when small dataset is employed, the linear SLNNs were trained by minimizing Eq. (21) and Eq. (24) on 2000 noisy measurements and 2000 background images, respectively, for 400 epochs. In the training processes, overfitting occurred as revealed by the curves of validation with respect to the number of epochs shown in Fig. 13.
However, an earlystopping strategy can be employed in which training is stopped at the epoch having the maximum validation . The values of , which were computed according to Eq. (10), evaluated at the epoch and at the epoch having the maximum validation are shown in Table II. These data reveal that overfitting caused a significant positive bias on while the earlystopping strategy accurately approximated the reference , which was computed by using the Hotelling template of the traditional HO that was shown in Fig. 10 (a). The Hotelling template was also computed by using the matrixinversion lemma [1] on 2000 background images, and the corresponding had a significant positive bias shown in Table II as observed by others [17].
Methods  epoch  Earlystopping 

Minimizing Eq. (21)  4.0421  2.0940 
Minimizing Eq. (24)  3.1101  2.1380 
Matrixinversion lemma  5.7979  
Reference  2.1075 
Vi Discussion and Conclusion
The proposed supervised learningbased method that employs CNNs to approximate the IO test statistic represents an alternative approach to conventional numerical approaches such as MCMC methods for use in optimizing medical imaging systems and dataacquisition designs. Although theoretical convergence properties exist for MCMC methods, practical issues such as designs of proposal densities from which proposed object samples are drawn need to be addressed for each considered object model and current applications of the MCMC methods have been limited to some specific object models that include parameterized torso phantoms [9], lumpy background models [2] and a binary texture model [8]. Supervised learningbased approaches may be easier to deploy with sophisticated object models than are MCMC methods. To demonstrate this, in the numerical study, we applied the proposed supervised learning method with a CLB object model, for which the IO computation has not been addressed by MCMC methods to date [8]. A practical advantage of the proposed method is that supervised learningbased methods are becoming widespread in their usage and many researchers are becoming experienced on training feedforward ANNs.
A challenge in approximating the IO by use of CNNs is the specification of the collection of model architectures to be systematically explored. In this study, we explored a family of CNNs that possess different numbers of CONV layers. By adding more CONV layers, the representation capacity of the network is increased and the test statistic can be more accurately approximated. This study does not provide methods for determining other architecture parameters such as the number of FC layers and the size of convolutional filters. Recent work [55] proposed a method that optimizes the network architecture in the training process. This represents a possible approach for jointly optimizing the network architecture and weights to approximate the IO test statistic.
We also proposed a supervised learningbased method using a simple linear SLNN to approximate the HO that is the optimal linear observer and sets a lower bound of the IO performance. The proposed methodology directly learns the Hotelling template without estimating and inverting covariance matrices. Accordingly, the proposed method can scale well to large images. When approximating the HO test statistic, selection of network architecture is not an issue because the HO test statistic depends linearly on the input image and one can employ a linear SLNN to represent linear functions. We also provided an alternative method to learn the HO by use of a covariancematrix decomposition. The feasibility of both methods to learn the HO from a reduced number of images was investigated. For the case where 2000 clustered lumpy images with the dimension were employed to approximate the HO, our proposed learningbased methods could still produce accurate estimates of by incorporating an earlystopping strategy.
Numerous topics remain for future investigation. With regards to approximating IOs by use of experimental images, there is a need to investigate methods to train large CNN models on limited training data. To accomplish this, one may investigate transfer learning
[56] or domain adaptation methods [57] that learn features of images in target domain (e.g., experimental images) by use of images in source domain (e.g., computersimulated images). One may also employ the method proposed by Kupinski et al. [43] or train a generative adversarial network [58] to estimate a stochastic object model (SOM) from experimental images to produce large datasets. Finally, it will be important to extend the proposed learningbased methods to more complicated tasks, such as joint detection and localization of a signal.Acknowledgment
This research was supported in part by NIH awards EB020168 and EB020604 and NSF award DMS1614305.
References
 [1] H. H. Barrett and K. J. Myers, Foundations of Image Science. John Wiley & Sons, 2013.
 [2] M. A. Kupinski, J. W. Hoppin, E. Clarkson, and H. H. Barrett, “IdealObserver computation in medical imaging with use of MarkovChain Monte Carlo techniques,” JOSA A, vol. 20, no. 3, pp. 430–438, 2003.
 [3] S. Park, H. H. Barrett, E. Clarkson, M. A. Kupinski, and K. J. Myers, “ChannelizedIdeal Observer using LaguerreGauss channels in detection tasks involving nonGaussian distributed lumpy backgrounds and a gaussian signal,” JOSA A, vol. 24, no. 12, pp. B136–B150, 2007.
 [4] S. Park and E. Clarkson, “Efficient estimation of IdealObserver performance in classification tasks involving highdimensional complex backgrounds,” JOSA A, vol. 26, no. 11, pp. B59–B71, 2009.
 [5] F. Shen and E. Clarkson, “Using Fisher information to approximate IdealObserver performance on detection tasks for lumpybackground images,” JOSA A, vol. 23, no. 10, pp. 2406–2414, 2006.
 [6] R. F. Wagner and D. G. Brown, “Unified SNR analysis of medical imaging systems,” Physics in Medicine & Biology, vol. 30, no. 6, p. 489, 1985.
 [7] Z. Liu, D. C. Knill, D. Kersten et al., “Object classification for human and Ideal Observers,” Vision Research, vol. 35, no. 4, pp. 549–568, 1995.
 [8] C. K. Abbey and J. M. Boone, “An Ideal Observer for a model of Xray imaging in breast parenchymal tissue,” in International Workshop on Digital Mammography. Springer, 2008, pp. 393–400.
 [9] X. He, B. S. Caffo, and E. C. Frey, “Toward realistic and practical Ideal Observer (IO) estimation for the optimization of medical imaging systems,” IEEE Transactions on Medical Imaging, vol. 27, no. 10, pp. 1535–1543, 2008.
 [10] H. H. Barrett, K. J. Myers, C. Hoeschen, M. A. Kupinski, and M. P. Little, “Taskbased measures of image quality and their relation to radiation dose and patient risk,” Physics in Medicine & Biology, vol. 60, no. 2, p. R1, 2015.
 [11] I. Reiser and R. Nishikawa, “Taskbased assessment of breast tomosynthesis: Effect of acquisition parameters and quantum noise,” Medical Physics, vol. 37, no. 4, pp. 1591–1600, 2010.
 [12] A. A. Sanchez, E. Y. Sidky, and X. Pan, “Taskbased optimization of dedicated breast CT via Hotelling observer metrics,” Medical Physics, vol. 41, no. 10, 2014.
 [13] S. J. Glick, S. Vedantham, and A. Karellas, “Investigation of optimal kVp settings for CT mammography using a flatpanel imager,” in Medical Imaging 2002: Physics of Medical Imaging, vol. 4682. International Society for Optics and Photonics, 2002, pp. 392–403.
 [14] H. H. Barrett, T. Gooley, K. Girodias, J. Rolland, T. White, and J. Yao, “Linear discriminants and image quality,” Image and Vision Computing, vol. 10, no. 6, pp. 451–460, 1992.
 [15] H. H. Barrett, J. Yao, J. P. Rolland, and K. J. Myers, “Model observers for assessment of image quality,” Proceedings of the National Academy of Sciences, vol. 90, no. 21, pp. 9758–9765, 1993.
 [16] H. H. Barrett, K. J. Myers, B. D. Gallas, E. Clarkson, and H. Zhang, “Megalopinakophobia: its symptoms and cures,” in Medical Imaging 2001: Physics of Medical Imaging, vol. 4320. International Society for Optics and Photonics, 2001, pp. 299–308.
 [17] M. A. Kupinski, E. Clarkson, and J. Y. Hesterman, “Bias in Hotelling observer performance computed from finite data,” in Medical Imaging 2007: Image Perception, Observer Performance, and Technology Assessment, vol. 6515. International Society for Optics and Photonics, 2007, p. 65150S.
 [18] H. H. Barrett, C. K. Abbey, B. D. Gallas, and M. P. Eckstein, “Stabilized estimates of Hotellingobserver detection performance in patientstructured noise,” in Medical Imaging 1998: Image Perception, vol. 3340. International Society for Optics and Photonics, 1998, pp. 27–44.
 [19] B. D. Gallas and H. H. Barrett, “Validating the use of channels to estimate the ideal linear observer,” JOSA A, vol. 20, no. 9, pp. 1725–1738, 2003.
 [20] J. G. Brankov, Y. Yang, L. Wei, I. El Naqa, and M. N. Wernick, “Learning a channelized observer for image quality assessment,” IEEE Transactions on Medical Imaging, vol. 28, no. 7, p. 991, 2009.

[21]
M. N. Wernick, Y. Yang, J. G. Brankov, G. Yourganov, and S. C. Strother, “Machine learning in medical imaging,”
IEEE Signal Processing Magazine, vol. 27, no. 4, pp. 25–38, 2010.  [22] F. Massanes and J. G. Brankov, “Evaluation of CNN as anthropomorphic model observer,” in Medical Imaging 2017: Image Perception, Observer Performance, and Technology Assessment, vol. 10136. International Society for Optics and Photonics, 2017, p. 101360Q.
 [23] M. Alnowami, G. Mills, M. Awis, P. Elangovanr, M. Patel, M. HallingBrown, K. Young, D. R. Dance, and K. Wells, “A deep learning model observer for use in alterative forced choice virtual clinical trials,” in Medical Imaging 2018: Image Perception, Observer Performance, and Technology Assessment, vol. 10577. International Society for Optics and Photonics, 2018, p. 105770Q.
 [24] F. K. Kopp, M. Catalano, D. Pfeiffer, E. J. Rummeny, and P. B. Noël, “Evaluation of a machine learning based model observer for Xray CT,” in Medical Imaging 2018: Image Perception, Observer Performance, and Technology Assessment, vol. 10577. International Society for Optics and Photonics, 2018, p. 105770S.
 [25] F. K. Kopp, M. Catalano, D. Pfeiffer, A. A. Fingerle, E. J. Rummeny, and P. B. Noël, “CNN as model observer in a liver lesion detection task for Xray computed tomography: A phantom study,” Medical Physics, 2018.
 [26] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, no. 5, pp. 359–366, 1989.
 [27] M. A. Kupinski, D. C. Edwards, M. L. Giger, and C. E. Metz, “Ideal Observer approximation using Bayesian classification neural networks,” IEEE Transactions on Medical Imaging, vol. 20, no. 9, pp. 886–899, 2001.
 [28] W. Zhou and M. A. Anastasio, “Learning the Ideal Observer for SKE detection tasks by use of convolutional neural networks,” in Medical Imaging 2018: Image Perception, Observer Performance, and Technology Assessment, vol. 10577. International Society for Optics and Photonics, 2018, p. 1057719.
 [29] C. E. Metz, “ROC methodology in radiologic imaging.” Investigative Radiology, vol. 21, no. 9, pp. 720–733, 1986.
 [30] J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural Networks, vol. 61, pp. 85–117, 2015.
 [31] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, p. 436, 2015.

[32]
S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face recognition: A convolutional neuralnetwork approach,”
IEEE Transactions on Neural Networks, vol. 8, no. 1, pp. 98–113, 1997.  [33] D. CireşAn, U. Meier, J. Masci, and J. Schmidhuber, “Multicolumn deep neural network for traffic sign classification,” Neural Networks, vol. 32, pp. 333–338, 2012.
 [34] C. Garcia and M. Delakis, “Convolutional face finder: A neural architecture for fast and robust face detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 11, pp. 1408–1423, 2004.
 [35] W. Rawat and Z. Wang, “Deep convolutional neural networks for image classification: A comprehensive review,” Neural Computation, vol. 29, no. 9, pp. 2352–2449, 2017.
 [36] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep learning. MIT press Cambridge, 2016, vol. 1.
 [37] J. Rolland and H. H. Barrett, “Effect of random background inhomogeneity on observer detection performance,” JOSA A, vol. 9, no. 5, pp. 649–658, 1992.
 [38] S. Park, M. A. Kupinski, E. Clarkson, and H. H. Barrett, “IdealObserver performance under signal and background uncertainty,” in Biennial International Conference on Information Processing in Medical Imaging. Springer, 2003, pp. 342–353.
 [39] F. O. Bochud, C. K. Abbey, and M. P. Eckstein, “Statistical texture synthesis of mammographic images with clustered lumpy backgrounds,” Optics Express, vol. 4, no. 1, pp. 33–43, 1999.
 [40] C. Metz, “Rockit user’s guide,” Chicago, Department of Radiology, University of Chicago, 1998.
 [41] C. E. Metz and X. Pan, “‘Proper’ binormal ROC curves: theory and maximumlikelihood estimation,” Journal of Mathematical Psychology, vol. 43, no. 1, pp. 1–33, 1999.
 [42] L. L. Pesce and C. E. Metz, “Reliable and computationally efficient maximumlikelihood estimation of “proper” binormal ROC curves,” Academic Radiology, vol. 14, no. 7, pp. 814–829, 2007.
 [43] M. A. Kupinski, E. Clarkson, J. W. Hoppin, L. Chen, and H. H. Barrett, “Experimental determination of object statistics from noisy images,” JOSA A, vol. 20, no. 3, pp. 421–429, 2003.
 [44] E. Clarkson and H. H. Barrett, “Approximations to IdealObserver performance on signaldetection tasks,” Applied Optics, vol. 39, no. 11, pp. 1783–1793, 2000.
 [45] M. P. Eckstein and C. K. Abbey, “Model observers for signalknownstatistically tasks (sks),” in Medical Imaging 2001: Image Perception and Performance, vol. 4324. International Society for Optics and Photonics, 2001, pp. 91–103.
 [46] Y. Zhang, B. T. Pham, and M. P. Eckstein, “Automated optimization of jpeg 2000 encoder options based on model observer performance for detecting variable signals in Xray coronary angiograms,” IEEE Transactions on Medical Imaging, vol. 23, no. 4, pp. 459–474, 2004.
 [47] C. Castella, M. Eckstein, C. Abbey, K. Kinkel, F. Verdun, R. Saunders, E. Samei, and F. Bochud, “Mass detection on mammograms: influence of signal shape uncertainty on human and model observers,” JOSA A, vol. 26, no. 2, pp. 425–436, 2009.
 [48] H. H. Barrett, K. J. Myers, N. Devaney, and C. Dainty, “Objective assessment of image quality. IV. Application to adaptive optics,” JOSA A, vol. 23, no. 12, pp. 3080–3105, 2006.
 [49] H. C. Gifford, M. A. King, P. H. Pretorius, and R. G. Wells, “A comparison of human and model observers in multislice LROC studies,” IEEE Transactions on Medical Imaging, vol. 24, no. 2, pp. 160–169, 2005.
 [50] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller, “Striving for simplicity: The all convolutional net,” arXiv preprint arXiv:1412.6806, 2014.
 [51] D. Scherer, A. Müller, and S. Behnke, “Evaluation of pooling operations in convolutional architectures for object recognition,” in Artificial Neural Networks–ICANN 2010. Springer, 2010, pp. 92–101.
 [52] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
 [53] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard et al., “Tensorflow: a system for largescale machine learning.” in OSDI, vol. 16, 2016, pp. 265–283.
 [54] S. Park, E. Clarkson, M. A. Kupinski, and H. H. Barrett, “Efficiency of the human observer detecting random signals in random backgrounds,” JOSA A, vol. 22, no. 1, pp. 3–16, 2005.
 [55] C. Cortes, X. Gonzalvo, V. Kuznetsov, M. Mohri, and S. Yang, “Adanet: Adaptive structural learning of artificial neural networks,” arXiv preprint arXiv:1607.01097, 2016.
 [56] J. Qiu, Q. Wu, G. Ding, Y. Xu, and S. Feng, “A survey of machine learning for big data processing,” EURASIP Journal on Advances in Signal Processing, vol. 2016, no. 1, p. 67, 2016.
 [57] Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by backpropagation,” arXiv preprint arXiv:1409.7495, 2014.
 [58] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems, 2014, pp. 2672–2680.