I Introduction
Medical imaging systems are commonly optimized with consideration of a specific task [57]. Assessing performance of such systems requires an objective metric for image quality (IQ) [8, 31, 40, 41, 49]. For signal detection tasks, the Bayesian ideal observer (IO) has been advocated for producing a figureofmerit for assessing IQ because it can maximize the amount of taskspecific information in the measurement data [54, 8, 31, 40, 41, 49]
. For a binary signal detection task, the IO test statistic takes the form of a likelihood ratio. Using this likelihood ratio as a test statistic in turn maximizes the area under the receiver operating characteristics (ROC) curve
[8, 54, 31, 40]. However, analytically determining the IO is generally difficult because it typically is a nonlinear function and requires complete knowledge of the statistical properties of the image data.There has been recent progress in developing approximations for computing the IO test statistic [41, 49]
. One line of research involves samplingbased methods that utilize Markovchain Monte Carlo methods to approximate the IO, but the work in this area has so far been limited to relatively simple object models
[21, 31, 41, 2]. Another recent development is the approximation of the IO with convolutional neural networks (CNNs)
[63]. An alternative method to approximating the IO’s performance employs variational Bayesian inference
[14]. This line of research has shown promise for implementing taskspecific optimization of sparse reconstruction methods.A common surrogate for the oftenintractable IO is the Hotelling observer (HO) [7, 44, 47, 18]
. The HO implements the optimal linear test discriminant for maximizing the signaltonoise ratio of the test statistic
[5, 9]. Implementing the HO requires the estimation and inversion of a covariance matrix, which quickly grows as image size increases and can become intractable to compute [6]. There are a few different strategies for mitigating the computational cost of inverting a large matrix [7]. One method is to avoid a direct inversion by implementing an iterative approach to estimate the test statistic [8]. If the measurement noise covariance matrix is known and an estimate of the background covariance matrix is available, covariance matrix decomposition is a viable option [8] with the caveat that certain situations can lead to significant bias in the performance [28]. Alternatively, the test statistic can be learned directly from the images provided that there is a sufficient amount of data [63, 62]. The most commonly employed method, however, is the implementation of channels that approximate the HO [4, 17, 41]These channels are linear transformations applied to reduce the dimensionality of the data, decreasing the computational costs of calculating the HO.
Conceptually, channels function by projecting the highdimensional image data to a lowdimensional image manifold [52]. Image data frequently can be compressed into a reduceddimensionality manifold [11, 53]. Ideally, the manifold embedding would preserve the important features of the data [58]. This projection operation is defined by the channel matrix. Channels are known as efficient if they approximate the original observer’s performance while reducing the dimensionality of the data [36]. Prior work in computing efficient channels includes LaguerreGauss (LG) [17]
, singular value decomposition (SVD)
[39], partial least squares (PLS) [59], and the filtered channel observer (FCO) [15]. In addition to learning efficient channels, there are approaches that seek to mimic the human observer’s performance. An early approach learned the relationship between channel features and human observer performance with a support vector machine
[12]. Another approach investigated optimization with respect to the HO and human observers on accelerated MRI reconstruction [42].Autoencoders (AEs) are a type of artificial neural network (ANN) that are characterized by a mirror structure, with the target output of the network similar to the input
[46, 23, 22, 10, 16]. They are designed to learn a lowerdimensional representation of the data called an embedding. The portion of the network that transforms the input to the embedding is known as the encoder, and the portion that transforms the embedding back into the original data space is known as the decoder. A good embedding is capable of significant data compression while retaining most of the information from the original data. The data compression qualities of AEs make them desirable to use in many tasks, and they have been applied in stateoftheart systems for classification [55], noise reduction [37], and regression [60]. The widespread success of the AE is due to its ability to generate lowdimensional representations of images, which increases the efficiency of further processing by attenuating noise and embedding data to its most important components. In general, a linear AE with optimal weights projects the data onto a subspace spanned by its top principle directions [27].In this work, the problem of learning taskinformed embeddings with an AE is explored. The AE is modified to learn the optimum transformation matrix that maximizes the amount of taskspecific information encoded in its latent states. This learning task is demonstrated to be equivalent to learning efficient channels for the HO. To the best of our knowledge, this is the first time a connection has been established between numerical observer channels and autoencodergenerated embeddings. The considered model is a linear AE with one hidden layer and one set of tied weights, as described below. Numerical studies are performed with binary signal detection tasks that involve a range of signals and backgrounds. The performance of the AElearned channels in these studies is compared to stateoftheart channelized methods. The potential advantages and limitations of this new approach are also discussed.
The remainder of this work is organized as follows. In Sec. II an overview of binary signal detection theory is presented. The HO, CHO and AE are also reviewed in that section. A novel methodology for learning channels for the locationknown binary signal detection task using an AE is developed in Sec. III. The numerical studies and results of the proposed method for approximating the HO are included in Secs. IV and V, along with a comparison to other stateoftheart methods. Finally, the paper concludes with a discussion of the work in Sec. VI.
Ii Background
Consider the linear digital imaging system
(1) 
where is the measured image data vector, denotes a continuoustodiscrete (CD) imaging operator that maps , is the object function with the 2D spatial coordinate , and is the random measurement noise. The object function will be abbreviated as and can be either deterministic or stochastic, depending on the specification of the signal detection task.
Iia Formulation of binary signal detection tasks
The binary signal detection task considered involves the classification of an image by an observer into one of two hypotheses: signalpresent () or signalabsent (). The imaging processes under these two hypotheses can be described as
(2a)  
(2b) 
where and represent a background and a signal object, respectively. Depending on the imaging task, these can be either random or fixed.
To perform a binary signal detection task, an observer computes a test statistic that maps the measured image to a realvalued scalar variable. This scalar is compared against a threshold
to classify
as satisfying either or . To determine the desired performance for the signal detection task, a receiver operating characteristic (ROC) curve can be plotted to depict the tradeoff between the falsepositive fraction (FPF) and the truepositive fraction (TPF) by varying the threshold . The overall signal detection performance of the observer can be summarized by computing the area under the ROC curve (AUC) [34].IiB Bayesian Ideal Observer and Hotelling Observer
The IO is optimal and sets the upper limit for observer performance on binary signal detection tasks. The IO test statistic is defined as any monotonic transformation of the likelihood ratio that takes the form [8, 31, 30]
(3) 
where and
are conditional probability density functions that describe the measured data
under hypothesis and , respectively.An alternative to the IO for assessing signal detection performance is the HO. The HO test statistic is defined as
(4) 
where is the observer template. Let denote the conditional mean of the image data given an object function. Similarly, let denote the conditional mean averaged with respect to object randomness associated with . The Hotelling template is defined as[8]
(5) 
where
(6) 
(7) 
Here, is the covariance matrix of the measured data under the hypothesis and is the difference between the mean of the measured data under the two hypotheses.
The signal to noise ratio (SNR) associated with the test statistic is another commonly employed FOM for assessing signal detection performance and is given by [9]
(8) 
where and
are the mean and variance of
under the hypothesis (). While the IO maximizes the AUC of an observer, the HO maximizes the SNR of the test statistic [9].IiC Channels
Computation of the HO can become intractable for large image sizes due to the cost of inverting the covariance matrix in Eqn. (5). Additionally, it may be difficult to estimate a full rank covariance matrix in limiteddata cases. To mitigate this problem, a channelized version of the image can be introduced as [9]
(9) 
where is a channelreduced image and is a matrix. The number of channels, , determines the dimensionality reduction from the original data of size Applying the HO to the channelreduced data yields the channelized HO (CHO) [9], with the test statistic taking the form of
(10) 
Here, and , where and for .
It is desirable to minimize the number of channels to maximize computational efficiency, since the dimensionality of is proportional to the number of channels. However, these channels should maximize the retained, taskrelevant, information to provide an efficient approximation of the HO. There are several methods that exist for selecting efficient channels. One of the first was LG channels [36]. These channels are a combination of a Gaussian function with a Laguerre polynomial and were proposed due to their structural similarity with the Hotelling template for certain detection tasks. These channels are suitable for a smooth rotationally symmetric signal on a lumpy background, but may have suboptimal performance for arbitrary signals and more complex backgrounds [59].
An alternative to LG channels are SVD channels [39]. These channels are singular vectors that form a basis for image vectors in the range of the imaging operator. The most efficient set of channels constructed from this method involved decomposing the noiseless signal image by use of the singular vectors and choosing the top of them to form the channel set. However, this method is computationally expensive and systemspecific.
Two current stateoftheart methods for generating efficient channels that work on arbitrary signals and backgrounds without any specific knowledge of the imaging system are partial least squares (PLS) [59] and filtered channel observer (FCO) [15]. PLS applies a data reduction technique that iteratively constructs a number of latent vectors that maximize the covariance between the data and the true image labels. PLS represents an attractive method to use in limiteddata cases and/or large image sizes and works well with noisy and heavily correlated data. However, the technique suffers a notable degradation of performance when the amount of available image data is small [59].
FCO channels were initially developed as anthropomorphic channels to approximate human signal detection performance for irregularlyshaped signals [15]. However, FCO channels have been explored as efficient channels for the HO[15, 3]. The FCO convolves a selected set of baseline channels with the signal before computing the observer template. For this work, LG channels were selected as the baseline set of channels due to both LG’s past success [36] and similar decisions with the FCO method in more recent work [3]. This realization of the FCO method will be referred to as convolutional LG.
IiD Neural Networks for Approximating the IO
A feedforward ANN is a system of computational units associated with tunable parameters called weights [48, 32]. A feedforward ANN is capable of approximating any continuous function if it has a sufficiently complex architecture [24, 25]. ANNs have been employed to form numerical observers, with the focus on directly estimating the test statistic [30, 61, 63]. Kupinski et al. [30] utilized conventional fully connected neural networks to approximate the IO on lowdimensional extracted image features. Zhou and Anastasio extended this work to higherdimensional data and allowed for native processing of image data by replacing the FCNN with a convolutional neural network [61, 63]. However, both of these approaches focus on learning the test statistic directly and may require a large amount of training data to accurately approximate the IO.
IiE Autoencoders
A specialized type of ANN is the autoencoder (AE) [46, 23, 22, 10, 16]. The AE is characterized by a mirror structure, with the input of the network similar to the target output. An AE has three distinct components: an encoder, an embedding, and a decoder. The encoder transforms the input to the embedding, which generally has a significantly reduced dimensionality compared to the input. The decoder transforms the embedding into the target output. In a canonical AE, the decoder is specified to reconstruct an approximation of the input to the encoder. AEs are frequently employed for their data compression properties in stateoftheart systems for classification [55], regression [60], noise reduction [37]
[13], and image recovery [35] tasks. Additional performance improvements can be made by injecting additional information into the AE training process. Studies have shown that exploiting a priori information through implicitly defined nonparametric functions can introduce taskspecific information in the training of AEs [50, 51].In contrast to previous work with ANNs, an AE is usually trained in an unsupervised way [55]. One aspect of AEs that has recently been considered is the concept of tied weights [56]. Tied weights further enforce the mirrorlike structure of the AE by forcing the encoder and decoder matrices to be symmetric. Tiedweight AEs have been shown to perform similarly to untiedweight AEs, but require less data to train because of the reduction in parameters.
In general, the layers in an AE specify many sets of matrix multiplications with added bias terms and nonlinear transformations. By restricting the operations to only matrix multiplications, linear AEs can be obtained. In these cases, the encoder and decoder can each be described by a transformation matrix that transforms to or from the data embedding. Such a simplified network is considered in this work since this configuration’s encoder has a natural parallel with the channel matrix in the CHO. The input to the network is a noisy image and the target output is either the input image or a related version of the input image, depending on the task.
An optimization problem is solved to determine the weights of the AE by minimizing a reconstruction loss. The solution of the optimization problem is computed by minimizing the loss function using a variation of the backpropogation algorithm [26]. The traditional loss function for an AE is the mean squared error between the input and the output of the network. Given vectorized background images of size , the traditional loss function corresponding to a zerobias linear AE is [46]
(11) 
where and are each weight matrices that parameterize the encoder and decoder of the AE, respectively. The target reconstruction is represented by , which can be the same as or different from the input data but is usually closely related. For example, in denoising problems the target output is a clean version of the input image.
Iii Method  AutoencoderLearned Channels
A method for learning efficient channels for the CHO with an AE is described below. A connection between AE weights and the CHO framework is established to illustrate the connection between the learned data embeddings and more traditional channels.
Iiia Autoencoder Channels and Linear Autoencoders
The learned weights of an AE have an additional interpretation when considered in the framework of a signal detection task. The weights define a mapping from the highdimensional image space to a lowdimensional embedding space. This is conceptually equivalent to the CHO channel matrix . The AE weights can be employed as channels for the CHO by setting in Eqn. (9). Intuitively, these AElearned channels capture the data most important for reconstructing the image.
The loss function in Eqn. (11) causes the AE to encode the entirety of the input image. This makes the traditional AE suboptimal for learning channels because a significant portion of the data embedding is dedicated to reconstructing certain components of the background and noise that may not be highly relevant to the detection task. To circumvent this, as described below, information about the signal can be incorporated into the AE training process to preserve taskspecific information.
IiiB TaskSpecific Autoencoders
A novel modification to the loss function to improve the learned data embedding and resulting signal detection performance for AEchannels is presented here. Ideally, the entirety of the AE embedding would be dedicated to the taskspecific information. This would minimize the proportion of the embedding that is dedicated to extraneous information and lead to a more efficient set of channels. By changing the AE’s target reconstruction to just the mean signal image, the background and noise are suppressed during the reconstruction process. This results in an embedding in which the signal can be accurately represented. This new approach minimizes the MSE between the reconstructed image and the estimated signal image and takes the form of
(12) 
where is defined in Eqn. (7) and
is the indicator function that returns 1 if the signal is present and 0 otherwise. Note that this loss function uses label information, and thus is a supervised learning algorithm. Considering the background as noise permits the entire capacity of the embedding to focus on the taskspecific information. Using the signal template as the target image assists the training process in identifying an embedding that preserves taskspecific information. The indicator function and alteration to the desired output also breaks the traditional AE’s connection to principle directions
[27]. As shown below, this modification to the loss function is capable of generating efficient channels for the CHO. A diagram of the AE with both the traditional and taskbased approach for the signal detection task is provided in Fig. 1, with a sample reconstruction from AEs trained using both loss functions shown in Fig. 2. Both the taskspecific and traditional loss functions can be minimized by use of a gradientdescent method, with specific implementation details provided in Sec. IVD.Iv Numerical studies
Numerical simulation studies were conducted to evaluate the performance of the proposed method for learning efficient channels for the CHO. All simulations addressed backgroundknownstatistically (BKS) signal detection tasks. Four distinct binary signal detection tasks were considered. Using a lumpy background, a locationknown task and signalknownstatistically task were considered. These tasks enabled the HO to be determined both using covariance matrix decomposition [8] and direct computation according to Eqn. (5). These observers will be referred to as HOCMD and HODirect, respectively. On a breast phantom background, two locationknown signal detection tasks using signals of different shapes and sizes were considered. These tasks allowed for the evaluation of channelized methods on a more realistic medical imaging task. ROC curves were fit by use of a binormal model [34, 33, 38] with the fitted AUC values reported. The experimental results are reported in distinct sections based on the image background model, with the details for each signal detection task and the training of neural networks are given in the appropriate subsections.
Iva Signal detection tasks that utilize a lumpy background model
Two different signal detection tasks were performed on a lumpy background model [45] with an idealized parallelhole collimator system [31]. Further details about each of the components is provided below.
IvA1 Lumpy Background
A stochastic lumpy object model was used as the background [45]
(13) 
where
is the number of lumps that is sampled from Poisson distribution with the mean set to 5 and
is the lumpy function modeled by a symmetric 2D Gaussian function with amplitude and width(14) 
Here, is the uniformlysampled position of the lump. The magnitude and width of the lumps were set to the frequentlyemployed values of and . An example of a signalpresent image in the dataset with a circular signal is located in Fig. 3.
IvA2 Imaging system
IvA3 Signals
The signal function was a 2D Gaussian function
(16) 
where is the amplitude and is the coordinate of the signal location. Here, is the Euclidean rotation matrix that rotates the Gaussian by an angle of and is given by
(17) 
and is a scaling matrix that controls the width of the Gaussian along each axis and is given by
(18) 
For both experiments involving the lumpy background, the elliptical Gaussian signal was set to have the parameters and . The image size was selected to be with the signal centered at . The value of varied depending on the type of task.
IvA4 Detection Tasks
The first signal detection task employed , forcing the signal to take the same orientation in each image. Thus, the signal location and shape were fixed. The signal template was computed according to Eqn. (7), which resulted in a noisy estimate of the signal. The second signal detection task sampled uniformly from the set . This allowed for four distinct orientations of the elliptical Gaussian. The mean signal was also computed with Eqn. (7), which resulted in a noisy estimate of the signal averaged across the four possible realizations.
IvA5 Dataset Generation
A training set of 60 000 unique background images with noise were generated for the lumpy object model. The background images were generated separately from the signal image in Eqn. (16
) using the appropriate background model. Each background image was summed with a unique noise vector drawn from an i.i.d. Gaussian distribution
with a mean of 0 and standard deviation
. These images were then paired, with half designated for signal present and half for signal absent. Each signal present image was summed with the signal image to generate the final training data set of 30 000 paired images. Another set of 5000 paired images was generated for determining the channel covariance matrix after the channels had been learned and a further set of 5000 paired images were held out as a testing dataset.IvB Locationknown tasks that utilize a breast phantom dataset
Two further signal detection tasks were performed on a breast phantom background employing the VICTRE dataset [3]. This dataset contains simulated digital mammography (DM) images and was employed previously in a locationknown human observer study to evaluate imaging systems [3]. The images are divided into four categories of breast types of decreasing difficulty for lesion detection: extremely dense, heterogeneously dense, scattered fibroglandular, and fatty. The signals in the dataset are microcalcification clusters and spiculated masses. For each signal, there are associated signalabsent and signalpresent images. The signal remains constant in location and shape throughout all the signalpresent images, but a clean signal image is not available. An estimation of the signal is obtained from the difference of the mean signalpresent and signalabsent images according to Eqn. (7), making this a locationknown task [3].
For each type of signal, 12500 total images were selected from the dataset to form training, validation, and testing sets of 5000, 625, and 625 paired images, respectively. The breast types selected maintained the proportions of the VICTRE study [3]. The signals were estimated by taking the mean of the signalpresent images and subtracting the mean of the signalabsent images for the combined training and validation dataset. Sample images and estimated signals are included in Fig. 4.
IvC AE Topology
The considered network topology was a tiedweight AE with no nonlinear or bias terms. This structure parallels the CHO formulation in Eqn. (9), as the AE is learning the transformation matrix . Tied weights were chosen because they couple the encoder and the decoder by enforcing , making the encoder a transpose of the decoder. This formulation prevents loss of information that may solely exist in the decoder since only the encoder is employed as the transformation matrix. Additionally, tied weight AEs have fewer parameters to train and thus perform better in the limiteddata experiments considered [20].
IvD Experimental Parameters
IvD1 Training Details
AEchannels were determined by minimizing the modified autoencoder loss function in Eqn. (12
). The models were trained in Tensorflow
[1] using the Adam algorithm [26]. The AE weights were initialized using a truncated normal initializer with a standard deviation of 5e6. The models were trained for 500 epochs. Provided the considered dataset contained more than 500 images, pretraining the models on a subset of 500 images for 500 epochs to burn in the network sometimes improved performance. A minibatch size of 250 was employed, with an equal number of signalpresent and signalabsent images in each minibatch. The learning rate was set to 5e3 for the VICTRE phantom background study and 1e5 for the lumpy background study. All networks were trained on a single NVIDIA TITAN X GPU.
Several reference methods were implemented to compared against the AElearned channels, including convolutional LG [15], partial least squares[59], and the matched filter. The HODirect[9] was also computed on each subset using Eqn. (5). A grid search on the entire training dataset for each background was used to select the parameters for all methods, with the number of channels capped at 20. This grid search also implicitly provided multiple random initializations for the AE.
IvD2 Evaluation
Each model was on trained across a range of restrictedsize subsets of the training data. The VICTRE case detailed in Sec. IVB contained subsets of size 250, 500, 1000, 2000, and 5000 image pairs. The larger lumpy background experiments detailed in Sec.IVA also considered sets of 10 000, 15 000, 20 000, 25 000, and 30 000 image pairs.
The standard trainvalidatetest scheme [19] was employed to evaluate performance. The AE and competing methods were given the training data and signal estimate to operate on, with the performance evaluated on the validation data to select the best set of parameters. Once the parameters were determined for each method, the CHO was numerically determined according to Eqn. (10) using the combined training subset and validation dataset to compute . The final models were then evaluated on the testing set to obtain the AUC values.
The HOCMD was also computed for the experiments on lumpy backgrounds to analyze the efficiency of the channels for each method. The empirical background covariance matrix was calculated using the combined training and validation datasets for a total of 70 000 noiseless background images. This method was unavailable for estimating the HO of the VICTRE experiments as noiseless images were not available.
V Results
The results for the limitedimage tests for the lumpy model and VICTRE breast phantom model are provided in Figs. 5 and 6. The traditional AE was also tested, but failed to exceed 0.55 AUC in all four experiments. Overall, the proposed method was competitive with the stateoftheart channelized methods for both the lumpy background and VICTRE phantom background cases. For the lumpy background cases, the AEchannels performed significantly better than the PLS channels for all but the largest dataset sizes. In those cases, performance was comparable. Convolutional LG channel performance was relatively static since the models were tuned at the maximum dataset size and it is not a learning method, but were the best performing channels for the majority of the lumpy dataset sizes considered. However, both the PLS and AE channels outperformed convolutional LG when sufficient images were available. The HODirect had inferior performance to both the AE and convolutional channels while also requiring significantly more computation to evaluate. Thus, some channelized methods outperformed the standard method of computing the HO. The HOCMD serves as an upper bound.
In the VICTRE background case, the AElearned channels outperform every other tested method for the smaller training subsets. Given a sufficient amount of data, the AE and PLS channels approach the same AUC and are approximately equivalent. This occurred more quickly for the larger spiculated mass signal than the smaller microcalcification clusters. The HODirect also had substandard performance in most cases due to the degeneracy of the covariance matrix in the dataconstrained experiments. In these illconditioned cases the test statistic was estimated by solving a linear system, but the resulting low AUC demonstrates the superiority of channelized methods for calculating an observer for this more complicated background.
During the course of the experiments, it was observed that the convolutional LG channels were especially sensitive to the quality of the estimated signal. When provided with the signal used to generate the data in both the locationknown and SKS lumpy experiments, the method outperformed all other competitors. When fewer images were available and thus there is more noise in the signal image, such as in the VICTRE phantom dataset, the performance degraded significantly. Although the AElearned channels attempt to reconstruct the given signal image directly, and thus would seem to be impacted more by noise, the method was more robust to error in the estimated signal than the convolutional LG approach. This is likely due to the same innate denoising AEs demonstrate due to the limited embedding dimensionality.
The learned channels for the 30 000 locationknown lumpy image case are included in Fig. 7. Many of the channels are similar to one another in the features they extract, and can be removed without significant loss of performance. These extraneous channels likely exist due to the AE training process. Random initializations generate different starting locations for each channel, which is iteratively optimized by the AE training process. During this process, the channels are updated to better jointly reconstruct the signal image. Thus, even if the final model makes inefficient use of its full channel budget, the channels are influenced by their interactions during the training process. One of the limitations of this approach is its sensitivity to the random initialization, which can result in models of dramatically varying quality even with the same structure.
Vi Discussion and Conclusion
This study demonstrated that AEs are capable of learning efficient CHO channels for both location known and certain SKS signal detection tasks. Data embeddings and observer channels were demonstrated to be fundamentally related, with the task of optimizing a data embedding to preserve signalspecific information equivalent to determining an efficient channel selection for the CHO. Furthermore, the presented method of computing channels is capable of meeting or exceeding the performance of stateoftheart methods on the investigated tasks.
Channels were learned for the CHO by minimizing the reconstruction loss of an AE. Modification of the AE loss function to focus only on taskspecific information involving the signal was found to have a significant benefit over using the traditional AE approach. Empirical sweeps over the network topology revealed that the AE could efficiently approximate the HO for a wide range of cases utilizing comparable numbers of channels to other approaches. The proposed method was equivalent to stateoftheart approaches for the lumpy background and significantly superior on the more complicated VICTRE breast phantom dataset, demonstrating the robustness and versatility of the method.
Performance improvements were especially noticeable for low numbers of trainingset images as the AElearned channels plateaued to higher AUC values sooner than other learningbased methods. However, the AElearned channels were sensitive to the random initialization of the weights and frequently learned redundant channels. The training scheme can likely be further improved with a more robust approach to weight initialization.
Opportunities for future work include expanding the current channels to the IO and extending the formulation to both more sophisticated SKS cases and 3D input images. The channels should work directly for any standard Markov chain Monte Carlo method for estimating the IO. Although the current form of the loss function for learning AEchannels requires knowing the signal centroid, it could be generalized by considering convolutional AEs [43]. The superior performance of AElearned channels on smaller datasets and medically realistic phantoms also expands the applicability of the method to realworld cases, and the method should be tested on experimental data to identify remaining challenges in tuning the AE.
Acknowledgment
This work was supported in part by grants NIH NS102213, NIH EB020604, and NSF DMS1614305.
References

[1]
(2016)
Tensorflow: a system for largescale machine learning.
. In OSDI, Vol. 16, pp. 265–283. Cited by: §IVD1.  [2] (2008) An Ideal Observer for a model of Xray imaging in breast parenchymal tissue. In International Workshop on Digital Mammography, pp. 393–400. Cited by: §I.
 [3] (201811) Evaluation of Digital Breast Tomosynthesis as Replacement of FullField Digital Mammography Using an In Silico Imaging Trial. JAMA Network Open 1 (7), pp. e185474–e185474. Cited by: §IIC, §IVB, §IVB.
 [4] (1998) Stabilized estimates of Hotellingobserver detection performance in patientstructured noise. In Medical Imaging 1998: Image Perception, Vol. 3340, pp. 27–44. Cited by: §I.
 [5] (1992) Linear discriminants and image quality. Image and Vision Computing 10 (6), pp. 451–460. Cited by: §I.
 [6] (2001) Megalopinakophobia: its symptoms and cures. In Medical Imaging 2001: Physics of Medical Imaging, Vol. 4320, pp. 299–308. Cited by: §I.
 [7] (2015) Taskbased measures of image quality and their relation to radiation dose and patient risk. Physics in Medicine & Biology 60 (2), pp. R1. Cited by: §I.
 [8] (2013) Foundations of Image Science. John Wiley & Sons. Cited by: §I, §I, §IIB, §IIB, §IV.
 [9] (1993) Model observers for assessment of image quality. Proceedings of the National Academy of Sciences 90 (21), pp. 9758–9765. Cited by: §I, §IIB, §IIC, §IVD1.
 [10] (2007) Scaling learning algorithms towards ai. In Largescale kernel machines, L. Bottou, O. Chapelle, D. DeCoste, and J. Weston (Eds.), (English (US)). Cited by: §I, §IIE.
 [11] (1996) Image representations for visual learning. Science 272 (5270), pp. 1905–1909. External Links: Document, ISSN 00368075, Link, https://science.sciencemag.org/content/272/5270/1905.full.pdf Cited by: §I.
 [12] (2009/07/) Learning a channelized observer for image quality assessment. IEEE transactions on medical imaging 28 (7), pp. 991–999. External Links: Document, ISBN 1558254X; 02780062, Link Cited by: §I.
 [13] (2009) Anomaly detection: a survey. ACM Comput. Surv. 41, pp. 15:1–15:58. Cited by: §IIE.
 [14] (201905) Reconstructionaware imaging system ranking by use of a sparsitydriven numerical observer enabled by variational Bayesian inference. IEEE Transactions on Medical Imaging 38 (5), pp. 1251–1262. External Links: Document, ISSN 02780062 Cited by: §I.
 [15] (2015) Derivation of an observer model adapted to irregular signals based on convolution channels. IEEE Transactions on Medical Imaging 34 (7), pp. 1428–1435. Cited by: §I, §IIC, §IIC, §IVD1.

[16]
(201003)
Why does unsupervised pretraining help deep learning?
. J. Mach. Learn. Res. 11, pp. 625–660. External Links: ISSN 15324435, Link Cited by: §I, §IIE.  [17] (200309) Validating the use of channels to estimate the ideal linear observer. Journal of the Optical Society of America A 20 (9), pp. 1725–1738. External Links: Link, Document Cited by: §I, §I.
 [18] (2002) Investigation of optimal kVp settings for CT mammography using a flatpanel imager. In Medical Imaging 2002: Physics of Medical Imaging, Vol. 4682, pp. 392–403. Cited by: §I.
 [19] (2016) Deep learning. Vol. 1, MIT press Cambridge. Cited by: §IVD2.
 [20] (2019) Autoencoder embedding of taskspecific information. In Medical Imaging 2019: Image Perception, Observer Performance, and Technology Assessment, Vol. 10952. External Links: Document, Link, Cited by: §IVC.
 [21] (2008) Toward realistic and practical Ideal Observer (IO) estimation for the optimization of medical imaging systems. IEEE Transactions on Medical Imaging 27 (10), pp. 1535–1543. Cited by: §I.
 [22] (200608) Reducing the dimensionality of data with neural networks. Science (New York, N.Y.) 313, pp. 504–7. External Links: Document Cited by: §I, §IIE.
 [23] (2006) A fast learning algorithm for deep belief nets. Neural Computation 18 (7), pp. 1527–1554. Note: PMID: 16764513 External Links: Document, Link, https://doi.org/10.1162/neco.2006.18.7.1527 Cited by: §I, §IIE.
 [24] (1989) Multilayer feedforward networks are universal approximators. Neural Networks 2 (5), pp. 359–366. Cited by: §IID.
 [25] (1991) Approximation capabilities of multilayer feedforward networks. Neural Networks 4 (2), pp. 251 – 257. External Links: ISSN 08936080, Document, Link Cited by: §IID.
 [26] (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §IIE, §IVD1.
 [27] (2019) Loss landscapes of regularized linear autoencoders. CoRR abs/1901.08168. External Links: Link, 1901.08168 Cited by: §I, §IIIB.
 [28] (2007) Bias in Hotelling observer performance computed from finite data. In Medical Imaging 2007: Image Perception, Observer Performance, and Technology Assessment, Vol. 6515, pp. 65150S. Cited by: §I.
 [29] (2003) Experimental determination of object statistics from noisy images. JOSA A 20 (3), pp. 421–429. Cited by: §IVA2.
 [30] (2001) Ideal Observer approximation using Bayesian classification neural networks. IEEE Transactions on Medical Imaging 20 (9), pp. 886–899. Cited by: §IIB, §IID.
 [31] (2003) IdealObserver computation in medical imaging with use of MarkovChain Monte Carlo techniques. JOSA A 20 (3), pp. 430–438. Cited by: §I, §I, §IIB, §IVA2, §IVA.
 [32] (2015) Deep learning. Nature 521 (7553), pp. 436. Cited by: §IID.
 [33] (1999) ‘Proper’ binormal ROC curves: theory and maximumlikelihood estimation. Journal of Mathematical Psychology 43 (1), pp. 1–33. Cited by: §IV.
 [34] (1986) ROC methodology in radiologic imaging.. Investigative Radiology 21 (9), pp. 720–733. Cited by: §IIA, §IV.
 [35] (2015Sep.) A deep learning approach to structured signal recovery. In 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), Vol. , pp. 1336–1343. External Links: Document, ISSN Cited by: §IIE.
 [36] (198712) Addition of a channel mechanism to the idealobserver model. Journal of the Optical Society of America A 4 (12), pp. 2447–2457. External Links: Link, Document Cited by: §I, §IIC, §IIC.
 [37] (2017) Convolutional autoencoder for image denoising of ultralowdose ct. Heliyon 3 (8), pp. e00393. External Links: ISSN 24058440, Document, Link Cited by: §I, §IIE.

[38]
(1997)
The ’proper’ binormal model: parametric receiver operating characteristic curve estimation with degenerate data
. Academic Radiology 4 (5), pp. 380–389. Cited by: §IV.  [39] (200905) Singular vectors of a linear imaging system as efficient channels for the bayesian ideal observer. IEEE Transactions on Medical Imaging 28 (5), pp. 657–668. External Links: Document, ISSN 02780062 Cited by: §I, §IIC.
 [40] (2007) ChannelizedIdeal Observer using LaguerreGauss channels in detection tasks involving nonGaussian distributed lumpy backgrounds and a Gaussian signal. JOSA A 24 (12), pp. B136–B150. Cited by: §I.
 [41] (2009) Efficient estimation of IdealObserver performance in classification tasks involving highdimensional complex backgrounds. JOSA A 26 (11), pp. B59–B71. Cited by: §I, §I, §I.
 [42] (2019) LaguerreGauss and sparse differenceofGaussians observer models for signal detection using constrained reconstruction in magnetic resonance imaging. In Medical Imaging 2019: Image Perception, Observer Performance, and Technology Assessment, Vol. 10952. External Links: Document, Link, Cited by: §I.

[43]
(200706)
Unsupervised learning of invariant feature hierarchies with applications to object recognition.
In
2007 IEEE Conference on Computer Vision and Pattern Recognition
, Vol. , pp. 1–8. External Links: Document, ISSN 10636919 Cited by: §VI.  [44] (2010) Taskbased assessment of breast tomosynthesis: Effect of acquisition parameters and quantum noise. Medical Physics 37 (4), pp. 1591–1600. Cited by: §I.
 [45] (1992) Effect of random background inhomogeneity on observer detection performance. JOSA A 9 (5), pp. 649–658. Cited by: §IVA1, §IVA.
 [46] (1986) Learning internal representations by error propagation. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1, D. E. Rumelhart, J. L. McClelland, and C. PDP Research Group (Eds.), pp. 318–362. External Links: ISBN 026268053X, Link Cited by: §I, §IIE, §IIE.
 [47] (2014) Taskbased optimization of dedicated breast CT via Hotelling observer metrics. Medical Physics 41 (10). Cited by: §I.
 [48] (2015) Deep learning in neural networks: An overview. Neural Networks 61, pp. 85–117. Cited by: §IID.
 [49] (2006) Using Fisher information to approximate IdealObserver performance on detection tasks for lumpybackground images. JOSA A 23 (10), pp. 2406–2414. Cited by: §I, §I.

[50]
(201221–23 Apr)
On nonparametric guidance for learning autoencoder representations.
In
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics
, N. D. Lawrence and M. Girolami (Eds.), Proceedings of Machine Learning Research, Vol. 22, La Palma, Canary Islands, pp. 1073–1080. External Links: Link Cited by: §IIE.  [51] (201209) Nonparametric guidance of autoencoder representations using label information. J. Mach. Learn. Res. 13 (1), pp. 2567–2588. External Links: ISSN 15324435, Link Cited by: §IIE.
 [52] (1998) Mapping a manifold of perceptual observations. In Advances in Neural Information Processing Systems 10, M. I. Jordan, M. J. Kearns, and S. A. Solla (Eds.), pp. 682–688. External Links: Link Cited by: §I.
 [53] (1991) Eigenfaces for recognition. Journal of Cognitive Neuroscience 3 (1), pp. 71–86. Note: PMID: 23964806 External Links: Document, Link, https://doi.org/10.1162/jocn.1991.3.1.71 Cited by: §I.
 [54] (1997/08/01) ICRU report 54: medical imaging  The assessment of image quality. Radiography 3 (3), pp. 243–244. External Links: Document, ISBN 091339453X, Link Cited by: §I.

[55]
(2008)
Extracting and composing robust features with denoising autoencoders
. In Proceedings of the Twentyfifth International Conference on Machine Learning (ICML’08), W. W. Cohen, A. McCallum, and S. T. Roweis (Eds.), pp. 1096–1103. Cited by: §I, §IIE, §IIE.  [56] (201012) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, pp. 3371–3408. External Links: ISSN 15324435, Link Cited by: §IIE.
 [57] (1985) Unified SNR analysis of medical imaging systems. Physics in Medicine & Biology 30 (6), pp. 489. Cited by: §I.
 [58] (20061001) Unsupervised learning of image manifolds by semidefinite programming. International Journal of Computer Vision 70 (1), pp. 77–90. External Links: ISSN 15731405, Document, Link Cited by: §I.
 [59] (201004) Partial least squares: a method to estimate efficient channels for the Ideal Observers. IEEE Transactions on Medical Imaging 29 (4), pp. 1050–1058. External Links: Document, ISSN 02780062 Cited by: §I, §IIC, §IIC, §IVD1.
 [60] (2017) Age progression/regression by conditional adversarial autoencoder. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4352–4360. Cited by: §I, §IIE.
 [61] (2018) Learning the Ideal Observer for SKE detection tasks by use of convolutional neural networks. In Medical Imaging 2018: Image Perception, Observer Performance, and Technology Assessment, Vol. 10577, pp. 1057719. Cited by: §IID.
 [62] (2019) Learning the Hotelling observer for ske detection tasks by use of supervised learning methods. In Medical Imaging 2019: Image Perception, Observer Performance, and Technology Assessment, Vol. 10952. External Links: Document, Link, Cited by: §I.
 [63] (201904) Approximating the Ideal Observer and Hotelling observer for binary signal detection tasks by use of supervised learning methods. IEEE Transactions on Medical Imaging, pp. . External Links: Document Cited by: §I, §I, §IID.
Comments
There are no comments yet.