Medical imaging systems are commonly assessed by use of objective measures of image quality that quantify the performance of an observer at specific tasks[2, 16, 20, 17, 19, 18]. Supervised deep learning methods have been actively investigated to learn and implement numerical observers for task-based image quality assessment. For example, Zhou et. al.
have proposed an Ideal Observer approximation methodology for binary signal detection tasks by use of convolutional neural networks (CNNs). These supervised deep learning-based methods require a large amount of labeled data for training. However, in practice, labeling a large of experimental data is tedious, expensive, and prone to subjective errors.
In contrast, labeled computer-simulated image data can be relatively convenient to generate. If the simulated data are realistic enough, it is potentially feasible to train a deep learning-based numerical observer (DL-NO) with a large amount of simulated data and then directly apply it to experimental data. However, it is often difficult to computationally model complicated anatomical structures and the response of real-world imaging systems and therefore simulated image data will generally possess physical and statistical differences from the experimental image data they seek to emulate. This results in a so-called domain shift between the two sets of images[6, 5, 4]
. This domain shift can significantly degrade the performance of a DL-NO that is trained on simulated images but applied to experimental ones. Recently, domain adaptation methods that aim at mitigating the effect of domain shifts have been applied to several computer vision tasks including image classification[15, 11, 13], image segmentation [10, 9, 3] and cell counting [8, 7].
In this study, we propose and investigate the use of an adversarial domain adaptation method to mitigate the deliterious effects of domain shift between simulated and experimental image data for DL-NOs that are trained on simulated images but applied to experimental ones. The employed domain adaption methodology wil not require labelled experimental images. As a proof of concept, a convolutional neural network (CNN) is employed as the NO and a binary signal detection task is considered. Through computer-simulation studies, the success of this strategy as a function of the degree of domain shift present between the simulated and experimental image data is investigated.
2.1 Framework of the Proposed Method
The framework of the proposed method consists of three stages: source observer training, domain adaptation model (DAM) training, and target observer formulation, as shown in Figure 1.
In the stage of source observer training, a large amount of labeled computer-simulated data (source data) are automatically generated, and then employed to train a DL-NO operating in the source domain (source observer). As shown in Figure 1
, the source observer contains an encoder neural network (ENN) and observation neural network (ONN). The ENN encodes a source data into a feature space that highly represents the source domain data. The ONN maps the encoded features to the desired output (e.g. test statistics for signal detection tasks).
In the DAM training stage, a deep neural network-based domain adaptation model (DAM) is trained by use of labeled simulated data (source data) and unlabeled experimental data (target data). The trained DAM will be employed to adapt the trained source observer to the target data domain (target domain). This task is achieved by mapping the target data to a feature space that is close to the feature space the trained ENN maps to. The DAM is trained by minimizing the distance between the feature space of the source domain and that of the target domain. A neural network-based domain critic model (DCM) is built up for measuring the distance between the two feature spaces. The DAM and DCM are iteratively trained via an adversarial learning approach introduced in the literature .
In the stage of target observer formulation, the trained DAM and the trained ONN (the second part of the source observer) are integrated to formulate a target numerical observer that can operate on experimental data for image quality assessment tasks.
2.2 Example of the Proposed Method
In this study, the method for learning a CNN-based numerical observer for a binary signal detection task proposed by Zhou et. al.  is employed as an example to demonstrate the stages of the proposed method and its performance.
2.2.1 Binary signal detection tasks
The considered task is a binary signal detection task in which the goal is to classify an imageinto either a signal-absent hypothesis () or a signal-present hypothesis (). The imaging process under these two hypothesises can be represented as:
where and denote the background and signal in the image domain, respectively, and is the measurement noise. Here is the total number of pixels in an image.
A numerical observer computes a scalar test statistic for this binary signal detection task. A decision is made in favor of hypothesis if is greater than some threshold; otherwise is selected.
2.2.2 Source observer training
The goal of this stage is to train a CNN-based source observer by use of a large amount of simulated images that represent the source domain. This is depicted in the left block of Figure 1.
In the stage of source observer training, the CNN-based source observer is trained. Figure 2 shows the network architectures of the ENN and ONN. The ENN contains a chain of
convolutional layer-Leaky ReLu layer (CONV-LeReLu) blocks and a max pooling layer (Max-Pool). The ONN has a fully connected layer-Leaky LeReLU (FC-LeReLU) block, followed by a sigmoid function in the last layer. The ENN encodes a simulated image into a low-dimensional but highly-representative feature space, while the ONN works as a classifier that computes a probability of the input simulated image belonging to hypothesisby use of the encoded features.
Let the CNN-based source observer be parameterized by a set of parameters . The output of the source observer can be represented by . Let denote the image label, where and correspond to the hypothesis and , respectively. Given a set of independent labeled source images, ,
is determined by minimizing an average cross-entropy loss function,, defined as:
where and are the training image and the associated label. The loss function is numerically minimized by use of methods described in the literature . The trained ENN (the first part of the source observer) will be employed for training the DAM as described next.
2.2.3 Domain adaptation model (DAM) training
The goal of DAM training stage is to train a DAM by use of a set of unlabeled source images (simulated images in the source domain), a set of unlabeled target images (experimental images in the target domain), and the trained ENN. The trained DAM will be employed to map target images to a feature space that has minimum domain shift with the feature space the trained ENN maps to.
The network architectures of the DAM and DCM are specified as two CNNs shown in Figure 2. In this study, the DAM has the same architecture as the ENN, considering both of them have similar function: mapping an image data into a low-dimensional feature space. The Wasserstein distance was employed to quantify the domain shift in the adversarial learning process for training the DAM and DCM. The trained CNN-based DAM will be employed as part of the target observer in the next stage.
2.2.4 Target observer formulation
In the final stage, the target numerical observer operating on experimental images is formulated by combining the trained DAM (trained in the second stage) and the trained ONN (trained in the first stage), as shown in the right block of Figure 1.
2.3 Numerical Studies to Demonstrate the Performance of the Proposed Method
In our proof-of-principle study, simulated images were employed to represent both the source and target domain images. This permitted a controlled and systematic investigation of the proposed method. The experimental (target domain) images were assumed to be produced by an idealized parallel-hole collimator system described by a point response function of the form 
where the system height . Five sets of target domain images were produced with different system blurs , , , and , respectively. The simulated (source domain) images were produced by the same imaging model but with an incorrect value of the system height and that of the system blur . It can be expected that different sets of target domain images have different levels of domain shifts with the source domain images. These data were employed to investigate how the degree of domain shift impacts the performance of the proposed method.
A binary signal-known-exactly and background-known-statistically (SKE/BKS) detection task was considered. For both the source and target domain images, the signal function, , was described by a 2D symmetric Gaussian function:
where is the amplitude, is the coordinate of the signal location, and is the width of the signal. The background described by a stochastic lumpy object model:
is the number of lumps that is sampled from a Poisson distribution:, where denotes a Poisson distribution with the mean that was set to , and the is the lumpy function modeled by a 2D Gaussian function with amplitude and width :
Here, was set to , was set to , and is the location of the
lumpy that was sampled from a uniform distribution over the field of view.
The signal image , the background image , and measurement noise were then generated as described below. The pixel of a signal image and background image were computed as:
The measurement noise was described by independent and identically distributed Gaussian random variables that models electronic noise:, where
denotes a Gaussian distribution with the mean
and the standard deviation, which was set to in this study.
The image sizes of both source and target images were pixels (i.e. ). Examples of signal-present images in the source and 5 target domains are shown Figure 3. Here, pairs of signal-present and signal-absent source images with labels were generated. Out of these image pairs, were employed as the training data to train the CNN-based source observer (shown in Figure 2), and were used as the related validation set for validating the source observer. During the source observer training, the values of parameters that result in the highest AUC performance of the source observer evaluated on the validation set were selected for the ENN and ONN.
Additionally, in each of the 5 target image sets, pairs of signal-present and signal-absent target images without labels and with labels were generated. Five DAMs were trained, and each of them was associated with one of the 5 target image sets and used for adapting the trained source observer to the corresponding set of target images. In each DAM training, the unlabeled source image pairs (generated in the previous stage) and the unlabeled target image pairs from the corresponding target domain were employed for the adversarial learning of the CNN-based DAM and DCM (specified in Figure 2). Out of the labeled target images, were employed to select the values of parameters for the DAM by evaluating AUC performance of the formulated target observer on them. The other labeled target images were employed for the method evaluation.
The naive method that directly applies the trained source observer to each of the 5 target image sets is referred to as the Source Observer (SO). The proposed method that employs adversarial domain adaption will be referred to as the Source Observer+Domain Adaptation (SODA). Finally, as a reference, we compute the performance of the CNN-based NO for the case when there is a large amount of labeled target images. Of course, the assumption of this work is that such data are not readily available. This method, referred to as the Target Observer (TO), employs the the same CNN architectures as the ENN and ONN specified in Figure 2. In the TO method, pairs of signal-present and signal-absent images from each of the 5 target image domains were generated and employed to directly train 5 target observers, respectively. The same 5 validation sets and 5 testing sets generated in the SODA were employed for validating and testing the 5 trained numerical observers in the TO. The detection performances of the TO will be used as ground truths to evaluate the performance of the proposed method at 5 different levels of domain shifts.
The ROC curves from SO, SODA, and TO were evaluated on the pairs of target testing images from each of the five target image domains. The Metz-ROC software was employed to fit the ROC curves .
As an example, the fitted ROC curves for the three methods computed in the target domain associated with is shown in Figure 4. The resulting ROC curve and AUC value of the proposed method were compared to those produced by use of the SO and TO respectively. The AUC values corresponding to the numerical observers trained by use of the SO, SODA, and TO are , , and , respectively. It is observed that, as expected, directly applying a trained source observer to target images provides the worst detection performance due to the domain shift between the source and target domains. By use of a domain adaptation strategy, the proposed method (SODA) can learn a CNN-based numerical observer that shows improved detection performance compared to the SO. The TO shows the best detection performance, however it is trained with a large amount of labeled target images but only a limited amount of unlabeled target images were employed in the proposed method (SODA).
Additionally, the detection performance, in terms of AUC, of the proposed SODA as a function of domain shifts are shown in the Table 1.
It can be observed from the table that in all the 5 domain shift cases, the proposed SODA improved the AUC performance compared to the SO, which demonstrate the potential of the proposed method. From the table we can also see that the gap between the AUC performance of proposed method and that of the TO increases with the increase in the level of domain shift and the increase in challenge level of the corresponding adaptation task.
This study provides a novel method to learn deep learning-based numerical observers operating on experimental data by use of adversarial domain adaptation methods. As a proof-of-principle study, a CNN-based numerical observer is learned by use of the proposed strategy for a binary SKE/BKS signal detection task. Experimental results demonstrate that the proposed method has the ability to learn deep learning-based numerical observers that operate on unlabeled experimental data in medical imaging. In future, more realistic object models will be employed to investigate the proposed method. Also, other type numeric observers, e.g. linear observers, will be learned to investigate the proposed method.
Acknowledgements.This research was supported in part by NIH awards EB020604, EB023045, NS102213, EB028652, and NSF award DMS1614305.
-  (2017) Wasserstein GAN. arXiv preprint arXiv:1701.07875. Cited by: §2.1.
-  (2013) Foundations of Image Science. John Wiley & Sons. Cited by: §1.
Unsupervised cross-modality domain adaptation of convnets for biomedical image segmentations with adversarial loss.
Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI), pp. 691–697. Cited by: §1.
Unsupervised domain adaptation by backpropagation. arXiv preprint arXiv:1409.7495. Cited by: §1.
-  (2011) Domain adaptation for object recognition: an unsupervised approach. In 2011 international conference on computer vision, pp. 999–1006. Cited by: §1.
-  (2009) Covariate shift and local learning by distribution matching. In Dataset Shift in Machine Learning, Max-Planck-GesellschaftBiologische Kybernetik, pp. 131–160. Cited by: §1.
-  (2019) Automatic microscopic cell counting by use of deeply-supervised density regression model. In Medical Imaging 2019: Digital Pathology, Vol. 10956, pp. 109560L. Cited by: §1.
-  (2019) Automatic microscopic cell counting by use of unsupervised adversarial domain adaptation and supervised density regression. In Medical Imaging 2019: Digital Pathology, Vol. 10956, pp. 1095604. Cited by: §1.
-  (2018) Convolutional neural network based automatic plaque characterization for intracoronary optical coherence tomography images. In Medical Imaging 2018: Image Processing, Vol. 10574, pp. 1057432. Cited by: §1.
-  (2018) Domain adaptation for biomedical image segmentation using adversarial training. In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), pp. 554–558. Cited by: §1.
Contrastive adaptation network for unsupervised domain adaptation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4893–4902. Cited by: §1.
Ideal-observer computation in medical imaging with use of markov-chain monte carlo techniques. JOSA A 20 (3), pp. 430–438. Cited by: §2.3.
-  (2019) Sliced wasserstein discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10285–10295. Cited by: §1.
-  (1998) Rockit user’s guide. Chicago, Department of Radiology, University of Chicago. Cited by: §3.
-  (2017) Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7167–7176. Cited by: §1.
-  (2018) Learning the Ideal Observer for SKE detection tasks by use of convolutional neural networks. In Medical Imaging 2018: Image Perception, Observer Performance, and Technology Assessment, Vol. 10577, pp. 1057719. Cited by: §1.
-  (2019) Learning the ideal observer for joint detection and localization tasks by use of convolutional neural networks. In Medical Imaging 2019: Image Perception, Observer Performance, and Technology Assessment, Vol. 10952, pp. 1095209. Cited by: §1.
-  (2019) Learning stochastic object model from noisy imaging measurements using ambientgans. In Medical Imaging 2019: Image Perception, Observer Performance, and Technology Assessment, Vol. 10952, pp. 109520M. Cited by: §1.
Approximating the ideal observer and hotelling observer for binary signal detection tasks by use of supervised learning methods. IEEE transactions on medical imaging. Cited by: §1, §2.2.2, §2.2.
-  (2019) Learning the hotelling observer for ske detection tasks by use of supervised learning methods. In Medical Imaging 2019: Image Perception, Observer Performance, and Technology Assessment, Vol. 10952, pp. 1095208. Cited by: §1.