Practical Window Setting Optimization for Medical Image Deep Learning

12/03/2018 ∙ by Hyunkwang Lee, et al. ∙ Harvard University 0

The recent advancements in deep learning have allowed for numerous applications in computed tomography (CT), with potential to improve diagnostic accuracy, speed of interpretation, and clinical efficiency. However, the deep learning community has to date neglected window display settings - a key feature of clinical CT interpretation and opportunity for additional optimization. Here we propose a window setting optimization (WSO) module that is fully trainable with convolutional neural networks (CNNs) to find optimal window settings for clinical performance. Our approach was inspired by the method commonly used by practicing radiologists to interpret CT images by adjusting window settings to increase the visualization of certain pathologies. Our approach provides optimal window ranges to enhance the conspicuity of abnormalities, and was used to enable performance enhancement for intracranial hemorrhage and urinary stone detection. On each task, the WSO model outperformed models trained over the full range of Hounsfield unit values in CT images, as well as images windowed with pre-defined settings. The WSO module can be readily applied to any analysis of CT images, and can be further generalized to tasks on other medical imaging modalities.



There are no comments yet.


page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep learning has made remarkable advancements in medical image analysis for various tasks across a range of imaging modalities (Esteva et al., 2017; Gulshan et al., 2016; Chilamkurthy et al., 2018). This rapid progress in image analysis capability has raised hopes that deployment of such technology will increase diagnostic accuracy, streamline clinical workflows, and improve patient outcomes (Thrall et al., 2018; Levin et al., 2018; Berlyand et al., 2018). Much of this progress has been attributed to increased computing power and the development of large and well-curated clinical datasets (Yu et al., 2018). However, recent years have demonstrated that significant performance gains remain to be had from innovations in neural network architectures and domain-specific image pre-processing (Greenspan et al., 2016).

Deep learning architectures in the medical domain may yet see significant performance improvements by incorporating expert knowledge about the target imaging modality and the current clinical workflow. In the case of computed tomography (CT), image values are defined over a wide range of Housefield Units (HU), but with different tissue types and pathologies generally visible only in narrow and specific ranges. As such, when interpreting CT images, human experts leverage tools in their workstations to apply predefined window levels (WL) and window widths (WW) to their display windows; these window adjustments focus visibility on the subset of tissues relevant to their task, and are crucial for the effective detection of some pathologies (Bae et al., 2005; Moise and Atkins, 2004). For example, radiologists may use a pre-set "brain" or "subdural" window setting for intracranial hemorrhage(ICH) detection. Despite the importance of optimal window settings in clinical practice, however, the effects of window settings on image quality and algorithmic performance have been overlooked in the literature. Most previous works have converted CT images to grayscale with a pre-set window setting, encoded three different windowed images into an RGB image, or used a wide range of image intensity values without windowing as input into deep learning models (Arbabshirani et al., 2018; Hoo-Chang et al., 2016; Anthimopoulos et al., 2016; Chang et al., 2018).

In this study, inspired by the way radiologists interpret CT images, we propose a window setting optimization (WSO) module comprised of convolution layers with 1x1 filters and customized activation functions. This enables us to find optimal window settings in a task-specific manner via backpropogation, which we demonstrate results in improved model performance on the detection of ICH and urinary stones. This WSO module mimics the radiologist’s workflow to optimize windowing functions and focus on the narrow window range in which the target organs or abnormalities can be clearly seen. Our method can be potentially applied to other CT image analysis tasks such as object detection and semantic segmentation, or to other medical imaging modalities such as a positron emission tomography (PET) scan and magnetic resonance imaging (MRI).

2 Methods

Figure 1: Window functions

2.1 Windowing function

The most common format for medical images is the digital imaging and communications in medicine (DICOM) format. DICOM images are encoded with either 12 or 16 bits per pixel, which is 4,096 or 65,536 levels per pixel, respectively. In CT imaging, these pixels represent Housefield Units values that correspond to tissue density and are generally distributed from (-1000) to >4000. The range and granularity of the data encoded in CT images extends far beyond the perceptual capacity of the human visual system, which can distinguish only several hundreds of grayscale. In addition, most medical displays support at most 8-bit resolution (Kimpe and Tuytschaever, 2007). For these technical and biological reasons, CT images can only be successfully interpreted by humans when the display device allows for a window function with adjustment of window settings. These settings map the visual range of the displays to a specified window, and assign all HU values outside this window range to or (Fig. 1). Windowing functions are defined based on linear or sigmoidal conversion as the following equations:


The constant is the upper limit of windowing functions and is the margin between the upper/lower limits and window end/start gray levels which determine the slope at the center.

2.2 Window setting optimization module

The linear and sigmoid windowing functions utilized in radiologist’s monitors can be emulated as a WSO module inside a neural network architecture. This is achieved by using convolution layers with 1x1 filters and a stride setting, followed by an activation layer – an upper-bounded rectified linear unit (ReLU), or sigmoid function multiplied by

, respectively, for the linear or sigmoid windowing function. In our implementation, full-range DICOM images are passed through this WSO module prior to being used as input to a CNN, as shown in Fig. 2. Weights and biases of the 1x1 convolution layers in the WSO module can thus be optimized along with the CNN, facilitating the identification of optimal windowing functions to visually extract the necessary features for maximum classification performance.

Figure 2: Overall architecture of window setting optimization (WSO) module

2.3 Dataset

This retrospective health insurance portability and accountability act (HIPAA)-compliant study was approved by the Institutional Review Board at our institution. All DICOM images were de-identified prior to this study. For ICH detection, a total of 904 non-contrast head CT examinations, including 625 ICH-positive and 279 ICH-negative cases, were acquired from our institutional Picture Archiving and Communication System (PACS) between June 2003 and July 2017. For urinary stone detection, we retrieved a total of 515 unenhanced abdominopelvic CT examinations, including 256 stone-negative and 279 stone-positive cases, from our PACS between January and October 2016. 2-dimensional (2D) axial slices of all head CT cases were annotated as the presence of ICH by five neuroradiologists by consensus, and axial slices from all abdominopelvic CT scans were labeled by a radiologist with 6 years experience as presence of urinary stones along with original radiology reports. 2D axial slices for both ICH and urinary stone detection were randomly split into train, validation, and test datasets by cases to make sure there is no overlap in cases between subsets (Table 2).

2.4 Experimental setup

We evaluated ten different classification models developed by training Inception-v3 (Szegedy et al., 2016) on various forms of input images with and without using WSO for ICH and urinary stone detection. First, CT images with a full dynamic range of HU values were used for a baseline model. Images converted with either of two pre-defined window settings and two-channel images generated with both were also used as input to CNN without a WSO module. The two pre-set window settings include "brain" (WL=50HU, WW=100HU) and "subdural" (WL=50HU, WW=130HU) windows for ICH detection, and "bone" (WL=300HU, WW=1500HU) and "abdomen" (WL=40HU, WW=400HU) for urinary stone detection. In addition, Inception-v3 equipped with WSO was trained on full-range DICOM images with weights and biases of convolution layers in the WSO initialized according to the windowing function type. For this preliminary study, we set as 255 and as 1, and associated and were then computed using Eq. 1 for linear and Eq. 2

for sigmoid for each pre-defined window setting. All classification models were trained for 60 epochs using the Adam optimizer

(Kingma and Ba, 2014) with default settings and a mini-batch size of 64. The base learning rate of 0.001 was decayed by a factor of 10 every 20 epochs, and the best models were selected based on the validation loss.

3 Results

Input Windowing function Initialization AP AUC AP AUC
HU values - - 0.807 0.923 0.813 0.800
Windowed with - - 0.925 0.963 0.920 0.917
Windowed with - - 0.932 0.967 0.945 0.944
Windowed with , - - 0.934 0.969 0.946 0.946
HU values 0.929 0.963 0.926 0.924
HU values 0.933 0.966 0.943 0.934
HU values , 0.940 0.970 0.951 0.946
HU values 0.930 0.966 0.959 0.955
HU values 0.939 0.971 0.970 0.970
HU values , 0.950 0.976 0.971 0.972
Table 1: Performance of ten different models trained with different input data and WSO module for ICH and urinary stone detection. =Brain window and =Subdural window settings for ICH and =Bone window and =Abdomen window settings for Stone.

Table 1 shows average precision (AP) and area under the ROC curve (AUC) evaluated on the test sets for the ten different models for ICH and urinary stone detection. Models trained on windowed images with pre-defined settings obtained significantly better performance than models trained with CT images with a dynamic full range of HU values for both classification tasks. Furthermore, model performance was improved when using WSO to optimize window settings instead of using fixed standard ones, especially with sigmoid windowing function, parameters of which were initialized according to two standard window settings.

Figure 3: Examples of DICOM and optimized windowed images

4 Discussion

In this study, we demonstrated that models with WSO achieved better performance for ICH and urinary stone detection on CT images, compared to using full-range DICOM images or windowed images with standard pre-defined window settings. Furthermore, as shown in Figure 3, WSO enabled models to find optimal window settings that make the regions of hemorrhage and urinary stones (highlighted in yellow) more conspicuous against neighboring anatomical structures to maximize classification model performance. Our WSO models can be further optimized by investigating the effects of the number of input image channels, , and on the performance of target application. Additionally, we stress that the WSO-based approach described here is not specific to abnormality classification on CT images, but rather generalizable to various image interpretation task on a variety of medical imaging modalities.

5 Acknowledgements

We thank Claire Jeon and Samuel G. Finlayson for proofreading and constructive comments on the manuscript.


  • (1)
  • Anthimopoulos et al. (2016) Marios Anthimopoulos, Stergios Christodoulidis, Lukas Ebner, Andreas Christe, and Stavroula Mougiakakou. 2016. Lung pattern classification for interstitial lung diseases using a deep convolutional neural network. IEEE transactions on medical imaging 35, 5 (2016), 1207–1216.
  • Arbabshirani et al. (2018) Mohammad R Arbabshirani, Brandon K Fornwalt, Gino J Mongelluzzo, Jonathan D Suever, Brandon D Geise, Aalpen A Patel, and Gregory J Moore. 2018.

    Advanced machine learning in action: identification of intracranial hemorrhage on computed tomography scans of the head with clinical workflow integration.

    npj Digital Medicine 1, 1 (2018), 9.
  • Bae et al. (2005) Kyongtae T Bae, Gita N Mody, Dennis M Balfe, Sanjeev Bhalla, David S Gierada, Fernando R Gutierrez, Christine O Menias, Pamela K Woodard, Jin Mo Goo, and Charles F Hildebolt. 2005. CT depiction of pulmonary emboli: display window settings. Radiology 236, 2 (2005), 677–684.
  • Berlyand et al. (2018) Yosef Berlyand, Ali S Raja, Stephen C Dorner, Anand M Prabhakar, Jonathan D Sonis, Ravi V Gottumukkala, Marc David Succi, and Brian J Yun. 2018.

    How artificial intelligence could transform emergency department operations.

    The American journal of emergency medicine (2018).
  • Chang et al. (2018) PD Chang, E Kuoy, J Grinband, BD Weinberg, M Thompson, R Homo, J Chen, H Abcede, M Shafie, L Sugrue, et al. 2018. Hybrid 3D/2D Convolutional Neural Network for Hemorrhage Evaluation on Head CT. American Journal of Neuroradiology 39, 9 (2018), 1609–1616.
  • Chilamkurthy et al. (2018) Sasank Chilamkurthy, Rohit Ghosh, Swetha Tanamala, Mustafa Biviji, Norbert G Campeau, Vasantha Kumar Venugopal, Vidur Mahajan, Pooja Rao, and Prashant Warier. 2018. Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study. The Lancet (2018).
  • Esteva et al. (2017) Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter, Helen M Blau, and Sebastian Thrun. 2017. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 7639 (2017), 115.
  • Greenspan et al. (2016) Hayit Greenspan, Bram Van Ginneken, and Ronald M Summers. 2016. Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique. IEEE Transactions on Medical Imaging 35, 5 (2016), 1153–1159.
  • Gulshan et al. (2016) Varun Gulshan, Lily Peng, Marc Coram, Martin C Stumpe, Derek Wu, Arunachalam Narayanaswamy, Subhashini Venugopalan, Kasumi Widner, Tom Madams, Jorge Cuadros, et al. 2016. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama 316, 22 (2016), 2402–2410.
  • Hoo-Chang et al. (2016) Shin Hoo-Chang, Holger R Roth, Mingchen Gao, Le Lu, Ziyue Xu, Isabella Nogues, Jianhua Yao, Daniel Mollura, and Ronald M Summers. 2016.

    Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning.

    IEEE transactions on medical imaging 35, 5 (2016), 1285.
  • Kimpe and Tuytschaever (2007) Tom Kimpe and Tom Tuytschaever. 2007. Increasing the number of gray shades in medical display systems—how much is enough? Journal of digital imaging 20, 4 (2007), 422–432.
  • Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
  • Levin et al. (2018) Scott Levin, Matthew Toerper, Eric Hamrock, Jeremiah S Hinson, Sean Barnes, Heather Gardner, Andrea Dugas, Bob Linton, Tom Kirsch, and Gabor Kelen. 2018. Machine-learning-based electronic triage more accurately differentiates patients with respect to clinical outcomes compared with the emergency severity index. Annals of emergency medicine 71, 5 (2018), 565–574.
  • Moise and Atkins (2004) Adrian Moise and M Stella Atkins. 2004. Design requirements for radiology workstations. Journal of Digital Imaging 17, 2 (2004), 92–99.
  • Szegedy et al. (2016) Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016.

    Rethinking the inception architecture for computer vision. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    . 2818–2826.
  • Thrall et al. (2018) James H Thrall, Xiang Li, Quanzheng Li, Cinthia Cruz, Synho Do, Keith Dreyer, and James Brink. 2018. Artificial intelligence and machine learning in radiology: opportunities, challenges, pitfalls, and criteria for success. Journal of the American College of Radiology 15, 3 (2018), 504–508.
  • Yu et al. (2018) Kun-Hsing Yu, Andrew L. Beam, and Isaac S. Kohane. 2018. Artificial intelligence in healthcare. Nature Biomedical Engineering 2, 10 (2018), 719–731.


ICH Stone
No ICH ICH No Stone Stone
Data split no. cases no. slices no. cases no. slices no. cases no. slices no. cases no. slices
Train 179 7484 525 5517 176 1179 199 1179
Validation 50 2185 50 668 30 181 30 181
Test 50 2139 50 613 50 347 50 347
Table 2: Data distribution for ICH and urinary stone detection tasks