NiftyNet: a deep-learning platform for medical imaging

09/11/2017 ∙ by Eli Gibson, et al. ∙ 0

Medical image analysis and computer-assisted intervention problems are increasingly being addressed with deep-learning-based solutions. Established deep-learning platforms are flexible but do not provide specific functionality for medical image analysis and adapting them for this application requires substantial implementation effort. Thus, there has been substantial duplication of effort and incompatible infrastructure developed across many research groups. This work presents the open-source NiftyNet platform for deep learning in medical imaging. The ambition of NiftyNet is to accelerate and simplify the development of these solutions, and to provide a common mechanism for disseminating research outputs for the community to use, adapt and build upon. NiftyNet provides a modular deep-learning pipeline for a range of medical imaging applications including segmentation, regression, image generation and representation learning applications. Components of the NiftyNet pipeline including data loading, data augmentation, network architectures, loss functions and evaluation metrics are tailored to, and take advantage of, the idiosyncracies of medical image analysis and computer-assisted intervention. NiftyNet is built on TensorFlow and supports TensorBoard visualization of 2D and 3D images and computational graphs by default. We present 3 illustrative medical image analysis applications built using NiftyNet: (1) segmentation of multiple abdominal organs from computed tomography; (2) image regression to predict computed tomography attenuation maps from brain magnetic resonance images; and (3) generation of simulated ultrasound images for specified anatomical poses. NiftyNet enables researchers to rapidly develop and distribute deep learning solutions for segmentation, regression, image generation and representation learning applications, or extend the platform to new applications.



There are no comments yet.


page 16

page 18

page 19

Code Repositories


An open-source convolutional neural networks platform for research in medical image analysis and image-guided therapy

view repo


Brain tumor segmentation for MICCAI 2017 BraTS challenge

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Computer-aided analysis of medical images plays a critical role at many stages of the clinical workflow from population screening and diagnosis to treatment delivery and monitoring. This role is poised to grow as analysis methods become more accurate and cost effective. In recent years, a key driver of such improvements has been the adoption of deep learning and convolutional neural networks in many medical image analysis and computer-assisted intervention tasks.

Deep learning refers to a deeply nested composition of many simple functions (principally linear combinations such as convolutions, scalar non-linearities and moment normalizations) parameterized by variables. The particular composition of functions, called the architecture, defines a parametric function (typically with millions of parameters) that can be optimized to minimize an objective, or ‘loss’, function, usually using some form of gradient descent.

Although the first use of neural networks for medical image analysis dates back more than twenty years (Lo et al., 1995), their usage has increased by orders of magnitude in the last five years. Recent reviews (Shen et al., 2017; Litjens et al., 2017) have highlighted that deep learning has been applied to a wide range of medical image analysis tasks (segmentation, classification, detection, registration, image reconstruction, enhancement, etc.) across a wide range of anatomical sites (brain, heart, lung, abdomen, breast, prostate, musculature, etc.). Although each of these applications have their own specificities, there is substantial overlap in software pipelines implemented by many research groups.

Deep-learning pipelines for medical image analysis comprise many interconnected components. Many of these are common to all deep-learning pipelines:

  • separation of data into training, testing and validation sets;

  • randomized sampling during training;

  • image data loading and sampling;

  • data augmentation;

  • a network architecture defined as the composition of many simple functions;

  • a fast computational framework for optimization and inference;

  • metrics for evaluating performance during training and inference.

In medical image analysis, many of these components have domain specific idiosyncrasies, detailed in Section 4. For example, medical images are typically stored in specialized formats that handle large 3D images with anisotropic voxels and encode additional spatial information and/or patient information, requiring different data loading pipelines. Processing large volumetric images has high memory requirements and motivates domain-specific memory-efficient networks or custom data sampling strategies. Images are often acquired in standard anatomical views and can represent physical properties quantitatively, motivating domain-specific data augmentation and model priors. Additionally, the clinical implications of certain errors may warrant custom evaluation metrics. Independent reimplementation of all of this custom infrastructure results in substantial duplication of effort, poses a barrier to dissemination of research tools and inhibits fair comparisons between competing methods.

This work presents the open-source NiftyNet555Available at platform to 1) facilitate efficient deep learning research in medical image analysis and computer-assisted intervention; and 2) reduce duplication of effort. The NiftyNet platform comprises an implementation of the common infrastructure and common networks used in medical imaging, a database of pre-trained networks for specific applications and tools to facilitate the adaptation of deep learning research to new clinical applications with a shallow learning curve.

2 Background

The development of common software infrastructure for medical image analysis and computer-assisted intervention has a long history. Early efforts included the development of medical imaging file formats (e.g. ACR-NEMA (1985), Analyze 7.5 (1986), DICOM (1992) MINC (1992), and NIfTI (2001)). Toolsets to solve common challenges such as registration (e.g. NiftyReg (Modat et al., 2010), ANTs (Avants et al., 2011) and elastix (Klein et al., 2010)), segmentation (e.g. NiftySeg (Cardoso et al., 2012)), and biomechanical modeling (e.g. (Johnsen et al., 2015)) are available for use as part of image analysis pipelines. Pipelines for specific research applications such as FSL (Smith et al., 2004) for functional MRI analysis and Freesurfer (Fischl et al., 1999; Dale et al., 1999) for structural neuroimaging have reached widespread use. More general toolkits offering standardized implementations of algorithms (VTK and ITK (Pieper et al., 2006)) and application frameworks (NifTK (Clarkson et al., 2015), MITK (Nolden et al., 2013) and 3D Slicer (Pieper et al., 2006)) enable others to build their own pipelines. Common software infrastructure has supported and accelerated medical image analysis and computer-assisted intervention research across hundreds of research groups. However, despite the wide availability of general purpose deep learning software tools, deep learning technology has limited support in current software infrastructure for medical image analysis and computer-assisted intervention.

Software infrastructure for general purpose deep learning is a recent development. Due to the high computational demands of training deep learning models and the complexity of efficiently using modern hardware resources (general purpose graphics processing units and distributed computing, in particular), numerous deep learning libraries and platforms have been developed and widely adopted, including cuDNN (Chetlur et al., ), TensorFlow (Abadi et al., 2016)

, Theano  

(Bastien et al., 2012)

, Caffe 

(Jia et al., 2014)

, Torch 

(Collobert et al., 2011), CNTK (Seide and Agarwal, 2016), and MatConvNet (Vedaldi and Lenc, 2015).

These platforms facilitate the definition of complex deep learning networks as compositions of simple functions, hide the complexities of differentiating the objective function with respect to trainable parameters during training, and execute efficient implementations of performance-critical functions during training and inference. These frameworks have been optimized for performance and flexibility, and using them directly can be challenging, inspiring the development of platforms that simplify the development process for common usage scenarios, such as Keras 

(Chollet et al., 2015), and TensorLayer (Dong et al., 2017) for TensorFlow and Lasagne (Dieleman et al., 2015) for Theano. However, by avoiding assumptions about the application to remain general, the platforms are unable to provide specific functionality for medical image analysis and adapting them for this domain of application requires substantial implementation effort.

Developed concurrently with the NiftyNet platform, the Deep Learning Toolkit666 aims to support fast prototyping and reproducibility by implementing deep learning methods and modules for medical image analysis. While still in preliminary development, it appears to focus on deep learning building blocks rather than analysis pipelines. NifTK (Clarkson et al., 2015; Gibson et al., 2017c) and Slicer3D (via the DeepInfer (Mehrtash et al., 2017) plugin) provide infrastructure for distribution of trained deep learning pipelines. Although this does not address the substantial infrastructure needed for training deep learning pipelines, integration with existing medical image analysis infrastructure and modular design makes these platforms promising routes for distributing deep-learning pipelines.

3 Typical deep learning pipeline

Figure 1: Data flow implemented in typical deep learning projects. Boxes represent the software infrastructure to be developed and arrows represent the data flow.

Deep learning adopts the typical machine learning pipeline consisting of three phases: model selection (picking and fitting a model on training data), model evaluation (measuring the model performance on testing data), and model distribution (sharing the model for use on a wider population). Within these simple phases lies substantial complexity, illustrated in Figure 

1. The most obvious complexity is in implementing the network being studied. Deep neural networks generally use simple functions, but compose them in complex hierarchies; researchers must implement the network being tested, as well as previous networks (often incompletely specified) for comparison. To train, evaluate and distribute these networks, however, requires further infrastructure. Data sets must be correctly partitioned to avoid biassed evaluations, sometimes considering data correlations (e.g. images acquired at the same hospital may be more similar to each other than to those from other hospitals). The data must be sampled, loaded and passed to the network, in different ways depending on the phase of the pipeline. Algorithms for tuning hyper-parameters within a family of models and optimizing model parameters on the training data are needed. Logging and visualization are needed to debug and dissect models during and after training. In applications with limited data, data sets must be augmented by perturbing the training data in realistic ways to prevent over-fitting. In deep learning, it is common practice to adapt previous network architectures, trained or untrained, in part or in full for similar or different tasks; this requires a community repository (popularly called a model zoo) storing models and parameters in an adaptable format. Much of this infrastructure is recreated by each researcher or research group undertaking a deep learning project, and much of it depends on the application domain being addressed.

4 Design considerations for deep learning in medical imaging

Medical image analysis differs from other domains where deep learning is applied due to characteristics of the data itself, and the applications in which they are used. In this section, we present the domain-specific requirements driving the design of NiftyNet.

4.1 Data availability

Acquiring, annotating and distributing medical image data sets have higher costs than in many computer vision tasks. For many medical imaging modalities, generating an image is costly. Annotating images for many applications requires high levels of expertise from clinicians with limited time. Additionally, due to privacy concerns, sharing data sets between institutions, let alone internationally, is logistically and legally challenging. Although recent tools such as DeepIGeoS 

(Wang et al., 2017b) for semi-automated annotation and GIFT-Cloud (Doel et al., 2017)

for data sharing are beginning to reduce these barriers, typical data sets remain small. Using smaller data sets increases the importance of data augmentation, regularization, and cross-validation to prevent over-fitting. The additional cost of data set annotation also places a greater emphasis on semi- and unsupervised learning.

4.2 Data dimensionality and size

Data dimensionality encountered in medical image analysis and computer-assisted intervention typically ranges from 2D to 5D. Many medical images, including MRI, CT, PET and SPECT, capture volumetric images. Longitudinal imaging (multiple images taken over time) is typical in interventional settings as well as clinically useful for measuring organ function (e.g. blood ejection fraction in cardiac imaging) and disease progression (e.g. cortical thinning in neurodegenerative diseases).

At the same time, capturing high-resolution data in multiple dimensions is often necessary to detect small but clinically important anatomy and pathology. The combination of these factors results in large data sizes for each sample, which impact computational and memory costs. Deep learning in medical imaging uses various strategies to account for this challenge. Many networks are designed to use partial images: 2D slices sampled along one axis from 3D images (Zhou et al., 2016), 3D subvolumes (Li et al., 2017), anisotropic convolution Wang et al. (2017a), or combinations of subvolumes along multiple axes (Roth et al., 2014). Other networks use multi-scale representations allowing deeper and wider networks on lower-resolution representations (Milletari et al., 2016; Kamnitsas et al., 2017). A third approach uses dense networks to reuse feature representations multiple times in the network (Gibson et al., 2017b). Smaller batch sizes can reduce the memory cost, but rely on different weight normalization functions such as batch renormalization (Ioffe, 2017), weight normalization (Salimans and Kingma, 2016) or layer normalization (Ba et al., 2016).

4.3 Data formatting

Data sets in medical imaging are typically stored in different formats than in many computer vision tasks. To support the higher-dimensional medical image data, specialized formats have been adopted (e.g. DICOM, NIfTI, Analyze). These formats frequently also store metadata that is critical to image interpretation, including spatial information (anatomical orientation and voxel anisotropy), patient information (demographics and identifiers), and acquisition information (modality types and scanner parameters). These medical imaging specific data formats are typically not supported by existing deep learning frameworks, requiring custom infrastructure for loading images.

4.4 Data properties

The characteristic properties of medical image content pose opportunities and challenges. Medical images are obtained under controlled conditions, allowing more predictable data distributions. In many modalities, images are calibrated such that spatial relationships and image intensities map directly to physical quantities and are inherently normalized across subjects. For a given clinical workflow, image content is typically consistent, potentially enabling the characterization of plausible intensity and spatial variation for data augmentation. However, some clinical applications introduce additional challenges. Because small image features can have large clinical importance, and because some pathology is very rare but life-threatening, medical image analysis must deal with large class imbalances, motivating special loss functions (Milletari et al., 2016; Fidon et al., 2017; Sudre et al., 2017). Furthermore, different types of error may have very different clinical impacts, motivating specialized loss functions and evaluation metrics (e.g. spatially weighted segmentation metrics). Applications in computer-assisted intervention where analysis results are used in real time (e.g.  Gibson et al. (2017c); Garcia-Peraza-Herrera et al. (2017)) have additional constraints on analysis latency.

5 NiftyNet: a platform for deep learning in medical imaging

The NiftyNet platform aims to augment the current deep learning infrastructure to address the ideosyncracies of medical imaging described in Section 4, and lower the barrier to adopting this technology in medical imaging applications. NiftyNet is built using the TensorFlow library, which provides the tools for defining computational pipelines and executing them efficiently on hardware resources, but does not provide any specific functionality for processing medical images, or high-level interfaces for common medical image analysis tasks. NiftyNet provides a high-level deep learning pipeline with components optimized for medical imaging applications (data loading, sampling and augmentation, networks, loss functions, evaluations, and a model zoo) and specific interfaces for medical image segmentation, classification, regression, image generation and representation learning applications.

5.1 Design goals

The design of NiftyNet follows several core principles which support a set of key requirements:

  • support a wide variety of application types in medical image analysis and computer-assisted intervention;

  • enable research in one aspect of the deep learning pipeline without the need for recreating the other parts;

  • be simple to use for common use cases, but flexible enough for complex use cases;

  • support built-in TensorFlow features (parallel processing, visualization) by default;

  • support best practices (data augmentation, data set separation) by default;

  • support model distribution and adaptation.

5.2 System overview

Figure 2: A brief overview of NiftyNet components.

The NiftyNet platform comprises several modular components. The NiftyNet ApplicationDriver defines the common structure across all applications, and is responsible for instantiating the data analysis pipeline and distributing the computation across the available computational resources. The NiftyNet Application classes encapsulate standard analysis pipelines for different medical image analysis applications, by connecting four components: a Reader to load data from files, a Sampler to generate appropriate samples for processing, a Network to process the inputs, and an output handler (comprising the Loss and Optimizer during training and an Aggregator during inference and evaluation). The Sampler includes sub-components for data augmentation. The Network includes sub-components representing individual network blocks or larger conceptual units. These components are briefly depicted in Figure 2 and detailed in the following sections.

As a concrete illustration, one instantiation of the SegmentationApplication could use the following modules. During training, it could use a UniformSampler to generate small image patches and corresponding labels; a vnet Network would process batches of images to generate segmentations; a Dice LossFunction

would compute the loss used for backpropagation using the

Adam Optimizer. During inference, it could use a GridSampler to generate a set of non-overlapping patches to cover the image to segment, the same network to generate corresponding segmentations, and a GridSamplesAggregator to aggregate the patches into a final segmentation.

5.3 Component details: ApplicationDriver class

The NiftyNet ApplicationDriver defines the common structure for all NiftyNet pipelines. It is responsible for instantiating the data and Application objects and distributing the workload across and recombining results from the computational resources (potentially including multiple CPUs and GPUs). It is also responsible for handling variable initialization, variable saving and restoring, and logging. Implemented as a template design pattern (Gamma et al., 1994), the ApplicationDriver delegates application-specific functionality to separate Application classes.

The ApplicationDriver can be configured from the command line or programmatically using a human-readable configuration file. This file contains the data set definitions and all the settings that deviate from the defaults. When the ApplicationDriver saves its progress, the full configuration (including default parameters) is also saved so that the analysis pipeline can be recreated to continue training or carry out inference internally or with a distributed model.

5.4 Component details: Application class

Medical image analysis encompasses a wide range of tasks for different parts of the pre-clinical and clinical workflow: segmentation, classification, detection, registration, reconstruction, enhancement, model representation and generation. Different applications use different types of inputs and outputs, different networks, and different evaluation metrics; however, there is common structure and functionality among these applications supported by NiftyNet. NiftyNet currently supports

  • image segmentation,

  • image regression,

  • image model representation (via auto-encoder applications), and

  • image generation (via auto-encoder and generative adversarial networks (GANs)),

and it is designed in a modular way to support the addition of new application types, by encapsulating typical application workflows in Application classes.

The Application class defines the required data interface for the Network and Loss, facilitates the instantiation of appropriate Sampler and output handler objects, connects them as needed for the application, and specifies the training regimen. For example, the SegmentationApplication specifies that networks accept images (or patches thereof) and generate corresponding labels, that losses accept generated and reference segmentations and an optional weight map, and that the optimizer trains all trainable variables in each iteration. In contrast, the GANApplication

specifies that networks accept a noise source, samples of real data and an optional conditioning image, losses accept logits denoting if a sample is real or generated, and the optimizer alternates between training the discriminator sub-network and the generator sub-network.

5.5 Component details: Networks and Layers

The complex composition of simple functions that comprise a deep learning architecture is simplified in typical networks by the repeated reuse of conceptual blocks. In NiftyNet, these conceptual blocks are represented by encapsulated Layer classes, or inline using TensorFlow’s scoping system. Composite layers, and even entire networks, can be constructed as simple compositions of NiftyNet layers and TensorFlow operations. This supports the reuse of existing networks by clearly demarcating conceptual blocks of code that can be reused and assigning names to corresponding sets of variables that can be reused in other networks (detailed in Section 5.10). This also enables automatic support for visualization of the network graph as a hierarchy at different levels of detail using the TensorBoard visualizer (Mané et al., 2015) as shown in Figure 3. Following the model used in Sonnet (Reynolds et al., 2017), Layer objects define a scope upon instantiation, which can be reused repeatedly to allow complex weight-sharing without breaking encapsulation.

Figure 3: TensorBoard visualization of a NiftyNet generative adversarial network. TensorBoard interactively shows the composition of conceptual blocks (rounded rectangles) and their interconnections (grey lines) and color-codes similar blocks. Above, the generator and discriminator blocks and one of the discriminator’s residual blocks are expanded. Font and block sizes were edited for readability.

5.6 Component details: data loading

The Reader class is responsible for loading corresponding image files from medical file formats for a specified data set, and applying image-wide preprocessing. For simple use cases, NiftyNet can automatically identify corresponding images in a data set by searching a specified file path and matching user-specified patterns in file names, but it also allows explicitly tabulated comma-separated value files for more complex data set structures (e.g. cross-validation studies). Input and output of medical file formats are already supported in multiple existing Python libraries, although each library supports different sets of formats. To facilitate a wide range of formats, NiftyNet uses nibabel (Brett et al., 2016) as a core dependency but can fall back on other libraries (e.g. SimpleITK (Lowekamp et al., 2013) if they are installed and a file format is not supported by nibabel. A pipeline of image-wide preprocessing functions, described in Section 5.8, is applied to each image before samples are taken.

5.7 Component details: Samplers and output handlers

To handle the breadth of applications in medical image analysis and computer-assisted intervention, NiftyNet provides flexibility in mapping from an input data set into packets of data to be processed and from the processed data into useful outputs. The former is encapsulated in Sampler classes, and the latter is encapsulated in output handlers. Because the sampling and output handling are tightly coupled and depend on the action being performed (i.e. training, inference or evaluation), the instantiation of matching Sampler objects and output handlers is delegated to the Application class.

Sampler objects generate a sequence of packets of corresponding data for processing. Each packet contains all the data for one independent computation (e.g. one step of gradient descent during training), including images, labels, classifications, noise samples or other data needed for processing. During training, samples are taken randomly from the training data, while during inference and evaluation the samples are taken systematically to process the whole data set. To feed these samples to TensorFlow, NiftyNet automatically takes advantage of TensorFlow’s data queue support: data can be loaded and sampled in multiple CPU threads, combined into mini-batches and consumed by one or more GPUs. NiftyNet includes Sampler classes for sampling image patches (uniformly or based on specified criteria), sampling whole images rescaled to a fixed size and sampling noise; and it supports composing multiple Sampler objects for more complex inputs.

Output handlers take different forms during training and inference. During training, the output handler takes the network output, computes a loss and the gradient of the loss with respect to the trainable variables, and uses an inlinecodeOptimizer to iteratively train the model. During inference, the output handler generates useful outputs by aggregating one or more network outputs and performing any necessary postprocessing (e.g. resizing the outputs to the original image size). NiftyNet currently supports Aggregator objects for combining image patches, resizing images, and computing evaluation metrics.

5.8 Component details: data normalization and augmentation

Data normalization and augmentation are two approaches to compensating for small training data sets in medical image analysis, wherein the training data set is too sparse to represent the variability in the distribution of images. Data normalization reduces the variability in the data set by transforming inputs to have specified invariant properties, such as fixed intensity histograms or moments (mean and variance). Data augmentation artificially increases the variability of the training data set by introducing random perturbations during training, for example applying random spatial transformations or adding random image noise. In NiftyNet, data augmentation and normalization are implemented as

Layer classes applied in the Sampler, as plausible data transformations will vary between applications. Some of these layers, such as histogram normalization, are data dependent; these layers compute parameters over the data set before training begins. NiftyNet currently supports mean, variance and histogram intensity data normalization, and flip, rotation and scaling spatial data augmentation.

5.9 Component details: data evaluation

Summarizing and comparing the performance of image analysis pipelines typically rely on standardized descriptive metrics and error metrics as surrogates for performance. Because individual metrics are sensitive to different aspects of performance, multiple metrics are reported together. Reference implementations of these metrics reduce the burden of implementation and prevent implementation inconsistencies. NiftyNet currently supports the calculation of descriptive and error metrics for segmentation. Descriptive statistics include spatial metrics (e.g. volume, surface/volume ratio, compactness) and intensity metrics (e.g. mean, quartiles, skewness of intensity). Error metrics, computed with respect to a reference segmentation, include overlap metrics (e.g. Dice and Jaccard scores; voxel-wise sensitivity, specificity and accuracy), boundary distances (e.g. mean absolute distance and Hausdorff distances) and region-wise errors (e.g. detection rate; region-wise sensitivity, specificity and accuracy).

5.10 Component details: model zoo for network reusability

To support the reuse of network architectures and trained models, many deep learning platforms host a database of existing trained and untrained networks in a standardized format, called a model zoo. Trained networks can be used directly (as part of a workflow or for performance comparisons), fine-tuned for different data distributions (e.g. a different hospital’s images), or used to initialize networks for other applications (i.e. transfer learning). Untrained networks or conceptual blocks can be used within new networks. NiftyNet provides several mechanisms to support the distribution and reuse of networks and conceptual blocks.

Trained NiftyNet networks can be restored directly using configuration options. Trained networks developed outside of NiftyNet can be adapted to NiftyNet by encapsulating the network within a Network class derived from TrainableLayer. Externally trained weights can be loaded within NiftyNet using a restoreinitializer, adapted from Sonnet (Reynolds et al., 2017), for the complete network or individual conceptual blocks. restoreinitializer initializes the network weights with those stored in a specified checkpoint, and supports variablescope renaming for checkpoints with incompatible scope names. Smaller conceptual blocks, encapsulated in Layer classes, can be reused in the same way. Trained networks incorporating previous networks are saved in a self-contained form to minimize dependencies.

The NiftyNet model zoo contains both untrained networks (e.g. unet (Çiçek et al., 2016) and vnet (Milletari et al., 2016) for segmentation), as well as trained networks for some tasks (e.g. densevnet (Gibson et al., 2017a) for multi-organ abdominal CT segmentation, wnet (Wang et al., 2017a) for brain tumor segmentation and simulatorgan (Hu et al., 2017) for generating ultrasound images). Model zoo entries should follow a standard format comprising:

  • Python source code defining any components not included in NiftyNet (e.g. external Network classes, Loss functions);

  • an example configuration file defining the default settings and the data ordering;

  • documentation describing the network and assumptions on the input data (e.g. dimensionality, shape constraints, intensity statistic assumptions).

For trained networks, it should also include:

  • a Tensorflow checkpoint containing the trained weights;

  • documentation describing the data used to train the network and on which the trained network is expected to perform adequately.

5.11 Platform processes

In addition to the implementation of common functionality, NiftyNet development has adopted good software development processes to support the ease-of-use, robustness and longevity of the platform as well as the creation of a vibrant community. The platform supports easy installation via the pip installation tool777 (i.e. pip install niftynet) and provides analysis pipelines that can be run as part of the command line interface. Examples demonstrating the platform in multiple use cases are included to reduce the learning curve. The NiftyNet repository uses continuous integration incorporating system and unit tests for regression testing. NiftyNet releases will follow the semantic versioning 2.0 standard (Preston-Werner, 2015) to ensure clear communication regarding backwards compatibility.

6 Results: illustrative applications

6.1 Abdominal organ segmentation

Segmentations of anatomy and pathology on medical images can support image-guided interventional workflows by enabling the visualization of hidden anatomy and pathology during surgical navigation. Here we present an example, based on a simplified version of (Gibson et al., 2017a), that illustrates the use of NiftyNet to train a Dense V-network segmentation network to segment organs on abdominal CT that are important to pancreatobiliary interventions: the gastrointestinal tract (esophagus, stomach and duodenum), the pancreas, and anatomical landmark organs (liver, left kidney, spleen and stomach).

The data used to train the network comprised 90 abdominal CT with manual segmentations from two publicly available data sets (Landman et al., 2015; Roth et al., 2016), with additional manual segmentations performed at our centre.

Reference standard NiftyNet segmentation
Figure 4: Reference standard (left) and NiftyNet (right) multi-organ abdominal CT segmentation for the subject with Dice scores closest to the median. Each segmentation is shown with a surface rendering view from the posterior direction and with organ labels overlaid on a transverse CT slice.
Dice Relative Mean 95th Percentile
score Volume Absolute Hausdorff
Difference Distance Distance
(voxels) (voxels)
Spleen 0.94 0.03 1.07 2.00
L. Kidney 0.93 0.04 1.06 3.00
Gallbladder 0.79 0.17 1.55 4.41
Esophagus 0.68 0.57 2.05 6.00
Liver 0.95 0.02 1.42 4.12
Stomach 0.87 0.09 2.06 8.88
Pancreas 0.75 0.19 1.93 7.62
Duodenum 0.62 0.24 3.05 12.47
Table 1: Median segmentation metrics for 8 organs aggregated over the 9-fold cross-validation.

The network was trained and evaluated in a 9-fold cross-validation, using the network implementation available in NiftyNet. Briefly, the network, available as densevnet in NiftyNet, uses a V-shaped structure (with downsampling, upsampling and skip connections) where each downsampling stage is a dense feature stack (i.e. a sequence of convolution blocks where the inputs are concatenated features from all preceding convolution blocks), upsampling is bilinear upsampling and skip connections are convolutions. The loss is a modified Dice loss (with additional hinge losses to mitigate class imbalance) implemented external to NiftyNet and included via a reference in the configuration file. The network was trained for 3000 iterations on whole images (using the ResizeSampler) with random affine spatial augmentations.

Segmentation metrics, computed using NiftyNet’s evaluation action, and aggregated over all folds, are given in Table 1. The segmentation with Dice scores closest to the median is shown in Figure 4.

6.2 Image regression

Image regression, more specifically, the ability to predict the content of an image given a different imaging modality of the same object, is of paramount importance in real-world clinical workflows. Image reconstruction and quantitative image analysis algorithms commonly require a minimal set of inputs that are often not be available for every patient due to the presence of imaging artefacts, limitations in patient workflow (e.g. long acquisition time), image harmonization, or due to ionising radiation exposure minimization.

An example application of image regression is the process of generating synthetic CT images from MRI data to enable the attenuation correction of PET-MRI images (Burgos et al., 2014). This regression problem has been historically solved with patch-based or multi-atlas propagation methods, a class of models that are very robust but computationally complex and dependent on image registration. The same process can now be solved using the deep learning architectures similar to the ones used in image segmentation.

As a demonstration of this application, a neural network was trained and evaluated in a 5-fold cross-validation setup using the net_regress application in NiftyNet. Briefly, the network, available as highresnet in NiftyNet, uses a stack of residual dilated convolutions with increasingly large dilation factors (Li et al., 2017). The root mean square error was used as the loss function and implemented as part of NiftyNet as rmse. The network was trained for 15000 iterations on patches of size , and using the iSampler (Berger et al., 2017) for patch selection with random affine spatial augmentations.

Regression metrics, computed using NiftyNet’s ‘evaluation‘ action, and aggregated over all folds, are given in Table 2. The 25th and 75th percentile example result with regards to MAE is shown in Figure 5.

MRI Ground-truth CT Synthetic CT
Figure 5: The input T1 MRI image (left), the ground truth CT (centre) and the NiftyNet regression output (right).
NiftyNet pCT UTE
MAE Average 88 121 203
S.D 7.5 17 24
ME Average 9.1 -7.3 -132
S.D. 12 23 34
Table 2: The Mean Absolute Error (MAE) and the Mean Error (ME) between the ground truth and the pseudoCT in Hounsfield units, comparing the NiftyNet method with pCT  (Burgos et al., 2014) and the UTE-based method of the Siemens Biograph mMR.

6.3 Ultrasound simulation using generative adversarial networks

Generating plausible images with specified image content can support training for radiological or image-guided interventional tasks. Conditional GANs have shown promise for generating plausible photographic images (Mirza and Osindero, 2014). Recent work on spatially-conditioned GANs (Hu et al., 2017) suggests that conditional GANs could enable software-based simulation in place of costly physical ultrasound phantoms used for training. Here we present an example illustrating a pre-trained ultrasound simulation network that was ported to NiftyNet for inclusion in the NiftyNet model zoo.

The network was originally trained outside of the NiftyNet platform as described in (Hu et al., 2017)

. Briefly, a conditional GAN network was trained to generate ultrasound images of specified views of a fetal phantom using 26,000 frames of optically tracked ultrasound. An image can be sampled from the generative model based on a conditioning image (denoting the pixel coordinates in 3D space) and a model parameter (sampled from a 100-D Gaussian distribution).

The network was ported to NiftyNet for inclusion in the model zoo. The network weights were transferred to the NiftyNet network using NiftyNet’s restoreinitializer, adapted from Sonnet (Reynolds et al., 2017), which enables trained variables to be loaded from networks with different architectures or naming schemes.

The network was evaluated multiple times using the linearinterpolation

inference in NiftyNet, wherein samples are taken from the generative model based on one conditioning image and a sequence of model parameters evenly interpolated between two random samples. Two illustrative results are shown in Figure 

6. The first shows the same anatomy, but a smooth transition between different levels of ultrasound shadowing artifacts. The second shows a sharp transition in the interpolation, suggesting the presence of mode collapse, a common issue in GANs (Goodfellow, 2016).

Figure 6: Interpolated images from the generative model space based on linearly interpolated model parameters. The top row shows a smooth variation between different amounts of ultrasound shadow artefacts. The bottom row shows a sharp transition suggesting the presence of mode collapse in the generative model.

7 Discussion

7.1 Lessons learned

NiftyNet development was guided by several core principles that impacted the implementation. Maximizing simplicity for simple use cases motivated many implementation choices. We envisioned three categories of users: novice users who are comfortable with running applications, but not with writing new Python code, intermediate users who are comfortable with writing some code, but not with modifying the NiftyNet libraries, and advanced users who are comfortable with modifying the libraries. Support for pip installation simplifies NiftyNet for novice and intermediate users. In this context, enabling experimental manipulation of individual pipeline components for intermediate users, and downloadable model zoo entries with modified components for novice users required a modular approach with plugin support for externally defined components. Accordingly, plugins for networks, loss functions and even application logic can be specified by Python import paths directly in configuration files without modifying the NiftyNet library. Intermediate users can customize pipeline components by writing classes or functions in Python, and can embed them into model zoo entries for distribution.

Although initially motivated by simplifying variable sharing within networks, NiftyNet’s named conceptual blocks also simplified the adaptation of weights from pre-trained models and the TensorBoard-based hierarchical visualization of the computation graphs. The scope of each conceptual blocks maps to a meaningful subgraph of the computation graph and all associated variables, meaning that all weights for a conceptual block can be loaded into a new model with a single scope reference. Furthermore, because these conceptual blocks are constructed hierarchically through the composition of Layer objects and scopes, they naturally encode a hierarchical structure for TensorBoard visualization

Supporting machine learning for a wide variety of application types motivated the separation of the ApplicationDriver logic that is common to all applications from the Application logic that varies between applications. This facilitated the rapid development of new application types. The early inclusion of both image segmentation/regression (mapping from images to images) and image generation (mapping from parameters to images) motivated a flexible specification for the number, type and semantic meaning of inputs and outputs, encapsulated in the Sampler and Aggregator components.

7.2 Platform availability

The NiftyNet platform is available from The source code can be accessed from the Git repository888 or installed as a Python library using pip install niftynet. NiftyNet is licensed under an open-source Apache 2.0 license999 The NiftyNet Consortium welcomes contributions to the platform and seeks inclusion of new community members to the consortium.

7.3 Future direction

The active NiftyNet development roadmap is focused on three key areas: new application types, a larger model zoo and more advanced experimental design. NiftyNet currently supports image segmentation, regression, generation and representation learning applications. Future applications under development include image classification, registration, and enhancement (e.g. super-resolution) as well as pathology detection. The current NiftyNet model zoo contains a small number of models as proof of concept; expanding the model zoo to include state-of-the-art models for common tasks and public challenges (e.g. brain tumor segmentation (BRaTS) 

(Menze et al., 2015; Wang et al., 2017a)); and models trained on large data sets for transfer learning will be critical to accelerating research with NiftyNet. Finally, NiftyNet currently supports a simplified machine learning pipeline that trains a single network, but relies on users for data partitioning and model selection (e.g. hyper-parameter tuning). Infrastructure to facilitate more complex experiments, such as built-in support for cross-validation and standardized hyper-parameter tuning will, in the future, reduce the implementation burden on users.

8 Summary of contributions and conclusions

This work presents the open-source NiftyNet platform for deep learning in medical imaging. Our modular implementation of the typical medical imaging machine learning pipeline allows researchers to focus implementation effort on their specific innovations, while leveraging the work of others for the remaining pipeline. The NiftyNet platform provides implementations for data loading, data augmentation, network architectures, loss functions and evaluation metrics that are tailored for the idiosyncracies of medical image analysis and computer-assisted intervention. This infrastructure enables researchers to rapidly develop deep learning solutions for segmentation, regression, image generation and representation learning applications, or extend the platform to new applications.

Conflict of interest



The authors would like to acknowledge all of the contributors to the NiftyNet platform. This work was supported by the Wellcome/EPSRC [203145Z/16/Z, WT101957, NS/A000027/1]; Wellcome [106882/Z/15/Z, WT103709]; the Department of Health and Wellcome Trust [HICF-T4-275, WT 97914]; EPSRC [EP/M020533/1, EP/K503745/1, EP/L016478/1]; the National Institute for Health Research University College London Hospitals Biomedical Research Centre (NIHR BRC UCLH/UCL High Impact Initiative); Cancer Research UK (CRUK) [C28070/A19985]; the Royal Society [RG160569]; a UCL Overseas Research Scholarship, and a UCL Graduate Research Scholarship. The authors would like to acknowledge that the work presented here made use of Emerald, a GPU-accelerated High Performance Computer, made available by the Science & Engineering South Consortium operated in partnership with the STFC Rutherford-Appleton Laboratory; and hardware donated by NVIDIA.