Learning to synthesise the ageing brain without longitudinal data

12/04/2019 ∙ by Tian Xia, et al. ∙ 19

Brain ageing is a continuous process that is affected by many factors including neurodegenerative diseases. Understanding this process is of great value for both neuroscience research and clinical applications. However, revealing underlying mechanisms is challenging due to the lack of longitudinal data. In this paper, we propose a deep learning-based method that learns to simulate subject-specific brain ageing trajectories without relying on longitudinal data. Our method synthesises aged images using a network conditioned on two clinical variables: age as a continuous variable, and health state, i.e. status of Alzheimer's Disease (AD) for this work, as an ordinal variable. We adopt an adversarial loss to learn the joint distribution of brain appearance and clinical variables and define reconstruction losses that help preserve subject identity. To demonstrate our model, we compare with several approaches using two widely used datasets: Cam-CAN and ADNI. We use ground-truth longitudinal data from ADNI to evaluate the quality of synthesised images. A pre-trained age predictor, which estimates the apparent age of a brain image, is used to assess age accuracy. In addition, we show that we can train the model on Cam-CAN data and evaluate on the longitudinal data from ADNI, indicating the generalisation power of our approach. Both qualitative and quantitative results show that our method can progressively simulate the ageing process by synthesising realistic brain images.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

page 6

page 7

page 8

page 9

Code Repositories

BrainAgeing

This is the project about simulating the ageing brain.


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The ability to predict the future status of an individual is able to offer preventive and prognostic clinical guidance [zhang2016consistent]. Recently, deep generative models have been used to simulate and predict pathological processes within different clinical applications, for example, to synthesise the future degeneration of a human brain using existing scans [huizinga2018spatio, ravi2019degenerative, bowles2018modelling, ziegler2012models]. However, to accomplish such accurate estimation for specific subjects, current methods require considerable amount of longitudinal data to sufficiently learn a complex auto-regression function. Here, we propose a new conditional adversarial training procedure that does not require longitudinal data to train. Our model (illustrated in Fig. 1) synthesises images of aged brains for a specific age and health state.

Brain ageing, accompanied by a series of functional and physiological changes, has been intensively investigated [zecca2004iron, mattson2018hallmarks]. However, the underlying mechanism has not been completely revealed [lopez2013hallmarks, cole2019brain]. Prior studies have shown that brain’s chronic changes are related to different factors, e.g., the biological age [fjell2010structural], degenerative diseases such as Alzheimer’s Disease (AD) [jack1998rate], binge drinking [coleman2014adolescent], and even education [taubert2010dynamic]. Accurate simulation of this process has great value for both neuroscience research and clinical applications to identify age-related pathologies [cole2019brain, fjell2010structural]. In this work, we aim to build a multivariate model that predicts a brain image of an old age given a young brain.

One particular challenge is inter-subject variation: every individual has a unique ageing trajectory. Previous approaches built a spatio-temporal atlas to predict average brain images at different ages [davis2010population, huizinga2018spatio]

. However, the learnt atlas does not preserve subject-specific characteristics. Recent studies proposed subject-specific ageing progression with neural networks 

[ravi2019degenerative, rachmadi2019predicting], although they required longitudinal data to train. Such data are difficult and expensive to acquire, especially for longer time spans. Even in ADNI [petersen2010alzheimer], the most well-known large-scale dataset, the longitudinal images are acquired at discrete time points and cover only a few years. Finding longitudinal data with a time span sufficient to simulate long-term ageing remains an open challenge.

Fig. 1: Left: The input is a brain image , and the network synthesises an aged brain image from

, conditioned on the target health state vector

and target age difference between input and target ages, respectively. Right: For an image of a 26 year old subject, bottom row shows outputs given different target age. The top row shows the corresponding image differences to highlight progressive changes.

To overcome the aforementioned challenges, we propose a deep adversarial method that learns the joint distribution of brain appearance, age and health state (AD status in this paper) without requiring longitudinal data for training. A simplified schematic of our model is illustrated in Fig. 1 along with example results. Given a brain image, our model produces a brain of the same subject at a target age and a health state. The input image is first encoded into a latent space, that is modulated by two vectors of target age difference and health state respectively. The conditioned latent space is finally decoded to an output image. The quality of synthetic results is encouraged by a discriminator that judges whether an output image is representative of the distribution of brain images of the same age and health state. A typical problem in synthesis with cross-sectional data [ziegler2012models] is loss of subject identity111

A classical computer vision example is the synthesis of human faces with different attributes. Identity loss is an output image of a different person. Even in the analogous task of brain ageing, that of face ageing, humans still find difficult to assess identity loss.

, i.e. the synthesis of an output that may not correspond to the input. We propose, and motivate, two loss functions that help preserve

subject identity by essentially regularising the amount of change introduced by ageing. In addition, we motivate the design of our conditioning mechanisms and show that ordinal binary encoding for both discrete and continuous variables improves performance significantly.

We quantitatively evaluate the simulation results using longitudinal data from the ADNI dataset [petersen2010alzheimer]. Since the longitudinal data only cover a limited time span, and in order to evaluate simulated images across decades, we further pre-train a VGG-network [simonyan2014very] to estimate the apparent age from output images. The estimated ages are used as a proxy metric for the quality of output images in terms of age accuracy. We also show qualitative results, including ageing simulation on different health states and long-term ageing synthesis. Both quantitative and qualitative results show that our method outperforms benchmarks with more accurate simulations that capture the characteristics specific to each individual on different health states. Furthermore, we train our model on Cam-CAN and evaluate it on ADNI to demonstrate the generalisation ability to unseen data. Finally, we perform ablation studies to investigate the effect of loss components and different ways of embedding clinical variables into the networks.

Our contributions are summarised as follows:222We advance our preliminary work [xia2019consistent] in the following aspects: 1) we extend our model to condition on age and AD status, which enables more accurate simulation of ageing progression of different health states; 2) we introduce additional regularisation to smooth the simulated progression; 3) we offer more experiments and a detailed analysis of performance, using longitudinal data, including new metrics and additional benchmark methods for comparison.

  • We propose a deep learning model to simulate the brain ageing process trained on cross-sectional data.

  • Our model uses an embedding mechanism that teaches the network to learn the joint distribution of brain images, age and health state (AD status).

  • We design losses that help preserve subject identity in the output images.

  • We demonstrate our method’s robustness with extensive experiments on two publicly available datasets.

The manuscript proceeds as follows: Section II reviews related work on brain ageing research. Section III details the proposed method. Section IV describes the experimental setup and training details. Section V presents results and discussion. Finally, Section VI concludes the manuscript.

Ii Related Work

We classify existing methods into two categories:

age-prediction that estimate the apparent age from a brain image, and image-simulation that synthesise brain images conditioned on age and health state. We discuss these methods below.

Age-prediction methods: Early methods predict age using hand-crafted features with kernel regression [franke2010estimating] or with Gaussian Process Regression [cole2017predicting]. However, their performance often relies on the effectiveness of the hand-crafted features.

Recently, deep learning models have been used to estimate the brain age from imaging data. For example, [cole2017mortality] used a VGG-based [simonyan2014very] model to predict age and detect degenerative diseases, while [jonsson2019deep] proposed to discover genetic associations with the brain degeneration using a ResNet-based network [he2016deep]. Similarly, [cole2015prediction] used the age predicted by a deep network to detect traumatic brain injury. However, these methods did not consider the morphological changes of brain, which is potentially more informative [costafreda2011automated].

Image-simulation methods: Given variables such as the age, these methods aim to synthesise the corresponding brain image to enable visual observation of brain changes. For instance, patch-based dictionary learning[zhang2016consistent] or kernel regression [huizinga2018spatio, ziegler2012models, serag2012construction] was used to build spatio-temporal atlases of brains at different ages. However, these are population average atlases and thus are not able to capture brain ageing trajectories specific to each individual.

Fig. 2: An overview of the proposed method (training). is the input image; is the target health state; is the difference between the starting age and target age : ; is the output (aged) image (supposedly belong to the same subject as ) of the target age and health state . The Generator takes as input , and , and outputs ; the Discriminator takes as input a brain image and and , and outputs a discrimination score.

Very recently deep generative methods have been used for this task. While [rachmadi2019predicting] and [wegmayr2019generative] used formulations of Generative Adversarial Networks (GAN) [goodfellow2014generative] to estimate brain changes, others [ravi2019degenerative]

used conditional adversarial autoencoder, following a recent face ageing work 

[zhang2017age], as the generative model. Irrespective of the model, these methods require longitudinal data, which limits their applicability. In [bowles2018modelling], a GAN-based method is trained to add or remove the atrophy patterns in the brain using image arithmetics, although the atrophy patterns were modelled in a linear way and the morphological changes were assumed to be the same for all subjects. In [milana2017deep], a Variational Autoencoder (VAE) was used to synthesise aged brain images, but the target age is not controlled, and the quality of the synthesised image appears poor (blurry). Similarly, [zhao2019variational] used a VAE to disentangle the spatial information from temporal progression, then used the first few layers of the VAE as feature extractor to improve the age prediction task. In summary, most previous methods either built atlases [zhang2016consistent, huizinga2018spatio, ziegler2012models, serag2012construction], or required longitudinal data [rachmadi2019predicting, ravi2019degenerative, wegmayr2019generative] to simulate the brain ageing process. Other methods that did not need longitudinal data [bowles2018modelling, zhao2019variational, milana2017deep], on the other hand, produced blurry images and lost subject identity.

Our approach: To address these shortcomings, we propose a conditional adversarial training procedure that learns to simulate the brain ageing process by being specific to the input subject, and by learning from cross-sectional data without requiring longitudinal observations.

Iii Proposed approach

Iii-a Problem statement, notation and overview

We denote a brain image as (and their distribution such that ), where are the subject’s clinical variables including the corresponding age and health state (AD status) . Given a brain image of age and health state , we want to synthesise a brain image of target age and health state . Critically, the synthetic brain image should retain the subject identity, i.e. belong to the same subject as the input , throughout the ageing process. The contributions of our approach, shown in Fig. 2, are the design of the conditioning mechanism; our model architecture that uses a Generator to synthesise images, and an adversary, a Discriminator, to help learn the joint distribution of clinical variables and brain appearance; and the losses we use to guide the training process. We detail all these below.

Iii-B Conditioning on age and health state

In our previous work [xia2019consistent], we simulate the ageing brain with age as the single factor. Here, we extend our previous approach by involving the health state, i.e. AD status, as another factor to better simulate the ageing process.

Fig. 3: Ordinal encoding of age and health state. Left shows how we represent age using a binary vector with first elements as 1 and the rest as 0; Right is the encoding of health state, where we use a vector to represent three categories of AD status from CN (healthy) to AD.

We use ordinal binary vectors, instead of one-hot vectors as in [zhang2017age], to encode both age and health state, which are embedded in the bottleneck layer of the Generator and Discriminator (detailed in Section III-C). We assume a maximal age of 100 years and use a vector to encode age . Similarly, we use a vector to encode health state. A simple illustration of this encoding is shown in Fig. 3. An ablation study presented in Section V-C illustrates the benefits of ordinal v.s. one-hot encoding.

Iii-C Proposed model

The proposed method consists of a Generator and a Discriminator. The Generator synthesises aged brain images corresponding to a target age and a health state. The Discriminator has a dual role: firstly, it discriminates between ground-truth and synthetic brain images; secondly, it ensures that the synthetic brain images correspond to the target clinical variables. The Generator is adversarially trained to generate realistic brain images of the correct target age. The detailed network architectures are shown in Fig. 4.

Fig. 4: Detailed architectures of Generator and Discriminator. The Generator contains three parts: an Encoder to extract latent features; a Transformer to involve target age and health state; and a Decoder to generate aged images. Similarly, we use the same conditioning mechanism for the Discriminator to inject the information of age and health state, and a long skip connection to better preserve features of input image.

Iii-C1 Generator

The Generator takes as input a 2D brain image , and ordinal binary vectors for target health state and age difference . Here, we condition on the age difference between input age and target age : , such that when input and output ages are equal , the network is encouraged to recreate the input. The output of is a 2D brain image corresponding to the target age and health state.333Note that the target health state can be different from the corresponding input state. This encourages learning a joint distribution between brain images and clinical variables.

has three components: the Encoder , the Transformer and the Decoder . first extracts latent features from the input image . involves the target age and health state into the network. Finally, generates the aged brain image from the bottleneck features. To embed age and health state into our model, we first concatenate the latent vector , obtained by , with the health state vector . The concatenated vector is then processed by a dense layer to output latent vector , which is then concatenated with the difference age vector . Finally, the resulting vector is used to generate the output image.444We tested the ordering of and , and it did not affect the results. We also tried to concatenate , and together into one vector, and use the resulting vector to generate the output. However, we found that the model tended to ignore the information of . This might be caused by the dimensional imbalance between () and (). We adopt long-skip connections [ronneberger2015u] between layers of and to preserve details of the input image and improve the sharpness of the output images. Overall, the Generator’s forward pass is: .

Iii-C2 Discriminator

Similar to the Generator, the Discriminator contains three subnetworks: the Encoder that extracts latent features, the Transformer that involves the conditional variables, and the Judge that outputs a discrimination score.

Note that is conditioned on the target age instead of age difference , to learn the joint distribution of brain appearance and age, such that it can discriminate between real and synthetic images of correct age. The forward pass for the Discriminator is and .

Iii-D Losses

We train with a multi-component loss function containing adversarial, identity-preservation and self-reconstruction losses. We detail these below.

Iii-D1 Adversarial loss

We adopt the Wasserstein loss with gradient penalty [gulrajani2017improved] to predict a realistic aged brain image and force the aged brain to correspond to the target age and health state :

(1)

where is the output image: (), is a ground truth image from another subject of target age and health state , is the average sample defined by . The first two terms measure the Wasserstein distance between ground-truth and synthetic samples; the last term is the gradient penalty involved to stabilise training. As in [gulrajani2017improved, baumgartner2018visual] we set .

Fig. 5: Illustration of ageing trajectories for two subjects. For a subject of age (A), the network can learn a mapping from A to C, which could still fool the Discriminator, but loses the identity of Subject 1 (orange line).

Iii-D2 Identity-preservation loss

While encourages the network to synthesise realistic brain images, these images may lose subject identity. For example, it is easy for the network to learn a mapping to an image that corresponds to the target age and health state, but belongs to a different subject. An illustration is presented in Fig. 5, where ageing trajectories of two subjects are shown. The task is to predict the brain image of subject 1 at age starting at age , by learning a mapping from point A to point B. But there are no ground-truth data to ensure that we stay on the trajectory of subject 1. Instead, the training data contain brain images of age belonging to subject 2 (and other subjects). Using only , the Generator may learn a mapping from A to C to fool the Discriminator, which will lose the identity of subject 1. In order to alleviate this and encourage the network to learn mappings along trajectory (i.e. from A to B), we adopt:

(2)

where is the input image of age and is the output image of age (). The term encourages that should positively correlate with the difference between and . Note that the health state is not involved in . The exponential term does not aim to precisely model the ageing trajectory. Instead, it is used to encourage identity preservation by penalising heavy transformations between close ages, and also stabilise training. A more accurate ageing prediction, that is also correlated with the health state, is achieved by the adversarial learning. An ablation study illustrating the critical role of is presented in Section V-C.

Iii-D3 Self-reconstruction loss

When the target age and health state are the same as the input, the Generator should reconstruct the input image. We use a self-reconstruction loss to explicitly encourage this input reconstruction:

(3)

where and are the input and output images respectively, of the same age and health state. Although is similar to , their roles are different: helps to preserve subject identity when generating aged images, while encourages the simulation of a smooth progression via self-reconstruction. An ablation study on in Section V-C shows the importance of stronger regularisation.555In our previous work [xia2019consistent], Eq. 2 did not have the constraint and might randomly include the case of to encourage self-reconstruction. However, as shown in Section V-C, stronger regularisation is necessary.

Iv Experimental setup

Datasets: We use two datasets, as detailed below.

Cambridge Centre for Ageing and Neuroscience (Cam-CAN) [camcan2015ageing] is a cross-sectional dataset containing normal subjects aged 17 to 85. To improve age distribution, we discarded subjects under 25 or over 85 years old. We split subjects into different age groups spanning 5 years. We randomly selected 38 volumes from each group and used 30 for training and 8 for testing. We use Cam-CAN to demonstrate consistent brain age synthesis across the whole lifespan.

Alzheimer’s Disease Neuroimaging Initiative (ADNI) [petersen2010alzheimer] is a longitudinal dataset, which contains cognitively normal (CN) subjects, subjects with mild cognitive impairment (MCI) and subjects with AD. We use ADNI to demonstrate brain image synthesis, conditioned on different health states. Since ADNI has longitudinal data, we used these data to quantitatively evaluate the quality of synthetically aged images. We chose 786 subjects as training (279 CN, 260 MCI, 247 AD), and 136 subjects as testing data (49 CN, 46 MCI, 41 AD).

Pre-processing: All volumetric data are skull-stripped using DeepBrain666https://github.com/iitzco/deepbrain, and linearly registered to MNI 152 space using FSL-FLIRT [woolrich2009bayesian]. We normalise brain volumes by clipping the intensities to , where is the 99.5% largest intensity value within each volume, and then rescale the resulting intensities to the range . We select the middle 60 axial slices from each volume, and crop each slice to the size of . During training, we only use cross-sectional data, i.e. one subject only has one volume of a certain age. During testing, we use the longitudinal ADNI data covering more than 2 years, and discard data where images are severely misaligned due to registration errors.

Benchmarks: We compare with the following benchmarks777We also experimented with the official implementation of [milana2017deep] however our experiments confirmed the poor image quality reported by the author.:
Conditional GAN: This approach from image translation [mirza2014conditional], estimates an output conditioned on the input. To make it comparable with our method, we train different Conditional GANs for transforming young images to different older age groups. Therefore, a single model of ours is compared with age-group specific Conditional GANs.
CycleGAN: We compare with CycleGAN [zhu2017unpaired], where there are two translation paths: from ‘young’ to ’old’ to ‘young’, and from ‘old’ to ‘young’ to ‘old’. Similarly to Conditional GAN, we train several CycleGANs for different target age groups.

CAAE: We compare with [zhang2017age], a recent paper for face ageing synthesis. We use the official implementation888https://zzutk.github.io/Face-ageing-CAAE/, modified to fit our input image shape. This method used a Conditional Adversarial Autoencoder (CAAE) to perform face ageing synthesis by concatenating a one-hot age vector with the bottleneck vector. They divided age into discrete categories with each category containing one age group.

Implementation details: The overall training loss is:

(4)

where , and are hyper-parameters used to balance each loss. The parameters are chosen experimentally. We chose as 100 following [baumgartner2018visual, xia2019consistent], and as a smaller value 10 as we focus more on brain ageing synthesis rather than reconstruction.

Fig. 6: Example results of subjects with ground-truth follow-up studies. We predict output from input using benchmarks and our method. We also show errors between the outputs and the ground-truths as . We can observe that our method achieves the most accurate results outperforming our previous method [xia2019consistent] and benchmarks. For more details see text.

During training, we divide subjects into a young group and an old group, and randomly draw a young sample and an old sample to synthesise the aged image of target age and health state . Note that , and could be different than

. We train all methods for 600 epochs. We update the generator and discriminator in an iterative fashion 

[arjovsky2017wasserstein, goodfellow2014generative]

. Since the discriminator in Wasserstein GAN needs to be close to optimal during training, we updated the discriminator for 5 iterations per generator update. Initially, for the first 20 epochs, we updated the discriminators for 50 iterations per generator update. We use Keras 

[chollet2015keras] and train with Adam [kingma2015adam] with a learning rate of 0.0001 and decay of 0.0001. Code will be made publicly available at https://upon.acceptance.

Evaluation metrics: To evaluate the quality of synthetically aged images, we first use the longitudinal data from ADNI dataset. We select follow-up studies that cover more than 2 years to allow observable neurodegenerative changes to happen. We used standard definitions of mean squared error (MSE),

peak signal-to-noise ratio

(PSNR) and structural similarity (SSIM) of window length of 11 [wang2003multiscale] to evaluate the closeness of the predicted images to the ground-truth.

However, longitudinal data in ADNI only cover a short time span. Therefore, we also propose a proxy metric to evaluate the output images. We first pre-train a VGG-like [simonyan2014very] network to predict age from brain images, then use this age predictor, , to estimate the apparent age of output images. The difference between the predicted and desired target age is then used to evaluate how close the generated images are to the target age. Formally, we define predicted age difference

(PAD):

. The age predictor is pre-trained on Cam-CAN and healthy (CN) ADNI training data, and achieves an average testing error of 5.1 years.

Statistics: All results are obtained on testing sets. We use bold

font to denote the best performing method (for each metric) and an asterisk (*) to denote statistical significance. We use a paired t-test (at 5% level assessed via permutations) to test the null hypothesis that there is no difference between our methods and the best performing benchmark.

Fig. 7: Brain ageing progression for one healthy (CN) subject (at age 70) from ADNI dataset. We synthesise the aged images at different target ages on different health states : CN, MCI and AD, respectively. We also visualise the difference between and , , and show the predicted ages of by our pre-trained age predictor (white text on top of each brain). For more details see text.

V Results and discussion

We start our experiments showing quantitative and qualitative results on ADNI and then Cam-CAN. We then conclude with several ablation studies to illustrate the importance of each component in our model.

SSIM PSNR MSE PAD
Cond. GAN
CycleGAN
CAAE [zhang2017age]
Ours[xia2019consistent]
Ours
TABLE I: Quantitative evaluation on ADNI dataset.

V-a Brain ageing synthesis on different health states (ADNI)

In this section, we train and evaluate our model on ADNI dataset, which contains CN, MCI and AD subjects. Our model is trained only on cross-sectional data. The results and discussions are detailed below.

Fig. 8: Long-term brain ageing synthesis on Cam-CAN dataset. We synthesise the aged images at different target ages and show the difference between input images and , . For more details see text.
Fig. 9: Ablation studies for loss components. Left: ablation study of . Top row shows that without , the network can lose the subject identity. Bottom row shows that the use of can enforce the preservation of subject identity, such that the changes as ages are smooth and consistent. Right: ablation study on . When is not used (top two rows), there are sudden changes at the beginning of ageing progression simulation (even at the original age), which hinders the preservation of subject identity. In contrast, when is used (bottom two rows), the ageing progression is smoother, which demonstrates better identity preservation.

V-A1 Quantitative results

The quantitative results are shown in Table I employing the metrics defined in Section IV. We can observe that our method achieves the best results in all metrics, with second best being the previous (more simple incarnation) [xia2019consistent] of the proposed model. Clearly, embedding health state improves performance. The third best results are achieved by CAAE [zhang2017age], where age is divided into 10 categories and represented by a one-hot vector. To generate the aged images at the target age (the age of the follow-up studies), we use the category which the target age belongs to, i.e. if the target age is 76, then we choose the category of age 75-78. We see the benefits of encoding age into ordinal vectors, where the difference between two vectors positively correlates with the difference between two ages in a finely-grained fashion. CycleGAN and Conditional GAN achieve the poorest results unsurprisingly, since conditioning here happens explicitly by training separate models according to different age groups.

V-A2 Qualitative results

Visual examples on two images from ADNI, are shown in Fig. 6. For both examples, our method generates most accurate predictions, followed by our previous method, offering visual evidence to the observations above. The third best results are achieved by CAAE, where we can see more errors between prediction and ground-truth . CycleGAN and Conditional GAN produced the poorest output images, with observable structural differences from ground-truth, which indicates the loss of subject identity.

Furthermore, we demonstrate visual results of the same subject on different target health states , as shown in Fig. 7. We can observe that for all , the brain changes gradually as increases. However, the ageing rate is different on different health states. Specifically, when is CN, the ageing rate is slower than that of MCI and AD; when is AD, the ageing process achieved the fastest rate. We also report the estimated ages of these synthetic images by . The results show that the predicted ages on AD are ‘older’ than the target ages and those on CN and MCI, which is consistent with the fact that AD accelerates brain ageing [petersen2010alzheimer].

V-B Long term brain ageing synthesis

Here we apply our model on Cam-CAN dataset where no longitudinal data are available. The qualitative and quantitative results are detailed below.

Fig. 10: Example results for one-hot v.s. ordinal encoding on Cam-CAN dataset. We synthesise aged images at different target ages with one-hot encoding v.s. ordinal encoding. We also visualise the difference between input image and , , and we report the estimated age of these images by the pre-trained age predictor (shown as white text on top of each brain).

V-B1 Qualitative results

In Fig. 8, we demonstrate the simulated brain ageing process throughout the whole lifespan, where the input images are two young subjects from Cam-CAN dataset. We observe that the output images gradually change as increases, with ventricular enlargement and brain tissue reduction. This change pattern is consistent with previous studies [good2001voxel, mietchen2009computational], implying that our method learns to consistently synthesise the ageing brain throughout lifespan even trained on cross-sectional data.

SSIM PSNR MSE PAD
Cond. GAN
CycleGAN
CAAE[zhang2017age]
Ours[xia2019consistent]
Ours
TABLE II: Quantitative evaluation of methods trained on Cam-CAN and evaluated on ADNI.

V-B2 Quantitative results (generalisation performance on ADNI)

Cam-CAN is a cross-sectional dataset. To quantitatively evaluate the quality of synthetic images, we use the longitudinal portion of ADNI data, and specifically only the CN cohort, to demonstrate the generalisation performance. The results are shown in Table II. We observe that though our model is trained and evaluated on different datasets, it still achieves comparable results with those of Table I and outperforms benchmarks. This indicates the generalisation ability of our model to unseen data.

SSIM PSNR MSE
TABLE III: Quantitative evaluation when training with different combinations of cost functions.

V-C Ablation studies

We perform ablation studies on loss components and embedding mechanisms to inject clinical variables into networks.

V-C1 Effect of loss components

We demonstrate the effect of and by assessing the model performance when each component is removed. In Table III we show quantitative results on ADNI dataset. In Fig. 9 we illustrate qualitative results on Cam-CAN dataset to visualise the effect. We can observe that the best results are achieved when all loss components are used. Specifically, without , the synthetic images lost subject identity severely throughout the whole progression, i.e. the output image appears to come from a different subject; without , output images suffer from sudden changes at the beginning of progression, even when . Both quantitative and qualitative results show that the design of and improves preservation of subject identity and enables more accurate brain ageing simulation.

SSIM PSNR MSE PAD
One-hot
concatall
Ours
TABLE IV: Quantitative results of different embedding mechanisms.

V-C2 Effect of different embedding mechanisms

We investigate the effect of different embedding mechanisms. Our embedding mechanism is described in Section III. Here, we first perform experiments where we use one-hot v.s. ordinal vectors to encode age and health state. The qualitative results are shown in Fig. 10. We can see when we use one-hot vectors to encode age and health state, the network still generates realistic images, but the ageing progression is not consistent, i.e. synthetic brains appear to have ventricle enlarging or shrinking in random fashion across age. In contrast, with ordinal encoding, the model simulates the ageing process consistently. This observation is confirmed by the estimated ages of the output images by .

We also compare with an embedding strategy where we concatenate , and the bottleneck latent vector together, and the concatenated vector is processed by the Decoder to generate the output image. We refer to this embedding strategy as . We found with , the network tends to ignore the health state vector and only use the information of . This can be caused by the dimensional imbalance between () and (. The quantitative results on ADNI are shown in Table IV. We can see that when one-hot encoding is used, the results dropped significantly, followed by , confirming our observation.

Vi Conclusion

We present a method that learns to simulate subject-specific aged images without longitudinal data. Our method relies on a Generator to generate the images and a Discriminator that captures the joint distribution of brain images and clinical variables, i.e. age and health state (AD status). We propose an embedding mechanism to encode the information of age and health state into our network, and age-modulated and self-reconstruction losses to preserve subject identity. We present qualitative results showing that our method is able to generate consistent and realistic images conditioned on the target age and health state. We evaluate with longitudinal data from ADNI and a proposed numerical evaluation metric for age accuracy. We demonstrate on ADNI and Cam-CAN datasets that our model outperforms benchmarks both qualitatively and quantitatively and via a series of ablations illustrate the importance of each design decision.

We see several avenues for future improvements by us or the community. Conditioning mechanisms that reliably embed prior information into neural networks enabling finer control over the outputs of models are of considerable interest in deep learning. In this paper we design a simple, yet effective, way to encode both age (continuous) and AD status (ordinal) factors into the image generation network. However, incorporating additional clinical variables, e.g. gender, genotypes, education, etc., could be inefficient with our current approach as it may involve more dense layers. While new techniques are available [huang2017arbitrary, perez2018film, park2019semantic, lee2019tetris] and some prior examples on few conditioning variables [jacenkow2019conditioning] or disentanglement [chartsias2019disentangled] are promising, their utility in integrating clinical variables with imaging data is under investigation. Although we used brain data, the approach could be extended to other organs or estimating the future state of pathology. Finally, despite our efforts to introduce 3D networks, this work remains 2D: the parameter space exploded due to the size of 3D networks.

References