Generative Spatiotemporal Modeling Of Neutrophil Behavior

04/02/2018 ∙ by Narita Pandhe, et al. ∙ University of Georgia 0

Cell motion and appearance have a strong correlation with cell cycle and disease progression. Many contemporary efforts in machine learning utilize spatio-temporal models to predict a cell's physical state and, consequently, the advancement of disease. Alternatively, generative models learn the underlying distribution of the data, creating holistic representations that can be used in learning. In this work, we propose an aggregate model that combine Generative Adversarial Networks (GANs) and Autoregressive (AR) models to predict cell motion and appearance in human neutrophils imaged by differential interference contrast (DIC) microscopy. We bifurcate the task of learning cell statistics by leveraging GANs for the spatial component and AR models for the temporal component. The aggregate model learned results offer a promising computational environment for studying changes in organellar shape, quantity, and spatial distribution over large sequences.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Polymorphonuclear neutrophil granulocytes (neutrophils) are the most abundant white blood cells in most mammals. They are highly motile phagocytic cells that constitute the first line of defense of the innate immune system [1]. Study of neutrophils and their underlying motion patterns provide insights into a host's response and behavior as a function of specific stimulus. Our understanding of cell behavior and the sources of cellular variation can be significantly aided and tested using cell modeling and simulations [2].

Recently, generative models have been extensively utilized for natural images. Examples include Variational Autoencoders

[3] and Generative Adversarial Networks (GANs) [4]. Generative models have the ability to learn the underlying statistical distributions over data and, thus, can generate exemplars of the true data set. It can learn sophisticated conditional relationships as well. In 2014, [4] proposed Generative Adversarial Networks (GANs), a framework for learning generative models. GANs do not rely on training objectives related to log-likelihood. Instead, GAN training can be seen as a competitive game between two models: the generator () and the discriminator (). Deep Convolutional GANs (DCGANs) [5] train convolutional networks in adversarial settings in order to generate natural images from CelebA [6], LSUN [7]

, and Imagenet datasets

[8]. [9] applied GANs to biological images to study the coexistence of proteins. Initial GAN models suffered from issues including training instabilities and mode collapse, making them harder to use. Active areas of research include novel applications, optimizing the network architecture, developing best training practices, and improving the cost function.

For computer vision systems, motion synthesis is still a challenging task and is drawing more contemporary research attention. Synthesis can be defined as generating new versions of a dataset which follow the distribution of original and is closely related to modeling.

[10] presents an algorithm that synthesizes motions based on annotations that describe it. Motion is constructed by splitting segments of movement from a corpus of motion data and assembling them. Each segment is modeled using an autoregressive process. This helps in modeling complicated non-stationary sequences which a single autoregressive process cannot handle. In [11], frames of original videos are projected into low-dimensional space and then learned as an AR model. [12] extends this approach by overcoming the problems of non-linearities in the data either using a spline-fitting approach or a combined appearance model. Many approaches have employed GANs for video generation and frame prediction. [13] utilizes two convolutional networks, separating foreground and background imagery, to learn directly from a massive dataset of real-world videos.

In this work, we simulate the behavior of human neutrophils. Considering the limited dataset available, we propose an aggregate application of GANs and AR models. We bifurcate our approach as two tasks: generating neutrophil's appearance and its motion, capturing the statistics independently. The GAN learns the appearance and spatial statistics, while the AR model captures the temporal aspect. Simulation is then achieved by sampling a point from the appearance space given the temporal dynamics up to the last observation.

2 Related Works

Several computational methods have been proposed for constructing from image data, statistical models of cellular and subcellular structures. General shape models such as Active Shape models [14] and cell shape model conditional on the nucleus shape [15] have been used. To our knowledge, the closest related literature is comprised of [9] for biological image synthesis and [13] for motion synthesis. Differences to [9] include the following. (1) Our GAN architecture is based on DCGANs, while theirs is a modified DCGAN for channel separation. (2) They apply GANs to samples from fluorescent microscopic images consisting of two channels, red and green. We use GANs for DIC microscopy images consisting of a single channel. They tackle a more difficult problem: using the information contained in the red channel learn how to generate a cell with several green-labeled proteins together. We are modeling single channel cell images.

Like this work, [13]

uses a two stream model, but differs as follows. (1) They use Long Short Term Memory Networks (LSTMs), while we use DCGANs for content and derive AR processes from motion. (2) Their network learns the temporal dynamics directly from raw pixels, using identified features combined with spatial features to make pixel-level predictions. We assume the background is stationary and only the foreground cells move. All the pixels of the foreground cells move similarly, so we pool them together into an AR model.

Figure 1:

Real (left) and synthesized (right) images of neutrophil. The synthetic images were created using DCGAN combined with Improved WGAN loss function.

3 Data

Videos imaging the two dimensional motion of human neutrophilic granulocytes are provided by Balazs Rada (Department of Infectious Diseases, University of Georgia). The videos are recorded using DIC microscopy which is used to enhance the contrast in unstained, transparent samples. The dataset consists of 11 videos, including 3 videos of normal neutrophils and 8 videos of neutrophils treated with an inhibitor, MRS2578, targeting a purinergic receptor. Duration of most of the videos is 3.0secs. We extracted frames of 1024x1024 resolution at 20fps. Individual cells were segmented using fully convolutional DenseNets [16], centered and resized to 64x64 resolution, resulting in 17280 total grayscale images.

4 Methods

4.1 GANs for Cell Image Synthesis

GANs consists of 2 neural networks competing against each other: Generator (

) and Discriminator (). generates images from random noise. While doing so, it tries to get as close as it can, to the distribution of real images. classifies between the real images and fake images generated by . Both are trying to perform best at their respective tasks and maximise their gains. is characterized as adversarial loss for training .

Formally, consider a set of training images, coming from a real distribution . The generator is a neural network parametrized by and discriminator is a neural network parametrized by . takes in random noise from the distribution and generates images . takes images from and , both, and outputs a scalar between [0, 1]. The output is higher if the sample belongs to else . Both and are trained simultaneously. The goal of

is to maximize the probability of assigning correct labels to an input while

minimizes . As a result and can be seen as playing a minimax game, as formulated in 4.1. Historical attempts to scale up GANs using CNNs to model images have been generally unsuccessful [4]. DCGANs [5] identified a family of architectures that resulted in stable training and can generate higher resolution images. We have adopted the architecture of DCGAN for both generator and discriminator.

4.1 can be reformulated via minimization of the Jensen Shannon (JS) divergence between the data-generating distribution and the distribution induced by and . [17] theoretically justified that JS minimized by GANs behaves badly and is potentially not continuous w.r.t to the generator’s parameters. They propose using an alternative distance - Earth Mover's distance (EM) also known as Wasserstein Distance, W(q,p). Since, computing Wasserstein distance is intractable, [17] shows an approximate solution to the same using Kantorovich-Rubinstein duality, wherein is the set of 1-Lipschitz functions. To enforce the Lipschitz constraint authors propose to clip the weights of the critic ( referred as critic because it's not trained to classify) within a compact space . Recently, [18] proposed an alternative way to enforce the Lipschitz constraint. Instead of weight clipping, they penalize the norm of the critic's gradient with respect to its input, for random samples . The objective function 4.1

leads stable training of a wide variety of GAN architectures with almost no hyperparameter tuning.

4.1.1 Experiments

We evaluated performances of models based on DCGAN architecture trained with GAN, Wasserstein GAN (WGAN), and Improved WGAN loss functions [4, 17, 18]. To evaluate the performance of GANs we utilize the optimization-based approach discussed by [9] to check if the test samples can be reconstructed well. To test for mode collapse, a common failure in GANs, for a fixed trained generator

we examine how well it can reconstruct images from a held out test set. For each image in the test set, we minimize the L2-distance between the generated and test images w.r.t. the noise vector

. We use 50 iterations of L-BFGS and select the best reconstruction out of 3 runs. We also report the negative log likelihood (NLL) w.r.t. the prior of the noise vectors .

Figure 2: Reconstruction errors against negative log likelihood (NLL) of the latent vectors found by reconstruction are displayed. The vertical blue line shows the mean L2-error. Horizontal gray line show mean NLL (std) of the noise sampled from the Gaussian prior. Lower values for both are better.

4.1.2 Latent Space Walk

We can interpolate between points in the latent space and understand the landscape. Walking the manifold can identify if there are any sharp transitions and whether the network has memorized. If walking the latent space results in smooth semantic changes to the image generations we can reason that the model has learned relevant, interesting representations

[5].

Figure 3: Interpolation between a series of 10 random points in the latent space depicts that the space learned has smooth transitions. Top row depicts the starting location for each of the 10 points. Last row depicts the respective ending locations.

4.2 AR for Cell Motion Synthesis

Figure 4: 2D trajectory plots of normal neutrophil and inhibitor-treated (MRS) neutrophil. The inhibitor-treated (MRS) neutrophil tend to exhibit less movements in comparison to the normal ones.
(Eq. 3)
(Eq. 4)

Different motion patterns are observed based on the cell conditions. We build a global motion pattern for normal and inhibited cells respectively, because we assume all the pixels(under the same conditions) move similarly. Based on the existing motion characteristics, new sequences can be synthesized for the corresponding cells. AR models are linear dynamical systems and are able to model a pattern of points in a particular space having a temporal component.

An AR process for a series of points in a -dimensional space can be modelled as Eq. 3, Eq. 4. Eq. 3 decomposes each video frame into a low-dimensional state vector

and a white noise term

. Eq. 4 denotes new state is a function of the sum of of its previous states , each multiplied by corresponding coefficients [19]. The noise terms and represent the residual difference between the observed data and the solutions to the linear equations, assumed to be Gaussian White noise.

Figure 5: Histograms show the distributions of values taken by normal (gray) and inhibitor-treated (black) neutrophil for top 5 principal components.

4.2.1 Experiments

Neutrophil motion is represented as trajectories of individual cells consisting of its center Cartesian coordinates across all the frames. Trajectories belonging to normal and inhibited cells are pooled separately and then projected into an eigenspace using SVD, yielding the principal components

. Subsequently, AR coefficients are determined. Parameter determines the dimensionality of the subspace ; parameter determines the order of AR coefficients . We performed grid search over and .

Figure 6: Neutrophil motion using the first three dimensions of the subspace of the AR model for normal (left) and inhibitor-treated (right). This motion is governed by the AR coefficients.

4.3 Synthesis

Synthesized neutrophil behavior consists of two parts: content and appearance is sampled from our trained generator and motion is sampled from a point in subspace . Using Eq. 4

, we iteratively generate different sequences. These new sequences are then projected back into the original space, leading to a new motion pattern synthesized entirely from the eigenvector information. The separation of motion and appearance in two streams enable GANs and AR process to identify the respective key features. This results in movement of only the foreground cells and leaves the rest untouched. It also gives an advantage of synthesizing video clips of different cells following different trajectories but nonetheless looks similar to the existing motion patterns.

Figure 7: Sample results of appearance and motion synthesis.

5 Conclusion

In this paper we presented a two stream approach to simulate human neutrophil behavior. Owing to the very limited data at our disposal, we utilized GANs to learn the spatial statistics and AR models to learn the temporal statistics. Bifurcation of appearance and motion allows a controlled video generation process. This work can enable us to quantify changes in organellar appearance, spatial distribution and help in understanding how subsets of the organellar ensembles evolve, improving our understanding of cellular mechanisms as they respond to their environments.

6 Acknowledgments

We thank R. Ceren for constructive criticism of this manuscr-
-ipt. This work was supported in part by AWS in Education Grant Award. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.

References