Microscopy Cell Segmentation via Adversarial Neural Networks

by   Assaf Arbelle, et al.

We present a novel approach for the segmentation of microscopy images. This method utilizes recent development in the field of Deep Artificial Neural Networks in general and specifically the advances in Generative Adversarial Neural Networks (GAN). We propose a pair of two competitive networks which are trained simultaneously and together define a min-max game resulting in an accurate segmentation of a given image. The system is an expansion of the well know GAN model to conditional probabilities given an input image. This approach has two main strengths as it is weakly supervised, i.e. can be easily trained on a limited amount of data, and does not require a definition of a loss function for the optimization. Promising results are presented. The code is freely available at: https://github.com/arbellea/DeepCellSeg.git



There are no comments yet.


page 6


Fluorescence Microscopy Image Segmentation Using Convolutional Neural Network With Generative Adversarial Networks

Recent advance in fluorescence microscopy enables acquisition of 3D imag...

AxonDeepSeg: automatic axon and myelin segmentation from microscopy data using convolutional neural networks

Segmentation of axon and myelin from microscopy images of the nervous sy...

Adversarial Segmentation Loss for Sketch Colorization

We introduce a new method for generating color images from sketches or e...

Exploring Generative Adversarial Networks for Image-to-Image Translation in STEM Simulation

The use of accurate scanning transmission electron microscopy (STEM) ima...

DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data

Despite over two decades of progress, imbalanced data is still considere...

UDBNET: Unsupervised Document Binarization Network via Adversarial Game

Degraded document image binarization is one of the most challenging task...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Live cell microscopy imaging is a key component in the biological research process. However, without the proper analysis tools, the raw images are a diamond in the rough. One must obtain the segmentation of the raw images defining the individual cells prior to calculation of the cells’ properties. Manual segmentation is infeasible due to the large quantity of images and cells per image.

Automatic segmentation tools are available and roughly split into two groups, supervised and unsupervised methods. The methods vary and include: automatic gray level thresholding [1], the watershed algorithm [2] and Active Contours [3, 4]. Another approach is to support the segmentation algorithm with temporal information from tracking algorithms as was proposed by [5, 6]. All these methods assume some structure in the data that may not fit every case.

Supervised methods, on the other hand, do not assume any structure rather aim to learn it from the data. Classic machine learning methods generally require two independent steps, feature extraction and classification. In most cases the feature extraction is based either on prior knowledge of the image properties such as in

[7] or general image properties such as smoothing filters, edge filters, etc. A widely used toolbox which takes a pixel classification approach is Ilastik [8]

, using a random forest classifier trained on predefined features extracted from a user’s scribbles on the image.

Recent developments in the computer vision community have shown the strength Convolutional Neural Networks (CNNs) which surpass state of the art methods in object classification

[9], semantic segmentation [10] and many other tasks. Recent attempts at cell segmentation using CNNs include [11, 12]. The common ground of all CNN methods is the need for an extensive training set alongside a predefined loss function such as the cross-entropy (CE).

In this work we present a novel approach for microscopy cell segmentation inspired by the GAN [13] and extension thereof [14, 15, 16, 17]. The GAN framework is based on two networks, a generator and a discriminator, trained simultaneously, with opposing objectives. This allows the discriminator to act as an abstract loss function in contrast to the common CE and losses. We propose a pair of adversarial networks, an estimator and a discriminator for the task of microscopy cell segmentation. Unlike the original GAN [13]

, we do not generate images from random noise vectors, rather estimate the underlaying variables of an image. The estimator learns to output some segmentation of the image while the discriminator learns to distinguish between expert manual segmentations and estimated segmentations given the associated image. The discriminator is trained to

minimize a classification loss on two classes, manual and estimated, i.e. minimizing the similarity between the two. The estimator, on the other hand, is trained to maximize the discriminator’s loss and effectively, maximize the similarity. In [18], semantic segmentation of natural images are generated for a set of predefined class. However, the main difference lays in our need to separate instances of a single class (cells) and not to separate different classes. The method also differs in choice of discriminator architecture and training method.

Our contribution is three-fold. We expand the concept of the GAN for the task of cell segmentation and in that reduce the dependency on a selection of loss function. We propose a novel architecture for the discriminator, referred to as the “Rib Cage” architecture (See section 2.4.2), which is adapted to the problem. The “Rib Cage” architecture includes several cross connections between the image and the segmentation, allowing the network to model complicated correlation between the two. Furthermore we show that accurate segmentations can be achieved with a low number of training examples therefore dramatically reducing the manual workload.

The rest of the paper is organized as follows. Section 2 defines the problem and elaborates on the proposed solution. Section 3 presents the results for both a common adversarial and non-adversarial loss compared to the proposed method, showing promising initial results. Section 4 summarizes and concludes the work thus far.

2 Methods

2.1 Problem Formulation

Let define the image domain and let the image

be an example generated by the random variable

. Our objective is to partition the image into individual cells, where the main difficulty is separating adjacent cells. Let the segmentation image be a partitioning of to three disjoint sets, background, foreground (cell nuclei) and cell contour, also generated by some random variable . The two random variables are statistically dependent with some unknown joint probability . The problem we address can be formulated as the most likely partitioning from the data given only a small number, , of example pairs . Had been known, the optimal estimator would be the Maximum Likelihood (ML) estimator:


However, since is unknown and cannot be calculated, we learn the near-optimal estimator of using the manual segmentation, , as our target.

2.2 Estimation Network

We propose an estimator in the form of a CNN with parameters . We wish to train the estimator such that the estimated will be as close as possible to the optimal ML estimation . This is achieved by optimizing for some loss function (defined in section 2.3):


2.3 Adversarial Networks

Unlike the GAN, aiming to generate examples from an unknown distribution, we aim to estimate the variables of an unknown conditional distribution . Defining the loss either in a supervised pixel-based way, e.g. norm, or in an unsupervised global method, by a cost functional that constrains partition into homogenous regions while minimizing the length of their boundaries, is usually not well defined. We define the loss by pairing our estimator with a discriminator. Let and denote the estimator and discriminator respectively, both implemented as CNN with parameters and respectively. The estimator aims to find the best estimation of the partitioning given the image . The discriminator on the other hand tries to distinguish between and given pairs of either or and outputs the probability that the input is manual rather than estimated denoted as . As is in the GAN case, the objectives of the estimator and the discriminator are exactly opposing and so are the losses for training and . We train to maximize the probability of assigning the correct label to both manual examples and examples estimated by . We simultaneously train to minimize the same probability, essentially trying to make and as similar as possible:


In other words, and are players in a min-max game with the value function:


The equilibrium is achieved when and are similar such that the discriminator can not distinguish between the pairs and .

2.4 Implementation Details

2.4.1 Estimator Network Architecture

The estimator

net is designed as a five layer fully CNN, each layer is constructed of a convolution followed by batch normalization and leaky-ReLU activation. The output of the estimator is an image with the same size as the input image with three channels corresponding to the probability that a pixel belongs to the background, foreground or cell contour.

2.4.2 Discriminator Network Architecture

The discriminator is designed with a more complex structure. The discriminators task is to distinguish manual and estimated segmentation images given a specific gray level (GL) image. The question arrises of how to design the discriminator architecture which can get both the GL and segmentation images as input. A basic design s that of a classification CNN where both images are concatenated in the channel axis, as done in [17]. However, we believe that this approach is not optimal for our needs since, for this task, the discriminator should be able match high level features from the GL and segmentation images. Yet these features may have very different appearances. For example, an edge of a cell in the GL image appear as a transition from white to black while the same edge in the segmentation image appears as a thin blue line. This difference requires the network to learn individual filters for each semantic region. Then, finding correlations between the two is a more feasible task. For these reasons we designed a specific architecture, referred to as a “Rib Cage” architecture, which has three channels. The first and second channels get inputs from the GL channel and segmentation channel respectively, each channel calculates feature maps using a convolutional layer, we refer to these channels as the “Ribs”. The third channel, referred to as the “Spine”, gets a concatenation of inputs from both the GL and segmentation channels and matches feature maps (i.e correlations). See Figure 1 for an illustration of the “Rib Cage” block. The discriminator is designed as three consecutive “Rib Cage” blocks followed by two fully-connected (FC) layers with leaky-ReLU activations and a final FC layer with one output and a sigmoid activation for classification. Figure 2 illustrates the discriminator design. The architecture parameters for the convolution layers are describes as and FC layers as . The parameters for the estimator: . The discriminator spine used half the number of filters as the ribs: .

Figure 1: The design of the basic building block for the discriminator. Each block has three inputs and three outputs.
Figure 2: The design of the Discriminator . Three “Rib Cage” blocks (see Figure 1) are followed by two FC layers with ReLU activations and a last FC layer with a sigmoid activation, . The Center-In chanel of the first Rib Cage block is ommited.

2.4.3 Data

We trained the networks on the H1299 data set [20] consisting of frames of size pixels. Each frame captures approximately 50 cells. Manual annotation of randomly selected frames was done by an expert. The annotated set was split into a training set and validation set. The training set was subsampled to examples for training which were augmented using randomly cropped areas of size pixels along with random flip and random rotation. The images were annotated using three labels for the background (red), cell nucleus (green) and nucleus contour (blue) encoded as RGB images.

3 Experiments and Results

Full Image


Manual Seg

Full Image


CE Loss
Figure 3:

Segmentation example of a validation image given a different number of training examples. The odd and even rows show the full image and a zoomed area respectively. Notice that in all cases the cells in the second row were correctly separated even though they are very close together. The bottom right shows the result when training with the CE loss.

ADV-1 ADV-2 ADV-4 ADV-11 CE - 11 Class Disc Ilastik
Prec 89.9% 85.4% 86.8% 85.8% 83.6% 78.7% 81.2%
Rec 82% 87.2% 86.8% 86.5% 86.4% 81.14% 80.2%
F 85.8% 86.3% 86.8% 86.1% 84.8% 79.9% 80.7%
J 80.6% 75.8% 77.4% 74.6% 72.1% 60.2% 68.4%

Table 1: Quantitative Results: Each column represents an experiment with a different number of training examples, ADV-. CE Loss-11 and ClassDisc are experiments using the same estimator network trained with the pixel-based CE loss and a simple classification discriminator respectively. The last column is the comparison to the state of the art tool, Ilastik [8]. The rows are the results for individual cell segmentation. As is explained in [5] True positives (TP) are cells with Jaccard measure greater than 0.5. False positives (FP) are automatic segmentation not appearing in the manual segmentation and false negatives (FN) is the opposite. The measures are defines as , , . J indicates the mean Jaccard measure for individual cells.

We conducted four experiments, training the networks with different values for . All other parameters were set identically. We evaluated the segmentation using the as described in the caption of Table 1. We compared the adversarial training regime to the common CE loss, training only the estimator. We furthermore evaluate our choice of RibCage discriminator versus a classification architecture (VGG16 [21]). We also compared our results to state of the art segmentation tool, Ilastik [8]. The manual annotation were done by an expert. The quantitative results of the individual cell segmentation are detailed in Table 1. Note that the amount of images in the training data had little effect on the results. Figure 3 shows an example of a segmented frame. It is clear that the networks learned a few distinct properties of the segmentation. First, each cell is encircled by a thin blue line. Second, the shape of the contour follows the true shape of the cell. Some drawbacks are still seen where two cells completely touch and the boundary is difficult to determine.

4 Summary

In this work we propose a new concept for microscopy cell segmentation using CNN with adversarial loss. The contribution of such an approach is two-fold. First, the loss function is automatically defined as it is learned along side the estimator, making this a simple to use algorithm with no tuning necessary. Second, we show that this method is robust to low number of training examples surpassing.

The quantitative results, as well as the visual results, show clearly that both the estimator and our unique “Rib Cage” discriminator learn both global and local properties of the segmentation, i.e the shape of the cell and the contour surrounding the cell, and the fitting of segmentation edges to cell edges. These properties could not be learned using only a pixel-bases CE loss as is commonly done.