DeepAI
Log In Sign Up

Counting Cells in Time-Lapse Microscopy using Deep Neural Networks

An automatic approach to counting any kind of cells could alleviate work of the experts and boost the research in fields such as regenerative medicine. In this paper, a method for microscopy cell counting using multiple frames (hence temporal information) is proposed. Unlike previous approaches where the cell counting is done independently in each frame (static cell counting), in this work the cell counting prediction is done using multiple frames (dynamic cell counting). A spatiotemporal model using ConvNets and long short term memory (LSTM) recurrent neural networks is proposed to overcome temporal variations. The model outperforms static cell counting in a publicly available dataset of stem cells. The advantages, working conditions and limitations of the ConvNet-LSTM method are discussed. Although our method is tested in cell counting, it can be extrapolated to quantify in video (or correlated image series) any kind of objects or volumes.

READ FULL TEXT VIEW PDF
12/08/2014

Cells in Multidimensional Recurrent Neural Networks

The transcription of handwritten text on images is one task in machine l...
02/28/2018

Using Deep Learning for Segmentation and Counting within Microscopy Data

Cell counting is a ubiquitous, yet tedious task that would greatly benef...
05/29/2018

Microscopy Cell Segmentation via Convolutional LSTM Networks

Live cell microscopy sequences exhibit complex spatial structures and co...
02/10/2020

Automatic detection and counting of retina cell nuclei using deep learning

The ability to automatically detect, classify, calculate the size, numbe...
04/18/2019

Enhanced Center Coding for Cell Detection with Convolutional Neural Networks

Cell imaging and analysis are fundamental to biomedical research because...
06/04/2018

A Cascade of 2.5D CNN and LSTM Network for Mitotic Cell Detection in 4D Microscopy Image

In recent years, intravital skin imaging has been used in mammalian skin...

1 Introduction

Analysis of cell images through time is a powerful tool commonly used in medicine and research. For instance, in regenerative medicine scientists perform experiments in stem cells using several culture conditions and capture a large number of images to analyze them  [Janmey and McCulloch2007]. A particular case of time-lapse microscopy is the counting. Albeit simpler than other tasks (for instance cell segmentation) it would alleviate a lot of human effort in fields such as regenerative medicine.

Automatic cell counting has been done with two approaches: detection and regression. In counting by detection the desired object to quantify is first detected, which involves a segmentation and classification step in traditional computer vision. This approach usually is designed for a specific cell type. Counting by detection has been replaced by more general counting techniques (counting by regression), borrowed from other computer vision problems (such as crown counting). On the other hand, counting by regression ignores the detection and estimates a number directly from the image. Hence, in this method usually the locations of the object can not be known. In this paper, we focus on the regression method.

Several works have proposed approaches for counting cells by regression. Xie et. al. [Xie et al.2016]

used a fully convolutional neural network to detect the position of the cells in the images. The input images were split into overlapping cuts that fit in the neural network. The full image was built again using interpolation over the neural network output. This approach allowed them to count using the integral over the output of the network. Xue et al. 

[Xue et al.2016] proposed using simple convolutional neural networks (ConvNet) as regressor to get the object count from the image. Using several mainstream ConvNets architectures and the same input image splitting as Xie et al.. Finally, Cohen et. al [Cohen et al.2017] also use ConvNets as regressors in a pure regression way, but the image in the final reconstruction stage changed. In his work each input pixel account for a number of cells, hence each image crop have redundancy counting (due to overlapping), which is used for total count prediction as a normalized factor below the total sum.

These works have made big steps towards automation of cell counting, but they did not discuss the time factor. Although some microscopic objects do not change significantly in appearance over time (such as bacteria E.Coli), several kinds of cells transform their appearance over time (e.g. stem cells). Hence, a deeper analysis on how this automatic count approaches response to the time variation is needed.

In this work, the cell counting problem over time is faced. Several challenges related to the cell counting task in the single frame prediction (static) and multi-frame prediction (dynamic) cases are discussed. An evaluation and analysis of current approaches for cell counting in the dynamic context is done. Finally, a spatiotemporal model using ConvNets and long short term memory (LSTM) recurrent neural networks is proposed to overcome temporal variations. The advantages, working conditions and limitations of the ConvNet-LSTM method are discussed. Although our method is tested in cell counting it can be extrapolated to quantify in video (or correlated image series) any kind of objects or volumes. All annotations and derived datasets used in this work, source code, and the trained models, are publicly available.

The rest of the paper is organized as follows: First, the challenges present in static and dynamic cell counting are discussed in section 2. Also, the methods used in the static and dynamic case are explained. Section 3 describes the experiments used to test the models. Results are presented in section 4. Section 5 discusses the results and limitations. Finally, in section 6 the conclusions and future work are presented.

2 Methodology

In this section different situations present in static and dynamic cases of cell counting are described and analyzed. The general ConvNet regression approach is briefly explained and the proposed spatiotemporal (ConvNet + LSTM) model is exposed.

2.1 The counting Task

Figure 1: General procedure for cell counting

The Fig. 1 shows the general procedure for cell counting. First, a cell culture or scene where the objects of interest are present is monitored over time. While the scene is observed images are taken periodically (sampled), this sampling frequency is parameter usually defined by the application (e.g. counting cells or counting people). The images (sampled in times t1, t2, and t3) are processed by a counting algorithm, which gives for each sampling time the number of cells (or objects) in the scene. In the Fig. 1 the interest objects (red dots) increase over time in the scene, but this condition is not mandatory.

2.2 Challenges in static and dynamic cell counting

The general perspective presented in Fig. 1 helps to state the problem, but several details remain behind.

2.2.1 Background and Outliers

The background (scene or context in which the objects of interest appear) usually is not uniform. In microscopy, several background noise contaminates the images, such as Non-uniform illumination, dead cells or organisms, external contamination agents (dust and particles), and scrap product of cells interactions and growing. This problem has been solved using a sufficiently discriminative classifier in previous works of cell segmentation.

2.2.2 Object appearance

Figure 2: Snapshot of stem cell while reproduction takes place

Cells are not time-invariant objects. They have a lot of intra-class variation caused for several reasons: change of nucleus position (if visible), interactions between cells, internal mechanics of the cell, among others. Furthermore, the peak of intraclass variation occurs when the cell reproduction begins. Fig. 2 shows a set of frames of the same single cell while the reproduction takes place, the snapshots are sampled each hour. All the images (except the last one) from Fig. 2 must be identified as one cell despite the evident appearance variation. The number of possible variations increases when multiple cells inside the same image are considered (as each individual cell can be in any state of the reproduction independently).

2.2.3 Image partition

Current state-of-the-art proposed solutions [Xie et al.2016, Xue et al.2016] for cell counting has a similar framework to process the images. The Fig. 3 briefly reviews the approach.

Figure 3: Static cell counting approach

The Fig. 3 shows the static cell counting approach. It can be summarized in three parts:

  1. The input image is cropped using a sliding window (red frame in Fig. 3). The step and window size (usually equal to ConvNet input) must be selected a priori. Usually, the crops have an overlapping in order to have redundant information

  2. Each crop is evaluated in the ConvNet to get an estimation of the number of cells or objects

  3. An algorithm to merge the individual crops count information into a global count is used. Some works [Xie et al.2016, Xue et al.2016] use an interpolation to remake the input image as a density map (whose integral is the number of cells). Cohen et al. [Cohen et al.2017] sum all crops results and divide by the number of overlapping pixels.

This approach has a known issue with the hyper-parameters (step and size of the sliding window) in step one. The size of the window controls the number of classes that the model must be able to classify. For instance, a window capable of containing four cells must classify between ,,,, and cells (five classes), but a larger window (capable of containing eight cells) must classify nine classes. Hence, the window size controls the complexity of the classification task. A larger window will result in more classes to predict and fewer crops to train the model (since fewer partitions can be done).

The step parameter controls the amount of redundancy and dataset size. A little step will give a lot of images very correlated between them and higher step will result in more independent samples but also less training sets. Due to these reasons, size and step of the sliding window are problem-specific parameters i.e. its optimum values must be found experimentally for each kind of cell (e.g. stem cells, blood cells, etc) or object. The theoretical relation between these parameters and the performance of the model is an open question.

2.2.4 Unbalanced data

The number of samples per class (i.e the number of objects in each crop) could be unbalanced. For instance, the proliferation of cells can be quantified by the equation [Sherley et al.1995]:

(1)

where is the number of cells at time , is the initial number of cells, and is the frequency of cell cycles per unit time. The exponential nature of this proliferation tells us that we are going to have far more images with few cells (e.g. and ). Hence, tiny datasets will suffer from a class unbalance problem which could limit the model performance. Big datasets do not suffer from unbalancing problems due to the classes can be artificially balanced erasing extra samples from most frequent classes.

2.2.5 Sampling rate

As Fig. 1 shows, the sampling times , , and record an image of the scene to compute the number of cells. The sampling time has no effect in current counting approaches (due to time is not used in static cell counting), however in this work, the sampling time has influence on the features. A low sample rate (sample with less frequency) will lead to time-uncorrelated images, hence the time information can not be used. A high sample rate will have time-correlated images, but due to cells change (appearance and movement) in time very slow neighborhood images could be almost the same.

2.3 Long-term Recurrent Convolutional Network (LRCN) for dynamic cell counting

The aim of our work is to merge the spatial data (static cell counting) with the time variable. This spatiotemporal problem has been previously addressed by the community of action recognition. Based on the work of Donahue et al. [Donahue et al.2015] in action recognition, we propose a mix between ConvNets (to address images information) and recurrent neural networks (to deal with time Features). Fig. 4 shows the proposed framework for cell counting in time-lapse microscopy.

Figure 4: Proposed framework for cell counting in time-lapse microscopy

2.3.1 Input data and partition

In this case, the two parameters step and window size explained in section 2.2.3 (image partition) remain equal. Additionally, a new parameter called temporal window () is used. The impose how many previous frames will be used to predict the current image number of objects. Albeit, this parameter () adds a constraint to the framework: The number of cells on the image can only be predicted when the number of previous frames is equal to . This condition must be avoided (especially in cell counting applications) since sample rate could be

image per hour and the growing process usually takes several days. The method should be able to process the video frame by frame in order to take actions (or not) each time an image is recorded. In order to avoid this constraint, a temporal padding is done, which will be explained in the following sections

Each frame is divided using the selected step and window size making a set or crops for each frame. Each crop has a set of temporally associated crops, i.e. the same crop (same spatial position) in past time instants (frames). The number of time-associated crops will be . Hence, as process number in Fig.4 shows, each frame is divided in crops and each crop is associated with past crops.

2.3.2 ConvNet

The convolutional neural network acts as a feature extractor. Each crop is evaluated in the ConvNet, but instead of using the whole ConvNet as a regressor (as previous works did), we collect the feature vector that appears before the fully connected layer (the exact layer changes between architectures). These ConvNet features are usually extracted from pre-trained architectures and called off-the-shell or bottleneck features.

Four mainstream ConvNet architectures were used as feature extractors: VGG16, VGG19 [Simonyan and Zisserman2014], ResNet50 [He et al.2016], and InceptionV3 [Szegedy et al.2016]. Nevertheless, any ConvNet or computer vision approach can be used to make this mapping (image - features).

2.3.3 Temporal padding and stack of features

Once the crops are forwardly propagated through the ConvNet, a feature vector of dimension is obtained for each crop ( is the number of features, it depends on the ConvNet architecture e.g. for the VGG16). This feature vector must be stacked with its previous crop’s feature vector to build a block of dimensions . However, is no possible to stack feature vectors when the number of previous frames is less than . Since no previous information exists, instead we add vectors of ones of dimension . For example, if

but at the moment only three images are recorded the block

is composed of three feature vectors with dimensions and two vectors with ones of dimensions . Experimentally, we found that this vector of one helps the network to predict the true number of cells since it gives an idea of which growing phase is the culture (this can also be a problem as will be discussed in limitations). Notice that must be built in the strict temporal order in which the frames appear.

2.3.4 Recurrent Neural Network (RNN) and LSTM

Recurrent neural networks have been successfully used for tasks with complex temporal dynamics such as speech recognition and text generation. Long short-term memory 

[Hochreiter and Schmidhuber1997]

RNN extend the scalability of RNN, allowing them to be trained in large topologies without exploding gradients problems. In this work we use LSTM networks, future references to RNN must be considered as LSTM. The mix model ConvNet + LSTM will be addressed as LRCN.

RNN has several time inference methods, in this work, the many-to-one approach is used. In many-to-one inference, inputs enter to the RNN before a prediction is done. In this work , which means the input to the RNN in the step 3 (see Fig.4) is going to be the block . Hence, we have an RNN with time steps and one output corresponding to the estimated number of cells in the crop.

Bidirectional LSTM RNN [Graves and Schmidhuber2005] get the most of temporal information stepping through the input time steps in forward and backward directions. However, bidirectional LSTM RNN can be used only when the whole input (all the time steps) are available. In this case it is not a problem, since our block already has all the time steps. Bidirectional LSTM RNN are also used and compared against LSTM RNN.

2.3.5 Join algorithm

In step 4 (see Fig.4) for each crop in the current frame there is an estimated cell count. In order to make a global decision on the number of cells in the current frame the same approach used in  [Xue et al.2016](previously briefly explained in Image partition section) was done.

3 Experimental setup

In this section, the datasets and the experiments carried out in this work, are described. Additionally, an explanation of implementation details (such as libraries and architecture parameters, hardware, and optimization methods) is included.

3.1 dataset

In this work, the publicly available dataset Cell Image Analysis Archive  [Kanade et al.2011] was used. This dataset contains Myoblastic stem cells videos during the growth of the culture. It uses phase-contrast microscopy imaging acquiring images at a frequency of every 5 minutes over a course of approximately 3.5 days using a Zeiss Axiovert T135V microscope. Each image contains 1392 1040 pixels with a resolution of 1.3m/pixel. Five sets of images (090325-F0009, 90303-F0002, 090318-F0007, and 090303-F0006) from different cultures were randomly selected (this is a very important condition since the RNN easily can memorize a single culture growing curve). Each set of images was sub-sampled to have one image per hour (this is the sampling rate parameter disused in section 2.2). Also, from each frame one quadrant (evenly dividing the whole image into four regions) is randomly selected. Each frame was manually annotated with a red dot in the center of a cell.

Following image partition section procedure, the parameters step and window size must be optimized for each type of cell. The window sizes , , and with overlapping (between crops) of , , and were tested. Experimentally for our dataset, the window size 50 and overlapping of (hence step of pixels) worked better. The training and test set were built using a 5 fold cross-validation like method. In each fold, four images sets were used for training and the last set was used as test.

3.2 Experiments

3.2.1 Static cell counting

The five folds where tested using the framework proposed in Fig. 3

with the features from the four pre-trained mainstream ConvNets. The ConvNets were pretrained models in the popular ImageNet dataset. Notice that as  

[Xue et al.2016] we tried to train from scratch these architectures, however the accuracy decreased (also  [Xue et al.2016] ). The fully connected layer of each model was fine-tuned for a regression task, due to the advantages (accuracy) of the regression over classification for cell counting were already shown by  [Xue et al.2016]

. Experimentally we found that models perform better training them with an L1 loss function (the same conclusion of  

[Xue et al.2016, Cohen et al.2017]).

3.2.2 Dynamic cell counting

The five folds were tested using the framework proposed in Fig. 3 with the same features from ConvNets of Static cell counting experiments. A two-layer RNN with LSTM cells per layer was stacked at the end of each ConvNet (instead of the fully connected layer). This RNN was trained using an L2 loss function, notice that this is a different loss function from static cell counting framework. This harsh loss function helped to increase the accuracy using the unbalanced dataset in the dynamic cell counting framework (not the case for static cell counting).

The same set of experiments (four ConvNets + LSTM RNN ) where repeated changing parameter between the values : , , . This set of experiments were done using a bidirectional LSTM also with two layers of cells.

3.3 Performance metrics

We use the same performance metric (mean absolute error) of  [Xue et al.2016, Cohen et al.2017] since it has been used by many object counting papers.

3.4 Implementation details

All network optimization and testing is performed using an NVIDIA GeForce GTX 1080-ti GPU and implemented using the Keras API 

[Chollet and others2015]

with a Tensorflow backend 

[Abadi et al.2016].

4 Results

This section shows the results of the previously stated experiments split into two sections: static cell counting and dynamic cell counting results.

4.1 Static cell counting

Table 1

shows the results of static cell counting using ConvNets. The first column shows the ConvNet architecture followed by the specific MAE and standard deviation in the test set of each fold. Notice the different performance with each fold, in almost all the experiments the fold

had the best performance and the folds - the worst.

Model F1 F2 F3 F4 F5
VGG16
67.20
57.02
19.5
37.71
18.3
23.59
11.9
64.28
57.0
VGG19
52.01
27.8
29.12
15.6
28.53
21.9
13.23
5.62
45.40
35.3
ResNet50
22.90
18.5
22.32
13.2
19.86
14.6
6.63
4.39
14.51
9.22
InceptionV3
36.31
27.2
26.3
17.5
35.86
29.3
27.62
19.3
77.56
62.7
Table 1: Results of static cell counting experiments

The best architecture (in performance) was the ResNet50 and the worst the VGG16. Similar to  [Xue et al.2016] we got better results (in almost all results) with deeper architectures using fine-tuning. The results with the pre-trained InceptionV3 ConvNet could be due to the fact that the model is too much specialized in natural images (ImageNet images), which are very different from microscopy images.

4.2 Dynamic cell counting with LSTM

Table 2 shows the results of dynamic cell counting using ConvNets and LSTM. The first column shows the ConvNet architecture followed by the specific MAE and standard deviation in the test set for each fold. Table 2 is subdivided by variation in the parameter from to (as previously stated).

LRCN -
VGG16
43.68
36.7
10.16
6.9
12.96
5.63
51.50
18.3
55.17
50.9
VGG19
32.21
25.7
9.99
5.5
9.83
5.79
29.14
20.8
29.14
20.9
ResNet50
42.42
37.0
43.36
14.0
28.25
13.2
23.13
11.6
56.37
53.4
InceptionV3
53.25
43.8
22.96
12.5
19.37
10.4
24.03
12.1
58.89
57.5
LRCN -
VGG16
42.13
34.2
10.95
7.29
33.89
5.75
36.38
16.9
30.19
25.6
VGG19
30.38
26.3
7.26
4.5
8.63
5.76
35.82
14.7
27.63
19.5
ResNet50
42.94
37.6
18.55
10.4
25.80
10.9
22.34
11.8
54.16
47.8
InceptionV3
42.57
37.1
11.45
9.12
17.03
13.4
21.10
11.0
52.68
33.0
LRCN -
VGG16
34.81
30.2
25.60
13.3
31.66
16.0
27.49
16.0
53.94
47.1
VGG19
46.93
34.5
13.23
6.48
19.87
6.45
30.49
19.6
53.87
46.9
ResNet50
45.48
40.1
19.36
11.0
29.03
13.9
24.41
12.5
54.74
49.7
InceptionV3
42.15
36.7
30.79
14.9
29.31
14.2
19.5
18.7
53.65
46.3
Table 2: Results of dynamic cell counting using LSTM experiments

Notice that the unbalance performance between folds continues: The fold had the best performance and the folds - the worst (for all the values of ). When increases the performance has a mostly regular increasing pattern until . As Table 2 in shows, too much temporal information (high ) reduces the performance and increase the standard deviation. However, for (the best value experimentally found for ) a direct comparison between VGG16 and VGG19 architectures (e.g. VGG16 results in static cell counting against VGG16 in dynamic cell counting) shows how LRCN is better in almost all the cases than single ConvNet.

Nevertheless, for the complex ConvNets (residual connections and network in network in the full connected layer) the coupling approach did not perform well. This result is probably due to the full connected layer in the ConvNets perform better than our stacked two-layer LSTM network used in the LRCN. A more specific and complex LSTM network probably could beat single ConvNet approach in the ResNet and InceptionV3 cases (however this hypothesis is not proved in this paper).

Table 3 shows the results of dynamic cell counting using ConvNets and bidirectional LSTM (Bi-LSTM). The distribution of the results in the table is the same as Table 2. As expected the Bi-LRCN perform better (in almost all the cases) than the simple LRCN. Following the results of  [Graves and Schmidhuber2005] the information of the sequence in backward direction has a great impact on the model. Notice the Bi-LRCN improve the MAE but not so much the standard deviation.

Bi-LRCN -
VGG16
40.02
33.0
12.04
9.32
12.33
5.46
36.65
17.8
25.30
16.3
VGG19
28.16
26.0
11.69
5.93
28.31
6.50
25.84
13.7
27.87
21.3
ResNet50
43.71
38.5
17.42
9.39
23.94
9.37
25.18
13.1
54.14
44.7
nceptionV3
50.54
21.5
12.11
10.2
18.12
12.2
18.71
15.6
58.92
40.2
Bi-LRCN -
VGG16
38.03
31.5
8.88
6.88
9.79
5.61
33.47
16.7
28.63
25.8
VGG19
27.42
23.1
5.43
4.9
7.87
5.67
43.23
15.4
25.43
18.1
ResNet50
45.52
38.5
19.21
12.42
25.63
9.1
26.54
14.2
54.33
50.4
InceptionV3
45.15
21.5
25.03
14.0
23.12
8.9
20.90
12.2
59.83
29.2
Bi-LRCN -
VGG16
40.41
33.1
10.27
7.29
23.28
7.19
66.37
19.0
27.78
22.4
VGG19
24.86
21.5
6.49
4.2
21.49
5.71
29.20
13.8
24.33
21.3
ResNet50
53.44
43.8
23.59
12.7
27.22
12.3
22.28
11.3
54.74
49.7
InceptionV3
44.43
23.7
24.56
14.0
22.07
9.3
18.92
12.8
61.90
60.5
Table 3: Results of dynamic cell counting using Bi-LSTM

5 Discussion and Limitations

Although dynamic cell counting was shown to overcome static cell counting, the following issues were found to limit the performance of our approach:

5.1 Unsuccesfull improvement in LRCN using ResNet and InceptionV3

As was previously stated, simple (no residual connections or network in network approaches) Convnets such as VGG16 and VGG19 the LRCN improve a lot the performance, but for complex ConvNets the approach did not perform well. Complex LSTM networks, which can replace the full connected layer of these ConvNets, must be implemented.

5.2 Unbalanced performance between folds

The performance ranking between fold remains more or less equal in all the experiments, from high to less MAE: followed by . These results reflect the number of samples per class in each test set in the folds. and are the sets with a higher number of classes (from 0 to 8 cells per images). Therefore the model did not have enough samples from classes with a lot (more than four) of cells and perform worst in this test sets. However, the fact that the LRCN deals better with this problem proves the positive influence of temporal information.

5.3 LRCN train set issues

In  [Xue et al.2016, Cohen et al.2017] some ConvNets are trained from scratch leading to better results than fine-tuned ones. In our case this task was impossible for both models: static cell counting and dynamic cell counting giving always worst results than fine-tuned models. This problem has two cores: Dataset size and unbalancing. We proceeded to balance the folds leaving a maximum of samples per class and leaving low number classes the same. Table 4 shows the results of static cell counting with “balanced” folds.

Model F1 F2 F3 F4 F5
VGG16
21.77
14.3
54.84
11.6
66.44
5.9
32.03
14.7
54.65
10.9
VGG19
17.3
8.6
15.11
12.8
9.64
+10.1
27.74
20.9
16.04
10.3
ResNet50
6.45
5.58
14.58
14.1
6.43
6.2
27.42
9.5
22.55
9.6
InceptionV3
25.43
13.2
14.07
8.8
46.88
22.6
38.79
7.0
26.95
18.2
Table 4: Results of static cell counting with balanced folds

Table 4 shows a lot of improvement with respect to unbalanced static cell counting. However, this same approach could not be done with LRCN, due to the reduced train set the model memorizes the dataset very fast, avoiding any generalization. The authors believe that with a larger dataset a balanced version of the database could lead to even better performance using the LRCN model.

6 Conclusions

In this paper, a computer vision cell counting approach using multiple frames (hence temporal information) is proposed. An extension of previous works using a mixed architecture with convolutional neural networks (ConvNets) and recurrent neural networks is developed. Using a publicly available dataset of stem cells and four mainstream deep ConvNets a comparison between frame cell counting prediction (static cell counting) and proposed multi-frame cell counting (dynamic cell counting) was shown. The results show how dynamic cell counting surpasses the static cell counting approach and resists better the unbalancing nature of microscopy image data. A detailed analysis of challenges and common issues in cell counting for time-lapse microscopy was also presented.

In the future, several architectures must be tested. A specific ConvNet architecture as  [Cohen et al.2017] proposed could lead to better results. Also, dimensional ConvNets (input frames overlapped as layers) and attention based models (in recurrent neural networks) could enhance performance. However, the most important issue is the data size.

Acknowledgments

We would like to thank RETECA Foundation for the support and the opportunity to develop this work.

References

  • [Abadi et al.2016] Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng.

    Tensorflow: A system for large-scale machine learning.

    In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 265–283, 2016.
  • [Chollet and others2015] François Chollet et al. Keras. https://github.com/fchollet/keras, 2015.
  • [Cohen et al.2017] Joseph Paul Cohen, Henry Z Lo, and Yoshua Bengio. Count-ception: Counting by fully convolutional redundant counting. arXiv preprint arXiv:1703.08710, 2017.
  • [Donahue et al.2015] Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. Long-term recurrent convolutional networks for visual recognition and description. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    , pages 2625–2634, 2015.
  • [Graves and Schmidhuber2005] Alex Graves and Jürgen Schmidhuber. Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks, 18(5):602–610, 2005.
  • [He et al.2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • [Hochreiter and Schmidhuber1997] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  • [Janmey and McCulloch2007] Paul A Janmey and Christopher A McCulloch. Cell mechanics: integrating cell responses to mechanical stimuli. Annu. Rev. Biomed. Eng., 9:1–34, 2007.
  • [Kanade et al.2011] Takeo Kanade, Zhaozheng Yin, Ryoma Bise, Seungil Huh, Sungeun Eom, Michael F Sandbothe, and Mei Chen. Cell image analysis: Algorithms, system and applications. In Applications of Computer Vision (WACV), 2011 IEEE Workshop on, pages 374–381. IEEE, 2011.
  • [Lehmussola et al.2007] Antti Lehmussola, Pekka Ruusuvuori, Jyrki Selinummi, Heikki Huttunen, and Olli Yli-Harja. Computational framework for simulating fluorescence microscope images with cell populations. IEEE Transactions on Medical Imaging, 26(7):1010–1016, 2007.
  • [Sherley et al.1995] JL Sherley, PB Stadler, and J Scott Stadler. A quantitative method for the analysis of mammalian cell proliferation in culture in terms of dividing and non-dividing cells. Cell proliferation, 28(3):137–144, 1995.
  • [Simonyan and Zisserman2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  • [Szegedy et al.2016] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2818–2826, 2016.
  • [Van Valen et al.2016] David A Van Valen, Takamasa Kudo, Keara M Lane, Derek N Macklin, Nicolas T Quach, Mialy M DeFelice, Inbal Maayan, Yu Tanouchi, Euan A Ashley, and Markus W Covert. Deep learning automates the quantitative analysis of individual cells in live-cell imaging experiments. PLoS computational biology, 12(11):e1005177, 2016.
  • [Xie et al.2016] Weidi Xie, J Alison Noble, and Andrew Zisserman. Microscopy cell counting and detection with fully convolutional regression networks. Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, pages 1–10, 2016.
  • [Xue et al.2016] Yao Xue, Nilanjan Ray, Judith Hugh, and Gilbert Bigras. Cell counting by regression using convolutional neural network. In European Conference on Computer Vision, pages 274–290. Springer, 2016.