Deep Cytometry

04/09/2019 ∙ by Yueqin Li, et al. ∙ 0

Deep learning has achieved spectacular performance in image and speech recognition and synthesis. It outperforms other machine learning algorithms in problems where large amounts of data are available. In the area of measurement technology, instruments based on the Photonic Time Stretch have established record real-time measurement throughput in spectroscopy, optical coherence tomography, and imaging flow cytometry. These extreme-throughput instruments generate approximately 1 Tbit/s of continuous measurement data and have led to the discovery of rare phenomena in nonlinear and complex systems as well as new types of biomedical instruments. Owing to the abundance of data they generate, time stretch instruments are a natural fit to deep learning classification. Previously we had shown that high-throughput label-free cell classification with high accuracy can be achieved through a combination of time stretch microscopy, image processing and feature extraction, followed by deep learning for finding cancer cells in the blood. Such a technology holds promise for early detection of primary cancer or metastasis. Here we describe a new implementation of deep learning which entirely avoids the computationally costly image processing and feature extraction pipeline. The improvement in computational efficiency makes this new technology suitable for cell sorting via deep learning. Our neural network takes less than a millisecond to classify the cells, fast enough to provide a decision to a cell sorter. We demonstrate the applicability of our new method in the classification of OT-II white blood cells and SW-480 epithelial cancer cells with more than 95% accuracy in a label-free fashion.



There are no comments yet.


This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Deep learning provides a powerful set of tools for extracting knowledge that is hidden in large-scale data. In image classification and speech recognition, deep learning algorithms have already made big inroads scientifically and commercially, creating new opportunities in medicine and bioinformatics[1]. In medicine, deep learning has been used to identify pulmonary pneumonia using chest X-ray images[2], heart arrhythmias using electrocardiogram data[3], and malignant skin lesions at accuracy levels on par with trained dermatologists[4]. The predictive potential of deep neural networks is also revolutionizing related fields like genetics and biochemistry where the sequence specificities of DNA- and RNA-binding proteins have been determined algorithmically from extremely large and complex datasets[5]. Recently, a deep-learning assisted image-activated sorting technology was demonstrated. It used frequency-division-multiplexed microscope to acquire fluorescence image by labeling samples and successfully sorted microalgal cells and blood cells [6]. Moreover, deep learning models help to analyze water samples so that the ocean microbiome is monitored [7].

Flow cytometry is a biomedical diagnostics technique which generates information gathered from the interaction of lasers with streaming cellular suspensions to classify each cell based on its size, granularity, and fluorescence characteristics through the measurement of forward- and side- scattered signals (elastic scatterings), as well as emission wavelength of fluorescent biomarkers used as marker-specific cellular labels (inelastic scatterings)[8, 9], respectively. One application of this technology is fluorescence-activated cell sorting (FACS) which enables the physical collection of cells of interest away from undesired cells within a heterogeneous mixture using multiple fluorescent labels to apply increasingly stringent light scattering and fluorescent emission characteristics to identify and collect target cell populations.

Despite the growing utility of flow cytometry in biomedical research and therapeutics manufacturing, the use of this platform can be limited due to the use of labeling reagents which may alter the behavior of bound cells through their inadvertent activation or inhibition prior to collection or through the targeting of unreliable markers for cell identification. CD326/EpCAM[10] is one example of the latter. This protein was initially accepted as a generic biomarker for cancer cells of epithelial origin (or their derivatives such as circulating tumor cells) but was later found to be heterogeneously expressed on both or even absent on the most malignant CTC[11] demonstrating some limitations to this approach. While these findings provide a rationale for the development of label-free cellular analysis and sorting platforms, sole reliance on forward- and side- scattered signals in the absence of fluorescence labeling information has been challenging as a cellular classification modality due to poor sensitivity and selectivity.

As a solution, label-free cell sorting based on additional physical characteristics has gained popularity[12, 13]. This approach is compatible with flow cytometry, but entails rapid data analysis and multiplexed feature extraction to improve classification accuracy. To achieve feature expressivity, parallel time stretch quantitative phase imaging (TS-QPI) methods are employed[14, 15, 16, 17] to assess additional parameters such as cell protein concentration (correlated with refractive index) and categorize unlabeled cells with increased accuracy.

We have recently introduced a novel imaging flow cytometer that analyzes cells using their biophysical features[18]. Label-free imaging is implemented by photonic time stretch[19, 20]

and the trade-off between sensitivity and speed is mitigated by using amplified time-stretch dispersive Fourier transform

[19, 21, 22, 23, 24]. In time-stretch imaging[25], the target cell is illuminated by spatially dispersed broadband pulses, and the spatial features of the target are encoded into the pulse spectrum in a short pulse duration of sub-nanoseconds. Both phase and intensity quantitative images are captured simultaneously, providing abundant features including protein concentration, optical loss, and cellular morphology[26, 27, 28, 29]. This procedure was successfully used as a classifier for OT-II hybridoma T-lymphocytes and SW-480 colon cancer epithelial cells in mixed cultures and distinct sub-populations of algal cells with immediate ramifications for biofuel production[18]. However, the image processing pipeline to extract morphological and biophysical features from label-free images has proven costly in time, taking several seconds to extract the features of each cell. This relatively long processing duration prevented the further development of a time-stretch imaging flow cytometer because classification decisions need to be made within subseconds, prior to the exit of target cells from the microfluidic channel. Even combined with deep learning methodologies for cell classification following biophysical feature determination, the conversion of waveforms to phase/intensity images and the feature extraction are demanded to generate the input datasets for neural network processing.

To remove the time-consuming steps of image formation and hand-crafted feature extraction, we developed and describe the use of a deep convolutional neural network to directly process the one-dimensional time-series waveforms from the imaging flow cytometer and automatically extract the features using the model itself. Eliminating the requirement of an image processing pipeline prior to the classifier, the running time of cell analysis can be reduced significantly, and cell sorting decisions can be made in less than a millisecond, orders of magnitude faster than previous efforts

[18]. Furthermore, we find that some features may not be represented in the phase and intensity images extracted from the waveforms, but can be observed by the neural network when the data is provided as the raw time-series waveforms. These hidden features, not available in manually designed image representations, enhance the model to perform cell classification more accurately. The balanced accuracy and F1 score of our model reach 95.74% and 95.71%, respectively, for an accelerated classifier of SW-480 and OT-II cells, achieving a new state of the art in accuracy, while enabling cell sorting by time-stretch imaging flow cytometry for the first time.


In order to study the learning behavior of the model, the neural network is evaluated on the training and validation datasets for every epoch of each class and their averaged forms (Fig.

Results). There are multiple ways to measure the performance of this model. Tracking the F1 score is one such example. The F1

score is the harmonic mean of precision and recall, where precision is the positive predictive value measuring the correctness of the classifier and the recall measures the completeness. Therefore, F

1 score is considered a very effective means of measuring classification performance. Since the examples in the dataset are categorized into three classes (SW-480, OT-II and blanks), the task for the neural network is multi-class classification as evaluated by calculating the F1 score per class and also their averaged forms. Three forms of F1 score averaging are taken into account: (1) the micro-averaged F1 score, which considers aggregate true positives for precision and recall calculations; (2) the macro-averaged F1 score, which evaluates precision and recall of each class individually, and then assigns equal weight to each class; (3) and the weighted-averaged F1 score that assigns a different weight to each class should the dataset be imbalanced. Orange curves show the train F1 score while green curves show the results of validation F1 score. Comparing the classification performance for each class, this neural network demonstrates successful recognition of SW-480 colorectal cells and OT-II hybridoma cells upon completion of the first training epoch. Interestingly, classification of the acellular dataset require approximately 10 epochs to achieve similar performance. The overall performance is determined by the averaged F1 scores of these three classes. The F1 scores of the training and validation datasets continue to improve until a maximum is reached at approximately the epoch 50. Meanwhile, the approximate performance of training and validation reveals a good generalization of this neural network. Ultimately, the weighted-averaged validation F1 scores observed achieved 97.01% accuracy. To evaluate the reproducibility of the results obtained by this neural network, the training procedure was repeated five times starting from randomly initialized weights and biases and demonstrated significant concordance between runs (The standard variation of the results was less than 0.82% at the last epoch).

figure[caption] Convergence of the network training. F1 score, as a measure of the classification performance, is shown for individual classes (a-c) and their averaged (combined) forms (d-f) over training epochs. At each epoch, the network is trained with all examples in the training dataset, and its performance over these training examples is averaged to obtain the training F1 score of the epoch (orange curves). At the end of each training epoch, the network is used for classifying all examples in the validation dataset resulting in each epoch’s validation F1 score (green curves). This neural network succeeded to recognize (a) SW-480 cells and (b) OT-II cells even at the end of the first train epoch, but required additional runs to detect (c) regions of the waveform containing no cells (blank examples). The shaded area demonstrates the range of performance variations in each epoch for five different training runs. The validation performance approximates the training performance, indicating the model is well-regularized.


In order for label-free real time flow cytometry to become a feasible methodology, imaging and data analysis need to be completed while the cell is traveling the distance between sample inlet of the microfluidic channel and the cell sorting mechanism (Fig. Discussion). During imaging, the time-stretch imaging system is used to rapidly capture the spatial information of cells at high throughput. A train of rainbow flashes illuminates the target cell as line scans. The features of the cells are encoded into the spectrum of these optical pulses, representing one-dimensional frames. Pulses are stretched in a dispersive optical fiber, mapping their spectrum to time. They are sequentially captured by a photodetector, and converted to a digital waveform, which can be analyzed by the neural network. The imaging and data capture takes less than 0.1 ms, and the delay in making a decision for cell sorting is dominated by the data processing time of the neural network.

figure[caption] Potential application of deep learning in cell sorting. A microfluidic channel with hydrodynamic focusing mechanism uses sheath fluid to align the cells in the center of field of view. The rainbow pulses formed by the time-stretch imaging system capture line images of the cells in the channel, taking blur-free images of cells flowing at a high speed. The output waveforms of the time-stretch imaging system are passed to a deep neural network. The network achieves rapid cell classification with high accuracy, fast enough to make decisions before the cell reaches the sorting mechanism. Different types of cells are categorized and charged with different polarity charges so that they can be separated into different collection tubes.

To quickly classify the target cells based on the collected data, we demonstrate the utility of analyzing waveforms directly by a convolutional neural network. The classification model is trained offline using datasets for the target cell type, and then used in an online system for cell sorting. The processing time of this model (forward propagation for a previously trained model) is 50 ms per example by using a CPU (8 Intel Xeon cores), 2.2 ms per example on a single NVIDIA Tesla K80 GPU, and 0.7 ms per example on a single NVIDIA Tesla P100 GPU. In our setup, with the cell flow rate in the microfluidic channel of 1.3 m/s, the cells travel 65 mm (for Intel CPU), 2.86 mm (for NVIDIA Tesla K80), or 0.91 mm (for NVIDIA Tesla P100) before the classification decision is made. It is very practical to fabricate microfluidic channels within these length limits, and they can keep the cells ordered for such short distances. Therefore, the cell type can be determined by our model in real-time before it reaches the cell sorter. Oftentimes the flow speed is less than our setup, and the length limitation is further relaxed.


In this manuscript, a deep convolutional neural network for direct processing of flow cytometry waveforms was presented. The results demonstrate record performance in label-free classification of cancerous cells with a test F1 score of 95.71% and accuracy of 95.70% for all classes evaluated. The system achieves this accurate classification in less than a millisecond, enabling real-time label-free cell sorting.



  • [1] Min, S., Lee, B. & Yoon, S. Deep learning in bioinformatics. Briefings in bioinformatics 18, 851–869 (2017).
  • [2] Rajpurkar, P. et al. Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv preprint arXiv:1711.05225 (2017).
  • [3] Rajpurkar, P., Hannun, A. Y., Haghpanahi, M., Bourn, C. & Ng, A. Y. Cardiologist-level arrhythmia detection with convolutional neural networks. arXiv preprint arXiv:1707.01836 (2017).
  • [4] Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115 (2017).
  • [5] Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of dna-and rna-binding proteins by deep learning. Nature biotechnology 33, 831 (2015).
  • [6] Nitta, N. et al. Intelligent image-activated cell sorting. Cell 175, 266–276 (2018).
  • [7] Gӧrӧcs, Z. et al. A deep learning-enabled portable imaging flow cytometer for cost-effective, high-throughput, and label-free analysis of natural water samples. Light: Science & Applications 7, 66 (2018).
  • [8] Shapiro, H. M. Practical flow cytometry (John Wiley & Sons, 2005).
  • [9] Watson, J. V. Introduction to flow cytometry (Cambridge University Press, 2004).
  • [10] Gires, O., Klein, C. A. & Baeuerle, P. A. On the abundance of epcam on cancer stem cells. Nature Reviews Cancer 9, 143 (2009).
  • [11] Kling, J. Beyond counting tumor cells. Nature Biotechnology 30, 578–580 (2012).
  • [12] Shields IV, C. W., Reyes, C. D. & López, G. P. Microfluidic cell sorting: a review of the advances in the separation of cells from debulking to rare cell isolation. Lab on a Chip 15, 1230–1249 (2015).
  • [13] Gossett, D. R. et al. Label-free cell separation and sorting in microfluidic systems. Analytical and bioanalytical chemistry 397, 3249–3267 (2010).
  • [14] Ikeda, T., Popescu, G., Dasari, R. R. & Feld, M. S. Hilbert phase microscopy for investigating fast dynamics in transparent systems. Optics letters 30, 1165–1167 (2005).
  • [15] Popescu, G. Quantitative phase imaging of cells and tissues (McGraw Hill Professional, 2011).
  • [16] Pham, H. V., Bhaduri, B., Tangella, K., Best-Popescu, C. & Popescu, G. Real time blood testing using quantitative phase imaging. PloS one 8, e55676 (2013).
  • [17] Wei, X., Lau, A. K., Xu, Y., Tsia, K. K. & Wong, K. K. 28 mhz swept source at 1.0 m for ultrafast quantitative phase imaging. Biomedical optics express 6, 3855–3864 (2015).
  • [18] Chen, C. L. et al. Deep learning in label-free cell classification. Scientific reports 6, 21471 (2016).
  • [19] Mahjoubfar, A. et al. Time stretch and its applications. Nature Photonics 11, 341 (2017).
  • [20] Mahjoubfar, A., Chen, C., Niazi, K. R., Rabizadeh, S. & Jalali, B. Label-free high-throughput cell screening in flow. Biomedical optics express 4, 1618–1625 (2013).
  • [21] Goda, K. & Jalali, B. Dispersive fourier transformation for fast continuous single-shot measurements. Nature Photonics 7, 102 (2013).
  • [22] Solli, D., Gupta, S. & Jalali, B. Optical phase recovery in the dispersive fourier transform. Applied Physics Letters 95, 231108 (2009).
  • [23] Goda, K., Solli, D. R., Tsia, K. K. & Jalali, B. Theory of amplified dispersive fourier transformation. Physical Review A 80, 043821 (2009).
  • [24] Xing, F., Chen, H., Xie, S. & Yao, J. Ultrafast three-dimensional surface imaging based on short-time fourier transform. IEEE Photonics Technology Letters 27, 2264–2267 (2015).
  • [25] Goda, K., Tsia, K. & Jalali, B. Serial time-encoded amplified imaging for real-time observation of fast dynamic phenomena. Nature 458, 1145 (2009).
  • [26] Feinerman, O., Veiga, J., Dorfman, J. R., Germain, R. N. & Altan-Bonnet, G. Variability and robustness in t cell activation from regulated heterogeneity in protein levels. Science 321, 1081–1084 (2008).
  • [27] Sigal, A. et al. Variability and memory of protein levels in human cells. Nature 444, 643 (2006).
  • [28] Roggan, A., Friebel, M., Dörschel, K., Hahn, A. & Mueller, G. J. Optical properties of circulating human blood in the wavelength range 400-2500 nm. Journal of biomedical optics 4, 36–47 (1999).
  • [29] Vona, G. et al. Isolation by size of epithelial tumor cells: a new method for the immunomorphological and molecular characterization of circulating tumor cells. The American journal of pathology 156, 57–63 (2000).