Log In Sign Up

Data-driven Feature Sampling for Deep Hyperspectral Classification and Segmentation

The high dimensionality of hyperspectral imaging forces unique challenges in scope, size and processing requirements. Motivated by the potential for an in-the-field cell sorting detector, we examine a Synechocystis sp. PCC 6803 dataset wherein cells are grown alternatively in nitrogen rich or deplete cultures. We use deep learning techniques to both successfully classify cells and generate a mask segmenting the cells/condition from the background. Further, we use the classification accuracy to guide a data-driven, iterative feature selection method, allowing the design neural networks requiring 90 fewer input features with little accuracy degradation.


page 2

page 4

page 5


Wavelength-aware 2D Convolutions for Hyperspectral Imaging

Deep Learning could drastically boost the classification accuracy for Hy...

Biosensors and Machine Learning for Enhanced Detection, Stratification, and Classification of Cells: A Review

Biological cells, by definition, are the basic units which contain the f...

Automated identification of neural cells in the multi-photon images using deep-neural networks

The advancement of the neuroscientific imaging techniques has produced a...

Trends in deep learning for medical hyperspectral image analysis

Deep learning algorithms have seen acute growth of interest in their app...

Fast forward feature selection for the nonlinear classification of hyperspectral images

A fast forward feature selection algorithm is presented in this paper. I...

Multiclass Yeast Segmentation in Microstructured Environments with Deep Learning

Cell segmentation is a major bottleneck in extracting quantitative singl...

Machine Learning Based Mobile Network Throughput Classification

Identifying mobile network problems in 4G cells is more challenging when...

I Introduction

Hyperspectral confocal fluorescence microscopy and hyperspectral imaging are powerful tools for the biological sciences, allowing high-content views of multiple pigments and proteins in individual cells within larger populations. As the technology has advanced in speed and ease of use, it is has become practical to think of applications such as high-throughput screening, or understanding heterogeneous cell response to changing environmental conditions, where one might want to identify cells of certain characteristics including phenotype, pigment content, protein expression, as determined by their spatially resolved fluorescence emission for subsequent analysis. Although a few researchers have used classification techniques such as support vector machines 

[1] to identify cells of that exhibit similar spectral emission characteristics, the majority of the analysis of hyperspectral images has been exploratory—developing spectral models for identifying the underlying spectral components [2, 3, 4].

In this work, we employ deep artificial neural network algorithms to classify individual cyanobacterial cells based on their hyperspectral fluorescence emission signatures. Such deep learning methods have increasingly seen extensive use in conventional image processing tasks with relatively low numbers of channels (such as processing RGB images) [5], however their utility in tasks with larger numbers of sensors, such as hyperspectral systems, remains an area of active research. In particular, in biological systems, non-trivial processes may yield complex interactions that can be detected through hyperspectral imaging that are in addition to the long-acknowledged challenges of automated data processing of spatial structure.

In addition to classifying the experimental effects on individual cells, we show how this method can help identify which spectral wavelengths are most useful for the classification. Importantly, the feature selection information could allow customized sensors to be designed for specific applications. This work demonstrates that this technique is suitable for real-time image analysis and high-throughput screening of heterogeneous populations of cyanobacterial cells for differentiating environmental response. The method can be further extended to other cell populations or complex tissue containing multiple cell types.

Ii Methods

Ii-a Dataset

Cyanobacterial culture, hyperspectral confocal fluorescence microscopy, spectral image analysis, and single cell analysis have been described fully in a previous publication [4]. In brief, Synechocystis sp. PCC 6803 cells were grown photoautotrophically in BG11 medium with M (nitrogen containing cultures) or where 1.76 M NaCl was substituted for the (nitrogen deplete cultures). Cultures were maintained under cool white light ( photon m-2 s-1, constant illumination) at with shaking (). Samples were obtained at , , hours for imaging studies. Fig. 1 in Murton et al. shows the experimental design. A small amount of concentration cyanobacterial cell solution () was placed on an agar-coated slide. After a brief settling time () the slide was coverslipped and sealed with nail-polish. Imaging was performed immediately. Hyperspectral confocal fluoresce images were acquired using a custom hyperspectral microscope [6] with of laser excitation and a oil objective (NA ). Spectra from each pixel were acquired with an electron multiplying CCD (EMCCD) with dwell times/pixel and the image was formed by raster scanning with a step size of . Hyperspectral images were preprocessed as described in Jones, et al. [7] to correct for detector spikes (cosmic rays), subtract the detector dark current features, generate cell masks that indicate background pixels, and perform wavelength calibration. To discover the underlying pigments relevant to the biological response to nitrogen limitation, multivariate curve resolution (MCR) analysis [8] was performed using custom software written in Matlab. Alternatively, the preprocessed hyperspectral images were subjected to classification (subject of this paper).

Fig. 1: (a) Each original image and mask is split by pixel producing a dataset comprising 512-dimensional image vectors with an associated mask scalar. (b) Uniform random undersampling is used to balance the dataset before being undergoing a 80/10/10 Training/Validation/Test split. (c) Median values of example trace (5) from the 48 hour collection dataset.

Ii-B Data formatting

We performed image classification on the hyperspectral images both on individual pixels, which thus did not incorporate spatial information, and on whole images, which combined cell masking with classification. For training, we only used images from the timepoint.

For pixel classification, to form the training, validation, and testing datasets, we first divide each of the traces and corresponding masks (generated using an automated cell segmentation routine based on a modified marker watershed transform with input from a skilled user) into individual pixels, see Fig. 1(a). This provides roughly -dimensional vectors per trace, each with a corresponding ground truth mask representing Background (BG), Cell Grown in Nitrogen Containing Culture () or Cell Grown in Nitrogen Deplete Culture () respectively. Since the vast majority of the pixels are background pixels, we perform uniform random undersampling to obtain a dataset that contains roughly samples of each class. We randomly select for training, for validation, and for testing.

For joint cell masking and classification, we generated roughly pixel chips using standard data augmentation techniques (crops, reflections, rotations) on both the original images and the ground truth masks. The images were undersampled during generation ensuring that the vast majority (roughly 90%) contained non-trivial masks. One-twentieth of the dataset was set aside for validation. Similar chips were generated from the dataset for testing.

Ii-C Densely Connected Network for Pixel Classification

Pixels from experimental images were classified into one of three categories: Background (BG), Cell Grown in Nitrogen Containing Culture () or Cell Grown in Nitrogen Deplete Culture (

). To perform pixel classification, we use a densely connected feed-forward neural network pictured in Fig. 

2(a). A dropout layer helps prevent overfitting, and the network is trained using an Adam optimizer [9]

. Rectified linear activation is used for the dense layers. Hyperparameter optimization was accomplished using hyperas 

[10] which is a wrapper for hyperopt [11]

on keras 

[12]. For a comparative baseline, we also performed the classification using -tree,

-simultaneous-feature random forests, implemented in Scikit-learn 











Convolutional :


Convolutional :



Convolutional :


Convolutional :
Fig. 2:

(a) The densely connected feed-forward neural network used in the classification task. (b) Convolutional neural network used for the masking task. All convolutional filters are

and pooling is done in blocks.

Ii-D Iterative, Data-Driven Sparse Feature Sampling

We use the densely connected neural network method described above to guide an iterative and data-driven sparse feature sampling algorithm. This approach has some similarities that in [14]

. However, we avoid computational overhead of an evolutionary algorithm by employing a greedy-type algorithm analyzing the synaptic weights of the neural network.

This method requires four discrete steps and a parameter which represents the decrease in accuracy which triggers a re-training.

  1. Train an initial classifier neural network as in II-C on the set of all input features . The dense feed-forward neural network is similar to that in II-C, however we adjust the network parameters to fit a shrinking input size and use an adadelta optimizer [15].

  2. Compute the where are the weights coming from , a dimension in the input layer. The value acts as a metric for the ‘worthiness’ of an input feature.

  3. Remove the dimension corresponding to the minimum to form from . Determine the validation accuracy on without retraining.

  4. If the decrease in the accuracy is more than , train a new network (possibly of smaller size to match ) and repeat from Step 1. If not, repeat from Step 3. Alternatively, we can apply various halting conditions, e.g. a maximum number of iterations.

Ii-E Convolutional Neural Network for Cell Masking

Joint cell masking and classification was performed using neural networks to generate image masks highlighting the various cell types ( or ). To accomplish this, we utilized a convolutional neural network similar to [16] wherein the image undergoes a downsampling follwed by upsampling. The network architecture is shown in Fig. 2(b). All convolutional filters are

, convolutional activation functions are rectified linear, and pooling/upsampling is done in


Mean squared error acted as the loss function. The network was trained using an adadelta optimizer 

[15]. As before, hyperparameter optimization was through the hyperas package.

Ii-F Computing hardware

For training all neural networks, we used an Nvidia DGX-1 node. The DGX-1 is equipped with dual -core Intel Xeon ES- CPUs, GB of system ram, and eight Nvidia Tesla GB P- GPUs with a total of over k CUDA cores.

Iii Results

Iii-a Classification Results

Our first task is to classify pixels as one of three categories Background (BG), Cell Grown in Nitrogen Containing Culture () or Cell Grown in Nitrogen Deplete Culture (). We choose to do a per-pixel classification for several reasons. First, neural networks require a large number of training points, and by splitting our dataset into individual pixels we inflate the number of training samples. Second, a per-pixel classification can be accomplished using a simple densely connected multi-layer perception network. Hence, we can determine the type using the spectral without conflating spatial information. The last reason is biological: In some applications, areas within a cell could be in different environments or states, and subcellular information may be desirable.

Densely connected feed-forward neural networks have a long history in pattern classification. Given the dimensionality of our data and the robustness of our training set, it is perhaps unsurprising that classification accuracy is high. Overall accuracy on the dataset is , with details in Table I.

Precision Recall f
BG 0.99 0.98 0.98
0.99 1.00 0.99
0.98 0.99 0.99
Average 0.99 0.99 0.99
TABLE I: Precision, Recall, and F-scores for the densely connected feed-forward network

This compares to roughly accuracy using a random forest approach. As shown in Fig. 3(a), the majority of the error is due to mis-classifying BG-labeled pixels. This error is possibly due to the fact that the original ground truth masks were expert generated and thus some cells were excluded even though they have strong signal due to either being out of focus or being cut off by the edge of the image frame, see Fig. 3(b). Error between and pixels could be due to algorithm error or the effect of the condition is not uniform within or across cells. Indeed, future analysis may use similar methods to determine effect localization rather than classification.

One advantage of the feed-forward neural network approach is the ability to interpret the layer one weights, allowing our pruning method described in II-D.

Fig. 3:

(a) Confusion matrix for the densely connected feed-forward neural network. (b) Sample images from

hr with dense network pixel classification overlaid. is orange; is yellow; BG is transparent. (c) Sample crops of original hr images, ground truth masks, generated masks and error. BG is coded ; is coded ; is coded .

Iii-B Pruning Dramatically Removes Unneeded Features

Although one of the benefits of hyperspectral imaging is the ability to sample many different wavelengths, there is both a time and resource cost associated with the spectral extent sampled. While there may be complex interactions across wavelengths, particularly in biological systems, it is expected that there would be considerable redundancy across input sensors. For cases in which the ultimate application of a hyperspectral system is classification as described above, we hypothesized that a reduced set of spectral frequencies could be identified that would be sufficient for application purposes. While this can be done a priori in some cases, our goal was to leverage an analytical method to identify the combination of reduced inputs necessary to achieve these results.

Accordingly, we next asked whether the neural network approach described in the above section could be used to down-select spectral features so as to enable classification with fewer input dimensions. As described in section II-D, we used the trained pixel-level neural network representation to identify candidate dimensions—in this case wavelengths—that could be removed while preserving overall algorithmic accuracy.

As shown in Fig. 4(a), we were able to ignore many input dimensions from the images and still maintain highly effective classification. In effect, this shows that somewhere on the order of 90% of the frequencies sampled are not necessary for effective classification. This approach was incremental, while the least important dimensions could be safely ignored from the originally trained network, it was not surprising that the removal of more influential input channels (as identified by synpatic weights) required the networks to be retrained with the reduced inputs to maintain strong performance. However, the number of incremental training cycles was relatively low up until the network was highly reduced.

Fig. 4(b) shows when specific frequencies are removed through this pruning procedure (darker colors are those removed earlier). Not surprisingly, there is structure associated with which frequencies are removed first and which must be maintained for strong classification performance. Due to parameter sensitivity and variability during the training process, the features which are pruned and the order in which they are pruned are not unique.

Fig. 4: (a) The validation accuracy is plotted against the number of removed dimensions. Yellow vertical lines represent a point where the network is re-trained. The algorithm was halted after re-training iterations; we set . (b) Individual frequencies labeled according to their removal order. Frequencies increase left-to-right, top-to-bottom. Darker colors are those removed earlier. Values of represent frequencies remaining after the algorithm halted.

Iii-C CNN Effectively Generates Masks

Finally, we examined whether our analysis approach could be extended to perform not only the experimental classification task but also the identification of regions of interest, namely cells, in our data. Because the pixel-based method described above eliminated the spatial structure of the data, we asked whether deep convolutional networks would be capable of jointly performing both classification and cell masking. Convolutional networks are state of the art in standard image classification and image segmentation tasks, so we expected them to be effective at the task of spatially identifying cells.

As shown in Fig. 3(c), the convolutional networks were able to generate a mask image simultaneously segmenting and classifying the cells. Over a -image test set generated from the hr dataset, the average per-pixel L1-error was . By allowing the network to produce non-integer values, we obtain smooth and detailed cell outlines. Furthermore, by conjoining the spatial and spectral dimensions, we expect this approach to be more robust to noise and extraneous objects.

Iv Conclusion

In this study, we demonstrate that modern deep artificial neural network approaches can be used to perform rapid classification of biological data sampled by hyperspectral imaging. Both the pixel-based and whole image-based classification results demonstrate that these approaches are highly effective with the class of data represented by this experimental data and suggest that deep neural network approaches are well suited for hyperspectral imaging analysis even in non-trivial application domains such as biological tissue.

We believe that the sampling reduction technique we describe here is a unique use of a neural network’s classification ability to guide the identification of which particular sensors—in this case wavelengths—are necessary to measure. Most dimensionality reduction methods, such as PCA and non-linear variants such as local linear embedding (LLE), are focused primarily on reducing the size. While they can identify channels that are not used at all, they are more directed towards storing and communicating data in fewer dimensions which still leverage information sampled across the original cadre of sensors. Thus these dimensionality reduction do not necessarily reduce the demands on the sensor side, even though they do often compress and describe data quite effectively.

The methods described here share some similarities to existing techniques for hyperspectral imaging using techniques such as deep stacked autoencoders or principal components analysis coupled with a deep convolutional network that extract high-level features which can then be fed into a simple classifier 

[17, 18]. In contrast, our approach is focused on directly going from the data to the classification of either pixels or whole regions (in our case, cells). This allows us to better leverage the structure of the dimensionality of the data, which for hyperspectral scenarios is often sparser in absolute numbers of images but is proportionally richer in terms of dimensionality.

Given deep neural networks’ history of broad applicability in other domains, we fully expect that these methods will be generalizable to other, similar datasets and anticipate subsequent analysis of a variety of cell types under experimental conditions. Further refinement of our convolutional neural network should provide effective and efficient sub-cellular segmentation via embedded computing platforms, and ultimately we aim to extend the use of these neural network algorithms to inform experimental results.


The authors thank the Pakrasi Lab at Washington University in St. Louis for the Synechocystis cells, Jaclyn Murton and Michael Sinclair at Sandia National Laboratories for data collections and hyperspectral imager maintenance, respectively. This work was partially supported by the Photosynthetic Antenna Research Center (PARC), an Energy Frontier Research Center funded by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences under Award DE-SC0001035 (hyperspectral image data collection). This work was supported by Sandia National Laboratories’ Laboratory Directed Research and Development(LDRD) Program under the Hardware Acceleration of Adaptive Neural Algorithms (HAANA) Grand Challenge. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia LLC, a wholly owned subsidiary of Honeywell International Inc. for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.


  • [1] K. Rajpoot and N. Rajpoot, “Svm optimization for hyperspectral colon tissue cell classification,” Medical Image Computing and Computer-Assisted Intervention–MICCAI 2004, pp. 829–837, 2004.
  • [2] W. F. Vermaas, J. A. Timlin, H. D. Jones, M. B. Sinclair, L. T. Nieman, S. W. Hamad, D. K. Melgaard, and D. M. Haaland, “In vivo hyperspectral confocal fluorescence imaging to determine pigment localization and distribution in cyanobacterial cells,” Proceedings of the National Academy of Sciences, vol. 105, no. 10, pp. 4050–4055, 2008.
  • [3] A. M. Collins, M. Liberton, H. D. Jones, O. F. Garcia, H. B. Pakrasi, and J. A. Timlin, “Photosynthetic pigment localization and thylakoid membrane morphology are altered in synechocystis 6803 phycobilisome mutants,” Plant physiology, vol. 158, no. 4, pp. 1600–1609, 2012.
  • [4] J. Murton, A. Nagarajan, A. Y. Nguyen, M. Liberton, H. A. Hancock, H. B. Pakrasi, and J. A. Timlin, “Population-level coordination of pigment response in individual cyanobacterial cells under altered nitrogen levels,” Photosynthesis Research, pp. 1–10, 2017.
  • [5] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
  • [6] M. B. Sinclair, D. M. Haaland, J. A. Timlin, and H. D. Jones, “Hyperspectral confocal microscope,” Applied optics, vol. 45, no. 24, pp. 6283–6291, 2006.
  • [7] H. D. Jones, D. M. Haaland, M. B. Sinclair, D. K. Melgaard, A. M. Collins, and J. A. Timlin, “Preprocessing strategies to improve mcr analyses of hyperspectral images,” Chemometrics and Intelligent Laboratory Systems, vol. 117, pp. 149–158, 2012.
  • [8] D. M. Haaland, H. D. Jones, and J. A. Timlin, “Experimental and data analytical approaches to automating multivariate curve resolution in the analysis of hyperspectral images.” in Resolving spectral mixtures.   Elsevier:Amsterdam, 2015, pp. 381–406.
  • [9] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  • [10] “Hyperas,”
  • [11] J. Bergstra, D. Yamins, and D. Cox, “Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures,” in

    International Conference on Machine Learning

    , 2013, pp. 115–123.
  • [12] F. Chollet et al., “Keras,”, 2015.
  • [13] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
  • [14]

    S. Li, H. Wu, D. Wan, and J. Zhu, “An effective feature selection method for hyperspectral image classification based on genetic algorithm and support vector machine,”

    Knowledge-Based Systems, vol. 24, no. 1, pp. 40–48, 2011.
  • [15] M. D. Zeiler, “Adadelta: an adaptive learning rate method,” arXiv preprint arXiv:1212.5701, 2012.
  • [16] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    , 2015, pp. 3431–3440.
  • [17] Y. Chen, Z. Lin, X. Zhao, G. Wang, and Y. Gu, “Deep learning-based classification of hyperspectral data,” IEEE Journal of Selected topics in applied earth observations and remote sensing, vol. 7, no. 6, pp. 2094–2107, 2014.
  • [18]

    W. Zhao and S. Du, “Spectral–spatial feature extraction for hyperspectral image classification: A dimension reduction and deep learning approach,”

    IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 8, pp. 4544–4554, 2016.