I Introduction
Ability to generalize concepts is fundamental component of intelligence and core for designing smart systems [6, 7]. Neural networks simulates this behavior with hierarchical learning of concepts. When it comes to automation, counting is an important task from machine vision application [1] to cell counting [2]. While neural networks manipulates numerical quantities but it is not associated with systematic generalization [3, 4]. These networks fail to generalize as evident from high generalization error while predicting quantities that lie outside the training numerical range as shown in table I or in figure 5 elaborated in experiment section. This highlights memorization behavior in neural networks instead of generalization abilities.
Neural accumulators (NAC) and neural arithmetic logic units (NALU) [5]
are biased to learn systematic numerical computation and performs relatively better than non linear activation functions for arithmetic operations. This numerical bias of learning computations makes them excellent choice for counting tasks which are essentially is an increment addition operation only. Deep learning models generally take either segmentation approach with explicit counting trainer or endtoend counting via a regression loss. In this paper we will go through the latter approach
[2] in detail for automation of cell counting process. As cell counting is cumbersome task and dense cell images with higher cell counts containing data outside training numeric range are common in real world scenarios. Achieving true cell automation with less generalization errors is the prime objective of this paper.In regression loss approach Fully convolutional regression networks [2] and Unet [8] architectures learns mapping between an image I(x) and a density map D(x), given by F I(x) D(x) (I R^{m × n} , D R^{m × n} ) for a m
n pixel image. Later on, variations of these architectures with different activation functions are discussed in this paper. And their performance is compared with Neural accumulators (NAC) and Neural arithmetic logic units (NALU) concatenated architectures in the form of numerically biased residual connections , similar to ResNets
[9] as shown in figure 1 but instead of adding the previous layer input they are concatenated. On the surface, this proposed architecture changes leads to accuracy improvements due to increased model capacity with these numerically biased units. But, our results with custom high count dataset created from BBBC005 [29] reflects increased generalization counting abilities for high cell count images with higher relative decrease in mean absolute error(MAE).Our concatenation based residual architecture utilizes the fundamentals of batch normalization like specified identity mapping architecture in ResNets
[17]. But, instead of using convolution operation directly this network leverages numerical bias information obtained from NAC and NALU operations applied on input layer and then finally uses convolution operation on the concatenated layer. Before and after this concatenation of this numerical bias learning operation, batch normalization is carried out and output of this operation is added back again to our next main network layer, as shown in figure 1.With means of this paper we introduce changes in current regression based model architectures for endtoend counter training and produce systems with higher accuracies for higher count testing ranges as well. Also, we validate our trained models on a different specially tailored validation dataset with approximately five times higher counts of cells as compared to training dataset created from BBBC005 synthetic cell dataset [29].
Ii Related Work
Intuitive numerical understanding is important in learning and by adjunct important in deep learning [6] for creating better models with higher generalization capabilities. Counting objects [10, 11, 12, 2, 14] in given image is a widely studied task. Trained models for counting tasks either use a deep learning model to segment instances of given object then count them in a postprocessing step [15] or learn endtoend predict count via a regression loss [2]. Networks like Countception [16] added the concept of average over redundant predictions with its deep Inception family network. Also, recent architectures like ResNets [17], Highway Networks [18] and Densenets [19] advocate linear connections like Countception to promote better learning bias. Such models have better performance, though additional computational overhead due to increased depth of given architectures do arise. Our work highlights the generalization capabilities of the network, that extrapolate well on unseen parts of solution space which highlights underlying structure of behavior governingequations [20]. We introduce architectural changes in models that learns residual functions and preserves input information which is numerically biased with reference to input layers. It is somewhat similar to ResNets [9]
, which are easier to optimize and gain accuracy with increasing depth. With our experiments we aim to highlight that models with numerically biased concatenated residual functions helps in achieving better results with their addition in the form of a comparative study with original architectures. Also, with our results in this paper we demonstrate with our results that backpropagation learns this numerical bias without any explicit numeric quantity being provided as input implying that better computer vision counters can be trained with this module when added to existing convolutional neural network architectures.
Density based estimation doesn’t require prior object detection or segmentation
[13, 21, 22]. In previous years, several works have investigated this approach. In [13], the problem is stated as density estimation with a supervised learning algorithm, D(x) = c
^{T}(x), where D(x) represents groundtruth density map, and (x) represents local features and parameters c are learned by minimizing the error between predicted and true density with quadratic programming over all possible subwindows. In [22], regression forest is used to exploit patchbased idea for learning structured labels, then for new input image density map is estimated averaged over structured patchbased predictions. Also, in [21]an algorithm is used that allows fast interactive counting with ridge regression.
Cell counting [13]
problem is classified into supervised learning problem that learns mapping between an image I(x) and a density map D(x), denoted by F
I(x) D(x) (I R^{m × n} , D R^{m × n} ) for a m n pixel image, see figure 2. Density function D(x) function is defined on pixels in given image, integrating this map over an image region gives an estimate of number of cells in that region. CNNs [23, 24] are quite popular in the biomedical imaging because of their simple architecture and achieve great results. Like in mitosis detection [26], neuronal membrane segmentation
[25] and analysis of C. elegans embryos development [27]. Previously, fully convolutional regression networks (FCRNs) and Countception have given stateoftheart results in cell counting, with potential for cell detection of overlapping cells.Also, UNets [8] a type of fully convolutional network, uses a modified version of architecture proposed by Ciresan et al. [25] as latter is slow and tradeoff between localization and use of context are present. In UNets pooling operations are replaced by upsampling operations to supplement usual contracting network. For localization high resolution features from contracting path are combined with unsampled output. Based on this information a successive convolution layer then learn to assemble more precise output. For our experimentation we selected fully convolution regression networks(FCRN) and Unet based on simple architecture and relative similarity in architecture with the difference being that Unet already uses inputs from previous layer for localization.
Iii Experiments
In this section first we conceptually explore numerical accumulators (NACs) and numerical arithmetic logic units (NALUs) and compare their addition capabilities with multilayer perceptrons equipped with different activation functions. With this study we aim to select concatenated residual connection variants from above mentioned numerically bias units which can best approximate the counting behavior and compare them with standard FCRN and Unet neural network architecture for regression loss approach in the following experiment done on synthetic dataset
[28]. For validation of counting generalization achieved, our trained models are tested against synthetic cell image dataset with approximately five times higher counts than training data.Iiia Visual understanding of neural accumulators (NACs) and neural arithmetic logic units (NALUs)
Neural accumulators (NACs) [5] supports accumulation of numerical quantities additively, a desirable bias for linear exploration while counting. It is special type of linear layer with transformation matrix W being continuous and differentiable parameterization for gradient descent. W = ()() consists of elements in [1, 1] with bias close to −1, 0, and 1. See figure 3 for ideation of this concept with following equations for NAC: a = Wx, W = ()() where , are learning parameters and W is transformation matrix.
For complex mathematical operations like multiplication and division we use neural arithmetic logic units (NALUs) [5]. It uses weighted sum of two subcells, one for addition or subtraction and another of multiply, division or power functions. It demonstrates that neural accumulators (NACs) can be extended for learning scaling operations with gatecontrolled suboperations. See figure 3 for ideation of this concept with following equations for NALU: y = ga + (1g)m; m = W(( x + )), g = (Gx) where m is subcell that operates in log space and g is learned gate, both contains learning parameters.
IiiB Comparative analysis of addition operation
Here, we use neural networks with NACs/NALUs and multilayer perceptrons (MLP) with different activation functions but same structures. These are trained with two randomly generated inputs from uniform distribution
a and b with each having 2^{14} data points for training. Prediction capabilities on test data with values ranging up to 10 times the training range are evaluated. Refer figure 4 to observe architecture for both these trained models in this comparative experiment.Comparative analysis is summarized in table I with mean absolute error (MAE) as an accuracy measure for MLP variants, NAC, NALU and its variants with changed learned gate for extrapolation. Also, in NALUTanh and NALUHard Sigmoid the learning gate g’s is changed to observe any improvements in NALU’s performance based on this change.
Layer Configuration/Activations 
Mean Absolute Error (a+b) 

Linear MLP 

Sigmoid MLP  
Tanh MLP  
ELu MLP  
ReLU MLP  
Leaky ReLU MLP  
PReLU MLP  
NAC  
NALU  
NALUHard Sigmoid  
NALUTanh 
From these results stated in table I and visualized in figure 5 we conclude that Linear, LeakyReLU, ReLU activations and NAC, NALU, NALUTanh modules were the top performers in extrapolation task for numeric addition operation task. Hence, these top performers are used further in cellcounting task on synthetic dataset for learning endtoend counting mechanism.
IiiC Counting experiment
^{1}^{1}1Code repository: https://github.com/ashishrana160796/nalucellcountingIn this experiment section the first subsections elaborates the datasets used for training and validation of our trained models, plus the data augmentation techniques used in our experiment. After that we elaborate onto different architectures used for training having different activation layers on standard architectures and residual concatenated connection modules on modified proposed model architectures.
IiiC1 Datasets and data augmentation
Synthetic dataset which is generated by system [28]. 200 highlyrealistic synthetic fluorescence microscopic images of bacterial cells are used for experimentation with a 75/25 traintest split for training each model architecture and its variants. Images are having average of 174±64 cells.
For validation of trained models and checking true generalization capabilities we use BBBC005 from the the Broad Institute’s Bioimage Benchmark Collection [29]. 600 images have a corresponding foreground mask are classified as completely infocus for ground truth. We take a subset of this dataset with highly focused F1
images only and coalesce image into one after replicate same image 16 times with some padding around each subimage. After these changes the ground truth values are accordingly changed and then image is resized to the same dimensions as original dataset, see figure
6.Data augmentation with elastic deformations to training images is applied for teaching network the desired invariance and robustness properties, like specified in figure 8. These elastic deformations are introduced in the form of angular shear in the training images. Translation and rotation invariance along with robustness to gray value variations and deformations is main focus of augmentation process for microscopic images. Disfigurement using random displacement vectors on a coarse 3x3 grid were also generated. These data augmentation techniques especially are helpful for our custom data which is created just by repeating the original image in order to supplement a more robust dataset for the model to train on.
IiiC2 Defining regression task and architecture details
In training dataset ground truth is provided as dot annotation corresponding to each cell. For training, dot annotations are represented by Gaussian and density surface D(x) which is formed from superposition of Gaussians. The optimization task is to regress density surface from corresponding image I(x). This is achieved by training convolutional neural networks (CNN) using mean square error between output heat map and target density surface as the loss function. Hence, at inference given an input I(x), the model predicts density heat map D(x).
FCRNs are inspired from VGGnet, we only used small kernels of size 3x3 pixels. Feature maps are increased for avoiding spatial information loss. Activation layers like convolutionReLUPooling are popular in CNN architectures [23]. Here, we have altered these layers to create different activation maps which contains some numerical bias in the form of residual connections and regularization by batch normalization. The first layers contains convolutionspooling operations, then we undo spatial reduction by upsampling operations for learning endtoend training. Also, for dimensional compatibility of residual NAC or NALU modules we did pooling and upsampling operations on these residual modules after batch normalization. See figure 8 for comparison between earlier original model and newly proposed architecture along with parameter details.
Unet is modified upon the previously discussed FCRN architecture by having large number of feature channels for upsampling to propagate context information to high resolution layers. That makes expansive path almost symmetric to contracting path yielding a ushape. Similar to above FCRNs optimization problem formulation remains the same, residual concatenated connection addition with NACs and NALU units along with batch normalization is done. Also, Unet architecture used in this paper is more computationally expensive than FCRN having approximately thrice the number of parameters leading to more feature learning capacity. See figure 9 for comparison between earlier original Unet model and newly proposed architecture along with parameter details.
For the concatenation of residual connections of these units the dimensional consistency is maintained by added pooling and upsampling operations accordingly for these units to merge with the base network. FCRN’s implementation resembles that of MatConvNet [30]
as upsampling in Keras is implemented by repeating elements, instead of bilinear sampling. In Unets
[8], for implementation lowlevel feature representations are fused during upsampling, aiming to compensate the information loss due to max pooling.Iv Results
Mean absolute error (MAE) is the metric used in this paper for measuring results for cell counting on the synthetic cell dataset [28] and custom BBBC005 synthetic modified high cell count validation dataset.

Mean Absolute Error (MAE): Mean Absolute Error (MAE): The mean absolute error is an average of the difference between the predicted value and true value.
(1) (2)

Relative Improvement Percentage Relative Improvement Percentage (RIP): Here, in context of this paper it defined as percentage improvement in MAE of a given model with respect to baseline ReLU models for FCRN and Unet architectures. In below equation, M_{r} is MAE from baseline ReLU model and M_{i} is model under consideration.
(3)
Result table II compares earlier FCRN, Unet architectures with new numerically biased ResNet like connection modules with NACs and NALUs units under current training setup. With our setup we able to obtain similar results as mentioned in earlier reference papers and also we have equipped earlier model architectures with different regularization activations as specified in the table. From earlier ReLU implementation clearly Linear and LeakyReLU activation regularization based models have performed well. Also, for both model structures NAC and NALUs residual modules have outperformed all the earlier specified regular FCRN architecture. And similar arguments and results are extended by Unet model results where NALU layer concatenation based Unet outperforms all the models trained for our experiment.
FCRNModels 
MAE  UNetModels  MAE 

ReLU 
3.43  ReLU  1.78 
LeakyReLU  3.39  LeakyReLU  1.74 
Linear  3.34  Linear  1.73 
NALUtanh  3.21  NALU/tanh  1.56 
NALU  3.17  NALU  1.42 
NAC  3.23  NAC  1.63 

Result table III compares performance of above trained models on a new validation dataset containing much higher cell counts for measuring performance on extrapolation capabilities counting tasks. For validation set we have used 300 images of size 256x256 pixels with cell counts averaging around 1200±12. Here also, NAC and NALU based residual concatenation module based models outperforms earlier architectures for counting tasks. This time relative improvement in even more for FCRN and Unet models showcasing better generalization abilities of trained models.
FCRNModels 
MAE  UNetModels  MAE 

ReLU 
3.04  ReLU  2.87 
LeakyReLU  2.99  LeakyReLU  2.62 
Linear  2.85  Linear  2.47 
NALUtanh  2.32  NALUtanh  1.95 
NALU  2.27  NALU  1.87 
NAC  2.40  NAC  1.92 

Relative improvement in predictions is visualized in figure 10
against ReLU based regularization as base result for comparison with other regularization layer based changes in FCRNs & Unets and concatenation layer NALU/NAC residual connection addition in FCRNs & Unets. It includes averaged out comparison from multiple executions of training and testing runs for both interpolation testing and extrapolation validation counting tasks for FCRN and Unet variant models with respect to ReLU based FCRN and Unet model. From, this figure it is clearly highlighted that models with NAC and NALUs residual modules have better generalization capabilities for extrapolation counting tasks i.e. they are better generalizers for this given cell counting task with increase in relative improvement in prediction as compared to base ReLU implementation. This figure shows more increase in relative improvement as we move right towards horizontal axis for both testing and validation task with extrapolation where in validation extrapolation task NAC/NALU models performing even better than testing data from which we can conclude that trained models are having better generalization abilities with some learned numerical bias in their trained weights with which even better predictions for higher count cells is made.
V Summary
We were able to show that addition of newly proposed NACs and NALU units in existing architectures in the form of residual concatenation connection layer modules achieves better results. With numerically biased residual connections, higher accuracy for more dense images having higher counts of cells is achieved. Hence, producing more generalized cell counters that provides better predictions for real life usecases. Finally, for code implementation details and other extra experimental results refer to this paper’s github repository.
References
 [1] Baygin, Mehmet, et al. ”An Image Processing based Object Counting Approach for Machine Vision Application.” arXiv preprint arXiv:1802.05911 (2018).
 [2] Xie, Weidi, J. Alison Noble, and Andrew Zisserman. ”Microscopy cell counting and detection with fully convolutional regression networks.” Computer methods in biomechanics and biomedical engineering: Imaging & Visualization 6.3 (2018): 283292.
 [3] Fodor, Jerry A., and Zenon W. Pylyshyn. ”Connectionism and cognitive architecture: A critical analysis.” Cognition 28.12 (1988): 371.
 [4] Marcus, Gary F. The algebraic mind: Integrating connectionism and cognitive science. MIT press, 2018.
 [5] Trask, Andrew, et al. ”Neural arithmetic logic units.” Advances in Neural Information Processing Systems. 2018.
 [6] Dehaene, Stanislas. The number sense: How the mind creates mathematics. OUP USA, 2011.
 [7] Gallistel, C. Randy. ”Finding numbers in the brain.” Philosophical Transactions of the Royal Society B: Biological Sciences 373.1740 (2018): 20170119.
 [8] Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. ”Unet: Convolutional networks for biomedical image segmentation.” International Conference on Medical image computing and computerassisted intervention. Springer, Cham, 2015.

[9]
He, Kaiming, et al. ”Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
 [10] Arteta, Carlos, et al. ”Interactive object counting.” European conference on computer vision. Springer, Cham, 2014.
 [11] Chan, Antoni B., ZhangSheng John Liang, and Nuno Vasconcelos. ”Privacy preserving crowd monitoring: Counting people without people models or tracking.” 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2008.
 [12] Seguí, Santi, Oriol Pujol, and Jordi Vitria. ”Learning to count with deep object features.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2015.
 [13] Lempitsky, Victor, and Andrew Zisserman. ”Learning to count objects in images.” Advances in neural information processing systems. 2010.
 [14] Zhang, Cong, et al. ”Crossscene crowd counting via deep convolutional neural networks.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
 [15] Hernández, Carlos X., Mohammad M. Sultan, and Vijay S. Pande. ”Using Deep Learning for Segmentation and Counting within Microscopy Data.” arXiv preprint arXiv:1802.10548 (2018).
 [16] Paul Cohen, Joseph, et al. ”Countception: Counting by fully convolutional redundant counting.” Proceedings of the IEEE International Conference on Computer Vision. 2017.
 [17] He, Kaiming, et al. ”Identity mappings in deep residual networks.” European conference on computer vision. Springer, Cham, 2016.
 [18] Srivastava, Rupesh Kumar, Klaus Greff, and Jürgen Schmidhuber. ”Highway networks.” arXiv preprint arXiv:1505.00387 (2015).
 [19] Huang, Gao, et al. ”Densely connected convolutional networks.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
 [20] Brunton, Steven L., Joshua L. Proctor, and J. Nathan Kutz. ”Discovering governing equations from data by sparse identification of nonlinear dynamical systems.” Proceedings of the National Academy of Sciences 113.15 (2016): 39323937.
 [21] Arteta, Carlos, et al. ”Interactive object counting.” European conference on computer vision. Springer, Cham, 2014.
 [22] Fiaschi, Luca, et al. ”Learning to count with regression forest and structured labels.” Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). IEEE, 2012.

[23]
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. ”Imagenet classification with deep convolutional neural networks.” Advances in neural information processing systems. 2012.
 [24] LeCun, Yann, et al. ”Gradientbased learning applied to document recognition.” Proceedings of the IEEE 86.11 (1998): 22782324.
 [25] Ciresan, Dan, et al. ”Deep neural networks segment neuronal membranes in electron microscopy images.” Advances in neural information processing systems. 2012.
 [26] Cireşan, Dan C., et al. ”Mitosis detection in breast cancer histology images with deep neural networks.” International Conference on Medical Image Computing and Computerassisted Intervention. Springer, Berlin, Heidelberg, 2013.
 [27] Ning, Feng, et al. ”Toward automatic phenotyping of developing embryos from videos.” IEEE Transactions on Image Processing 14 (2005): 13601371.
 [28] Lehmussola, Antti, et al. ”Computational framework for simulating fluorescence microscope images with cell populations.” IEEE transactions on medical imaging 26.7 (2007): 10101016.
 [29] Ljosa, Vebjorn, Katherine L. Sokolnicki, and Anne E. Carpenter. ”Annotated highthroughput microscopy image sets for validation.” Nature methods 9.7 (2012): 637637.
 [30] Vedaldi, Andrea, and Karel Lenc. ”Matconvnet: Convolutional neural networks for matlab.” Proceedings of the 23rd ACM international conference on Multimedia. ACM, 2015.
Comments
There are no comments yet.