The ability of deep artificial neural networks (ANNs) to represent a broad class of complex functions has made them especially useful for machine learning applications. Currently, most implementations of ANNs in machine learning communicate in real-valued signals of 32-bit or higher resolution and are therefore incompatible with neuromorphic systems, which communicate in discrete 1-bit pulses. Further challenges are introduced by constraints imposed by specific neuromorphic chip sets, such as limits on weight range and resolution. Spiking Neural Networks (SNNs) are a class of ANN that uses spiking activation functions at the neural layers and are therefore compatible with neuromorphic implementations
. Gradient-based training methods, commonly used on conventional ANNs, cannot be directly applied to the inherently discontinuous spiking neurons in SNNs without incurring loss. Instead, a common approach is to first train an ANN via stochastic-gradient-descent (SGD) and then convert the learned network weights to weights appropriate for an SNN. Existing translation methods are limited in the architectures to which they may be applied.
Neuromorphic chips have the potential to significantly reduce the power requirements of deep neural network implementations. In conventional Von Neumann hardware, the highly parallel operations of neural network calculation tend to create a bottleneck between the CPU and memory . As a result, most deep networks are extremely energy inefficient when run on these architectures . Neuromorphics alleviate the bottleneck by co-locating memory and processing in a single neuro-synaptic mesh; synaptic channel weights act as memory and neurons handle computation. Additionally, because the neurons operate through spikes, energy is only expended when needed for computation, resulting in overall lower power consumption for parallel operations . Neuromorphics may then offer a path to deploy ANNs on power-constrained devices that would otherwise be infeasible.
ANNs have been successfully deployed to neuromorphic chips in a few limited cases. Most existing methods depend on directly mapping learned ANN weights to SNNs, subject to constraints [6, 7]. A significant drawback for these types of methods is that they require the SNN neuron firing rate curves to closely emulate the ANN activation function. Most analog and hybrid neuromorphic chips use a form of leaky integrate and fire (LIF) neuron , which are highly non-linear and have a gradient approaching infinity near the voltage threshold. Methods relying on weight mapping are therefore only applicable to digital neuromorphic architectures in which the simulated neuron function can be artificially controlled . Newly emerging neuromorphic architectures are mixed analog-digital, which use analog neurons. These hybrid chips offer theoretically lower power consumption, making a more general translation approach desirable . Additionally, existing approaches often replicate the network multiple times to increase accuracy, requiring a large number of neurons.
The method introduced here first trains an ANN using a gradient-based approach and then translates it into an SNN with a similar architecture. The method proposed, which we call layer-wise synapse optimization (LSO), translates the ANN layer-by-layer, solving for the synaptic weights such that the hidden-layer activations of the ANN are optimally represented in the SNN. This method formulates the translation problem as a linear least-square error problem at each layer, accounting for the SNN neuron behavior. This approach allows the SNN to optimally represent the ANN features for a given pair of ANN activation function and SNN neuron, though analogous pairs (i.e. rectified linear unit (ReLU) and LIF) would be expected to retain a better mapping. Introduction of noise at each translation helps to account for the spiking behavior which is not captured in the firing rate-approximation.
Because the method operates on pre-trained ANNs, existing state-of-the-art networks (i.e. AlexNet, VGG, etc.) can be employed without replicating the data-intensive training process. Additionally, networks for tasks requiring specialized, often difficult to implement training approaches, may be trained using proven methods without additional constraint. This would be especially beneficial for fields such as reinforcement learning.
The method also introduces an optimal compression method, which allows the SNN layer size to be selected independent of the corresponding ANN layer size. In this way, architecture-specific constraints on neuron-ensemble or core size may be imposed with minimal loss. In contrast with previous works, LSO does not require the SNN to have identical or linearly replicated structure to the source ANN. In addition, it does not require the ANN non-linearities to emulate the neuron spike-rate behavior, making it applicable to hybrid and analog architectures in addition to digital. It can be applied to feed-forward multi-layer perceptron (MLP) networks including convolutional neural networks (CNN).
This work proposes a translation approach that translates the hidden-layer activations of the ANN to SNN representation layer-by-layer. Instead of mapping ANN weights directly onto the SNN synapses, the optimal weights are identified by fitting SNN neuron outputs to ANN activation-function outputs for a given sample input set.
At each layer , the algorithm solves for the post-neuron synaptic weights, or decoders, such that the features of the ANN layer are optimally represented by the equivalent SNN layer. Figure 1 shows an MLP with the network values labeled with the notation used in this section. are the prior layer outputs (or network inputs for the first layer) and are the ANN layer inputs generated through the pre-multiplication of with . In the SNN, are the neural layer inputs of the SNN that approximate for the ANN, and , are the analogous activations. The hidden layer solution process is outlined in Eqs. 1 to 8:
The process is described in the remainder of this section.
LSO works by computing what the input to the SNN layer must be such that the output of that layer will match the output of the ANN activations. A set of samples is required to generate the target feature set for each layer. This sample set should be chosen such that it provides good coverage of the input space of interest for the network. First, the algorithm replicates the provided samples a selected number of times. These repeated samples will be used to increase sensitivity to neural noise, as described later in this section. The repeated samples are used for the first layer as .
The input samples are passed to the first ANN layer activation-function, which generates the layer features , as shown in equations Eq. 1 and Eq. 2. These features are scaled such that the maximum feature activation does not exceed the network’s maximum firing rate. This step was inspired by previous work 
. Once the ANN features are known, they cannot be directly compared to the SNN layer outputs as the SNN neurons operate in a time-domain, whereas the ANN does not. In order to use this method, we approximate a constant-time SNN output as the neuron firing rate to give an approximate estimate of the average neuron output. For the basic Leaky-Integrate-and-Fire (LIF) neuron used in this study, a firing rate equation and its inverse may be easily derived. This is shown below for a neuron with input current, membrane time-constant , membrane resistance , and voltage threshold .
We can begin by considering the differential equation for membrane voltage :
Assuming constant input over integration time-step
The neuron spikes when exceeds a threshold voltage . The spike period is
Adding a refractory period and inverting gives the firing frequency
Finally, the inverse of this function with respect to the input current is
The domain and range of this InvLIF(e)quation is limited to . Strict implementation of this would result in all negative ANN inputs at
being set to 0, resulting in a loss of information. It was also observed that this implementation made the resulting matrix more likely to be ill-conditioned. A heuristic approach is instead taken to address 0-valued frequencies. In this approach, the negative inputs are made positive and scaled by the InvLIF function, and then made negative again. This ensures the values in the objectiveare all similarly scaled, aiding with the solution of the resulting LSE problem. Using this approach, the ANN activations may be converted into objective SNN layer inputs as shown in Eq. 4.
Once the objective SNN-layer inputs are known, the decoder weights may be solved for directly, which may not be feasible because many neuromorphic architectures impose constraints on neural core sizes and connectivity. It may be desirable, instead, to limit the size of any given SNN layer to an arbitrary size of the designer’s choice. This is accomplished by implementing a truncation method on the target activations. The truncation process is outlined in Algorithm 2.
The target matrix is first decomposed using singular value decomposition into the component, , and matrices, where is a diagonal matrix of singular values . The highest-order singular values are set to 0, where is the total number of singular values minus the target size of the SNN at that layer. The new singular value matrices are recombined to provide an activation matrix with the neuron dimension equal to the desired SNN layer size.
The resulting truncated matrix is a lower rank representation of the original. This representation is shown to be the optimal (minimum Frobenius norm difference) reduction by the Eckart-Young theorem, and may be extended to discrete integer-valued problems as well . This method is similar to existing truncation methods proposed for use on ANNs [13, 14]. Unlike these methods, the proposed one uses the generated sample information to truncate based on optimal hidden-layer feature representation rather than on optimal weight matrix truncation. A disciplined study on the difference in these two methods is left for future work.
The decoders may now be solved for using the truncated activations. In this work, we solve the resulting least-squares problem using the pseudo-inverse. Using the decoders, the output of the SNN layer may be solved using the LIF rate function.
The samples used at the following hidden layer are then generated by adding suitable noise, to the calculated activations, as shown in Eq. 8
. As previously stated, the LIF rate output is only an approximation to the actual behavior of the neurons when operating in spiking mode. This approximation fails to capture the temporal effects introduced by this spiking, including spike phase and synaptic filtering. We treat these effects as noise in our solution by adding a noise term to the output of each neural layer carried forward in the algorithm. For SNN implementation on fully-digital neuromorphic architectures, with deterministic spike interval, this noise may be neglected. In this work, we sample this noise from a multi-variate Gaussian distribution, with covariance set to the covariance of the sample set scaled by the network maximum firing rate (in this work, 1000 Hz).
This process is repeated for each layer in the network until the output layer. At this point, since the ANN output is no longer being acted upon by an activation function, the target activations are set to be the outputs of the ANN, and the algorithm proceeds as normal. The complete process is outlined in Algorithm 1.
Iii Experiments and Results
The feature-translation method was tested by translating three different ANNs into SNNs and evaluating the resulting networks on task performance and agreement with the ANN output. Two of the networks translated were multi-layer perceptron (MLP) networks. These both represented control policies for simulated dynamic-systems learned through reinforcement learning. The third network was a convolutional neural network (CNN), trained on the MNIST hand-written digit classification task 16].
In order to study the effects of various translation hyper-parameters on the performance of the resulting networks, each ANN was translated under multiple parameter settings, as shown in Table I. The integration time is the amount of time the network was allowed to accumulate charge while stimulated by the constant input signal. The output signal was averaged over time to give the network output for the supplied input. “Sample Replication” is the number of times the supplied input sample-set was replicated by the algorithm. Size factor is how much each SNN layer was reduced in size, relative to the original ANN layer. This size reduction was applied uniformly across all layers of the network.
The resulting ANNs implement a physics-based LIF model, and therefore represent an analog or hybrid neuromorphic architecture. The firing threshold voltage was set to , and each neuron was given a background current of to offset the resulting threshold bias. The neuron time constants were deterministically set to 0.02 with zero mismatch. The synaptic connections between neurons implement a simple low-pass filter, with a time constant of ms. The networks were set to have a maximum neuron firing rate of Hz.
|Small MLP||ms, ms, ms, ms, ms, ms||,|
|Large MLP||ms, ms, ms, ms, ms, ms||,|
|CNN||ms, ms, ms, ms, ms, ms||,|
All of the SNN models were built and simulated using the Nengo large-scale neural engineering system simulator . Nengo is a discrete-time simulator of large scale spiking neural systems based on the Neural Engineering Framework. All simulations in this study were created using the built-in Nengo objects, with the exception of the neuron model. The basic LIF neuron suffers from numerical error at high firing rates, caused by the default reset to zero voltage. This was addressed by building a modified LIF neuron that subtracts the threshold voltage from the calculated voltage at firing time, compensating for some of the numerical discretization error. Without this, high-frequency signals, which are responsible for communicating a high amount of information, suffer from significant discretization loss.
Iii-a MLP Translations
Two MLPs were tested on one reinforcement learning task each from OpenAI gym 
. Each task requires the policy to select an action from a known action space based on a provided observation. The simulated environment is then stepped forward one discrete time-step and the process repeats with the environment providing an observation and a reward signal on each step. The performance of a policy in on a given task is measured by the accumulation of reward signals over episodes of the task. The first MLP (“Small MLP”) was trained using stochastic gradient descent on the Cart Pole task. The objective of this task is to keep a pole balanced as an inverted pendulum by moving the cart base in 1D. The observation space is a four-element vector and the output is a binary action choice (“move left” or “move right”). A screen-shot of this environment is shown inFig. 1(a).
The translated ANN had three layers (two hidden layers), with 64 rectified linear unit (ReLU) neurons at each hidden layer. The ANN outputs unnormalized logits, which were passed to a softmax function and then used as probability masses for each discrete action. Actions were stochastically selected from the resulting distribution at each time step. During translation, the neural noise added at each layer was the clock rate, 1,000Hz scaled by the maximum standard deviation of the four input signals. Trajectories from ten episodes of 200 steps each were used as the sample inputs, for a total of 2,000 samples. The SNN was then tested over 40 episodes, for a total ofsteps. The best agreement between the SNN and ANN was achieved at replication factor 5, integration time 1s, with an average RMSE across the samples of with a standard deviation of . The ANN scored an average of 200/200, with a standard deviation of 0.0. The SNN scored 200/200 with a standard deviation of 0.0. These results, along with those from the other experiments are summarized in LABEL:tab:summary.
The second MLP (“Medium MLP”) was trained using the Trust Region Policy Optimization (TRPO) algorithm  on the Ant walker task. The objective of this task is to control a four-legged robotic walker to walk as far as possible by controlling the individual joint motions. The observation space is a -element vector and the output is 8 continuous valued actions (joint-torques). A screen-shot of this environment is shown in figure Fig. 1(b).
The translated ANN had four layers (three hidden layers), with , , and ReLU neurons at each hidden layer. The ANN outputs the continuous, real-valued actions which are passed directly to the environment. During translation, the neural noise added at each layer was the clock rate scaled by the average standard deviation of the input signals. Trajectories from episodes of steps each were used as the sample inputs, for a total of samples. The best agreement with the ANN was achieved with replication factor 5, integration time 1s, with an average RMSE of with a standard deviation of . The ANN scored an average of , with a standard deviation of . The SNN scored with a standard deviation of .
Iii-B CNN Translation
A CNN was trained on the MNIST (mini-NIST) dataset of hand-written digits. Each image in MNIST is a small, px by px, single-channel, black-and-white image of a hand-written numerical digit 0 to 9, with matched classification label. The objective of the task is to correctly identify the pictured digit. An example of the images and labels is shown in Fig. 3. The architecture of the CNN trained for this task is shown in the first three columns of Table II. ReLU functions were used at each layer.
The CNN was trained using stochastic gradient descent with ADAM optimization . The full MNIST dataset was segregated into a training and test batch, having and
images respectively. No zero-centering or normalization pre-processing was done on the dataset. The single-channel input was replicated three times to create a three-channel input so that a general CNN architecture could be used. The CNN was trained on all images across 10 epochs of training using mini-batch sizes of 30 images.
|Filter Dims.||Num. Filters||Stride||FC Dims.|
|Conv Layer 0|
|Conv Layer 1|
|Conv Layer 2|
|Conv Layer 3|
|FC Layer 0||units||n/a||n/a|
This translation method, and neuromorphic architectures in general, cannot accommodate sliding convolutional layers. In neuromorphic chips, the sliding filters are often implemented in parallel cores with redundant weights for each filter location. In order to accommodate this, the CNN was transformed into a large fully-connected network by repeating the filter matrices in the appropriate configuration and reshaping the image into a column vector. The dimensions of the resulting fully-connected MLP are shown along with the original in Table II.
During translation, the neural noise added at each layer was the clock rate, 1,000 Hz scaled by the average standard deviation of the input pixels signals. images from the training set were randomly selected as the input sample-set for each translation. For testing, images were randomly selected from the test set. The best agreement between the SNN and CNN was achieved with no size reduction and five-time sample replication, with an average RMSE across the 1,000 samples of with a standard deviation of . In this case, both the CNN and the SNN achieved an average accuracy of on the 1,000 samples.
Iii-C Summary Results
The effects of the various parameters can be seen in the included figures. Figure 4 shows the effect of increasing the network compression on the network disagreement. The results shown are for 500ms integration time and five-time sample replication. A strong correlation can be seen between compression amount and disagreement in both the Ant-Walker MLP and the CNN, however, there is only a slight increase in the Cart Pole MLP.
Figure 5 shows the effect of varying the SNN integration time on the relative disagreement with the ANN. The results shown are for full-sized network translation, with five-times sample replication. As can be seen, the disagreement drops sharply as integration time increase from 10ms to 50ms and then asymptotes above 100ms for all three networks.
For the CNN, we also report the accuracy of the networks on the classification task for varying levels of compression. The results shown in Table III
are for 500ms integration time and five-time sample replication. The total number of neurons in the resulting SNNs are reported along with the accuracy of the SNN and the original CNN. The variance in CNN performance is due to the random sampling of the 1,000 image sub-batch used as the test set for each network instance.
LABEL:tab:summary shows the performance of each network with differing sample replication. Each result is reported for an integration time of 500ms and a full-sized translation. For the MLP networks, the average episode score over the test runs is reported. For the CNN, the classification accuracy on the test set is reported.
|No. Neurons||CNN Accuracy||SNN Accuracy|
The results show that layer-wise synapse optimization is able to effectively translate both MLP and CNN neural networks to SNNs compatible with implementation on neuromorphic chips with arbitrary neurons. Variance of the multiple translation parameters results in variation of the agreement between ANN and SNN.
For all three networks, sample replication had little effect on either network agreement or task performance. This may be because for all three networks, relatively large sample sets were used which provided redundant coverage of the input space and enough capacity to obviate the need for sample replication. Sample replication may remain a viable method to increase translation robustness for tasks in which sample data is sparse or generation of samples is expensive. Investigation of this hypothesis is left for future work.
In several domains, particularly those involving system control, integration time may be a critical performance parameter. The networks produced in this study show were able to retain agreement and task performance for integration times as low as 50ms for a 1,000 Hz system. In general, there is an input-output delay in time-dependent SNNs. In this work, outputs from the first 10% of the total integration time were ignored to account for this delay. This may have resulted in too small a delay window for the low integration-time runs. For task-specific networks, a measurement of the delay could be completed to more precisely set the window and allow for smaller integration times.
Network compression had a significant effect on network agreement for the ant-walker MLP and the CNN. For both of these networks, average disagreement increased by approximately 60% as network size was decreased from 100% of the original to just 25%. For the Cart Pole MLP, little effect was observed with decrease in layer size, perhaps because the original ANN over-parameterized the relatively simple problem, resulting in very sparse weights. It should be noted, that although the CNN agreement degraded significantly, the task performance did not, with the SNN meeting within one percent or exceeding the CNN accuracy in for each test case. This suggests that the disagreement observed was the effect of approximately uniform scaling of the outputs, retaining the critical relative relationship between output magnitudes required for classification. The fact that the translated CNN performance improves with increasing compression may be a result of the flattening process used to convert the CNN into a densely connected MLP. This conversion results in large, sparse weight matrices that can be ill-conditioned. The proposed truncation method removes the higher-order singular values, resulting in a lower condition number.
In this work, the same network compression factor was used on every layer in each test case. This may be improved upon by instead applying differing compression factors to each layer. In general, compression works better on networks with sparse weights or with most weights near zero [13, 14]. Therefore, by applying higher compression to these types of layers, and less to more dense layers, more effective compression can be achieved with lower loss in accuracy.
This compression would be especially beneficial for implementation of CNNs. As shown, the expansion of the sliding convolutional layers to fully-connected increased the overall parameter count per-layer, for example from 384 to over 1.5 million for the first layer of our network. Deploying this to any practical neuromorphic chip would be infeasible. At the same time, the method used to convert the convolutional layers to fully-connected results in very sparse weights. For example, for the CNN in this work , , and of the weights from the fully-connected convolutional layer conversions are zero. This result suggests that network compression, when applied to these layers, would be an effective method to compress CNNs for practical neuromorphic deployment.
In this work we presented layer-wise synapse optimization (LSO), a method to convert ANNs layer-by-layer by optimal representation of the network hidden-layer activations. The method may be used to convert ANNs to an SNN with physically realistic neurons. This offers a significant improvement on previous methods, for which limitations on both ANN activations and SNN neurons limited their utility to only digital neuromorphic architectures. In contrast, feature-translation can generate SNNs compatible with digital, analog and hybrid architectures. LSO was shown to be effective on MLP networks and, with appropriate conversion to a dense MLP, convolutional networks. For all the networks tested, good agreement was achieved between the SNN and the original ANN. The networks translated represented policies for two distinct classes of problems: reinforcement learning-based system control and image classification. LSO was able to maintain excellent performance across both of these task types. Initial results also show that small integration times, on the order of 50ms for a 1,000 Hz system, are attainable.
LSO also introduces an optimal compression method for ANN conversion. In contrast to previous methods which often require replication of ANN nodes, this method allows the network size to be selected by the designer by optimally compressing the ANN layers. This is critical for deployment of the translated SNNs to practical neuromorphic systems which often impose constraints on core size and connectivity. As seen, conversion of CNNs to MLPs results iton extremely large networks that would be infeasible to deploy on many systems. Through use of the novel compression method introduced, these layers can be truncated to manageable dimensions. In this work, compression to a network as small as of the original was accomplished with no loss in performance on the MNIST task.
There are several extensions to this method that are left for future work. Additional spiking neuron types should be investigated. In particular, SNNs representative of digital neuromorphic systems should be generated. End-to-end validation could then be conducted by deploying and testing the network on a neuromorphic chip. Our primary choice for this test is the TrueNorth chip from IBM . Deployment to chip would also require incorporation of the particular constraints imposed by the hardware. For TrueNorth, these are primarily constraints on the resolution of the synaptic weights and on neuron connectedness. These can be addressed through use of a constrained optimizer to solve the LSE problem in the layer translation.
During the study, it was observed that the large matrices generated resulted in high memory cost and slow performance. An alternative solver to the matrix inversion method should be investigated to alleviate this computational cost. Further investigation should also be done to study the effect of network depth on the effectiveness of the translation method. Lossy errors have a tendency to accumulate through SNN layers ; hence, the robustness of our method should be studied.
LSO is based on approximating neuron spiking behavior as continuous firing-rate curves. Additional performance may be gained by optimizing network performance directly on the spiking-network after training, to fine-tune the weights. Investigation of evolutionary or cross-entropy based approaches for this fine-tuning should be conducted. Additionally, differentiable ANN activation functions that approximate neuromorphic firing rate curves should be investigated . These activation functions would allow for gradient-based training such that the resulting ANN weights would be closer to the weights required for SNN implementation and would likely improve the accuracy of the final network after translation.
The authors would like to thank the U.S. Army Research Laboratory’s High Performance Computing Research Center for the support of this work, especially Manuel Vindiola and Vinnie Monaco. The authors are also grateful to Professor Kwabena Boahen and Eric Kauderer-Abrams from the Stanford Neuromorphics Laboratory for their guidance and advice.
-  W. Maass, “Networks of spiking neurons: The third generation of neural network models,” Neural Networks, vol. 10, no. 9, pp. 1659–1671, 1997.
-  E. Hunsberger and C. Eliasmith, “Training spiking deep networks for neuromorphic hardware,” CoRR, vol. abs/1611.05141, pp. 1–10, 2016.
-  G. Indiveri and S. C. Liu, “Memory and Information Processing in Neuromorphic Systems,” Proceedings of the IEEE, vol. 103, no. 8, pp. 1379–1397, 2015.
-  D. Li, X. Chen, M. Becchi, and Z. Zong, “Evaluating the Energy Efficiency of Deep Convolutional Neural Networks on CPUs and GPUs,” IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom), pp. 477–484, 2016.
-  K. Boahen, “A Neuromorph’s Prospectus,” Computing in Science and Engineering, vol. 19, no. 2, pp. 14–28, 2017.
-  P. U. Diehl, B. U. Pedroni, A. Cassidy, P. Merolla, E. Neftci, and G. Zarrella, “TrueHappiness: Neuromorphic emotion recognition on TrueNorth,” International Joint Conference on Neural Networks, pp. 4278–4285, 2016.
-  S. K. Esser, P. A. Merolla, J. V. Arthur, A. S. Cassidy, R. Appuswamy, A. Andreopoulos, D. J. Berg, J. L. McKinstry, T. Melano, D. R. Barch, C. di Nolfo, P. Datta, A. Amir, B. Taba, M. D. Flickner, and D. S. Modha, “Convolutional networks for fast, energy-efficient neuromorphic computing,” Proceedings of the National Academy of Sciences, vol. 113, no. 41, pp. 11441–11446, 2016.
-  E. Orhan, “The leaky integrate-and-fire neuron model.” http://corevision.cns.nyu.edu/ eorhan/notes/lif-neuron.pdf, 2012.
P. U. Diehl, G. Zarrella, A. Cassidy, B. U. Pedroni, and E. Neftci, “Conversion of artificial recurrent neural networks to spiking neural networks for low-power neuromorphic hardware,”IEEE International Conference on Rebooting Computing, 2016.
-  B. V. Benjamin, P. Gao, E. McQuinn, S. Choudhary, A. R. Chandrasekaran, J. M. Bussat, R. Alvarez-Icaza, J. V. Arthur, P. A. Merolla, and K. Boahen, “Neurogrid: A mixed-analog-digital multichip system for large-scale neural simulations,” Proceedings of the IEEE, vol. 102, no. 5, pp. 699–716, 2014.
P. U. Diehl, D. Neil, J. Binas, M. Cook, S. C. Liu, and M. Pfeiffer, “Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing,”International Joint Conference on Neural Networks, 2015.
-  M. M. Lin, “Discrete Eckart-Young theorem for integer matrices,” SIAM Journal on Matrix Analysis and Applications, vol. 32, no. 4, pp. 1367–1382, 2011.
-  S. Han, H. Mao, and W. J. Dally, “Deep Compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” CoRR, vol. abs/1510.00149, pp. 1–14, 2015.
-  T. He and E. Al., “Reshaping deep neural network for fast decoding by node-pruning,” International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 245–249, 2014.
-  Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, pp. 2278–2324, Nov 1998.
-  M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015. Software available from tensorflow.org.
-  T. Bekolay, J. Bergstra, E. Hunsberger, T. Dewolf, T. C. Stewart, D. Rasmussen, X. Choo, A. R. Voelker, and C. Eliasmith, “Nengo: a Python tool for building large-scale functional brain models.,” Frontiers in Neuroinformatics, vol. 7, no. January, p. 48, 2014.
-  G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “OpenAI Gym,” 2016.
-  J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel, “Trust region policy optimization,” CoRR, vol. abs/1502.05477, 2015.
-  D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014.
-  P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada, F. Akopyan, B. L. Jackson, N. Imam, C. Guo, Y. Nakamura, B. Brezzo, I. Vo, S. K. Esser, R. Appuswamy, B. Taba, A. Amir, M. D. Flickner, W. P. Risk, R. Manohar, and D. S. Modha, “A million spiking-neuron integrated circuit with a scalable communication network and interface,” Science, vol. 345, no. 6197, pp. 668–673, 2014.
-  B. Rueckauer, I. Lungu, Y. Hu, and M. Pfeiffer, “Theory and tools for the conversion of analog to spiking convolutional neural networks,” CoRR, vol. abs/1612.04052, 2016.