Generative models on accelerated neuromorphic hardware

07/06/2018 ∙ by Akos F. Kungl, et al. ∙ University of Heidelberg 40

The traditional von Neumann computer architecture faces serious obstacles, both in terms of miniaturization and in terms of heat production, with increasing performance. Artificial neural (neuromorphic) substrates represent an alternative approach to tackle this challenge. A special subset of these systems follow the principle of "physical modeling" as they directly use the physical properties of the underlying substrate to realize computation with analog components. While these systems are potentially faster and/or more energy efficient than conventional computers, they require robust models that can cope with their inherent limitations in terms of controllability and range of parameters. A natural source of inspiration for robust models is neuroscience as the brain faces similar challenges. It has been recently suggested that sampling with the spiking dynamics of neurons is potentially suitable both as a generative and a discriminative model for artificial neural substrates. In this work we present the implementation of sampling with leaky integrate-and-fire neurons on the BrainScaleS physical model system. We prove the sampling property of the network and demonstrate its applicability to high-dimensional datasets. The required stochasticity is provided by a spiking random network on the same substrate. This allows the system to run in a self-contained fashion without external stochastic input from the host environment. The implementation provides a basis as a building block in large-scale biologically relevant emulations, as a fast approximate sampler or as a framework to realize on-chip learning on (future generations of) accelerated spiking neuromorphic hardware. Our work contributes to the development of robust computation on physical model systems.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The aggressive pursuit of Moore’s law in conventional computing architectures is slowly but surely nearing its end Waldrop (2016), with difficult-to-overcome physical effects, such as heat production and quantum uncertainty, representing the main limiting factor. The so-called von Neumann bottleneck between processing and memory units represents the main cause, as it effectively limits the speed of these largely serial computation devices. The most promising solutions come in the form of massively parallel devices, many of which are based on brain-inspired computing paradigms Indiveri et al. (2011); Furber (2016), each with its own advantages and drawbacks.

Among the various approaches to such neuromorphic computing, one class of devices is dedicated to the physical emulation of cortical circuits: not only do they instantiate neurons and synapses that operate in parallel and independently of each other, but these units are actually represented by distinct circuits that emulate the dynamics of their biological archetypes Mead (1990); Indiveri et al. (2006); Schemmel et al. (2010); Jo et al. (2010); Pfeil et al. (2013); Qiao et al. (2015); Chang et al. (2016); Wunderlich et al. (2019). Some important advantages of this approach lie in their reduced power consumption and enhanced speed compared to conventional simulations of biological neuronal networks, which represent direct payoffs of replacing the resource-intensive numerical calculation of neuro-synaptic dynamics with the physics of the devices themselves.

However, such computation with analog dynamics, without the convenience of binarization, as used in digital devices, has a downside of its own: variability in the manufacturing process (fixed pattern noise) and temporal noise both lead to reduced controllability of the circuit dynamics. Additionally, one relinquishes much of the freedom permitted by conventional algorithms and simulations, as one is confined by the dynamics and parameter ranges cast into the silicon substrate. The main challenge of exploiting these systems therefore lies in designing performant network models using the available components while maintaining a degree of robustness towards the substrate-induced distortions. Just like for the devices themselves, inspiration for such models often comes from neuroscience, as the brain needs to meet similar demands.

With accumulating experimental evidence Berkes et al. (2011); Pouget et al. (2013); Orbán et al. (2016); Haefner et al. (2016), the view of the brain itself as an analytical computation device has shifted. The stochastic nature of neural activity in vivo is being increasingly regarded as an explicit computational resource rather than a nuisance that needs to be dealt with by sophisticated error-correcting mechanisms or by averaging over populations. Under the assumption that stochastic brain dynamics reflect an ongoing process of Bayesian inference in continuous time, the output variability of single neurons can be interpreted as a representation of uncertainty. Theories of neural sampling Buesing et al. (2011); Hennequin et al. (2014); Aitchison and Lengyel (2016); Petrovici et al. (2016); Kutschireiter et al. (2017) provide an analytical framework for embedding this type of computation in spiking neural networks.

Figure 1: (A) Photograph of a fully assembled wafer module of the BrainScaleS system (dimensions: × × ). One module hosts 384 HICANN chips on 48 reticles, with 512 physical neurons per chip and 220 synapse circuits per neuron. The wafer itself lies at the center of the module and is itself not visible. FPGAs are responsible for I/O and experiment control. Support PCBs provide power supply for the on-wafer circuits as well as access to neuron membrane voltages. The connectors for inter-wafer (sockets resembling USB-A) and off-wafer/host connectivity (Gigabit-Ethernet sockets) are distributed over all four edges of the main PCB. Mechanical stability is provided by an aluminum frame. (B) The wafer itself is composed of 48 reticles (e.g., red rectangle), each containing 8 HICANN chips (e.g., black rectangle, enlarged in C). Inter-reticle connectivity is added in a post-processing step. (C) On a single HICANN chip, the largest area is occupied by the two synapse matrices which instantiate connections to the neurons positioned in the neuron array. (D-E) Post synaptic potentials (PSPs) measured on 100 different neuron membranes using the same parameter settings before (D) and after (E) calibration. Despite the clearly observable fixed-pattern noise, calibration brings PSPs closer to the common amplitude value of and time constants around . The PSPs are averaged over 375 presynaptic spikes and smoothed with a Savitzky-Golay filter Savitzky and Golay (1964) to eliminate readout noise.

In this paper we describe the realization of neural sampling with networks of leaky integrate-and-fire neurons Petrovici et al. (2016) on the BrainScaleS accelerated neuromorphic platform Schemmel et al. (2010)

. With appropriate training, the variability of the analog components can be incorporated into a functional network structure, while the network’s ongoing dynamics make explicit use of the analog substrate’s intrinsic acceleration for Bayesian inference. We demonstrate sampling from low-dimensional target probability distributions with randomly chosen parameters (

section III.1) as well as inference in high-dimensional spaces constrained by real-world data, by solving associated classification and constraint satisfaction problems (section III.2). All network components are fully contained on the neuromorphic substrate, with external inputs only used for sensory evidence (visual data). Our work thereby contributes to the search for novel paradigms of information processing that can directly benefit from the features – including some otherwise perceived as flaws – of neuro-inspired physical model systems.

Ii Methods

ii.1 The BrainScaleS system

BrainScaleS Schemmel et al. (2010) is a mixed-signal neuromorphic system that emulates networks of spiking neurons. Each BrainScaleS wafer module consists of a silicon wafer with 384 HICANN (High Input Count Analog Neural Network) chips, see fig. 1 A. On each chip, 512 analog circuits emulate the adaptive exponential integrate-and-fire (AdEx) model Brette and Gerstner (2005); Millner et al. (2010) of spiking neurons with conductance-based synapses. The parameters of the neuron circuits are stored in analog memory cells (floating gates) with resolution, and the synaptic weights are stored in SRAM Schemmel et al. (2010). The dynamics evolve with an acceleration factor of with respect to biological time, i.e., all specific time constants (synaptic, membrane, adaptation) are approximately times smaller than typical corresponding values found in biology Schemmel et al. (2010); Petrovici et al. (2014). To preserve compatibility with related literature Petrovici et al. (2016); Schmitt et al. (2017); Leng et al. (2018), we refer to parameters in the biological domain unless specified otherwise. Spike events are transported digitally and can reach all other neurons on the wafer with the help of an additional redistribution layer that instantiates an on-wafer circuit-switched network Zoschke et al. (2017) (fig. 1 B).

Because of mismatch effects (fixed-pattern noise) inherent to the substrate, the response to incoming stimuli varies from neuron to neuron (fig. 1

 D). In order to bring all neurons into the desired regime and reduce the neuron-to-neuron response variability, we employ a standard calibration procedure that is performed only once, during the commissioning of the system

Schmitt et al. (2017); Petrovici et al. (2017a). Nevertheless, even after calibration, a significant degree of diversity persists (fig. 1 E). The emulation of functional networks that do not rely on population averaging therefore requires appropriate training algorithms (section III.2).

ii.2 Sampling with leaky integrate-and-fire neurons

The theory of sampling with leaky integrate-and-fire neurons Petrovici et al. (2016) describes a mapping between the dynamics of a population of neurons with conductance-based synapses (equations given in table 1

) and a Markov-chain Monte Carlo sampling process from an underlying probability distribution over binary random variables (RVs). Each neuron in such a sampling network corresponds to one of these RVs: if the

-th neuron has spiked in the recent past and is currently refractory, then it is considered to be in the on-state , otherwise it is in the off-state (fig. 2 A, B). With appropriate synaptic parameters, such a network can approximately sample from a Boltzmann distribution defined by

(1)

where is the partition sum, a symmetric, zero-diagonal effective weight matrix and the effective bias of the -th neuron.

In the original model, each neuron receives excitatory and inhibitory Poisson input. This plays two important roles: it transforms a deterministic LIF neuron into a stochastic firing unit and induces a high-conductance state Destexhe et al. (2003)

), which symmetrizes the neural activation function by reducing the effective membrane time constant. A mapping of this activation function to the logistic function

provides the translation from the dimensionless weights and biases of the target distribution to the corresponding biological parameters of the spiking network Petrovici (2016).

Although different in their dynamics, such sampling spiking networks (SSNs) function similarly to (deep) Boltzmann machines

Hinton et al. (1984)

, which makes them applicable to the same class of machine learning problems

Leng et al. (2018). Training can be done using the wake-sleep algorithm Hinton et al. (1995), which implements maximum-likelihood learning on the training set:

(2)
(3)

where and represent averages over the sampled (model) and target (data) distribution, respectively, and is the learning rate.

In order to enable a fully-contained neuromorphic emulation on the BrainScaleS system, the original model had to be modified. The changes in the network structure, noise generation mechanism and learning algorithm are described in section II.3.

For low-dimensional, fully specified target distributions, we used the Kullback-Leibler divergence

(DKL, Kullback and Leibler, 1951) as a measure of discrepancy between the sampled () and the target () distributions:

(4)

This was done in part to preserve comparability with previous studies Buesing et al. (2011); Petrovici et al. (2015, 2016)

, but also because the DKL is the natural loss function for maximum likelihood learning. For high-dimensional datasets, we used the error rate (ratio of misclassified images in the test set) for discriminative tasks and the mean squared error (MSE) between reconstruction and original image for pattern completion tasks. The MSE is defined as

(5)

where is the reference data value, is the model reconstruction and the sum goes over the pixels to be reconstructed by the SSN.

Figure 2: Sampling with leaky integrate-and-fire (LIF) neurons. (A) Schematic of a spiking sampling network (SSN) with 5 neurons. (B) Example membrane potentials of three neurons in the network. Following a spike, the refractory mechanism effectively clamps the membrane potential to the reset value for a duration . During this time, the RV corresponding to that neuron is in the state . At any point in time, the state sampled by the network can therefore be read out directly from its output spikes. (C) Based on this framework, hierarchical sampling networks can be built, which can be trained on real-world data.

ii.3 Experimental setup

Figure 3: Experimental setup. Each sampling unit is instantiated by a pair of neurons on the hardware. The bias neuron

b
is configured with a suprathreshold leak potential and generates a regular spike train that impinges on the sampling neuron

s
, thereby serving as a bias, controlled by . (A) As a benchmark, we provided each sampling neuron with private, off-substrate Poisson spike sources. (B) Alternatively, in order to reduce the I/O load, the noise was generated by a random network (RN). The RN consisted of randomly connected inhibitory neurons with . Connections were randomly assigned, such that each sampling neuron received a fixed number of excitatory and inhibitory presynaptic partners (table 1). (C)

Exemplary activation function of a single sampling neuron with Poisson noise and with an RN as a function of the bias weight. The standard deviation of the the trial-to-trial variability is on the order of

for both activation functions, hence the error bars are to small to be shown. The inset shows the membrane trace of the corresponding bias neuron. (D-E) Diversity of activation functions on a calibrated BrainScaleS system. The figures show histograms of the width and the midpoint of the activation functions with Poisson noise and with an RN, calculated by fitting the logistic function to the data.

The physical emulation of a network model on an analog neuromorphic substrate is not as straightforward as a software simulation, as it needs to comply with the constraints imposed by the emulating device. Often, it may be tempting to fine-tune the hardware to a specific configuration that fits one particular network, e.g., by selecting specific neuron and synapse circuits that operate optimally given a particular set of network parameters, or by manually tweaking individual hardware parameters after the network has been mapped and trained on the substrate. Here, we explicitly refrained from any such interventions in order to guarantee the robustness and scalability of our results.

All experiments were carried out on a single module of the BrainScaleS system using a subset of the available HICANN chips. The network setup was specified in the BrainScaleS-specific implementation of PyNN Davison et al. (2009) and the standard calibration Schmitt et al. (2017) was used to set the analog parameters. The full setup consisted of two main parts: the SSN and the source of stochasticity.

In the original sampling model Petrovici et al. (2016), in order to affect biases, the wake-sleep algorithm (eq. 2) requires access to at least one reversal potential (, , or ), which are all controlled by analog memory cells. Given that rewriting analog memory cells is both less precise and slower than rewriting the SRAM cells controlling the synaptic weights, we modified our SSNs to implement biases by means of synaptic weights. To this end, we replaced individual sampling neurons by sampling units, each realized using two hardware neurons (fig. 3 A, B). Like in the original model, a sampling neuron was set up to encode the corresponding binary RV. Each sampling neuron was accompanied by a bias neuron set up with a suprathreshold leak potential that ensured regular firing (fig. 3 C, inset). Each bias neuron projected to its target sampling neuron with both an excitatory and an inhibitory synapse (with independent weights), thus inducing a controllable offset of the sampling neuron’s average membrane potential. Because excitatory and inhibitory inputs are routed through different circuits for each neuron, two types of synapses were required to allow the sign of the effective bias to change during training. For larger networks, in order to optimize the allocation of hardware resources, we shared the use of bias neurons among multiple sampling neurons (connected via distinct synapses). Similarly, in order to allow sign switches during training, connections between sampling neurons were implemented by pairs of synapses (one excitatory and one inhibitory) as well.

The dynamics of the sampling neurons were rendered stochastic in two different ways. The first setup served as a benchmark and represented a straightforward implementation of the theoretical model from Petrovici et al. (2016), with Poisson noise generated on the host computer and fed in during the experiment (fig. 3 A). In the second setup, we used the spiking activity of a sparse recurrent random network (RN) of inhibitory neurons, instantiated on the same wafer, as a source of noise (fig. 3 B). The mutual inhibition ensured a relatively constant (sub)population firing rate with suitable random statistics that can replace the ideal Poisson noise in our application. Projections from the RN to the SSN were chosen as random and sparse; this resulted in weak, but non-zero shared-input correlations, which can be compensated, however, by appropriate training Bytschok et al. (2017); Dold et al. (2018). This allowed the hardware-emulated RN to replace the Poisson noise required by the theoretical model.

With these noise-generating mechanisms, the activation function of the neurons, defined by the firing rate as a function of the bias weight , took on an approximately logistic shape, as required by the sampling model (fig. 3 C). Due mainly to the variability of the hardware circuits, the exact shape of this activation function varied significantly between neurons (fig. 3 D-E). Effectively, this means that initial weights and biases were set randomly, but also that the effective learning rates were different for each neuron. However, as we show below, this did not prevent the training procedure from converging to a good solution. This robustness with respect to substrate variability represents an important result of this work.

To train the networks on a neuromorphic substrate without embedded plasticity, we used a training concept often referred to as in-the-loop training Schmuker et al. (2014); Esser et al. (2016); Schmitt et al. (2017). With the setup discussed above, the only parameters changed during training were digital, namely the synaptic weights between sampling neurons and the weights between bias and sampling neurons. This allowed us to work with a fixed set of analog parameters, which significantly amplified the precision and speed of reconfiguration during learning, as compared to having used the analog storage instead. The updates of the digital parameters (synaptic weights) were calculated on the host computer based on the wake-sleep algorithm (eq. 2) but using the spiking activity measured on the hardware. During the iterative procedure, the values of the weights were saved and updated as a double precision floating point variable, followed by (deterministic) discretization in order to comply with the single-synapse weight resolution of . Clamping was done by injecting regular spike trains with frequency from the host through 5 synapses simultaneously, excitatory for and inhibitory for . These multapses were needed to exceed the upper limit of single synaptic weights and thus ensure proper clamping.

Iii Results

iii.1 Learning to approximate a target distribution

Figure 4: Emulated SSNs sampling from target Boltzmann distributions. Sampled distributions are depicted in blue for setups with Poisson noise and in orange for setups using RNs. Target distributions shown in dark yellow. Data was gathered from 150 runs with random initializations. Median values are shown as dark colors and interquartile ranges as either light colors or error bars. (A) Improvement of sampled distributions during training. The observed variability after convergence (during the plateau) is not due to noise in the system, but rather a consequence of the weight discretization: when the ideal (target) weights lie approximately mid-way between two consecutive integer values on the hardware, training leads to oscillations between these values. The parameter configuration showing the best performance during a training run – which, due to the abovementioned oscillations, was not necessarily the one in the final iteration – was chosen as the end result of the training phase. Averages of these results are shown as dashed lines. (B) Convergence of sampled distributions for the trained SSNs. (C) and (D) Sampled joint and marginal distributions of the trained SSNs after , respectively. (E) Consistency of training results for different target distributions. Here, we show a representative selection of 6 distributions with 10 independent runs per distribution. The full set of experiments is shown in appendix C. The box highlighted in blue corresponds to the target distribution used in the other panels of fig. 4. (F) Convergence of conditional distributions for the trained SSNs. (G) and (H) Sampled conditional joint and marginal distributions of the trained SSNs after , respectively.

The experiments described in this section serve as a general benchmark for the ability of our hardware-emulated SSNs and the associated training algorithm to approximate fully specified target Boltzmann distributions. The viability of our proposal to simultaneously embed deterministic RNs as sources of pseudo-stochasticity is tested by comparing the sampling accuracy of RN-driven SSNs to the case where noise is injected from the host as perfectly uncorrelated Poisson spike trains.

Target distributions

over 5 RVs were chosen by sampling weights and biases from a Beta distribution centered around zero:

. Similarly to previous studies Petrovici et al. (2016); Jordan et al. (2017)

, by giving preference to larger absolute values of the target distribution’s parameters, we thereby increased the probability of instantiating rougher, more interesting energy landscapes. The initial weights and biases of the network were sampled from a uniform distribution over the possible hardware weights. Due to the small size of the state space, the “wake” component of the wake-sleep updates could be calculated analytically as

and by explicit marginalization of the target distribution over non-relevant RVs.

For training, we used 500 iterations with sampling time per iteration. Afterwards, the parameter configuration that produced the lowest was more thoroughly re-tested in a longer () experiment. To study the ability of the trained networks to perform Bayesian inference, we clamped two of the five neurons to fixed values and compared the sampled conditional distribution to the target conditional distribution. Results for one of these target distributions are shown in fig. 4.

On average, with Poisson noise, the training showed fast convergence during the first 20 iterations, followed by fine-tuning and full convergence within 200 iterations. As expected, the convergence of the setups using RNs was significantly slower due to the need to overcome the additional background correlations, but they were still able to achieve similar performance (fig. 4 A).

In both setups, during the test run, the trained SSNs converged to the target distribution following an almost identical power law, which indicates similar mixing properties (fig. 4 B). For longer sampling durations (), the systematic deviations from the target distributions become visible and the reaches the same plateau as observed during training. Figure 4 C and D respectively show the sampled joint and marginal distributions after convergence. These observations remained consistent across a set of 20 different target distributions (see fig. 4 E for a representative selection of training results and appendix C for more details).

Similar observations hold for the inference experiments. Due to the smaller state space, convergence happened faster (fig. 4 E). The corresponding joint and marginal distributions are shown in fig. 4 F and G, respectively. The lower accuracy of these distributions is mainly because of the asymmetry of the effective synaptic weights caused by the variability of the substrate, towards which the learning algorithm is agnostic. The training took wall-clock time, including the pure experiment runtime, the initialization of the hardware and the calculation of the updates on the host computer (total turn-over time of the training). This corresponds to a speed-up factor of 100 compared to the equivalent of biological real time. While the nominal speed-up remained intact for the emulation of network dynamics, the total speed-up factor was reduced due to the overhead imposed by network (re)configuration and I/O between the host and the neuromorphic substrate.

iii.2 Learning from data

Figure 5: Behavior of hierarchical SSNs trained on data. Top row: rMNIST; middle row: rFMNIST; bottom row: exemplary setups for the partial occlusion scenarios. (A-B) Exemplary images from the rMNIST (A) and rFMNIST (B) datasets used for training and comparison to their MNIST and FMNIST originals. (C-D) Training with the hardware in the loop after translation of pre-trained parameters. Confusion matrices after training shown as insets. Performance of the reference RBMs shown as dashed brown lines. Results are given as median and interquartile values over 10 test runs. (E-F) Pattern completion and (G-H) error ratio of the inferred label for partially occluded images (blue: patch; red: salt&pepper). Solid lines represent median values and shaded areas show interquartile ranges over 250 test images per class. Performance of the reference RBMs shown as dashed lines. As a reference, we also show the error ratio of the SNNs on unoccluded images in (G) and (H). (I) Snapshots of the pattern completion experiments: O - original image, C - clamped image (red and blue pixels are occluded), R - response of the visible layer, L - response of the label layer. (J)

Exemplary temporal evolution of a pattern completion experiment with patch occlusion. For better visualization of the activity in the visible layer in (J) and (I), we smoothed out its discretized response to obtain grayscale pixel values, by convolving its state vector with a box filter of

width.

In order to obtain models of labeled data, we trained hierarchical SSNs analogously to restricted Boltzmann machines (RBMs). Here, we used two different datasets: a reduced version of the MNIST 

LeCun et al. (1998) and the fashion MNIST Xiao et al. (2017) datasets, which we abbreviate as rMNIST and rFMNIST in the following. The images were first reduced with nearest-neighbor resampling (misc.imresize function in the SciPy library Jones et al. (2014)) and then binarized around the median gray value over each image. We used all images from the original datasets (approx. 6000 per class) from 4 classes (0, 1, 4, 7) for rMNIST and 3 classes (T-shirts, Trousers, Sneakers) for rFMNIST (fig. 5 A-B). The emulated SSNs consisted of 3 layers, with 144 visible, 60 hidden and either 4 label units for rMNIST or 3 for rFMNIST.

Pre-training was done on simulated classical RBMs using the CAST algorithm Salakhutdinov (2010). We use the performance of these RBMs in software simulations using Gibbs sampling as a reference for the results obtained with the hardware-emulated SSNs. After pre-training, we mapped these RBMs to approximately equivalent SSNs on the hardware, using an empirical translation factor based on an average activation function (fig. 3 C) to calculate the initial hardware synaptic weights from weights and biases of the RBMs. Especially for rMNIST, this resulted in a significant deterioration of the classification performance (fig. 5 C). After mapping, we continued training using the wake-sleep algorithm, with the hardware in the loop. While in the previous task it was possible to calculate the data term explicitly, it now had to be sampled as well. In order to ensure proper clamping, the synapses from the hidden to the label layer and from the hidden layer to the visible layer were turned off during the wake phase.

The SSNs were tested for both their discriminative and their generative properties. For classification, the visible layer was clamped to images from the test set (black pixels correspond to and white pixels to ). Each image was presented for 500 biological milliseconds, which corresponds to wall-clock time. The neuron in the label layer with the highest firing rate was interpreted as the label predicted by the model. For both datasets, training was able to restore the performance lost in the translation of the abstract RBM to the hardware-emulated SSN. The emulated SSNs achieved error rates of on rMNIST and on rFMNIST. These values are close to the ones obtained by the reference RBMs: on rMNIST and on rFMNIST (fig. 5 C-D, confusion matrices shown as insets).

The gross wall-clock time needed to classify the 4125 images in the rMNIST test set was

( per image, speed-up). For the 3000 images in the rFMNIST test set, the emulation ran for ( per image; speed-up). This subsumes the runtime of the BrainScaleS software stack, hardware configuration and the network emulation. The runtime of the software-stack includes the translation from a PyNN-based network description to a corresponding hardware configuration. As before, the difference between the nominal acceleration factor and the effective speed-up stems from the I/O and initialization overhead of the hardware system.

Figure 6: Generated images during guided dreaming. The visible state space, along with the position of the generated images within it, was projected to two dimensions using t-SNE Maaten and Hinton (2008). The thin lines connect consecutive samples. (A) rMNIST; (B) rFMNIST.

To test the generative properties of our emulated SSNs, we set up two scenarios requiring them to perform pattern completion. For each class, 250 incomplete images were presented as inputs to the visible layer. For each image, of visible neurons received no input, with the occlusion following two different schemes: salt&pepper (upper row in fig. 5 I) and patch (lower row in fig. 5 I). Each image was presented for . In order to remove any initialization bias resulting from preceding images, random input was applied to the visible layer between consecutive images.

Reconstruction accuracy was measured using the mean squared error (MSE) between the reconstructed and original occluded pixels. For binary images, as in our case, the MSE reflects the average ratio of mis-reconstructed to total reconstructed pixels. Simultaneously, we also recorded the classification accuracy on the partially occluded images. After stimulus onset, the MSE converged from chance level () to its minimum () within (fig. 5 E-F). Given an average refractory period of (fig. 3 C), this suggests that the network was able to react to the input with no more than 5 spikes per neuron. For all studied scenarios, the reconstruction performance of the emulated SSNs closely matched the one achieved by the reference RBMs. Examples of image reconstruction are shown in fig. 5 I-J for both datasets and occlusion scenarios. The classification performance deteriorated only slightly compared to non-occluded images and also remained close to the performance of the reference RBMs (fig. 5 G-H). The temporal evolution of the classification error closely followed that of the MSE.

As a further test of the generative abilities of our hardware-emulated SSNs, we recorded the images produced by the visible layer during guided dreaming. In this task, the visible and hidden layers of the SSN evolved freely without external input, while the label layer was periodically clamped with external input such that exactly one of the label neurons was active at any time (enforced one-hot coding). In a perfect model, this would cause the visible layer to sample only from configurations compatible with the hidden layer, i.e., from images corresponding to that particular class. Between the clamping of consecutive labels, we injected ms random input to visible layer to facilitate the changing of the image. The SSNs were able to generate varied and recognizable pictures, within the limits imposed by the low resolution of the visible layer (fig. 6). For rMNIST, all used classes appeared in correct correspondence to the clamped label. For rFMNIST, images from the class “Sneakers” were not always triggered by the corresponding guidance from the label layer, suggesting that the learned modes in the energy landscape are too deep, and sneakers too dissimilar to T-shirts and Trousers, to allow good mixing during guided dreaming.

Iv Discussion

This manuscript presents the first scalable demonstration of sampling-based probabilistic inference with spiking networks on a highly accelerated analog neuromorphic substrate. We trained fully connected spiking networks to sample from target distributions and hierarchical spiking networks as discriminative and generative models of high-dimensional input data. Despite the inherent variability of the analog substrate, we were able to achieve performance levels comparable to those of software simulations in several benchmark tasks, while maintaining a significant overall acceleration factor compared to systems that operate in biological real time. Importantly, by co-embedding the generation of stochasticity within the same substrate, we have demonstrated the viability of a fully embedded neural sampling model with significantly reduced demands on off-substrate I/O bandwidth. In the following, we address the limitations of our study, point out links to related work and discuss its implications within the greater context of computational neuroscience and bio-inspired AI.

iv.1 Limitations and constraints

The most notable limitation imposed by the current commissioning state of the BrainScaleS system was on the size of the emulated SSNs. At the time of writing, errors in the manufacturing and post-production processes caused a reduction of the usable hardware real-estate to a patchy and non-contiguous area on the substrate, thereby strongly limiting the maximum connectivity between different locations within this area. In order to limit synapse loss to small values (below ), we restricted ourselves to using a small but contiguous functioning area of the wafer, which in turn limited the maximum size of our SSNs and noise-generating RNs. Ongoing improvements in post-production and assembly, as well as in the mapping and routing software, are expected to enhance on-wafer connectivity and thereby automatically increase the size of emulable networks, as the architecture of our SSNs scales naturally to such an increase in hardware resources.

To a lesser extent, the sampling accuracy was also affected by the limited precision of hardware parameter control. The writing of analog parameters exhibits significant trial-to-trial variability; in any given trial, this leads to a heterogeneous substrate, which is known to reduce the sampling accuracy Probst et al. (2015). Most of this variability is compensated during learning, but the resolution of the synaptic weights ultimately limits the ability of the SSN to approximate target distributions. This leads to the “jumping” behavior of the in the final stages of learning (fig. 4 A). However, the penalty imposed by a limited synaptic weight resolution is known to decrease for larger deep networks with more and larger hidden layers, both spiking and non-spiking Courbariaux et al. (2015); Petrovici et al. (2017b).

In the current setup, our SSNs displayed limited mixing abilities. During guided dreaming, images from one of the learned classes were more difficult to generate (fig. 6). Restricted mixing due to deep modes in the energy landscape carved out by contrastive learning is a well-known problem for classical Boltzmann machines, which is usually alleviated by computationally costly annealing techniques Salakhutdinov (2010); Desjardins et al. (2010); Bengio et al. (2013). However, the fully-commissioned BrainScaleS system will feature embedded short-term synaptic plasticity Schemmel et al. (2010), which has been shown to promote mixing in spiking networks Leng et al. (2018) while operating purely locally, at the level of individual synapses.

The synaptic learning rule was local and Hebbian, but updates were calculated on a host computer using an iterative in-the-loop training procedure, which required repeated stopping, evaluation and restart of the emulation, thereby reducing the nominal acceleration factor of by two orders of magnitude. By utilizing on-chip plasticity, as available, for example, on the BrainScaleS-2 successor system Friedmann et al. (2017); Wunderlich et al. (2019), this laborious procedure becomes obsolete and the accelerated nature of the substrate can be exploited to its fullest extent.

iv.2 Relation to other work

This study builds upon a series of theoretical and experimental studies of sampling-based probabilistic inference using the dynamics of biological neurons. The inclusion of refractory times was first considered in Buesing et al. (2011). An extension to networks of leaky integrate-and-fire neurons and a theoretical framework for their dynamics and statistics followed in Petrovici et al. (2013) and Petrovici et al. (2016). The compensation of shared-input correlations through inhibitory feedback and learning was discussed in Jordan et al. (2017) and Bytschok et al. (2017), inspired by the early study of asynchronous irregular firing in Brunel (2000) and by preceding correlation studies in theoretical Tetzlaff et al. (2012) and experimental Pfeil et al. (2016) work.

Previous small-scale studies of sampling on accelerated mixed-signal neuromorphic hardware include Petrovici et al. (2015, 2017a, 2017b). An implementation of sampling with spiking neurons and its application to the MNIST dataset was shown in Pedroni et al. (2016) using the fully digital, real-time TrueNorth neuromorphic chip Merolla et al. (2014).

We stress two important differences between Pedroni et al. (2016) and this work. First, the nature of the neuromorphic substrate: the TrueNorth system is fully digital and calculates neuronal state updates numerically, in contrast to the physical-model paradigm instantiated by BrainScaleS. In this sense, TrueNorth emulations are significantly closer to classical computer simulations on parallel machines: updates of dynamical variables are precise and robustness to variability is not an issue; the price is paid in simulation speed, with TrueNorth running in biological real time, which is 10.000 times slower than BrainScaleS. Second, the nature of neuron dynamics: the neuron model used in Pedroni et al. (2016) is an intrinsically stochastic unit that sums its weighted inputs, thus remaining very close to classical Gibbs sampling and Boltzmann machines, while our approach considers multiple additional aspects of its biological archetype (exponential synaptic kernels, leaky membranes, deterministic firing, stochasticity through synaptic background, shared-input correlations etc.). Moreover, our approach uses less hardware neuron units to represent a sampling unit, enabling a more parsimonious utilization of the neuromorphic substrate.

iv.3 Conclusion

In this work we showed how sampling-based Bayesian inference using hierarchical spiking networks can be robustly implemented on a physical model system despite inherent variability and imperfections. Underlying neuron and synapse dynamics are deterministic and close to their biological archetypes, but with much shorter time constants, hence the intrinsic acceleration factor of with respect to biology. The entire architecture – sampling network plus background random network – was fully deterministic and entirely contained on the neuromorphic substrate, with external communication used only to represent input patterns and labels. Considering the deterministic nature of neurons in vitro Mainen and Sejnowski (1995); Reinagel and Reid (2002); Toups et al. (2012), such an architecture also represents a plausible model for neural sampling in cortex Jordan et al. (2017); Dold et al. (2018).

We demonstrated sampling from arbitrary Boltzmann distributions over binary random variables, as well as generative and discriminative properties of networks trained with high-dimensional visual data. For such networks, the two abovementioned computational tasks (pattern completion and classification) happen simultaneously, as they both require the calculation of conditional distributions, which is carried out implicitly by the network dynamics. Both during learning and for the subsequent inference tasks, the setup benefitted significantly from the fast intrinsic dynamics of the substrate, achieving a net speedup of   to   compared to biology.

We view these results as a contribution to the nascent, but expanding field of applications for biologically inspired physical-model systems. They demonstrate the feasibility of such devices for solving problems in machine learning, as well as for studying biological phenomena. Importantly, they explicitly addresses the search for robust computational models that are able to harness the strengths of these systems, most importantly their speed and energy efficiency. The proposed architecture scales naturally to substrates with more neuronal real-estate and can be used for a wide array of tasks that can be mapped to a Bayesian formulation, such as constraint satisfaction problems Jonke et al. (2016); Fonseca Guerra and Furber (2017), prediction of temporal sequences Sutskever and Hinton (2007), movement planning Taylor and Hinton (2009); Alemi et al. (2015), simulation of solid-state systems Edwards and Anderson (1975) and quantum many-body problems Carleo and Troyer (2017); Czischek et al. (2018).

Acknowledgements.
We thank Johannes Bill for many fruitful discussions. The work leading to these results has received funding from the European Union Seventh Framework Programme (FP7) under grant agreement no #604102, the EU’s Horizon 2020 research and innovation programme under grant agreements No #720270 and #785907 (Human Brain Project, HBP), the EU’s research project BrainScaleS #269921 and the Heidelberg Graduate School of Fundamental Physics. We owe particular gratitude to the sustained support of our research by the Manfred Stärk Foundation.

Appendix A Network description and parameters

In tables 3, 2 and 1 we characterize the implemented network and its parametrization in the different tasks. Note that the neurons and synapses were emulated on a partly analog neuromorphic device, hence systematic differences between the ideal and the realized dynamics are expected. We show the analog parameters as they were declared in the software stack and not how they were realized in silico, as a calibration of the hardware is only possible up to a certain accuracy. For more details on these aspects of the BrainScaleS system see Schmitt et al. (2017).

Type Leaky integrate-and-fire (LIF), conductance based synapse, exponential kernel
Subthreshold dynamics Subthreshold dynamics
Reset and refractoriness
This model was emulated on the BrainScaleS system Schemmel et al. (2010)
Spiking If
neuron emits a spike with timestamp
Synapse dynamics For each presynaptic spike at
where is the synaptic weight, the synaptic delay and the Heaviside function
This model was emulated on the BrainScaleS system Schemmel et al. (2010)
Table 1: Description of the neuron and synapse model.
A Sampling neuron
Name Value Description
reset potential
resting potential
threshold potential
inhibitory reversal potential
excitatory reversal potential
refractory time
ca. membrane time constant
membrane capacity
excitatory synaptic time constant
inhibitory synaptic time constant
B Bias neuron
Name Value Description
reset potential
resting potential
threshold potential
inhibitory reversal potential
excitatory reversal potential
refractory time
ca. membrane time constant
membrane capacity
excitatory synaptic time constant
inhibitory synaptic time constant
C Neurons of the random network
Name Value Description (all analog)
reset potential
resting potential
threshold potential
inhibitory reversal potential
excitatory reversal potential
refractory time
ca. membrane time constant
membrane capacity
excitatory synaptic time constant
inhibitory synaptic time constant
D Synapse
Name Value Description
[0,15] synaptic bias weight in hardware values (digital)
[0,15] synaptic network weight in hardware values (digital)
on the order of (uncalibrated)

synaptic delay, estimated in 

Schemmel et al. (2010)
Table 2: Neuron parameters. Parameters of the network setup specified in table 1. The analog parameters are shown as specified in the software setup and not as realized on the hardware. For details on the calibration procedure see, e.g., Schmitt et al. (2017). Legend: the calibration of the membrane time constant was not available at the time of this work, and the corresponding technical parameter was set to the smallest available value instead (fastest possible membrane dynamics for each neuron).
A Probability distribution with Poisson Noise
Name Value Description
5 number of sampling neurons
1 number of bias neurons
0 number of random neurons
- within-population in-degree of neurons in the random network
- in-degree of sampling neurons from the random network
- synaptic weights in the random network
in hardware units
Poisson frequency to sampling neurons per synapse type
B Probability distribution with random network
Name Value Description
5 number of sampling neurons
1 number of bias neurons
200 number of random neurons
20 within-population in-degree of neurons in the random network
15 in-degree of sampling neurons from the random network
10 synaptic weights in the random network
in hardware units
- Poisson frequency to sampling neurons per synapse type
C High-dimensional dataset
Name Value Description
number of sampling neurons, { rFMNIST, rMNIST }
1 number of bias neurons
400 number of random neurons
20 within-population in-degree of neurons in the random network
15 in-degree of sampling neurons from the random network
10 synaptic weights in the random network
in hardware units
- Poisson frequency to sampling neurons per synapse type
Table 3: Network parameters. Parameters are shown for the three different cases described in the manuscript: A Target Boltzmann distribution, Poisson noise. B Target Boltzmann distribution, random network for stochasticity. C Learning from data, random network for stochasticity.

Appendix B Learning with the hardware in the loop

In table 4 we show the learning parameters used in the experiments on the BrainScaleS hardware. We did not carry any systematic hyper-parameter optimization. Note that the used learning parameters in the experiments in section III.1 are not directly comparable because the different statistics of the background noise (Poisson or random network) correspond to different effective learning rates.

Experiment Learning rate Momentum factor minibatch-size Initial
target distribution, Poisson 1.0 0.6 -
target distribution, random network 0.5 0.6 -
rMNIST 0.4 0.6 7/class pre-trained
rFMNIST 0.4 0.6 7/class pre-trained
Table 4: Parameters for learning.

Appendix C Learning to approximate target distributions

In fig. 7 we show the final DKLs after training to represent a target distribution both with Poisson noise and with the activity of a random network. We carried out the same experiments as described in section III.1 with 20 different samples for the weights and the biases of the target distribution. The experiments were repeated 10 times for each sample. Median learning results remained consistent across target distributions, with the variability reflecting the difficulty of the problem (discrepancies between LIF and Glauber dynamics become more pronounced for larger weights and biases). Variability across trials for the same target distribution is due to the trial-to-trial variability of the analog parameter storage (floating gates), due to the inherent stochasticity in the learning procedure (sampling accuracy in an update step), as well as due to systematic discrepancies between the effective pre-post and post-pre interaction strengths between sampling units, which are themselves a consequence of the aforementioned floating gate variability.

Figure 7: Emulated SSNs sampling from different target Boltzmann distributions. The figure shows the results of experiments identical to the ones in section III.1 for 20 different target distributions with 10 repetitions for each sample. We show the of the test-run after training for (A)

the joint distributions with Poisson noise,

(B) the inference experiment with Poisson noise, (C) the joint distributions with a random background network and (C) the inference experiment with a random background network. The data is plotted following the traditional box-and-whiskers scheme: the orange line represents the median, the box represents the interquartile range, the whiskers represent the full data range and the

represent the far outliers. In each subplot the leftmost data (highlighted in red) corresponds to the distribution shown in

fig. 4.

References

  • Waldrop (2016) M. M. Waldrop, Nature News 530, 144 (2016).
  • Indiveri et al. (2011) G. Indiveri, B. Linares-Barranco, T. J. Hamilton, A. Van Schaik, R. Etienne-Cummings, T. Delbruck, S.-C. Liu, P. Dudek, P. Häfliger, S. Renaud, et al., Frontiers in neuroscience 5, 73 (2011).
  • Furber (2016) S. Furber, Journal of neural engineering 13, 051001 (2016).
  • Mead (1990) C. Mead, Proceedings of the IEEE 78, 1629 (1990).
  • Indiveri et al. (2006) G. Indiveri, E. Chicca,  and R. J. Douglas, IEEE transactions on neural networks 17 (2006).
  • Schemmel et al. (2010) J. Schemmel, D. Brüderle, A. Grübl, M. Hock, K. Meier,  and S. Millner, in Circuits and systems (ISCAS), proceedings of 2010 IEEE international symposium on (IEEE, 2010) pp. 1947–1950.
  • Jo et al. (2010) S. H. Jo, T. Chang, I. Ebong, B. B. Bhadviya, P. Mazumder,  and W. Lu, Nano letters 10, 1297 (2010).
  • Pfeil et al. (2013) T. Pfeil, A. Grübl, S. Jeltsch, E. Müller, P. Müller, M. A. Petrovici, M. Schmuker, D. Brüderle, J. Schemmel,  and K. Meier, Frontiers in neuroscience 7, 11 (2013).
  • Qiao et al. (2015) N. Qiao, H. Mostafa, F. Corradi, M. Osswald, F. Stefanini, D. Sumislawska,  and G. Indiveri, Frontiers in neuroscience 9, 141 (2015).
  • Chang et al. (2016) Y.-F. Chang, B. Fowler, Y.-C. Chen, F. Zhou, C.-H. Pan, T.-C. Chang,  and J. C. Lee, Scientific reports 6, 21268 (2016).
  • Wunderlich et al. (2019) T. Wunderlich, A. F. Kungl, E. Müller, A. Hartel, Y. Stradmann, S. A. Aamir, A. Grübl, A. Heimbrecht, K. Schreiber, D. Stöckel, et al., Frontiers in Neuroscience 13, 260 (2019).
  • Berkes et al. (2011) P. Berkes, G. Orbán, M. Lengyel,  and J. Fiser, Science 331, 83 (2011).
  • Pouget et al. (2013) A. Pouget, J. M. Beck, W. J. Ma,  and P. E. Latham, Nature neuroscience 16, 1170 (2013).
  • Orbán et al. (2016) G. Orbán, P. Berkes, J. Fiser,  and M. Lengyel, Neuron 92, 530 (2016).
  • Haefner et al. (2016) R. M. Haefner, P. Berkes,  and J. Fiser, Neuron 90, 649 (2016).
  • Buesing et al. (2011) L. Buesing, J. Bill, B. Nessler,  and W. Maass, PLoS computational biology 7, e1002211 (2011).
  • Hennequin et al. (2014) G. Hennequin, L. Aitchison,  and M. Lengyel, arXiv preprint arXiv:1404.3521  (2014).
  • Aitchison and Lengyel (2016) L. Aitchison and M. Lengyel, PLoS computational biology 12, e1005186 (2016).
  • Petrovici et al. (2016) M. A. Petrovici, J. Bill, I. Bytschok, J. Schemmel,  and K. Meier, Physical Review E 94, 042312 (2016).
  • Kutschireiter et al. (2017) A. Kutschireiter, S. C. Surace, H. Sprekeler,  and J.-P. Pfister, Scientific reports 7, 8722 (2017).
  • Savitzky and Golay (1964) A. Savitzky and M. J. Golay, Analytical chemistry 36, 1627 (1964).
  • Brette and Gerstner (2005) R. Brette and W. Gerstner, Journal of neurophysiology 94, 3637 (2005).
  • Millner et al. (2010) S. Millner, A. Grübl, K. Meier, J. Schemmel,  and M.-O. Schwartz, in Adv Neur In, Vol. 23, edited by J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel,  and A. Culotta (2010) pp. 1642–1650.
  • Petrovici et al. (2014) M. A. Petrovici, B. Vogginger, P. Müller, O. Breitwieser, M. Lundqvist, L. Muller, M. Ehrlich, A. Destexhe, A. Lansner, R. Schüffny, et al., PloS one 9, e108590 (2014).
  • Schmitt et al. (2017) S. Schmitt, J. Klähn, G. Bellec, A. Grübl, M. Guettler, A. Hartel, S. Hartmann, D. Husmann, K. Husmann, S. Jeltsch, et al., in Neural Networks (IJCNN), 2017 International Joint Conference on (IEEE, 2017) pp. 2227–2234.
  • Leng et al. (2018) L. Leng, R. Martel, O. Breitwieser, I. Bytschok, W. Senn, J. Schemmel, K. Meier,  and M. A. Petrovici, Scientific reports 8, 10651 (2018).
  • Zoschke et al. (2017) K. Zoschke, M. Güttler, L. Böttcher, A. Grübl, D. Husmann, J. Schemmel, K. Meier,  and O. Ehrmann, EPTC 2017  (2017).
  • Petrovici et al. (2017a) M. A. Petrovici, A. Schroeder, O. Breitwieser, A. Grübl, J. Schemmel,  and K. Meier, in Neural Networks (IJCNN), 2017 International Joint Conference on (IEEE, 2017) pp. 2209–2216.
  • Destexhe et al. (2003) A. Destexhe, M. Rudolph,  and D. Paré, Nature reviews neuroscience 4, 739 (2003).
  • Petrovici (2016) M. A. Petrovici, Form Versus Function: Theory and Models for Neuronal Substrates (Springer, 2016).
  • Hinton et al. (1984) G. E. Hinton, T. J. Sejnowski,  and D. H. Ackley, Boltzmann machines: Constraint satisfaction networks that learn (Carnegie-Mellon University, Department of Computer Science Pittsburgh, PA, 1984).
  • Hinton et al. (1995) G. E. Hinton, P. Dayan, B. J. Frey,  and R. M. Neal, Science 268, 1158 (1995).
  • Kullback and Leibler (1951) S. Kullback and R. A. Leibler, The annals of mathematical statistics 22, 79 (1951).
  • Petrovici et al. (2015) M. A. Petrovici, D. Stöckel, I. Bytschok, J. Bill, T. Pfeil, J. Schemmel,  and K. Meier (2015).
  • Davison et al. (2009) A. P. Davison, D. Brüderle, J. M. Eppler, J. Kremkow, E. Muller, D. Pecevski, L. Perrinet,  and P. Yger, Frontiers in neuroinformatics 2, 11 (2009).
  • Bytschok et al. (2017) I. Bytschok, D. Dold, J. Schemmel, K. Meier,  and M. A. Petrovici, in BMC Neuroscience 2017, Vol. 18 (Organization for Computational Neurosciences, 2017) p. P200.
  • Dold et al. (2018) D. Dold, I. Bytschok, A. F. Kungl, A. Baumbach, O. Breitwieser, W. Senn, J. Schemmel, K. Meier,  and M. A. Petrovici, arXiv preprint arXiv:1809.08045  (2018).
  • Schmuker et al. (2014) M. Schmuker, T. Pfeil,  and M. P. Nawrot, Proceedings of the National Academy of Sciences 111, 2081 (2014).
  • Esser et al. (2016) S. K. Esser, P. A. Merolla, J. V. Arthur, A. S. Cassidy, R. Appuswamy, A. Andreopoulos, D. J. Berg, J. L. McKinstry, T. Melano, D. R. Barch, et al., Proceedings of the National Academy of Sciences of the United States of America 113, 11441 (2016).
  • Jordan et al. (2017) J. Jordan, M. A. Petrovici, O. Breitwieser, J. Schemmel, K. Meier, M. Diesmann,  and T. Tetzlaff, arXiv preprint arXiv:1710.04931  (2017).
  • LeCun et al. (1998) Y. LeCun, L. Bottou, Y. Bengio,  and P. Haffner, Proceedings of the IEEE 86, 2278 (1998).
  • Xiao et al. (2017) H. Xiao, K. Rasul,  and R. Vollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,”  (2017), cs.LG/1708.07747 .
  • Jones et al. (2014) E. Jones, T. Oliphant,  and P. Peterson,  (2014).
  • Salakhutdinov (2010) R. Salakhutdinov, in Proceedings of the 27th International Conference on Machine Learning (ICML-10) (2010) pp. 943–950.
  • Maaten and Hinton (2008) L. v. d. Maaten and G. Hinton, Journal of machine learning research 9, 2579 (2008).
  • Probst et al. (2015) D. Probst, M. A. Petrovici, I. Bytschok, J. Bill, D. Pecevski, J. Schemmel,  and K. Meier, Frontiers in computational neuroscience 9, 13 (2015).
  • Courbariaux et al. (2015) M. Courbariaux, Y. Bengio,  and J.-P. David, in Advances in neural information processing systems (2015) pp. 3123–3131.
  • Petrovici et al. (2017b) M. A. Petrovici, S. Schmitt, J. Klähn, D. Stöckel, A. Schroeder, G. Bellec, J. Bill, O. Breitwieser, I. Bytschok, A. Grübl, et al., in 2017 IEEE International Symposium on Circuits and Systems (ISCAS) (IEEE, 2017) pp. 1–4.
  • Desjardins et al. (2010) G. Desjardins, A. Courville, Y. Bengio, P. Vincent,  and O. Delalleau, in 

    Proceedings of the thirteenth international conference on artificial intelligence and statistics

     (MIT Press Cambridge, MA, 2010) pp. 145–152.
  • Bengio et al. (2013) Y. Bengio, G. Mesnil, Y. Dauphin,  and S. Rifai, in International conference on machine learning (2013) pp. 552–560.
  • Friedmann et al. (2017) S. Friedmann, J. Schemmel, A. Grübl, A. Hartel, M. Hock,  and K. Meier, IEEE transactions on biomedical circuits and systems 11, 128 (2017).
  • Petrovici et al. (2013) M. A. Petrovici, J. Bill, I. Bytschok, J. Schemmel,  and K. Meier, arXiv preprint arXiv:1311.3211  (2013).
  • Brunel (2000) N. Brunel, Journal of computational neuroscience 8, 183 (2000).
  • Tetzlaff et al. (2012) T. Tetzlaff, M. Helias, G. T. Einevoll,  and M. Diesmann, PLoS computational biology 8, e1002596 (2012).
  • Pfeil et al. (2016) T. Pfeil, J. Jordan, T. Tetzlaff, A. Grübl, J. Schemmel, M. Diesmann,  and K. Meier, Physical Review X 6, 021023 (2016).
  • Pedroni et al. (2016) B. U. Pedroni, S. Das, J. V. Arthur, P. A. Merolla, B. L. Jackson, D. S. Modha, K. Kreutz-Delgado,  and G. Cauwenberghs, IEEE transactions on biomedical circuits and systems 10, 837 (2016).
  • Merolla et al. (2014) P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada, F. Akopyan, B. L. Jackson, N. Imam, C. Guo, Y. Nakamura, et al., Science 345, 668 (2014).
  • Mainen and Sejnowski (1995) Z. F. Mainen and T. J. Sejnowski, Science 268, 1503 (1995).
  • Reinagel and Reid (2002) P. Reinagel and R. C. Reid, Journal of Neuroscience 22, 6837 (2002).
  • Toups et al. (2012) J. V. Toups, J.-M. Fellous, P. J. Thomas, T. J. Sejnowski,  and P. H. Tiesinga, PLoS computational biology 8, e1002615 (2012).
  • Jonke et al. (2016) Z. Jonke, S. Habenschuss,  and W. Maass, Frontiers in neuroscience 10, 118 (2016).
  • Fonseca Guerra and Furber (2017) G. A. Fonseca Guerra and S. B. Furber, Frontiers in neuroscience 11, 714 (2017).
  • Sutskever and Hinton (2007) I. Sutskever and G. Hinton, in Artificial intelligence and statistics (2007) pp. 548–555.
  • Taylor and Hinton (2009) G. W. Taylor and G. E. Hinton, in Proceedings of the 26th annual international conference on machine learning (ACM, 2009) pp. 1025–1032.
  • Alemi et al. (2015) O. Alemi, W. Li,  and P. Pasquier, in 2015 International Conference on Affective Computing and Intelligent Interaction (ACII) (IEEE, 2015) pp. 442–448.
  • Edwards and Anderson (1975) S. F. Edwards and P. W. Anderson, Journal of Physics F: Metal Physics 5, 965 (1975).
  • Carleo and Troyer (2017) G. Carleo and M. Troyer, Science 355, 602 (2017).
  • Czischek et al. (2018) S. Czischek, M. Gärttner,  and T. Gasenzer, Physical Review B 98, 024311 (2018).