The rapidly increasing progress on neuromorphic computing and the ongoing research of spiking systems such as the third generation of neural networks calls for a better understanding of the fundamental processes of neuromorphic hardware systemsMaass (1997); Huang and Rao (2014); Jimenez Rezende and Gerstner (2014); Kornijcuk et al. (2016); Marblestone et al. (2016); Gerstner et al. (2012), for a recent review on neuromorphic computing see Schuman et al. (2017). As a parallel computing platform, these systems may be used in the long run to accurately simulate and compute large systems, in particular given their low energy consumption.
Possible applications range from an effective implementation of artificial neural networks and further machine learning methodsMaass (2014); Neftci et al. (2014, 2016); Pedroni et al. (2013); Leng et al. (2018); Hinton (2002), a better understanding of biological processes in our brains Pecevski et al. (2011); Nessler et al. (2013) to the computation of physical and stochastic interesting systems Czischek et al. (2018); Carleo and Troyer (2017); Gao Xun and Duan Lu-Ming (2017). Many artificial neural networks and physical systems are described by Boltzmann distributed systems. For a quantitatively accurate computation of such systems, it is necessary to deduce an exact representation on neuromorphic hardware systems Jordan et al. (2017); Pedroni et al. (2013); Probst et al. (2015); Buesing et al. (2011)
, in particular for systematic error estimates.
Our work is motivated by the similarity of the Langevin dynamics and leaky integrate-and-fire (LIF) neurons for performing stochastic inference Petrovici et al. (2016). Indeed, the fundamental dynamics of LIF neurons is governed by Langevin dynamics. Apart from its obvious relevance for the description of stochastical processes, the Langevin equation Lemons and Gythiel (1997) can also be used for simulating quantum field theories with stochastic quantization Damgaard and Huffel (1987); Batrouni et al. (1985); Parisi and Wu (1981)
. In this approach the Euclidean path integral measure is obtained as the stationary distribution of a stochastic process. This paves the way to the heuristic approach of using complex Langevin dynamics as a potential method for accessing real time dynamics and sign problems. The latter problem is e.g. prominent in QCD at finite chemical potentialAarts and Stamatescu (2008); Aarts et al. (2013); Aarts (2012). A further interesting application of the Langevin equation can be found in Welling and Teh (2011)
. There, Langevin dynamics is combined with a stochastic gradient descent algorithm to perform Bayesian learning which enables an uncertainty estimation of resulting parameters.
Many of the above mentioned systems are discrete ones, the simplest one being a two-state system. This suggests the formulation of a discrete analogue of the continuous Langevin dynamics for the accurate description of discrete systems. In the present work we show that a formulation of Langevin dynamics for discrete systems leads to a new class of a generic stochastic process, namely the Langevin equation for discrete systems,
The present work concentrates on the potential of the novel process for a more accurate implementation of Boltzmann distributed systems on the neuromorphic hardware. This leads to a new architecture of neurons based on a self-interacting contribution. The self-interacting term changes manifestly the dynamics of the neural network. This results in an activation function which is much closer to a logistic distribution, the activation function of a Boltzmann machine, than existing approaches. The new architecture can be applied to both, discrete two state systems and neuromorphic hardware systems with a continuous membrane potential and a spiking character. In this workspiking character refers to an effective mapping of a continuous potential to two discrete neuron states in an interacting system. The dynamics differ in their kind of noise that is uncorrelated in the former case and autocorrelated in the latter case. Figure 1 compares the different network structures and gives an overview over existing and contributed new dynamics of this work.
An exact representation of the activation function of the Boltzmann machine is necessary to obtain correct statistics in coupled systems on neuromorphic hardware systems. In the present work we show in a detailed numerical analysis that small deviations in the activation function propagate if a rectangular refractory mechanism or interactions between neurons are taken into account. These small deviations have a large impact on the resulting correlation functions and observables. The numerical results demonstrate that a reliable estimation, understanding and control of different sources of errors are essential for a correct computation of Boltzmann distributed systems in the future.
The paper is organised as follows. The Langevin equation for discrete systems is derived in Section II. In Section III, the so called sign-dependent discrete Langevin machine is introduced as a special case of the Langevin equation for discrete systems. The mappings of the different dynamics for discrete systems onto an Ornstein-Uhlenbeck process with spiking character is discussed in Section IV. Section V recapitulates relations between the discrete Langevin machine and the neuromorphic hardware system. In Section VI, numerical results of the introduced and of existing dynamics are presented, possible sources of errors are extracted and the propagation of errors for different abstractions of a neuromorphic hardware system is analysed. The conclusion and outlook can be found in Section VII.
Ii Discrete Langevin Dynamics
The Langevin equation for discrete systems is derived inspired by a comparison of the Metropolis algorithm and the Langevin dynamics (for a detailed comparison see Appendix A
). The general formulation of a Langevin equation for discrete systems presented in this Section is similar to a Monte Carlo algorithm and is driven by a Gaussian noise contribution. The transition probability to a proposed new state is regulated by the introduction of truncating Gaussian noise. It is shown that the accuracy of the process strongly depends on an intrinsic parameterand the scale of the energy contribution.
Certain necessary properties of a possible Langevin equation for discrete systems can be stated beforehand based on the comparison of the Langevin dynamics and the Metropolis algorithm in Appendix A. First of all, an infinitesimal change of the microscopic state/field is not possible. Therefore, one has to switch from a parallel to a random sequential update mechanism. Further, the computer time directly is the time scale of the approach. The proposal field has to be chosen from a discrete distribution. One may select the proposal field according to some distribution around the current field. However, since a parallelisation is not possible, the uniform selection probability of a Metropolis algorithm can be adopted.
Assuming the same acceptance probability as in the continuous case, a starting point is the following proportionality,
With the help of a relation between the cumulative Gaussian distribution and the exponential function, given by (52), and with , this can be rewritten in the following way,
for . An analytical expression for the additional scaling factor is given in equation (53).
The Gaussian noise contribution
is uncorrelated and has variance,
Taking the current state
into account, one can transform the sampling from the cumulative normal distribution into a general stochastic update rule with Gaussian noise and a proposal state. This leads us to (1), already presented in the introduction,
where needs to be chosen sufficiently small. represents the Heaviside function.
The update formalism corresponds to a single spin flip Monte Carlo algorithm with a random sequential update mechanism, driven by Gaussian noise. It can be immediately seen within the present form that a flip to a proposed field gets the more unlikely the smaller . Adaptations of the Gaussian noise term to truncated Gaussian noise can help to improve the dynamics, i.e., to increase the probability of a spin flip. In principle, this corresponds to a rescaling of the transition probability term similar to a maximization of the spin flip probability in the Metropolis algorithm.
The truncated Gaussian noise term can be expressed by the following parametrization,
where is in the range of
The improved update rule is
Transition probabilities of further standard Monte Carlo algorithms can be emulated by other choices of . Note that for a uniform random number and a proposal field , an equivalent formulation to (8) can be stated for the transition probability (2),
Processes with a different value of , i.e., a different rescaling of the transition probability, can always be mapped onto each other by a respective rescaling of the time. Given a transition probability and a scaling factor , the following relation holds,
Most of the existing single spin flip algorithms can be reformulated into a Langevin equation for discrete systems with the same derivation, as presented in this Section. However, it can be shown that for the particular choice of the transition probability according to equation (2), the resulting order of accuracy in the detailed balance equation is the best one, with .
The update formalism (8) represents a Langevin like equivalent for discrete systems to the Langevin dynamics of continuous systems. As for continuous systems, the dynamics depends on Gaussian noise and is based on a rather simple expression. The algorithms can also be applied to continuous systems due to the equivalence to standard Monte Carlo algorithms in the limit .
Iii Sign-Dependent Discrete Langevin Machine
The Langevin equation for discrete systems (8) turns into a rather simple expression for a two-state system. The resulting dynamics is introduced in the following as sign-dependent discrete Langevin machine (). The represents a new architecture for interacting neurons with the particularity of a self-interacting contribution. The derived network structure results in a new basic dynamics with different weights and biases compared to the Boltzmann machine. It has the unique property that the equilibrium distribution converges in the limit , despite a different underlying dynamics, to a logistic distribution, the activation function of the Boltzmann machine.
We define the energy of the Boltzmann machine in the common way by
where are symmetric weights between the neurons and and is some additional bias. The domain of definition of the states at each neuron is given by .
where the transformed parameters are defined as follows: , and . Figure 2 illustrates a comparison between the structure of the Boltzmann machine and the new update dynamics. The activation function of the is given in the limit of by a logistic distribution,
The term Langevin machine is chosen because of the similarity of the network to the Boltzmann machine and to Langevin dynamics. The adjective discrete is added to avoid confusion with the Langevin machine presented in Neelakanta et al. (1991). The noise term in the dynamics can be chosen according to equation (6), i.e., it can be a Gaussian noise or a truncated Gaussian noise. The self-interaction term and the contribution of the bias lead in dependency of the state of the neuron for small values of to a strong shift of the mean value into a positive or negative direction. Respectively, the neuron stays very long in an active regime or in an inactive regime in the case of Gaussian noise. The process fluctuates between two different fundamental descriptions. The addition sign-dependent is used to emphasize this property and to point out that the so far presented dynamics is a particular realisation of the discrete Langevin machine, a larger class of network implementations with a Gaussian noise distribution. This is discussed in more detail in Section V. The exponent ’’ in the abbreviation signifies the fluctuation between the two regimes. The implicit dynamics (13) allows different interpretations and implementations.
Absorbing the Gaussian noise term into the bias, the resulting network has a rectangular decision function and can be interpreted as a neural network with a noisy bias. The simplicity of the update rule might be especially for neuromorphic systems very helpful for a computation of physical statistical systems, which are Boltzmann distributed. The implementation of an exponential function in the system is much more challenging than generating Gaussian noise. The rectangular decision function further coincides with the threshold function of spiking neurons. Accordingly, a possible adaptation of the dynamics on neuromorphic systems is obtained by the introduction of an additional time scale and staggered Gaussian noise peaks.
In the present work we pursue an alternative approach which is discussed in the next Section. Instead of performing an implicit update, it is also possible to explicitly compute the probability for an activation of the neuron in the next step. This probability is given by
In contrast to the Boltzmann machine, the transition probability is not the same probability as the activation function (14).
Finally, the exhibits a totally different dynamics than the Boltzmann machine. The dynamics is characterised by a Gaussian noise term as stochastic input, a self-interacting term, its simplicity and multiple possible implementations. Transition probabilities and correlation times can be easily controlled by usage of truncated Gaussian noise. Finite values of lead to a greater or lesser extent to deviating observables, depending on the structure of the Boltzmann machine. The source of error is given by the error term of order in the Taylor expansion of the detailed balance equation. Here, corresponds to the total input for a neuron , according to Appendix D. Exact results of a Boltzmann machine can be obtained by an extrapolation to the limit .
Iv Neuromorphic Hardware System
In this Section, we discuss different approaches for a projection and an accurate computation of the Boltzmann machine on a neuromorphic hardware system. This is one of the possible application of the sign-dependent Langevin machine. Several steps are necessary for a successful projection, as indicated in Figure 3. Each step describes a different level of abstraction of a neuromorphic system. A separate consideration of different aspects of such a system enables a clear distinction and identification of different sources of errors. Note that the diagram in Figure 3 is not the only possible approach to such a projection.
In Petrovici et al. (2016), an analytic expression for the neural activation function of leaky integrate-and-fire (LIF) neurons has been derived for the hardware of the BrainScaleS project in Heidelberg. It was demonstrated how the neuromorphic hardware system can be used to perform stochastic inference with spiking neurons in the high-conductance state. The microscopic dynamics of the membrane potential of a neuron can be approximated in this state with Poisson-driven LIF neurons by an Ornstein-Uhlenbeck process. The spiking character of the system is obtained by a threshold function which maps the system onto an effective two-state system.
Particular properties of LIF sampling are:
a description of the microscopic state of a neuron by a continuous membrane potential,
an autocorrelated noise contribution to the membrane potential,
a spiking character with an asymmetric refractory mechanism and
non-trivial and non-constant interaction kernels between neurons.
In the present Section we study simplified dynamics of LIF sampling. This allows us to analyse the impact of particular hardware related properties and sources of resulting errors on different levels of abstraction. After a short introduction to the principles of LIF sampling, a mapping of the is presented with respect to several particularities of the hardware system. Possible sources of errors of the mapping are discussed. We also relate our novel approach to the standard approach which relies on a fit of the activation function of the hardware system to the activation function of the Boltzmann machine.
The Section ends with an analysis of the impact of a refractory mechanism on the dynamics as a further step towards LIF sampling.
iv.1 LIF Sampling
The spikey neuromorphic system of the BrainScaleS project emulates spiking neural networks with physical models of neurons and synapses implemented in mixed-signal microelectronicsSchemmel et al. (2010); Petrovici et al. (2016). With the help of Poisson-driven leaky integrate-and-fire (LIF) neurons, it is possible to obtain stochastic inference with deterministic spiking neurons. The dynamics of the free membrane potential of a neuron can be approximated in the high-conductance state by an Ornstein-Uhlenbeck process,
In (16), determines the strength of the attractive force towards the mean value . The mean value consists of some leak potential and an additional averaged noise contribution. depends on the contribution from the Poisson background.
Inspired by a biological neuron Petrovici (2016), the neuron emits a spike when the membrane potential exceeds a certain threshold . It is active and is reset to for a refractory time afterwards, where the neuron is considered as inactive. This is also sketched in Figure 4. One has to distinguish between the effective membrane potential (red curve), which is unaffected by the spiking dynamics, and the real membrane potential (blue curve). As in Petrovici et al. (2016), it is assumed that the convergence of from to takes place in a negligible time after the finite refractory time has elapsed.
iv.1.1 Activation Function
One can calculate distributions for the so called burst lengths and the mean first passage times of the membrane potential with the help of transition probabilities . These are given by a corresponding Fokker-Planck equation of the Ornstein-Uhlenbeck process in the high conductance state. The burst length is the number of consecutive spikes. The mean first passage time corresponds to the mean duration which it takes for the membrane potential to pass the threshold
from a lower starting point. This is the time between an end of a burst and the next spike. From an iterative calculation one can derive an activation function, i.e., a probability distribution for the neuron to be active (), in terms of these probabilities Petrovici et al. (2016),
where corresponds to the mean drift time from the resting potential to . The distribution over burst lengths is represented by and the distribution over the mean times between burst regimes is given by .
iv.1.2 A Simplified Model
If the refractory time is neglected, the neuron can be interpreted as active if the effective membrane potential is above a certain threshold, and as inactive otherwise. The resulting dynamics corresponds to the level of abstraction (c) in Figure 3. The neuron state is given by: . The process is referred to as an Ornstein-Uhlenbeck process with spiking character. Interacting contributions are implemented on the basis of the projected neuron states instead of their actual effective continuous membrane potential. The activation function is a cumulative Gaussian distribution,
with the equilibrium distribution of the Ornstein-Uhlenbeck process (16),
For convenience, the threshold potential is set to zero in further considerations.
The more accurate expression of the activation function, given by equation (17), takes the finite refractory time into account. The actual activation function is somewhere between the logistic distribution and the cumulative Gaussian distribution.
In the following, we neglect the finiteness of the refractory time. Therefore, we consider mostly simplified theoretical models of the hardware system of the level of abstraction (c). The models can be used to analyse Gaussian noise as stochastic input and the impact of autocorrelated noise in a system with a microscopic real time evolution of the membrane potential. Given by the spiking character, the considered models correspond effectively to interacting two-state systems. A discussion with respect to a refractory mechanism, as a process of the level of abstraction (d), is given in Section IV.4.
iv.2 Boltzmann Machine
In this Section we discuss a mapping of the Boltzmann machine (BM) onto an Ornstein-Uhlenbeck process with spiking character as a simplified model of a neuromorphic hardware system. A successful mapping demands a logistic distribution as activation function and a correct handling of interactions between neurons.
A possible approach is a fit of the activation function with a scaling parameter and a shift parameter to the desired logistic distribution according to , with Petrovici et al. (2016); Leng et al. (2018); Probst et al. (2015). Interactions can be taken into account by absorbing their contributions into the mean value of the Ornstein-Uhlbenbeck process according to: . It is assumed that the time to equilibrium is negligible after a change of an interacting neuron.
A mapping of the Boltzmann machine to a process of the level of abstraction (c) can be performed straigtforward by identifying the total input (see Appendix D) of a neuron with the mean value of the Ornstein-Uhlenbeck process according to: . This can be achieved by setting the average noise contribution to zero, adjusting the leak potential to , and by taking the interacting contributions into account,
with and where and . With , we obtain the following activation function,
The process is abbreviated in the following by . The ’1’ in the exponent is chosen in compliance with the process and indicates that the process takes place in one regime, i.e. the process does not fluctuate between two fundamental dynamics. The fitting of the activation function to the logistic distribution is indicated by the additional ’F’. The process without any fitting parameters (, ) is denoted as .
We can also formulate an update rule with the same resulting activation function for a discrete two-state system, i.e., a system without a membrane potential. The resulting system is build upon an immediate representation of the neuron state, as it is the case for the BM and the . The resulting process corresponds to a transition from the level of abstraction (c) to the level (b) and is driven by uncorrelated Gaussian noise.
The related update rule of level of abstraction (b) is derived in a similar manner as for the Langevin equation for discrete systems. It is given by,
where the updates take place in computer time. The corresponding transition probability reads,
The dynamics has an additive Gaussian noise term and the Heaviside function as a projection onto the domain of definition of . Therefore, it is very similar to the sign-dependent discrete Langevin machine (13). The update rule is studied in Jordan et al. (2017) in more detail and introduced in Neftci et al. (2016) as an approximation of the so called Synaptic Sampling Machine. In compliance with the and the , we use the abbreviation for the process. When the activation function is fitted to the logistic distribution, is the corresponding acronym. In the latter case, sources of errors are resulting deviations due to an imperfect fit and finite times to equilibrium if an interacting neuron changes its macroscopic state Jordan et al. (2017).
iv.3 Sign-dependent Langevin Machine
The sign-dependent discrete Langevin machine and LIF neurons exhibit similar underlying dynamics. This motivates a mapping of the onto an Ornstein-Uhlenbeck process with spiking character in the same manner as in the previous Section, i.e. from the level of abstraction (b) to (c). The resulting process represents a continuous counterpart to the sign-dependent discrete Langevin machine and is referred to as sign-dependent Ornstein-Uhlbenbeck process (). The activation function of the process converges in the limit also to an logistic function. As illustrated in Figure 1, the two processes differ in their microscopic representation and their time scales. The corresponds to a process with two discrete states and the computer time as time scale. The process describes the temporal evolution of a membrane potential in real time, whereas the interactions between neurons are based on a projection of the potential onto two states.
The total input of the dynamics of the previous Section is exchanged for a mapping onto an Ornstein-Uhlenbeck process by the redefined membrane potential of the sign-dependent discrete Langevin machine: . This leads to the following dynamics of the sign-dependent Ornstein-Uhlenbeck process,
with . The additional scaling factor of is omitted, i.e., it holds: , and . The term sign-dependent reflects again the property of the neuron to stay very long in an active regime (’’), or in an inactive regime (’’).
Intuitive arguments for a convergence of the activation function of the process to a logistic distribution can be given by considering the regimes separately. The equilibrium distribution for a dynamics, which is only in the active regime, is given by equation (18) with . The equilibrium distribution of the inactive regime is implemented by . The tails of these two distributions overlap in the region around , as illustrated in Figure (a)a. The separately considered equilibrium distributions are a good approximation for small values of after reweighting them with the corresponding stationary probability distributions. If the membrane potential randomly crosses the threshold , it perceives a strong drift towards the other regime, due to the changing mean value, i.e., due the switch , and the large gap between the mean value of the two regimes. An immediate return to its initial regime gets rather unlikely. According to this argument, the transition probabilities of the sign-dependent Ornstein-Uhlenbeck process to become active or inactive can be approximated in the limit by the probability that the membrane potential reaches the threshold potential . This corresponds to the identifications
The resulting activation function of the sign-dependent Ornstein-Uhlenbeck process can be obtained by computing
The distribution converges in the limit of to the logistic distribution
the activation function of the Boltzmann machine.
One resulting source of error is the worse approximation of the transition probabilities of the process with and for larger values of . The deviations can be seen in Figure (a)a, when one compares the tail distributions of the fitted weighted equilibrium distributions of of the two regimes with the actual measured distribution. The distributions of the two regimes are closer together and crossings between the active and the inactive state take place more often. This can be compensated to some extent by the introduction of an additional rescaling factor as it has been done in the previous Section for the fitting of the process. An advantage is that this can also be done after running the simulation. An analytic expression for an appropriate rescaling factor has not been found yet, and is subject to current work. The scaling factor is different to the factor of the since the convergence to the logistic distribution is caused by different characteristics for the two processes.
A second source of error is again the finite time, a neuron needs to be in equilibrium after a change of an interacting neuron or due to a change of the regime of itself. The relative amount of time, where the process is in non-equilibrium, and therefore resulting errors increase for larger values of .
An advantage of the sign-dependent Ornstein-Uhlenbeck process is the possibility to extrapolate results for different values of to the limit of , i.e., to exact results of the Boltzmann machine. In the case of an additional scaling factor, it has to be found for a possible extrapolation a dependency between the scaling factor and epsilon . Figure 6 compares numerical scaling factors of the process with the known scaling factor of the for different biases in the free case, i.e. for the activation function. A disadvantage of the process is that smaller values of lead to larger correlation times and therefore to a higher simulation cost. This results from the limitation that it is not possible to accelerate the dynamics by an adaptation of the noise source, as it is the case for the . From another perspective, this property might even help to straighten out problems related to the hardware, like non-trivial post-synaptic shapes, for example.
A comparison of the dynamics in (IV.3) with the mapping of the Boltzmann machine in equation (IV.2) shows that the self-interaction and the dynamics of the discrete Langevin machine are key ingredients for a successful mapping onto the spiking system. The properties of the process are dominated essentially by the self-interacting term. Therefore, the process is not just an Ornstein-Uhlenbeck process, but represents a new kind of dynamics with a different resulting equilibrium distribution and, up to now, non-investigated properties. Similar dynamics which contain projected values of interacting potentials might serve as a starting point for an entire new class of dynamics.
iv.4 Refractory Mechanism
A possible further step towards LIF sampling is to take into account a refractory mechanism. This step is given in Figure 3 by the level of abstraction (d). The refractory mechanism can be also considered for a discrete system, for example for the Boltzmann machine. This approach represents a different ordering of the different abstractions of Figure 3.
In a simplified model, it can be assumed that a neuron stays active for the refractory time , after it got activated. An imbalance between the active and the inactive state is caused by this property. This asymmetry can be compensated by reducing the transition probability to become active by a factor of , as discussed in Buesing et al. (2011). The factor can be absorbed into the membrane potential by a shift of the activation function by , i.e., by . Note that the sign-dependent processes lead to a reformulation of the neuron computability condition of Buesing et al. (2011) due to the inherent dependency of the dynamics on the neuron state itself.
For the cumulative Gaussian distribution, an absorption of the factor of is not possible anymore. The resulting activation function with a finite refractory time is deformed. The deformation gets larger for larger refractory times, as can be seen in Figure 7. We conclude that the errors of the activation function to the logistic distribution without a refractory mechanism propagate and increase for dynamics with finite refractory times . The resulting deformation of the activation function can be identified as a further source of error.
Within the last level of abstraction of Figure 3, interactions between neurons or with the neuron itself are in general not constant. The so called post-synaptic potential (PSP) corresponds to the received input potential of an interacting neuron Petrovici et al. (2016). In Appendix E, the relation between a correct implementation of the weights based on the interaction kernel is discussed in more detail. In this work, only rectangular PSP shapes are considered. An investigation of exponential PSP shapes is postponed to future work.
It is important to distinguish between the refractory mechanism as a property of each neuron itself and the post-synaptic potential. The latter needs only to be taken into account if interactions between neurons are considered. In particular, this means that the PSP shape affects only the activation function of the sign-dependent processes due to their self-interacting contribution.
V Discrete Langevin machine
The Ornstein-Uhlenbeck process with a spiking dynamics and correlated noise offers the possibility to simulate a discrete two-state system by an underlying continuous dynamics. The discrete Langevin machine can be interpreted as a discrete counterpart to those spiking systems with uncorrelated noise, as indicated in Figure 1. It has been shown, that it is possible to map different realisations of simplified theoretical models of the neuromorphic hardware system onto a two-state system (see Figure 3 as well as the dynamics (IV.2) (23) and (IV.3) (13)). Denoting the hardware as , with parameters , and the discrete Langevin machine as , with , we can state the important relation,
i.e., there exists a discrete two-state system for each set of parameters of the hardware which emulates the dynamics of the spiking system in discrete space.
In terms of update dynamics this corresponds to the mapping of
onto a discrete dynamics
for all realisations of and with .
Relation (29) and the formal introduction of the discrete Langevin machine can be seen as a theoretical framework to describe different possible implementations as well as several levels of abstraction of LIF sampling in terms of processes in real time with a continuous membrane potential and spiking character and processes in computer time with discrete states. For an exact mapping , the magnitudes of the sources of error have to be matched. Table 2 gives an overview of the presented theoretical models and their properties regarding different levels of abstraction of a neuromorphic system.
|Gibbs Sampling (BM)|
|logistic distribution||logistic distribution||logistic distribution||logistic distribution||logistic distribution||cumulative Gaussian distribution||cumulative Gaussian distribution|
|Time scale||computer time||computer time||computer time||real time||real time||computer time||real time|
|Extrapolation to exact solution?||-||no||yes||no||yes||-||-|
|Extrapolation to exact solution?||-||no||yes||no||yes||-||-|
|Control of a refractory mechanism||exact Buesing et al. (2011)||constant shift||constant shift||constant shift||constant shift||non-trivial shift||non-trivial shift|
Numerical results are discussed for the Langevin equation for discrete systems of Section II, for the newly introduced sign-dependent Ornstein-Uhlenbeck process as well as for existing approaches. We start with an analysis of the Clock model in Subsection VI.1. Dynamics and equilibrium distributions of the free membrane potential are compared in Subsections VI.2 and VI.3 for the discrete Langevin machine and for abstractions of the neuromorphic hardware system, according to Figure 3 and Table 2. The focus is on a correct implementation of the logistic distribution of the Boltzmann machine and on a detailed analysis of the impact of different sources of errors. The systems are considered with and without an asymmetric refractory mechanism with a rectangular post-synaptic shape. The Section ends with a computation of the Ising model by a projection of the model on the Boltzmann machine and with a numerical investigation of a Boltzmann machine with three neurons in subsection VI.4. Both models serve as a benchmark for Boltzmann distributed systems with interacting neurons.
vi.1 -state Clock Model
The -state clock model Potts (1951, 1952) describes spins with different states which are parametrised by . It is used to verify numerically the Langevin equation for discrete systems, as a first example. The model has the following Hamiltonian,
The sum runs over all nearest neighbour spin pairs . In a complex plane one can interpret the spin states as equally distributed states on an unit circle. The common Potts model Wu (1982) is derived from this initial model. For the model corresponds to the Ising model and in the limit of , it describes the continuous XY-model. For the system emulates two independent Ising models.
The clock model exhibits for
a second order phase transition. It exists an exact solution for the inverse critical temperature for, which is as follows Lapilli et al. (2006),
where the Boltzmann constant has been set to . An appropriate order parameter for the system is the average magnetization per spin, which can be defined as
where the sum runs over all spins of a lattice with sites. The specific heat capacity per spin is considered as a further observable and is defined as
where denotes the expectation value Newman and Barkema (1999).
Numerical results for the magnetization and the specific heat are illustrated in Figure (a)a and Figure (b)b in dependency of the inverse temperature and of different values of . Results of the Metropolis algorithm as a standard Monte Carlo algorithm (MC) serve as benchmark. The inverse critical temperature can be read off from the maximum of the specific heat. In Figure 9, the relative deviations of the inverse critical temperature to the inverse critical temperature of the Metropolis algorithm are plotted against .
The resulting deviations for finite values of can be explained by a detailed error analysis of the transition probabilities of the Langevin equation for discrete systems. For this purpose, one has to check the compliance of the detailed balance equation,
In Figure (a)a, it can be seen that the absolute error of the cumulative Gaussian distribution is asymmetric around . This imbalance leads to a shift of the effective fraction of transition probabilities and therefore to a change of the equilibrium distribution of the spin states. The strength of this shift grows for larger values of , which corresponds to larger values of , and larger values of . The effect can be nicely observed in the change of the specific heat in Figure (b)b with growing . In general, it holds: the larger , the worse is the compliance of the detailed balance equation and the larger is the resulting shift of the equilibrium distribution.
vi.2 Neuromorphic Hardware vs. Langevin Machine
We analyse numerically the mapping between dynamics of the discrete Langevin machine and the continuous dynamics according to relation (29) by an explicit consideration of transition probabilities. It is discussed the impact of deviations in the transition probabilities as well as a mapping of the temporal evolution of the different processes onto each other with respect to resulting activation functions for the free membrane potential.
Differences of the two dynamics which are given by construction are illustrated in Figure 1. The processes correspond to the levels of abstraction (b) and (c) of a neurmorphic hardware system in Figure 3. The essential differences are the source of noise which is for the Langevin machine uncorrelated and for the Ornstein-Uhlenbeck process correlated as well as the representation of a microscopic state. The dynamics is described in the former case by two discrete states in computer time and in the latter one by the evolution of a continuous membrane potential with spiking character in real time. We evaluate the impact of the different sources of errors on the deviation to the expected logistic distribution for the sign-dependent and for the fitted processes: , , and .
vi.2.1 Activation Function
Figures 7 and (a)a illustrate the activation functions of the free membrane potential in dependency of the bias in the network for the different presented dynamics. The results of the and the process coincide exactly and their deviation to the cumulative Gaussian distribution emerges from numerical errors. In concordance to these observations, the fitted and process have the same deviations to the logistic distribution in Figure (a)a. In the case of the and the process, the observed deviations mirror the theoretical errors for finite values of . As depicted in Figure (c)c, both activation functions converge in the limit of . The rate of convergence of the process is much smaller than the one of the for equal values of . This can be reasoned by the different sources of errors for the two processes, as discussed in detail at the ends of Sections III and IV.3. The results of the process motivate the research for an analytic or numerical expression of the dependency . In contrast to the and , deviations to the logistic distribution are limited only due to larger correlation times for smaller values of for the process.
vi.2.2 Dynamics - Time Evolution
It has been found numerically that the computer time and the real time coincide for the and the Ornstein-Uhlenbeck process. All simulations in real time are performed with finite time steps of . All processes in computer time are computed with a random sequential update formalism and in real time by a parallel update scheme. The time scale in all figures is chosen in units of the computer time.
Figure 11 compares trajectories of the different discussed processes with respect to a uniform time scale. It can be observed in the evolution of the membrane potential for all processes that there occur fast changes if the membrane potential is close to the threshold value . These perturbations seem to have no influence on the time evolution and the equilibrium distribution.
As discussed in Section II, a scaling factor can be found for a correct mapping of the temporal evolution of two processes A and B if both processes exhibit the same equilibrium distribution. The scaling factor is given under usage of equation (11) and by a computation of the transition probabilities by
Analytic expressions for the transition probabilities of the considered processes are given in Table 1. The given transition probabilities have been validated numerically. For that purpose we have mapped the temporal ensemble evolution of the different dynamics onto the evolution of the Boltzmann machine with respect to the computed scaling factors. A scaling factor reflects the increase/decrease of the correlation time for processes with different transition probabilities. In Figure 12, the dependency of the scaling factor on is plotted for the sign-dependent processes.
The considerations of the time evolution reinforce that the relation of equation (29) corresponds to an exact mapping of the dynamics of a discrete system with uncorrelated noise to a continuous system with correlated noise. This property is not self-evident. However, the dependency is in some cases non-trivial, due to different occurring sources of errors of the considered models.
vi.3 Refractory Mechanism
In this Section we investigate the impact of an asymmetric refractory mechanism of a neuromorphic system with a rectangular PSP shape. This has been introduced in Section IV.4 as the level of abstraction (d) with regard to Figure 3.
vi.3.1 Control of the Refractory Mechanism
We concentrate on a correct representation of the logistic function with a refractory mechanism. The imbalance between the inactive and the active state can in general not be compensated entirely by a trivial shift of the bias by for the cumulative Gaussian distribution. The activation functions of the sign-dependent processes and the fitted dynamics are close to a logistic distribution. Therefore, a shift can be used to approximate the logistic distribution with a refractory mechanism. However, for large refractory times this approximation gets worse, as indicated in Figure 7.
In this work, the shift of the activation function is determined by the constraint that . The actual shifts of the bias are slightly different for the and the as a consequence of resulting deformations of the cumulative Gaussian distribution for larger refractory times (see Figure 7). We also introduce a further time constant . This allows us to distinguish clearly between the refractory time of a neuron and the resulting optimal shift for a correct fixing of the activation function. Ideally, one should derive a dependency to preserve a consistent fixing of the activation function.
For the process it holds true that , since the deviation of the activation function to the logistic distribution is nearly symmetric around . Nevertheless, a shift by leads to a worse approximation of the transition probability, since the Taylor expansion of the cumulative Gaussian distribution of the dynamics is performed around . This is a further error source.
For the process, the necessary shift of the bias by is much larger than . Dependencies of for fixed values of epsilon and for are illustrated in Figure 13. The large differences in and in can be traced back to the different microscopic dynamics of the processes and to the different origin of a correct implementation of the activation function for the processes without a refractory mechanism.
In contrast to the process, the dynamics of the process fluctuates between the active and the inactivate regime which are distinguished and driven by the self-interacting contribution. Integrating a refractory mechanism, a change from the active to the inactive regime by the self-interacting term is suppressed as long as the neuron is captured in its refractory mode. This can be seen in the lower plot of Figure (b)b. Consequently, the distributions and have a dissimilar impact on the resulting distribution of . The lower tail distribution of , i.e. the distribution for , biases the distribution around the threshold within the refractory time. In contrast, the upper tail distribution of does not affect , since the dynamics changes for from the inactive to the active regime. The local minimum of around as well as the entire distribution are shifted to smaller values, as a result of this asymmetry, as illustrated in Figure (b)b. Further, the absolute value of the minimum is larger than the one for the process without a refractory mechanism. The imbalance between and results in larger deviations of the activation function for the process with a refractory mechanism. This asymmetry corresponds to a further source of error.
The equilibrium distribution needs to undergo a larger shift by than the process for a compensation of the impact of the refractory mechanism. This is a consequence of a partially suppression of the change of the dynamics to the inactive regime. For the process, the underlying dynamics is not affected by the refractory mode due to the absence of a self-interacting term. Therefore, the only purpose of the shift by is to fix the transition probabilities to correctly compensate the emerging asymmetry of the refractory mechanism. Respectively, the resulting transition probabilities are expressed in dependence of . Analytic expressions are given in Table 1.
vi.3.2 Activation Function and Time Evolution
Figure (b)b compares the impact of a refractory mechanism on the different dynamics regarding their deviations to the logistic distribution. The deviations of the and the process have increased, as expected by the introduced asymmetry of the refractory mechanism. Nevertheless, the error vanishes for , as illustrated in Figure (c)c. Further, Figure (d)d shows that a further increase of the refractory time has a very low impact on the deviations which ensures an applicability for large refractory times, in practice. As discussed in Section IV.4, the cumulative Gaussian distribution is non-symmetrically deformed by the shift by . This leads to deviations in the activation function of the and the process that can be compensated to a certain extent by an adaptation of the variance, i.e. of the scaling parameter .
We conclude that a refractory mechanism with rectangular PSP shape has no impact on a possible control of the sources of errors for the sign-dependent processes.
vi.4 Interacting Systems
We consider the Ising model Ising (1925) and the Boltzmann machine Ackley et al. (1985) to investigate the presented abstractions of a neuromorphic hardware system with interactions between neurons. The Ising model can be easily mapped onto the Boltzmann machine. A numerical analysis can be understood as a proof of concept that the presented processes also work in a more complex network set-up. As a second model, we study a Boltzmann machine with three neurons. We compare the results for all presented models with and without a refractory mechanism with a rectangular PSP shape.
The Ising model describes a two-state spin system. The spin states are , which are likewise also referred to as spin up and spin down . The Hamiltonian is defined as
The external magnetic field is set to zero in the following numerical analysis and is some coupling constant. For this particular case, we can consider the averaged absolute value of the magnetization per spin as an order parameter. This is then given by
where the sum runs again over all spins of the lattice for a given configuration. From theoretical considerations, an exact expression for the inverse critical temperature of the model with a vanishing external field can be obtained Onsager (1944),
For a computation with the presented algorithms, we need a mapping between the Boltzmann machine and the Ising model onto the correct domain of definition. The mapping of and can be obtained by the following identifications between and and and Petrovici (2016); Bytschok et al. (2017),
where corresponds to the dimension of the system. The spin state can be computed by .
The Boltzmann machine can have an arbitrarily complex network structure. Particular implementations like the restricted Boltzmann machine turn the Boltzmann machine to an interesting class of networks, which has many applications in different areas of research, see e.g.Czischek et al. (2018); Carleo and Troyer (2017); Gao Xun and Duan Lu-Ming (2017); Montúfar (2018). To study the impact of systems with a higher possible variability, we consider a Boltzmann machine with three neurons and different weights an biases around zero. The Kullback-Leibler divergence Kullback and Leibler (1951)
serves as a measure to numerically classify the quality of the presented processes. We compute the Kullback-Leibler divergence based on the history of a process, starting from a random initial state according to
BM indicates the exact probability distribution of the Boltzmann machine and AM corresponds to the approximated probability distribution of some other model. The sum runs over all possible neuron configurations . The probabilities are approximated by the corresponding histograms of the history of the trajectory in the configuration space.
Figures (a)a and (b)b show the absolute value of the magnetization for the Ising model with a vanishing external magnetic field for the dynamics without and with a refractory mechanism. The observables are computed for the and the process for different values of and for the and the process. In Figure 9, the deviation of the derived inverse critical temperatures is plotted in dependency of for the processes without a refractory mechanism. Figure 15 illustrates the convergence of the considered processes for vanishing . The resulting deviations reflect the magnitude of errors in the representation of the activation function. This reinforces, together with the results of Figure 14, the argument that already small changes in the activation function can lead to large deviations in the resulting observables. This argument also explains the partially worse performance of the processes for the dynamics with a refractory mechanism.
The observables exhibit for the and the process the same tendency as the results for the -state clock model, despite their different sources of errors. The equilibrium distributions are shifted to smaller values of as described in the discussion of Section VI.1. As before, the shift grows with larger values of and of . The similar behaviour of the process can be justified by the similar trend in the deviation of the activation functions of the two processes. The higher rate of convergence of the process is a result of the different source of errors.
Vii Conclusions and Outlook
The newly introduced dynamics pave the way for possible new applications and the discovery of new physics. This includes, for example, a formulation of Langevin dynamics for discrete systems with respect to a possible computation of Hamiltonians with a complex contribution, similar to complex Langevin dynamics Aarts and Stamatescu (2008); Aarts et al. (2013). A further interesting task is to investigate the novel network architecture of the sign-dependent Ornstein-Uhlenbeck process (), given by the dynamics (IV.3) in Section IV, with its particularity of a self-interacting term and resulting fluctuating dynamics, on a neuromorphic hardware system.
The numerical analysis of different abstractions of a LIF network in Section VI demonstrates that the novel network architecture of the sign-dependent discrete Langevin machine () and the process is suitable for an exact computation of correlation functions of Boltzmann distributed systems. This applies to both, a discrete two-state system with uncorrelated noise and a continuous system with autocorrelated noise. The numerical results show that an exact implementation of the logistic distribution or at least a correct estimation of errors is necessary to obtain quantitative exact observables.
It remains to be seen whether this statement is also sufficient and valid for non-trivial PSP shapes, as a last step towards LIF sampling. In particular, one has to analyse the impact of marginal deviations to the activation function on observables of larger and more complex systems than the one considered in this work. Moreover, one may ask whether an exact representation of an activation function with a self-interacting term is sufficient to also obtain reliable and accurate results for interacting neurons independent of the interaction kernel, i.e. the post-synaptic potential, respectively. In other words: Is it possible to extend findings for a single self-interacting neuron to a general complex interacting system. These questions are postponed to future work. Either way, we expect that the existence of a self-interacting contribution in the process helps to better understand arising non-linearities of the neuromorphic hardware.
) represents a further step towards an integration of deep learning and neuroscienceMarblestone et al. (2016); Neftci et al. (2014); Leng et al. (2018). We believe that the present work offers a tool for a better comparison of classical artificial networks and neuromorphic networks.
We thank M. A. Petrovici, A. Baumbach and K. Meier for discussions and collaboration on related subjects. This work is supported by Deutsche Forschungsgemeinschaft (DFG) under Germany’s Excellence Strategy EXC-2181/1 - 390900948 (the Heidelberg Excellence Cluster STRUCTURES).
Appendix A Metropolis Algorithm vs. Langevin Dynamics
The Section reviews shortly properties of the Metropolis algorithm Metropolis et al. (1953) and Langevin dynamics for continuous systems to get a first intuition on how a possible Langevin equation for a discrete system could look like. The considerations are inspired by the work in Baillie and Johnston (1989); Meakin et al. (1983); Ettelaie and Moore (1984).
where corresponds to the Euclidean action, which depends on fields on a (3+1) dimensional hypercubic lattice in Euclidean space. The Langevin equation describes the evolution of the quantum fields in an additional fictious time dimension, the Langevin time . Quantum fluctuations are emulated by the additional white Gaussian noise term with the properties to be uncorrelated,
It can be shown that in equilibrium the resulting distribution of the fields coincides with the Boltzmann distribution: . A common approach to prove this is to derive the equivalence of the Langevin equation and the Fokker-Planck equation in a first step and to compute the static solution in a second step Damgaard and Huffel (1987). This property renders Langevin dynamics a powerful tool in QCD Aarts and Stamatescu (2008); Aarts et al. (2013).
a.1 Transition Probability of the Langevin Equation
The transition probabilities of the discrete Langevin equation are computed, in the following. These are used in a second step for an interpretation of the dynamics as a standard Monte Carlo algorithm. Starting from the discrete Langevin equation
with: and , it is straightforward to compute the transition probabilities of an infinitesimal change,
Inserting the standard normal distribution and computing the square in the exponent one obtains