# Sparse Coding by Spiking Neural Networks: Convergence Theory and Computational Results

In a spiking neural network (SNN), individual neurons operate autonomously and only communicate with other neurons sparingly and asynchronously via spike signals. These characteristics render a massively parallel hardware implementation of SNN a potentially powerful computer, albeit a non von Neumann one. But can one guarantee that a SNN computer solves some important problems reliably? In this paper, we formulate a mathematical model of one SNN that can be configured for a sparse coding problem for feature extraction. With a moderate but well-defined assumption, we prove that the SNN indeed solves sparse coding. To the best of our knowledge, this is the first rigorous result of this kind.

## Authors

• 10 publications
• 8 publications
• 15 publications
• ### Bio-Inspired Multi-Layer Spiking Neural Network Extracts Discriminative Features from Speech Signals

Spiking neural networks (SNNs) enable power-efficient implementations du...
06/10/2017 ∙ by Amirhossein Tavanaei, et al. ∙ 0

• ### On the Algorithmic Power of Spiking Neural Networks

Spiking Neural Networks (SNN) are mathematical models in neuroscience to...
03/28/2018 ∙ by Chi-Ning Chou, et al. ∙ 0

• ### Using ODE waveform-relaxation methods to efficiently include gap junctions in distributed neural network simulations

Waveform-relaxation methods divide systems of differential equations int...
05/13/2021 ∙ by Matthias Bolten, et al. ∙ 0

• ### Evolving Spiking Neural Networks for Nonlinear Control Problems

Spiking Neural Networks are powerful computational modelling tools that ...
03/04/2019 ∙ by Huanneng Qiu, et al. ∙ 0

• ### Predictive Coding as Stimulus Avoidance in Spiking Neural Networks

Predictive coding can be regarded as a function which reduces the error ...
11/21/2019 ∙ by Atsushi Masumori, et al. ∙ 0

• ### A Basic Compositional Model for Spiking Neural Networks

This paper is part of a project on developing an algorithmic theory of b...
08/12/2018 ∙ by Nancy Lynch, et al. ∙ 0

• ### Connection Pruning for Deep Spiking Neural Networks with On-Chip Learning

Long training time hinders the potential of the deep Spiking Neural Netw...
10/09/2020 ∙ by Thao N. N. Nguyen, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

A central question in computational neuroscience is to understand how complex computations emerge from networks of neurons. For neuroscientists, a key pursuit is to formulate neural network models that resemble the researchers’ understanding of physical neural activities and functionalities. Precise mathematical definitions or analysis of such models is less important in comparison. For computer scientists, on the other hand, a key pursuit is often to devise new solvers for specific computational problems. Understanding of neural activities serves mainly as an inspiration for formulating neural network models; the actual model adopted needs not be so much faithfully reflecting actual neural activities as to be mathematically well defined and possesses provable properties such as stability or convergence to the solution of the computational problem at hand.

This paper’s goal is that of a computer scientist. We formulate here two neural network models that can provably solve a mixed -

optimization problem (often called a LASSO problem). LASSO is a workhorse for sparse coding, a method applicable across machine learning, signal processing, and statistics. In this work, we provide a framework to rigorously establish the convergence of firing rates in a spiking neural network to solutions corresponding to a LASSO problem. This network model, namely the Spiking LCA, is first proposed in

[16] to implement the LCA model [15] using analog integrate-and-fire neuron circuit. We will call the LCA model in [15] the Analog LCA (A-LCA) for clarity. In the next section, we introduce the A-LCA model and its configurations for LASSO and its constrained variant CLASSO. A-LCA is a form of Hopfield network, but the specific (C)LASSO configurations render convergence difficult to establish. We will outline our recent results that use a suitable generalization of the LaSalle principle to show that A-LCA converges to (C)LASSO solutions.

In A-LCA, neurons communicate among themselves with real numbers (analog values) during certain time intervals. In Spiking LCA (S-LCA), neurons communicate among themselves via “spike” (digital) signals that can be encoded with a single bit. Moreover, communication occurs only at specific time instances. Consequently, S-LCA is much more communication efficient. Section 3 formulates S-LCA and other auxiliary variables such as average soma currents and instantaneous spike rates. The section subsequently provides a proof that the instantaneous rates converge to CLASSO solutions. This proof is built upon the results we obtained for A-LCA and an assumption that a neuron’s inter-spike duration cannot be arbitrarily long unless it stops spiking altogether after a finite time.

Finally, we devise a numerical implementation of S-LCA and empirically demonstrate its convergence to CLASSO solutions. Our implementation also showcases the potential power of problem solving with spiking neurons in practice: when an approximate implementation of S-LCA is ran on a conventional CPU, it is able to converge to a solution with modest accuracy in a short amount of time. The convergence is even faster than FISTA [4], one of the fastest LASSO solvers. This result suggests that a specialized spiking neuron hardware is promising, as parallelism and sparse communications between neurons can be fully leveraged in such an architecture.

## 2 Sparse Coding by Analog LCA Neural Network

We formulate the sparse coding problem as follows. Given vectors in , , , ( is usually called a redundant—due to —dictionary) and a vector (consider an input signal), try to code (approximate well) as where contains as many zero entries as possible. Solving a sparse coding problem has attracted a tremendous amount of research effort [9]. One effective way is to arrive at through solving the LASSO problem [19] where one minimizes the distance between and with a regularization on the parameters. For reasons to be clear later on, we will consider this problem with the additional requirement that be non-negative: . We call this the CLASSO (C for constrained) problem:

 argmina≥012∥s−Φa∥22+λ∥a∥1 (1)

Rozell, et al., presented in [15] the first neural network model aims at solving LASSO. neurons are used to represent each of the dictionary atoms . Each neuron receives an input signal that serves to increase a “potential” value that a neuron keeps over time. When this potential is above a certain threshold, neuron- will send inhibitory signals that aim to reduce the potential values of the list of receiving neurons with which neuron-

“competes.” The authors called this kind of algorithms expressed in this neural network mechanism Locally Competitive Algorithms (LCAs). In this paper, we call this as analog LCA (A-LCA). Mathematically, an A-LCA can be described as a set of ordinary differential equations (a dynamical system) of the form

 ˙ui(t)=bi−ui(t)−∑j≠iwijT(uj(t)),i=1,2,…,N.

The function

is a thresholding (also known as an activation) function that decides when and how an inhibition signal is sent. The coefficients

further weigh the severity of each inhibition signal. In this general form, A-LCA is an instantiation of the Hopfield network proposed in [10, 11].

Given a LASSO or CLASSO problem, A-LCA is configured by , . For LASSO, the thresholding function is set to , and for CLASSO it is set to : is defined as when and when ; and . Note that if all the s are normalized to , then the dynamical system in vector notation is

 ˙u=b−u−(ΦTΦ−I)a,a=T(u). (2)

The vector function simply applies the same scalar function to each of the input vector’s component. We say A-LCA solves (C)LASSO if a particular solution of the dynamical system converges to a vector and that is the optimal solution for (C)LASSO. This convergence phenomenon was demonstrated in [15].

LCA needs not be realized on a traditional computer via some classical numerical differential equation solver; one can realize it using, for example, an analog circuit which may in fact be able to solve (C)LASSO faster or with less energy. From the point of view of establishing A-LCA as a robust way to solve (C)LASSO, rigorous mathematical results on A-LCA’s convergence is invaluable. Furthermore, any convergence theory here will bound to have bearings on other neural network architectures, as we will see in Section 3. Had the thresholding function in A-LCA be strictly increasing and unbounded above and below, standard Lyapunov theory can be applied to establish convergence of the dynamical system. This is already pointed out in Hopfield’s early work for both graded neuron model [11] and spiking neuron model [12]. Nevertheless, such an A-LCA does not correspond to (C)LASSO where the thresholding functions are not strictly increasing. Furthermore, the CLASSO thresholding function is bounded below as well. While Rozell, et al., demonstrated some convergence phenomenon [15], it is in two later works [1, 2] that Rozell and other colleagues attempted to complement the original work with convergence analysis and proofs. Among other results, these works stated that for any particular A-LCA solution , with converges to a LASSO optimal solution. Unfortunately, as detailed in [18], there are major gaps in the related proofs and thus the convergence claims are in doubt. Moreover, the case of for the CLASSO problem was not addressed. In [18], one of our present authors established several convergence results which we now summarize so as to support the development of Section 3. The interested reader can refer to [18] for complete details.

A-LCA is a dynamical system of the form , . In this case, the function is defined as . Given any “starting point” , standard theory of ordinary differential equations shows that there is a unique solution such that and for all . Solutions are also commonly called flows. The two key questions are (1) given some (or any) starting point , whether and in what sense the flow converges, and (2) if so, what relationships exist between the limiting process and the (C)LASSO solutions.

The LaSalle invariance principle [14] is a powerful tool to help answer the first question. The gist of the principle is that if one can construct a function such that it is non-increasing along any flow, then one can conclude that all flows must converge to a special set111 if where . which is the largest positive invariant set222A set is positive invariant if any flow originated from the set stays in that set forever. inside the set of points at which the Lie derivative of is zero. The crucial technical requirements on are that possesses continuous partial derivatives and be radially unbounded333The function is radially unbounded if whenever . Unfortunately, the natural choice of for A-LCA does not have continuous first partial derivatives everywhere, and not radially unbounded in the case of CLASSO. Both failures are due to the special form of with Based on a generalized version of LaSalle’s principle proved in [18], we establish that any A-LCA flow (LASSO or CLASSO) converges to , the largest positive invariant set inside the “stationary” set .

Having established , we further prove in [18] that is in fact the inverse image under of the set of optimal (C)LASSO solutions. The proof is based on the KKT [6] condition that characterizes and properties particular to A-LCA.

###### Theorem 1.

(A-LCA convergence results from [18]) Given the A-LCA

 ˙u=F(u),F(u)=b−u−(ΦTΦ−I)T(u).

is based on if one wants to solve LASSO and , CLASSO. Let be an arbitrary starting point and be the corresponding flow. The following hold:

1. Let be the set of (C)LASSO optimal solutions and be ’s inverse image under the corresponding thresholding function . Then any arbitrary flow always converges to the set .

2. Moreover, where is the optimal objective function value of (C)LASSO, and .

3. Finally, when the (C)LASSO optimal solution is unique, then there is a unique such that . Furthermore and as .

## 3 Sparse Coding by Spiking LCA Neural Network

A-LCA is inherently communication efficient: Neuron- needs to communicate to others only when its internal state exceeds a threshold, namely . In a sparse coding problem, it is expected that the internal state will eventually stay perpetually below the threshold for many neurons. Nevertheless, for the entire duration during which a neuron’s internal state is above threshold, constant communication is required. Furthermore, the value to be sent to other neurons are real valued (analog) in nature. In this perspective, a spiking neural network (SNN) model holds the promise of even greater communication efficiency. In a typical SNN, various internal states of a neuron are also continually evolving. In contrast, however, communication in the form of a spike—that is one bit—is sent to other neurons only when a certain internal state reaches a level (a firing threshold). This internal state is reset right after the spiking event, thus cutting off communication immediately until the time when the internal state is “charged up” enough. Thus communication is necessary only once in a certain time span and then a single bit of information carrier suffices.

While such a SNN admits mathematical descriptions [16, 3], there is hitherto no rigorous results on the network’s convergence behavior. In particular, it is unclear how a SNN can be configured to solve specific problems with some guarantees. We present now a mathematical formulation of a SNN and a natural definition of instantaneous spiking rate. Our main result is that under a moderate assumption, the spiking rate converges to the CLASSO solution when the SNN is suitably configured. To the best of our knowledge, this is the first time a rigorous result of this kind is established.

In a SNN each of the neurons maintains, over time , an internal soma current configured to receive a constant input and an internal potential . The potential is “charged” up according to where is a configured bias current. When reaches a firing threshold at a time , neuron- resets its potential to but simultaneously fires an inhibitory signal to a preconfigured set of receptive neurons, neuron-s, whose soma current will be diminished according to a weighted exponential decay function: , where for and zero otherwise. Let be the ordered time sequence of when neuron- spikes and define , then the soma current satisfies both the algebraic and differential equations below (the operator denotes convolution):

 μi(t)=bi−∑j≠iwij(α∗σj)(t),˙μi(t)=bi−μi(t)−∑j≠iwijσj(t). (3)

Equation 3 together with the definition of the spike trains describe our spiking LCA (S-LCA).

An intuitive definition of spike rate of a neuron is clearly the number of spikes per unit time. Hence we define the instantaneous spiking rate and average soma current for neuron- as:

 ai(t)def=1t−t0∫tt0σi(s)dsandui(t)def=1t−t0∫tt0μi(s)ds,% t0≥0 is a parameter. (4)

Apply the operator to the differential equation portion in (3), using also the relationship , and we obtain

 ˙ui(t)=bi−ui(t)−∑j≠iwijaj(t)−(ui(t)−ui(t0))/(t−t0). (5)

Consider now a CLASSO problem where the dictionary atoms are non-negative and normalized to unit Euclidean norm. Configure S-LCA with and from Equation 1, and set , . So configured, it can be shown that the soma currents’ magnitudes (and thus that of the average currents as well) are bounded: there is a such that for all and all . Consequently,

 limt→∞˙ui(t)=limt→∞(μi(t)−ui(t))/(t−t0)=0,for i=1,2,…,N. (6)

The following relationship between and is crucial:

 ui(t)−λ=1t−t0∫ti,kt0(μi−λ)+1t−t0∫tti,k(μi−λ)=ai(t)+vi(t)/(t−t0). (7)

From this equation and a moderate assumption that inter-spike duration cannot be arbitrarily long unless neuron- stops spiking altogether, one can prove that

 Tλ(ui(t))−ai(t)→0as t→∞. (8)

The complete proof for this result is left in the Appendix.

We can derive convergence of S-LCA as follows. Since the average soma currents are bounded, Bolzano-Weierstrass theorem shows that has at least one limit point, that is, there is a point and a time sequence , such that as . By Equation 8, . By Equations 5 and 6, we must therefore have

 0=b−u∗−(ΦTΦ−I)a∗. (9)

Since S-LCA is configured for a CLASSO problem, the limit is in fact a fixed point of A-LCA, which is unique whenever CLASSO’s solution is. In this case, the limit point of the average currents is unique and thus indeed we must have and , the CLASSO solution.

## 4 Numerical Simulations

To simulate the dynamics of S-LCA on a conventional CPU, one can precisely solve the continuous-time spiking network formulation by tracking the order of firing neurons. In between consecutive spiking events, the internal variables, and , of each neuron follow simple differential equations and permit closed-form solutions. This method, however, is likely to be slow that it requires a global coordinator that looks ahead into the future to determine the next firing neuron. For efficiency, we instead take an approximate approach that evolves the network state in constant-sized discrete time steps. At every step, the internal variables of each neuron are updated and a firing event is triggered if the potential exceeds the firing threshold. The simplicity of this approach admits parallel implementations and is suitable for specialized hardware designs. Nevertheless, this constant-sized-time-step approach introduces errors in spike timings: the time that a neuron sends out a spike may be delayed by up to a time step. As we will see in this section, the timing error is the major factor that limits the accuracy of the solutions from spiking networks. However, such an efficiency-accuracy trade-off may in fact be desirable for certain applications such as those in machine learning.

### 4.1 Illustration of SNN dynamics

We solve a simple CLASSO problem: subject to , where

 s=⎡⎢⎣0.511.5⎤⎥⎦,Φ=[ϕ1ϕ2ϕ3]=⎡⎢⎣0.33130.81480.43640.88350.36210.21820.33130.45270.8729⎤⎥⎦,λ=0.1.

We use a 3-neuron network configured with , , the bias current as and firing threshold set to 1. Figure 1 details the dynamics of this simple 3-neuron spiking network. It can be seen from this simple example that the network only needs very few spike exchanges for it to converge. In particular, a weak neuron, such as Neuron 2, is quickly rendered inactive by inhibitory spike signals from competing neurons. This raises an important question: how many spikes are in the network? We do not find this question easy to answer theoretically. However, empirically we see the number of spikes in in S-LCA can be approximated from the state variable in A-LCA, that is the in Equation 10 below are solutions to Equation 2:

 ∫t0σi(s)ds≈∫t0Tλ(ui(s))ds,i=1,2,…,N. (10)

Figure 1(d) shows the close approximation of spike counts using (10) in the example. We observe that such approximation consistently holds in large-scale problems, suggesting a strong tie between S-LCA and A-LCA. Since in an A-LCA configured for a sparse coding problem, we expect for most ’s to converge to zero, (10) suggests that the total spike count in S-LCA is small.

### 4.2 Convergence of spiking neural networks

We use a larger 400-neuron spiking network to empirically examine the convergence of spike rates to CLASSO solution. The neural network is configured to perform feature extraction from a 88 image patch, using a 400-atom dictionary learned from other image datasets.444The input has 128 dimensions by splitting the image into positive and negative channels. With the chosen , the optimal solution has 8 non-zeros entries. Figure 2(a) shows the convergence of the objective function value in the spiking network solution, comparing to the true optimal objective value obtained from a conventional CLASSO solver. Indeed, with a small step size, the spiking network converges to a solution very close to the true optimum.

The relationships among step size, solution accuracy and total computation cost are noteworthy. Figure 2(a) shows that increasing the step size from to sacrifices two digits of accuracy in the computed . The total computation cost is reduced by a factor of : It takes times fewer time units to converge, and each time unit requires times fewer iterations. This multiplication effect on cost savings is highly desirable in applications such as machine learning where accuracy is not paramount. We note that a large-step-size configuration is also suitable for problems whose solutions are sparse: The total number of spikes are fewer and thus total timing errors are correspondingly fewer.

There are several ways to “read out” a SNN solution. Most rigidly, we can adhere to in Equation 4 with . In practice, picking some is better when we expect a sparse solution: The resulting will be identically zero for those neurons that only spike before time . Because (Equation 8), another alternative is to use as the solution, which is more likely to deliver a truly sparse solution. Finally, one can change ’s definition to , so that the impact of the spikes in the past decays quickly. Figure 2(b) illustrates these different “read out” methods and shows that the exponential kernel is as effective empirically, although we must point out that the previous mathematical convergence analysis is no longer applicable in this case.

### 4.3 CPU benchmark of a spiking network implementation

Our earlier discussions suggest that the spiking network can solve CLASSO using very few spikes. This property has important implications to a SNN’s computational efficiency. The computation cost of a -neuron spiking network has two components: neuron states update and spiking events update. Neuron states update includes updating the internal potential and current values of every neuron, and thus incurs an cost at every time step. The cost of spiking events update is proportional to times the average number of inter-neuron connections because a spiking neuron updates the soma currents of those neurons to which it connects. Thus this cost can be as high as (for networks with all-to-all connectivity, such as in the two previous examples) or as low as (for networks with only local connectivity, such as in the example below). Nevertheless, spiking-event cost is incurred only when there is a spike, which may happen far fewer than once per time step. In practice we observe that computation time is usually dominated by neuron-states update, corroborating the general belief that spiking events are relatively rare, making spiking networks communication efficient.

We report the execution time of simulating the spiking neural network on a conventional CPU, and compare the convergence time with FISTA [4], one of the fastest LASSO solvers. We solve a convolutional sparse coding problem [20] on a 52x52 image and a 208x208 image.555We use 8

8 patches, a stride of 4, and a 128

224 dictionary. The experiments are ran on 2.3GHz Intel® Xeon® CPU E5-2699 using a single core. SIMD is enabled to exploit the intrinsic parallelism of neural network and matrix operations. As shown in Figure 3, the spiking network delivers much faster early convergence than FISTA, despite its solution accuracy plateauing due to spike timing errors. The convergence trends in both figures are similar, demonstrating that spiking networks can solve problems of various sizes. The fast convergence of spiking networks can be attributed to their ability to fully exploit the sparsity in solutions to reduce the spike counts. The fine-grain asynchronous communication can quickly suppress most neurons from firing. In FISTA or in any other conventional solvers, communications between variables is similarly needed, but is realized through matrix-vector multiplications performed in an iteration-to-iteration basis. The only way to exploit sparsity is to avoid computations involving variables that have gone to zero during one iteration. A comparison of how the sparsity in solutions evolves in S-LCA and FISTA can be found in Figure 3(b).

## 5 Discussion

Our work is closely related to the recent progress on optimality-driven balanced network [7, 3, 5]. The SNN model in [3, 5] differs slightly from ours in that only one internal state is used in the former. Using our language here, neuron-’s spike is generated by reaching a threshold and not by , whose role is eliminated altogether. Despite the differences in the details of neuron models, spikes in both networks occur from a competitive process between neurons, and serve to minimize a network-level energy function. This work furthers the understanding of the convergence property in such spiking networks. Additionally, it is argued that in a tightly balanced excitatory/inhibitory network, spike codes are highly efficient that each spike is precisely timed to keep the network in optimality. This work provides evidence of the high coding efficiency even before the network settles into steady-state. By utilizing well-timed spikes, the neurons are able to collectively solve optimization problems with minimum communications. We demonstrate that this insight can be translated into practical value through an approximate implementation on conventional CPU.

We observe that mathematical rigor was not a focus in [3]

: The statement that in a tightly balanced network the potential converges to zero is problematic when taken literally as all spiking events will eventually cease in that case. The stationary points of the loss function (Equation 6 in

[3]) are no longer necessarily the stationary points when the firing rates are constrained to be non-negative. The general KKT condition has to be used in this situation. The condition does not affect the behavior of the loss function in between spikes. In essence, there is no guarantee that the trajectory of the variable generated by the SNN is descending the loss function, that is, .

Our SNN formulation and the established convergence properties can be easily extended to incorporate an additional -penalty term, the so-called elastic-net problem [21]

 argmina≥012∥s−Φa∥22+λ1∥a∥1+λ2∥a∥22 (11)

The elastic-net formulation can be handled by modifying the slope of the activation function in A-LCA as follows

 Tλ(x)def={ll0if x≤λ1x−λ12λ2+1if x>λ1,T±λ(x)def=Tλ(x)+Tλ(−x).

In S-LCA, this corresponds to setting the bias current to and modifying the firing thresholds of the neurons to .

There are several other works studying the computation of sparse representations using spiking neurons. Zylberberg et al. [22] show the emergence of sparse representations through local rules, but do not provide a network-level energy function. Hu et al. [13] derive a spiking network formulation that minimizes a modified time-varying LASSO objective. Shapero et al. [16, 17] are the first to propose the S-LCA formulation, but yet to provide an in-depth analysis. We believe the S-LCA formulation can be a powerful primitive in future spiking network research.

The computational power of spikes enables new opportunities in future computer architecture designs. The spike-driven computational paradigm motivates an architecture composed of massively parallel computation units. Unlike the von Neumann architecture, the infrequent but dispersed communication pattern between the units suggests a decentralized design where memory should be placed close to the compute, and communication can be realized through dedicated routing fabrics. Such designs have the potential to accelerate computations without breaking the energy-density limit.

## Appendix A Governing Algebraic and Differential Equations

Consider a neural networking consisting of neurons. The only independent variables are the soma currents for . There are another variables of potentials which are depedent on the currents to be described momentarily. Consider the following configurations. Each neron receives a positive constant input current . A nonnegative current bias and a positive potential threshold are set a priori. At any given time such that , the potential evolves according to

 vi(t)=∫tt0(μi(s)−λ)ds

until the time when . At this time, a spike signal is sent from neuron- to all the nerons that are connected to it, weighted by a set of pre-configured weights . The potential is reset to zero immediately afterwards. That is, for but before the next spike is generated,

 vi(t)=∫tti,k(μi(s)−λ)ds.

Moreover, for any consecutive spike times and ,

 ∫ti,k+1ti,k(μi(s)−λ)ds=ν.

Finally, when neuron- receives a spike from neuron- at time with a weight , the soma current is changed by the additive signal where

 α(t)=H(t)e−t/τ,

being the Heaviside function that is 1 for and 0 otherwise. The sign convention used here means that a positive means that a spike from neuron- always tries to inhibit neuron-.

Suppose the initial potentials are all set to be below the spiking threshold , then the dynamics of the system can be succintly described by the set of algebraic equations

 μi(t)=bi−∑j≠iwi,j(α∗σj)(t),i=1,2,…,N (AE)

where is the convolution operator and is the sequence of spikes

 σj(t)=∑kδ(t−tj,k),

being the Dirac delta function. The spike times are determined in turn by the evolution of the soma currents that govern the evolutions of the potentials.

One can also express the algebraic equations AE as a set of differential equations. Note that the Heaviside function can be expressed as . Hence

 ddtα(t) = ddt(e−t/τ∫t−∞δ(s)ds) = −1τα(t)+δ(t).

Thus, differentiating Equation AE yields

 ˙μi(t)=1τ(bi−μi(t))−∑j≠iwi,jσj(t). (DE)

Note that Equations AE and DE are given in terms of the spike trains that are governed in turn by the soma currents themselves as well as the configuartions of initial potentials, the spiking threshold and bias current .

## Appendix B Defining Spike Rates and Average Currents

Suppose the system of spiking neurons are initialized with sub-threshold potentials, that is, for all . Thus at least for finite time after 0, all soma currents remain constant at and that no neurons will generate any spikes. Furthermore, consider for now that for all . That is, only inhibitory signals are present. Let the spike times for each neuron be . This sequence could be empty, finite, or infinite. It is empty if the potential never reaches the threshold. It is finite if the neuron stop spiking from a certain time onwards. We will define the spike rate, , and average current, , for each neuron as follows.

 ai(t)def={1t∫t0σi(s)dst>0,0t=0,

and

 ui(t)def={1t∫t0μi(s)dst>0,bit=0.

With these definitions, the section presents the following results.

• The inhibition assumption leads to the fact that all the soma currents are bounded above. This in turns shows that none of the neurons can spike arbitrarily rapidly.

• The fact that neurons cannot spike arbitrarily rapidly implies the soma currents are bounded from below as well.

• The main assumption needed (that is, something cannot be proved at this point) is that if a neron spikes infinitely often, then the duration between consecutive spikes cannot be arbitrarily long.

• Using this assumption and previous established properties, one can prove an important relationship between the spike rate and average current in terms of the familiar thresholding function

###### Proposition 1.

There exists bounds and such that for all and . With the convention that , then there is a positive value such that for all and , whenever these values exist.

###### Proof.

Because all spike signals are inhibitory, clearly from Equation AE, we have for all . Thus, defining leads to for all and .

Given any two consecutive and that exist,

 ν = vi(t+i,k)+∫ti,k+1ti,k(μi(s)−λ)ds ≤ vi(t+i,k)+(ti,k+1−ti,k)(B+−λ).

Note that if . For the special case when , this value is . Hence

 ti,k+1−ti,k≥min{mini{ν−vi(0)},ν}(B+−λ)−1.

Thus there is a so that whenever these two spike times exist.

Finally, because of duration between spikes cannot be arbitrarily small, it is easy to see that

 γdef=∞∑ℓ=0e−ℓRτ≥(α∗σ)(t).

Therefore,

 B−def=min{−γ∑j≠iwi,j}≤μi(t)

for all and . So indeed, there are and such that for all and . ∎

Proposition 1 shows that among other things, there is a lower bound of the duration of consecutive spikes. The following is an assumption.

###### Assumption 1.

Assume that there is a positive number such that whenever the numbers and exist, .

In simple words, this assumption says that unless a neuron stop spiking althogether after a certain time, the duration between consecutive spike cannot become arbitrarily long. With this assumption and the results in Proposition 1, the following important relationship between and can be established.

###### Theorem 2.

Let be the thresholding function where for , and for . For each neuron , there is a function such that

 T(ui(t))=ai(t)ν+Δi(t)

and that as .

###### Proof.

Let

 A={i∣neuron-i spikes infinitely often}

( stands for “active”), and

 I={i∣neuron-i stop spiking after a finite time}

( stands for “inactive”). First consider . Let be the time of the final spike. For any ,

 ui(t)−λ = 1t∫ti,k0(μi(s)−λ)ds+1t∫tti,k(μi(s)−λ)ds = 1t∫ti,k0(μi(s)−λ)ds+1tvi(t) = ai(t)ν+1tvi(t), ui(t) = ai(t)ν+λ+1tvi(t).

Note that always. If , then

 0≤T(ui(t))−ai(t)≤ν/t.

If ,

 −ai(t)ν≤T(ui(t))−ai(t)ν≤0.

Since , obviously. Thus

 T(ui(t))−ai(t)ν→0.

Consider the case of . For any , let be the largest spike time that is no bigger than . Because , as .

 ui(t)−λ = 1t∫ti,k0(μi(s)−λ)ds+1t∫tti,k(μi(s)−λ)ds = ai(t)ν+1t∫tti,k(μi(s)−λ)ds.

Furthermore, note that because of the assumption always, where , . In otherwords, there is a time large enough such that for all and . Moreover, and . Thus

 1t∫tti,k(μi(s)−λ)ds∈1t[B−−λ,B+−λ]/r→0.

When this term is eventually smaller in magnitude than ,

 T(ui(t)=ai(t)ν+1t∫tti,k(μi(s)−λ)ds

and we have

 T(ui(t))−ai(t)ν→0.

## Appendix C Spiking Neural Nets and LCA

This section shows that for a spiking neural net (SNN) that corresponds to a LCA, the limit points of the SNN necessarily are the fixed points of the LCA. In particular, when the LCA corresponds to a constrained LASSO, that is LASSO where the parameters are constrained to be nonnegative, whose solution is unique, then SNN necessarily converges to this solution. The proof for all these is surprisingly straightforward.

The following differential equation connecting to and all other spiking rates is crucial.

 ˙ui(t)=1τ(bi−ui(t))−∑j≠iwi,jaj(t)−1t(ui(t)−bi). (rates-DE)

Derivation of this relationship is straightforward. First, apply the operation to Equation DE:

 1t∫t0˙μi(s)ds=1τ(bi−ui(t))−∑j≠iwi,jaj(t).

To find an expression for the left hand side above, note that

 ddtui(t) = ddt1t∫t0μi(s)ds = 1tμi(t)−1t2∫t0μi(s)ds = 1t(μi(t)−ui(t)).

Therefore

 1t∫t0˙μi(s)ds = 1t(μi(t)−bi) = 1t(μi(t)−ui(t))+1t(ui(t)−bi) = ddtui(t)+1t(ui(t)−bi).

Consequently, Equation rates-DE is established.

Observe that because is bounded (Proposition 1), so is the average current . This means that as because it was shown just previously that .

Since and are all bounded, the vectors must have a limit point (Bolzano-Weierstrass) . By Theorem 2, there is a correpsonding such that . Moreover, we must have

 0=1τ(b−u∗)−Wa∗

where the matrix has entries and . Hence

 0=1τ(b−u∗)−1νWT(u∗).

Indeed, , correspond to a fixed point of LCA. In the case when this LCA corresponds to a LASSO with unique solution, there is only one fixed point, which implies that there is also one possible limit point of SNN, that is, the SNN must converge, and to the LASSO solution.

## References

• [1] A. Balavoine, J. Romberg, and C. J. Rozell. Convergence and rate analysis of neural networks for sparse approximation. IEEE Trans. Neural Netw., 23(9):1377–1389, September 2012.
• [2] A. Balavoine, C. J. Rozell, and J. Romberg. Convergence of a neural network for sparse approximation using nonsmooth Łojasiewicz inequality. In Proceedings of the International Joint Conference on Neural Networks, Dalla, TX, August 2013.
• [3] D. G. T. Barrett, S. Denève, and C. K. Machens. Firing rate predictions in optimal balanced networks. In NIPS, 2013.
• [4] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183–202, 2009.
• [5] M. Boerlin, C. Machens, and S. Deneve. Predictive coding of dynamical variables in balanced spiking networks. PLoS Comput Biol, 9(11), 2013.
• [6] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, Cambridge, 2004.
• [7] S. Denève and C. K. Machens. Efficient codes and balanced networks. Nature neuroscience, 19(3):375–382, 2016.
• [8] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of Statistics, 32(2):407–499, 2004.
• [9] M. Elad. Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer, 2010.
• [10] J. J. Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci., 79(8):2554–2558, 1982.
• [11] J. J. Hopfield. Neurons with graded response have collective computational properties like those of two-state neurons. Proc. Natl. Acad. Sci., 1:3088–3092, 1984.
• [12] J. J. Hopfield and A. V. Herz. Rapid local synchronization of action potentials: Toward computation with coupled integrate-and-fire neurons. Proc. Natl. Acad. Sci., 92(15):6655–6662, 1995.
• [13] T. Hu, A. Genkin, and D. B. Chklovskii. A network of spiking neurons for computing sparse representations in an energy-efficient way. Neural Comput., 24(11):2852–2872, 2012.
• [14] J. P. LaSalle. Some extensions of Liapunov’s second method. IRE Trans. Circuit Theory, 7(4):520–527, December 1960.
• [15] C. J. Rozell, D. H. Johnson, R. G. Baraniuk, and B. A. Olshausen. Sparse coding via thresholding and local competition in neural circuits. Neural Comput., 20(10):2526–2563, 2008.
• [16] S. Shapero, C. Rozell, and P. Hasler. Configurable hardware integrate and fire neurons for sparse approximation. Neural Netw., 45:134–143, 2013.
• [17] S. Shapero, M. Zhu, J. Hasler, and C. Rozell. Optimal sparse approximation with integrate and fire neurons. International journal of neural systems, 24(05):1440001, 2014.
• [18] P. T. P. Tang. Convergence of LCA Flows to (C)LASSO Solutions. ArXiv e-prints, Mar. 2016, 1603.01644.
• [19] R. Tibshirani. Regression shrinkage and selection via the Lasso. J. Royal Statist. Soc B., 58(1):267–288, 1996.
• [20] M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus. Deconvolutional networks. In IEEE CVPR, 2010.
• [21] H. Zou and T. Hastie. Regularization and variable selection via the elastic net. J. Royal Statist. Soc B., 67:301–320, 2005.
• [22] J. Zylberberg, J. T. Murphy, and M. R. DeWeese. A sparse coding model with synaptically local plasticity and spiking neurons can account for the diverse shapes of v1 simple cell receptive fields. PLoS Comput Biol, 7(10):e1002250, 2011.