On the Algorithmic Power of Spiking Neural Networks

03/28/2018 ∙ by Chi-Ning Chou, et al. ∙ Harvard University 0

Spiking Neural Networks (SNN) are mathematical models in neuroscience to describe the dynamics among a set of neurons which interact with each other by firing spike signals to each other. Interestingly, recent works observed that for an integrate-and-fire model, when configured appropriately (e.g., after the parameters are learned properly), the neurons' firing rate, i.e., converges to an optimal solution of Lasso and certain quadratic optimization problems. Thus, SNN can be viewed as a natural algorithm for solving such convex optimization problems. However, theoretical understanding of SNN algorithms remains limited. In particular, only the convergence result for the Lasso problem is known, but the bounds of the convergence rate remain unknown. Therefore, we do not know any explicit complexity bounds for SNN algorithms. In this work, we investigate the algorithmic power of the integrate-and-fire SNN model after the parameters are properly learned/configured. In particular, we explore what algorithms SNN can implement. We start by formulating a clean discrete-time SNN model to facilitate the algorithmic study. We consider two SNN dynamics and obtain the following results. * We first consider an arguably simplest SNN dynamics with a threshold spiking rule, which we call simple SNN. We show that simple SNN solves the least square problem for a matrix A∈R^m× n and vector b∈R^m with timestep complexity O(κ n/ϵ). * For the under-determined case, we observe that simple SNN may solve the ℓ_1 minimization problem using an interesting primal-dual algorithm, which solves the dual problem by a gradient-based algorithm while updates the primal solution along the way. We analyze a variant dynamics and use simulation to serve as partial evidence to support the conjecture.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The theory of natural algorithms is a framework that bridges the algorithmic thinking in computer science and the mathematical models in biology. Under this framework, biological systems are viewed as algorithms to efficiently solve specific computational problems. Seminal works such as bird flocking [Cha09, Cha12], slime systems [NYT00, TKN07, BMV12], and evolution [LPR14, LP16] successfully provide algorithmic explanations for different natural objects. These works give rigorous theoretical results to confirm empirical observations, shed new light on the biological systems through computational lens, and sometimes lead to new biologically inspired algorithms. In this work, we investigate Spiking Neural Networks (SNNs) as natural algorithms for solving convex optimization problems. SNNs are mathematical models for biological neural networks where a network of neurons transmit information by firing spikes through their synaptic connections (i.e., edges between two neurons). Our starting point is a seminal work of Barrett, Denève, and Machens [BDM13], where they showed that the firing rate (i.e., the average number of spikes fired by each neuron) of a certain class of integrate-and-fire SNNs can be characterized by the optimal solutions of a quadratic program defined by the parameters of SNN. Thus, the SNN can be viewed as a natural algorithm for the corresponding quadratic program. However, no rigorous analysis was given in their work. We bridge the gap by showing that the firing rate converges to an optimal solution of the corresponding quadratic program with an explicit polynomial bound on the convergent rate. Thus, the SNN indeed gives an efficient algorithm for solving the quadratic program. To the best of our knowledge, this is the first result with an explicit bound on the convergent rate. Previous works [SRH13, SZHR14, TLD17]

on related SNN models for optimization problems are either heuristic or only proving convergence results when the time goes to infinity (see Section 

1.4 for full discussion on related works). We take one step further to ask what other optimization problems can SNNs efficiently solve. As our main result, we show that when configured properly, SNNs can solve the minimization problem111The problem is defined as given matrix , vector , and guaranteed that there is a solution to . The goal is finding a solution with the smallest norm. See Section 2 for formal definition. in polynomial time222The running time is polynomial in a parameter depending on the inputs. In some cases, this parameter might cause the running to be quasi-polynomial or sub-exponential. See Section 3.3 for more details.. Our main technical insight is interpreting the dynamics of SNNs in a dual space. In this way, SNNs can be viewed as a new primal-dual algorithm for solving the minimization problem. In the rest of the introduction, we will first briefly introduce the background of spiking neural networks (SNNs) and formally define the mathematical model we are working on. Next, our results will be presented and compared with other related works. Finally, we wrap up this section with potential future research directions and perspectives.

1.1 Spiking Neural Networks

Spiking neural networks (SNNs) are mathematical models for the dynamics of biological neural networks. An SNN consists of neurons, and each of them is associated with an intrinsic electrical charge called membrane potential. When the potential of a neuron reaches a certain level, it will fire an instantaneous signal, i.e., spike, to other neurons and increase or decrease their potentials. Mathematically, the dynamic of neuron’s membrane potential in an SNN is typically described by a differential equation, and there are many well-studied models such as the integrate-and-fire model [Lap07], the Hodgkin-Huxley model [HH52], and their variants [Fit61, Ste65, ML81, HR84, Ger95, KGH97, BL03, FTHVVB03, I03, TMS14]. In this work, we focus on the integrate-and-fire model defined as follows. Let be the number of neurons and be the vector of membrane potentials where is the potential of neuron at time for any and . The dynamics of can be described by the following differential equation: for each and

(1)

where the initial value of the potentials are set to 0, i.e.,  for each . There are two terms that determine the dynamics of membrane potentials as shown in (1). The simpler term is the input charging333Also known as input signal or input current. , which can be thought of as an external effect on each neuron. The other term models the instantaneous spike effect among neurons. Specifically, the term models the effect on the potential of neuron when neuron fires a spike. Here is the connectivity matrix that encodes the synapses between neurons, where describes the connection strength from neuron to neuron . is the spike train that records the spikes of each neuron, and can be thought of as indicating whether neuron fires a spike at time . To sum up, the term decreases444If , then the potential of neuron actually increases by . the potential of neuron by whenever neuron fires a spike at time . The spike train is determined by the spike events, which are in turn determined by the spiking rule. A typical spiking rule is the threshold rule. Specifically, let be the spiking threshold, the threshold rule simply says that neuron fires a spike at time if and only if . Next, record the timings when neuron fires a spike as and let be the number of spikes within time . An important statistics of the dynamics is the firing rate defined as for neuron at time , namely, the average number of spikes of neuron up to time . The last thing we need for specifying is the spike shape, which can be modeled as a function . Intuitively, the spike shape describes the effect of a spike, and standard choices of could be the Dirac delta function or a pulse function with an exponential tail. Now we can define to be the spike train of neuron at time . We provide the following concrete example to illustrate the SNN dynamics introduced above.

Example 1.1.

Let , , and be the Dirac delta function such that for any , and for any . Let both input charging and connectivity matrix be static, i.e.,  and for any , and consider

In Figure 1, we simulate this SNN for 500 seconds. We can see that neuron 1 fires a spike every ten seconds while neuron 2 fires a spike every one hundred seconds. As a result, the firing rate of neuron 1 will gradually converge to 0.1 and that of neuron 2 will go to 0.01.

Figure 1: The example of SNN with two neurons. In (a), we describe the dynamic of this SNN. Note that the effect of spikes is the negation of the synapse encoded in the connectivity matrix . In (b), we plot the membrane potential vectors . In (c), we plot the timings when neurons fire a spike. One can see that neuron 1 fires a spike every ten seconds while neuron 2 fires a spike every one hundred seconds. In (d), we plot the firing rate vector . One can see that the firing rate of neuron 1 will gradually converge to 0.1 and that of neuron 2 will go to 0.01.

In general, both the input charging vector and the connectivity matrix can evolve over time, in which the change of models the variation of the environment and the change of captures the adaptive learning behavior of the neurons to the environmental change. Understanding how synapses evolve over time (i.e., synapse plasticity) is a very important subject in neuroscience. However, in this work, we follow the choice of Barrett et al. [BDM13] and consider static SNN dynamics, where both the input charging and the synapses are constants. Although this is a special case compared to the general model in (1), we justify the choice of static SNN by showing that SNN already exhibits non-trivial computational power even in this restricted model. As in Barrett et al. [BDM13], we focus on static SNN and view it as a natural algorithm for optimization problems. Specifically, given an instance to the optimization problem, the goal is to configure a static SNN (by setting its parameters) so that the firing rate converge to an optimal solution efficiently. In this sense, the result of Barrett et al. [BDM13] can be interpreted as a natural algorithm for certain quadratic programs. In our eyes, the solution being encoded as the firing rate is an interesting and peculiar feature of the SNN dynamics. Also, the dynamics of a static SNN can be viewed as a simple distributed algorithm with a simple communication pattern. Specifically, once the dynamics is set up, each neuron only needs to keep track of its potential and communicate with each other through spikes.

1.2 Our Results

Barrett et al. [BDM13] gave a clean characterization of the firing rates by the network connectivity and input signal. Concretely, they considered static SNN where both the connectivity matrix and the external charging do not change with time. They argued that the firing rate would converge to the solution of the following quadratic program.

(2)
subject to

They supported this observation by giving simulations on the so called tightly balanced networks and yielded pretty accurate predictions in practice. Also, they heuristically explained the reason how they came up with the quadratic program. However, no rigorous theorem had been proved on the convergence of firing rate to the solution of this quadratic program.
To give a theoretical explanation for the discovery of [BDM13], we start with a simpler SNN model to enable the analysis.

The simple SNN model

In the simple SNN model, we make two simplifications on the general model in (1). First, we pick the shape of spike to be the Dirac delta function. That is, let and thus . This simplification saves us from complicated calculation while the Dirac delta function still captures the instantaneous behavior of a spike. Second, we consider the connectivity matrix in the form where is the spiking strength and is the Cholesky decomposition of . The reason for introducing is to model the height of the Dirac delta function. Mathematically, it is redundant to have both and since the model remains the same when combining with . However, as we will see in the next subsection, separating and is meaningful as corresponds to the input of the computational problem and is the parameter that one can choose to configure an SNN to solve the problem. In this work, we focus on the algorithmic power of SNN in the following sense. Given a problem instance, one configures a SNN and sets the firing rate to be the output at time . We say this SNN solves the problem if converges to the solution of the problem.

Simple SNN solves the non-negative least squares.

As mentioned, Barrett et al. [BDM13] identified a connection between the firing rate of SNN with integrate-and-fire neurons and a quadratic programming problem (2). They gave empirical evidence for the correctness of this connection, however, no theoretical guarantee had been provided. Our first result confirms their observation by giving the first theoretical analysis. Specifically, when and , the firing rate will converge to the solution of the following non-negative least squares problem.

(3)
subject to
Theorem 1 (informal).

Given , , and . Suppose satisfies some regular conditions555More details about the regular conditions will be discussed in Section 3.3.. Let be the firing rate of the simple SNN with where is a function depending on . When ,666The and the later both hide the dependency on some parameters of . See Section 3.3. is an -approximate solution777See Definition 1 for the formal definition of -approximate solution. for the non-negative least squares problem of .

See Theorem 6 in Section 4 for the formal statement of this theorem. To the best of our knowledge, this is the first888See Section 1.4 for comparisons with related works. theoretical result on the analysis of SNN with an explicit bound on the convergence rate and shows that SNN can be implemented as an efficient algorithm for an optimization problem.

Simple SNN solves the minimization problem.

In addition to solving the non-negative least squares problem, as our main result, we also show that the simple SNN is able to solve the minimization problem, which is defined as minimizing the norm of the solutions of . minimization problem is also known as the basis pursuit problem proposed by Chen et al. [CDS01]

. The problem is widely used for recovering sparse solution in compressed sensing, signal processing, face recognition etc. Before the discussion on

minimization, let us start with a digression on the two-sided simple SNN for the convenience of future analysis.

where . Note that the two-sided SNN is a special case of the one-sided SNN in the sense that one can use the one-sided SNN to simulate the two-sided SNN as follows. Given a two-sided SNN described above with connectivity matrix and external charging . Let and . Intuitively, this can be thought of as duplicating each neuron and flip its connectivities with other neurons. To solve the minimization problem, we simply configure a two-sided SNN as follows. Given an input , let and . Now, we have the following theorem.

Theorem 2 (informal).

Given , , and . Suppose satisfies some regular conditions. Let be the firing rate of the two-sided simple SNN with where is a function depending on . When , is an -approximate solution999See Definition 2 for the formal definition of -approximate solution. for the minimization problem of .

See Theorem 5 for the formal statement of this theorem. As we will discuss in the next subsection, under the dual view of the SNN dynamics, the simple two sided SNN can be interpreted as a new natural primal-dual algorithm for the minimization problem.

1.3 A Dual View of the SNN Dynamics

The main techniques in this work is the discovery of a dual view of SNN. Recall that the dynamics of a static SNN can be described by the following differential equation.

where the parameters and can be represented as and for some and . For simplicity, we pick the firing threshold here. Let us call the dynamics of the primal SNN. Now, the dual SNN, can be defined as follows.

where and

defined as the usual way. At first glance, this merely looks like a simple linear transformation, Nevertheless, the dual SNN provides a nice

geometric view for the SNN dynamics as follows.

(a) An example of one neuron.
(b) The effect of both the external charging and spikes on dual SNN.
Figure 2: These are examples of the geometric interpretation of the dual SNN. In (a), we have one neuron where . In this case, neuron would not fire as long as the dual SNN stays in the gray area. In (b), we consider a SNN with 3 neurons where , , and . One can see that the effect of spikes on dual SNN is a jump in the direction of the normal vector of the wall(s).

At each update in the dynamics, there are two terms affecting the dual SNN : the external charging and the spiking effect . First, one can see that the external charging can be thought of as a constant force that drags that dual SNN in the direction . To explain the effect of spikes in the dual view, let us start with an geometric view for the spiking rule. Recall that neuron fires a spike at time if and only if . In the language of dual SNN, this condition is equivalent to . Let be the wall of neuron , the above observation is saying that neuron will fire a spike once it penetrates the wall from the half-space . See Figure (a)a for an example. After neuron fires a spike, the spiking effect on the dual SNN would be a term, which corresponds to a jump in the normal direction of . See Figure (b)b for an example. The geometric interpretation described above is the main advantage of using dual SNN. Specifically, this gives us a clear picture of how spikes affect the SNN dynamics. Namely, neuron fires a spike if and only if the dual SNN penetrates the and then jumps back in the normal direction of . Note that this connection would not hold in the primal SNN. In primal SNN , neuron fires a spike if and only if while the effect on is moving in the direction . See Table 1 for a comparison.

Primal SNN Dual SNN
Spiking rule
Spiking effect
Table 1: Comparison of the geometric view of primal and dual SNNs.
Dual view of SNN as a primal-dual algorithm for minimization problem

First, let us write down the minimization problem and its dual.

subject to
subject to
Now we observe that the dual dynamics can be viewed as a variant of the projected gradient descent algorithm to solve the dual program. Before the explanation, recall that for the minimization problem, we are considering the two-sided SNN for convenience. Indeed, without the spiking term, simply moves towards the gradient direction of the dual objective function . For the spike term , note that (i.e., neuron fires) if and only if , which means that is outside the feasible polytope of the dual program. Therefore, one can view the role of the spike term as projecting back to the feasible polytope. That is, when the dual SNN becomes infeasible, it triggers some spikes, which maintains the dual feasibility and updates the primal solution (the firing rate). To sum up, we can interpret the simple SNN as performing a non-standard projected gradient descent algorithm for the dual program of minimization in the dual view of SNN. With this primal-dual view in mind, we analyze the SNN algorithm by combining tools from convex geometry and perturbation theory as well as several non-trivial structural lemmas on the geometry of the dual program of minimization. One of the key ingredients here is identifying a potential function that (i) upper bounds the error of solving minimization problem and (ii) monotonously converges to . More details will be provided in Section 3.

1.4 Related Work

We compare this research with other related works in the following four aspects.

Computational power of SNN

Recognized as the third generation of neural networks [Maa97b],the theoretical foundation for the computability of SNN had been built in the pioneering works of Maass et al. [Maa96, Maa97b, Maa99, MB01]

in which SNN was shown to be able to simulate standard computational models such as Turing machines, random access machines (RAM), and threshold circuits. However, this line of works focused on the universality of the computational power and did not consider the efficiency of SNN in solving specific computational problems. In recent years, a line of exciting research have reported the efficiency of SNN in solving specific computational problems such as sparse coding 

[ZMD11, Tan16, TLD17], dictionary learning [LT18]

, pattern recognition 

[DC15, KGM16, BMF17], and quadratic programming [BDM13]. These works indicated the advantage of SNN in handling sparsity as well as being energy efficient and inspired real-world applications [BT09]. However, to the best of our knowledge, no theoretical guarantee on the efficiency of SNN had been provided. For instance, Tang et al. [Tan16, TLD17] only proved the convergence in the limit result for SNN solving sparse coding problem instead of giving an explicit convergence rate analysis. The main contribution in this work is giving a rigorous guarantee on the convergence rate of the computational power of SNN.

The number of spikes versus the timing of spikes

In this work, we mainly focused on the firing rate of SNN. That is, we only study the computational power with respect to the number of spikes. Another important property of SNN is the timing of spikes. The power of the timing of spikes had been reported since the 90s from some experimental evidences indicating that neural systems might use the timing of spikes to encode information [Abe91, Hop95, RW99]. From then on, a bunch of works have been focused on the aspect of time as a basis of information coding both from theoretical [OF96, Maa97b, MB01, TDVR01] and experimental [Hei91, BRVSW91, KS93] sides. It is generally believed that the timing of spikes is more powerful then the firing rate [TFM96, RT01, PMB12]. Other than the capacity of encoding information, the timing of spikes has also been studied in the context of computational power [TFM96, Maa97a, Maa97b, GM08] and learning [BtN05, Ban16, SS17]. See the survey by Paugam et al. [PMB12] for a thorough discussion. While the timing of spikes is conceived as an important source of the power of SNN, in this work we simply focus on the firing rate and already yield some non-trivial findings in terms of the computational power. We believe that our work is still in the very beginning stage of the study of the computational power of SNN. Investigating how does the timing of spikes play a role is an interesting and important future direction. Immediate open questions here would be how could the timing of spikes fit into our study? What’s the dual view of the timing of spikes? Can the timing of spikes solve the optimization problems more efficiently? Can the timing of spikes solve more difficult problems?

SNN with randomness

While most of the literature focus on deterministic SNN, there is also an active line of works studying the SNN model with randomness101010SNN model with noise is also known as stochastic SNN or noisy SNN depending on how the randomness involves in the model. [AS94, SN94, FSW08, BBNM11, JHM14, Maa15, JHM16, LMP17a, LMP17b, LMP17c, LM18]. Buesing et al. [BBNM11] used noisy SNN to implement MCMC sampling and Jonke et al. [JHM14, Maa15, JHM16] further instantiated the idea to attack -hard problems such as traveling salesman problem (TSP) and constraint satisfaction problem (CSP)

. Concretely, their noisy SNN has a randomized spiking rule and the firing pattern would form a distribution over the solution space whereas the closer a solution is to the optimal solution, the higher the probability it is sampled. They got nice experimental performance in terms of solving empirical instance approximately. They also pointed out that their noisy SNN has the potential to be implemented energy-efficiently in practice. Lynch, Musco, and Parter 

[LMP17b] studied the stochastic SNNs with a focus on the Winner-Take-All (WTA) problem. Their sequence of works [LMP17a, LMP17b, LMP17c, LM18] gave the first asymptotic analysis for stochastic SNN in solving WTA, similarity testing, and neural coding. They view SNNs as distributed algorithms and derived computational tradeoff in running time and network size. In this work, we consider the SNN model without randomness and thus is incomparable with the above SNN models with randomness. It is an interesting direction to apply the dual view of deterministic SNN to SNN with randomness.

Locally competitive algorithms

Inspired by the dynamics of biological neural networks, Ruzell et al. designed the locally competitive algorithms (LCA) [RJBO08] for solving the Lasso (least absolute shrinkage and selection operator) optimization problem111111Note that Lasso is equivalent to the Basis Pursuit De-Noising (BPDN) program under certain parameters transformation., which is widely used in statistical modeling. Roughly speaking, LCA is also a dynamics among a set of artificial neurons which continuously signal their potential values (or a function of the values) to their neighboring neurons. There are two main differences between SNN and LCA. First, the neuron in SNN fires discrete spikes while the artificial neuron in LCA produces continuous signal. Next, the neurons’ potentials in LCA will converge to a fixed value, which is the output of the algorithm. In contrast, in SNN, only the neurons’ firing rates may converge instead of their potentials. Nevertheless, there is a spikified version of LCA introduced by Shapero et al. [SRH13, SZHR14] called spike LCA (S-LCA) in which the continuous signals are replaced with discrete spikes. S-LCA is almost the same as the SNN we are considering except a shrinkage term121212That is, the potential of each neuron will drop with rate proportional to the current potential value.. Recently, Tang et al. [TLD17] showed that the firing rate of S-LCA indeed converges to a variant of Lasso problem131313In this variant, all the entries in matrix is non-negative. in the limit. These works also experimentally demonstrated the efficient convergence of S-LCA and its advantage of fast identifying sparse solutions with potentially competitive practical performance to other Lasso algorithms (e.g., FISTA [BT09]). However, there is no proof of convergence rate, and thus no explicit complexity bound of S-LCA.

1.5 Future Works and Perspectives

In this work, we give a theoretical study on the algorithmic power of SNN. Specifically, we focus on the firing rate of SNN and confirm an empirical analysis by Barrett et al. [BDM13] with a convergence theorem (i.e., Theorem 1). Furthermore, we discover a dual view of SNN and show that SNN is able to solve the minimization problem (i.e., Theorem 2). In the following, we give interpretations to our results and point out future research directions. First, how to interpret the dual dynamics of SNN? In this work, we discover the dual SNN based on mathematical convenience. Is there any biological interpretation? Second, push further the analysis of simple SNN. We believe the parameters we get in the main theorems are not optimal. Is it possible to further sharpen the upper bound? We think this is both theoretically and practically interesting because both non-negative least squares and

minimization are important problems that have been well-studied studied in the literature. Comparing the running time complexity or parallel time complexity of SNN algorithm with other algorithms could also be of theoretical interest and might inspire new algorithm with better complexity. Also, for practical purpose, having better parameters would give more confidence in implementing SNN as a natural algorithm. Third, further investigate the potential of SNN dynamics as natural algorithms. The question is two-folded: (i) What algorithms can SNN implement? (ii) What computational problems can SNN solve? It seems that SNN is good at dealing with sparsity. Could it be helpful in related computational tasks such as fast Fourier transform (FFT) or sparse matrix-vector multiplication? It is interesting to identify optimization problems and class of instances where SNN algorithm can outperform other algorithms. Finally, explore the practical advantage of SNN dynamics as natural algorithms. The potential practical time efficiency, energy efficiency, and simplicity for hardware implementation have been suggested in several works 

[MMI15, BIP15, BPLG16]. It would be exciting to see whether SNN has nice performance on practical applications such as compressed sensing, Lasso, and etc.

2 Preliminaries

In Section 2.1, we build up some notations for the rest of the paper. In Section 2.2, we define two optimization problems and the corresponding convergence guarantees.

2.1 Notations

For any , denote and . Let be two vectors. denotes the entry-wise absolute value of , i.e.,  for any . refers to entry-wise comparison, i.e.,  . Let be an real matrix. For any , denote the th column of as and its negation to be , i.e., . When is positive semidefinite, we define the -norm of a vector to be . Let to be the pseudo-inverse of

. Define the maximum eigenvalue of

as , the minimum non-zero eigenvalue of to be , and the condition number of to be . If we do not specified, the following , and are the eigenvalues and condition number of the connectivity matrix . For any , we denote to be the projection of on the range space of .

2.2 Optimization problems

In this subsection, we are going to introduce two optimization problems: non-negative least squares and minimization.

2.2.1 Non-negative least squares

Problem 1 (non-negative least squares).

Let . Given and vector , find that minimizes subject to for all .

Remark 1.

Recall that the least squares problem is defined as finding that minimize . That is, the non-negative least squares is a restricted version of the least squares problem. Nevertheless, one can use a non-negative least squares solver to solve the least squares problem by setting and where is the instance of least squares and is the instance of non-negative least squares.

The SNN algorithm might not solve the non-negative least squares problem exactly and thus we define the following notion of solving the non-negative least squares problem approximately.

Definition 1 (-approximate solution to non-negative least squares).

Let and . Given and . We say is an -approximate solution to the non-negative least squares problem of if where is an optimal solution.

2.2.2 minimization

Problem 2 ( minimization).

Let . Given and such that there exists a solution to . The goal of minimization is to solve the following optimization problem.

subject to

Similarly, we do not expect SNN algorithm to solve the minimization exactly. Thus, we define the notion of solving the minimization problem approximately as follows.

Definition 2 (-approximate solution to minimization).

Let and . Given and . Let denote the optimal value of the minimization problem of . We say is an -approximate solution of the minimization problem of if and .

2.3 Karush-Kuhn-Tucker conditions

Karush-Kuhn-Tucker (KKT) conditions are necessary and sufficient conditions for the optimality of optimization problems under some regular assumptions. Consider the following optimization program.

(4)
subject to

where are convex and differentiable. Let and be the dual variables. KKT conditions give necessary and sufficient conditions for be a pair of primal and dual optimal solutions.

Theorem 3 (KKT conditions).

are a pair of primal and dual optimal solutions for (4) if and only if the following conditions hold.

  • is primal feasible, i.e., and for all and .

  • is dual feasible, i.e., for all .

  • The Lagrange multiplier vanishes, i.e., .

  • satisfy complementary slackness, i.e., for all .

For more details about KKT conditions, please refer to standard textbook such as Chapter 5.5.3 in [BV04].

2.4 Perturbation theory

Perturbation theory, sometimes known as sensitivity analysis, for optimization problems concerns the situation where the optimization program is perturbed and the goal is to give a good estimation for the optimal solution. See a nice survey by Bonnans and Shapiro 

[BS98]. In the following we state a special case for convex optimization program with strong duality.

Theorem 4 (perturbation, Chapter 5.6 in [Bv04]141414Note that we switch the original and perturbed programs in the statement in [Bv04].).

Given the following two optimization programs where the strong duality holds and there exists feasible dual solution.

(5) subject to
(6) subject to
Let be the optimal value of the original program (5) and be the optimal value of the perturbed program (6). Let be the optimal dual solution of the perturbed program (6). We have

3 A simple SNN algorithm for minimization

In this section, we focus on the discovery of the dual view of simple SNN and how it can be viewed as a primal-dual algorithm for solving the minimization problem. Recall that for the minimization problem, we are working on the two-sided simple SNN for the convenience of future analysis. That is,

where . To solve the minimization problem, we configure a two-sided simple SNN as follows. Given an input , let and . However, currently it is unclear how does the above simple SNN dynamics relate to the minimization problem.

(7)
subject to

Interesting, the connection between simple SNN and the minimization problem happens in the dual program of the minimization problem. Before we formally explain this connection, let us write down the dual program of (7).

(8)
subject to

Let us try to make some geometric observations on (8). First, the objective of the dual program is to maximize the inner product with , which is quite related to the external charging of SNN since we take . Next, the feasible region of the dual program is a polytope (or a polyhedron) defined by the intersection of half-spaces and for each where denotes the column of . It will be convenient to introduce the following notation before we move on. For , let . Let . Thus, the feasible polytope of the dual program is defined by the intersection of half-spaces defined by for all . We call this polytope the dual polytope151515In the case where the feasible region of the dual program is not bounded, it is a dual polyhedron. For the convenience of the presentation, we usually assume the feasible region is bounded.. Moreover, for each

, we call the hyperplane

the wall of the dual polytope. See Figure (b)b for examples.

(a) An example of one neuron.
(b) An example of three neurons.
Figure 3: This is examples of the geometric interpretation of the dual program of minimization problem. In (a), we have where . In this case, the gray area, i.e., the feasible region of the dual program, is unbounded. In (b), we have where , , and . In this case, the gray area is bounded and thus called dual polytope.

Now, the key observation is that by a linear transformation, the dynamics of simple SNN has a natural interpretation in the dual space. We call it the dual SNN defined as follows.

3.1 Dual SNN

We first recall the simple SNN dynamics which we call the primal SNN from now on. For convenience, we set the threshold parameter (and make the spiking strength parameter explicit). For any ,

(9)

Now, we define the dual SNN as follows. Let and for each , define

(10)

Let us make some remarks about the connection between the primal and dual SNNs. First, it can be immediately seen that for each from (9) and (10). That is, given , it is easy to get by multiplying with on the left. It turns out that the other direction also holds. For each , we have , where is the pseudo-inverse of . The reason is because the primal SNN lies in the column space of . Thus, the two dynamics are in fact isomorphic to each other. Now let us understand the dynamics of dual SNN in the dual space . At each timestep, there are two terms, i.e., the external charging and the spiking effect , that affect the dual SNN . The external charging can be thought of as a constant force that drags that dual SNN in the direction . See Figure (a)a. This coincides with the objective function of the dual program (8) and thus the external charging can then be viewed as taking a gradient step towards solving (8). Nevertheless, to solve (8), one need to make sure the solution is feasible, i.e., should lie in the dual polytope. Interestingly, this is exactly what the spike is doing! Recall that neuron fires a spike if (recall that we set ), which corresponds to in the dual space. Thus, the spike term has the following nice geometric interpretation: if “exceeds” the wall for some , then neuron fires a spike and is “bounced back” in the normal direction of the wall in the sense that is subtracted by . See Figure (b)b for example.

(a) The effect of external charging on dual SNN.
(b) The effect of both the external charging and spikes on dual SNN.
Figure 4: This is examples of the geometric interpretation of the dual We consider the same matrix as in Figure (b)b and . In (a), one can see that the external charging points the direction that dual SNN is moving. In (b), one can see that the effect of spikes on dual SNN is a jump in the direction of the normal vector of the wall.

Therefore, one can view the dual SNN as performing a variant of projected gradient descent algorithm for the dual program of minimization problem. Specifically, to maintain the feasibility, the vector is not projected back to the feasible region as usual, but is “bounced back” in the normal direction of the wall corresponding to the violated constraint . An advantage of this variant is that the “bounced back” operation is simply subtraction of , which is significantly more efficient than the orthogonal projection back to the feasible region. On the other hand, note that the dynamics might not exactly converge to the optimal dual solution . Intuitively, the best we can hope for is that will converge to a small neighboring region of (assuming the spiking strength is sufficiently small). The above intuition of viewing dual SNN as a projected gradient descent algorithm for the dual program of the -minimization problem will be formally proved in the later subsections.

The primal-dual connection.

So far we have informally seen that the dual SNN can be viewed as solving the dual program of the -minimization problem. However, this does not immediately give us a reason why the firing rate would converge to the solution of the primal program. It turns out that there is a beautiful connection between the dual SNN and firing rate through the Karush-Kuhn-Tucker (KKT) conditions (see Section 2.3) and perturbation theory (see Section 2.4). We now discuss some intuitions about how the dual solution translates to the primal solution. To jump into the core idea, let us consider an ideal scenario where the dual SNN is already very close to the optimal dual solution for the dual program of the minimization problem. Since is the optimal solution and thus it must lie on the boundary of the dual polytope. Let be the set of walls that touches. That is, if and only if . Now, let denote the optimal primal solution of the minimization problem. Observe that by the complementary slackness in the KKT conditions, for each , we have (resp. ) if (resp. ) and if . To summary, this is saying that contains the coordinates that are non-zero in the primal optimal solution . See Figure 5 for an example.

Figure 5: This is an example based on Figure (b)b and Figure (b)b. In this example, , , , and . The optimal dual solution is as shown in the figure. Thus, by the above definition we have . By the KKT conditions, we then know that only the and coordinate of the optimal primal solution are non-zero. Indeed, the optimal primal solution is .

With this observation, once the dual SNN is very close to the optimal dual solution and stays nearby, only those neurons correspond to would fire spikes. In other words, the firing rate of the non-zero coordinates in the primal optimal solution will remain non-zero due to the spikes while the other coordinates will gradually go to zero. At this point, we have seen that (i) the dual SNN can be viewed as a projected gradient descent algorithm for the dual program of minimization problem and (ii) the dual solution (resp. dual SNN) and primal solution (resp. firing rate) have a natural connection through the KKT conditions. The explanations so far are rather informal and focus on intuition. From now on, everything will start to be more and more formal and rigorous. Before that, let us state the main theorem of this section about simple SNN solving minimization problem.

Theorem 5.

Given and where all the row of has unit norm. Let be the niceness parameter of defined later in Definition 4. Suppose and there exists a solution for . There exists a polynomial such that for any , let be the firing rate of the simple SNN with , , , . Let be the optimal value of the minimization problem. For any , when , then is an -approximate solution for the minimization problem for .

Two remarks on the statement of Theorem 5. First, we consider the continuous SNN instead of the discrete SNN, which is of interest for simulation on classical computer. In discrete SNN, the step size is some non-negligible instead of . The main reason for considering continuous SNN is that this significantly simplify the proof by avoiding a huge amount of nasty calculations. We suspect that the proof idea would hold for discrete SNN with discretization parameter for some polynomial . Second, the parameters in Theorem 5 have not been optimized and we believe all the dependencies can be improved. Since the parameters highly affect the efficiency of SNN as an algorithm for minimization problem, we pose it as an interesting open problem to study what are the best dependencies one can get.

3.2 Overview of the proof for Theorem 5

The proof of Theorem 5 consists of two main steps as mentioned in the previous subsection. The first step argues that the dual SNN would converge to the neighborhood of the optimal dual solution . The second step is connecting the dual solution (i.e., the dual SNN) to the primal solution (i.e., the firing rate). In the first step, we try to identify a potential function161616Potential function is widely used in the analysis of many gradient-descent based algorithm. The difficulty lies in the search of a good potential function for the algorithm. that captures how close is to the optimal dual solution . It turns out that this is not an easy task since the effect of spikes makes the behavior of dual SNN very non-monotone. We conquer the difficulty via a technique that we call ideal coupling (see Definition 6 and Figure 7). The idea is associating the dual SNN with an ideal SNN for every such that the ideal SNN would have smoother behavior comparing to the spiking phenomenon in the dual SNN. We will formally define the ideal SNN in Section 3.4. There are two advantages of using ideal SNN instead of handling dual SNN directly: (i) Ideal SNN is smoother than dual SNN in the sense that it would not change after spikes (see Lemma 3.5). Further, by introducing some auxiliary processes (i.e., the auxiliary SNNs defined in Definition 8

), we are able to identify a potential function that is strictly improving at any moment and measures how well the dual SNN has been solving the

minimization problem (see Lemma 3.8). (ii) ideal SNN is naturally associated with an ideal solution (defined in Definition 7) which is easier to analyze than the firing rate. Using these good properties of ideal SNN, we can prove in Lemma 3.11 that the residual error of the ideal solution will converge to . After we are able to show the convergence of the residual error in Lemma 3.11, we move to the second step where the goal is showing that the norm of the solution is also small. We look at the KKT conditions of the minimization problem and observe that the primal and dual solutions of SNN satisfy the KKT conditions of a perturbed program of the minimization problem. Finally, combine tools from perturbation theory, we can upper bound the error of the ideal solution by its residual error in Lemma 3.12. Theorem 5 then follows from Lemma 3.11 and Lemma 3.12 with some special cares on how to transform everything for ideal solution to the firing rate. See Figure 6 for an overall structure of the proof for Theorem 5.

KKT conditions

Perturbation

Figure 6: Overview of the proof for Theorem 5.

In the rest of this section, we are going to start from some definitions on the nice conditions we need for the input matrix in Section 3.3. Next, we define the ideal coupling in Section 3.4 and prove Lemma 3.5 and Lemma 3.8 in Section 3.5 and Section 3.6 respectively. Finally, we wrap up the proof for Theorem 5 in Section 3.7.

3.3 Some nice conditions on the input matrix

We need some nice conditions for the input matrix as follows.

Definition 3 (non-degeneracy).

Let where . We say is non-degenerate if for any size submatrix of has full rank. For any , we say is -non-degenerate if for any , , and , where is the projection of onto subspace spanned by for any .

Note that if is non-degenerate, then for any and and , there exists an unique solution to where is the submatrix of restricted to columns in . We call such a vertex of the polytope . Note that in this definition, a vertex might not lie in . An important parameter for future analysis is the minimum distance between two distinct vertices of .

Definition 4 (nice input matrix).

Let and . We say is -nice if all of the following conditions hold.

  1. [label=(0)]

  2. is -non-degenerate.

  3. The distance between any two distinct vertices of is at least .

  4. For any , , and , let . For any , .

Define to be the largest such that is -nice. We say is nice if .

To motivate the definition of niceness, the following lemma shows that the minimization problem defined by matrix has unique solution if .

Lemma 3.1.

Let . If , then for any , the minimization problem for has unique solution.

Proof.

We prove the lemma by contradiction. Suppose there exists such that there are two distinct solutions to the minimization problem for . Let be the optimal solution of the dual program as in equation (8). By the complementary slackness in the KKT condition, for any optimal solution to the primal program, . Let , then both and are solution to where and are restrictions to index set . As , we have . By the non-degeneracy of , has full rank and thus has unique solution. That is, , which is a contradiction. We conclude that if is non-degenerate and , then for any , the minimization problem for has unique solution.    

In general, it is easy to find a matrix such that

. However, we would like to argue that most of the matrices are actually nice. The following lemma shows that random matrix

sampled from the rotational symmetry model (RSM) is nice. In RSM, each column of is an uniform vector on the unit sphere of . Note that such matrix for minimization problem is commonly used in practice such as compressed sensing.

Lemma 3.2.

Let be a random matrix samples from RSM, then with high probability.

Proof.

First, we show that is non-degenerate with high probability. For any and , denote the event where as . Note that this event is measured zero for all choice of and and thus by union bound, we have being non-degenerate with high probability. For the other two properties, similar arguments hold.    

We remark that giving a lower bound in terms of and for would result in a better asymptotic bound for our main theorem and could have applications in other problems too. Since the goal of this paper is giving a provable analysis, we do not intend to optimize the parameter. Note that for sampled from RSM, has an inverse exponential lower bound directly from union bound when and are polynomially related. As for upper bound, there are inverse quasi-polynomial upper bound if and inverse exponential upper bound if as pointed out by the anonymous reviewer from ITCS 2019. See Appendix B. for more details. We leave it as an open question to understand the correct asymptotic behavior of when is sampled from RSM.

3.4 Ideal coupling

Ideal coupling is a technique to keeping track of the dual SNN by associating any point in the dual polytope to a point in a smaller polytope. Concretely, let be the dual polytope and be the ideal polytope where is an important parameter that will be properly chosen171717The choice of depends on and and will be discussed later. in the end of the proof. Observe that . The idea of ideal coupling is associating each with a point in . In the analysis, we will then focus on the dynamics of instead of that of . Before we formally define the coupling, we have to define a partition of with respect to as follows.

Definition 5 (partition of ).

Let and be defined as above. For each , define

where is the active walls of and is the cone spanned by the column of indexed by .

Example 3.3.

Consider the example where and . The dual polytope (resp. ideal polytope) is the square with vertices in the form