1 Network model with disynaptic inhibition
The disynaptic inhibition network (Fig. 1, left) has the activity dynamics,
(1) | ||||
(2) |
Here
is a step size parameter, which can be set at a small constant value or adjusted adaptively. The activation function
is half-wave rectification. After the activities converge to a steady state, update the connection matrices via(3) | ||||
(4) |
where , , and . After the updates (3) and (4), any negative elements of and are zeroed to maintain nonnegativity. The divisive factor in Eq. (1) is updated via
(5) |
Intuitions behind the model definitions are explained in the companion paper (Seung, 2018). The goal of the present paper is show how the network can be interpreted as a method of solving a zero-sum game.
2 Correlation game between connections
2.1 Formulation as constrained optimization
The first normative principle concerns transformation of a sequence of input vectors
into a sequence of output vectors . Both input and output are assumed nonnegative. Define the input matrix as the matrix containing input vectors as its columns. The element is the th component of . Similarly, define the output matrix as containing output vectors as its columns. Define the output-input correlation matrix isIts element is the time average of , or . Similarly, define the output-output correlation matrix
Its element is the time average of , or
. Note that “correlation matrix” is used to mean second moment matrix rather than covariance matrix. In other words, the correlation matrix does not involve subtraction of mean values. This is natural for sparse nonnegative variables, but covariance matrices may be substituted in other settings.
Problem 1 (Constrained optimization).
Define the goal of unsupervised learning as the constrained optimization
(6) |
where is a fixed matrix and is a scalar-valued function that is assumed monotone nondecreasing as a function of every element of its matrix-valued argument.
Monotonicity is an important assumption because it allows us to interpret the objective of Eq. (6) as maximization of input-output correlations.
2.2 Copositivity vs. nonnegativity
Seung and Zung (2017) introduced the principle
(7) |
which differs from Eq. (6) only by the substitution of “nonnegativity” for “copositivity.” (Here nonnegativity of a matrix is defined to mean nonnegativity of all its elements.) While the formalisms here are valid for arbitrary , a convenient choice is to set diagonal elements of to be and off-diagonal elements of to be
(8) |
If is much smaller than , the nonnegativity constraint in Eq. (7) amounts to decorrelation.
A symmetric matrix is said to be copositive when for every nonnegative vector . This constraint is analogous to positive semidefiniteness but is more complex because it cannot be reduced to a single eigenvalue constraint. Hahnloser et al. (2003) give sufficient and necessary conditions for copositivity involving eigenvalues of submatrices.
Nonnegativity of is a sufficient condition for copositivity of , but it is not a necessary condition. In particular, copositivity of does not require nonnegativity, so a solution of Problem 1 may have for some and .
A necessary condition for copositivity of is nonnegativity of its diagonal elements, since if where denotes the standard basis for . In particular copositivity of requires that for all . These inequalities will be called “power constraints,” because they limit the power in the outputs.
If either of the diagonal elements and vanish, then a necessary condition for copositivity is nonnegativity of the off-diagonal element . Therefore may exceed in a solution of Problem 1 only if the power constraints for and are not saturated.
2.3 Correlation game from Legendre-Lagrangian duality
The copositivity constraint in Eq. (6) can be enforced by introducing Lagrange multipliers and ,
(9) |
The Lagrange multiplier is a nonnegative matrix. The outer maximum must choose so that is copositive because otherwise the minimum with respect to is . The Lagrange multiplier is a nonnegative diagonal matrix. The outer maximum must choose so that the diagonal elements of are nonnegative because otherwise the minimum with respect to is .
As mentioned above, copositivity of by itself already implies that the diagonal elements are nonnegative. It follows that the Lagrange multiplier is redundant for the primal problem, though it does affect the dual problem. Similarly, adding extra rows to the Lagrange multiplier does not change the primal problem. For enforcing the copositivity constraint, it would be sufficient for to be . However, making does affect the dual problem.
Problem 2 (Game between cells and connections).
Switching the order of min and max in Eq. (1) yields the dual problem,
(10) |
This is an upper bound for Eq. (9) by the minimax inequality.
At this point, it is convenient to define the objective function as the convex conjugate (Legendre-Fenchel transform) of a function ,
(11) |
The nonnegativity constraint on in Eq. (11) guarantees that is monotone nondecreasing as a function of every element of . The function can be interpreted as a regularizer or prior for the weight matrix .
With Legendre duality, a maximization with respect to is implicit in Eq. (11). Switching the order of and maximizations yields the following equivalent problem.
Problem 3 (Game between connections).
The Lagrangian dual of the constrained optimization in Problem 1 is
(12) |
with payoff function defined by
(13) |
The min-max problem can be interpreted as a zero-sum game between on the one hand and and on the other.
Problem 3 is closely related to the correlation game previously introduced by Seung and Zung (2017),
(14) |
Problem 3 constrains for some nonnegative and , so it is an upper bound for Eq. (14). This is the mathematical interpretation of choosing a parametrized form for the Lagrange multiplier , as was done by Seung (2018).
The network model of Section 1 follows by setting
in Eq. (13) and applying online projected gradient ascent to perform the maximizations in Eq. (12) and online projected gradient descent to perform the minimizations. For a more general choice of , Eq. (3) should be replaced by
3 Correlation game between cells
The second normative principle concerns transformation of a sequence of nonnegative input vectors into two sequences of nonnegative output vectors and . Define the input matrix and the two output matrices and .
Problem 4 (Game between cells).
Define the goal of unsupervised learning as the zero-sum game between and
(15) |
where and are scalar-valued functions assumed monotone nondecreasing as a function of every element of their matrix-valued arguments.
Note that only nonnegativity constraints remain in Problem 4; the copositivity constraint of Problem 1 is completely hidden. This correlation game can be interpreted as follows. The cells would like to maximize correlations (make large) and minimize correlations (make small). The cells would like to maximize correlations (make large) and minimize power (make small). There is conflict between the and cells because cells would like to minimize correlations while cells would like to maximize them. The compromise is that cells incompletely decorrelate from each other.
(16) |
as the Legendre transform of
(17) |
4 Discussion
The second normative principle is also interesting because it can be generalized to include and connections. This will be the subject of future work.
Acknowledgments
The author is grateful for helpful discussions with J. Zung, C. Pehlevan and D. Chklovskii. The research was supported in part by the Intelligence Advanced Research Projects Activity (IARPA) via DoI/IBC contract number D16PC0005, and by the National Institutes of Health via U19 NS104648 and U01 NS090562.
References
- Földiák [1990] Peter Földiák. Forming sparse representations by local anti-hebbian learning. Biological cybernetics, 64(2):165–170, 1990.
- Hahnloser et al. [2003] Richard HR Hahnloser, H Sebastian Seung, and Jean-Jacques Slotine. Permitted and forbidden sets in symmetric threshold-linear networks. Neural computation, 15(3):621–638, 2003.
- Pehlevan and Chklovskii [2015] Cengiz Pehlevan and Dmitri Chklovskii. A normative theory of adaptive dimensionality reduction in neural networks. In Advances in neural information processing systems, pages 2269–2277, 2015.
- Pehlevan et al. [2018] Cengiz Pehlevan, Anirvan M Sengupta, and Dmitri B Chklovskii. Why do similarity matching objectives lead to hebbian/anti-hebbian networks? Neural computation, 30(1):84–124, 2018.
- Seung [2018] H Sebastian Seung. Unsupervised learning by a nonlinear network with hebbian excitatory and anti-hebbian inhibitory neurons. arXiv, 2018.
- Seung and Zung [2017] H Sebastian Seung and Jonathan Zung. A correlation game for unsupervised learning yields computational interpretations of hebbian excitation, anti-hebbian inhibition, and synapse elimination. arXiv preprint arXiv:1704.00646, 2017.
- Zylberberg et al. [2011] Joel Zylberberg, Jason Timothy Murphy, and Michael Robert DeWeese. A sparse coding model with synaptically local plasticity and spiking neurons can account for the diverse shapes of v1 simple cell receptive fields. PLoS Comput Biol, 7(10):e1002250, 2011.