I Introduction
Artificial neural networks are becoming indispensible tools in a variety of spheres of human activity and society in general. It is therefore of utmost importance to understand the way they process the supplied data. We build on the recent works [1, 3]
which established fundamental properties of artificial neural networks in a general Hilbert space settings, such as existence of the fixed points sets, convergence analysis, and Lipschitz continuity. In this paper, we focus on exploiting a natural assumption that weights of the network are bounded. This assumption allows us to provide a simple proof of the fact that recurrent neural network possesses exactly one fixed cycle, obtained in the iteration limit of the network and originating from its unique fixed point. We also provide a bound on the norm of this fixed point in terms of norm of weights and biases. Finally, we discuss why a feed-forward neural network model used in this paper is not able to accomodate Hopfield networks under our assumption, which provides a motivation to derive and analyze a more generic neural network models.
Ii Preliminaries
Ii-a Functions in Hilbert spaces
Let and be arbitrary Hilbert spaces. For a function , by we denote fixed point set of . For a set-valued function (with non-empty values) , stands for the set of zeros of function . The closure of a subset is denoted by . For a linear operator , by and we denote its range and kernel, respectively. Finally, by we denote the adjoint operator of , and by we denote identity operator on . To simplify the notation, we omit the subscript of the identity operator if the space under consideration is clear from the context.
Ii-B Convex analysis
Let be a Hilbert space. We denote by the class of lower semi-continuous (l.s.c.) convex functions , which are proper, i.e. such that
(1) |
For , the proximal operator is defined as
(2) |
The subdifferential of is the set-valued operator given by
(3) |
The sets are closed and convex [2, Proposition 16.4].
Operator is called monotone, provided for each we have . Monotone operator is maximally monotone if there is no extension of to larger monotone operator in the sense that . An example of maximally monotone operator is the subdifferential
Denote by the set of functions from to , which are increasing, -Lipschitz, and take value for argument . This set of functions can be characterized using proximal operators as follows:
Proposition 1.
[1, Proposition 2.3] Let . Then if and only if there exists , which has as its minimizer and .
This fact allows the following definition:
Definition 1.
[1, Definition 2.20] Let be real Hilbert space and let . Then belongs to if there exists a function such that it has minimium at and .
Iii Setting
Let , and let be real Hilbert spaces. Let , i.e. if , then , , , and if , then . For simplicity we will write instead of .
Assumption 1.
Let , i.e. , for a certain with .
Assumption 2.
Let , be bounded linear operators and , . Moreover, let us define by the formula
(4) |
Definition 2.
An -layer feed-forward neural network defined on is the composition
(5) |
In the theory of neural networks, the functions are called activation operators, operators are called weight operators and elements are called bias parameters.
Remark 1.
We note that Assumption 2 implies that the neural network is already trained, by which we understand that there exists an optimal setting of weight operators and bias parameters which fits the input and output of the network to a certain set of training data. In the case of recurrent neural networks, we have a situation where the input and output of the network are in the same space , so it is natural to ask about the existence of periodic points and about the shape of set of these points.
Thus, from now on we assume that and denote
(6) |
and
(7) |
The set consists of fixed (periodic) points of the recurrent neural network
(8) |
and the set describes trajectories across layers of the neural network (5) of these fixed points.
Let and let us introduce operators
Observe that .
Let for and let us define by the formula
(9) |
Moreover, let be defined by the formula
(10) |
where we denoted , . We also note that
Fact 1.
[2, Proposition 16.9] Under the above assumptions one has
(11) |
Consequently, .
Theorem 1.
[1, Part of Proposition 4.3] In terms of the model introduced above, consider the following problem: find such that
(12) |
The following holds.
-
The set of solutions of system of inclusions (12) is .
-
.
-
Let us assume that the operator is monotone. Then the set is closed and convex. Moreover, and are nonempty if any of the following conditions is satisfied:
-
is surjective.
-
, and .
-
Remark 2.
Note that problem of finding fixed points of the recurrent neural network (5) reduces to a problem of solving a system of equations
(13) |
Under Assumption 1 and using [2, Proposition 16.44], which states that
(14) |
the above system (13) can be rewritten to the system of inclusions (12). That is why that inclusion is crucial for our further considerations.
Iv Results
The following proposition shows that, under a single mild assumption, the neural network converges to its unique fixed point.
Proposition 2.
Let
(15) |
Then, is a singleton. Denote the unique element of as . Furthermore, let and Then , where denotes the th coordinate of , and
Proof.
By Assumption 1, from [2, Proposition 12.28], the activation operators are firmly nonexpansive. Thus, from [3, Proposition 3.3], we obtain in particular that the -layered neural network in (8) is Lipschitz continuous with constant Therefore, from the Banach Fixed Point Theorem it admits a unique fixed point, thus is a singleton. Denote this unique fixed point of as Then, the fact that is a singleton follows immediately from the definition of in (7). We also note en passant that ∎
Remark 3.
The above proposition extends the results of Theorem 1 proved in [1] to the case when condition (15) holds. It is remarkable that, in such a case, no other conditions are required to ensure that is not only nonempty, closed and convex, but actually a singleton. Moreover, note that the sequence of interations convergences in the strong topology to the unique fixed point.
Corollary 1.
In particular, if
(16) |
then is a singleton.
The next proposition provides a bound on the norm of the unique element of
Proposition 3.
Proof.
We note first that is the unique solution of system (12), if condition (15) is assumed, and in such a case
(18) |
Thus, from [2, Proposition 16.44] one has
(19) |
Hence, using the fact that (cf. Definition 1), from [1, Proposition 2.21] one has in particular that for all
(20) |
For , this implies that
(21) | ||||
(22) |
where we have used the fact that The inequality (17) follows. If , then , and from Proposition 2 one concludes that in such a case ∎
The fact that neural network converges to a single fixed point may not be desirable in certain applications. The following remark demonstrates that in such a case, a more general network model must be considered.
Remark 4.
Consider Hopfield neural network model given as follows
(23) |
where is the state vector, is block diagonal matrix of self-inhibition of neurons and , is continuous activation function of the neural network.
Any equlibrium point of the above network satisfies
(24) |
Therefore,
(25) |
Denote . Since, by [2, Proposition 16.44], , then
(26) |
Thus,
(27) |
which is of the form (12). In particular, under appropriate assumptions on operators and , one can achieve that , which according to Proposition 2 leads to a situation where we have only one fixed point of network (23). Hence, our (and the one described in [1]) model may not be adequate in this case, as Hopfield network learning relies on memorizing many distinct fixed points of the network.
References
- [1] P. L. Combettes and J.-C. Pesquet, Deep Neural Network Structures Solving Variational Inequalities, arXiv preprint, arXiv:1808.07526, 2019.
- [2] H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Second Edition. New York: Springer, 2017.
- [3] P. L. Combettes and J.-C. Pesquet, Lipschitz Certificates for Neural Network Structures Driven by Averaged Activation Operators. arXiv preprint arXiv:1903.01014, 2019.
Comments
There are no comments yet.