# Iterative Neural Networks with Bounded Weights

A recent analysis of a model of iterative neural network in Hilbert spaces established fundamental properties of such networks, such as existence of the fixed points sets, convergence analysis, and Lipschitz continuity. Building on these results, we show that under a single mild condition on the weights of the network, one is guaranteed to obtain a neural network converging to its unique fixed point. We provide a bound on the norm of this fixed point in terms of norms of weights and biases of the network. We also show why this model of a feed-forward neural network is not able to accomodate Hopfield networks under our assumption.

## Authors

• 4 publications
• 3 publications
04/06/2021

### TENT: Efficient Quantization of Neural Networks on the tiny Edge with Tapered FixEd PoiNT

In this research, we propose a new low-precision framework, TENT, to lev...
05/23/2018

### On the Relation of Impulse Propagation to Synaptic Strength

In neural network, synaptic strength could be seen as probability to tra...
06/03/2021

### Convergent Graph Solvers

We propose the convergent graph solver (CGS), a deep learning method tha...
06/30/2021

### Fixed points of monotonic and (weakly) scalable neural networks

We derive conditions for the existence of fixed points of neural network...
03/23/2021

### Fixed Point Networks: Implicit Depth Models with Jacobian-Free Backprop

A growing trend in deep learning replaces fixed depth models by approxim...
03/23/2019

### Connections between spectral properties of asymptotic mappings and solutions to wireless network problems

In this study we establish connections between asymptotic functions and ...
11/22/2021

### Feature extraction of machine learning and phase transition point of Ising model

We study the features extracted by the Restricted Boltzmann Machine (RBM...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Artificial neural networks are becoming indispensible tools in a variety of spheres of human activity and society in general. It is therefore of utmost importance to understand the way they process the supplied data. We build on the recent works [1, 3]

which established fundamental properties of artificial neural networks in a general Hilbert space settings, such as existence of the fixed points sets, convergence analysis, and Lipschitz continuity. In this paper, we focus on exploiting a natural assumption that weights of the network are bounded. This assumption allows us to provide a simple proof of the fact that recurrent neural network possesses exactly one fixed cycle, obtained in the iteration limit of the network and originating from its unique fixed point. We also provide a bound on the norm of this fixed point in terms of norm of weights and biases. Finally, we discuss why a feed-forward neural network model used in this paper is not able to accomodate Hopfield networks under our assumption, which provides a motivation to derive and analyze a more generic neural network models.

## Ii Preliminaries

### Ii-a Functions in Hilbert spaces

Let and be arbitrary Hilbert spaces. For a function , by we denote fixed point set of . For a set-valued function (with non-empty values) , stands for the set of zeros of function . The closure of a subset is denoted by . For a linear operator , by and we denote its range and kernel, respectively. Finally, by we denote the adjoint operator of , and by we denote identity operator on . To simplify the notation, we omit the subscript of the identity operator if the space under consideration is clear from the context.

### Ii-B Convex analysis

Let be a Hilbert space. We denote by the class of lower semi-continuous (l.s.c.) convex functions , which are proper, i.e. such that

 Domϕ:={x∈H∣ϕ(x)<+∞}≠∅. (1)

For , the proximal operator is defined as

 proxϕ(x):=argminy∈H(ϕ(y)+1/2∥x−y∥2). (2)

The subdifferential of is the set-valued operator given by

 ∂ϕ(x):={u∈H∣⟨y−x,u⟩+ϕ(x)≤ϕ(y)for ally∈H}. (3)

The sets are closed and convex [2, Proposition 16.4].

Operator is called monotone, provided for each we have . Monotone operator is maximally monotone if there is no extension of to larger monotone operator in the sense that . An example of maximally monotone operator is the subdifferential

Denote by the set of functions from to , which are increasing, -Lipschitz, and take value for argument . This set of functions can be characterized using proximal operators as follows:

###### Proposition 1.

[1, Proposition 2.3] Let . Then if and only if there exists , which has as its minimizer and .

This fact allows the following definition:

###### Definition 1.

[1, Definition 2.20] Let be real Hilbert space and let . Then belongs to if there exists a function such that it has minimium at and .

## Iii Setting

Let , and let be real Hilbert spaces. Let , i.e. if , then , , , and if , then . For simplicity we will write instead of .

###### Assumption 1.

Let , i.e. , for a certain with .

###### Assumption 2.

Let , be bounded linear operators and , . Moreover, let us define by the formula

 gi(x):=σi(Wix+bi),forx∈Hi−1,i=1,…,n. (4)
###### Definition 2.

An -layer feed-forward neural network defined on is the composition

 gn∘⋯∘g1. (5)

In the theory of neural networks, the functions are called activation operators, operators are called weight operators and elements are called bias parameters.

###### Remark 1.

We note that Assumption 2 implies that the neural network is already trained, by which we understand that there exists an optimal setting of weight operators and bias parameters which fits the input and output of the network to a certain set of training data. In the case of recurrent neural networks, we have a situation where the input and output of the network are in the same space , so it is natural to ask about the existence of periodic points and about the shape of set of these points.

Thus, from now on we assume that and denote

 G:=Fix(gn∘⋯∘g1)⊂H0, (6)

and

 F:={(x1,…,xn)∈H∣x1=g1(xn),x2=g2(x1),…,xn=gn(xn−1)}. (7)

The set consists of fixed (periodic) points of the recurrent neural network

 g:=gn∘⋯∘g1:H0→H1→⋯→Hn=H0, (8)

and the set describes trajectories across layers of the neural network (5) of these fixed points.

Let and let us introduce operators

 S :H→→H:(x1,…,xn−1,xn)↦(xn,x1,…,xn−1), W :→H→H:(xn,x1,…,xn−1)↦(W1xn,W2x1,…,Wnxn−1).

Observe that .

Let for and let us define by the formula

 (9)

Moreover, let be defined by the formula

 ψ(x1,…,xn):=n∑i=1(ϕi(xi)−⟨xi,bi⟩)=ϕ(x)−⟨x,b⟩, (10)

where we denoted , . We also note that

###### Fact 1.

[2, Proposition 16.9] Under the above assumptions one has

 ∂ϕ(x1,…,xn)=∂ϕ1(x1)×…×∂ϕn(xn). (11)

Consequently, .

###### Theorem 1.

[1, Part of Proposition 4.3] In terms of the model introduced above, consider the following problem: find such that

 ⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩b1∈¯¯¯x1−W1¯¯¯xn+∂ϕ1(¯¯¯x1),b2∈¯¯¯x2−W2¯¯¯x1+∂ϕ2(¯¯¯x2),⋮bn∈¯¯¯xn−Wn¯¯¯xn−1+∂ϕn(¯¯¯xn). (12)

The following holds.

1. The set of solutions of system of inclusions (12) is .

2. .

3. Let us assume that the operator is monotone. Then the set is closed and convex. Moreover, and are nonempty if any of the following conditions is satisfied:

1. is surjective.

2. , and .

###### Remark 2.

Note that problem of finding fixed points of the recurrent neural network (5) reduces to a problem of solving a system of equations

 ⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩¯¯¯x1=g1(¯¯¯xn)=σ1(W1¯¯¯xn+b1),¯¯¯x2=g2(¯¯¯x1)=σ2(W2¯¯¯x1+b2),⋮¯¯¯xn=gn(¯¯¯xn−1)=σn(Wn¯¯¯xn−1+bn). (13)

Under Assumption 1 and using [2, Proposition 16.44], which states that

 proxϕi=(I+∂ϕi)−1, (14)

the above system (13) can be rewritten to the system of inclusions (12). That is why that inclusion is crucial for our further considerations.

## Iv Results

The following proposition shows that, under a single mild assumption, the neural network converges to its unique fixed point.

###### Proposition 2.

Let

 n∏i=1∥Wi∥<1. (15)

Then, is a singleton. Denote the unique element of as . Furthermore, let and Then , where denotes the th coordinate of , and

###### Proof.

By Assumption 1, from [2, Proposition 12.28], the activation operators are firmly nonexpansive. Thus, from [3, Proposition 3.3], we obtain in particular that the -layered neural network in (8) is Lipschitz continuous with constant Therefore, from the Banach Fixed Point Theorem it admits a unique fixed point, thus is a singleton. Denote this unique fixed point of as Then, the fact that is a singleton follows immediately from the definition of in (7). We also note en passant that

###### Remark 3.

The above proposition extends the results of Theorem 1 proved in [1] to the case when condition (15) holds. It is remarkable that, in such a case, no other conditions are required to ensure that is not only nonempty, closed and convex, but actually a singleton. Moreover, note that the sequence of interations convergences in the strong topology to the unique fixed point.

###### Corollary 1.

In particular, if

 ∥W∥<1, (16)

then is a singleton.

The next proposition provides a bound on the norm of the unique element of

###### Proposition 3.

Let satisfies condition (16). Then, the unique element (cf. Proposition 2) is such that

 ∥xF∥≤∥b∥1−∥W∥. (17)

In particular, if no bias terms are used in neural network (5), i.e., , then

###### Proof.

We note first that is the unique solution of system (12), if condition (15) is assumed, and in such a case

 b+(W∘S)xF∈xF+∂ϕ(xF)=(I+∂ϕ)(xF). (18)

Thus, from [2, Proposition 16.44] one has

 xF=proxϕ(W∘SxF+b). (19)

Hence, using the fact that (cf. Definition 1), from [1, Proposition 2.21] one has in particular that for all

 ∥proxϕ(x)∥≤∥x∥. (20)

For , this implies that

 ∥xF∥≤ ∥W∘SxF+b∥≤∥W∘SxF∥+∥b∥≤ (21) ∥W∘S∥⋅∥xF∥+∥b∥=∥W∥⋅∥xF∥+∥b∥, (22)

where we have used the fact that The inequality (17) follows. If , then , and from Proposition 2 one concludes that in such a case

The fact that neural network converges to a single fixed point may not be desirable in certain applications. The following remark demonstrates that in such a case, a more general network model must be considered.

###### Remark 4.

Consider Hopfield neural network model given as follows

 x′(t)=−Dx(t)+Wσ(x(t))+b, (23)

where

is the state vector,

with

is block diagonal matrix of self-inhibition of neurons and

, given by , ,

, is continuous activation function of the neural network.

Any equlibrium point of the above network satisfies

 0=−Dx+Wσ(x)+b. (24)

Therefore,

 x=D−1(Wσ(x)+b). (25)

Denote . Since, by [2, Proposition 16.44], , then

 D−1(Wz+b)∈z+∂ϕ(z). (26)

Thus,

 D−1b∈z−D−1Wz+∂ϕ(z), (27)

which is of the form (12). In particular, under appropriate assumptions on operators and , one can achieve that , which according to Proposition 2 leads to a situation where we have only one fixed point of network (23). Hence, our (and the one described in [1]) model may not be adequate in this case, as Hopfield network learning relies on memorizing many distinct fixed points of the network.