# Conditions for Stability and Convergence of Set-Valued Stochastic Approximations: Applications to Approximate Value and Fixed point Iterations

The main aim of this paper is the development of easily verifiable sufficient conditions for stability (almost sure boundedness) and convergence of stochastic approximation algorithms (SAAs) with set-valued mean-fields, a class of model-free algorithms that have become important in recent times. In this paper we provide a complete analysis of such algorithms under three different, yet related sets of sufficient conditions, based on the existence of an associated global/local Lyapunov function. Unlike previous Lyapunov function based approaches, we provide a simple recipe for explicitly constructing the Lyapunov function, needed for analysis. Our work builds on the works of Abounadi, Bertsekas and Borkar (2002), Munos (2005), and Ramaswamy and Bhatnagar (2016). An important motivation for the flavor of our assumptions comes from the need to understand dynamic programming and reinforcement learning algorithms, that use deep neural networks (DNNs) for function approximations and parameterizations. These algorithms are popularly known as deep learning algorithms. As an important application of our theory, we provide a complete analysis of the stochastic approximation counterpart of approximate value iteration (AVI), an important dynamic programming method designed to tackle Bellman's curse of dimensionality. Further, the assumptions involved are significantly weaker, easily verifiable and truly model-free. The theory presented in this paper is also used to develop and analyze the first SAA for finding fixed points of contractive set-valued maps.

## Authors

• 12 publications
• 37 publications
• ### Asynchronous stochastic approximations with asymptotically biased errors and deep multi-agent learning

Asynchronous stochastic approximations are an important class of model-f...
02/22/2018 ∙ by Arunselvan Ramaswamy, et al. ∙ 0

• ### Stochastic recursive inclusion in two timescales with an application to the Lagrangian dual problem

In this paper we present a framework to analyze the asymptotic behavior ...
02/06/2015 ∙ by Arunselvan Ramaswamy, et al. ∙ 0

• ### Stability of Stochastic Approximations with Controlled Markov' Noise and Temporal Difference Learning

In this paper we present a stability theorem' for stochastic approximat...
04/23/2015 ∙ by Arunselvan Ramaswamy, et al. ∙ 0

• ### Theoretical and Numerical Analysis of Approximate Dynamic Programming with Approximation Errors

This study is aimed at answering the famous question of how the approxim...
12/18/2014 ∙ by Ali Heydari, et al. ∙ 0

• ### An Online Prediction Algorithm for Reinforcement Learning with Linear Function Approximation using Cross Entropy Method

In this paper, we provide two new stable online algorithms for the probl...
06/15/2018 ∙ by Ajin George Joseph, et al. ∙ 0

• ### Analysis of gradient descent methods with non-diminishing, bounded errors

The main aim of this paper is to provide an analysis of gradient descent...
04/01/2016 ∙ by Arunselvan Ramaswamy, et al. ∙ 0

• ### Convex Q-Learning, Part 1: Deterministic Optimal Control

It is well known that the extension of Watkins' algorithm to general fun...
08/08/2020 ∙ by Prashant G. Mehta, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Stochastic approximation algorithms (SAAs) are an important class of iterative schemes that are used to solve problems arising in stochastic optimization, stochastic control, machine learning and financial mathematics, among others. SAAs constitute a powerful tool, due to their model free approach, to solving problems. The first stochastic approximation algorithm was developed by Robbins and Monro

[17] in 1951 to solve the root finding problem. Important contributions to modern stochastic approximations theory were made by Benaïm (1996) [3], Benaïm and Hirsch (1996) [4], Borkar (1997) [9], Borkar and Meyn (1999) [11] and Benaïm, Hofbauer and Sorin (2005)[5], to name a few.

An important aspect of the analysis of SAAs lies in verifying the almost sure boundedness (stability) of the iterates. This can be hard in many applications. In this paper we present easily verifiable sufficient conditions for both stability and convergence of SAAs with set-valued mean-fields. Specifically, we consider the following iterative scheme:

 xn+1=xn+a(n)(yn+Mn+1), (1)

where for all ; is a step-size sequence; for all such that is a Marchaud map; and is the noise sequence. We present three different yet overlapping sets of (easily verifiable) conditions for stability (almost sure boundedness) and convergence (to a closed connected internally chain transitive invariant set of ) of (1). The reader is referred to Section 4 for the analysis.

The problem of stability for SAAs with set-valued mean-fields has been previously studied by Ramaswamy and Bhatnagar [16]. They developed the first set of sufficient conditions for the stability and convergence of (1) by extending the ideas of Borkar and Meyn [11]. Their sufficient conditions are based on the “limiting properties” of an associated scaled differential inclusion. On the contrary, the conditions presented in this paper are based on “local properties” of the associated differential inclusion. We believe that the stability criterion presented here is applicable to scenarios that are in some sense “orthogonal” to those that readily use the stability criterion of [16]. Our work contributes to the literature of SAAs with set-valued mean-fields by presenting the first set of Lyapunov function based sufficient conditions for stability and convergence of (1). These are also the only sets of sufficient conditions in the literature for stability and convergence of (1), after the ones presented by us in [16].

In this paper, we present Lyapunov function based stability conditions for the analyses of SAAs with set-valued mean-fields. An important motivation for doing this lies in the proliferation of dynamic programming and reinforcement learning methods based on value iteration and policy gradients that use deep neural networks (DNNs) for function approximations and parameterizations, respectively. The use of DNNs often causes the algorithms to “blow-up”, in finite time, and find sub-optimal solutions. In this paper we present sufficient conditions which guarantee that no finite-time blow-up occurs. Further, these conditions guarantee that the suboptimal solutions found are very close to the optimal solution. The reader is referred to Sections 7 and 8 for details. Our work builds on the work of Abounadi, Bertsekas and Borkar [1] as well as Ramaswamy and Bhatnagar [16]. Our stability criterion is dependent in part on the possibility of comparing various instances of an algorithm being analyzed. For the exact nature of this comparison, the reader is referred to assumption of Section 3. As stated earlier, we present three sets of assumptions that are qualitatively different yet overlapping, for stability and convergence. As a consequence, the framework developed herein is semantically rich enough to cover a multitude of scenarios encountered in reinforcement learning, stochastic optimization and other applications.

We answer the following important question: Does the mere existence of a Lyapunov function for imply the almost sure boundedness (stability) of (1)? We will show that the existence of a global/local Lyapunov function allows us to construct an “inward directing set”, see Proposition 2 for details. We then use this inward directing set to develop a partner projective scheme (to (1)). This scheme is shown to converge to a point inside the previously constructed inward directing set. In order to show the stability of (1), we compare the same to the aforementioned partner projective scheme. The exact nature of this comparison is outlined in . It is imperative that (1) and it’s partner projective scheme are comparable. In other words, it seems that the mere existence of a Lyapunov function is insufficient to ensure stability. Additional assumptions such as of Section 3 are needed for this purpose. We demonstrate the verifiability of our assumptions by using our framework to comprehensively analyze two important problems: (i) approximate value iteration methods with possibly biased approximation errors and (ii) SAAs for finding fixed points of contractive set-valued maps. It is worth noting that our analysis of approximate value iteration methods does not distinguish between biased and unbiased approximation errors.

In Section 7

, as an application of our main results, we present a complete analysis of approximate value iteration methods, an important class of dynamic programming algorithms, under significantly weaker set of assumptions. Value iteration is an important dynamic programming method used to numerically compute the optimal value function of a Markov decision process (MDP). However, it is well known that for many important applications, it suffers from

Bellman’s curse of dimensionality. Approximate Value Iteration (AVI) methods endeavor to address Bellman’s curse of dimensionality by introducing an approximation operator that allows for approximations at every step of the classical value iteration method. If the approximation errors are allowed to be unbounded then the algorithm may not converge, see [7] for details. AVIs with bounded approximation errors have been previously studied in [7, 12, 14]. Bertsekas and Tsitsiklis [7] studied scenarios wherein the approximation errors are uniformly bounded over all states. Munos [14] extended the analysis of [7], allowing for approximation errors that are bounded in the weighted p-norm sense, for the infinite horizon discounted cost problem. However the convergence analysis of [14]

requires that the transition probabilities or future state distributions be “smooth”. For a detailed comparison of our results concerning AVI methods to those already present in the literature, see Section

7.2.

An important contribution of this paper is in providing a convergence analysis of AVIs without the aforementioned restriction on transition probabilities or future distributions (cf. Section 7). Our analysis encompasses both the stochastic shortest path and the discounted cost infinite horizon problems. When analyzing stochastic iterative AVIs (see (12) in Section 7 for details on stocastic iterative AVIs), stability (almost sure boundedness) of the iterates is normally assumed to hold. As stated before, stability is a hard assumption to verify. Further, it is unclear if the introduction of an approximation operator leads to unstable iterates. Thus, an important contribution of this paper is in showing stability of stochastic iterative AVIs, under weak, verifiable conditions. In Section 7, it is shown that a stochastic iterative AVI converges to a possibly

suboptimal cost-to-go vector

which belongs to a “small neighborhood” of the optimal vector, . Further it is shown that the size of this neighborhood is directly proportional to the magnitude of the approximation errors, see Theorems 3 and 4 for details.

Thus, in Section 7 we provide a complete analysis (stability and convergence) of general AVI methods under weak, easily verifiable, set of sufficient conditions. We eliminate all previous restrictions on the “smoothness” of transition probabilities and future distributions. We also allow for more general “operational noise” as compared to previous literature. An important aspect of our analysis is that it encompasses both stochastic shortest path and infinite horizon discounted cost problems. We provide a unified analysis for stability and convergence of AVI methods wherein the approximation errors are bounded with respect to multiple norms. Finally, we believe that the theory developed herein is useful in providing the theoretical foundation for understanding reinforcement learning and dynamic programming algorithms that use deep neural networks (DNNs), an area that has garnered significant interest recently.

In Section 8, as another important application of our framework, we develop and analyze, for the first time, general SAAs for finding fixed points of set-valued maps. Fixed point theory is an active area of research due to its applications in a multitude of disciplines. Recently, the principle of dynamic programming (DP) was generalized by Bertsekas [6] to solve problems which, previously, could not be solved using classical DP. This extension involved a new abstract definition of the Bellman operator. The theory thus developed is called Abstract Dynamic Programming. An integral component of this new theory involves showing that the solution to the abstract Bellman operator is its fixed point. We believe that the results of Section 8 are helpful in solving problems that can be formulated as an Abstract Dynamic Program. Our contribution on this front is in analyzing stochastic approximation algorithms for finding fixed points of contractive set-valued maps, see Section 8 for details. As mentioned before, we show that such algorithms are bounded almost surely and that they converge to a sample path dependent fixed point of the set-valued map under consideration. To the best of our knowledge ours is the first SAA, complete with analysis, for finding fixed points of set-valued maps.

### 1.1 Organization of this paper

In Section 2 we list the definitions and notations used in this paper. In Section 3 we present three sets of sufficient conditions for stability and convergence (to a closed connected internally chain transitive invariant set of the associated DI) of SAAs with set-valued mean-fields. Through Sections 4, 5 and 6 we present our main results, Theorems 1 and 2. In Section 7 we analyze stochastic iterative AVI methods, see Theorems 3 and 4. In Section 8 we develop and analyze a SAA for finding fixed points of contractive set-valued maps, see Theorem 5 for the main result here. A detailed discussion on assumption , which is crucial to our analysis, is provided in Section 9. Finally Section 10 provides the concluding remarks.

## 2 Definitions and Notations

The definitions and notations encountered in this paper are listed in this section.

• [Upper-semicontinuous map] We say that is upper-semicontinuous, if given sequences (in ) and (in ) with , and , , then .

• [Marchaud Map] A set-valued map } is called Marchaud if it satisfies the following properties: (i) for each , is convex and compact; (ii) (point-wise boundedness) for each , for some ; (iii) is upper-semicontinuous.
Let be a Marchaud map on . The differential inclusion (DI) given by

 ˙x ∈ H(x), (2)

is guaranteed to have at least one solution that is absolutely continuous. The reader is referred to [2] for more details. We say that if x is an absolutely continuous map that satisfies (2). The set-valued semiflow associated with (2) is defined on as:
. Let and define

 ΦB(M)=⋃t∈B, x∈MΦt(x).
• [Limit set of a solution & -limit-set] The limit set of a solution x with is given by . Let , the -limit-set be defined by

• [Invariant set] is invariant if for every there exists a trajectory, , entirely in with , , for all . Note that the definition of invariant set used in this paper, is the same as that of positive invariant set used in [5] and [10].

• [Open and closed neighborhoods of a set] Let and , then . We define the -open neighborhood of by . The -closed neighborhood of is defined by . The open ball of radius around the origin is represented by , while the closed ball is represented by .

• [Internally chain transitive set] is said to be internally chain transitive if is compact and for every , and we have the following: There exists and that are solutions to the differential inclusion , points and real numbers greater than such that: and for . The sequence is called an chain in from to .

• [Attracting set & fundamental neighborhood] is attracting if it is compact and there exists a neighborhood such that for any , with . Such a is called the fundamental neighborhood of . In addition to being compact if the attracting set is also invariant then it is called an attractor. The basin of attraction of is given by .

• [Global attractor] If the basin of a given attractor is the whole space, then the attractor is called global attractor.

• [Globally asymptotically stable equilibrium point] A point is an equilibrium point, if . Further, it is globally asymptotically stable if it is a global attractor. This notion is readily extensible to sets.

• [Lyapunov stable] The above set is Lyapunov stable if for all , such that .

## 3 Assumptions

Consider the following iteration in :

 xn+1=xn+a(n)[yn+Mn+1], (3)

where for all with , is the given step-size sequence and is the given noise sequence.

We make the following assumptions 111 Note that assumption presents a set of Lyapunov conditions on the associated DI. Two more sets of alternative conditions, viz., and will also be presented subsequently in this section. :

• is a Marchaud map. For all , where is the given Marchaud constant.

• The step-size sequence is such that , and .

• For all , where is some constant.

• for all , where .

• Associated with the differential inclusion (DI) is a compact set , a bounded open neighborhood and a function such that

• i.e., is strongly positively invariant.

• .

• is a continuous function such that for all and we have , for any .

Note that it follows from Proposition 3.25 of Benaïm, Hofbauer and Sorin [5] that contains a Lyapunov stable attracting set. Further there exists an attractor contained in whose basin of attraction contains , with as fundamental neighborhoods for small values of .

Note that assumptions on the noise, , will be weakened to include more general noise sequences later, see Section 6. The reader is referred to Section 9 for more detailed discussions on this assumption.

We define open sets and such that the following conditions are satisfied: (a) (b) and (c) is a fundamental neighborhood of the attractor contained in . Since is continuous, it follows that is open relative to , for all . Further, assumption implies that and that for small values of . It may be noted that the closure of is . Let and , where . Here is chosen small enough to ensure that and the above conditions (b) and (c) hold. Note that condition (a) is automatically satisfied since .

Our analysis will carry forward even under a slight weakening of . We call this weakening as . It may be noted that implies . We present both, since one may be easier to verify as compared to the other, depending on the problem at hand.

• Associated with is a compact set , a bounded open neighborhood and a function such that

• i.e., is strongly positively invariant.

• .

• is an upper semicontinuous function such that for all and we have , where .

• is bounded on . In other words, .

The difference between statements and contributes to the qualitative difference between assumptions and . It follows from Proposition 3.25 of Benaïm, Hofbauer and Sorin [5] that contains an attractor set whose basin of attraction contains . As in the case of , we define open sets and satisfying the above stated (a) and (b). But first, we prove that sets of the form are open relative to , as expected.

###### Proposition 1.

For any , the set is open relative to . Further, .

###### Proof.

PROOF: Suppose is not open relative to , then there exists such that , as , with for every and . It follows from the boundedness of , i.e., (A4b)(iv) that there exists such that as for some . Since , it follows from the upper semicontinuity of that . Since and , we get a contradiction.

To prove the second part of the proposition, it is enough to show that for every . Since is open relative to and , we have . It is left to show that . Since , there exists such that . It follows from the upper semicontinuity and the boundedness of that . ∎

As in the case of , we have and for small values of . We are now ready to define and . Define such that , possible for small values of . Further, choose such that is open and . This is possible since is compact and is open.

###### Remark 1.

Let us suppose we are given a differential inclusion , an associated attractor set and a strongly positive invariant neighborhood of . We define an upper-semicontinuous Lyapunov function , as found in Remark 3.26, Section 3.8 of [5]. In other words, given by , where is an increasing function such that for all . We claim that satisfies . To verify this claim we consider the following. is strongly positive invariant and for , hence . It follows from the upper semicontinuity of that i.e., (A4b)(iv) is satisfied. It is left to show that is also satisfied. Fix and . It follows from the definition of a semi-flow that for any , where . Further,

 V(x)≥max{d(z,A)g(t+s)∣z∈Φs(y),s≥0} and
 max{d(z,A)g(t+s)∣z∈Φs(y),s≥0}>max{d(z,A)g(s)∣z∈Φs(y),s≥0}.

The RHS of the above equation is i.e., .

We consider one final alternative to and below.

• is the global attractor of .

• is an upper semicontinuous function such that for all , and .

• for all , and .

As with and , we define open sets and satisfying conditions (a) and (b), see below the statement of . Recall that . Define and for appropriate and satisfying . Suppose we are unable to find such an then we may choose to be any open set satisfying the required properties once is fixed as mentioned before.

A classical way to ensure stability of (3) is by running its associated projective scheme given by:

 ^xn={\LARGE⊓}\footnotesizeB,C(^xn+a(n)[yn+Mn+1]), (4)

where and is the projection operator that projects onto set , when the operand escapes from set . Clearly, the martingale noise sequences are identical in (3) and (4). The advantage of running an associated projective scheme, such as (4), is that stability of the algorithm is ensured provided is compact. However, the main drawback to this approach is that convergence properties of (4) are dependent on the choice of and . If the limiting behavior of the algorithm is known a priori, then can be chosen to contain the limiting set. However, this is not always possible. In this paper, we show that assumption facilitates a comparison between our algorithm and its projective counterpart. The form of comparison is captured in assumption, , presented below.

• There is a sample path dependent integer such that a.s., where is generated by (3) and is generated by (4).

It may be noted that we run a hypothetical projective scheme such as (4), merely to prove the stability of (3). Assumption facilitates in finding the sets and for this hypothetical scheme. If one arbitrarily chooses the aforementioned sets, then the algorithm and its projective counterpart may not be comparable (in the sense of ). The key assumption that ensures stability is , typically follows. The reader is referred to Section 9 for more details.

###### Remark 2.

In Remark 1, we explicitly constructed a local Lyapunov function satisfying . Similarly, here, we construct a global Lyapunov function satisfying . Define the function as , where is defined in Remark 1. This Lyapunov function, , satisfies . The proof is similar to the one found in Remark 1.

Inward directing sets: Given a differential inclusion , an open set is said to be an inward directing set with respect to the aforementioned differential inclusion, if , , whenever . Clearly inward directing sets are invariant. It also follows that any solution starting at the boundary of is “directed inwards”, into .

###### Proposition 2.

The open sets , and , constructed in accordance to assumptions , and respectively, are inward directing sets with respect to .

###### Proof.

PROOF: Recall that the set is constructed such that for every and for every . Since , it follows from that , where , and . In other words, and for and . It is left to show that for and . This follows directly from the observation that for every and . and can be similarly shown to be inward directing. ∎

In what follows, we use assumptions -, and the existence of an inward directing set with respect to the associated DI, to prove the stability of (3). As a consequence of Proposition 2, we may verify one among , and to ensure the existence of such an inward directing set. It may be noted that these assumptions are qualitatively different. However, their primary role is to help us find one of the aforementioned inward directing sets. Depending on the nature of the iteration being analyzed, it may be easier to verify one or the others.

## 4 Analysis of the projective scheme

We begin this section with a minor note on notations. Since the roles of , and are indistinguishable, we shall refer to them generically as . In a similar manner, and are generically referred to as and respectively. Note that . We also define the projection map, , as follows:

 {\LARGE⊓}B,C(x):={{x}, if x∈C{y∣d(y,x)=d(x,¯¯¯¯B), y∈¯¯¯¯B}%,otherwise.

As in [1], in order to prove stability, (3) is compared to the following projective scheme.

 ~xn+1=xn+a(n)[yn+Mn+1],xn+1=zn, where zn∈{\LARGE⊓}B,C(~xn+1), (5)

where and , with . Note that the initial point is first projected before starting the projective scheme. The above equation can be rewritten as

 xn+1=xn+a(n)[yn+Mn+1]+gn, (6)

where

. Let us construct a linearly interpolated trajectory using (

6). We begin by dividing into diminishing intervals using the step-size sequence. Let and for . The linearly interpolated trajectory is defined as follows:

 Xl(t):={xn, for t=tn(1−t−tna(n))xn+(t−tna(n))~xn+1, for t∈[tn,tn+1).. (7)

The above constructed trajectory is right continuous with left-hand limits, i.e., and exist. Further the jumps occur exactly at those ’s for which the corresponding ’s are non-zero. We also define three piece-wise constant trajectories , and as follows: , and for . The trajectories , and are also right continuous with left-hand limits. We define a linearly interpolated trajectory associated with as follows:

 Wl(t):=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩n−1∑m=0a(m)Mm+1 for t=tn(1−t−tna(n))Wl(tn)+(t−tna(n))Wl(tn+1), for t∈[tn,tn+1).

We define a few “left-shifted trajectories” using the above constructed trajectories. For ,

 Xnl(t):=Xl(tn+t),
 Xnc(t):=Xc(tn+t),
 Ync(t):=Yc(tn+t),
 Gnc(t):=Gc(tn+t)−Gc(tn),
 Wnl(t):=Wl(tn+t)−Wl(tn).

for .

###### Proof.

PROOF: Fix for some . We have the following

 Xl(s)=(1−s−tma(m))xm+(s−tma(m))~xm+1,
 Xl(s)=(1−s−tma(m))xm+(s−tma(m))(xm+a(m)[ym+Mm+1]),
 Xl(s)=xm+(s−tm)[ym+Mm+1].

Let us express in the form of the above equation. Note that for some . Then we have the following:

 Xnl(t)=xn+k+(tn+t−tn+k)[yn+k+Mn+k+1].

Unfolding , in the above equation till , yields:

 Xnl(t)=Xnl(0)+n+k−1∑l=n(a(l)[yl+Ml+1]+gl)+(tn+t−tn+k)[yn+k+Mn+k+1]. (8)

We make the following observations:
,
,
and
.
As a consequence of the above observations, (8) becomes:

 Xnl(t)=Xnl(0)+t∫0Ync(τ)dτ+ Wnl(t)+ Gnc(t).

Fix . If and are viewed as subsets of equipped with the Skorohod topology, then we may use the Arzela-Ascoli theorem for to show that they are relatively compact, see Billingsley [8] for details. The Arzela-Ascoli theorem for states the following: A set , is relatively compact if and only if the following conditions are satisfied:

• ,

• ,

• and

• .

If and are point-wise bounded and any two of their discontinuities are separated by at least , for some fixed , then the above four conditions will be satisfied, see [8] for details.

###### Lemma 2.

and are relatively compact in equipped with the Skorohod topology.

###### Proof.

PROOF: Recall from that , . Since is Marchaud, it follows that for some and that . Further, for some constant that is independent of . In other words, we have that the sequences and are point-wise bounded. It remains to show that any two discontinuities are separated. Let and . Clearly . Define

 m(n)=max{j>0 ∣ n+j∑k=na(k)

If there is a jump at , then . It follows from the definition of that for . In other words, there are no discontinuities in the interval and . If we fix , then any two discontinuities are separated by at least . ∎

Since is arbitrary, it follows that and are relatively compact in . Since is point-wise bounded (assumption ) and continuous, it is relatively compact in . It follows from that any limit of , in , is the constant function. Suppose we consider sub-sequences of and