# Training GANs with Centripetal Acceleration

Training generative adversarial networks (GANs) often suffers from cyclic behaviors of iterates. Based on a simple intuition that the direction of centripetal acceleration of an object moving in uniform circular motion is toward the center of the circle, we present the Simultaneous Centripetal Acceleration (SCA) method and the Alternating Centripetal Acceleration (ACA) method to alleviate the cyclic behaviors. Under suitable conditions, gradient descent methods with either SCA or ACA are shown to be linearly convergent for bilinear games. Numerical experiments are conducted by applying ACA to existing gradient-based algorithms in a GAN setup scenario, which demonstrate the superiority of ACA.

## Authors

• 34 publications
• 2 publications
• 38 publications
• 8 publications
• ### Training GANs with predictive projection centripetal acceleration

Although remarkable successful in practice, training generative adversar...
10/07/2020 ∙ by Li Keke, et al. ∙ 0

The wide applications of Generative adversarial networks benefit from th...
11/10/2021 ∙ by Huiqing Qi, et al. ∙ 21

• ### Towards Distributed Coevolutionary GANs

Generative Adversarial Networks (GANs) have become one of the dominant m...
07/21/2018 ∙ by Tom Schmiedlechner, et al. ∙ 0

• ### Convergence Behaviour of Some Gradient-Based Methods on Bilinear Games

Min-max optimization has attracted much attention in the machine learnin...
08/15/2019 ∙ by Guojun Zhang, et al. ∙ 0

• ### Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile

Owing to their connection with generative adversarial networks (GANs), s...
07/07/2018 ∙ by Panayotis Mertikopoulos, et al. ∙ 0

• ### Mirror descent in saddle-point problems: Going the extra (gradient) mile

Owing to their connection with generative adversarial networks (GANs), s...
07/07/2018 ∙ by Panayotis Mertikopoulos, et al. ∙ 0

• ### Bayesian CycleGAN via Marginalizing Latent Sampling

Recent techniques built on Generative Adversarial Networks (GANs) like C...
11/19/2018 ∙ by Haoran You, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Generative Adversarial Nets (GANs)[7] are recognized as powerful generative models, which have successfully been applied to various fields such as image generation[8], representation learning[15]

[17]. The idea behind GANs is an adversarial game between a generator network (G-net) and a discriminator network (D-net). The G-net attempts to generate synthetic data from some noise to deceive the D-net while the D-net tries to discern between the synthetic data and the real data. The original GANs can be formulated as the min-max problem:

 minGmaxDV(G,D)=Ex∼pdata[logD(x)]+Ez∼pz(z)[log(1−D(G(z)))]. (1.1)

Though GANs are appealing, they are often hard to train. The main difficulty might be the associated gradient vector field rotating around a Nash equilibrium due to the existence of imaginary components in the Jacobian eigenvalues

[11], which results in the limit oscillatory behaviors. There are a series of studies focusing on developing fast and stable methods of training GANs. Using the Jacobian, consensus optimization[11] diverts gradient updates to the descent direction of the field magnitudes. More essentially, a differential game can always be decomposed into a potential game and a Hamiltonian game[1]. Potential games have been intensively studied [13] because gradient decent methods converge in these games. Hamiltonian games obey a conservation law such that iterates generated by gradient descent are likely to cycle or even diverge in these games. Therefore, Hamiltonian components might be the cause of cycling when gradient descent methods are applied. Based on the observations, the Symplectic Gradient Adjustment (SGA) method [1] modifies the associated vector field to guide the iterates to cross the curl of the Hamiltonian component of a differential game. [4] also uses the similar technique to cross the curl such that rotations are alleviated. By augmenting the Follow-the-Regularized-Leader algorithm with an regularizer[16] by adding an optimistic predictor of the next iteration gradient, Optimistic Mirror Descent (OMD) methods are presented in [3] and analysed in [5, 10, 9, 12]. The negative momentum is employed in [6] to deplete the kinetic energy of the cyclic motion such that iterates would fall towards the center. It is also observed in [6] that the alternating version of the negative momentum method is more stable.

Our idea is motivated by two aspects. Firstly and intuitively, we use the fact that the direction of centripetal acceleration of an object moving in uniform circular motion points to the center of the circle, which might guide iterates to cross the curl and escape from cycling traps. Secondly, we try to find a method to approximate the dynamics of consensus optimization or SGA to cross the curl without computing the Jacobian, which can reduce computational costs. Then we were inspired to present the centripetal acceleration methods, which can be used to adjust gradients in various methods such as SGD, RMSProp

[18] and Adam[2]. For stability and effectiveness, we are also motivated by [6] to study the alternating scheme, which could even work in a notorious GAN setup scenario.

The main contributions are as follows:

1. From two different perspectives, we present centripetal acceleration methods to alleviate the cyclic behaviors in training GANs. Specifically, we propose the Simultaneous Centripetal Acceleration (SCA) method and the Alternating Centripetal Acceleration (ACA) method.

2. For bilinear games, which are purely adversarial, we prove that gradient descent with either SCA or ACA is linearly convergent under suitable conditions.

3. Primary numerical simulations are conducted in a GAN setup scenario, which show that the centripetal acceleration is useful while combining several gradient-based algorithms.

Outline. The rest of the paper is organized as follows. In Section 2, we present simultaneous and alternating centripetal acceleration methods and discuss them with closely related works. In Section 3, focusing on bilinear games, we prove the linear convergence of gradient descent combined with the two centripetal acceleration methods. In Section 4, we conduct numerical experiments to test the effectiveness of centripetal acceleration methods. Section 5 concludes the paper.

## 2 Centripetal Acceleration Methods

A differentiable two-player game involves two loss functions

and defined over a parameter space . Player 1 tries to minimize the loss while player 2 attempts to minimize the loss . The goal is to find a local Nash equilibrium of the game, i.e. a pair with the following two conditions holding in a neighborhood of :

 ¯θ∈argminθl1(θ,¯ϕ),    ¯ϕ∈argminϕl2(¯θ,ϕ).

The derivation of problem (1.1) leads to a two-player game. The G-net is parameterized as while the D-net is parameterized as . Then the problem becomes to find a local Nash equilibrium:

 (2.1)

where

 V(θ,ϕ)=Ex∼pdata[logD(x;ϕ)]+Ez∼pz(z)[log(1−D(G(z;θ);ϕ))]. (2.2)

The simultaneous gradient descent method in training GANs [14] is

 θt+1=θt−α∇θV(θt,ϕt),    ϕt+1=ϕt+α∇ϕV(θt,ϕt).

The alternating version is

 θt+1=θt−α∇θV(θt,ϕt),    ϕt+1=ϕt+α∇ϕV(θt+1,ϕt).

However, directly applying gradient descent even fails to approach the saddle point in a toy model (See Fig. 2 in Section 4). By applying the Simultaneous Centripetal Acceleration (SCA) method, which will be explained later, to adjust gradients, we obtain the method of Gradient descent with SCA (Grad-SCA):

 Gθ =∇θV(θt,ϕt)+β1α1(∇θV(θt,ϕt)−∇θV(θt−1,ϕt−1)), (2.3) θt+1 =θt−α1Gθ, (2.4) Gϕ =∇ϕV(θt,ϕt)+β2α2(∇ϕV(θt,ϕt)−∇ϕV(θt−1,ϕt−1)), (2.5) ϕt+1 =ϕt+α2Gϕ. (2.6)

It can be seen that the gradient decent scheme is still employed in (2.4) and (2.6), while the gradients in (2.3) and (2.5) are adjusted by adding the directions of centripetal acceleration simultaneously. If adjusting the gradients by the Alternating Centripetal Acceleration (ACA) method, we obtain the following method of Gradient descent with ACA (Grad-ACA):

 Gθ =∇θV(θt,ϕt)+β1α1(∇θV(θt,ϕt)−∇θV(θt−1,ϕt−1)), (2.7) θt+1 =θt−α1Gθ, (2.8) Gϕ =∇ϕV(θt+1,ϕt)+β2α2(∇ϕV(θt+1,ϕt)−∇ϕV(θt,ϕt−1)), (2.9) ϕt+1 =ϕt+α2Gϕ. (2.10)

Grad-ACA also employs simple gradient descent steps but adjusts the gradients by adding the directions of centripetal acceleration alternatively. Nevertheless, the idea of centripetal acceleration can also be applied to other gradient-based methods, resulting in more efficient algorithms. For example, the RMSProp algorithm [18] with ACA, abbreviated by RMSProp-ACA, performs well in our numerical experiments (see Section 4.2).

The basic intuition behind employing centripetal acceleration is shown in Fig. 1. Consider the uniform circular motion. Let  denote the instantaneous velocity at time . Then the centripetal acceleration points to the origin. The cyclic behavior around a Nash equilibrium might be similar to the circular motion around the origin. Therefore, the centripetal acceleration provides a direction, along which the iterates can approach the target more quickly. Then the approximated centripetal acceleration term is applied to gradient descent as illustrated in Grad-SCA.

The proposed centripetal acceleration methods are also inspired by the dynamics of consensus optimization. In a Hamiltonian game, the associated vector field conserves the Hamiltonian’s level sets because , which prevents iterates from approaching the equilibrium where . To illustrate the similarity between centripetal acceleration methods and consensus optimization in Hamiltonian games, we consider the -player differential game where each player has a loss function for . Then the simultaneous gradient is . The Jacobian of is

 J:=⎡⎢ ⎢ ⎢ ⎢⎣∇w1w1l1∇w1w2l1⋯∇w1wnl1∇w2w1l2∇w2w2l2⋯∇w1w2l2⋯⋯⋯⋯∇wnw1ln∇wnw2ln⋯∇wnwnln⎤⎥ ⎥ ⎥ ⎥⎦. (2.11)

Let . Then the iteration scheme of consensus optimization is

 wk+1=wk−α(ξk+βJTkξk) (2.12)

and the corresponding continuous dynamics has the form:

 dwdt=−(I+βJT)ξ. (2.13)

When is small, the dynamics approximates

 dwdt=−(I−βJT)−1ξ. (2.14)

By rearranging the order, we obtain

 dwdt=−ξ+βJTdwdt. (2.15)

Since the game is assumed to be Hamiltonian, i.e., , the dynamic equation (2.15) becomes

 dwdt=−ξ−βJdwdt. (2.16)

Note that . Then (2.16) is equivalent to

 dwdt=−ξ−βdξdt. (2.17)

Discretizing the equation with stepsize , we obtain

 wt+1=wt−αξt−β(ξt−ξt−1), (2.18)

which is exactly Grad-SCA. Furthermore, in Hamiltonian games, the dynamics of consensus optimization and SGA that plugs into gradient descent algorithms (Grad-SGA) are essentially the same. Therefore, the presented Grad-SCA could be regarded as a Jacobian-free approximation of consensus optimization or Grad-SGA.

Related works. Taking in Grad-SCA (2.3)-(2.6), the centripetal acceleration scheme reduces to OMD[3], which has the following form:

 θt+1 =θt−2α∇θV(θt,ϕt)+α∇θV(θt−1,ϕt−1), ϕt+1 =ϕt+2α∇ϕV(θt,ϕt)−α∇ϕV(θt−1,ϕt−1).

Very recently, from the perspective of generalizing OMD, [12] presented schemes similar to Grad-SCA and they studied its convergence under a unified proximal method framework. However, OMD is motivated by predicting the next iteration gradient to be the current gradient optimistically. Although the scheme of OMD coincides with Grad-SCA, we must stress that the motivations are essentially different and result in totally distinct parameter selection strategies. Due to the similar dynamics, the presented methods inherit parameter selection strategies of consensus optimization and SGA. For example, in the second experiment in Section 4, we take and . The magnitude of is quite larger than instead of an equality. Moreover, we analyze the alternating form (Grad-ACA) (2.7)-(2.10) and employed RMSProp-ACA in the numerical experiments. Therefore, the presented methods are not trivial generalizations of OMD and the idea of centripetal acceleration is quite useful.

Another similar scheme[5] is to extrapolate the gradient from the past:

 θt+12 =θt−α∇θV(θt−12,ϕt−12), ϕt+12 =ϕt+α∇ϕV(θt−12,ϕt−12), θt+1 =θt−α∇θV(θt+12,ϕt+12), ϕt+1 =ϕt+α∇ϕV(θt+12,ϕt+12).

It can be rewritten as

 θt+12 =θt−12−2α∇θV(θt−12,ϕt−12)+α∇θV(θt−32,ϕt−32), ϕt+12 =ϕt−12+2α∇ϕV(θt−12,ϕt−12)−α∇ϕV(θt−32,ϕt−32)

which is equivalent to OMD. The algorithm may also be closely related to the predictive methods with the following form:

 θt+12 =θt−α∇V(θt,ϕt), ϕt+12 =ϕt+α∇V(θt,ϕt), θt+1 =θt−β∇V(θt+12,ϕt+12), ϕt+1 =ϕt+β∇V(θt+12,ϕt+12).

A unified framework to analyze OMD and predictive methods is presented in [9].

Last but not least, our idea of using alternating scheme comes from negative momentum methods[6], which suggests alternating forms might be more stable and effective in practice.

## 3 Linear Convergence for Bilinear Games

In this section, we focus on the convergence of Grad-SCA and Grad-ACA in the bilinear game:

 (3.1)

Any stationary point of the game satisfies the first order conditions:

 Aϕ∗+b =0 (3.2) ATθ∗+c =0. (3.3)

It is obvious that a stationary point exists if and only if is in the range of and is in the range of . We suppose that such a pair exists. Without loss of generality, we shift to . Then the problem is reformulated as:

 minθ∈Rdmaxϕ∈RpθTAϕ,   A∈Rd×p. (3.4)

In the following two subsections, we analyze convergence properties of Grad-SCA and Grad-ACA, respectively. Technique details are postponed to appendices.

### 3.1 Linear Convergence of Grad-SCA

For the bilinear game, Grad-SCA is specified as

 θt+1 =θt−α1Aϕt−β1(Aϕt−Aϕt−1), (3.5) ϕt+1 =ϕt+α2ATθt+β2(ATθt−ATθt−1). (3.6)

Define the matrix as

 F1:=⎡⎢ ⎢ ⎢ ⎢⎣Id−(α1+β1)A0β1A(α2+β2)ATIp−β2AT0Id0000Ip00⎤⎥ ⎥ ⎥ ⎥⎦. (3.7)

It is obvious that , where are generated by (3.5) and (3.6). For simplicity, we suppose that is square and nonsingular in Propositions 3.2 and 3.3 and Corollary 3.4. Then we prove the linear convergence for a general matrix in Proposition 3.5 and Corollary 3.6. We will employ the following well-known lemma to illustrate the linear convergence.

###### Lemma 3.1.

Suppose that has the spectral radius . Then the iterative system converges to linearly. Explicitly, , there exists a constant such that

 ∥xt∥≤C(ρ(F)+ε)t. (3.8)
###### Proposition 3.2.

Suppose that is square and nonsingular. The eigenvalues of are the roots of the fourth order polynomials:

 λ2(1−λ)2+(λ(α2+β2)−β2)(λ(α1+β1)−β1)ζ,   ζ∈Sp(ATA), (3.9)

where denotes the collection of all eigenvalues.

Next, we consider cases when and .

###### Proposition 3.3.

Suppose that is square and nonsingular. Then is linearly convergent to 0 if and satisfy

 0<α+β≤1√λmax(ATA),    |α−β|≤√λmin(ATA)|α+β|210, (3.10)

where and denote the largest and the smallest eigenvalues, respectively.

Consider the special case when Grad-SCA reduces to OMD. Then we have the following corollary. The corollary is slightly weaker than the existing result [9, Lemma 3.1].

###### Corollary 3.4.

Suppose that is square and nonsingular. If and , then is linearly convergent, i.e., , there exists such that

 Δt≤C(ε+√12+12√1−α2λmin(ATA))2t.

Now we do not assume to be square and nonsingular (). Instead, suppose has rank and the SVD decomposition is , where with , and . Denote by the null space of , which means , and by the null space of . Note that any is a stationary point and we define

 ΔPt:=∥θt+1−PN(θ0)∥2+∥θt−PN(θ0)∥2+∥ϕt+1−PM(ϕ0)∥2+∥ϕt−PM(ϕ0)∥2,

where denotes the orthogonal projection onto while denotes the orthogonal projection onto .

###### Proposition 3.5.

Suppose that and . Then is linearly convergent.

With the analogous analysis, we have the following result for OMD.

###### Corollary 3.6.

If and , then is linearly convergent, i.e., , there exists a constant such that

 ΔPt+1≤C(ε+√12+12√1−α2σ2r)2t.

### 3.2 Linear Convergence of Grad-ACA

In this subsection, we consider Grad-ACA for the bilinear game,

 θt+1 =θt−α1Aϕt−β1(Aϕt−Aϕt−1), (3.11) ϕt+1 =ϕt+α2ATθt+1+β2(ATθt+1−ATθt). (3.12)

The update of can be rewritten as:

 ϕt+1 =ϕt+(α2+β2)AT(θt−α1Aϕt−β1(Aϕt−Aϕt−1))−β2ATθt.

Thus we define the matrix

 F2:=⎡⎢ ⎢ ⎢ ⎢⎣I−(α1+β1)A0β1Aα2ATI−(α1+β1)(α2+β2)ATA0(α2+β2)β1ATAI0000I00⎤⎥ ⎥ ⎥ ⎥⎦, (3.13)

which immediately follows that .

###### Proposition 3.7.

Suppose that is square and nonsingular. Consider the special case where . If , then is linearly convergent to , i.e., there exists a constant such that

 Δt≤C(1−α2λmin(ATA)+α4λmin(ATA)2)2t.

Next, we do not assume to be square and nonsingular. Employing the SVD decomposition and with the same techniques employed in Proposition 3.5, we have

###### Corollary 3.8.

Consider the special case where . If , Then is linearly convergent, i.e., there exists a constant such that

 ΔPt≤C(1−α2σ2r+α4σ4r)2t,

which implies that linearly converges to the stationary point .

## 4 Numerical Simulation

### 4.1 A Simple Bilinear Game

In the first experiment, we tested Grad-SCA and Grad-ACA on the following bilinear game

 minθ∈Rmaxϕ∈Rθ⋅ϕ. (4.1)

The unique stationary point is . The behaviors of the methods are presented in Fig. 2. Pure gradient descent steps do not converge to the origin in this simple game. However, with centripetal acceleration methods, both Grad-SCA and Grad-ACA converge to the origin.

We compared the effects of various step-sizes and acceleration coefficients in both simultaneous and alternating cases. Fig. 3 suggests that the alternating methods are preferable.

### 4.2 Mixture of Gaussians

In the second simulation111The code is available at https://github.com/dynames0098/GANsTrainingWithCenAcc

, we established a toy GAN model to compare several methods on learning eight Gaussians with standard deviation

. The ground truth is shown in Fig. 4.

Both the generator and the discriminator networks have four fully connected layers of neurons. Each of the four layers is activated by a ReLU layer. The generator has two output neurons to represent a generated point while the discriminator has one output which judges a sample. The random noise input for the generator is a 16-D Gaussian. We conducted the experiment on a server equipped with CPU i7 4790, GPU Titan Xp, 16GB RAM as well as TensorFlow (version 1.12) and Python (version 3.6.7).

We compared the results of several algorithms as shown in Fig. 6. Five methods are included in the comparison:

1. RMSProp: Simultaneous RMSPropOptimizer (learning rate: ) provided by TensorFlow.

2. RMSProp-alt: Alternating RMSPropOptimizer (learning rate: ).

3. ConOpt: Consensus optimizer ()[11].

5. RMSProp-ACA: RMSPropOptimizer with alternating centripetal acceleration method ().

To stress the effectiveness brought by parameter selection and alternating strategy regardless of the similar form with OMD, we also tested OMD on this simulation with searching a range of parameters (See Appendix B).

The centripetal acceleration methods have extra computation costs on computing the difference between successive gradients as well as storage costs to maintain previous gradients. The consensus optimization and SGA require extra computations on the Jacobian related steps. Fig. 5 shows a time consuming comparison. From these comparisons, RMSProp-ACA seems competitive to other methods.

## 5 Conclusion

In this paper, to alleviate the difficulty in finding a local Nash equilibrium in a smooth two-player game, we were inspired to present several gradient-based methods, including Grad-SCA and Grad-ACA, which employ centripetal acceleration. The proposed methods can easily be plugged into other gradient-based algorithms like SGD, Adam or RMSProp in both simultaneous or alternating ways. From the theoretical viewpoint, we proved that both Grad-SCA and Grad-ACA have linear convergence for bilinear games under suitable conditions. We found that in a simple bilinear game, centripetal acceleration makes iterates converge to the Nash equilibrium stably; these examples also suggest that alternating methods are more preferred than simultaneous ones. In the GAN setup numerical simulations, we showed that the RMSProp-ACA can be competitive to consensus optimization and symplectic gradient adjustment methods.

However, we only consider the deterministic bilinear games theoretically and limited numerical simulations. In practical training of GANs or its variants, the associated games are much more complicated due to the randomness of computation, the online procedure and non-convexity. These issues still need further detailed studies.

## Appendix A Proofs in Section 3

### a.1 Proof of Proposition 3.2

Proof The characteristic polynomial of the matrix (3.7) is

 det⎛⎜ ⎜ ⎜ ⎜⎝Id−λId−(α1+β1)A0β1A(α2+β2)ATIp−λIp−β2AT0Id0−λId00Ip0−λIp⎞⎟ ⎟ ⎟ ⎟⎠, (A.1)

which is equivalent to

 (A.2)

Since is nonsingular and square, then or can not be the roots of A.2. Then the roots of (A.2) must be the roots of

 det(λ(1−λ)Ip+1λ(1−λ)(λ(α2+β2)−β2)(λ(α1+β1)−β1)ATA). (A.3)

It follows that the eigenvalues of must be the roots of the fourth order polynomials:

 λ2(1−λ)2+(λ(α2+β2)−β2)(λ(α1+β1)−β1)ζ,   ζ∈Sp(ATA).

### a.2 Proof of Proposition 3.3

Proof Given an eigenvalue of , using Proposition 3.2, we have

 (λ2−λ−i(λ(α+β)−β)√ζ)(λ2−λ+i(λ(α+β)−β)√ζ)=0,   ζ∈Sp(ATA). (A.4)

Denote and . Then the four roots of (A.4) are

 λ±1 =1+is±√1+2it−s22, λ±2 =1−is±√1−2it−s22.

Note that for a given complex number , the absolute value of the real part of is and the absolute value of the imaginary part of is . Therefore, since , all real parts of lie in the interval , where

 R=12√√(1−s2)2+4t2+1−s22+12 (A.5)

and all imaginary parts of lie in the interval , where

 I=12√√(1−s2)2+4t2−1+s22+s2. (A.6)

Using the inequality

 √x+y≤√x+y2√x,    (x>0,y≥0), (A.7)

we have

 R ≤12√1−s2+t21−s2+12, (A.8) I ≤s2+|t|2√1−s2. (A.9)

Next, we discuss in and separately.
(1). In the first case, we suppose . Since for all , we have

 |t| ≤s210.

Noting that , we obtain

 |t|≤1−√1−s25. (A.10)

Combining and (A.10) yields

 |t| ≤2(1−√1−s2)(1−s2)5 ≤(1−√1−s2)(1−s2)2√1−s2+12,

which follows that

 1 ≥|t|√1−s2+√1−s2+|t|2(1−s2)+|t|√1−s2 ≥t21−s2+√1−s2+t22(1−s2)32+s|t|√1−s2 (A.11) ≥t21−s2+√1−s2+t21−s2+s|t|√1−s2. (A.12)

The inequality (A.11) follows by the fact that and the inequality (A.12) uses (A.7). The inequality above is equivalent to

 (12√1−s2+t21−s2+12)2+(s2+|t|2√1−s2)2≤1.

Using (A.8) and (A.9), we obtain

 ρ(F1)≤√R2+I2≤1. (A.13)

Note that the equality of (A.7) holds if and only if . Thus the equality of (A.13) implies and . Since , we have the strict inequality , which leads to the linear convergence of .
(2). In the second case, assume . Since , using (A.5) and (A.6) directly, we have

 ρ(F1)≤√R2+I2<1, (A.14)

which yields the linear convergence. ∎

### a.3 Proof of Corollary 3.4

Proof For the special cases, we have and . From (A.8) and (A.9), we obtain

 ρ(F1)≤√R2+I2 =12√((1−s2)+1+2√1−s2)+s2 =√12+12√1−s2 ≤√12+12√1−α2λmin(ATA)<1.

From Lemma 3.1 it follows that is linearly convergent. ∎

### a.4 Proof of Proposition 3.5

Proof Using the SVD decomposition , we have

 UTθt+1 =UTθt−αDVTϕt−β(DVTϕt−DVTϕt−1), VTϕt+1 =VTϕt+αDUTθt+β(DUTθt−DUTθt−1).

According to the definition of the diagonal matrix , the -th component to -th components of and are zeros. Therefore, we focus on the leading components of and , denoted by and respectively. Let be the matrix composed of the leading rows and columns of . Then we have

 [UTθt+1]1:r =[UTθt]1:r−αDr[VTϕt]1:r−β(Dr[VTϕt]1:r−Dr[VTϕt−1]1:r), (A.15) 1:r =[VTϕt]1:r+αDr[UTθt]1:r+β(Dr[UTθt]1:r−Dr[UTθt−1]1:r). (A.16)

Define

 Δrt: =∥[UTθt]1:r∥2+∥[UTθt+1]1:r∥2+∥[VTϕt]1:r∥2+∥[