In this paper, we study the following composite optimization problem
where is differentiable and is Lipschitz continuous with , and is proximable. The inertial proximal algorithm for the problem (iPiano)  can be described as
where is the stepsize and is the inertial parameter. The iPiano is closely related to two classical algorithms: the forward-backward splitting method  (when ) and heavy-ball method (when ) . The iPiano is a combination of forward-backward splitting method and and heavy-ball method. However, different from forward-backward splitting, the sequence generated by iPiano is not Fejér monotone due to the inertial term . This brings troubles in proving the convergence rates in the convex case. Note that the heavy-ball method is a special form of iPiano. The difficulty also exists in analyzing the complexity of heavy-ball method. In the existing literatures, the sublinear convergence rate of the heavy-ball was established only in the sense of ergodicity. In this paper, we propose a novel Lyapunov function to address this issue, and prove the non-ergodic convergence rates.
1.1 Interpretation by dynamical systems
The discretization of the following dynamical system gives the heavy-ball method with :
for some . If further , the heavy-ball method reduces to basic gradient descent, which results in the discretization of the following ODE
Studying the property of the above dynamical systems helps us to understand the algorithms. More importantly, it motivates us to construct the proper Lyapunov function. We notice that some important relation between and is missing. It is thus natural to add the missing information back to (1.3). In the discretization, is replaced by , where is stepsize for discretization. Then it holds that
Note that both and can be viewed as the discretization of . Motivated by this observation, we propose to modify (1.3) by adding the following constraint
where . In Section 2, we study the system (1.3)+(1.6). With the extra constraint (1.6), the sublinear asymptotical convergence rate can be established. The analysis enables the non-ergodic sublinear convergence rate for heavy-ball (inertial) algorithm.
1.2 Related works
The inertial term was first proposed in the heavy-ball algorithm . When the objective function is twice continuously differentiable, strongly convex (almost quadratic), the Heavy-ball method is proved to converge linearly. Under weaker assumption that the gradient of the objective function is Lipschitz continuous,  proved the convergence to a critical point, yet without specifying the convergence rate. The smoothness of objective function is critical for the heavy-ball to converge. In fact, there is an example that the heavy-ball method diverges for a stronly convex but nonsmooth function . Different from the classical gradient methods, heavy-ball algorithm fails to generate a Fejér monotone sequence. In general convex and smooth case, the only convergence rate result is ergodic in terms of the function values .
The iPiano combines heavy-ball method with the proximal mapping as in forward-backward splitting. In the nonconvex case, convergence of the algorithm was thoroughly discussed . The local linear convergence of iPiano and heavy-ball method has been proved in . In the strongly convex case, the linear convergence was proved for iPiano with fixed . In the paper , inertial Proximal Alternating Linearized Minimization (iPALM) was introduced as a variant of iPiano for solving two-block regularized problem. Without the inertial terms, this algorithm reduces to the Proximal Alternating Linearized Minimization (PALM) , being equivalent to the two-block case of the Coordinate Descent (CD) algorithm . In the convex case, the two-block CD methods are also well studied [2, 17, 3, 18]. Recently, there is gowing interests in studying CD method using the operators [5, 14, 19, 8].
1.3 Contribution and organization
In this paper, we present the first non-ergodic convergence rate result for iPiano in general convex case. Compared with results in , our convergence is established with a much larger stepsize under the coercive assumption. If the function fails to be coercive, we can choose asymptotic stepsizes. We also present the linear convergence under an error bound condition without assuming strong convexity. Similar to the coercive case, our results hold for relaxed stepsizes. In addition, we extend our result to the coordinate descent version of iPiano. Both cyclic and stochastic index selection strategies are considered. The contributions of this paper are summarized as follows:
1. A novel dynamical interpretation: We propose a modified dynamical system of the inertial algorithm, from which we derive the sublinear asymptotical convergence rate with a proper Lyapunov function.
2. The non-ergodic sublinear convergence rate: We are the first to prove the non-ergodic convergence rates of the inertial proximal gradient algorithm. The linear convergence rate is also proved for the objective function without strong convexity. The brief idea of proof is to bound the Lyapunov function, and connect this bound to the successive difference of the Lyapunov function.
3. Better linear convergence: Stronger linear convergence results are proved for inertial algorithms. Compared with that in the literature, we have relaxed stepsize and inertial parameters. The strong convexity assumption can be weaken. More importantly, we show that the stepsize can be chosen independent of the strong convexity constant.
4. Extensions to multi-block version: The convergence of multi-block versions of inertial methods is studied. Both cyclic and stochastic index selection strategies are considered. The sublinear and linear convergence rates are proved for both algorithms.
The rest of the paper is organized as follows. In Section 2, we study the modified dynamical system and present technical lemmas. In Section 3, we show the convergence rates for inertial proximal gradient methods. We extend the results to the multi-block version of iPiano in Section 4, and to the stochastic version in Section 5. Section 6 concludes this article.
2 Dynamical motivation and technical lemmas
In this part, we first analyze the performance of the modified dynamical system (1.3)+(1.6). The existence of the system is beyond the scope of this paper and will not be discussed. And then, two lemmas are introduced for the sublinear convergence rates analysis.
2.1 Performance of the modified dynamical system
With direct computation, it holds that
Assume that is coercive, noting is decreasing and nonnegative, must be bounded. With the continuity of , is also bounded. That means is also bounded. If , with the triangle inequality,
We then obtain the boundedness of and . Let , we have
With the boundedness, denote that
Then, we can easily have
That is also
Taking integrations of both sides, we then have
2.2 Technical lemmas
This parts contains two lemmas on nonnegative sequences: Lemma 1 is used to derive the convergence rate. It can be regarded as the discrete form of (2.6); Lemma 2 is developed to bound the sequence when inertial parameters are decreasing.
Lemma 1 (Lemma 3.8, ).
Let be nonnegative sequence of real numbers satisfying
Then, we have
Let be a nonnegative sequence and follow the condition
If is descending and
3 Convergence rates
In this section, we prove convergence rates of iPiano. The core of the proof is to construct a proper Lyapunov function.
Suppose is a convex function with -Lipschitz gradient and is convex, and . Let be generated by the inertial proximal gradient algorithm with non-increasing . Choosing the step size
for arbitrary fixed , we have
Updating directly gives
With the convexity of , we have
With Lipschitz continuity of ,
where uses the Schwarz inequality . With direct calculations, we then obtain
With the non-increasity of , is also non-increasing. Thus, we obtain the (3). ∎
We employ the following Lyapunov function
Suppose the conditions of Lemma 3 hold. Let denote the projection of onto , assumed to exist, and define
Then it holds
With direct computation and Lemma 3, we have
The convexity of yields
where . By (3.2), we then have
Similarly, we have
Using this and the definition of (3.7), we have:
Direct calculation yields
Thus, we derive
3.1 Sublinear convergence rate under weak convexity
In this subsection, we present the sublinear of the convex iPiano. The coercivity of the function is critical for the analysis. If is coercive, the parameter can be bounded from ; however, if fails to be promised to be coercive, must be descending to zero. Thus, this subsection will be divided into two parts in term of the coercivity.
3.1.1 is convercive
First, we present the non-ergodic convergence rate of the function value. The rate can be derived if is bounded from and .
Assume the conditions of Lemma 3 hold, and
Then we have
To the best of our knowledge, this is the first time to prove the non-ergodic convergence rate in the perspective of function values for iPiano and heavy-ball method in the convex case.
3.1.2 fails to be coercive
In this case, to obtain the boundedness of the sequence , we must employ diminishing , i.e., . The following lemma can derive the needed boundedness.
Suppose the conditions of Lemma 3 hold, and
where . Let be generated by the inertial proximal gradient algorithm algorithm, then, is bounded.
First, we prove that is a contractive operator. For any ,
where the first inequality depends on the fact , and the second one is due to .
Let be a minimizer of . Obviously, it holds
Noting is contractive,
With Lemma 2, we then prove the result. ∎
Now, we are prepared to present the rate of the function values when is not coercive.
Suppose the conditions of Lemma 5 hold. Let be generated by the inertial proximal gradient algorithm algorithm, then we have
3.2 linear convergence and sublinear convergence under optimal strong convexity condition
We say that the function satisfies the optimal strong convexity condition, if
where is the projection of onto the set , and . This condition is much weaker than the strong convexity.
With (3.19), we have
On the other hand, from the definition of (3.7),
With Lemma 4, we then derive
With the assumption, , and the bound is assumed as . And then, we have the following result,
If , we have . The result certainly holds. If ,
With basic algebraic computation,
By defining , we then prove the result. ∎
4 Cyclic coordinate descent inertial algorithm
This part analyzes the cyclic coordinate inertial proximal algorithm. The two-block version is proposed in , which focuses on the nonconvex case. Here, we consider the multi-block version and prove its convergence rate under convexity assumption. The minimization problem can be described as
where and () are all convex. We use the notation
The cyclic coordinate descent inertial algorithm runs as: for from to ,
where . The iPALM can be regarded as the two-block case of this algorithm. The function is assumed to satisfy
for any and , and . With (4.3), we can easily obtain
The proof is similar to [Lemma 1.2.3,] and will not be reproduced. In the following part of this paper, we use the following assumption
A1: for any , the sequence is non-increasing.
For any ,
With the convexity of , we have
With (4), we can have