1 Introduction
Proximal splitting methods such as DouglasRachford splitting, the alternating direction method of multipliers, forwardbackward splitting and many others (see [6, 2, 8, 3, 9, 10, 7]) are often used for solving largescale convex optimization problems of the form
(1) 
where and (or) have cheaply computable proximal mappings. Since also many nonconvex functions possess cheap proximal computations, there is a great interest in analyzing whether these iterates still converge to a solution. This paper focuses on analyzing the performance of splitting methods applied to problems, where is convex and
(2) 
is nonconvex with

being an increasing, convex function,

being a unitarily invariant norm,

being the indicator function for matrices that have at most rank .
Analogously, one can consider vectorvalued problems where the rank constraint is replaced by the cardinality constraint. Both problem types are very common within statistics, machine learning, automatic control and many more (see
[12, 14, 26, 30, 25, 5, 4, 15]).Till this day, only special instances of solving this problem with proximal splitting methods have been analyzed [16, 21, 24, 17, 29], mainly under the assumption that is the indicator function of an affine set and . In this paper, we deal with general convex functions and a large class of functions , which allow us to provide an alternative analysis for showing local convergence.
Letting denote the biconjugate (convex envelope) of , we show conditions under which the proximal operator to the nonconvex function in Eq. 2 and its convex envelope (which was introduced in [12]) coincide. We translate these conditions to the setting of applying the DouglasRachford and forwardbackward splitting algorithms to the nonconvex problem Eq. 1 with in Eq. 2, and its optimal convex relaxation
We show that the conditions imply local convergence of the nonconvex splitting methods whenever all solutions to the convex relaxation are solutions to Eq. 1. Thus in many practical examples, there is no loss in directly using the nonconvex algorithms. In fact, there are many examples where the nonconvex methods can find a lowrank solution where the optimal convex relaxation fails. In other words, the nonconvex algorithm can have lowrank limit points, whilst the convex has none, but not vice versa. This fact is explicitly analysed for the case where is the Frobenius norm and .
Interestingly, we will see that unlike in the convex case, proximal splitting methods applied to Eq. 1 and
(3) 
where , do not necessarily converge to the same limit points. Furthermore, the existence of a limit point as well as the the region of attraction in our local convergence result highly depend on the size of . On the one hand, if the optimal convex relaxation does not posses a lowrank solution, it is shown that has to be chosen sufficiently small for a limit point to exists. On the other hand, in case of our guaranteed local convergence, the region of attraction grows with , i.e. for every initial point of the proximal algorithms there exists a sufficiently large such that the algorithm converges.
Finally note that besides the ability of finding lowrank solutions when the convex relaxation fails, the nonconvex algorithms are computationally more favourable, because the proximal computations of are significantly cheaper than those of the convex envelope (see [11]).
2 Background
The following notation for real matrices and vectors
is used in this paper. The nonincreasingly ordered singular values of
, counted with multiplicity, are denoted bywhere . Further, for and we define the unique optimal rankr approximation with respect to unitary invariant norms (see [19, Theorem 7.4.9.1]) as
where
is a singular value decomposition (SVD) of
. If , thenFurther, the innerproduct for is defined by
2.1 Norms
A function is called a symmetric gauge function if

is a norm.

, where denotes the elementwise absolute value.

for all permutation matrices and all .
A norm on is unitarily invariant if for all and all unitary matrices and it holds that Since all unitarily invariant norms on define a symmetric gauge function and vice versa (see [19]), we define
By [19] also the dual norm of is unitarily invariant and therefore it is associated with a symmetric gauge function , i.e.
For , the truncated symmetric gauge functions are given by
Then, the socalled lowrank inducing norms are defined in [12] as the dual norms of
The following properties have been shown in [12].
Lemma 1.
For all symmetric gauge functions and it holds that
(4)  
(5) 
Finally, the Frobenius norm is given by
2.2 Functions
The effective domain of a function is defined as
Then is said to be:

proper if .

closed if for each is a closed set.
A function is called increasing if


.
The conjugate and biconjugate function and of are defined as
and . If , then the monotone conjugate is given by
The subdifferential of in is defined as
The proximal mapping of at is defined by
Finally, for the indicator function is defined as
2.3 Optimal Convex Relaxation
It is shown in [12] that the every lowrank inducing norm is the biconjugate (convex envelope) of Eq. 2 for different .
Proposition 1.
Assume is an increasing closed convex function, and let be defined on with . Then,
(6)  
(7) 
These characterizations can be used to formulate Fenchel dual problems and optimal convex relaxations to our rank constrained problems. This is shown in the following proposition, which is from [12].
3 Theoretical Results
In this section we derive the theoretical results that are needed for our convergence analysis in Section 4. The proofs to these results are given in the appendix.
Theorem 1.
Let , and , where is a proper, closed and increasing convex function and Then for all it holds that
Moreover, let
then the following are equivalent:



for all .

Computing the prox of the nonconvex function at , reduces to evaluating the convex prox of either or the convex envelope at . Therefore, only the first singular values and vectors are needed to compute the nonconvex prox. This can be compared to the prox of the convex envelope at , where all singular values and vectors might be needed. To compute the prox of is cheaper than computing the prox of , except for rank matrices, see [12]. Therefore it is often much cheaper to evaluate the prox of the nonconvex function than of its convex envelope .
In order to relate Theorem 1 to the solutions of Eq. 8 and Eq. 10, the following results, which are proven in Sections A.3 and A.2 respectively, will be needed.
Lemma 2.
Let and . Assume that
where either or for some . Then all fulfill that
Moreover, if , then .
4 Convergence Analysis
Next it is discussed how Propositions 3 and 1 can be used to show local convergence of proximal splitting algorithms applied to problems of the form
(11) 
where is a convex function with cheaply computable proximal mapping and an convex, increasing function. To illustrate and support our analysis, let us first recap the following two wellknown proximal splitting algorithms applied to Eq. 1.
DouglasRachford Splitting
The DouglasRachford splitting method is one of the most wellknown splitting algorithms for solving largescale convex problems [7, 22, 8, 6]. In fact, the wellknown alternating direction methods of multipliers (ADMM) is a special case of this algorithm (see [10, 9, 3]). The DouglasRachford iterations are given by equationparentequation
(12a)  
(12b)  
(12c) 
where and . For convex and , and converge towards an identical solution of Eq. 1 and is nonincreasing, where . (see [7, 22, 8]).
ForwardBackward Splitting
Another popular splitting methods is the socalled forwardbackward splitting algorithm (see [6, 2, 20, 27]). In this case, is assumed to be differentiable with Lipschitz continuous gradient, i.e. for all
Then the forwardbackward iterations are given by
where . Also here if and are convex, then it can be shown that converges towards a solution of Eq. 1 and is nonincreasing with .
Local Convergence
One of the steps in the above two methods (and many other operator splitting methods) when applied to solve Eq. 1 is
If and are convex, then converges to a solution of Eq. 1 in both methods and is a nonincreasing sequence, where . Next, we will show that the latter and Theorem 1 imply local convergence of proximal splitting algorithms applied to the nonconvex problem in Eq. 11.
In the following we will refer to a proximal splitting algorithm applied to the optimal convex relaxation in Eq. 10, which is restated here,
(13) 
as the convex splitting algorithm with iterates
Correspondingly, if the algorithm is applied to Eq. 11, i.e. , we speak of the nonconvex splitting algorithm with iterates
Let us assume that is a solution to Eq. 13 with
By (firm) nonexpansiveness of and the continuity of the singular values (see [28, Corollary 4.9]), Theorem 1 implies that
for all , where . Thus, since is nonincreasing, it follows that
This proves the local convergence of the nonconvex algorithm if .
We will conclude this section by linking this condition to the solution set of Eq. 13, which is the same as Eq. 10. A necessary optimality condition for solving Eq. 9 and Eq. 10 is that (see [23, Theorem 7.12.1] and [27, Theorem 23.5.])
Now, relating this to the optimality condition of the convex prox computation:
implies that
is a solution to the dual problem Eq. 9, i.e.,
(14) 
By Theorem 1 we can conclude that if . Hence, Proposition 3 implies the local convergence of nonconvex proximal splitting algorithms, if there exists a solution to Eq. 14 such that
(15) 
This condition insures, by Proposition 3, that Eq. 13 has only solutions of at most rank . Note that if Eq. 13 has solutions of rank larger than , then a convex algorithm cannot be expected to find solutions of rank , despite their possible existence. This is because the solution set of a convex problem is a convex set.
In other words, nonconvex proximal splitting methods locally converge to a solution of Eq. 11, whenever one can expect to find such a solution by solving Eq. 13. Moreover, the region of attraction to contains the ball with
This means that for each initial point there exists a that guarantees the convergence to . Finally, numerical experiments indicate that the nonconvex algorithms can also find rankr solutions to Eq. 13 despite the fact that Eq. 13 may have higher rank solutions.
5 DouglasRachford Limit Points
In the following, let us compare the DouglasRachford limit points to the optimal convex relaxation (convex DouglasRachford) with the limit points of the nonconvex DouglasRachford for problems Eq. 1 where
Using completion of squares and the wellknown SchmidtMirsky Theorem (see [19, Theorem 7.4.9.1]), we get that
(16) 
This allows us to derive the following comparative result on the limit points of the convex and nonconvex DouglasRachford, which is proven in Section A.4.
Theorem 2.
Let with and . Then is a limit point of the convex (nonconvex) DouglasRachford splitting iterate Eq. 12a if and only if there exists such that
and in the

convex case:

nonconvex case:
Theorem 2 verifies what has been discussed in the end of previous section that all limit points of the convex DouglasRachford are limit points to the nonconvex DouglasRachford, but not vice versa. More importantly, it shows the importance of choosing a feasible . In the presence of a duality gap in Eq. 9, Theorem 2 implies that if is chosen too large, then the nonconvex DouglasRachford may not posses a limit point, but choosing sufficiently small can help to gain convergence. Analytical examples where this applies have been studied in [11] and a numerical example is given in the next section. This is very much in contrast to the convex case, where convergence is independent of . Finally note that by choosing just small enough for a limit point to exist, the problem of multiple limit points may be avoided and thus making the algorithm independent of the initialization. Similar derivations can be carried out for all in the form of Eq. 2.
6 Example
Within many areas such as automatic control, the rank of a Hankel operator/matrix is crucial, because it determines the order of a linear dynamical system. Whereas, the celebrated AdamyanArovKrein theorem (see [1]) answers the question of optimal lowrank approximation of infinite dimensional Hankel operators, the following finite dimensional case is still unsolved:
subject to  
where . In the following, we show how nonconvex DouglasRachford splitting performs on this problem class in comparison with the optimal convex relaxation. To this end, we rewrite the problem in the view of Eq. 13 and Eq. 10 as
where . For our numerical experiments we use
The nonconvex DouglasRachford uses and is initialized with for all . The ranks of the solutions to the optimal convex relaxation are shown in Figure 1. We observe that only for the convex relaxation manages to find guaranteed solutions to the nonconvex problem. In contrast, the nonconvex Douglas Rachford converges for all . Figure 2 shows the relative errors of these solutions and the (suboptimal) solutions to the convex relaxation as well as the lower bound that is provided by the convex relaxation (see Proposition 2). Note that the convex relaxation is not able to obtain a suboptimal solution of rank . From Figure 2 it can be seen that the nonconvex solutions for coincide with the convex solutions, just as our local convergence guarantee suggests. However, for all other , the nonconvex approximations outperform the suboptimal solutions of the convex relaxation. Finally, is has been observed that, if one chooses sufficiently large, the nonconvex DouglasRachford does not converge for . This can be explained through Theorem 2.
7 Conclusion
We have shown conditions under which the proximal mapping of the nonconvex function Eq. 2 coincides with the proximal mapping of its convex envelope. This allowed us to state conditions under which the nonconvex and convex DouglasRachford methods and forwardbackward methods coincide. This, in turn, guarantees local convergence of the nonconvex methods in these situations. Furthermore, we have provided a comparison between the convex and nonconvex DouglasRachford limit points for common instance of the squared Frobenius norm. Unlike in the convex case, this has demonstrated that scaling the problem may have significant impact. Finally, we discussed a numerical example in which a nonconvex method converges also when the stated assumptions do not hold. In those situations, the quality of the solution from the nonconvex algorithm was better than the solution obtained by the optimal convex relaxation.
References
 [1] A. Antoulas, Approximation of LargeScale Dynamical Systems. SIAM, 2005.
 [2] H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, ser. CMS Books in Mathematics. Springer New York, 2011.
 [3] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011.
 [4] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Transactions on Information Theory, vol. 52, no. 2, pp. 489–509, 2006.
 [5] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky, “The convex geometry of linear inverse problems,” Foundations of Computational Mathematics, vol. 12, no. 6, pp. 805–849, 2012.
 [6] P. L. Combettes and J.C. Pesquet, Proximal Splitting Methods in Signal Processing. Springer New York, 2011, pp. 185–212.
 [7] J. Douglas and H. H. Rachford, “On the numerical solution of heat conduction problems in two and three space variables,” Transactions of the American Mathematical Society, vol. 82, no. 2, pp. 421–439, 1956.
 [8] J. Eckstein and D. P. Bertsekas, “On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators,” Mathematical Programming, vol. 55, no. 1, pp. 293–318, 1992.
 [9] D. Gabay and B. Mercier, “A dual algorithm for the solution of nonlinear variational problems via finite element approximation,” Computers and Mathematics with Applications, vol. 2, no. 1, pp. 17–40, 1976.
 [10] R. Glowinski and A. Marroco, “Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisationdualité d’une classe de problémes de dirichlet non linéaires,” ESAIM: Mathematical Modelling and Numerical Analysis  Modélisation Mathématique et Analyse Numérique, vol. 9, pp. 41–76, 1975.
 [11] C. Grussler, “Rank reduction with convex constraints,” Ph.D. dissertation, Lund University, 02 2017.
 [12] C. Grussler and P. Giselsson, “Lowrank inducing norms with optimality interpretations,” 2016, preprint.
 [13] C. Grussler, A. Rantzer, and P. Giselsson, “Lowrank optimization with convex constraints,” 2016.
 [14] C. Grussler, A. Zare, M. R. Jovanovic, and A. Rantzer, “The use of the heuristic in covariance completion problems,” in 55th IEEE Conference on Decision and Control (CDC), 2016.
 [15] T. Hastie, R. Tibshirani, and M. Wainwright, Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press, 2015.
 [16] R. Hesse, D. R. Luke, and P. Neumann, “Alternating projections and DouglasRachford for sparse affine feasibility,” IEEE Transactions on Signal Processing, vol. 62, no. 18, pp. 4868–4881, 2014.
 [17] R. Hesse and D. R. Luke, “Nonconvex Notions of Regularity and Convergence of Fundamental Algorithms for Feasibility Problems,” SIAM Journal on Optimization, vol. 23, no. 4, pp. 2397–2419, 2013.
 [18] J.B. HiriartUrruty and C. Lemaréchal, Convex analysis and minimization algorithms I: Fundamentals, ser. Grundlehren der mathematischen Wissenschaften. Springer Berlin Heidelberg, 2013, vol. 305.
 [19] R. A. Horn and C. R. Johnson, Matrix Analysis, 2nd ed., 2012.
 [20] E. Levitin and B. Polyak, “Constrained minimization methods,” USSR Computational Mathematics and Mathematical Physics, vol. 6, no. 5, pp. 1 – 50, 1966.
 [21] A. S. Lewis, “The convex analysis of unitarily invariant matrix functions,” Journal of Convex Analysis, vol. 2, no. 1, pp. 173–183, 1995.
 [22] P.L. Lions and B. Mercier, “Splitting algorithms for the sum of two nonlinear operators,” SIAM Journal on Numerical Analysis, vol. 16, no. 6, pp. 964–979, 1979.
 [23] D. G. Luenberger, Optimization by Vector Space Methods. John Wiley & Sons, 1968.
 [24] D. R. Luke, “ProxRegularity of Rank Constraint Sets and Implications for Algorithms,” Journal of Mathematical Imaging and Vision, vol. 47, no. 3, pp. 231–238, 2013.
 [25] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimumrank solutions of linear matrix equations via nuclear norm minimization,” SIAM Review, vol. 52, no. 3, pp. 471–501, 2010.
 [26] G. C. Reinsel and R. Velu, Multivariate ReducedRank Regression: Theory and Applications, ser. Lecture Notes in Statistics. Springer New York, 1998, vol. 136.
 [27] R. T. Rockafellar, Convex Analysis. Princeton University Press, 1970, no. 28.
 [28] G. W. Stewart and J.g. Sun, Matrix Perturbation Theory. Academic press, 1990.
 [29] A. Themelis, L. Stella, and P. Patrinos, “Forwardbackward envelope for the sum of two nonconvex functions: Further properties and nonmonotone linesearch algorithms,” 2016.

[30]
R. Vidal, Y. Ma, and S. S. Sastry,
Generalized Principal Component Analysis
, ser. Interdisciplinary Applied Mathematics. SpringerVerlag New York, 2016, vol. 40.  [31] G. Watson, “Characterization of the subdifferential of some matrix norms,” Linear Algebra and its Applications, vol. 170, pp. 33 – 45, 1992.
Appendix A Appendix
a.1 Proof to Theorem 1
Proof.
For and , let us define
By [21, Corollary 2.5.] and the unitary invariance of , it can be seen that and have simultaneous SVDs, i.e. if , then . Hence,
Further, [19, Theorem 7.4.8.4.] implies that
for all , which yields that
where the last equality and the inclusion follow by [19, Corollary 7.4.1.3.], [21, Corollary 2.5.] and the unitary invariance of . This proves that
Moreover, by Eq. 4 it follows that implies
By the extend Moreau decomposition (see e.g. [3]) and Proposition 1 it holds that
where . As before, , and can be shown to have simultaneous SVDs which is why
(17) 
Thus if and only if for . Since, only depends on , this is equivalent to
This shows the equivalence between Items iv, iii and ii. Finally note that this is also equivalent to
Since is unique, this can only be true if and thus , which concludes the proof. ∎
a.2 Proof to Lemma 2
Proof.
Let be an SVD of and the corresponding vector of singular values. Further, let for all , be defined as
By [31, Theorem 2] it holds that
(18) 
Next we show that . Letting denote the cardinality, it follows from [19, Theorem 7.4.8.4.] that
and therefore by [18, Corollary VI.4.3.2]
where denotes the convex hull. However, [19, Theorem 7.4.8.4.] implies that if
In this case, only depends on variables , which is why for all and hence all it holds that
Comments
There are no comments yet.