1. Introduction
In this paper, we deal with the structured nonsmooth nonconvex minimization problem
(1) 
where we will systematically assume the following hypotheses (see sec:algBPALM for details): [requirements for composite minimization (1)]
is proper and lower semicontinuous (lsc);
is , which is smooth relative to ;
is multiblock strictly convex, coercive and essentially smooth;
the firstorder oracle of , , and is available, , has a nonempty set of minimizers, i.e., , and .
Although, the problem (1
) has a simple structure, it covers a broad range of optimization problems arising in signal and image processing, statistical and machine learning, control and system identification. Consequently, needless to say, there is a huge number of algorithmic studies around solving the optimization problems of the form (
1). Among all of such methodologies, we are interested in the class of alternating minimization algorithms such as block coordinate descent [beck2015cyclic, beck2013convergence, latafat2019block, nesterov2012efficiency, razaviyayn2013unified, richtarik2014iteration, tseng2001convergence, tseng2009coordinate], block coordinate [combettes2015stochastic, fercoq2019coordinate, latafat2019new], and GaussSeidel methods [auslender1976optimisation, bertsekas1989parallel, grippo2000convergence], which assumes that all blocks are fixed except one and solves the corresponding auxiliary problem with respect to this block, update the latter block, and continue with the others. In particular, the proximal alternating minimization has received much attention in the last few years; see for example [attouch2008alternating, attouch2007new, attouch2006inertia, attouch2010proximal, attouch2013convergence, beck2016alternating]. Recently, the proximal alternating linearized minimization and its variation has been developed to handle (1); see for example [bolte2014proximal, pock2016inertial, shefi2016rate].Traditionally, the Lipschitz (Hölder) continuity of partial gradients of in (1) is a necessary tool for providing the convergence analysis of optimization algorithms; see, e.g., [bolte2014proximal, pock2016inertial]
. It is, however, wellknown that it is not the Lipschitz (Hölder) continuity of gradients playing a key role in such analysis, but one of its consequence: an upper estimation of
including a Bregman distance called descent lemma ; cf. [bauschke2016descent, lu2018relatively]. This idea is central to convergence analysis of many optimization schemes requiring such an upper estimation; see, e.g., [ahookhosh2019bregman, bauschke2019linear, bauschke2016descent, bolte2018first, teboulle2018simplified, hanzely2018fastest, hanzely2018accelerated, lu2018relatively, nesterov2018implementable]. If , , is bistrongly convex, , and is smooth, alternating proximal point and alternating proximal gradient algorithms suggested in [li2019provable] with saddlepoint avoidance guarantee. Further, if the th block of is smooth relative to a kernel function (), a blockcoordinate proximal gradient was recently proposed in [wang2018block], which involves only a limited convergence analysis. Beside of these two relevant papers, to the best of our knowledge, there are no more alternating minimization methods for solving (1) under relative smoothness assumption on , i.e., this motivates the quest for such algorithmic development.1.1. Contribution and organization
In this paper, we propose a Bregman proximal alternating linearized minimization (BPALM) algorithm and its adaptive version (ABPALM) for (1). Our contribution is summarized as follows:

[wide, labelwidth=!, labelindent=0pt]

(Bregman proximal alternating linearized minimization) We introduce BPALM, a multiblock generalization of the proximal alternating linearized minimization (PALM) [bolte2014proximal] using Bregman distances, and its adaptive version (ABPALM). To do so, we extend the notion of relative smoothness [bauschke2016descent, lu2018relatively] to its multiblock counterpart to support a structured problem of the form (1). Owing to multiblock relative smoothness of , unlike PALM, our algorithm does not need to know the local Lipschitz moduli of partial gradients () and their lower and upper bounds, which are hard to provide in practice. Our framework recovers [wang2018block] by exploiting a sum separable kernel, and the corresponding algorithm in [li2019provable] is a special case of our algorithm if , , .

(Efficient framework for ONMF and SONMF) Exploiting a suitable kernel function for Bregman distance, it turns out that the objective functions of ONMF and SONMF are multiblock relatively smooth with respect to this kernel. Further, it is shown that the auxiliary problems of ONMF and SONMF are solved in closed forms making BPALM and ABPALM suitable for largescale machine learning and data analysis problems.
This paper has three sections, besides this introductory section. In sec:algBPALM, we introduce the notion of multiblock relative smoothness, and verify the fundamental properties of Bregman Proximal alternating linearized mapping. In sec:convAnalysis, we introduce BPALM and ABPALM and investigate the convergence analysis of the sequences generated by these algorithms. In sec:appONMF, we show that the objective functions of ONMF and SONMF satisfy our assumptions and the related subproblems can be solved in closed form.
1.2. Notation
We denote by
the extendedreal line. For the identity matrix
, we set such that . The open ball of radius centered in is denoted as . The set of cluster points of is denoted as . A function is proper if and , in which case its domain is defined as the set . For , is the (sub)level set of ; and are defined similarly. We say that is level bounded if is bounded for all. A vector
is a subgradient of at , and set of all such vectors is called the subdifferential , i.e.see [rockafellar2011variational, Definition 8.3].
2. Multiblock Bregman proximal alternating linearized mapping
We first establish the notion multiblock relative smoothness, which is an extension of the relative smoothness [bauschke2016descent, lu2018relatively] for problems of the form (1). We then introduce Bregman alternating linearized mapping and study some of its basic properties.
In order to extend the definition of Bregman distances for the multiblock problem (1), we first need to introduce the notion of multiblock kernel functions, which will coincide with the standard one (cf. [ahookhosh2019bregman, Definition 2.1]) if .
[multiblock convexity and kernel function]Let be a proper and lsc function with and such that . For a fixed vector , we define the function given by
(2) 
Then, we say that is

multiblock (strongly/strictly) convex if the function is (strongly/strictly) convex for all and ;

multiblock locally strongly convex around if, for , there exists and such that

a multiblock kernel function if is multiblock convex and is coercive for all and , i.e., ;

multiblock essentially smooth, if for every sequence converging to a boundary point of for all ;

of multi block Legendre type if it is multiblock essentially smooth and multiblock strictly convex.
[popular kernel functions] There are many kernel functions satisfying def:kernel. For example, for , energy, BoltzmannShannon entropy, FermiDirac entropy (cf. [bauschke2018regularizing, Example 2.3]) and several examples in [lu2018relatively, Section 2]; and for see two examples in [li2019provable, Section 2]. Two important classes of multiblock kernels are sum separable kernels, i.e.,
and product separable kernels, i.e.,
see such a kernel for ONMF in pro:relSmoothNMF0.
We now give the definition of Bregman distances (cf. [bregman1967relaxation]) for multiblock kernels.
[Bregman distance] For a kernel function , the Bregman distance is given by
(3) 
Fixing all blocks except the th one, the Bregman distance with respect to this block is given by
which measures the proximity between and with respect to the th block of variables. Moreover, the kernel is multiblock convex if and only if for all and and . Note that if is multiblock strictly convex, then () if and only if .
We are now in a position to present the notion of multiblock relative smoothness, which is the central tool for our analysis in sec:convAnalysis.
[multiblock relative smoothness] Let be a multiblock kernel and let be a proper and lower semicontinuous function. If there exists () such that the functions given by
are convex for all and , then, is called smooth relative to .
Note that if , the multiblock relative smoothness is reduced to standard relative smoothness, which was introduced only recently in [bauschke2016descent, lu2018relatively]. In this case, if is Lipschitz continuous, then both and are convex, i.e., the relative smoothness of generalizes the notions of Lipschitz continuity using Bregman distances. If , this definition will be reduced to the relative bismoothness given in [li2019provable] for .
We next characterize the notion of multiblock relative smoothness.
[characterization of multiblock relative smoothness] Let be a multiblock kernel and let be a proper lower semicontinuous function and . Then, the following statements are equivalent:
smooth relative to ;
for all and ,
(4) 
for all and ,
(5) 
if and for all , then
(6) 
for .
Proof.
Fixing all the blocks except one of them, the results can be concluded in the same way as [lu2018relatively, Proposition 1.1]. ∎
2.1. Bregman proximal alternating linearized mapping
Recall that if , for a kernel function and a proper lower semicontinuous function , the Bregman proximal mapping is given by
(7) 
which is a generalization of the classical one using the Bregman distance (3) in place of the Euclidean distance; see, e.g., [chen1993convergence] and references therein. We note that
which implies . The function is proxbounded if there exists such that for some ; cf. [ahookhosh2019bregman]. We next extend this definition to our multiblock setting.
[multiblock proxboundedness] A function is multiblock proxbounded if for each there exists and such that
The supremum of the set of all such is the threshold of the proxboundedness, i.e.,
(8) 
For the problem (1), we have leading to
(9) 
i.e., we therefore denote . If is multiblock proxbounded for , so is for all . We next present equivalent conditions to this notion.
[characteristics of multiblock proxboundedness] For a multiblock kernel function and proper and lsc functions with , the following statements are equivalent:
is multiblock proxbounded;
for all and given in (2), is bounded below on for some ;
for all , .
Proof.
Suppose and let . Then, for all , it holds that
Notice that is strictly convex and coercive, and as such is lower bounded. Conversely, suppose that . Then, from (9), we obtain
which is finite, owing to coercivity of .
Suppose that . Since is coercive, we have
Conversely, suppose . Then, there exists such that whenever . In particular
where the last inequality follows from coercivity of . Since owing to lower semicontinuity, we conclude that is lower bounded on . ∎
Let us now define the function as
(10) 
and the setvalued Bregman proximal alternating linearized mapping as
(11) 
which reduces to the Bregman forwardbackward splitting mapping if ; cf. [bolte2018first, ahookhosh2019bregman].
[majorization model] Note that invoking fac:relSmoothEqvi2, the multiblock ()relative smoothness assumption of entails a majorization model
for .
In the next lemma, we show that the cost function is monotonically decreasing by minimizing the model (10) with respect to each block of variables.
[Bregman proximal alternating inequality] Let the conditions in ass:basic:fgh hold, and let with . Then,
(12) 
for all .
Proof.
Recall that a function with values is levelbounded in locally uniformly in if for each and there is a neighborhood of along with a bounded set such that for all , cf. [rockafellar2011variational]. Using this definition, the fundamental properties of the mapping are investigated in the subsequent result.
[properties of Bregman proximal alternating linearized mapping] Under conditions given in ass:basic:fgh for , the following statements are true:

is nonempty, compact, and outer semicontinuous (osc) for all ;

;

If , then ;
Proof.
For a fixed and a vector , let us define the function given by
Since and are proper and lsc, so is on the set , for a constant . We show that is levelbounded in locally uniformly in . If it is not, then there exists , with , and such that with and . This guarantees that, for sufficiently large , , i.e., and
Setting , pro:proxBoundedness2 ensures that there exists a constant such that
Subtracting the last two inequalities, it holds that
Expanding , dividing both sides by , and taking limit from both sides of this inequality as , it can be deduced that
This leads to the contradiction , which implies that is levelbounded. Therefore, all assumptions of the parametric minimization theorem [kan2012moreau, Theorem 2.2 and Corollary 2.2] are satisfied, i.e., pro:proxPro1. If , then lem:proxAltIneq implies that for , i.e., , the second inclusion follows from ass:basic:argmin. ∎
[sum or product separable kernel] Let us observe the following.

If is an additive separable function, i.e., , then (11) can be written in the form
Comments
There are no comments yet.