# Task Embedded Coordinate Update: A Realizable Framework for Multivariate Non-convex Optimization

We in this paper propose a realizable framework TECU, which embeds task-specific strategies into update schemes of coordinate descent, for optimizing multivariate non-convex problems with coupled objective functions. On one hand, TECU is capable of improving algorithm efficiencies through embedding productive numerical algorithms, for optimizing univariate sub-problems with nice properties. From the other side, it also augments probabilities to receive desired results, by embedding advanced techniques in optimizations of realistic tasks. Integrating both numerical algorithms and advanced techniques together, TECU is proposed in a unified framework for solving a class of non-convex problems. Although the task embedded strategies bring inaccuracies in sub-problem optimizations, we provide a realizable criterion to control the errors, meanwhile, to ensure robust performances with rigid theoretical analyses. By respectively embedding ADMM and a residual-type CNN in our algorithm framework, the experimental results verify both efficiency and effectiveness of embedding task-oriented strategies in coordinate descent for solving practical problems.

Comments

There are no comments yet.

## Authors

• 6 publications
• 25 publications
• 6 publications
• 2 publications
• ### Convergence guarantees for a class of non-convex and non-smooth optimization problems

We consider the problem of finding critical points of functions that are...
04/25/2018 ∙ by Koulik Khamaru, et al. ∙ 0

read it

• ### Global Non-convex Optimization with Discretized Diffusions

An Euler discretization of the Langevin diffusion is known to converge t...
10/29/2018 ∙ by Murat A. Erdogdu, et al. ∙ 0

read it

• ### Inertial Block Mirror Descent Method for Non-Convex Non-Smooth Optimization

In this paper, we propose inertial versions of block coordinate descent ...
03/05/2019 ∙ by Le Thi Khanh Hien, et al. ∙ 0

read it

• ### Coordinate Friendly Structures, Algorithms and Applications

This paper focuses on coordinate update methods, which are useful for so...
01/05/2016 ∙ by Zhimin Peng, et al. ∙ 0

read it

• ### Distributed Inexact Successive Convex Approximation ADMM: Analysis-Part I

In this two-part work, we propose an algorithmic framework for solving n...
07/21/2019 ∙ by Sandeep Kumar, et al. ∙ 0

read it

• ### Provable Non-Convex Optimization and Algorithm Validation via Submodularity

Submodularity is one of the most well-studied properties of problem clas...
12/18/2019 ∙ by Yatao An Bian, et al. ∙ 27

read it

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## Introduction

Over the past few decades, multivariate non-convex optimization has been widely concerned in the realms of pattern recognition and machine learning. Achieving well performances in the tasks such as matrix factorization

[Vu and Monga2016, Bao et al.2016] and image enhancement task [Fu et al.2016, Gharbi et al.2017], multivariate non-convex problems have motivated a revived interest in designing and analyzing numerical algorithms.

Compared to univariate optimization, it is much more complicated to optimize multivariate problems with coupled objective functions. Taking two variables as instance, this kind of coupled problem can be formulated as:

 minz:=(x,y)Ψ(z):=f(x)+g(y)+H(x,y), (1)

with vectors/matrices

and . Being employed to varieties of tasks, the coordinate descent (CD) [Luo and Tseng1989] is widely used for solving problem (1), which optimizes the objective over each direction, while fixing the remaining one with its latest value, i.e, solving univariate optimization problems in a loop. Doing in this way, calculating the coordinate updates are much simpler than computing a full update, requiring less memory and computational cost. However, in addition to these benefits, few CD algorithms consider useful traits of univariate sub-problems for improving either convergence speeds or optimized results for solving the generic problem (1) with non-convex, non-smooth objective function.

In most cases, though is non-convex and even non-smooth, it is quite likely to have univariate sub-problems with nice properties: e.g., the sub-problems can be optimized via convex optimization, or may have unique solutions. Moreover, the univariate problems usually have entirely distinct formations referred to different variables. For example, many literatures have posted superiorities on restricting dictionaries with normalized bases, meanwhile, constraining sparsity for the codes with various non-convex penalties for dictionary learning tasks [Gregor and Lecun2010, Wang et al.2016, Bao et al.2016]. Though these models are non-convex and non-smooth, their univariate sub-problems of dictionary can be efficiently solved, compared with the other sub-problem. Thus, in view of the nice traits and specificities of univariate problems, it is significant to specifically integrate effective algorithms for optimizing task-specific sub-problems, to improve the efficiency and effectiveness of CD schemes.

More critically, though there are not a few CD algorithms for solving multivariate optimization problems, converging to a critical point is still a nice result for generic non-convex and non-smooth problems [Bolte, Sabach, and Teboulle2014, Xu and Yin2015, Pock and Sabach2017]

. While, we have noticed that, many univariate sub-problems of real-world image processing tasks are referred to specific application problems. E.g., in tasks like image deblurring and super-resolution, one univariate sub-problem can be regarded as an image denoising task. Rather than numerical algorithms, techniques such as BM3D and CNN are effective for solving image denoising problem. Although such advanced techniques mostly lack theoretical support, they have the ability to efficiently project the variables on small neighborhoods of the desired solutions. Considering the effectiveness of these advanced techniques, it is significant to integrate them into CD schemes, expecting to get desired results with high probability

[Zhang et al.2017, Chan, Wang, and Elgendy2017].

The above mentioned strategies, integrating either numerical algorithms or advanced techniques, have already appeared in applications for specific tasks, which will be briefly stated later. However, the success of those CD schemes, designed for specific problems, can not be straightforwardly replicated to other tasks. Moreover, there has not yet been proposed a unified CD framework, integrating both numerical algorithms and advanced techniques, for optimizing the generic multivariate non-convex problem (1). More importantly, few of them are able to provide rigid theoretical analyses on illuminating the properties of the final optimized results. Considering all the mentioned aspects, we in this paper propose a realizable algorithm framework, which embeds various task-oriented strategies in the update of CD scheme, for effectively solving the generic problem (1). We name our proposed algorithm as TECU (task embedded coordinate update), and the main contributions are sketched out as follows:

1. For optimizing the generic multivariate problem (1) with coupled non-convex objective, we propose an algorithm TECU, which embeds task-oriented techniques for optimizing specific univariate sub-problems of CD update. Moreover, we further provide a realizable condition to ensure robust performances of TECU with theoretical analyses.

2. Considering the nice properties of univariate sub-problems, TECU is able to improve the algorithm efficiency by embedding high-efficient numerical algorithms into its framework. We utilize the -regularized dictionary learning task and design to embed ADMM to accelerate the convergence speed of the whole algorithm. Experiments conducted on synthetic data give verifications on the efficiency of TECU, in comparison with other existing numerical algorithms.

3. Through embedding advanced techniques, TECU is likely to obtain desired solutions with high probability, which is superior to most numerical algorithms for non-convex optimization. Taking low-light image enhancement as an example, we embed a residual-type CNN to optimize the univariate problem of illumination layer. Then, comparing to state-of-the-art methods, the experimental results show the superiority of embedding networks for real-world tasks, meanwhile, verify the effectiveness of integrating networks and CD schemes in a unified algorithm framework.

### Related Work

For solving general multivariate non-convex problems, the most classical case that adopts CD scheme is the proximal alternating method (PAM) [Attouch et al.2010]. However, it is limited to most coupled problems for requiring explicit solutions for every univariate sub-problems. To get around this limitation, the PALM linearizes the coupled function, in pursuit of explicit solutions [Bolte, Sabach, and Teboulle2014]

. However, it requires computing exact Lipschitz constants during iterations, which sometimes is time-consuming even for estimating their tight upper bounds

[Bao et al.2016, Xu and Yin2015]. Moreover, improper upper bounds definitely slow down the convergence speeds of PALM. These troubles on estimating Lipschitz constants also exist in CD variants like BCU [Xu and Yin2017] and iPALM [Pock and Sabach2017]. Besides the mentioned defects, the updates of existing CD algorithms utterly lose sight of task specificities, i.e., optimizing every univariate sub-problems in the same scheme, which is less efficient in practice.

Unlike algorithms for general problems, it is common to embed numerical algorithms for optimizing sub-problems in real-world applications [Guo and Ma2014, Li and Brown2014, Li et al.2016, Yue et al.2017]. Such algorithms often make good uses on the nice traits of univariate sub-problems, thus they always possess high efficiencies and superior performances. However, their specificities give rise to less generalization: the well-designed algorithms usually cannot be borrowed to other models. Not only this, those specified algorithms mostly have relatively weak convergence in theory, thus their efficiencies are mostly lack of robustness.

Quite recently, fusing advanced techniques into optimization framework has been a hot research interest for real-world applications [Schmidt and Roth2014, Liu et al.2018, Zhang et al.2017]. For example, the authors in [Schmidt and Roth2014] learn a cascade of shrinkage fields to replace artificially designed priors in a half-quadratic optimization for image restoration. Instead of designing complex regularizers, [Zhang et al.2017] learns a CNN-based denoiser to replace corresponding sub-problem in their optimization framework. These novel methods usually have remarkable performances with the power of advanced techniques, however, their successes rely on completely replacing univariate sub-problems with advanced techniques, thus few of them are able to illuminate the properties on final results with rigid theoretical analyses.

### Preliminaries

In general, the objective function of problem (1) is assumed to have: (1) and are proper, lower semi-continuous (l.s.c); (2) is a function; its gradient and partial gradients are Lipschitz continuous on bounded sets; (3) is coercive, that is, it is bounded from below and when , where denotes the Frobenius norm. Meanwhile, is a Kurdyka-Łojasiewicz (KŁ) function.

Notice that, all semialgebraic functions and subanalytic functions satisfy the KŁ property. Typical semialgebraic functions include real polynomial functions, with , indicator functions of semialgebraic sets, Stiefel manifolds and constant rank matrices [Attouch et al.2010].

## Task Embedded Coordinate Update

Corresponding to specific tasks, the univariate sub-problems of the generic model (1) usually either have desirable characteristics or can be corresponding to certain tasks with single variable. Considering these available traits, we embed powerful strategies in CD scheme, for optimizing task-oriented univariate sub-problems, and then improving the convergence speeds and optimized results of the whole algorithm.

The most basic CD scheme optimizes the objective cyclically over each variable, that is, successively solving the following sub-problems to update and at iteration .

 minxf(x)+H(x,yt−1)+η12∥x−xt−1∥2, (2) minyg(y)+H(xt,y)+η22∥y−yt−1∥2.

Targeting to these univariate sub-problems, we respectively introduce numerical algorithms and advanced techniques to optimize them with a practical error control condition, which provides a criteria on optimization precisions, meanwhile, helps illuminating the properties of final optimized results.

### Task Embedded Strategies

Owing to the diversities of realistic tasks, the univariate sub-problems in Eq. (2) usually have distinct objective functions. Taking their specificities into consideration, we introduce two targeted strategies, i.e., numerical algorithms and advanced techniques, for optimizing these univariate sub-problems.

#### Numerical Algorithms Embedding

Mostly, it makes no sense to employ extra numerical algorithms for optimizing sub-problems in Eq. (2) if their explicit solutions are easily obtained. However, for most cases with coupled , it is common to adopt linearization for easy-to-solve sub-problems. But as mentioned in the section of related work, the linearization skill requires estimating Lipschitz constants during every iteration, which brings a series of time-consuming troubles.

We have noticed that, though the objective function is non-convex and non-smooth, sub-problems corresponding to specific tasks may possess nice traits, e.g, they are convex sometimes even differentiable problems, or they have unique solutions. Thus there are plenty of high-efficient numerical algorithms designed for solving such univariate problems, e.g., greedy algorithms [Elad2010], PCG [Spillane2016], FISTA [Kim and Fessler2018] and ADMM [Wang, Yin, and Zeng2018]. Thus it is advantaged to embed efficient algorithms in the CD scheme, for improving the efficiency and effectiveness of the whole algorithm, which is one of the motivations for proposing TECU.

#### Advanced Techniques Embedding

For real-world image processing tasks, most univariate sub-problems can be corresponding to specific applications. For example, in tasks of image deblurring and super-resolution, one univariate sub-problem can be seen as image denoising task [Chan, Wang, and Elgendy2017, Zhang et al.2017]. While for low-light image enhancement, one of two sub-problems is for estimating the illumination layer, while the other one is for restoring the reflection image.

Except for optimization algorithms, there have been plenty advanced techniques such like BM3D [Chan, Wang, and Elgendy2017]

and variants of neural networks

[Zhang et al.2017, Gharbi et al.2017] for solving single-variable tasks. Different from numerical algorithms, advanced techniques like neural networks, obtain the final results by a pre-trained propagations, rather than optimizing mathematical models. Such advanced techniques are mostly quite efficient, meanwhile, are able to propagate variables very close to the desired solutions. Taking these advantages into consideration, we propose to embed advanced techniques into CD scheme, to improve the convergence speeds of the whole algorithm, meanwhile, to get desired solutions with high probability.

Targeting to embed the above mentioned strategies into a unified framework, we adopt and steps of updates, starting at and , for optimizing problems of Eq. (2) to some extents, namely:

 (3)

where denotes the composition operator. Each can be set as either one-step iteration of numerical algorithms or a propagation of advanced techniques.

This general framework covers various existing methods [Yue et al.2017, Schmidt and Roth2014, Zhang et al.2017]: e.g., for solvers like [Yue et al.2017], each can be seen as one-step iteration of numerical algorithms; for others like [Zhang et al.2017], there exists only one step propagation of advanced techniques. Besides, Eq. (3) also includes cases that employ numerical algorithms and advanced techniques in hybrid manners, which is far more flexible than existing solvers. Furthermore, considering two scenarios: 1) one problem of Eq. (2) has closed-form solution; 2) not all the sub-problems have efficient numerical algorithms, thus we propose TECU in a hybrid updating scheme (as shown in Alg. 1) with other two classical CD updates (see Table 1).

### Error Control and Estimation

Notice that our task embedded strategies do not require exactly optimizing problems in Eq. (2): updating and by task embedded strategies bring errors and to the first-order optimality conditions of univariate sub-problems:

 etx=gtx+∇xH(xt,yt−1)+η1(xt−xt−1), (4) ety=gty+∇yH(xt,yt)+η2(yt−yt−1),

where and , with represents the Fréchet limiting-subdifferential [Attouch et al.2010].

Apparently, imprecise task embedded calculations certainly slow down the convergence speed of the whole algorithm, while over-precise optimizations are time-consuming and unnecessary for practical use. Hence, we provide the following criterion to control the accuracies for optimizing univariate sub-problems.

###### Criterion 1

The errors and brought by task embedded strategies should be controlled by certain constants, i.e.,

 ∥etx∥≤Cxϵtx,∥ety∥≤Cyϵty, (5)

with and . Moreover, and should be satisfied.

Notice that the conditions in Eq. (5) are certainly attainable for converged algorithms since the inaccurate errors and are approaching to zero when the algorithms are identified as converged. However, since and should not be optionally selected from the sets of and , we in Prop. 1 provide an implementation for estimating the errors, to make our proposed TECU more practical111All the proofs in this paper are presented in [Wang et al.2018].

###### Proposition 1

Two intermediate variables and are calculated with respect to and as follows222 denotes proximal mapping to proper, l.s.c function .:

 ˜xt,Ktx=prox1f(η1xt−1+Pt−1x(xt,Ktx)), (6) ˜yt,Kty=prox1g(η2yt−1+Pt−1y(yt,Kty)),

with functions and . Then, the following

 et,Ktxx= Pt−1x(xt,Ktx)−Pt−1x(˜xt,Ktx), (7) et,Ktyy= Pt−1y(yt,Kty)−Pt−1y(˜yt,Kty),

are implementations of and in Eq. (4), by assigning to and to .

So far, we have introduced the whole process of TECU and further provide its detailed procedures in Alg. 2. As follows, we will demonstrate that our error control conditions are more persuasive than previously used criteria [Li and Pong2014, Yue et al.2017], since it is well converged in theory.

## Theoretical Analyses

With properties of the objective function and the error control criterion, we in this section present some nice properties of TECU. Firstly, we demonstrate that along with the iteration progresses of TECU, there exists a bounded function that satisfies sufficient-descent property.

###### Proposition 2

Suppose that the sequence is generated by TECU, then with Criterion 1, there exist a bounded function , such that for :

 Φ(zt,zt−1)−Φ(zt+1,zt)≥a∥zt+1−zt∥2, (8a) dist(0,∂Φ(zt,zt−1))≤b(∥zt−zt−1∥+∥zt−1−zt−2∥), (8b)

with definite positive constants and . In addition, the sequence generated by TECU is bounded.

Since our proposed TECU adopts a hybrid update scheme (presented in Alg. 1), thus the , and have different concrete formations, corresponding to distinct combinations of updates [Wang et al.2018]. Then with this proposition, we are ready to illuminate the property of the final solution optimized by TECU.

###### Theorem 3

The generated by TECU is a Cauchy sequence, which converges to a critical point of the original objective function .

The Theorem 3 presents the property of the final optimized solution, meanwhile, demonstrates the robust performances of our proposed algorithm framework. Moreover, TECU has at least sub-linear convergence rate when the desingularising function of the objective is satisfied with positive constant and Better yet, it will further have linear convergence rate if . Though this convergence rate is accordance with previous CD schemes like [Attouch et al.2010], our TECU is far more efficient and effective for realistic tasks.

## TECU for Realistic Tasks

As previously stated, TECU allows embedding both numerical algorithms and advance techniques in its algorithm framework. Hence, we consider two realistic tasks, -regularized dictionary learning (DL) and low-light image enhancement (LIE), to verify both efficiency and effectivenss of embedding task-oriented strategies.

### ℓ0-Regularized Dictionary Learning

Dictionary learning is a powerful tool for learning features from data. Its basic idea is to factorize the data as , where is the dictionary and is the corresponding coefficients. We consider the previously proposed DL problem [Bao et al.2016], which can be modeled as

 minW,Dλ∥W∥0+XD(D)+12∥Y−DW⊤∥2, (9)

where denotes the penalty that counts the number of non-zero elements, and indicator function acts on the set for normalized bases.

Notice that, the two sub-problems of DL have entirely different characteristics. The sub-problem of is a sparse coding task with penalty, which can be optimized by a proximal iterative hard-thresholding (PITH) method [Bach et al.2011]. While the sub-problem of minimizes a strongly convex quadratic function with unit ball constraint, thus it can be efficiently solved. Considering this nice property, we embed ADMM for optimizing sub-problem, and the experimental results verify that embedding ADMM improve the efficiency of optimization. However, due to the non-convexity of the sub-problem of , additional experiments are conducted to show that embedding PITH is not a good choice for TECU, which also indicates the necessity of hybrid scheme.

Rather than TECU, the DL problem can also be optimized by other numerical algorithms. However, as repeatedly stated, though the update of PALM can be simply computed, its request on estimating Lipschitz constants decelerate the overall convergence speeds, especially for large-scale data (see Table 3). Except for PALM, its two variants, i.e., BCU [Xu and Yin2017] and iPALM [Pock and Sabach2017] are also employed for solving the DL problem. But these variants do not avoid estimating Lipschitz constants, even worse, their efficiencies are restricted by additional parameter conditions. To avoid these troubles, [Bao et al.2016] utilize a strategy (named INV), which solves the sub-problem of by leaving out term first, and then directly projecting the results on . However, we have noticed that the update of INV is inaccurate without theoretical supports, thus its performances are always lack of robustness (see Fig. 1).

### Low-light Image Enhancement

The purpose of LIE is to enhance the captured low-visibility images so that high-quality images can be obtained. For LIE, Retinex-based decomposition [Cai et al.2017, Fu et al.2016] is widely concerned: , with element-wise multiplication operator . Thus LIE is to factorize the observed low-light image into an illumination layer that represents the light intensity, and a reflection layer which describes the physical characteristic of objects.

Considering the characteristics of illumination layer, many literatures [Fu et al.2015, Cai et al.2017] enforce a smooth constraint, i.e., to represent the smooth changes of the illumination layer. Then together with the range constraints of both layers, we establish the following optimization model for LIE task:

 minI,Rα2∥∇I∥2+XI(I)+XR(R)+12∥O−I⊙R∥2, (10)

with , .

Employing TECU for optimizing it, we embed a residual-type CNN to propagate the illumination layer very close to the desired solution. Specifically, we first randomly choose 800 image-pairs from ImageNet database

[Krizhevsky, Sutskever, and Hinton2012] and crop them into

small patches, to train a neural network with only one residual block including 7 convolutional layers and ReLU activations. Then at each iteration, we first use the pre-trained network to propagate the latest value of

. Then by considering that the pre-trained network may not always satisfy the error control conditions, thus we further employ prox-linear updates as the remaining propagations, until the Criterion 1 is satisfied. For the other sub-problem of reflection layer, we adopt proximal update to get the closed-form solution.

## Experimental Results

With the task embedded strategies respectively designed for DL and LIE problems, we apply TECU to DL with synthetic data to verify its nice convergence properties. While for LIE task, TECU is applied to real-world images. The experimental results of both two realistic tasks demonstrate the efficiency and effectiveness of embedding strategies into CD algorithm, by comparing with other state-of-the-art methods.

### ℓ0-Regularized Dictionary Learning

We generate synthetic data with different sizes (see Table 2) to help analyze the convergence properties of TECU. Specifically, all the algorithms are terminated when satisfying the following condition:

 max{∥Dt+1−Dt∥∥Dt∥, ∥Wt+1−Wt∥∥Wt∥, ∥Ψt+1−Ψt∥∥Ψt∥}<1e−4. (11)

#### Comparisons of Different Embedded Strategies

Firstly, we conduct experiments to compare two particular cases of our proposed framework. Precisely, the sign “TECU” in the Fig. 1, Table 2 and Table 3 represents the case of embedding ADMM for updating , while using prox-linear update for sub-problem; the “TECU-PITH” in Fig. 1 refers to embedding PITH for updating , while applying prox-linear update for sub-problem.

From the first row of Fig. 1, it is distinct that these two cases of TECU have quite different convergence performances. From the top row of Fig. 1(d), we can see that TECU requires a few propagation steps, however, TECU-PITH reaches the maximum inner steps (set as ) at almost every iteration. The excessive inner propagations definitely decelerate the overall convergence speed, i.e., TECU uses 2.22s but TECU-PITH takes nearly 600s to converge.

This comparison result from one side shows that ADMM is more productive for optimizing sub-problem, but PITH is less effective for solving the sub-problem of . On the other side, the different performances are also influenced by the characteristics of sub-problems. The sub-problem of minimizes strongly convex quadratic function with unit ball constraint, thus it can be efficiently solved. However, due to the penalty, the sub-problem of is NP hard, which is more difficult to optimize. Therefore, this comparisons of different embedded strategies indicate the necessity of employing hybrid scheme in the framework of TECU, meanwhile, suggest embedding high-efficient numerical algorithms for sub-problem optimization.

#### Comparisons with Other Algorithms

Comparing with other existing algorithms, we can see from Table 2 and Fig. 1 that TECU converges with less iteration steps, especially has better convergence performances on the optimization of sub-problem. Moreover, we give further comparisons in Table 3 to show that the computation time of one propagation in TECU is much less than the ones in other algorithms. Thus it is the reason why TECU totally adopts more propagations but has less computation time. Moreover, since all the PALM, BCU and iPALM require estimating Lipschitz constants at every iteration, thus they have similar one-step computation time on different data scales. It is obvious that estimating Lipschitz constants during iterations is extremely time-consuming especially for large scale data. Thus though embedding numerical algorithms brings more propagations, it avoids estimating Lipschitz constants thus is far more efficient than existing algorithms.

### Low-light Image Enhancement

Firstly, we conduct an experiment in Fig. 2 to compare TECU with the classical CD algorithm, i.e., PAM [Attouch et al.2010] on an example image from [Cai, Gu, and Zhang2018]. Since [Cai, Gu, and Zhang2018] provides image pairs of low-light images and the references obtained by other techniques, we provide PSNR values with respect to the given reference to give quantitative evaluations. As shown in Fig. 2, TECU achieves superior performances in terms of visual effect and PSNR score. Moreover, we further plot the PSNR curves in Fig. 2(e). From which we can tell that embedding network into the classical CD scheme certainly produces an excellent growth trend than employing itself.

We compare TECU with state-of-the-art approaches including: HE [Cheng and Shi2004], BPDHE [Sheet et al.2010], MSRCR [Rahman, Jobson, and Woodell2004], GOLW [Shan, Jia, and Brown2010], NPEA [Wang et al.2013], SRIE [Fu et al.2015], WVM [Fu et al.2016], JIEP [Cai et al.2017] and HDRNet [Gharbi et al.2017], on the NASA dataset [NASA2001] and the Non-uniform dataset [Wang et al.2013]. There are 23 images of different indoor and outdoor scenes in NASA dataset, while the Non-uniform dataset consists 130 low-quality images in different natural scenes including sunshine, overcast sky and nightfall scenarios.

For the lack of ground truth, it is impossible to give standard metrics (i.e., PSNR) to evaluate the quantitative performances for LIE task. In previous literatures [Wang et al.2013, Fu et al.2016, Cai et al.2017], a blind image quality assessment called Natural Image Quality Evaluator (NIQE) is widely used to give quantitative evaluation for LIE. Following this, we also present the NIQE scores in Table 4, comparing with all these state-of-the-art methods on the two different benchmarks. The comparison results in Table 4 indicate that TECU with embedded network has the lowest NIQE score and thus achieves the highest image quality. We also provide a visual comparison on examples selected from both two datasets. It is obvious that TECU is able to enhance image quality with high contrast, but other results are still contain details in dark, which are hard to recognize. Thus from both quantitative and quality analyses, we can conclude that embedding networks in the framework of TECU is effective and competitive for the challenging LIE task.

## Conclusion

We propose a realizable algorithm framework TECU, which embeds both numerical algorithms and advance techniques for optimizing a generic multivariate non-convex problem. Through embedding task-oriented strategies, TECU is able to improve the convergence speed of the whole algorithm and obtain desired solutions with high probability. Moreover, we further provide a realizable error control condition, to ensure robust performances with rigid theoretical supports. The experimental results on two practical problems verify the superiorities of our proposed algorithm.

## Acknowledgments

This work was supported by National Natural Science Foundation of China (Grant Nos. 61672125, 61733002, 61572096, 61632019 and 61806057), China Postdoctoral Science Foundation (Grant No. 2018M632018) and the Fundamental Research Funds for the Central Universities.

## References

• [Attouch et al.2010] Attouch, H.; Bolte, J.; Redont, P.; and Soubeyran, A. 2010. Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the kurdyka-łojasiewicz inequality. Mathematics of Operations Research 35:438–457.
• [Bach et al.2011] Bach, F.; Jenatton, R.; Mairal, J.; and Obozinski, G. 2011. Optimization with sparsity-inducing penalties. Foundations & Trends®in Machine Learning 4(1):1–106.
• [Bao et al.2016] Bao, C.; Ji, H.; Quan, Y.; and Shen, Z. 2016. Dictionary learning for sparse coding: Algorithms and convergence analysis. IEEE TPAMI 38(7):1356–1369.
• [Bolte, Sabach, and Teboulle2014] Bolte, J.; Sabach, S.; and Teboulle, M. 2014. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming 146:459–494.
• [Cai et al.2017] Cai, B.; Xu, X.; Guo, K.; Jia, K.; Hu, B.; and Tao, D. 2017. A joint intrinsic-extrinsic prior model for retinex. In CVPR.
• [Cai, Gu, and Zhang2018] Cai, J.; Gu, S.; and Zhang, L. 2018. Learning a deep single image contrast enhancer from multi-exposure images. IEEE TIP 27(4):2049–2062.
• [Chan, Wang, and Elgendy2017] Chan, S. H.; Wang, X.; and Elgendy, O. A. 2017. Plug-and-play admm for image restoration: Fixed-point convergence and applications. IEEE TCI 3(1):84–98.
• [Cheng and Shi2004] Cheng, H., and Shi, X. 2004. A simple and effective histogram equalization approach to image enhancement. Digital Signal Processing 14(2):158–170.
• [Elad2010] Elad, M. 2010. Sparse and Redundant Representations. Springer New York.
• [Fu et al.2015] Fu, X.; Liao, Y.; Zeng, D.; Huang, Y.; Zhang, X.-P.; and Ding, X. 2015. A probabilistic method for image enhancement with simultaneous illumination and reflectance estimation. IEEE TIP 24(12):4965–4977.
• [Fu et al.2016] Fu, X.; Zeng, D.; Huang, Y.; Zhang, X.-P.; and Ding, X. 2016. A weighted variational model for simultaneous reflectance and illumination estimation. In CVPR.
• [Gharbi et al.2017] Gharbi, M.; Chen, J.; T., B. J.; Hasinoff, S. W.; and Fredo, D. 2017. Deep bilateral learning for real-time image enhancement. ACM ToG 36(4):118.
• [Gregor and Lecun2010] Gregor, K., and Lecun, Y. 2010. Learning fast approximations of sparse coding. In ICML.
• [Guo and Ma2014] Guo, X.and Cao, X., and Ma, Y. 2014. Robust separation of reflection from multiple images. In CVPR.
• [Kim and Fessler2018] Kim, D., and Fessler, J. A. 2018. Another look at the fast iterative shrinkage/thresholding algorithm (fista). Siam J. on Optimization 28(1).
• [Krizhevsky, Sutskever, and Hinton2012] Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012.

Imagenet classification with deep convolutional neural networks.

In NIPS.
• [Li and Brown2014] Li, Y., and Brown, M. S. 2014. Single image layer separation using relative smoothness. In CVPR.
• [Li and Pong2014] Li, G., and Pong, T. K. 2014. Global convergence of splitting methods for nonconvex composite optimization. arXiv.
• [Li et al.2016] Li, C.; Guo, J.; Cong, R.; Pang, Y.; and Wang, B. 2016. Underwater image enhancement by dehazing with minimum information loss and histogram distribution prior. IEEE TIP 25(12):5664–5677.
• [Liu et al.2018] Liu, R.; Ma, L.; Wang, Y.; and Zhang, L. 2018. Learning converged propagations with deep prior ensemble for image enhancement. IEEE TIP.
• [Luo and Tseng1989] Luo, Z. Q., and Tseng, P. 1989. On the convergence of the coordinate descent method for convex differentiable minimization. Journal of Optimization Theory and Applications 72:7–35.
• [NASA2001] NASA. 2001. Retinex image processing.
• [Ortega et al.1970] Ortega; M, J.; Rheinboldt; and C, W. 1970. Iterative solution of nonlinear equations in several variables. 25(114):347?380.
• [Pock and Sabach2017] Pock, T., and Sabach, S. 2017. Inertial proximal alternating linearized minimization (ipalm) for nonconvex and nonsmooth problems. Siam J. Imaging Sciences 9(4):1756–1787.
• [Rahman, Jobson, and Woodell2004] Rahman, Z.-u.; Jobson, D. J.; and Woodell, G. A. 2004. Retinex processing for automatic image enhancement. Journal of Electronic Imaging 13(1):100–111.
• [Schmidt and Roth2014] Schmidt, U., and Roth, S. 2014. Shrinkage fields for effective image restoration. In CVPR.
• [Shan, Jia, and Brown2010] Shan, Q.; Jia, J.; and Brown, M. S. 2010. Globally optimized linear windowed tone mapping. IEEE TVCG 16(4):663–675.
• [Sheet et al.2010] Sheet, D.; Garud, H.; Suveer, A.; Mahadevappa, M.; and Chatterjee, J. 2010. Brightness preserving dynamic fuzzy histogram equalization. IEEE TCE 56(4).
• [Spillane2016] Spillane, N. 2016. An adaptive multipreconditioned conjugate gradient algorithm. Siam J. on Scientific Computing 38(3):A1896–A1918.
• [Vu and Monga2016] Vu, T. H., and Monga, V. 2016. Fast low-rank shared dictionary learning for image classification. IEEE TIP PP(99):1–1.
• [Wang et al.2013] Wang, S.; Zheng, J.; Hu, H.-M.; and Li, B. 2013. Naturalness preserved enhancement algorithm for non-uniform illumination images. IEEE TIP 22(9):3538–3548.
• [Wang et al.2016] Wang, Y.; Liu, R.; Song, X.; and Su, Z. 2016. Linearized alternating direction method with penalization for nonconvex and nonsmooth optimization. In AAAI.
• [Wang et al.2018] Wang, Y.; Liu, R.; Ma, L.; and Song, X. 2018. Supplementary material of task embedded coordinate update: A realizable framework for multivariate non-convex optimization.
• [Wang, Yin, and Zeng2018] Wang, Y.; Yin, W.; and Zeng, J. 2018. Global convergence of admm in nonconvex nonsmooth optimization. Journal of Scientific Computing 1–35.
• [Xu and Yin2015] Xu, Y., and Yin, W. 2015.

A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion.

SIAM J. Imaging Sciences 6(3):1758–1789.
• [Xu and Yin2017] Xu, Y., and Yin, W. 2017. A globally convergent algorithm for nonconvex optimization based on block coordinate update. Journal of Scientific Computing 72(2):700–734.
• [Yue et al.2017] Yue, H.; Yang, J.; Sun, X.; Wu, F.; and Hou, C. 2017. Contrast enhancement based on intrinsic image decomposition. IEEE TIP 26(8):3981–3994.
• [Zhang et al.2017] Zhang, K.; Zuo, W.; Gu, S.; and Zhang, L. 2017. Learning deep cnn denoiser prior for image restoration. In CVPR.

## Supplementary Material of Task Embedded Coordinate Update: A Realizable Framework for Multivariate Non-convex Optimization

In this supplementary material, the contents are presented according to the following order:

1. Revisit the definition of Kurdyka-Łojasiewicz (KŁ) property/function.

2. Give detailed proofs of Proposition 1.

3. Provide detailed proofs of convergence: proofs of Proposition 2 and Theorem 3.

4. Give more experimental results of low-light image enhancement task.

## Kurdyka-Łojasiewicz Property/Function

###### Definition 4

(Kurdyka-Łojasiewicz function) Proper, lower semi-continuous function is said to have the Kurdyka-Łojasiewicz property at if there exist , a neighborhood of and a desingularizing function which satisfies (1) ; (2) is on and continuous at ; (3) for all , such that for all

 x∈U~x∩[σ(~x)<σ(x)<σ(~x)+μ], (12)

the following inequality holds

 ϕ′(σ(x)−σ(~x))dist(0,∂σ(x))≥1. (13)

Moreover, if satisfies the KŁ property at each point of then is called a KŁ function.

## Detailed Proofs of Proposition 1

• From the calculations in Eq. (6), we can deduce the following equalities.

 ˜xt,Ktx= prox1f(xt,Ktx−∇xH(xt,Ktx,yt−1)−η1(xt,Ktx−xt−1)) (14) = prox1f(˜xt,Ktx−∇xH(˜xt,Ktx,yt−1)−η1(˜xt,Ktx−xt−1)+et,Ktxx).

Once the error satisfies the Criterion 1, will be assigned as in the Eq. (14), thus we get

 xt=prox1f(xt−∇xH(xt,yt−1)−η1(xt−xt−1)+et,Ktxx). (15)

The above deductions can be similarly extended to the case of :

 yt=prox1g(yt−∇yH(xt,yt)−η2(yt−yt−1)+et,Ktyy). (16)

From the definition of proximal mapping operator, Eq. (15) and (16) are equal to

 et,Ktxx =gtx+∇xH(xt,yt−1)+η1(xt−xt−1), (17) et,Ktyy =gty+∇yH(xt,yt)+η2(yt−yt−1).

where and . The above equalities show that and are implementations of and in Eq. (4).

## Detailed Proofs of Convergence

Firstly, we can conclude from Eq. (4) that, the and updated by task embedding strategy can be regarded as solutions to the following subproblems:

 minxf(x)+H(x,yt−1)+η12∥x−xt−1∥2−(etx)⊤x, (18) minyg(y)+H(xt,y)+η22∥y−yt−1∥2−(ety)⊤y.

This equivalent conversion is strict since the first-order optimality conditions of Eq. (18) are exactly the same with Eq. (4). However, we have to emphasize that it is only used for theoretic analyses: we do not directly optimize Eq. (18) in practice, instead, and are updated by task embedding strategy, as claimed in Alg. 1.

Since our proposed TECU is a hybrid framework which contains three different updates at each iteration, we would like to revisit the proximal update, prox-linear update and the theoretically-equivalent form of our novel task-embedding update:

For solving sub-problem:

1. Proximal: , .

2. Prox-linear: , .

3. Task embedding: , .

For solving sub-problem:

1. [resume]

2. Proximal: , .

3. Prox-linear: , .

4. Task embedding: , .

There are totally 9 combinations under TECU framework, that is: However, we are only interested the ones that consist at least one task embedding update, that is, we consider the cases:

 1−6,2−6,3−4,3−5,3−6.

In the subsequence, we will prove that this hybrid algorithm TECU has nice convergence property: it generates a Cauchy sequence that converges to a critical point of the original objective function.

### Proof for Proposition 2

• Notice that “”, “”are the same with “”, “” since and can be switched to each other. Thus we only give detailed proofs on cases of “”, “” and “”.

(Sufficient descent property: Eq. (8a))

For “” with task embedding updates on both subproblems, we have the following inequalities:

 f(xt+1)+H(xt+1,yt)+η12∥xt+1−xt∥2−(et+1x)⊤xt+1≤f(xt)+H(xt,yt)−(et+1x)⊤xt, (19) g(yt+1)+H(xt+1,yt+1)+η22∥yt+1−yt∥2−(et+1y)⊤yt+1≤g(yt)+H(xt+1,yt)−(et+1y)⊤yt.

Adding the above two inequalities, then we can get the following inequalities with positive real numbers and :

 Ψ(zt)−Ψ(zt+1)≥ η12∥xt+1−xt∥2+η22∥yt+1−yt∥2+(et+1x)⊤(xt−xt+1)+(et+1y)⊤(yt−yt+1) (20) ≥ η12∥xt+1−xt∥2+η22∥yt+1−yt∥2 −(ρ12∥et+1x∥2+12ρ1∥xt+1−xt∥2)−(ρ22∥et+1y∥2+12ρ2∥yt+1−yt∥2).

The last inequality comes from applying Young’s inequality. Then by combining the Criterion 1 for TECU, we have:

 Ψ(zt)−Ψ(zt+1) (21) ≥ (η12−12ρ1)∥xt+1−xt∥2+(η22−12ρ2)∥yt+1−yt∥2−ρ1(Cx)22∥xt−xt−1∥2−ρ2(Cy)22∥yt−yt−1∥2 ≥ η14∥xt+1−xt∥2+η24∥yt+1−yt∥2−(Cx)2η1∥xt−xt−1∥2−(Cy)2η2∥yt−yt−1∥2,

where the last equality holds by setting and .

Denoting with . Then by denoting , and , we denote , then the above inequality is equal to:

 (22)

For “”, we have the following inequality from the iterative scheme of proximal update:

 f(xt+1)+H(xt+1,yt)+ζt12∥xt+1−xt∥2≤f(xt)+H(xt,yt). (23)

Then, together with the second inequality of Eq. (19), we have that

 Ψ(zt)≥ Ψ(zt+1)+ζt12∥xt+1−xt∥2+η22∥yt+1−yt∥2−(ρ2∥et+1y∥2+12ρ∥yt+1−yt∥2) (24) ≥ Ψ(zt+1)+ζt12∥xt+1−xt∥2+η24∥yt+1−yt∥2−(Cy)2η2∥yt−yt−1∥2,

where the last inequality holds by setting .

Then by denoting and assign , and , we have . Then, the following inequality holds:

 Φ2(zt+1,zt)≥Φ2(zt,zt−1)+ζt12∥xt+1−xt∥2+(η24−(Cy)2η2)∥yt+1−yt∥2. (25)

While, for the case “”, its prox-linear update indicates that

 f(xt+1)+(xt+1−xt)⊤∇xH(xt,yt)+γt12∥xt+1−xt∥2≤f(xt). (26)

Together with the descent lemma for gradient Lipschitz functions described in [Ortega et al.1970]:

 H(xt+1,yt)≤H(xt,yt)+(xt+1−xt)⊤∇xH(xt,yt)+Lt12∥xt+1−xt∥2, (27)