Introduction
Over the past few decades, multivariate nonconvex optimization has been widely concerned in the realms of pattern recognition and machine learning. Achieving well performances in the tasks such as matrix factorization
[Vu and Monga2016, Bao et al.2016] and image enhancement task [Fu et al.2016, Gharbi et al.2017], multivariate nonconvex problems have motivated a revived interest in designing and analyzing numerical algorithms.Compared to univariate optimization, it is much more complicated to optimize multivariate problems with coupled objective functions. Taking two variables as instance, this kind of coupled problem can be formulated as:
(1) 
with vectors/matrices
and . Being employed to varieties of tasks, the coordinate descent (CD) [Luo and Tseng1989] is widely used for solving problem (1), which optimizes the objective over each direction, while fixing the remaining one with its latest value, i.e, solving univariate optimization problems in a loop. Doing in this way, calculating the coordinate updates are much simpler than computing a full update, requiring less memory and computational cost. However, in addition to these benefits, few CD algorithms consider useful traits of univariate subproblems for improving either convergence speeds or optimized results for solving the generic problem (1) with nonconvex, nonsmooth objective function.In most cases, though is nonconvex and even nonsmooth, it is quite likely to have univariate subproblems with nice properties: e.g., the subproblems can be optimized via convex optimization, or may have unique solutions. Moreover, the univariate problems usually have entirely distinct formations referred to different variables. For example, many literatures have posted superiorities on restricting dictionaries with normalized bases, meanwhile, constraining sparsity for the codes with various nonconvex penalties for dictionary learning tasks [Gregor and Lecun2010, Wang et al.2016, Bao et al.2016]. Though these models are nonconvex and nonsmooth, their univariate subproblems of dictionary can be efficiently solved, compared with the other subproblem. Thus, in view of the nice traits and specificities of univariate problems, it is significant to specifically integrate effective algorithms for optimizing taskspecific subproblems, to improve the efficiency and effectiveness of CD schemes.
More critically, though there are not a few CD algorithms for solving multivariate optimization problems, converging to a critical point is still a nice result for generic nonconvex and nonsmooth problems [Bolte, Sabach, and Teboulle2014, Xu and Yin2015, Pock and Sabach2017]
. While, we have noticed that, many univariate subproblems of realworld image processing tasks are referred to specific application problems. E.g., in tasks like image deblurring and superresolution, one univariate subproblem can be regarded as an image denoising task. Rather than numerical algorithms, techniques such as BM3D and CNN are effective for solving image denoising problem. Although such advanced techniques mostly lack theoretical support, they have the ability to efficiently project the variables on small neighborhoods of the desired solutions. Considering the effectiveness of these advanced techniques, it is significant to integrate them into CD schemes, expecting to get desired results with high probability
[Zhang et al.2017, Chan, Wang, and Elgendy2017].The above mentioned strategies, integrating either numerical algorithms or advanced techniques, have already appeared in applications for specific tasks, which will be briefly stated later. However, the success of those CD schemes, designed for specific problems, can not be straightforwardly replicated to other tasks. Moreover, there has not yet been proposed a unified CD framework, integrating both numerical algorithms and advanced techniques, for optimizing the generic multivariate nonconvex problem (1). More importantly, few of them are able to provide rigid theoretical analyses on illuminating the properties of the final optimized results. Considering all the mentioned aspects, we in this paper propose a realizable algorithm framework, which embeds various taskoriented strategies in the update of CD scheme, for effectively solving the generic problem (1). We name our proposed algorithm as TECU (task embedded coordinate update), and the main contributions are sketched out as follows:

For optimizing the generic multivariate problem (1) with coupled nonconvex objective, we propose an algorithm TECU, which embeds taskoriented techniques for optimizing specific univariate subproblems of CD update. Moreover, we further provide a realizable condition to ensure robust performances of TECU with theoretical analyses.

Considering the nice properties of univariate subproblems, TECU is able to improve the algorithm efficiency by embedding highefficient numerical algorithms into its framework. We utilize the regularized dictionary learning task and design to embed ADMM to accelerate the convergence speed of the whole algorithm. Experiments conducted on synthetic data give verifications on the efficiency of TECU, in comparison with other existing numerical algorithms.

Through embedding advanced techniques, TECU is likely to obtain desired solutions with high probability, which is superior to most numerical algorithms for nonconvex optimization. Taking lowlight image enhancement as an example, we embed a residualtype CNN to optimize the univariate problem of illumination layer. Then, comparing to stateoftheart methods, the experimental results show the superiority of embedding networks for realworld tasks, meanwhile, verify the effectiveness of integrating networks and CD schemes in a unified algorithm framework.
Related Work
For solving general multivariate nonconvex problems, the most classical case that adopts CD scheme is the proximal alternating method (PAM) [Attouch et al.2010]. However, it is limited to most coupled problems for requiring explicit solutions for every univariate subproblems. To get around this limitation, the PALM linearizes the coupled function, in pursuit of explicit solutions [Bolte, Sabach, and Teboulle2014]
. However, it requires computing exact Lipschitz constants during iterations, which sometimes is timeconsuming even for estimating their tight upper bounds
[Bao et al.2016, Xu and Yin2015]. Moreover, improper upper bounds definitely slow down the convergence speeds of PALM. These troubles on estimating Lipschitz constants also exist in CD variants like BCU [Xu and Yin2017] and iPALM [Pock and Sabach2017]. Besides the mentioned defects, the updates of existing CD algorithms utterly lose sight of task specificities, i.e., optimizing every univariate subproblems in the same scheme, which is less efficient in practice.Unlike algorithms for general problems, it is common to embed numerical algorithms for optimizing subproblems in realworld applications [Guo and Ma2014, Li and Brown2014, Li et al.2016, Yue et al.2017]. Such algorithms often make good uses on the nice traits of univariate subproblems, thus they always possess high efficiencies and superior performances. However, their specificities give rise to less generalization: the welldesigned algorithms usually cannot be borrowed to other models. Not only this, those specified algorithms mostly have relatively weak convergence in theory, thus their efficiencies are mostly lack of robustness.
Quite recently, fusing advanced techniques into optimization framework has been a hot research interest for realworld applications [Schmidt and Roth2014, Liu et al.2018, Zhang et al.2017]. For example, the authors in [Schmidt and Roth2014] learn a cascade of shrinkage fields to replace artificially designed priors in a halfquadratic optimization for image restoration. Instead of designing complex regularizers, [Zhang et al.2017] learns a CNNbased denoiser to replace corresponding subproblem in their optimization framework. These novel methods usually have remarkable performances with the power of advanced techniques, however, their successes rely on completely replacing univariate subproblems with advanced techniques, thus few of them are able to illuminate the properties on final results with rigid theoretical analyses.
Preliminaries
In general, the objective function of problem (1) is assumed to have: (1) and are proper, lower semicontinuous (l.s.c); (2) is a function; its gradient and partial gradients are Lipschitz continuous on bounded sets; (3) is coercive, that is, it is bounded from below and when , where denotes the Frobenius norm. Meanwhile, is a KurdykaŁojasiewicz (KŁ) function.
Notice that, all semialgebraic functions and subanalytic functions satisfy the KŁ property. Typical semialgebraic functions include real polynomial functions, with , indicator functions of semialgebraic sets, Stiefel manifolds and constant rank matrices [Attouch et al.2010].
Task Embedded Coordinate Update
Corresponding to specific tasks, the univariate subproblems of the generic model (1) usually either have desirable characteristics or can be corresponding to certain tasks with single variable. Considering these available traits, we embed powerful strategies in CD scheme, for optimizing taskoriented univariate subproblems, and then improving the convergence speeds and optimized results of the whole algorithm.
The most basic CD scheme optimizes the objective cyclically over each variable, that is, successively solving the following subproblems to update and at iteration .
(2)  
Targeting to these univariate subproblems, we respectively introduce numerical algorithms and advanced techniques to optimize them with a practical error control condition, which provides a criteria on optimization precisions, meanwhile, helps illuminating the properties of final optimized results.
Task Embedded Strategies
Owing to the diversities of realistic tasks, the univariate subproblems in Eq. (2) usually have distinct objective functions. Taking their specificities into consideration, we introduce two targeted strategies, i.e., numerical algorithms and advanced techniques, for optimizing these univariate subproblems.
Numerical Algorithms Embedding
Mostly, it makes no sense to employ extra numerical algorithms for optimizing subproblems in Eq. (2) if their explicit solutions are easily obtained. However, for most cases with coupled , it is common to adopt linearization for easytosolve subproblems. But as mentioned in the section of related work, the linearization skill requires estimating Lipschitz constants during every iteration, which brings a series of timeconsuming troubles.
We have noticed that, though the objective function is nonconvex and nonsmooth, subproblems corresponding to specific tasks may possess nice traits, e.g, they are convex sometimes even differentiable problems, or they have unique solutions. Thus there are plenty of highefficient numerical algorithms designed for solving such univariate problems, e.g., greedy algorithms [Elad2010], PCG [Spillane2016], FISTA [Kim and Fessler2018] and ADMM [Wang, Yin, and Zeng2018]. Thus it is advantaged to embed efficient algorithms in the CD scheme, for improving the efficiency and effectiveness of the whole algorithm, which is one of the motivations for proposing TECU.
Advanced Techniques Embedding
For realworld image processing tasks, most univariate subproblems can be corresponding to specific applications. For example, in tasks of image deblurring and superresolution, one univariate subproblem can be seen as image denoising task [Chan, Wang, and Elgendy2017, Zhang et al.2017]. While for lowlight image enhancement, one of two subproblems is for estimating the illumination layer, while the other one is for restoring the reflection image.
Except for optimization algorithms, there have been plenty advanced techniques such like BM3D [Chan, Wang, and Elgendy2017]
and variants of neural networks
[Zhang et al.2017, Gharbi et al.2017] for solving singlevariable tasks. Different from numerical algorithms, advanced techniques like neural networks, obtain the final results by a pretrained propagations, rather than optimizing mathematical models. Such advanced techniques are mostly quite efficient, meanwhile, are able to propagate variables very close to the desired solutions. Taking these advantages into consideration, we propose to embed advanced techniques into CD scheme, to improve the convergence speeds of the whole algorithm, meanwhile, to get desired solutions with high probability.Targeting to embed the above mentioned strategies into a unified framework, we adopt and steps of updates, starting at and , for optimizing problems of Eq. (2) to some extents, namely:
(3)  
where denotes the composition operator. Each can be set as either onestep iteration of numerical algorithms or a propagation of advanced techniques.
This general framework covers various existing methods [Yue et al.2017, Schmidt and Roth2014, Zhang et al.2017]: e.g., for solvers like [Yue et al.2017], each can be seen as onestep iteration of numerical algorithms; for others like [Zhang et al.2017], there exists only one step propagation of advanced techniques. Besides, Eq. (3) also includes cases that employ numerical algorithms and advanced techniques in hybrid manners, which is far more flexible than existing solvers. Furthermore, considering two scenarios: 1) one problem of Eq. (2) has closedform solution; 2) not all the subproblems have efficient numerical algorithms, thus we propose TECU in a hybrid updating scheme (as shown in Alg. 1) with other two classical CD updates (see Table 1).
Proximal update  Proxlinear update  

Error Control and Estimation
Notice that our task embedded strategies do not require exactly optimizing problems in Eq. (2): updating and by task embedded strategies bring errors and to the firstorder optimality conditions of univariate subproblems:
(4)  
where and , with represents the Fréchet limitingsubdifferential [Attouch et al.2010].
Apparently, imprecise task embedded calculations certainly slow down the convergence speed of the whole algorithm, while overprecise optimizations are timeconsuming and unnecessary for practical use. Hence, we provide the following criterion to control the accuracies for optimizing univariate subproblems.
Criterion 1
The errors and brought by task embedded strategies should be controlled by certain constants, i.e.,
(5) 
with and . Moreover, and should be satisfied.
Notice that the conditions in Eq. (5) are certainly attainable for converged algorithms since the inaccurate errors and are approaching to zero when the algorithms are identified as converged. However, since and should not be optionally selected from the sets of and , we in Prop. 1 provide an implementation for estimating the errors, to make our proposed TECU more practical^{1}^{1}1All the proofs in this paper are presented in [Wang et al.2018].
Proposition 1
Two intermediate variables and are calculated with respect to and as follows^{2}^{2}2 denotes proximal mapping to proper, l.s.c function .:
(6)  
with functions and . Then, the following
(7)  
are implementations of and in Eq. (4), by assigning to and to .
So far, we have introduced the whole process of TECU and further provide its detailed procedures in Alg. 2. As follows, we will demonstrate that our error control conditions are more persuasive than previously used criteria [Li and Pong2014, Yue et al.2017], since it is well converged in theory.
Theoretical Analyses
With properties of the objective function and the error control criterion, we in this section present some nice properties of TECU. Firstly, we demonstrate that along with the iteration progresses of TECU, there exists a bounded function that satisfies sufficientdescent property.
Proposition 2
Suppose that the sequence is generated by TECU, then with Criterion 1, there exist a bounded function , such that for :
(8a)  
(8b) 
with definite positive constants and . In addition, the sequence generated by TECU is bounded.
Since our proposed TECU adopts a hybrid update scheme (presented in Alg. 1), thus the , and have different concrete formations, corresponding to distinct combinations of updates [Wang et al.2018]. Then with this proposition, we are ready to illuminate the property of the final solution optimized by TECU.
Theorem 3
The generated by TECU is a Cauchy sequence, which converges to a critical point of the original objective function .
The Theorem 3 presents the property of the final optimized solution, meanwhile, demonstrates the robust performances of our proposed algorithm framework. Moreover, TECU has at least sublinear convergence rate when the desingularising function of the objective is satisfied with positive constant and Better yet, it will further have linear convergence rate if . Though this convergence rate is accordance with previous CD schemes like [Attouch et al.2010], our TECU is far more efficient and effective for realistic tasks.
Iteration number / Total propagation steps  Computation time (s)  
Algorithms  PALM  INV  BCU  iPALM  TECU  PALM  INV  BCU  iPALM  TECU 
21  109  19  21  12 / 54  4.45  16.20  3.87  4.68  2.22  
21  35  18  21  14 / 152  17.16  21.40  14.45  18.01  11.96  
22  39  18  22  17 / 153  121.47  137.27  100.19  124.67  75.19 
PALM  INV  BCU  iPALM  TECU  PALM  INV  BCU  iPALM  TECU  PALM  INV  BCU  iPALM  TECU 
0.21  0.15  0.20  0.22  0.04  0.82  0.61  0.80  0.86  0.08  5.52  3.52  5.57  5.67  0.49 
TECU for Realistic Tasks
As previously stated, TECU allows embedding both numerical algorithms and advance techniques in its algorithm framework. Hence, we consider two realistic tasks, regularized dictionary learning (DL) and lowlight image enhancement (LIE), to verify both efficiency and effectivenss of embedding taskoriented strategies.
Regularized Dictionary Learning
Dictionary learning is a powerful tool for learning features from data. Its basic idea is to factorize the data as , where is the dictionary and is the corresponding coefficients. We consider the previously proposed DL problem [Bao et al.2016], which can be modeled as
(9) 
where denotes the penalty that counts the number of nonzero elements, and indicator function acts on the set for normalized bases.
Notice that, the two subproblems of DL have entirely different characteristics. The subproblem of is a sparse coding task with penalty, which can be optimized by a proximal iterative hardthresholding (PITH) method [Bach et al.2011]. While the subproblem of minimizes a strongly convex quadratic function with unit ball constraint, thus it can be efficiently solved. Considering this nice property, we embed ADMM for optimizing subproblem, and the experimental results verify that embedding ADMM improve the efficiency of optimization. However, due to the nonconvexity of the subproblem of , additional experiments are conducted to show that embedding PITH is not a good choice for TECU, which also indicates the necessity of hybrid scheme.
Rather than TECU, the DL problem can also be optimized by other numerical algorithms. However, as repeatedly stated, though the update of PALM can be simply computed, its request on estimating Lipschitz constants decelerate the overall convergence speeds, especially for largescale data (see Table 3). Except for PALM, its two variants, i.e., BCU [Xu and Yin2017] and iPALM [Pock and Sabach2017] are also employed for solving the DL problem. But these variants do not avoid estimating Lipschitz constants, even worse, their efficiencies are restricted by additional parameter conditions. To avoid these troubles, [Bao et al.2016] utilize a strategy (named INV), which solves the subproblem of by leaving out term first, and then directly projecting the results on . However, we have noticed that the update of INV is inaccurate without theoretical supports, thus its performances are always lack of robustness (see Fig. 1).
Dataset  HE  BPDHE  MSRCR  GOLW  NPEA 

NASA  3.68  3.77  3.66  3.74  3.43 
Nonuniform  4.28  3.09  3.04  2.99  2.99 
Dataset  SRIE  WVM  JIEP  HDRNet  TECU 
NASA  3.97  3.86  3.72  3.72  3.41 
Nonuniform  3.02  2.94  2.97  3.11  2.91 
Lowlight Image Enhancement
The purpose of LIE is to enhance the captured lowvisibility images so that highquality images can be obtained. For LIE, Retinexbased decomposition [Cai et al.2017, Fu et al.2016] is widely concerned: , with elementwise multiplication operator . Thus LIE is to factorize the observed lowlight image into an illumination layer that represents the light intensity, and a reflection layer which describes the physical characteristic of objects.
Considering the characteristics of illumination layer, many literatures [Fu et al.2015, Cai et al.2017] enforce a smooth constraint, i.e., to represent the smooth changes of the illumination layer. Then together with the range constraints of both layers, we establish the following optimization model for LIE task:
(10) 
with , .
Employing TECU for optimizing it, we embed a residualtype CNN to propagate the illumination layer very close to the desired solution. Specifically, we first randomly choose 800 imagepairs from ImageNet database
[Krizhevsky, Sutskever, and Hinton2012] and crop them intosmall patches, to train a neural network with only one residual block including 7 convolutional layers and ReLU activations. Then at each iteration, we first use the pretrained network to propagate the latest value of
. Then by considering that the pretrained network may not always satisfy the error control conditions, thus we further employ proxlinear updates as the remaining propagations, until the Criterion 1 is satisfied. For the other subproblem of reflection layer, we adopt proximal update to get the closedform solution.Experimental Results
With the task embedded strategies respectively designed for DL and LIE problems, we apply TECU to DL with synthetic data to verify its nice convergence properties. While for LIE task, TECU is applied to realworld images. The experimental results of both two realistic tasks demonstrate the efficiency and effectiveness of embedding strategies into CD algorithm, by comparing with other stateoftheart methods.
4.95 /   4.95 / 0.13  4.60 / 2.08  4.67 / 1.11  4.79 / 4.94  4.56 / 0.10 
2.67 /   2.77 / 0.51  2.76 / 5.90  2.71 / 3.12  2.63 / 13.79  2.55 / 0.31 
Input  SRIE  WVM  JIEP  HDRNet  TECU 
Regularized Dictionary Learning
We generate synthetic data with different sizes (see Table 2) to help analyze the convergence properties of TECU. Specifically, all the algorithms are terminated when satisfying the following condition:
(11) 
Comparisons of Different Embedded Strategies
Firstly, we conduct experiments to compare two particular cases of our proposed framework. Precisely, the sign “TECU” in the Fig. 1, Table 2 and Table 3 represents the case of embedding ADMM for updating , while using proxlinear update for subproblem; the “TECUPITH” in Fig. 1 refers to embedding PITH for updating , while applying proxlinear update for subproblem.
From the first row of Fig. 1, it is distinct that these two cases of TECU have quite different convergence performances. From the top row of Fig. 1(d), we can see that TECU requires a few propagation steps, however, TECUPITH reaches the maximum inner steps (set as ) at almost every iteration. The excessive inner propagations definitely decelerate the overall convergence speed, i.e., TECU uses 2.22s but TECUPITH takes nearly 600s to converge.
This comparison result from one side shows that ADMM is more productive for optimizing subproblem, but PITH is less effective for solving the subproblem of . On the other side, the different performances are also influenced by the characteristics of subproblems. The subproblem of minimizes strongly convex quadratic function with unit ball constraint, thus it can be efficiently solved. However, due to the penalty, the subproblem of is NP hard, which is more difficult to optimize. Therefore, this comparisons of different embedded strategies indicate the necessity of employing hybrid scheme in the framework of TECU, meanwhile, suggest embedding highefficient numerical algorithms for subproblem optimization.
Comparisons with Other Algorithms
Comparing with other existing algorithms, we can see from Table 2 and Fig. 1 that TECU converges with less iteration steps, especially has better convergence performances on the optimization of subproblem. Moreover, we give further comparisons in Table 3 to show that the computation time of one propagation in TECU is much less than the ones in other algorithms. Thus it is the reason why TECU totally adopts more propagations but has less computation time. Moreover, since all the PALM, BCU and iPALM require estimating Lipschitz constants at every iteration, thus they have similar onestep computation time on different data scales. It is obvious that estimating Lipschitz constants during iterations is extremely timeconsuming especially for large scale data. Thus though embedding numerical algorithms brings more propagations, it avoids estimating Lipschitz constants thus is far more efficient than existing algorithms.
Lowlight Image Enhancement
Firstly, we conduct an experiment in Fig. 2 to compare TECU with the classical CD algorithm, i.e., PAM [Attouch et al.2010] on an example image from [Cai, Gu, and Zhang2018]. Since [Cai, Gu, and Zhang2018] provides image pairs of lowlight images and the references obtained by other techniques, we provide PSNR values with respect to the given reference to give quantitative evaluations. As shown in Fig. 2, TECU achieves superior performances in terms of visual effect and PSNR score. Moreover, we further plot the PSNR curves in Fig. 2(e). From which we can tell that embedding network into the classical CD scheme certainly produces an excellent growth trend than employing itself.
We compare TECU with stateoftheart approaches including: HE [Cheng and Shi2004], BPDHE [Sheet et al.2010], MSRCR [Rahman, Jobson, and Woodell2004], GOLW [Shan, Jia, and Brown2010], NPEA [Wang et al.2013], SRIE [Fu et al.2015], WVM [Fu et al.2016], JIEP [Cai et al.2017] and HDRNet [Gharbi et al.2017], on the NASA dataset [NASA2001] and the Nonuniform dataset [Wang et al.2013]. There are 23 images of different indoor and outdoor scenes in NASA dataset, while the Nonuniform dataset consists 130 lowquality images in different natural scenes including sunshine, overcast sky and nightfall scenarios.
For the lack of ground truth, it is impossible to give standard metrics (i.e., PSNR) to evaluate the quantitative performances for LIE task. In previous literatures [Wang et al.2013, Fu et al.2016, Cai et al.2017], a blind image quality assessment called Natural Image Quality Evaluator (NIQE) is widely used to give quantitative evaluation for LIE. Following this, we also present the NIQE scores in Table 4, comparing with all these stateoftheart methods on the two different benchmarks. The comparison results in Table 4 indicate that TECU with embedded network has the lowest NIQE score and thus achieves the highest image quality. We also provide a visual comparison on examples selected from both two datasets. It is obvious that TECU is able to enhance image quality with high contrast, but other results are still contain details in dark, which are hard to recognize. Thus from both quantitative and quality analyses, we can conclude that embedding networks in the framework of TECU is effective and competitive for the challenging LIE task.
Conclusion
We propose a realizable algorithm framework TECU, which embeds both numerical algorithms and advance techniques for optimizing a generic multivariate nonconvex problem. Through embedding taskoriented strategies, TECU is able to improve the convergence speed of the whole algorithm and obtain desired solutions with high probability. Moreover, we further provide a realizable error control condition, to ensure robust performances with rigid theoretical supports. The experimental results on two practical problems verify the superiorities of our proposed algorithm.
Acknowledgments
This work was supported by National Natural Science Foundation of China (Grant Nos. 61672125, 61733002, 61572096, 61632019 and 61806057), China Postdoctoral Science Foundation (Grant No. 2018M632018) and the Fundamental Research Funds for the Central Universities.
References
 [Attouch et al.2010] Attouch, H.; Bolte, J.; Redont, P.; and Soubeyran, A. 2010. Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the kurdykałojasiewicz inequality. Mathematics of Operations Research 35:438–457.
 [Bach et al.2011] Bach, F.; Jenatton, R.; Mairal, J.; and Obozinski, G. 2011. Optimization with sparsityinducing penalties. Foundations & Trends®in Machine Learning 4(1):1–106.
 [Bao et al.2016] Bao, C.; Ji, H.; Quan, Y.; and Shen, Z. 2016. Dictionary learning for sparse coding: Algorithms and convergence analysis. IEEE TPAMI 38(7):1356–1369.
 [Bolte, Sabach, and Teboulle2014] Bolte, J.; Sabach, S.; and Teboulle, M. 2014. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming 146:459–494.
 [Cai et al.2017] Cai, B.; Xu, X.; Guo, K.; Jia, K.; Hu, B.; and Tao, D. 2017. A joint intrinsicextrinsic prior model for retinex. In CVPR.
 [Cai, Gu, and Zhang2018] Cai, J.; Gu, S.; and Zhang, L. 2018. Learning a deep single image contrast enhancer from multiexposure images. IEEE TIP 27(4):2049–2062.
 [Chan, Wang, and Elgendy2017] Chan, S. H.; Wang, X.; and Elgendy, O. A. 2017. Plugandplay admm for image restoration: Fixedpoint convergence and applications. IEEE TCI 3(1):84–98.
 [Cheng and Shi2004] Cheng, H., and Shi, X. 2004. A simple and effective histogram equalization approach to image enhancement. Digital Signal Processing 14(2):158–170.
 [Elad2010] Elad, M. 2010. Sparse and Redundant Representations. Springer New York.
 [Fu et al.2015] Fu, X.; Liao, Y.; Zeng, D.; Huang, Y.; Zhang, X.P.; and Ding, X. 2015. A probabilistic method for image enhancement with simultaneous illumination and reflectance estimation. IEEE TIP 24(12):4965–4977.
 [Fu et al.2016] Fu, X.; Zeng, D.; Huang, Y.; Zhang, X.P.; and Ding, X. 2016. A weighted variational model for simultaneous reflectance and illumination estimation. In CVPR.
 [Gharbi et al.2017] Gharbi, M.; Chen, J.; T., B. J.; Hasinoff, S. W.; and Fredo, D. 2017. Deep bilateral learning for realtime image enhancement. ACM ToG 36(4):118.
 [Gregor and Lecun2010] Gregor, K., and Lecun, Y. 2010. Learning fast approximations of sparse coding. In ICML.
 [Guo and Ma2014] Guo, X.and Cao, X., and Ma, Y. 2014. Robust separation of reflection from multiple images. In CVPR.
 [Kim and Fessler2018] Kim, D., and Fessler, J. A. 2018. Another look at the fast iterative shrinkage/thresholding algorithm (fista). Siam J. on Optimization 28(1).

[Krizhevsky, Sutskever, and
Hinton2012]
Krizhevsky, A.; Sutskever, I.; and Hinton, G. E.
2012.
Imagenet classification with deep convolutional neural networks.
In NIPS.  [Li and Brown2014] Li, Y., and Brown, M. S. 2014. Single image layer separation using relative smoothness. In CVPR.
 [Li and Pong2014] Li, G., and Pong, T. K. 2014. Global convergence of splitting methods for nonconvex composite optimization. arXiv.
 [Li et al.2016] Li, C.; Guo, J.; Cong, R.; Pang, Y.; and Wang, B. 2016. Underwater image enhancement by dehazing with minimum information loss and histogram distribution prior. IEEE TIP 25(12):5664–5677.
 [Liu et al.2018] Liu, R.; Ma, L.; Wang, Y.; and Zhang, L. 2018. Learning converged propagations with deep prior ensemble for image enhancement. IEEE TIP.
 [Luo and Tseng1989] Luo, Z. Q., and Tseng, P. 1989. On the convergence of the coordinate descent method for convex differentiable minimization. Journal of Optimization Theory and Applications 72:7–35.
 [NASA2001] NASA. 2001. Retinex image processing. https://dragon.larc.nasa.gov/retinex/pao/news/.
 [Ortega et al.1970] Ortega; M, J.; Rheinboldt; and C, W. 1970. Iterative solution of nonlinear equations in several variables. 25(114):347?380.
 [Pock and Sabach2017] Pock, T., and Sabach, S. 2017. Inertial proximal alternating linearized minimization (ipalm) for nonconvex and nonsmooth problems. Siam J. Imaging Sciences 9(4):1756–1787.
 [Rahman, Jobson, and Woodell2004] Rahman, Z.u.; Jobson, D. J.; and Woodell, G. A. 2004. Retinex processing for automatic image enhancement. Journal of Electronic Imaging 13(1):100–111.
 [Schmidt and Roth2014] Schmidt, U., and Roth, S. 2014. Shrinkage fields for effective image restoration. In CVPR.
 [Shan, Jia, and Brown2010] Shan, Q.; Jia, J.; and Brown, M. S. 2010. Globally optimized linear windowed tone mapping. IEEE TVCG 16(4):663–675.
 [Sheet et al.2010] Sheet, D.; Garud, H.; Suveer, A.; Mahadevappa, M.; and Chatterjee, J. 2010. Brightness preserving dynamic fuzzy histogram equalization. IEEE TCE 56(4).
 [Spillane2016] Spillane, N. 2016. An adaptive multipreconditioned conjugate gradient algorithm. Siam J. on Scientific Computing 38(3):A1896–A1918.
 [Vu and Monga2016] Vu, T. H., and Monga, V. 2016. Fast lowrank shared dictionary learning for image classification. IEEE TIP PP(99):1–1.
 [Wang et al.2013] Wang, S.; Zheng, J.; Hu, H.M.; and Li, B. 2013. Naturalness preserved enhancement algorithm for nonuniform illumination images. IEEE TIP 22(9):3538–3548.
 [Wang et al.2016] Wang, Y.; Liu, R.; Song, X.; and Su, Z. 2016. Linearized alternating direction method with penalization for nonconvex and nonsmooth optimization. In AAAI.
 [Wang et al.2018] Wang, Y.; Liu, R.; Ma, L.; and Song, X. 2018. Supplementary material of task embedded coordinate update: A realizable framework for multivariate nonconvex optimization.
 [Wang, Yin, and Zeng2018] Wang, Y.; Yin, W.; and Zeng, J. 2018. Global convergence of admm in nonconvex nonsmooth optimization. Journal of Scientific Computing 1–35.

[Xu and Yin2015]
Xu, Y., and Yin, W.
2015.
A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion.
SIAM J. Imaging Sciences 6(3):1758–1789.  [Xu and Yin2017] Xu, Y., and Yin, W. 2017. A globally convergent algorithm for nonconvex optimization based on block coordinate update. Journal of Scientific Computing 72(2):700–734.
 [Yue et al.2017] Yue, H.; Yang, J.; Sun, X.; Wu, F.; and Hou, C. 2017. Contrast enhancement based on intrinsic image decomposition. IEEE TIP 26(8):3981–3994.
 [Zhang et al.2017] Zhang, K.; Zuo, W.; Gu, S.; and Zhang, L. 2017. Learning deep cnn denoiser prior for image restoration. In CVPR.
Supplementary Material of
Task Embedded Coordinate Update: A Realizable Framework
for Multivariate Nonconvex Optimization
In this supplementary material, the contents are presented according to the following order:

Revisit the definition of KurdykaŁojasiewicz (KŁ) property/function.

Give detailed proofs of Proposition 1.

Give more experimental results of lowlight image enhancement task.
KurdykaŁojasiewicz Property/Function
Definition 4
(KurdykaŁojasiewicz function) Proper, lower semicontinuous function is said to have the KurdykaŁojasiewicz property at if there exist , a neighborhood of and a desingularizing function which satisfies (1) ; (2) is on and continuous at ; (3) for all , such that for all
(12) 
the following inequality holds
(13) 
Moreover, if satisfies the KŁ property at each point of then is called a KŁ function.
Detailed Proofs of Proposition 1

From the calculations in Eq. (6), we can deduce the following equalities.
(14) Once the error satisfies the Criterion 1, will be assigned as in the Eq. (14), thus we get
(15) The above deductions can be similarly extended to the case of :
(16) From the definition of proximal mapping operator, Eq. (15) and (16) are equal to
(17) where and . The above equalities show that and are implementations of and in Eq. (4).
Detailed Proofs of Convergence
Firstly, we can conclude from Eq. (4) that, the and updated by task embedding strategy can be regarded as solutions to the following subproblems:
(18)  
This equivalent conversion is strict since the firstorder optimality conditions of Eq. (18) are exactly the same with Eq. (4). However, we have to emphasize that it is only used for theoretic analyses: we do not directly optimize Eq. (18) in practice, instead, and are updated by task embedding strategy, as claimed in Alg. 1.
Since our proposed TECU is a hybrid framework which contains three different updates at each iteration, we would like to revisit the proximal update, proxlinear update and the theoreticallyequivalent form of our novel taskembedding update:
For solving subproblem:

Proximal: , .

Proxlinear: , .

Task embedding: , .
For solving subproblem:

[resume]

Proximal: , .

Proxlinear: , .

Task embedding: , .
There are totally 9 combinations under TECU framework, that is: However, we are only interested the ones that consist at least one task embedding update, that is, we consider the cases:
In the subsequence, we will prove that this hybrid algorithm TECU has nice convergence property: it generates a Cauchy sequence that converges to a critical point of the original objective function.
Proof for Proposition 2

Notice that “”, “”are the same with “”, “” since and can be switched to each other. Thus we only give detailed proofs on cases of “”, “” and “”.
(Sufficient descent property: Eq. (8a))
For “” with task embedding updates on both subproblems, we have the following inequalities:
(19) Adding the above two inequalities, then we can get the following inequalities with positive real numbers and :
(20) The last inequality comes from applying Young’s inequality. Then by combining the Criterion 1 for TECU, we have:
(21) where the last equality holds by setting and .
Denoting with . Then by denoting , and , we denote , then the above inequality is equal to:
(22) For “”, we have the following inequality from the iterative scheme of proximal update:
(23) Then, together with the second inequality of Eq. (19), we have that
(24) where the last inequality holds by setting .
Then by denoting and assign , and , we have . Then, the following inequality holds:
(25) While, for the case “”, its proxlinear update indicates that
(26) Together with the descent lemma for gradient Lipschitz functions described in [Ortega et al.1970]:
(27)
Comments
There are no comments yet.