Edge-preserving smoothing (EPS) has attracted a strong interest in the fields of image processing and computer vision. Predominantly, it appears in a manipulation task that requires decomposing an image into a piecewise smooth layer and a detail layer. These layered signals can be recombined to match various application goals, e.g., detail enhancement and tone mapping . Recent works on joint smoothing provide a new paradigm, enabling various applications such as dense correspondence , joint upsampling [3, 4], and texture removal . The problem of segmentation  and visual saliency  may also be interpreted as a joint smoothing problem. The basic idea of joint smoothing is to provide structural guidance of how the smoothing should be performed. Thus, it is assumed that the guidance signal has enough information to alter structures in the input image. In this context, global optimization-based methods [1, 8, 9] are advocated. They find an optimal solution to an objective function that consists of a data fidelity term and a prior term. Thanks to such global formulation, the optimization-based methods show the state-of-the-art performance compared with local EPS approaches that typically have a form of weighted averaging . However, this outperformance can be achieved only at the price of high computational cost, mainly arising from solving the global objective function. The optimization-based methods are still an order of magnitude slower than local ones [10, 11, 12], even with recent acceleration techniques [13, 14, 15]. Progress in hardware will make an efficient implementation possible, but it is not unlikely that an image resolution will increase as well. Accordingly, a different optimization technique is needed for a highly efficient global EPS methods.
In this paper, we formulate a global EPS (e.g., based on weighted-least squares or weighted-total variation) as an equivalent constrained optimization problem.
This formulation results in a sequence of sub-problems that is much easier to optimize.
The computational efficiency of our approach is due to a kind of variable splitting techniques.
However, unlike the previous splitting-based methods111The most expensive operation in previous methods, in the senses of alternating minimization [14, 15] or half quadratic optimization  , is fast Fourier transforms with
, is fast Fourier transforms withcomplexity. [14, 15], our formulation enables using a highly efficient, linear time algorithm in both weighted-least squares (WLS) and -total variation (WTV) problems. As a result our approach has a time complexity linear to the number of image pixels. Another appealing aspect is that it converges with only few iterations. We also propose fast iteratively re-weighted algorithms for an objective function using a non-convex prior term. Note that the previous splitting-based methods in [14, 15] are not directly applicable to many non-convex priors of practical relevance .
2 Problem formulation and Analysis
EPS can be formulated as finding a solution of an objective function. We can find the solution by using popular iterative methods  or splitting-based approaches [14, 15], depending on the prior terms used in the objective function. We start with a basic formulation for EPS to provide some intuition. In the following, the subscript denotes the 2D spatial location of a pixel.
Given an input image and a guidance image , a desired output is obtained by minimizing the following objective function:
where and are discrete implementations of the derivative in horizontal and vertical directions, respectively. controls the strength of the smoothing. can be the input image or a different signal correlated with . The weight is defined using and constant . The potential function and the weight function allow one to employ various image priors that behave differently in preserving or smoothing image features. We always assume that and have the same width () and the height ().
2.1 WLS for EPS
When , the objective function of (1) becomes quadratic, and it corresponds to the WLS framework . The minimizer satisfies the following linear system:
where is a discrete difference matrix, is a diagonal matrix containing the weights , and
is an identity matrix. Iterative solvers such as Jacobi and conjugate gradient methods
are applicable to solve the sparse linear system of (2). Since these methods consist of matrix-vector multiplications, each iteration runs in linear time to an image size. However, the number of iterations, required for achieving a particular accuracy, depends on the matrix dimension () , and hence the computational cost is considerable. Despite recent progress in preconditioning techniques, all the existing solvers are still an order of magnitude slower than the state-of-the art local EPS approaches [10, 11]. Moreover, the cost of constructing the preconditioner may outweigh the speed gain from the improved conditioning.
Similarly, the preconditioning techniques  may not accelerate EPS algorithms using the iteratively re-weighted least squares (IRLS) method . An intermediate linear system of the IRLS method varies during external iterations and thus a series of preconditioners should be constructed, leading to a huge amount of computational overhead.
2.2 WTV for EPS
The WTV prior, , often achieves a better capability along boundaries [5, 8], but it needs more computational cost than using in the WLS. A common approach to minimizing the WTV objective function is to exploit the variable-splitting and penalty techniques, as follows :
where is a penalty parameter. An auxiliary variables and are introduced for an alternative minimization of the data and the prior terms. When is fixed, minimizing the objective function of (3) with respect to can be solved in a closed-form, and vice versa. The solver is hence iteratively applied while updating , , and :
Although numerous methods, such as alternating direction method of multipliers (ADMM)  and split-Bregamn (SB) , have been proposed, they use the fast Fourier Transform222The linear system (5) can be diagonalized by FFT. Thus, solving (5) requires three FFT calls. (FFT) to update the variable . As a result, the computational cost required for solving (3) is per iteration ().
In this section, we first propose an efficient method of minimizing (1) when or . We then apply it to solve non-convex objective functions. The key idea is to decompose (1) into each spatial dimension with a proper variable splitting, and then to use a constrained optimization technique.
Consider the following optimization problem with linear equality constraint:
We again denote by an auxiliary variable. In our formulation, the size of is equal to the input . Then, it is clear that (6) is equivalent to the original problem (1), under the constraint . The penalty decomposition method333We have tested the augmented Lagrangian method  to avoid using large values (continuation), but found this method did not improve a convergence rate, while increasing a memory demand (due to Lagrange multiplier). , associated with (6) can be written as:
This problem can be solved with block coordinate descent by minimizing (7) with respects to and , alternately 444For scalar variables, ..
where , , and .
Now, we are ready to show the advantage of our formulation. As represents a difference operator with respect to the horizontal axis, we can decompose (8) into sub-problems defined with 1D horizontal signals only. By introducing a 1D slack variable , we have:
The super-script denotes a horizontal signal along the dimension (). This 1D optimization process is repeated for all horizontal signals ( in number). Note that a similar result can be obtained for the auxiliary variable . In this case, (9) is decomposed into a sub-problem with a 1D vertical signal along the dimension ().
To the best of our knowledge, the variable splitting technique has been applied in such a way that , yielding complexity algorithm , as explained in Section 2. The previous variable splitting [15, 8, 16] aims to transfer and out of the potential function by introducing the auxiliary variable and , respectively. In contrast, we utilize it to decompose the original problem of (1) into a series of 1D sub-problems. This not only leads to easily solvable sub-problems, but also significantly improves the convergence rate of the algorithm (see Section 4). In the following, we present an efficient method of solving the objective form of (10) defined with a 1D horizontal signal. Its vertical counterpart can be optimized in the same manner.
3.1 1D Fast solver
With , (10) can be rewritten with a 1D horizontal signal and a guide signal as
where . The 1D output that minimizes the above equation is obtained by solving the following linear system of size .
Note that the size of and is , not as in (2). Interestingly, the problem (12) becomes much easier to solve than (2) since is a tridiagonal matrix. We can solve this equation with cost () by the Thomas algorithm . It consists of forward-backward steps. More details can be found in the supplementary material.
When , (10) is written as follows:
The IRLS algorithm can be applied to solve (13) approximately . However, there exists a non-iterative, method for the 1D total variation . While this method is designed to solve 1D (un-weighted) total variation, it is possible to extend it to minimize 1D WTV. Note that this extension enables the fast iteratively re-weighted L1 (IRL1) algorithm for 2D image smoothing. See Section 3.3 for details.
We introduce the (Fenchel-Moreau) dual form of (13) as follows:
where is the dual variable. Once the solution of the problem (14) is found, we can recover the solution of its primal form by
The optimality condition which characterizes the solutions and is then expressed as
More details about the derivation of (14) and (16) are available at the supplementary material.
3.2 2D Smoothing algorithm and properties
The proposed algorithm in the case of 2D image smoothing is summarized in Algorithm 1. Given an input , a guidance and a smoothing parameter , the global 1D smoothing operations are sequentially performed along with the horizontal and vertical directions. At each iteration, the input for 1D horizontal smoothing is re-calculated as the linear combination of the auxiliary variable and (line 5). In the same manner, 1D vertical smoothing is applied to the linear combination of and (line 11). These reflect the smoothing effect from the other side. The 1D smoothing parameter gradually decreases by factor of (line 6, 12). The algorithm is terminated after iterations.
Although the original formulation (1) is decomposed into a series of sub-problems, we compute exact solutions of the objectives (8) and (9) defined on the 1D dimension. This property considerably accelerates the convergence speed of the algorithm. Moreover, since these sub-problems can be decoupled, a straightforward parallelization is possible.
As , Algorithm 1 is convergent.
See the supplementary material.
Our approach is not rotationally invariant, as the original formulation of (1) enforces the smoothness term to be aligned for each axis individually. For the WLS smoothing, we do not observe any visible artifacts in all experiments. A similar observation can be found in . When , it prefers object boundaries which are minimal in the Manhattan () distance. This may give blocky artifacts in the region boundaries. In order to alleviate this problem, Algorithm 1 can be customized to perform a smoothing using 8-neighborhood system we should introduce two more auxiliary variables (by symmetry) and additionally iterate diagonal and anti diagonal passes555But, all results in this paper are obtained using a neighborhood system..
3.2.1 Continuation parameter
Our approach uses a continuation parameter to gradually increase a penalty parameter , at each iteration. This ensures that the auxiliary variable should be close to , and thus Algorithm 1 is convergent (Proposition 1). Fig. 1 shows the energy evolutions of WLS and WTV with different values of =2, 4, and 8. Using a large makes the convergence faster, but the objective value after the convergence becomes slightly higher than that of using a small . In this paper, we set by considering the trade-off between speed and accuracy.
3.3 Extension to fast iteratively re-weighted algorithms
Our approach can be extended for a wider class of potential functions by using iteratively re-weighted algorithms . Let us consider general heavy-tailed potentials , then the original objective function of (1) is written as
The principle of iteratively re-weighted algorithms is to find a convex function, which serves as the upper bound of (17). It allows for explicit algorithms according to the construction of this bound, i.e., IRLS vs IRL1. In the following, we will use to represent the external iteration index used to minimize (17).
3.3.1 Fast IRLS (FIRLS)
The IRLS approach uses the fact that many potentials can be seen as minima of the quadratic upper bound 
. Using an intermediate estimate, we obtain the following update rule:
where . Minimizing (18) is equivalent to solving a linear system (2), except that a Laplacian matrix is computed with . Thus, we can obtain using our fast WLS solver with internal iterations. This extension is attractive in several aspects. First, our approach can be applied to a range of applications, which require for sparse image gradients and sharp edges (See Section 5). Second, our approach involves only simple arithmetic operations for solving (18). While most existing linear solvers  suffer from additionally computing the preconditioner at each external iteration, as an intermediate linear system varies with .
3.3.2 Fast IRL1 (FIRL1)
Theoretically, the IRLS algorithm can be applied only when the potential function is well approximated with a quadratic upper bound . This, however, does not cover interesting functions such as and (), which are concave and non-differentiable at origin. Here we extend our fast WTV solver to the FIRL1 algorithm:
where , is concave, and denotes the sub-gradient. The algorithm exploits a convex function obtained by linearizing the potential . Note that (19) serves as the upper bound of (17) due to concavity of .
Obviously, each of the IRL1 iterations is the WTV problem, which can be solved efficiently using our solver. The pseudo-code for our fast iteratively re-weighted algorithms is provided in Algorithm 2.
4 Experimental Validation
We evaluate the convergence and runtime performance of our solver. The experiments are simulated with a single Intel i7 3.4GHz CPU. Our approach is easy to implement and we will release the source code at public website. All compared methods have been implemented in MATLAB with MEX interface. Primary parameters in our approach are set as: , and . At each iteration, we gradually increase by factor of . The smoothing parameters and are set to and , respectively, for all methods (the range of image values is ).
For convergence analysis of the WLS smoothing, we compare our approach (FWLS) with the conjugate gradient (CG), Gauss-Seidel (GS), Jacobi iteration (JI), and successive over relaxation (SOR). We use the SuiteSparse library  to solve a triangular system, arising from GS and SOR. In the case of the WTV smoothing, our approach (FWTV) is compared to the split Bregman method (SB) [8, 25] and classical penalty decomposition (classical-PD) . The FFTW library  is used to solve the -subproblem of (5) for SB and PD-classical methods. We show in Fig. 2(a) how the WLS objective evolves at each iteration (all methods are initialized by the input ). Although each iteration of CG, GS, JI, and SOR runs in a linear time, they require a very large number of iterations to converge. In contrast, our solver converges in a few iterations. The result of the WTV is shown in Fig. 2(b). Likewise, our approach converges much faster than SB and classical-PD. Note that these methods have complexity per iteration , while our solver runs in linear time. The input image is shown in the inset Fig 1.
|Image size||PCG ||MATLAB “ ”||FWLS||SB [8, 25]||FWTV|
To further demonstrate the effectiveness of our approach, we additionally collect 100 natural images from the BSDS500 dataset , and compare the WLS smoothing results. The smoothing result obtained by MATLAB “ ” command is used as the reference. The average value of the structural similarity (SSIM) index  is plotted as a function of the number of iterations in Fig. 3(a). The difference images between the reference and the results at 20 iteration is also visualized in Fig. 3(b)-(e). In general, we find the proposed approach yields satisfactory results after . The average SSIM values at , , and are 0.9896, 0.9963, and 0.9975, respectively.
The runtime comparison is shown in Table 1. In our methods, the number of iterations is fixed to 5 based on the above experiment. For the WLS smoothing, our approach is compared with the state-of-the-art preconditioning method (PCG)  and MATLAB “ ” operator. We note that MATLAB “ ” operator uses the sparse cholesky decomposition in the SuiteSparse library . The result for the PCG  is obtained from source code provided by the author. It should be noted that although the preconditioning method  improves the rate of convergence significantly, constructing the preconditioner needs lots of setup time, taking about 2.6 second for 1 image. The stopping criteria of the SB [8, 25] is . Our approach is magnitude faster than competing methods.
Next, we present the analysis of our fast iteratively re-weighted algorithms. Using the FIRLS, we first minimize the objective666The same image in Fig. 1 is used and is set to 7.65 (17) with . Fig. 4(a) shows that the convergence rate of the FIRLS method differs depending on the number of inner iterations . For the FIRL1, we use as it is not well suited for the potential function that has a quadratic behavior around 0 (Fig. 4(b)). Overall, we observe that the iteratively re-weighted algorithms are stuck in bad local minima if internal iterations are not carried out sufficiently. The objective value even fluctuates with , as the red line of Fig. 4(a). It is thus crucial to solve the subproblems of iteratively re-weighted algorithms until reaching the certain level of accuracy. In general, we find , is a good choice for both algorithms777On this example, an average value of per-pixel difference, i.e., is 0.15 after external iterations (the range of image values is ).. Many heavy-tailed potential functions are non-convex, and thus any warm initializations for may further improve the convergence of our fast iteratively re-weighted algorithms.
Our approach is flexible, and can be applied to several tasks using EPS. In this section, we apply our method to texture removal, content-based color quantization, and style transfer. To this end, the various image priors, i.e., potential function and weight , are exploited to match different application goals. All results and runtime in the comparison are obtained from source codes provided by the authors. The parameters are carefully tuned through extensive experiments. Additional results and other applications, such as scale-space filtering and image denoising, are also available in the supplementary material.
5.1 Texture removal
For texture removal, we employ the model proposed in : is set to and where
is the Gaussian kernel with standard deviation 2. This type of guidance image is very effective since textures on the object are usually of small scale structures. The smoothing parameters, and are fixed to 5 and 7.65, respectively, but varies according to images. In this setting, we minimize the objective (17) using our FIRLS solver. Fig. 5 shows examples of texture removal obtained by the rolling guidance filter (RGF) , the weighted median filter (WMF) , the relative total variation (RTV) , and our FIRLS. The RGF  is implemented using fast bilateral filter . The proposed method is even faster than the texture smoothing tools based on the fast local filtering (RGF , WMF ), while outperforming them in the subjective evaluation. Note that minimizing the objective function in the RTV  needs to solve a large linear system iteratively888In the RTV , the prior term is defined in a form of total variation, resulting in a nonlinear system of equation. It can be easily addressed by using a fixed point iteration.. Thus, it can be also accelerated by our FWLS solver.
5.2 Color quantization
(Content-based) color quantization attempts to increase the sparsity of colors while maintaining the overall structure in images. This is useful for many vision tasks, including image retrieval and segmentation. We use the sparsity prior,, to reduce the number of colors. The guidance image is not used in this application, i.e., . With this, we minimize the objective (17) using our FIRL1 solver. Results and comparisons are provided in Fig. 6. Our solver shows performance comparable to state-of-the-art minimization [16, 29], while running faster. The result in Fig. 6(c) has been obtained using the region fusion (RF) approach , which is tailored to minimization only.
5.3 Style transfer
Any feature of input image or others can be taken as the guidance . To transfer structure of to input images , we use our FWLS and FWTV solvers. Fig. 7 shows a specific style transfer example obtained by the weighted median filter (WMF)  and our FWLS. The edge map of is used as the guidance image . The window size of the WMF  is set to , taking about 0.4 seconds. Our FWLS and FWTV solvers take only 0.08 and 0.28 seconds, respectively, for an image of size .
We introduce a highly efficient splitting-based method for global EPS. Unlike previous splitting-based methods, our formulation enables linear time solvers for WLS and WTV problems. Our solver converges quickly, and its runtime is comparable to state-of-the-art local EPS approaches. We also propose fast iteratively re-weighted algorithms for a non-convex objective function. Our approach is flexible, and thus is applicable to a variety of applications.
-  Z. Farbman, R. Fattal, D.L., Szeliski, R.: Edge-preserving decompositions for multi-scale tone and detail manipulation. ACM TOG 27(3) (2008)
-  C. Rhemann, A. Hosni, M.B.C.R., Gelautz, M.: Fast cost-volume filtering for visual correspon-dence and beyond. in Proc. CVPR (2008)
-  J. Kopf, M. Cohen, D.L., Uyttendaele, M.: Joint bilateral upsampling. ACM TOG 26(3) (2007)
-  J. Park, H. Kim, Y.W.T.M.S.B., Kweon, I.: High quality depth map upsampling for 3d-tof cameras. in Proc. ICCV (2011)
-  L. Xu, Q. Yan, Y.X., Jia, J.: Structure extraction from texture via relative total variation. ACM TOG 31(6) (2012)
-  T. Kim, K.L., Lee, S.: Generative image segmentation using random walks with restart. in Proc. ECCV (2008)
-  F. Perazzi, P. Krhenbhl, Y.P., Hornung, A.: Saliency filters: Contrast based filtering for salient region detection. in Proc. CVPR (2012)
-  S. Bi, X.H., Yu, Y.: An image transform for edge-preserving smoothing and scene-level intrinsic decomposition. ACM TOG 34(4) (2015)
-  B. Ham, M.C., Ponce, J.: Robust image filtering using joint static and dynamic guidance. in Proc. CVPR (2015)
-  K. He, J.S., Tang, X.: Guided image filtering. in Proc. ECCV (2010)
-  Gastal, E.S.L., Oliveira, M.M.: Domain transform for edge-aware image and video processing. ACM TOG 30(4) (2013)
-  J. Chen, S. Paris, F.D.: Real-time edge-aware image processing with the bilateral grid. ACM TOG 26(3) (2007)
-  D. Krishnan, R.F., Szeliski, R.: Efficient preconditioning of laplacian matrices for computer graphics. ACM TOG 32(4) (2013)
-  Y. Wang, J. Yang, W.Y., Zang, Y.: A new alternating minimization algorithm for total variation image reconstruction. SIAM 1(3) (2008)
-  M. V. Afonso, J.M.B.D., Figueiredo, M.A.T.: Fast image recovery using variable splitting and constrained optimization. IEEE TIP 19(9) (2010)
-  L. Xu, C. Lu, Y.X., Jia, J.: Image smoothing via l0 gradient minimization. ACM TOG 30(6) (2011)
-  Schmidt, U., Roth, S.: Shrinkage fields for effective image restoration. in Proc. CVPR (2014)
-  Saad, Y.: Iterative methods for sparse linear system. SIAM, Philadelphia (2003)
-  Chartrand, R., Yin, W.: Iteratively reweighted algorithms for compressive sensing. in Proc. ICASSP (2008)
-  Nocedal, J., Wright, S.: Numerical optimization. Springer (2006)
-  Golub, G.H., Loan, C.F.V.: Matrix computations. JHU Press (2006)
-  Condat, L.: A direct algorithm for 1d total variation denoising. IEEE SPL 20(11) (2013)
-  P. Ochs, A. Dosovitskiy, T.B., Pock, T.: On iteratively reweighted algorithms for nonsmooth nonconvex optimization in computer vision. SIAM 8(1) (2015)
-  Online.: http://faculty.cse.tamu.edu/davis/suitesparse.html.
-  Goldstein, T., Osher, S.J.: The split bregman method for l1-regularized problems. JIS 2(2) (2009)
-  Online.: http://www.fftw.org/.
-  Z. Wang, A. C. Bovik, H.R.S., Simoncelli, E.P.: Image quality assessment:from error visibility to structural similarity. IEEE TIP 13(4) (2004)
-  P. Arbelaez, M. Maire, C.F., Malik, J.: Contour detection and hierarchical image segmentation. IEEE TPAMI 33(5) (2011)
-  Nguyen, R.M.H., Brown, M.S.: Fast and effective gradient minimization by region fusion. in Proc. ICCV (2015)
-  Q. Zhang, L.X., Jia, J.: 100 times faster weighted median filter. in Proc. CVPR (2014)
-  Q. Zhang, X. Shen, L.X., Jia, J.: Rolling guidance filter. in Proc. ECCV (2014)