In image analysis applications, it is desirable to use variational models with regularisers that involve higher order derivatives: These higher order models offer better smoothness and avoid e.g. staircasing artifacts that are present in various first order variational approaches. Unfortunately, higher order variational models are numerically unpleasant. For instance, minimising a second order energy gives rise to a fourth order partial differential equation (PDE) as Euler-Lagrange equation. Higher order PDEs are more cumbersome to discretise, and their numerical approximations suffer from higher condition numbers such that algorithms converge more slowly or become even unstable.
As a remedy, coupling models have been proposed. They introduce coupling terms with auxiliary variables. For instance, one approximates the gradient of the processed image by an auxiliary variable. In this way a second order energy in a single variable is replaced by a first order energy in two variables. As Euler-Lagrange equations, one solves a system of second order PDEs, which is numerically more pleasant than solving a fourth order PDE.
Coupling models have a long history. Already in 1990, Horn came up with a coupling model for shape from shading . In the case of denoising and deblurring, Chambolle and Lions  have investigated coupling models by combining several convex functionals into one regulariser. With this they were able to modify the total variation regularisation model  and reduce unpleasant staircasing artifacts. A transfer of this infimal convolution approach to the discrete setting was proposed by Setzer et al. . Also Bredies et al. 
performed research in this direction and introduced a total generalised variation regulariser based on symmetric tensors of varying order. This allows a selective regularisation with different smoothness constraints. More recent coupling models have become highly sophisticated. Hewer et al. introduced three coupling variables to come up with a coupling model that replaces fourth order energy minimisation. Hafner et al.  considered coupling models that are anisotropic both in the coupling and the smoothness term, and Brinkmann et al. 
used coupling models in connection with regularisers based on natural vector field operations, including divergence and curl terms.
Nevertheless, in spite of their popularity and sophistication, the main motivation behind all these coupling approaches was numerical efficiency. Additional benefits of the coupling formulation have hardly been investigated so far.
Our Contributions. The goal of our paper is to show that the coupling formulation is much more than just a numerical acceleration strategy. We will see that it provides valuable benefits from three different perspectives that have remained unexplored so far:
From a modelling perspective, we argue that coupling terms may free the user from designing nonquadratic smoothness terms if discontinuity-preserving solutions are needed. Moreover, we show that it can be worthwhile to replace even first order energies by coupling models.
From a theoretical perspective, we demonstrate that coupling models may give insights into the connections between two of the most widely used variational approaches in image analysis: the total variation (TV) regularisation of Rudin et al.  and the Mumford-Shah functional  for image segmentation. These insights will also give rise to a new algorithm for Mumford-Shah segmentation, that differs from existing ones by the fact that it minimises a strictly convex energy. Thus, we can guarantee convergence of a split Bregman algorithm.
From an application perspective, we show that sparsity promoting coupling terms may provide useful information as an edge detector that comes at no additional cost.
Paper Structure. Our paper is organised as follows: In Section 2, we derive the framework for a first and second order model that incorporates the coupling term. We show the relation to the Mumford-Shah functional and elaborate why the coupling term can be interpreted as an edge detector. Section 3 deals with the numerical aspects that lead to the minimisation of our approach in the discrete setting and showcases the advantages of the split Bregman formulation. In Section 4, we present experimental results, and we conclude the paper in Section 5.
2 Our Models
In this section we propose two coupling models for edge-preserving image enhancement. We relate them to total variation regularisation and the segmentation approach of Mumford and Shah, and we argue that the coupling term provides a novel edge detector.
2.1 Coupling Framework
2.1.1 First Order Model.
Let our image domain be an open bounded set and denote a greyscale image that contains noise with zero mean. Our goal is to find a noise free version of that image. As this is in general an ill-posed problem, we employ variational approaches to find a reconstruction that is as close as possible to the noise free image and at the same time satisfies certain a priori smoothness conditions. One of the earliest models by Whittaker  and Tikhonov  obtains as minimiser of the energy functional
where denotes the Euclidian norm. The first term guarantees that is close to the given data and is called data term. The second term penalises deviation from a smooth solution and is therefore denoted as the smoothness term. The regularisation parameter steers the smoothness of the solution. In practice we deal with a rectangular, finite domain . This prompts us to choose homogeneous Neumann boundary conditions. These coincide with the natural boundary conditions that arise when we derive the Euler-Lagrange equations of (1).
We modify (1) to a constrained problem by introducing an auxiliary vector field . We set and obtain
Enforcing the constraint weakly in the Euclidian norm yields the coupling model
with parameter . Since the newly introduced term inherently couples and we call it coupling term. It is obvious that apart from trivial cases the vector field is not equal to the gradient of the function . However, by following the proof in  one can show that
which means that they are ”equal in the mean”. The vector field inherits its boundary conditions from . We will discuss the details in the next chapter.
At first sight the coupling model does not offer any advantage over the energy (1) and seems to complicate it more. Indeed, by calculating the Euler-Lagrange equations and performing a simple substitution we see that the minimiser for of the functional in (3) is equivalent to the minimiser of energy (1) with parameter . For the limit case , which corresponds to a hard constraint of , we thus obtain the same solution. In the other limit case , the energy (3) corresponds to (1) with regularisation parameter .
In order to obtain our first order coupling model we only apply a small modification to the coupling term in (3). We recall that a function is said to be of bounded variation () if it has a (distributional) gradient in form of a Radon measure of finite total mass. Then we can define the total variation seminorm of as 
If we consider the difference as an argument, we extend (5) and obtain
With this we formulate our first order coupling model:
It is obvious that the model (7) is strictly convex due to the quadratic data term. Moreover, we also have a quadratic smoothness term and only the coupling term contains a more sophisticated penaliser. It is therefore no surprise that for the hard constraint , the energies (7) and (3) are still identical. The more interesting case arises in the limit : In order to fulfil the constraint, the vector field should be zero, which yields the regulariser (5) for the coupling term. Energy (7) is then equivalent to the classical TV denoising model by Rudin et al. . However, we will later demonstrate how the structure of the coupling model is both beneficial for an efficient minimisation as well as for superior denoising results. We note that it can be shown that model (7) is equivalent to denoising with a Huber TV penaliser .
2.1.2 Second Order Model.
While TV regularisation promises a good compromise between image reconstruction and edge preservation, it introduces ”blocky” structures into the reconstructed images, the so-called staircasing artifacts. One way to avoid these artifacts is by second order models [25, 30, 21]. Due to the coupling formulation, a higher regularisation order can easily be realised in our framework. Instead of the vector field we penalise its Jacobi matrix in the Frobenius norm. The resulting energy is given by
In order to see that (8) approximates a second order model, we discuss the limiting case again. The smoothness term favours a constant vector field . For a hard coupling this means that constant first derivatives of lead to the smallest energy. Hence, the solution corresponds to an affine function which is characteristic for second order regularisation methods. The model resembles the total generalized variation (TGV) of second order :
where denotes the symmetrised gradient in a TV-like norm in the smoothness term. However, the smoothness term in our first and second order model is still quadratic. In Section 3 we will see that this allows very efficient algorithms.
2.2 Relation to the Segmentation Model of Mumford and Shah
Let be defined as before and let be a compact curve in . The segmentation model by Mumford and Shah minimises the following energy :
The function approximates as a sufficiently smooth function in , but admits discontinuities across . The curve denotes the edge set between the segments. The expression represents the length of and can be expressed as , the one-dimensional Hausdorff measure in . The parameter controls the length of the edge set, whereas steers the smoothness of in .
Unfortunately the Mumford-Shah energy is highly non-convex and difficult to minimise. Ambrosio and Tortorelli  showed that they can approximate the solution by a sequence of regular functions defined on Sobolev spaces and proved -convergence. However, their energy is still non-convex and the quality of the solution depends on the initial parameter setting of the -approximation.
Nevertheless, the price for convexity comes with a smooth minimiser that cannot contain any discontinuities such that a postprocessing step is necessary for a segmentation.
In order to relate equation (11) to our first order model (7), we perform the image decomposition such that the regularisers affect different parts of the image. These kind of approaches based on an infimal convolution were introduced by Chambolle and Lions in . With this we obtain
The decomposition lets us define the parts on different spaces. While , we have . Thus, we admit jumps in which makes a postprocessing unnecessary. By introducing an auxiliary vector field which is designed to approximate , we consequently obtain for the argument of the last term:
Substituting this in (12) brings us back to the coupling model (7). Therefore, we observe that our first order coupling model approximates the Mumford-Shah segmentation model. It is obvious that this convex approximation is smoother then other variants as we do not perform an enhancement at edges. However, we still allow jumps in the regularised image. Moreover, we stress that the approximation is strictly convex and admits a unique solution.
The reasoning for the first order model can easily be transferred to the second order model. Energy (8) can thus be understood as an approximation to a second order Mumford-Shah model. One example for a second order approximation is the model of Blake and Zisserman :
Here, denotes the Hessian, and the Hausdorff measures on the curves and penalise not only the edge length between segments as before, but also the length of the curve that describes creases in the image. More recent variants were discussed in [9, 15]. They all have in common that they require the approach of Ambrosio-Torterelli to derive a minimiser. Our second order model gives an alternative that manages without the -approximation.
2.3 Interpretation of the Coupling Term as an Edge Detector
In the previous subsection we established a connection between our coupling framework and the Mumford-Shah functional. The difficulty in the original formulation lies in the computation of the edge set . We argue in the following how the edge set is related to the introduced coupling term which is directly accessible without additional effort.
We observe that the smoothness terms in energy (7) and (8) are quadratic. That means that on one hand is still a smooth approximation to the . On the other hand the coupling term contains the sparsity-promoting norm. As a result, deviates from only at a small number of positions, which are represented as peaks in the energy of the coupling term. Since these peaks correspond to discontinuous jumps in the solution , we see that the coupling term corresponds to the desired edge set and can naturally be utilised as an edge detector.
We distinguish two limiting cases. For we end up with the hard constraint such that the coupling energy (6) evaluates to . This makes sense, as the model simplifies to energy (1) again, and will be a smooth solution without jumps. For , the functions and are decoupled. A minimiser is reached when corresponds to the given image and . Here the coupling term represents a simple gradient-based edge detector without a presmoothing. Such a detector is not reliable in the presence of high frequency oscillations and noise as it responds to many false edges.
In between these limiting cases, an increasing value for allows the coupling term to partition the image into more and more homogeneous regions, since this gives a smaller energy. Thus, on each scale , we solve a convex variational model and obtain a unique solution. The coupling term serves as a novel edge detector that provides us with a globally optimal edge set. The discontinuity-preserving norm in the underlying model allows the coupling term to respond at edges, but makes it robust against false edges produced by noise in the image. As it also promotes sparsity it limits the detector’s response to the most relevant edges which are well localised.
3 Numerical Aspects
In this section we argue that the discrete variables in our models live on different grids in order to obtain consistent approximations. We speed up a conventional alternating minimisation by a split Bregman scheme  that offers a tremendous acceleration of the minimisation process. We discuss the scalar-valued problem and then explain the modification needed to solve the problem for vector-valued data.
3.1 Discretisation of the Variables
In order to implement the model on a computer we introduce a discrete setting based on a finite difference discretisation. We consider the domain to be rectangular, i.e. . Then the discrete domain is given on a regular Cartesian grid with nodes:
with grid size . The discrete locations within correspond w.l.o.g. to the continuous locations within . Discretising the scalar-valued functions and at positions yields variables . We approximate the derivatives with a finite difference scheme. As we require the consistency order of our approximation to be at least two, we use central differences around a grid shifted by . In order to be consistent with this approximation we consequently need the discretisation for to live on different dual grids shifted just by this half grid size . More precisely, entries are defined at positions , whereas entries are defined at positions , as they approximate the partial derivatives in different directions. This also extends to the boundary conditions. As already mentioned inherits its boundary conditions from the homogeneous Neumann boundary conditions of . This means that the components of have zero Dirichlet boundary conditions in the direction of the shifted grid and homogeneous Neumann boundary conditions otherwise.
3.2 Discretisation of the Energies
For sufficiently smooth data , we can define a regularised penaliser of (5) by
where denotes the discrete Euclidean norm. The coupling term contains the discrete counterpart of the regularised penalising function (16) in form of
which evaluates the regulariser (16) pixelwise with
The approximation of the discrete gradient components and lives on a grid shifted by half a grid size in the corresponding positive coordinate direction and incorporates the homogeneous Neumann boundary conditions.
In the function , the parameter defines the regularisation order of the problem. For , we have , and for we use given by
We employ the same discretisation for and as above, whereas we introduce the approximations and as central differences on a grid which is shifted in the respective negative coordinate with Dirichlet boundary conditions.
If we solve problem (17) in a naive way, we set the derivatives with respect to the components and to zero and perform an alternating iterative minimisation. The nonlinearity is always evaluated at the old iterate. The minimisation problem for leads to
with the diffusivity vector and a discrete divergence operator . For , the auxiliary vector field requires us to solve the following equations:
whereas for we obtain a system of equations of the form
Here, denotes a direct sum of discrete Laplace operators that are applied to each component of separately.
3.3 The Split Bregman Method
Unfortunately, the solution of (21) is a bottleneck in the alternating minimisation process. The derivatives of the regularised penalisation that serve as a diffusivity slow down the convergence speed drastically when we apply iterative solvers. As a remedy for this problem we employ the split Bregman method by Goldstein and Osher . This operator splitting approach is equivalent to the alternating direction method of multipliers (ADMM) and the Douglas-Rachford splitting algorithm which are suitable for convex optimisation problems such as (17) and whose convergence is guaranteed [6, 26].
We introduce the discrete vector field that lives on the same grid as . We set and obtain the constrained problem
Enforcing the constraint strictly with a numerical parameter yields the split Bregman iteration at step :
Even though the split Bregman method introduces an additional minimisation problem, the previous nonlinear equations are simplified substantially due to the simple quadratic structure of the data and smoothness term. The subproblems for and are then by design linear and yield the optimality conditions
Equation (28) leads to a simple pointwise computation. For the other two subproblems we may only apply a few steps of an iterative solver. This is in accordance to Osher and Goldstein who state that a complete convergence of the subproblems is not necessary for the convergence of the whole algorithm .
The last subproblem for contains the nonlinearity, that now only occurs pointwise and not as a diffusivity in a divergence term. We obtain the update rule
We terminate the algorithm if the residuum of equations (21)–(23) in relation to an initial residuum gets smaller than a prescribed threshold. If we are also interested in the resulting edge set, we can obtain it without additional effort by evaluating .
3.4 Vector-valued Extension
An extension to vector-valued images is straightforward. The Euclidian norm in data and smoothness terms allows a separate calculation for each channel which opens up the possibility for a parallel implementation. The only coupling between channels occurs in (19). Therefore, only the nonlinearities in the pointwise update rule (30) are affected.
4.1 Denoising Experiment
|test image ||
|TV (MSE: 35.06)||
|row 87, TV model||
In this section we discuss the application of our first and second order coupling models for denoising. We compare our models to TV denoising  and the combined first and second order approach TV-TV by Papafitsoros et al. . We set the regularisation parameter of the penalising function to and stop the algorithms if the relative residuum gets smaller than . For the solution of the linear subproblems we perform an inner loop of Jacobi steps per iteration. The numerical parameter is important for the number of iterations until convergence is reached. Empirically we found that should be chosen in dependence of and the regularisation order . We fix . We choose the parameters optimally for each method such that the mean squared error (MSE) w.r.t. the ground truth is minimised.
In Fig. 1 we consider an affine test image of size that is corrupted by Gaussian noise with . We first consider denoising by first order models. Our model is able to beat the classical TV model w.r.t. the MSE. We also see a qualitative improvement when we consider a horizontal cross section of the images in the last row. We reduce the staircasing artifacts significantly.
|row 87, TV-TV model||
We compare the reconstruction by second order models in Fig. 2. While the TV-TV model beats our first order coupling model only slightly w.r.t. to the MSE, our second order model presents a large improvement. In the 1D plot we see that no staircasing artifacts arise.
This experiment also illustrates the efficiency of the split Bregman implementation for solving our problems. We compare two implementations. On the one hand we employ an alternating minimisation strategy of equations (21) and (22) resp. (23) realised with a standard conjugate gradient method. On the other hand we apply a split Bregman Implementation based on the subproblems that arise from equation (25).
Figure 3 shows the decay in the relative residuum as a function to the runtime. For both the first and second order model, we gain a speed up of more than one order of magnitude with the split Bregman scheme.
4.2 Convex Mumford-Shah Approximation
To understand how the coupling model approximates the Mumford-Shah functional we first have to discuss the influence of the parameters. The parameter determines the smoothness between segments. For a fixed value of , increasing reduces the number of segments and consequently of detected edges. To counteract the segment fusion steered by , the parameter should be chosen relatively large to benefit from the TV-like effect inside each segment.
In Fig. 4 we show our first order model for varying values of together with the edge set that we get from the coupling term. For better visualisation we perform a hysteresis thresholding of the edge set . Our method is able to produce smooth segments. For an increasing value of , more and more segments are merged. For instance we see that the smaller hole in the rock vanishes for a higher value. As already mentioned the resulting images are rather smooth in each segment due to the quadratic smoothness term, but we still have sharp edges in between. For an increasing value we obtain a sparser edge set.
|original image ||,||,|
4.3 Application to Edge Detection
The previous subsection shows that an edge detector can be obtained directly from the coupling term. In Fig. 5 we illustrate its high robustness under noise. We compare it to a Canny edge detector , which is a classical edge detection method. To compensate for the noise, the Canny edge detector presmoothes the image by a convolution with a Gaussian kernel. While most large scale structures are detected, edges of smaller details are dislocated or merged together. This can be seen e.g. at the branch below the bird. Our method works edge-preserving and is able to capture these small details.
5 Conclusions and Outlook
We have shown that coupling terms in variational models are useful for more than just easing numerics. Already first order models that contain a coupling term in the sparsity promoting norm are sufficient for edge-preserving denoising. For the remaining data and smoothness terms we can use simple quadratic norms. This does not only simplify the numerical minimisation scheme, it also challenges the common belief that edge-preserving denoising requires a nonquadratic smoothness term. The first order coupling model also shows a theoretical connection to the classical segmentation model of Mumford and Shah. As it works on a strictly convex energy, we obtain an approximation of the Mumford-Shah functional that has a unique solution and is guaranteed to converge. Furthermore, the introduced coupling term itself carries valuable information: It has the properties of a global edge detector that is robust against noise and comes at literally no cost as a byproduct of the minimisation.
In our ongoing research we investigate how the coupling formulation can be extended to applications in deblurring or inpainting or in combination with anisotropic regularisation. Moreover, we intend to exploit the simplicity of the quadratic terms by using parallel implementations to gain more efficient solvers.
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 741215, ERC Advanced Grant INCOVID).
-  R. Acar and C. R. Vogel. Analysis of bounded variation penalty methods for ill–posed problems. Inverse Problems, 10:1217–1229, 1994.
-  L. Ambrosio, N. Fusco, and D. Pallara. Functions of Bounded Variation and Free Discontinuity Problems. Oxford University Press, Oxford, 2000.
-  L. Ambrosio and V. Tortorelli. Approximation of functionals depending on jumps by elliptic functionals via -convergence. Bolletino della Unione Matematica Italiana, 7:105–123, 1992.
-  M. Bildhauer, M. Fuchs, and J. Weickert. An alternative approach towards the higher order denoising of images. analytical aspects. Journal of Mathematical Sciences, 224(3):414–441, July 2017.
-  A. Blake and A. Zisserman. Visual Reconstruction. MIT Press, Cambridge, MA, 1987. Paperback reprint edition.
S. Boyd, N. Parikh, B. Peleato, and J. Eckstein.
Distributed optimization and statistical learning via the alternating
direction method of multipliers.
Foundations and Trends in Machine Learning, 3:1–122, 2010.
-  K. Bredies, K. Kunisch, and T. Pock. Total generalized variation. SIAM Journal on Imaging Sciences, 3(3):492–526, 2010.
-  E.-M. Brinkmann, M. Burger, and J. S. Grah. Unified models for second-order TV-type regularisation in imaging-a new perspective based on vector operators. Journal of Mathematical Imaging and Vision, 61(5):571–601, June 2019.
-  M. Burger, T. Esposito, and C. I. Zeppieri. Second-order edge-penalization in the ambrosio–tortorelli functional. Multiscale Modeling & Simulation, 13(4):1354–1389, 2015.
-  M. Burger, K. Papafitsoros, E. Papoutsellis, and C. Schönlieb. Infimal convolution regularisation functionals of BV and spaces. part I: The finite case. Journal of Mathematical Imaging and Vision, 55(3):343–369, July 2016.
-  X. Cai, R. Chan, and T. Zeng. A two-stage image segmentation method using a convex variant of the Mumford–Shah model and thresholding. SIAM Journal on Imaging Sciences, 6(1):368–390, 2013.
-  J. Canny. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8:679–698, 1986.
-  A. Chambolle and P.-L. Lions. Image recovery via total variation minimization and related problems. Numerische Mathematik, 76:167–188, 1997.
-  D. Martin, and C. Fowlkes. Berkeley Segmentation and Boundary Detection Benchmark and Dataset. http://www.cs.berkeley.edu/projects/vision/grouping/segbench., 2003.
-  J. Duan, Y. Ding, Z. Pan, J. Yang, and L. Bai. Second order Mumford–Shah model for image denoising. In Proc. 2015 IEEE International Conference on Image Processing, pages 547–551, Sep. 2015.
-  T. Goldstein and S. Osher. The split Bregman method for L1-regularized problems. SIAM Journal on Imaging Sciences, 2(2):323–343, 2009.
-  D. Hafner, C. Schroers, and J. Weickert. Introducing maximal anisotropy into second order coupling models. In J. Gall, P. Gehler, and B. Leibe, editors, Pattern Recognition, volume 9358 of Lecture Notes in Computer Science, pages 79–90. Springer, Cham, 2015.
-  A. Hewer, J. Weickert, T. Scheffer, H. Seibert, and S. Diebels. Lagrangian strain tensor computation with higher order variational models. In T. Burghardt, D. Damen, W. Mayol-Cuevas, and M. Mirmehdi, editors, Proc. 24th British Machine Vision Conference, pages 129.1–129.10, Bristol, Sept. 2013. BMVA Press.
B. K. P. Horn.
Height and gradient from shading.
International Journal of Computer Vision, 5(1):37–75, 1990.
-  J. M. Hyman and M. Shashkov. Natural discretizations for the divergence, gradient, and curl on logically rectangular grids. Computers and Mathematics with Applications, 33(4):81–104, 1997.
-  M. Lysaker, A. Lundervold, and X.-C. Tai. Noise removal using fourth-order partial differential equations with applications to medical magnetic resonance images in space and time. IEEE Transactions on Image Processing, 12(12):1579–1590, Dec. 2003.
-  D. Mumford and J. Shah. Optimal approximation of piecewise smooth functions and associated variational problems. Communications on Pure and Applied Mathematics, 42:577–685, 1989.
-  K. Papafitsoros and C. B. Schönlieb. A combined first and second order variational approach for image reconstruction. Journal of Mathematical Imaging and Vision, 48(2):308–338, Feb 2014.
-  L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D, 60:259–268, 1992.
Denoising with higher order derivatives of bounded variation and an application to parameter estimation.Computing, 60:1–27, 1998.
-  S. Setzer. Split Bregman algorithm, Douglas-Rachford splitting and frame shrinkage. In X.-C. Tai, K. Mørken, M. Lysaker, and K.-A. Lie, editors, Scale Space and Variational Methods in Computer Vision, pages 464–476. Springer, 2009.
-  S. Setzer, G. Steidl, and T. Teuber. Infimal convolution regularizations with discrete 1-type functionals. Communications in Mathematical Sciences, 9(3):797–872, 2011.
-  A. N. Tikhonov. Solution of incorrectly formulated problems and the regularization method. Soviet Mathematics Doklady, 4:1035–1038, 1963.
-  E. T. Whittaker. A new method of graduation. Proceedings of the Edinburgh Mathematical Society, 41:65–75, 1923.
-  Y.-L. You and M. Kaveh. Fourth-order partial differential equations for noise removal. IEEE Transactions on Image Processing, 9(10):1723–1730, 2000.