A total variation based regularizer promoting piecewise-Lipschitz reconstructions

03/12/2019 ∙ by Martin Burger, et al. ∙ University of Cambridge FAU 0

We introduce a new regularizer in the total variation family that promotes reconstructions with a given Lipschitz constant (which can also vary spatially). We prove regularizing properties of this functional and investigate its connections to total variation and infimal convolution type regularizers TVLp and, in particular, establish topological equivalence. Our numerical experiments show that the proposed regularizer can achieve similar performance as total generalized variation while having the advantage of a very intuitive interpretation of its free parameter, which is just a local estimate of the norm of the gradient. It also provides a natural approach to spatially adaptive regularization.



There are no comments yet.


page 9

page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Since it has been introduced in [20], total variation () has been popular in image processing due to its ability to preserve edges while imposing sufficient regularity on the reconstructions. There have been numerous works studying the geometric structure of -based reconstructions (e.g., [19, 17, 11, 16, 8]). A typical characteristic of these reconstructions is the so-called staircasing [19, 17], which refers to piecewse-constant reconstructions with jumps that are not present in the ground truth image. To overcome the issue of staircasing, many other -type regularizers have been proposed, perhaps the most successful of which being the Total Generalized Variation ([3, 5]. uses derivatives of higher order and favours reconstructions that are piecewise-polynomial; in the most common case of these are piecewise-affine.

While greatly improves the reconstruction quality compared to , the fact that it uses second order derivatives typically results in slower convergence of iterative optimization algorithms and therefore increases computational costs of the reconstruction. Therefore, there has been an effort to achieve a performance similar to that of with a first-order method (i.e. a method that only uses derivatives of the first order). In [9, 10], infimal convolution type regularizers have been introduced that use an infimal convolution of the Radon norm and an norm applied to the weak gradient of the image. For an , is defined as follows

where is the weak gradient, are constants and . It was shown that for the reconstructions are piecewise-smooth, while for they somewhat resemble those obtained with .

The regularizer we introduce in the current paper also aims at achieving a similar performance with second-order methods while only relying on first order derivatives. It can be seen either as a relaxiation of obtained by extending its kernel from constants to all functions with a given Lipschitz constant (for this reason, we call this new regularizer , with ‘

’ standing for ‘piecewise-Lipschitz’), or as an infimal convolution type regularizer, where the Radon norm is convolved with the characteristic function of a certain convex set.

We start with the following motivation. Let be a bounded Lipschitz domain and a noisy image. Recall the ROF [20] denoising model

where is the weak gradient,

is the space of vector-valued Radon measures and

is the regularization parameter. Introducing an auxiliary variable , we can rewrite this problem as follows

Our idea is to relax the constraint on as follows

for some positive constant, function or measure . Here is the variation measure corresponding to and the symbol denotes a partial order in the space of signed (scalar valued) measures . This problem is equivalent to


which we take as the starting point of our approach.

This paper is organized as follows. In Section 2 we introduce the primal and dual formulations of the functional and prove their equivalence. In Section 3 we prove some basic properties of and study its relationship with other -type regularizers. Section 4 contains numerical experiments with the proposed regularizer.

2 Primal and Dual Formulations

Let us first clarify the notation of the inequality for signed measures in (1).

Definition 1

We call a measure positive if for every subset one has . For two signed measures we say that if is a positive measure.

Now let us formally define the new regularizer.

Definition 2

Let be a bounded Lipschitz domain, be a finite positive measure. For any we define

where denotes the Radon norm and is the variation measure [4] corresponding to , i.e. for any subset

(see also the polar decomposition of measures [1]).

The in Definition 2 can actually be replaced by a since we are dealing with a metric projection onto a closed convex set in the dual of a separable Banach space. We also note that for we immediately recover total variation.

We defined the regularizer in the general case of being a measure, but we can also choose to be a Lebesgue-measurable function or a constant. In this case the inequality is understood in the sense where is the Lebesgue measure, resulting in being absolutely continuous with respect to . We will not distinguish between these cases in what follows and just write .

As with standard , there is also an equivalent dual formulation of .

Theorem 2.1

Let be a positive finite measure and a bounded Lipschitz domain. Then for any the functional can be equivalently expressed as follows

where denotes the pointwise -norm of .


Since by the Riesz-Markov-Kakutani representation theorem the space of vector valued Radon measures is the dual of the space , we can rewrite the expression in Definition 2 as follows

In order to exchange and , we need to apply a minimax theorem. In our setting we can use the Nonsymmetrical Minimax Theorem from [2, Th. 3.6.4]. Since the set is bounded, convex and closed and the set is convex, we can swap the infimum and the supremum and obtain the following representation

Noting that the supremum can actually be taken over , we obtain

which yields the assertion upon replacing with .

3 Basic Properties and Relationship with other -type regularizers

Influence of .

It is evident from Definition 2 that a larger yields a larger feasible set and a smaller value of . Therefore, whenever . In particular, we get that for any .

Lower-semicontinuity and convexity.

Lower-semicontinuity is clear from Definition 2 if we recall that the infimum is actually a minimum. Convexity follows from the fact that is an infimal convolution of two convex functions.

Absolute one-homogeneity.

Noting that is the distance from the convex set , we conclude that it is absolute one-homogeneous if and only if this set consists of just zero, i.e. when and .


We have seen that for any , i.e. is a lower bound for . If is finite, the converse inequality (up to a constant) also holds and we obtain topological equivalence of and .

Theorem 3.1

Let be a bounded Lipschitz domain and a positive finite measure. For every we obtain the following relation:


We already established the right inequality. For the left one we observe that for any such that the following estimate holds

which also holds for the infimum over .

The left inequality in Theorem 3.1 ensures that is coercive on , since implies . Upon adding the norm, we also get coercivity on . This ensures that can be used for regularisation of inverse problems in the same scenarios as .

Topological equivalence between and is understood in the sense that if one is bounded then the other one is too. Being not absolute one-homogeneous, however, cannot be an equivalent norm on .

Null space.

We will study the null space in the case when is a Lebesgue measurable function and the inequality in Definition 2 is understood in the sense that with being the Lebesgue measure.

Proposition 1

Let and be an function. Then if and only if the weak derivative is absolutely continuous with respect to the Lebesgue measure and its -norm is bounded by a.e.


We already noted that, since the space is the dual of the separable Banach space , the infimum in Definition 2 is actually a minimum and there exists a with such that

If , then and therefore , which implies that is absolutely continuous with respect to and can be written as

for any . The function is the Radon-Nikodym derivative . From the condition we immediately get that a.e.

Remark 1

We notice that the set is not a linear subspace, therefore, we should rather speak of the null set than the null space.

Remark 2

Proposition 1 implies that all functions in the null set are Lipschitz continuous with (perhaps, spatially varying) Lipschitz constant ; hence the name of the regularizer.

Luxemburg norms.

For a positive convex nondecreasing function with the Luxemburg norm is defined as follows [18]

We point out a possible connection between and a Luxemburg norm corresponding to with a suitable constant . However, we do not investigate this connection in this paper.

Relationship with Infimal Convolution Type regularizers.

We would like to highlight a relationship to infimal convolution type regularizers  [9, 10]. Indeed, as already noticed earlier, can be written as an infimal convolution


where . If is a constant, we obtain a bound on the -norm of . This highlights a connection to the regularizer [10]: for any particular weight in front of the -norm in (and for a given ), the auxiliary variable will have some value of the -norm and if we use this value as in , we will obtain the same reconstruction. The important difference is that with we don’t have direct control over this value and can only influence it in an indirect way through the weights in the regularizer. In this parameter is given explicitly and can be either obtained using additional a priori information about the ground truth or estimated from the noisy image.

Similar arguments can be made in the case of spatially variable if the weighting in is also allowed to vary spatially.

4 Numerical Experiments

In this section we want to compare the performance of the proposed first order regularizer with a second order regularizer,

, in image denosing. We consider images (or 1D signals) corrupted by Gaussian noise with a known variance and use the residual method 

[14] to reconstruct the noise-free image, i.e. we solve (in the discrete setting)


where is the noisy image, is the regularizer ( or ),

is the standard deviation of the Gaussian noise and

is the number of pixels in the image (or the number of entries in the 1D signal). We solve all problems in MATLAB using CVX [15]. For we use the parameter , which is in the range recommended in [12].

A characteristic feature of the proposed regularizer is its ability to efficiently encode the information about the gradient of the ground truth (away from jumps) if such information is available. Our experiments showed that the quality of the reconstruction significantly depends on the quality of the (local) estimate of the norm of the gradient of the ground truth.

The ideal application for would be one where we have a good estimate of the gradient of the ground truth away from jumps, which, however, may occur at unknown locations and be of unknown magnitude. If such an estimate is not available, we can roughly estimate the gradient of the ground truth from the noisy signal, which is the approach we take.

4.1 1D experiments.

We consider the ground truth shown in Figure 1a (green dashed line). The signal is discretized using points. We add Gaussian noise with variance and obtain the noisy signal shown in the same Figure (blue solid line).

Figure 1: The pipeline for the reconstruction with (a-e). An overregularized reconstruction is used to detect and partially eliminate the jumps (a). The residual (b) is filtered (c) and numerically differentiated (d). The absolute value of the obtained derivative is used as the parameter (d) for . The reconstruction using (e) follows well the structure of the ground truth apart from a small artefact at around . also yileds a good reconstruction (f), although it tends to approximate the solution with a piecewise-affine function in areas where the ground truth is not affine (e.g., between and ). Both regularizers yield similar (high) values of SSIM.

To use , we need to estimate the derivative of the true signal away from the jumps. Therefore, we need to detect the jumps, but leave the signal intact away from them (up to a constant shift). This is exactly what happens if the image is overregularized with . We compute a reconstruction by solving the ROF model


with a large value of (in this example we took ). The result is shown in Figure 1a (red solid line). The residual, which we want to use to estimate the derivative of the ground truth, is shown in Figure 1b. Although the jumps have not been removed entirely, this signal can be used to estimate the derivative using filtering.

The filtered residual (we used the build-in MATLAB function ’smooth’ with option ’rlowess’ (robust weighted linear least squares) and ) is shown in Figure 1c. This signal is sufficiently smooth to be differentiated. We use central differences; to suppress the remaining noise in the filtered residual we use a step size for differentiation that is times the original step size. The result is shown in Figure 1d (reg solid line) along with the true derivative (green dashed line). We use the absolute value of the so computed derivative as the parameter .

The reconstruction obtained using is shown in Figure 1e. We see that the reconstruction is best in areas where our estimate of the true derivative was most faithful (e.g., between and ). But also in other areas the reconstruction is good and preserves the structure of the ground truth rather well. We notice a small artefact at the value of the argument of around ; examining the estimate of the derivative in Figure 1d and the residual in Figure 1b, we notice that was not able the remove the jump at this location and therefore the estimate of the derivative was too large. This allowed the reconstruction to get too close to the data at this point.

We also notice that the jumps are sometimes reduced, with a characteristic linear cut near the jump (e.g., near and ). This can have different reasons. For the jumps near and we see that the estimate of is too large (the true derivative is zero), which allows the regulariser to cut the edges. For the jumps near and the situation is different. At these positions a negative slope in the ground truth is followed by a positive jump. Since only constraints the absolute value of the gradient, even with a correct estimate of the regulariser will reduce the jump, going with the maximum slope in the direction of the jump. Functions with a negative slope followed by a positive jump are also problematic for , since they do not satisfy the source condition (their subdifferential is empty [7]). In such cases will also always reduce the jump.

Figure 1e shows the reconstruction obtained with . The reconstruction is quite good, although it is often piecewise-affine where the ground truth is not, e.g. between and or between and . As expected, both regularizers tend to push the reconstructions towards their kernels, but, since with a good choice of contains the ground truth in its kernel (up to the jumps), it yields reconstructions that are more faithful to the structure of the ground truth.

4.2 2D experiments.

In this Section we study the performance of in denoising of 2D images. We use two images - “cameraman” (Figure 2a) and “owl” (Figure 3a). Both images have the resolution pixels and values in the interval . The images are corrupted with Gaussian noise with standard deviation (Figures 2b and 3b).

Figure 2: The pipeline for the reconstruction of the “cameraman” image with (c-g). An overregularized reconstruction is used to detect and partially eliminate the jumps (c). (Note that managed to segment the picture into piecewise-constant regions rather well.) The residual (d) is filtered (e) and numerically differentiated. The norm of the obtained gradient is used as the parameter for ((f), is scaled to the interval for presentation purposes). The reconstructions using (g) and (h) are almost identical. Both preserve edges and are rather smooth away from them. Details are rather well preserved (see, e.g., the pillars of the building in the background as well as the face of the cameraman; the texture of the grass is lost in both cases, however). Relatively homogeneous regions in the original image are also relatively homogeneous in the reconstruction, yet they are not piecewise constant. SSIM values differ very little.
Figure 3: The pipeline for the reconstruction of the “owl” image with (c-g). An overregularized reconstruction is used to detect and partially eliminate the jumps (c). (This time the segmentation obtained by is not perfect – perhaps too detailed – but still rather good.) The residual (d) is filtered (e) and numerically differentiated. The norm of the obtained gradient is used as the parameter for ((f), is scaled to the interval for presentation purposes). This time the residual is not as clear as in the “cameraman” example and the estimated seems noisier. However, it still mainly follows the structure of the original image. The reconstruction using (g) preserves the edges and well reconstructs some details in the image, e.g., the feathers of the owl. Other details, however, are lost (the needles of the pine tree in the background). Looking at in this region, we notice that it is rather irregular and does not capture the structure of the ground truth. The reconstruction (h) is again very similar to and SSIM values are very close.

The pipeline for the reconstruction using is the same as in 1D. We obtain a piecewise constant image by solving an overregularized ROF problem (4) with (Figures 2c and 3c) and compute the residuals (Figures 2d and 3d). Then we smooth the residuals using a Gauss filter with (Figures 2e and 3e) and compute its derivatives in the - and -directions using the same approach as in 1D (central differences with a different step size; we used step size in this example). These derivatives are used to estimate , which is set equal to the norm of the gradient. Figures 2f and 3f show scaled to the interval for better visibility. We use the same parameters (for Gaussian filtering and numerical differentiation) to estimate in both images. Reconstructions obtained using and are shown in Figures 2g-h and 3g-h. As in the 1D example, the parameter for was set to .

Comparing the results for both images, we notice that the residual (as well as its filtered version) captures the details in the “cameraman” image much better than in the “owl” image. The filtered residual in the “owl” image seems to miss some of the structure of the original image and this is reflected in the estimated (which looks much noisier in the “owl” image and, in particular, does not capture the structure of the needles of the pine tree in the upper left corner). This might be due to the segmentation achieved by , which seems better in the “cameraman” image (the one in the “owl” image seems to a bit too detailed). This effect might be mitigated by using a better segmentation technique.

This difference is reflected in the reconstructions. While in the “cameraman” image the details are well preserved (e.g., the face of the cameraman or his camera, as well as the details of the background), in the “owl” image part of them are lost and replaced by rather blurry (if not constant) regions; however, in other regions, such as the feathers of the owl, the details are preserved much better, which can be also seen from the estimated that is much more regular in this area and closer to the structure of the ground truth. We also notice some loss of contrast in the reconstruction. Perhaps, it could be dealt with by adopting the concept of debiasing [13, 6] in the setting of , however, it is not clear yet, what is the structure of the model manifold in this case.

The reconstructions look strikingly similar to those obtained by . Structural similarity between these reconstructions (i.e. SSIM computed using on of them as the reference) is for the “cameraman” image and for the “owl” image. Although the reconstructions depend on the parameter any may differ more from for other values of , the one we chose here () is reasonable and lies within the optimal range reported in [12].

There are two main messages to be taken from these experiments. The first one is that is able to almost reproduce the reconstructions obtained using with a reasonable choice of the parameter , which is a very good performance for a method that does not use higher-order derivatives. The second one is that the performance of greatly depends on the quality of the estimate of . When we are able to well capture the structure of the original image in this estimate, the structure of the reconstructions is rather close to that of the ground truth. To further illustrate this point, we show in Fig. 4 reconstructions in the ideal scenario when is estimated from the ground truth as the local magnitude of the gradient. The quality of the reconstructions is very good, suggesting that with a better strategy of estimating the gradient could achieve even better performance.

Figure 4: In the ideal scenario when is estimated from the ground truth, is able to reproduce the original image almost perfectly

5 Conclusions

We proposed a new -type regularizer that can be used to decompose the image into a jump part and a part with Lipschitz continuous gradient (with a given Lipschitz constant that is also allowed to vary spatially). Functions whose gradient does not exceed this constant lie in the kernel of the regularizer and are not penalized. By smartly choosing this bound we can hope to put the ground truth into the kernel (up to the jumps) and thus not penalize any structure that is present in the ground truth.

In this paper we presented, in the context of denoising, an approach to estimating this bound from the noisy image. The approach is based on segmenting the image (and compensating for the jumps) using overregularized and estimating the local bound on the gradient using filtering. Our numerical experiments showed that can produce reconstructions that are very similar to , however, the results significantly depend on the quality of the estimation of the local bound on the gradient. Using a more sophisticated estimation technique is expected to further improve the reconstructions. The ideal application for would be one where there is some information about the magnitude of the Lipschitz part of the gradient.


This work was supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 777826 (NoMADS). MB acknowledges further support by ERC via Grant EU FP7 – ERC Consolidator Grant 615216 LifeInverse. YK acknowledges support of the Royal Society through a Newton International Fellowship. YK also acknowledges support of the Humbold Foundataion through a Humbold Fellowship he held at the University of Münster when this work was initiated. CBS acknowledges support from the Leverhulme Trust project on Breaking the non-convexity barrier, EPSRC grant Nr. EP/M00483X/1, the EPSRC Centre Nr. EP/N014588/1, the RISE projects CHiPS and NoMADS, the Cantab Capital Institute for the Mathematics of Information and the Alan Turing Institute. We gratefully acknowledge the support of NVIDIA Corporation with the donation of a Quadro P6000 and a Titan Xp GPUs used for this research.


  • [1] Ambrosio, L., Fusco, N., Pallara, D.: Functions of Bounded Variation and Free Discontinuity Problems. Clarendon Press (2000)
  • [2] Borwein, P. Zhu, Q.: Techniques of Variational Analysis. CMS Books in Mathematics, Springer (2005)
  • [3] Bredies, K., Kunisch, K., Pock, T.: Total generalized variation. SIAM Journal on Imaging Sciences 3, 492–526 (2011)
  • [4] Bredies, K., Lorenz, D.: Mathematische Bildverarbeitung. Einführung in Grundlagen und moderne Theorie. Springer (2011)
  • [5] Bredies, K., Valkonen, T.: Inverse problems with second-order total generalized variation constraints. In: Proceedings of the 9th International Conference on Sampling Theory and Applications (SampTA) 2011, Singapore (2011)
  • [6] Brinkmann, E.M., Burger, M., Rasch, J., Sutour, C.: Bias reduction in variational regularization. Journal of Mathematical Imaging and Vision 59(3), 534–566 (2017)
  • [7] Bungert, L., Burger, M., Chambolle, A., Novaga, M.: Nonlinear spectral decompositions by gradient flows of one-homogeneous functionals (2019), arXiv:1901.06979
  • [8] Burger, M., Korolev, Y., Rasch, J.: Convergence rates and structure of solutions of inverse problems with imperfect forward models. Inverse Problems 35(2), 024006 (2019)
  • [9] Burger, M., Papafitsoros, K., Papoutsellis, E., Schönlieb, C.B.: Infimal convolution regularisation functionals of and spaces. part i. the finite case. Journal of Mathematical Imaging and Vision 55(3), 343–369 (2016)
  • [10] Burger, M., Papafitsoros, K., Papoutsellis, E., Schönlieb, C.B.: Infimal convolution regularisation functionals of and spaces. the case . In: Bociu, L., Désidéri, J., Habbal, A. (eds.) CSMO 2015. IFIP AICT vol. 494. Springer (2016)
  • [11] Chambolle, A., Duval, V., Peyré, G., Poon, C.: Geometric properties of solutions to the total variation denoising problem. Inverse Problems 33(1), 015002 (2017)
  • [12] De los Reyes, J.C., Schönlieb, C.B., Valkonen, T.: Bilevel parameter learning for higher-order total variation regularisation models. Journal of Mathematical Imaging and Vision 57(1) (2017)
  • [13]

    Deledalle, C.A., Papadakis, N., Salmon, J.: On debiasing restoration algorithms: Applications to total-variation and nonlocal-means. In: Aujol, J.F., Nikolova, M., Papadakis, N. (eds.) Scale Space and Variational Methods in Computer Vision. pp. 129–141. Springer (2015)

  • [14] Engl, H., Hanke, M., Neubauer, A.: Regularization of Inverse Problems. Springer (1996)
  • [15] Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx (Mar 2014)
  • [16] Iglesias, J.A., Mercier, G., Scherzer, O.: A note on convergence of solutions of total variation regularized linear inverse problems. Inverse Problems 34, 055011 (2018)
  • [17] Jalalzai, K.: Some remarks on the staircasing phenomenon in total variation-based image denoising. Journal of Mathematical Imaging and Vision 54(2), 256–268 (2016)
  • [18] Luxemburg, W.: Banach function spaces. Ph.D. thesis, T.U. Delft (1955)
  • [19] Ring, W.: Structural properties of solutions to total variation regularization problems. ESAIM: M2AN 34(4), 799–810 (2000)
  • [20] Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena 60(1), 259 – 268 (1992)