Discrete Energy Minimization, beyond Submodularity: Applications and Approximations

by   Shai Bagon, et al.

In this thesis I explore challenging discrete energy minimization problems that arise mainly in the context of computer vision tasks. This work motivates the use of such "hard-to-optimize" non-submodular functionals, and proposes methods and algorithms to cope with the NP-hardness of their optimization. Consequently, this thesis revolves around two axes: applications and approximations. The applications axis motivates the use of such "hard-to-optimize" energies by introducing new tasks. As the energies become less constrained and structured one gains more expressive power for the objective function achieving more accurate models. Results show how challenging, hard-to-optimize, energies are more adequate for certain computer vision applications. To overcome the resulting challenging optimization tasks the second axis of this thesis proposes approximation algorithms to cope with the NP-hardness of the optimization. Experiments show that these new methods yield good results for representative challenging problems.



There are no comments yet.


page 1

page 17

page 22

page 36

page 38

page 39


A Unified Multiscale Framework for Discrete Energy Minimization

Discrete energy minimization is a ubiquitous task in computer vision, ye...

Submodularization for Quadratic Pseudo-Boolean Optimization

Many computer vision problems require optimization of binary non-submodu...

Complexity of Discrete Energy Minimization Problems

Discrete energy minimization is widely-used in computer vision and machi...

Constrained Dominant sets and Its applications in computer vision

In this thesis, we present new schemes which leverage a constrained clus...

Submodular Hamming Metrics

We show that there is a largely unexplored class of functions (positive ...

Clustering is Easy When ....What?

It is well known that most of the common clustering objectives are NP-ha...

Parametric Majorization for Data-Driven Energy Minimization Methods

Energy minimization methods are a classical tool in a multitude of compu...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Multiscale Energy Pyramid

In this work we consider discrete pair-wise minimization problems, defined over a (weighted) graph , of the form:


where is the set of variables, is the set of edges, and the solution is discrete: , with variables taking possible labels. Many problems in computer vision are cast in the form of (the-at-equationgroup-at-ID) (see [Szeliski2008]

). Furthermore, we do not restrict the energy to be submodular, and our framework is also applicable to more challenging non-submodular energies. Our aim is to build an energy pyramid with a decreasing number of degrees of freedom. The key component in constructing such a pyramid is the interpolation method. The interpolation maps solutions between levels of the pyramid, and defines how to approximate the original energy with fewer degrees of freedom. We propose a novel principled energy aware interpolation method. The resulting energy pyramid exposes the multiscale landscape of the energy making low energy assignments apparent at coarse levels. However, it is counter intuitive to directly interpolate discrete values, since they usually have only semantic interpretation. Therefore, we substitute an assignment

by a binary matrix . The rows of correspond to the variables, and the columns corresponds to labels: iff variable is labeled “” (). This representation allows us to interpolate discrete solutions, as will be shown in the subsequent sections. Expressing the energy (the-at-equationgroup-at-ID) using yields a relaxed quadratic representation (along the lines of [Anand2000]) that forms the basis for our multiscale framework derivation:

s.t. ()

where is sparse with entries , s.t. , and s.t. , . A detailed derivation of (the-at-equationgroup-at-IDa) can be found in Sec. . An energy over variables with labels is now parameterized by . We first describe the energy pyramid construction for a given interpolation matrix , and defer the detailed description of our novel interpolation to Sec. Document.

Energy coarsening by variables

Let be the fine scale energy. We wish to generate a coarser representation with . This representation approximates using fewer variables: with only rows. Given an interpolation matrix s.t. , it maps coarse to fine assignments through:


For any fine assignment that can be approximated by a coarse assignment we may plug (Energy coarsening by variables) into (the-at-equationgroup-at-IDa) yielding:


We have generated a coarse energy parameterized by that approximates the fine energy . This coarse energy is of the same form as the original energy allowing us to apply the coarsening procedure recursively to construct an energy pyramid.

Energy coarsening by labels

So far we have explored the reduction of the number of degrees of freedom by reducing the number of variables. However, we may just as well look at the problem from a different perspective: reducing the search space by decreasing the number of labels from to (). It is a well known fact that optimization algorithms (especially large move making, e.g., [Boykov2001]) suffer from significant degradation in performance as the number of labels increases ([Bleyer2010]). Here we propose a novel principled and general framework for reducing the number of labels at each scale. Let be the fine scale energy. Looking at a different interpolation matrix , we may interpolate a coarse solution by . This time the interpolation matrix acts on the labels, i.e., the columns of . The coarse labeling matrix has the same number of rows (variables), but fewer columns (labels). We use notation to emphasize that the coarsening here affects the labels rather than the variables. Coarsening the labels yields:


Again, we end up with the same type of energy, but this time it is defined over a smaller number of discrete labels: , where and .  The main theoretical contribution of this work is encapsulated in the multiscale “trick” of equations () and (). This formulation forms the basis of our unified framework allowing us to coarsen the energy directly and exploits its multiscale landscape for efficient exploration of the solution space. This scheme moves the multiscale completely to the optimization side and makes it independent of any specific application. We can practically now approach a wide and diverse family of energies using the same multiscale implementation. The effectiveness of the multiscale approximation of () and () heavily depends on the interpolation matrix ( resp.). Poorly constructed interpolation matrices will fail to expose the multiscale landscape of the functional. In the subsequent section we describe our principled energy-aware method for computing it.

Energy-aware Interpolation


Figure : Interpolation as soft variable aggregation: finefine variable fine1, fine2 and fine4 are aggregated into coarsecoarse variable coarse1, while finefine variables fine1,fine3 and fine4 are aggregated into coarsecoarse variable coarse2. Soft aggregation allows for finefine variables to be influenced by few coarsecoarse variables, e.g.: finefine variable fine1 is a convex combination of of coarse1 and of coarse2. Hard aggregation is a special case where is a binary matrix. In that case each fine variable is influenced by exactly one coarse variable.

In this section we use terms and notations for variable coarsening (), however the motivation and methods are applicable for label coarsening () as well due to the similar algebraic structure of () and (). Our energy pyramid approximates the original energy using a decreasing number of degrees of freedom, thus excluding some solutions from the original search space at coarser scales. Which solutions are excluded is determined by the interpolation matrix . A desired interpolation does not exclude low energy assignments at coarse levels. The matrix can be interpreted as an operator that aggregates fine-scale variables into coarse ones (Fig. Document). Aggregating fine variables and into a coarser one excludes from the search space all assignments for which . This aggregation is undesired if assigning and to different labels yields low energy. However, when variables and are strongly correlated by the energy (i.e., assignments with yield low energy), aggregating them together efficiently allows exploration of low energy assignments. A desired interpolation aggregates and when and are strongly correlated by the energy.

Measuring energy-aware correlations

We provide two correlations measures, one used in computing variable coarsening () and the other used for label coarsening (). Energy-aware correlations between variables:

A reliable estimation for the correlations between the variables allows us to construct a desirable

that aggregates strongly correlated variables. A naïve approach would assume that neighboring variables are correlated (this assumption underlies [Felzenszwalb2006]). This assumption clearly does not hold in general and may lead to an undesired interpolation matrix . [Kim2011] proposed several “closed form formulas” for energy-aware variable grouping. However, their formulas take into account either the unary term or the pair-wise term. Indeed it is difficult to decide which term dominates and how to fuse these two terms together. Therefore, there is no “closed form” method that successfully integrates both of them. As opposed to these “closed form” methods, we propose a novel empirical scheme for correlation estimation. Empirical estimation of the correlations naturally accounts for and integrates the influence of both the unary and the pair-wise terms. Moreover, our method, inspired by [Ron2011, Livne2011], extends to all energies (the-at-equationgroup-at-IDa): submodular, non-submodular, metric , arbitrary , arbitrary , energies defined over regular grids and arbitrary graphs. Variables and are correlated by the energy when yields relatively low energy value. To estimate these correlations we empirically generate several “locally” low energy assignments, and measure the label agreement between neighboring variables and . We use Iterated Conditional Modes (ICM) of [Besag1986] to obtain locally low energy assignments: Starting with a random assignment, ICM chooses, at each iteration and for each variable, the label yielding the largest decrease of the energy function, conditioned on the labels assigned to its neighbors. Performing ICM iterations for random initializations provides locally low energy assignments . Our empirical dissimilarity between and is given by , and their correlation is given by , with . It is interesting to note that strong correlation between variables and usually implies that the pair-wise term binding them together () is a smoothness-preserving type of relation. We assume that even for challenging energies with many contrast-enhancing pair-wise terms, there are still significant amount of smoothness-preserving terms to allow for effective coarsening. Energy-aware correlations between labels: Correlations between labels are easier to estimate, since this information is explicit in the matrix that encodes the “cost” (i.e., dissimilarity) between two labels. Setting , we get a “closed-form” expression for the correlations between labels.

From correlations to interpolation

Using our measure for the variable correlations, , we follow the Algebraic Multigrid (AMG) method of [Brandt1986] to compute an interpolation matrix that softly aggregates strongly correlated variables. We begin by selecting a set of coarse representative variables , such that every variable in is strongly correlated with . That is, every variable in is either in or is strongly correlated to other variables in . A variable is considered strongly correlated to if . affects the coarsening rate, i.e., the ratio , smaller results in a lower ratio. We perform this selection greedily and sequentially, starting with adding to if it is not yet strongly correlated to . Given the selected coarse variables , maps indices of variables from fine to coarse: is the coarse index of the variable whose fine index is (in Fig. Document: and ). The interpolation matrix is defined by:


We further prune rows of leaving only maximal entries. Each row is then normalized to sum to 1. Throughout our experiments we use (), () for computing ( resp.).

Unified Discrete Multiscale Framework

[t] Discrete multiscale optimization. KwInitInit KwOptRefine KwCoarseCoarsen Energy . fine scale Energy pyramid construction: Estimate pair-wise correlations at scale (Sec. Document).  Compute interpolation matrix (Sec. Document).  Derive coarse energy (Eq. ).    Coarse-to-fine optimization: interpolate a solution   where uses an “off-the-shelf” algorithm to optimize the energy with as an initialization.  So far we have described the different components of our multiscale framework. Alg. Document puts them together into a multiscale minimization scheme. Given an energy , our framework first works fine-to-coarse to compute interpolation matrices that generates an “energy pyramid”. Typically we end up at the coarsest scale with less than variables. As a result, exploring the energy at this scale is robust to the initial assignment of the single-scale method used111In practice we use “winner-take-all” initialization as suggested by [Szeliski2008, §3.1].. Starting from the coarsest scale, a coarse solution at scale is interpolated to a finer scale . At the finer scale it serves as a good initialization for an “off-the-shelf” single-scale optimization that refines this interpolated solution. These two steps are repeated for all scales from coarse to fine. The interpolated solution , at each scale, might not satisfy the binary constraints (the-at-equationgroup-at-IDa). We round each row of by setting the maximal element to and the rest to . The most computationally intensive modules of our framework are the empirical estimation of the variable correlations and the single-scale optimization used to refine the interpolated solutions. The complexity of the correlation estimation is , where is the number of non-zero elements in and is the number of labels. However, it is fairly straightforward to parallelize this module. It is now easy to see how our framework generalizes [Felzenszwalb2006], [Komodakis2010] and [Kim2011]: They are restricted to hard aggregation in . [Felzenszwalb2006] and [Komodakis2010] use a multiscale pyramid, however their variable aggregation is not energy-aware, and is restricted to diadic pyramids. On the other hand, [Kim2011] have limited energy-aware aggregation, applied to a two level only “pyramid”. They only optimize at the coarse scale and cannot refine the solution on the fine scale.

Experimental Results

Our experiments has two main goals: first, to stress the difficulty of approximating non-submodular energies and to show the advantages of primal methods for this type of minimization problems. The other goal is to demonstrate how our unified multiscale framework improved the performance of existing single-scale primal methods. We evaluated our multiscale framework on a diversity of discrete optimization tasks222code available at www.wisdom.weizmann.ac.il/~bagon/matlab.html.: ranging from challenging non-submodular synthetic and co-clustering energies to low-level submodular vision energies such as denoising and stereo. In addition we provide a comparison between the different methods for measuring variable correlations that were presented in Sec. Document. We conclude with a label coarsening experiment. In all of these experiments we minimize a given publicly available benchmark energy, we do not attempt to improve on the energy formulation itself. We use ICM ([Besag1986]), -swap and -expansion (large move making algorithms of [Boykov2001]) as representative single-scale “off-the-shelf” primal optimization algorithms. To help large move making algorithms to overcome the non-submodularity of some of these energies we augment them with QPBO(I) of [Rother2007]. We follow the protocol of [Szeliski2008] that uses the lower bound of TRW-S ([Kolmogorov2006]) as a baseline for comparing the performance of different optimization methods for different energies. We report how close the results (in percents) to the lower bound: closer to is better. We show a remarkable improvement for ICM combined in our multiscale framework compared with a single-scale scheme. For the large move making algorithms there is a smaller but consistent improvement of the multiscale over a single scale scheme. TRW-S is a dual method and is considered state-of-the-art for discrete energy minimization [Szeliski2008]. However, we show that when it comes to non-submodular energies it struggles behind the large move making algorithms and even ICM. For these challenging energies, multiscale gives a significant boost in optimization performance.

3* ICM Swap Expand 2*TRW-S
2*oursOurs single 2*oursOurs single 2*oursOurs single
scale scale scale
ours ours ours
ours ours ours
ours ours ours
Table : Synthetic results (energy): Showing percent of achieved energy value relative to the lower bound (closer to is better) for ICM, -swap, -expansion and TRW-S for varying strengths of the pair-wise term (, stronger harder to optimize.)


We begin with synthetic non-submodular energies defined over a 4-connected grid graph of size (), and labels. The unary term . The pair-wise term () and . The parameter controls the relative strength of the pair-wise term, stronger (i.e., larger ) results with energies more difficult to optimize (see [Kolmogorov2006]). Table Document shows results, averaged over 100 experiments. The resulting synthetic energies are non-submodular (since may become negative). For these challenging energies, state-of-the-art dual method (TRW-S) performs rather poorly333We did not restrict the number of iterations, and let TRW-S run until no further improvement to the lower bound is made. (worse than single scale ICM) and there is a significant gap between the lower bound and the energy of the actual primal solution provided. Among the primal methods used, These results motivate our focusing on primal methods, especially -swap.

3*GT 3*Input ICM QPBO 3*TRW-S Sim.
2*ours Ours single 2*ours Ours single Ann.
scale scale
[width=]ms/014-gt-014.png [width=]ms/test_0014.jpg [width=]ms/014-icm-ms-014.png [width=]ms/014-icm-ss-014.png [width=]ms/014-qpbo-ms-014.png [width=]ms/014-qpbo-ss-014.png [width=]ms/014-trws-ss-014.png [width=]ms/014-ref-ss-014.png
[width=]ms/016-gt-016.png [width=]ms/test_0016.jpg [width=]ms/016-icm-ms-016.png [width=]ms/016-icm-ss-016.png [width=]ms/016-qpbo-ms-016.png [width=]ms/016-qpbo-ss-016.png [width=]ms/016-trws-ss-016.png [width=]ms/016-ref-ss-016.png
[width=]ms/023-gt-023.png [width=]ms/test_0023.jpg [width=]ms/023-icm-ms-023.png [width=]ms/023-icm-ss-023.png [width=]ms/023-qpbo-ms-023.png [width=]ms/023-qpbo-ss-023.png [width=]ms/023-trws-ss-023.png [width=]ms/023-ref-ss-023.png
[width=]ms/025-gt-025.png [width=]ms/test_0025.jpg [width=]ms/025-icm-ms-025.png [width=]ms/025-icm-ss-025.png [width=]ms/025-qpbo-ms-025.png [width=]ms/025-qpbo-ss-025.png [width=]ms/025-trws-ss-025.png [width=]ms/025-ref-ss-025.png
[width=]ms/033-gt-033.png [width=]ms/test_0033.jpg [width=]ms/033-icm-ms-033.png [width=]ms/033-icm-ss-033.png [width=]ms/033-qpbo-ms-033.png [width=]ms/033-qpbo-ss-033.png [width=]ms/033-trws-ss-033.png [width=]ms/033-ref-ss-033.png
[width=]ms/043-gt-043.png [width=]ms/test_0043.jpg [width=]ms/043-icm-ms-043.png [width=]ms/043-icm-ss-043.png [width=]ms/043-qpbo-ms-043.png [width=]ms/043-qpbo-ss-043.png [width=]ms/043-trws-ss-043.png [width=]ms/043-ref-ss-043.png
Figure : Chinese characters inpainting: Visualizing some of the instances used in our experiments. Columns are (left to right): The original character used for testing. The input partially occluded character. ICM and QPBO results both our multiscale and single scale results. Results of TRW-S and results of [Nowozin2011] obtained with a very long run of simulated annealing (using Gibbs sampling inside the annealing).
[width=.45]ms/binary_dtf_energy_cropped.pdf [width=.45]ms/binary_dtf_runtimes_cropped.pdf
(a) (b)


Figure : Energies of Chinese characters inpainting: Box plot showing 25%, median and 75% of the resulting energies relative to reference energies of [Nowozin2011] (lower than

= lower than baseline). Our multiscale approach combined with QPBO achieves consistently better energies than baseline, with very low variance. TRW-S improves on only 25% of the instances with very high variance in the results.

oursOurs single-scale oursOurs single-scale
(a) ours ours
(b) ours ours
Table : Energies of Chinese characters inpainting: table showing (a) mean energies for the inpainting experiment relative to baseline of [Nowozin2011] (lower is better, less than = lower than baseline). (b) percent of instances for which strictly lower energy was achieved.

Chinese character inpainting

We further experiment with learned binary energies of [Nowozin2011, §5.2]444available at www.nowozin.net/sebastian/papers/DTF_CIP_instances.zip.

. These 100 instances of non-submodular pair-wise energies are defined over a 64-connected grid. These energies were designed and trained to perform the task of learning Chinese calligraphy, represented as a complex, non-local binary pattern. Despite the very large number of parameters involved in representing such complex energies, learning is conducted very efficiently using Decision Tree Field (DTF). The main challenge in these models becomes the inference at test time. Our experiments show how approaching these challenging energies using our unified multiscale framework allows for better approximations. Table 

Document and Fig. Document

compare our multiscale framework to single-scale methods acting on the primal binary variables. Since the energies are binary, we use QPBO instead of large move making algorithms. We also provide an evaluation of a dual method (TRW-S) on these energies. In addition to the quantitative results, Fig. 

Document provides a visualization of some of the instances of the restored Chinese characters. These “real world” energies highlight the advantage primal methods has over dual ones when it comes to challenging non-submodular energies. It is further clear that significant improvement is made by our multiscale framework.

ICM Swap Expand TRW-S
2*oursOurs single 2*oursOurs single 2*oursOurs single
scale scale scale
(a) ours ours ours
(b) ours ours ours
(c) oursoursours
Table : Co-clustering results: Baseline for comparison are state-of-the-art results of [Glasner2011]. (a) We report our results as percent of the baseline: smaller is better, lower than even outperforms state-of-the-art. (b) We also report the fraction of energies for which our multiscale framework outperform state-of-the-art. (c) run times. pyramid construction milisec.


The problem of co-clustering addresses the matching of superpixels within and across frames in a video sequence. Following [Bagon2012, §6.2], we treat co-clustering as a discrete minimization of non-submodular Potts energy. We obtained 77 co-clustering energies, courtesy of [Glasner2011], used in their experiments. The number of variables in each energy ranges from 87 to 788. Their sparsity (percent of non-zero entries in ) ranges from to , The resulting energies are non-submodular, have no underlying regular grid, and are very challenging to optimize [Bagon2012]. Table Document compares our discrete multiscale framework combined with ICM and -swap. For these energies we use a different baseline: the state-of-the-art results of [Glasner2011] obtained by applying specially tailored convex relaxation method (We do not use the lower bound of TRW-S here since it is far from being tight for these challenging energies). Our multiscale framework improves state-of-the-art for this family of challenging energies.

semi-metric energies

We further applied our multiscale framework to optimize less challenging semi-metric energies. We use the diverse low-level vision MRF energies from the Middlebury benchmark [Szeliski2008]555Available at vision.middlebury.edu/MRF/.. For these semi-metric energies, TRW-S (single scale) performs quite well and in fact, if enough iterations are allowed, its lower bound converges to the global optimum. As opposed to TRW-S, large move making and ICM do not always converge to the global optimum. Yet, we are able to show a significant improvement for primal optimization algorithms when used within our multiscale framework. Tables Document and Document and Figs. Document and Document show our multiscale results for the different submodular energies. One of the conclusions of the Middlebury challenge was that ICM is no longer a valid candidate for optimization. Integrating ICM into our multiscale framework puts it back on the right track. Table Document exemplifies how our framework improved running times for two difficult energies (“Penguin” denoising and “Venus” stereo).

ICM Swap Expand
oursOurs single scale oursOurs single scale oursOurs single scale
Tsukuba ours ours ours
Venus ours ours ours
Teddy ours ours ours
Table : Stereo: Showing percent of achieved energy value relative to the lower bound (closer to is better). Visual results for these experiments are in Fig. Document. Energies from [Szeliski2008].
ICM Swap Expand Ground
oursOurs Single scale oursOurs Single scale oursOurs Single scale truth
[width=]ms/tsu_icm_0.png [width=]ms/tsu-ICM.png [width=]ms/tsu_swap_0.png [width=]ms/tsu-Swap.png [width=]ms/tsu-EXPAND-9.png [width=]ms/tsu-Expansion.png [width=]ms/tsukuba-truedispL.png
[width=]ms/ven_icm_1.png [width=]ms/ven-ICM.png [width=]ms/ven_swap_0.png [width=]ms/ven-Swap.png [width=]ms/ven-EXPAND-5.png [width=]ms/ven-Expansion.png [width=]ms/venus-truedispL.png
[width=]ms/ted_icm_0.png [width=]ms/ted-ICM.png [width=]ms/ted_swap_0.png [width=]ms/ted-Swap.png [width=]ms/ted-EXPAND-8.png [width=]ms/ted-Expansion.png [width=]ms/teddy-truedispL.png
Figure : Stereo: Note how our multiscale framework drastically improves ICM results. visible improvement for -swap can also be seen in the middle row (Venus). Numerical results for these examples are shown in Table Document.
ICM Swap Expand
oursOurs single scale oursOurs single scale oursOurs single scale
House ours ours ours
Penguin ours ours ours
Table : Denoising and inpainting: Showing percent of achieved energy value relative to the lower bound (closer to is better). Visual results for these experiments are in Fig. Document. Energies from [Szeliski2008].
Input ICM Swap Expand
oursOurs Single scale oursOurs Single scale oursOurs Single scale
[width=]ms/house-input.png [width=]ms/houseM_icm_0.png [width=]ms/houseM-ICM.png [width=]ms/houseM_swap_0.png [width=]ms/houseM-Swap.png [width=]ms/house-EXPAND-2.png [width=]ms/houseM-Expansion.png
[width=]ms/penguin-bar.png [width=]ms/penguin_icm_0.png [width=]ms/penguin-ICM.png [width=]ms/penguin_swap_0.png [width=]ms/penguin-Swap.png [width=]ms/penguin-EXPAND-1.png [width=]ms/penguin-Expansion.png
Figure : Denoising and inpainting: Single scale ICM is unable to cope with inpainting: performing local steps it is unable to propagate information far enough to fill the missing regions in the images. On the other hand, our multiscale framework allows ICM to perform large steps at coarse scales and successfully fill the gaps. Numerical results for these examples are shown in Table Document.
Energy #variables #variables oursOurs single
(finest) (coarsest) ours(multiscale) scale
Penguin ours
ours + [sec] [sec]
Venus ours
ours + [sec] [sec]
Table : Running times for variable coarsening (-swap): Examples of typical running times (in seconds). For multiscale we report the runtime for constructing the pyramid and the overall time it took to optimize coarse-to-fine. Note that the reported times are of our unoptimized serial Matlab implementation.

Comparing variable correlation estimation methods

As explained in Sec. Document the correlations between the variables are the most crucial component in constructing an effective multiscale scheme. In this experiment we compare our energy-aware correlation measure (Sec. Document) to three methods proposed by [Kim2011]: “unary-diff”, “min-unary-diff” and “mean-compat”. These methods estimate the correlations based either on the unary term or the pair-wise term, but not both. We also compare to an energy-agnostic measure, that is , this method underlies [Felzenszwalb2006]. We use ICM within our framework to evaluate the influence these methods have on the resulting multiscale performance for four representative energies. Fig. Document shows percent of lower bound for the different energies. Our measure consistently outperforms all other methods, and successfully balances between the influence of the unary and the pair-wise terms.


Figure : Comparing correlation measures: Graphs showing percent of lower bound (closer to is better) for different methods of computing variable-correlations. Some of the bars are cropped at . Our energy-aware measure consistently outperforms all other methods.

Coarsening labels

-swap does not scale gracefully with the number of labels. Coarsening an energy in the labels domain (i.e., same number of variables, fewer labels) proves to significantly improve performance of -swap, as shown in Table Document. For these examples constructing the energy pyramid took only milliseconds, due to the “closed form” formula for estimating label correlations. Our principled framework for coarsening labels improves -swap performance for these energies.

2*Energy #labels #labels 2*oursOurs single
(finest) (coarsest) scale
Penguin 2*256 2*67 ours
(denoising) ours [sec] [sec]
Venus 2*20 2*4 ours
(stereo) ours [sec] [sec]
Table : Coarsening labels (-swap): Working coarse-to-fine in the labels domain. We use 5 scales with coarsening rate of . Number of variables is unchanged. Table shows percent of achieved energy value relative to the lower bound (closer to is better), and running times.


This work presents a unified multiscale framework for discrete energy minimization that allows for efficient and direct exploration of the multiscale landscape of the energy. We propose two paths to expose the multiscale landscape of the energy: one in which coarser scales involve fewer and coarser variables, and another in which the coarser levels involve fewer labels. We also propose adaptive methods for energy-aware interpolation between the scales. Our multiscale framework significantly improves optimization results for challenging energies. Our framework provides the mathematical formulation that “bridges the gap” and relates multiscale discrete optimization and algebraic multiscale methods used in PDE solvers (e.g., [Brandt1986]). This connection allows for methods and practices developed for numerical solvers to be applied in multiscale discrete optimization as well.


Derivation of eq. (the-at-equationgroup-at-IDa)

In this work we consider discrete pair-wise minimization problems of the form:   (the-at-equationgroup-at-ID) Using the following parameterizations: s.t. , and s.t. we claim that (the-at-equationgroup-at-ID) is equivalent to:   (the-at-equationgroup-at-IDa)   s.t. (the-at-equationgroup-at-IDa) Assuming both and are symmetric666if they are not we need to be slightly more careful with transposing them, but roughly similar expression can be derived.. Looking at the different components in (the-at-equationgroup-at-IDa):

Looking at the trace of the second term:


As for the unary term:


Putting () and () together we get:

Note that the diagonal of is assumed to be zero: this is a reasonable assumption as represents an interaction of variable with itself. This type of interaction is well represented by the unary term . When coarsening the energy it may happen that will no longer have zeros on the diagonal. This case may arise when a single coarse variable represents neighboring fine scale variables. In that case the fine scale pair-wise interaction should be absorbed into the coarse scale unary term. It is easy to see that the term should be added to the unary term , whenever has non zeros on the diagonal. After this rectification, the non-zeros entries on the diagonal of can be set to zero.

More General Energy Functions

This chapter (LABEL:cp:multiscale) has focused on the construction of an energy pyramid for energies of the form (the-at-equationgroup-at-ID). However, this form does not cover all possible pair-wise energies. We have used this slightly restricted form throughout the paper since it emphasizes the simplicity of the algebraic derivation of our multiscale framework. Nevertheless, our multiscale framework can be as easily applied to more general energies.


A general from for pair-wise energy over a graph can be written as


In this more general form, the pair-wise term can be entirely different for each pair: . This is in contrast to the energy (the-at-equationgroup-at-ID) where the pair-wise terms differ only by the scaling factor of a single fixed term . The Photomontage energies of [Szeliski2008, §4.2] are an example of such general pair-wise energies. Instead of using a pair of matrices and to parameterize the pair-wise terms, we use a collection of matrices for the same purpose in the general settings. Each matrix is of size and is defined as . A general energy is now parameterized by . Coarsening variables: The computation of the interpolation matrix is unchanged. ICM can be applied to energies of the form () to estimate agreement between neighboring variables. These agreements are then used to compute the interpolation matrix (Sec. Document). In order to write the coarsening of the pair-wise term we need to introduce some notations: Let be variables at the fine scale, and denote variables in the coarse scale. An entry, , in the interpolation matrix indicates how fine scale variable is affected by a coarse scale variable . Given an interpolation matrix we can coarsen the energy by


The coarse graph, , is defined by all the non-zero pair-wise terms . The coarser energy is now parameterized by

Coarsening labels: In this case, the computation of the interpolation matrix is not trivial, since we no longer have a single matrix to derive the agreements from. We leave this issue of deriving an efficient interpolation matrix for the general case for future work. However, given a matrix the coarsening of labels can be done easily:


yielding a coarser energy parameterized by

with the same number of variables, , but fewer labels .  It is fairly straight forward to see that the energy (the-at-equationgroup-at-ID) is a special case of the more general form () and so the coarsening of variables and labels of Eq. () and () can be seen as special cases of Eq. () and () resp.

High-order energies

Discrete energies may involve terms that are beyond pair-wise: that is, describe interaction between sets of variables. These energies are often referred to as high-order energies. Examples of such energies can be found in e.g., [Kohli2009, Rother2009]. A high order energy is defined over a hyper-graph where the hyper-edges are subsets of variables s.t. .


Where the high-order terms are -way discrete functions:


A high-order energy of the form (

) can be parameterized using tensors. Each high-order term,

, is parameterized by a -order tensor


A high-order energy () is now parameterized by . Coarsening variables: The computation of the interpolation matrix is unchanged. ICM can be applied to energies of the form () to estimate agreement between neighboring variables. These agreements are then used to compute the interpolation matrix (Sec. Document). Given an interpolation matrix we can coarsen the energy by


For all coarse variables with non-zero entry . These non-zeros interactions in defines the coarse scale hyper-edges in . Note that when two (or more) fine-scale variable are represented by the same coarse variable , then the size of (the coarse scale hyper-edge) is reduced relative to the size of (the fine scale hyper-edge). Coarsening labels: In this case, the computation of the interpolation matrix is not trivial, since we no longer have a clear representation of the interactions between the different labels. We leave this issue of deriving an efficient interpolation matrix for the general case for future work. However, given a matrix the coarsening of labels can be done easily. Using to denote coarse scale labels


yielding a coarser energy parameterized by

with the same number of variables, , but fewer labels .