A Unified Multiscale Framework for Discrete Energy Minimization

04/22/2012 ∙ by Shai Bagon, et al. ∙ Weizmann Institute of Science 0

Discrete energy minimization is a ubiquitous task in computer vision, yet is NP-hard in most cases. In this work we propose a multiscale framework for coping with the NP-hardness of discrete optimization. Our approach utilizes algebraic multiscale principles to efficiently explore the discrete solution space, yielding improved results on challenging, non-submodular energies for which current methods provide unsatisfactory approximations. In contrast to popular multiscale methods in computer vision, that builds an image pyramid, our framework acts directly on the energy to construct an energy pyramid. Deriving a multiscale scheme from the energy itself makes our framework application independent and widely applicable. Our framework gives rise to two complementary energy coarsening strategies: one in which coarser scales involve fewer variables, and a more revolutionary one in which the coarser scales involve fewer discrete labels. We empirically evaluated our unified framework on a variety of both non-submodular and submodular energies, including energies from Middlebury benchmark.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 7

page 10

Code Repositories

discrete_multiscale

Matlab code implementing discrete multiscale optimization presented in: Shai Bagon and Meirav Galun A Unified Multiscale Framework for Discrete Energy Minimization (arXiv'2012), and Shai Bagon and Meirav Galun A Multiscale Framework for Challenging Discrete Optimization (NIPS Workshop on Optimization for Machine Learning 2012).


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Figure 1: A Unified multiscale framework: We derive multiscale representation of the energy itself = energy pyramid. Our multiscale framework is unified in the sense that different problems with different energies share the same multiscale scheme, making our framework widely applicable and general.

Discrete energy minimization is ubiquitous in computer vision, and spans a variety of problems. These energies can be grossly divided into two classes: submodular and non/submodular energies. Submodular energies are characterized by “smoothness” encouraging pairwise (or higher order) terms. Apart from the binary case, minimizing these energies is known to be NP-hard. Despite this theoretical hardness, such submodular energies, which naturally reflect a “piecewise constant” prior, gained popularity and became very common in computer vision applications, such as denoising, stereo and multi-label segmentation (e.g., Szeliski et al (2008)). For this reason most of the efforts of the vision community regarding discrete optimization focused on developing approximate optimization methods for these submodular energies, yielding quite successful algorithms. Recently, more challenging, non/submodular energies started to gain popularity. These energies are characterized by a combination of “smooth” and “non-smooth” encouraging pairwise terms. The correlation-clustering functional, recently applied to segmentation, co-segmentation and clustering (e.g., Glasner et al (2011); Bagon and Galun (2011)), is an example for such non/submodular energy. Moreover, non/submodular energies may appear when the parameters of the energy are automatically learned (e.g., Nowozin et al (2011)). Since such non/submodular energies are only recently explored, their optimization receives less attention, and consequently, the existing optimization methods provide approximations that may be quite unsatisfactory. In practice, it is generally considered a more challenging task to optimize non/submodular energies.

But what makes discrete energy minimization such a challenging endeavor? The fact that this minimization implies an exploration of an exponentially large search space. One way to alleviate this difficulty is to use multiscale search. The illustration on the right shows a toy “energy” at different scales of detail. Considering only the original scale (), it is very difficult to suggest an effective exploration (optimization) method. However, when looking at coarser scales () of the energy an interesting phenomenon is revealed. At the coarsest scale () the large basins of attraction emerge, but with very low accuracy. As the scales become finer (), one “loses sight” of the large basins, but may now “sense” more local properties with higher accuracy. We term this well known phenomenon as the multiscale landscape of the energy. This multiscale landscape phenomenon encourages coarse/to/fine exploration strategies: starting with the large basins that are apparent at coarse scales, and then gradually and locally refining the search at finer scales.

For more than three decades the vision community focuses on the multiscale pyramid of images (e.g., Lucas and Kanade (1981); Burt and Adelson (1983)). There is almost no experience and no methods that apply a multiscale scheme directly to discrete energies.

Another domain in which multiscale methods are common practice is numerical PDE solvers. Early works in that domain applied geometric coarsening (geometric multigrid), which is the analogue of the classical image pyramid. A solution for a PDE was then obtained by applying a single/scale solver at each scale (relaxation). This geometric multigrid paradigm suggested a very simple construction of a regular pyramid at the cost of very careful design of single/scale solvers, tailoring them for each problem separately. A breakthrough for the PDE community was the development of algebraic multigrid (AMG) of Brandt (1986). The algebraic multigrid approach suggests to derive the pyramid directly from the underlying problem, resulting with irregular data/driven pyramid. This way, local and general solvers (e.g., Gauss-Seidel relaxation) can be incorporated into the algebraic pyramid yielding improved and robust solutions (Stüben (1999)).

In this paper we present a novel unified discrete multiscale optimization scheme that acts directly on the energy (Fig. 1). Our multiscale framework is unified in the sense that it is application independent: different problems with different energies share the same multiscale scheme, making our framework widely applicable and general. More importantly, our multiscale method efficiently explores the discrete solution space through an irregular multiscale energy pyramid, constructed by energy/aware

coarse-to-fine interpolation. In a sense, our method may be considered as the discrete analogue of AMG: Instead of focusing attention on complicated optimization schemes, our framework exposes the multiscale landscape of the energy through energy/aware construction of the pyramid. This way even simple and local optimization methods can be incorporated into our pyramid yielding improved and robust approximations. In practice, we apply our multiscale optimization method to a large set of challenging problems, including submodular and non/submodular, and achieve comparable or lower energy values, than those obtained by the state/of/the/art methods.

This work makes several contributions:⁢

  1. A novel unified multiscale framework for discrete optimization: A wide variety of optimization problems, including segmentation, stereo, denoising, correlation/clustering, and others share the same multiscale framework.

  2. Any multiscale scheme requires a single/scale optimization method to refine the search at each scale. Our framework is also unified in the sense that it is not restricted to any specific optimization method.

  3. Energy/aware coarsening scheme. Variable aggregation takes into account the underlying structure of the energy itself, thus efficiently and directly exposes its multiscale landscape.

  4. Provide discrete analogue to AMG. Incorporating even simple and local optimization methods into out energy/aware pyramid yields good approximations.

  5. Coarsening the labels. Our formulation allows for variable coarsening as well as for label coarsening.

  6. Optimizing hard non/submodular energies. We achieve significantly lower energy assignments on diverse computer vision energies, including challenging non/submodular examples.

1.1 Related work

Algorithms for discrete energy minimization can work in the primal space or the dual space. Primal methods act on the discrete variables in the label space to minimize the energy (e.g., Besag (1986); Boykov et al (2002); Rother et al (2007)). Dual methods formulate a dual problem to the energy and maximize a lower bound to the sought energy (e.g., Kolmogorov (2006)). Dual methods are recently considered more favorable since they do not only provide an approximate solution, but also provide a lower bound on how far this solution is from the global optimum. Furthermore, if a labeling is found with energy equals to the lower bound a certificate is provided that the global optimum was found. For the submodular energies it was shown (by Szeliski et al (2008)) that dual methods tend to provide better approximations with very tight lower bounds. However, using several classes of non/submodular energies, we empirically demonstrate that when it comes to challenging non/submodular energies, primal methods tend to provide better approximations than dual methods, since in these cases the lower bound is no longer tight (Werner (2010)).

Our multiscale framework constructs a multiscale energy pyramid in terms of the primal space. We achieve comparable performance when applied to submodular problems and superior performance when applied to non/submodular problems, while comparing it to the state-of-the-art methods (primal and dual).

There are very few works that apply multiscale schemes directly to the discrete energy. A prominent example for this approach was suggested by Felzenszwalb and Huttenlocher (2006); it provides a coarse/to/fine belief propagation scheme restricted to regular diadic pyramid. A more recent work is that of Komodakis (2010) that provides an algebraic multigrid formulation for discrete optimization in the dual space. However, despite his general formulation Komodakis only provides examples using regular diadic grids of submodular energies.

The work of Kim et al (2011) proposes a two-scale scheme mainly aimed at improving run-time of the optimization process. Their proposed coarsening strategies can be interpreted as special cases of our unified framework. We analyze their underlying assumptions (Sec. 3.1), and suggest better methods for efficient exploration of the multiscale landscape of the energy.

The complexity of the optimization algorithms is affected by the number of discrete labels, as well as the number of variables. Existing optimization algorithms starts to fall behind when facing energies with large label space. Lempitsky et al (2007) proposed a method to exploit known properties of the metric between the labels to allow for faster minimization of energies with large number of labels. However, their method is restricted to energies with clear and known label metrics and requires training. In contrast, our framework addresses this issue via a principled scheme that builds an energy pyramid with decreasing number of labels without prior training and with fewer assumptions on the labels interactions.

2 Multiscale Energy Pyramid

We consider discrete pair-wise minimization problems, defined over a (weighted) graph , of the form:

(1)

where is the set of variables, is the set of edges, and the solution is discrete: , with variables taking possible labels. Many problems in computer vision are cast in the form of (1) (see Szeliski et al (2008)). Furthermore, we do not restrict the energy to be submodular, and our framework is also applicable to more challenging non/submodular energies.

Our aim is to build an effective energy pyramid with a decreasing number of degrees of freedom. The key component in constructing such a pyramid is the interpolation method. The interpolation maps solutions between levels of the pyramid, and determines the original energy approximation with fewer degrees of freedom. We propose a novel principled energy aware interpolation method such that the resulting energy pyramid efficiently exposes the multiscale landscape of the energy making low energy assignments apparent at coarse levels.

Practically, it is counter intuitive to directly interpolate discrete label values, since they usually have only semantic interpretation. Therefore, we substitute an assignment by an equivalent binary matrix representation . The rows of correspond to the variables, and the columns corresponds to labels: iff variable is labeled “” (). This representation allows us to interpolate discrete solutions, as will be shown in the subsequent sections.

Expressing the energy (1) using yields a relaxed quadratic representation (Rangarajan (2000)). This algebraic representation forms the basis for our principled multiscale framework derivation:

(2)
s.t. (3)

where , s.t. , and s.t. , .

An energy over variables with labels is now parameterized by .

We first describe the energy pyramid construction for a general interpolation matrix , and defer the detailed description of our novel interpolation to Sec. 3.

Energy coarsening by variables

Let be the fine scale energy. We wish to generate a coarser representation with . This representation approximates using fewer variables: with only rows.

An interpolation matrix s.t. , maps coarse assignment to fine assignment . For any fine assignment that can be approximated by a coarse assignment , i.e.,

(4)

Plugging (4) into (2):

(5)

We have generated a coarse energy parameterized by that approximates the fine energy . This coarse energy is of the same form as the original energy allowing us to apply the coarsening procedure recursively to construct an energy pyramid.

Energy coarsening by labels

So far we have explored the reduction of the number of degrees of freedom by reducing the number of variables. However, we may just as well look at the problem from a different perspective: reducing the search space by decreasing the number of labels from to (). It is a well known fact that optimization algorithms suffer from significant degradation in performance as the number of labels increases (Bleyer et al (2010)). Here we propose a novel principled and general framework for reducing the number of labels at each scale.

Let be the fine scale energy. Looking at a different interpolation matrix , we interpolate a coarse solution by . This time the interpolation matrix acts on the labels, i.e., the columns of . The coarse labeling matrix has the same number of rows (variables), but fewer columns (labels). We use notation to emphasize that the coarsening here affects the labels rather than the variables.

Coarsening the labels yields:

(6)

Again, we end up with the same type of energy, but this time it is defined over a smaller number of discrete labels: , where and .

The main theoretical contribution of this work is encapsulated in the multiscale “trick” of equations (5) and (6). Formulating the interpolation as a linear operator () and plugging it in the quadratic energy representation (3) provides a principled algebraic representation for our multiscale framework. Our direct formulation is in contrast to the “ad-hoc” representation of Felzenszwalb and Huttenlocher (2006); Kim et al (2011), and Komodakis (2010). Our scheme moves the multiscale completely to the optimization side and makes it independent of any specific application. We can practically approach now a wide and diverse family of energies using the same multiscale implementation.

The effectiveness of the multiscale approximation of (5) and (6) heavily depends on the interpolation matrix ( resp.). Poorly constructed interpolation matrices will fail to expose the multiscale landscape of the functional. In the subsequent section we describe our principled energy/aware method for computing it.

3 Energy-aware Interpolation

[r][r]

Figure 2: Interpolation as soft variable aggregation: fine variables 1, 2, 3 and 4 are softly aggregated into coarse variables 1 and 2. For example, fine variable 1 is a convex combination of of 1 and of 2. Hard aggregation is a special case where is a binary matrix. In that case each fine variable is influenced by exactly one coarse variable.

In this section we use terms and notations for variable coarsening (), however the motivation and methods are applicable for label coarsening () as well due to the similar algebraic structure of (5) and (6).

Our energy pyramid approximates the original energy using a decreasing number of degrees of freedom, thus excluding some solutions from the original search space at coarser scales. Which solutions are excluded is determined by the interpolation matrix . A desired interpolation does not exclude low energy assignments at coarse levels.

The matrix can be interpreted as an operator that aggregates fine-scale variables into coarse ones (Fig. 2). Aggregating fine variables and into a coarser one excludes from the search space all assignments for which . This aggregation is undesired if assigning and to different labels yields low energy. However, when variables and are in agreement under the energy (i.e., assignments with yield low energy), aggregating them together allows for efficient exploration of low energy assignments. A desired interpolation aggregates and when and are in agreement under the energy.

3.1 Measuring energy/aware agreements

We provide two measures for agreement, one is used for computing variable-coarsening (), while the other is used for label coarsening ().

Energy-aware agreement between variables:

A reliable estimation for the agreement between the variables allows us to construct a desirable

that aggregates variables that are in agreement under the energy. A naïve approach would assume that neighboring variables are always in agreement (this assumption underlies the diadic pyramids of Felzenszwalb and Huttenlocher (2006); Komodakis (2010)). This assumption clearly does not hold in general and may yield an undesired interpolation matrix leading to an inefficient multiscale scheme. More recently Kim et al (2011) suggested to use the energy itself in order to estimate variable agreements. However, their ad-hoc methods are incapable of balancing the effect of the unary and pair-wise terms of the energy.

Indeed it is difficult to decide which term dominates and how to fuse these two terms together. Therefore, we propose a novel empirical scheme for agreement estimation that naturally accounts for and integrates the influence of both the unary and the pair-wise terms. Moreover, our method applies to all energies (2): submodular, non/submodular, metric , arbitrary , arbitrary , energies defined over regular grids and arbitrary graphs.

Variables and are in agreement under the energy when yields relatively low energy value. To estimate these agreements we empirically generate several samples with relatively low energy, and measure the label agreement between neighboring variables and in these samples. We use Iterated Conditional Modes (ICM) of Besag (1986) to obtain locally low energy assignments: Starting with a random assignment ICM chooses, at each iteration, for each variable, the label yielding the largest decrease of the energy function, conditioned on the labels assigned to its neighbors.

This procedure may be viewed as a special case of sampling from a distribution: The assumed underlying distribution is a Gibbs distribution, i.e., . ICM may be interpreted as Gibbs sampling from the distribution at the limit (i.e., the ”zero-temperature” limit). Therefore, our samples may be viewed as zero-temperature Gibbs sampling with multiple restarts from the posterior (Koller and Friedman (2009)).

Performing ICM iterations with random restarts provides us with samples . Utilizing the label-disagreement weights encoded in the matrix , the disagreement between neighboring variable and is estimated as , where is the label of variable in the sample. Their agreement is then given by , with .

Energy-aware agreement between labels: Agreements between labels are easier to estimate, since this information is explicit in the matrix that encodes the label-disagreement between any two labels. Setting , we get a “closed-form” expression for the agreements between labels.

3.2 From agreements to interpolation

Using our measure for the variable agreements, , we follow the Algebraic Multigrid (AMG) method of Brandt (1986) to first determine the set of coarse representatives and then construct an interpolation matrix that softly aggregates variables according to their agreement.

We begin by selecting a set of coarse representative variables , such that every variable in is in agreement with . A variable is considered in agreement with if . That is, every variable in is either in or is in agreement with other variables in , and thus well represented in the coarse scale.

We perform this selection greedily and sequentially, starting with adding to if it is not yet in agreement with . The parameter affects the coarsening rate, i.e., the ratio , smaller results in a lower ratio.

At the end of this process we have a set of coarse representatives . The interpolation matrix is then defined by:

(7)

Where is the coarse index of the variable whose fine index is (in Fig. 2: and ).

We further prune rows of leaving only maximal entries. Each row is then normalized to sum to 1. Throughout our experiments we use (), () for computing ( resp.).

4 A Unified Discrete Multiscale Framework

Input: Energy .
Output:
Init // fine scale
// Energy pyramid construction:
while  do
       Estimate pair-wise agreements at scale (Sec. 3.1). Compute interpolation matrix (Sec. 3.2). Derive coarse energy (Eq. 5).
// Coarse-to-fine optimization:
while  do
       Refine // interpolate a solution
      
where Refine uses an existing single/scale method to optimize the energy with as an initialization.
Algorithm 1 Discrete multiscale optimization.

So far we have described the different components of our multiscale framework. Alg. 1 puts them together into a multiscale minimization scheme. Given an energy , our framework first works fine-to-coarse to compute interpolation matrices that construct the “energy pyramid”: . Typically we end up at the coarsest scale with less than variables. As a result, exploring the energy at this scale is robust to the initial assignment of the single/scale method used111In practice, at the coarsest scale we use “winner-take-all” initialization as suggested by (Szeliski et al, 2008, §3.1)..

Starting from the coarsest scale, we apply a simple single/scale optimization method (e.g., ICM, -expansion, etc.). Since there are very few degrees of freedom at the coarsest scale, these single/scale methods are likely to obtain a low-energy coarse solution. This stems from the fact that at the coarsest scale the large basins of attraction of the energy are easily accessed and explored.

At each scale , the coarse solution is interpolated to a finer scale : . At the finer scale serves as a good initialization for optimizing the energy with the same single/scale optimization method. These two steps of interpolation followed by refinement are repeated for all scales from coarse to fine.

Single-scale optimization methods for discrete energies generally accept only discrete assignments (i.e., the binary constraints (3)) as an initialization. However, the interpolated solution , at each scale, might not satisfy the binary constraints (3). Therefore, we round each row of by setting the maximal element to and the rest to .

The most computationally intensive module of our framework is the empirical estimation of the variable agreements. The complexity of the agreement estimation is , where is the number of non-zero elements in and is the number of labels. However, it is fairly straightforward to parallelize this module.

It is now easy to see how our framework generalizes Felzenszwalb and Huttenlocher (2006), Komodakis (2010) and Kim et al (2011). They are restricted to hard aggregation in . Felzenszwalb and Huttenlocher (2006) and Komodakis (2010) use a multiscale pyramid, however their variable aggregation is not energy/aware, and is restricted to diadic pyramids. On the other hand, Kim et al (2011) have limited energy/aware aggregation, applied to two level “pyramid” only.

5 Experimental Results

We evaluated our multiscale framework on a diversity of discrete optimization tasks222code available at www.wisdom.weizmann.ac.il/~bagon/matlab.html.: ranging from challenging non/submodular synthetic and co-clustering energies, to low-level submodular vision energies such as denoising and stereo. In all of these experiments we minimize a given publicly available benchmark energy, we do not attempt to improve on the energy formulation itself.

For every instance of energy minimization problem in these benchmarks we construct an energy pyramid using our method. We then use our energy pyramid to efficiently exploit the multiscale landscape of each energy to improve optimization results of existing methods. In the following experiments we use ICM (Besag (1986)), -swap and -expansion (large move making algorithms of Boykov et al (2002)) as representative single/scale primal optimization algorithms. Each step of the large move making algorithms of Boykov et al (2002) solves a reduced binary problem. For the challenging non/submodular energies these binary steps are approximated using QPBO(I) of Rother et al (2007).

We follow the protocol of Szeliski et al (2008) that uses the lower bound of TRW-S (Kolmogorov (2006)) as a baseline for comparing performance of different optimization methods on different energies. We report the ratio between the resulting energy value and the lower bound (in percents), closer to is better.

These experiments show how our energy/aware construction of the pyramid efficiently exposes the underlying multiscale landscape of the energy. This way even simple and very local optimization scheme (applied at each scale) can achieve good approximations. The most prominent example is ICM (Besag (1986)): this greedy local coordinate descend algorithm performs poorly when applied directly to the energy. It converges very rapidly to a sub-optimal local solution (see, e.g., Szeliski et al (2008)). However, when used within our multiscale framework, local search at coarse scales amounts to very large and non-local search in the fine scale. This example stresses the advantage of constructing energy/aware multiscale framework: Exposing the multiscale landscape of the energy helps to achieve good approximation even when using simple and local methods at each scale.

When incorporating large move making algorithms as the single/scale optimization in our framework, there is a consistent improvement of multiscale over these single/scale scheme. In addition, TRW-S is a dual method and is considered state/of/the/art for discrete energy minimization (Szeliski et al (2008)). However, we show that when it comes to non/submodular energies it struggles behind the large move making algorithms and even ICM. Moreover, for these challenging energies, our multiscale framework gives a significant boost in optimization performance, achieving significantly lower energy values than the TRW-S.

ICM Swap(QPBO) Expand(QPBO) TRW-S
Ours single Ours single Ours single
scale scale scale
Table 1: Synthetic results: Showing percent of achieved energy value relative to the lower bound (closer to is better) for ICM, -swap, -expansion and TRW-S for varying strengths of the pair-wise term (, stronger harder to optimize.)

5.1 Synthetic

We begin with synthetic non/submodular energies defined over a 4-connected grid graph of size (), and labels. The unary term . The pair-wise term () and . The parameter controls the relative strength of the pair-wise term, stronger (i.e., larger ) results with energies more difficult to optimize (see Kolmogorov (2006)). Table 1 shows results, averaged over 100 experiments.

Using our multiscale framework to perform coarse/to/fine optimization of the energy yields significantly lower energies for all single/scale methods used (ICM, -expansion and -swap) and TRW-S: The percents in “ours” column are closer to than the results of the other methods.

Despite the fact that these synthetic energies were randomly generated without any underlying structure, still there is a multiscale landscape to the functional. Our multiscale framework constructs an energy pyramid that exposes this underlying multiscale landscape, resulting with better and more efficient optimization results.

The resulting synthetic energies are non/submodular (since may become negative). For these challenging energies, state-of-the-art dual method (TRW-S) performs rather poorly333We did not restrict the number of iterations, and let TRW-S run until no further improvement to the lower bound is made. (worse than single scale ICM) and there is a significant gap between the lower bound and the energy of the actual primal solution provided. This gap might be due to the fact that for these challenging no-submodular energies the dual bound is not tight (Werner (2010)).

GT Input ICM QPBO TRW-S Sim.
Ours single Ours single Ann.
scale scale
Figure 3: Chinese characters inpainting: Visualizing some of the instances used in our experiments. Columns are (left to right): The original character used for testing. The input, partially occluded character. ICM and QPBO results both our multiscale and single scale results. Results of TRW-S and results of Nowozin et al (2011) obtained with a very long run of simulated annealing (using Gibbs sampling inside the annealing).
Figure 4: Energies of Chinese characters inpainting: Box plot showing 25%, median and 75% of the resulting energies relative to reference energies of Nowozin et al (2011) (lower than

= lower than baseline). Our multiscale approach combined with QPBO achieves consistently better energies than baseline, with very low variance. TRW-S improves on only 25% of the instances with very high variance in the results.

ICM QPBO TRW-S
Ours single-scale Ours single-scale
(a)
(b)
Table 2: Energies of Chinese characters inpainting: table showing (a) mean energies for the inpainting experiment relative to baseline of Nowozin et al (2011) (lower is better, less than = lower than baseline). (b) percent of instances for which strictly lower energy was achieved.

5.2 Chinese character inpainting

We further experiment with non/submodular learned binary energies of (Nowozin et al, 2011, §5.2)444available at www.nowozin.net/sebastian/papers/DTF_CIP_instances.zip.. These 100 instances of non/submodular pair-wise energies are defined over a 64-connected grid. These energies were designed and trained to perform the task of learning Chinese calligraphy, represented as a complex, non-local binary pattern.

Our experiments show how approaching these challenging energies using our unified multiscale framework allows for better approximations. Table 2 and Fig. 3

compare our multiscale framework to single/scale methods acting on the primal binary variables. Since the energies are binary, multi-label large move making algorithms boils down to binary QPBO. We also provide an evaluation of a dual method (TRW-S) on these energies. In addition to the quantitative results, Fig. 

4 provides a visualization of some of the instances of the restored Chinese characters.

For these challenging non/submodular ‘real world” energies our multiscale framework provides significant improvement over single/scale scheme.

ICM Swap(QPBO) Expand(QPBO) TRW-S
Ours single Ours single Ours single
scale scale scale
(a)
(b)
Table 3: Co-clustering results: Baseline for comparison are state-of-the-art results of Glasner et al (2011). (a) We report our results as percent of the baseline: smaller is better, lower than even outperforms state-of-the-art. (b) We also report the fraction of energies for which our multiscale framework outperform state-of-the-art.

5.3 Co-clustering

The problem of co-clustering addresses the matching of superpixels within and across frames in a video sequence. Following (Bagon and Galun, 2011, §6.2), we treat co-clustering as a discrete minimization of non/submodular Potts energy. We obtained 77 co-clustering energies, courtesy of Glasner et al (2011), used in their experiments. The number of variables in each energy ranges from 87 to 788. Their sparsity (percent of non-zero entries in ) ranges from to , The resulting energies are non/submodular, have no underlying regular grid, and are very challenging to optimize Bagon and Galun (2011).

Table 3 compares our discrete multiscale framework combined with ICM, -swap and -expansion. For these energies we use a different baseline: the state-of-the-art results of Glasner et al (2011) obtained by applying specially tailored convex relaxation method (We do not use the lower bound of TRW-S here since it is far from being tight for these challenging energies). Our multiscale framework improves state-of-the-art for this family of challenging energies and significantly outperform TRW-S.

Furthermore, the results demonstrated in the last three sub-sections highlight the advantage that primal methods has over dual ones when it comes to challenging non/submodular energies.

5.4 Submodular energies

We further applied our multiscale framework to optimize less challenging submodular energies. We use the diverse low-level vision MRF energies from the Middlebury benchmark Szeliski et al (2008)555Available at vision.middlebury.edu/MRF/..

For these submodular energies, TRW-S (single scale) performs quite well and in fact, if enough iterations are allowed its lower bound converges to the global optimum. As opposed to TRW-S, large move making and ICM do not always converge to the global optimum. Yet, we are able to show a significant improvement for primal optimization algorithms when used within our multiscale framework. Tables 4 and 5 and Figs. 5 and 6 show our multiscale results for the different submodular energies.

ICM Swap Expand
Ours single scale Ours single scale Ours single scale
Tsukuba
Venus
Teddy
Table 4: Stereo: Showing percent of achieved energy value relative to the lower bound (closer to is better). Visual results for these experiments are in Fig. 5. Energies from Szeliski et al (2008).
ICM Swap Expand Ground
Ours Single scale Ours Single scale Ours Single scale truth
Figure 5: Stereo: Note how our multiscale framework drastically improves ICM results. visible improvement for -swap can also be seen in the middle row (Venus). Numerical results for these examples are shown in Table 4. Energies from Szeliski et al (2008).
ICM Swap Expand
Ours single scale Ours single scale Ours single scale
House
Penguin
Table 5: Denoising and inpainting: Showing percent of achieved energy value relative to the lower bound (closer to is better). Visual results for these experiments are in Fig. 6. Energies from Szeliski et al (2008).
Input ICM Swap Expand
Ours Single scale Ours Single scale Ours Single scale
Figure 6: Denoising and inpainting: Single scale ICM is unable to cope with inpainting: performing local steps it is unable to propagate information far enough to fill the missing regions in the images. On the other hand, our multiscale framework allows ICM to perform large steps at coarse scales and successfully fill the gaps. Numerical results for these examples are shown in Table 5. Energies from Szeliski et al (2008).

5.5 Comparing variable agreement estimation methods

As explained in Sec. 3 the agreements between the variables are the most crucial component in constructing an effective multiscale scheme. In this experiment we compare our energy/aware agreement measure (Sec. 3.1) to three methods proposed by Kim et al (2011): “unary-diff”, “min-unary-diff” and “mean-compat”. These methods estimate the agreement based either on the unary term or the pair-wise term, but not both. We also compare to an energy-agnostic measure, that is , this method underlies Felzenszwalb and Huttenlocher (2006); Komodakis (2010).

For each energy we estimate variable agreements using these five different approaches. These different estimations are then used to construct five different energy-pyramids (as described in Sec. 3.2). Better agreement estimation will results with better exploration of the multiscale landscape of the energy yielding better optimization results. We use ICM with each of the five energy-pyramids to evaluate the influence these methods have on the resulting multiscale performance for three representative energies.

Fig. 7 shows percent of lower bound for the different energies. Energy-pyramids constructed based on our agreement estimation method consistently outperforms all other methods, and successfully balances between the influence of the unary and the pair-wise terms.

Figure 7: Comparing agreements estimation methods: Graphs showing percent of lower bound (closer to is better) for different methods of computing variable-agreements. One bar is cropped at . Our energy/aware measure consistently outperforms all other methods. As a reference, results of single/scale optimization are shown on the right.

5.6 Coarsening labels

-swap does not scale gracefully with the number of labels. Coarsening an energy in the labels domain (i.e., same number of variables, fewer labels) proves to significantly improve performance of -swap, as shown in Table 6. For these examples constructing the energy pyramid took only milliseconds, due to the “closed form” formula for estimating label correlations.

Our principled framework for coarsening labels improves -swap performance for these energies.

Energy #labels #labels Ours single
(finest) (coarsest) scale
Penguin 256 67
(denoising) [sec] [sec]
Venus 20 4
(stereo) [sec] [sec]
Table 6: Coarsening labels: Working coarse/to/fine in the labels domain. We use 5 scales with coarsening rate of . Number of variables is unchanged. Table shows percent of achieved energy value relative to the lower bound (closer to is better), and running times. These results were obtained using -swap for optimizing each scale.

6 Conclusion

This work presents a unified multiscale framework for discrete energy minimization that allows for efficient and direct exploration of the multiscale landscape of the energy. We propose two paths to expose the multiscale landscape of the energy: one in which coarser scales involve fewer and coarser variables, and another in which the coarser levels involve fewer labels. We also propose adaptive methods for energy/aware interpolation between the scales. Our multiscale framework significantly improves optimization results for challenging energies.

Our framework provides the mathematical formulation that “bridges the gap” and relates multiscale discrete optimization and algebraic multiscale methods used in PDE solvers (e.g., Brandt (1986)). This connection allows for methods and practices developed for numerical solvers to be applied in multiscale discrete optimization as well.

Acknowledgements.
We would like to thank Maria Zontak and Daniel Glasner for their insightful remarks and discussions.

References

  • Bagon and Galun (2011) Bagon S, Galun M (2011) Large scale correlation clustering optimization. arXiv
  • Besag (1986) Besag J (1986) On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society
  • Bleyer et al (2010) Bleyer M, Rother C, Kohli P (2010) Surface stereo with soft segmentation. In: CVPR
  • Boykov et al (2002) Boykov Y, Veksler O, Zabih R (2002) Fast approximate energy minimization via graph cuts. PAMI
  • Brandt (1986) Brandt A (1986) Algebraic multigrid theory: The symmetric case. Applied Mathematics and Computation
  • Burt and Adelson (1983) Burt P, Adelson E (1983) The laplacian pyramid as a compact image code. IEEE Transac on Commun
  • Felzenszwalb and Huttenlocher (2006) Felzenszwalb P, Huttenlocher D (2006) Efficient belief propagation for early vision. IJCV
  • Glasner et al (2011) Glasner D, Vitaladevuni S, Basri R (2011) Contour-based joint clustering of multiple segmentations. In: CVPR
  • Kim et al (2011) Kim T, Nowozin S, Kohli P, Yoo C (2011) Variable grouping for energy minimization. In: CVPR
  • Koller and Friedman (2009) Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. The MIT Press
  • Kolmogorov (2006) Kolmogorov V (2006) Convergent tree-reweighted message passing for energy minimization. PAMI
  • Komodakis (2010) Komodakis N (2010) Towards more efficient and effective LP-based algorithms for MRF optimization. In: ECCV
  • Lempitsky et al (2007) Lempitsky V, Rother C, Blake A (2007) Logcut-efficient graph cut optimization for markov random fields. In: ICCV
  • Lucas and Kanade (1981)

    Lucas B, Kanade T (1981) An iterative image registration technique with an application to stereo vision. In: International joint conference on artificial intelligence

  • Nowozin et al (2011)

    Nowozin S, Rother C, Bagon S, Sharp T, Yao B, Kohli P (2011) Decision tree fields. In: ICCV

  • Rangarajan (2000)

    Rangarajan A (2000) Self-annealing and self-annihilation: unifying deterministic annealing and relaxation labeling. Pattern Recognition

  • Rother et al (2007) Rother C, Kolmogorov V, Lempitsky V, Szummer M (2007) Optimizing binary MRFs via extended roof duality. In: CVPR
  • Stüben (1999) Stüben K (1999) Algebraic multigrid (AMG). An introduction with applications. GMD Forschungszentrum Informationstechnik, Sankt Augustin
  • Szeliski et al (2008) Szeliski R, Zabih R, Scharstein D, Veksler O, Kolmogorov V, Agarwala A, Tappen M, Rother C (2008) A comparative study of energy minimization methods for markov random fields with smoothness-based priors. PAMI
  • Werner (2010)

    Werner T (2010) Revisiting the linear programming relaxation approach to gibbs energy minimization and weighted constraint satisfaction. PAMI