A Study of Lagrangean Decompositions and Dual Ascent Solvers for Graph Matching

12/16/2016 ∙ by Paul Swoboda, et al. ∙ 0

We study the quadratic assignment problem, in computer vision also known as graph matching. Two leading solvers for this problem optimize the Lagrange decomposition duals with sub-gradient and dual ascent (also known as message passing) updates. We explore s direction further and propose several additional Lagrangean relaxations of the graph matching problem along with corresponding algorithms, which are all based on a common dual ascent framework. Our extensive empirical evaluation gives several theoretical insights and suggests a new state-of-the-art any-time solver for the considered problem. Our improvement over state-of-the-art is particularly visible on a new dataset with large-scale sparse problem instances containing more than 500 graph nodes each.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In computer vision and beyond, the quadratic assignment problem, known also as graph matching, feature correspondence and feature matching, has attracted great interest. This problem is similar to Maximum-A-Posteriori (MAP) inference on a discrete pairwise graphical model, also called conditional random field (CRF) in the literature. It differs in an additional uniqueness constraint: Each label can be taken at most once. This uniqueness constraint makes it well-suited to attack e.g. tracking problems or shape matching. In both cases feature points or object parts have to be matched between multiple frames one-to-one. Unfortunately, the uniqueness constraint prevents naive application of efficient message passing solvers for MAP-inference to this problem. For this reason, many dedicated graph matching solvers were developed, see related work below.

On the other hand, efficient dual block-coordinate ascent (also known as message passing) algorithms like TRW-S [32] count among the most efficient solvers for MAP-inference in conditional random fields. Also, the graph matching problem, after possibly introducing many additional variables, can be stated as a MAP-inference problem in a standard pairwise CRF. Such an approach already surpasses most state-of-the-art graph matching solvers.

Hence, it is desirable to devise specialized convergent message passing solvers exhibiting none of the drawbacks discussed above, i.e. (i) directly operating on a compact representation of the graph matching problem. (ii) using techniques from the MAP-inference community to gain computational efficiency and

To achieve this goal, we propose (i) several Lagrangean decompositions of the graph matching problem and (ii) novel efficient message passing solvers for these relaxations. We show their efficacy in an extensive empirical evaluation.

Related work

The term graph matching

refers to a number of different optimization problems in pattern recognition, see 

[17] for a review. We mean the special version known unambiguously as quadratic assignment problem (QAP) [35]. Recently, the graph matching was generalized to the hypergraph matching problem (see [42] and references therein), which match between more than two graphs.

The quadratic assignment problem was first formulated in [12] back in . Since a number of NP-complete problems such as traveling salesman, maximal clique, graph isomorphism and graph partitioning can be straightforwardly reduced to QAP, this problem is NP-hard itself. Its importance for numerous applications boosted its analysis a lot: The (already aged) overview [40] contains references with over works suggesting new algorithms and over with new theoretical results related to this problem.

Nearly all possible solver paradigms were put to the test for QAP. These include, but are not limited to, convex relaxations based on Lagrangean decompositions [30, 51], linear [21, 5], convex quadratic [8] and semi-definite [62, 50, 45] relaxations, which can be used either directly to obtain approximate solutions or just to provide lower bounds. To tighten these bounds several cutting plane methods were proposed [10, 11]

. On the other side, various primal heuristics, both (i) deterministic, such as local search 

[43, 4], graduated assignment methods [24], fixed point iterations [38], spectral technique and its derivatives [37, 16, 52, 61] and (ii) stochastic, like random walk [15] and Monte-Carlo sampling [36, 48] were suggested to provide approximate solutions to the problem. Altogether these methods serve as building blocks for exact branch-and-bound [22, 25, 9] algorithms and other non-convex optimization methods [58, 63, 24]. The excellent surveys [40, 14] contain further references.

As is usual for NP-hard problems, no single method can efficiently address all QAP instances. Different applications require different methods and we concentrate here on problem instances specific for computer vision. Traditionally within this community predominantly primal heuristics are used, since demand for low computational time usually dominates the need to obtain optimality guarantees. However, two recently proposed solvers [51, 60] based on Lagrangean decomposition (also known as dual decomposition in computer vision) have shown superior results and surpassed numerous state-of-the-art primal heuristics.

The dual decomposition solver [51] represents the problem as a combination of MAP-inference for binary CRFs, the linear assignment problem and a number of small-sized QAPs over few variables; Lagrangean multipliers connecting these subproblems are updated with the sub-gradient method. Although the solver demonstrates superior results on computer vision datasets, we suspect that its efficiency can be further improved by switching to a different update method, such as bundle [31, 29] or block-coordinate ascent [56]. This suspicion is based on comparison of such solvers for MAP-inference in CRFs [28]

and similar observation related to other combinatorial optimization problems (see e.g. 

[44]).

Hungarian Belief Propagation (HBP) [60] considers a combination of a multilabel CRF and a linear assignment as subproblems; Lagrange mutipliers are updated by a block-coordinate ascent (message passing) algorithm and the obtained lower bounds are employed inside a branch-and-bound solver. It is known [33], however, that efficiency of message passing significantly depends on the schedule of sending messages. Specifically, efficiency of dual ascent algorithms depends on selecting directions for the ascent (blocks of variables to optimize over) and the order in which these ascent operations are performed. Arguably, the underlying multilabel CRF subproblem is crucial and the message passing must deal with it efficiently. However, HBP [60] uses a message passing schedule similarly as in the MPLP algorithm [23], which was shown [33, 28] to be significantly slower than the schedule of SRMP (TRW-S) [33].

Contribution

We study several Lagrangean decompositions of the graph matching problem. Some of these are known, e.g. the one used in the HBP algorithm [60] and the one corresponding to the local polytope relaxation of the pairwise CRF representation of graph matching. The others have not been published so far, to our knowledge. For all these decompositions we provide efficient message passing (dual ascent) algorithms based on a recent message passing framework [49]. In the case of the local polytope relaxation our algorithm coincides with the SRMP method [33], a higher-order generalization of the famous TRW-S algorithm [32].

Our experimental evaluation suggests a new state-of-the-art method for the graph matching problem, which outperforms both the dual decomposition [51] and the HBP [60] solvers. We propose tighter convex relaxations for all our methods. Also, we significantly improve performance of the HBP algorithm by changing its message passing schedule.

Proofs are given in the appendix. Code and datasets are availabe at http://github.com/pawelswoboda/LP_MP.

Notation.

Undirected graphs are denoted by , where is a finite node set and is the edge set. The set of neighboring nodes of w.r.t. graph is denoted by . The convex hull of a set is denoted by .

2 CRFs and Graph Matching

First, we introduce conditional random fields and state the graph matching problem as one with additional uniqueness constraints. Second, we consider an inverse formulation of the graph matching problem, which, after being coupled with the original formulation, often leads to faster algorithms.

Conditional random fields (CRF).

Let be an undirected graph. With each node we associate a variable  taking its values in a finite set of labels

. Hence, each label corresponds to a unit vector. Notation

denotes the Cartesian product . A vector with coordinates is called a labeling. Likewise, we use the notation (a special case being ) to indicate part of a labeling. Functions , , and , , are potentials, which define a local quality of labels and label pairs.

The energy minimization or MAP-inference problem for CRFs is

(1)

The objective in (1) is called energy of the CRF.

A great number of applied problems can be efficiently cast in the format (1), see e.g. [53, 28]

. This defines its importance for computer vision, machine learning and a number of other branches of science 

[53]. While problem (1) is NP-hard in general, many exact and approximate solvers were proposed [28].

Graph Matching.

Although the format of Problem (1) allows us to express many practically important optimization tasks efficiently, some applications require the resulting labelings to satisfy additional constraints. In particular, for the graph matching problem no label may be taken twice.

Let a common universe of labels be given such that . We require each label to be taken at most once, i.e. . In other words, we seek an injective mapping . This problem can be stated as

(2)

Graph matching is NP-hard, since it is equivalent to MAP-inference for CRFs (1) in the trivial case, when nodes of the graph contain mutually non-intersecting sets of labels.

Inverse Graph Matching

A special case arises if the universe of labels to be matched has the same size as the set of nodes of the graph . Then every injective mapping must also be a bijection. Hence, every feasible labeling corresponds to a permutation of . The graph matching problem (2) can in this case also be approached in terms of the inverse permutation. To this end let the inverse graph be given by ; the inverse label set is associated with each node ; respectively is the set of inverse labelings and denotes ; the set of edges of the inverse graph is defined as . The inverse costs for , read:

Consider the resulting inverse graph matching problem

(3)

Labeling and inverse labeling correspond to each other iff .

Note that when the edge set is sparse, the inverse edge set may be not. In such a case, computational complexity of the inverse problem is higher than of the original one.

3 Lagrangean Decompositions

Since the graph matching problem (2) is NP-hard, it is common to consider convex relaxations. Below, we present three Lagrangean decompositon based relaxations of the problem. These can be applied to the original graph matching problem (2), to the inverse one (3) and to a combination of both. Since all these relaxations are based on the famous local polytope relaxation [46, 54] of the MAP-inference for CRFs (1), we give a short overview of this relaxation first.

Local Polytope for CRFs.

The MAP-inference problem (1

) can be represented as an integer linear program (ILP) 

[34] using an overcomplete representation [53] by grouping potentials corresponding to each node and edge into separate vectors. That is, , stands for a vector with coordinates . The real-valued vectors have the same dimensionality as and stand for the ”relaxed” version of . The corresponding linear programming (LP) relaxation reads:

(4)

Constraints of (4) define the local polytope . Note that adding integrality constraints makes the problem (4) equivalent to its combinatorial formulation (1).

Integer Relaxed Pairwise Separable Linear Programs (IRPS-LP)

Below we describe a general problem format studied in [49], which generalizes the local polytope relaxation (4). Importantly, the same format fits also the Lagrangean decompositions of the graph matching problem, which we consider below. This makes it possible to consider all these relaxations at once from a general viewpoint.

Let a factor graph consist of nodes , called factors and edges , called factor-edges. Let , , be sets of binary vectors and (ii) , , be matrices with binary entries, which map binary vectors from into binary vectors from , i.e., . The IRPS-LP is a class of problems, which factorize according to .

(5)

Constraints are associated with each factor-edge and are called coupling constraints. When representing the local polytope relaxation (4) as (5) we assume and . The convex hull of is fully defined by the first line of constraints in (4), since constitutes a set of unit binary vectors. The second line of constraints in (4) defines the coupling constraints.

We use variable names for (in general) non-binary vectors and for binary ones , .

3.1 Graph Matching Problem Relaxations.

Below, we describe three relaxations of the graph matching (2) problem, which fit the IRPS-LP (5) format. The first one results in a standard local polytope relaxation (4) of a specially constructed CRF, the second one utilizes additional coupling constraints on top of (4), while the third approach uses a network flow subproblem. Additionally, we use the inverse formulation (3) and build two additional IRPS-LPs.

(R1) Graph Matching as CRF.

To build a CRF equivalent to the graph matching we start with the underlying CRF as in (2) and express the uniqueness constraints in the edge factors. To this end we (i) extend the edge set with new edges connecting any two nodes having at least one common label, i.e. ; (ii) assign edge potentials to all new edges ; (iii) for all we assign . Any solution of the resulting CRF (1) with cost is an assignment. The relaxation in terms of an IRPS-LP is the local polytope (4).

This approach results in general in a quadratic number of additional edge potentials, which may become intractable as the size of the graph matching problem grows.

(R2) Relaxation with Label Factors.

For each label we introduce an additional label factor, which keeps track of nodes which assign label . The label set of this factor consists of those nodes which can be assigned label and an additional dummy node representing non-assignment of label . Label # is necessary, as not every label needs to be taken. The set of factors becomes , with the coupling constraint set . The resulting IRPS-LP formulation reads

(R2)

Here we introduced additional potentials for the label factor. Initially, we set .

(R3) Relaxation with a Network Flow Factor.

If one ignores the edge potentials in (2), the problem can be equivalently reformulated as bipartite matching [6]:

(6)

Here we substituted the uniqueness constraints with the linear inequalities , which is equivalent for . It is known that is the convex hull of all binary vectors satisfying the conditions of  [6], i.e. . Therefore fits into the IRPS-LP framework. Crucially for an efficient implementation, (6) can be efficiently solved by minimum cost flow solvers [6].

Below we treat (6) as a separate factor and link it with (4) to obtain an IRPS-LP. Its factor graph is defined by and . The resulting IRPS-LP formulation is

(R3)

Initially, we set .

Representation (6) for the uniqueness constraints has been already used e.g., in [19]. However their optimization technique lacks both convergence guarantees and monotonicity of a lower bound, which our methods possess. The work [60] considered the Lagrange dual of (R3) as a relaxation the graph matching problem. Their relaxation is equivalent to (R3), but their algorithm differs from ours. We refer to Section 5 for a discussion of the differences and to Section 6 for an experimental comparison.

(R4-R5) Coupling Original Graph Matching (2) and its Inverse (3).

In the special case when we may solve the inverse graph matching problem (3) instead of the original one (2). Another alternative is to solve both problems simultaneously and couple them together by requiring that the labeling of (2) is the inverse permutation for the labeling from (3). Such an approach doubles the problem size, yet it may result in a smaller number of iterations required to obtain convergence. This approach works both for relaxations (R2) and (R3).

The resulting coupled IRPS-LP for (R2) reads

(R4)

Here the role of label factors in (R2) has been taken over by the node factors of the inverse graph matching (3). We distribute the costs equally among and initially.

Another coupled IRPS-LP, corresponding to (R3) reads

(R5)

Here the network flow factor controls consistency of the original and inverse labelings . Initially, we set and distribute costs in and equally.

The optimal values obtained by relaxations (R1) – (R5) may deliver differing bounds to (2), as characterized below.

Proposition 1.

(R2) = (R3) and (R4) = (R5). Relaxation (R1) is weaker than (R2) and (R3).

4 General Algorithm

In this section we define a general algorithm for IRPS-LP problems (5), which is applicable to the decompositions (R1)–(R5) of the graph matching problem considered in Section 3.1. Our algorithm is a simplified version of the algorithm [49], where we fixed several parameters to the values common to the relaxations (R1)–(R5).

Instead of optimizing IRPS-LP (5) directly, we consider its Lagrangean dual w.r.t. the coupling constraints . The IRPS-LP problem (5) can be shortly written as , where stands for , represents all coupling constraints and denotes a polytope encapsulating the rest of constraints. By dualizing with a vector of Lagrange multipliers one obtains the Lagrange function . After introducing the dual objective reads . It is well-known [13] that for any feasible and the dual problem consists in maximizing over . Going from to is called an equivalent transformation or reparametrization in the literature. In the CRF-literature it is also known as message passing.

Now we apply the above considerations to the general IRPS-LP problem (5). Specifically, let be two neighboring factors in the factor-graph . Then for any and satisfying the coupling constraint for edge

The values and sign of the Lagrange multiplies define how much cost is ”sent” from to or the other way around. When we consider a subset of the neighboring factors for , the resulting equivalent transformation reads:

(7)

We are interested in which improve the dual. Below we define a subclass of such messages for the same setting as in (7):

Definition 1.

Messages , , are called admissible, if there exists and additionally

(8)

We denote the set of admissible vectors by .

Lemma 1 ([49]).

Admissible messages do not decrease the dual value, i.e., implies .

Example 1.

Let us apply Definition 1 to the local polytope relaxation (4) of CRFs. Let correspond to , where is some node and is any of its incident edges and . Then corresponds to a locally optimal label and . Therefore we may assign to any value from . This assures that (8) is fulfilled and remains a locally optimal label after reparametrization even if there are multiple optima in .

Sending Messages.

Procedure 1 represents an elementary step of our optimization algorithm. It consists of sending messages from a node to a subset of its neighbors .

1 Optimize factor:
2 Choose
3 Maximize admissible messages to :
(9)
Update and , , according to (7)
Procedure 1 Send messages from to

Procedure 1 first computes an optimal labeling for the factor in line 1, then computes message updates in (9) and finally updates the costs in line 1. The costs in line 1 are chosen as , except when is the network flow factor for (R3) and (R5). In this case, we choose .

Computation (9) provides a maximally possible admissible message from to . Essentially, it makes the cost vector of the factor  as uniform as possible. So, in the setting of Example 1 becomes equal to and therefore for all . Since the result of (9) is an admissible message, Procedure 1 never decreases the dual objective, as follows from Lemma 1.

Dual Ascent Algorithm.

Let the notation stand for an ordered set such that , . Algorithm 2 below goes over some of the factors in a pre-specified order and calls Procedure 1 to send or receive messages to/from some of the neighbors.

1 Input:  ,  ,
2 for  do
3       for  do
4             Receive messages:
5             for  do
6                   Call Algorithm 1 with input .
7             end for
8            Send messages:
9             Call Algorithm 1 with input .
10       end for
11      Reverse the order of  and exchange 
12 end for
Algorithm 2 Dual Ascent for IRPS-LP

Algorithm 2 works as follows: We choose an ordered subset of factors . For each factor we select a neighborhood of factors from which to receive messages and a neighborhood to which messages are sent by Procedure 1. We run Algorithm 2 on (forward direction) and (backward direction) alternatingly until some stopping condition is met. Since Algorithm 2 reparametrizes the problem by Procedure 1 only and the latter is guaranteed to not decrease the dual, so is Algorithm 2. We refer to [49] for further theoretical properties of Algorithm 2.

5 Graph matching algorithms.

For each of the relaxations (R1)-(R5) of the graph matching problem we detail parameters of Algorithm 2 used in our experiments: we define the sets , , .

Algorithm Names.

We use the following shortcuts for specializations of Algorithm 2 to the relaxations (R1)-(R5): GM corresponds to (R1), AMP to (R2), AMCF to (R3). To obtain the relaxations (R1-R3) we use either the original graph, as in (2), or an inverse one, as in (3). These options are denoted by suffixes -O and -I respectively. Additionally, the two coupled relaxations (R4) and (R5), are addressed by algorithms AMP-C and AMCF-C respectively. All in all, we have eight algorithms GM-O, GM-I, AMP-O, AMP-I, AMP-C, AMCF-O, AMCF-I and AMCF-C.

The sets , and

are defined in Table 1. For algorithms with the suffix -I the values are the same as for those with -O, but corresponding to the inverse graph.

We assume the order of graph nodes and labels to be given a priori. We define for the matching factor and for the edge factors . Similarly, we define for all edge factors in the inverse graph. We extend the resulting partial order to a total one, e.g., by topological sort. For we define and as the sets of preceding and subsequent factors.

Sending a message by some factor automatically implies receiving this message by another, coupled factor. Therefore, there is no need to go over all factors in Algorithm 2. In particular, edge-factors are coupled to node-factors only, therefore processing all node factors in Algorithm 2 automatically means updating all edge-factors as well. In the processing order and selection of the sets and we follow the most efficient MAP-solvers TRWS [32] and SRMP [33] (the latter is a generalization of TRW-S to higher order models and has a slightly different implementation for pairwise CRFs (1)). In the special case when all nodes contain disjoint subsets of labels the graph matching problem (2) turns into MAP-inference in CRFs (1). Then all our algorithms GM, AMP and AMCF reduce to SRMP [33].

It is worth mentioning that for CRFs there exist algorithms, such as MPLP [23], which go over edge-factors only and in this way implicitly process also node-factors. As empirically shown in SRMP [33], MPLP is usually slower than SRMP. In Section 6 we show that our methods also favorably compare to the recently proposed HBP [60], which is similar to AMCF-O, but uses an MPLP-like processing schedule.

Optimization Subproblems of Procedure 1.

For each call of Procedure 1 one must find the best factor element in line 1 and compute the best messages by solving (9). The first subproblem is solved by explicitly scanning all elements of the factor for node-, edge- and label-factors. For optimizing over , we use a min-cost-flow solver. Solving (9) for all choices of factors and neighborhoods is possible through closed-form solutions or calling a mainimum cost flow solver and is described in the appendix.

Algorithm Ordered set
GM-O
AMP-O
AMCF-O
AMP-C
AMCF-C
Table 1: Input sets for specializations of Algorithm 2. For algorithms with the suffix -I the sets are the same as for those with -O, but correspond to the inverse graph.

Primal Rounding.

Algorithm 2 only provides lower bounds to the original problem (2). To obtain a primal solution one may ignore the edge potentials and solve the resulting reparametrized bipartite matching problem (6) with a minimum cost flow solver, as done in [60]. Empirically we found that it is better to interleave rounding and message passing, similarly as in TRWS [32] and SRMP [33]. Assume we have already computed a primal integer solution for all and we want to compute . To this end, between lines 4 and 5 of Algorithm 2 for we assign

(10)

Time complexity

If , time complexity per iteration is for GM. For AMP we must add and for AMCF the time to solve (6) (possible in ). Details and speedups are in the appendix.

Higher Order Extensions.

Our approach is straightforwardly extendable to the graph matching problem with higher order factors, a special case being third order: Let be a subset ot triplets of nodes and be corresponding triplet potentials. The corresponding third order graph matching problem reads

s.t. (11)

The associated IRPS-LP can be constructed by including additional factors for all triplets in an analoguous fashion as in (4), see e.g. [55] for the corresponding relaxation.

For relaxations (R1) – (R5) we use third order factors to enforce cycle inequalities, which we add in a cutting plane approach as in [47]. For this we set at the beginning. By this construction (11) is equivalent to (2), however the corresponding IRPS-LP are not: Triplet potentials make the relaxation tighter.

6 Experiments

Algorithms.

We compare against the two Lagrangean decomposition based solvers [51, 60] described in Section 1.

  • The dual decomposition solver DD [51]. We use local subproblems containing nodes. Note that the comparison in [60] was made with subproblems of size , hence DD’s relaxation was weaker there.

  • “Hungarian belief propagation”HBP [60]. In [60] a branch and bound solver is used on top of the dual ascent solver. For a fair comparison our reimplementation uses only the dual ascent component. As for AMP and AMCF, we append to HBP the suffixes -O and -C to denote the relaxations we let HBP run on.

According to [51, 60], these two algorithms outperformed competitors [24, 37, 18, 19, 57, 41, 50, 45, 38, 15, 63, 26] at the time of their publication, hence we do not compare against the latter ones.

We set a hard threshold of iterations for each algorithm, exiting earlier when the primal/dual gap vanishes or no dual progress was observed anymore. We compute primal solutions every -th iteration in our algorithms. For GM, AMP, AMCF and HBP we use the tightening extension discussed in Section 5 to improve the dual lower bound. We tighten our relaxation whenever no dual progress occurs.

Datasets.

dataset #I #V #L C
house 105 30 30 dense
hotel 105 30 30 dense
car 30 19-49 19-49 dense
motor 20 15-52 15-52 dense
graph flow 6 60-126 2-126 sparse
worms 30 20-60 sparse
Table 2: Dataset description. #I denotes number of instances, #V the number of nodes , #L the number of labels a node can be matched to and C the connectivity of the graph.

Figure 1: Plots showing convergence over time for house, hotel, car, motor, graph flow and worms datasets. Values denote . Values are averaged over all instances of the dataset. The x-axis and y-axis are logarithmic.

We have compared on six datasets:

  • house [3] and hotel [2] with costs as in [51]. The task is to find matching feature points between images capturing an object from different viewpoints.

  • car and motor, both used in [39], containing pairs of cars and motorbikes with keypoints to be matched. The images are taken from the VOC PASCAL 2007 challenge [20]. Costs are computed from features as in [39].

  • The graph flow dataset [1] comes from a tracking problem with large displacements [7]. Keypoints in frames of RGB-D images obtained by a Kinect camera [59] are matched. The depth information provided by the Kinect camera is taken into account when computing the potentials .

  • The worms dataset [27] from bioimaging. The goal is to annotate nuclei of C. elegans, a famous model organism used in developmental biology, by finding the corresponding nuclei in a precomputed model. The instances of worms are, to our knowledge, the largest graph matching datasets ever investigated in the literature.

Whereas the first datasets are publicly available, the worms dataset was kindly given to us by the authors of [27] and will be published. A summary of dataset characteristics can be found in Table 2. Previous computational studies concentrated on small-scale problems having up to nodes and labels. We have included the worms dataset with up to nodes and labels.

Results.

Fig. 2 shows performance of the algorithms on all considered datasets. Among all variants -O, -I, -C corresponding respectively to the original, inverse and coupled formulations we plotted only the best one. As expected, for dense graphs (datasets house, hotel, car, motor) the variant -C with coupling provided most robust convergence, being similar to the best of -O either -I and therefore is presented on the plots. For sparse graphs, the inverse representation becomes too expensive, as the inverse edge set may be dense even though is sparse in (3). Therefore we stick to the original problem -O.

  • hotel and house are easy datasets, and many instances needed iterations for convergence. AMP, AMCF and DD were able to solve all instances to optimality within few seconds or even faster. However, DD is the fastest method for this data.

  • car and motor were already harder and the iteration limit did not allow to ascertain optimality for all instances. AMP significantly outperforms its competitors, DD is significantly slower than the rest, whereas other algorithms show comparable results.

  • on worms again AMP significantly outperforms its competitors, AMCF and HBP converge to similar duality gap, although AMCF does it one-two orders of magnitude faster, GM and DD return results which are hardly competitive.

  • graph flow is the only dataset, where DD clearly overcomes all competitors, followed by AMP. We attribute it to DD’s tighter relaxation, (its ”local” subproblems contain variables, whereas our subproblems have at most 3 variables after tightening.)

Insights and Conclusions

  • AMP shows overall best performance for both small dense and large sparse datasets. It is the best anytime solver: it has the best performance in the first iterations. This is beneficial (i) if the run-time is limited or (ii) in branch-and-bound procedures, where a good early progress helps to efficiently eliminate non-optimal branches fast.

  • Although AMP, AMCF and HBP address equivalent relaxations (having the same maximal dual value) their convergence speed is different. AMCF and HBP are generally slower than AMP, which we attribute to the suboptimal redistribution of the costs by the min-cost-flow factors when maximizing messages in (9).

  • DD

    ’s relatively good performance is probably due to the large subproblems used by this method. First, this decreases the number of dual variables, which accelerates bound convergence; second, this makes the relaxation tighter, which decreases the duality gap. We attribute slow convergence of

    DD to the subgradient method.

  • Summarizing, larger subproblems are profitable for the sub-gradient method, but not for message passing.

  • AMCF outperforms HBP due to better message scheduling.

  • We attribute the inferior performance of GM mostly to the weakest relaxation it optimizes. Even under this condition, due to a good message scheduling and fast message passing it outperforms DD and HBP on several datasets.

A detailed evaluation of all instances is in the appendix.

7 Acknowledgments

The authors would like to thank Vladimir Kolmogorov for helpful discussions. This work is partially funded by the European Research Council under the European Unions Seventh Framework Programme (FP7/2007-2013)/ERC grant agreement no 616160.

References

  • [1] Graph flow dataset. http://cvlab-dresden.de/research/iamge-matching/graphflow/.
  • [2] Hotel dataset. http://vasc.ri.cmu.edu//idb/html/motion/hotel/.
  • [3] House dataset. http://vasc.ri.cmu.edu/idb/html/motion/house/.
  • [4] K. Adamczewski and Y. S. K. M. Lee. Discrete tabu search for graph matching. In International Conference on Computer Vision, 2015.
  • [5] W. P. Adams and T. A. Johnson. Improved linear programming-based lower bounds for the quadratic assignment problem. DIMACS series in discrete mathematics and theoretical computer science, 16:43–75, 1994.
  • [6] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows: Theory, Algorithms, and Applications. Prentice Hall, 1 edition, Feb. 1993.
  • [7] H. A. Alhaija, A. Sellent, D. Kondermann, and C. Rother. Graphflow - 6d large displacement scene flow via graph matching. In J. Gall, P. V. Gehler, and B. Leibe, editors, GCPR, volume 9358 of Lecture Notes in Computer Science, pages 285–296. Springer, 2015.
  • [8] K. M. Anstreicher and N. W. Brixius. A new bound for the quadratic assignment problem based on convex quadratic programming. Mathematical Programming, 89(3):341–357, 2001.
  • [9] K. M. Anstreicher and N. W. Brixius. Solving quadratic assignment problems using convex quadratic programming relaxations. Optimization Methods and Software, 16(1-4):49–68, 2001.
  • [10] M. Bazaraa and H. Sherali. New approaches for solving the quadratic assignment problem. In Operations Research Verfahren, volume 32, pages 29–46, 1979.
  • [11] M. S. Bazaraa and H. D. Sherali. On the use of exact and heuristic cutting plane methods for the quadratic assignment problem. Journal of the Operational Research Society, 33(11):991–1003, 1982.
  • [12] M. Beckman and T. Koopmans. Assignment problems and the location of economic activities. Econometrica, 25:53–76, 1957.
  • [13] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004.
  • [14] R. E. Burkard, E. Cela, P. M. Pardalos, and L. S. Pitsoulis. The quadratic assignment problem. In Handbook of combinatorial optimization, pages 1713–1809. Springer US, 1998.
  • [15] M. Cho, J. Lee, and K. M. Lee. Reweighted random walks for graph matching. In K. Daniilidis, P. Maragos, and N. Paragios, editors, ECCV (5), volume 6315 of Lecture Notes in Computer Science, pages 492–505. Springer, 2010.
  • [16] M. Cho, J. Sun, O. Duchenne, and J. Ponce.

    Finding matches in a haystack: A max-pooling strategy for graph matching in the presence of outliers.

    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2083–2090, 2014.
  • [17] D. Conte, P. Foggia, C. Sansone, and M. Vento. Thirty years of graph matching in pattern recognition. IJPRAI, 18(3):265–298, 2004.
  • [18] T. Cour, P. Srinivasan, and J. Shi. Balanced graph matching. In NIPS. MIT Press, Cambridge, MA, 2007.
  • [19] J. Duchi, D. Tarlow, G. Elidan, and D. Koller. Using combinatorial optimization within max-product belief propagation. In NIPS, pages 369–376. MIT Press, 2006.
  • [20] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The Pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2):303–338, June 2010.
  • [21] A. Frieze and J. Yadegar. On the quadratic assignment problem. Discrete applied mathematics, 5(1):89–98, 1983.
  • [22] J. W. Gavett and N. V. Plyter. The optimal assignment of facilities to locations by branch and bound. Operations Research, 14(2):210–232, 1966.
  • [23] A. Globerson and T. S. Jaakkola. Fixing max-product: Convergent message passing algorithms for MAP LP-relaxations. In NIPS, pages 553–560, 2007.
  • [24] S. Gold and A. Rangarajan. A graduated assignment algorithm for graph matching. IEEE Trans. Pattern Anal. Mach. Intell., 18(4):377–388, 1996.
  • [25] P. Hahn, T. Grant, and N. Hall. A branch-and-bound algorithm for the quadratic assignment problem based on the hungarian method. European Journal of Operational Research, 108(3):629–640, 1998.
  • [26] B. Jiang, J. Tang, C. Ding, and B. Luo. A local sparse model for matching problem. In

    Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence

    , AAAI’15, pages 3790–3796. AAAI Press, 2015.
  • [27] D. Kainmueller, F. Jug, C. Rother, and G. Myers. Active graph matching for automatic joint segmentation and annotation of C. elegans. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 81–88. Springer, 2014.
  • [28] J. H. Kappes, B. Andres, F. A. Hamprecht, C. Schnörr, S. Nowozin, D. Batra, S. Kim, B. X. Kausler, T. Kröger, J. Lellmann, N. Komodakis, B. Savchynskyy, and C. Rother. A comparative study of modern inference techniques for structured discrete energy minimization problems. International Journal of Computer Vision, 115(2):155–184, 2015.
  • [29] J. H. Kappes, B. Savchynskyy, and C. Schnörr. A bundle approach to efficient MAP-inference by Lagrangian relaxation. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 1688–1695. IEEE, 2012.
  • [30] S. E. Karisch and F. Rendl. Lower bounds for the quadratic assignment problem via triangle decompositions. Mathematical Programming, 71(2):137–151, 1995.
  • [31] K. C. Kiwiel. Proximity control in bundle methods for convex nondifferentiable minimization. Mathematical programming, 46(1-3):105–122, 1990.
  • [32] V. Kolmogorov. Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell., 28(10):1568–1583, 2006.
  • [33] V. Kolmogorov. A new look at reweighted message passing. IEEE Trans. Pattern Anal. Mach. Intell., 37(5):919–930, 2015.
  • [34] B. Korte, J. Vygen, B. Korte, and J. Vygen. Combinatorial optimization, volume 2. Springer, 2012.
  • [35] E. L. Lawler et al. The quadratic assignment problem. Management Science, 9(4):586–599, 1963.
  • [36] J. Lee, M. Cho, and K. M. Lee.

    A graph matching algorithm using data-driven markov chain monte carlo sampling.

    In Pattern Recognition (ICPR), 2010 20th International Conference on, pages 2816–2819. IEEE, 2010.
  • [37] M. Leordeanu and M. Hebert. A spectral technique for correspondence problems using pairwise constraints. In ICCV, pages 1482–1489. IEEE Computer Society, 2005.
  • [38] M. Leordeanu, M. Hebert, and R. Sukthankar. An integer projected fixed point method for graph matching and MAP inference. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, editors, NIPS, pages 1114–1122. Curran Associates, Inc., 2009.
  • [39] M. Leordeanu, R. Sukthankar, and M. Hebert. Unsupervised learning for graph matching. International Journal of Computer Vision, 96(1):28–45, 2012.
  • [40] E. M. Loiola, N. M. M. de Abreu, P. O. Boaventura-netto, P. Hahn, and T. Querido. An analytical survey for the quadratic assignment problem. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, pages 657–690, 2007.
  • [41] J. Maciel and J. Costeira. A global solution to sparse correspondence problems. IEEE Trans. Pattern Anal. Mach. Intell., 25(2):187–199, 2003.
  • [42] Q. Nguyen, A. Gautier, and M. Hein.

    A flexible tensor block coordinate ascent scheme for hypergraph matching.

    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5270–5278, 2015.
  • [43] P. M. Pardalos, K. A. Murthy, and T. P. Harrison. A computational comparison of local search heuristics for solving quadratic assignment problems. Informatica, 4(1-2):172–187, 1993.
  • [44] C. Ribeiro and M. Minoux. Solving hard constrained shortest path problems by Lagrangean relaxation and branch-and-bound algorithms. Methods of Operations Research, 53:303–316, 1986.
  • [45] C. Schellewald and C. Schnörr. Probabilistic subgraph matching based on convex relaxation. In EMMCVPR, volume 3757 of Lecture Notes in Computer Science, pages 171–186. Springer, 2005.
  • [46] M. I. Schlesinger. Syntactic analysis of two-dimensional visual signals in noisy conditions. Kibernetika, 4(113-130):1, 1976.
  • [47] D. Sontag, D. K. Choe, and Y. Li. Efficiently searching for frustrated cycles in map inference. In UAI, pages 795–804. AUAI Press, 2012.
  • [48] Y. Suh, M. Cho, and K. M. Lee. Graph matching via sequential monte carlo. In Computer Vision - ECCV 2012 - 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part III, pages 624–637, 2012.
  • [49] P. Swoboda, J. Kuske, and B. Savchynskyy. A dual ascent framework for Lagrangean decomposition of combinatorial problems. CoRR, 2016.
  • [50] P. H. S. Torr. Solving Markov random fields using semi definite programming. In In: AISTATS, 2003.
  • [51] L. Torresani, V. Kolmogorov, and C. Rother. A dual decomposition approach to feature correspondence. IEEE Trans. Pattern Anal. Mach. Intell., 35(2):259–271, 2013.
  • [52] S. Umeyama. An eigendecomposition approach to weighted graph matching problems. IEEE Trans. Pattern Anal. Mach. Intell., 10(5):695–703, 1988.
  • [53] M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1-2):1–305, 2008.
  • [54] T. Werner. A linear programming approach to max-sum problem: A review. IEEE Trans. Pattern Analysis and Machine Intelligence, 29(7):1165–1179, 2007.
  • [55] T. Werner. Revisiting the linear programming relaxation approach to Gibbs energy minimization and weighted constraint satisfaction. IEEE Trans. Pattern Anal. Mach. Intell., 32(8):1474–1488, 2010.
  • [56] S. J. Wright. Coordinate descent algorithms. Mathematical Programming, 151(1):3–34, 2015.
  • [57] J. Yarkony, C. C. Fowlkes, and A. T. Ihler. Covering trees and lower-bounds on quadratic assignment. In CVPR, pages 887–894. IEEE Computer Society, 2010.
  • [58] M. Zaslavskiy, F. Bach, and J.-P. Vert. A path following algorithm for the graph matching problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12):2227–2242, 2009.
  • [59] Z. Zhang. Microsoft kinect sensor and its effect. IEEE MultiMedia, 19(2):4–10, Apr. 2012.
  • [60] Z. Zhang, Q. Shi, J. McAuley, W. Wei, Y. Zhang, and A. van den Hengel. Pairwise matching through max-weight bipartite belief propagation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
  • [61] G. Zhao, B. Luo, J. Tang, and J. Ma. Using Eigen-Decomposition Method for Weighted Graph Matching, pages 1283–1294. Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.
  • [62] Q. Zhao, S. E. Karisch, F. Rendl, and H. Wolkowicz. Semidefinite programming relaxations for the quadratic assignment problem. Journal of Combinatorial Optimization, 2(1):71–109, 1998.
  • [63] F. Zhou and F. D. la Torre. Factorized graph matching. In CVPR, pages 127–134. IEEE Computer Society, 2012.

8 Supplementary Material

8.1 Proofs

Proof of Proposition 1

8.2 Optimization Subproblems of Procedure 1

In Procedure 1 the two problem in lines 1 and 1 must be solved. Solution of the optimization problem in line 1 was discussed in the main part of the paper. Therefore, it only remains to show how to carry optimization of the problem in line 1 efficiently for all cases that can occur. This is shown in Table 3.

Checking validity of the operations in Table 3 for and is straightforward. For and , we prove correctness below. Correctness for and is analoguous.

Lemma 2.

The reparametrization adjustment problem (9) for and is given by (14). Moreover it is the dual of a minimum cost network flow problem.

Proof.

Recall from network flow theory [6], that is optimal for cost , iff such that

Consider the primal/dual pair

(12)

On the right side, the adjustment problem (1) is written down explicitly. The left hand side is a minimum cost flow problem, hence the second part of the claim is proven.

The last equality above on the right hand side ensures that . Substituting this everywhere on the right hand side of (12) gives

(13)

This form matches the format given in (14). ∎

8.3 Time complexity

The time complexity of running one iteration of message passing is essentially the time to run all required invocations of Algorithm 1 via the routines described in Table 3. Total runtime per iteration for the various algorithms we have proposed can be found in Table 4. We assume that . In sparse assignment problems, where this is not the case, run-time decreases according to sparsity.

If we hold the unary potentials , in a heap, we can support operation which is required in the third line in Table 3 in time , since either (sending) or (receiving).

Hence, all our algorithms scale to realistic problem sizes.

8.4 Detailed Experimental Evaluation

Plots showing lower bound and primal solution energy per over time can be seen in Figure 2.

In Table 5 dataset statistics are given together with final upper and lower bound as well as runtime averaged over all instances in specific datasets are given.

A per-instance evaluation of all considered algorithms can be found in Table 6.

Algorithm 1 input Solution of (9)
(14)
(15)
Table 3: Message computation problems (9)
Algorithm Time complexity
GM