1 Introduction
In computer vision and beyond, the quadratic assignment problem, known also as graph matching, feature correspondence and feature matching, has attracted great interest. This problem is similar to MaximumAPosteriori (MAP) inference on a discrete pairwise graphical model, also called conditional random field (CRF) in the literature. It differs in an additional uniqueness constraint: Each label can be taken at most once. This uniqueness constraint makes it wellsuited to attack e.g. tracking problems or shape matching. In both cases feature points or object parts have to be matched between multiple frames onetoone. Unfortunately, the uniqueness constraint prevents naive application of efficient message passing solvers for MAPinference to this problem. For this reason, many dedicated graph matching solvers were developed, see related work below.
On the other hand, efficient dual blockcoordinate ascent (also known as message passing) algorithms like TRWS [32] count among the most efficient solvers for MAPinference in conditional random fields. Also, the graph matching problem, after possibly introducing many additional variables, can be stated as a MAPinference problem in a standard pairwise CRF. Such an approach already surpasses most stateoftheart graph matching solvers.
Hence, it is desirable to devise specialized convergent message passing solvers exhibiting none of the drawbacks discussed above, i.e. (i) directly operating on a compact representation of the graph matching problem. (ii) using techniques from the MAPinference community to gain computational efficiency and
To achieve this goal, we propose (i) several Lagrangean decompositions of the graph matching problem and (ii) novel efficient message passing solvers for these relaxations. We show their efficacy in an extensive empirical evaluation.
Related work
The term graph matching
refers to a number of different optimization problems in pattern recognition, see
[17] for a review. We mean the special version known unambiguously as quadratic assignment problem (QAP) [35]. Recently, the graph matching was generalized to the hypergraph matching problem (see [42] and references therein), which match between more than two graphs.The quadratic assignment problem was first formulated in [12] back in . Since a number of NPcomplete problems such as traveling salesman, maximal clique, graph isomorphism and graph partitioning can be straightforwardly reduced to QAP, this problem is NPhard itself. Its importance for numerous applications boosted its analysis a lot: The (already aged) overview [40] contains references with over works suggesting new algorithms and over with new theoretical results related to this problem.
Nearly all possible solver paradigms were put to the test for QAP. These include, but are not limited to, convex relaxations based on Lagrangean decompositions [30, 51], linear [21, 5], convex quadratic [8] and semidefinite [62, 50, 45] relaxations, which can be used either directly to obtain approximate solutions or just to provide lower bounds. To tighten these bounds several cutting plane methods were proposed [10, 11]
. On the other side, various primal heuristics, both (i) deterministic, such as local search
[43, 4], graduated assignment methods [24], fixed point iterations [38], spectral technique and its derivatives [37, 16, 52, 61] and (ii) stochastic, like random walk [15] and MonteCarlo sampling [36, 48] were suggested to provide approximate solutions to the problem. Altogether these methods serve as building blocks for exact branchandbound [22, 25, 9] algorithms and other nonconvex optimization methods [58, 63, 24]. The excellent surveys [40, 14] contain further references.As is usual for NPhard problems, no single method can efficiently address all QAP instances. Different applications require different methods and we concentrate here on problem instances specific for computer vision. Traditionally within this community predominantly primal heuristics are used, since demand for low computational time usually dominates the need to obtain optimality guarantees. However, two recently proposed solvers [51, 60] based on Lagrangean decomposition (also known as dual decomposition in computer vision) have shown superior results and surpassed numerous stateoftheart primal heuristics.
The dual decomposition solver [51] represents the problem as a combination of MAPinference for binary CRFs, the linear assignment problem and a number of smallsized QAPs over few variables; Lagrangean multipliers connecting these subproblems are updated with the subgradient method. Although the solver demonstrates superior results on computer vision datasets, we suspect that its efficiency can be further improved by switching to a different update method, such as bundle [31, 29] or blockcoordinate ascent [56]. This suspicion is based on comparison of such solvers for MAPinference in CRFs [28]
and similar observation related to other combinatorial optimization problems (see e.g.
[44]).Hungarian Belief Propagation (HBP) [60] considers a combination of a multilabel CRF and a linear assignment as subproblems; Lagrange mutipliers are updated by a blockcoordinate ascent (message passing) algorithm and the obtained lower bounds are employed inside a branchandbound solver. It is known [33], however, that efficiency of message passing significantly depends on the schedule of sending messages. Specifically, efficiency of dual ascent algorithms depends on selecting directions for the ascent (blocks of variables to optimize over) and the order in which these ascent operations are performed. Arguably, the underlying multilabel CRF subproblem is crucial and the message passing must deal with it efficiently. However, HBP [60] uses a message passing schedule similarly as in the MPLP algorithm [23], which was shown [33, 28] to be significantly slower than the schedule of SRMP (TRWS) [33].
Contribution
We study several Lagrangean decompositions of the graph matching problem. Some of these are known, e.g. the one used in the HBP algorithm [60] and the one corresponding to the local polytope relaxation of the pairwise CRF representation of graph matching. The others have not been published so far, to our knowledge. For all these decompositions we provide efficient message passing (dual ascent) algorithms based on a recent message passing framework [49]. In the case of the local polytope relaxation our algorithm coincides with the SRMP method [33], a higherorder generalization of the famous TRWS algorithm [32].
Our experimental evaluation suggests a new stateoftheart method for the graph matching problem, which outperforms both the dual decomposition [51] and the HBP [60] solvers. We propose tighter convex relaxations for all our methods. Also, we significantly improve performance of the HBP algorithm by changing its message passing schedule.
Proofs are given in the appendix. Code and datasets are availabe at http://github.com/pawelswoboda/LP_MP.
Notation.
Undirected graphs are denoted by , where is a finite node set and is the edge set. The set of neighboring nodes of w.r.t. graph is denoted by . The convex hull of a set is denoted by .
2 CRFs and Graph Matching
First, we introduce conditional random fields and state the graph matching problem as one with additional uniqueness constraints. Second, we consider an inverse formulation of the graph matching problem, which, after being coupled with the original formulation, often leads to faster algorithms.
Conditional random fields (CRF).
Let be an undirected graph. With each node we associate a variable taking its values in a finite set of labels
. Hence, each label corresponds to a unit vector. Notation
denotes the Cartesian product . A vector with coordinates is called a labeling. Likewise, we use the notation (a special case being ) to indicate part of a labeling. Functions , , and , , are potentials, which define a local quality of labels and label pairs.The energy minimization or MAPinference problem for CRFs is
(1) 
The objective in (1) is called energy of the CRF.
A great number of applied problems can be efficiently cast in the format (1), see e.g. [53, 28]
. This defines its importance for computer vision, machine learning and a number of other branches of science
[53]. While problem (1) is NPhard in general, many exact and approximate solvers were proposed [28].Graph Matching.
Although the format of Problem (1) allows us to express many practically important optimization tasks efficiently, some applications require the resulting labelings to satisfy additional constraints. In particular, for the graph matching problem no label may be taken twice.
Let a common universe of labels be given such that . We require each label to be taken at most once, i.e. . In other words, we seek an injective mapping . This problem can be stated as
(2) 
Graph matching is NPhard, since it is equivalent to MAPinference for CRFs (1) in the trivial case, when nodes of the graph contain mutually nonintersecting sets of labels.
Inverse Graph Matching
A special case arises if the universe of labels to be matched has the same size as the set of nodes of the graph . Then every injective mapping must also be a bijection. Hence, every feasible labeling corresponds to a permutation of . The graph matching problem (2) can in this case also be approached in terms of the inverse permutation. To this end let the inverse graph be given by ; the inverse label set is associated with each node ; respectively is the set of inverse labelings and denotes ; the set of edges of the inverse graph is defined as . The inverse costs for , read:
Consider the resulting inverse graph matching problem
(3) 
Labeling and inverse labeling correspond to each other iff .
Note that when the edge set is sparse, the inverse edge set may be not. In such a case, computational complexity of the inverse problem is higher than of the original one.
3 Lagrangean Decompositions
Since the graph matching problem (2) is NPhard, it is common to consider convex relaxations. Below, we present three Lagrangean decompositon based relaxations of the problem. These can be applied to the original graph matching problem (2), to the inverse one (3) and to a combination of both. Since all these relaxations are based on the famous local polytope relaxation [46, 54] of the MAPinference for CRFs (1), we give a short overview of this relaxation first.
Local Polytope for CRFs.
The MAPinference problem (1
) can be represented as an integer linear program (ILP)
[34] using an overcomplete representation [53] by grouping potentials corresponding to each node and edge into separate vectors. That is, , stands for a vector with coordinates . The realvalued vectors have the same dimensionality as and stand for the ”relaxed” version of . The corresponding linear programming (LP) relaxation reads:(4)  
Constraints of (4) define the local polytope . Note that adding integrality constraints makes the problem (4) equivalent to its combinatorial formulation (1).
Integer Relaxed Pairwise Separable Linear Programs (IRPSLP)
Below we describe a general problem format studied in [49], which generalizes the local polytope relaxation (4). Importantly, the same format fits also the Lagrangean decompositions of the graph matching problem, which we consider below. This makes it possible to consider all these relaxations at once from a general viewpoint.
Let a factor graph consist of nodes , called factors and edges , called factoredges. Let , , be sets of binary vectors and (ii) , , be matrices with binary entries, which map binary vectors from into binary vectors from , i.e., . The IRPSLP is a class of problems, which factorize according to .
(5)  
Constraints are associated with each factoredge and are called coupling constraints. When representing the local polytope relaxation (4) as (5) we assume and . The convex hull of is fully defined by the first line of constraints in (4), since constitutes a set of unit binary vectors. The second line of constraints in (4) defines the coupling constraints.
We use variable names for (in general) nonbinary vectors and for binary ones , .
3.1 Graph Matching Problem Relaxations.
Below, we describe three relaxations of the graph matching (2) problem, which fit the IRPSLP (5) format. The first one results in a standard local polytope relaxation (4) of a specially constructed CRF, the second one utilizes additional coupling constraints on top of (4), while the third approach uses a network flow subproblem. Additionally, we use the inverse formulation (3) and build two additional IRPSLPs.
(R1) Graph Matching as CRF.
To build a CRF equivalent to the graph matching we start with the underlying CRF as in (2) and express the uniqueness constraints in the edge factors. To this end we (i) extend the edge set with new edges connecting any two nodes having at least one common label, i.e. ; (ii) assign edge potentials to all new edges ; (iii) for all we assign . Any solution of the resulting CRF (1) with cost is an assignment. The relaxation in terms of an IRPSLP is the local polytope (4).
This approach results in general in a quadratic number of additional edge potentials, which may become intractable as the size of the graph matching problem grows.
(R2) Relaxation with Label Factors.
For each label we introduce an additional label factor, which keeps track of nodes which assign label . The label set of this factor consists of those nodes which can be assigned label and an additional dummy node representing nonassignment of label . Label # is necessary, as not every label needs to be taken. The set of factors becomes , with the coupling constraint set . The resulting IRPSLP formulation reads
(R2)  
Here we introduced additional potentials for the label factor. Initially, we set .
(R3) Relaxation with a Network Flow Factor.
If one ignores the edge potentials in (2), the problem can be equivalently reformulated as bipartite matching [6]:
(6)  
Here we substituted the uniqueness constraints with the linear inequalities , which is equivalent for . It is known that is the convex hull of all binary vectors satisfying the conditions of [6], i.e. . Therefore fits into the IRPSLP framework. Crucially for an efficient implementation, (6) can be efficiently solved by minimum cost flow solvers [6].
Below we treat (6) as a separate factor and link it with (4) to obtain an IRPSLP. Its factor graph is defined by and . The resulting IRPSLP formulation is
(R3)  
Initially, we set .
Representation (6) for the uniqueness constraints has been already used e.g., in [19]. However their optimization technique lacks both convergence guarantees and monotonicity of a lower bound, which our methods possess. The work [60] considered the Lagrange dual of (R3) as a relaxation the graph matching problem. Their relaxation is equivalent to (R3), but their algorithm differs from ours. We refer to Section 5 for a discussion of the differences and to Section 6 for an experimental comparison.
(R4R5) Coupling Original Graph Matching (2) and its Inverse (3).
In the special case when we may solve the inverse graph matching problem (3) instead of the original one (2). Another alternative is to solve both problems simultaneously and couple them together by requiring that the labeling of (2) is the inverse permutation for the labeling from (3). Such an approach doubles the problem size, yet it may result in a smaller number of iterations required to obtain convergence. This approach works both for relaxations (R2) and (R3).
The resulting coupled IRPSLP for (R2) reads
(R4)  
Here the role of label factors in (R2) has been taken over by the node factors of the inverse graph matching (3). We distribute the costs equally among and initially.
Another coupled IRPSLP, corresponding to (R3) reads
(R5)  
Here the network flow factor controls consistency of the original and inverse labelings . Initially, we set and distribute costs in and equally.
The optimal values obtained by relaxations (R1) – (R5) may deliver differing bounds to (2), as characterized below.
Proposition 1.
(R2) = (R3) and (R4) = (R5). Relaxation (R1) is weaker than (R2) and (R3).
4 General Algorithm
In this section we define a general algorithm for IRPSLP problems (5), which is applicable to the decompositions (R1)–(R5) of the graph matching problem considered in Section 3.1. Our algorithm is a simplified version of the algorithm [49], where we fixed several parameters to the values common to the relaxations (R1)–(R5).
Instead of optimizing IRPSLP (5) directly, we consider its Lagrangean dual w.r.t. the coupling constraints . The IRPSLP problem (5) can be shortly written as , where stands for , represents all coupling constraints and denotes a polytope encapsulating the rest of constraints. By dualizing with a vector of Lagrange multipliers one obtains the Lagrange function . After introducing the dual objective reads . It is wellknown [13] that for any feasible and the dual problem consists in maximizing over . Going from to is called an equivalent transformation or reparametrization in the literature. In the CRFliterature it is also known as message passing.
Now we apply the above considerations to the general IRPSLP problem (5). Specifically, let be two neighboring factors in the factorgraph . Then for any and satisfying the coupling constraint for edge
The values and sign of the Lagrange multiplies define how much cost is ”sent” from to or the other way around. When we consider a subset of the neighboring factors for , the resulting equivalent transformation reads:
(7) 
We are interested in which improve the dual. Below we define a subclass of such messages for the same setting as in (7):
Definition 1.
Messages , , are called admissible, if there exists and additionally
(8) 
We denote the set of admissible vectors by .
Lemma 1 ([49]).
Admissible messages do not decrease the dual value, i.e., implies .
Example 1.
Let us apply Definition 1 to the local polytope relaxation (4) of CRFs. Let correspond to , where is some node and is any of its incident edges and . Then corresponds to a locally optimal label and . Therefore we may assign to any value from . This assures that (8) is fulfilled and remains a locally optimal label after reparametrization even if there are multiple optima in .
Sending Messages.
Procedure 1 represents an elementary step of our optimization algorithm. It consists of sending messages from a node to a subset of its neighbors .
Procedure 1 first computes an optimal labeling for the factor in line 1, then computes message updates in (9) and finally updates the costs in line 1. The costs in line 1 are chosen as , except when is the network flow factor for (R3) and (R5). In this case, we choose .
Computation (9) provides a maximally possible admissible message from to . Essentially, it makes the cost vector of the factor as uniform as possible. So, in the setting of Example 1 becomes equal to and therefore for all . Since the result of (9) is an admissible message, Procedure 1 never decreases the dual objective, as follows from Lemma 1.
Dual Ascent Algorithm.
Let the notation stand for an ordered set such that , . Algorithm 2 below goes over some of the factors in a prespecified order and calls Procedure 1 to send or receive messages to/from some of the neighbors.
Algorithm 2 works as follows: We choose an ordered subset of factors . For each factor we select a neighborhood of factors from which to receive messages and a neighborhood to which messages are sent by Procedure 1. We run Algorithm 2 on (forward direction) and (backward direction) alternatingly until some stopping condition is met. Since Algorithm 2 reparametrizes the problem by Procedure 1 only and the latter is guaranteed to not decrease the dual, so is Algorithm 2. We refer to [49] for further theoretical properties of Algorithm 2.
5 Graph matching algorithms.
For each of the relaxations (R1)(R5) of the graph matching problem we detail parameters of Algorithm 2 used in our experiments: we define the sets , , .
Algorithm Names.
We use the following shortcuts for specializations of Algorithm 2 to the relaxations (R1)(R5): GM corresponds to (R1), AMP to (R2), AMCF to (R3). To obtain the relaxations (R1R3) we use either the original graph, as in (2), or an inverse one, as in (3). These options are denoted by suffixes O and I respectively. Additionally, the two coupled relaxations (R4) and (R5), are addressed by algorithms AMPC and AMCFC respectively. All in all, we have eight algorithms GMO, GMI, AMPO, AMPI, AMPC, AMCFO, AMCFI and AMCFC.
The sets , and
are defined in Table 1. For algorithms with the suffix I the values are the same as for those with O, but corresponding to the inverse graph.
We assume the order of graph nodes and labels to be given a priori. We define for the matching factor and for the edge factors . Similarly, we define for all edge factors in the inverse graph. We extend the resulting partial order to a total one, e.g., by topological sort. For we define and as the sets of preceding and subsequent factors.
Sending a message by some factor automatically implies receiving this message by another, coupled factor. Therefore, there is no need to go over all factors in Algorithm 2. In particular, edgefactors are coupled to nodefactors only, therefore processing all node factors in Algorithm 2 automatically means updating all edgefactors as well. In the processing order and selection of the sets and we follow the most efficient MAPsolvers TRWS [32] and SRMP [33] (the latter is a generalization of TRWS to higher order models and has a slightly different implementation for pairwise CRFs (1)). In the special case when all nodes contain disjoint subsets of labels the graph matching problem (2) turns into MAPinference in CRFs (1). Then all our algorithms GM, AMP and AMCF reduce to SRMP [33].
It is worth mentioning that for CRFs there exist algorithms, such as MPLP [23], which go over edgefactors only and in this way implicitly process also nodefactors. As empirically shown in SRMP [33], MPLP is usually slower than SRMP. In Section 6 we show that our methods also favorably compare to the recently proposed HBP [60], which is similar to AMCFO, but uses an MPLPlike processing schedule.
Optimization Subproblems of Procedure 1.
For each call of Procedure 1 one must find the best factor element in line 1 and compute the best messages by solving (9). The first subproblem is solved by explicitly scanning all elements of the factor for node, edge and labelfactors. For optimizing over , we use a mincostflow solver. Solving (9) for all choices of factors and neighborhoods is possible through closedform solutions or calling a mainimum cost flow solver and is described in the appendix.
Algorithm  Ordered set  

GMO  
AMPO  
AMCFO  
AMPC  
AMCFC 
Primal Rounding.
Algorithm 2 only provides lower bounds to the original problem (2). To obtain a primal solution one may ignore the edge potentials and solve the resulting reparametrized bipartite matching problem (6) with a minimum cost flow solver, as done in [60]. Empirically we found that it is better to interleave rounding and message passing, similarly as in TRWS [32] and SRMP [33]. Assume we have already computed a primal integer solution for all and we want to compute . To this end, between lines 4 and 5 of Algorithm 2 for we assign
(10) 
Time complexity
If , time complexity per iteration is for GM. For AMP we must add and for AMCF the time to solve (6) (possible in ). Details and speedups are in the appendix.
Higher Order Extensions.
Our approach is straightforwardly extendable to the graph matching problem with higher order factors, a special case being third order: Let be a subset ot triplets of nodes and be corresponding triplet potentials. The corresponding third order graph matching problem reads
s.t.  (11) 
The associated IRPSLP can be constructed by including additional factors for all triplets in an analoguous fashion as in (4), see e.g. [55] for the corresponding relaxation.
For relaxations (R1) – (R5) we use third order factors to enforce cycle inequalities, which we add in a cutting plane approach as in [47]. For this we set at the beginning. By this construction (11) is equivalent to (2), however the corresponding IRPSLP are not: Triplet potentials make the relaxation tighter.
6 Experiments
Algorithms.

“Hungarian belief propagation”HBP [60]. In [60] a branch and bound solver is used on top of the dual ascent solver. For a fair comparison our reimplementation uses only the dual ascent component. As for AMP and AMCF, we append to HBP the suffixes O and C to denote the relaxations we let HBP run on.
According to [51, 60], these two algorithms outperformed competitors [24, 37, 18, 19, 57, 41, 50, 45, 38, 15, 63, 26] at the time of their publication, hence we do not compare against the latter ones.
We set a hard threshold of iterations for each algorithm, exiting earlier when the primal/dual gap vanishes or no dual progress was observed anymore. We compute primal solutions every th iteration in our algorithms. For GM, AMP, AMCF and HBP we use the tightening extension discussed in Section 5 to improve the dual lower bound. We tighten our relaxation whenever no dual progress occurs.
Datasets.
dataset  #I  #V  #L  C 

house  105  30  30  dense 
hotel  105  30  30  dense 
car  30  1949  1949  dense 
motor  20  1552  1552  dense 
graph flow  6  60126  2126  sparse 
worms  30  2060  sparse 
We have compared on six datasets:

The worms dataset [27] from bioimaging. The goal is to annotate nuclei of C. elegans, a famous model organism used in developmental biology, by finding the corresponding nuclei in a precomputed model. The instances of worms are, to our knowledge, the largest graph matching datasets ever investigated in the literature.
Whereas the first datasets are publicly available, the worms dataset was kindly given to us by the authors of [27] and will be published. A summary of dataset characteristics can be found in Table 2. Previous computational studies concentrated on smallscale problems having up to nodes and labels. We have included the worms dataset with up to nodes and labels.
Results.
Fig. 2 shows performance of the algorithms on all considered datasets. Among all variants O, I, C corresponding respectively to the original, inverse and coupled formulations we plotted only the best one. As expected, for dense graphs (datasets house, hotel, car, motor) the variant C with coupling provided most robust convergence, being similar to the best of O either I and therefore is presented on the plots. For sparse graphs, the inverse representation becomes too expensive, as the inverse edge set may be dense even though is sparse in (3). Therefore we stick to the original problem O.

hotel and house are easy datasets, and many instances needed iterations for convergence. AMP, AMCF and DD were able to solve all instances to optimality within few seconds or even faster. However, DD is the fastest method for this data.

car and motor were already harder and the iteration limit did not allow to ascertain optimality for all instances. AMP significantly outperforms its competitors, DD is significantly slower than the rest, whereas other algorithms show comparable results.

on worms again AMP significantly outperforms its competitors, AMCF and HBP converge to similar duality gap, although AMCF does it onetwo orders of magnitude faster, GM and DD return results which are hardly competitive.

graph flow is the only dataset, where DD clearly overcomes all competitors, followed by AMP. We attribute it to DD’s tighter relaxation, (its ”local” subproblems contain variables, whereas our subproblems have at most 3 variables after tightening.)
Insights and Conclusions

AMP shows overall best performance for both small dense and large sparse datasets. It is the best anytime solver: it has the best performance in the first iterations. This is beneficial (i) if the runtime is limited or (ii) in branchandbound procedures, where a good early progress helps to efficiently eliminate nonoptimal branches fast.

Although AMP, AMCF and HBP address equivalent relaxations (having the same maximal dual value) their convergence speed is different. AMCF and HBP are generally slower than AMP, which we attribute to the suboptimal redistribution of the costs by the mincostflow factors when maximizing messages in (9).

DD
’s relatively good performance is probably due to the large subproblems used by this method. First, this decreases the number of dual variables, which accelerates bound convergence; second, this makes the relaxation tighter, which decreases the duality gap. We attribute slow convergence of
DD to the subgradient method. 
Summarizing, larger subproblems are profitable for the subgradient method, but not for message passing.

AMCF outperforms HBP due to better message scheduling.

We attribute the inferior performance of GM mostly to the weakest relaxation it optimizes. Even under this condition, due to a good message scheduling and fast message passing it outperforms DD and HBP on several datasets.
A detailed evaluation of all instances is in the appendix.
7 Acknowledgments
The authors would like to thank Vladimir Kolmogorov for helpful discussions. This work is partially funded by the European Research Council under the European Unions Seventh Framework Programme (FP7/20072013)/ERC grant agreement no 616160.
References
 [1] Graph flow dataset. http://cvlabdresden.de/research/iamgematching/graphflow/.
 [2] Hotel dataset. http://vasc.ri.cmu.edu//idb/html/motion/hotel/.
 [3] House dataset. http://vasc.ri.cmu.edu/idb/html/motion/house/.
 [4] K. Adamczewski and Y. S. K. M. Lee. Discrete tabu search for graph matching. In International Conference on Computer Vision, 2015.
 [5] W. P. Adams and T. A. Johnson. Improved linear programmingbased lower bounds for the quadratic assignment problem. DIMACS series in discrete mathematics and theoretical computer science, 16:43–75, 1994.
 [6] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows: Theory, Algorithms, and Applications. Prentice Hall, 1 edition, Feb. 1993.
 [7] H. A. Alhaija, A. Sellent, D. Kondermann, and C. Rother. Graphflow  6d large displacement scene flow via graph matching. In J. Gall, P. V. Gehler, and B. Leibe, editors, GCPR, volume 9358 of Lecture Notes in Computer Science, pages 285–296. Springer, 2015.
 [8] K. M. Anstreicher and N. W. Brixius. A new bound for the quadratic assignment problem based on convex quadratic programming. Mathematical Programming, 89(3):341–357, 2001.
 [9] K. M. Anstreicher and N. W. Brixius. Solving quadratic assignment problems using convex quadratic programming relaxations. Optimization Methods and Software, 16(14):49–68, 2001.
 [10] M. Bazaraa and H. Sherali. New approaches for solving the quadratic assignment problem. In Operations Research Verfahren, volume 32, pages 29–46, 1979.
 [11] M. S. Bazaraa and H. D. Sherali. On the use of exact and heuristic cutting plane methods for the quadratic assignment problem. Journal of the Operational Research Society, 33(11):991–1003, 1982.
 [12] M. Beckman and T. Koopmans. Assignment problems and the location of economic activities. Econometrica, 25:53–76, 1957.
 [13] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004.
 [14] R. E. Burkard, E. Cela, P. M. Pardalos, and L. S. Pitsoulis. The quadratic assignment problem. In Handbook of combinatorial optimization, pages 1713–1809. Springer US, 1998.
 [15] M. Cho, J. Lee, and K. M. Lee. Reweighted random walks for graph matching. In K. Daniilidis, P. Maragos, and N. Paragios, editors, ECCV (5), volume 6315 of Lecture Notes in Computer Science, pages 492–505. Springer, 2010.

[16]
M. Cho, J. Sun, O. Duchenne, and J. Ponce.
Finding matches in a haystack: A maxpooling strategy for graph matching in the presence of outliers.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2083–2090, 2014.  [17] D. Conte, P. Foggia, C. Sansone, and M. Vento. Thirty years of graph matching in pattern recognition. IJPRAI, 18(3):265–298, 2004.
 [18] T. Cour, P. Srinivasan, and J. Shi. Balanced graph matching. In NIPS. MIT Press, Cambridge, MA, 2007.
 [19] J. Duchi, D. Tarlow, G. Elidan, and D. Koller. Using combinatorial optimization within maxproduct belief propagation. In NIPS, pages 369–376. MIT Press, 2006.
 [20] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The Pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2):303–338, June 2010.
 [21] A. Frieze and J. Yadegar. On the quadratic assignment problem. Discrete applied mathematics, 5(1):89–98, 1983.
 [22] J. W. Gavett and N. V. Plyter. The optimal assignment of facilities to locations by branch and bound. Operations Research, 14(2):210–232, 1966.
 [23] A. Globerson and T. S. Jaakkola. Fixing maxproduct: Convergent message passing algorithms for MAP LPrelaxations. In NIPS, pages 553–560, 2007.
 [24] S. Gold and A. Rangarajan. A graduated assignment algorithm for graph matching. IEEE Trans. Pattern Anal. Mach. Intell., 18(4):377–388, 1996.
 [25] P. Hahn, T. Grant, and N. Hall. A branchandbound algorithm for the quadratic assignment problem based on the hungarian method. European Journal of Operational Research, 108(3):629–640, 1998.

[26]
B. Jiang, J. Tang, C. Ding, and B. Luo.
A local sparse model for matching problem.
In
Proceedings of the TwentyNinth AAAI Conference on Artificial Intelligence
, AAAI’15, pages 3790–3796. AAAI Press, 2015.  [27] D. Kainmueller, F. Jug, C. Rother, and G. Myers. Active graph matching for automatic joint segmentation and annotation of C. elegans. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pages 81–88. Springer, 2014.
 [28] J. H. Kappes, B. Andres, F. A. Hamprecht, C. Schnörr, S. Nowozin, D. Batra, S. Kim, B. X. Kausler, T. Kröger, J. Lellmann, N. Komodakis, B. Savchynskyy, and C. Rother. A comparative study of modern inference techniques for structured discrete energy minimization problems. International Journal of Computer Vision, 115(2):155–184, 2015.
 [29] J. H. Kappes, B. Savchynskyy, and C. Schnörr. A bundle approach to efficient MAPinference by Lagrangian relaxation. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 1688–1695. IEEE, 2012.
 [30] S. E. Karisch and F. Rendl. Lower bounds for the quadratic assignment problem via triangle decompositions. Mathematical Programming, 71(2):137–151, 1995.
 [31] K. C. Kiwiel. Proximity control in bundle methods for convex nondifferentiable minimization. Mathematical programming, 46(13):105–122, 1990.
 [32] V. Kolmogorov. Convergent treereweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell., 28(10):1568–1583, 2006.
 [33] V. Kolmogorov. A new look at reweighted message passing. IEEE Trans. Pattern Anal. Mach. Intell., 37(5):919–930, 2015.
 [34] B. Korte, J. Vygen, B. Korte, and J. Vygen. Combinatorial optimization, volume 2. Springer, 2012.
 [35] E. L. Lawler et al. The quadratic assignment problem. Management Science, 9(4):586–599, 1963.

[36]
J. Lee, M. Cho, and K. M. Lee.
A graph matching algorithm using datadriven markov chain monte carlo sampling.
In Pattern Recognition (ICPR), 2010 20th International Conference on, pages 2816–2819. IEEE, 2010.  [37] M. Leordeanu and M. Hebert. A spectral technique for correspondence problems using pairwise constraints. In ICCV, pages 1482–1489. IEEE Computer Society, 2005.
 [38] M. Leordeanu, M. Hebert, and R. Sukthankar. An integer projected fixed point method for graph matching and MAP inference. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, editors, NIPS, pages 1114–1122. Curran Associates, Inc., 2009.
 [39] M. Leordeanu, R. Sukthankar, and M. Hebert. Unsupervised learning for graph matching. International Journal of Computer Vision, 96(1):28–45, 2012.
 [40] E. M. Loiola, N. M. M. de Abreu, P. O. Boaventuranetto, P. Hahn, and T. Querido. An analytical survey for the quadratic assignment problem. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, pages 657–690, 2007.
 [41] J. Maciel and J. Costeira. A global solution to sparse correspondence problems. IEEE Trans. Pattern Anal. Mach. Intell., 25(2):187–199, 2003.

[42]
Q. Nguyen, A. Gautier, and M. Hein.
A flexible tensor block coordinate ascent scheme for hypergraph matching.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5270–5278, 2015.  [43] P. M. Pardalos, K. A. Murthy, and T. P. Harrison. A computational comparison of local search heuristics for solving quadratic assignment problems. Informatica, 4(12):172–187, 1993.
 [44] C. Ribeiro and M. Minoux. Solving hard constrained shortest path problems by Lagrangean relaxation and branchandbound algorithms. Methods of Operations Research, 53:303–316, 1986.
 [45] C. Schellewald and C. Schnörr. Probabilistic subgraph matching based on convex relaxation. In EMMCVPR, volume 3757 of Lecture Notes in Computer Science, pages 171–186. Springer, 2005.
 [46] M. I. Schlesinger. Syntactic analysis of twodimensional visual signals in noisy conditions. Kibernetika, 4(113130):1, 1976.
 [47] D. Sontag, D. K. Choe, and Y. Li. Efficiently searching for frustrated cycles in map inference. In UAI, pages 795–804. AUAI Press, 2012.
 [48] Y. Suh, M. Cho, and K. M. Lee. Graph matching via sequential monte carlo. In Computer Vision  ECCV 2012  12th European Conference on Computer Vision, Florence, Italy, October 713, 2012, Proceedings, Part III, pages 624–637, 2012.
 [49] P. Swoboda, J. Kuske, and B. Savchynskyy. A dual ascent framework for Lagrangean decomposition of combinatorial problems. CoRR, 2016.
 [50] P. H. S. Torr. Solving Markov random fields using semi definite programming. In In: AISTATS, 2003.
 [51] L. Torresani, V. Kolmogorov, and C. Rother. A dual decomposition approach to feature correspondence. IEEE Trans. Pattern Anal. Mach. Intell., 35(2):259–271, 2013.
 [52] S. Umeyama. An eigendecomposition approach to weighted graph matching problems. IEEE Trans. Pattern Anal. Mach. Intell., 10(5):695–703, 1988.
 [53] M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(12):1–305, 2008.
 [54] T. Werner. A linear programming approach to maxsum problem: A review. IEEE Trans. Pattern Analysis and Machine Intelligence, 29(7):1165–1179, 2007.
 [55] T. Werner. Revisiting the linear programming relaxation approach to Gibbs energy minimization and weighted constraint satisfaction. IEEE Trans. Pattern Anal. Mach. Intell., 32(8):1474–1488, 2010.
 [56] S. J. Wright. Coordinate descent algorithms. Mathematical Programming, 151(1):3–34, 2015.
 [57] J. Yarkony, C. C. Fowlkes, and A. T. Ihler. Covering trees and lowerbounds on quadratic assignment. In CVPR, pages 887–894. IEEE Computer Society, 2010.
 [58] M. Zaslavskiy, F. Bach, and J.P. Vert. A path following algorithm for the graph matching problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12):2227–2242, 2009.
 [59] Z. Zhang. Microsoft kinect sensor and its effect. IEEE MultiMedia, 19(2):4–10, Apr. 2012.
 [60] Z. Zhang, Q. Shi, J. McAuley, W. Wei, Y. Zhang, and A. van den Hengel. Pairwise matching through maxweight bipartite belief propagation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
 [61] G. Zhao, B. Luo, J. Tang, and J. Ma. Using EigenDecomposition Method for Weighted Graph Matching, pages 1283–1294. Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.
 [62] Q. Zhao, S. E. Karisch, F. Rendl, and H. Wolkowicz. Semidefinite programming relaxations for the quadratic assignment problem. Journal of Combinatorial Optimization, 2(1):71–109, 1998.
 [63] F. Zhou and F. D. la Torre. Factorized graph matching. In CVPR, pages 127–134. IEEE Computer Society, 2012.
8 Supplementary Material
8.1 Proofs
Proof of Proposition 1
8.2 Optimization Subproblems of Procedure 1
In Procedure 1 the two problem in lines 1 and 1 must be solved. Solution of the optimization problem in line 1 was discussed in the main part of the paper. Therefore, it only remains to show how to carry optimization of the problem in line 1 efficiently for all cases that can occur. This is shown in Table 3.
Checking validity of the operations in Table 3 for and is straightforward. For and , we prove correctness below. Correctness for and is analoguous.
Lemma 2.
8.3 Time complexity
The time complexity of running one iteration of message passing is essentially the time to run all required invocations of Algorithm 1 via the routines described in Table 3. Total runtime per iteration for the various algorithms we have proposed can be found in Table 4. We assume that . In sparse assignment problems, where this is not the case, runtime decreases according to sparsity.
If we hold the unary potentials , in a heap, we can support operation which is required in the third line in Table 3 in time , since either (sending) or (receiving).
Hence, all our algorithms scale to realistic problem sizes.
8.4 Detailed Experimental Evaluation
Plots showing lower bound and primal solution energy per over time can be seen in Figure 2.
In Table 5 dataset statistics are given together with final upper and lower bound as well as runtime averaged over all instances in specific datasets are given.
A perinstance evaluation of all considered algorithms can be found in Table 6.
Algorithm  Time complexity 

GM 
Comments
There are no comments yet.