Point pattern matching is a fundamental problem in pattern recognition, and has been modeled in several different forms, depending on the demands of the application domain in which it is required[2, 3]. A classic formulation which is realistic in many practical scenarios is that of near-isometric point pattern matching, in which we are given both a “template” () and a “scene” () point patterns, and it is assumed that contains an instance of (say ), apart from an isometric transformation and possibly some small jitter in the point coordinates. The goal is to identify in and find which points in correspond to which points in .
Recently, a method was introduced which solves this problem efficiently by means of exact belief propagation in a certain graphical model . The approach is appealing because it is optimal not only in that it consists of exact inference in a graph with small maximal clique size ( for matching in ), but that the graph itself is optimal. There it is shown that the maximum a posteriori (MAP) solution in the sparse and tractable graphical model where inference is performed is actually the same MAP solution that would be obtained if a fully connected model (which is intractable) could be used. This is due to the so-called global rigidity of the chordal graph in question: when the graph is embedded in the plane, the lengths of its edges uniquely determine the lengths of the absent edges (i.e. the edges of the graph complement) . The computational complexity of the optimal point pattern matching algorithm is then shown to be (both in terms of processing time and memory requirements), where is the number of points in the template point pattern and is the number of points in the scene point pattern (usually with in applications). This reflects precisely the computational complexity of the Junction Tree algorithm in a chordal graph with nodes, states per node and maximal cliques of size 4. The authors present experiments which give evidence that the method substantially improves on well-known matching techniques, including Graduated Assignment .
In this paper, we show how the same optimality proof can be obtained with an algorithm that runs in time per iteration. In addition, memory requirements are precisely decreased by a factor of . We are able to achieve this by identifying a new graph which is globally rigid but has a smaller maximal clique size: 3. The main problem we face is that our graph is not chordal, so in order to enforce the running intersection property for applying the Junction Tree algorithm the graph should first be triangulated; this would not be interesting in our case, since the resulting triangulated graph would have larger maximal clique size. Instead, we show that belief propagation in this graph converges to the optimal solution, although not necessarily in a single iteration. In practice, we find that convergence occurs after a small number of iterations, thus improving the running-time by an order of magnitude. We compare the performance of our model to that of  with synthetic and real point sets derived from images, and show that in fact comparable accuracy is obtained while substantial speed-ups are observed.
We consider point matching problems in . The problem we study is that of near-isometric point pattern matching (as defined above), i.e. one assumes that a near-isometric instance () of the template () is somewhen “hidden” in the scene (). By “near-isometric” it is meant that the relative distances of points in are approximately preserved in . For simplicity of exposition we assume that , , and are ordered sets (their elements are indexed). Our aim is to find a map with image that best preserves the relative distances of the points in and , i.e.
where is the matrix whose entry is the Euclidean distance between points indexed by and in . Note that finding
is inherently a combinatorial optimization problem, sinceis itself a subset of , the scene point pattern. In , a generic point in
is modeled as a random variable (), and a generic point in is modeled as a possible realization of the random variable (). As a result, a joint realization of all the random variables corresponds to a match between the template and the scene point patterns. A graphical model (see [6, 7]) is then defined on this set of random variables, whose edges are set according to the topology of a so-called 3-tree graph (any 3-tree that spans ). A 3-tree is a graph obtained by starting with the complete graph on 3 vertices, , and then adding new vertices which are connected only to those same 3 vertices.111Technically, connecting new vertices to the 3 nodes of the original graph is not required: it suffices to connect new vertices to any existent 3-clique. Figure 1 shows an example of a 3-tree. The reasons claimed in  for introducing 3-trees as a graph topology for the probabilistic graphical model are that (i) 3-trees are globally rigid in the plane and (ii) 3-trees are chordal222A chordal graph is one in which every cycle of length greater than 3 has a chord. A chord of a cycle is an edge not belonging to the cycle but which connects two nodes in the cycle (i.e. a “shortcut” in a cycle). graphs. This implies (i) that the 3-tree model is a type of graph which is in some sense “optimal” (in a way that will be made clear in the next section in the context of the new graph we propose) and (ii) that 3-trees have a Junction Tree with fixed maximal clique size (); as a result it is possible to perform exact inference in polynomial time .
Potential functions are defined on pairs of neighboring nodes and are large if the difference between the distance of neighboring nodes in the template and the distance between the nodes they map to in the scene is small (and small if this difference is large). This favors isometric matchings. More precisely,
where is typically some unimodal function peaked at zero (e.g. a zero-mean Gaussian function) and is the Euclidean distance between the corresponding points (for simplicity of notation we do not disambiguate between random variables and template points, or realizations and scene points). For the case of exact matching, i.e. when there exists an such that the minimal value in (1) is zero, then (where is just the indicator function ). The potential function of a maximal clique () is then simply defined as the product of the potential functions over its 6 () edges (which will be maximal when every factor is maximal). It should be noted that the potential function of each edge is included in no more than one of the cliques containing that edge.
For the case of exact matching (i.e. no jitter), it is shown in  that running the Junction Tree algorithm on the 3-tree graphical model with will actually find a MAP assignment which coincides with , i.e. such that . This is due to the “graph rigidity” result, which tells us that equality of the lengths of the edges in the 3-tree and the edges induced by the matching in is sufficient to ensure the equality of the lengths of all pairs of points in and . This will be made technically precise in the next section, when we prove an analogous result for another graph.
3 An Improved Graph
Here we introduce another globally rigid graph which has the advantage of having a smaller maximal clique size. Although the graph is not chordal, we will show that exact inference is tractable and that we will indeed benefit from the decrease in the maximal clique size. As a result we will be able to obtain optimality guarantees like those from .
Our graph is constructed using Algorithm 1.
In order to present our results we need to start with the definition of a globally rigid graph:
A planar graph embedding is said to be globally rigid in if the lengths of the edges uniquely determine the lengths of the edges of the graph complement of .
So our statements are really about graph embeddings in , but for simplicity of presentation we will simply refer to these embeddings as “graphs”.
This means that there are no degrees of freedom for the absent edges in the graph: they must all have specified and fixed lengths. To proceed we need a simple definition and some simple technical lemmas.
A set of points is said to be in general position in if no 3 points lie in a straight line.
Given a set of points in general position in , if the distances from a point to two other fixed points are determined then can be in precisely two different positions.
Consider two circles, each centered at one of the two reference points with radii equal to the given distances to point . These circles intersect at precisely two points (since the 3 points are not collinear). This proves the statement. ∎
The following lemma follows directly from lemma 1 in , and is stated without proof.
Given a set of points in general position in , if the distances from a point to three other fixed points are determined then the position of is uniquely determined.
We can now present a proposition.
Any graph arising from Algorithm 1 is globally rigid in the plane if the nodes are in general position in the plane.
Define a reference frame where points 1, 2 and have specific coordinates (we say that the points are “determined”). We will show that all points then have determined positions in and therefore have determined relative distances, which by definition implies that the graph is globally rigid.
We proceed by contradition: assume there exists at least one undetermined point in the graph. Then we must have an undetermined point such that and are determined (since points 1 and 2 are determined). By virtue of lemma 4, points and must then be also undetermined (otherwise point would have determined distances from 3 determined points and as a result would be determined).
Let us now assume that only points are undetermined. Then the only possible realizations for points , and are their reflections with respect to the straight line which passes through points and , since these are the only possible realizations that maintain the rigidity of the triangles , since and are assumed fixed. However, since and are also fixed by assumption, this would break the rigidity of triangles and . Therefore cannot be determined. This can then be considered as the base case in an induction argument which goes as follows. Assume only are undetermined. Then, by reflecting these points over the line that joins and (which are fixed by assumption), we obtain the only other possible realization consistent with the rigidity of the triangles who have all their vertices in . However, this realization is inconsistent with the rigidity of triangles and , therefore must not be determined and by induction any point such that must not be determined, which contradicts the assumption that is determined. As a result, the assumption that there is at least one undetermined point in the graph is false. This implies that the graph has all points determined in , and therefore all relative distances are determined and by definition the graph is globally rigid. This proves the statement.∎
Although we have shown that graphs are globally rigid, notice that they are not chordal. For the graph in Figure 2, the cycles and have no chord. Moreover, triangulating this graph in order to make it chordal will necessarily increase (to at least 4) the maximal clique size (which is not sufficient for our purposes since we arrive at the case of ).
Instead, consider the clique graph formed by . If there are nodes, the clique graph will have cliques . This clique graph forms a cycle, which is depicted in Figure 3.333Note that if we connected every clique whose nodes intersected, the clique graph would no longer form a cycle; here we have only formed enough connections so that the intersection of any two cliques is shared by the cliques on at least one path between them (similar to the running intersection property for Junction Trees).
We now draw on results first obtained by Weiss , and confirmed elsewhere . There it is shown that, for graphical models with a single cycle, belief propagation converges to the optimal MAP assignment, although the computed marginals may be incorrect. Note that for our purposes, this is precisely what is needed: we are after the most likely joint realization of the set of random variables, which corresponds to the best match between the template and the scene point patterns. Max-product belief propagation  in a cycle graph like the one shown in Figure 3 amounts to computing the following messages, iteratively:
where is the set of singleton variables in clique node , the potential function for clique node and the message passed from clique node to clique node . Upon reaching the convergence monitoring threshold, the optimal assignment for singleton variable in clique node is then computed by .
Unfortunately, the above result is only shown in  when the graph itself forms a cycle, whereas we only have that the clique graph forms a cycle. However, it is possible to show that the result still holds in our case, by considering a new graphical model in which the cliques themselves form the nodes, whose cliques are now just the edges in the clique graph. The result from  can now be used to prove that belief propagation in this graph converges to the optimal MAP assignment, which (by appropriately choosing potential functions for the new graph), implies that belief propagation should converge to the optimal solution in the original graph also.
To demonstrate this, we need not only show that belief propagation in the new model converges to the optimal assignment, but also that belief propagation in the new model is equivalent to belief propagation in the original model.
The original clique graph (Figure 3) can be transformed into a model containing only pairwise potentials, whose optimal MAP assignment is the same as the original model’s.
Consider a clique “node” (in the original graph), whose neighbors share exactly two of its nodes (for instance ). Where the domain for each node in the original graph was simply , the domain for each “node” in our new graph simply becomes .
In this setting, it is no longer possible to ensure that the assignment chosen for each “node” is consistent with the assignment to its neighbor – that is, for an assignment to , and to , we cannot guarantee that , or . Instead, we will simply define the potential functions on this new graph in such a way that the optimal MAP assignment implicitly ensures this equality. Specifically, we shall define the potential functions as follows: for two cliques and in the original graph (which share two nodes, say and ), define the pairwise potential for the clique () in the new graph as follows:
Where is simply the clique potential for the clique in the original graph; (sim. for ). That is, we are setting the pairwise potential to simply be the original potential of one of the cliques if the assignments are compatible, and otherwise. If we were able to set , we would guarantee that the optimal MAP assignment was exactly the optimal MAP assignment in the original graph – however, this is not possible, since the result of  only holds when the potential functions have finite dynamic range. Hence we must simply choose sufficiently small so that the optimal MAP assignment cannot possibly contain an incompatible match – it is clear that this is possible, for example will do.
The result of  now implies that belief propagation in this graph will converge to the optimal MAP assignment, which we have shown is equal to the optimal MAP assignment in the original graph. ∎
The messages passed in the new model are equivalent to the messages passed in the original model, except for repetition along one axis.
We use induction on the number of iterations. First, we must show that the outgoing messages are the same during the first iteration (during which the incoming messages are not included). We will denote by the message from to during the iteration:
This result only holds due to the fact that will never be chosen when maximizing along any axis. We now have that the messages are equal during the first iteration (the only difference being that the message for the new model is repeated along one axis).444To be completely precise, the message for the new model is actually a function of only a single variable – . By “repeated along one axis”, we mean that for any given , the message at this point is independent of , which therefore has no effect when maximizing. Next, suppose during the iteration, the messages (for both models) are equal to . Then for the iteration we have:
Hence the two message passing schemes are equivalent by induction. ∎
We can now state our main result:
Let be a graph generated according to the procedure described in Algorithm 1. Assume that there is a perfect isometric instance of within the scene point pattern . Then the MAP assignment obtained by running belief propagation over the clique graph derived from G is such that .
For the exact matching case, we simply set in (2). Now, for a graph given by Algorithm 1, the clique graph will be simply a cycle, as shown in Figure 3, and following propositions 6 and 7 as well as the already mentioned result from , belief propagation will find the correct MAP assignment , i.e.
is the probability distribution for the graphical model induced by the graph. Now, we need to show that also maximizes the criterion which ensures isometry, i.e. we need to show that the above implies
where is the probability distribution of the graphical model induced by the complete graph. Note that must be such that the lengths of the edges in are precisely equal to the lengths of the edges in (i.e. the edges induced in from by the map ). By the global rigidity of , the lengths of must then be also precisely equal to the lengths of . This implies that . Since (10) can be expanded as
it becomes clear that will also maximize (10). This proves the statement.∎
The parameters used in our experiments are as follows:
– this parameter controls the noise-level used in our model. Here we apply Gaussian noise to each of the points in
(with standard deviationin each axis). We have run our experiments on a range of noise levels between and (where the original points in are chosen randomly between and ). Note that this is the same as the setting used in .
Dynamic range – as mentioned in section 2, the potential function is simply the product of for all edges in (here each maximal clique contains 3 edges). The dynamic range of a function is simply defined as its maximum value divided by its minimum value (i.e. ). In order to prove convergence of our model, it is necessary that the dynamic range of our potential function is finite . Therefore, rather than using directly, we use . This ensures that the dynamic range of our model is no larger than , and that as . In practice, we found that varying this parameter did not have a significant effect on convergence time. Hence we simply fixed a large finite value () throughout.
MSE-cutoff – in order to determine the point at which belief propagation has converged, we compute the marginal distribution of every clique, and compare it to the marginal distribution after the previous iteration. Belief propagation is terminated when this mean-squared error is less than a certain cutoff value for every clique in the graph. When choosing the mode of the marginal distributions after convergence, if two values differ by less than the square-root of this cutoff, both of them are considered as possible MAP-estimates (although this was rarely an issue when the cutoff was sufficiently small). We found that asincreased, the mean squared error between iterations tended to be smaller, and therefore that smaller cutoff values should be used in these instances. Indeed, although the number of viable matches increases as increases, the distributions increase in sparsity at an even faster rate – hence the distributions tend to be less peaked on average, and changes are likely to have less effect on the mean squared error. Hence we decreased the cutoff values by a factor of 10 when .555Note that this is not a parameter in , in which only a single iteration is ever required.
The clique graph in which messages are passed by our belief propagation algorithms is exactly that shown in Figure 3. It is worth noting, however, that we also tried running belief propagation using a clique graph in which messages were passed between all intersecting cliques; we found that this made no difference to the performance of the algorithm,666Apart from one slight difference: including the additional edges appears to provide convergence in fewer iterations. However, since the number of messages being passed is doubled, the overall running-time for both clique graphs was ultimately similar. and we have therefore restricted our experiments to the clique graph of Figure 3 in respect of its optimality guarantees.
For the sake of running-time comparison, we implemented the proposed model, as well as that of  using the Elefant belief propagation libraries in Python.777http://elefant.developer.nicta.com.au/ However, to ensure that the results presented are consistent with those of , we simply used code that the authors provided when reporting the matching accuracy of their model.
Figures 5 and 6 show the running-time and matching accuracy (respectively) of our model, as we vary the mean-squared error cutoff. Obviously, it is necessary to use a sufficiently low cutoff in order to ensure that our model has converged, but choosing too small a value may adversely effect its running-time. We found that the mean-squared error varied largely during the first few iterations, and we therefore enforced a minimum number of iterations (here we chose at least 5) in order to ensure that belief-propagation was not terminated prematurely. Figure 5 reveals that the running-time is not significantly altered when increasing the MSE-cutoff – revealing that the model has almost always reached the lower cutoff value after 5 iterations (in which case we should expect a speed-up of precisely ). Furthermore, decreasing the MSE-cutoff does not significantly improve the matching accuracy for larger point sets (Figure 6), so choosing the lower cutoff does little harm if running-time is major a concern. Alternately, the Junction Tree model (which only requires a single iteration), took (for to ), 3, 44, 250, and 1031 seconds respectively. These models differ only in the topology of the network (see section 3), and the size of the messages being passed; our method easily achieves an order of magnitude improvement for large networks.888In fact, the speed-up appears to be more than an order of magnitude for the large graphs, which is likely a side effect of the large memory requirements of the Junction Tree algorithm.