Distributed methods for synchronization of orthogonal matrices over graphs

01/25/2017 ∙ by Johan Thunberg, et al. ∙ University of Luxembourg 0

This paper addresses the problem of synchronizing orthogonal matrices over directed graphs. For synchronized transformations (or matrices), composite transformations over loops equal the identity. We formulate the synchronization problem as a least-squares optimization problem with nonlinear constraints. The synchronization problem appears as one of the key components in applications ranging from 3D-localization to image registration. The main contributions of this work can be summarized as the introduction of two novel algorithms; one for symmetric graphs and one for graphs that are possibly asymmetric. Under general conditions, the former has guaranteed convergence to the solution of a spectral relaxation to the synchronization problem. The latter is stable for small step sizes when the graph is quasi-strongly connected. The proposed methods are verified in numerical simulations.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

This paper introduces two new distributed algorithms for the problem of synchronizing orthogonal matrices over graphs. Synchronization means that compositions of transformations (multiplications of matrices) over loops in the graph equal the identity (matrix

[1, 2, 3, 4]. Thus, “synchronization” does not refer to the related concepts of consensus [5] or rendezvous, e.g., attitude synchronization [6]. We formulate the problem as a nonlinear least-squares optimization with matrix variables [7, 8]. For symmetric communication topologies we provide an algorithm with strong convergence guarantees – the solution converges to the optimal solution of a spectral relaxation, which in turn is known to produce near-optimal solutions. For graphs that are possibly asymmetric we provide an algorithm with weaker convergence guarantees but with good performance in numerical simulations.

The synchronization problem appears as one of the key components in the following applications: the 3D-localization problem, where the transformations are obtained from camera measurements; the generalized Procrustes problem, where scales, rotations, and translations are calculated between multiple point clouds [9]; the image registration problem, where transformations are calculated between multiple images [10]. Due to sensor and communication limitations, there is often a need to use distributed protocols for the 3D-localization problem and several approaches have been proposed recently [11, 12, 13]. There are also many other interesting applications for the synchronization problem, see Section 1.2 in [14].

If we exclude the requirement that the synchronization method shall be distributed, there is an extensive body of work. Govindu et al. have presented several approaches based on Lie-group averaging, where a first-order approximation in the tangent space is used [15, 16, 17]. Singer et al. have presented several optimization approaches [1, 2, 3, 18, 19, 20, 21]. Pachauri et al. have addressed the special case where the matrices are permutation matrices [22]. In [3], three types of relaxations of the problem are presented: semidefinite programming relaxation (see [14]

for an extensive analysis of this approach); spectral relaxation; least unsquared deviation in combination with semidefinite relaxation. These three relaxations were evaluated in the probabilistic framework where the error to the ground truth was calculated in numerical experiments. The simulations showed that the first two approaches were on par, whereas the last approach performed slightly better. Furthermore, the last approach was significantly more robust to outliers. The first distributed algorithm we present has a connection to the second category of the three relaxations above, since the matrices in the algorithm converge to the optimal solution of the spectral relaxation. Our methods are extrinsic, in the sense that the matrices are calculated in

and then projected onto the set of orthogonal matrices. The opposite to extrinsic methods are intrinsic methods where no projections from an ambient space occur. In [23], intrinsic gradient descent methods are studied for the problem of finding the Riemannian center of mass.

The contributions of this work can be summarised as the introduction of two novel algorithms (Algorithm 1 and 2) for distributed synchronization of orthogonal matrices over directed graphs. For both algorithms we provide conditions for guaranteed convergence. The main result of the paper is the above-mentioned convergence in Algorithm 1 to the optimal solution of the spectral relaxation problem (Proposition 4.2.2). Previous works in the context of distributed algorithms have focused on undirected graphs and 3D rotations [11, 12, 13]. However, in this work we consider directed graphs and arbitrary dimensions. It should be noted that some of the existing algorithms can be extended to higher dimensions and are given for the 3D-case mostly for clarity of exposition.

The distributed approaches in this work bear a resemblance to linear consensus protocols [24, 25, 26, 27]

. The methods also share similarities with the eigenvector method in 

[28] and gossip algorithms [29]. The important states in our algorithms are matrices, and those combined converge to a tall matrix whose range space is a certain linear subspace. In the case of symmetric communication between agents, the proposed method can either be interpreted as an extension of the power method or the steepest descent method. In our methods, instead of using the graph Laplacian matrix [24], matrices similar to the graph connection Laplacian matrix [30] are used. These matrices can be seen as a generalizations of the graph Laplacian matrix, in which the scalars are replaced by matrix blocks.

The paper proceeds as follows. In Section 2 we introduce the definitions that are necessary in order to precisely state the problem, which is done in Section 3. Subsequently, the distributed method for the case of symmetric graphs (Algorithm 1) is introduced and analyzed in section 4. In Section 5, the distributed method for the case of directed and possibly asymmetric graphs (Algorithm 2) is introduced and analyzed. In Section 6, the paper is concluded.

2 Preliminaries

2.1 Directed Graphs

Let be a directed graph, where is the node set and is the edge set. Throughout the paper, the notation means that every element in is contained in . The set is the set of neighboring nodes of node and defined by

(1)

The adjacency matrix for the graph is defined by

(2)

The graph Laplacian matrix is defined by

(3)

where

is a vector with all entries equal to

. In order to emphasize that the adjacency matrix , the graph Laplacian matrix and the sets depend on the graph , we may write , and respectively. For simplicity however, we mostly omit this notation and simply write , , and .

(connected graph, undirected path)
The directed graph is connected if there is an undirected path from any node in the graph to any other node. An undirected path is defined as a (finite) sequence of unique nodes such that for any pair of consecutive nodes in the sequence it holds that

(quasi-strongly connected graph, center, directed path)
The directed graph is quasi-strongly connected (QSC) if it contains a center. A center is a node in the graph to which there is a directed path from any other node in the graph. A directed path is defined as a (finite) sequence of unique nodes such that any pair of consecutive nodes in the sequence comprises an edge in .

(strongly connected graph)
The directed graph is strongly connected if for all pairs of nodes , there is a directed path from to .

(symmetric graph)
The directed graph is symmetric if

(4)

Given a graph , the graph is the graph constructed by reversing the direction of the edges in , i.e., if and only if . It is easy to see that

(5)

2.2 Synchronization or transitive consistency of matrices

The set of invertible matrices in is and the group of orthogonal matrices in is

(6)

The set comprises those matrices in whose determinants are equal to . (transitive consistency)

  1. The matrices in the collection of matrices in are transitively consistent for the complete graph if

    (7)

    for all and .

  2. Given a graph , the matrices in the collection of matrices in are transitively consistent for if there is a collection such that is transitively consistent for the complete graph.

If it is apparent by the context, sometimes we will be less strict and omit to mention which graph a collection of transformations is transitively consistent for. Another word for transitive consistency is synchronization. We will use the two interchangeably. A sufficient condition for synchronization of the -matrices for any graph is that there is a collection of matrices in such that

(8)

for all . Lemma 2.2 below and the proof thereof provides additional important information. The result is similar to that in [12]. For the statement of the lemma, the following definition is needed.

Two collections and of matrices in are equal up to transformation from the left, if there is such that

(9)

For any graph and collection of matrices in that are transitively consistent for ,

  1. there is a collection of matrices in such that

    (10)
  2. all collections satisfying (10) are equal up to transformation from the left if and only if is connected,

  3. there is a unique collection of transitively consistent matrices for the complete graph, if and only if all collections satisfying (10) are equal up to transformation from the left.

Proof: See [31].

Another equivalent definition of transitive consistency or synchronization is given in [12, 32]. A set of transformations is transitively consistent if the composite transformations equal the identity along loops or cycles in the graph. In Proposition 7 in [12] the equivalence between this condition for the loops and (10) is shown. The definition using the auxiliary -matrices, (10), is the one we will use mostly in our analysis.

3 Problem formulation

The optimization problem of interest is given by

where the ’s are positive scalar weights, the set is the edge-set of a connected directed graph , and the matrices in the collection belong to . The objective function comprises the weighted element-wise sum of squared differences between the -matrices and the -matrices. The problem is similar to the problem in [3]. The differences are that we allow for directed graphs (instead of undirected graphs) and we do not require the matrices to be contained in .

The overall problem addressed in this paper is how to design distributed methods that achieve good solutions to .

When the -matrices are orthogonal, can be written as

(11)
(12)

The matrix will be used frequently in the following. The presented definition of might seem overly complicated, since when is orthogonal. However, we will also use when the -matrices are not orthogonal. In that case it is important to note however that is given by the definition in and not by (11).

The matrix is defined as

(13)

where and the operator in the second term is understood in the block-matrix sense, i.e. , where denotes element-wise multiplication. The matrix is now, compared to Section 2, a weighted adjacency matrix of . In the following, will always be defined in this way. The matrices , , and are de-facto functions of the graph , , and either the -matrices or the -matrices. However, unless it is absolutely necessary, we will not show this dependence explicitly.

4 Symmetric Graphs

In this section we introduce Algorithm 1. It is the proposed distributed algorithm for synchronization over symmetric graphs. A detailed analysis of Algorithm 1 will be conducted in Section 4.2.

4.1 The algorithm

There are four matrices that can be seen as the output of the algorithm at each iteration. Those are: , , , and . The procedure to calculate the -matrices is similar to a gradient descent procedure and can also be seen as the power method. The -matrices are the projections of the -matrices onto .

For all , the matrix and the corresponding is calculated from auxiliary variables. The most important such auxiliary variables are the ’s, which are calculated in a distributed manner. The protocol for calculating the ’s is similar to a well-known average consensus protocol, but differs by an extra term . This extra term makes the states converge not to the averages of the initial conditions, but to the averages over for the converging sequences . The idea behind the -matrices is to modify the -matrices in such a way that the modified matrices (the ’s) converge to the matrices in the optimal solution to a spectral relaxation of problem () (this relaxation is defined in Section 4.2.2).   Algorithm 1 Distributed method for symmetric graphs  Inputs: a symmetric directed graph , a weight matrix , and a collection of matrices in .

Outputs: , , , for and .

Initialization: let , , , and for all . Let and for all . Let .

Iteration :
for all , let

where is the projection operator (the least squares projection onto

computed by means of Singular Value Decomposition (SVD)).


 

The -variables provide a way of creating the -matrices by re-scaling the columns of the -matrices. This re-scaling is necessary to obtain the desired convergence. The -matrices are projections onto of scaled versions of the -matrices. Under general conditions, those converge to the projections of the matrices in the optimal solution to the spectral relaxation of problem ().

  Subroutine 1 Calculation of and  

Inputs: .

Outputs: , .

  1. If is not invertible or

    is invertible and it does not hold that the eigenvalues of

    are distinct, real, and positive. Let and for all .

  2. Else using eigenvalue decomposition, compute

    (14)

    Let .

    Let where and each .

 

Let for all . The update for is given by

(15)

4.2 Analysis

In this section we show how the matrices , , , and relate to problem . We will provide conditions for well-posedness and convergence.

4.2.1 Some properties of

For the analysis of Algorithm 1 we first provide an alternative definition of transitive consistency, formulated in terms of the -matrix. To be more precise, in Proposition 4.2.1 we state that for the general case of invertible matrices, transitive consistency is equivalent to the -matrix having a -dimensional nullspace. In other words, there are no collections of matrices that are not transitively consistent for which the -matrix has a nullspace of dimension . This motivates the choice of as the objective function in an optimization problem for synchronization of matrices.

For collections of matrices in and graph that is connected, it holds that

(16)

with equality if and only if transitive consistency holds.

Before we provide the proof of Proposition 4.2.1 we provide the following lemma and the proof thereof. For any connected graph and collection of matrices in , the collection is transitively consistent for if and only if there is a collection of matrices in such that

(17)

Proof: Suppose is transitively consistent, then, according to Lemma 2.2, there is such that (10) holds for the -matrices. In this case it holds that

(18)

which implies that (17) is fulfilled since is symmetric. On the other hand, if is not transitively consistent, there are no such that (10) holds. It can now be shown that (18) does not hold for any collection of matrices in .

Proof of Proposition 4.2.1:
Part 1: Here we assume that is transitively consistent. Due to Lemma 4.2.1, we know that

(19)

Thus we need to show that the inequality in (19) cannot be strict. Since is transitively consistent, there is , where the fulfill (10).

Suppose the inequality (19) is strict. We know that Now there must be a vector , where the are in , such that , and . There must be and such that the -th element of is nonzero. Now, let

and where for . It holds that and . For all , let denote the -th block matrix in . The rest of this part of the proof consists of firstly showing that all the -matrices are invertible and secondly showing that we can use those matrices to formulate a contradictory statement.

It holds that . This is true since it is constructed by taking the identity matrix and replacing the -th column by another vector that has a nonzero -th element. Now, for any it holds that which implies that . Also, for any such that , it holds that which implies that . Now, due to the fact that is connected, an induction argument can be used to show that all the are elements in .

The collection satisfies for all Since , the two collections and are not equal up to transformation from the left. But, since the graph is connected, the two must be equal up to transformation from the left (Lemma 2.2). This is a contradiction. Hence it is a false assumption that the inequality in (19) is strict.

Part 2: Here we show that if then is transitively consistent.

Let be any full rank matrix such that It holds that all the . Let be the -th block matrix in . Since is full rank, there is a collection such that .

Now, for we know that for any it holds that which implies that . Also, for any such that , it holds that

which implies that . Now, due to the fact that is connected, an induction argument can be used to show that for all . But then which together with the fact that is full rank, implies that for all . It holds that Now the desired result follows by application of Lemma 4.2.1.

From Definition 6 and Proposition 4.2.1 we get the following equivalent characterisations of transitive consistency. (equivalent characterisations of transitive consistency)
For a connected graph and a collection of matrices in the following three statements are equivalent

  1. is transitively consistent.

  2. There is a collection of matrices in such that .

  3. There is a collection of matrices in such that

According to Corollary 4.2.1, the following holds. For a collection of matrices in , attains the value if and only if the collection is transitively consistent. This means that minimization of the right-hand side of (11) is an approach to consider even when the are not necessarily orthogonal. This is the approach in the first step of an iterative method recently published [33].

With the assurance given by Proposition 4.2.1 that is a suitable objective function, we now move on to the convergence analysis of Algorithm 1.

4.2.2 Convergence analysis

We begin by introducing a relaxation of problem , given by

Let be one of the optimal solutions to . Under the assumption that for all , the spectral relaxation method in Section 2.2 in [3] is the same as solving problem .

Now we provide a list of conditions for convergence, which are recalled in the following propositions. Only a subset of the conditions will be used in each proposition.

(1) is connected and symmetric.
(2) for all .
(3) is transitively consistent.
(4) , where
.
(5) , where
.
(6) for all the -matrices in .
(7) where the are the
matrices in .
(8) It holds that
where are the eigenvalues
of , i.e., .
(9) It holds that for
where the are
defined in (8) above.
(10) , where is the graph Laplacian
matrix of the graph defined in Section 2
(not to mix up with ).
Table 1: Conditions for convergence.

The conditions (1-3) are fundamental properties that need no further explanation. Conditions (4-5) and (10) are conditions for the step size determination. These have the property that they scale with the number of nodes in the network, i.e., . Condition (6) states that all the -matrices in the optimal solution to are invertible and condition (7) states that the sum of those is invertible. Condition (8) states that the smallest eigenvalues of are strictly smaller than the largest. Condition (9) states that the smallest eigenvalues of are distinct.

The following lemma provides a bound for such that the discrete-time system defined by (15) is stable. It is a justification of convergence condition (4).

The largest eigenvalue of , as defined in convergence condition (4), is an upper bound for the eigenvalues of for all graphs and collections satisfying convergence conditions (1) and (2).

Proof: The largest eigenvalue is given by

where is the -dimensional unit sphere. Let , where each . By using the structure of the -function in , one can show that
Now, The set comprises the non-negative real numbers.

Lemma 4.2.2 has the following implication. If in Algorithm 1 is chosen to be smaller than , then (see (15)) converges as goes to infinity.

Now, unless the -matrices are transitively consistent, the nullspace of has a lower dimension than and in general it will be zero-dimensional. Thus, converges to zero. In the case when the -matrices are transitively consistent, the converge to a -dimensional subspace in general.

Now we provide a result for the special case when transitive consistency holds. Algorithm 1 reduces to the first two lines in each iteration, which is de facto the power method. See [28] for a discussion about the power method in a similar context. We provide Proposition 4.2.2 and its proof below for the sake of completeness.

Suppose convergence conditions (1-3), and (5-7) hold. Then, for Algorithm 1, there is a positive integer such that is well defined for , and (for ) it holds that

Proof: Under conditions (2) and (3), it holds that is similar to the matrix , where is defined in convergence condition (5). This is a consequence of the fact that we can write as

In the right-hand side above, shall be interpreted in the block diagonal sense, where the and are put as blocks on the diagonal.

Since conditions (1), (2), and (3) hold, we can use Corollary 4.2.1. There is a collection such that the conditions (1) and (2) are fulfilled (in the corollary). Thus, Due to this fact and the fact that for all , it holds that the projection of onto is where . Furthermore, converges to as goes to infinity (condition (5)). Now, if is invertible (condition (7)), by the definition of the limit there is a such that is well defined for all . It holds that

(20)
(21)

Convergence conditions (6) and (7) holds. Thus, the -matrices are invertible. Under these conditions, the projections of converge in (21) (see the last paragraph of Proposition 4.2.2 below for details about the convergence of the projections).

In Proposition 4.2.2 it is important to note that for all , converges to inside , whereas converges to without guarantees of being in for each .

Now we take a step further in our analysis of Algorithm 1. We show that when the -matrices are not necessarily transitively consistent, we still have a nice convergence property for the -matrices.

Suppose that the convergence conditions (1-2), (4), and (6-8) are satisfied. Then, for Algorithm 1 there is a positive integer such that is well defined for all , and (for ) it holds that

(22)

Proof: Since the convergence conditions (1), (2), and (4) are fulfilled, we know that the discrete time system defined in equation (15) is stable.

The columns of are, up to scale and orthogonal transformation, the eigenvectors corresponding to the smallest eigenvalues of . Now, let us rewrite as where the columns of the matrix are the eigenvectors corresponding to the largest eigenvalues of the matrix . Due to convergence condition (8), there are eigenvalues that are strictly larger than the other eigenvalues. The matrices and are defined as

(23)
(24)

and the matrices and are defined as and