Revisiting the Continuity of Rotation Representations in Neural Networks

06/11/2020 ∙ by Sitao Xiang, et al. ∙ University of Southern California 0

In this paper, we provide some careful analysis of certain pathological behavior of Euler angles and unit quaternions encountered in previous works related to rotation representation in neural networks. In particular, we show that for certain problems, these two representations will provably produce completely wrong results for some inputs, and that this behavior is inherent in the topological property of the problem itself and is not caused by unsuitable network architectures or training procedures. We further show that previously proposed embeddings of SO(3) into higher dimensional Euclidean spaces aimed at fixing this behavior are not universally effective, due to possible symmetry in the input causing changes to the topology of the input space. We propose an ensemble trick as an alternative solution.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Quaternions and Euler angles have traditionally been used to represent 3D rotations in computer graphics and vision. This tradition is preserved in more recent works where neural networks are employed for inferring or synthesizing rotations, for a wide range of applications such as pose estimation from images, e.g.

Xiang et al. (2018), and skeleton motion synthesis, e.g. Villegas et al. (2018). However, difficulties has been encountered, in that the network seems unable to avoid rotation estimation errors in excess of in certain cases, as reported by Xiang et al. (2018). Attempts has been made to explain this, including arguments that Euler angle and quaternion representations are not embeddings and in a certain sense discontinuous Zhou et al. (2019), and from symmetry present in the data Xiang et al. (2018); Saxena et al. (2009). One proposed solution is to use embeddings of into or Zhou et al. (2019). However, we feel that these arguments are mostly based on intuition and empirical results from experiments, while the nature of the problem is topological which is one aspect that has not been examined in depth. In this paper we aim to give a more precise characterization of this problem, theoretically prove the existence of high errors, analysis the effect of symmetries, and propose a solution to this problem. In particular:

  • We prove that a neural network converting rotation matrices to quaternions and Euler angles must produce an error of for some input.

  • We prove that symmetries in the input cause embeddings to also produce high errors, and calculate error bounds for each kind of symmetry.

  • We propose the self-selected ensemble, a method that works well with many different rotation representations, even in the presense of input symmetry.

We further verify our theoretical claims with experiments.

2 Theoretical Results

2.1 Guaranteed Occurrence of High Errors

We first consider a toy problem: given a 3-d rotation represented by a rotation matrix, we want to convert it to other rotation representations with neural networks. We will see that under the very weak assumption that our neural network computes a continuous function, it is provable that given any such network that converts rotation matrices to quaternions or Eular angles, there always exists inputs on witch the network produces outputs with high error.

When treating quaternions as Euclidean vectors, we identify

with . We denote the vector dot product between and with a dot as and quaternion multiplication with juxtaposition as . The quaternion conjugate of is and the norm of which is the same for quaternions and vectors is .

It has been noticed that any function that converts 3D rotation matrices to their corresponding quaternion exhibits some “discontinuities” and that this is related to the fact that does not embed in . This has been argued by giving a specific conversion function and finding discontinuities. Most often, given a rotation matrix , if we have


where . Since quaternions and give the same rotation, any conversion from rotation matrix to quaternion needs to break ties. The conversion given above breaks ties towards the first coordinate being positive. When it equals zero there needs to be additional rules that are not relevant here. Then discontinuities can be found by taking limits on the “decision boundary”: consider defined by


That is, is the rotation around -axis by angle . Then when and when . So, we have


Thus is not continuous at . Since neural networks typically compute continuous functions, such a function cannot be computed by a neural network.

However, we feel that this argument fails to address this problem satisfactorily. Firstly, it pertains to a specific conversion rule. The ties can be broken towards any hemisphere, for which there are an infinite number of choices. In fact the image of needs not be a hemisphere. In addition, there is no reason to mandate a specific conversion function for the neural network to fit. We need to prove that even with the freedom of learning its own tie-breaking rules, the neural network cannot learn a correct conversion from rotation matrices to quaternions.

Another shortcoming is that this argument does not give error bounds. Since neural networks can only approximate the conversion anyways, can we get a continuous conversion function if we allow some errors and if so, how large does the margin have to be? Experiments in Zhou et al. (2019) hint at an unavoidable maximum error of , which is the largest possible distance between two 3D rotations. We want to prove that. Now we introduce our first theorem. Let be the standard conversion from a quaternion to the rotation it represents:


and let denote the distance between two rotations and , measured as an angle. To reduce cumbersome notations we overload so that when a quaternion appear as an argument in we mean . We have (see appendix A).

Theorem 1.

For any continuous function , there exists a rotation such that .


Let be defined as in equation 2. is continuous in . Consider as a path in . , so is a loop.

is a covering space of with being the covering map. . By the lifting property of covering spaces,111See standard algebraic topology texts, e.g. page 60 of Hatcher (2002) lifts to a unique path in starting from . That is, there exists a unique continuous function such that for all and . It is easy to see that is that path.

Let , then and , so . is continuous in . By the intermediate value theorem, there exists such that . So, .

Now we have found such that the rotations and differ by a rotation of . ∎

Note that there exists continuous functions mapping a set of Euler angles to a quaternion representing the same rotation. For example, for extrinsic -- Euler angles , we can get a quaternion of this rotation by multiplying three elemental rotations:


So, as a corollary, we conclude that a continuous function from 3D rotation matrices to Euler angles likewise must produce large error at some point, for otherwise by composing it with we get a continuous function violating theorem 1. Let be the standard conversion from extrinsic -- Euler angles to rotation matrices given by .

Corollary 2.

For any continuous function , there exists a rotation such that .

Obviously the same conclusion holds for any possible sequence of Euler angles, intrinsic or extrinsic.

It is easy to see that the same type of argument applies to the simpler case of functions from to that tries to compute the angle of the rotation, by using the top-left of , setting and and finding such that .

Theorem 3.

For any continuous function , there exists a rotation such that the rotation of angle differs from by a rotation of angle .

Given that the quaternion representation of 3D rotations is due to the exceptional isomorphism , there is no obvious generalization to -dimensional rotations. We do however discuss analogous results for 4D rotations in appendix D, due to .

2.2 The Self-selecting Ensemble

Neural networks are typically continuous (and differentiable) so that gradient based methods can be used for training. Nevertheless, we often employ discontinuous operations at test time, e.g. quantizing a probability distribution into class label in classification networks. Unfortunately this does not work for a regression problem with a continuous output space. However, by employing a simple ensemble trick, it is possible to transform this discontinuity of regression into a discontinuity of classification.

Consider again the problem of recovering the rotation angle from a 2D rotation matrix. Given with , what we want is exactly . There are different ways for choosing the principal value for this multi-valued function, e.g. in or in . Let us distinguish between these two by calling the first one and the second one . They have discontinuities at rotation angles of and (), respectively.

Now we can construct two functions that are continuous and computes and respectively except when near the discontinuity, where they give incorrect values in order to make them continuous. If we make sure that these two “wrong regions” do not overlap, then for any input matrix at least one of the two functions will give a correct rotation angle.

Theorem 4.

There exists continuous functions such that for any rotation , at least one of and gives the correct rotation angle of .


We give an example of and as follows:


It can be checked that these functions are continuous and that their wrong regions do not overlap. ∎

Since these functions are continuous, they can be approximated by neural networks. On top of these, we can add a classifier that predicts which function would give the correct output for each input. During training time, these functions and the classifier can be trained jointly: the error of the whole ensemble is the sum of the error of each individual functions, weighted by the probability assigned by the classifier. Now the discontinuity only happens at test time, when we select the output of the function with highest assigned probability. We call this method the

self-selecting ensemble.

Can a similar approach work for the conversion from 3D rotation matrices to quaternions? It turns out that two or even three functions are not enough:

Theorem 5.

For any three continuous functions , there exists a rotation such that for all .


Consider functions defined by , for . For any , Since , .

Let defined by , then for any . By the Borsuk–Ulam theorem,222See e.g. page 174 of Hatcher (2002) there exists such that , then , so , which means . So for , for . ∎

Theorem 5, with a seemingly simpler proof, implies theorem 1. But the proof of Borsuk–Ulam theorem is not simple, and it applies to hyperspheres only while the techniques in theorem 1 can be useful for other spaces as well, as we will show.

Allowing a fourth function, however, can give us a successful ensemble:

Theorem 6.

There exists continuous functions such that for any rotation , for some .

The proof is by construction. See appendix B.

Is the same true for Euler angles? Similar to corollary 2 we can conclude that an ensemble of 3 functions will not work. But to find an ensemble of 4 functions that does work, we have to additionally deal with gimbal lock. We first show that there is no correct continuous conversion from rotation matrices to Euler angles when the domain contains a gimbal locked position in its interior.

Theorem 7.

Let be any neighborhood of in . There exists no continuous function such that for every .

See appendix B for proof.

The same conclusion can be drawn near any where . Since gimbal locked positions has to be correctly handled, this problem cannot be solved by simply adding more functions that all have the same gimbal locked positions. Notice that if the order of elemental rotations is --, then the gimbal locked positions will be different. Analogous to we define as follows:


and also . We show the existence of a mixed Euler angle ensemble that gives the correct conversion:

Theorem 8.

There exists continuous functions such that for any rotation , at least one of the following is equal to : , , and .

The proof is by construction. See appendix B.

In section 3.1, we show with experiments that a neural network can successfully learn ensembles of four quaternion representations or ensembles of four mixed-type Euler angle representations that gives small error for rotation matrix conversion over the entire .

2.3 Input Symmetry and Effective Input Topology

The discussion above is of theoretical interest, but would be of little practical relevance if the discontinuity of quaternion and Euler angle representations can be solved by simply using a representation that is continuous, such as the embedding of into proposed in Zhou et al. (2019). We show however, that due to the combined effect of a symmetry in the input and a symmetry of the neural network function, such embeddings can become ineffective.

As in Zhou et al. (2019), we consider the problem of estimating the rotation of a target point cloud relative to a reference point cloud. Since by giving both the target and the reference and considering all possible point cloud at once we will be faced with a extremely complicated input space, for our theoretical analysis we focus on a very simple case: the reference point cloud is fixed, so that the network is only given the target point cloud which is a rotated version of the fixed reference point cloud.

At a first glimpse the topological structure of this problem is exactly the same as converting a 3D rotation matrix into other representations, since there is a homeomorphism between the input space and . However, the neural network might see a different picture. In a point cloud, there is no assumption of any relationship between different points, and it is considered desirable for the neural network to be invariant under a permutation of input points, which is a design principle of some popular neural network architectures for point cloud processing, e.g. PointNet Qi et al. (2017).

This behavior causes unexpected consequences. If the input point cloud itself possesses nontrivial rotational symmetry, then different rotations on this point cloud might result in the same set of points, differing only in order. Since the neural network is oblivious to the order of input points, these point clouds generated by different rotations are effectively the same input to the network.

Let be a 3D point cloud and be a rotation. Let . Then the symmetry group of , , is defined by . It is a subgroup of . Take as the reference point cloud. Then if two rotations generates the same target point cloud, then , so , so , that is, and belong to the same left coset of in . The reverse is also true.

So for the network, two inputs are equivalent if and only if the rotations that generate them belong to the same left coset of . The left cosets of in forms a homogeneous space, denoted .333A quotient group is denoted the same way but needs not be a normal subgroup of in general so here we do not mean a quotient group. When is finite, except for the degenerate case where all points of lie on a line, must be finite. If is a finite subgroup of , then is a covering space of , with covering map .

We can show that when is nontrivial, a network that is invariant under input point permutation cannot always recover the correct rotation matrix. Here by “correct” we mean the rotation given by the network need not be the same as the one used for generating the input point cloud, but must generate the same point cloud up to permutation. denotes the preimage of under .

Theorem 9.

Let be a nontrivial finite subgroup of , then there does not exist continuous function such that for every , .


Assume that there exists such a function . Choose any . Let . Then . Since is a nontrivial group, is a nontrivial covering space of . So . Select any from . is path-connected, so there is a path in from to , that is, there exists continuous function such that and . Fix such an and let . Then is a path in and .

By the lifting property of covering spaces, lifts to a unique path in starting from , which is just . That is, is the unique continuous function such that and . We also have and . Since is the unique continuous function such that and , we must have . But , which is a contradiction. ∎

This is essentially the same proof as in theorem 1 but without error bounds. Such bounds can be established, but the techniques are much more complicated. We discuss about this in appendix B.

Similar to corollary 2, for an embedding , since we can continuously map from the image of back to , there exists no continuous function that finds such an embedding of the correct rotation from the input point cloud, for otherwise gives a continuous function that computes a correct rotation matrix.

Corollary 10.

Let be a nontrivial finite subgroup of and be an embedding, then there does not exist continuous function such that for every , .

In particular, This means using the 6D or 5D rotation representations proposed in Zhou et al. (2019), which are embeddings of in or , does not resolve the problem. However, the self-selecting ensemble can solve this problem. We state the following without a formal proof:

Proposition 11.

Let be a finite subgroup of , then there exists continuous functions such that for every , for some .

The idea is that to each is assigned a contractible subset of on which they give the “correct” output. The values on the rest of are such that is continuous on . Then the ’s satisfy the requirement if these contractible subsets collectively cover . We discuss more about this in appendix B.

In section 3.2, we test rotation estimation for point clouds using different representations, on one point cloud with trivial symmetry and one with nontrivial rotational symmetry. We show that for the nontrivial case, an ensemble is necessary for the 5D and 6D embeddings as well as quaternions.

3 Experiments

3.1 Converting Rotation Matrices

We test the accuracy of converting a 3D rotation matrix into various representations with neural networks, including ensembles. We use an MLP with 5 hidden layers of size 128 each. The size of the output layer varies according to the representation. For ensembles, each individual function as well as the classifier share all their computations except for the output layer, so the overhead of an ensemble over a single network is tiny.

For a single network, the loss function is simply the rotation distance between the input and the output. For an ensemble of

functions, let each individual function in the ensemble be and the classifier be . Here we take the “raw” output vector of the classifier, without converting it into a distribution with e.g. . The loss of the whole ensemble on one input rotation is defined as


where is the maximum possible distance between two 3D rotations. At test time, is selected as the output of the ensemble where .

We train each network using Adam with learning rate and batch size for iterations. For training and testing, we sample uniformly from (see appendix A.4 for some notes). We sample million random rotations for testing.

Figure 2 shows the semi-log plot of errors of each representation by percentile. Mean and maximum errors are given in table 2. The maximum error is also marked in the graph for clarity. Here we compare a single quaternion, ensemble of four quaternions, a single set of Euler angles, mixed Euler angle ensemble with two sets each of -- and -- Euler angles, the 5D embedding and the 6D embedding. Comparison of ensembles of different sizes can be found in appendix C. In particular we will show that ensembles of three networks do not work.

The result does not actually show a maximum error of for a single quaternion or a single set of Euler angles. This is because while such high errors are guaranteed to exist, they are nevertheless very rare as in general the set of such inputs has measure zero and will almost never be encountered by uniform random sampling. We can see that while using a single quaternion or a single set of Euler angles we inevitably hit the maximum possible error, ensembles of four quaternions or mixed Euler angles can give fairly accurate conversion from rotation matrices to quaternions or Euler angles. In fact, the quaternion ensemble is only marginally worse than the 6D embedding and noticeably better than the 5D embedding.

3.2 Estimating the Rotation of a Point Cloud

We test the accuracy of estimating the rotation of a point cloud with various rotation representation, on point clouds with and without symmetries. There are many different possible symmetries. Here we test one of the possibilities as an example. For each of symmetry/non-symmetry we only use one fixed point cloud, for both training and testing, with the rotation being the only variation in the input. To understand why we use such an unconventional setting, along with all the details of network architecture, loss function, training and construction of the point cloud data, please refer to appendix C. For now it suffices to know that our network is invariant under input point permutation, and that our experiment comes in two parts: in part one, the point could does not have rotational symmetry; in part two, the point cloud has the rotational symmetry (in Schoenflies notation) which means it is invariant under a rotation of around the axis, the axis and the axis.

The result of part 1 is shown in figure 4 and table 4. A single quaternion, ensemble of 4 quaternions, and the 5D and 6D embeddings are compared. The result is consistent with that of rotation matrix conversion.

The result of part 2 is shown in figure 6 and table 6. It can be clearly seen that when the input possessess nontrivial rotational symmetry, the 5D and 6D embeddings can no longer correctly estimate the rotation of the input in all cases. In contrast, the ensemble of four quaternions continues to perform well. We added the ensemble of four 5D embeddings and the ensemble of four 6D embeddings to the comparison, and the results are similar to the ensemble of four quaternions. We can see that the difference between ensembles and single networks is qualitative while the difference between different kinds of representations is quantitative and comparatively rather minor.

From the numerical result one might guess that with a single network and input with symmetry, the lower bound of maximum error of rotation estimation is . We will derive this in appendix B.

[1.1] Type Mean() Max() Quat. Quat. 4 Euler Euler mix 5D 6D

Figure 1: Error of rotation matrix conversion by percentile
Figure 2: Error statistics

[1.1] Type Mean() Max() Quat. Quat. 4 5D 6D

Figure 3: Error of rotation estimation by percentile, of a point cloud with trivial symmetry
Figure 4: Error statistics

[1.1] Type Mean() Max() Quat. Quat. 4 5D 5D 4 6D 6D 4

Figure 5: Error of rotation estimation by percentile, of a point cloud with symmetry
Figure 6: Error statistics

4 Conclusion

In this paper, we analyzed the discontinuity problem of quaternion and Euler angle representation of 3D rotations in neural networks from a topological perspective and showed that a maximum error of must occur. We further explored the effect of symmetry in the input on the ability of the network to find the correct rotation representation, and found that symmetries in the input can cause continuous rotation representations to become ineffective, with provable lower bounds of maximum error. We proposed the self-selecting ensemble to solve this discontinuity problem and showed that it works well with different rotation representations, even in the presence of symmetry in the input. We verified our theory with experiments on two simple example problems, the conversion from rotation matrices to other representations and the estimation of the rotation of a point cloud. An extension of our results to 4D rotations were also discussed.

The application of our theoretical analysis and ensemble method to real-world problems involving rotation representation can be a direction for future researches. In addition, the potential usefulness of the self-selecting ensemble for solving a broader range of regression problems with different input and output topology or with other discontinuities in general can be further explored.

Broader Impact

This work is mainly concerned with a theoretical problem and does not present any foreseeable societal consequence.


  • Golub and Van Loan [2013] Gene H. Golub and Charles F. Van Loan. Matrix Computations. Johns Hopkins University Press, 4th edition, 2013.
  • Hatcher [2002] Allen Hatcher. Algebraic Topology. Cambridge University Press, 2002. Electronic version available at
  • Mebius [2005] Johan Ernest Mebius. A matrix-based proof of the quaternion representation theorem for four-dimensional rotations. arXiv preprint: math/0501249, 2005.
  • Miles [1965] R. E. Miles. On random rotations in . Biometrika, 52(3/4):636–639, 1965.
  • Qi et al. [2017] Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas.

    Pointnet: Deep learning on point sets for 3d classification and segmentation.


    Proceedings of the IEEE conference on computer vision and pattern recognition

    , pages 652–660, 2017.
  • Saxena et al. [2009] Ashutosh Saxena, Justin Driemeyer, and Andrew Y. Ng. Learning 3-d object orientation from images. In 2009 IEEE International Conference on Robotics and Automation, pages 794–800. IEEE, 2009.
  • Takens [1968] Floris Takens. The minimal number of critical points of a function on a compact manifold and the lusternik-schnirelman category. Inventiones mathematicae, 6(3):197–244, 1968.
  • van Elfrinkhof [1897] L. van Elfrinkhof. Eene eigenschap van de orthogonale substitutie van de vierde orde. In Handelingen van het zesde Nederlandsch Natuuren Geneeskundig Congres, pages 237–240, 1897.
  • Villegas et al. [2018] Ruben Villegas, Jimei Yang, Duygu Ceylan, and Honglak Lee. Neural kinematic networks for unsupervised motion retargetting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8639–8648, 2018.
  • Xiang et al. [2018] Yu Xiang, Tanner Schmidt, Venkatraman Narayanan, and Dieter Fox.

    Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes.

    In Robotics: Science and Systems (RSS), 2018.
  • Zhou et al. [2019] Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, and Hao Li. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5745–5753, 2019.

Appendix A Mathematical Notes

a.1 Distance in

The “difference” between two rotations and can be described by what it takes to change into , which can be done by multiplying on the left. is also a rotation. In 3 dimensions, the rotation angle of can be used to measure the distance between and .

Rotation matrices are orthogonal matrices. An important property of orthogonal matrices is that all their eigenvalues have length 1. Complex eigenvalues of real matrices always appear in conjugate pairs. For an orthogonal matrix, this means their eigenvalues must be

, or pairs of complex numbers of the form .

For 3D rotation matrices, their three eigenvalues are exactly and where is the rotation angle. The trace of a matrix equals the sum of its eigenvalues. So, for a 3D rotation matrix with rotation angle , , so . Let denote the distance between and , then .

For a unit quaternion , we have


That is, where is the scalar part of . We consider distances between rotations represented by quaternions often, so to avoid having to write every time we let also take quaternions as arguments directly, in which case by we mean . We have


Note however that defined as such is a metric in but not a metric in , as it does not satisfy the triangle inequality. Instead, we denote the usual metric on , the geodesic distance, by : . We have , or, .

a.2 Distance in

In theorem 9 we left open the problem of establishing error bounds. To find such bounds we must first define distance between a rotation and an element of . Let and . Define


That is, the distance from a rotation to an element of is its distance to the nearest preimage of in . This can also be extended to have quaternions as the first argument:


On the last line, the inner can be absorbed into the outer because for any we also have . We then further extend the definition to allow quaternions as the second argument:


Note that . , the preimage of under the covering map , is called a binary polyhedral group. Like specific finite subgroups of , the specific binary polyhedral groups have their names and notations, but generically let us denote them by . So we have


a.3 Distance in

To derive analogous results with error bounds for 4D rotations, we need to define the distance between two 4D rotations. In contrast to 3D rotations, there is no single rotation angle and rotation axis in general. Instead, there exists a pair of orthogonal planes that are invariant under the rotation. The rotation matrix has eigenvalues and , and the restriction of the 4D rotation on each of these two planes is a 2D rotation, one with rotation angle and the other with rotation angle .

Note that unlike in 3D where a rotation of angle is also a rotation of angle around the opposite axis, in 4D if we negate the sign of rotation in one of the pair of invariant planes by flipping the orientation of the other invariant plane, then the sign of rotation on the other invariant plane will also be negated. So in general we cannot necessarily have and both positive without changing the handedness of the coordinate system. Without loss of generality we assume that and .

For two 4D rotations and , we compute and from and take as the distance between and . Let us denote this by .

Since we are interested in the quaternion representation of 4D rotations, we want to compute the distance between two rotations from their quaternion representations, without having to compute the eigenvalues of a non-symmetric matrix. We first introduce the quaternion representation of 4D rotations. The formula is due to van Elfrinkhof van Elfrinkhof [1897]. A proof in English can be found in Mebius [2005].

Let be a matrix. Define the associate matrix as


has rank and Frobenius norm if and only if is a rotation matrix. In such case there exist real numbers , , , , , , and such that and . The solution is unique up to negating all these numbers. Then, the rotation matrix can be decomposed as where


The two matrices commute, and it can be checked that


So, given a pair of two unit quaternions and , they represent the 4D rotation


and if is a point and is its quaternion form , then the quaternion form of is . and represent the same rotation, and for a given rotation matrix, can be uniquely determined up to negation.

We then proceed to find the relationship between quaternion distance and 4D rotation distance . The real Schur decomposition theorem444See e.g. page 377 of Golub and Van Loan [2013] states that for any real matrix , there exists an orthogonal matrix such that


where each is either a matrix or a matrix having complex conjugate eigenvalues. Apply this to the case where is a 4D rotation matrix, and treat a pair of eigenvalue or a pair of eigenvalue as a complex conjugate pair, then there exists orthogonal matrix such that


where and are matrices having complex conjugate eigenvalues. Since is orthogonal, is also orthogonal. so is a rotation. From we can get . Assume that has determinant (otherwise it has determinant , then we can negate the last row of so that it has determinant ) so that it is also a rotation. has the same eigenvalues as , so we must have


where and are the two rotation angles of . Let . Its associate matrix is