1 Introduction
In Moser and Veselov (1991), Moser and Veselov proposed the following equations to discretize the classical EulerArnold differential equations for the motion of a generalized rigid body:
(1)  
where is the angular momentum with respect to the body (here represented by a skewsymmetric matrix), is the inertia matrix (symmetric positive definite), and
(orthogonal matrix) is the angular velocity. Rigid body equations arise in several applications, e.g., celestial mechanics, molecular dynamics, mechanical robotics, and flight control, where they are used in particular to understand the body–body interactions of particles like planets, atoms, and molecules. See, for instance,
Bloch (2015); Dang et al. (2020); Lee et al. (2007); Nordkvist and Sanyal (2010); Kalabic et al. (2014, 2017) and the references therein.The main challenge of solving (1) is to find an orthogonal matrix in the second equation, by assuming that and are given. Mathematically, the problem consists of finding an orthogonal matrix (for convenience, here we use instead of ) such that
(2) 
where is a given symmetric positive definite matrix, and is a known skewsymmetric matrix. All the matrices involved are square of order . The matrix equation (2) is known as the MoserVeselov equation and was firstly investigated in Moser and Veselov (1991), where the authors based their developments on factorizations of certain matrix polynomials. A different approach, but computationally more efficient, was provided later in Cardoso and Leite (2003), where the authors noted that (2) can be connected with a certain algebraic Riccati equation and, in turn, with the Hamiltonian matrix
(3) 
We now revisit some results stated in Cardoso and Leite (2003), concerning the existence and uniqueness of solutions of (2).
Theorem 1.
For the matrix equation (2):

There exists a solution (the special orthogonal or rotation group of order
) if and only if the size of the Jordan blocks associated to the pure imaginary eigenvalues of
(if any) is even; 
(2) has a unique solution if and only if the spectrum of is pure imaginary and the size of the Jordan blocks associated to each (nonzero) eigenvalue is even.
From Theorem 1, we can see that the MoserVeselov equation may have no solution in if the associated Hamiltonian matrix
has any pure imaginary eigenvalue with a Jordan block of odd size. It is also known that the existence of purely imaginary eigenvalues in
causes significant difficulties in solving (2). To avoid those situations, as far as we know, in all the existing algorithms for solving the equation, it is assumed a priori that does not admit any pure imaginary eigenvalue (see, (Mclachlan and Zanna, 2005, Sec. 1.2), (Moser and Veselov, 1991, Sec. 1.4), and (Nordkvist and Sanyal, 2010, Sec. G)). Moreover, the algorithms based on solving the associated algebraic Riccati equation require the strong condition that the matrix must be symmetric positive definite. These issues have motivated us to investigate methods whose applicability does not require those restrictive conditions. As we can see later in Sec. 5, the three proposed optimization algorithms produce special orthogonal solutions, even when is not symmetric positive definite. Those iterative algorithms may also be used in problems where has purely imaginary eigenvalues associated with Jordan blocks of even size (check Theorem 1) but, as will be illustrated later in Sec. 5, the convergence may slow down.Problem 2.
Let denote the Frobenius norm, i.e., . The problem of finding a special orthogonal solution in (2) can be formulated as an optimization problem in the following way:
(4) 
In Problem 2, we have chosen the Frobenius norm because its definition in terms of the trace of a matrix allows us to access the derivatives of the objective function easily, making it more suitable to handle optimization problems than other norms, like, for instance, spectral or infinity norms.
The literature on numerical methods for solving nonlinear constrained problems, like (4), is large; see for instance Nocedal and Wright (2006); Luenberger and Ye (2015); Polak (1997); Miraldo and Cardoso (2020); Campos et al. (2019). However, due to the complicated expression of the objective function and the large number of constraints arising from the conditions and , some care must be taken with the choice of the methods.
Techniques from Riemannian geometry for solving optimization problems with orthogonal constraints have attracted the interest of many researchers in the last decades; see Absil et al. (2007); Edelman et al. (1999), and the references therein. An essential feature of those techniques is that they allow the transformation of a constrained optimization problem into an unconstrained one. Moreover, since the set of orthogonal matrices is a manifold and provided that the objective function satisfies some smoothness requirements, we can make available tools such as Euclidean gradients, Riemannian gradients, retractions, and geodesics.
The three methods presented in this work evolve on the orthogonal manifold and belong to the family of line search methods on manifolds described in (Absil et al., 2007, Ch. 4). They are iterative and feasible (or constraintpreserving), in the sense that, starting with a matrix , all the iterates also stay in .
The major contributions of this work are:

a detailed discussion on the properties of the optimization problems, including convexity issues;

a novel approach for solving the unconstrained optimization problems arising in each iteration of the Bregman splitting algorithm and a discussion about the reasons that led the MATLAB’s fminunc function to give unsatisfactory results in some circumstances;

a novel relative residual capable to infer about the quality of the computed solution of the MoserVeselov equation;

careful modifications on existing algorithms for solving optimization problems with orthogonal constraints, to make them suitable for our particular problems.
In the next section, we derived a workable expression to the objective function of Problem 2. Sec. 3 presents the three algorithms proposed in this paper. In Sec. 4, numerical issues of the algorithms are discussed, and, in Sec. 5, a selection of experiments are carried out to illustrate the performance of the algorithms. In Sec. 6, some conclusions will be drawn.
2 Rewriting the Objective Function
Let us denote by the objective function arising in (4). Using the properties of the trace of a matrix and attending that and , we have (detailed calculation is omitted):
(5) 
Now, if we take into account the orthogonality of , that is, , then can be simplified to (a different notation is used):
(6) 
which is the restriction of to the orthogonal group , that is, , for any , but, in general, , if . Hence, the problem (4) may be simplified to:
(7) 
where .
If denotes the entry of the matrix , then both defined in (5) and in (6) are differentiable functions in , because they are quadratic polynomials in the variables . In the following lemma we show that is convex in the set of all matrices with real entries , that is:
(8) 
for any and .
Lemma 3.
The function given in (5) is convex in .
Proof.
Let us denote . Attending that the Frobenius norm satisfies the triangle inequality, we have
(9) 
for all and . Consider the scalar function . Since , for any , and is nondecreasing in , we conclude that is convex. ∎
Similarly, we could show that is convex in . Note, however, that the constraints of the optimization problem (7) are nonconvex, that is, for and , in general , which makes the problem much more difficult.
Now, we use the rules for the derivatives of the trace function (see, for instance, (Lutkepohl, 1996, Ch. 10)) to obtain the expressions of the Euclidean gradients (derivatives with respect to ) of those functions:
(10)  
(11) 
The Riemannian gradients in the orthogonal manifold can be defined by:
(12)  
(13) 
We recall that these Riemannian gradients belong to the orthogonal group’s tangent space, which is the set of skewsymmetric matrices (see, for instance, Edelman et al. (1999)).
In the next section, we propose three algorithms to solve (2).
3 Algorithms
There is a vast literature on methods for optimizing functions on the orthogonal group or, more generally, on the Stiefel manifold (e.g., Abrudan et al. (2008); Edelman et al. (1999); Gao et al. (2018); Jiang and Dai (2015); Lai and Osher (2014); Manton (2002); Wen and Yin (2013); Zhu (2017)). Among those methods, we have selected three stateoftheart ones that we believe to be well suited for our specific objective function. The first algorithm (Section 3.1) splits the orthogonal constraints in a Bregman’s style Bregman (1967); Osher et al. (2005) and is based on the SOC algorithm proposed by Lai and Osher in (Lai and Osher, 2014, Alg. 2). Our second algorithm (Section 3.2) is inspired in the feasible method developed by Wen and Yin in (Wen and Yin, 2013, Alg. 2), which uses a retraction — the Cayley transform — instead of geodesics. It is a line search method and, to find the appropriate step size, the method of BarzilaiBorwein (BB) Barzilai and Borwein (1988) is used. The third algorithm involves line search techniques, namely the Armijo’s rule, and is based on a proposal by Abrudan et al. in (Abrudan et al., 2008, Table II). It is of steepest descent type and involves geodesics, more specifically matrix exponentials.
3.1 Algorithm Based on Bregman Splitting
The SOC Algorithm (Lai and Osher, 2014, Alg. 2) applied to Problem 2 is summarized in the following steps:

Choose a positive scalar and a starting matrix . Set and ;

While “not converge” do

where is defined by (5);

;

Compute the singular value factorization:
; 
;

.

A drawback of this algorithm is the requirement of solving the (unconstrained) optimization problem in Step 2(a). Let us write down the corresponding objective function in terms of the trace. Set and . After a few calculations, we have
(14) 
Hence, the objective function in Step 2(a) is , where denotes the function given in (5), and the associated unconstrained optimization problem may be formulated as
(15) 
where are given square matrices of order , and are, respectively, symmetric positive definite and skewsymmetric, and is a positive parameter. By a similar argument to the one used in Lemma 3, we can show that, for each , in (14) is convex, and the same is valid to the objective function . This is a very useful property because it guarantees that local minima are global as well.
Since each iteration of the above SOC algorithm requires the solution of a convex unconstrained optimization problem of the form (15), whose objective function changes for each , according to the entries of the matrix , a possible approach to solve each one is to use the MATLAB’s function fminunc, which is based on quasiNewton and trust region methods. More precisely, fminunc is based, by default, on the Broyden, Fletcher, Goldfarb, and Shanno quasiNewton method, which is also know as the BFGS method (check (Nocedal and Wright, 2006, Sec. 8.1) and the references therein). However, if the gradient is provided, then fminunc switches to a trust region method based on the proposals of Coleman and Li Coleman and Li (1994, 1996).
We have solved many unconstrained problems of the form (15) using fminunc but, despite the convexity, the results were not so good as expected, either in terms of speed of convergence or in terms of accuracy. We learnt from our experiments that fminunc (without gradient and with the default tolerance of ) is reliable only for very small size problems (say, ). As increases, we observed that in some iterations of the SOC algorithm, fminunc was unable to minimize . It displayed warnings like “local minimum possible” or “solver stopped prematurely”. Recall that finding the minimum of (15) in all the iterations is necessary to guarantee the convergence of the SOC algorithm. For a given fixed , let denotes the th iteration generated by fminunc when applied to the minimization . An interesting fact we have observed was that, as increased, decreased quite fast to values lower than
, while the components of the gradient vector
decreased in a slow fashion towards zero. This implies that fminunc involves a large number of iterations to guarantee that the norm of the gradient is lower than a fixed tolerance and hence that satisfies (up to that tolerance) the firstorder necessary conditions. Recall that, if is a local (or global) minimizer of and (which is continuously differentiable), we must have .However, if the gradient function (check (16)) is provided to fminunc and the tolerance is set to , we see that its performance improves and, as shown later in Section 5, for equations of small size (say, ), the usage of fminunc can be viewed as a possible approach to make the SOC algorithm effective.
To deal with smaller tolerances and equations involving matrices with larger size, we propose a different approach, which is described next. As we will see later, this approach seems to be very promising, either in terms of accuracy and computational cost.
We start by finding the zeros of the gradient of (here, to simplify the notation, we omit the subscript ). We note that the expression of is nonlinear and involves variables.
A few calculations lead to the following expression to the gradient of the objective function :
(16) 
We know that local minima of are among the zeros of its Euclidean gradient. Since is convex, those local minima (if any) will be global as well. Therefore, we need to investigate the solutions of the matrix equation , which is equivalent to
(17) 
An easy way of solving (17) is achieved by performing vectorization. Let stand for the operator that stacks the columns of a matrix into a long vector of size , and let denote the Kronecker product. It is wellknown that and that
(18) 
where is the commutation (or permutation) matrix of order (check (Lutkepohl, 1996, Ch. 7, Sec. 9.2)). Applying the operator to (17) yields
(19) 
Note that (19) corresponds to a linear system of type
where is an matrix, , and . The vectorization approach is very useful to understand the theory of the matrix equation (17). If is nonsingular, a unique solution to (17) is guaranteed. Otherwise, if is singular, (17) may have infinitely many solutions or no solutions. In the numerical examples we have considered (including the ones to be shown in Section 5), the matrix encountered was always nonsingular.
It turns out, however, that solving (17) through the linear system would require operations, which is prohibitive, especially if large. This has motivated us to look for less expensive methods for solving (17).
Since is assumed to be nonsingular, a right multiplication of the matrix equation (17) by yields
(20) 
Setting , , , and , (20) can be rewritten in the form
(21) 
which is a Sylvestertype equation. An effective method to solve (21) may be found in (Terán and Dopico, 2011, Alg. 3.1), which is based on the socalled QZ decomposition. Do not confuse (21) with the more easily to handle classical Sylvester matrix equation . Although the QZ decomposition is quite expensive, it can be performed in operations. Hence, it is possible to solve (17) and the Step 2(a) of the SOC algorithm efficiently by operations, which is the typical cost for problems involving matrixmatrix products.
A drawback we have noticed when performing experiments with a direct application of the algorithm of Lai and Osher (2014), to solve the MoserVeselov equation, was the poor feasibility of the approximation obtained when compared to the other algorithms to be addressed in the next sections. That is, denoting by the approximation obtained for the solution, we have observed that the value was not close enough to zero. To overcome this issue, we project
onto the orthogonal group by computing its singular value decomposition (SVD). Provided that (
21) has a unique solution , that means that is the unique minimizer of the objective function in (15).3.2 Two Steepest DescentType Algorithms
A successful method to solve optimization problems with orthogonal constraints is the feasible iterative method developed in (Wen and Yin, 2013, Alg. 2). At each iteration, the skewsymmetric Riemannian gradient (13) is multiplied by a suitable positive number (stepsize), and transformed into an orthogonal matrix by means of the Cayley transformation
(22) 
where is the Riemannian gradient and is an orthogonal matrix. In the iterative procedure, the orthogonal matrix may be viewed as an improvement of a previous approximation .
There are many methods available to compute the stepsize . In (Wen and Yin, 2013, Alg. 2), the authors recommend a nonmonotone linear search method, because of its good theoretical properties regarding the convergence. However, it is not considered here because it has led to poor results in many experiments carried out (not shown here) with particular MoserVeselov equations. In our modified version of the WenYin algorithm (Wen and Yin, 2013, Alg. 2), which is presented in Algorithm 2, we use, instead, the alternating BB method of Dai and Fletcher (2005):
(23) 
where , and denotes the Euclidean scalar product.
The second steepest descenttype algorithm addressed here (Algorithm 3) is a variation of the approaches proposed in Manton (2002); Abrudan et al. (2008) and we refer the reader to those papers for more technical details. In a few words, Algorithm 3 starts with an initial approximation , finds the skewsymmetric matrix (the gradient direction on the manifold), and performs several steps along geodesics until convergence. We recall that geodesics on (i.e., curves giving the shortest path between two points in the manifold) can be defined through the matrix exponential as:
where is a skewsymmetric matrix and is a real scalar. In Algorithm 3, the positive scalar controls the length of the “tangent vector” and, in turn, the algorithm’s overall convergence. To find an almost optimal , the algorithm uses the Armijo’s stepsize rule (Polak, 1997, Sec.1.3).
4 Numerical Issues
This section addresses some numerical issues associated with the implementation of the three algorithms proposed so far to solve the MoserVeselov equation.
4.1 Convergence
Assuming that all the pure imaginary eigenvalues (if any) of the matrix (3) have Jordan blocks with even size, the existence of at least a solution in of the MoserVeselov equation is guaranteed (check Theorem 1). Provided that a careful choice of the starting matrix is made, one of those solutions may be obtained by the three proposed algorithms. We recall that finding an initial guess that minimizes the number of iterations in iterative methods for solving equations is, in general, a challenging problem. However, steepestdescent algorithms combined with suitable methods for computing the step size have good convergence properties (see Barzilai and Borwein (1988) for the BB method and (Polak, 1997, Sec. 1.3.2) for the Armijo’s method). They have linear convergence, but, in contrast to Newtontype methods, in general, it is easier to find an for which the iterative sequence generated by the method converges.
In our specific case, we have observed through many experiments (some of them will be shown in Sec. 5) that is a reasonable choice for the three proposed algorithms, in the sense that it leads to convergence towards a special orthogonal solution. Experiments where has been taken as a randomized special orthogonal matrix will be considered in Sec. 5, but with a slower convergence; see Figures 3 and 4.
4.2 Computational Cost
The three proposed algorithms require
operations, which, as written before, is acceptable for algorithms involving matrix computations. However, this information is vague, and we shall give a more sharp estimate of the cost. Namely, we need to find the coefficient of
in the polynomial giving the total number of operations. As usual, the terms in and are ignored.In terms of computational cost by iteration, the Bregman splitting Algorithm 1 is, in general, the most expensive while Algorithm 2 is the cheapest. However, Algorithm 1 converges, in general, faster, attaining the same accuracy of the other two algorithms in much fewer iterations (see Section 5).
Algorithm 1: The cost is mainly determined by the cost of solving a Sylvestertype equation of the form (21) and the computation of two Singular Value Decompositions (SVD). Solving (21) involves about operations ( for the QZ algorithm and for the remaining calculations; see Terán and Dopico (2011)), and each SVD involves about operations by the method of Golub and Reinsch Golub and Reinsch (1970). Note that while each iteration includes two SVD, the quite expensive QZ decomposition is just required one time because and are fixed during all the iterations.
Algorithms 2 and 3: The objective functions considered in the steepest descenttype algorithms involve the computation of the trace of matrix products. The efficient computation of does not require matrixmatrix products. Instead, it can be carried out through the formula:
(24) 
where the operator denotes the Hadamard product, i.e., the entrywise product. If and are matrices of order , the direct computation of the matrix product needs operations, while the trace at (24) just requires . However, as far as we know, the trace of a product of three or four matrices requires the computation of one matrixmatrix product which costs operations.
Hence, the evaluation of the objective function (see its expression in (6)) involved in the steepest descent type algorithms requires operations (two matrixmatrix products), while its computation inside the cycles requires only ; the product needs to be computed just one time as the algorithm runs.
Each iteration of the Algorithm 2 requires the computation of a Cayley transform, which corresponds to solving a multiple righthand side linear system of the form , which costs about . Concerning Algorithm 3, one exponential of a skewsymmetric matrix is required in each iteration. In Cardoso and Leite (2010), a scaling and squaring algorithm designed specifically for exponentials of a skewsymmetric matrix is proposed, with an overall cost of , where stands for the number of squarings. Alternatively, one can use the general algorithm available through the function expm of MATLAB, which implements the scaling and squaring algorithm of AlMohy and Higham (2009). Its cost is , and we refer the reader to AlMohy and Higham (2009) for the detailed expression of the computational cost. We note that, in the particular case , the exponential of a skewsymmetric matrix can be computed by the wellknown Rodrigues’s formula, at the cost of just one matrixmatrix product.
4.3 Residual Estimates
Let us consider the residual function
(25) 
and assume that is an approximation to the exact solution of the MoserVeselov equation obtained by a certain numerical algorithm. Hence, , for some matrix of order .
In the numerical computations of solutions of matrix equations, it is, in general, difficult to estimate the absolute error or the relative error , where stands for a given subordinate matrix norm. Thus, the authors work instead with relative residuals to check the quality of the approximation and the numerical stability.
An obvious definition for the relative residual of the MoserVeselov equation would be
(26) 
However, as pointed out in (Guo and Higham, 2006, Sec. 5) (see also (Higham, 2008, Problem 7.15)) for the matrix equation (i.e., for computing the matrix th root of ), this definition may not be appropriate in some situations. This has motivated us to propose a definition for the relative residual in the style of what is suggested in Guo and Higham (2006); Higham (2008).
With respect to the residual function given in (25), we have
By vectorization and attending to some properties of the Kronecker product, we have
where is the permutation matrix of order (check (18)). Denoting , we have
With respect to the spectral norm , we have
and, by attending that, for any matrix , , it follows
(27) 
Let us suppose that , for a certain small value . Note that , for any orthogonal matrix of order because . Then and
or, equivalently,
This suggests the following definition for the relative residual:
(28) 
To understand why the relative residual (28) is more meaningful than (26), we shall notice that solving (ignoring the orthogonal constraint on ) is equivalent to solve the linear system , with being in general singular, and that the norm of must be considered in error analysis, as the expression (28) does.
4.4 Termination Criteria
Several strategies are available to decide when terminating an iterative procedure. Attending to the nature of our optimization problem in (4), the sequence generated by , where
must converge to zero. Hence, we can fix a tolerance and then iterate while . Note that, in Algorithms 2 and 3, is available during the algorithm and does not involve extra cost.
The typical behavior of methods with linear convergence is to slow down as approaches stationary points. Often, it may be difficult to detect this phenomenon’s occurrence, so another condition to stop the cycle must be added. In our implementations of the algorithm, we consider the classical relative difference
as well. Because the algorithms are implemented in finite precision environments, a maximal number of iterations must be considered to stop iterating. In summary, we fix tolerances , and a maximal number of iterations and stop iterating when
(29) 
5 Numerical Experiments
To evaluate the performance of the proposed algorithms, we have carried out a set of experiments in MATLAB R2021a (with unit roundoff ) in a machine with Core i5 (1.60GHz). To terminate the iteration procedure in the algorithms, we have used the following criteria:
where tol is a prescribed tolerance. The following terminology is used:

#iter: number of iterations;

relres: relative residual defined in (28);

: value of the objective function defined in (6);

: norm of the Riemannian gradient defined in (13).
We have considered in Algorithm 1 and in Algorithm 2. Most of the MoserVeselov equations considered in the experiments do not satisfy the condition required by direct methods, but the associated Hamiltonian matrix (see (3)) has no pure imaginary eigenvalue. Experiment 4 is devoted to the case where has pure imaginary eigenvalues.
5.1 Experiment 1
This experiment involves a set of MoserVeselov equations with randomized matrices and of order , plus a set of MoserVeselov equations with randomized matrices and of order , and so on, up to a set of MoserVeselov equations with randomized matrices and of order and aims at comparing Algorithm 1 with Algorithm 2 in terms of #iter, relres, and , for a tolerance and the initial guess set to . The results are displayed through a boxplot with whiskers in Figures 1 and 2
. Experiment 1 involves 2000 MoserVeselov equations and, as expected, there may have outliers, which are represented by red crosses in the graphs. All of them are finite, i.e., no NaN’s of Inf’s arose in the calculations. In most of the cases, these outliers reflect difficulties in the intermediate calculations involved in the algorithms, like the computation of inverses of illconditioned.
A careful inspection of those figures lead us to conclude that Algorithm 1 gives the best results in terms of relative residuals, number of iterations, values of the objective function and norms of the Riemannian gradient, despite it involves higher computational cost by iteration than Algorithm 2. However, since it requires much fewer iterations, the overall computational cost is, in general, smaller.
5.2 Experiment 2
This experiment involves MoserVeselov equations: equations with randomized matrices and of order , plus equations with randomized matrices and of order , and so on, up to equations with randomized matrices and of order . It illustrates the performance of Algorithms 1 and 2 according to the choice of the initial guess , in terms of the number of iterations #iter and of relative residuals relres. The results of Algorithm 1 are displayed in Figure 3 and of Algorithm 2 in Figure 4, by means of boxplot graphs with whiskers. We observe that both algorithms perform much better if we take instead of a randomized special orthogonal matrix.
5.3 Experiment 3
Now, we consider the same set of MoserVeselov equations as in Experiment 2 (Sec. 5.2) to illustrate Algorithm 1, SOC Algorithm + fminunc (i.e., the algorithm described at the beginning of Sec. 3.1, with Step 2(a) solved using fminunc of MATLAB, where the gradient (16) is specified in the options mode), Algorithm 2 and Algorithm 3, in terms of relative residuals and the number of iterations, for and a tolerance . The results are displayed in Figure 5. Note that now we are using a larger tolerance than in Experiments 1 and 2. This is because for smaller tolerances both SOC Algorithm+fminunc and Algorithm 3 diverge frequently or may require thousands of iterations.
In Figure 6, we compare the computational time of Algorithm 1 with SOC Algorithm + fminunc, where we can see that the latter algorithm requires about times the computational time of the former. If we decrease the tolerance or increase the size of the matrices, the results of SOC Algorithm + fminunc get even worse and may have little practical interest. For these cases, we recommend instead Algorithms 1 or 2.
5.4 Experiment 4
In this experiment, the goal is to illustrate the behaviour of Algorithms 1, 2 and 3 when the Hamiltonian matrix (check (3)) has some pure imaginary eigenvalues associated to Jordan blocks of even sizes. We have taken ten MoserVeselov equations involving randomized matrices of order that were carefully chosen to guarantee that fits the conditions just mentioned. The results are displayed in Figure 7. While in Experiments 1, 2 and 3, Algorithm 1 has performed very well in terms of the number of iterations, now it is the one that gives the poorest results ( iterations in all the ten tests)! For a compromise regarding the relative residual and the number of iterations, Algorithm 2 gives the best results.
5.5 Other experiments and considerations
Experiments with Algorithms 1 and 2 for MoserVeselov equations involving randomized dense matrices of larger sizes () were also performed (not shown here), for a tolerance . In terms of the magnitude of the relative residual and of computational time, we observed a gradual deterioration as the size of the matrices increased. So, our overall recommendation is that those algorithms are suitable for dense matrices of small/medium size, say .
We are not aware of existing algorithms for solving MoserVeselov equations involving dense (or sparse) matrices of large size. We leave these investigations for further research.
Methods for solving the optimization problem in (4) hardly give results with relative residuals of order the unit roundoff . That is the price to pay for considering the objective function as instead of
, whose expression is more difficult to handle. In iterative methods implemented in environments involving floatingpoint arithmetic, as
becomes less than , its value may stop decreasing. So, it is more reasonable to expect approximations with relative residuals of order .6 Conclusions
This paper proposes three algorithms for solving the discrete EulerArnold equation for the generalized rigid body motion estimation: a Bregman splitting (Algorithm 1), and two steepest descentbased methods (Algorithms 2 and 3). An essential advantage of these methods is that they do not require the strong condition , in contrast with other methods existing in the literature. Important numerical issues related to the algorithms, like convergence, computational cost, residual estimates, and termination criteria, have been investigated in detail. The numerical experiments that have been carried out to evaluate the performance of the three methods suggest that the Bregman splitting Algorithm 1 is promising, at least for equations where the associated Hamiltonian matrix has no pure imaginary eigenvalue.
Acknowledgements: The work of João R. Cardoso was partially supported by the Centre for Mathematics of the University of Coimbra  UIDB/00324/2020, funded by the Portuguese Government through FCT/MCTES. Pedro Miraldo was partially supported by the LARSyS  FCT Plurianual funding 20202023.
References
References

Abrudan et al. (2008)
Abrudan, T., Eriksson, J.,
Koivunen, V., 2008.
Steepest descent algorithms for optimization under unitary matrix constraint.
Trans. Sig. Proc. 56, 1134–1147.  Absil et al. (2007) Absil, P.A., Mahony, R., Sepulchre, R., 2007. Optimization Algorithms on Matrix Manifolds. Princeton University Press.
 AlMohy and Higham (2009) AlMohy, A., Higham, N., 2009. A new scaling and squaring algorithm for the matrix exponential. SIAM J. Matrix Anal. Appl. 31, 970–989.
 Barzilai and Borwein (1988) Barzilai, J., Borwein, J., 1988. Twopoint step size gradient methods. IMA J. Numer. Anal. 8, 141–148.
 Bloch (2015) Bloch, A., 2015. Nonholonomic Mechanics and Control. 2 ed., Springer.
 Bregman (1967) Bregman, L., 1967. The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex optimization. USSR Comput. Math. Math. Phys. 7, 200–217.
 Campos et al. (2019) Campos, J., Cardoso, J.R., Miraldo, P., 2019. Poseamm: A unified framework for solving pose problems using an alternating minimization method, in: IEEE Int’l Conf. Robotics and Automation (ICRA), pp. 3493–3499.
 Cardoso and Leite (2003) Cardoso, J., Leite, F., 2003. The MoserVeselov equation. Linear Alg. Appl. 360, 237–248.
 Cardoso and Leite (2010) Cardoso, J., Leite, F., 2010. Exponentials of skewsymmetric matrices and logarithms of orthogonal matrices. J. Comput. Appl. Math. 233, 2867–2875.
 Coleman and Li (1994) Coleman, T., Li, Y., 1994. On the convergence of reflective newton methods for largescale nonlinear minimization subject to bounds. Mathematical Programming 67, 189–224.
 Coleman and Li (1996) Coleman, T., Li, Y., 1996. An interior, trust region approach for nonlinear minimization subject to bounds. SIAM Journal on Optimization 6, 418–445.
 Dai and Fletcher (2005) Dai, Y.H., Fletcher, R., 2005. Projected barzilaiborwein methods for largescale boxconstrained quadratic programming. Numer. Math. 100, 21–47.
 Dang et al. (2020) Dang, Q., Gui, H., Liu, K., Zhu, B., 2020. Relaxedconstraint pinpoint lunar landing using geometric mechanics and model predictive control. Journal of Guidance, Control, and Dynamics URL: https://doi.org/10.2514/1.G005039.
 Edelman et al. (1999) Edelman, A., Arias, T., Smith, S., 1999. The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20, 303–353.
 Gao et al. (2018) Gao, B., Liu, X., Chen, X., Yuan, Y.X., 2018. A new firstorder algorithmic framework for optimization problems with orthogonality constraints. SIAM J. Optim. 28, 302–332.
 Golub and Reinsch (1970) Golub, G., Reinsch, C., 1970. Singular value decomposition and least squares solutions. Numer. Math. 14, 403–420.
 Guo and Higham (2006) Guo, C.H., Higham, N., 2006. A schurnewton method for the matrix pth root and its inverse. SIAM J. Matrix Anal. Appl. 28, 788–804.
 Higham (2008) Higham, N., 2008. Functions of Matrices: Theory and Computation. SIAM, Philadelphia, PA, USA.
 Jiang and Dai (2015) Jiang, B., Dai, Y.H., 2015. A framework of constraint preserving update schemes for optimization on stiefel manifold. Math. Program. 153, 535–575.
 Kalabic et al. (2014) Kalabic, U., Gupta, R., Cairano, S.D., Bloch, A.M., Kolmanovsky, I.V., 2014. Constrained spacecraft attitude control on SO(3) using reference governors and nonlinear model predictive control. American Control Conference , 5586–5593.
 Kalabic et al. (2017) Kalabic, U., Gupta, R., Cairano, S.D., Bloch, A.M., Kolmanovsky, I.V., 2017. Mpc on manifolds with an application to the control of spacecraft attitude on SO(3). Automatica 76, 293–300.
 Lai and Osher (2014) Lai, R., Osher, S., 2014. A splitting method for orthogonality constrained problems. J. Sci. Comput. 58, 431–449.
 Lee et al. (2007) Lee, T., Leok, M., McClamroch, N., 2007. Lie group variational integrators for the full body problem in orbital mechanics. Celestial. Mech. Dyn. Astr. 98, 121–144.
 Luenberger and Ye (2015) Luenberger, D., Ye, Y., 2015. Linear and Nonlinear Programming. Springer Publishing Company, Incorporated.
 Lutkepohl (1996) Lutkepohl, H., 1996. Handbook of Matrices. 1 ed., Jonh Wiley and Sons.
 Manton (2002) Manton, J., 2002. Optimization algorithms exploiting unitary constraints. Trans. Sig. Proc. 50, 635–650.
 Mclachlan and Zanna (2005) Mclachlan, R., Zanna, A., 2005. The discrete moser—veselov algorithm for the free rigid body, revisited. Found. Comput. Math. 5, 87–123.
 Miraldo and Cardoso (2020) Miraldo, P., Cardoso, J.R., 2020. On the generalized essential matrix correction: An efficient solution to the problem and its applications. J. Math. Imaging Vis. 62, 1107–1120.
 Moser and Veselov (1991) Moser, J., Veselov, A.P., 1991. Discrete versions of some classical integrable systems and factorization of matrix polynomials. Commun. Math. Phys. 139, 217–243.
 Nocedal and Wright (2006) Nocedal, J., Wright, S., 2006. Numerical Optimization. second ed., Springer, New York, NY, USA.
 Nordkvist and Sanyal (2010) Nordkvist, N., Sanyal, A., 2010. A lie group variational integrator for rigid body motion in SE(3) with applications to underwater vehicle dynamics. IEEE Conference on Decision and Control (CDC) , 5414–5419.
 Osher et al. (2005) Osher, S., Burger, M., Goldfarb, D., Xu, J., Yin, W., 2005. An iterative regularization method for total variationbased image restoration. Multiscale Model. Simul. 4, 460–489.
 Polak (1997) Polak, E., 1997. Optimization: Algorithms and Consistent Approximations. SpringerVerlag.
 Terán and Dopico (2011) Terán, F.D., Dopico, F., 2011. Consistency and efficient solution of the sylvester equation for congruence: . Electron.J. Linear Algebra 22, 849–863.
 Wen and Yin (2013) Wen, Z., Yin, W., 2013. A feasible method for optimization with orthogonality constraints. Math. Program. 142, 397–434.
 Zhu (2017) Zhu, X., 2017. A riemannian conjugate gradient method for optimization on the stiefel manifold. Comp. Opt. and Appl. 67, 73–110.