1 Introduction
Given a square complex matrix and a scalar valued function defined on the spectrum of [11, Def.1.1], the notation means that “ is a primary function of ” in the usual sense, as considered in [11] and [13, Ch.6]. We refer the reader to those books for background on matrix functions. If a prior Schur decomposition ( unitary and upper triangular) has been computed, the problem of computing is reduced to that of computing , because .
In this paper, we begin by stating, in Section 2, a preconditioning technique that can be easily incorporated into algorithms for computing functions of triangular matrices. This technique can be used, in particular to:

Increase the efficiency of algorithms, especially those involving scaling and squaring (e.g., matrix exponential) or inverse scaling and squaring techniques (matrix logarithm and matrix th roots);

Reduce the norm of both and , and, in general, of the relative condition number of at , thus making the problem of approximating better conditioned.
In Section 3, we prove that the preconditioning technique reduces the norms of the original matrix and of . Other properties related to Fréchet derivatives and condition numbers are investigated. In Section 4, an error analysis of the technique is provided. It will be shown that it is numerically stable. Two experiments regarding the inverse scaling and squaring to the matrix logarithm and an algorithm for computing the inverse cosine of a matrix are presented in Section 5. In Section 6, some conclusions are drawn.
Unless otherwise stated, , and will denote, respectively, a general subordinate matrix norm, the Frobenius norm and the 2norm (also known as spectral norm). For a given , stands for the matrix of the absolute values of the entries of .
2 The preconditioning technique
Let be an upper triangular matrix and let be a positive scalar. Without loss of generality, let us assume throughout the paper that . Consider the diagonal matrix
(2.1) 
and let us denote .
Given a complex valued function defined on the spectrum of , the following steps describe, at a glance, the proposed preconditioning technique for computing :

Choose a suitable scalar ;

Compute using a certain algorithm;

Recover .
A discussion on the choice of will be provided at the end of Section 4. Step 3 is based on the fact that primary matrix functions preserve similarity transformations (see [11, Thm. 1.13]). Note that the similarity transformation in Step 3 can magnify the absolute error of the computed approximation to , thus resulting in a larger error in the approximation obtained to . This issue will be discussed in Section 4, where we can see that such errors are not larger than the ones resulting from the application of the algorithms without preconditioning.
To gain insight into the effects of the left multiplication of by and of the right multiplication by , we write
(2.2) 
where is a diagonal matrix formed by the diagonal of and zeros elsewhere, is formed by the first superdiagonal of and zeros elsewhere and so on, up to . Then
which means that the proposed technique just involves multiplications/divisions of certain entries of the matrix by the positive scalar .
For instance, if () is an upper triangular matrix ( for ), we have
Hence,
To illustrate the benefits of the proposed preconditioning technique, let us consider the problem of computing the exponential of the matrix
(2.3) 
by the classical scaling and squaring method, as described in [8]
. Before using Taylor or Padé approximants,
has to be scaled by so that the condition holds. We easily find that the smallest verifying that condition is , thus making it necessary to carry out at least 21 squarings for evaluating . In contrast, if we apply our preconditioning technique with , it is enough to take , meaning that the computation of involves only one squaring and the multiplication of the entry of by , which is a very inexpensive procedure.In the last decades, the scaling and squaring method has been subject to significant improvements; see, in particular, [1, 10]. The function expm of the recent versions of MATLAB implements the method proposed in [1], where a sophisticated technique is used to find a suitable number of squarings. Such a technique is based on the magnitude of instead of , which may lead to a considerable reduction in the number of squarings. For instance, if we compute , where is the matrix given in (2.3), by [1, Alg. 5.1], no squaring is required. Note that this does not represent a failure of our proposal, because the preconditioning technique described above can be combined easily with any method for approximating the matrix exponential (or any other primary matrix function), in particular, with the new scaling and squaring method of [1]. For instance, the computation of the exponential of the matrix in (2.4) involves squarings if computed directly by expm (which implements [1, Alg. 5.1]) and no squarings if preconditioned. The main reason is that for , we have and consequently,
for any positive integer .
Similarly, our technique can be used to reduce the number of square roots in the inverse scaling and squaring to the matrix logarithm. The function logm implements the method provided in [2, Alg. 5.2]
, where, like the matrix exponential, the estimation of the number of square roots is based on the magnitude of
instead of . To illustrate the gain that preconditioning a matrix may bring, let us consider(2.4) 
(see [2, p.C163]). Directly computing by logm involves the computation of square roots, while a preconditioning of with , where denotes the nilpotent part of , requires only square roots, without any sacrifice in the accuracy of the computed logarithm (see the results for in the Table 1).
3 Properties
To understand the potential of the proposed preconditioning technique, we show, in this section, that, for any , it reduces the Frobenius norms of both and of . We also provide insight to understand why in all the examples we have tested the norm of the Fréchet derivative is reduced as well, thus making the problems better conditioned.
Proposition 3.1.
Let us assume that: is an upper triangular matrix, , is a complex valued function defined on the spectrum of , and is defined as in (2.1). Denoting , the following inequalities hold:

;

.
Proof.
∎∎
Using (3.3), we can evaluate the difference between the squares of the norms and . Likewise, the difference between the squares of and can be found from (3.6). In particular, if is nonsingular, we have , which shows that , where denotes the standard condition number of with respect to matrix inversion. This issue has also motivated us to investigate the absolute and relative condition numbers of at , which are commonly defined through Fréchet derivatives ([11, Thm. 3.1]).
Given a map , the Fréchet derivative of at in the direction of is a linear operator that maps the “direction matrix” to such that
The Fréchet derivative of the matrix function may not exist at , but if it does it is unique and coincides with the directional (or Gâteaux) derivative of at in the direction . Hence, the existence of the Fréchet derivative guarantees that, for any ,
Any consistent matrix norm on induces the operator norm Here we use the same notation to denote both the matrix norm and the induced operator norm.
Since is linear in
, it is often important to consider the following vectorized form of the Fréchet derivative:
(3.7) 
where stands for the operator that stacks the columns of a matrix into a long vector of size , and
For more information on the Fréchet derivative and its properties see, for instance, [4, Ch. X] and [11, Ch. 3].
Lemma 3.1.
For , let , where is the vector with in the th position and zeros elsewhere. Using the same notation of Proposition 3.1, the following equality holds:
(3.8) 
Proof.
Through a simple calculation, we can see that, for any , Now, by the linearity and similarity properties of Fréchet derivatives, one arrives at:
∎∎
Proposition 3.2.
Proof.
By virtue of Lemma 3.1, (3.9) is trivial if . Hence, from now on, we assume that . Once more, using Lemma 3.1, the properties of primary matrix functions ensure that
Let us denote . Since the matrices , and are upper triangular (recall that ), the same is valid for . Write
where is a diagonal matrix formed by the diagonal of and zeros elsewhere, is formed by the first superdiagonal of and zeros elsewhere and so on, up to . After a few calculations, it can be shown that . More precisely, denoting by the entry of , we have , for any and . The remaining entries of may be or may not be zero. Hence,
showing that (3.9) is valid. ∎∎
The Fréchet derivatives of the most used primary matrix functions are sums or integrals of functions like in Proposition 3.2. For instance, the Fréchet derivatives of the matrix exponential and matrix logarithm allow, respectively, the integral representations
(3.10) 
and
(3.11) 
(see [11]). More generally, a function that can be represented by a Taylor series expansion
has a Fréchet derivative of the form [15]
(3.12) 
which involves sums of functions like . Hence, the procedure used in the proof of inequality (3.9) can be easily extended to several Fréchet derivatives, including, in particular, (3.10), (3.11) and (3.12), thus showing that, for those functions,
(3.13) 
for any ,
A wellknown tool to understand how changes as is being subject to perturbations of first order, is the absolute condition number of at , whose value can be evaluated using the Fréchet derivative:
(3.14) 
The relative condition number of at can be evaluated by the formula
(3.15) 
(see [11, Sec. 3.1]).
Let us recall how to evaluate . Once we know , for a given a pair , with , the equality (3.7) enables us to find the th column of . Repeating the process for all , we can find all the entries of . Since, with respect to the Frobenius norm,
(3.16) 
(see [11, Sec. 3.4]), the absolute and relative conditions numbers follow easily.
We now compare the condition numbers corresponding to and , by analysing the values of and . To simplify the exposition, let us denote and (). Attending to (3.13), the Frobenius norm of the th column of is smaller than or equal to that of the th column of , for any . This means that,
where denotes the vector with one in the th component and zeros elsewhere. Moreover, and, by applying the operator to both handsides of (3.8), we can observe that and have the same signs for all . If (or ) have nonnegative entries, the properties of the spectral norm ensure that
(3.17) 
and, consequently,
(3.18) 
However, because the spectral norm is not absolute (that is, in general, ; see, for instance, [7] and [14, Sec. 5.6]), in the case of (or ) having simultaneously positive and negative entries, it is not fully guaranteed that (3.18) holds. Nevertheless, we believe that (3.18) is not valid just for a few exceptional cases. In all the tests we have carried out, we have not encountered any example for which (3.18) does not hold. The same can be said about the inequality that our tests suggested to be true in general.
4 Error Analysis
Let us denote and , where . We assume that is the approximation arising in the computation of using a certain algorithm and that is the approximation to that results from the multiplication of by on the lefthand side and by on the righthand side. Recall that is defined by (2.1), where . The entries of (resp., ) are denoted by (resp., ).
We will show below that the absolute and relative errors arising in the computation of are magnified, but such a magnification has the same order of magnitude as the corresponding errors that come from the direct application of the algorithms. To get more insight, we analyse the componentwise absolute errors.
Let and We have
where we can see that some entries of are magnified by powers of , and the errors affecting the topright entries of being much larger, especially whenever is large. Hence, we may be misled into thinking that the proposed preconditioning technique would introduce large errors in the computation process. It turns out that the errors arising when one uses preconditioning are not larger than the ones occurring in direct methods (that is, methods without preconditioning). To understand why, let us consider the typical situation where a Padé approximant of order , , is used to approximate . From the definition of Padé approximant, we have
(see, for instance, [11, Sec. 4.4.2]). Hence, in a matrix scenario, there exist coefficients such that
meaning that the truncation error with and without preconditioning is the same (we are not taking into account other types of errors, like roundoff errors). What we have is that for a large , the entries in the topright of are much smaller than the corresponding ones in . These observations still remain valid for Taylor approximants (they are particular cases of Padé approximants) and for the Parlett recurrence (the details are not included here).
The theoretical results stated in the previous section require the condition for having a reduction in the norm and a smaller condition number. There is still the question of how to find a suitable . In the case of the matrix exponential, it is convenient to choose guaranteeing while to the matrix logarithm an appropriate must led to . This suggests that finding an optimal for a given function and for a given matrix
may be difficult. Instead, we propose the following heuristic that in practice works very well.
Assume that , where and are, respectively, the diagonal and nilpotent parts of . According to the experiments that will be shown in Section 5, significant savings may occur if is small when compared with , that is, if the quotient is small. Hence, appears to be a very reasonable choice for making the preconditioning reliable (check the results above for matrices (2.3) and (2.4)), in particular for matrices having norm . In addition, provided that and attending to (3.18), we see that algorithms for computing are less sensitive to errors (including roundoff) than algorithms for .
Another issue that must be taken into account is that the power may overflow for very large values of and . In these cases, another strategy for choosing is recommended. Other possibility is to extend the proposed preconditioning technique to block triangular matrices. This is a topic that needs further research.
Based on the facts above and bearing in mind that the proposed preconditioning technique just involves divisions and multiplications of entries of a matrix by a positive scalar, we can claim that it is a numerically stable procedure provided that is suitable chosen.
5 Numerical experiments
Before presenting the numerical experiments, we give some practical clues on how to implement the proposed preconditioning technique. We assume that an algorithm (let us call it Algorithm 1) for computing , with triangular, and a suitable are available. Consider the MATLAB code displayed on Figure 1. Our preconditioning technique can be implemented in MATLAB using the following steps:

T1=multiply_by_alpha(T,alpha);

Run Algorithm 1 to approximate ;

Recover from F=multiply_by_alpha(T1,1/alpha).
Most of the effective algorithms for matrix functions involve operations, while this preconditioning technique only involves . Thus, if the choice of is such that some operations are saved (e.g., squarings, square roots, matrix products,…), such a preconditioning technique can contribute towards a reduction in the computational cost.
The two experiments we present below were carried out in MATLAB R2019b (with unit roundoff ).
Experiment 1.
In this first experiment, we have calculated the logarithm of the following triangular matrices, with no real negative eigenvalues (MATLAB style is used) by the MATLAB function
logm, without and with preconditioning:
is the matrix , with and ; its exact logarithm is ;

is the matrix in (2.4);

has been obtained by ;

has been obtained by ;

has come from ;

to are randomized matrices with orders ranging from to with small entries in diagonal and large entries in the superdiagonals.
The results are displayed in Table 1. We have used the following notation:

, where is defined in (2.1) with ;

and concern the number of square roots involved in the inverse scaling and squaring method, without and with preconditioning, respectively;

and are the relative errors of the computed approximations, that is,

and are estimates of the relative condition numbers obtained with the function logm_cond available in [12].
Excepting the matrix , whose exact logarithm is known, we have considered as the exact () the result of evaluating the logarithm at 200 decimal digit precision using the Symbolic Math Toolbox and rounding the result to double precision.
5  0  9.8e23  9.8e23  1.1e+06  1.9e+00  3.3e+11  2.9e+00  1.4e06  

16  4  2.1e16  2.1e16  7.3e+04  9.9e01  8.8e+19  5.4e+00  8.5e06  
6  4  5.3e16  6.8e16  4.7e+00  4.0e+00  8.7e+01  3.0e+01  1.1e+00  
5  4  2.8e16  7.7e16  7.1e+00  6.3e+00  6.4e+02  9.4e+01  1.7e+00  
11  4  8.7e16  5.7e16  6.2e+01  4.6e+01  2.9e+10  6.6e+02  1.1e+00  
11  6  1.2e15  2.1e16  2.0e+01  1.6e+00  1.9e+11  2.0e+02  7.0e02  
11  5  4.4e16  4.9e16  3.2e+01  1.5e+00  9.8e+10  2.5e+01  4.7e02  
19  6  9.1e16  6.6e16  5.0e+01  1.1e+00  3.7e+21  3.0e+02  2.1e02  
10  6  1.8e15  5.4e16  1.2e+01  1.5e+00  1.5e+09  7.5e+01  1.2e01  
13  5  7.2e16  2.7e16  3.1e+01  1.5e+00  1.4e+13  4.3e+01  4.2e02 
Experiment 2. In this experiment, we have implemented Algorithm 5.2 in [3] for computing the inverse cosine of a matrix, , with and without preconditioning. MATLAB codes are available at https://github.com/higham/matrixinvtrighyp and the algorithm is called by acosm. This algorithm involves a prior Schur decomposition, thus being well suited to be combined with preconditioning. We have considered ten matrices from MATLAB’s gallery with no eigenvalues in the set and sizes ranging from to . The left plot of Figure 2 compares the relative errors
where the exact results from evaluating the inverse cosisne at 200 decimal digit precision using the Symbolic Math Toolbox and rounding the result to double precision. The right plot displays the number of square roots in the inverse scaling and squaring procedure required by [3, Alg. 5.2] with preconditioning, , and without preconditioning .
A careful analysis of the results displayed in Table 1 shows that a combination of the proposed preconditioning technique with logm and an appropriate choice of brings many benefits, namely:

A significant reduction in the number of square roots required, especially when the norm of is small in comparison with the norm of (check second, third and the last column);

A stabilization or reduction of the magnitude of relative errors (columns 4 and 5);

A decrease in the relative condition number of the matrix logarithm (columns 6 to 9).
Similar observations hold for the results of Figure 2, namely: lower relative errors and less square roots when the preconditioning technique is incorporated in [3, Alg. 5.2].
6 Conclusions
We have proposed an inexpensive preconditioning technique aimed at improving algorithms for evaluating functions of triangular matrices, in terms of computational cost and accuracy. It is particularly well suited to be combined with algorithms involving a prior Schur decomposition. Such a technique involves a scalar that needs to be carefully chosen. We have presented a practical strategy for finding such a scalar, that has given good results for experiments involving matrix exponential, logarithm and inverse cosine.
References
 [1] A. H. AlMohy, N. J. Higham, A New Scaling and Squaring Algorithm for the Matrix Exponential, SIAM J. Matrix Anal. Appl., 31, 970–989 (2009).
 [2] A. H. AlMohy and N. J. Higham, Improved inverse scaling and squaring for the matrix logarithm, SIAM J. Sci. Comput., 34, 153–169 (2012).
 [3] M. Aprahamian and N. J. Higham, Matrix Inverse Trigonometric and Inverse Hyperbolic Functions: Theory and Algorithms, SIAM J. Matrix Anal. Appl., 37, 1453–1477 (2016).
 [4] R. Bhatia, Matrix Analysis, SpringerVerlag, New York (1997).
 [5] J. R. Cardoso, A. Sadeghi, Computation of matrix gamma function, BIT Numer. Math., 59, 343–370 (2019).
 [6] P. A. Davies and N. J. Higham, A SchurParlett algorithm for computing matrix functions, SIAM J. Matrix Anal. Appl., 25, 464–485 (2003).
 [7] R. Mathias, The spectral norm of a nonnegative matrix, Linear Alg. Appl., 139, 269–284 (1990).
 [8] C. Moler and C. Van Loan, Nineteen dubious ways to compute the exponential of a matrix, twentyfive years later, SIAM Rev., 45, 3–49 (2003).
 [9] R. Garrappa, M. Popolizio, Computing the Matrix MittagLeffler Function with Applications to Fractional Calculus, J. Sci. Comput., 77, 129–153 (2018).
 [10] N. J. Higham, The scaling and squaring method for the matrix exponential revisited, SIAM J. Matrix Anal. Appl., 26, 1179–1193 (2005).
 [11] N. J. Higham, Functions of Matrices: Theory and Computation, SIAM, Philadelphia, PA, USA (2008).

[12]
N. J. Higham, The Matrix Function Toolbox,
http://www.maths.manchester.ac.uk/higham/mftoolbox/.  [13] R. A. Horn, C. R. Johnson, Topics in Matrix Analysis, Cambridge Univ. Press, Cambridge, UK, Paperback Edition (1994).
 [14] R. A. Horn, C. R. Johnson, Matrix Analysis, 2nd Ed., Cambridge University Press (2013).
 [15] C. S. Kenney, A. J. Laub, Condition estimates for matrix functions, SIAM J. Matrix Anal. Appl., 10, 191–209 (1989).