Parameter estimation problems appear often in adaptive control, , system identification,, signal processing,  and in many other areas. Suppose that the process is described by the following equation:
is the regressor vector andis the vector of unknown parameters to be estimated,
Introduction of the model in the form
and minimization of the following function associated with the mismatch between the model and the process
yields the following restricted linear system of equations :
which should be solved with respect to . Accurate and computationally efficient solution of (4) is the most significant part of the estimation problem. The solution may benefit from the properties of the matrix , which depend on the type of regressor and the step number associated with the window size. The matrix is SPD (Symmetric and Positive Definite) matrix in many cases. For example, the matrix is SPD for the systems with harmonic regressor,  - . In general case the equation (4) can be multiplied by
(for any invertible matrix) that transforms the system to the SPD case with the Gram matrix, 
. Notice that the Gram matrix is often ill-conditioned matrix due to squaring of the condition number.
The matrix is SPD ill-conditioned (a) information matrix for systems with harmonic regressor and short window sizes, , (b) Gram matrix due to the squaring of the condition number in the least squares method, , , (c) mass matrix in finite element method,  and mass matrix (lumped mass matrix) for mechanical systems with singular perturbations, , (d) state matrix for systems of linear equations,  and in many other applications. Moreover, many processes in practice are singulary perturbed (have modes with different time-scales),  and considered as stiff and ill-conditioned systems, which potentially extends application areas of this model. In other words the model (4) with SPD (and possibly ill-conditioned matrix ) covers almost all the cases in the parameter estimation applications.
Ill-conditioning implies robustness problems (sensitivity to numerical calculations) and imposes additional requirements on the accuracy of the solution of (4) (which are especially pronounced in finite-digit calculations) since small changes in due to measurement, truncation, accumulation, rounding and other errors result in significant changes in . In addition, ill-conditioning implies slow convergence of the iterative procedures.
The accuracy requirements is the main motivation for application of the Richardson iteration which is driven by the residual error and has filtering (averaging) property. The residual error is smoothed and remains bounded for a sufficiently large step number in this iteration, where the bound depends on inaccuracies, providing the best possible solution in finite digit calculations.
Therefore convergence rate improvement and reduction of the computational complexity of the Richardson iteration is one of the most important problems in the area. One of the ways to convergence rate improvement is introduction of the Newton-Schulz and Durand algorithms in the Richardson iteration, ,  - .
This paper starts with short systematic overview (presented in unified tutorial fashion) of Newton-Schulz and Durand iterations in Section 2. Special attention is paid to practical introduction to power series factorizations for reduction of the computational complexity in Section 3. The simplest integration of Newton-Schulz algorithms in the Richardson iteration is presented in Section 4 with application to practical problem in Section 5. The problem is associated with the failure detection via estimation of the frequency content of the signals in electrical system in the presence of missing data. The detection is performed on real signals with simulated failure in the software. The efficiency of the approach is demonstrated in Section 5. The paper ends with brief conclusions in Section 6.
2 Overview of Iterative Matrix Inversion Algorithms: Mini Tutorial
Brief overview of iterative matrix inversion algorithms with convergence analysis is presented in this Section. The overview is presented in the systematic way (from simple algorithms to more sophisticated ones) in the chronological sequence with convergence analysis and starts with the second order Newton-Schulz algorithms in Section 2.1 followed by the description of high order Newton-Schulz algorithms in Section 2.2. Computationally efficient Durand iterations are described in Section 2.3, and finally the combination for convergence rate improvement of high order Newton-Schulz and Durand algorithms is presented in Section 2.4.
2.1 Second Order Newton-Schulz Iterations
is the identity matrix,. Evaluation of the error in the first step yields :
which gives the error model:
Algorithm (6) which is also known as Newton-Schulz-Hotelling matrix iteration,  estimates the inverse of the matrix via minimization of the estimation error provided that the spectral radius of initial matrix
is less than one, where is preconditioner (initial guess of ). The proof of convergence is presented in  - , where
the algorithm (6) is written in the following form:
, where .
The order of the iterative algorithm can be increased for convergence rate improvement by introduction of the power series in the algorithm (6).
2.2 High Order Newton-Schulz Iterations
High order algorithms (which are also known as hyperpower iterative algorithms, which are widely used for calculation of generalized inverses) can be presented in the following form for ,  -  :
The error in (9) can be written in the following form:
with . Notice that the error is better aligned with Richardson iteration, see Section 4 and therefore is considered further in this paper.
2.3 Durand Iterations
Computationally efficient matrix inversion method described by Durand, ,  can be derived from the relation associated with the initial error in Newton-Schulz approach, by substitution of instead of as follows :
The error models (14), (15) are valid for the algorithm
Higher order Durand iterations are
discussed in , .
Notice that the convergence rate of Newton-Schulz Iterations is significantly higher than the convergence rate of Durand iterations. However, Durand iteration (13) requires only one matrix product in each step.
Notice also that the convergence performance can be enhanced by introduction of the initial power series as preconditioning (which is calculated only once) in Newton-Schulz and Durand iterations, , . The order of initial power series can be chosen sufficiently high, which essentially improves the convergence rate. Computational complexity of initial power series can be reduced via factorizations, see Section 3 for details.
Convergence rate can also be improved via combination of two Newton-Schulz iterations, which also cover Durand iterations as special case, see next Section 2.4.
2.4 Convergence Rate Improvement via Combination of High Order Newton-Schulz Iterations
Consider the following combination of two Newton-Schulz iterations associated with two-step iterative method,  :
where , , where satisfies (8). Notice that Durand iteration (13) can be derived from (16) - (19) with . The error model (20) can be proved via multiplication of (19) by and explicit evaluation of the error (similar to Section 2.2, see also , ) as follows:
The error model (21) is obtained via explicit evaluation of (22).
The algorithm (16) - (21) has two independent and equally complex computational parts: the first part is associated with calculations of , and , where , whereas the term is calculated in the second part. This algorithm is ideally suited for parallel implementation.
Extended version of this survey is presented in Stotsky A., Recursive Versus Nonrecursive Richardson Algorithms: Systematic Overview, Unified Frameworks and Application to Electric Grid Power Quality Monitoring, Automatika, vol. 63, N2, pp. 328-337, 2022, https://doi.org/10.1080/00051144.2022.2039989, .
Further developments can be found in Stotsky A.,Simultaneous Frequency and Amplitude Estimation for Grid Quality Monitoring : New Partitioning with Memory Based Newton-Schulz Corrections, IFAC PapersOnLine 55-9, pp. 42-47, 2022, https://www.sciencedirect.com/science/article/pii/S2405896322003937 ,.
3 Reduction of Computational Complexity via Unified Factorizations
The following high order Newton-Schulz power series can be factorized for composite orders as follows,  :
Taking into account that every prime number is followed after the composite number the factorization for the prime orders can be performed as follows:
where is the order of the algorithm and is the number of matrix products per each cycle. For illustration of the unified factorization approach few examples with increasing EI are presented below.
3.1 Case of , ,
which requires matrix products with .
3.2 Case of , ,
which requires matrix products with .
3.3 Case of , ,
which requires matrix products with .
3.4 Case of , ,
which requires matrix products, , .
Notice that the idea of factorization is associated with Newton-Schulz method. Therefore Newton-Schulz algorithms of low orders (order two or three) being iterated for a number of steps can be applied instead of factorized Newton-Schulz iterations of higher orders for the sake of robustness and efficiency.
Notice that three steps of the second order Newton-Schulz iteration are equivalent to one step of eighth order iteration with matrix products, see (28). However, application of the eleventh order algorithm, see (29) and  requires also mmm and provides faster convergence than three steps of the second order Newton-Schulz iteration. This algorithm has also the highest efficiency index, among the algorithms presented above. Therefore the proper choice of the order and factorization that is made for each particular application should represent the trade-off between the robustness and convergence rate.
4 Richardson Iterations
Many practical problems can be associated with restricted linear system of equations:
which should be solved for . Richardson iteration is the most promising solution to this problem which provides the best possible accuracy in finite digit calculations. Despite the fact that the inverse of matrix is not required in order to find the approximate inverse essentially improves convergence rate of the Richardson iteration, ,  - . The simplest combination of matrix inversion techniques and Richardson algorithm can be presented as follows :
where is the estimate of , is associated with the estimate of the inverse of and is the inversion error,
see Section 2.
Iteration (31) can be implemented using matrix vector products only.
Notice that definition of the inversion error in the form is more convenient than the following definition since the error is better aligned with Richardson iteration providing the same convergence conditions. The convergence analysis of Richardson iteration with the error can be found for example in .
Any matrix inversion algorithms presented in Section 2 with factorizations described in Section 3 can be applied in (31). Error models for different choices of are presented in . Since is getting closer to in each step the convergence rate of (31) is essentially improved. Updating of can even be stopped after several steps for reduction of the computational burden.
5 Estimation of the Frequency Content of Electrical Signals for Robust Failure Detection
Suppose that the measured electrical signal (voltage or current) can be presented in the following form
where is the vector of unknown constant parameters and is the harmonic regressor presented in the following form:
where is the fundamental frequency of network (for example Hertz or Hertz),
is the number of harmonics, and is a zero mean
white Gaussian noise, is the step number.
The model of the signal (32) with adjustable parameters is presented in the following form:
The signal is approximated by the model in the least squares sense in each step of moving window of a size .
The estimation algorithm is based on minimization of the following error ;
for a fixed step , where .
The least squares solution for estimation of the parameter vector , which should be calculated in each step can be written as follows:
5.1 Missing Input Data & Rank Deficiency of Information Matrix
The matrix is symmetric and positive definite information matrix for systems with harmonic regressor for a sufficiently large window size . However, the software, functional or system failures may occur, which usually result in declaration of zero values in the information matrix . Such failures may result in rank deficiency of the information matrix in each step. Nevertheless, the methods described above can be still applied in this case.
5.2 Fault Detection on Real Data
Power quality monitoring and detection of sag and swell events is
considered in the presence of system failures.
Monitoring is performed via estimation of the frequency content of the voltage signals in three phase
system. Real data obtained from the DOE/EPRI National Database of Power System Events, 
is used for power quality monitoring.
The event (Event ID ) which resulted in voltage sag and swell was considered.
The cause of this event is unknown and the event is considered as temporary fault.
Measured voltage signals which correspond to the sag and swell
in phases ’a’,’b’ and ’c’ are plotted with green, red and blue lines respectively
in Figure 1(a).
It is assumed that the voltage signals can be described by the model (32) with harmonic regressor (33) and fundamental frequency of Hertz with four higher harmonics and unknown parameter vector . The model of the signals is given by (34) with the vector of the parameters which estimates unknown vector . The estimation problem is reduced to the restricted linear system of equations (36) which should be solved w.r.t. in each step. The matrix is the SPD matrix in each step (which is verified by simulations).
Suppose that a functional or system failure occurred, which resulted in three zero columns (columns 3,4 and 5) of the information matrix in each step.
In addition, suppose that three components of the vector (the components 3,4 and 5) are zeros in each step due to similar failure. Such failures result in rank deficient information matrix in each step. The rank of the information matrix is equal to seven in this case. Nevertheless, the methods described above can be applied in this case.
5.3 Pre-Conditioning for Rank Deficient Information Matrix
In order to apply the methods described above the matrix should be chosen such that the spectral radius of the matrix is less than one in each step, . For SPD information matrices the pre-conditioner can be chosen as with , where is the maximum row sum matrix norm, , , . In the case of software or system failures the information matrix may become rank deficient and special preconditioning should be applied. Consider the following transformation of the system (36) and regularization of the matrix :
where is a small positive number, and the matrix is SPD matrix
for which the preconditioning described above can be applied.
The choice of the parameter represents the trade-off between high condition number (for very small
) and the performance of regularization and hence the performance of the event
detection (for large ).
The condition number of as a function of the regularization parameter is shown in
Figure 2. For a small which allows high performance detection the matrix
becomes ill-conditioned matrix (which is sensitive to numerical operations) for which the spectral radius
is very close to one.
Application of the stepwise splitting method,
where the matrix is split recursively in each step provides robust solution for ill-conditioned case, .
Notice that the model (39), (40) provides robust solution for parameter estimation problems in the case of insufficient excitation and singular information matrices and can be applied instead of classical self-tuning adaptive and estimation algorithms in adaptive control and system identification,  - .
5.4 Simulation Results
Amplitudes of the first harmonic of the measured signals
recovered by the Richardson parameter estimation algorithm (31) with
the second order matrix inversion method (6)
are presented in Figure 1(b).
The Figure shows that the sag and swell events are detectable using the approach described above
even in the presence of system failures, which result in rank deficiency of information matrix.
The estimation accuracy was evaluated as the average sum of squares
of the parameter errors for Richardson algorithm with respect to
parameter vector calculated without missing data. The parameter vector calculated
with Richardson algorithm is quite close to the parameter vector calculated without missing data.
In addition, LU decomposition method was also applied for parameter calculation in (36) with regularized matrix , which is close to singular matrix (extremely ill-conditioned) for sufficiently small . LU decomposition method is very sensitive to numerical errors for extremely ill-conditioned matrices, whereas Richardson algorithm delivers robust solution. Therefore Richardson algorithms are recommended for robust detection of the failure events in the presence of missing data.
The practical guide for solution of the parameter estimation problem
described by restricted linear system of equations was presented in the Richardson framework,
which provides the best possible solution in finite digit
Newton-Schulz and Durand iterations with unified factorizations, which enhance the performance of the
Richardson method are presented in
the form of tutorial (in historical framework) with the convergence analysis.
The combination of Richardson method with second order Newton-Schulz approach was tested in the failure detection problem on real signals from the electrical network. In addition, the software or system failure was simulated which resulted in rank deficiency of the information matrix. New preconditioning method was proposed for rank deficient matrices in the problem of estimation of the frequency contents of electrical signals. It was shown that the Richardson method is applicable in the failure detection problems with rank deficient information matrices.
-  Fomin V., Fradkov A. and Yakubovich V., Adaptive Control of Dynamic Objects, Nauka, Moscow, 1981 (in Russian).
-  Ioannou P. and Sun J., Robust Adaptive Control, Prentice Hall, Upper Saddle River, NJ, 1996.
-  Ljung L., System Identification: Theory for the User, Prentice-Hall, Upper Saddle River, NJ, 1999.
-  Bayard, D., A General Theory of Linear Time-Invariant Adaptive Feedforward Systems with Harmonic Regressors. IEEE Trans. Autom. Control vol. 45, N 11, pp. 1983-1996, 2000.
Stotsky A., Recursive Trigonometric Interpolation Algorithms, Journal of Systems and Control Engineering, vol. 224, N 1, pp. 65-77, 2010.
-  Björck Å., Numerical Methods for Least Squares Problems, SIAM, First edition, April 1, 1996.
-  Stotsky A., Accuracy Improvement in Least-Squares Estimation with Harmonic Regressor: New Preconditioning and Correction Methods, 54-th CDC, Dec. 15-18, Osaka, Japan, 2015, pp. 4035-4040.
-  Stotsky A., Unified Frameworks for High Order Newton-Schulz and Richardson Iterations: A Computationally Efficient Toolkit for Convergence Rate Improvement, Journal of Applied Mathematics and Computing, vol. 60, N 1 - 2, pp. 605-623, 2019.
-  Sticko S., Towards Higher Order Immersed Finite Elements for the Wave Equation, IT Licentiate theses, 2016-008, Uppsala University, Department of Information Technology, 2016.
-  Moustafa K., Real-Time Identification of Ill-Conditioned Structures, Journal of Offshore Mechanics and Arctic Engineering, August, vol. 112, pp. 181 - 186, 1990.
-  Hasan A., Kerrigan E. , Constantinides G., Solving a Positive Definite System of Linear Equations via the Matrix Exponential, 50-th IEEE CDC-ECC, Orlando, FL, USA, December 12-15, 2011, pp. 2299-2304.
-  Kokotovic P., Applications of Singular Perturbation Techniques to Control Problems, SIAM Review, vol. 26, N 4, pp. 501-550, 1984.
-  Dubois D., Greenbaum A. and Rodrigue G., Approximating the Inverse of a Matrix for Use in Iterative Algorithms on Vector Processors, Computing, 22, pp. 257-268, 1979.
-  Srivastava S., Stanimirovic P., Katsikis V. and Gupta D., A Family of Iterative Methods with Accelerated Convergence for Restricted Linear System of Equations, Mediterr. J. Math., vol. 14-222, pp. 1-26, 2017.
-  Liu X., Du W., Yu Y. and Yonghui Qin Y., Efficient Iterative Method for Solving the General Restricted Linear Equation, Journal of Applied Mathematics and Physics, N 6, pp. 418-428, 2018.
-  Stotsky A., Combined High-Order Algorithms in Robust Least-Squares Estimation with Harmonic Regressor and Strictly Diagonally Dominant Information Matrix, Proc. IMechE Part I: Journal of Systems and Control Engineering, vol. 229, N 2, pp. 184-190, 2015.
-  Schulz G., Iterative Berechnung Der Reziproken Matrix, Zeitschrift für Angewandte Mathematik und Mechanik, vol. 13, pp. 57-59, 1933.
-  Hotelling H., Some New Methods in Matrix Calculation, Annals of Math. Stat., vol. 14, pp. 1-34, 1943.
-  Demidovich B., Maron I., Basics of Numerical Mathematics, Moscow, Fizmatgiz, 1963, (in Russian).
-  Ben-Israel A. , A Note on an Iterative Method for Generalized Inversion of Matrices, Math. Comput. vol. 20, pp. 439-440, 1966.
-  Isaacson E. and Keller H., Analysis of Numerical Methods, John Wiley & Sons, New York, 1966.
-  Petryshyn W., On Generalized Inverses and on the Uniform Convergence of with Application to Iterative Methods, J. Math. Anal. Appl., vol. 18, pp. 417-439, 1967.
-  Stickel E., On a Class of High Order Methods for Inverting Matrices, ZAMM Z. Angew. Math. Mech. 67, pp. 331-386, 1987.
-  Durand E., Solutions Numeriques des Equations Algebriques, vol. 2., Paris, Masson, 1972.
-  Brezinski C. Variations on Richardson’s Method and Acceleration, in: Numerical Analysis, A Numerical Analysis Conference in Honour of Jean Meinguet, Bull. Soc. Math. Belgium, 1996, pp. 33-44.
-  O’Leary, D., Yet Another Polynomial Preconditioner for the Conjugate Gradient Algorithm, Linear Algebra Appl. vol. 154-156, pp.377-388, 1991.
-  Stotsky A. , Efficient Iterative Solvers in the Least Squares Method, Proc. of the 21-st IFAC World Congress, Berlin, Germany, July 12-17,2020, pp. 901-906, see also arXiv:2008.11480v1 [math.OC], 26 Aug 2020.
-  Soleymani, F., Stanimirovic, P., Haghani, F., On Hyperpower Family of Iterations for Computing Outer Inverses Possessing High Efficiencies, Linear Algebra Appl., vol. 484,pp. 477-495, 2015.
-  Ehrmann H., Konstruktion und Durchführung von Iterationsverfahren höherer Ordnung. Arch. Ration. Mech. Anal., N. 4, pp. 65-88,1959.
-  Stanimirovic P., Kumar A. and Katsikis V., Further Efficient Hyperpower Iterative methods for the Computation of Generalized Inverses , RACSAM, vol.113, pp. 3323-3339, 2019.
-  DOE/EPRI National Database Repository of Power System Events.
-  Stotsky A., Grid Frequency Estimation Using Multiple Model with Harmonic Regressor: Robustness Enhancement with Stepwise Splitting Method, IFAC PapersOnLine 50-1, pp. 12817 - 12822, 2017.
-  Stotsky A. (2022). Recursive Versus Nonrecursive Richardson Algorithms: Systematic Overview, Unified Frameworks and Application to Electric Grid Power Quality Monitoring, Automatika, vol. 63, N2, pp. 328-337, https://doi.org/10.1080/00051144.2022.2039989
-  Stotsky A.,Simultaneous Frequency and Amplitude Estimation for Grid Quality Monitoring : New Partitioning with Memory Based Newton-Schulz Corrections, IFAC PapersOnLine 55-9, pp. 42-47, 2022, https://www.sciencedirect.com/science/article/pii/S2405896322003937