Efficiently and gracefully handling computational errors is critical in modern computing. Such errors can result from short-lived hardware failures, communication noise, buggy software, or even malicious third-party servers.
The first goal for fault-tolerant computing is verification. Freivalds  presented a linear-time algorithm to verify the correctness of a single matrix product. Recently, efficient verification algorithms for a wide range of computational linear algebra problems have been developed [Kaltofen et al., 2011, Dumas and Kaltofen, 2014, Dumas et al., 2016, 2017].
A higher level of fault tolerance is achieved through error correction, which is the goal of the present study. This is a strictly stronger model than verification, where we seek not only to identify when a result is incorrect, but also to compute the correct result if it is “close” to the given, incorrect one. Some recent error-correction problems considered in the computer algebra literature include Chinese remaindering [Goldreich et al., 1999, Khonji et al., 2010, Böhm et al., 2015], system solving [Boyer and Kaltofen, 2014, Kaltofen et al., 2017], and function recovery [Comer et al., 2012, Kaltofen and Yang, 2013].
A generic approach to correcting errors in transmission would be to use an error-correcting code, writing the result with some redundancy to support transmission over a noisy channel. But this approach requires extra bandwidth, is limited to a fixed number of errors, and does not account for malicious alterations or mistakes in the computation itself.
Instead, we will use the information in the problem itself to correct errors. This is always possible by re-computing the result, and so the goal is always to correct a small number of errors in (much) less time than it would take to do the entire computation again. Our notion of error correction makes it possible to correct an unbounded number of mistakes that occur either in computation or transmission, either at random or by a malicious party.
Some other applications of error correction have nothing to do with alterations or mistakes. One example is sparse matrix multiplication, which can be solved with an error-correction algorithm by simply assuming the “erroneous” matrix is zero. Another example is computing a result over using Chinese remaindering modulo small primes, where the entries have varying bit lengths. The small entries are determined modulo the first few primes, and the remaining large entries are found more quickly by applying an error correction algorithm using (known) small entries.
1.1 Our results
Matrix multiplication with errors
We consider two computational error correction problems. The first problem is the same as in Gasieniec et al. , correcting a matrix product. Given , the task is to compute the unique error matrix such that .
Algorithm 5 solves this problem using
field operations, where
is the number of nonzero entries in the input matrices, at most ;
is the number of errors in the given product ; and
is the number of distinct rows (or columns) in which errors occur, whichever is larger.
|Spread-out errors||Compact errors|
An even more detailed complexity statement, incorporating rectangular matrices and a failure probability, can be found inTheorem 7.1. To understand the implications of this complexity, consider six cases, as summarized in Table 1.
The cost of our algorithm depends on the locations of the errors. If there are a constant number of errors per row or column, we can use sparse multiplication to achieve softly-linear complexity . This is similar to [Gasieniec et al., 2017] except that we do not require the input matrices to be dense.
If errors are spread out among all rows and columns, the worst-case cost using dense multiplication is . As the number of errors grows to , this cost approaches that of normal matrix multiplication without error correction.
In the other extreme, if the errors are isolated to a square (but not necessarily contiguous) submatrix, then the worst-case cost using dense multiplication is . Again this scales to as , but the cost is better for . This situation may also make sense in the context of distributed computing, where each node in a computing cluster might be assigned to compute one submatrix of the product.
Our algortihm is never (asymptotically) slower than that of [Gasieniec et al., 2017] or the cost of naïve recomputation, but it is faster in many cases: for example when the inputs are sparse, when there are many errors, and when the errors are compactly located.
Matrix inverse with errors
The second problem we consider is that of correcting computational errors in a matrix inverse. Formally, given , the task is to compute the unique error matrix such that .
Algorithm 6 shows how to solve this problem using
field operations, where the parameters are the same as before, and is the number of rows or columns which contain errors, whichever is smaller. This is the same complexity as our algorithm for matrix multiplication with errors, plus an additional term.
The additional term makes the situation here less fractured than with the multiplication algorithm. In the case of spread-out errors, the cost is , which becomes simply when the input matrices are dense. The case of compact errors is better, and the complexity is just , the same as that of our multiplication algorithm
As before, the cost of this algorithm approaches the cost of the naïve inverse computation as the number of errors grows to .
The algorithms we present work over any field and require only that a field element of order larger than can be found. Because our algorithms explicitly take time to compute the powers of this element anyway, the usual difficulties with efficiently generating high-order elements in finite fields do not arise.
Over the integers or rationals , the most efficient approach would be to work modulo a series of primes and use Chinese remaindering to recover the result. A similar approach could be used over polynomial rings.
Over the reals or complex numbers
, our algorithms work only under the unrealistic assumption of infinite precision. We have not considered the interesting question of how to recover from the two types of errors (noise and outliers) that may occur in finite precision.
1.2 Related work
Let be a constant between 2 and 3 such that two matrices can be multiplied using field operations. In practice, for dimensions up to a few hundred, after which Strassen-Winograd multiplication is used giving . The best asymptotic algorithm currently gives [Le Gall, 2014].
Even though the practical value of is larger than the asymptotic one, matrix multiplication routines have been the subject of intense implementation development for decades, and highly-tuned software is readily available for a variety of architectures [Dumas et al., 2008, Wang et al., 2013]. Running times based on have practical as well as theoretical significance; the indicates where an algorithm is able to take advantage of fast low-level matrix multiplication routines.
Special routines for multiplying sparse matrices have also been developed. Writing for the number of nonzero entries in the input, the standard row- or column-wise sparse matrix multiplication costs field operations. Yuster and Zwick  improved this to . Later work also incorporates the number of nonzeros in the product matrix. Lingas  gave an output-sensitive algorithm with running time . Amossen and Pagh  have another with complexity , an improvement when the outputs are not too dense. Pagh  developed a different approach with complexity ; that paper also includes a nice summary of the state of the art. Our multiplication with errors algorithm can also be used for sparse multiplication, and it provides a small improvement when the input and output are both not too sparse.
The main predecessor to our work is the recent algorithm of Gasieniec et al. , which can correct up to errors in the product of two matrices using field operations. Their approach is based in part on the earlier work of Pagh 
and makes clever use of hashing and fast Fourier transforms in order to achieve the stated complexity.
Computing the inverse of a matrix has the same asymptotic complexity as fast matrix multiplication [Bunch and Hopcroft, 1974]. In practice, using recent versions of the FFPACK library [Dumas et al., 2008] on an Intel desktop computer, we find that a single matrix inversion is only about 33% more expensive than a matrix-matrix product of the same size.
Multiplication with errors
A high-level summary of our multiplication algorithm is as follows:
Determine which rows in the product contain errors.
Write the unknown sparse matrix of errors as . Remove the rows with no errors from and , so we have .
Treating the rows of as sparse polynomials, use structured linear algebra and fast matrix multiplication to evaluate each row polynomial at a small number of points.
Use sparse polynomial interpolation to recover at least half of the rows from their evaluations.
Update and iterate times to recover all errors.
To determine the rows of which are nonzero in Step i
, we apply a simple variant of Frievalds’ verification algorithm: for a random vector, the nonzero rows of are probably the same as the nonzero rows of itself.
The evaluation/interpolation approach that follows is reminiscent of that in Gasieniec et al. , with two important differences. First, using sparse polynomial techniques rather than hashing and FFTs means the complexity scales linearly with the number of nonzeros in the input rather than the dense matrix size . Second, by treating the rows separately rather than recovering all errors at once, we are able to incorporate fast matrix multiplication so that our worst-case cost when all entries are erroneous is never more than , the same as that of naïve recomputation.
Section 4 provides details on the Monte Carlo algorithm to determine the rows and columns where errors occur (Step i above), Section 5 presents a variant of sparse polynomial interpolation that suits our needs for Step iv, and Section 6 explains how to perform the row polynomial evaluations in Step iii. A certain rectangular matrix multiplication in this step turns out to dominate the overall complexity. We connect all these pieces of the multiplication algorithm in Section 7.
Inverse with errors
Our algorithm for correcting errors in a matrix inverse is based on the multiplication algorithm, but with two complications. Writing for the input matrix, for the erroneous inverse, and for the (sparse) matrix of errors in , we have:
We want to compute for a random vector in order to determine which rows of are nonzero. This is no longer possible directly, but using the second formulation as above, we can compute . Because must be nonsingular, has the same distribution as a random vector, so this approach still works.
The second complication is that removing zero rows of does not change the dimension of the right-hand side, but only removes corresponding columns from on the left-hand side. We make use of the recent result of Cheung et al. , which shows how to quickly find a maximal set of linearly independent rows in a sparse matrix. This allows us to find a small submatrix of such that , with being compressed matrices as before. Because the size of depends on the number of errors, it can be inverted much more quickly than re-computing the entire inverse of .
The dominating cost in most cases is the same rectangular matrix product as needed as in matrix multiplication with error correction. However, here we also have the extra cost of finding and computing its inverse, which dominates the complexity when there are a moderate number of errors and their locations are spread out.
3 Notation and preliminaries
The soft-oh notation is the same as the usual big-oh notation but ignoring sub-logarithmic factors: if and only if , for some runtime functions and .
We write for the set of nonnegative integers and for the finite field with elements.
For a finite set , the number of elements in is denoted , and is the powerset of , i.e., the set of subsets of .
The number of nonzero entries in a matrix is written as . Note that , and if has rank then .
For a univariate polynomial , denotes the exponents of nonzero terms of . Note that .
We assume that the number of field operations to multiply two (dense) polynomials in with degrees less than is . This is justified by the generic algorithm from Cantor and Kaltofen  which gives , or more result results of Harvey et al.  which improve this to for finite fields.
As discussed earlier, we take with to be any feasible exponent of matrix multiplication. By applying fast matrix multiplication in blocks, it is also possible to improve the speed of rectangular matrix multiplication in a straightforward way. In particular:
The product of any matrix times a matrix can be found using ring operations.
Note that it might be possible to improve this for certain values of using Le Gall , but that complexity is much harder to state and we have not investigated such an approach.
We restate two elementary facts on sparse matrix multiplication.
For any and , the matrix-vector product can be computed using field operations.
For any and , their product can be computed using field operations.
4 Identifying nonzero rows and columns
Our algorithms for error correction in matrix product and inverse computation both begin by determining which rows and columns contain errors. This is accomplished in turn by applying a random row or column vector to the unknown error matrix .
For now, we treat the error matrix as a black box, i.e., an oracle matrix which we can multiply by a vector on the right or left-hand side. The details of the black box construction differ for the matrix multiple and inverse problem, so we delay those until later.
The idea is to apply a row or column vector of random values to the unknown matrix , and then examine which entires in the resulting vector are zero. This is exactly the same idea as the classic Monte Carlo verification algorithm of Freivalds , which we extend to which rows or columns in a matrix are nonzero. This black-box procedure is detailed in Algorithm 1 FindNonzeroRows.
The running time of FindNonzeroRows depends entirely on the size of and the cost of performing one black-box apply over the field . The correctness depends on the parameter .
FindNonzeroRows always returns a list of nonzero row indices in With probability at least , it returns the index of every nonzero row in .
Consider a single row of . If is zero, then the corresponding row of will always be zero; hence every index returned must be a nonzero row.
Next assume has at least one nonzero entry, say at index , and consider one row
of the random matrix. Whatever the other entries in and are, there is exactly one value for the ’th entry of such that . So the probability that is , and by independence the probability that is .
Because has at most nonzero rows, the union bound tells us that the probability any one of the corresponding rows of is zero is at most . From the definition of , this ensures a failure probability of at most . ∎
Because we mostly have in mind the case that is a finite field, we assume in FindNonzeroRows and in the preceding proof that it is possible to sample uniformly from the entire field . However, the same results would hold if sampling from any fixed subset of (perhaps increasing the size of if the subset is small).
5 Batched low-degree sparse interpolation
Our algorithms use techniques from sparse polynomial interpolation to find the locations and values of erroneous entries in a matrix product or inverse.
Consider an unknown univariate polynomial with degree less than and at most nonzero terms. The nonzero terms of can be uniquely recovered from evaluations at consecutive powers [Ben-Or and Tiwari, 1988, Kaltofen and Yagati, 1989].
Doing this efficiently in our context requires a few variations to the typical algorithmic approach. While the state of the art seen in recent papers such as [Javadi and Monagan, 2010, Kaltofen, 2010, Arnold et al., 2015, Huang and Gao, 2017] achieves softly-linear complexity in terms of the sparsity , there is either a higher dependency on the degree, a restriction to certain fields, or both.
Here, we show how to take advantage of two unique aspects of the problem in our context. First, we will interpolate a batch of sparse polynomials at once, on the same evaluation points, which allows for some useful precomputation. Secondly, the maximum degree of any polynomial we interpolation is , the column dimension of the unknown matrix, and we are allowed linear-time in .
The closest situation in the literature is that of van der Hoeven and Lecerf , who show how to recover a sparse polynomial if the exponents are known. In our case, we can tolerate a much larger set containing the possible exponents by effectively amortizing the size of this exponent set over the group of batched sparse interpolations.
Recall the general outline of Prony’s method for sparse interpolation of from evaluations :
Find the minimum polynomial of the sequence of evaluations.
Find the roots of .
Compute the discrete logarithm base of each to determine the exponents of .
Solve a transposed Vandermonde system built from and the evaluations to recover the coefficients of .
Steps iv and i can be performed over any field using fast multipoint evaluation and interpolation in field operations [Kaltofen and Yagati, 1989]. However, the other steps depend on the field and in particular Step iii can be problematic over large finite fields.
To avoid this issue, we first pre-compute all possible roots for any minpoly in any of the batch of sparse interpolations that will be performed. Although the set of possible roots is larger than the degree of any single minpoly , in our case the set is always bounded by the dimension of the matrix, so performing this precomputation once is worthwhile.
5.1 Batched root finding
While fast root-finding procedures over many fields have already been developed, we present a simple but efficient root-finding procedure for our specific case that has the advantage of running in softly-linear time over any coefficient field. The general idea is to first compute a coprime basis (also called a gcd-free basis) for the minpolys , then perform multi-point evaluation with the precomputed list of possible roots.
The details of procedure FindRoots are in Algorithm 2 below. In the algorithm, we use a product tree constructed from a list of polynomials. This is a binary tree of height , with the original polynomials at the leaves, and where every internal node is the product of its two children. In particular, the root of the tree is the product of all polynomials. The total size of a product tree, and the cost of computing it, is softly-linear in the total size of the input polynomials; see Borodin and Munro  for more details.
Algorithms 18.1 and 21.2 of Bernstein  are deterministic and compute (respectively) a coprime basis and then a factorization of the ’s according to the basis elements.
The polynomials form a standard product tree from the polynomials in the coprime basis. For any element of the product tree, its roots must be a subset of the roots of its parent node. So the multi-point evaluations on Algorithm 2 determine, for every polynomial in the tree, which elements of are roots of that polynomial.
The cost of computing the coprime basis using Algorithm 18.1 in [Bernstein, 2005] is with at least a factor. This gives the first term in the complexity (and explains our use of soft-oh notation throughout). This cost dominates the cost of constructing the product tree and factoring the ’s on Algorithm 2.
Using the standard product tree algorithm [Borodin and Munro, 1975], the cost of multi-point evaluation of a degree- polynomial at each of points is field operations. In our algorithm, this happens at each level down the product tree. At each of levels there are at most points (since the polynomials at each level are relatively prime). The total degree at each level is at most . If , then this cost is already bounded by that of determining the coprime basis. Otherwise, the multi-point evaluations occur in batches of points each, giving the second term in the complexity statement. ∎
5.2 Batched sparse interpolation algorithm
We are now ready to present the full algorithm for performing simultaneous sparse interpolation on polynomials based on evaluations each.
Let be a field with an element whose multiplicative order is at least , and suppose are unknown univariate polynomials. Given bounds , a set of possible exponents, and evaluations for all and , Algorithm 3 MultiSparseInterp requires field operations in the worst case.
The algorithm correctly recovers every polynomial such that , , and .
From Lemma 5.1, the cost of FindRoots is . This dominates the cost of the minpoly and transposed Vandermonde computations, which are both [Kaltofen and Yagati, 1989]. Using binary powering, computing the set on Algorithm 3 requires field operations.
If has at most nonzero terms, then the minpoly computed on Algorithm 3 is a degree- polynomial with exactly distinct roots. If , then by Lemma 5.1, FindRoots correctly finds all these roots. Because these are the only terms in , solving the transposed Vandermonde system correctly determines all corresponding coefficients; see [Kaltofen and Yagati, 1989] or [van der Hoeven and Lecerf, 2013] for details. ∎
6 Performing evaluations
Consider the matrix multiplication with errors problem, recovering an sparse error matrix according to the equation . In this section, we show how to treat each row of as a sparse polynomial and evaluate each of these at consecutive powers of a high-order element , as needed for the sparse interpolation algorithm from Section 5. The same techniques will also be used in the matrix inverse with errors algorithm.
Consider a column vector of powers of an indeterminate, . The matrix-vector product then consists of polynomials, each degree less than , and with a 1-1 correspondence between nonzero terms of the polynomials and nonzero entries in the corresponding row of .
Evaluating every polynomial in at a point is the same as multiplying by a vector . Evaluating each polynomial in at the first powers allows for the unique recovery of all -sparse rows, according to Theorem 5.2. This means multiplying times an evaluation matrix such that .
Consider a single row vector with nonzero entries. Multiplying means evaluating a -sparse polynomial at the first powers of ; we can view it as removing the zero entries from to give , and removing the same rows from to get , and then computing the product . This evaluation matrix with removed rows is actually a transposed Vandermonde matrix built from the entries for each index of a nonzero entry in .
An explicit algorithm for this transposed Vandermonde matrix application is given in Section 6.2 of Bostan et al. , and they show that the complexity of multiplying is field operations, and essentially amounts to a power series inversion and product tree traversal. (See also Section 5.1 of van der Hoeven and Lecerf  for a nice description of this approach to fast evaluation of a sparse polynomial at consecutive powers.) We repeat this for each row in a given matrix to compute .
Now we are ready to show how to compute efficiently, using the approach just discussed to compute and before multiplying times and performing a subtraction. As we will show, the rectangular multiplication of times is the only step which is not softly-linear time, and this dominates the complexity.
The cost of pre-computing all powers using binary powering is field operations, since the number of nonzero columns in or is at most . Using these precomputed values, each transposed Vandermonde apply costs , where is the number of nonzero entries in the current row. Summing over all rows of and gives for these steps.
Because all the relevant parameters are known before the multiplication, we assume that the underlying software makes the best choice among these two algorithms.
Note that and are definitely dominated by the cost of the dense multiplication. In the sparse multiplication case, the situation is more subtle when . But in this case, we just suppose that rows of which are not used in the sparse multiplication are never computed, so the complexity is as stated. ∎
7 Multiplication with error correction algorithm
We now have all the necessary components to present the complete multiplication algorithm, Algorithm 5 MultiplyEC. The general idea is to repeatedly use FindNonzeroRows to determine the locations on nonzeros in , then DiffEval and MultiSparseInterp to recover half of the nonzero rows at each step.
Setting the target sparsity of each recovered row to , where is the number of nonzero rows, ensures that at least half of the rows are recovered at each step. To get around the fact that the number of errors is initially unknown, we start with an initial guess of , and double this guess whenever fewer than half of the remaining rows are recovered.
The procedure is probabilistic of the Monte Carlo type only because of the Monte Carlo subroutine FindNonzeroRows to determine which rows are erroneous. The other parts of the algorithm — computing evaluations and recovering nonzero entries — are deterministic based on the bounds they are given.
The only probabilistic parts of the algorithm are the calls to FindNonzeroRows, which can only fail by incorrectly returning too few rows. If this never happens, then each iteration through the while loop either (a) discovers that the value of was too small and increases it, or (b) recovers half of the rows or columns of .
So the total number of iterations of the while loop if the calls to FindNonzeroRows are never incorrect is at most By the union bound, from the way that is computed and the fact that there are two calls to FindNonzeroRows on every iteration, the probability that FindNonzeroRows is never incorrect is at least .
If a call to FindNonzeroRows returns an incorrect result, two bad things can happen. First, if is too small during some iteration, all of the evaluations may be incorrect, leading to being increased. In extremely unlucky circumstances, this may happen times until the product is naïvely recomputed on Algorithm 5, leading to the worst-case running time.
The second bad thing that can occur when a call to FindNonzeroRows fails is that it may incorrectly report all errors have been found, leading to the small chance that the algorithm returns an incorrect result. ∎
8 Inverse error correction algorithm
As discussed in Section 2, our algorithm for correcting errors in a matrix inverse follows mostly the same outline as that for correcting a matrix product, with two important changes. These are the basis for the next two lemmas.
For any matrices where is invertible, a call to correctly returns the nonzero rows of with probability at least .
In the current formulation, instead of applying the random matrix directly to , instead the product is applied. But because is nonsingular, there is a 1-1 correspondence between the set of all matrices and matrices the set . That is, applying a random matrix to is the same as applying a random matrix to itself, and therefore the same results hold. ∎
Given any rank- matrix , it is possible to compute a matrix formed from a subset of the rows of , and its inverse , using field operations.
This is a direct consequence of [Cheung et al., 2013, Theorem 2.11], along with the classic algorithm of [Bunch and Hopcroft, 1974] for fast matrix inversion. Alternatively, one could use the approach of [Storjohann and Yang, 2015] to compute the lexicographically minimal set of linearly independent rows in , as well as a representation of the inverse, in the same running time. ∎
In this algorithm, we make use of two different formulas for :
The main difference in this algorithm compared to MultiplyEC is the need to find nonzero rows of to constitute the matrix and compute its inverse . According to Lemma 8.2 these both cost , which gives the additional term in the complexity statement. Note that this also eliminates the utility of sparse multiplication in all cases, simplifying the complexity statement somewhat compared to that of MultiplyEC.
The rest of the proof is identical to that of Theorem 7.1. ∎
- Amossen and Pagh  Rasmus Resen Amossen and Rasmus Pagh. Faster join-projects and sparse matrix multiplications. In Proceedings of the 12th International Conference on Database Theory, ICDT ’09, pages 121–126, New York, NY, USA, 2009. ACM. doi: 10.1145/1514894.1514909.
- Arnold et al.  Andrew Arnold, Mark Giesbrecht, and Daniel S. Roche. Faster sparse multivariate polynomial interpolation of straight-line programs. Journal of Symbolic Computation, 2015. doi: 10.1016/j.jsc.2015.11.005.
Ben-Or and Tiwari 
Michael Ben-Or and Prasoon Tiwari.
A deterministic algorithm for sparse multivariate polynomial
Proceedings of the twentieth annual ACM symposium on Theory of computing, STOC ’88, pages 301–309, New York, NY, USA, 1988. ACM. doi: 10.1145/62212.62241.
- Bernstein  Daniel J. Bernstein. Factoring into coprimes in essentially linear time. Journal of Algorithms, 54(1):1–30, 2005. doi: 10.1016/j.jalgor.2004.04.009.
- Böhm et al.  Janko Böhm, Wolfram Decker, Claus Fieker, and Gerhard Pfister. The use of bad primes in rational reconstruction. Math. Comp., 84(296):3013–3027, 2015. doi: 10.1090/mcom/2951.
- Borodin and Munro  A. Borodin and I. Munro. The computational complexity of algebraic and numeric problems. Number 1 in Elsevier Computer Science Library; Theory of Computation Series. American Elsevier Pub. Co., New York, 1975.
- Bostan et al.  A. Bostan, G. Lecerf, and É. Schost. Tellegen’s principle into practice. In Proceedings of the 2003 International Symposium on Symbolic and Algebraic Computation, ISSAC ’03, pages 37–44. ACM, 2003. doi: 10.1145/860854.860870.
- Boyer and Kaltofen  Brice Boyer and Erich L. Kaltofen. Numerical linear system solving with parametric entries by error correction. In Proceedings of the 2014 Symposium on Symbolic-Numeric Computation, SNC ’14, pages 33–38. ACM, 2014. doi: 10.1145/2631948.2631956.
- Bunch and Hopcroft  James R. Bunch and John E. Hopcroft. Triangular factorization and inversion by fast matrix multiplication. Mathematics of Computation, 28(125):231–236, 1974. URL http://www.jstor.org/stable/2005828.
- Cantor and Kaltofen  David G. Cantor and Erich Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Informatica, 28:693–701, 1991. doi: 10.1007/BF01178683.
- Cheung et al.  Ho Yee Cheung, Tsz Chiu Kwok, and Lap Chi Lau. Fast matrix rank algorithms and applications. J. ACM, 60(5):31:1–31:25, October 2013. doi: 10.1145/2528404.
- Comer et al.  Matthew T. Comer, Erich L. Kaltofen, and Clément Pernet. Sparse polynomial interpolation and Berlekamp/Massey algorithms that correct outlier errors in input values. In Proceedings of the 37th International Symposium on Symbolic and Algebraic Computation, ISSAC ’12, pages 138–145, New York, NY, USA, 2012. ACM. doi: 10.1145/2442829.2442852.
- Dumas and Kaltofen  Jean-Guillaume Dumas and Erich Kaltofen. Essentially optimal interactive certificates in linear algebra. In Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, ISSAC ’14, pages 146–153. ACM, 2014. doi: 10.1145/2608628.2608644.
- Dumas et al.  Jean-Guillaume Dumas, Pascal Giorgi, and Clément Pernet. Dense linear algebra over word-size prime fields: the FFLAS and FFPACK packages. ACM Trans. Math. Softw., 35:19:1–19:42, October 2008. doi: 10.1145/1391989.1391992.
- Dumas et al.  Jean-Guillaume Dumas, Erich Kaltofen, Emmanuel Thomé, and Gilles Villard. Linear time interactive certificates for the minimal polynomial and the determinant of a sparse matrix. In Proceedings of the ACM on International Symposium on Symbolic and Algebraic Computation, ISSAC ’16, pages 199–206. ACM, 2016. doi: 10.1145/2930889.2930908.
- Dumas et al.  Jean-Guillaume Dumas, David Lucas, and Clément Pernet. Certificates for triangular equivalence and rank profiles. In Proceedings of the 2017 ACM on International Symposium on Symbolic and Algebraic Computation, ISSAC ’17, pages 133–140. ACM, 2017. doi: 10.1145/3087604.3087609.
- Freivalds  Rūsiņš Freivalds. Fast probabilistic algorithms. In Jiří Bečvář, editor, Mathematical Foundations of Computer Science 1979, pages 57–69. Springer Berlin Heidelberg, 1979.
- Gasieniec et al.  Leszek Gasieniec, Christos Levcopoulos, Andrzej Lingas, Rasmus Pagh, and Takeshi Tokuyama. Efficiently correcting matrix products. Algorithmica, 79(2):428–443, Oct 2017. doi: 10.1007/s00453-016-0202-3.
- Goldreich et al.  Oded Goldreich, Dana Ron, and Madhu Sudan. Chinese remaindering with errors. In Proceedings of the Thirty-first Annual ACM Symposium on Theory of Computing, STOC ’99, pages 225–234. ACM, 1999. doi: 10.1145/301250.301309.
- Harvey et al.  David Harvey, Joris van der Hoeven, and Grégoire Lecerf. Faster polynomial multiplication over finite fields. J. ACM, 63(6):52:1–52:23, January 2017. doi: 10.1145/3005344.
- van der Hoeven and Lecerf  Joris van der Hoeven and Grégoire Lecerf. On the bit-complexity of sparse polynomial and series multiplication. Journal of Symbolic Computation, 50:227–0254, 2013. doi: 10.1016/j.jsc.2012.06.004.
- Huang and Gao  Qiao-Long Huang and Xiao-Shan Gao. Faster deterministic sparse interpolation algorithms for straight-line program multivariate polynomials. CoRR, abs/1709.08979, 2017. URL http://arxiv.org/abs/1709.08979.
- Javadi and Monagan  Seyed Mohammad Mahdi Javadi and Michael Monagan. Parallel sparse polynomial interpolation over finite fields. In Proceedings of the 4th International Workshop on Parallel and Symbolic Computation, PASCO ’10, pages 160–168, New York, NY, USA, 2010. ACM. doi: 10.1145/1837210.1837233.
- Kaltofen and Yagati  Erich Kaltofen and Lakshman Yagati. Improved sparse multivariate polynomial interpolation algorithms. In P. Gianni, editor, Symbolic and Algebraic Computation, volume 358 of Lecture Notes in Computer Science, pages 467–474. Springer Berlin / Heidelberg, 1989. doi: 10.1007/3-540-51084-2_44.
- Kaltofen  Erich L. Kaltofen. Fifteen years after DSC and WLSS2: What parallel computations I do today [invited lecture at PASCO 2010]. In Proceedings of the 4th International Workshop on Parallel and Symbolic Computation, PASCO ’10, pages 10–17, New York, NY, USA, 2010. ACM. doi: 10.1145/1837210.1837213.
- Kaltofen and Yang  Erich L. Kaltofen and Zhengfeng Yang. Sparse multivariate function recovery from values with noise and outlier errors. In Proceedings of the 38th International Symposium on Symbolic and Algebraic Computation, ISSAC ’13, pages 219–226. ACM, 2013. doi: 10.1145/2465506.2465524.
- Kaltofen et al.  Erich L. Kaltofen, Michael Nehring, and B. David Saunders. Quadratic-time certificates in linear algebra. In Proceedings of the 36th International Symposium on Symbolic and Algebraic Computation, ISSAC ’11, pages 171–176. ACM, 2011. doi: 10.1145/1993886.1993915.
- Kaltofen et al.  Erich L. Kaltofen, Clément Pernet, Arne Storjohann, and Cleveland Waddell. Early termination in parametric linear system solving and rational function vector recovery with error correction. In Proceedings of the 2017 ACM on International Symposium on Symbolic and Algebraic Computation, ISSAC ’17, pages 237–244. ACM, 2017. doi: 10.1145/3087604.3087645.
- Khonji et al.  Majid Khonji, Clément Pernet, Jean-Louis Roch, Thomas Roche, and Thomas Stalinski. Output-sensitive decoding for redundant residue systems. In Proceedings of the 2010 International Symposium on Symbolic and Algebraic Computation, ISSAC ’10, pages 265–272. ACM, 2010. doi: 10.1145/1837934.1837985.
- Le Gall  François Le Gall. Faster algorithms for rectangular matrix multiplication. In 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, pages 514–523, Oct 2012. doi: 10.1109/FOCS.2012.80.
Le Gall 
François Le Gall.
Powers of tensors and fast matrix multiplication.In Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, ISSAC ’14, pages 296–303, New York, NY, USA, 2014. ACM. doi: 10.1145/2608628.2608664.
- Lingas  Andrzej Lingas. A fast output-sensitive algorithm for boolean matrix multiplication. In Amos Fiat and Peter Sanders, editors, Algorithms - ESA 2009, pages 408–419. Springer Berlin Heidelberg, 2009. doi: 10.1007/978-3-642-04128-0_37.
- Pagh  Rasmus Pagh. Compressed matrix multiplication. ACM Trans. Comput. Theory, 5(3):9:1–9:17, August 2013. doi: 10.1145/2493252.2493254.
- Storjohann and Yang  Arne Storjohann and Shiyun Yang. A relaxed algorithm for online matrix inversion. In Proceedings of the 2015 ACM on International Symposium on Symbolic and Algebraic Computation, ISSAC ’15, pages 339–346, New York, NY, USA, 2015. ACM. doi: 10.1145/2755996.2756672.
- Wang et al.  Qian Wang, Xianyi Zhang, Yunquan Zhang, and Qing Yi. Augem: Automatically generate high performance dense linear algebra kernels on x86 cpus. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’13, pages 25:1–25:12. ACM, 2013. doi: 10.1145/2503210.2503219.
- Yuster and Zwick  Raphael Yuster and Uri Zwick. Fast sparse matrix multiplication. ACM Trans. Algorithms, 1(1):2–13, July 2005. doi: 10.1145/1077464.1077466.