Asymmetric Multiresolution Matrix Factorization

10/10/2019 ∙ by Pramod Kaushik Mudrakarta, et al. ∙ 0

Multiresolution Matrix Factorization (MMF) was recently introduced as an alternative to the dominant low-rank paradigm in order to capture structure in matrices at multiple different scales. Using ideas from multiresolution analysis (MRA), MMF teased out hierarchical structure in symmetric matrices by constructing a sequence of wavelet bases. While effective for such matrices, there is plenty of data that is more naturally represented as nonsymmetric matrices (e.g. directed graphs), but nevertheless has similar hierarchical structure. In this paper, we explore techniques for extending MMF to any square matrix. We validate our approach on numerous matrix compression tasks, demonstrating its efficacy compared to low-rank methods. Moreover, we also show that a combined low-rank and MMF approach, which amounts to removing a small global-scale component of the matrix and then extracting hierarchical structure from the residual, is even more effective than each of the two complementary methods for matrix compression.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A wide variety of naturally occurring data, represented as large matrices, often has complex hierarchical structure. For instance, this is the case in data originating from social networks, complex biological interactions, circuit simulations and so on. With the increasing size of such matrices, there is a need to represent and compress them efficiently. Towards this end, a number of methods have been proposed (mahoney2009cur; coifman2006diffusion; hackbusch1999sparse; halko2009)

. All such methods make an implicit or explicit assumption about the nature of such matrices, which serves as a prior to facilitate efficient factorization and compression. Frequently, the assumption in question is that the matrices are of low rank. For example, Principal Component Analysis (PCA) decomposes a symmetric matrix

into the form


is a orthogonal matrix and

is diagonal. Notably, such methods are single-level algorithms. Moreover, the columns of are usually dense, which does not adequately capture strong locality properties that might be exhibited, for instance, in real world graphs.

Taking a different tack, Multiresolution Matrix Factorization (MMF) (kondor2014multiresolution) framed matrix factorization as a type of multiresolution analysis (MRA) over a discrete metric space whose metric structure is represented by a self-adjoint operator. Thus, in contrast to low-rank approaches, MMF factorizes a symmetric matrix in the following multilevel manner:

where each of the carefully selected matrices are orthogonal and extremely sparse, and the matrix is close to diagonal. MMF is able to discover soft hierarchical arrangements of localized structures in the data represented by the matrix. It is amenable to parallelization and fast implementations (kondor2015parallel). The wavelet bases uncovered by MMF also provide a natural sparse approximation for functions defined over the space. For instance, if the matrix is the normalized graph Laplacian, then the factorization provides us with wavelet bases for approximating functions on graphs.

While MMF has been shown to be useful in various practical settings (teneva2016multiresolution; ding2017multiresolution; ithapu2017decoding), one notable drawback of the method is that it is wholly defined using a self-adjoint operator such as the graph Laplacian. Therefore, MMF can only handle a multi-level factorization of symmetric matrices. However, many naturally occurring data are represented more concisely as an nonsymmetric matrix; the underlying space might only have a quasi-metric structure. This is the case for instance, when we are working with directed graphs. For data living on such spaces, hierarchical or multiscale structure is as important, albeit more complicated, and thus there is much to be gained by defining appropriate sparse wavelet bases for function approximation.

Defining such wavelet bases, in particular for functions on directed graphs, is an active area of research (sevi2018harmonic; satoshi2019; mhaskar2018). However, there is currently no consensus on the correct approach for doing so. Further, existing methods are difficult to implement in practice for downstream applications. In this paper, we do not make an attempt to directly extend multiresolution analysis (MRA) in the spirit of MMF to such spaces, but rather provide practical methods to leverage intuitions used in MMF to make it operable on general square matrices. We show experimentally that our methods can perform very well in various real world matrix compression tasks.

To conclude this section, our contributions in this paper are twofold: 1) we provide two practical ways to adapt MMF to nonsymmetric matrices, 2) we argue that extending MRA to nonsymmetric matrices is not analogous to developing a multi-level SVD; in particular, we show experimentally that it is important to consider interactions between different levels in the hierarchical structure and lastly 3) we also show that a combined low-rank and MMF approach, which amounts to removing a small global-scale component of the matrix and then extracting hierarchical structure from the residual, is even more effective than each of the two complementary methods for matrix compression

After a description of notation used in this paper, in Section 3, we give a brief introduction to Multiresolution Matrix Factorization.

2 Notation

We denote

. Inner products between vectors are denoted by the operator

. The -th entry of a matrix is denoted by . If is a set of indices, represents the submatrix of corresponding to those indices. denotes the complement of a set. When can be interpreted as an ordered set, for instance, if it is a set of indices, then represents the -th index from that set.

3 Multiresolution Matrix Factorization

Given a measurable space and square integrable functions on , multiresolution analysis (MRA) constructs a sequence of spaces of functions of increasing smoothness:

where each is split into a smoother space and a rougher space , such that . The length-scale over which a function varies increases with and thus makes this decomposition an analysis of multiscale/multiresolution.

kondor2014multiresolution extends MRA to functions on discrete spaces (e.g. undirected graphs) by interpreting a given symmetric matrix as a discrete version of a continuous self-adjoint operator. Specifically, for kondor2014multiresolution defines multiresolution analysis with respect to a symmetric matrix as a successive filtration


where each has an orthonormal basis , and its complementary space has an orthonormal basis that satisfy the following three axioms:


The sequence equation 1 is a filtration of in terms of smoothness. In particular,

decays with .


The basis vectors of are localized in the sense that

increases no faster than a certain rate.


Suppose the matrix represents the matrix expressing in terms of the previous basis . Then, is sparse. ( is taken to be the standard basis.)

Once an overcomplete multiscale basis satisfying MRA1-3 for , i.e., , is determined, it can be used to compress the matrix . Applying each of the ’s sequentially, we obtain versions of compressed to various scales:

Since the choice of the ’s aligns with MRA1, kondor2014multiresolution argues that the matrices and are approximately in core-diagonal form. That is, up to a permutation, each of those matrices has a block-diagonal form, with one block being a diagonal matrix.

Formally, Multiresolution Matrix Factorization is defined as an approximate multilevel factorization of the form


where the matrices and obey the following conditions:

  1. Each is orthogonal and highly sparse. In the simplest case, each is a Givens rotation, i.e., a matrix which differs from the identity in just the four matrix elements

    for some pair of indices and rotation angle . Multiplying a vector with such a matrix rotates it counter-clockwise by in the plane. More generally, is a so-called -point rotation, which rotates not just two, but coordinates.

  2. Typically, in MMF factorizations , and the size of the active part of the Q_ℓ matrices decreases according to a set schedule . More precisely, there is a nested sequence of sets such that the part of each rotation is the dimensional identity. is called the active set at level . In the simplest case, .

  3. is an -core-diagonal matrix, which means that it is block diagonal with two blocks: , called the core, which is dense, and which is diagonal. In other words, unless or .

Given a symmetric matrix, the factorization is carried out by minimizing the Frobenius norm error between the original matrix and its factorized form. Algorithms such as GreedyJacobi (kondor2014multiresolution) or pMMF (teneva2016multiresolution) are typically used in practical applications (teneva2016multiresolution; ding2017multiresolution; mudrakarta2017generic)

Having discussed the core ideas behind MMF, we briefly review some related art before moving forward.

4 Multiscale structure in nonsymmetric matrices

In this section, I want to write about why its not trivial to extend MMF to nonsymmetric matrices, and also explain a little bit about the intuition behind the formulations in the next sections. Two points to address:

  • How well is MRA1 satisfied in the formulations proposed? i.e., what measure of smoothness are we using?

Notes (just for guidance - not intended to be part of the paper)

  • The broad goal here is to efficiently compress and estimate large matrices arising in the real world

  • This process involves making assumptions on the matrix – the most common one being the low-rank assumption, which is typically modeled via PCA

  • The problem with PCA is that the eigenvectors are always dense, which does not capture cases where the matrix “more closely couples certain clusters of nearby coordinates than those farther apart w.r.t. some underlying topology”. Sparse PCA solves the locality problem, but does not capture global scale structure.

  • MMF is a method that aims to capture multiscale structure in discrete spaces such as graphs. Matrices are considered as operators on vectors, and MMF essentially decomposes that operator in terms of a multiscale basis.

  • Given a continuous operator such as a Laplacian. The application of this operator on a function can be expressed in terms of the eigenfunctions of the operator. In the case of the Laplacian, the eigenfunctions are the Fourier basis functions.

    Similarily, when the symmetric matrix is the operator, the application of this operator (i.e., matrix-vector product) can be expressed in terms of eigenvectors of the matrix. Thus, in this way the eigenvalue decomposition is an analogue of the Fourier decomposition.

  • Multiresolution Analysis (MRA) captures multiscale structure in a given space of functions by repeatedly reducing the space so that the smoothness of functions increases in some sense. For functions on , Mallat (1989) defines MRA in terms of the wavelets constructed via dilation and translation of certain canonical functions.

    Question: how does Mallat show that this form of constructing wavelets correspond to the idea of compressing the space by smoothness of functions? In particular, what exactly is the measure of smoothness that he uses, and how is it related to symmetricity/asymmetricity of an operator that is used to determine the smoothness of a function?

  • The MMF paper defines MRA on discrete spaces similarly, in terms of successively compressing a given sucspace. Note that each subpsace here would be defined by a finite set of basis vectors. Three axioms are defined that charactersize the smoothness of each space (MRA1), the localization (MRA2), and the existence of fast wavelet transform (MRA3).

    What does it mean to relax the criterion that each must be a strict subset of ?

    How is smoothness defined in the asymmetric case?

  • Now that we can determine a wavelet basis in matrix form , the given operator can be transformed into it by performing

    In the symmetric case, the wavelet functions are assumed to be close to eigenvectors of , so that the MRA1 criterion ensures that each subspace is smoother than the previous. This makes the structure of core-diagonal. MMF-factorizable matrices are defined to be those for which applying the wavelet transformation results in being core-diagonal.

    In reality, never is core-diagonal. We manually zero out entries of the wavelet-transformed matrix so that it comes to that form. That makes MMF an approximate factorization.

    In some of the asymmetric versions we develop, this criterion is relaxed, thereby violating the MRA1 axiom a bit, but that results in better compression.

5 Related Work

The work most closely related to some of the ideas in this paper is that on defining wavelets on functions on directed graphs. Two notable recent works on the theme are that of sevi2018harmonic and mhaskar2018. sevi2018harmonic did so by using a random walk operator, while mhaskar2018 used an extension of diffusion polynomial frames (mhaskarmaggioni), using a different operator. In fact, the influential work on diffusion wavelets (coifman2006diffusion), provides a more general theory in which symmetry of the diffusion operator is not necessary. However, it is unclear how to lift these ideas for defining a principled asymmetric MMF that has also a clear MRA interpretation. On the side of matrix factorization, works that have some resemblance to our approach are LiYangButterfly; LiYangInterpolate, in which the left and right matrices don’t have to be related by transposition. However these approaches don’t have a clear wavelet based motivation.

6 MMF on nonsymmetric matrices

Based on the MMF approach described in Section 3, in the sequel we provide two approaches that extend MMF to general square matrices.

6.1 MMF via an additive decomposition

We begin by observing that any square matrix

can be written as the sum of a symmetric matrix and a skew-symmetric matrix as follows


We propose performing MRA on each term separately. Since the first term is symmetric, it directly yields to a multiresolution matrix factorization. In particular, the factorization of can be obtained via a parallel variant of MMF such as pMMF (kondor2015parallel). Thus, the problem of computing a multiresolution factorization of an asymmetric square matrix is now reduced to computing a multiresolution factorization for a skew-symmetric matrix.

6.1.1 MMF for skew-symmetric matrices

For skew-symmetric matrices, the MRA axioms, especially MRA1, are no longer applicable. The reason being that under the definition of smoothness in MRA1 , for a skew-symmetric matrix , the smoothness is always zero.

In symmetric MMF, the choice of the sparse orthogonal factors and the diagonal part of the core-diagonal matrix are based on the assumption that the multiscale basis vectors determined by the MMF algorithm are as close to eigenvectors as possible. In other words, the filtration defined in equation 1 is a sequence of subspaces that is as close to a spectral filtration as possible. Indeed, MMF approximately diagonalizes a symmetric matrix using rotations, where is the dimension of the matrix. We consider this intuition as our starting off point to extend MMF to skew-symmetric matrices.

Note that a matrix is skew-symmetric if . Skew-symmetric matrices have complex eigenvalues and are therefore not diagonalizable in real space. Thus, a skew-symmetric matrix can never be perfectly represented using the same form of MMF as for a symmetric matrix over reals. In order to overcome this issue, we consider a well known observation (murnaghan1931canonical) that a skew-symmetric matrix, under a unitary transformation, assumes the so-called Murnaghan canonical form,


where are matrices of the form

where .

This observation affords a multiresolution factorization for skew-symmetric matrices identical to that for symmetric matrices with the difference that the core-diagonal matrix is allowed to have the following form:


where are matrices as defined above.

A couple of observations based on this form are in order. First, notice that the skew-symmetricity of the matrix in its MMF is always preserved. Second, this form also allows for interaction between wavelets at adjacent levels. This feature is distinct from the original MMF procedure in which no such interactions occur. To obtain the factorization, the procedure is similar to the GreedyJacobi algorithm from kondor2014multiresolution, with the difference that in in the last step, instead of zeroing out off-diagonal elements, we sparsify the matrix into the Murnaghan normal form as described above.

Thus, the multiresolution decomposition of a square matrix can be done by factorizing each of the two parts from equation 3 separately and writing as a sum of two MMFs. In other words, we may write



where has the form defined in equation 5.

In the next section, we present another approach for multiresolution factorization.

6.2 MMF via direct factorization

In this approach, we aim to capture hierarchical structure by operating on the square matrix directly rather than decomposing it into an additive form. Recall that we would like to identify clusters of coordinates that a matrix more closely couples than others, and determine an appropriate multiresolution analysis of

. When the matrix is symmetric, the linear transformation that it represents is self-adjoint. In the asymmetric case, depending on whether we are transforming

or , we may end up with different ways in which might couple the coordinates.

We consider two different ways of performing multiresolution analysis of , one that is based on , and another on . Suppose represents the wavelet transform in the former case, and of the latter. Thus, we have

which is the matrix in the bases and . In the symmetric case, takes a core-diagonal form, based on the fact that the eigenvalue decomposition converts a symmetric matrix to diagonal, and MMF uses orthonormal transformations that are designed to be as close to eigenvectors as possible. In the asymmetric case, we relax this assumption and allow some of the off-diagonal elements of to be nonzero. This is motivated by the fact that the frequencies or smoothness coefficients filtered by the left and right hand sides may not align with each other.



represents the adjacency matrix of a directed graph, we are essentially identifying hierarchical cluster structures considering 1) only the outgoing edges, and 2) only the incoming edges. The matrix

gives a mapping between these two cluster structures.

We define asymmetric MMF to be a multiresolution factorization of the following form


, where

  • Each are highly sparse, orthogonal rotation matrices similar to the rotations in a symmetric MMF

  • is a core-sparse matrix, that is, under an appropriate permutation of row and columns, would be a block-diagonal matrix with the upper left block being dense, and the other 3 blocks would be highly sparse

  • When and is in core-diagonal form with a symmetric core, the above factorization would be a symmetric MMF.

The computation of the asymmetric MMF is obtained using an approach similar to the symmetric MMF. The algorithm we propose is designed to minimize the Frobenius norm error

We determine the left and right rotations in a greedy manner. That is, we compute a sequence of rotated version of

Each of the rotations is computed by identifying sets of similar rows (for the left rotations) and columns (for the right rotations), and determining an appropriate -point rotation. This requires us to compute Gram matrices on both the rows as well as the columns. Finally, we sparsify the final rotated version of to determine the core-sparse matrix . We explore three ways of sparsification

  1. CoreDiagonal: retain only the diagonal elements of

  2. TopN: retain the top elements in terms of magnitude

  3. GreedyTopN: select elements by magnitude greedily such that no two of the selected elements fall in the same row/column

The full procedure is summarized in Algorithm 1.

The running time of this algorithm is , similar to GreedyJacobi routine used in the original symmetric MMF. Nevertheless, computation can be made faster by similarly extending the parallel MMF algorithm to maintain row and column Gram matrices and computing separate left and right rotations. It must be noted that applying the left rotations to does not affect the column Gram matrix, and applying the right rotations does not affect the row Gram matrix.

0:  core size , matrix
  Set of active rows
  Set of active columns
  for  to  do
     Select a random row from the set
     Remove from
     Select a random column from the set
     Remove from
  end for
   or or
Algorithm 1 Asymmetric MMF via direct factorization

6.3 Numerical experiments

Similar to the evaluation of MMF in teneva2016multiresolution, we compare our asymmetric MMF formulations with the low-rank-based CUR decomposition (mahoney2009cur)

for the task of compressing square matrices. The CUR decomposition is a widely used alternative to the singular value decomposition to compress large matrices. To reduce a matrix

to rank-, the method proceeds by carefully selecting rows () and columns () and determines a matrix such that the product is as close to the best rank- approximation of as possible.

To set up a comparison, we sample 70 square matrices from the UFlorida sparse matrix repository (davis2011university). The matrices in this repository were sourced from various fields of science and engineering and contain data arising from circuit simulations, social networks, chemical processes, 2D/3D problems, etc. All the sampled matrices were chosen such that the numerical symmetry was less than , and had a maximum dimensionality of .

6.3.1 Results on MMF via an additive decomposition

We first present results comparing the asymmetric MMF obtained by the additive decomposition to CUR. Table 1 shows the percentage of times the asymmetric MMF had a better compression error over CUR over 65 matrices. Since both the CUR and MMF algorithms are randomized, we report results averaged over 3 trials. CUR performs better than MMF in the majority of the cases.

Matrix kind num. matrices 1% 10% 25% 50% 75%
problems with underlying 2D/3D geometry
structural problem 4 50.0 50.0 50.0 0.0 0.0
computational fluid dynamics problem 3 0.0 0.0 0.0 0.0 0.0
2D/3D problem 5 80.0 80.0 60.0 60.0 60.0
materials problem 2 0.0 0.0 0.0 0.0 0.0
problems with no underlying geometry
economic problem 11 18.2 9.1 9.1 9.1 9.1
directed graph 19 5.3 5.3 5.3 0.0 0.0
power network problem 3 0.0 0.0 0.0 0.0 0.0
circuit simulation problem 2 0.0 0.0 0.0 0.0 0.0
chemical process simulation problem 23 0.0 0.0 8.7 0.0 0.0
statistical/mathematical problem 1 100.0 0.0 0.0 0.0 0.0
counter-example problem 1 0.0 0.0 0.0 0.0 0.0
total wins: 13.5 10.8 12.2 5.4 5.4
Table 1: Matrix compression using asymmetric MMF (additive). Each entry of the rightmost 5 columns is the percentage of matrices on which MMF performed better than CUR, as measured by Frobenius norm compression error. Each column corresponds to the percentage size (w.r.t original size) to which a matrix is compressed.

6.3.2 Results on MMF via direct factorization

Next, we report results on the direct factorization variant of the asymmetric MMF. In particular, we consider two different versions of the method based on what elements are retained in the core-sparse matrix . In the first case, we retain the compressed dense core, and elements chosen greedily according to magnitude and made sure that no two of these elements fall in the same row or column.

Table 2 shows the percentage times this version of the asymmetric MMF wins over CUR in compression error.

Matrix kind num. matrices 1% 10% 25% 50% 75%
problems with underlying 2D/3D geometry
structural problem 4 50.0 50.0 25.0 0.0 0.0
materials problem 1 0.0 0.0 0.0 0.0 0.0
2D/3D problem 4 100.0 75.0 75.0 75.0 0.0
computational fluid dynamics problem 3 100.0 100.0 0.0 0.0 0.0
problems with no underlying geometry
statistical/mathematical problem 1 100.0 100.0 100.0 0.0 0.0
economic problem 9 77.8 22.2 11.1 33.3 11.1
power network problem 2 100.0 0.0 50.0 0.0 0.0
directed graph 15 80 20 6.7 0.0 0.0
chemical process simulation problem 21 80.9 23.8 14.3 0.0 0.0
circuit simulation problem 2 100.0 0.0 0.0 0.0 0.0
counter-example problem 1 0.0 0.0 0.0 0.0 0.0
total wins: 79.4 30.2 17.5 9.5 1.6
Table 2: Matrix compression using asymmetric MMF (TopN). See Table 1 for a description of the table entries.

We observe that this version of MMF is significantly better than the additive version, and is able to capture structure in both problems with underlying known geometry and without underlying geometry.

Relaxing the structure of the matrix further, we report the results for simply storing the top elements in the core-sparse matrix in Table 3.

Matrix kind num. matrices 1% 10% 25% 50% 75%
problems with underlying 2D/3D geometry
structural problem 4 100.0 100.0 100.0 50.0 50.0
materials problem 1 100.0 100.0 100.0 100.0 100.0
2D/3D problem 5 100.0 100.0 100.0 80.0 60.0
computational fluid dynamics problem 3 100.0 100.0 100.0 0.0 0.0
problems with no underlying geometry
directed graph 18 100.0 50.0 22.3 16.7 0.0
chemical process simulation problem 21 100.0 90.5 67.7 57.1 38.1
circuit simulation problem 2 100.0 100.0 100.0 50.0 0.0
power network problem 3 100.0 100.0 100.0 66.7 33.3
counter-example problem 1 100.0 100.0 100.0 100.0 100.0
economic problem 10 100.0 100.0 100.0 70.0 50.0
statistical/mathematical problem 1 100.0 100.0 100.0 100.0 0.0
total wins: 100.0 84.1 69.6 49.3 29.0
Table 3: Matrix compression using asymmetric MMF (GreedyTopN). See Table 1 for a description of the table entries.

Here, we see that MMF is able to capture structure and compress nearly all kinds of matrices at all compression levels.

From the numerical results, it seems that allowing interaction between wavelet levels is critical to achieving good approximation errors. This indicates that hierarchical structure in nonsymmetric matrices is more complex than that uncovered by simply extending the symmetric MMF analogous to an SVD.

For smaller compression ratios, CUR seems to be preferable for some types of problems such as directed graphs. It is possible that some matrices may have global-scale behavior that is superimposed over the hierarchical structure. This is possible, for instance, when the data is noisy. In the next part, we explore ways of combining MMF and low-rank approaches to achieve even better compression errors.

6.3.3 Matrix compression via a combination of CUR and MMF

In order to better motivate this section, we first consider Figure 1, which shows the error of MMF approximation of a matrix as a function of the decay rate of its singular values.

Figure 1: Approximation error of symmetric MMF with different rates of decay in the eigenvalues. The random matrices were generated by computing , where is a fixed, randomly generated orthogonal matrix. is a diagonal matrix whose entries are given by , where is the decay coefficient and takes uniformly spaced values between and . The X-axis corresponds to .

We observe that when the singular values decay at a super-linear rate, MMF does not approximate the matrix well. However, in many real-world matrices, the decay of singular values may not be uniform and may fluctuate to very high and low values. An example of such a spectrum is shown in Figure 2. In this case we see that although there are segments where the decay rate is sub-linear, the large decay rate around singular value 1500 may significantly impact the effectiveness of MMF on this matrix. Thus, it might be well motivated to first compress the matrix using low-rank methods and then use MMF on the residual. We describe this technique in Algorithm 2.

Figure 2: Plot showing the singular values of the ‘ex8’ matrix from the UFlorida sparse matrix repository in descending order. Although the matrix is full-rank, the sharp drop in the singular values around index 800 imparts a low-rank nature to the matrix. (Note that the y-axis is on log-scale). Image sourced from UFlorida Sparse Matrix Repository (davis2011university).
0:  Square matrix , compression size , and
  [C,U,R] = CUR(A, r)
  [P, H, Q] = MMF(CUR, k)
Algorithm 2 Matrix compression using a combination of low-rank and multiresolution methods

For various values of , we report the errors for compressing various matrices to 5% of their original size in Figure 3.

Figure 3: Frobenius norm error as a function of CUR compression rank for compressing the matrix to 5% of its original size using the combined low-rank + multiresolution method. Solid blue line indicates error of compression with only CUR to 5% matrix size, while the dotted blue line indicates the same, but with MMF.

In most of the matrices, we observe that there exists at least one rank to which the matrix can be first compressed using CUR before compressing with MMF, that results in lower compression errors than each of CUR and MMF methods individually. In practice, there is no method to determine what this rank would be for a given matrix, however, it is an interesting and worthwhile problem to be able to remove a small global-scale component of the matrix, after which the hierarchical structure may be easily extracted.

In all the experiments, the running time of MMF is equal or slightly slower than that of low-rank-based methods. However, we note that our algorithms can be made significantly faster by straightforwardly adapting parallel versions of MMF (kondor2015parallel) to the asymmetric case. We expect speedups similar to those achieved in the symmetric case (teneva2016multiresolution).

7 Conclusion

In this paper, we have presented approaches to extend Multiresolution Matrix Factorization to nonsymmetric matrices. In general, efficiently estimating hierarchical structure in matrices is an increasingly important problem for which there currently exist very few practically useful methods. For future work we aim to use ideas from wavelet theory on more general spaces to define MMFs for general square matrices in which the MRA interpretation is clear.