Deterministic Completion of Rectangular Matrices Using Asymmetric Ramanujan Graphs

08/02/2019 ∙ by Shantanu Prasad Burnwal, et al. ∙ Indian Institute of Technology Hyderabad 0

In this paper we study the matrix completion problem: Suppose X ∈R^n_r × n_c is unknown except for an upper bound r on its rank. By measuring a small number m ≪ n_r n_c of elements of X, is it possible to recover X exactly, or at least, to construct a reasonable approximation of X? There are two approaches to choosing the sample set, namely probabilistic and deterministic. At present there are very few deterministic methods, and they apply only to square matrices. The focus in the present paper is on deterministic methods that work for rectangular as well as square matrices. The elements to be sampled are chosen as the edge set of an asymmetric Ramanujan graph. For such a measurement matrix, we derive bounds on the error between a scaled version of the sampled matrix and unknown matrix, and show that, under suitable conditions, the unknown matrix can be recovered exactly. Even for the case of square matrices, these bounds are an improvement on known results. Of course they are entirely new for rectangular matrices. This raises the question of how such asymmetric Ramanujan graphs might be constructed. While some techniques exist for constructing Ramanujan bipartite graphs with equal numbers of vertices on both sides, until now no methods exist for constructing Ramanujan bipartite graphs with unequal numbers of vertices on the two sides. We provide a method for the construction of an infinite family of asymmetric biregular Ramanujan graphs with q^2 left vertices and lq right vertices, where q is any prime number and l is any integer between 2 and q. The left degree is l and the right degree is q. So far as the authors are aware, this is the first explicit construction of an infinite family of asymmetric Ramanujan graphs.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 General Statement

Compressed sensing refers to the recovery of high-dimensional but low-complexity objects from a small number of linear measurements. Recovery of sparse (or nearly sparse) vectors, and recovery of high-dimensional but low-rank matrices are the two most popular applications of compressed sensing. The object of study in the present paper is the matrix completion problem, which is a special case of low-rank matrix recovery. Matrix completion has been getting a lot of attention because of its application to different areas such as image processing, sketching, quantum tomography, and recommendation systems (e.g., the Netflix problem). An excellent survey of the matrix completion problem can be found in

[1].

1.2 Problem Definition

The matrix completion problem can be stated formally as follows: Suppose is an unknown matrix that we wish to recover whose rank is known to be bounded by a known integer . Let denote the set for each integer . In the matrix completion problem, a set is specified, known as the sample set. The measurements consist of for all . Let us define the matrix to be the binary matrix with an element of in the location and zeros elsewhere, and define

To be specific, suppose , where is the total number of samples. We are able to observe the values of the unknown matrix at the locations in the set . Then the measurement can be expressed as the Hadamard product111Recall that the Hadamard product of two matrices of equal dimensions is defined by for all . where is defined by

From these measurements, and the information that , we aim to construct uniquely, or at least to construct a good approximation of .

One possible approach to the matrix completion problem is to set

(1)

The above problem is a special case of minimizing the rank of an unknown matrix subject to linear constraints, and is therefore NP-hard [2]. Since the problem is NP-hard, a logical approach is to replace the rank function by its convex relaxation, which is the nuclear norm

, or the sum of the singular values of a matrix, as shown in

[3]. Therefore the convex relaxation of (1) is

(2)

It can be shown that, under suitable conditions, the unique solution to (2) is the true but unknown matrix . Such results are reviewed in Section 2.

Another emerging trend is to use the so-called “max-norm” introduced in [4]. To define this norm, we begin by recalling that, if , then an induced matrix norm is given by

where denotes the -th row of the matrix . The max-norm of a matrix is defined as

(3)

With this definition, an alternate approach to matrix completion is

(4)

1.3 Contributions of the Present Paper

In the literature to date, most of the papers assume that the sample set is chosen at random from , either without replacement as in [5], or with replacement [6]. The authors are aware of only two papers [7, 8] in which a deterministic procedure is suggested for choosing the sample set as the edge set of a Ramanujan graph. (This concept is defined below).

In case is chosen at random, it makes little difference whether the unknown matrix is square or rectangular. However, if is to be chosen in a deterministic fashion, then the approach suggested in [7, 8] requires that the unknown matrix be square.222Though the paper [7] uses the notation , in the theorems it is assumed that . The reason for this is that, while it is possible to define the notion of a Ramanujan bigraph, until now there is not a single explicit construction of such a graph, only some abstract formulas that are not explicitly computable [9, 10]. One of the main contributions of the present paper is to present an infinite family of Ramanujan bigraphs; this is the first such explicit construcion. Using this construction, we prove explicit deterministic procedures for choosing the sample set to recover an unknown rectangular matrix, and prove bounds on the recovery error. These bounds are an improvement on the available bounds in two different ways. First, these bounds are applicable for rectangular matrices, while existing deterministic methods do not apply to this case. Second, even in the case of square matrices, our bounds improve currently available bounds. These improvements are achieved though modifying the so-called “expander mixing lemma” for bipartite graphs, which is a result that is possibly of independent interest.

In addition to developing the theory, we also study the “phase transition behavior” of nuclear norm minimization as a recovery technique, which show that the currently available sufficient conditions for matrix completion are quite far from being necessary.

2 Literature Review

In [5], the authors point out that the formulations (1) or (2) do not always recover an unknown matrix. They illustrate this by taking as the matrix with a in the position and zeros elsewhere. In this case, unless , the solution to both (1) and (2

) is the zero matrix, which does not equal

. The difficulty in this case is that the matrix has high “coherence,” as defined next.

Definition 1.

Suppose has rank

and the reduced singular value decomposition

, where , , and is the diagonal matrix of the nonzero singular values of . Let denote the orthogonal projection of onto . Finally, let denote the -th canonical basis vector. Then we define

(5)

where is the -th row of . The quantity is defined analogously, and

(6)

Next, define

(7)

It is shown in [5] that . The upper bound is achieved if any canonical basis vector is a column of . (This is what happens with the matrix with all but one element equalling zero.) The lower bound is achieved if every element of has the same magnitude of , that is, a Walsh-Hadamard matrix.

To facilitate the statement of some known results in matrix completion, we reproduce from the literature two standard coherence assumptions on the unknown matrix .

  1. There are known upper bounds on and respectively.

  2. There is a constant such that

    (8)
    (9)

    where is shorthand for .

2.1 Probabilistic Sampling

There are two approaches to choosing the sample set , namely probabilistic and deterministic. In the probabilistic approach the elements of are chosen at random from . In this setting one can further distinguish between two distinct situations, namely sampling from with replacement or without replacement. If one were to sample out of the elements of the unknown matrix without replacement, then one is guaranteed that exactly distinct elements of are measured. However, the disadvantage is that the locations of the samples are not independent, which makes the analysis quite complex. This is the approach adopted in [5].

Theorem 1.

(See [5, Theorem 1.1].) Draw

(10)

samples from

without replacement. Then with probability atleast

where

(11)

the recovered matrix using (2) is be the unique solution. Here are some universal constants that depend on , and .

An alternative is to sample the elements of with replacement. In this case the locations of the samples are indeed independent. However, the price to be paid is that, with some small probability, there would be duplicate samples, so that after random draws, the number of elements of that are measured could be smaller than . This is the approach adopted in [6]. On balance, the approach of sampling with replacement is easier to analyze.

Theorem 2.

(See [6, Theorem 2].) Assume without loss of generality that . Choose some constant , and draw

(12)

samples from with replacement. Define as in (2). Then, with probability at least equal to where

(13)

the true matrix is the unique solution to the optimization problem, so that .

2.2 Basic Concepts from Graph Theory

In contrast with probabilistic sampling, known deterministic approaches to sampling make use of the concept of Ramanujan graphs. For this reason, we introduce a bare minimum of graph theory. Further details about Ramanujan graphs can be found in [11, 12].

Suppose . Then can be interpreted as the biadjacency matrix of a bipartite graph with vertices on one side and vertices on the other. If , then the bipartite graph is said to be balanced, and is said to be unbalanced if . The prevailing convention is to refer to the side with the larger () vertices as the “left” side and the other as the “right” side. A bipartite graph is said to be left-regular with degree if every left vertex has degree , and right-regular with degree if every right vertex has degree . It is said to be -biregular if it is both left- and right-regular with row-degree and column-degree . Obviously, in this case we must have that . It is convenient to say that a matrix is “-biregular” to mean that the associated bipartite graph is -biregular. The bipartite graph corresponding to is said to be an asymmetric Ramanujan graph if

(14)

2.3 Deterministic Sampling

The following result is claimed in [7].

Theorem 3.

(See [7, Theorem 4.2].) Suppose Assumptions (A1) and (A2) hold. Choose to be the adjacency matrix of regular graph such that , and . Define as in (2). With these assumptions, if

(15)

Then the true matrix is the unique solution to the optimization problem (2).

Theorems 1 and 2 pertain to nuclear norm minimization as in (2). An alternate set of bounds is obtained in [8] for max norm minimization as in (3). The matrix is assumed to be square, with .

Theorem 4.

(See [8, Theorem 2].) Suppose is the adjacency matrix of a

-regular graph with second largest (in magnitude) eigenvalue equal to

. Define as in (3). Then

(16)

where is Grothendieck’s constant.

There is no closed-form formula for this constant, but it is known that

See [13] for this and other useful properties of Grothendieck’s constant.

Theorems 1 and 2 on the one hand, and Theorem 4 on the other hand, have complementary strengths and weaknesses. Theorems 1 and 2 ensure the exact recovery of the unknown matrix via nuclear norm minimization. However, the bounds involve the coherence of the unknown matrix as well as its rank. In contrast, the bound in Theorem 4 is “universal” in that it does not involve either the rank or the coherence of the unknown matrix , just its max norm. Moreover, the bound is on the Frobenius norm of the difference , and thus provides an “element by element” bound. On the other hand, there are no known results under which max norm minimization exactly recovers the unknown matrix.

3 New Results

In this section we state without proof the principal new results in the paper. The proofs are given in subsequent sections.

3.1 Rationale of Using Ramanujan Bigraphs

We begin by giving a rationale of why biadjacency matrices of Ramanujan bigraphs are useful as measurement matrices. Suppose we could choose , the matrix of all ones. Then , and we could recover exactly from the measurements. However, this choice of corresponds to measuring every element of , and there would be nothing “compressed” about this sensing. Now suppose that , the biadjacency matrix of a -biregular graph. Then is the largest singular value of , with corresponding row and column singular vectors and . Let denote the second largest singular value of . Then

where denotes the spectral norm of a matrix (i.e., its largest singular value). Using the formulas for and and rescaling shows that

This formula can be expressed more compactly by defining the constant , as

where the various equalities follow from the fact that . One can think of as the fraction of elements of the unknown matrix that are sampled. Since , we see that

where . Therefore

(17)

Now note that

Therefore, the smaller is compared to , the better the approximation error is betwen and the unknown matrix .333Note that are the dimensions of the unknown matrix and are therefore fixed. Now, a Ramanujan graph is one for which this ratio is as small as possible. It is shown in [14] that, if are kept fixed while are increased, subject of course to the constraint that , then (14) gives the best possible upper bound on .

3.2 Error bounds using deterministic sampling

Theorem 5 below extends [7, Theorem 4.1] to rectangular matrices. It provides an upper bound on the error between a scaled version of the measurement matrix an the true matrix . Note that there is no optimization involved in applying this bound.

Theorem 5.

Suppose the sampling set comes from a -regular bipartite graph, and represents the magnitude of the second largest singular value (and of course is the largest singular value). Suppose is a matrix of rank or less, and let denote its coherence as defined in (6). Then

(18)

where denotes the spectral norm (largest singular value) of a matrix.

Remark: Observe that the bound in (18) is a product of two terms: which depends on the measurement matrix , and which depends on the unknown matrix .

Corollary 1.

Suppose the sampling set comes from a - regular asymmetric Ramanujan graph, Then

(19)

Theorem 6 extends [8, Theorem 2] to rectangular matrices. Even for square matrices, the bound in Theorem 6 is smaller by a factor of two compared to that in [8, Theorem 2], stated here as Theorem 4. Note that, in contrast with Theorem 5, the bound in Theorem 6 does not involve the coherence of the unknown matrix, nor its rank. Moreover, the bound is on the Frobenius norm of the difference, and is therefore an “element by element” bound, unlike in Theorem 5.

Theorem 6.

Suppose the sampling set comes from a -regular bipartite graph, and let denote the second largest singular value of its biadjacency matrix.444Note that biregularity implies that the largest singular value is . Suppose is a solution of (3). Then

(20)

where is the Frobenious norm, is the max norm and is Grothendieck’s constant.

Corollary 2.

Suppose the sampling set comes from a - regular asymmetric Ramanujan graph, Then

(21)

The next theorem presents a sufficient condition under which nuclear norm minimization as in (2) and sampling matrix from a Ramanujan bigraph leads to exact recovery of the unknown matrix. Note that [7, Theorem 4.2] claims to provide such a sufficient condition for square matrices. However, in the opinion of the authors, there is a gap in the proof, as discussed in the Conclusions section. Therefore Theorem 7 can be thought as the first one to prove exact recovery using nuclear norm minimization and a deterministic sampling matrix.

Theorem 7.

Suppose is a matrix of rank or less, and satisfies the incoherence assumptions and with constants and .555Note that, unlike [5, 6], we do not require the constant . Suppose a biadjacency matrix of a biregular graph , and let denote the second largest singular value of matrix . Define

(22)

and suppose that

(23)
(24)

Then is the unique minimum of (2).

3.3 Construction of Asymmetric Ramanujan Graphs

There are very few explicit constructions of Ramanujan graphs. The first two explicit constructions are given in [15, 16] for some choices of . Two recent papers [17, 18] prove the existence of bipartite Ramanujan graphs of all degrees and all vertex sizes , but do not give readily computable procedures. The paper [19] gives a supposedly polynomial-time algorithm for implementing the recipes in these papers, but it is quite opaque and does not contain any pseudocode. Explicit construction of rectangular Ramanujan graphs are even fewer. The only constructions of which the authors are aware are in [9, 10], and these are very abstract and not explicitly computable.

We now present our construction of asymmetric Ramanujan graphs with vertices on one side and vertices on the other side, for every prime number and every integer between and . As mentioned above, we believe this is the first explicit construction of an asymmetric Ramanujan graph. Note that when the bipartite graph is balanced, and our construction gives another class of Ramanujan graphs. This construction is inspired by so-called array codes from LDPC (low density parity check) codes [20, 21]. The construction is as follows: Let be a prime number, and let denote the “right shift” permutation. Thus and the remaining elements are all zero, with interpreted modulo . Define

(25)

It is easily seen that the bipartitate graph defined by has left vertices, right vertices, , and . Therefore the largest singular value of is .

Theorem 8.

The matrix has a singular value of , singular values of , and singular values of . Therefore, whenever , defines a Ramanujan bigraph. With , defines a balanced Ramanujan bipartite graph.

4 Proofs

In this section we give the proofs of various theorems in the previous section. We state a couple of lemmas that are used repeatedly in the sequel. Throughout we use the notation that if is a matrix, then denote the -th row and -th column of respectively. The -th element of is denoted by .

4.1 Some Preliminary Results

Theorem 9.

Suppose , , and . Suppose further that . Then

(26)
Proof.

The proof follows readily by expanding the triple product. Note that

Therefore

as desired. ∎

Theorem 10.

Suppose are as in Theorem 9. Suppose further that

(27)

Then

(28)
Proof.

Recall that, for any matrix , we have that

In particular

where the last step follows from Theorem 9. Now fix such that . Then

Therefore (28) is proved once it is established that, whenever , it follows that

(29)

To prove (29), apply Schwarz’ inequality to deduce that

(30)

Now

By entirely similar reasoning, we get

Substituting these two bounds into (30) establishes (29) and completes the proof. ∎

4.2 Proof of Theorem 5

Proof.

As before, define

and recall that

Now suppose is a singular value decomposition of , so that . Define . Then . Moreover

because , and the definition of the coherence . Similarly

Now apply Theorem 10 with

and note that . Then (28) becomes

as desired. ∎

4.3 Proof of Theorem 6

The proof of Theorem 6 is based on the following extension of the expander mixing lemma from [22] for rectangular expander graphs, which might be of independent interest.

Lemma 1.

Let be the adjacency matrix of an asymmetric biregular graph with vertices so that , and is the largest singular value of . Let denote the second largest singular value of . Then for all and , we have:

(31)

where is the number of edges between the two vertex sets and , and is the total number of edges in the graph.

Remark: First we explain why this result is called the “expander mixing lemma.” Note that is the fraction of rows that are in , while is the fraction of columns that are in . If the total number of edges

were to be uniformly distributed, then the term on the left side of (

31) would equal zero. Therefore the bound (31

) estimates the extent to which the distribution of edges deviates from being uniform.

The above result extends [8, Theorem 8] which is adapted from [22, Lemma 2.5] to regular Ramanujan graphs. Moreover, the bound given here is tighter, because of the presence of the two square-root terms on the right side. As become larger, the square root terms tend to zero. No such term is present in [22, Lemma 2.5].

Proof.

Let denote the characterstic vectors of sets respectively. Then

Write , and note that, due to the biregularity of , we have that , , and . Next, write and , where belong to the row null space and column null space of respectively. Note that , and similarly . Then

Rearranging the above gives

(32)

Next, by Schwarz’ inequality, it follows that

Now note that

and similarly

This implies that

(33)

Substituting this into (32), dividing both sides by gives the first expression in (31). The second expression follows from . ∎

Theorem 11.

Suppose and is the edge set of an asymmetric biregular graph. Then

(34)

where is Grothendieck’s constant.

Proof.

Let be a rank sign matrix with entries, and define its corresponding binary matrix by , where is a matrix with all ones. Because is a rank sign matrix, it can be expressed as , where and . Define

Let represent the characterstic vector of set . Let and , and let denote the complements of in the sets respectively. Then , , and . Therefore