 # Graph Fourier Transform Based on ℓ_1 Norm Variation Minimization

The definition of the graph Fourier transform is a fundamental issue in graph signal processing. Conventional graph Fourier transform is defined through the eigenvectors of the graph Laplacian matrix, which minimize the ℓ_2 norm signal variation. However, the computation of Laplacian eigenvectors is expensive when the graph is large. In this paper, we propose an alternative definition of graph Fourier transform based on the ℓ_1 norm variation minimization. We obtain a necessary condition satisfied by the ℓ_1 Fourier basis, and provide a fast greedy algorithm to approximate the ℓ_1 Fourier basis. Numerical experiments show the effectiveness of the greedy algorithm. Moreover, the Fourier transform under the greedy basis demonstrates a similar rate of decay to that of Laplacian basis for simulated or real signals.

Comments

There are no comments yet.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

### 1.1 Graph Fourier transform

In many applications such as social, transportation, sensor and neural networks, high-dimensional data is usually defined on the vertices of weighted graphs

Shuman2013 . To process signals on graphs, traditional theories and methods established on the Euclidean domain need to be extended to the graph setting. There are many works in this area in recent years, including spectral graph theory Chung1997 , Fourier transform for directed graphs Sardellitti2017 ; Sandryhaila2013 , short-time Fourier transform on graphs Shuman2016 , wavelets on graphs Gavish2010 ; Hammond2011 ; XChen2014 ; Dong2017 , graph sampling theory Chen2015 , uncertainty principle Agaskar2013 , etc.

The definition of the graph Fourier transform plays a central role in graph signal processing. By Fourier transform, a graph signal is decomposed into different spectral components and thus can be analyzed from the Fourier domain. The popular definition of graph Fourier transform is through the eigenvectors of the graph Laplacian matrix. Although this definition is adopted by many researchers, it has some limitations. First, the definition only applies to undirected graphs. Second, the computation of the Laplacian eigenvectors is rather expensive when the graph is large. Therefore, it is tempting to find an alternative definition of graph Fourier transform without these disadvantages.

One basic requirement for the Fourier basis is that the basis vectors should represent a range of different oscillating frequencies. For a time-domain signal, the classical Fourier transform decomposes it into different frequency components. Likewise, in the graph setting, one expects the graph Fourier basis to have a similar property, i.e., the basis vectors represent different oscillating frequencies. Generally speaking, the magnitude of oscillation of a signal can be measured by its variation. In fact, the

norm variation of the Laplacian eigenvectors

is characterized by the corresponding eigenvalue

. When the eigenvalues are arranged in ascending order, the variation of the eigenvector will be ascending with , thus representing a range of frequencies from low to high. Moreover, the eigenvector minimizes the norm variation in the subspace orthogonal to the span of the previous eigenvectors.

Recently, Sardellitti et al. proposed a definition of directed graph Fourier basis as the set of orthogonal vectors minimizing the graph directed variation, and proposed two algorithms (SOC and PAMAL) to solve the related optimization problem Sardellitti2017 . However, there is a lack of theoretic analysis of the proposed Fourier basis, and the computational complexity of the proposed algorithms are rather high. Slightly different from Sardellitti’s approach, we propose a definition of Fourier basis based on iteratively solving a sequence of norm variation minimization problems. We rigorously prove a necessary condition satisfied by the proposed Fourier basis. Further, we provide a fast greedy algorithm to approximately construct the Fourier basis. Numerical experiments show the algorithm is effective, and the Fourier coefficients under the greedy basis and Laplacian basis have nearly the same rate of decay for simulated or real signals.

The rest of the paper is organized as follows. In Section 2, we discuss the relation between graph Fourier basis and signal variation, and propose the definition of Fourier basis based on norm variation minimization. In Section 3, we prove a necessary condition of Fourier basis, showing that the th basis vector ’s components have at most different values. In Section 4, we provide a greedy algorithm to construct an approximate basis. In Section 5, we present some numerical results. Section 6 is a final conclusion.

### 1.2 Notations

In this paper we use the following notations.

For a matrix , denotes its column space, i.e., ; and denotes its kernel, i.e., .

For a vector , denotes its Euclidean norm, i.e., . For a matrix , denotes its operator norm, i.e., . Denote by the open ball centered at with radius .

The cardinality of a set is denoted by . Let be a positive integer, and . For any , we use to denote the indication vector of , i.e., if and otherwise. is also written as .

For and subsets , is defined as .

## 2 Graph Fourier basis and signal variation

In this section, we shall derive the relationship between the graph Fourier basis and signal variation. Let us begin with the basic terminology of graph signal processing. Let be a connected, undirected, and weighted graph, where the vertices set and the weight matrix satisfying and . The degree of a vertex is defined as , and the degree matrix . The combinatorial Laplacian matrix is defined as . Since is symmetric and positive semi-definite, it has eigenvalues and the corresponding set of orthonormal eigenvectors . We call the Laplacian basis of . A graph signal is a real-valued function defined on , and can be regarded as a vector in . The Fourier transform of under the Laplacian basis is defined as .

Note that the norm variation of the Laplacian eigenvector is increasing with . To see this, let , then it can be proved that

 x⊤Lx=∑1≤i

That means the quadratic form exactly measures the norm variation of . Since , we have

 u⊤1Lu1≤⋯≤u⊤NLuN,

i.e., the norm variation of is increasing with . In other words, the Laplacian basis vectors represent a range of frequencies from low to high.

Furthermore, the eigenvector minimizes the norm variation in the subspace orthogonal to the span of the previous eigenvectors, i.e.,

 uk=argminx∈RNx⊤Lxs. t.[u1,…,uk−1]⊤x=0, ∥x∥=1. (2)

In fact, let satisfy and . Let the Fourier transform of be . Then can be expressed as , hence

 x⊤Lx=^x⊤U⊤LU^x=N∑j=kλj|^xj|2≥λkN∑j=k|^xj|2=λk=u⊤kLuk.

Therefore the eigenvector solves the norm variation minimization problem (2) for .

It is natural to consider the more general norm variation. In this paper, we restrict ourselves to norm variation defined as follows

 S(x):=∑1≤i

Similar to Laplacian basis minimizing norm variation, we define the Fourier basis as the solution of norm variation minimization problem.

###### Definition 1.

Let . If a sequence of vectors solves the norm variation minimization problem as follows,

 uk=argminx∈RNS(x)s. t.[u1,…,uk−1]⊤x=0, ∥x∥=1. (4)

for

, then we say the orthogonal matrix

constitutes an Fourier basis, or simply an basis, of the graph .

Remarks: The above definition of Fourier basis can be extended to directed graphs. All one needs is to replace in the minimization problem by a directed version

 ˜S(x):=∑1≤i,j≤Nwij(xi−xj)+, (5)

where (more details can be found in Sardellitti2017 ). Then one can similarly defined the directed Fourier basis as the solution of the corresponding problem. Without loss of generality, we only consider undirected graphs in this paper. Most results can be generated to the directed case without essential difficulties.

## 3 Necessary condition of ℓ1 Fourier basis

In the previous section, the Fourier basis vectors are defined as the solutions of a sequence of minimization problem (4). We rewrite problem (4) in a concise form:

 PU:=minx∈RNS(x)s. t.U⊤x=0,∥x∥=1 (6)

where is a matrix with its first column being , , and . With this notation, problem (4) can be referred to as . Now our goal is to solve problem .

First, let us recall some basic definitions of optimization theory. Denote the feasible region of problem by , i.e.,

 XU:={x∈RN∣U⊤x=0, ∥x∥=1}. (7)

A point is called a local minimum of problem if there exists such that for any . If holds for any , then is called a global minimum of problem . Obviously a global minimum is necessarily a local minimum. We denote the set of all local minima of problem by .

Due to the sphere constraint , problem is not a convex optimization problem. As far as we know, there are no general results about the global minimum of such problems, and in most cases it is only possible to approach the local minimum by iterative algorithms Bresson2012 ; Lai2014 . As the main result of this section, we shall prove a necessary condition satisfied by the local minimum (Theorem 4). The key ingredient of the proof is based on the concept of piecewise representation, which is introduced as follows.

###### Definition 2.

Suppose . Let and . Then can be rewritten as , where . Let , and . Then , which is called the piecewise representation of . We also call the partition matrix of , denoted by .

It is easy to see that any vector in has unique piecewise representation. Under the piecewise representation , the norm variation can be simplified to a linear form in a local neighborhood of .

###### Lemma 3.

Suppose , , and . Then there exists such that

 S(Ma′)=f⊤a′,∀a′∈B(a,ε), (8)

where is defined by

 fi:=i−1∑j=1W(Ai,Aj)−m∑j=i+1W(Ai,Aj),i=1,…,m. (9)
###### Proof.

Suppose , then . Let and . When is sufficiently small, we have , i.e., there exists such that for all , is a piecewise representation. Therefore

 S(x′) = 12N∑i=1N∑j=1wi,j|x′i−x′j| = 12m∑i=1∑p∈Aim∑j=1∑q∈Ajwp,q|x′p−x′q| = 12m∑i=1m∑j=1|a′i−a′j|∑p∈Ai∑q∈Ajwp,q = ∑1≤i

###### Theorem 4.

If and , then

 dimker(U⊤M)=1. (10)
###### Proof.

The main idea is to transform problem to a easier one by using Lemma 3. Suppose and . By assumption of problem , we have and , therefore is a non-constant signal, i.e., . Since is a local minimum of , there exists such that

 x =argminx′∈RNS(x′)s. t.U⊤x′=0, ∥x′∥=1, x′∈B(x,ε1). (11)

By Lemma 3, there exists and such that for all and . Let , then implies and . Let , then implies , and implies . Therefore

 a =argmina′∈Rmf⊤a′s. t.U⊤Ma′=0, a′⊤Λa′=1, a′∈B(a,ε). (12)

Suppose , and is an orthonormal basis of . Define , , . Then we have

 c =argminc′∈Rlg⊤c′s. t.c′⊤Qc′=1, c′∈B(c,ε). (13)

We next prove problem (13) has minimum only if . It is proved by contradiction.

Suppose . By the method of Lagrange multipliers, the minimum of problem (13) satisfies the equation

 ∇[g⊤c+μ(c⊤Qc−1)]=g+2μQc=0,

where is a Lagrange multiplier. Thus .

Since , there exists a nonzero vector such that . Let , then . Let , , . Then

 c′′⊤Qc′′=c⊤Qc+2tc⊤Qr+t2r⊤Qr=1+t2r⊤Qr>1,

since is symmetric and positive definite.

Let , then . Choose small enough to guarantee . Since , we have

 g⊤c′=g⊤c+tg⊤r√c′′⊤Qc′′=g⊤c√c′′⊤Qc′′

which contradicts to being the minimum of problem (13). The proof is complete. ∎

We remark that the condition (10) is not a sufficient condition. From condition (10

), we deduce an estimate of the number of values of the components of a local minimum

.

###### Corollary 5.

If , then the components of have at most different values.

###### Proof.

Let . By Theorem 4, . Since

 k−1 = rank(U) ≥ rank(M⊤U) = dimspan(M⊤U) = m−dimker(U⊤M) = m−1,

we have . By definition of piecewise representation, is the number of different values of ’s components. The proof is complete. ∎

Corollary 5 asserts that the th basis vector , as the global (hence local) minimum of problem , is at most a -valued signal. In particular, is a constant signal and is exactly a two-valued signal. Intuitively speaking, the larger is, the more values can take, the more oscillation might present. Thus the basis vectors represent different oscillation frequencies from low to high as expected.

Another implication of condition (10) is the finiteness of the set of local minima. Denote by the set of all partition matrices of satisfying condition (10), i.e.,

 M∗U:={M∣x∈XU, M=ϕ(x), dimker(U⊤M)=1}. (14)

For any vector , its partition matrix has at most columns, and each entry of is either or . Therefore the set of all partition matrices of vectors in is a finite set, so as a subset is also finite.

By Theorem 4, if is a local minimum of problem , then its partition matrix belongs to . Conversely, given a partition matrix , we show that there are only two with partition matrix being equal to .

If and , then .

###### Proof.

Since , there exist such that and . Then and , i.e., . Since and , there exists such that , hence . From , we have . The proof is complete. ∎

Define

 ψU(M):={x∣x∈XU∩spanM},∀M∈M∗U. (15)

By Theorem 6, has two elements in total, which differs by a sign. Let

 X∗U:=⋃M∈M∗UψU(M). (16)

Then , i.e., is a finite set.

The local minima set is a subset of . In fact, If and , then and , hence . It follows that is also a finite set, i.e., each local minima is isolated and the total number of local minima is finite. Figure 1 shows the relations between these sets and definitions. Here resembles the concept of critical points, which contains but not equals the set of local minima. Figure 1: Relation between M∗U, X∗∗U, X∗U and XU.

Since is finite, to find the global minimum of problem , one way is to compute for all in and pick out the largest one. Table 1 shows a special case for , .

Through this method of enumeration, the continuous problem is equivalent to a discrete problem in which the variable belongs to a finite set . However, as far as we know, the discrete problem has no effective algorithm, since the size of grows exponentially with , and the method of enumeration is impractical for large . In the next section, we will give a fast greedy algorithm to approximately construct the Fourier basis when is large.

## 4 Greedy algorithm for ℓ1 Fourier basis

In this section, we provide a fast greedy algorithm to approximately construct the Fourier basis. Through piecewise representation, the partition matrix of the th basis vector naturally induces a partition of the vertices set . The increasing of variation of implies that the corresponding partition evolves from coarser to finer scales. On the contrary, given a sequence of partitions varying across different scales, one might be able to construct an orthonormal basis close to basis. Motivated by this idea, we propose a greedy algorithm, based on a partition sequence created by iteratively grouping the vertices. In each step, we pick out the two groups of vertices with the largest mutual weights between them, and combine them in a new group. Repeating the process, we get a sequence of partitions varying from finer to coarser scales. Then based on , we define a sequence of subspaces of . By using the similar ideas of multi-resolution analysis, we obtain an orthonormal basis.

### 4.1 Greedy partition sequence

We define a sequence of partitions on the vertices set as follows.

###### Definition 7.

Let

 τN:={{1},{2},…,{N}}. (17)

For , define

 Ak,Bk:=argmaxA,B∈τkW(A,B), (18)
 τk−1:={Ak∪Bk}∪{C∈τk∣C≠Ak, C≠Bk}. (19)

Definition 7 actually represents a vertices grouping process. At the beginning, the finest partition has groups, each group having one vertex. To get the next partition , we identify as the two groups having the largest mutual weight. Then we combine and to get a new group , and together with the other groups in we form a new partition . This operation repeats for times. At the end, we get the coarsest partition , with all the vertices belonging to a single group. See Figure 2 for an illustration. Figure 2: In step k, we combine Ak and Bk of τk to get τk−1.

### 4.2 Greedy basis

The greedy partition sequence defined above yields a sequence of subspaces

 Vk:=span{1A∣A∈τk},k=1,…,N, (20)

which satisfy the relations

 span1=V1⊂V2⊂⋯⊂VN=RN. (21)

Denote the orthogonal complement of in by . By definition 7, the partition is obtained by combining two groups and in . Suppose and . Let . Then can be written in the form . From , we get , . Since

 ⟨x,1Ak∪Bk⟩=a|Ak|+b|Bk|=0,

there exists such that , . By requiring , we get . We summarize these results in the following theorem.

###### Theorem 8.

Suppose are defined as in Definition 7. Let ,

 ˜uk:=ak1Ak+bk1Bk,k=2,…,N, (22)

where

 ak:=−tk|Bk|,bk:=tk|Ak|,tk:=1√|Ak||Bk|(|Ak|+|Bk|). (23)

Then is an orthogonal matrix. We call the greedy basis of the graph .

Table 2 shows a simple example of the greedy basis given a partition sequence , where the number of vertices . Figure 3 plots the binary tree formed by and . Figure 3: Binary tree of Ak and Bk in the above example

An interesting question is whether the greedy basis vector minimizes the norm variation. We will show that the partition matrix induced by the greedy partition satisfies the necessary condition (10).

###### Theorem 9.

Let

 ˜Uk−1:=[˜u1,…,˜uk−1],k=2,…,N, (24)

where is defined in Theorem 8. Suppose , and . Then .

###### Proof.

Suppose and . Then , i.e., . Since , we have . Because , that means . Since and , there exists such that , i.e., . Hence , , , i.e., , therefore and . ∎

In Theorem 9, and satisfy the condition (10), i.e., . Since