Effective information analysis generally boils down to the geometry of the data represented by a graph. Typical applications include social networks, transportation networks, spread of epidemic disease
, brain’s neuronal networks, gene data on biological regulatory networks, telecommunication networks9], which are lying on non-Euclidean graph domain. To describe the geometric structures, graph matrices such as adjacency matrix or graph Laplacian can be employed to reveal latent patterns.
In recent years, many problems are being revisited with deep learning tools. Convolutional neural networks(ConvNets) emerging in recent years are at the heart of deep learning, and the most prominent strain of neural networks in research. ConvNets have revolutionized computer vision11], computer audition13, 14], and many other areas. However, ConvNets are designed for grid data such as image, which belongs to the Euclidean domain. Graph data is non-Euclidean which makes it difficult to employ typical ConvNets. To bridge the gap, Bruna et al.  generalized spectral convolutional operation which requires expensive steps of spectral decomposition and matrix multiplication. Hammond et al.
first introduced truncated Chebyshev polynomial for estimating wavelet in graph signal processing. Based on this polynomial approximation, Defferrard et al. designed ChebNet which contains a novel neural network layer for the convolution operator in the spectral domain. Kipf and Welling simplified ChebNet by assuming the maximum of eigenvalues is 2 and fixing the order to 1, which boosts both effectiveness and efficiency. Li et al. found that this simplified ChebNet is an application of Laplacian smoothing, which implies that current studies are only effective on the smooth signal.
Current studies on graph ConvNet heavily rely on polynomial approximation, which makes it difficult to estimate jump signals. Fig. 1 shows the behaviors of polynomial and rational function on jump discontinuity: rational approximation fits the functions considerably better than polynomials. It is widely recognized that (1) polynomial approximation suffers from Gibbs phenomenon, which means polynomial function oscillate and overshoot near discontinuities; (2) Applying a higher order of polynomials could dramatically reduce the oscillation, but also incurs an expensive computational cost. (3) Polynomials require degree (poly(1/)) to approximate functions near singularities and on an unbounded domain, while rational functions only need (poly log(1/)) to achieve -close[1, 2]. However, it is non-trivial to apply rational approximation. Polynomial-based method can easily transfer the function on eigenvalues to the same function on graph Laplacian so that matrix multiplication by eigenvector can be avoided. It is not easy for rational approximation to do so due to the additional denominator.
In this paper, the advantage of the rational function in approximation is transferred to spectral graph domain. Specifically, we propose a rational function based neural networks(RationalNet), which can avoid matrix multiplication by eigenvectors. To alleviate the local minimum problem of the neural network, a relaxed Remez algorithm is employed for parameter initialization. Our theoretical analysis shows that rational functions converge much faster than polynomials on jump signal. In a nutshell, the key innovations are:
Propose a neural network model based on rational function for recovering jump discontinuities: To estimate the jump signal, our proposed method integrates rational approximation and spectral graph operation to avoid matrix multiplication by eigenvectors. For graph signal regression task, expensive matrix inversion can be circumvented by graph Fourier transform.
Develop an efficient algorithm for model parameters optimization: Remez algorithm is theoretically optimal, but it is often not practical especially when approximating discrete signal. To alleviate this issue, the stopping rules of Remez algorithm are relaxed to initialize the neural networks parameters.
Provide theoretical analysis for the proposed method on jump signal: For understanding the behaviors of polynomial and rational function on jump discontinuities, a uniform representation is proposed to analyze convergence rate regarding the order number theoretically.
Conducting extensive experiments for performance evaluations111The code and datasets will be released after the acceptance for replication: The proposed method was evaluated on synthetic and real-world data. Experimental results demonstrate that the proposed approach runs efficiently and consistently outperforms the best of the existing methods.
The rest of the paper is organized as follows. Section II reviews existing work in this area. Necessary preliminary is presented in section III. Section IV elaborates a rational function based model for graph signal regression task. Section V presents algorithm description and theoretical analysis. In Section VI, experiments on synthetic and real-world data are analyzed. This paper concludes by summarizing the study’s important findings in Section VII.
Ii Related Work
The proposed method draws inspiration from the field of approximation theory, spectral graph theory and recent works using neural networks. In what follows, we provide a brief overview of related work in these fields.
Ii-a Approximation theory
In mathematics, approximation theory is concerned with how functions can best be approximated with simpler functions, and with quantitatively characterizing the errors introduced thereby. One problem of particular interest is that of approximating a function in a computer mathematical library, using operations that can be performed on the computer or calculator, such that the result is as close to the actual function as possible. This is typically done with polynomial or rational approximations. Polynomials are familiar and comfortable, but rational functions seem complex and specialized, and rational functions are more powerful than polynomials at approximating functions near singularities and on unbounded domains. Basic properties of rational function are described in books of complex analysis[22, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31].
Ii-B Spectral graph theory
Spectral graph theory is the study of the properties of a graph in relationship to the characteristic polynomial, eigenvalues, and eigenvectors of matrices associated with the graph, such as its adjacency matrix or Laplacian matrix[32, 33, 34]. Many graphs and geometric convolution methods have been proposed recently. The spectral convolution methods ([15, 18, 19, 35]) are the mainstream algorithm developed as the graph convolution methods. Because their theory is based on the graph Fourier analysis ([36, 37]). The polynomial approximation is firstly proposed by . Inspired by this, graph convolutional neural networks (GCNNs) () is a successful attempt at generalizing the powerful convolutional neural networks (CNNs) in dealing with Euclidean data to modeling graph-structured data. Kipf and Welling proposed a simplified type of GCNNs, called graph convolutional networks (GCNs). The GCN model naturally integrates the connectivity patterns and feature attributes of graph-structured data and outperforms many state-of-the-art methods significantly. Li et al. found that GCN is actual an application of Laplacian smoothing, which is inconsistent with GCN’s motivation. In sum, this thread of work calculates the average of vertexes within Nth-order neighbors.
In this paper, we focus on the effectiveness of approximation technique on graph signal. Under graph signal regression task, the superiority of rational function beyond polynomial function is analyzed, and a rational function based neural network is proposed.
We focus processing graph signals defined on undirected graphs , where is a set of n vertexes, represents edges and is an unweighted adjacency matrix. A signal
defined on the nodes may be regarded as a vector. Combinatorial graph Laplacian is defined as where is degree matrix.
As is a real symmetric positive semidefinite matrix, it has a complete set of orthonormal eigenvectors and their associated ordered real nonnegative eigenvalues identified as the frequencies of the graph. The Laplacian is diagonalized by the Fourier basis : where is the diagonal matrix whose diagonal elements are the corresponding eigenvalues, i.e., . The graph Fourier transform of a signal is defined as and its inverse as [36, 37, 38]. To enable the formulation of fundamental operations such as filtering in the vertex domain, the convolution operator on graph is defined in the Fourier domain such that , where is the element-wise product, and are two signals defined on vertex domain. It follows that a vertex signal is filtered by spectral signal as:
Kipf and Welling simplifies it by:
rewrite the above GCN in matrix form: , which leads to symmetric normalized Laplacian with raw feature. GCN has been analyzed GCN in  using smoothing Laplacian: where is a weight parameter between the current vertex and the features of its neighbors , is degree of , and is the smoothed Laplacian. This smoothing Laplacian has a matrix form：
The above formula is random walk normalized Laplacian as a counterpart of symmetric normalized Laplacian. Therefore, GCN is nothing but a first-order Laplacian smoothing which averages neighbors of each vertex.
Iv Model description
This section formally defines the task of graph signal recovering and then describes our proposed RationalNet which aims to characterize the jump signal in spectral domain.
Iv-a Problem Setting
All the related works integrate graph convolution estimator and fully-connected neural layers for a classification task. This classification can be summarized as:
where indicates the parameters of normal neural network layers connecting the output of and the label
, such as fully-connected layers and softmax layers for classification. Andis a neural network layer implemented by approximation techniques. However, whether the success is due to the neural networks() or the convolution approximation method() remains unknown. To focus on the analysis of approximation on , a graph signal regression task is proposed to evaluate the performance of the convolution approximators . Regression task directly compares the label and the output of , removing the distraction of .
Given a graph , raw feature , and training signal on the part of vertexes, , our goal is to recover signal values, , on test nodes. Formally, we want to find a so that:
If the raw features are good enough for the regression task, whether the effectiveness is due to or is difficult to verify. Therefore, one reasonable option for is the uniform signal in spectral domain. Specifically, and , which means that represents eigenbasis of graph structure in spectral domain. Each entry of vector indicates one eigenvector in the spectral domain. The physical meaning of the convolution operation is how to select eigenbasis in spectral domain to match the graph signal . Representing with graph Laplacian, the regress task can be rewritten as:
where is the spectral filter to approximate.
, RationalNet approximates the spectral filter by a widely used type of rational function, i.e., Padé approximator, which is defined as:
Applying to graph convolution operator, we have:
where represents a diagonal matrix whose entries are eigenvalues, , and . The division is element-wise. The inverse of matrix Q(x) is equivalent to applying reciprocal operation on its diagonal entries, so the equation can be rewritten as:
Applying matrix rules, it’s easy to have:
Since , we can rewrite the equation above as:
where and . Note that in our case. Based on polynomial approximation, RationalNet only adds a inverse polynomial function . Therefore, polynomial approximation (GCN/ChebNet) on graph is a special case of RationalNet when .
Computing inverse of matrix is still of high complexity (Gauss–Jordan elimination method) in Eq. 4, especially for large matrix. This can be avoided by transferring vertex graph signal and raw features into spectral domain. Therefore, the Eq. 12 can be rewritten as:
where is the graph Fourier transform of graph signal, and is the graph Fourier transform of raw feature. By this step, we only need compute reciprocal of eigenvalues, rather than computing matrix multiplication and inversion at each update. Eq. 2 can be obtained via left multiplying both sides of Eq. 2 by transpose of eigenvectors. Note that Eq. 5 is applicable when there is no other layers between the output and the convolution operation. In contrast, Eq. 4 can be used not only in regression task like Eq. 2, but also in classification where there exist additional neural networks as described in Eq. 1. RationalNet has complexity for Eq. 5 and for Eq. 4.
Iv-C Relaxed Remez Algorithm for initialization
RationalNet is powerful at approximating a function. However, it is often stuck in a local optimum due to the neural network optimization, which makes rational function not always better than the polynomial approximation. Remez exchange algorithm is an iterative algorithm used to find simple approximations to functions. Nevertheless, the drawback of this minimax algorithm is that the order for optimum is unknown and the stopping condition is not often practical. To improve RationalNet’s initialization, a relaxed Remez algorithm is proposed.
As Theorem 24.1 (equioscillation characterization of best approximants, ) states: Given the order of numerator(m) and denominator(n), and a real function that is continuous in [p, q], there exists a unique best rational approximation defined as Eq. 3. This means that the optimizes the minimax error:
A rational function is equal to iff equioscillates between at least m+n+2 extreme points, or say the error function attains the absolute maximum value with alternating sign:
where indicates the index of data point, and represents the max of residuals: . For rational function approximation, there is some nonlinearity because there will be a product of with in the equations. Hence, these equations need to be solved iteratively. The iteration formula can be defined by linearizing the equations:
where indicate the iteration index. Eq. 8 is obtained by neglecting nonlinear terms of the form in Eq. 7. This procedure can converge in a reasonable time (). Expanding Eq. 8 for all sampled data points , it can be rewritten as:
where K=m+n+1. Starting from an assumed initial guess ， this set of linear equations can be solved for the unknown , and , when two successive values of are in satisfactory agreement such as is less than 1e-6. Constructing a rational function with new coefficients, Remez computes the error residuals. If absolute value of the residuals are not great than , the optimal coefficients are obtained. Otherwise, Remez calculates the roots of rational function and constructs a new set of control points by collecting the extremes in each interval of roots, and repeat the computation of Eq. 8 until residuals are not great than . However, this stopping rule is not often satisfied, which makes the algorithm stuck in dead loop. Therefore, we add an iteration limit for avoid dead loop. The relaxed Remez algorithm could be summarized as follows:
Prepare training data
Specify the degree of interpolating rational function.
Pick m + n + 2 points from the data points with equal interval. Under this discrete setting, the distances between any neighbors are considered equal if the data distribution are dense
Optimization by equioscillation constraint
Solve coefficients and by Eq. 9
Form a new rational function with new coefficients
Calculate residual errors
Repeat until E converges or is less than 1e-6
Check stopping rule
Calculate residual errors
Stops if the absolute value of any residual is not numerically greater than .
Otherwise, find the n+m+1 roots of the rational function, and find the points at which the error function attains its extreme value. Rerun the algorithm with this new set of training data from the second step.
We have considered an algorithm for obtaining minimax approximation when the function could be evaluated at any point inside the interval. In our case, the function is known only at a set of discrete points, since eigenvalues are not continuous. However, this problem is no essentially different form the continuous case if the set of points is rather dense in the target interval [p, q]. We simply assume that eigenvalue samples are dense enough, since we often normalized eigenvalues into the range [0,1], several hundreds of points are thereby sufficient. For example, our smallest size of the synthetic graph consists of 500 nodes, so there are 500 eigenvalues distributed in [0, 1], which should be enough for approximation.
If the degree of rational function is large, then the system of equations could be ill-conditioned. Sometime, the linear system of equations 9 is singular, which make the solution vector(, , ) under-determined. We traverse all possible m/n pairs given the maximum of m and n. The relaxed algorithm discards any m/n if singular matrix error occurs.
We found that the residuals are not smaller than for some m/n pairs, and the algorithm continues to output the same values. In such case, the algorithm stops if the maximum and minimum residuals () converge or they satisfy .
V Algorithm and Theoretical analysis
This section elaborates algorithm details and analyzes its convergence rate on jump discontinuity.
V-a Algorithm description
The Algorithm 1 first calculate graph Laplacian(line 1) and spectral decomposition(line 3), and convert vertex signal into spectral domain by inverse Fourier transform(line 2). From given m,n, algorithm 1 traverse all possible m/n pairs (line 4). Picking up m+n+1 points with equal intervals, the optimal error is calculated (line 10). After convergence, optimal m/n and ψ i /ϕ j are determined (line 11). Then algorithm calculates the residuals(line 13). If the stopping rule is not satisﬁed, decrease the order of denominator or numerator in turns and repeat the same process, otherwise, output the parameters of rational function.
With optimal parameters, graph convolution operation is calculated by rational approximation (line 19). Then we conduct typical neural networks optimization.
V-B Theoretical analysis
This section first represents jump discontinuity using a function(Eq. 10). Then convergence rate of rational function on jump discontinuity is analyzed(Theorem V.1). With the help a Lemma V.2, we prove Theorem V.1. Similarly, convergence rate of polynomial function(Theorem V.3) is also provided.
We found that and can characterize single jump discontinuity. For example, when a=b=1/2 and =0,
is ReLU function. It iswhen a=1 and b=1, and =1. Thus, rotates or change the angle between two lines at jump discontinuity based on and . These two functions can be rewritten in an uniform formula:
Theorem V.1 (convergence rate of rational approximation on jump discontinuity).
Given n5 and b1, there exist a rational function of degree n that satisfies
Given , ,
proof for Lemma 11.
If =0, left of Eq. 11 is equivalent to
Since and are both even, it suffices to consider the case when .
For , since so that :
If =1, Following same procedure as =0, we have:
By Bernstein’s theorem, polynomials can approximate a function with:
where is a polynomial function of degree of n, and . Using the same settings for the rational function, we have a similar result for polynomials:
Theorem V.3 (convergence rate of polynomial approximation on jump discontinuity).
Given , , :
where is a polynomial function of degree n, and C=3bc.
In a nutshell, when the order is large or equal to 5, polynomial converges linearly regarding the order number, while rational function converges exponentially.
This section elaborates evaluation with a detailed analysis of the behaviors of the proposed method on synthetic and real-world graphs.
Vi-a Training Setting and Baselines
The input include a graph Laplacian , a graph signal residing on each vertex , and raw feature . In a nutshell, we aims at finding a function that satisfies :
where is set to be jump function such and . In previous works, raw features and filtering signal are fed into the model to fit the graph signal. However, raw features have an impact on fitting graph signal, which distracts the analysis of filtering behaviors. As discussed in Section IV, is set to be eigenvector which is a uniform signal in spectral domain, so that we can focus on the behaviors of approximation methods. We compare RationalNet(RNet) against several stat of art regression models:
Polynomial Regression(PR) 
Passive Aggressive Regression(PAR) 
Epsilon-Support Vector Regression(SVR). Three kernels were applied: linear(L), polynomial(P) and RBF(R).
Bayesian Ridge Regression(BR),
Automatic Relevance Determination(ARD)
Orthogonal Matching Pursuit(OMP)
Huber Regression 
ChebNet. PolyNet is proposed by replacing Chebyshev polynomial with normal polynomial.
Vi-B experiments on synthetic data
To validate the effectiveness of RationalNet, we conduct a simulated test with synthetic data. The task is to recover signal on the vertexes, which is a regression problem. Specifically, we generated a graph comprised of several subgroups. The edge amount for each vertex in the same subgroup is randomly chosen between 0 and 8, while the links among different subgroups are sampled between 0 and 3. Experiments were conducted on a 500-node and a 1000-node graph. Two types of jump signals are fed into this network structure: and . Since all eigenvalues are normalized into range [0, 1], jump points of and are moved into the same range. Specifically, we used and . Detailed results are shown in Table II and I.
|ChebNet||.0021.0000||1.1904.0052||.2058 .0067||.2084 .0043|
In 1000-node graph test on (first two columns in Table I), PolyFit achieved the second lowest MSE(0.0016 for S-ERR). PolyNet’s MSE(0.0016) is the same as that of PolyFit, which shows the power of polynomial regression. Chebyshev polynomial(ChebNet) dose not improve PolyNet, which implies that neural network might approximate the best coefficients of polynomials no matter what type of polynomial is used. LR, RR, LASSO, EN, OMP, BR, ARD, SGD SVR(L/P) performed at the same level(0.0015-0.0018). Our method(5e-6) significantly outperformed all the baselines by a large margin. Both the errors in spectral domain and vertex domain show the advantage of RationalNet. The Similarly, PolyFit and PolyNet and SVR(R) performed better than all the baselines except RationalNet. Our method still achieves the lowest MSE(0.004619 for S-ERR). The 500-node graph experiment(Table I)) also demonstrates the effectiveness of RationalNet.
Regression behaviors on synthetic data is shown in Fig. 2 and 3. Methods (SVR(L), Ridge, OMP, LASSO, Linear regression, ENet, ARD, Huber, etc.) fitted the (Fig. 2) using a straight Line, while better baselines(SVR(R)), PolyFit, PolyNet, ChebNet) approximate the function with curves. RationalNet almost overlapped with the target function which makes its MSE very small(5e-6). Similarly, in Fig. 3, the methods using straight lines(SVR(L), Ridge, OMP, SGD, LASSO, BR, Huber LR, ENet, ARD) performed relatively bad. Fitting with curves, PolyFit, PolyNet, ChebNet improved the performance by a large margin. Similarly, RationalNet overlapped the signal and achieved the lowest error score.
|RNet w/o Remez (spectral)||0.000145||1.364790|
|RNet w/o Remez (vertex)||0.000252||0.887602|
|RNet w/ Remez (spectral)||1.981569e-6||0.010645|
|RNet w/ Remez (vertex)||9.569891e-5||0.093268|
Since RationalNet initializes parameters by a relaxed Remez algorithm, we analyze the performance of neural networks and Remez respectively. As shown in Table III, the first two rows show the MSE of Remez only. Compared with Remez algorithm, RationalNet without Remez initialization(3rd and 4th lines) performed badly. On the contrary, RationalNet with Remez initialization(5th and 6th lines) improved the Remez by 56.26% and 81.39%(7th line) for and separately in spectral domain, which also reduces their MSE in vertex domain by 9.37% and 14.93%(8th line). This result illustrates that Remez and RationalNet cannot find the optimum independently. Therefore, it is reasonable to integrate these two methods for optimizing the coefficients.
Vi-C Case study on real-world scenario
In this section, we study a traffic congestion signal on Minnesota state-level road network 222https://www.cise.ufl.edu/research/sparse/matrices/Gleich/minnesota.html and Fairfax county-level road network VA333https://github.com/gboeing/osmnx. Specifically, the signal is a high-pass filtering which can be written as in Fourier domain. is a threshold function sets the output to 0 when normalized eigenvalues , and 1 for . Therefore, this function filter out signal of low frequency. The physical meaning of the convolutional operation is a weight function that chooses the eigenbasis() to fit the traffic signal .
The top line of Fig. 4 shows several examples in eigen space of Minnesota road networks. First two sub figures are the 2nd and 3rd eigenvector on vertex domain: emphasizes the south of Minnesota(red area), while highlights the capital St. Paul and its biggest city Minneapolis. Note that the 1st eigenvector is a constant vector for any connected graph. correspond to , which represent the first two lowest frequencies. As these figures show, low frequencies represent smooth signals, which means that the neighbors of each node are likely to have similar signal value. By contrast, high-frequency basis captures non-smooth component as the 3rd and 4th sub figures show: signal values vary frequently in some areas. Combining top 50% high-frequency eigenbasis, the last sub figure shows the signal on the graph. In addition, the degree of non-smoothness of signal regard graph structure can be evaluated quantitatively by Dirichlet energy(): . Dirichlet energy of examples is shown in the caption of Fig. 4. Eigenvectors of low frequency() are smooth, so their Dirichlet energies are low. While high-frequency eigenvectors are less smooth since their Dirichlet energy is higher(around 4.04). Summing up the top 50% high frequencies, the Dirichlet energy of in the last sub figure is very large(15384.10). The bottom line of Fig. 4 shows similar examples from Fairfax road networks. highlights Fair City Mall(red area) and the road to this mall, while underlines Fairfax Circle Shopping Center and a residential neighborhoods nearby. Similarly, and show two non-smooth graph signals. Summing up top 50% high frequencies, the 5th sub figure exhibits an extremely non-smooth signal. Characterizing non-smooth graph signal or high frequencies is not a trivial task. Therefore, approximating this high pass filtering is significantly challenging.