EigenNetworks

06/05/2018
by   Jonathan Mei, et al.
Carnegie Mellon University
0

In many applications, the interdependencies among a set of N time series { x_nk, k>0 }_n=1^N are well captured by a graph or network G. The network itself may change over time as well (i.e., as G_k). We expect the network changes to be at a much slower rate than that of the time series. This paper introduces eigennetworks, networks that are building blocks to compose the actual networks G_k capturing the dependencies among the time series. These eigennetworks can be estimated by first learning the time series of graphs G_k from the data, followed by a Principal Network Analysis procedure. Algorithms for learning both the original time series of graphs and the eigennetworks are presented and discussed. Experiments on simulated and real time series data demonstrate the performance of the learning and the interpretation of the eigennetworks.

READ FULL TEXT VIEW PDF

Authors

04/27/2020

The Local Partial Autocorrelation Function and Some Applications

The classical regular and partial autocorrelation functions are powerful...
05/05/2015

Autoencoding Time Series for Visualisation

We present an algorithm for the visualisation of time series. To that en...
02/28/2015

Signal Processing on Graphs: Causal Modeling of Unstructured Data

Many applications collect a large number of time series, for example, th...
02/08/2021

Changepoint detection on a graph of time series

When analysing multiple time series that may be subject to changepoints,...
09/21/2021

Personalized Online Machine Learning

In this work, we introduce the Personalized Online Super Learner (POSL) ...
03/14/2018

Generalised Structural CNNs (SCNNs) for time series data with arbitrary graph-toplogies

Deep Learning methods, specifically convolutional neural networks (CNNs)...
05/27/2020

TSML (Time Series Machine Learnng)

Over the past years, the industrial sector has seen many innovations bro...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

“All entities move and nothing remains still.”
–Heraclitus

In a world where change is the only constant, learning pairwise relationships among large ensembles of interacting agents from data they generate is often interesting but difficult. In many applications, data arises in contexts where physical laws are not available to derive useful models. An alternative paradigm is to capture relationships among data through sparse graph structures because they are interpretable by humans for advancing scientific understanding [1, 2, 3], as well as computationally beneficial for engineering efficient analytics algorithms [4, 5, 6, 7].

Many network time series models treat time series as stationary, where the parameters of the model–the networks–stay constant through time [8, 9, 10, 11, 12, 13, 14, 15]. We wish to extend some of these models to the non-stationary case in which these graph structures themselves vary in time. Time varying networks raise several challenges including the computational cost of tracking them, their succint representation, and extracting from them useful interpretations to close the loop with the application. To address these issues, we propose parsimonious representations of these time varying networks as weighted linear combinations of basis networks. We refer to the basis networks as principal networks or eigennetworks111Eigenfaces were introduced in [16] for face representation and identification and further developed in [17]

for face detection and identification, see also

[18] that explores eigenfaces and fisherfaces for face recognition and classification and [19] for a review of relevant work at the time.

and to the weights as eigenfeatures. The eigennetworks explain the variation of the graphs through a few basic component networks that are somehow fundamental to the underlying process that generates the time series. The time varying graphs describe a trajectory in eigenspace that is expected to be smooth since the graphs change at a much slower time scale than the data time series, except when rare but significant changes occur in the application or data sources. For example, an election where senate seats may flip from one party to another induces change in the voting pattern in the US senate, or the four stages of the

Drosophila Melanogaster222A vinegar fly, also commonly referred to as fruit fly. metamorphosis should induce abrupt changes in its genetic expression data. In both applications, these events should be reflected in observable structural changes in the underlying networks. But in between these drastic events, we expect the structure of the networks to slowly vary, in the senate rolls, senators will mostly follow a party line position, and, in between life stages, the genetic expression data of the fruit fly exhibit small variations. We consider these applications in Section VI. The paper describes algorithms for 1) learning the time series of graphs from the raw data; 2) how to extract the corresponding principal or eigennetworks, eigenfeatures and eigentrajectories from the estimates of the time varying networks; and 3) approaches to determine when significant changepoints occur. We illustrate and validate our concepts and methods with experiments on simulated data and with two real world applications, one involving the voting record of the US senate over a two year period and the other the genetic expression data for the Drosophila Melanogaster as it goes through its life cycle. These provide interesting opportunity to interpret eigennetworks in real life contexts.

We explain now in more detail the approach.

Eigennetworks and eigenfeatures. Analysis of networks is usually difficult. The problem is compounded with varying networks. But the rate of change of the networks is expected to be slow relative to the original time series, it is then reasonable to believe that the collection of varying graphs can be well described through a small set of eigennetworks that are fundamental to the system. For example, in neuroscience, the functional connectivity of a subject in an experiment may present two distinct paradigms, when at rest and active, corresponding to the default mode and the task-positive networks [20]; and in genetics, the gene regulatory network of an insect may present four main paradigms, corresponding to the developmental stages of egg, larva, pupa, and adult [21]. The description using eigennetworks allows a more compact summary of the process as well as a parsimonious representation, reducing the total parameter size of the final learning problem and providing more recognizable insights regarding the application data. Besides representation, the eigennetworks and the eigenfeatures can be used for other postprocessing network analysis, including classification, identification, or recognition, among others. Ideally, in certain applications, we may interpret each principal network individually as an eigenbehavior, shedding further light and helping to make sense of complex observed behaviors. Smooth variations of the networks are captured by time-varying weighted linear combinations of the eigennetworks.

Assuming that the time varying graphs have been learned from the time series, see next paragraph, we interpret each graph as a vector and desire a description of a vector subspace that efficiently captures most of the scatter variation in this set of graphs. The principal or eigennetworks, like eigenfaces, see footnote 

1 and [16, 17, 22], span a lower dimensional subspace (“network space”) in which the networks describing the process live. There are myriad ways to find vectors spanning such subspaces [23, 24, 25, 26]

; we adopt Principal Component Analysis (PCA) 

[27] as dimensionality reduction method that for a fixed number of basis components maximizes the scatter variation across the dimension reduced representations of the graphs. We apply a sparse version of PCA to networks and term this Principal Network Analysis (PNA).

Estimating time varying networks and changepoints. Before applying PNA, we estimate the time varying networks. We present a generic nonlinear regression model to learn the graphs. In our framework, the graphs can be directed or undirected. We pose the learning as a regularized optimization problem incorporating desired structural properties such as sparsity of the graphs and low rank latent variables (accounting for trends or unmodeled effects on the observations). This is potentially a difficult and computationally expensive task, since we may need to learn a new network at every single time step! To make this vastly underdetermined problem tractable, additional assumptions are required. One common assumption takes the form that the model does not change “too quickly” (or “too often”) through time. This can be characterized with respect to some metrics for signal smoothness [28, 29]. This assumption tends to take on two primary variants: 1) bounded total variation (TV); or 2) Lipschitz temporal gradient. The behavior of these variants can lead to strikingly different qualitative interpretations. Low total variation can be viewed as switching (along time) between networks at discrete (time) changepoints, whereas Lipschitz gradient can be viewed as a network slowly evolving in time.

For enforcing smoothness on the learned networks model, we focus on optimization based approaches as the basis for our algorithms and analysis. There have been several attempts at extending optimization methods for estimating single networks to the non-stationary setting [30, 31]. However, these methods take a black and white approach to the smoothness assumption, with either totally smooth graphs or piecewise constant graphs. Our approach, based on eigennetworks and time-varying eigenfeatures (the weights of the linear combinations representing the networks) allows capturing both smoothness and determine abrupt changepoints. Clearly, simultaneously estimating both improves the overall performance of the approach, since, once a changepoint occurs, we will be able to better select the data to be used to estimate the network.

Summary. We organize the paper as follows: section II establishes notation and reviews background, in particular, from and extending the notation in [14] from estimating a single network to estimating time varying networks, also reviewing Lipschitz and total variation based approaches to estimating time varying sparse networks; section III introduces our framework for learning time-varying sparse graphs in the presence of latent effects; section IV

presents our local linear regression kernel based approach to determine changepoints; section 

V describes the estimation of the principal or eigennetworks and eigenfeatures from the time series of networks; section VI details experimental results of our framework on simulated data and two sets of real data, one the voting record of the US senate in 2010-2011, before and after the 2010 November election, and the other genetic expression data for the Drosophila Melanogaster that covers the four stages of its life cycle; and finally we conclude in section VII.

Ii Notation and Related Works

In this section, we will describe the models that provide foundations for the methods we consider. Much of the notation follows from [14], which estimates a single network, so we provide a brief review. We will then introduce modifications to these models to capture non-stationarity and estimate a series of time-varying networks and compare this extension with several existing methods for time varying graphs in these settings. Finally, we outline the challenges that can arise in learning these graph structures.

Ii-a Generalized Linear Models

The Generalized Linear Model (GLM) can be described using several parameterizations. We adopt the one based on the Bregman Divergence [32, 33]. For observations and , let , . The model is parameterized by 1) a non-linear link function for a convex function ; and 2) a vector . We have the model written as

(1)

(note that some references use as the link function where we use ).

For data with conditionally independent given (note that this is not necessarily assuming that are independent), learning the model assuming is known can be achieved via empirical risk minimization,

(2)

where is the convex conjugate of (see [14] for further details).

Ii-B Estimating Static Networks

First, we extend the model to the multivariate case. Let , , and the matrix . Consider the vectorization,

(3)

In (3), the notation stands for . For the remainder of this paper, we make an assumption that all for notational simplicity, though the same analysis readily extends to the case where are distinct.

Now, consider an additional low rank matrix coefficient in the following model,

(4)

The matrix can be seen as incorporating the effects of some small number of latent variables on the observed variables . That is, under a true model

(5)

the coefficient matrix induces additive to (the derivation is omitted due to space constraints; see [14] for further details). For expositional clarity, we omit , noting that we can simply substitute if we desire such a model. However,in our experiments in section VI, we work with the full model, with  included.

We point out that formulating this problem as a generic regression allows us to consider graphs333In (3) through (5), we assume a generic model. When dealing with graphs, we will assume and , where  is the number of agents or nodes of the graph, is an integer, and the matrices  and  become of dimensions .

as either directed or undirected. For example, to estimate directed graphs on a time series, we may use an autoregressive model of lag order

(6)

where . To estimate undirected graphs, we may for example use an extemporaneous regression,

(7)

where and is the -th canonical basis vector, is the -th row of . Viewed jointly for all indices of , in row we regress (all indices of except for the th) on (the th index of ) by using (where is the -th row of ), enforcing contribution of 0’s on the diagonal. This regression has a correspondence to the precision (inverse covariance) matrix. In general, we can also enforce any arbitrary symmetric nonzero structure on if we have corresponding prior structural information on the graph matrix. We summarize how these specific settings from (6) and (7) correspond to the generic notation used in equation (3) in Table I.

Notation in (3) AR (directed) in (6) Inv. Cov. (undirected) in (7)
TABLE I: Summary of example regression settings and notation for estimating undirected and directed graphs

When posing the learning as an optimization problem, the loss function from (

2) naturally extends to the multivariate case. Other desired structural properties may be incorporated into the framework via regularization. Thus, we can pose the optimization as

(8)

where

(9)

Some examples for include symmetric sparsity for sparse partial correlations (conditional independencies) in a Gaussian graphical model [34], group sparsity corresponding to sparse Granger Causality [9], and commutativity corresponding to graph filters in the Discrete Signal Processing on Graphs framework [13, 35, 36, 37]. More detailed discussion of different choices for regularization and their assumptions will remain beyond the scope of this paper.

Ii-C Time-varying Networks

To further reduce clutter when convenient, we define

(10)

which collects matrices through time index , and denotes the vectorization of a matrix . Also, we use the various shorthand forms

(11)

so that the model is allowed to change at each time step , and the interpretation for is the linear part of the model prediction made by using the network from time on data from time . The actual model itself only makes sense at , but we will see shortly why we wish to have the notational flexibility to describe these offset quantities.

Returning to our model, we have the familiar

(12)

with a possibly non-linear link function and an associated loss functional at time of the general offset form

(13)

where and are vectorizations of and analogous to . As currently specified, the model is overdetermined. In the following, we review assumptions from prior art that restrict the model space and make the model estimation statistically tractable.

Ii-D Related Works

Optimization based methods also come in several flavors. One approach to impose Lipschitz gradient is to simply use a kernel estimator [30] to obtain locally stationary solutions [28] using our offset loss functionals from (13),

(14)

where is the kernel weight of a symmetric kernel with some bandwidth centered at evaluated at , and is a regularizer on the sparsity of the network whose form we omit for simplicity. This is embarrassingly parallel, and allows estimating the network at a single time point without needing to compute those at other time points if deemed irrelevant. However, the smoothness of the kernel makes it more challenging to interpret for detecting discrete changepoints as the low TV setting allows.

Alternatively, an approach for enforcing low total variation [31] is to regularize the difference between neighboring time points

(15)

This group sparse regularizer encourages the difference between networks at adjacent time points to either be all zero or non-zero. This minimization is over all networks through defined in (10). It couples the problem across time points, so that the solution is a joint estimator and is no longer so simply parallelized. Furthermore, while this model can capture significant changepoints, it is less able to describe smoother variation.

Ii-E Local Linear Regression

The main conceptual workhorse for our approach is still kernel regression. However, we consider a locally linear regression instead of a vanilla kernel regression. This has the benefit of a lower bias near boundaries of the dataset, so it is natural to adopt for our purposes of changepoint detection. Changepoints can be thought of as a boundary that appears in the middle of the data; alternatively, the boundaries can be thought of as changepoints at the edges of the data domain.

The local linear regression estimator is defined similarly to the kernel estimator. For a clean signal , noise signal , and noisy observations , the local linear regression estimator is

(16)

where is an estimate for the value of the clean signal at time , while represents an estimate for the slope of the clean signal at time . This estimator, as compared to kernel regression, has a lower bias near the edges of the data [38].

Ii-F Matrix Decompositions

Consider the problem of expressing a given matrix as the product of two matrices

(17)

where , , and for some . In general, if for some positive integer , then there can be infinite solutions for the matrices and . Particular structure may be imposed on and/or to obtain desired properties or interpretations. These two matrices can be determined by any number of matrix factorization algorithms, such as a straightforward

-rank singular value decomposition in which

(18)

for some choice of , typically.

This solution can be seen as a common starting point that can give rise to other solutions, since the existence of the SVD is guaranteed while its computation is stable [39], and it yields one nice structure, with and defining orthonormal bases. However, since we expect in our application the networks to have different types of structure both temporally (smooth) and spatially (sparse), we would naturally expect any reasonably interpretable decomposition to exhibit qualitatively similar behavior. The SVD does not necessarily give us these behaviors. Thus, we consider matrix factorization techniques that can give rise to a temporally smooth and a spatially sparse (where we associate time with and network structure with indirectly via our arbitrary choice of how to stack as in (10)).

There are many algorithms for performing matrix factorization, such as probabilistic frameworks that utilize sampling [23]. Again, we focus on optimization based frameworks. Many of these tend to center around variational methods [24] or ADMM [40, 25]. Ultimately, we choose to use inertial Proximal Alternating Linear Minimization (iPALM) [41]. The iPALM method has similarities in form with ADMM, as they both utilize the separability of the objective function and computationally fast proximal operators. However, iPALM guarantees that certain types of nonconvex optimization problems, including regularized matrix factorization, converge to local minima via an adaptive choice of step size. We provide more details on implementation in section V.

Iii Smooth Graph Regression

We propose an optimization based approach that learns time varying graph structure jointly based on the full time series data. Our method offers the parallelizability of the previous smooth window-based approach.

Iii-a Formulation and computation

We formulate the locally linear regression based optimization problem,

(19)

for , where

(20)

and where is a regularizer on the structure of , the matrix corresponds to the instantaneous time derivative of and is essentially a nuisance parameter in our setting. To reduce notational clutter, we have omitted the low-rank component, but the formulations follow exactly analogously from equation (4), by considering the additive low-rank contribution from latent variables as (again, see [14] for details). For , we can consider for example a group sparsity that corresponds to Granger causality if is autoregressive (AR) coefficient matrices, similar to before,

(21)

where , the vectors collecting the elements of the respective AR coefficient matrices across all lag orders.

With this estimator, we can see that it still remains trivially parallelizable across time points ,

(22)

where

(23)

and where simply acts on one of the .

Iii-B Estimation Procedure

Since our optimization problems (19) are convex, to produce our estimates, we can use any number of convex algorithms to solve the formulation. For concreteness, we provide an outline for one such approach using proximal gradient methods.

The gradient computations required are,

(24)

where is the binary mask enforcing the 0 structure. We can let and collect and stack and (respectively) the same way as collects and stacks ,

(25)

Finally, we need the proximal operators for the regularizers,

(26)

For , we have

(27)

where the indexing of within is the same as that of within , which corresponds to a form of group soft thresholding that can be performed quickly.

To put it all together, algorithm 1 describes the full proximal gradient implementation.

1:Regularization parameter , max. iterations .
2:Initialize or randomly, set iteration , initial step size .
3:while not converged and  do (in parallel over )
4:     Compute gradients using equation (24):
(28)
5:     Compute proximal step
(29)
6:     
7:end while
8:return
Algorithm 1 TV Graphs using proximal gradient descent

Iv Changepoints

In section III, the estimation procedure handles smoothly varying networks. However, we may wish to be able to capture abrupt structural changepoints. This will also have the advantage of limiting the estimate of the network from time series data from networks before or after they change significantly. We consider this here. We borrow a clever use of the local linear regression [42, 43] that estimates smooth 1D signals with discrete discontinuities. We can formulate several related problems with varying kernels,

(30)

for denoting left, center, and right, respectively, for which the difference between the estimators is in the kernel weights. Specifically, is still a symmetric kernel centered at evaluated at , but

(31)

The intuition behind this suite of estimators is that each will perform best at different time points relative to the locations of changepoints. Consider a single changepoint at location . For some small enough relative to the kernel bandwidth , if , then both the right and center estimators will use data from two different networks from across the changepoint, while the left estimator will only use data from one network (to th order); hence, we would expect the left estimator to be better. Similarly, we expect the right estimator to perform best when . Finally, with no changepoint we expect the center estimator to be the best since it uses more relevant data.

Given our intuition on change points, how do we quantitatively decide which estimator of the three to use? At each time point, we can compute the empirical residuals of each estimator for ,

(32)

Letting and , we take the estimate

(33)

where

theoretically should be chosen according to assumptions about problem settings, such as those on the noise variance, minimum magnitude of the changepoints, and kernel shape, to name a few. In practice, it could be chosen via some validation procedure. To elaborate, for

, we always choose the central estimator and declare no changepoints (and may skip computing the and estimators), while for we never choose the central estimator (and may skip computing it). This range of values draws out the ROC for the changepoint detector. In fact, it is shown in [42, 43] that, when , even implies (theoretically) never using the central estimator in the interior of the interval, and thus they advise against this.

Finally, the changepoints can be detected by considering the set of indices for which

(34)

This tends to produce smooth behavior except near the changepoints, where the one-sided estimators are used. As an extra post-processing step, we may compute the central estimate on the boundaries of the contiguous segments to ensure smoothness in these regions.

This intuition leads to a speedup of the original detection scheme for generic as well. We can first find all the potential changepoints as determined by these crossovers. Then only at the two time points on either side of the crossover, compute the central estimators to verify that the left and right estimators are indeed better fit than the central (i.e., that for either of these points).

This leads to a modified algorithm, To put it all together, algorithm 1 describes the full proximal gradient implementation.

1:Threshold factor , regularization parameter , max. iterations .
2:Initialize or randomly, set iteration , initial step size .
3:while not converged and  do (in parallel over )
4:     Compute gradients using equation (24):
(35)
5:     Compute proximal step
(36)
6:     
7:end while
8:for  do (in parallel)
9:     Compute empirical errors and select indices
(37)
10:end for
11:Compute changepoints
(38)
12:(Optional for smoothness) Compute central estimate near boundaries
13:return
Algorithm 2 TV Graphs with changepoints

V Principal Network Analysis

While our method for estimating time-varying networks is fairly general, in many cases the graphs vary in time. Network analysis is in general difficult, but analysis of such ensemble of networks is even more difficult and may obscure interpretation of the application. We discuss an alternative simpler reprsentation of the time varying networks as a (time varying) weighted linear combination of some small set of so called “Principal Networks” or eigennetworks. Letting

(39)

we can decompose a set of time varying graphs (th row of is the vectorization of network ) as

(40)

Row  of  is the vector of  weights or eigenfeatures at time  for network . Column of  is the time variation of the th eigenfeature . The  rows of  are the  vectorized principal or eigennetworks. Equation (40) expresses each network  (th row of ) as a weighted linear combination of the  eigennetworks (rows of ). The matrices and can be determined by any number of matrix factorization algorithms. We use an inertial version of Proximal Alternating Linearized Minimization (PALM, or iPALM for the inertial version) because of its theoretical guarantee of convergence to stationary points without requiring a tuning parameter [44, 41].

We perform matrix factorization on the output from algorithm 1 directly. Before we pose the optimization problem, we introduce the regularization to be used. For some , let

(41)

Let our sparsity regularizer be . For concreteness, in the autoregressive (AR) setting we may choose a group regularization corresponding to Granger Causality,

(42)

where similarly to before , the vectors collecting the elements of the respective AR coefficient matrices across all lag orders.

Also, let our low rank regularizer be

(43)

Now, we give the matrix factorization optimization problem,

(44)

where the subindex mf stands for matrix factorization. The Principal Network Analysis (PNA) yields sparse that can be viewed as our principal networks, which fundamentally underlie the process and are present at any given time point in some weighted linear combination with each other, with temporally smoothly varying weights or eigenfeatures potentially with changepoints as a result of being estimated in the same way. We could alternatively regularize to be smooth (e.g., via a grouped sparse version of Total Generalized Variation [45]) while not regularizing or even to regularize both. However, since we perform this factorization on the estimated , which is already sparse and smooth, it is somewhat redundant and only increases the computational complexity.

To solve our nonconvex problem (44) using iPALM, we require a gradient computation on the smooth parts of the loss function and proximal operator computations for the non-smooth parts. Let

(45)

The gradient computations required are

(46)

where . To apply iPALM, we also need to compute an upper bound for the following Lipschitz constants,

(47)

Thus we consider

(48)

Similarly,

(49)

These upper bounds on the Lipschitz constants adaptively determine the step sizes in each coordinate block at each iteration, and thus they are slightly loose but more straightforward to compute than some other possible tighter bounds. Thus iPALM requires additional set-up up front to apply, but does not need tuning for step sizes, as compared to ADMM, which does not require the Lipschitz computation, but has a step size or learning rate parameter that can require properly tuning or setting in more challenging non-convex problems [40, 25].

Finally, we need the proximal operators for the regularizers,

(50)

which can be solved quickly using soft thresholding. Putting everything together, we have algorithm 3.

1:Let . Set , and tolerance . Overloading superscripts to denote the iteration, initialize and or with R-rank SVD.
2:while  and  do
3:     
4:     Set inertial coefficient
5:      
6:      
7:     Compute error
8:end while
9:return
Algorithm 3 PNA using iPALM with two coordinate blocks

We note that a natural direction for an interesting line of future work would be to combine these two steps of first estimating a time varying graph and then performing matrix factorization into a single joint formulation that directly learns the factors, or iterates between the two steps as subproblems. This could bridge the gap between a full temporally coupled solution and its computation, since the factorized form would have fewer total parameters to estimate.

Vi Experiments

We test the time varying graph estimation on simulated and two real data sets. We perform principal network analysis on the results of a US Senate voting record data set to yield the principal networks that we show have interpretable meaning in the context of the contemporary political climate. We also analyze gene interaction networks for the common fruit fly, the Drosophila Melanogaster, as it goes through the four metamorphosis stages of its life.

Vi-a Simulated Data

We explain how we generate the time varying networks and the data , . For lack of space, we report results for the following choice of parameters: number of nodes ; number of eigennetworks ; number of change points ; order of the autoregressive ; time window . From equation (6), . At each , will be a weighted linear combination of  eigennetworks. From equation (39), at each , we need the eigenfeature or weight and the two eigennetworks , .

We generate the eigennetworks

as weighted Erdös-Rényi networks with probability of an edge

and asymetric random edge weights drawn from a normal distribution

with diagonal entries uniformly generated in . In this study, we keep the eigennetworks fixed across the time window . We then generate changepoints uniformly at random in for where is the number of changepoints (in this study ). At each time step , the eigenfeatures or weights of the linear combination are the sum of a fixed level and a sinusoidal term. Because we have two eigennetworks, we have two weights at each . The fixed level of each weight , , and in between changepoints, is kept constant and is drawn uniformly at random from . The parameters of each of the two sinusoids do not change in the whole time window, i.e., they remain the same across the changepoint. The amplitude of each sinusoid is drawn uniformly from the interval , the period being the same for both and equal to , the duration of the time window. The phase of one of the sinusoids is taken to be zero and that of the other to be . By linear combination of the two (fixed) Erdös-Rényi eigennetworks weighted by the eigenfeatures so generated, we created time varying networks , each of size . These time varying networks are all created by the same structure (fixed eigennetowrks), but vary continuously because of the sinusoidal terms in the features , except at the changepoint when the weights jump as the magnitude levels and the amplitudes of the sinusoids jump. This makes it a fairly difficult setting conceptually, as the structure of the eigennetworks remains constant. We also generated data supported on these networks according to an autoregressive process of order .

Fig. 1: Graphs of the top 25 edges of the estimated and true eigennetworks

Figure 1 shows the true and estimated eigennetworks used to generate the data. Structure-wise, the eigennetworks are estimated fairly well; the first eigennetwork has a probability of false alarm444Probability of false alarm of an edge is the probability that an estimated edge is not present in the true eigennetwork. Likewise, probability of detection is the probability that an edge in the true eigennetwork is present in the estimated eigennetwork. Because edges are weighted, we threshold the weights, and a probability of detection , while the second has a and . For visualization, we desire low false alarm to avoid clutter and end up at an operating point with conservative detection. We also note that this is a difficult setting since both networks are strongly active in the first half of the time series while less active in the second half. Thus the two networks are difficult to distinguish in noise, and we see some edges present in both estimates that are only present in one true eigennetwork (e.g., between nodes 16 and 20). A fuller characterization of the ROC would be an interesting direction for future work.

Fig. 2: Graphs of the top 50 edges

Figure 2 shows that near the detected changepoint, using the same sparsity regularization parameter , the networks estimated with the changepoints are better able to capture the weights on either side of the changepoint, while the central estimator seems to average too strongly across the changepoint, resulting in weaker edges. We show in figure 3 the performance of our estimation method as compared to the central estimator for and .

Fig. 3: Time series of elements of graph adjacency matrix and corresponding errors

In figure 3 each line represents the time series of the weights corresponding to one edge. We see that using our method, the changepoint is estimated well, with the overall error comparing favorably to that of the normal central kernel estimate. We note that the estimated magnitudes of the edge weights for the graph are shrunken as a result of the regularization, and there is some bias towards at the boundaries. This suggests an extension of the method would be to let the regularization parameter vary through the interval rather than stay constant. A proper time profile of the value would be of interest.

Vi-B US Senate Voting Data

We also compiled votes from the United States senate roll call records of sessions 111-112, corresponding to a period from 2010-2011 [46, 47]. There are senators from each of states, so that there are at most senators voting on each item. However, instead of directly tracking a growing and shrinking network composed of individual senators, we track the seat that the senator occupies, so that the time series of votes generated by senators continue being produced by their replacements. We treat yes votes as , no votes as , and abstentions/vacancies as so that the time series is (see figure 4).

Fig. 4: Time series of votes by seat, yellow (+1) for Yea, blue (-1) for Nay, and green (0) for abstention/vacant seat.
Yea - Nay Outcome Purpose
97 - 0 Agreed to A resolution honoring the victims and heroes of the shooting…in Tucson, Arizona.
96 - 1 Agreed to …to provide penalties for aiming laser pointers at airplanes…
100 - 0 Confirmed Confirmation Robert S. Mueller, III, of California, to be Director of the Federal Bureau of Investigation
0 - 97 Rejected …budget request for the United States Government for fiscal year 2012, and…budgetary levels for fiscal years 2013 through 2021.
TABLE II: Examples of unanimous/procedural votes

Since a significant portion of votes is nearly unanimous and/or procedural as according to senate rules, as opposed to actually debated and/or legislative and thus truly informative, it is unclear that modeling the effect that temporally neighboring votes have on each other is more meaningful than recovering the reciprocal relations (at least using the frameworks presented in this chapter). For some examples of such unanimous or procedural votes, see table II [47]. Thus, we apply the undirected graph estimation model from equation (7) with the low-rank component included to track the network of relations among the senate.

We use a slightly different sparsity regularization to account for the symmetric structure of the adjacency matrix to be estimated,

(51)
Fig. 5: Top: time series of entries of estimated graph adjacency. Each color is one time element of the matrix. Middle: Changepoints detected by algorithm. Bottom: Actual party affiliations for each seat, yellow for Republican, blue for Democrat.

We see at the top of figure 5 the individual elements of the estimated adjacency matrix, one color for each entry plotted as a function of time . In the middle, we see that the algorithm detected a changepoint at timepoint 349. In the bottom of the figure, we see evidence that there was actually the start of a new session (transition from 111-112) at timepoint 264 (Jan. 2011) in which 5 new senators were elected of the opposite political party to the previous senator they replaced (Republicans replacing Democrats). This happened as the so-called “Tea Party,” which started as a vocal grassroots movement and was active in organizing national demonstrations, had gained major traction in the US Republican Party [48]. The estimate of the changepoint is not in fact wrong; since the available data is the voting record in the senate, what this analysis shows is that the impact of the election is felt not immediately after but after initial pro-forma votes till the senate settles down to its real business of drafting and voting on impactful legislation. The lag in detection corresponds to the extended segment of largely uninformative almost unanimous votes in the beginning of the new session (see figure 4 where, between columns 264 and 349, columns are mostly of the same color, either yellow or blue, with few dissents) so that the change only really manifests itself after this intial stage with the senate settling into the new session.

In figures 6 and 7, we compare estimated networks using our method and the central kernel estimate from and , corresponding to two points respectively before and after the start of the new session as well as the detected changepoint. The top layouts are achieved by using a planar force-directed embedding based on the edges estimated in (in particular, we use the Fruchterman-Reingold [49] layout as implemented in MATLAB® version 2018a). Intuitively, shorter average path lengths between nodes correspond to larger forces and lower distances in the planar embedding. The bottom layouts are computed using the top layouts as initialization points. Neighboring points vote more similarly, signifying ideological closeness. We additionally label seats according to the ground truth party of the senator occupying them at that time, with Republicans in red, Democrats in blue, and the Independent in yellow.

Fig. 6: Top: Estimated adjacency graph from middle of first session; Bottom: Estimated adjacency graph from middle of second session; Left: Our method; Right: Central estimator

In figure 6, we can see that the graph clearly shows the polarization between the two parties. Interestingly, the Independent is shown as ideologically closer to the democrats but on the closest edge to the Republicans. Indeed, the Independent is Senator Sanders, whom we know to hold views that are espoused by both parties, but in 2016 ultimately ran in the Democratic primary race to try to become the party’s presidential candidate. In addition, we point out two seats, (Arkansas) and (Indiana) who were Democratic prior to the election, but switched to Republican afterwards. These senators were on the boundary of the Democratic cluster to begin with, which is consistent with the fact that both states historically lean Republican as measured by the previous presidential elections (2008 was in fact an exception for Indiana, while 1992 and 1996 were exceptions for Arkansas as the Democratic candidate was from Arkansas) [50, 51]. Also, we note that the shapes of the clusters change, with the Democratic clique flattening while the Republican clique stretches. We must be careful in drawing any conclusions from this phenomenon, but this could weakly support (i.e., not provide any evidence to disprove) the conclusions drawn by political researchers claiming that official recognition of the “Tea Party Movement” and its unity resulted in the Republican Party moving ideologically further away from the previous center of the political space [52, 53]. Finally, we see that the two methods estimate very similar graphs, showing that our method performs a similar estimation to the smooth central kernel estimator in regimes away from changepoints.

Fig. 7: Top: Estimated adjacency graph from the first session near the detected changepoint; Bottom: Estimated adjacency graph from the second session near the detected changepoint; Left: Our method; Right: Central estimator

In figure 7, we visualize graphs estimated on either side of the detected changepoint at , this time much closer to the changepoint. We see that the central kernel shows less change between the two timepoints, while there is a noticeable difference across the changepoint in our method. In the top row, there is also some movement relative to the timepoints visualized in figure 6, possibly corresponding to posturing ahead of the election. Continuing our previous interpretations, this could be the existing Republicans trying to maintain their seats over “Tea Party” primary challengers [52]. We clarify one subtlety on this theory: though the visualized timepoint is actually after the beginning of the new session, the kernel bandwidth is such that the votes from before in the old session still influence the estimated graph in both methods, as the visualized timepoint still occurs before the detected changepoint.

Fig. 8: Principal Networks of the senate voting record; Bottom-right: relative time-varying weights for each network

Finally, in figure 8 we visualize the result of PNA applied to this dataset, again using a force-directed layout. This time, note that we do not label the nodes by color since these networks are now across all time, so several seats could be either red or blue depending on the time. We find that the first two Principal Networks (PN) capture the main overall polarized structure of the two parties. PN 3, on the other hand, seems to highlight the most salient and influential seats in the senate. In this case, we see visually that the time series for PN 3 has the largest time derivatives, so that the network corresponds to the edges that had the largest changes throughout the dataset. Again, Independent senator Sanders (89) is seen towards the interior of the network, indicating the existence of many edges with many seats with all manner of ideologies. We see that seat 25 is also towards the center of this network. Interestingly, this is Democratic senator Burris from Illinois, who was specially appointed by the governor as a result of then-senator Obama’s resignation after his election to the presidency in 2008, who retired prior to the beginning of the new session to be replaced by Republican senator Kirk. This change in ideologies occurring before the single detected changepoint was clearly not large enough in magnitude to result in another changepoint, but rather it led to the seat seeming to share ideologies across both parties and a strong presence in PN 3 as a seat with many large enough edges changing smoothly.

Vi-C Gene Interaction Networks

We examine genetic expression data for Drosophila Melanogaster [54] that covers the 4 stages of its life cycle: egg, larva, pupa, adult. The expresion dataset collects binary values for genes at time points. The egg stage corresponds to time points , the larva to time points , the pupa to time points , and the adult to time points . Of course, trying to estimate a network on so many genes is ill-posed, even when trying to impose high levels of sparsity. Thus, as with previous attempts at network analysis on similar gene expression data [55], we select a subsample of developmental genes. We analyze random genes. In the context of this dataset, for ease of presentation to a non-biologist audience, we overload the term “gene” to refer to either genes or the proteins they encode, and label them interchangeably in figures.

We again use the symmetric regularization in equation (51) to estimate the interaction networks or eigennetworks. This is because by subsampling we have less ability to determine causality via directed networks, even when including the low-rank network components corresponding to latent variables. However, we see that we are still able to glean meaningful information from the data.

(a) Net 1
(b) Net 2
(c) Net 3
(d) Net 4
Fig. 9: The weights for the EigenNetworks estimated from gene expression data
Fig. 10: The weights for the eigennetworks estimated from gene expression data

Figure 9 shows the eigennetworks estimated from the data displayed using a hierarchical edge bundling [56] layout in R [57] and figure 10 shows their corresponding weights through time.

In figure 8(a) in the first eigennetwork, we point out the detection of a connection detected between the “kay” and “hpo” genes. These both interact directly with gene “sd”, which we do not have in this subsample but is part of a signalling pathway that determines organ size during early development [58]. In figure 8(b), the second network shows the genes that have high activity during larval development activated. In particular, it is worth noting that the “caps” gene most apparently central in this network has been specifically implicated in pre-pupal wing development, which is one of the major aspects of Drosophila metamorphosis and supported by other active processes in the pre-pupal stages [59]. In figure 8(c)

, the third network shows that “caps“ is still active, but now interacting with “hpo”, which also has a role in photoreceptor neuronal (eye) development 

[60]. Interestingly, “caps” has a secondary role after pre-pupal wing development, which is precisely pupal retinal development [61]. We further note that “hpo” and “sd” also play a role in cancer suppression later on while “kay” is not known to, and we see both “hpo” and “kay” do stay active but now in separate subnetworks rather than directly connected as in the first EigenNetwork. Finally, in figure 8(d) we get to see the few genes active during adulthood. For example “srp” is responsible for glucose metabolism, and “spir” is implicated in the formation of new eggs (oogenesis) [62], and the development of the new embryo from the new egg depends on proper energy storage by the mother [63].

We note that changepoints can be detected detected at time points 7, 39, and 58. The 39 and 58 are consistent with the transitions from larva to pupa and from pupa to adult, while the changepoint at 7, though not one of the four major life stages, is due to the fact that these first 7 time steps actually represent the first few “highly dynamic” hours between fertilization and gastrulation; the genes that initially express are maternal, followed by some genes that express only after fertilization actually due to the new zygote [54]. The transition from egg to larva at 30 is not detected, but we see the evidence of the change still present in the time series of coefficients. In fact, we can see this as evidence that the change in gene expression during the transition from egg to larva is actually relatively smooth rather than abrupt at this time resolution, which is again consistent with previous gene expression studies [54]. We also note that this subsample unfortunately does not contain any gene networks that are significantly active during the pupal stage. Still, the time series of network weights that the algorithm detects corresponds to distinct regimes that in fact line up well with the other known developmental stages.

Vii Conclusion

We have presented a framework for learning the principal networks or eigennetworks–a small set of fundamental graphs–and eigenfeatures that underlie a system described by smooth graphs with potential changepoints. This framework includes a time-varying graph estimation method that is easily parallelized, as a middle ground to existing methods for estimating time-varying graphs. The framework additionally uses a provably convergent matrix factorization to find the fundamental or eigengraphs. We demonstrate the use of this framework on simulated data as well as 1) real US senate voting records for identifying interesting entities within the network and the times at which salient events occurred; and 2) genetic expression data for the Drosophila Melanogaster as it goes through its four stages of metamorphosis.

An interesting direction for future work includes developing additional guarantees for each step of the approach and thus observing the propagation of error through the two steps. Additionally, the application of this method to other areas of science would be exciting, especially in budding areas of neuroscience. A final area of interest is in combining the frameworks for learning graph time series and for performing matrix factorization into a single problem in a numerically stable algorithm, again with performance guarantees.

References

  • [1] G. Deshpande, P. Santhanam, and X. Hu, “Instantaneous and causal connectivity in resting state brain networks derived from functional MRI data,” NeuroImage, vol. 54, no. 2, pp. 1043–1052, Jan. 2011. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S105381191001205X
  • [2] K. Sachs, O. Perez, D. Pe’er, D. A. Lauffenburger, and G. P. Nolan, “Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data,” Science, vol. 308, no. 5721, pp. 523–529, Apr. 2005. [Online]. Available: http://www.sciencemag.org/content/308/5721/523
  • [3] P. A. Valdés-Sosa, J. M. Sánchez-Bornot, A. Lage-Castellanos, M. Vega-Hernández, J. Bosch-Bayard, L. Melie-Garc\́idotlessa, and E. Canales-Rodr\́idotlessguez, “Estimating brain functional connectivity with sparse multivariate autoregression.” Philos. Trans. R. Soc. Lond. B. Biol. Sci., vol. 360, no. 1457, pp. 969–81, May 2005. [Online]. Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1854937&tool=pmcentrez&rendertype=abstract
  • [4] S. T. Roweis and L. K. Saul, “Nonlinear Dimensionality Reduction by Locally Linear Embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, Dec. 2000. [Online]. Available: http://www.sciencemag.org/content/290/5500/2323
  • [5] O. Livne and A. Brandt, “Lean Algebraic Multigrid (LAMG): Fast Graph Laplacian Linear Solver,” SIAM Journal on Scientific Computing, vol. 34, no. 4, pp. B499–B522, Jan. 2012. [Online]. Available: https://epubs.siam.org/doi/abs/10.1137/110843563
  • [6] A. Khan, N. Li, X. Yan, Z. Guan, S. Chakraborty, and S. Tao, “Neighborhood Based Fast Graph Search in Large Networks,” in Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’11.   New York, NY, USA: ACM, 2011, pp. 901–912. [Online]. Available: http://doi.acm.org/10.1145/1989323.1989418
  • [7] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 11, pp. 1222–1239, Nov. 2001.
  • [8] F. R. Bach and M. I. Jordan, “Learning graphical models for stationary time series,” IEEE Transactions on Signal Processing, vol. 52, no. 8, pp. 2189–2199, Aug. 2004.
  • [9] A. Bolstad, B. Van Veen, and R. Nowak, “Causal network inference via group sparse regularization,” IEEE Transactions on Signal Processing, vol. 59, no. 6, pp. 2628–2641, Jun. 2011.
  • [10] X. Dong, D. Thanou, P. Frossard, and P. Vandergheynst, “Laplacian matrix learning for smooth graph signal representation,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2015, pp. 3736–3740.
  • [11] V. Kalofolias, “How to learn a graph from smooth signals,” in

    Proceedings of the 19th International Conference on Artificial Intelligence and Statistics

    , ser. Proceedings of Machine Learning Research, A. Gretton and C. C. Robert, Eds., vol. 51.   Cadiz, Spain: PMLR, 09–11 May 2016, pp. 920–929. [Online]. Available:

    http://proceedings.mlr.press/v51/kalofolias16.html
  • [12] S. Segarra, A. G. Marques, G. Mateos, and A. Ribeiro, “Network topology inference from spectral templates,” IEEE Transactions on Signal and Information Processing over Networks, vol. 3, no. 3, pp. 467–483, Sep. 2017.
  • [13] J. Mei and J. M. F. Moura, “Signal processing on graphs: causal modeling of unstructured data,” IEEE Transactions on Signal Processing, vol. 65, no. 8, pp. 2077–2092, Apr. 2017.
  • [14] ——, “SILVar: Single index latent variable models,” IEEE Transactions on Signal Processing, vol. 66, no. 11, pp. 2790–2803, Jun. 2018.
  • [15] Y. Shen, B. Baingana, and G. B. Giannakis, “Kernel-based structural equation models for topology identification of directed networks,” IEEE Transactions on Signal Processing, vol. 65, no. 10, pp. 2503–2516, May 2017.
  • [16] L. Sirovich and M. Kirby, “Low-dimensional procedure for the characterization of human faces,” J. Opt. Soc. Am. A, vol. 4, no. 3, pp. 519–524, March 1987, publisher: Optical Society of America.
  • [17] M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal of Cognitive Neuroscience, vol. 3, no. 1, pp. 71–86, 1991.
  • [18] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs. fisherfaces: Recognition using class specific linear projection,” IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 19, no. 7, pp. 711–720, July 1997.
  • [19] R. Chellappa, C. L. Wilson, and S. Sirohey, “Human and machine recognition of faces: A survey,” Proceedings of The IEEE, vol. 83, no. 5, pp. 705–740, May 1995.
  • [20] A. Elton and W. Gao, “Task-positive Functional Connectivity of the Default Mode Network Transcends Task Domain,” Journal of Cognitive Neuroscience, vol. 27, no. 12, pp. 2369–2381, Dec. 2015.
  • [21] E. Davidson, The Regulatory Genome: Gene Regulatory Networks In Development And Evolution.   Academic Press, Jun. 2006. [Online]. Available: http://www.amazon.ca/exec/obidos/redirect?tag=citeulike09-20&path=ASIN/0120885638
  • [22] M. A. Turk and A. P. Pentland, “Face recognition using eigenfaces,” in

    1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Proceedings

    , Jun. 1991, pp. 586–591.
  • [23]

    R. Salakhutdinov and A. Mnih, “Bayesian probabilistic matrix factorization using Markov chain Monte Carlo,” in

    In ICML ’08: Proceedings of the 25th International Conference on Machine Learning, 2008.
  • [24] D. Dueck and B. Frey, “Probabilistic sparse matrix factorization,” Univ. Toronto Tech. Rep. PSI-2004, 2004. [Online]. Available: http://www.psi.toronto.edu/psi/pubs/2004/PSI-TR-2004-23.pdf
  • [25] D. Hajinezhad, T. H. Chang, X. Wang, Q. Shi, and M. Hong, “Nonnegative matrix factorization using ADMM: Algorithm and convergence analysis,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Mar. 2016, pp. 4742–4746.
  • [26] E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust Principal Component Analysis?” J. ACM, vol. 58, no. 3, pp. 11:1–11:37, Jun. 2011. [Online]. Available: http://doi.acm.org/10.1145/1970392.1970395
  • [27] S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,” Chemometrics and Intelligent Laboratory Systems, vol. 2, no. 1, pp. 37–52, Aug. 1987. [Online]. Available: http://www.sciencedirect.com/science/article/pii/0169743987800849
  • [28] R. Dahlhaus, “Locally stationary processes,” Handbook of Statistics, vol. 30, pp. 351–412, 2012. [Online]. Available: https://books.google.com/books?hl=en&lr=&id=9wXOytMWHQoC&oi=fnd&pg=PA351&dq=locally+stationary+processes+dahlhaus&ots=5iSnYa6gZM&sig=c2dndyjRfnKwKMq-dy7bv1WD-Yw
  • [29] A. Chambolle, “An Algorithm for Total Variation Minimization and Applications,” Journal of Mathematical Imaging and Vision, vol. 20, no. 1-2, pp. 89–97, Jan. 2004. [Online]. Available: https://link.springer.com/article/10.1023/B:JMIV.0000011325.36760.1e
  • [30]

    L. Song, M. Kolar, and E. P. Xing, “Time-varying dynamic Bayesian networks,” in

    Advances in Neural Information Processing Systems 22, Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, Eds.   Curran Associates, Inc., 2009, pp. 1732–1740. [Online]. Available: http://papers.nips.cc/paper/3716-time-varying-dynamic-bayesian-networks.pdf
  • [31] M. Kolar and E. P. Xing, “Estimating networks with jumps,” Electronic Journal of Statistics, vol. 6, pp. 2069–2106, 2012. [Online]. Available: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4085697/
  • [32] L. M. Bregman, “The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming,” USSR Computational Mathematics and Mathematical Physics, vol. 7, no. 3, pp. 200–217, Jan. 1967. [Online]. Available: http://www.sciencedirect.com/science/article/pii/0041555367900407
  • [33] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, “Clustering with Bregman divergences,” Journal of Machine Learning Research, vol. 6, no. Oct, pp. 1705–1749, 2005. [Online]. Available: http://www.jmlr.org/papers/v6/banerjee05b.html
  • [34] J. Friedman, T. Hastie, and R. Tibshirani, “Sparse inverse covariance estimation with the graphical Lasso,” Biostatistics, vol. 9, no. 3, pp. 432–41, Jul. 2008. [Online]. Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3019769&tool=pmcentrez&rendertype=abstract
  • [35] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on graphs,” IEEE Transactions on Signal Processing, vol. 61, no. 7, pp. 1644–1656, Apr. 2013.
  • [36] ——, “Discrete signal processing on graphs: Frequency analysis,” IEEE Transactions on Signal Processing, vol. 62, no. 12, pp. 3042–3054, Jun. 2014.
  • [37] ——, “Big Data Analysis with Signal Processing on Graphs: Representation and processing of massive data sets with irregular structure,” IEEE Signal Processing Magazine, vol. 31, no. 5, pp. 80–90, Sep. 2014.
  • [38] J. Fan and I. Gijbels, “Local linear smoothers in regression function estimation,” North Carolina State University, Technical Report 2055, 1991.
  • [39] L. N. Trefethen and D. Bau III, Numerical Linear Algebra.   SIAM, Jan. 1997, google-Books-ID: JaPtxOytY7kC.
  • [40] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends Mach. Learn., vol. 3, no. 1, pp. 1–122, Jan. 2011. [Online]. Available: http://dx.doi.org/10.1561/2200000016
  • [41] T. Pock and S. Sabach, “Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems,” SIAM Journal on Imaging Sciences, vol. 9, no. 4, pp. 1756–1787, Jan. 2016. [Online]. Available: https://epubs.siam.org/doi/10.1137/16M1064064
  • [42] P. Qiu, “A jump-preserving curve fitting procedure based on local piecewise-linear kernel estimation,” Journal of Nonparametric Statistics, vol. 15, pp. 437–453, 2003.
  • [43] I. Gijbels, A. Lambert, and P. Qiu, “Jump-preserving regression and smoothing using local linear fitting: a compromise,” Annals of the Institute of Statistical Mathematics, vol. 59, pp. 235–272, 2007.
  • [44] J. Bolte, S. Sabach, and M. Teboulle, “Proximal alternating linearized minimization for nonconvex and nonsmooth problems,” Mathematical Programming, vol. 146, no. 1-2, pp. 459–494, Aug. 2014. [Online]. Available: https://link.springer.com/article/10.1007/s10107-013-0701-9
  • [45] K. Bredies, K. Kunisch, and T. Pock, “Total generalized variation,” SIAM Journal on Imaging Sciences, vol. 3, no. 3, pp. 492–526, Jan. 2010. [Online]. Available: https://epubs.siam.org/doi/abs/10.1137/090769521
  • [46] “U.S. Senate: Roll call votes 111th Congress-2nd session (2010),” Dec. 2010. [Online]. Available: https://www.senate.gov/legislative/LIS/roll_call_lists/vote_menu_111_2.htm
  • [47] “U.S. Senate: Roll call votes 112th Congress-1st session (2011),” Dec. 2011. [Online]. Available: https://www.senate.gov/legislative/LIS/roll_call_lists/vote_menu_112_1.htm
  • [48] C. F. Karpowitz, J. Q. Monson, K. D. Patterson, and J. C. Pope, “Tea Time in America? The impact of the Tea Party Movement on the 2010 midterm elections,” PS: Political Science & Politics, vol. 44, no. 2, pp. 303–309, Apr. 2011.
  • [49] T. M. J. Fruchterman and E. M. Reingold, “Graph drawing by force-directed placement,” Software: Practice and Experience, vol. 21, no. 11, pp. 1129–1164, Nov. 1991. [Online]. Available: http://onlinelibrary.wiley.com/doi/10.1002/spe.4380211102/abstract
  • [50] “Arkansas Presidential Election Voting History,” https://www.270towin.com/states/Arkansas, Apr. 2018. [Online]. Available: https://www.270towin.com/states/Arkansas
  • [51] “Indiana Presidential Election Voting History,” https://www.270towin.com/states/Indiana, Apr. 2018. [Online]. Available: https://www.270towin.com/states/Indiana
  • [52] V. Williamson, T. Skocpol, and J. Coggin, “The Tea Party and the remaking of Republican conservatism,” Perspectives on Politics, vol. 9, no. 1, pp. 25–43, Mar. 2011. [Online]. Available: https://www.cambridge.org/core/journals/perspectives-on-politics/article/tea-party-and-the-remaking-of-republican-conservatism/BDF68005B52758A48F7EC07086C3788C
  • [53] B. T. Gervais and I. L. Morris, “Reading the Tea leaves: Understanding Tea Party caucus membership in the US House of Representatives,” PS: Political Science & Politics, vol. 45, no. 2, pp. 245–250, Apr. 2012.
  • [54] M. N. Arbeitman, E. E. M. Furlong, F. Imam, E. Johnson, B. H. Null, B. S. Baker, M. A. Krasnow, M. P. Scott, R. W. Davis, and K. P. White, “Gene Expression During the Life Cycle of Drosophila melanogaster,” Science, vol. 297, no. 5590, pp. 2270–2275, Sep. 2002. [Online]. Available: http://science.sciencemag.org/content/297/5590/2270
  • [55] M. Kolar, H. Liu, and E. P. Xing, “Graph Estimation From Multi-Attribute Data,” Journal of Machine Learning Research, vol. 15, pp. 1713–1750, 2014. [Online]. Available: http://jmlr.org/papers/v15/kolar14a.html
  • [56] D. Holten, “Hierarchical edge bundles: visualization of adjacency relations in hierarchical data,” IEEE transactions on visualization and computer graphics, vol. 12, no. 5, pp. 741–748, Oct. 2006.
  • [57] R. C. Team, “R: A Language and Environment for Statistical Computing.”   Vienna, Austria: R Foundation for Statistical Computing, 2014. [Online]. Available: http://www.R-project.org/
  • [58] B. N. Rohith and B. V. Shyamala, “Scalloped a member of the Hippo tumor suppressor pathway controls mushroom body size in Drosophila brain by non-canonical regulation of neuroblast proliferation,” Developmental Biology, vol. 432, no. 2, pp. 203–214, Dec. 2017. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S001216061730266X
  • [59] M. Milán, U. Weihe, L. Pérez, and S. M. Cohen, “The LRR Proteins Capricious and Tartan Mediate Cell Interactions during DV Boundary Formation in the Drosophila Wing,” Cell, vol. 106, no. 6, pp. 785–794, Sep. 2001. [Online]. Available: https://www.cell.com/cell/abstract/S0092-8674(01)00489-5
  • [60] R. Nolo, C. M. Morrison, C. Tao, X. Zhang, and G. Halder, “The bantam microRNA is a target of the hippo tumor-suppressor pathway,” Current biology: CB, vol. 16, no. 19, pp. 1895–1904, Oct. 2006.
  • [61] M. Shinza-Kameda, E. Takasu, K. Sakurai, S. Hayashi, and A. Nose, “Regulation of layer-specific targeting by reciprocal expression of a cell adhesion molecule, capricious,” Neuron, vol. 49, no. 2, pp. 205–213, Jan. 2006.
  • [62] M. E. Quinlan, J. E. Heuser, E. Kerkhoff, and R. Dyche Mullins, “Drosophila Spire is an actin nucleation factor,” Nature, vol. 433, no. 7024, pp. 382–388, Jan. 2005. [Online]. Available: https://www.nature.com/articles/nature03241
  • [63] A. Fraga, L. Ribeiro, M. Lobato, V. Santos, J. R. Silva, H. Gomes, J. L. d. C. Moraes, J. d. S. Menezes, C. J. L. d. Oliveira, E. Campos, and R. N. d. Fonseca, “Glycogen and Glucose Metabolism Are Essential for Early Embryonic Development of the Red Flour Beetle Tribolium castaneum,” PLOS ONE, vol. 8, no. 6, p. e65125, Jun. 2013. [Online]. Available: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0065125