# Efficient Minimax Signal Detection on Graphs

Several problems such as network intrusion, community detection, and disease outbreak can be described by observations attributed to nodes or edges of a graph. In these applications presence of intrusion, community or disease outbreak is characterized by novel observations on some unknown connected subgraph. These problems can be formulated in terms of optimization of suitable objectives on connected subgraphs, a problem which is generally computationally difficult. We overcome the combinatorics of connectivity by embedding connected subgraphs into linear matrix inequalities (LMI). Computationally efficient tests are then realized by optimizing convex objective functions subject to these LMI constraints. We prove, by means of a novel Euclidean embedding argument, that our tests are minimax optimal for exponential family of distributions on 1-D and 2-D lattices. We show that internal conductance of the connected subgraph family plays a fundamental role in characterizing detectability.

## Authors

• 19 publications
• 68 publications
• ### Structure and substructure connectivity of balanced hypercubes

The connectivity of a network directly signifies its reliability and fau...
08/06/2018 ∙ by Huazhong Lü, et al. ∙ 0

We contribute an approach to the problem of locally computing sparse con...
07/10/2020 ∙ by Rogers Epstein, et al. ∙ 0

• ### On density of subgraphs of Cartesian products

In this paper, we extend two classical results about the density of subg...
11/30/2017 ∙ by Victor Chepoi, et al. ∙ 0

• ### Strong subgraph k-connectivity bounds

Let D=(V,A) be a digraph of order n, S a subset of V of size k and 2< k≤...
03/01/2018 ∙ by Yuefang Sun, et al. ∙ 0

• ### Top-k Connected Overlapping Densest Subgraphs in Dual Networks

Networks are largely used for modelling and analysing data and relations...
08/04/2020 ∙ by Riccardo Dondi, et al. ∙ 0

• ### On different Versions of the Exact Subgraph Hierarchy for the Stable Set Problem

One of many different hierarchies towards the stability number of a grap...
03/30/2020 ∙ by Elisabeth Gaar, et al. ∙ 0

• ### A Spectral Framework for Anomalous Subgraph Detection

A wide variety of application domains are concerned with data consisting...
01/29/2014 ∙ by Benjamin A. Miller, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Signals associated with nodes or edges of a graph arise in a number of applications including sensor network intrusion, disease outbreak detection and virus detection in communication networks. Many problems in these applications can be framed from the perspective of hypothesis testing between null and alternative hypothesis. Observations under null and alternative follow different distributions. The alternative is actually composite and identified by sub-collections of connected subgraphs.

To motivate the setup consider the disease outbreak problem described in [1]

. Nodes there are associated with counties and observations associated with each county correspond to reported cases of a disease. Under the null distribution, observations at each county are assumed to be poisson distributed and independent across different counties. Under the alternative there are a contiguous sub-collection of counties (connected sub-graph) that each experience elevated cases on average from their normal levels but are otherwise assumed to be independent. The eventual shape of the sub-collection of contiguous counties is highly unpredictable due to uncontrollable factors.

In this paper we develop a novel approach for signal detection on graphs that is both statistically effective and computationally efficient. Our approach is based on optimizing an objective function subject to subgraph connectivity constraints, which is related to generalized likelihood ratio tests (GLRT). GLRTs maximize likelihood functions over combinatorially many connected subgraphs, which is computationally intractable. On the other hand statistically, GLRTs have been shown to be asymptotically minimax optimal for exponential class of distributions on Lattice graphs & Trees [2] thus motivating our approach.We deal with combinatorial connectivity constraints by obtaining a novel characterization of connected subgraphs in terms of convex Linear Matrix Inequalities (LMIs). In addition we show how our LMI constraints naturally incorporate other features such as shape and size. We show that the resulting tests are essentially minimax optimal for exponential family of distributions on 1-D and 2-D lattices. Conductance of the subgraph, a parameter in our LMI constraint, plays a central role in characterizing detectability.

Related Work:  The literature on signal detection on graphs can be organized into parametric and non-parametric methods, which can be further sub-divided into computational and statistical analysis themes. Parametric methods originated in the scan statistics literature [3] with more recent work including that of [4, 5, 6, 1, 7, 8] focusing on graphs. Much of this literature develops scanning methods that optimize over rectangles, circles or neighborhood balls [5, 6] across different regions of the graphs. However, the drawbacks of simple shapes and the need for non-parametric methods to improve detection power is well recognized. This has led to new approaches such as simulated annealing [5, 4] but is lacking in statistical analysis. More recent work in ML literature [9]

describes semi-definite programming algorithm for non-parametric shape detection, which is similar to our work here. However, unlike us their method requires a heuristic rounding step, which does not lend itself to statistical analysis. In this context a number of recent papers have focused on statistical analysis

[10, 2, 11, 12]

with non-parametric shapes. They derive fundamental bounds for signal detection for the elevated means testing problem in the Gaussian setting on special graphs such as trees and lattices. In this setting under the null hypothesis the observations are assumed to be independent identically distributed (IID) with standard normal random variables. Under the alternative the Gaussian random variables are assumed to be standard normal except on some

connected subgraph where the mean is elevated. They show that GLRT achieves “near”-minimax optimality in a number of interesting scenarios. While this work is interesting the suggested algorithms are computationally intractable. To the best of our knowledge only  [13, 14] explores a computationally tractable approach and also provides statistical guarantees. Nevertheless, this line of work does not explicitly deal with connected subgraphs (complex shapes) but deals with more general clusters. These are graph partitions with small out-degree. Although this appears to be a natural relaxation of connected subgraphs/complex-shapes it turns out to be quite loose111A connected subgraph on a 2-D lattice of size has out-degree at least while set of subgraphs with out-degree includes disjoint union of nodes. So statistical requirements with out-degree constraints can be no better than those for arbitrary -sets. and leads to substantial gap in statistical effectiveness for our problem. In contrast we develop a new method for signal detection of complex shapes that is not only statistically effective but also computationally efficient.

## 2 Problem Formulation

Let denote an undirected unweighted graph with nodes and edges. Associated with each node, , are observations . We assume observations are distributed under the null hypothesis. The alternative is composite and the observed distribution, , is parameterized by belonging to a class of subsets , where is the superset. We denote by the collection of size- subsets. denotes the induced edge set on . We let denote the collection of random variables on the subset . denotes nodes . Our goal is to design a decision rule, , that maps observations to with zero denoting null hypothesis and one denoting the alternative. We formulate risk following the lines of [12]

and combine Type I and Type II errors:

 R(π) = (1)
###### Definition 1 (δ-Separable).

We say that the composite hypothesis problem is -separable if there exists a test such that, .

We next describe asymptotic notions of detectability and separability. These notions requires us to consider large-graph limits. To this end we index a sequence of graphs with and an associated sequence of tests .

###### Definition 2 (Separability).

We say that the composite hypothesis problem is asymptotically -separable if there is some sequence of tests, , such that for sufficiently large . It is said to be asymptotically separable if . The composite hypothesis problem is said to be asymptotically inseparable if no such test exists.

Sometimes, additional granular measures of performance are often useful to determine asymptotic behavior of Type I and Type II error. This motivates the following definition:

###### Definition 3 (δ-Detectability).

We say that the composite hypothesis testing problem is -detectable if there is a sequence of tests, , such that,

 supS∈ΛPS(πn(xn)=0)n→∞⟶0,limsupnP0(πn(xn)=1)≤δ

In general -detectability does not imply separability. For instance, consider and . It is -detectable for but not separable.

##### Generalized Likelihood Ratio Test (GLRT)

is often used as a statistical test for composite hypothesis testing. Suppose and

are probability density functions associated with

and respectively. The GLRT test thresholds the “best-case” likelihood ratio, namely,

 GLRT:ℓmax(xn)=maxS∈ΛℓS(xn)H1>

Local Behavior: Without additional structure, the likelihood ratio, for a fixed is a function of observations across all nodes. Many applications exhibit local behavior, namely, the observations under the two hypothesis behave distinctly only on some small subset of nodes (as in disease outbreaks). This justifies introducing local statistical models in the following section. Combinatorial: The class is combinatorial such as collections of connected subgraphs and GLRT is not generally computationally tractable. On the other hand GLRT is minimax optimal for special classes of distributions and graphs and motivates development of tractable algorithms.

### 2.1 Statistical Models & Subgraph Classes

The foregoing discussion motivates introducing local models, which we present next. Then informed by existing results on separability we categorize subgraph classes by shape, size and connectivity.

#### 2.1.1 Local Statistical Models

Signal in Noise Models arise in sensor network (SNET) intrusion [7, 15] and disease outbreak detection [1]. They are modeled with Gaussian (SNET) and Poisson (disease outbreak) distributions.

 H0:xv=wv;H1:xv=μαuv1S(v)+wv,for some,S∈Λ,u∈S (3)

For Gaussian case we model as a constant, as IID standard normal variables, as the propagation loss from source node to the node . In disease outbreak detection , and are independent Poisson random variables, and is the population of county . In these cases takes the following local form where is a normalizing constant.

 ℓS(x)=ℓS(xS)∝∑v∈V(Ψv(xv)−log(Zv))1S(v) (4)

We characterize as the minimum value that ensures separability for the different models:

 μ0=inf{μ∈R+∣∃πn,limn→∞R(πn)=0},λ0=inf{λ∈R+∣∃πn,limn→∞R(πn)=0} (5)

Correlated Models arise in textured object detection [16] and protein subnetwork detection [17]. For instance consider a common random signal on , which results in uniform correlation on .

 H0:xv=wv;H1:xv=(√ρ(1−ρ)−1)z1S(v)+wv,for some,S∈Λ, (6)

are standard IID normal random variables. Again we obtain . These examples motivate the following general setup for local behavior:

###### Definition 4.

The distributions and are said to exhibit local structure if they satisfy:
(1) Markovianity: The null distribution satisfies the properties of a Markov Random Field (MRF). Under the distribution the observations are conditionally independent of when conditioned on annulus , where , is the 1-neighborhood of . (2) Mask: Marginal distributions of observations under and on nodes in are identical: , the -algebra of measurable sets.

###### Lemma 1 ([7]).

Under conditions (1) and (2) it follows that .

#### 2.1.2 Structured Subgraphs

Existing works [10, 2, 12]

point to the important role of size, shape and connectivity in determining detectability. For concreteness we consider the signal in noise model for Gaussian distribution and tabulate upper bounds from existing results for

(Eq. 5). The lower bounds are messier and differ by logarithmic factors but this suffices for our discussion here. The table reveals several important points. Larger sets are easier to detect – decreases with size; connected -sets are easier to detect relative to arbitrary -sets; for 2-D lattices “thick” connected shapes are easier to detect than “thin” sets (paths); finally detectability on complete graphs is equivalent to arbitrary -sets, i.e., shape does not matter. Intuitively, these tradeoffs make sense. For a constant , “signal-to-noise” ratio increases with size. Combinatorially, there are fewer -connected sets than arbitrary -sets; fewer connected balls than connected paths; and fewer connected sets in 2-D lattices than dense graphs.

These results point to the need for characterizing the signal detection problem in terms of connectivity, size, shape and the properties of the ambient graph. We also observe that the table is somewhat incomplete. While balls can be viewed as thick shapes and paths as thin shapes, there are a plethora of intermediate shapes. A similar issue arises for sparse vs. dense graphs. We introduce general definitions to categorize shape and graph structures below.

###### Definition 5 (Internal Conductance).

(a.k.a. Cut Ratio) Let denote a subgraph of where , , written as . Define the internal conductance of as:

 ϕ(H)=minA⊂S|δS(A)|min{|A|,|S−A|};δS(A)={(u,v)∈FS∣u∈A,v∈S−A} (7)

Apparently if is not connected. The internal conductance of a collection of subgraphs, , is defined as the smallest internal conductance:

 ϕ(Σ)=minH∈Σϕ(H)

For future reference we denote the collection of connected subgraphs by and by the sub-collections containing node with minimal internal conductance :

 C={H⊆G:ϕ(H)>0},Ca,Φ={H=(S,FS)⊆G:a∈S,ϕ(H)≥Φ} (8)

In 2-D lattices, for example, for connected K-balls or other thick shapes of size . due to “snake”-like thin shapes. Thus internal conductance explicitly accounts for shape of the sets.

## 3 Convex Programming

We develop a convex optimization framework for generating test statistics for local statistical models described in Section

2.1. Our approach relaxes the combinatorial constraints and the functional objectives of the GLRT problem of Eq.(2). In the following section we develop a new characterization based on linear matrix inequalities that accounts for size, shape and connectivity of subgraphs. For future reference we denote .

Our first step is to embed subgraphs, of , into matrices. A binary symmetric incidence matrix, , is associated with an undirected graph , and encodes edge relationships. Formally, the edge set is the support of , namely, . For subgraph correspondences we consider symmetric matrices, , with components taking values in the unit interval, .

 M={M∈[0,1]n×n∣Muv≤Muu,M% Symmetric}
###### Definition 6.

is said to correspond to a subgraph , written as , if

 S=Supp{Diag(M)},FS=Supp(A∘M)

The role of is to ensure that if we want the corresponding edges . Note that in Defn. 6 removes the spurious edges for .

Our second step is to characterize connected subgraphs as convex subsets of . Now a subgraph is a connected subgraph if for every , there is a path consisting only of edges in going from to . This implies that for two subgraphs and corresponding matrices and , their convex combination naturally corresponds to in the sense of Defn 6. On the other hand if then is disconnected and so is as well. This motivates our convex characterization with a common “anchor” node. To this end we consider the following collection of matrices:

 M∗a={M∈M∣Maa=1,Mvv≤Mav}

Note that includes star graphs induced on subsets with anchor node . We now make use of the well known properties [18] of the Laplacian of a graph to characterize connectivity. The unnormalized Laplacian matrix of an undirected graph with incidence matrix is described by where

is the all-one vector.

###### Lemma 2.

Graph

is connected if and only if the number of zero eigenvalues of

is one.

Unfortunately, we cannot directly use this fact on the subgraph because there are many zero eigenvalues because the complement of is by definition zero. We employ linear matrix inequalities (LMI) to deal with this issue. The condition [19] with symmetric matrices is called a linear matrix inequality in with respect to the positive semi-definite cone represented by . Note that the Laplacian of the subgraph is a linear matrix function of . We denote a collection of subgraphs as follows:

 CLMI(a,γ)Δ={H⇌M∣M∈M∗a,L(A∘M)−γL(M)⪰0} (9)
###### Theorem 3.

The class is connected for . Furthermore, every connected subgraph can be characterized in this way for some and , namely, .

Proof Sketch. implies is connected. By definition of there must be a star graph that is a subgraph on . This means that (hence ) can only have one zero eigenvalue on . We can now invoke Lemma 2 on

. The other direction is based on hyperplane separation of convex sets. Note that

is convex but is not. This necessitates the need for an anchor. In practice this means that we have to search for connected sets with different anchors. This is similar to scan statistics the difference being that we can now optimize over arbitrary shapes. We next get a handle on .

encodes Shape:   We will relate to the internal conductance of the class . This provides us with a tool to choose to reflect the type of connected sets that we expect for our alternative hypothesis. In particular thick sets correspond to relatively large and thin sets to small . In general for graphs of fixed size the minimum internal conductance over all connected shapes is strictly positive and we can set to be this value if we do not a priori know the shape.

###### Theorem 4.

In a 2-D lattice, it follows that , where .

LMI-Test:  We are now ready to present our test statistics. We replace indicator variables with the corresponding matrix components in Eq. 4, i.e., and obtain:

 Elevated Mean: ℓM(x)=∑v∈V(Ψv(xv)−log(Zv))Mvv Correlated Gaussian: ℓM(x)∝∑(u,v)∈EΨ(xu,xv)Muv−∑vMvvlog(1−ρ) (10) LMITa,γ ℓa,γ(x)=maxM∈CLMI(a,γ)ℓM(x)H1>

This test explicitly makes use of the fact that alternative hypothesis is anchored at and the internal conductance parameter is known. We will refine this test to deal with the completely agnostic case in the following section.

## 4 Analysis

In this section we analyze LMIT and the agnostic LMI tests for the Elevated Mean problem for exponential family of distributions on 2-D lattices. For concreteness we focus on Gaussian & Poisson models and derive lower and upper bounds for (see Eq. 5). Our main result states that to guarantee separability, , where is the internal conductance of the family of connected subgraphs, is the size of the subgraphs in the family, and is some node that is common to all the subgraphs. The reason for our focus on homogenous Gaussian/Poisson setting is that we can extend current lower bounds in the literature to our more general setting and demonstrate that they match the bounds obtained from our LMIT analysis. We comment on how our LMIT analysis extends to other general structures and models later.

The proof for LMIT analysis involves two steps (see Supplementary):

1. Lower Bound: Under we show that the ground truth is a feasible solution. This allows us to lower bound the objective value, , of Eq. 11.

2. Upper Bound: Under we consider the dual problem. By weak duality it follows that any feasible solution of the dual is an upper bound for . A dual feasible solution is then constructed through a novel Euclidean embedding argument.

We then compare the upper and lower bounds to obtain the critical value .

We analyze both non-agnostic and agnostic LMI tests for the homogenous version of Gaussian and Poisson models of Eq. 3 for both finite and asymptotic 2-D lattice graphs. For the finite case the family of subgraphs in Eq. 3 is assumed to belong to the connected family of sets, , containing a fixed common node of size . For the asymptotic case we let the size of the graph approach infinity (). For this case we consider a sequence of connected family of sets on graph with some fixed anchor node . We will then describe results for agnostic LMI tests, i.e., lacking knowledge of conductance and anchor node .

Poisson Model:  In Eq. 3 we let the population to be identically equal to one across counties. We present LMI tests that are agnostic to shape and anchor nodes:

 LMITA:ℓ(x)=maxa∈V,γ≥Φ2min√γℓa,γ(x)H0>

where denotes the minimum possible conductance of a connected subgraph with size , which is .

###### Theorem 5.

The test achieves -separability for and the agnostic test LMIT for .

Next we consider the asymptotic case and characterize tight bounds for separability.

###### Theorem 6.

The two hypothesis and are asymptotically inseparable if . It is asymptotically separable with for . The agnostic achieves asymptotic separability with .

Gaussian Model:  We next consider agnostic tests for Gaussian model of Eq. 3 with no propagation loss, i.e., .

###### Theorem 7.

The two hypotheses and for the Gaussian model are asymptotically inseparable if , are separable with if , and are separable with if

Our inseparability bound matches existing results on 2-D Lattice & Line Graphs by plugging in appropriate values for for the cases considered in [2, 12]. The lower bound is obtained by specializing to a collection of “non-decreasing band” subgraphs.Yet LMIT and LMIT is able to achieves the lower bound within a logarithmic factor. Furthermore, our analysis extends beyond Poisson & Gaussian models and applies to general graph structures and models. The main reason is that our LMIT analysis is fairly general and provides an observation-dependent bound through convex duality. We briefly describe it here. Consider functions that are positive, separable and bounded for simplicity. By establishing primal feasibility that the subgraph for a suitably chosen , we can obtain a lower bound for the alternative hypothesis and show that . On the other hand for the null hypothesis we can show that, . Here and denote expectations with respect to alternative and null hypothesis and is a ball-like thick shape centered at with radius . Our result then follows by invoking standard concentration inequalities. We can extend our analysis to the non-separable case such as correlated models because of the linear objective form in Eq. 10.

## 5 Experiments

We present several experiments to highlight key properties of LMIT and to compare LMIT against other state-of-art parametric and non-parametric tests on synthetic and real-world data. We have shown that agnostic LMIT is near minimax optimal in terms of asymptotic separability. However, separability is an asymptotic notion and only characterizes the special case of zero false alarms (FA) and missed detections (MD), which is often impractical. It is unclear how LMIT behaves with finite size graphs when FAs and MDs are prevalent. In this context incorporating priors could indeed be important. Our goal is to highlight how shape prior (in terms of thick, thin, or arbitrary shapes) can be incorporated in LMIT using the parameter to obtain better AUC performance in finite size graphs. Another goal is to demonstrate how LMIT behaves with denser graph structures.

From the practical perspective, our main step is to solve the following SDP problem:

 maxM:∑iyiMiis.t.M∈CLMI(a,γ),tr(M)≤K

We use standard SDP solvers which can scale up to nodes for sparse graphs like lattice and nodes for dense graphs with edges.

To understand the impact of shape we consider the test LMIT for Gaussian model and manually vary . On a 1510 lattice we fix the size (17 nodes) and the signal strength , and consider three different shapes (see Fig. 1) for the alternative hypothesis. For each shape we synthetically simulate 100 null and 100 alternative hypothesis and plot AUC performance of LMIT as a function of . We observe that the optimum value of AUC for thick shapes is achieved for large and small for thin shape confirming our intuition that is a good surrogate for shape. In addition we notice that thick shapes have superior AUC performance relative to thin shapes, again confirming intuition of our analysis.

To understand the impact of dense graph structures we consider performance of LMIT with neighborhood size. On the lattice of the previous experiment we vary neighborhood by connecting each node to its 1-hop, 2-hop, and 3-hop neighbors to realize denser structures with each node having 4, 8 and 12 neighbors respectively. Note that all the different graphs have the same vertex set. This is convenient because we can hold the shape under the alternative fixed for the different graphs. As before we generate 100 alternative hypothesis using the thin set of the previous experiment with the same mean and 100 nulls. The AUC curves for the different graphs highlight the fact that higher density leads to degradation in performance as our intuition with complete graphs suggests. We also see that as density increases a larger achieves better performance confirming our intuition that as density increases the internal conductance of the shape increases.

In this part we compare LMIT against existing state-of-art approaches on a 300-node lattice, a 200-node random geometric graph (RGG), and a real-world county map graph (129 nodes) (see Fig.3,4). We incorporate shape priors by setting (internal conductance) to correspond to thin sets. While this implies some prior knowledge, we note that this is not necessarily the optimal value for and we are still agnostic to the actual ground truth shape (see Fig.3,4). For the lattice and RGG we use the elevated-mean Gaussian model. Following [1] we adopt an elevated-rate independent Poisson model for the county map graph. Here is the population of county, . Under null the number of cases at county , follows a Poisson distribution with rate and under the alternative a rate within some connected subgraph. We assume and apply a weighted version of LMIT of Eq. 12, which arises on account of differences in population. We compare LMIT against several other tests, including simulated annealing (SA) [4], rectangle test (Rect), nearest-ball test (NB), and two naive tests: maximum test (MaxT) and average test (AvgT). SA is a non-parametric test and works by heuristically adding/removing nodes toward a better normalized GLRT objective while maintaining connectivity. Rect and NB are parametric methods with Rect scanning rectangles on lattice and NB scanning nearest-neighbor balls around different nodes for more general graphs (RGG and county-map graph). MaxT & AvgT are often used for comparison purposes. MaxT is based on thresholding the maximum observed value while AvgT is based on thresholding the average value.

We observe that uniformly MaxT and AvgT perform poorly. This makes sense; It is well known that MaxT works well only for alternative of small size while AvgT works well with relatively large sized alternatives  [11]. Parametric methods (Rect/NB) performs poorly because the shape of the ground truth under the alternative cannot be well-approximated by Rectangular or Nearest Neighbor Balls. Performance of SA requires more explanation. One issue could be that SA does not explicitly incorporate shape and directly searches for the best GLRT solution. We have noticed that this has the tendency to amplify the objective value of null hypothesis because SA exhibits poor “regularization” over the shape. On the other hand LMIT provides some regularization for thin shape and does not admit arbitrary connected sets.

## References

• [1] G. P. Patil and C. Taillie. Geographic and network surveillance via scan statistics for critical area detection. In Statistical Science, volume 18(4), pages 457–465, 2003.
• [2] E. Arias-Castro, E. J. Candes, H. Helgason, and O. Zeitouni. Searching for a trail of evidence in a maze. In The Annals of Statistics, volume 36(4), pages 1726–1757, 2008.
• [3] J. Glaz, J. Naus, and S. Wallenstein. Scan Statistics. Springer, New York, 2001.
• [4] L. Duczmal and R. Assuncao. A simulated annealing strategy for the detection of arbitrarily shaped spatial clusters. In Computational Statistics and Data Analysis, volume 45, pages 269–286, 2004.
• [5] M. Kulldorff, L. Huang, L. Pickle, and L. Duczmal. An elliptic spatial scan statistic. In Statistics in Medicine, volume 25, 2006.
• [6] C. E. Priebe, J. M. Conroy, D. J. Marchette, and Y. Park. Scan statistics on enron graphs. In Computational and Mathematical Organization Theory, 2006.
• [7] V. Saligrama and M. Zhao.

Local anomaly detection.

In Artificial Intelligence and Statistics, volume 22, 2012.
• [8] V. Saligrama and Z. Chen. Video anomaly detection based on local statistical aggregates.

2013 IEEE Conference on Computer Vision and Pattern Recognition

, 0:2112–2119, 2012.
• [9] J. Qian and V. Saligrama. Connected sub-graph detection. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2014.
• [10] E. Arias-Castro, D. Donoho, and X. Huo. Near-optimal detection of geometric objects by fast multiscale methods. In IEEE Transactions on Information Theory, volume 51(7), pages 2402–2425, 2005.
• [11] Addario-Berry, N. Broutin, L. Devroye, and G. Lugosi. On combinatorial testing problems. In The Annals of Statistics, volume 38(5), pages 3063–3092, 2010.
• [12] E. Arias-Castro, E. J. Candes, and A. Durand. Detection of an anomalous cluster in a network. In The Annals of Statistics, volume 39(1), pages 278–304, 2011.
• [13] J. Sharpnack, A. Rinaldo, and A. Singh. Changepoint detection over graphs with the spectral scan statistic. In International Conference on Artificial Intelligence and Statistics, 2013.
• [14] J. Sharpnack, A. Krishnamurthy, and A. Singh. Near-optimal anomaly detection in graphs using lovasz extended scan statistic. In Neural Information Processing Systems, 2013.
• [15] Erhan Baki Ermis and Venkatesh Saligrama. Distributed detection in sensor networks with limited range multimodal sensors. IEEE Transactions on Signal Processing, 58(2):843–858, 2010.
• [16] G. R. Cross and A. K. Jain. Markov random field texture models. In IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 5, pages 25–39, 1983.
• [17] M. Bailly-Bechet, C. Borgs, A. Braunstein, J. T. Chayes, A.Dagkessamanskaia, J. Francois, and R. Zecchina. Finding undetected protein associations in cell signaling by belief propagation. In Proceedings of the National Academy of Sciences (PNAS), volume 108, pages 882–887, 2011.
• [18] F. Chung. Spectral graph theory. American Mathematical Society, 1996.
• [19] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.

## Appendix: Proofs of Theorems

Proof of Theorem 3:

###### Proof.

For the first part we show , . Let be a connected subgraph. Assume on the contrary that is disconnected: , where . Let . W.l.o.g. assume , i.e. , and consists of nodes .

Let . Consider the sub-matrix of corresponding to , since the rest part are all 0. Now we use the vector to hit :

 g′QSg=g′LS(AS∘MS)g−γg′LS(MS)g≥0. (13)

Note that has the form:

 AS=(AC00A¯C), (14)

where the off-diagonal block is zero because by assumption and is disconnected. Then:

 LS(AS∘MS)=Diag((AS∘MS)1n)−(AS∘MS)=(~LC00~L¯C), (15)

where is the Laplacian matrix of weighted by . Notice it still holds that . This means .

On the other hand, let be:

 LS(MS)=Diag(MS1n)−MS=(L1L3L′3L2). (16)

Using and to hit will yield: and . Apparently due to positive semi-definiteness of Laplacian matrix. If it’s strictly positive, proof is done. Otherwise this means . Note that all entries of are either 0 or negative due to non-negativity of . This means , or equivalently for any . But this can not happen, because and for any . Contradiction! So is connected.

For the other direction we need to show that any connected subgraph has a corresponding matrix , such that and for some and .

Let be defined as:

 Mij={1i∈S,j∈S0otherwise

This can be viewed as the adjacency matrix corresponding to a complete graph on the node set . So it naturally involves a star graph centered at , and satisfies the linear constraints of .

Furthermore, the sub-block corresponding to , , is exactly the adjacency matrix of . Since is connected, the second smallest eigenvalue of is strictly positive. Notice that on the sub-block, . Again by Finsler’s Lemma, this means that there exists a , such that the LMI holds on the sub-block:

 LS(AS∘MS)−γL(MS)⪰0

Proof of Theorem 4:

###### Proof.

For simplicity we provide a proof sketch for rectangle bands on a 2D lattice . We need to show that for a band belonging to , there exists a binary matrix such that , where depends only on .

Construct the matrix as follows:

 Mii={1i∈S0otherwise,Mij={1(i,j)∈ESori=aorj=a0otherwise

Apparently , and . W.l.o.g. assume , and . We only need to consider the first sub-block of , denoted by . Notice is exactly the unnormalized Laplacian matrix of , and is the Laplacian of the union graph of and , where denote the star graph centered at node .

Let . is the adjacency matrix of a graph , where is obtained from by removing those edges connected with the anchor. We rewrite the required inequality:

 QS(MS;γ)=L(AS∘MS)−γL(MS)=(1−γ)L(AS∘MS)−γL(MΔ)⪰0

Since is obtained from by removing edges, we have . We will show , which implies . Therefore it suffices to show:

 L(AS∘MS)−2γL(Mstar)⪰0.

The rest part follows from Lemma 8, which characterizes the value of for the above LMI to hold. Proof is done. ∎

###### Lemma 8.

Let denote a -node rectangle band with width and length on the 2D lattice, i.e. . Let be the graph Laplacian matrix corresponding to the rectangle lattice, and be the graph Laplacian of the star graph with the same node set, centered at the bottom-left node. Then the following inequality holds for :

 L−γLstar⪰0
###### Proof.

Assume the anchor node is node 1. It is equivalent to show that for any ,

 f′Lstarf=∑i≥2(f1−fi)2≤1γf′Lf=1γ∑(i,j)∈E(fi−fj)2

We first investigate a simple case where , i.e. is a -node line graph. In this scenario . We use Cauchy-Schwartz inequality to bound each using the edges on the path from node 1 to :

 (f1−fi)2=(i−1∑j=1(fj−fj+1))2≤(i−1)i−1∑j=1(fj−fj+1)2

Summing over all , we have:

 k∑i=2(f1−fi)2 ≤ k∑i=2[(i−1)i−1∑j=1(fj−fj+1)2] = (k−1∑i=1i)(f1−f2)2+(k−1∑i=2i)(f2−f3)2+...+(k−1)(fk−1−fk)2 ≤ k22k−1∑j=1(fj−fj+1)2

Therefore the inequality for line graph holds.

Now w.l.o.g. assume and . We first show that to cover the nodes in the lower triangle, is enough. The strategy is similar: construct paths from anchor to each node, and apply Cauchy-Schwartz inequality to make use of edges on these paths. Two tricks need to be mentioned:
(1) Paths need to be constructed very carefully so that each edge of is not used too often;
(2) It is inevitable that some edges will be used much more frequently than others, for example, the edges coming out of anchor. A weighted Cauchy-Schwartz should therefore be applied to alleviate this effect.

Let each node be indexed by its coordinates, is the anchor node. To help understand the construction, we introduce several notations. A node is “critical” if for some integer , as marked by red solid circles in Fig.5. Let denote the collection of nodes on the -th “boundary”. Anchor node is the only node in , and the outer most boundary is . Apparently .

We build a complete balanced binary tree based on all critical nodes with tree edges , where denotes a critical node in . We note down several observations for paths from anchor to each :
(1) There is a unique path starting from anchor to each , passing through critical nodes , for .
(2) Such a path, denoted by where , is composed of tree edges, for , with .
(3) For any two such paths, after they split at some node, they will never share any graph edges.

Now consider a path from to some , . We use weighted Cauchy-Schwartz inequality to bound this path with graph edges:

 (fv0−fvp)2 = (p−1∑i=0(fvi−fvi+1))2 = ⎛⎜⎝(fv0−fv1)+∑(i,j)∈(v1,v2)(fi−fj)+...+∑(i,j)∈(vp−1,vp)(fi−fj)⎞⎟⎠2 ≤ (1×2p−1+2×2p−2+...+2p−1×1) ⋅⎛⎝(fv0−fv1)22p−1+∑(i,j)∈(v1,v2)(fi−fj)22p−2+...+∑(i,j)∈(vp−1,vp)(fi−fj)21⎞⎠ = p⎛⎜⎝(fv0−fv1)2+2∑(i,j)∈(v1,v2)(fi−fj)2+...+2p−1∑(i,j)∈(vp−1,vp)(fi−fj)2⎞⎟⎠

The intuitive idea is that the graph edges composing tree edges closer to the anchor, i.e. for small where , will be passed through many more times than those composing tree edges far away from the anchor. So when applying weighted Cauchy-Schwartz inequality, a larger denominator is imposed on for those for small . For example, for the most frequently used edge , a penalty of is imposed on these edges (2 such edges, ((0,0),(0,1)) and ((0,0),(1,0))), while for those graph edges composing , only a constant is put in the denominator.

Next we need to figure out the frequency that each graph edge is used for covering all the nodes. By induction it is not hard to observe that the graph edges on the tree edge will be passed by at most paths. Take the graph of Fig.5 as an example. Each path is of the form , . The edges on are used at most 8 times, eg. . We have . The edges on