Nonparametric change point detection for growing trees and code to reproduce figures in Banerjee, Bhamidi and Carmichael, 2018
Motivated by applications, both for modeling real world systems as well as in the study of probabilistic systems such as recursive trees, the last few years have seen an explosion in models for dynamically evolving networks. The aim of this paper is two fold: (a) develop mathematical techniques based on continuous time branching processes (CTBP) to derive quantitative error bounds for functionals of a major class of these models about their large network limits; (b) develop general theory to understand the role of abrupt changes in the evolution dynamics of these models using which one can develop non-parametric change point detection estimators. In the context of the second aim, for fixed final network size n and a change point τ(n) < n, we consider models of growing networks which evolve via new vertices attaching to the pre-existing network according to one attachment function f till the system grows to size τ(n) when new vertices switch their behavior to a different function g till the system reaches size n. With general non-explosivity assumptions on the attachment functions f,g, we consider both the standard model where τ(n) = Θ(n) as well as the quick big bang model when τ(n) = n^γ for some 0<γ <1. Proofs rely on a careful analysis of an associated inhomogeneous continuous time branching process. Techniques developed in the paper are robust enough to understand the behavior of these models for any sequence of change points τ(n)→∞. This paper derives rates of convergence for functionals such as the degree distribution; the same proof techniques should enable one to analyze more complicated functionals such as the associated fringe distributions.READ FULL TEXT VIEW PDF
This paper reviews recent developments in fundamental limits and optimal...
The variance of noise plays an important role in many change-point detec...
Structural changes occur in dynamic networks quite frequently and its
Interactions among people or objects are often dynamic in nature and can...
Change point detection is becoming increasingly popular in many applicat...
The paper deals with disorders detection in the multivariate stochastic
One of the fundamental assumptions in stochastic control of continuous t...
Nonparametric change point detection for growing trees and code to reproduce figures in Banerjee, Bhamidi and Carmichael, 2018
Driven by the explosion in the amount of data on various real world networks, the last few years have seen the emergence of many new mathematical network models. Motivations behind these models are diverse including (a) extracting unexpected patterns as densely connected regions in the network (e.g. community detection); (b) understand properties of dynamics on these real world systems such as the spread of epidemics, the efficacy of random walk search algorithms etc; (c) most relevant for this study, understanding mechanistic reasons for the emergence of empirically observed properties of these systems such as heavy tailed degree distribution or the small world property. We refer the interested reader to [albert2002statistical, newman2003structure, newman2010networks, bollobas2001random, durrett-rg-book, van2009random] and the references therein for a starting point to the vast literature on network models. A small but increasingly important niche is the setting of dynamic network models, networks that evolve over time. In the context of probabilistic combinatorics, in particular in the study of growing random trees, these models have been studied for decades in the vast field of recursive trees, see [mahmoud2008polya, bergeron1992varieties, flajolet2009analytic, drmota2009random] and the references therein. To fix ideas, consider one of the standard examples: start with a base graph (e.g. two vertices connected by an edge) and an attachment function where . For each fixed time , having constructed the network at time , the network transitions to as follows: a new vertex enters the system and attaches to a pre-existing vertex
with probability proportional towhere is the current degree of this vertex. The case of corresponds to the famous class of random recursive trees [smythe1995survey]. The specific case of was considered in [barabasi1999emergence] where they showed, via non-rigorous arguments, that the resulting graph has a heavy tailed degree distribution with exponent in the large limit; this was rigorously proved in [Bollobas:2001:DSS:379831.379835].
This paper has the following two major aims:
In the context of models described above, asymptotics in the large network limit for a host of random tree models as well as corresponding functionals have been derived ranging from the degree distribution to the so-called fringe distribution [aldous1991asymptotic, holmgren2017fringe, bhamidi2007universal] of random trees. One of the major drivers of research has been proving convergence of the empirical distribution of these functionals to limiting (model dependent) constants. Establishing (even suboptimal) rates of convergence for these models has been non-trivial other than for models related to urn models e.g. see the seminal work of Janson [janson2004functional]
. The aim of this paper is to develop robust methodology for proving such error bounds for general models. Our results will not be optimal owing to the generality of the model considered in the paper; however using the techniques in this paper coupled with higher moment assumptions can easily lead to more refined results for specific models. To keep the paper to manageable length, we focus on the degree distribution but see Section4 for our work in progress of using the methodology in this paper for more general functionals.
Consider general models of network evolution as described in the above paragraph but wherein, beyond some point, new individuals entering the system change their evolution behavior. This is reflected via a change in the the attachment function to a different attachment function .
We first aim to understand the effect of change points on structural properties of the network model and the interplay between the time scale of the change point and the nature of the attachment functions before and after the change point. Analogous to classical change point detection, we start by considering models which evolve for steps with a change point at time for ; we call this the standard model. Counter-intuitively, we find that irrespective of the value of , structural properties of the network such as the tail of the degree distribution are determined by model parameters before the change point; motivated by this we consider other time scales of the change point (which we call the quick big bang model) to see the effect of the long range dependence phenomenon in the evolution of the process.
We then develop nonparametric change point detection techniques for the standard model when one has no knowledge of the attachment functions, pre or post change point.
Fix . For each , fix functions , which we will refer to as attachment functions. Let us start by describing the model when , and we have one attachment function . This setting will be referred to as nonuniform random recursive trees [szymanski1987nonuniform] or attachment model. We will grow a sequence of random trees as follows:
For , consists of two vertices attached by a single edge. Label these using and call the vertex as the “root” of the tree. We will think of the tree as directed with edges being pointed away from the root (from parent to child).
Fix . Let the vertices in be labeled by . For each vertex let denote the out-degree of . A new vertex labelled by enters the system. Conditional on , this new vertex attaches to a currently existing vertex with probability proportional to . Call the vertex that attaches to, the “parent” of and direct the edge from this parent to resulting in the tree .
Model with change point: Next we define the model with distinct change points. Fix attachment functions . For fix distinct times . Let and and write for the driving parameters of the process. For notational convenience, let and . Consider a sequence of random trees constructed as follows. For , the process evolves as in the non-change point model using the attachment function . We will call this the initializer function. Then or each change point index and time the process evolves according to the function i.e. each new vertex entering the system at time attaches to a pre-existing vertex with probability proportional to .
We use for stochastic domination between two real valued probability measures. For let . If
has an exponential distribution with rate, write this as . Write for the set of integers, for the real line and let , . Write for convergence almost everywhere, in probability and in distribution respectively. For a non-negative function , we write when is uniformly bounded, and when . Furthermore, write if and . Finally, we write that a sequence of events occurs with high probability (whp) when . For a sequence of increasing rooted trees (random or deterministic), we will assume that edges are directed from parent to child (with the root as the original progenitor). For any , note that for all vertices but the root, the degree of is the same as the out-degree of . For and , let be the number of vertices in with out-degree ; thus counts the number of leaves in .
Here we setup constructions needed to state the main results. We will need the following assumption on the attachment functions of interest in this paper. We mainly follow [jagers-ctbp-book, jagers1984growth, nerman1981convergence, rudas2007random].
Positivity: Every attachment function is assumed to be strictly positive that is for all .
Every attachment function can grow at most linearly i.e. such that . This is equivalent to there existing a constant such that for all .
Consider the following function defined via,
Define . We assume,
Using (iii) of the above Assumption, let be the unique such that
This object is often referred to as the Malthusian rate of growth parameter.
Fix an attachment function as above. We can construct a point process on as follows: Let
be a sequence of independent exponential random variables with. Now define for . The point process is defined via,
Abusing notation, we write for ,
Here we view as a measure on . We will need a variant of the above objects: for fixed , let denote the point process where the first inter-arrival time is namely define the sequence,
As above, . We abbreviate as and similarly .
Fix attachment function satisfying Assumption 2.1(ii). A continuous time branching process driven by , written as , is defined to be a branching process started with one individual at time and such that every individual born into the system has an offspring distribution that is an independent copy of the point process defined in (2.4).
We refer the interested reader to [jagers-ctbp-book, athreya1972] for general theory regarding continuous time branching processes. We will also use to denote the collection of all individuals at time . For , denote by the birth time of . Let denote the size (number of individuals born) by time . Note in our construction, by our assumption on the attachment function, individuals continue to reproduce forever. Write for the corresponding expectation i.e.,
Under Assumption 2.1(ii), it can be shown [jagers-ctbp-book, Chapter 3] that for all , , is strictly increasing with as . In the sequel, to simplify notation we will suppress dependence on on the various objects defined above and write etc. The connection between CTBP and the discrete random tree models in the previous section is given by the following result which is easy to check using properties of exponential distribution (and is the starting point of the Athreya-Karlin embedding [athreya1968]).
Fix attachment function consider the sequence of random trees constructed using attachment function . Consider the continuous time construction in Definition 2.2 and define for the stopping times . Then viewed as a sequence of growing random labelled rooted trees we have,
Consider a continuous time branching process with attachment function and Malthusian rate . For each , denote by the number of vertices in of degree and abbreviate to . Let be as in (2.3). Define the probability mass function via,
Here for , the is by convention taken to be . Verification that the above is an honest probability mass function can be found in [rudas2007random, Theorem 2]. Following the seminal work of [jagers-ctbp-book, jagers1984growth, nerman1981convergence, rudas2007random], it follows that for each that
However, to obtain consistent change point estimators, we need to strengthen the above convergence to a sup-norm convergence on a time interval whose size goes to infinity with growing
and also, a quantitative rate for this convergence. Such results have been obtained for very specific attachment functions via functional central limit theorems (e.g. see[janson2004functional] for models whose degree evolution can be reduced to the evolution of urn processes satisfying regularity conditions and [resnick2015asymptotic] for the linear preferential attachment model), but do not extend to the general setting. We make the following assumptions throughout this section.
There exists such that
Assumption 3.2 is implied by since
Fix a sequence of growing trees and recall that for any and , denotes the number of vertices with out-degree . The main theorem of this section is
In the notation of Jagers and Nerman [jagers-nerman-1, nerman1981convergence], the result above is stated for the “characteristic” corresponding to degree (see the discussion below). We believe our proof techniques are robust enough to generalize to more complex functionals such as the fringe distribution [aldous1991asymptotic, holmgren2017fringe]. We will pursue this in a separate paper. However below we describe one of the key estimates derived in this paper of more general relevance.
For special cases such as the uniform or linear preferential attachment, stronger results are obtainable via Janson’s “superball” argument [janson2004functional] as well as application of the Azuma-Hoeffding inequality [Bollobas:2001:DSS:379831.379835, van2009random]. However these do not appear to work for the general model considered in this paper.
Recall from [nerman1981convergence] that a characteristic is a non-negative random process , assigning some kind of score to the typical individual at age t. We assume for every . For this article, we will be interested in the following class of characteristics:
For any characteristic , define . This can be thought of as the sum of -scores of all individuals in . Write . For fixed and for the specific characteristic , write .
It is easy to check that for a general (integrable) characteristic , satisfies the renewal equation
Write when the limit exists. Following [nerman1981convergence], we write to denote that is the -th child of and define for any ,
Write . By Corollary 2.5 of [nerman1981convergence], converges almost surely to a finite random variable as . By Theorem 3.1 of [nerman1981convergence], for any . An important technical contribution of this paper is the following result.
Fix . We start by studying the model under the following assumption which we refer to as the “standard” model owing to the analogous assumptions for change point methodology in time series:
Fix and assume there exist such that for all , the change point is .
To simplify notation we will drop . Recall the sequence of random trees . For any and , write for the number of vertices with out-degree . We will sometimes abuse notation and write to explicitly specify the dependence of this object on the underlying tree. In this section we mainly consider the case where there is exactly one change point at time for fixed . In Section 3.3 we describe the general result for multiple change points. The notation is cumbersome so this general case can be skipped over on an initial reading. We also give the proof for the single change point case; the general case follows via straight-forward extensions. Fix initializer attachment function and let be as in (2.3). Define the probability mass function via (3.1) with in place of . As before let the attachment function after change point be . Recall from (2.6), for fixed , the function and the function from (2.7). Also recall that, for fixed ,
Let denote the collection of all probability measures on . For each , consider the functional given by
where . Write for the -th co-ordinate of the above map. Let for denote the degree distribute or a random recursive tree grown with attachment function (i.e. without any change point). Corollary 7.2 shows that for each , there is a unique such that
Define for . Now, we are ready to state our main theorem on sup-norm convergence of degree distributions post-change point.
Suppose satisfy Assumption 2.1. For any and
There is a probabilistic way to view the limit which we now describe at the end of the construction of the process namely . Write for . Construct an integer valued random variable using the following auxiliary random variables:
Generate . Conditional on , generate point process and let , with as in (3.6). Now set .
Conditional on , generate random variable supported on the interval with distribution
Conditional on and , let , where as in (2.5), is the point process constructed using attachment function .
Now let . Let be the integer valued random variable defined as follows: with probability , and with probability , . The following is a restatement of the convergence result implied by Theorem 3.6 for time .
Write for the pmf of . The next result, albeit intuitively reasonable is non-trivial to prove in the generality of the models considered in the paper.
Assume that . Then for any one has . Thus the change point always changes the degree.
The next result describes the tail behavior of the ensuing random variable.
The initializer function determines the tail behavior of in the sense that
If in the model without change point using , the degree distribution has an exponential tail then so does the model with change point irrespective of and .
If in the model without change point using , the degree distribution has a power law tail with exponent then so does model with change point irrespective of and .
Suppose the initializer function is linear with for . For fixed , let denote the size of the -th maximal degree. Then as long as the function satisfies Assumption 2.1, is a tight collection of random variables bounded away from zero as .
Without change point, it is known [mori2007degree] that for each fixed , for a non-degenerate distribution. Thus the above result shows that irrespective of the second attachment function , the maximal degree asymptotics for linear preferential attachment remain unaffected. The proof of the above result follows via analogous arguments as [bhamidi2015change, Theorem 2.2] and thus we will not prove it in this paper.
Fix , with and let . Further fix attachment functions satisfying Assumption 2.1 and let . We start with the following recursive construction of a sequence of probability mass functions and positive constants .
Initialization: For . let as in (3.1).
Pre-epoch distribution:For , define the random variable .
recursion: For , define as the unique root of the equation:
Epoch age distribution: Fix . Generate as above. Conditional on , generate random variable supported on the interval with distribution
Alive after epoch degree distribution: Conditional on the random variables in (d) let where as before is the point process with attachment function .
Mixture distribution: Finally define as the following mixture: with probability ; with probability , let .
Let be the probability mass function of .
With , write .
Now we consider the case where the change point happens “early” in the evolution of the process, where the change point scales like . To simplify notation, we specialize to the case , however our methodology is easily extendable to the general regime. Let denote the probability mass function as in (3.1) but using the function to construct in (2.3) and then in place of in (3.1).
Define for and any non-negative measure ,
We will work under the following assumption.
Recall that in the previous section, one of the messages was that the initializer function determined various macroscopic properties of the degree distribution for the standard model.
The form was assumed for simplicity. We believe the proof techniques are robust enough to handle any , where and . We defer this to future work.
The next result implies that the maximal degree does feel the effect of the change point. Instead of proving a general result we will consider the following special cases. Throughout denotes the maximal degree in .
Once again assume . Consider the following special cases:
Uniform Linear: Suppose whilst for fixed . Then with high probability as , for any sequence ,
Linear Uniform: Suppose for fixed whilst . Then with high probability as , for any sequence ,
Linear Linear: Suppose whilst where . Then is tight and bounded away from zero where
It is instructive to compare the above results to the setting without change point. For the uniform model, it is known [devroye:1995, szymanski1990maximum] that the maximal degree scales like whilst for the linear preferential attachment, the maximal degree scales like [mori2007degree]. Thus for example, (b) of the above result coupled with Theorem 3.15 implies that the limiting degree distribution in this case is the same as that of the uniform random recursive tree (URRT) namely Geometric with paratemer ; however the maximal degree scales polynomially in and not like as in the URRT.
For any , the initial segment should always leave its signature in some functional of the process. See for example [bubeck2015influence, bubeck2017finding, curien2014scaling] where the evolution of the system (using typically linear preferential attachment albeit [bubeck2017finding] also considered the uniform attachment case) starting from a fixed “seed” tree was considered and the aim was to detect (upto some level of accuracy) this seed tree after observing the tree
. Similar heuristics suggest that in the context of our model, the initial segment of the process however small should show its signature at some level. We discuss this aspect further in Section4.
Proofs of results for the quick big bang model are given in Section 8.
In this Section, we discuss the statistical issues of actual change point detection from an observation of the network. We will only consider the standard model and one change point (). We do not believe the estimator below is “optimal” in terms of rates of convergence, however the motivation behind proving the sup-norm convergence result Theorem 3.6 is to provide impetus for further research in obtaining better estimators.
Consider any two sequences satisfying , as . We define the following change point estimator:
The following theorem establishes the consistency of the above estimator.
From a practical point of view, for the proposed estimator to be close to the change point even for moderately large , we should select satisfying the above hypotheses so that
grows as slowly as possible (which ensures that we look at the evolving tree not too early, before the ‘law of large numbers’ effect has set in) andgrows as quickly as possible (to ensure that the detection threshold is sufficiently close to zero to capture the change in degree distribution close to the change point). One reasonable choice is and .
Random recursive trees: Random recursive trees have now been studied for decades, motivated by a wide array of fields including convex hull algorithms, linguistics, epidemiology and first passage percolation and recently in the study of various coalescent processes. See [mahmoud2008polya, drmota2009random, smythe1995survey, devroye1998branching, goldschmidt2005random] and the references therein for starting points to this vast literature. For specific examples such as the uniform attachment or the linear attachment model with , one can use the seminal work of Janson [janson2004functional] via a so-called “super ball” argument to obtain functional central limit theorems for the degree distribution. Obtaining quantitative error bounds let alone weak convergence results in the general setting considered in this paper is much more non-trivial. Regarding proof techniques, we proceed via embedding the discrete time models into continuous time branching processes and then using martingale/renewal theory arguments for the corresponding continuous time objects to read off corresponding results for the discrete models; this approach goes back all the way to [athreya1968]. Limit results for the corresponding CTBPs in the setting of interest for this paper were developed in the seminal work of Jagers and Nerman [jagers-ctbp-book, jagers-nerman-1, nerman1981convergence]. One contribution of this work is to derive quantitative versions for this convergence, a topic less explored but required to answer questions regarding statistical estimation of the change point.
Fringe convergence of random trees: A second aim of this work (albeit not developed owing to space) is understanding rates of convergence of the fringe distribution. We briefly describe the context, referring the interested reader to [aldous1991asymptotic, holmgren2017fringe] for general theory and discussion of their importance in computer science. Let denote the space of all rooted (unlabelled) finite trees (with denoting the empty tree). Fix a finite non-empty rooted tree with root . For each let denote the sub-tree consisting of the set of vertices “below” namely vertices for which the shortest path from needs to pass through . View as an element in via rooting it at . The fringe distribution of
is the probability distribution on:
If is a sequence of random trees, one now obtains a sequence of random probability measures. Aldous in [aldous1991asymptotic] shows that convergence of the associated fringe measures implies convergence of the associated random trees locally to limiting infinite random trees with a single infinite path; this then implies convergence of a host of global functionals such as the empirical spectral distribution of the adjacency matrix, see e.g. [bhamidi2012spectra]. For a number of discrete random tree models, embedding these in continuous time models and using results of [jagers-nerman-1, nerman1981convergence] has implied convergence of this fringe distribution; however establishing rates of convergence has been non-trivial [holmgren2017fringe]. While many of the results in this paper are all formulated in terms of the degree distribution, the results and most of the proofs in Section 9 extend to more general characteristics such as the fringe distribution. To keep the paper to manageable length, this is deferred to future work.
General change point: Change point detection especially in the context of univariate time series has also matured into a vast field, see [csorgo1997limit, brodsky2013nonparametric]. Even in this context, consistent estimation especially in the setting of multiple change points is non-trivial and requires specific assumptions on the nature of the change see e.g. [yao1988estimating]
for work in estimating the change in mean of a sequence of independent observations from the normal distribution; in the context of econometric time series settings including linear regression see for example[bai1997estimating, bai1998estimating, bai2003computation]; for recent applications in the biological sciences [olshen2004circular, zhang2007modified]. The only pre-existing work on change point in the context of evolving networks formulated in this paper that we are aware of was carried out in [bhamidi2015change] where one assumed linear attachment functionals of the form for some parameter . In this context, specialized computations specific to this model enabled one to derive change point detection estimators that were consistent. Unfortunately these techniques do not extend to the general case considered in this paper.
Open questions: In the context of rates of convergence, one natural question is to understand if one can obtain tighter bounds than those in Theorem 3.3 and in particular prove a functional central limit theorem (FCLT) with