DeepAI
Log In Sign Up

Connected Components for Infinite Graph Streams: Theory and Practice

Motivated by the properties of unending real-world cybersecurity streams, we present a new graph streaming model: XStream. We maintain a streaming graph and its connected components at single-edge granularity. In cybersecurity graph applications, input streams typically consist of edge insertions; individual deletions are not explicit. Analysts maintain as much history as possible and will trigger customized bulk deletions when necessary Despite a variety of dynamic graph processing systems and some canonical literature on theoretical sliding-window graph streaming, XStream is the first model explicitly designed to accommodate this usage model. Users can provide Boolean predicates to define bulk deletions. Edge arrivals are expected to occur continuously and must always be handled. XStream is implemented via a ring of finite-memory processors. We give algorithms to maintain connected components on the input stream, answer queries about connectivity, and to perform bulk deletion. The system requires bandwidth for internal messages that is some constant factor greater than the stream arrival rate. We prove a relationship among four quantities: the proportion of query downtime allowed, the proportion of edges that survive an aging event, the proportion of duplicated edges, and the bandwidth expansion factor. In addition to presenting the theory behind XStream, we present computational results for a single-threaded prototype implementation. Stream ingestion rates are bounded by computer architecture. We determine this bound for XStream inter-process message-passing rates in Intel TBB applications on Intel Sky Lake processors: between one and five million graph edges per second. Our single-threaded prototype runs our full protocols through multiple aging events at between one half and one a million edges per second, and we give ideas for speeding this up by orders of magnitude.

READ FULL TEXT VIEW PDF
03/28/2022

GraphZeppelin: Storage-Friendly Sketching for Connected Components on Dynamic Graph Streams

Finding the connected components of a graph is a fundamental problem wit...
09/13/2017

Approximate Integration of streaming data

We approximate analytic queries on streaming data with a weighted reserv...
10/27/2020

Improved Algorithms for Edge Colouring in the W-Streaming Model

In the W-streaming model, an algorithm is given O(n polylog n) space and...
07/22/2018

Independent Sets in Vertex-Arrival Streams

We consider the classic maximal and maximum independent set problems in ...
10/31/2021

On multiple IoT data streams processing using LoRaWAN

LoraWAN has turned out to be one of the most successful frameworks in Io...
12/09/2017

A Streaming Algorithm for Graph Clustering

We introduce a novel algorithm to perform graph clustering in the edge s...
09/20/2017

SBG-Sketch: A Self-Balanced Sketch for Labeled-Graph Stream Summarization

Applications in various domains rely on processing graph streams, e.g., ...

1 Introduction

We assume that the graph-edge stream is effectively infinite. This means that as long as the algorithm is running, it must always be prepared for the arrival of another edge. At any time, the system has seen only a finite set of edges and need store only a finite graph representation. However this graph can be arbitrarily large and may eventually exceed any particular finite storage. In order to deal with this case, we define a streaming model to exploit distributed systems with huge aggregate memory and to handle bulk deletions customized by a user-provided deletion or “aging” predicate. The most straightforward predicate would be (get rid of edges older than a certain timestamp), but users may require bulk deletions guided by different predicates. For example, they may wish to preserve some old edges of high value.

Previous theoretical work on infinite graph streams in a sliding-window model does allow for automatic time-based expiration while maintaining connected components Crouch et al. (2013); McGregor (2014). We adapt and generalize these theoretical ideas by allowing user-defined deletion predicates and distributed computation. Furthermore, in Section 8 we briefly survey previous work on dynamic graph processing. For now, we simply note that while some of this literature achieves impressive edge ingestion rates, none of it explains how to continue ingestion indefinitely as the storage capacity fills up. These dynamic methods accept a stream of insertions and deletions, and if the former dominates the latter, the system will eventually fill and fail. In this paper, we spend most of our effort providing a theoretical basis for graph stream computations of arbitrary duration. We ensure that each edge is stored only once in our distributed model during normal conditions, and that we recover to that steady state in a predictable way after an aging event.

Many dynamic graph processing systems ingest edges concurrently in large blocks, making it potentially impossible to detect the emergence and disappearance of fine-grained detail such as -sized components that merge into a giant component, as predicted by Aiello et al. (2000). We model ingestion at a single-edge granularity to ensure that phenomena such as this will be observable.

Contributions

We give a distributed algorithm for maintaining streaming connected components. Processors are connected in a one-way ring, with only one processor connected to the outside. The algorithm is designed for cybersecurity monitoring and has the following set of features:

  • The system works with an arbitrary number of processors, allowing the system to keep an arbitrary number of edges.

  • Each edge is stored on only one processor, requiring space, so the asymptotic space complexity is optimal.

  • The system fails because of space only if all processors are holding edges to their maximum capacity.

  • Processing an edge or query requires almost-constant time per system “tick,” the time to run union-find (inverse Ackermann’s function).

  • Connectivity-related queries, spanning-tree queries, etc, are answered at perfect granularity. Though there is some latency, the answer is perfect with respect to the graph in the system at the time the query was asked. This is in contrast to systems that process edge changes in batches up to millions allowing no finer granularity to queries.

  • Because some cyber phenomena do not have explicit edge deletions, the system removes edges only when required.

    • This edge deletion is done in bulk. Though querying is disabled during data structure repair, the system continues to ingest incoming edges. There is no need to buffer or drop edges if deletion happens during a period of lower use such as at night.

    • The analyst can select any (constant-time) edge predicate to determine which edges survive a bulk deletion. This allows analysts to keep edges they feel are of high value regardless of their age.

    • For age-based deletion, the system can trigger and select correct parameters for bulk deletion automatically.

  • If the analyst selects legal values (depending on properties of the hardware and input stream) for how many edges survive a bulk deletion and what fraction of the time the system must answer queries, the system will run indefinitely.

2 Preliminaries

2.1 Modeling the graph through time

We mark time based upon the arrival of any input stream element. Time starts at zero and increments whenever a stream element arrives. The input stream at time is the ordered stream that has arrived between time and time . A stream element is an input edge , a query (e.g., ), or a command (e.g. ). In Section 3 we formalize the operation of our new model, X-Stream, at each of these ticks.

Definition 1
Active Edge:

At any time , we say an edge is active if it has entered the system and no subsequent aging command has removed it.

Active Graph:

At any time , the active graph is . Edge set is the set of active edges at time and the vertex set is the set of endpoints of .

Active Stream:

At any time , an active stream is a subset of the input stream consisting only of active edges.

Note that differs from iff the stream element arriving at time is an edge not already in or or an aging command. In the latter case, is the set of edges that survive the aging.

2.2 Streaming models

In classic streaming models, computing systems receive data in a sequence of pieces and do not have space to store the entire data set. As in online algorithms, systems must make irreversible decisions without knowledge of future data arrivals. Graph streams are a type of such data streams in which a sequence of edges arrives at the computing system, which may assemble some of the edges into a graph data structure. Applications include modeling the spread of diseases in health-care networks, analysis of social networks, and security and anomaly detection in computer-network data. We focus on cybersecurity applications, in which analysts can infer interesting information from graphs that model relationships between entities. As the scale of such graphs increases, analysts will need algorithms to at least partially automate stream analysis.

We present detailed algorithms and a complete implementation of the real-time graph-mining methodology introduced in Berry et al. (2013). In this streaming model, the full graph is stored in a distributed system. The model is also capable of bulk edge deletion, while continuing to accept new edges. The algorithm continuously maintains connected-component information. It can answer queries about components and connectivity, except during a period of data-structure repair immediately following a bulk delete.

In classic graph streaming models, such as in Munro and Paterson (1980); Muthukrishnan and others (2005); Raghavan and Henzinger (1999), the input is a finite sequence of edges, each consisting of a pair of vertices. The edge sequence is an arbitrary permutation of graph edges, which may include duplicates. The output consists of each vertex, along with a label, such that two vertices have the same label if and only if they belong to the same component. Algorithms for the classic streaming model have two parameters: , the number of times the algorithm can see the input stream, or the number of passes on the stream, and , the storage space available to the algorithm.

Figure 1: Example of the first pass of the DFR connected components algorithm. The union-find structure in the upper left has a capacity of four union operations. This pass ingests five edges (shown in red) before filling, creating three super-nodes: , , and , which encapsulate vertices , , , and . Edge was redundant. The remaining edges are relabled and emitted as stream “,” representing the contracted graph. Then the contents of the union-find structure are emitted (this is stream ).

The W-Stream model, developed in Demetrescu et al. (2009), uses the concept multiple passes. Each pass can emit a customized intermediate stream. W-Stream can support several graph algorithms. However, our work is specifically based upon the connected components algorithm of Demetrescu et al. (2009), which we will call DFR  to recognize the authors: Demetrescu, Finocchi, and Ribichini.

In the streamsort model, introduced in Aggarwal et al. (2004), algorithms can create and process intermediate temporary streams, and reorder the contents of intermediate streams at no cost.

2.3 W-Stream

Because our work is an extension of the DFR finite-stream connected-components algorithm based on the W-Stream model, we describe that algorithm and model in detail now. DFR uses graph contraction. In contraction, one or more connected subgraphs are each replaced by a supernode. Edges within a contracted subgraph disappear and edges with one endpoint inside a supernode now connect to the supernode. For example, in Figure 1, connected subgraph (triangle) , , , is contracted into supernode .

In each pass, the W-Stream connected components algorithm ingests a finite stream and outputs a modified stream to read for the next pass. Each stream represents a progressively more contracted graph, where connected subgraphs contract to a node, until a single node represents each connected component. The stream at each pass comes in two parts. The first (A) part is the current, partially contracted graph, and the second (B) part lists the graph nodes buried inside supernodes. The initial input stream has all the graph edges in part A and an empty part B. The last final-output stream has an empty part A and part B gives a component label for each vertex. Figure 1 illustrates the input and output of the first pass of the W-Stream connected-components algorithm.

During pass , the algorithm ingests streams and , in that order. First, it computes connected components using union-find data structures until it runs out of memory. More formally, is the capacity of the W-Stream processor, measured in union operations. As shown in Figure 1, the set representative is one of the set elements, for reasons given later. This union-find stage ingests a prefix of stream . Because its memory is now full, the processor must now emit information about the remaining stream rather than ingesting it. The algorithm incorporates what it has learned about the graph’s connected components into the input for the next pass. Specifically, in the contracted graph corresponding to stream (called ), each set in the union-find data structure is represented as a supernode.

The DFR algorithm now generates the next stream from the remainder of stream . For each remaining edge in stream , if endpoint is buried within supernode , we relabel to . For example, in Figure 1, node is buried in supernode . Therefore, edge is relabeled to . If both endpoints are relabeled to the same supernode, DFR drops it according to contraction rules.

We now describe how to process stream and to emit stream . In the first pass, stream is empty. Stream tells which nodes are “buried” in the newly-created supernodes. Specifically, stream is a set of pairs , where node is buried inside supernode . Node will never appear in any future stream . To process a non-empty stream , we use the same relabeling strategy we used while emitting stream . However, the interpretation is different. If is in stream , and supernode is now part of a union-find set with representative , we emit to stream . This means that node is part of supernode in stream .

This process repeats until pass where is the empty stream. Stream can be interpreted as connected-component labels. Two nodes have the same supernode label if and only if they are in the same connected component.

We now summarize the argument from Demetrescu et al. (2009) that the DFR algorithm is correct. The streams are pairs of vertices from the original graph that are in the same connected component by correctness of the union-find algorithm. We can therefore interpret the pair as an edge in a star-graph representation for a (partial) connected component (in their lingo, a “collection of stars”).

Correctness of DFR follows from maintaining the following invariant at each pass.

Invariant 1

(Demetrescu et al. (2009) Invariant 2.2) For each , is a collection of stars and has the same connected components as .

Observation 1

DFR computes the same set of connected components for any permutation of the input stream, and any arbitrary duplication of stream elements.

Figure 2: Implementation of the DFR connected components algorithm in the W-Stream model. The output of pass must be stored to disk, then read back in as input for pass .

3 X-Stream model

To motivate our X-Stream model, we first consider how the theoretical W-Stream model might be implemented. We show a plausible solution in Figure 2. A single processor reads and writes from files that store intermediate streams. As these files may be of arbitrary size, a direct implementation of W-Stream is only a notional idea. X-Stream is a theoretical variant of W-Stream that can be implemented efficiently.

In graph terms, W-Stream stores only spanning tree edges. It may drop any non-tree edge since there is no concept of deletion in that model. Our X-Stream model must accommodate bulk deletion by design since its input stream is infinite, and this means that non-tree edges must be retained. If a spanning-tree edge is deleted, then some non-tree edge might reconnect the graph.

Our X-Stream model supports W-Stream-like computations on infinite streams of graph edges. We present XS-CC, a connected components algorithm analogous to DFR but implemented in the X-Stream model. Concurrent, distributed processing allows the XS-CC algorithm to handle streams without end markers, which are necessary in the DFR algorithm to distinguish between stream passes. Aging also allows the X-Stream model to manage unending edge streams in finite space. X-Stream has a ring of processors, plus an I/O processor, shown in Figure 3. The I/O processor passes the input stream of edges, as well as queries and other external commands, to the first processor of the ring. The I/O processor also outputs information received from the ring, such as query responses.

Let be a set of non-I/O processors defining the current state of the system. Processor is the head (also called ), Processor is the tail (also called ), and we define successor and predecessor functions as usual: for , for . Thus each processor passes data to its successor, including the tail, which passes data back to the head.

Each has edge storage capacity ; for simplification we assume all processors have capacity , for a total memory capacity of edges. We also assume since processors generally have at least megabytes of memory, even when there are enough processors for a relatively large .

We next define a notion of time for the X-Stream model. For this paper, we assume that all hashing and union-find operations are effectively constant-time.

Definition 2 (X-Stream Step)

the X-Stream clock marks units of time, or ticks. At each tick, every processor is activated: it receives a fixed-size bundle  of slots from its predecessor, does a constant amount of computation, and sends a size- output bundle to its successor. The head processor also receives a single slot of information from the I/O processor at each tick.

X-Stream steps are thus conceptually systolic Kung (1980). though real implementations such as the one we present in Section 7 can be asynchronous.

Figure 3: X-Stream architecture. Bundles of slots circulate on a heartbeat. Input edges reside in the single primary (black) slot of each bundle. Payload (white) slots are used during bulk deletion and complex queries.

Using the notion of the X-Stream clock, we can define the logical graph:

Definition 3

The logical graph at time is is defined as:

  1. the edge set is the set of active edges at time and

  2. the vertex set is the set of endpoints of .

Each message between ring processors is a bundle of constant-sized slots. The constant is the bandwidth expansion factor mentioned in the abstract. One of our key modeling assumptions is that the bandwidth within a ring of processors is at least times greater than the stream arrival rate, i.e., that the bandwidth expansion factor is at least . We give theory governing this factor in Section 5 and corroborate that theory via experiment in Section 9.

Each slot can hold one edge and is designated either primary or payload by its position in the bundle. By convention, the first slot in a bundle is primary and all others are payload. Primary slots generally contain information from outside the system, such as input stream edges or queries, while payload slots are used during the aging process and during queries with non-constant output size (for example, enumerating all vertices in small components).

Once a processor receives a bundle of slots, it processes the contents of each in turn. The processor can modify or replace the slot contents, as we will describe below. When it has finished with all occupied slots, it emits the bundle downstream.

In order to formally define the graph stored in the X-Stream model we consider two edge states: A settled edge is incorporated into the data structures of exactly one processor. A transit edge is in the system, but is not settled. For example, it may still be moving from processor to processor in a bundle.

3.1 Distributed Data Structures

The X-Stream data structure is a natural distributed version of the W-Stream data structure, except that we must store the entire graph to allow bulk deletions. W-Stream pass identifies a set of connected subgraphs in the input stream. X-Stream processor stores a spanning forest of these subgraphs. By construction, the connected components of this forest are the same as those of the W-Stream connectivity structure, called in Demetrescu et al. (2009).

To describe how X-Stream’s distributed data structures implement the W-Stream nesting strategy, we define the concepts of local components and building blocks.

Definition 4

The connected components identified by the union-find structure on a processor are called the local components (LCs) of .

A processor downstream of will see a local component of contracted into a single node, which might incorporate into one of its local components. Figure 4 shows an example of three processors and their union-find structures. As depicted in the first box of the figure, and are LCs of and and are incorporated in the LC on .

Definition 5

Building blocks (BBs) for processor are the elements over which does union-find. A primitive building block contains exactly one vertex of the input graph. The set of all primitive building blocks is . A non-primitive building block corresponds to a local component from a processor upstream of .

We say a processor consumes its building blocks because notification of their existence arrives from upstream and does not propagate downstream. A local component corresponding to a set in a union-find structure encapsulates all the building blocks in that set. We now formalize the relationship between X-Stream distributed data structures and X-Stream processors. Figure 4 illustrates these concepts.

Definition 6
:

The unique processor that creates local component , denoted .

:

the local component that contains building block , denoted .

:

The unique processor that consumes building block , denoted .

Figure 4: Example usage of notation: because processor consumed building block , because processor created local component , because is a building block of local component . Processor will relabel any vertex in to because .
Definition 7 (X-Stream nesting identity)

Let be a building block. Then

(1)

The X-Stream Nesting Identity, which is true by construction, says that the processor storing the local component that encapsulates building block is the processor that consumed .

We base our algorithm description and correctness arguments on (1). In the X-Stream nesting structure, a building block is consumed by processor and incorporated into local component . The latter is a creation event, and the creating processor is .

Figure 5: X-Stream normal operation. Spanning tree (red) edges are packed into and store the connectivity information of W-Stream. Non-tree (blue) edges are retained and packed immediately afterwards. Note that will remain the building processor until it has completely filled with tree edges (jettisoning its current non-tree edges downstream). The gray edge (

) has not yet been classified as tree or non-tree.

3.2 Relabeling

Suppose that is the domain of possible vertex names from the edge stream, and is the set of possible names of building blocks/local components. The X-Stream  nesting identity allows us to define a simple recursive relabeling function for processor , which returns the name of the local component that most recently encapsulates a primitive building block if such a local component exists, either on or upstream of . This function underlies correctness arguments for X-Stream data structures and connectivity queries.

Definition 8

Let the relabeling function be defined as follows. For , let , where is the primitive building block corresponding to vertex . For :

(2)

In the example shown in Figure 4, , , and . Since the first processor that knows of is , , , and . are the primitive building blocks corresponding to the vertices and named using the vertex names.

An edge is received by the system as two vertices with their primitive building blocks and a time stamp : and . Since edges are undirected, an edge

will be referred to as having the same endpoints as . Processor for then receives edges in the form . For all edges, the most recent timestamp is stored.

3.3 Operation Modes

The XS-CC algorithm operates in two major modes. The aging mode is active when a bulk deletion is occurring, as we will describe in Section 5. Otherwise the system is in normal mode.

XS-CC uses the concept of relabeling, the X-Stream ring of processors, and periodic bulk deletion events to handle infinite streams of input edges. Each processor plays the role of an intermediate stream in W-Stream, borrowing the concept of loop unrolling from compiler optimization. As with the latter, we can store information concerning only a finite number of loop iterations (W-Stream passes) at one time. However, the periodic bulk deletions allow XS-CC to run indefinitely.

4 Normal mode

During XS-CC normal mode, at any X-Stream tick exactly one processor is in a state of building. This building processor, , accepts new edges, maintains connected components with a union-find data structure, and stores spanning tree edges. XS-CC normal mode maintains two key invariants, stated below and illustrated in Figure 5.

Invariant 2

Let be the building processor. Then is completely full of spanning tree edges for at all times, and has no spanning tree edges for .

When fills with tree edges, a building “token” is passed downstream to ’s successor, which assumes building responsibilities. Thus, XS-CC maintains a spanning forest of the input graph, packed into prefix processors . The XS-CC normal mode protocols maintain Invariant 2 and one other:

Invariant 3

Let be the first processor with any empty space. Then is completely full of edges for at all times, and has no tree or non-tree edges for .

XS-CC diagnostic: dump connected components
 

1: Precondition: the input stream has stopped. This never happens during normal operation.
2: Description: Processor emits correct finite stream connected components output.
3:procedure DumpComponentLabels()
4:      Relabel all DumpComponentLabels output from upstream
5:     while Receive(, do
6:         Emit(, )      
7:      Now emit each union-find relationship
8:     for  do is the supernode relationship; see Definition 6
9:         Emit(, )      
Algorithm 1 This diagnostic routine is helpful for understanding correctness; it would never be called in practice. This assumes the W-Stream convention of choosing a vertex representative to name each supernode.

Invariants  2 and 3 are illustrated in Figure 5, with sets of spanning tree edges represented in red, and sets of non-tree edges represented in blue. In normal mode operation, single edges arrive at each X-Stream tick and propagate downstream to the builder, being relabeled along the way. They settle into if they are found to be tree edges, and into otherwise. The figure shows the system at X-Stream tick . In this notional example, the edge that arrived in the previous tick has passed through the head processor , but has not yet been resolved as “tree” or “non-tree.” Edge has passed through two processors, and relabeling of the endpoints has identified it as a non-tree edge. The basic protocol is thus quite simple; the subtleties of XS-CC normal operation arise in maintaining the invariants. For example, the builder may need to jettison non-tree edges downstream to make room for new tree edges. We provide full detail in Section 7.

4.1 XS-CC normal mode correctness

We now show that Invariant 2 and XS-CC relabeling implies an exact correspondence between the connectivity structures computed by XS-CC and DFR.

Algorithm 1 is a diagnostic routine intended to test implementations of XS-CC. At X-Stream time , a call to this routine streams out the connected components of the active graph as a stream of (vertex, label) pairs. Although we could correctly stream these vertex pairs out even as new edges change the connected components (see Section 10 for additional algorithm steps), this version is more illustrative. For simplicity we assume that the input stream pauses at time .

When head processor receives the “dump components” command in a primary slot, it copies the command to the primary slot of the bundle it will emit. Then, in Lines 8-9 of Algorithm 1, processor fills the remaining payload slots with relationships from its union-find structure, the way DFR  outputs union-find information into the streams. Specifically, if is the name of a building block encapsulated by local component in (i.e. ), then processor outputs a pair , where is the encapsulating supernode, the result of processor relabeling block . Processor then fills the payload slots of subsequent bundles until it has output all of its union-find relationships.

For downstream processor (), the bundle that has the “dump components” command has pairs in the payload slots. In Lines 5-6 of Algorithm 1, processor relabels any block names that have been encapsulated by a new supernode in . Otherwise . After all relationships from upstream have arrived, there is empty payload for to output its union-find information as described above.

We now argue the correctness of XS-CC based on the correctness of W-Stream. Let denote the output of processor from this diagnostic. For the formal arguments, we require the following definitions:

Definition 9

During a union operation joining sets with representatives and , the supernode naming function is such that decides whether or becomes the new set representative.

For example, we might choose a supernode naming function . This is the function used in Figure 1.

Definition 10

is an implementation of DFR with processor union-find capacity and supernode naming function run on input stream . XS-CC is defined similarly for XS-CC, with each processor’s union-find capacity set to .

A resolved edge is one that has been classified as “tree” or “non-tree.” Stream edges arrive in an unresolved state. In DFR, the stream written from pass contains only those edges that resolve to “tree” edges (they connect supernodes in the current version of the contracted graph). DFR deletes as “non-tree” any edge that it determines to be contained inside a supernode. In contrast, XS-CC must retain all non-duplicate edges, even after resolution. In particular, non-tree edges must be retained in case they are needed to reconnect pieces of the graph after bulk deletion. XS-CC removes duplicate edges from the stream after updating their timestamps.

By Invariant 2, all unique non-tree edges (those contracted inside a supernode) are stored at the end of the XS-CC data structure in spare space in the builder, or in processors downstream of the builder. These downstream processors have no union-find structure. The following lemma ignores known non-tree edges.

Lemma 1

The stream of unresolved edges sent from processor to in is exactly from .

Proof

We prove this lemma by induction. For the base case, the first pass of DFR and the first processor of XS-CC receive the same same finite stream of unresolved edges from the outside (logical processor ), namely the input stream of edges . Suppose that the stream of unresolved edges sent from processor to processor is the same as stream from DFR . We show that the stream of unresolved edges processor sends to is exactly DFR stream .

Processor of XS-CC and pass of DFR begin by computing connected components via union-find. Every edge that changes connectivity (starts a new component or joins two components) uses one of the possible union operations for this processor/pass. When they have both done union operations (their capacity), they have computed identical union-find data structures since they have done the same computations on the same input stream. At this point, DFR has not yet emitted any edges and XS-CC has emitted only resolved non-tree edges. Now DFR processes the remaining edges of , relabeling the endpoints, deleting edges where both endpoints are contained in the same supernode, and emitting the others to stream . XS-CC relabels these remaining edges the same way, and emits the same stream of unresolved edges (among resolved non-tree edges).∎

Since XS-CC runs on unending streams, there is no “end of stream” mark to trigger creation of and processing of an DFR  stream. However, the “dump components” diagnostic creates these streams.

Lemma 2

For followed by a call to DumpComponentLabels, stream is identical to stream from .

Proof

The “dump components” command after ingestion of a finite stream serves as an end-of-stream marker for XS-CC . We prove the lemma by induction. For the base case, streams and , the component information input to pass of DFR and processor of XS-CC respectively are both empty. Suppose that stream from is the same as stream from XS-CC followed by a call to
DumpComponentLabels. We show that stream is the same stream . From the proof of Lemma 1, the runs of XS-CC and DFR compute the same connected components in processor and pass respectively. Because XS-CC and DFR are using the same supernode naming function and have the same capacity, the union-find data structures (names of representatives and names of set elements) are identical. As described above, processor relabels and emits all elements of the same way that DFR pass relabels elements stream to stream . Then processor and DFR pass output the information in their identical union-find structures in identical ways, completing streams and respectively. ∎

4.2 XS-CC queries

The most basic XS-CC query is a connectivity query: are nodes and in the same connected component? A query that arrives at X-Stream tick will be answered with respect to the graph . The query enters the system from the I/O processor and propagates through the processors just as new stream edges do. Each processor relabels the endpoints, and the tail processor returns “:yes” if the labels are the same and “:no” otherwise. This holds even if one or both of the endpoints have never been seen before. The following theorem shows that connectivity queries are correct at single-edge granularity, and therefore that XS-CC in normal mode correctly computes the connected components of an edge stream.

Theorem 4.1

Suppose that the connectivity query arrives at the head processor of an X-Stream system with processors at X-Stream tick . Then the I/O processor will receive the boolean query answer at time . The answer will be True iff was connected to in , the logical graph that existed at time .

Proof

Recall that is the building processor. The query answer will be determined by X-Stream tick at the latest, since, by Invariant 2, is the last one to store any tree edges and hence, any union-find information. Thus it is the last processor that can change a label. The query travels processor-to-processor in a primary basket, just as the dump-components command does. If there are any transit edges in when the query arrives, they travel in slots of bundles strictly ahead of the query. Thus transit edges will settle into a processor before the query arrives. Similarly, any edges that arrive after the query travel in bundles strictly behind the query and cannot affect the query relabeling. Thus when the bundle with the query arrives at processor , the union-find data structure, and the processor’s status as the builder or not, are set exactly according to the graph in the system when the query arrived.

Processing query is closely related to processing DumpComponentLabels. Instead of dumping information for every vertex, starting at the point where a vertex is first encapsulated in a supernode, simple queries only consider two vertices. The label for will only change from to a supernode label at the processor that first incorporates into a local component (). In DumpComponentLabels, processor is the first that outputs any pair , with first component into the stream . Thus, after the query has passed the building processor, the labels for vertices and are identical to their output values, which exit the system at time . By Lemma 2, these are the same labels they would have if DFR is run on graph . Because DFR is a correct connected components algorithm, vertices and will have the same label if an only if they are in the same connected component. ∎

We call queries that XS-CC answers with latency constant queries. See section 10 for examples of non-constant queries.

The next theorem shows that XS-CC is space-efficient, storing the current graph in asymptotically optimal space.

Theorem 4.2

In normal operation of XS-CC, each edge is stored in exactly one processor, requiring space.

Proof

In normal operation, when a new edge arrives at a processor that already stores a copy of , processor removes from the stream and updates the timestamp of . Invariant 2 ensures that incoming tree edge encounters any previously-stored copies of itself before it reaches , the building processor, which recognizes it as a tree edge. Invariant 3 ensures that incoming non-tree edge encounters any previously-stored copies of itself before it reaches , the first processor with any empty space. Furthermore, this invariant also ensures that there are no edges stored downstream of . ∎

Theorem 4.1 shows that basic connectivity queries are answered correctly by XS-CC. In Section 10 we informally discuss three additional types of feasible queries: complex queries such as finding all vertices not in the giant component of a social network , vertex-based queries like finding the degree or neighborhood, and diagnostic queries regarding system capacity used. Also, by Invariant 2, X-Stream always knows a spanning tree of the streaming graph by construction. This tree could be checkpointed, for example, if processors share a filesystem.

5 Aging mode

(a)
(b)
Figure 6: XS-CC aging mode. (a) A token notifies processors that they must apply an aging predicate. All surviving edges become unresolved (gray). (b) Normal operation continues uninterrupted for new edges, while unresolved edges circulate back to be incorporated into a new data structure. Each processor in turn becomes the loading processor () and recycles its unresolved edges.
Figure 7: XS-CC aging nomenclature. both primary and payload edges are called resolved when they have been classified as tree or non-tree. Duplicate detection leaves empty slots, and processors ingest and emit bundles of edges.

XS-CC handles infinite streams via a bulk deletion operation we call an aging event. Our model is thus unlike most previous work, in that we do not expect or support individual edge deletions embedded within the stream. Rather, we expect the system administrator to schedule bulk deletions to ensure that the oldest and/or least useful data are deleted in a timely manner.

To begin aging, the system administrator introduces an aging predicate (for example, a timestamp threshold) into the input stream. The predicate propagates through the system, and each processor suspends query processing upon receipt. However, a new stream edge might arrive in the X-Stream tick immediately after the aging predicate arrives from the I/O processor. This and all other new edges must be ingested and processed without exception. Thus, the connectivity data structures must be rebuilt concurrently with normal stream edge processing. When this rebuild is complete, queries are accepted once again.

We now describe how XS-CC processes the aging predicate and prove correctness. In Section 6 we provide theoretical guarantees relating the fraction of system capacity used after the deletion predicate has been applied, the bandwidth expansion factor, the proportion of query downtime that is tolerable, and the expected stream edge duplication rate.

5.1 Aging process

Figure 6 illustrates the aging process. An aging token arrives with an edge-deletion predicate. As the token propagates downstream, all edges are reclassified to be untested. If an edge later passes the aging predicate it becomes unresolved since the old connectivity structure is no longer valid. Immediately after the aging token is received by the head processor, new stream edges may continue to arrive. These are processed as normal, starting from empty data structures, so we maintain Invariants 2 and 3 even during aging.

Conceptually, upon receipt of aging notification the deletion of all edges that fail the aging predicate and reclassification of all surviving edges to unresolved is instantaneous. However, in practice each processor takes X-Stream ticks to execute a “testing phase” that applies the aging predicate to each stored edge. Without careful attention to detail, implementers could allow a case in which there is no space yet for a new stream edge. In Section 7 we give exact specifications for a correct procedure that ensures no stream edge is dropped, even in the X-Stream tick immediately after aging notification. If the testing phase has yet not identified empty space for a new stream edge, then one of the unresolved edges can be sent downstream in a primary slot. This is an example of the jeopardy condition described later in this section, corresponding to Line 21 in Algorithm 1.

In addition to normal processing of new stream edges, XS-CC recycles all unresolved edges that survive the aging predicate. As depicted in Figure 6, we introduce a new designation for a loading processor or “loader.” Upon each activation to process a stream edge, the loader packs unresolved edges into any available payload slots in the output bundle. Such bundles propagate around the ring. After a bundle reaches the head processor , its payload edges are processed as if they were new edges. When the loader has emitted all of its unresolved edges, it passes the loader token downstream to its successor. Aging is complete when the last processor with any unresolved edges has completed its loader duties.

Figure 8: The XS-CC aging “jeopardy condition.” Processor currently bears both building and loading responsibilities, is completely full of edges, and must ingest a bundle with no empty slots. It ingests slots, finds no duplicates, and must emit slots. Therefore an unresolved “jeopardy edge” must be emitted in the primary slot. If it doesn’t settle in a processor before leaving the tail, the system is completely full and raises a FAIL condition. Note that in this illustration, will be able to store the jeopardy edge, so the jeopardy condition will soon be mitigated.

The complete XS-CC protocols defined in Section 7 enforce the previous invariants at all times, as well as the following invariant during aging.

Invariant 4

During aging, let be the loading processor and be the building processor. Then . Also, processor has no unresolved edges for and has no resolved edges for .

The combination of all invariants ensures that all processors from the head to the builder are running XS-CC in normal mode on all incoming (and recycled) edges. All resolved edges are packed to the front (upstream). When all edges have been recycled and aging ends, the layout of edges returns to normal mode.

Figure 7 puts the nomenclature of our arguments into context. An edge becomes resolved when an XS-CC processor determines that it is a tree or non-tree edge, regardless of whether it is a new stream edge in a primary slot or an unresolved edge being recycled as payload. Processors ingest and emit bundles of edges. With one exception we will discuss presently, the complexity of processing input bundles and packing edges into output bundles prior to emission is relegated to Section 7.

Aging is generally a straightforward process in which the loader token steadily advances from to , unresolved edges are recycled and resolved, and the XS-CC connectivity structure is rebuilt. When builder and loader designations coincide in the same processor, that processor packs unresolved edges for emission first, then non-tree edges. Edge bundles containing transit edges have one primary slot and payload slots, where is the bandwidth expansion factor. New stream edges reside in primary slots, and unresolved edges circulate in payload slots until they are resolved. Payload edges continue in their assigned slots until allowed to settle, per the invariants.

There is a single exception to this last point, illustrated in Figure 8. We call this the jeopardy condition and use it to specify exactly when the system fills to capacity during aging (indicating that the aging command was too late or did not remove enough edges). In the jeopardy condition, processor is also the loader, is already storing edges to its capacity , and must ingest an edge bundle with no empty slots. It ingests slots, finds no duplicates, and by conservation of space, must emit slots. Therefore an unresolved edge must reside in the primary slot. If cannot be offloaded before exiting the tail, the system is completely full and raises a FAIL condition.

The above discussion and the more detailed discussion in Section 7 show the following property necessary for proving aging correctness holds:

Property 1

During aging, every surviving edge is incorporated into the new connected components data structure either by directly or by traveling back to as a payload edge.

5.2 Aging correctness

We now argue correctness of the aging process. We say that any implementation of XS-CC aging that maintains Invariants 2, 3 and 4 and property 1 is compliant. A compliant aging process ensures that during aging there is a monotonic ordering of edges in the system, with tree (red) edges never allowed downstream of non-tree (blue) edges, and unresolved (gray) edges never allowed upstream of non-tree edges. In the argument below, we slightly abuse notation by using the graph in place of its edge set .

Theorem 5.1

Suppose a compliant XS-CC implementation receives an aging command at tick and reauthorizes queries at tick . Let be the set of edges in that fail the aging predicate and let be the set of edges that arrive between time and . Then at tick , the x-stream system stores graph , can properly answer queries and stores each edge in exactly once.

Proof

As the aging command that arrived at time propagates through the processors, they reclassify all current edges to “untested” as described in Section 5.1, forgetting the current union-find structure. Thus the system starts processing a new graph from an empty state at time . As described in Section 5.1, processors delete all edges in , those who fail the predicate. Each remaining edge in is eventually loaded into payload slot by Property 1, and processed at the head as arriving edges. Invariants 2 and 3 hold with the newly-created data structures thoughout aging. Invariant 4 ensures that all unresolved edges are in the builder processor or downstream. Those in the builder do not affect the connectivity computation and are eventually moved downstream. Thus, all edges arriving from outside the system are processed as in normal mode and all edges arriving in the payload slots are processed as in normal mode (other than traveling in a payload slot). Thus at time when the tail processor passes the loading token out of the system and enables queries, the X-Stream system stores exactly the edges in , with duplicates appropriately removed. This is the graph the system is required to hold by definition of aging and the requirement that it drop no incoming edges during aging. The edges are processed into the data structures with arbitrary mixing of new edges and recycled (surviving) edges. By Observation 1, and the equivalence of DFR and XS-CC in normal mode, the ordering of the input edges does not matter for future query correctness. By Theorem 4.1, the X-Stream system will now correctly answer querries on the graph starting at time .

During aging, some edges may be stored up to twice. If a duplicate of a suriving edge enters the system before edge circulates back to the head processor, then edge is stored both in the new data structure as a tree or non-tree edge and as an unresolved edge. However, when edge is eventually recycled, it will be recognized as a duplicate and not stored again. By Theorem 4.2, any edge that enters from outside the system during aging will be stored at most once in the new data structure. ∎

6 Conditions for successful aging

In this section, we define the conditions under which a compliant aging process completes before the system fails for lack of space. We consider properties of the system, properties of the input stream, and user preferences.

Definition 11

We define the following as tradeoff parameters associated with infinite runs of XS-CC.

c:

fraction of the total system storage occupied by edges that survive the aging predicate

d:

percentage of X-Stream ticks that the system is unavailable for queries due to aging

u:

estimate of the percentage of incoming stream edges that will be unique

k:

the bandwidth expansion factor: the size of an X-Stream bundle (a set of edge-sized slots that circulates in the ring)

p:

number of X-Stream processors

S:

aggregate storage available in the system

s:

storage per processor in a homogeneous system ()

Aging must be initiated before the system becomes too full, or else jeopardy edges will lead to a FAIL condition. We quantify this decision point as follows.

Lemma 3

In the worst case, there must be at least

open space in the system when an aging command is issued to be guaranteed sufficient space for aging, where , , and are given in Definition 11.

Proof

When the aging command arrives, there could be edges in transit that all must be stored. Because iteration over the untested list doesn’t imply any specific ordering, in the worst case, when processors test the edges against the predicate, all surviving edges are tested before an edge fails the predicate. This gives the latest time when space becomes free for new edges. When a processor receives the aging command, it processes untested edges each tick until it has tested all its edges. In the ticks required for the aging command to reach the tail, the head tests edges, the second processor tests edges and so on, while the tail tests edges. Thus in the first ticks after aging starts, the system tests edges. After that, the system tests edges per tick. If the system tested , every tick, it would require ticks. But the first ticks are only half as efficient, so we require an extra ticks. Thus the total number of ticks before the system is guaranteed to remove an edge that fails the predicate is at most . ∎

If the system is homogeneous, the empty space expression in Lemma 3 becomes . For example, for a homogeneous system, assuming that , if and , then one should start aging while of the last processor is still empty. The last processor can issue a warning when it starts to fill and again closer to the deadline given and .

Theorem 6.1

In any XS-CC aging process initiated in accordance with Lemma 3, if , , , , , , and from Definition 11 are set such that

then the aging process will finish before the system storage fills completely.

Proof

After the aging token arrives, the head processor must apply the aging predicate to its edges. It processes per tick, as described in Section 5.1. Thus, after ticks, the head processor passes the loader token to the second processor. By that time, all other processors have applied the predicate to all of their edges and have a list of surviving edges. Once unresolved edges begin circulating from the loader (ignoring additive latencies such as the time until the first payload reaches the head processor since ), edges re-enter the system to be resolved at each tick. Since unresolved edges survived the aging predicate, in the worst case (when they are all in the second processor or later) it will take ticks to complete aging. During this time, every ticks yields a new, non-duplicate stream edge. Thus, the system will fill to capacity in ticks. The proportion constrains these two tick counts as follows:

Simplifying this inequality and solving for (with Wolfram Alpha Wolfram—Alpha (2021), for example) yields the result. ∎

The parameters and are user preferences, but is dictated by computer architecture. Reasonable values of for current architectures are , but emerging data flow architectures may provide upward flexibility. The parameter must be estimated by the user based on knowledge of the input streams that s/he will feed to XS-CC.

We can now state the central result of this paper.

Theorem 6.2

XS-CC can process an effectively infinite stream of graph edges without failing, answering connectivity queries correctly when in normal mode, as long as the system is configured in accordance with Theorem 6.1 and aging is started with sufficient space available obeying Lemma 3.

Proof

Assuming that the proportion of X-Stream ticks that yield a new, non-duplicate stream edge is , an empty system will fill and fail in ticks. Compliant aging in accordance with Lemma 3 ensures that aging will always complete before the system fills. During normal mode operation, Lemma 2 and Theorem 4.1 ensure, respectively, that accurate connected component information is stored, and that connectivity queries are answered correctly. As long as the system adminstrator adheres to such a schedule, XS-CC operation can continue through an arbitrary number of aging events. ∎

We note that queries yielding system capacity usage are TODO constant-size. In the case of a simple aging predicate such as a timestamp threshold, given a target proportion of edges that survive an aging event, the X-Stream system administrator could use an automated process to trigger the aging process.

7 X-Stream edge processing specification

X-Stream connected components driver
 

1:procedure ProcessBundle(PrimaryEdge(e), PayloadEdges())
2:     PackingSpaceAvailable the output buffer has size and is initially empty
3:     if EmptyEdge(then
4:         Pack(EmptyEdge) any call to Pack decrements PackingSpaceAvailable
5:     else   ProcessEdge()      
6:     for  do there are payload edges only during aging or non-constant query processing
7:         ProcessEdge()      
8:     if Not(AGING) then
9:         Emit(PackedBundle)
10:         return      
11:      Aging-related logic
12:     if Loader then
13:         if Head then the Head’s testing & resolution phase
14:              for  do
15:                   = PopEdge(UNTESTED)
16:                  if EmptyEdge(then
17:                       Pack(LoaderToken)
18:                       Break                   
19:                  if AgingPredicatePassed(then
20:                       if  and Full then
21:                           Pack() jeopardy edge
22:                       else
23:                           ProcessEdge() Head immediately resolves surviving edge                        
24:                  else
25:                       Delete()                                 
26:         else Downstream resolution phase (all testing has finished)
27:              for  do
28:                   = PopEdge(UNRESOLVED)
29:                  if NULL(then
30:                       Pack(LoaderToken)
31:                       Break
32:                  else
33:                       Pack(e’)                                          
34:     else Downstream testing phase
35:         if Not(HEAD) And NotEmpty(UNTESTED) then
36:              for  do
37:                   = PopEdge(UNTESTED)
38:                  if AgingPredicatePassed(then
39:                       PushEdge(, UNRESOLVED)
40:                  else
41:                       Delete()                                               
42:     Emit(PackedBundle)
Algorithm 1 This is the driver function for an X-Stream implementation that is compliant with Invariants 2, 3, and 4, and Property 1.

Algorithms 1 and 2 show the XS-CC driver and constituent functions, respectively, for processing edges. We do not show full detail for token passes, commands, and queries. These functions maintain the invariants and produce a compliant XS-CC implementation. We used this pseudocode as guidance for the code that produces our experimental results.

Each X-Stream processor executes ProcessBundle whenever it receives the next bundle of edge slots, regardless of its current execution mode (normal or aging). It will process each slot in turn, and the constituent functions ProcessEdge, ProcessPotentialTreeEdge, and StoreOrForward determine what to pack into an output bundle destined to flow downstream.

Note that the top-level logic of processing the primary and payload edges of a bundle is the same in Algorithm 1, regardless of execution mode. When a new edge arrives from the stream, processors upstream of (and including) the building processor will classify it as tree or non-tree using the relabeling logic of Section 3.2 (Lines 15-18 of ProcessEdge and Lines 2-4 of ProcessPotentialTreeEdge). The builder stores any new tree edge. We ensure that this is possible via logic to jettison an unresolved edge if one exists (only during aging; Lines 9 and 16 of StoreOrForward), or else to jettison a non-tree edge (Line 15 of StoreOrForward). This progression of jettison logic maintains Invariants 2 and 3.

Suppose that the head processor receives notification of an aging event at X-Stream tick . X-Stream ticks and are especially interesting. If a new edge arrives in the input stream at , it must be stored in (which is now acting as both the builder and the loader ) in order to maintain Invariant 4. However, has had only one tick to initiate the process of testing its edges against the aging predicate. That means that it tested edges in tick . Suppose all of these edges survived the predicate and therefore couldn’t be deleted. This is a jeopardy condition, and it was handled during tick by Lines 20-21 of ProcessBundle. Favoring the new edge, jettisoned in the primary slot of its output bundle the last of the unresolved edges it created in that tick. Therefore, at tick we are assured that can store a new stream edge.

During aging, the loader packs unresolved edges into the empty payload slots in incoming bundles to be sent around the ring. When these edges arrive at , they are processed as if they were new stream edges, classified as tree or non-tree, and incorporated into the data structures in by the same invariant-maintaining constituent functions that handle new edges. One optimization we include is that need not actually pack and send its unresolved edges around the ring. Rather, in Lines 13-23 of PackBundle, simply tests against the aging predicate and immediately processes its tested edges rather than calling them unresolved.

As aging proceeds, the Loader token is passed downstream whenever a processor exhausts its list of unresolved edges (Lines 28-31 of ProcessBundle). Once the Loader token exits the tail processor, Property 1 is established.

X-Stream constituent functions
 

1: Processor receives an edge
2:procedure ProcessEdge()
3:     if Duplicate(e) then regardless of ’s position in the chain, duplicate edges don’t propagate downstream
4:         SetNewestTimestamp(e) During aging, either or its stored duplicate could be the newest
5:         if Primary(e) then
6:              Pack(EmptyEdge) bundles drive XS ticks, ProcessBundle requires a primary edge          
7:         Return      
8:     if DownstreamOfBuilder then : stores only non-tree and/or unresolved edges
9:         if Primary(e) then need to store this edge if we can in order to ensure Invariant 3
10:              StoreOrForward(e) StoreOrForward accepts or packs it for output
11:         else Payload(e), i.e., aging
12:              Pack(e) processors downstream of the Builder simply propagate payload edges          
13:         Return      
14:      Processor contains connected component information, i.e.,
15:     if  then previously-discovered non-tree edge
16:         StoreOrForward(e)
17:     else if Not(ProcessPotentialTreeEdge(e)) then newly-discovered non-tree edge
18:         StoreOrForward(e)      
1:procedure ProcessPotentialTreeEdge(e=(u,v,…))
2:      Relabel(e)
3:     if  then newly-discovered non-tree edge
4:         Return(FALSE)      
5:     if Builder then builder must ingest tree edge
6:         Assert(StoreOrForward(e) = STORE) ensure Invariant 2
7:         if FullOfTreeEdges then
8:              Pack(BuilderToken) can be encoded with a bit; doesn’t take a whole slot          
9:     else   Pack(e) has previously sealed, so it is already full of tree edges      
10:     Return(TRUE) still a potential tree edge; downstream processors will determine that
1:procedure StoreOrForward(e=(u,v,…))
2:      Precondition: if is a tree edge, this processor is not full of TREE edges
3:     if Full then
4:         if Tail then
5:              Fail the system is totally full          
6:         if Unresolved(then
7:              Pack()
8:              Return(FORWARD)          
9:          PopEdge(UNRESOLVED) jettison an unresolved edge to keep a resolved one, if possible
10:         if EmptyEdge(then no more edges to resolve
11:              if NonTree(then
12:                  Pack() no need to jettison a non-tree edge if is non-tree
13:                  Return(FORWARD)
14:              else by precondition, there must be a non-tree edge to jettison
15:                  Pack(PopEdge(NONTREE)) jettison a non-tree edge to keep a tree edge               
16:         else   Pack()               
17:     if Primary(e) then Pack(EmptyEdge) every output raft needs a primary edge      
18:     Accept(e) perform UNION/FIND if
19:     Return(STORE)
Algorithm 2 These three constituent functions comprise the X-Stream algorithm for maintaining connected components.

8 Related work

This paper is motivated by the obseration that no other published work meets the needs of the cybersecurity use case we describe in Section 1. Most work on theoretical streaming problems, best surveyed in Muthukrishnan and others (2005), is limited to the finite case. Some research in the past decade has addressed infinite graph streaming in a sliding-window model Crouch et al. (2013); McGregor (2014) but the work is quite abstract, and expiration must be addressed with every new stream observation. We were unable to apply this theory directly in a distributed system with bulk deletions, but our XS-CC algorithm could be thought of as a generalized, distributed implementation of these sliding window ideas.

As far as we know, our X-Stream model and XS-CC graph-algorithmic use case comprise the first approach to infinite graph streaming that is both theoretically-justified and practical. We provided an initial view of the X-Stream model in a 2013 KDD workshop paper Berry et al. (2013), and provide full detail of a greatly-streamlined version in this paper.

As the previous sections have made clear, our focus is infinite streaming of graph edges with theoretical guarantees and a well-defined expiration strategy with a path to implementation in simple distributed systems. Thus, we have approached the problem from a theoretical streaming perspective, focusing primarily on related “per-edge arrival” streaming details. We have shown how to maintain connectivity and spanning tree information. We hope that others will expand the set of graph queries available in X-Stream, and/or propose new infinite streaming models.

The closest related work comes from the discipline of dynamic graph algorithms, which takes a different approach. Work in this area typically assumes that the graph in its entirety is stored in a shared address space or supercomputing resource. Updates to the graph come in batches and take the form of edge insertions and sometimes deletions too. After a batch of updates is received, incremental graph algorithms update attributes such as connected components or centrality values. During this algorithmic update the stream is typically stopped. There is no attempt to describe what running infinitely in a finite space would mean other than to rely on an implicit assumption that the batches will have as many deletions as insertions over time. We know that in the cybersecurity context, for example, this assumption will never be true. An impressive survey of dynamic graph processing systems is found in Besta et al. (2019).

We break down the area of dynamic graphs into data structure work that builds a graph without computing any algorithm (for example Ediger et al. (2012); Riedy et al. (2011); Iwabuchi et al. (2016)), work that “stops the world” at each algorithmic update (for example Riedy et al. (2011); Basak et al. (2020); Wheatman and Xu (2021)), and recent attempts to to process graph updates update algorithmic results concurrently Yin et al. (2018); Yin and Riedy (2019); Sallinen et al. (2019); Grossman et al. (2020).

The data structure group includes solutions such as Ediger et al. (2012), which achieves a rate of over 3 million edges per second on a Cray XMT2 supercomputer using a batch size of 1,000,000 edge updates, and Iwabuchi et al. (2016), which achieves a rate of more than two billion edges per second on a more modern supercomputer while maintaining some information about vertex degrees. While these rates are impressive, approaches such as these require a supercomputer and don’t specify how to continue running as their storage fills up.

When incremental computation of graph algorithms such as Breadth-First Search (BFS), connected components, PageRank, and others, SAGA-Bench Basak et al. (2020) can achieve latencies of a fraction of a second on conventional hardware using an update batch size of 500,000 edges. This tranlates to a few million updates per second, while also maintaining incremental graph algorithm solutions. Wheatman and Xu also exploit large batches of edge updates and advanced data strctures (packed-memory arrays) to approach this problem Wheatman and Xu (2021). They achieve amortized update rates of up to 80 million updates per second while maintaining per-batch solutions to graph problems such as connected components, where update batches can be of size 10,000,000 or greater. Even if our analysts could tolerate such batch sizes, however, what prevents us from simply adopting their approach is the requirement that we must have a methodology for running infinitely.

We conclude our discussion of the dynamic graph literature with recent results that process graph updates and update algorithmic results without interrupting the input stream. The HOOVER system can run vertex-centric graph computations on supercomputers that update connected components information at an ingestion rate of more than 600,000,000 edges per second Grossman et al. (2020). However, the update algorithm works only for edge insertions so our requirements are not met and the system would quickly fill up. Yin, et al. Yin et al. (2018) propose a concurrent streaming model, and Yin & Riedy Yin and Riedy (2019) instantiate this model with an experimental study on Katz centrality. However, overlapping graph update and graph computation still does not meet our need for a strategy to compute on infinite streams.

9 Experiments

Bundle Size 64-bit ints/s X-Stream potential () X-Stream potential ()
Benchmark 1 5 1742160.27 174216.02 69686.41
Benchmark 1 25 8680555.55 868055.55 347222.22
Benchmark 1 250 54112554.11 5411255.41 2164502.16
Benchmark 2 5 1344086.02 134408.60 53763.44
Benchmark 2 25 6281407.03 628140.70 251256.28
Benchmark 2 250 35063113.60 3506311.36 1402524.54
Table 1: TBB benchmarks designed to produce bounds on X-Stream performance on Intel Sky Lake. Benchmark 1 propagates bundles downstream without any computation. Benchmark 2 hashes two of every five integers in the bundle to simulate XS-CC’s ProcessEdge computation. The rightmost two columns show upper bounds on XS-CC performance for bandwidth expansion factors and . On this architecture, we must send bundles of size 250 to maximize performance. XS-CC with is bounded by 1.4 million edges per second.

The X-Stream model and the XS-CC algorithm are based on message passing. At each X-Stream tick, each processor performs only a constant number of operations. These are predominantly hashing operations, union-find operations, and simple array access. Therefore, performance of XS-CC is strongly tied to computer architecture. The faster a system can perform hashing and message passing, the faster XS-CC will run.

With a current Intel computer architecture (Sky Lake), we will show that our initial XS-CC implementation can almost match the peak performance of a simple Intel/Thread Building Blocks (TBB) benchmark that transfers data between cores of the processor. This translates to streaming rates of between half a million and one million edges per second, which is comparable to the low end of performance spectrum for modern dynamic graph solutions (none of which handle infinite streams). The high end of that spectrum is not comparable to our context since we require no supercomputer and ingest data from only one processor. We have ideas to exploit properties of many graphs (such as the phenomenon of a giant connected component) for running many instances of XS-CC concurrently to boost our rates by orders of magnitude. However, that is beyond the scope of this paper.

9.1 Computing setup and benchmarking

All results in this paper were obtained using a computing cluster with Intel Sky Lake Platinum 8160 processors, each with 2 sockets, 24 cores/socket, 2 HW threads/core (96 total), and 192GB DDR memory. The memory bandwidth is 128GB/s, distributed over 6 DRAM channels. The interconnect is Intel OmniPath Gen-1 (100Gb/s). The operating system is CentOS 7.9, and our codes are compiled with Intel icpc 20.2.254 using the flags -O3 -xCORE-AVX512.

Our full implementation of XS-CC is single-threaded 111the normal-mode computations and data structures are single-threaded. We use another thread for cleanup and reallocation at an aging transition. and written in PHISH Plimpton and Shead (2014), a streaming framework based on message passing. However, before presenting XS-CC results, we explore the expected peak performance of the algorihm on a single node of the Sky Lake cluster using the vendor’s own software library (Thread Building Blocks 2019 8).

Mini benchmark

We implemented a simple ring of X-Stream-style processing modules in TBB. The head module accepts bundles of synthetic data from an I/O module and sends them down the ring toward the tail, which feeds back to the head. The latter merges this bundle with its next input bundle. We further distinguish two benchmarks:

  • Benchmark 1: Each processor either simply copies input bundles to output.

  • Benchmark 2: Each processor hashes two of every five integer of the input bundle and copies the input to the output. This approximately reflects the main computation kernels of XS-CC: hashing the timestamp of each each, and doing a union-find operation.

Table 1 shows the performance of our TBB benchmarks as the number of 64-bit integers in a bundle is varied. For these runs there are 10 processors in the ring. Recall that XS-CC edges circulate as 5-tuples of 64-bit integers: where are vertex ids, are local component labels, and is a timestamp. Therefore the raw rates of the third column must be divided by 5 to count in units of X-Stream primary edges. Furthermore, to account for the payload slots in XS-CC bundles active during aging or non-constant query processing, the primary edge rate must be divided by the bandwidth expansion factor . With optimal use of Intel’s TBB, we see that we should pass messages containing roughly 250 64-bit integers, and we expect XS-CC edge streaming rates to be bounded by 1.4 million edges per second.

Since Benchmark 2 is equivalent to Benchmark 1 except for a larger compute load, we see clearly that Benchmark 2 is not bandwith bound on this architecture. We believe that these benchmarks are bound by a combination of compute and memory latency. We experience a slow-down from 2.1 million edges per second to 1.4 million simply by adding two hashing operations per bundle. As our experiments with XS-CC will show, the latter is likely even more compute bound. This is welcome since it admits the possibility that multithreading within TBB nodes and X-Stream processors could accelerate our single-threaded edge-processing results.

Furthermore, in a real deployment of XS-CC, we would assign a single X-Stream PE to a single compute node and communicate over the interconnect. In fact, that is the basis of the XS-CC results presented in Figures 9, 10, and 11. In this case, we can compute the approximate theoretical peak for an X-Stream-like computation as follows. The interconnect is 100Gb/s, or 12.5GB/s. That translates to 1.5 billion 64-bit integers per second. Since XS-CC uses messages with 5 64-bit ints to represent an edge, and a typical value of the X-Stream bandwidth expansion factor is 5, we are bounded by million XS-CC primary edges per second. The rates of our prototype implementation do not approach this number, so we believe that like the benchmarks, we are bound by a combination of compute and memory latency. A multi-threaded production version of XS-CC would likely be necessary to better exploit a computing environment such as our Sky Lake cluster. With that said: the TBB benchmark itself falls far short of the possible performance suggested by Sky Lake’s theoretical peak memory bandwidth of 128GB/s. Significant algorithm engineering may be necessary to obtain a perfomant, production version of XS-CC.

Datasets

We present prototype XS-CC results on three datasets:

  1. An anonymized stream of 10 million real gateway network traffic edges from Sandia (the same stream used in  Berry et al. (2013)).

  2. A stream of edges from an R-MAT graph with 2097152 vertices, edge factor 8, and SSCA-2 parameters (0.45, 0.15, 0.15, 0.25) Chakrabarti et al. (2004).

  3. The Reddit reply network Kumar et al. (2018) from SNAP Leskovec and Krevl (2014), with 646,024,723 edges.

  4. A synthetic dataset with 100 continguous observations of each of a stream of edges with new, unique endpoints.

For experiments below validating Theorem 6.1, we note that Dataset 3 has a uniqueness parameter () of roughly 0.67.

Figure 9: Experiment 1: Prototype XS-CC normal-mode streaming rates on the four datasets on Sky Lake. Table 1 places the expected peak performance of X-Stream computations on this architecture at between 1.4 million and 5 million edges per second. This indicates that our single-threaded XS-CC implementation is not bandwidth bound and a future version could benefit from multithreading. The decrease in performance from M edges/sec to M edges/sec occurs when the second X-Stream processor becomes the builder.

9.2 XS-CC implementation

We used PHISH with the MPI back end to implement the XS-CC algorithm. Stream processing modules in PHISH are called “minnows,” and we instantiated a minnow to serve as the X-Stream I/O processor and a group of minnows to form the X-Stream ring of procssors (one per compute node in the Sky Lake cluster). We also ran with a single compute node hosting all XS-CC PE’s. However, since our prototype is compute bound the rates we achieved were comparable and are not presented.

Figure 10: Experiment 2: Empirical validation of Theorem 6.1 using Dataset 3. The value is the fraction of edges that survive the aging predicate, the value is the fraction of time that queries are enabled, and the value is the bandwidth expansion factor (the number of slots in a bundle).

We now present results from Sky Lake runs of our PHISH-based, single-threaded implemenation of XS-CC. Before collecting these results, we validated the correctness of XS-CC by designating every tenth stream edge to be a connectivity query, statically computing the correct connected components, and confirming that XS-CC’s query result matched that from the static computation (with and without aging events). We ran this validation on a prefix of approximately 800,000 edges from Dataset 3.

9.3 Experiment 1: XS-CC normal-mode streaming rate

Figure 9 shows XS-CC streaming rates for normal mode in the three datasets. We streamed the full Datasets 1 and 2 and a prefix of 30 million edges of Dataset 3. Our single-threaded prototype implementation is compute bound, as verified by computing to the benchmark results of Table 1. Note that the performance of our prototype is heavily data-dependent. On the “easy” synthetic dataset (Dataset 5 in the figure), note that we match rates with Table 1. When real datasets cause more work, the ingestion rate drops, again showing that we are compute bound. Our prototype achieves rates between 500,000 and 1,000,000 edges per second, depending on the dataset. We note that Dataset 1, which is a real dataset, has many repeat edges and admits an ingestion rate of one million edges per second.

Figure 11: Experiment 3: Automated aging with reservoir sampling on a prefix of 300 million edges of Dataset 3. The aging strategy honors Theorem 3. This demonstrates our strategy to run on infinite graph stream through arbitrary numbers of aging events.
Figure 12: Experiment 3: The stream processing rate of our prototype over time during automated aging.. Although ingestion rate of new edges does slow during aging due to factors not modeled such as the proportion of tree edges, we note that aging during periods of lesser stream intensity (such as nighttime for a worksite) should allow infinite streaming without dropping edges.

9.4 Experiment 2: XS-CC with a single aging event

Recall that Theorem 6.1 relates the X-Stream bandwidth expansion parameter to parameters of the system and dataset. We validate that theorem empirically for a 30 million-edge prefix of Dataset 3 by analyzing a single aging event that is triggered at X-Stream tick . In Figure 10, the capacity of one X-Stream processor () is fixed for each value of of such that the system will be completely full at the end of the 30 million-edge stream, after processing one aging event222In particular, if we age at X-Stream tick such that fraction of the edges stored so far survives, then our total storage needed is and . The number of X-Stream processors () is fixed at . We vary over a range of values of (the target fraction of edges that survive the aging event) and (the fraction of X-Stream ticks in which queries are enabled), we show using a 3-D surface the predicted by 6.1. We overlay empirical results in the form of observed data points from XS-CC runs when the bandwidth expansion factor is set as predicted by the theorem. We claim that the prediction surface and observed data points corroborate the theorm in this experiment.

9.5 Experiment 3: XS-CC runs of arbitrary length

The most important contribution of our work is our set of ideas regarding infinite graph streams and running XS-CC for indefinite periods of time without filling up or failing. We corroborate these ideas empirically in this section for a simple use case: the aging predicate is a simple timestamp comparison: delete all edges older than X-Stream tick . This strategy could be adapted to accommodate other aging predicates.

The primary challenge facing an X-Stream system administrator is when to initiate an aging event and what threshold to use. In this section we present an automated solution. The system administrator initializes the system with a target value (the fraction of edges that should survive an aging event). Then we run XS-CC with the following aging-invocation protocol. This could be specified in detailed pseudocode, but in the interest of space we describe it informally below.

When the tail processor begins to fill, we begin an automated binary search to find a timestamp that will hit our fractional target of edges that survive the aging event. We augment the data structures of each X-Stream ring processor to include a small reservoir of 100 edges, and ensure that this is a representative sample by using the classical technique of reservoir sampling Vitter (1985).

The binary search proceeds by varying between the oldest timestamp and newest timestamps in the system. At each candidate value of , each ring processor estimates the number of edges that will survive an aging event with threshold . In one circuit of the ring in a payload bundle, the tail processor will know whether needs to be increased or decreased in order to hit the target value. In a logarithmic number of passes, the tail processor knows an accurate value of and tells the head to initiate aging with that threshold. In practice, this should give plenty of time to complete aging honoring Theorem 3.

Figure 11 depicts the result of running XS-CC on a 300 million-edge prefix of Dataset 3 using this automated aging strategy with a target value of 0.5. We see that the binary search succeeds in finding aging thresholds that reliably reestablish a storage level of over an arbitrary number of aging events (we depict the first 27).

10 Non-constant queries and commands

As we have shown, connectivity queries propagate through the X-Stream ring processings in X-Stream ticks, and the query answer is sent from the tail processor to the head, then back to the I/O processor. Another potentially useful query that finishes in X-Stream ticks is “How many edges are in the system?”

X-Stream also supports queries with non-constant-sized output. At most one such query can be active at a time. The answer to the query is output in constant-sized pieces using the payload slots. The canonical non-constant query is a request to output all vertices in small connected components. Specially, the answer is the names of all components with at most vertices and the list of vertices within them. This query makes practical sense only in graphs that have a giant connected component, but most real graphs have one. We describe how X-Stream executes this specific query.

For a local component with name , let be the number of vertices in . A processor can compute the size of a local component as the sum of the number of vertices in each building block. This is for a primitive building block. For this discussion, we assume processors keep track of the number of primitive building blocks for each local component while building these components. This adds only constant work per union-find operation. However, it’s also possible to inialize local-component sizes to zero and compute them on-the-fly for this query. But then, the processor does at most k-1 work counting primitive building blocks or outputing the messages below, which will further delay the query response. Processors receive the size of non-primitive building blocks from upstream processors.

When the head processor receives the query “Output the vertices in components that have at most vertices” in the primary slot of a bundle, it passes the query downstream in the primary slot. This allows all processors to learn the type of query and the parameter . The head then uses the payload slots to start answering the query. The query is answered in two phases. In the first phase, processors compute component sizes. For each local component with name , such that , the head processor (eventually) sends a message “(” in a payload basket. The head outputs of these messages per bundle if it already knows its component sizes. After the last message, it outputs a “query phase done” token.

Each downstream processor passes the initial query downstream. Then for each message , the processor checks to see if is a building block for one of its local components . If it is, then it increments the size of . If is not a local building block, the processor sends the message downstream. When the processor receives the “query phase done” token, it knows the size of all its non-primitive building blocks, and hence knows the size of all of its local components. It sends its own “(” messages for each local component such that . When it has sent all its messages, it passes the “query phase done” token downstream. If the current graph has a connected component that has size at most , the message with its final size is passed through the tail and out to the analyst. The tail also passes the “query phase done” token to the head.

Sealed processors (full of tree edges) can set a flag indicating they have computed their component sizes. If there is another such query before an aging, then it removes messages associated with its local building blocks without incrementing any size counters.

In the second phase, the head processor (eventually) sends a message , for each primitive vertex in each local component reported in the first phase. For the head, all building blocks are primitive vertices. It’s possible to put more than one vertex in the latter kind of message (e.g. , depending upon the size of a slot. After the last such message, the head passes a “query done” token downstream.

When a downstream processor receives a message from upstream in the second phase, it checks to see if is a building block for one of its local components . If not, then it passes the message downstream. If so, then (i.e. the processor reported local component in the first phase, it relabels the message, sending downstream. If is too large, it just removes the message from the system.

When a downstream processor receives the “query done” token, it outputs messages , where is a local componet with and is a primitive building block (vertex) in local component .

A somewhat easier non-constant query is spanning tree. Starting with the head, each processor outputs its tree edges.

Some queries can be either constant-size (latency ) or non-constant depending upon what additional data structures the processors maintain. One example is “What is the degree of node ?” Suppose each processor maintains adjacency lists for the subgraph it holds. Then the processor can find the number of edges adjacent to a vertex in constant time, given a hash table to access the adjacency list for each vertex. In this case, the vertex-degree query has latency . The query makes one pass around the ring with the answer progressing one processor per tick. Otherwise, without this data structure, each processor will need time to compute the number of edges it holds that are adjacent to vertex . In this case, it is a non-constant query. The message still touches each processor once, but the processor may require multiple ticks to compute the number to add to the accumulating degree value.

Linear algebraic computations typically involve a matrix-vector product, which would be unweildy to compute directly in the X-Stream model. However, the emerging field of randomized linear algebra 

Drineas and Mahoney (2018) offers a path forward. If we devote some space in the tail processor to accommodate a sample of edges (adjusting Lemma 3 accordingly), payload slots can be used to accumulate a random sample of the graph. Techniques such as randomized PageRank Gasnikov and Dmitriev (2015) or others might then be applied in a separate thread in the tail processor, still with minimal interruption to the input stream.

11 Conclusion and Future work

We have provided the first comprehensive set of ideas to handle infinite graph streaming with bulk expiration events, including theory and a prototype implementation. Despite its use of the network interconnect and MPI message passing rather than memory access, the prototype sometimes matches the ingestion rate of a purely on-node Intel/TBB benchmark. Slow downs are data-dependent and might be mitigated by a multithreaded implementation and algorithm engineering. Furthermore, performance of a single XS-CC ring will benefit from advances in computer architecture. Although our prototype operates correctly, future work would be necessary to engineer a production version.

To close, we consider the possibility improving X-Stream’s ingestion rate by orders of magnitude. It is possible to do this if we leverage a key property of most real graphs: the giant component. Suppose that we must ingest such a graph via hundreds or thousands of disjoint streams, and suppose that we instantiate an independent XS-CC instance for each. We note that with overwhelming likelihood, each XS-CC instance will ingest a portion of the global giant component. Using ideas from Section 10, each XS-CC instance can stream its -sized components out to a “small-component server” (and notify that server of vertices in components that have joined the giant component). The small-component server would handle any connectivity query not involving the giant component (of which there are relatively few). Full detail is beyond the scope of this paper and we leave for future work.

Acknowledgements.
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. This research was funded through the Laboratory Directed Research and Development (LDRD) program at Sandia. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government. We thank Siva Rajamanickam, Cannada Lewis, and Si Hammond for useful discussions and baseline code for the TBB benchmark.

References

  • [1] G. Aggarwal, M. Datar, S. Rajagopalan, and M. Ruhl (2004) On the streaming model augmented with a sorting primitive. In Foundations of Computer Science, 2004. Proceedings. 45th Annual IEEE Symposium on, pp. 540–549. Cited by: §2.2.
  • [2] W. Aiello, F. Chung, and L. Lu (2000) A random graph model for massive graphs. In

    Proceedings of the thirty-second annual ACM symposium on Theory of computing

    ,
    pp. 171–180. Cited by: §1.
  • [3] A. Basak, J. Lin, R. Lorica, X. Xie, Z. Chishti, A. Alameldeen, and Y. Xie (2020) SAGA-bench: software and hardware characterization of streaming graph analytics workloads. In 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 12–23. Cited by: §8, §8.
  • [4] J. Berry, M. Oster, C. A. Phillips, S. Plimpton, and T. M. Shead (2013) Maintaining connected components for infinite graph streams. In Proceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, pp. 95–102. Cited by: §2.2, §8, item 1.
  • [5] M. Besta, M. Fischer, V. Kalavri, M. Kapralov, and T. Hoefler (2019) Practice of streaming processing of dynamic graphs: concepts, models, and systems. arXiv preprint arXiv:1912.12740. Cited by: §8.
  • [6] D. Chakrabarti, Y. Zhan, and C. Faloutsos (2004) R-mat: a recursive model for graph mining. In Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 442–446. Cited by: item 2.
  • [7] M. S. Crouch, A. McGregor, and D. Stubbs (2013) Dynamic graphs in the sliding-window model. In European Symposium on Algorithms, pp. 337–348. Cited by: §1, §8.
  • [8] C. Demetrescu, I. Finocchi, and A. Ribichini (2009) Trading off space for passes in graph streaming problems. ACM Transactions on Algorithms (TALG) 6 (1), pp. 6. Cited by: §2.2, §2.3, §3.1, Invariant 1.
  • [9] P. Drineas and M. W. Mahoney (2018) Lectures on randomized numerical linear algebra. The Mathematics of Data 25 (1). Cited by: §10.
  • [10] D. Ediger, R. McColl, J. Riedy, and D. A. Bader (2012) Stinger: high performance data structure for streaming graphs. In 2012 IEEE Conference on High Performance Extreme Computing, pp. 1–5. Cited by: §8, §8.
  • [11] A. V. Gasnikov and D. Y. Dmitriev (2015) On efficient randomized algorithms for finding the pagerank vector. Computational Mathematics and Mathematical Physics 55 (3), pp. 349–365. Cited by: §10.
  • [12] M. Grossman, H. Pritchard, S. Poole, and V. Sarkar (2020) HOOVER: leveraging openshmem for high performance, flexible streaming graph applications. In 2020 IEEE/ACM 3rd Annual Parallel Applications Workshop: Alternatives To MPI+ X (PAW-ATM), pp. 55–65. Cited by: §8, §8.
  • [13] K. Iwabuchi, S. Sallinen, R. Pearce, B. Van Essen, M. Gokhale, and S. Matsuoka (2016) Towards a distributed large-scale dynamic graph data store. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 892–901. Cited by: §8, §8.
  • [14] S. Kumar, W. L. Hamilton, J. Leskovec, and D. Jurafsky (2018) Community interaction and conflict on the web. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, pp. 933–943. Cited by: item 3.
  • [15] H. Kung (1980) Algorithms for vlsi processor arrays. Introduction to VLSI systems, pp. 271–292. Cited by: §3.
  • [16] J. Leskovec and A. Krevl (2014-06) SNAP Datasets: Stanford large network dataset collection. Note: http://snap.stanford.edu/data Cited by: item 3.
  • [17] A. McGregor (2014) Graph stream algorithms: a survey. ACM SIGMOD Record 43 (1), pp. 9–20. Cited by: §1, §8.
  • [18] J. I. Munro and M. S. Paterson (1980) Selection and sorting with limited storage. Theoretical computer science 12 (3), pp. 315–323. Cited by: §2.2.
  • [19] S. Muthukrishnan et al. (2005) Data streams: algorithms and applications. Foundations and Trends in Theoretical Computer Science 1 (2), pp. 117–236. Cited by: §2.2, §8.
  • [20] S. J. Plimpton and T. Shead (2014) Streaming data analytics via message passing with application to graph algorithms. Journal of Parallel and Distributed Computing 74 (8), pp. 2687–2698. Cited by: §9.1.
  • [21] P. Raghavan and M. Henzinger (1999) Computing on data streams. In Proc. DIMACS Workshop External Memory and Visualization, Vol. 50, pp. 107. Cited by: §2.2.
  • [22] J. Riedy, D. Ediger, D. A. Bader, and H. Meyerhenke (2011) Tracking structure of streaming social networks. In 2011 Graph Exploitation Symposium hosted by MIT Lincoln Labs, Cited by: §8.
  • [23] S. Sallinen, R. Pearce, and M. Ripeanu (2019) Incremental graph processing for on-line analytics. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 1007–1018. Cited by: §8.
  • [24] J. S. Vitter (1985) Random sampling with a reservoir. ACM Transactions on Mathematical Software (TOMS) 11 (1), pp. 37–57. Cited by: §9.5.
  • [25] B. Wheatman and H. Xu (2021) A parallel packed memory array to store dynamic graphs. In 2021 Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 31–45. Cited by: §8, §8.
  • [26] Wolfram—Alpha (2021) Wolfram alpha LLC. Note: https://www.wolframalpha.com/ Cited by: §6.
  • [27] C. Yin, J. Riedy, and D. A. Bader (2018) A new algorithmic model for graph analysis of streaming data. In International Workshop on Mining and Learning with Graphs, Vol. 10. Cited by: §8, §8.
  • [28] C. Yin and J. Riedy (2019) Concurrent katz centrality for streaming graphs. In 2019 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. Cited by: §8, §8.