1 Introduction
We assume that the graphedge stream is effectively infinite. This means that as long as the algorithm is running, it must always be prepared for the arrival of another edge. At any time, the system has seen only a finite set of edges and need store only a finite graph representation. However this graph can be arbitrarily large and may eventually exceed any particular finite storage. In order to deal with this case, we define a streaming model to exploit distributed systems with huge aggregate memory and to handle bulk deletions customized by a userprovided deletion or “aging” predicate. The most straightforward predicate would be (get rid of edges older than a certain timestamp), but users may require bulk deletions guided by different predicates. For example, they may wish to preserve some old edges of high value.
Previous theoretical work on infinite graph streams in a slidingwindow model does allow for automatic timebased expiration while maintaining connected components Crouch et al. (2013); McGregor (2014). We adapt and generalize these theoretical ideas by allowing userdefined deletion predicates and distributed computation. Furthermore, in Section 8 we briefly survey previous work on dynamic graph processing. For now, we simply note that while some of this literature achieves impressive edge ingestion rates, none of it explains how to continue ingestion indefinitely as the storage capacity fills up. These dynamic methods accept a stream of insertions and deletions, and if the former dominates the latter, the system will eventually fill and fail. In this paper, we spend most of our effort providing a theoretical basis for graph stream computations of arbitrary duration. We ensure that each edge is stored only once in our distributed model during normal conditions, and that we recover to that steady state in a predictable way after an aging event.
Many dynamic graph processing systems ingest edges concurrently in large blocks, making it potentially impossible to detect the emergence and disappearance of finegrained detail such as sized components that merge into a giant component, as predicted by Aiello et al. (2000). We model ingestion at a singleedge granularity to ensure that phenomena such as this will be observable.
Contributions
We give a distributed algorithm for maintaining streaming connected components. Processors are connected in a oneway ring, with only one processor connected to the outside. The algorithm is designed for cybersecurity monitoring and has the following set of features:

The system works with an arbitrary number of processors, allowing the system to keep an arbitrary number of edges.

Each edge is stored on only one processor, requiring space, so the asymptotic space complexity is optimal.

The system fails because of space only if all processors are holding edges to their maximum capacity.

Processing an edge or query requires almostconstant time per system “tick,” the time to run unionfind (inverse Ackermann’s function).

Connectivityrelated queries, spanningtree queries, etc, are answered at perfect granularity. Though there is some latency, the answer is perfect with respect to the graph in the system at the time the query was asked. This is in contrast to systems that process edge changes in batches up to millions allowing no finer granularity to queries.

Because some cyber phenomena do not have explicit edge deletions, the system removes edges only when required.

This edge deletion is done in bulk. Though querying is disabled during data structure repair, the system continues to ingest incoming edges. There is no need to buffer or drop edges if deletion happens during a period of lower use such as at night.

The analyst can select any (constanttime) edge predicate to determine which edges survive a bulk deletion. This allows analysts to keep edges they feel are of high value regardless of their age.

For agebased deletion, the system can trigger and select correct parameters for bulk deletion automatically.


If the analyst selects legal values (depending on properties of the hardware and input stream) for how many edges survive a bulk deletion and what fraction of the time the system must answer queries, the system will run indefinitely.
2 Preliminaries
2.1 Modeling the graph through time
We mark time based upon the arrival of any input stream element. Time starts at zero and increments whenever a stream element arrives. The input stream at time is the ordered stream that has arrived between time and time . A stream element is an input edge , a query (e.g., ), or a command (e.g. ). In Section 3 we formalize the operation of our new model, XStream, at each of these ticks.
Definition 1
 •
 Active Edge:

At any time , we say an edge is active if it has entered the system and no subsequent aging command has removed it.
 •
 Active Graph:

At any time , the active graph is . Edge set is the set of active edges at time and the vertex set is the set of endpoints of .
 •
 Active Stream:

At any time , an active stream is a subset of the input stream consisting only of active edges.
Note that differs from iff the stream element arriving at time is an edge not already in or or an aging command. In the latter case, is the set of edges that survive the aging.
2.2 Streaming models
In classic streaming models, computing systems receive data in a sequence of pieces and do not have space to store the entire data set. As in online algorithms, systems must make irreversible decisions without knowledge of future data arrivals. Graph streams are a type of such data streams in which a sequence of edges arrives at the computing system, which may assemble some of the edges into a graph data structure. Applications include modeling the spread of diseases in healthcare networks, analysis of social networks, and security and anomaly detection in computernetwork data. We focus on cybersecurity applications, in which analysts can infer interesting information from graphs that model relationships between entities. As the scale of such graphs increases, analysts will need algorithms to at least partially automate stream analysis.
We present detailed algorithms and a complete implementation of the realtime graphmining methodology introduced in Berry et al. (2013). In this streaming model, the full graph is stored in a distributed system. The model is also capable of bulk edge deletion, while continuing to accept new edges. The algorithm continuously maintains connectedcomponent information. It can answer queries about components and connectivity, except during a period of datastructure repair immediately following a bulk delete.
In classic graph streaming models, such as in Munro and Paterson (1980); Muthukrishnan and others (2005); Raghavan and Henzinger (1999), the input is a finite sequence of edges, each consisting of a pair of vertices. The edge sequence is an arbitrary permutation of graph edges, which may include duplicates. The output consists of each vertex, along with a label, such that two vertices have the same label if and only if they belong to the same component. Algorithms for the classic streaming model have two parameters: , the number of times the algorithm can see the input stream, or the number of passes on the stream, and , the storage space available to the algorithm.
The WStream model, developed in Demetrescu et al. (2009), uses the concept multiple passes. Each pass can emit a customized intermediate stream. WStream can support several graph algorithms. However, our work is specifically based upon the connected components algorithm of Demetrescu et al. (2009), which we will call DFR to recognize the authors: Demetrescu, Finocchi, and Ribichini.
In the streamsort model, introduced in Aggarwal et al. (2004), algorithms can create and process intermediate temporary streams, and reorder the contents of intermediate streams at no cost.
2.3 WStream
Because our work is an extension of the DFR finitestream connectedcomponents algorithm based on the WStream model, we describe that algorithm and model in detail now. DFR uses graph contraction. In contraction, one or more connected subgraphs are each replaced by a supernode. Edges within a contracted subgraph disappear and edges with one endpoint inside a supernode now connect to the supernode. For example, in Figure 1, connected subgraph (triangle) , , , is contracted into supernode .
In each pass, the WStream connected components algorithm ingests a finite stream and outputs a modified stream to read for the next pass. Each stream represents a progressively more contracted graph, where connected subgraphs contract to a node, until a single node represents each connected component. The stream at each pass comes in two parts. The first (A) part is the current, partially contracted graph, and the second (B) part lists the graph nodes buried inside supernodes. The initial input stream has all the graph edges in part A and an empty part B. The last finaloutput stream has an empty part A and part B gives a component label for each vertex. Figure 1 illustrates the input and output of the first pass of the WStream connectedcomponents algorithm.
During pass , the algorithm ingests streams and , in that order. First, it computes connected components using unionfind data structures until it runs out of memory. More formally, is the capacity of the WStream processor, measured in union operations. As shown in Figure 1, the set representative is one of the set elements, for reasons given later. This unionfind stage ingests a prefix of stream . Because its memory is now full, the processor must now emit information about the remaining stream rather than ingesting it. The algorithm incorporates what it has learned about the graph’s connected components into the input for the next pass. Specifically, in the contracted graph corresponding to stream (called ), each set in the unionfind data structure is represented as a supernode.
The DFR algorithm now generates the next stream from the remainder of stream . For each remaining edge in stream , if endpoint is buried within supernode , we relabel to . For example, in Figure 1, node is buried in supernode . Therefore, edge is relabeled to . If both endpoints are relabeled to the same supernode, DFR drops it according to contraction rules.
We now describe how to process stream and to emit stream . In the first pass, stream is empty. Stream tells which nodes are “buried” in the newlycreated supernodes. Specifically, stream is a set of pairs , where node is buried inside supernode . Node will never appear in any future stream . To process a nonempty stream , we use the same relabeling strategy we used while emitting stream . However, the interpretation is different. If is in stream , and supernode is now part of a unionfind set with representative , we emit to stream . This means that node is part of supernode in stream .
This process repeats until pass where is the empty stream. Stream can be interpreted as connectedcomponent labels. Two nodes have the same supernode label if and only if they are in the same connected component.
We now summarize the argument from Demetrescu et al. (2009) that the DFR algorithm is correct. The streams are pairs of vertices from the original graph that are in the same connected component by correctness of the unionfind algorithm. We can therefore interpret the pair as an edge in a stargraph representation for a (partial) connected component (in their lingo, a “collection of stars”).
Correctness of DFR follows from maintaining the following invariant at each pass.
Invariant 1
(Demetrescu et al. (2009) Invariant 2.2) For each , is a collection of stars and has the same connected components as .
Observation 1
DFR computes the same set of connected components for any permutation of the input stream, and any arbitrary duplication of stream elements.
3 XStream model
To motivate our XStream model, we first consider how the theoretical WStream model might be implemented. We show a plausible solution in Figure 2. A single processor reads and writes from files that store intermediate streams. As these files may be of arbitrary size, a direct implementation of WStream is only a notional idea. XStream is a theoretical variant of WStream that can be implemented efficiently.
In graph terms, WStream stores only spanning tree edges. It may drop any nontree edge since there is no concept of deletion in that model. Our XStream model must accommodate bulk deletion by design since its input stream is infinite, and this means that nontree edges must be retained. If a spanningtree edge is deleted, then some nontree edge might reconnect the graph.
Our XStream model supports WStreamlike computations on infinite streams of graph edges. We present XSCC, a connected components algorithm analogous to DFR but implemented in the XStream model. Concurrent, distributed processing allows the XSCC algorithm to handle streams without end markers, which are necessary in the DFR algorithm to distinguish between stream passes. Aging also allows the XStream model to manage unending edge streams in finite space. XStream has a ring of processors, plus an I/O processor, shown in Figure 3. The I/O processor passes the input stream of edges, as well as queries and other external commands, to the first processor of the ring. The I/O processor also outputs information received from the ring, such as query responses.
Let be a set of nonI/O processors defining the current state of the system. Processor is the head (also called ), Processor is the tail (also called ), and we define successor and predecessor functions as usual: for , for . Thus each processor passes data to its successor, including the tail, which passes data back to the head.
Each has edge storage capacity ; for simplification we assume all processors have capacity , for a total memory capacity of edges. We also assume since processors generally have at least megabytes of memory, even when there are enough processors for a relatively large .
We next define a notion of time for the XStream model. For this paper, we assume that all hashing and unionfind operations are effectively constanttime.
Definition 2 (XStream Step)
the XStream clock marks units of time, or ticks. At each tick, every processor is activated: it receives a fixedsize bundle of slots from its predecessor, does a constant amount of computation, and sends a size output bundle to its successor. The head processor also receives a single slot of information from the I/O processor at each tick.
XStream steps are thus conceptually systolic Kung (1980). though real implementations such as the one we present in Section 7 can be asynchronous.
Using the notion of the XStream clock, we can define the logical graph:
Definition 3
The logical graph at time is is defined as:

the edge set is the set of active edges at time and

the vertex set is the set of endpoints of .
Each message between ring processors is a bundle of constantsized slots. The constant is the bandwidth expansion factor mentioned in the abstract. One of our key modeling assumptions is that the bandwidth within a ring of processors is at least times greater than the stream arrival rate, i.e., that the bandwidth expansion factor is at least . We give theory governing this factor in Section 5 and corroborate that theory via experiment in Section 9.
Each slot can hold one edge and is designated either primary or payload by its position in the bundle. By convention, the first slot in a bundle is primary and all others are payload. Primary slots generally contain information from outside the system, such as input stream edges or queries, while payload slots are used during the aging process and during queries with nonconstant output size (for example, enumerating all vertices in small components).
Once a processor receives a bundle of slots, it processes the contents of each in turn. The processor can modify or replace the slot contents, as we will describe below. When it has finished with all occupied slots, it emits the bundle downstream.
In order to formally define the graph stored in the XStream model we consider two edge states: A settled edge is incorporated into the data structures of exactly one processor. A transit edge is in the system, but is not settled. For example, it may still be moving from processor to processor in a bundle.
3.1 Distributed Data Structures
The XStream data structure is a natural distributed version of the WStream data structure, except that we must store the entire graph to allow bulk deletions. WStream pass identifies a set of connected subgraphs in the input stream. XStream processor stores a spanning forest of these subgraphs. By construction, the connected components of this forest are the same as those of the WStream connectivity structure, called in Demetrescu et al. (2009).
To describe how XStream’s distributed data structures implement the WStream nesting strategy, we define the concepts of local components and building blocks.
Definition 4
The connected components identified by the unionfind structure on a processor are called the local components (LCs) of .
A processor downstream of will see a local component of contracted into a single node, which might incorporate into one of its local components. Figure 4 shows an example of three processors and their unionfind structures. As depicted in the first box of the figure, and are LCs of and and are incorporated in the LC on .
Definition 5
Building blocks (BBs) for processor are the elements over which does unionfind. A primitive building block contains exactly one vertex of the input graph. The set of all primitive building blocks is . A nonprimitive building block corresponds to a local component from a processor upstream of .
We say a processor consumes its building blocks because notification of their existence arrives from upstream and does not propagate downstream. A local component corresponding to a set in a unionfind structure encapsulates all the building blocks in that set. We now formalize the relationship between XStream distributed data structures and XStream processors. Figure 4 illustrates these concepts.
Definition 6
 •
 :

The unique processor that creates local component , denoted .
 :

the local component that contains building block , denoted .
 :

The unique processor that consumes building block , denoted .
Definition 7 (XStream nesting identity)
Let be a building block. Then
(1) 
The XStream Nesting Identity, which is true by construction, says that the processor storing the local component that encapsulates building block is the processor that consumed .
We base our algorithm description and correctness arguments on (1). In the XStream nesting structure, a building block is consumed by processor and incorporated into local component . The latter is a creation event, and the creating processor is .
3.2 Relabeling
Suppose that is the domain of possible vertex names from the edge stream, and is the set of possible names of building blocks/local components. The XStream nesting identity allows us to define a simple recursive relabeling function for processor , which returns the name of the local component that most recently encapsulates a primitive building block if such a local component exists, either on or upstream of . This function underlies correctness arguments for XStream data structures and connectivity queries.
Definition 8
Let the relabeling function be defined as follows. For , let , where is the primitive building block corresponding to vertex . For :
(2) 
In the example shown in Figure 4, , , and . Since the first processor that knows of is , , , and . are the primitive building blocks corresponding to the vertices and named using the vertex names.
An edge is received by the system as two vertices with their primitive building blocks and a time stamp : and . Since edges are undirected, an edge
will be referred to as having the same endpoints as . Processor for then receives edges in the form . For all edges, the most recent timestamp is stored.
3.3 Operation Modes
The XSCC algorithm operates in two major modes. The aging mode is active when a bulk deletion is occurring, as we will describe in Section 5. Otherwise the system is in normal mode.
XSCC uses the concept of relabeling, the XStream ring of processors, and periodic bulk deletion events to handle infinite streams of input edges. Each processor plays the role of an intermediate stream in WStream, borrowing the concept of loop unrolling from compiler optimization. As with the latter, we can store information concerning only a finite number of loop iterations (WStream passes) at one time. However, the periodic bulk deletions allow XSCC to run indefinitely.
4 Normal mode
During XSCC normal mode, at any XStream tick exactly one processor is in a state of building. This building processor, , accepts new edges, maintains connected components with a unionfind data structure, and stores spanning tree edges. XSCC normal mode maintains two key invariants, stated below and illustrated in Figure 5.
Invariant 2
Let be the building processor. Then is completely full of spanning tree edges for at all times, and has no spanning tree edges for .
When fills with tree edges, a building “token” is passed downstream to ’s successor, which assumes building responsibilities. Thus, XSCC maintains a spanning forest of the input graph, packed into prefix processors . The XSCC normal mode protocols maintain Invariant 2 and one other:
Invariant 3
Let be the first processor with any empty space. Then is completely full of edges for at all times, and has no tree or nontree edges for .
Invariants 2 and 3 are illustrated in Figure 5, with sets of spanning tree edges represented in red, and sets of nontree edges represented in blue. In normal mode operation, single edges arrive at each XStream tick and propagate downstream to the builder, being relabeled along the way. They settle into if they are found to be tree edges, and into otherwise. The figure shows the system at XStream tick . In this notional example, the edge that arrived in the previous tick has passed through the head processor , but has not yet been resolved as “tree” or “nontree.” Edge has passed through two processors, and relabeling of the endpoints has identified it as a nontree edge. The basic protocol is thus quite simple; the subtleties of XSCC normal operation arise in maintaining the invariants. For example, the builder may need to jettison nontree edges downstream to make room for new tree edges. We provide full detail in Section 7.
4.1 XSCC normal mode correctness
We now show that Invariant 2 and XSCC relabeling implies an exact correspondence between the connectivity structures computed by XSCC and DFR.
Algorithm 1 is a diagnostic routine intended to test implementations of XSCC. At XStream time , a call to this routine streams out the connected components of the active graph as a stream of (vertex, label) pairs. Although we could correctly stream these vertex pairs out even as new edges change the connected components (see Section 10 for additional algorithm steps), this version is more illustrative. For simplicity we assume that the input stream pauses at time .
When head processor receives the “dump components” command in a primary slot, it copies the command to the primary slot of the bundle it will emit. Then, in Lines 89 of Algorithm 1, processor fills the remaining payload slots with relationships from its unionfind structure, the way DFR outputs unionfind information into the streams. Specifically, if is the name of a building block encapsulated by local component in (i.e. ), then processor outputs a pair , where is the encapsulating supernode, the result of processor relabeling block . Processor then fills the payload slots of subsequent bundles until it has output all of its unionfind relationships.
For downstream processor (), the bundle that has the “dump components” command has pairs in the payload slots. In Lines 56 of Algorithm 1, processor relabels any block names that have been encapsulated by a new supernode in . Otherwise . After all relationships from upstream have arrived, there is empty payload for to output its unionfind information as described above.
We now argue the correctness of XSCC based on the correctness of WStream. Let denote the output of processor from this diagnostic. For the formal arguments, we require the following definitions:
Definition 9
During a union operation joining sets with representatives and , the supernode naming function is such that decides whether or becomes the new set representative.
For example, we might choose a supernode naming function . This is the function used in Figure 1.
Definition 10
is an implementation of DFR with processor unionfind capacity and supernode naming function run on input stream . XSCC is defined similarly for XSCC, with each processor’s unionfind capacity set to .
A resolved edge is one that has been classified as “tree” or “nontree.” Stream edges arrive in an unresolved state. In DFR, the stream written from pass contains only those edges that resolve to “tree” edges (they connect supernodes in the current version of the contracted graph). DFR deletes as “nontree” any edge that it determines to be contained inside a supernode. In contrast, XSCC must retain all nonduplicate edges, even after resolution. In particular, nontree edges must be retained in case they are needed to reconnect pieces of the graph after bulk deletion. XSCC removes duplicate edges from the stream after updating their timestamps.
By Invariant 2, all unique nontree edges (those contracted inside a supernode) are stored at the end of the XSCC data structure in spare space in the builder, or in processors downstream of the builder. These downstream processors have no unionfind structure. The following lemma ignores known nontree edges.
Lemma 1
The stream of unresolved edges sent from processor to in is exactly from .
Proof
We prove this lemma by induction. For the base case, the first pass of DFR and the first processor of XSCC receive the same same finite stream of unresolved edges from the outside (logical processor ), namely the input stream of edges . Suppose that the stream of unresolved edges sent from processor to processor is the same as stream from DFR . We show that the stream of unresolved edges processor sends to is exactly DFR stream .
Processor of XSCC and pass of DFR begin by computing connected components via unionfind. Every edge that changes connectivity (starts a new component or joins two components) uses one of the possible union operations for this processor/pass. When they have both done union operations (their capacity), they have computed identical unionfind data structures since they have done the same computations on the same input stream. At this point, DFR has not yet emitted any edges and XSCC has emitted only resolved nontree edges. Now DFR processes the remaining edges of , relabeling the endpoints, deleting edges where both endpoints are contained in the same supernode, and emitting the others to stream . XSCC relabels these remaining edges the same way, and emits the same stream of unresolved edges (among resolved nontree edges).∎
Since XSCC runs on unending streams, there is no “end of stream” mark to trigger creation of and processing of an DFR stream. However, the “dump components” diagnostic creates these streams.
Lemma 2
For followed by a call to DumpComponentLabels, stream is identical to stream from .
Proof
The “dump components” command after ingestion of a finite stream serves as an endofstream marker for XSCC . We prove the lemma by induction. For the base case, streams and , the component information input to pass of DFR and processor of XSCC respectively are both empty. Suppose that stream from is the same as stream from
XSCC followed
by a call to
DumpComponentLabels. We show that stream is the same stream . From the proof of Lemma 1, the runs of XSCC and DFR compute the same connected components in processor and pass respectively. Because XSCC and DFR are using the same supernode naming function and have the same capacity, the unionfind data structures (names of representatives and names of set elements) are identical. As described above, processor relabels and emits all elements of the same way that DFR pass relabels elements stream to stream . Then processor and DFR pass output the information in their identical unionfind structures in identical ways, completing streams and respectively. ∎
4.2 XSCC queries
The most basic XSCC query is a connectivity query: are nodes and in the same connected component? A query that arrives at XStream tick will be answered with respect to the graph . The query enters the system from the I/O processor and propagates through the processors just as new stream edges do. Each processor relabels the endpoints, and the tail processor returns “:yes” if the labels are the same and “:no” otherwise. This holds even if one or both of the endpoints have never been seen before. The following theorem shows that connectivity queries are correct at singleedge granularity, and therefore that XSCC in normal mode correctly computes the connected components of an edge stream.
Theorem 4.1
Suppose that the connectivity query arrives at the head processor of an XStream system with processors at XStream tick . Then the I/O processor will receive the boolean query answer at time . The answer will be True iff was connected to in , the logical graph that existed at time .
Proof
Recall that is the building processor. The query answer will be determined by XStream tick at the latest, since, by Invariant 2, is the last one to store any tree edges and hence, any unionfind information. Thus it is the last processor that can change a label. The query travels processortoprocessor in a primary basket, just as the dumpcomponents command does. If there are any transit edges in when the query arrives, they travel in slots of bundles strictly ahead of the query. Thus transit edges will settle into a processor before the query arrives. Similarly, any edges that arrive after the query travel in bundles strictly behind the query and cannot affect the query relabeling. Thus when the bundle with the query arrives at processor , the unionfind data structure, and the processor’s status as the builder or not, are set exactly according to the graph in the system when the query arrived.
Processing query is closely related to processing DumpComponentLabels. Instead of dumping information for every vertex, starting at the point where a vertex is first encapsulated in a supernode, simple queries only consider two vertices. The label for will only change from to a supernode label at the processor that first incorporates into a local component (). In DumpComponentLabels, processor is the first that outputs any pair , with first component into the stream . Thus, after the query has passed the building processor, the labels for vertices and are identical to their output values, which exit the system at time . By Lemma 2, these are the same labels they would have if DFR is run on graph . Because DFR is a correct connected components algorithm, vertices and will have the same label if an only if they are in the same connected component. ∎
We call queries that XSCC answers with latency constant queries. See section 10 for examples of nonconstant queries.
The next theorem shows that XSCC is spaceefficient, storing the current graph in asymptotically optimal space.
Theorem 4.2
In normal operation of XSCC, each edge is stored in exactly one processor, requiring space.
Proof
In normal operation, when a new edge arrives at a processor that already stores a copy of , processor removes from the stream and updates the timestamp of . Invariant 2 ensures that incoming tree edge encounters any previouslystored copies of itself before it reaches , the building processor, which recognizes it as a tree edge. Invariant 3 ensures that incoming nontree edge encounters any previouslystored copies of itself before it reaches , the first processor with any empty space. Furthermore, this invariant also ensures that there are no edges stored downstream of . ∎
Theorem 4.1 shows that basic connectivity queries are answered correctly by XSCC. In Section 10 we informally discuss three additional types of feasible queries: complex queries such as finding all vertices not in the giant component of a social network , vertexbased queries like finding the degree or neighborhood, and diagnostic queries regarding system capacity used. Also, by Invariant 2, XStream always knows a spanning tree of the streaming graph by construction. This tree could be checkpointed, for example, if processors share a filesystem.
5 Aging mode
(a) 
(b) 
XSCC handles infinite streams via a bulk deletion operation we call an aging event. Our model is thus unlike most previous work, in that we do not expect or support individual edge deletions embedded within the stream. Rather, we expect the system administrator to schedule bulk deletions to ensure that the oldest and/or least useful data are deleted in a timely manner.
To begin aging, the system administrator introduces an aging predicate (for example, a timestamp threshold) into the input stream. The predicate propagates through the system, and each processor suspends query processing upon receipt. However, a new stream edge might arrive in the XStream tick immediately after the aging predicate arrives from the I/O processor. This and all other new edges must be ingested and processed without exception. Thus, the connectivity data structures must be rebuilt concurrently with normal stream edge processing. When this rebuild is complete, queries are accepted once again.
We now describe how XSCC processes the aging predicate and prove correctness. In Section 6 we provide theoretical guarantees relating the fraction of system capacity used after the deletion predicate has been applied, the bandwidth expansion factor, the proportion of query downtime that is tolerable, and the expected stream edge duplication rate.
5.1 Aging process
Figure 6 illustrates the aging process. An aging token arrives with an edgedeletion predicate. As the token propagates downstream, all edges are reclassified to be untested. If an edge later passes the aging predicate it becomes unresolved since the old connectivity structure is no longer valid. Immediately after the aging token is received by the head processor, new stream edges may continue to arrive. These are processed as normal, starting from empty data structures, so we maintain Invariants 2 and 3 even during aging.
Conceptually, upon receipt of aging notification the deletion of all edges that fail the aging predicate and reclassification of all surviving edges to unresolved is instantaneous. However, in practice each processor takes XStream ticks to execute a “testing phase” that applies the aging predicate to each stored edge. Without careful attention to detail, implementers could allow a case in which there is no space yet for a new stream edge. In Section 7 we give exact specifications for a correct procedure that ensures no stream edge is dropped, even in the XStream tick immediately after aging notification. If the testing phase has yet not identified empty space for a new stream edge, then one of the unresolved edges can be sent downstream in a primary slot. This is an example of the jeopardy condition described later in this section, corresponding to Line 21 in Algorithm 1.
In addition to normal processing of new stream edges, XSCC recycles all unresolved edges that survive the aging predicate. As depicted in Figure 6, we introduce a new designation for a loading processor or “loader.” Upon each activation to process a stream edge, the loader packs unresolved edges into any available payload slots in the output bundle. Such bundles propagate around the ring. After a bundle reaches the head processor , its payload edges are processed as if they were new edges. When the loader has emitted all of its unresolved edges, it passes the loader token downstream to its successor. Aging is complete when the last processor with any unresolved edges has completed its loader duties.
The complete XSCC protocols defined in Section 7 enforce the previous invariants at all times, as well as the following invariant during aging.
Invariant 4
During aging, let be the loading processor and be the building processor. Then . Also, processor has no unresolved edges for and has no resolved edges for .
The combination of all invariants ensures that all processors from the head to the builder are running XSCC in normal mode on all incoming (and recycled) edges. All resolved edges are packed to the front (upstream). When all edges have been recycled and aging ends, the layout of edges returns to normal mode.
Figure 7 puts the nomenclature of our arguments into context. An edge becomes resolved when an XSCC processor determines that it is a tree or nontree edge, regardless of whether it is a new stream edge in a primary slot or an unresolved edge being recycled as payload. Processors ingest and emit bundles of edges. With one exception we will discuss presently, the complexity of processing input bundles and packing edges into output bundles prior to emission is relegated to Section 7.
Aging is generally a straightforward process in which the loader token steadily advances from to , unresolved edges are recycled and resolved, and the XSCC connectivity structure is rebuilt. When builder and loader designations coincide in the same processor, that processor packs unresolved edges for emission first, then nontree edges. Edge bundles containing transit edges have one primary slot and payload slots, where is the bandwidth expansion factor. New stream edges reside in primary slots, and unresolved edges circulate in payload slots until they are resolved. Payload edges continue in their assigned slots until allowed to settle, per the invariants.
There is a single exception to this last point, illustrated in Figure 8. We call this the jeopardy condition and use it to specify exactly when the system fills to capacity during aging (indicating that the aging command was too late or did not remove enough edges). In the jeopardy condition, processor is also the loader, is already storing edges to its capacity , and must ingest an edge bundle with no empty slots. It ingests slots, finds no duplicates, and by conservation of space, must emit slots. Therefore an unresolved edge must reside in the primary slot. If cannot be offloaded before exiting the tail, the system is completely full and raises a FAIL condition.
The above discussion and the more detailed discussion in Section 7 show the following property necessary for proving aging correctness holds:
Property 1
During aging, every surviving edge is incorporated into the new connected components data structure either by directly or by traveling back to as a payload edge.
5.2 Aging correctness
We now argue correctness of the aging process. We say that any implementation of XSCC aging that maintains Invariants 2, 3 and 4 and property 1 is compliant. A compliant aging process ensures that during aging there is a monotonic ordering of edges in the system, with tree (red) edges never allowed downstream of nontree (blue) edges, and unresolved (gray) edges never allowed upstream of nontree edges. In the argument below, we slightly abuse notation by using the graph in place of its edge set .
Theorem 5.1
Suppose a compliant XSCC implementation receives an aging command at tick and reauthorizes queries at tick . Let be the set of edges in that fail the aging predicate and let be the set of edges that arrive between time and . Then at tick , the xstream system stores graph , can properly answer queries and stores each edge in exactly once.
Proof
As the aging command that arrived at time propagates through the processors, they reclassify all current edges to “untested” as described in Section 5.1, forgetting the current unionfind structure. Thus the system starts processing a new graph from an empty state at time . As described in Section 5.1, processors delete all edges in , those who fail the predicate. Each remaining edge in is eventually loaded into payload slot by Property 1, and processed at the head as arriving edges. Invariants 2 and 3 hold with the newlycreated data structures thoughout aging. Invariant 4 ensures that all unresolved edges are in the builder processor or downstream. Those in the builder do not affect the connectivity computation and are eventually moved downstream. Thus, all edges arriving from outside the system are processed as in normal mode and all edges arriving in the payload slots are processed as in normal mode (other than traveling in a payload slot). Thus at time when the tail processor passes the loading token out of the system and enables queries, the XStream system stores exactly the edges in , with duplicates appropriately removed. This is the graph the system is required to hold by definition of aging and the requirement that it drop no incoming edges during aging. The edges are processed into the data structures with arbitrary mixing of new edges and recycled (surviving) edges. By Observation 1, and the equivalence of DFR and XSCC in normal mode, the ordering of the input edges does not matter for future query correctness. By Theorem 4.1, the XStream system will now correctly answer querries on the graph starting at time .
During aging, some edges may be stored up to twice. If a duplicate of a suriving edge enters the system before edge circulates back to the head processor, then edge is stored both in the new data structure as a tree or nontree edge and as an unresolved edge. However, when edge is eventually recycled, it will be recognized as a duplicate and not stored again. By Theorem 4.2, any edge that enters from outside the system during aging will be stored at most once in the new data structure. ∎
6 Conditions for successful aging
In this section, we define the conditions under which a compliant aging process completes before the system fails for lack of space. We consider properties of the system, properties of the input stream, and user preferences.
Definition 11
We define the following as tradeoff parameters associated with infinite runs of XSCC.
 c:

fraction of the total system storage occupied by edges that survive the aging predicate
 d:

percentage of XStream ticks that the system is unavailable for queries due to aging
 u:

estimate of the percentage of incoming stream edges that will be unique
 k:

the bandwidth expansion factor: the size of an XStream bundle (a set of edgesized slots that circulates in the ring)
 p:

number of XStream processors
 S:

aggregate storage available in the system
 s:

storage per processor in a homogeneous system ()
Aging must be initiated before the system becomes too full, or else jeopardy edges will lead to a FAIL condition. We quantify this decision point as follows.
Lemma 3
In the worst case, there must be at least
open space in the system when an aging command is issued to be guaranteed sufficient space for aging, where , , and are given in Definition 11.
Proof
When the aging command arrives, there could be edges in transit that all must be stored. Because iteration over the untested list doesn’t imply any specific ordering, in the worst case, when processors test the edges against the predicate, all surviving edges are tested before an edge fails the predicate. This gives the latest time when space becomes free for new edges. When a processor receives the aging command, it processes untested edges each tick until it has tested all its edges. In the ticks required for the aging command to reach the tail, the head tests edges, the second processor tests edges and so on, while the tail tests edges. Thus in the first ticks after aging starts, the system tests edges. After that, the system tests edges per tick. If the system tested , every tick, it would require ticks. But the first ticks are only half as efficient, so we require an extra ticks. Thus the total number of ticks before the system is guaranteed to remove an edge that fails the predicate is at most . ∎
If the system is homogeneous, the empty space expression in Lemma 3 becomes . For example, for a homogeneous system, assuming that , if and , then one should start aging while of the last processor is still empty. The last processor can issue a warning when it starts to fill and again closer to the deadline given and .
Theorem 6.1
Proof
After the aging token arrives, the head processor must apply the aging predicate to its edges. It processes per tick, as described in Section 5.1. Thus, after ticks, the head processor passes the loader token to the second processor. By that time, all other processors have applied the predicate to all of their edges and have a list of surviving edges. Once unresolved edges begin circulating from the loader (ignoring additive latencies such as the time until the first payload reaches the head processor since ), edges reenter the system to be resolved at each tick. Since unresolved edges survived the aging predicate, in the worst case (when they are all in the second processor or later) it will take ticks to complete aging. During this time, every ticks yields a new, nonduplicate stream edge. Thus, the system will fill to capacity in ticks. The proportion constrains these two tick counts as follows:
Simplifying this inequality and solving for (with Wolfram Alpha Wolfram—Alpha (2021), for example) yields the result. ∎
The parameters and are user preferences, but is dictated by computer architecture. Reasonable values of for current architectures are , but emerging data flow architectures may provide upward flexibility. The parameter must be estimated by the user based on knowledge of the input streams that s/he will feed to XSCC.
We can now state the central result of this paper.
Theorem 6.2
Proof
Assuming that the proportion of XStream ticks that yield a new, nonduplicate stream edge is , an empty system will fill and fail in ticks. Compliant aging in accordance with Lemma 3 ensures that aging will always complete before the system fills. During normal mode operation, Lemma 2 and Theorem 4.1 ensure, respectively, that accurate connected component information is stored, and that connectivity queries are answered correctly. As long as the system adminstrator adheres to such a schedule, XSCC operation can continue through an arbitrary number of aging events. ∎
We note that queries yielding system capacity usage are TODO constantsize. In the case of a simple aging predicate such as a timestamp threshold, given a target proportion of edges that survive an aging event, the XStream system administrator could use an automated process to trigger the aging process.
7 XStream edge processing specification
Algorithms 1 and 2 show the XSCC driver and constituent functions, respectively, for processing edges. We do not show full detail for token passes, commands, and queries. These functions maintain the invariants and produce a compliant XSCC implementation. We used this pseudocode as guidance for the code that produces our experimental results.
Each XStream processor executes ProcessBundle whenever it receives the next bundle of edge slots, regardless of its current execution mode (normal or aging). It will process each slot in turn, and the constituent functions ProcessEdge, ProcessPotentialTreeEdge, and StoreOrForward determine what to pack into an output bundle destined to flow downstream.
Note that the toplevel logic of processing the primary and payload edges of a bundle is the same in Algorithm 1, regardless of execution mode. When a new edge arrives from the stream, processors upstream of (and including) the building processor will classify it as tree or nontree using the relabeling logic of Section 3.2 (Lines 1518 of ProcessEdge and Lines 24 of ProcessPotentialTreeEdge). The builder stores any new tree edge. We ensure that this is possible via logic to jettison an unresolved edge if one exists (only during aging; Lines 9 and 16 of StoreOrForward), or else to jettison a nontree edge (Line 15 of StoreOrForward). This progression of jettison logic maintains Invariants 2 and 3.
Suppose that the head processor receives notification of an aging event at XStream tick . XStream ticks and are especially interesting. If a new edge arrives in the input stream at , it must be stored in (which is now acting as both the builder and the loader ) in order to maintain Invariant 4. However, has had only one tick to initiate the process of testing its edges against the aging predicate. That means that it tested edges in tick . Suppose all of these edges survived the predicate and therefore couldn’t be deleted. This is a jeopardy condition, and it was handled during tick by Lines 2021 of ProcessBundle. Favoring the new edge, jettisoned in the primary slot of its output bundle the last of the unresolved edges it created in that tick. Therefore, at tick we are assured that can store a new stream edge.
During aging, the loader packs unresolved edges into the empty payload slots in incoming bundles to be sent around the ring. When these edges arrive at , they are processed as if they were new stream edges, classified as tree or nontree, and incorporated into the data structures in by the same invariantmaintaining constituent functions that handle new edges. One optimization we include is that need not actually pack and send its unresolved edges around the ring. Rather, in Lines 1323 of PackBundle, simply tests against the aging predicate and immediately processes its tested edges rather than calling them unresolved.
As aging proceeds, the Loader token is passed downstream whenever a processor exhausts its list of unresolved edges (Lines 2831 of ProcessBundle). Once the Loader token exits the tail processor, Property 1 is established.
8 Related work
This paper is motivated by the obseration that no other published work meets the needs of the cybersecurity use case we describe in Section 1. Most work on theoretical streaming problems, best surveyed in Muthukrishnan and others (2005), is limited to the finite case. Some research in the past decade has addressed infinite graph streaming in a slidingwindow model Crouch et al. (2013); McGregor (2014) but the work is quite abstract, and expiration must be addressed with every new stream observation. We were unable to apply this theory directly in a distributed system with bulk deletions, but our XSCC algorithm could be thought of as a generalized, distributed implementation of these sliding window ideas.
As far as we know, our XStream model and XSCC graphalgorithmic use case comprise the first approach to infinite graph streaming that is both theoreticallyjustified and practical. We provided an initial view of the XStream model in a 2013 KDD workshop paper Berry et al. (2013), and provide full detail of a greatlystreamlined version in this paper.
As the previous sections have made clear, our focus is infinite streaming of graph edges with theoretical guarantees and a welldefined expiration strategy with a path to implementation in simple distributed systems. Thus, we have approached the problem from a theoretical streaming perspective, focusing primarily on related “peredge arrival” streaming details. We have shown how to maintain connectivity and spanning tree information. We hope that others will expand the set of graph queries available in XStream, and/or propose new infinite streaming models.
The closest related work comes from the discipline of dynamic graph algorithms, which takes a different approach. Work in this area typically assumes that the graph in its entirety is stored in a shared address space or supercomputing resource. Updates to the graph come in batches and take the form of edge insertions and sometimes deletions too. After a batch of updates is received, incremental graph algorithms update attributes such as connected components or centrality values. During this algorithmic update the stream is typically stopped. There is no attempt to describe what running infinitely in a finite space would mean other than to rely on an implicit assumption that the batches will have as many deletions as insertions over time. We know that in the cybersecurity context, for example, this assumption will never be true. An impressive survey of dynamic graph processing systems is found in Besta et al. (2019).
We break down the area of dynamic graphs into data structure work that builds a graph without computing any algorithm (for example Ediger et al. (2012); Riedy et al. (2011); Iwabuchi et al. (2016)), work that “stops the world” at each algorithmic update (for example Riedy et al. (2011); Basak et al. (2020); Wheatman and Xu (2021)), and recent attempts to to process graph updates update algorithmic results concurrently Yin et al. (2018); Yin and Riedy (2019); Sallinen et al. (2019); Grossman et al. (2020).
The data structure group includes solutions such as Ediger et al. (2012), which achieves a rate of over 3 million edges per second on a Cray XMT2 supercomputer using a batch size of 1,000,000 edge updates, and Iwabuchi et al. (2016), which achieves a rate of more than two billion edges per second on a more modern supercomputer while maintaining some information about vertex degrees. While these rates are impressive, approaches such as these require a supercomputer and don’t specify how to continue running as their storage fills up.
When incremental computation of graph algorithms such as BreadthFirst Search (BFS), connected components, PageRank, and others, SAGABench Basak et al. (2020) can achieve latencies of a fraction of a second on conventional hardware using an update batch size of 500,000 edges. This tranlates to a few million updates per second, while also maintaining incremental graph algorithm solutions. Wheatman and Xu also exploit large batches of edge updates and advanced data strctures (packedmemory arrays) to approach this problem Wheatman and Xu (2021). They achieve amortized update rates of up to 80 million updates per second while maintaining perbatch solutions to graph problems such as connected components, where update batches can be of size 10,000,000 or greater. Even if our analysts could tolerate such batch sizes, however, what prevents us from simply adopting their approach is the requirement that we must have a methodology for running infinitely.
We conclude our discussion of the dynamic graph literature with recent results that process graph updates and update algorithmic results without interrupting the input stream. The HOOVER system can run vertexcentric graph computations on supercomputers that update connected components information at an ingestion rate of more than 600,000,000 edges per second Grossman et al. (2020). However, the update algorithm works only for edge insertions so our requirements are not met and the system would quickly fill up. Yin, et al. Yin et al. (2018) propose a concurrent streaming model, and Yin & Riedy Yin and Riedy (2019) instantiate this model with an experimental study on Katz centrality. However, overlapping graph update and graph computation still does not meet our need for a strategy to compute on infinite streams.
9 Experiments
Bundle Size  64bit ints/s  XStream potential ()  XStream potential ()  

Benchmark 1  5  1742160.27  174216.02  69686.41 
Benchmark 1  25  8680555.55  868055.55  347222.22 
Benchmark 1  250  54112554.11  5411255.41  2164502.16 
Benchmark 2  5  1344086.02  134408.60  53763.44 
Benchmark 2  25  6281407.03  628140.70  251256.28 
Benchmark 2  250  35063113.60  3506311.36  1402524.54 
The XStream model and the XSCC algorithm are based on message passing. At each XStream tick, each processor performs only a constant number of operations. These are predominantly hashing operations, unionfind operations, and simple array access. Therefore, performance of XSCC is strongly tied to computer architecture. The faster a system can perform hashing and message passing, the faster XSCC will run.
With a current Intel computer architecture (Sky Lake), we will show that our initial XSCC implementation can almost match the peak performance of a simple Intel/Thread Building Blocks (TBB) benchmark that transfers data between cores of the processor. This translates to streaming rates of between half a million and one million edges per second, which is comparable to the low end of performance spectrum for modern dynamic graph solutions (none of which handle infinite streams). The high end of that spectrum is not comparable to our context since we require no supercomputer and ingest data from only one processor. We have ideas to exploit properties of many graphs (such as the phenomenon of a giant connected component) for running many instances of XSCC concurrently to boost our rates by orders of magnitude. However, that is beyond the scope of this paper.
9.1 Computing setup and benchmarking
All results in this paper were obtained using a computing cluster with
Intel Sky Lake Platinum 8160 processors, each with 2 sockets,
24 cores/socket, 2 HW threads/core (96 total), and 192GB DDR memory. The
memory bandwidth is 128GB/s, distributed over 6 DRAM channels. The
interconnect is Intel OmniPath Gen1 (100Gb/s).
The operating system is CentOS 7.9, and our codes are compiled with Intel icpc
20.2.254 using the flags O3 xCOREAVX512
.
Our full implementation of XSCC is singlethreaded ^{1}^{1}1the normalmode computations and data structures are singlethreaded. We use another thread for cleanup and reallocation at an aging transition. and written in PHISH Plimpton and Shead (2014), a streaming framework based on message passing. However, before presenting XSCC results, we explore the expected peak performance of the algorihm on a single node of the Sky Lake cluster using the vendor’s own software library (Thread Building Blocks 2019 8).
Mini benchmark
We implemented a simple ring of XStreamstyle processing modules in TBB. The head module accepts bundles of synthetic data from an I/O module and sends them down the ring toward the tail, which feeds back to the head. The latter merges this bundle with its next input bundle. We further distinguish two benchmarks:

Benchmark 1: Each processor either simply copies input bundles to output.

Benchmark 2: Each processor hashes two of every five integer of the input bundle and copies the input to the output. This approximately reflects the main computation kernels of XSCC: hashing the timestamp of each each, and doing a unionfind operation.
Table 1 shows the performance of our TBB benchmarks as the number of 64bit integers in a bundle is varied. For these runs there are 10 processors in the ring. Recall that XSCC edges circulate as 5tuples of 64bit integers: where are vertex ids, are local component labels, and is a timestamp. Therefore the raw rates of the third column must be divided by 5 to count in units of XStream primary edges. Furthermore, to account for the payload slots in XSCC bundles active during aging or nonconstant query processing, the primary edge rate must be divided by the bandwidth expansion factor . With optimal use of Intel’s TBB, we see that we should pass messages containing roughly 250 64bit integers, and we expect XSCC edge streaming rates to be bounded by 1.4 million edges per second.
Since Benchmark 2 is equivalent to Benchmark 1 except for a larger compute load, we see clearly that Benchmark 2 is not bandwith bound on this architecture. We believe that these benchmarks are bound by a combination of compute and memory latency. We experience a slowdown from 2.1 million edges per second to 1.4 million simply by adding two hashing operations per bundle. As our experiments with XSCC will show, the latter is likely even more compute bound. This is welcome since it admits the possibility that multithreading within TBB nodes and XStream processors could accelerate our singlethreaded edgeprocessing results.
Furthermore, in a real deployment of XSCC, we would assign a single XStream PE to a single compute node and communicate over the interconnect. In fact, that is the basis of the XSCC results presented in Figures 9, 10, and 11. In this case, we can compute the approximate theoretical peak for an XStreamlike computation as follows. The interconnect is 100Gb/s, or 12.5GB/s. That translates to 1.5 billion 64bit integers per second. Since XSCC uses messages with 5 64bit ints to represent an edge, and a typical value of the XStream bandwidth expansion factor is 5, we are bounded by million XSCC primary edges per second. The rates of our prototype implementation do not approach this number, so we believe that like the benchmarks, we are bound by a combination of compute and memory latency. A multithreaded production version of XSCC would likely be necessary to better exploit a computing environment such as our Sky Lake cluster. With that said: the TBB benchmark itself falls far short of the possible performance suggested by Sky Lake’s theoretical peak memory bandwidth of 128GB/s. Significant algorithm engineering may be necessary to obtain a perfomant, production version of XSCC.
Datasets
We present prototype XSCC results on three datasets:

An anonymized stream of 10 million real gateway network traffic edges from Sandia (the same stream used in Berry et al. (2013)).

A stream of edges from an RMAT graph with 2097152 vertices, edge factor 8, and SSCA2 parameters (0.45, 0.15, 0.15, 0.25) Chakrabarti et al. (2004).

A synthetic dataset with 100 continguous observations of each of a stream of edges with new, unique endpoints.
For experiments below validating Theorem 6.1, we note that Dataset 3 has a uniqueness parameter () of roughly 0.67.
9.2 XSCC implementation
We used PHISH with the MPI back end to implement the XSCC algorithm. Stream processing modules in PHISH are called “minnows,” and we instantiated a minnow to serve as the XStream I/O processor and a group of minnows to form the XStream ring of procssors (one per compute node in the Sky Lake cluster). We also ran with a single compute node hosting all XSCC PE’s. However, since our prototype is compute bound the rates we achieved were comparable and are not presented.
We now present results from Sky Lake runs of our PHISHbased, singlethreaded implemenation of XSCC. Before collecting these results, we validated the correctness of XSCC by designating every tenth stream edge to be a connectivity query, statically computing the correct connected components, and confirming that XSCC’s query result matched that from the static computation (with and without aging events). We ran this validation on a prefix of approximately 800,000 edges from Dataset 3.
9.3 Experiment 1: XSCC normalmode streaming rate
Figure 9 shows XSCC streaming rates for normal mode in the three datasets. We streamed the full Datasets 1 and 2 and a prefix of 30 million edges of Dataset 3. Our singlethreaded prototype implementation is compute bound, as verified by computing to the benchmark results of Table 1. Note that the performance of our prototype is heavily datadependent. On the “easy” synthetic dataset (Dataset 5 in the figure), note that we match rates with Table 1. When real datasets cause more work, the ingestion rate drops, again showing that we are compute bound. Our prototype achieves rates between 500,000 and 1,000,000 edges per second, depending on the dataset. We note that Dataset 1, which is a real dataset, has many repeat edges and admits an ingestion rate of one million edges per second.
9.4 Experiment 2: XSCC with a single aging event
Recall that Theorem 6.1 relates the XStream bandwidth expansion parameter to parameters of the system and dataset. We validate that theorem empirically for a 30 millionedge prefix of Dataset 3 by analyzing a single aging event that is triggered at XStream tick . In Figure 10, the capacity of one XStream processor () is fixed for each value of of such that the system will be completely full at the end of the 30 millionedge stream, after processing one aging event^{2}^{2}2In particular, if we age at XStream tick such that fraction of the edges stored so far survives, then our total storage needed is and . The number of XStream processors () is fixed at . We vary over a range of values of (the target fraction of edges that survive the aging event) and (the fraction of XStream ticks in which queries are enabled), we show using a 3D surface the predicted by 6.1. We overlay empirical results in the form of observed data points from XSCC runs when the bandwidth expansion factor is set as predicted by the theorem. We claim that the prediction surface and observed data points corroborate the theorm in this experiment.
9.5 Experiment 3: XSCC runs of arbitrary length
The most important contribution of our work is our set of ideas regarding infinite graph streams and running XSCC for indefinite periods of time without filling up or failing. We corroborate these ideas empirically in this section for a simple use case: the aging predicate is a simple timestamp comparison: delete all edges older than XStream tick . This strategy could be adapted to accommodate other aging predicates.
The primary challenge facing an XStream system administrator is when to initiate an aging event and what threshold to use. In this section we present an automated solution. The system administrator initializes the system with a target value (the fraction of edges that should survive an aging event). Then we run XSCC with the following aginginvocation protocol. This could be specified in detailed pseudocode, but in the interest of space we describe it informally below.
When the tail processor begins to fill, we begin an automated binary search to find a timestamp that will hit our fractional target of edges that survive the aging event. We augment the data structures of each XStream ring processor to include a small reservoir of 100 edges, and ensure that this is a representative sample by using the classical technique of reservoir sampling Vitter (1985).
The binary search proceeds by varying between the oldest timestamp and newest timestamps in the system. At each candidate value of , each ring processor estimates the number of edges that will survive an aging event with threshold . In one circuit of the ring in a payload bundle, the tail processor will know whether needs to be increased or decreased in order to hit the target value. In a logarithmic number of passes, the tail processor knows an accurate value of and tells the head to initiate aging with that threshold. In practice, this should give plenty of time to complete aging honoring Theorem 3.
Figure 11 depicts the result of running XSCC on a 300 millionedge prefix of Dataset 3 using this automated aging strategy with a target value of 0.5. We see that the binary search succeeds in finding aging thresholds that reliably reestablish a storage level of over an arbitrary number of aging events (we depict the first 27).
10 Nonconstant queries and commands
As we have shown, connectivity queries propagate through the XStream ring processings in XStream ticks, and the query answer is sent from the tail processor to the head, then back to the I/O processor. Another potentially useful query that finishes in XStream ticks is “How many edges are in the system?”
XStream also supports queries with nonconstantsized output. At most one such query can be active at a time. The answer to the query is output in constantsized pieces using the payload slots. The canonical nonconstant query is a request to output all vertices in small connected components. Specially, the answer is the names of all components with at most vertices and the list of vertices within them. This query makes practical sense only in graphs that have a giant connected component, but most real graphs have one. We describe how XStream executes this specific query.
For a local component with name , let be the number of vertices in . A processor can compute the size of a local component as the sum of the number of vertices in each building block. This is for a primitive building block. For this discussion, we assume processors keep track of the number of primitive building blocks for each local component while building these components. This adds only constant work per unionfind operation. However, it’s also possible to inialize localcomponent sizes to zero and compute them onthefly for this query. But then, the processor does at most k1 work counting primitive building blocks or outputing the messages below, which will further delay the query response. Processors receive the size of nonprimitive building blocks from upstream processors.
When the head processor receives the query “Output the vertices in components that have at most vertices” in the primary slot of a bundle, it passes the query downstream in the primary slot. This allows all processors to learn the type of query and the parameter . The head then uses the payload slots to start answering the query. The query is answered in two phases. In the first phase, processors compute component sizes. For each local component with name , such that , the head processor (eventually) sends a message “(” in a payload basket. The head outputs of these messages per bundle if it already knows its component sizes. After the last message, it outputs a “query phase done” token.
Each downstream processor passes the initial query downstream. Then for each message , the processor checks to see if is a building block for one of its local components . If it is, then it increments the size of . If is not a local building block, the processor sends the message downstream. When the processor receives the “query phase done” token, it knows the size of all its nonprimitive building blocks, and hence knows the size of all of its local components. It sends its own “(” messages for each local component such that . When it has sent all its messages, it passes the “query phase done” token downstream. If the current graph has a connected component that has size at most , the message with its final size is passed through the tail and out to the analyst. The tail also passes the “query phase done” token to the head.
Sealed processors (full of tree edges) can set a flag indicating they have computed their component sizes. If there is another such query before an aging, then it removes messages associated with its local building blocks without incrementing any size counters.
In the second phase, the head processor (eventually) sends a message , for each primitive vertex in each local component reported in the first phase. For the head, all building blocks are primitive vertices. It’s possible to put more than one vertex in the latter kind of message (e.g. , depending upon the size of a slot. After the last such message, the head passes a “query done” token downstream.
When a downstream processor receives a message from upstream in the second phase, it checks to see if is a building block for one of its local components . If not, then it passes the message downstream. If so, then (i.e. the processor reported local component in the first phase, it relabels the message, sending downstream. If is too large, it just removes the message from the system.
When a downstream processor receives the “query done” token, it outputs messages , where is a local componet with and is a primitive building block (vertex) in local component .
A somewhat easier nonconstant query is spanning tree. Starting with the head, each processor outputs its tree edges.
Some queries can be either constantsize (latency ) or nonconstant depending upon what additional data structures the processors maintain. One example is “What is the degree of node ?” Suppose each processor maintains adjacency lists for the subgraph it holds. Then the processor can find the number of edges adjacent to a vertex in constant time, given a hash table to access the adjacency list for each vertex. In this case, the vertexdegree query has latency . The query makes one pass around the ring with the answer progressing one processor per tick. Otherwise, without this data structure, each processor will need time to compute the number of edges it holds that are adjacent to vertex . In this case, it is a nonconstant query. The message still touches each processor once, but the processor may require multiple ticks to compute the number to add to the accumulating degree value.
Linear algebraic computations typically involve a matrixvector product, which would be unweildy to compute directly in the XStream model. However, the emerging field of randomized linear algebra
Drineas and Mahoney (2018) offers a path forward. If we devote some space in the tail processor to accommodate a sample of edges (adjusting Lemma 3 accordingly), payload slots can be used to accumulate a random sample of the graph. Techniques such as randomized PageRank Gasnikov and Dmitriev (2015) or others might then be applied in a separate thread in the tail processor, still with minimal interruption to the input stream.11 Conclusion and Future work
We have provided the first comprehensive set of ideas to handle infinite graph streaming with bulk expiration events, including theory and a prototype implementation. Despite its use of the network interconnect and MPI message passing rather than memory access, the prototype sometimes matches the ingestion rate of a purely onnode Intel/TBB benchmark. Slow downs are datadependent and might be mitigated by a multithreaded implementation and algorithm engineering. Furthermore, performance of a single XSCC ring will benefit from advances in computer architecture. Although our prototype operates correctly, future work would be necessary to engineer a production version.
To close, we consider the possibility improving XStream’s ingestion rate by orders of magnitude. It is possible to do this if we leverage a key property of most real graphs: the giant component. Suppose that we must ingest such a graph via hundreds or thousands of disjoint streams, and suppose that we instantiate an independent XSCC instance for each. We note that with overwhelming likelihood, each XSCC instance will ingest a portion of the global giant component. Using ideas from Section 10, each XSCC instance can stream its sized components out to a “smallcomponent server” (and notify that server of vertices in components that have joined the giant component). The smallcomponent server would handle any connectivity query not involving the giant component (of which there are relatively few). Full detail is beyond the scope of this paper and we leave for future work.
Acknowledgements.
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DENA0003525. This research was funded through the Laboratory Directed Research and Development (LDRD) program at Sandia. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government. We thank Siva Rajamanickam, Cannada Lewis, and Si Hammond for useful discussions and baseline code for the TBB benchmark.References
 [1] (2004) On the streaming model augmented with a sorting primitive. In Foundations of Computer Science, 2004. Proceedings. 45th Annual IEEE Symposium on, pp. 540–549. Cited by: §2.2.

[2]
(2000)
A random graph model for massive graphs.
In
Proceedings of the thirtysecond annual ACM symposium on Theory of computing
, pp. 171–180. Cited by: §1.  [3] (2020) SAGAbench: software and hardware characterization of streaming graph analytics workloads. In 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 12–23. Cited by: §8, §8.
 [4] (2013) Maintaining connected components for infinite graph streams. In Proceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, pp. 95–102. Cited by: §2.2, §8, item 1.
 [5] (2019) Practice of streaming processing of dynamic graphs: concepts, models, and systems. arXiv preprint arXiv:1912.12740. Cited by: §8.
 [6] (2004) Rmat: a recursive model for graph mining. In Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 442–446. Cited by: item 2.
 [7] (2013) Dynamic graphs in the slidingwindow model. In European Symposium on Algorithms, pp. 337–348. Cited by: §1, §8.
 [8] (2009) Trading off space for passes in graph streaming problems. ACM Transactions on Algorithms (TALG) 6 (1), pp. 6. Cited by: §2.2, §2.3, §3.1, Invariant 1.
 [9] (2018) Lectures on randomized numerical linear algebra. The Mathematics of Data 25 (1). Cited by: §10.
 [10] (2012) Stinger: high performance data structure for streaming graphs. In 2012 IEEE Conference on High Performance Extreme Computing, pp. 1–5. Cited by: §8, §8.
 [11] (2015) On efficient randomized algorithms for finding the pagerank vector. Computational Mathematics and Mathematical Physics 55 (3), pp. 349–365. Cited by: §10.
 [12] (2020) HOOVER: leveraging openshmem for high performance, flexible streaming graph applications. In 2020 IEEE/ACM 3rd Annual Parallel Applications Workshop: Alternatives To MPI+ X (PAWATM), pp. 55–65. Cited by: §8, §8.
 [13] (2016) Towards a distributed largescale dynamic graph data store. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 892–901. Cited by: §8, §8.
 [14] (2018) Community interaction and conflict on the web. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, pp. 933–943. Cited by: item 3.
 [15] (1980) Algorithms for vlsi processor arrays. Introduction to VLSI systems, pp. 271–292. Cited by: §3.
 [16] (201406) SNAP Datasets: Stanford large network dataset collection. Note: http://snap.stanford.edu/data Cited by: item 3.
 [17] (2014) Graph stream algorithms: a survey. ACM SIGMOD Record 43 (1), pp. 9–20. Cited by: §1, §8.
 [18] (1980) Selection and sorting with limited storage. Theoretical computer science 12 (3), pp. 315–323. Cited by: §2.2.
 [19] (2005) Data streams: algorithms and applications. Foundations and Trends in Theoretical Computer Science 1 (2), pp. 117–236. Cited by: §2.2, §8.
 [20] (2014) Streaming data analytics via message passing with application to graph algorithms. Journal of Parallel and Distributed Computing 74 (8), pp. 2687–2698. Cited by: §9.1.
 [21] (1999) Computing on data streams. In Proc. DIMACS Workshop External Memory and Visualization, Vol. 50, pp. 107. Cited by: §2.2.
 [22] (2011) Tracking structure of streaming social networks. In 2011 Graph Exploitation Symposium hosted by MIT Lincoln Labs, Cited by: §8.
 [23] (2019) Incremental graph processing for online analytics. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 1007–1018. Cited by: §8.
 [24] (1985) Random sampling with a reservoir. ACM Transactions on Mathematical Software (TOMS) 11 (1), pp. 37–57. Cited by: §9.5.
 [25] (2021) A parallel packed memory array to store dynamic graphs. In 2021 Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 31–45. Cited by: §8, §8.
 [26] (2021) Wolfram alpha LLC. Note: https://www.wolframalpha.com/ Cited by: §6.
 [27] (2018) A new algorithmic model for graph analysis of streaming data. In International Workshop on Mining and Learning with Graphs, Vol. 10. Cited by: §8, §8.
 [28] (2019) Concurrent katz centrality for streaming graphs. In 2019 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. Cited by: §8, §8.