1. Introduction
Stream processing is now in widespread production use in domains as varied as telecommunication, personalized advertisement, medicine, transportation, and finance. It is generally the paradigm of choice for applications that expect high throughput and low latency. Regardless of domain, nearly every stream processing application involves some form of aggregation or another, with one of the most common being slidingwindow aggregation.
Slidingwindow aggregation derives a summary statistic over a userspecified amount of recent streaming data. Users also define how that summary statistic is computed, usually in the form of an associative binary operator (Boykin et al., 2014), as that is the most general known form for which computation can be effectively incrementalized to avoid naïvely scanning every window. While some associative aggregation operators, such as sum, are also invertible, many, such as maximum or Bloom filters, are merely associative but not invertible.
Recent algorithmic research on slidingwindow aggregation has given much attention to streams with strictly inorder arrivals. The standard interface for slidingwindow aggregation supports insert, evict, and query. In the inorder setting, there are algorithms (Shein et al., 2017; Tangwongsan et al., 2017) for associative operators that take only time per window change, without requiring the operator to be invertible nor commutative.
In reality, however, outoforder streams are the norm (Akidau et al., 2013). Clock drift and disparate latency in computation and communication, for example, can cause values in a stream to arrive in a different order than their timestamps. Processing outoforder streams is already supported in many stream processing platforms (e.g., (Akidau et al., 2013; Zaharia et al., 2013; Carbone et al., 2015; Akidau et al., 2015)). Still, in terms of performance, users who want the full generality of associative operators have to resort to latencyprone buffering or, alternatively, use an augmented balanced tree, such as a Btree, at a cost of time per insert or evict, where is the window size. This stands in stark contrast with the inorder setting, especially for when the streams are nearly in order. Thus, we ask whether there exists a sub algorithm for outoforder streams; this paper is our affirmative answer.
This paper introduces the finger Btree aggregator (FiBA), a novel algorithm that efficiently aggregates sliding windows on outoforder streams and inorder streams alike. Each insert or evict takes amortized time^{1}^{1}1See Theorem 3.4 for a more formal statement., where the outoforder distance is the distance from the inserted or evicted value to the closer end of the window. The complexity means for inorder streams, nearly for slightly outoforder streams, and never more than even for severely outoforder streams. The worstcase time for any one particular insert or evict is , which only happens in the rare case of rebalancing all the way up the tree. FiBA requires space and takes time for a wholewindow query. Furthermore, it is as general as the prior stateoftheart, supporting variablesized windows and only requiring associativity from the operator.
Our solution can be summarized as finger Btrees (Guibas et al., 1977) with positionaware partial aggregates. Starting with the classic Btrees, we first add pointers, or fingers, to the start and end of the tree. These fingers make it possible to perform the search for the value to insert or evict in worstcase time. Second, we adapt a specific variant of Btrees where the rebalance to fix the size invariants takes amortized time; specifically, we use Btrees with MAX_ARITY$=2\cdot$MIN_ARITY and where rebalancing happens afterthefact (Huddleston and Mehlhorn, 1982). Third and most importantly, we develop novel positionaware partial aggregates and a corresponding algorithm to bound the cost of aggregate repairs to the cost of search plus rebalance.
The running time of FiBA is asymptotically the best possible in general. We prove a lower bound showing that for insert and evict operations with outoforder distance up to , the amortized cost of an operation in the worst case must be at least .
Furthermore, we show how FiBA can support window sharing with query time logarithmic in the subwindow size and the distance from the largest window’s boundaries. Here, the space complexity is , where is the size of the largest window.
Our experiments confirm the theoretical findings and show that FiBA performs well in practice. For outoforder streams, it is a substantial improvement over existing algorithms in terms of both latency and throughput. For strictly inorder streams (i.e., FIFO), it demonstrates constant time performance and remains competitive with specialized algorithms for inorder streams.
We hope FiBA will be used to make streaming applications less resourcehungry and more responsive for outoforder streams.
2. Problem Statement: OoO SWAG
This section states the problem addressed in this paper more formally. Consider a data stream where each value carries a logical time in the form of a timestamp. Throughout, we denote a timestamped value as . For example, is the value at logical time . The examples in this paper use natural numbers for timestamps, but our algorithms do not depend on any properties of the natural numbers besides being totally ordered. For instance, our algorithms work just as well with date/time representations or with real numbers.
It is intuitive to assume that values in such a stream arrive in nondecreasing order of time (in order). However, due to clock drift and disparate latency in computation and communication, among other factors, values in a stream often arrive in a different order than their timestamps. Such a stream is said to have outoforder (OoO) arrivals—there exists a laterarriving value that has an earlier logical time than a previouslyarrived value.
Our goal in this paper is to maintain the aggregate value of a timeordered sliding window in the face of outoforder arrivals. To motivate our formulation below, consider the following example, which maintains the max and the maxcount, i.e., the number of times the max occurs in the sliding window.
Initially, the values arrive in the same order as their associated timestamps . The maximum value is , and maxcount is because occurs twice. When stream values arrive in order, they are simply appended. For instance, when arrives, it is inserted at the end:
However, when values arrive outoforder, they must be inserted into the appropriate spots to keep the sliding window timeordered. For instance, when arrives, it is inserted between timestamps and :
As for eviction, stream values are usually removed from a window in order, for instance, evicting from the front:
Notice that, in general, eviction cannot always be accomplished by simply inverting the aggregation value. For instance, evicting cannot be done by “subtracting off” the value from the current aggregation value. The algorithm needs to efficiently discover the new max 4 and maxcount 2:
Monoids. There are other streaming aggregations besides max and maxcount. Monoids capture a large class of commonly used aggregations (Boykin et al., 2014; Tangwongsan et al., 2015). A monoid is a triple , where is a binary associative operator on , with being its identity element. Notice that only needs to be associative; it does not need not be commutative or invertible. For example, to express max and maxcount as a monoid, if and are the max and maxcount, then
Since is associative, no parentheses are needed for repeated application. When the context is clear, we even omit , for example, writing qstu for . This concise notation is borrowed from the mathematicians’ convention of omitting explicit multiplication operators.
OoO SWAG. This paper is concerned with maintaining an aggregation on a timeordered sliding window where the aggregation operator can be expressed as a monoid. This can be formulated as an abstract data type (ADT) as follows:
Definition 2.1 ().
Let be a binary operator operator from a monoid and its identity. The outoforder slidingwindow aggregation (OoO SWAG) ADT is to maintain a timeordered sliding window , , supporting the following operations:

[label=—, topsep=2pt,leftmargin=1.5]

insert( : Time, : Agg) checks whether is already in the window, i.e., whether there is an such that . If so, it replaces by . Otherwise, it inserts into the window at the appropriate location.

evict( : Time) checks whether is in the window, i.e., whether there is an such that . If so, it removes from the window. Otherwise, it does nothing.

query() : Agg combines the values in time order using the operator. In other words, it returns if the window is nonempty, or if empty.
Lower Bound. How fast can OoO SWAG operations be supported? For inorder streams, the SWAG operations can be handled in time per operation (Tangwongsan et al., 2017; Shein et al., 2017). But the problem becomes more difficult when the stream has outoforder arrivals. We prove in this paper that to handle outoforder distance up to , the amortized cost of a OoO SWAG operation in the worst case must be at least .
Theorem 2.2 ().
Let be given such that and . For any OoO SWAG algorithm, there exists a sequence of operations, each with outoforder distance at most , for which the algorithm requires a total of at least time.
The proof, which appears in Appendix A, shows this in two steps. First, it establishes a sorting lower bound for permutations on elements with outoforder distance at most . Second, it gives a reduction proving that maintaining OoO SWAG is no easier than sorting such permutations.
Orthogonal Techniques. OoO SWAG operations are designed to work well with other stream aggregation techniques.
The insert() operation supports the case where is already in the window, so it works with preaggregation schemes such as window panes (Li et al., 2005), paired windows (Krishnamurthy et al., 2006), cutty windows (Carbone et al., 2016), or Scotty (Traub et al., 2018). For instance, for a 5hour sliding window that advances in 1minute increments, the logical times can be rounded to minutes, leading to more cases where is already in the window.
The evict() operation accommodates the case where is not the oldest time in the window, so it works with streaming systems that use retractions (Abadi et al., 2005; Akidau et al., 2013, 2015; Barga et al., 2007; Brito et al., 2008; Chandramouli et al., 2010; Li et al., 2008; Zaharia et al., 2013).
Neither insert() nor evict() are limited to values of that are near either end of the window, so they work in the general case, not just in cases where the outoforder distance is bounded by buffer sizes or low watermarks.
Query Sharing. As defined above, OoO SWAG does not support query sharing. However, query sharing for different window sizes can be accommodated via a range query:

[label=—, topsep=2pt,leftmargin=1.5]

query( : Time, : Time) : Agg aggregates exactly the values from the window whose times fall between and . That is, it returns , where is the largest such that and is the smallest such that . If the subrange contains no values, the operation returns .
In these terms, the problem statement of this paper is
to design and implement efficient OoO SWAG operations as well as rangequery support for arbitrary monoids .
3. Finger BTree Aggregator (FiBA)
This section introduces our algorithm gradually, giving intuition along the way. It begins by describing a basic algorithm (Section 3.1) that utilizes a Btree augmented with aggregates. This algorithm takes time for each insert or evict operation. Reducing the time complexity below requires further observations and ideas. This is explored intuitively in Section 3.2 with details fleshed out in Section 3.3.
3.1. Basic Algorithm: Augmented BTree
One way to implement the OoO SWAG is to start with a classic Btree with timestamps as keys and augment that tree with aggregates. This is a baseline implementation, which will be built upon. Even though any balanced trees can, in fact, be used, we chose the Btree because it is wellstudied and has customizable fanout degree, providing opportunities for experimentation.
There are many Btree variations. The range of permissible arity, or fanout degree of a node, is controlled by two parameters MIN_ARITY and MAX_ARITY. While MIN_ARITY can be any integer greater or equal to , most Btree variations require that MAX_ARITY be at least . Hence, if —or simply when the context is clear—denotes the arity of a node , then a Btree obeys the following size invariants:

[leftmargin=1em]

For a nonroot node , MIN_ARITY$\,\lea(y)

For all nodes, MAX_ARITY.

All nodes have timestamps and values .

All nonleaf nodes have child pointers .
Figure 1 illustrates a Btree augmented with aggregates. In this example, MIN_ARITY is 2 and MAX_ARITY is . Consequently, all nodes have 1–3 timestamps and values, and nonleaf nodes have 2–4 children. Each node in the tree contains an aggregate, an array of timestamps and values, and optionally pointers to the children. For instance, the root node contains the aggregate ab..u, the values and their timestamps , and pointers to three children. Because we use timestamps as keys, the entries are timeordered, both within a node and across nodes, with timestamps stored in a parent node separating and limiting the time in the subtrees it points to. The tree is always heightbalanced. Additionally, all leaves are at the same depth.
What aggregate is kept in a node? For each node , the aggregate stored at that node obeys the upaggregation invariant:
By a standard inductive argument, is the aggregation of the values inside the subtree rooted at . This means the query() operation can simply return the aggregation value at the root (root.agg).
The operations insert() or evict() first search for the node where belongs. Second, they locally insert or evict at that node, updating the aggregate stored at that node. Then, they rebalance the tree starting at that node and going up towards the root as necessary to fix any size invariant violations, while also repairing aggregate values along the way. Finally, they repair any remaining aggregate values not repaired during rebalancing, starting above the node where rebalancing topped out and visiting all ancestors up to the root.
Theorem 3.1 ().
In a classic Btree augmented with aggregates, if it stores , the operation query() returns .
Proof.
After each operation, all nodes obey the aggregation invariant, and root$)
Theorem 3.2 ().
In a classic Btree augmented with aggregates, the operation query() costs at most time and operations insert() or evict() take at most time.
Proof.
As is standard, we treat the arity of a node as bounded by a constant. The query operation and the local insert or evict visit only a single node. The search, rebalance, and repair visit at most two nodes per tree level. The work is thus bounded by the tree height, which is since the tree is heightbalanced (Bayer and McCreight, 1972; Cormen et al., 1990; Huddleston and Mehlhorn, 1982). Hence, the total cost per operation is . ∎
3.2. Breaking the Barrier
The basic algorithm just described supports OoO SWAG operations in time using an augmented classic Btree. To improve upon the time complexity, we now discuss the bottlenecks in the basic algorithm and outline a plan to resolve them.
In the basic algorithm, the insert() and evict() operations involve four steps: (1) search for the node where belongs; (2) locally insert or evict; (3) rebalance to repair size invariants; and (4) repair remaining aggregation invariants. If one treats arity as constant, the local insertion or eviction operation takes constant time, as does the query() operation. But each of the steps for search, rebalance, and repair takes up to time. Hence, these are the bottleneck steps and will be improved upon as follows:

[label=()]

By maintaining “fingers” to the leftmost and rightmost leaves, we will reduce the search complexity to , where is the distance to the closer end of the slidingwindow boundary. This means that in the FIFO or nearFIFO case, the search complexity will be constant.

By using an appropriate MAX_ARITY and a somewhat lazy strategy for rebalancing, we will make sure that rebalance takes no more than constant in the amortized sense. This means that for any operation that affects the tree structure, the cost to restore the proper tree structure amounts to constant per operation, regardless of outoforder distance.

By introducing positiondependent aggregates, we will ensure that repairs to the aggregate values are made only to nodes along the search path or involved in restructuring. This means that the repairs cost no more than the cost of search and rebalance.
We combine the above ideas into a novel sub algorithm for OoO SWAG. Below, we describe how these ideas will be implemented intuitively, leaving detailed algorithms and proofs to Section 3.3.
Sub Search. In classic Btrees, a search starts at the root and ends at the node being searched, henceforth called . Often, is a leaf, so the search visits nodes. However, instead of starting at the root, one can start at the leftmost or rightmost leaf in the tree. This requires pointers to the leftmost or rightmost leaf, henceforth called the left and right fingers (Guibas et al., 1977). In addition, we keep a parent pointer at each node. Hence, the search can start at the nearest finger, walk up to the nearest common ancestor of the finger and , and walk down from there to . The resulting algorithm runs in , where is the distance from the nearest end of the window–or more precisely, is the number of timed values from to the nearest end of the window.
Sub Rebalance. Insertions and evictions can cause nodes to overflow or underflow, thus violating the size invariants. There are two popular strategies that address this: either before or after the fact. The beforethefact strategy ensures that ancestors of the affected node are not at risk of overflow or underflow by preventive rebalancing, so that the arity is at least one further away from the threshold required by the size invariants (e.g., (Cormen et al., 1990)). The afterthefact strategy first performs the local insert or evict step, then repairs any resulting overflow or underflow to ensure the size invariants hold again by the end of the entire insert or evict operation. We adopt the afterthefact strategy, which has been shown to take amortized constant time (Huddleston and Mehlhorn, 1982) as long as . For simplicity, we use . The amortized cost is as rebalancing rarely goes all the way up the tree. The worstcase cost is , bounded by the tree height.
Sub Repair. The basic algorithm stores at each node the upaggregate , i.e., the partial aggregate of the subtree under . This is problematic, because it means that an insertion or eviction at a node , usually a leaf, affects the partial aggregates stored in all ancestors of —that is, the entire path up to the root. To circumvent this issue, we need an arrangement of aggregates that can be repaired by traversing to a finger, without always traversing to the root. For this, we make each node store the kind of partial aggregate suitable for its position in the tree. Furthermore, because the root no longer contains the aggregate of the whole tree, we will ensure that query() can be answered by combining partial aggregates at the left finger, the root, and the right finger.
To meet these requirements, we define four kinds of partial aggregates in Figure 2. As illustrated in Figure 3, they are used in a Btree according to the following aggregation invariants:

[leftmargin=0em,label=,itemsep=2pt]

Nonspine nodes store the upaggregate . Such a node is neither a finger nor an ancestor of a finger. This aggregate must be repaired whenever the subtree below it changes. Figure 3(A) shows nodes with upaggregates in white, light blue, or light green. For example, the center child of the root contains the aggregate hijklmn, comprising its entire subtree.

The root stores the inner aggregate . This aggregate is only affected by changes to the inner part of the tree, and not by changes below the leftmost or rightmost child of the root. Figure 3(A) shows the inner parts of the tree in white and the root in gray, and the root stores the aggregate ghijklmno.

Nonroot nodes on the left spine store the left aggregate . For a given node , the left aggregate encompasses all nodes under the leftmost child of the root except for ’s leftmost child . When a change occurs below the leftmost child of the root, the only aggregates that need to be repaired are those on a traversal up to the left spine and then down to the left finger. Figure 3(A) shows the left spine in dark blue and nodes affecting it in light blue. For example, the node in the middle of the left spine contains the aggregate cdef, comprising the left subtree of the root except for the left finger.

Nonroot nodes on the right spine store the right aggregate . This is symmetric to the left aggregate . When a change occurs below the rightmost child of the root, only aggregates on a traversal to the right finger are repaired. Figure 3(A) shows the right spine in dark green and nodes affecting it in light green. For example, the node in the middle of the right spine contains the aggregate qst of the right subtree of the root except for the right finger.
3.3. Using Finger BTrees
This section describes an algorithm that implements the OoO SWAG using a finger Btree augmented with aggregates. It achieves sub time complexity by maintaining the size invariants from Section 3.1 and the aggregation invariants from Section 3.2.
The algorithmic complexity analysis will account for the cost of split, merge, or move operations by counting coins. Specifically, the analysis counts the number of split, merge, or move steps of an insert or evict operation as spent coins. Coins can be imagined as being stored at tree nodes, so they can be used to pay for split, merge, or move operations later. Throughout this paper, coins are visualized as little golden circles next to tree nodes. Sometimes, coins must be added or removed from the outside to make up the difference between spent coins and coins in the tree before and after each step. We refer to these coins as being billed or refunded. The key result of the proof will be that billed coins never exceed 2 for any insert() or evict(), hence rebalancing has amortized constant time complexity.
Figures 3–6 show concrete examples covering all the interesting cases of the algorithm. Each state, for instance (A), shows a tree with aggregates and coins. Each step, for instance AB, shows an insert or evict, illustrating how it affects the tree, its partial aggregates, and coins.

[leftmargin=1em]

In Figure 3, Step AB is an inorder insert without rebalance, which only affects the aggregate at a single node, the right finger.

Step BC is an outoforder insert without rebalance, affecting aggregates on a walk to the right finger.

Step CD is an inorder evict without rebalance, affecting the aggregate at a single node, the left finger.

Step DE is an outoforder insert to a node with arity MIN_ARITY, causing an overflow; rebalancing splits it.

Step EF is an evict from a node with MIN_ARITY, causing the node to underflow; rebalancing merges it with its neighbor.

In Figure 6, Step GH is an insert that causes nodes to overflow all the way up to the root, causing a height increase followed by splitting the old root. This affects aggregates on all split nodes and on both spines.

In Figure 6, Step IJ is an evict that causes first an underflow that is fixed by a merge, and then an underflow at the next level where the neighbor node is too big to merge. The algorithm repairs the size invariant with a move of a child and a timed value from the neighbor. This step affects aggregates on all nodes affected by rebalancing plus a walk to the left finger.

In Figure 6, Step KL is an evict that causes nodes to underflow all the way up to the root, causing a height decrease to eliminate the old empty root. This affects aggregates on all merged nodes and on both spines.
⬇ 1fun query() : Agg 2 if root.isLeaf() 3 return root.agg 4 return leftFinger.agg root.agg rightFinger.agg 5 6fun insert( : Time, : Agg) 7 node searchNode() 8 node.localInsertTimeAndValue(, ) 9 top, hit$_\texttt{left}_\texttt{right}\gets$ rebalanceForInsert(node) 10 repairAggs(top, hit$_\texttt{left}_\texttt{right}t$ : Time) 11 node searchNode() 12 found, idx node.localSearch() 13 if found 14 if node.isLeaf() 15 node.localEvictTimeAndValue() 16 top,hit$_\texttt{left}_\texttt{right}\gets$ rebalanceForEvict(node, null) 17 else 18 top,hit$_\texttt{left}_\texttt{right}\gets$ evictInner(node, idx) 19 repairAggs(top, hit$_\texttt{left}_\texttt{right}_\texttt{left}_\texttt{right}\gets$ top.parent 20 top.localRepairAgg() 21 else 22 top.localRepairAgg() 23 if top.leftSpine or top.isRoot() and hit$_\texttt{left}\gets$ top 24 while not left.isLeaf() 25 left left.getChild(0) 26 left.localRepairAgg() 27 if top.rightSpine or top.isRoot() and hit$_\texttt{right}\gets$ top 28 while not right.isLeaf() 29 right right.getChild(right.arity  1) 30 right.localRepairAgg() ⬇ 1fun rebalanceForInsert(node : Node) : Node$\times$Bool$\times$Bool 2 hit$_\texttt{left}_\texttt{right}\gets$ node.leftSpine, node.rightSpine 3 while node.arity > MAX_ARITY 4 if node.isRoot() 5 heightIncrease() 6 hit$_\texttt{left}_\texttt{right}\gets$ true, true 7 split(node) 8 node node.parent 9 hit$_\texttt{left}\gets$ hit$_\texttt{left}_\texttt{right}\gets$ hit$_\texttt{right}_\texttt{left}_\texttt{right}\times$Bool$\times$Bool 10 hit$_\texttt{left}_\texttt{right}\gets$ node.leftSpine, node.rightSpine 11 if node toRepair 12 node.localRepairAggIfUp() 13 while not node.isRoot() and node.arity < MIN_ARITY 14 parent node.parent 15 nodeIdx, siblingIdx pickEvictionSibling(node) 16 sibling parent.getChild(siblingIdx) 17 hit$_\texttt{right}\gets$ hit$_\texttt{right}\leq$ MIN_ARITY 18 node merge(parent, nodeIdx, siblingIdx) 19 if parent.isRoot() and parent.arity 1 20 heightDecrease() 21 else 22 node parent 23 else 24 move(parent, nodeIdx, siblingIdx) 25 node parent 26 if node toRepair 27 node.localRepairAggIfUp() 28 hit$_\texttt{left}\gets$ hit$_\texttt{left}_\texttt{right}\gets$ hit$_\texttt{right}_\texttt{left}_\texttt{right}Figure 7. Finger BTree with aggregates: algorithm. ⬇ 30fun evictInner(node : Node, idx : Int) : Node$\times$Bool$\times$Bool 31 left, right node.getChild(idx), node.getChild(idx+1) 32 if right.arity > MIN_ARITY 33 leaf, , oldest(right) 34 else 35 leaf, , youngest(left) 36 leaf.localEvictTimeAndValue() 37 node.setTimeAndValue(idx, , ) 38 top,hit$_\texttt{left}_\texttt{right}\gets$ rebalanceForEvict(leaf, node) 39 if top.isDescendent(node) 40 while top node 41 top top.parent 42 hit$_\texttt{left}\gets$ hit$_\texttt{left}_\texttt{right}\gets$ hit$_\texttt{right}_\texttt{left}_\texttt{right}Figure 8. Finger BTree evict inner: algorithm. Step MN, outoforder evict 9:i. Spent 0, billed 1.Figure 9. Finger Btree evict inner: example. Figure 3.3 shows most of the algorithm, excluding only evictInner, which will be presented later. While rebalancing always works bottomup, aggregate repair works in the direction of the partial aggregates: either up for upagg or inneragg, or down for leftagg or rightagg. Our algorithm piggybacks the repair of upaggs onto the local insert or evict and onto rebalancing, and then repairs the remaining aggregates separately. To facilitate the handover from the piggybacked phase to the dedicated phase of aggregate repair, the rebalancing routines return a triple top, hit, hit, for instance, in Line 9. Node top is where rebalancing topped out, and if it has an upagg, it is the last node whose aggregate has already been repaired. Booleans hit and hit indicate whether rebalancing affected the left or right spine, determining whether aggregates on the respective spine have to be repaired.
To keep the algorithm more readable, we factored out the case of evicting from a nonleaf node into function evictInner in Figure 3.3. To evict something from an inner node, Line 82 evicts a substitute from a leaf instead, and Line 83 writes that substitute over the evicted slot. Function evictInner creates an obligation to repair an extra node during rebalancing, handled by parameter toRepair on Line 52 in the same figure. Function evictInner can only be triggered for outoforder eviction, because inorder evictions always happen at the left finger, which is a leaf.
The following theorems state our correctness guarantees and the time complexity; their proofs appear in Appendix B.
Theorem 3.3 ().
In a finger Btree with aggregates that contains , operation query() returns .
Theorem 3.4 ().
In a finger Btree with aggregates, query() costs at most time, and insert() and evict() take time , where

[topsep=2pt]

is , with being the distance to the start or end of the window, whichever is closer;

is amortized and worstcase ; and

is .
4. Window Sharing
This section explains how to use a single finger Btree to efficiently answer aggregations on subwindows of different sizes on the fly. Applications are numerous. One common basic example is a simple anomaly detection workflow that compares two related aggregations: one on a large window representing the normal “stable” behavior and the other on a smaller window representing the most recent behavior. Then, an alert is triggered when the aggregates differ substantially. Whereas in this example, the sizes of the windows are known ahead of query time, in many other applications—e.g., interactive data exploration—queries are ad hoc.
We propose to implement window sharing via range queries, as defined at the end of Section 2. This has many benefits: The window contents need to be saved only once regardless of how many subwindows are involved. Thus, each insert or evict needs to be performed only once on the largest window. This approach can accommodate an arbitrary number of shared window sizes. For instance, many users can register queries over different window sizes. Importantly, queries can be ad hoc and interactive, which would otherwise not be possible to support using multiple fixed instances. Furthermore, the rangequery formulation also accommodates the case where the window boundary is not the current time (). For instance, it can report results with some timelag dictated by punctuation or low watermarks.
To answer the range query query(), the algorithm, shown in Figure 10, uses recursion starting from the leastcommon ancestor node whose subtree encompasses the queried range. The main technical challenge is to avoid making spurious recursive calls. Because the nodes already store partial aggregates, the algorithm should only recurse into a node’s children if the partial aggregates cannot be used directly. Specifically, we aim for the algorithm to invoke at most two chains of recursive calls, one visiting ancestors of node and the other visiting ancestors of node. The insight for preventing spurious recursive calls is that one needs information about neighboring timestamps in a node’s parent to determine whether the node itself is subsumed by the range. We encode whether the neighboring timestamp in the parent is included in the range on the left or right by using or , respectively.
This strategy alone would have been similar to range query in an interval tree (Cormen et al., 1990), albeit without explicitly storing the ranges; however, our speciallydesigned partial aggregates add another layer of details: not all nodes store aggup values . But any nodes that lack are guaranteed to be on one of the two recursion chains, because if a query involves spines of the entire window, then those spines coincide with edges of the intersection between the window and the range.
Theorem 4.1 ().
In a finger Btree with aggregates that contains , the operation query() returns the aggregate , where is the largest such that and is the smallest such that .
Proof.
By induction. Each recursive call returns the aggregate of the intersection between its subtree and the queried range. ∎
Theorem 4.2 ().
In a finger Btree with aggregates that contains , the operation query() takes time , where

[topsep=0pt]

is the largest index such that

is the smallest index such that

and are the distances to the window boundary

is the size of subwindow being queried.
Proof.
Using finger searches, Line 2 takes . Now the distance from either node or node to the leastcommon ancestor (LCA) is at most . Therefore, locating the LCA takes at most , and so do subsequent recursive calls in queryRec that traverse the same paths. ∎
In particular, when a query ends at the current time (i.e., when ), the theorem says that the query takes time, where is the size of the subwindow being queried.
5. Results
Figure 11. Outoforder distance experiments. We implemented both OoO SWAG variants in C++: the baseline classic Btree augmented with aggregates and the finger Btree aggregator (FiBA). We present experiments with competitive minarity values: , and . Higher values for minarity were never competitive in our experiments. Our experiments run outside of any particular streaming framework so we can focus on the aggregation algorithms themselves. Our load generator produces synthetic data items with random integers. The experiments perform rounds of evict, insert, and query to maintain a sliding window that accepts a new data item, evicts an old one, and produces a result each round.
We present results with three aggregation operators and their corresponding monoids, each representing a different category of computational cost. The operator sum performs an integer sum over the window, and its computational cost is less than that of tree traversals and manipulations. The operator geomean
performs a geometric mean over the window. For numerical stability, this requires a floating point log on insertion and floating point additions during data structure operations. It represents a middle ground in computational cost. The most expensive operator,
bloom, is a Bloom filter (Bloom, 1970) where the partial aggregations maintain a bitset of size . It represents aggregation operators where the computational cost of performing an aggregation easily dominates the cost of maintaining the SWAG data structure.We ran all experiments on a machine with an Intel Xeon E52697 at 2.7 GHz running Red Hat Enterprise Linux Server 7.5 with a 3.10.0 kernel. We compiled all experiments with g++ 4.8.5 with optimization level O3.
5.1. Varying Distance
We begin by investigating how insert’s outoforder distance affects throughput. The distance varying experiments, Figure 11, maintain a window with a constant size of data items. The axis is the outoforder distance between the newest timestamp already in the window and the timestamp created by our load generator. Our adversarial load generator prepopulates the window with high timestamps and then spends the measured portion of the experiment producing low timestamps. This regime ensures that after the prepopulation with high timestamps, the outoforder distance of each subsequent insertion is precisely .
This experiment confirms the prediction of the theory. The classic Btree’s throughput is mostly unaffected by the change in distance, but the finger Btree’s throughput starts out significantly higher and smoothly degrades, following a trend. All variants see an uptick in performance when , that is, when the distance is the size of the window. This is a degenerate special case. When , the lowest timestamp to evict is always in the leftmost node in the tree, so the tree behaves like a lastin firstout (LIFO) stack, and inserting and evicting it requires no tree restructuring.
The minarity that yields the bestperforming Btree varies with the aggregation operator. For expensive operators, such as bloom, smaller minarity trees perform better. The reason is that as the minarity grows, the number of partial aggregations the algorithm needs to perform inside of a node also increases. When the aggregation cost dominates all others, trees that require fewer total aggregations will perform better. On the flip side, for cheap operators, such as sum, trees that require fewer rebalance and repair operations will perform better.
The steplike throughput curves for the finger Btrees is a function of their minarity: larger minarity means longer sections where the increased outoforder distance still affects only a subtree with the same height. When the throughput suddenly drops, the increase in meant an increase in the height of the affected subtree, causing more rebalances and updates.
5.2. Latency
Figure 12. Latency experiments. The worstcase latency for both classic and finger Btrees is , but we expect that the finger variants should significantly reduce average latency. The experiments in Figure 12 confirm this expectation. All latency experiments are with a fixed window of size . The top set of experiments use an outoforder distance of and the bottom set use an outoforder distance of . (We chose the latter distance because it is among the worstperforming in the throughput experiments.) The experimental setup is the same as for the throughput experiments, and the latency is for an entire round of evict, insert, and query. The axis is the number of processor cycles for a round, in log scale. Since we used a 2.7 GHz machine, cycles take 370 nanoseconds and cycles take 370 microseconds. The blue bars represent the median latency, the shaded blue regions represent the distribution of latencies, and the black bar is the th percentile. The range is the minimum and maximum latency.
When the outoforder distance is and the aggregation operator is cheap or only moderately expensive, the worstcase latency in practice for the classic and finger Btrees is similar. This is expected, as the time is dominated by tree operations, and they are worstcase . However, the minimum and median latencies are orders of magnitude better for the finger Btrees. This is also expected, since in the case of , the fingers enable amortized constant updates. When the aggregation operator is expensive, the finger Btrees have significantly lower latency, because they have to repair fewer partial aggregates.
With an outoforder distance of and cheap or moderately expensive operators, the classic and finger Btrees have similar latency. This is expected: as approaches , the worstcase latency for finger Btrees approaches . Again, with expensive operators, the minimum, median, and th percentile of the finger Btree with minarity is orders of magnitude lower than that of classic Btrees. There is, however, a curious effect clearly present in the bloom experiments with finger Btrees, but still observable in the others: minarity has the lowest latency; it gets significantly worse with minarity , then improves with minarity . Recall that the root is not subject to minarity—in other words, it may be slimmer. With , depending on the arity of the root, some aggregation repairs walk almost to the root and then back down a spine while others walk to the root and no further. The former case, which involves twice a spine, is generally more expensive than the latter, which is usually a shorter path. The frequency of the expensive case is a function of the window size, tree arity, and outoforder distance, and these factors do not interact linearly.
Figure 13. FIFO experiments. 5.3. Fifo
A special case for FiBA is when ; with inorder data, our finger Btree aggregator (FiBA) enjoys amortized constant time performance. Figure 13 compares the Btreebased SWAGs against the stateofthe art SWAGs optimized for firstin, firstout, completely inorder data. Twostacks only works on inorder data and is amortized with worstcase (adamax, 2011). The DeAmortized Bankers Aggregator (DABA) also only works on inorder data and is worstcase (Tangwongsan et al., 2017). The Reactive Aggregator supports outoforder evict but requires inorder insert and is amortized with worstcase (Tangwongsan et al., 2015). The axis represents increasing window size .
Twostacks and DABA perform as seen in prior work: for most window sizes, twostacks with amortized time bound has the best throughput. DABA is generally second best, as it does a little more work on each operation to maintain worstcase constant performance.
The finger Btree variants demonstrate constant performance as the window size increases. The best finger Btree variants stay within of DABA for sum and geomean, but are about off of DABA with a more expensive operator like bloom. In general, finger Btrees are able to maintain constant performance with completely inorder data, but the extra work of maintaining a tree means that SWAGs specialized for inorder data consistently outperform them.
Figure 14. Window sharing experiments. Outoforder distance also varies as where is the small window size. The classic Btrees clearly demonstrate behavior as the window size increases. Reactive does demonstrate behavior, but it is only obvious with bloom. For sum and geomean, the fixed costs dominate. Reactive was designed to avoid using pointerbased data structures under the premise that the extra memory accesses would harm performance. To our surprise, this is not true: on our hardware, the extra computation required to avoid pointers ends up costing more. For bloom, Reactive outperforms all of the Btree based SWAGs because it is essentially a minarity 1, maxarity 2 tree. As seen in other results, for the most expensive aggregation operators, reducing the total number of aggregation operations matters more to performance than data structure updates.
5.4. Window Sharing
One of the benefits of finger Btrees is that they can support a rangequery interface while maintaining logarithmic performance for queries over that range. A rangequery interface enables window sharing: the same window can be used for multiple queries over different ranges. An obvious benefit from window sharing is reduced space usage, but we also wanted to investigate if it could improve runtime performance. As Figure 14 shows, window sharing did not consistently improve runtime performance.
The experiments maintain two queries: a big window fixed to size , and a small window whose size varies from to , shown on the axis. The workload consists of outoforder data items where the outoforder distance is half of the small window size, i.e., . The _twin experiments maintain two separate trees, one for each window size. The _range experiments maintain a single tree, using a standard query for the big window and a range query for the small window.
Our experiment performs outoforder insert and inorder evict, so insert costs and evict costs . Hence, on average, each round of the _range experiment costs for insert, for evict, and for query on the big window and the small window. On average, each round of the _twin experiment costs for insert, for evict, and for query on the big and small window. Since we chose , this works out to a total of per round in both the _range and the _twin experiments. There is no fundamental reason why window sharing is slightly more expensive in practice. A more optimized code path might make range queries slightly less expensive, but we would still expect them to remain in the same ballpark.
By picking , our experiments demonstrate the case where window sharing is the most likely to outperform the twin experiment. Since it did not outperform the twin experiment, we conclude that window sharing is unlikely to have a consistent performance benefit. We could have increased the number of shared windows to the point where maintaining multiple nonshared windows performed worse because of the memory hierarchy, but that is the same benefit as reduced space usage. We conclude that the primary benefits of window sharing in this context are reduced space usage and the ability to construct queries against arbitrarily sized windows on the fly.
6. Related Work
This section describes work related to outoforder sliding window aggregation, slidingwindow aggregation with window sharing, and finger trees.
OutofOrder Stream Processing. Processing outoforder (OoO) streams is a popular research topic with a variety of approaches. But there are surprisingly few incremental algorithms for OoO stream processing. Truviso (Krishnamurthy et al., 2010) handles stream data sources that are outoforder with respect to each other but where input values are inorder with respect to the stream they arrive on. The algorithm runs separate stream queries on each source followed by consolidation. In contrast, with FiBA, each individual stream input value can have its own independent OoO behavior. Chandramouli et al. (Chandramouli et al., 2010)
describe how to perform pattern matching on outoforder streams but do not tackle sliding window aggregation. Finally, the Reactive Aggregator
(Tangwongsan et al., 2015) performs incremental slidingwindow aggregation and can handle OoO evict in time. In contrast, FiBA can handle both OoO insert and OoO evict, and takes sub time.One approach to OoO streaming is buffering: hold input stream values in a buffer until it is safe to release them to the rest of the stream query (Srivastava and Widom, 2004). Buffering has the advantage of not requiring incremental operators in the query since the query only sees inorder data. Unfortunately, buffering increases latency (since values endure nonzero delay) and reduces quality (since bounded buffer sizes lead to outputs computed on incomplete data). One can reduce the delay by optimistically performing computation over transactional memory (Brito et al., 2008) and performing commits inorder. Finally, one can tune the tradeoff between quality and latency by adaptively adjusting buffer sizes (Ji et al., 2015). In contrast to buffering approaches, FiBA can handle arbitrary lateness without sacrificing quality nor significant latency.
Another approach to OoO streaming is retraction: report outputs quickly but revise them if they are affected by latearriving inputs. At any point, results are accurate with respect to stream input values that have arrived so far. An early streaming system that embraced this approach was Borealis (Abadi et al., 2005), where stateful operators used stored state for retraction. Spark Streaming also takes this approach: it externalizes state from operators and handles stragglers like failures, invalidating parts of the query (Zaharia et al., 2013). Pure retraction requires OoO algorithms such as OoO sliding window aggregation, but the retraction literature does not show how to do that efficiently, as the naïve approach of recomputing from scratch would be inefficient for large windows. Our paper is complementary, describing an efficient OoO sliding window aggregation algorithm that could be used with systems like Borealis or Spark Streaming.
Using a low watermark (lwm) is an approach to OoO streaming that combines buffering with retraction. The lwm approach allows OoO values to flow through the query but limits state requirements at individual operators by limiting the OoO distance. CEDR proposed 8 timestamplike fields to support a spectrum of blocking, buffering, and retraction (Barga et al., 2007). Li et al. (Li et al., 2008) formalized the notion of a lwm based on the related notion of punctuation (Tucker et al., 2003). StreamInsight, which was inspired by CEDR, offered a statemanagement interface to operator developers that could be used for slidingwindow aggregation. Subsequently, MillWheel (Akidau et al., 2013), Flink (Carbone et al., 2015), and Beam (Akidau et al., 2015) also adopted the lwm concept. The lwm provides some guarantees but leaves it to the operator developer to handle OoO values. Our paper describes an efficient algorithm for an OoO aggregation operator, which could be used with systems like the ones listed above.
Sliding Window Aggregation with Sharing. All of the following papers focus on sharing over streams with the same aggregation operator, e.g., monoid . The Scotty algorithm supports slidingwindow aggregation over outoforder streams, while sharing windows with both different sizes and slice granularities (Traub et al., 2018). For instance, Scotty might share a window of size 60 minutes and granularity 3 minutes with a session window whose gap timeout is set to 5 minutes. When a tuple arrives outoforder, older slices may need to be updated, fused, or created. Scotty relies upon an aggregate store (e.g., based on a balanced tree) to maintain slice aggregates. One caveat is that the aggregation operator must be commutative; otherwise, one needs to keep around the tuples from which a slice is preaggregated. Our FiBA algorithm does not make any commutativity assumption. For commutative operators, FiBA could serve as a more efficient aggregate store for Scotty, thus combining the benefits of Scotty’s stream slicing with asymptotically faster final aggregation.
Other prior work on window sharing requires inorder streams. The BInt algorithm uses base intervals, which can be viewed as a tree structure over ordered data, and supports sharing of windows with different sizes (Arasu and Widom, 2004). Krishnamurthi et al. (Krishnamurthy et al., 2006) show how to share windows that differ not just in size but also in granularity. Cutty windows are a more efficient approach to sharing windows with different sizes and granularities (Carbone et al., 2016), and their paper explains how to extend the Reactive Aggregator (Tangwongsan et al., 2015) for sharing. The FlatFIT algorithm performs sliding window aggregation in amortized constant time and supports window sharing, addressing different granularities with the same technique as Cutty windows (Shein et al., 2017). Finally, the SlickDeque algorithm focuses on the special case where always returns one of either or , and offers window sharing with time complexity assuming friendly input data distributions (Shein et al., 2018). In contrast to the above work, FiBA combines window sharing with outoforder processing. It directly supports sliding window aggregation over windows of different sizes.
Finger Trees. Our FiBA algorithm uses techniques from the literature on finger trees, combining and extending them to work with sliding window aggregation. Guibas et al. (Guibas et al., 1977) introduced finger trees in 1977. A finger can be viewed as a pointer to some position in a tree that makes tree operations (usually search, insert, or evict) near that position less expensive. Guibas et al. used fingers on Btrees, but without aggregation. Huddleston and Mehlhorn (Huddleston and Mehlhorn, 1982) offer a proof that the amortized cost of insertion or eviction at distance from a finger is . Our proof is inspired by Huddleston and Mehlhorn, but simplified and addressing a different data organization: we support values to be stored at interior nodes, whereas Huddleston and Mehlhorn’s trees store values only in leaves. Kaplan and Tarjan (Kaplan and Tarjan, 1996) present a purely functional variant of finger trees. The hands data structure is an implementation of fingers that is external to the tree, thus saving space, e.g., for parent pointers (Blelloch et al., 2003). We did not adopt this techniques, because in a Btree, nodes are wider and thus, there are fewer nodes and consequently fewer parent pointers in total. Finally, Hinze and Paterson (Hinze and Paterson, 2006) present purely functional finger trees with amortized time complexity at distance 1 from a finger. They describe caching a monoidbased measure at tree nodes, but this cannot be directly used for slidingwindow aggregation. Our paper is the first to use finger trees for fast outoforder sliding window aggregation. The main novelty is to use and maintain positionaware partial sums.
7. Conclusion
FiBA is a novel algorithm for sliding window aggregation over outoforder streams. The algorithm is based on finger Btrees with positionaware partial aggregates. It works with any associative aggregation operator, does not restrict the kinds of outoforder behavior, and also supports window sharing. This paper includes proofs of correctness and algorithmic complexity bounds of our new algorithm. The proofs demonstrate that FiBA strictly outperforms the prior stateoftheart in theory and that it is as good as the lower bound algorithmic complexity for this problem. In addition, experimental results demonstrate that FiBA yields excellent throughput and latency in practice. Whereas in the past, streaming applications that required outoforder sliding window aggregation had to make undesirable tradeoffs to reach their performance requirements, our new algorithm enables them to work outofthebox for a broad range of circumstances.
References
 (1)
 Abadi et al. (2005) Daniel J. Abadi, Yanif Ahmad, Magdalena Balazinska, Ugur Cetintemel, Mitch Cherniack, JeongHyon Hwang, Wolfgang Lindner, Anurag S. Maskey, Alexander Rasin, Esther Ryvkina, Nesime Tatbul, Ying Xing, and Stan Zdonik. 2005. The Design of the Borealis Stream Processing Engine. In Conference on Innovative Data Systems Research (CIDR). 277–289.
 adamax (2011) adamax. 2011. Re: Implement a queue in which push_rear(), pop_front() and get_min() are all constant time operations. http://stackoverflow.com/questions/4802038/. Retrieved Oct., 2018.
 Akidau et al. (2013) Tyler Akidau, Alex Balikov, Kaya Bekiroglu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, and Sam Whittle. 2013. MillWheel: FaultTolerant Stream Processing at Internet Scale. In Very Large Data Bases (VLDB) Industrial Track. 734–746.
 Akidau et al. (2015) Tyler Akidau, Robert Bradshaw, Craig Chambers, Slava Chernyak, Rafael J. FernandezMoctezuma, Reuven Lax, Sam McVeety, Daniel Mills, Frances Perry, Eric Schmidt, and Sam Whittle. 2015. The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in MassiveScale, Unbounded, OutofOrder Data Processing. In Conference on Very Large Data Bases (VLDB). 1792–1803.
 Arasu and Widom (2004) Arvind Arasu and Jennifer Widom. 2004. Resource sharing in continuous sliding window aggregates. In Conference on Very Large Data Bases (VLDB). 336–347.
 Barga et al. (2007) Roger S. Barga, Jonathan Goldstein, Mohamed Ali, and Mingsheng Hong. 2007. Consistent Streaming Through Time: A Vision for Event Stream Processing. In Conference on Innovative Data Systems Research (CIDR). 363–373.
 Bayer and McCreight (1972) Rudolf Bayer and Edward M. McCreight. 1972. Organization and Maintenance of Large Ordered Indices. Acta Informatica 1 (1972), 173–189.
 Blelloch et al. (2003) Guy E. Blelloch, Bruce M. Maggs, and Shan Leung Maverick Woo. 2003. Spaceefficient Finger Search on Degreebalanced Search Trees. In Symposium on Discrete Algorithms (SODA). 374–383.
 Bloom (1970) Burton H. Bloom. 1970. Space/Time Tradeoffs in Hash Coding with Allowable Errors. Communications of the ACM (CACM) 13, 7 (1970), 422–426.
 Boykin et al. (2014) Oscar Boykin, Sam Ritchie, Ian O’Connell, and Jimmy Lin. 2014. Summingbird: A Framework for Integrating Batch and Online MapReduce Computations. In Conference on Very Large Data Bases (VLDB). 1441–1451.
 Brito et al. (2008) Andrey Brito, Christof Fetzer, Heiko Sturzrehm, and Pascal Felber. 2008. Speculative outoforder event processing with software transaction memory. In Conference on Distributed EventBased Systems (DEBS). 265–275.
 Carbone et al. (2015) Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink: Stream and Batch Processing in a Single Engine. IEEE Data Engineering Bulletin 38, 4 (2015), 28–38.
 Carbone et al. (2016) Paris Carbone, Jonas Traub, Asterios Katsifodimos, Seif Haridi, and Volker Markl. 2016. Cutty: Aggregate Sharing for UserDefined Windows. In Conference on Information and Knowledge Management (CIKM). 1201–1210.
 Chandramouli et al. (2010) Badrish Chandramouli, Jonathan Goldstein, and David Maier. 2010. HighPerformance Dynamic Pattern Matching over Disordered Streams. In Conference on Very Large Data Bases (VLDB). 220–231.
 Cormen et al. (1990) Thomas Cormen, Charles Leiserson, and Ronald Rivest. 1990. Introduction to Algorithms. MIT Press.

Guibas
et al. (1977)
Leo J. Guibas, Edward M.
McCreight, Michael F. Plass, and
Janet R. Roberts. 1977.
A New Representation for Linear Lists. In
Symposium on the Theory of Computing (STOC)
. 49–60.  Hinze and Paterson (2006) Ralf Hinze and Ross Paterson. 2006. Finger Trees: A Simple Generalpurpose Data Structure. Journal of Functional Programming (JFP) 16, 2 (2006), 197–217.
 Huddleston and Mehlhorn (1982) Scott Huddleston and Kurt Mehlhorn. 1982. A new data structure for representing sorted lists. Acta Informatica 17, 2 (1982), 157–184.
 Ji et al. (2015) Yuanzhen Ji, Hongjin Zhou, Zbigniew Jerzak, Anisoara Nica, Gregor Hackenbroich, and Christof Fetzer. 2015. Qualitydriven Processing of Sliding Window Aggregates over Outoforder Data Streams. In Conference on Distributed EventBased Systems (DEBS). 68–79.
 Kaplan and Tarjan (1996) Haim Kaplan and Robert E. Tarjan. 1996. Purely Functional Representations of Catenable Sorted Lists. In Symposium on the Theory of Computing (STOC). 202–211.
 Krishnamurthy et al. (2010) Sailesh Krishnamurthy, Michael J. Franklin, Jeffrey Davis, Daniel Farina, Pasha Golovko, Alan Li, and Neil Thombre. 2010. Continuous Analytics over Discontinuous Streams. In International Conference on Management of Data (SIGMOD). 1081–1092.
 Krishnamurthy et al. (2006) Sailesh Krishnamurthy, Chung Wu, and Michael Franklin. 2006. Onthefly sharing for streamed aggregation. In International Conference on Management of Data (SIGMOD). 623–634.
 Li et al. (2005) Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, and Peter A. Tucker. 2005. No Pane, No Gain: Efficient Evaluation of Slidingwindow Aggregates over Data Streams. ACM SIGMOD Record 34, 1 (2005), 39–44.
 Li et al. (2008) Jin Li, Kristin Tufte, Vladislav Shkapenyuk, Vassilis Papadimos, Theodore Johnson, and David Maier. 2008. Outoforder Processing: A New Architecture for Highperformance Stream Systems. In Conference on Very Large Data Bases (VLDB). 274–288.
 Shein et al. (2017) Anatoli U. Shein, Panos K. Chrysanthis, and Alexandros Labrinidis. 2017. FlatFIT: Accelerated Incremental SlidingWindow Aggregation for RealTime Analytics. In Conference on Scientific and Statistical Database Management (SSDBM). 5.1–5.12.
 Shein et al. (2018) Anatoli U. Shein, Panos K. Chrysanthis, and Alexandros Labrinidis. 2018. SlickDeque: High Throughput and Low Latency Incremental SlidingWindow Aggregation. In Conference on Extending Database Technology (EDBT). 397–408.
 Srivastava and Widom (2004) Utkarsh Srivastava and Jennifer Widom. 2004. Flexible time management in data stream systems. In Symposium on Principles of Database Systems (PODS). 263–274.
 Tangwongsan et al. (2017) Kanat Tangwongsan, Martin Hirzel, and Scott Schneider. 2017. LowLatency SlidingWindow Aggregation in WorstCase Constant Time. In Conference on Distributed EventBased Systems (DEBS). 66–77.
 Tangwongsan et al. (2015) Kanat Tangwongsan, Martin Hirzel, Scott Schneider, and KunLung Wu. 2015. General Incremental SlidingWindow Aggregation. In Conference on Very Large Data Bases (VLDB). 702–713.
 Traub et al. (2018) Jonas Traub, Philipp Grulich, Alejandro Rodriguez Cuellar, Sebastian Bres̈, Asterios Katsifodimos, Tilmann Rabl, and Volker Markl. 2018. Scotty: Efficient Window Aggregation for outoforder Stream Processing. In Poster at the International Conference on Data Engineering (ICDEPoster).
 Tucker et al. (2003) Peter A. Tucker, David Maier, Tim Sheard, and Leonidas Fegaras. 2003. Exploiting punctuation semantics in continuous data streams. Transations on Knowledge and Data Engineering (TKDE) 15, 3 (2003), 555–568.
 Zaharia et al. (2013) Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized Streams: Faulttolerant Streaming Computation at Scale. In Symposium on Operating Systems Principles (SOSP). 423–438.
Appendix A Running Time Lower Bound
This appendix proves Theorem 2.2, establishing a lower bound on any OoO SWAG implementation. For a permutation on an ordered set , denote by , , the th element of the permutation. Let be the number of elements among that are greater in value than —that is, . This measure coincides with our notion of outoforder distance: if elements with timestamps are inserted into OoO SWAG in that order, the th element has outoforder distance .
For an ordered set and , let denote the set of permutations on such that —i.e., every element is out of order by at most . We begin the proof by bounding the size of such a permutation set.
Lemma A.1 ().
For an ordered set and ,
Proof.
The base case is —the empty permutation. For nonempty , let be the smallest element in . Then, every can be obtained by inserting into one of the first indices of a suitable . In particular, each gives rise to exactly unique permutations in . Hence, . This expands to
which means , completing the proof. ∎
We will now prove Theorem 2.2 by providing a reduction that sorts any permutation using OoO SWAG.
Proof of Theorem 2.2.
Fix . Let be a OoO SWAG implementation instantiated with the operator . When queried, this aggregation produces the first element in the sliding window. Now let be any permutation in . We will sort using . First, insert elements into . By construction, each insertion has outoforder distance at most . Then, query and evict times, reminiscent of heap sort. At this point, has been sorted using a total of OoO SWAG operations.
By a standard informationtheoretic argument (see, e.g., (Cormen et al., 1990)), sorting a permutation in requires, in the worst case, time. There are two cases to consider: If , we have , so . Otherwise, we have and . Using Stirling’s approximation, we know , which is since . In either case, . ∎
Appendix B FiBA Correctness & Complexity
Proof of Theorem 3.3.
There are two cases. If the root has no children (is a leaf), the inner aggregate stored at the root represents the aggregation of all the values inside the root node. Otherwise, by the aggregation invariants, we have the following observations: (1) the aggregation at the right (left) finger is the aggregation of all values in the subtree that is the rightmost (leftmost) child of the root; and (2) the aggregation at the root, represented by an inner aggregate, is the aggregation of all values in the tree excluding those covered by (1). Therefore, query(), which returns leftFinger.agg root.agg rightFinger.agg, returns the aggregation of the values in the entire tree, in time order. ∎
Proof of Theorem 3.4.
The query() operation performs at most two operations; it clearly runs in time.
The search cost is bounded is follows. Let be the node at the finger where searching begins and recursively define as the parent of . This forms a sequence of nodes on the spine on which searching takes place. Recall that MIN_ARITY is a constant. Because the subtree rooted at has keys and the key we are searching is at distance , we know the key belongs in the subtree rooted at some , where . Thus, it takes steps to walk up the spine and at most another to locate the spot in the subtree as all leaves are at the same depth, bounding by . The rebalance cost is given by Lemma C.1 in the following section. Finally, following the aggregation invariants, a partial aggregation is affected only if it is along the search path or involved in rebalancing. Therefore, the number of affected nodes that requires repairs is bounded by . Treating as bounded by a constant, is

Comments
There are no comments yet.