# Sub-O(log n) Out-of-Order Sliding-Window Aggregation

Sliding-window aggregation summarizes the most recent information in a data stream. Users specify how that summary is computed, usually as an associative binary operator because this is the most general known form for which it is possible to avoid naively scanning every window. For strictly in-order arrivals, there are algorithms with O(1) time per window change assuming associative operators. Meanwhile, it is common in practice for streams to have data arriving slightly out of order, for instance, due to clock drifts or communication delays. Unfortunately, for out-of-order streams, one has to resort to latency-prone buffering or pay O( n) time per insert or evict, where n is the window size. This paper presents the design, analysis, and implementation of FiBA, a novel sliding-window aggregation algorithm with an amortized upper bound of O( d) time per insert or evict, where d is the distance of the inserted or evicted value to the closer end of the window. This means O(1) time for in-order arrivals and nearly O(1) time for slightly out-of-order arrivals, with a smooth transition towards O( n) as d approaches n. We also prove a matching lower bound on running time, showing optimality. Our algorithm is as general as the prior state-of-the-art: it requires associativity, but not invertibility nor commutativity. At the heart of the algorithm is a careful combination of finger-searching techniques, lazy rebalancing, and position-aware partial aggregates. We further show how to answer range queries that aggregate subwindows for window sharing. Finally, our experimental evaluation shows that FiBA performs well in practice and supports the theoretical findings.

## Authors

• 4 publications
• 10 publications
• 3 publications
• ### In-Order Sliding-Window Aggregation in Worst-Case Constant Time

Sliding-window aggregation is a widely-used approach for extracting insi...
09/29/2020 ∙ by Kanat Tangwongsan, et al. ∙ 0

• ### Smoothness of Schatten Norms and Sliding-Window Matrix Streams

Large matrices are often accessed as a row-order stream. We consider the...
03/15/2021 ∙ by Robert Krauthgamer, et al. ∙ 0

• ### Cardinalities estimation under sliding time window by sharing HyperLogLog Counter

Cardinalities estimation is an important research topic in network manag...
10/31/2018 ∙ by Jie Xu, et al. ∙ 0

• ### Fast Automatic Feature Selection for Multi-Period Sliding Window Aggregate in Time Series

As one of the most well-known artificial feature sampler, the sliding wi...
12/02/2020 ∙ by Rui An, et al. ∙ 0

• ### Sliding window order statistics in sublinear space

We extend the multi-pass streaming model to sliding window problems, and...
07/12/2018 ∙ by Dhruv Rohatgi, et al. ∙ 0

• ### Randomized sliding window algorithms for regular languages

A sliding window algorithm receives a stream of symbols and has to outpu...
02/21/2018 ∙ by Moses Ganardi, et al. ∙ 0

• ### Fast Detection of Outliers in Data Streams with the Q_n Estimator

We present FQN (Fast Q_n), a novel algorithm for fast detection of outli...
10/06/2019 ∙ by Massimo Cafaro, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Stream processing is now in widespread production use in domains as varied as telecommunication, personalized advertisement, medicine, transportation, and finance. It is generally the paradigm of choice for applications that expect high throughput and low latency. Regardless of domain, nearly every stream processing application involves some form of aggregation or another, with one of the most common being sliding-window aggregation.

Sliding-window aggregation derives a summary statistic over a user-specified amount of recent streaming data. Users also define how that summary statistic is computed, usually in the form of an associative binary operator (Boykin et al., 2014), as that is the most general known form for which computation can be effectively incrementalized to avoid naïvely scanning every window. While some associative aggregation operators, such as sum, are also invertible, many, such as maximum or Bloom filters, are merely associative but not invertible.

Recent algorithmic research on sliding-window aggregation has given much attention to streams with strictly in-order arrivals. The standard interface for sliding-window aggregation supports insert, evict, and query. In the in-order setting, there are algorithms (Shein et al., 2017; Tangwongsan et al., 2017) for associative operators that take only time per window change, without requiring the operator to be invertible nor commutative.

In reality, however, out-of-order streams are the norm (Akidau et al., 2013). Clock drift and disparate latency in computation and communication, for example, can cause values in a stream to arrive in a different order than their timestamps. Processing out-of-order streams is already supported in many stream processing platforms (e.g., (Akidau et al., 2013; Zaharia et al., 2013; Carbone et al., 2015; Akidau et al., 2015)). Still, in terms of performance, users who want the full generality of associative operators have to resort to latency-prone buffering or, alternatively, use an augmented balanced tree, such as a B-tree, at a cost of time per insert or evict, where is the window size. This stands in stark contrast with the in-order setting, especially for when the streams are nearly in order. Thus, we ask whether there exists a sub- algorithm for out-of-order streams; this paper is our affirmative answer.

This paper introduces the finger B-tree aggregator (FiBA), a novel algorithm that efficiently aggregates sliding windows on out-of-order streams and in-order streams alike. Each insert or evict takes amortized time111See Theorem 3.4 for a more formal statement., where the out-of-order distance is the distance from the inserted or evicted value to the closer end of the window. The complexity means for in-order streams, nearly for slightly out-of-order streams, and never more than even for severely out-of-order streams. The worst-case time for any one particular insert or evict is , which only happens in the rare case of rebalancing all the way up the tree. FiBA requires space and takes time for a whole-window query. Furthermore, it is as general as the prior state-of-the-art, supporting variable-sized windows and only requiring associativity from the operator.

Our solution can be summarized as finger B-trees (Guibas et al., 1977) with position-aware partial aggregates. Starting with the classic B-trees, we first add pointers, or fingers, to the start and end of the tree. These fingers make it possible to perform the search for the value to insert or evict in worst-case time. Second, we adapt a specific variant of B-trees where the rebalance to fix the size invariants takes amortized time; specifically, we use B-trees with MAX_ARITY$=2\cdot$MIN_ARITY and where rebalancing happens after-the-fact (Huddleston and Mehlhorn, 1982). Third and most importantly, we develop novel position-aware partial aggregates and a corresponding algorithm to bound the cost of aggregate repairs to the cost of search plus rebalance.

The running time of FiBA is asymptotically the best possible in general. We prove a lower bound showing that for insert and evict operations with out-of-order distance up to , the amortized cost of an operation in the worst case must be at least .

Furthermore, we show how FiBA can support window sharing with query time logarithmic in the subwindow size and the distance from the largest window’s boundaries. Here, the space complexity is , where is the size of the largest window.

Our experiments confirm the theoretical findings and show that FiBA performs well in practice. For out-of-order streams, it is a substantial improvement over existing algorithms in terms of both latency and throughput. For strictly in-order streams (i.e., FIFO), it demonstrates constant time performance and remains competitive with specialized algorithms for in-order streams.

We hope FiBA will be used to make streaming applications less resource-hungry and more responsive for out-of-order streams.

## 2. Problem Statement: OoO SWAG

This section states the problem addressed in this paper more formally. Consider a data stream where each value carries a logical time in the form of a timestamp. Throughout, we denote a timestamped value as . For example, is the value at logical time . The examples in this paper use natural numbers for timestamps, but our algorithms do not depend on any properties of the natural numbers besides being totally ordered. For instance, our algorithms work just as well with date/time representations or with real numbers.

It is intuitive to assume that values in such a stream arrive in nondecreasing order of time (in order). However, due to clock drift and disparate latency in computation and communication, among other factors, values in a stream often arrive in a different order than their timestamps. Such a stream is said to have out-of-order (OoO) arrivals—there exists a later-arriving value that has an earlier logical time than a previously-arrived value.

Our goal in this paper is to maintain the aggregate value of a time-ordered sliding window in the face of out-of-order arrivals. To motivate our formulation below, consider the following example, which maintains the max and the maxcount, i.e., the number of times the max occurs in the sliding window.

Initially, the values arrive in the same order as their associated timestamps . The maximum value is , and maxcount is because occurs twice. When stream values arrive in order, they are simply appended. For instance, when arrives, it is inserted at the end:

However, when values arrive out-of-order, they must be inserted into the appropriate spots to keep the sliding window time-ordered. For instance, when arrives, it is inserted between timestamps and :

As for eviction, stream values are usually removed from a window in order, for instance, evicting from the front:

Notice that, in general, eviction cannot always be accomplished by simply inverting the aggregation value. For instance, evicting cannot be done by “subtracting off” the value from the current aggregation value. The algorithm needs to efficiently discover the new max 4 and maxcount 2:

Monoids. There are other streaming aggregations besides max and maxcount. Monoids capture a large class of commonly used aggregations (Boykin et al., 2014; Tangwongsan et al., 2015). A monoid is a triple , where is a binary associative operator on , with being its identity element. Notice that only needs to be associative; it does not need not be commutative or invertible. For example, to express max and maxcount as a monoid, if and are the max and maxcount, then

 ⟨m1,c1⟩⊗max,maxcount⟨m2,c2⟩=⎧⎪⎨⎪⎩⟨m1,c1⟩ifm1>m2⟨m2,c2⟩ifm1

Since is associative, no parentheses are needed for repeated application. When the context is clear, we even omit , for example, writing qstu for . This concise notation is borrowed from the mathematicians’ convention of omitting explicit multiplication operators.

OoO SWAG. This paper is concerned with maintaining an aggregation on a time-ordered sliding window where the aggregation operator can be expressed as a monoid. This can be formulated as an abstract data type (ADT) as follows:

###### Definition 2.1 ().

Let be a binary operator operator from a monoid and its identity. The out-of-order sliding-window aggregation (OoO SWAG) ADT is to maintain a time-ordered sliding window , , supporting the following operations:

• [label=—, topsep=2pt,leftmargin=1.5]

• insert( : Time,  : Agg) checks whether is already in the window, i.e., whether there is an such that . If so, it replaces by . Otherwise, it inserts into the window at the appropriate location.

• evict( : Time) checks whether is in the window, i.e., whether there is an such that . If so, it removes from the window. Otherwise, it does nothing.

• query() : Agg combines the values in time order using the operator. In other words, it returns if the window is non-empty, or if empty.

Lower Bound. How fast can OoO SWAG operations be supported? For in-order streams, the SWAG operations can be handled in time per operation (Tangwongsan et al., 2017; Shein et al., 2017). But the problem becomes more difficult when the stream has out-of-order arrivals. We prove in this paper that to handle out-of-order distance up to , the amortized cost of a OoO SWAG operation in the worst case must be at least .

###### Theorem 2.2 ().

Let be given such that and . For any OoO SWAG algorithm, there exists a sequence of operations, each with out-of-order distance at most , for which the algorithm requires a total of at least time.

The proof, which appears in Appendix A, shows this in two steps. First, it establishes a sorting lower bound for permutations on elements with out-of-order distance at most . Second, it gives a reduction proving that maintaining OoO SWAG is no easier than sorting such permutations.

Orthogonal Techniques. OoO SWAG operations are designed to work well with other stream aggregation techniques.

The insert() operation supports the case where is already in the window, so it works with pre-aggregation schemes such as window panes (Li et al., 2005), paired windows (Krishnamurthy et al., 2006), cutty windows (Carbone et al., 2016), or Scotty (Traub et al., 2018). For instance, for a 5-hour sliding window that advances in 1-minute increments, the logical times can be rounded to minutes, leading to more cases where is already in the window.

The evict() operation accommodates the case where is not the oldest time in the window, so it works with streaming systems that use retractions (Abadi et al., 2005; Akidau et al., 2013, 2015; Barga et al., 2007; Brito et al., 2008; Chandramouli et al., 2010; Li et al., 2008; Zaharia et al., 2013).

Neither insert() nor evict() are limited to values of that are near either end of the window, so they work in the general case, not just in cases where the out-of-order distance is bounded by buffer sizes or low watermarks.

Query Sharing. As defined above, OoO SWAG does not support query sharing. However, query sharing for different window sizes can be accommodated via a range query:

• [label=—, topsep=2pt,leftmargin=1.5]

• query( : Time,  : Time) : Agg aggregates exactly the values from the window whose times fall between and . That is, it returns , where is the largest such that and is the smallest such that . If the subrange contains no values, the operation returns .

In these terms, the problem statement of this paper is

to design and implement efficient OoO SWAG operations as well as range-query support for arbitrary monoids .

## 3. Finger B-Tree Aggregator (FiBA)

This section introduces our algorithm gradually, giving intuition along the way. It begins by describing a basic algorithm (Section 3.1) that utilizes a B-tree augmented with aggregates. This algorithm takes time for each insert or evict operation. Reducing the time complexity below requires further observations and ideas. This is explored intuitively in Section 3.2 with details fleshed out in Section 3.3.

### 3.1. Basic Algorithm: Augmented B-Tree

One way to implement the OoO SWAG is to start with a classic B-tree with timestamps as keys and augment that tree with aggregates. This is a baseline implementation, which will be built upon. Even though any balanced trees can, in fact, be used, we chose the B-tree because it is well-studied and has customizable fan-out degree, providing opportunities for experimentation.

There are many B-tree variations. The range of permissible arity, or fan-out degree of a node, is controlled by two parameters MIN_ARITY and MAX_ARITY. While MIN_ARITY can be any integer greater or equal to , most B-tree variations require that MAX_ARITY be at least . Hence, if —or simply when the context is clear—denotes the arity of a node , then a B-tree obeys the following size invariants:

• [leftmargin=1em]

contains . ∎

###### Theorem 3.2 ().

In a classic B-tree augmented with aggregates, the operation query() costs at most  time and operations insert() or evict() take at most time.

###### Proof.

As is standard, we treat the arity of a node as bounded by a constant. The query operation and the local insert or evict visit only a single node. The search, rebalance, and repair visit at most two nodes per tree level. The work is thus bounded by the tree height, which is since the tree is height-balanced (Bayer and McCreight, 1972; Cormen et al., 1990; Huddleston and Mehlhorn, 1982). Hence, the total cost per operation is . ∎

### 3.2. Breaking the O(logn) Barrier

The basic algorithm just described supports OoO SWAG operations in time using an augmented classic B-tree. To improve upon the time complexity, we now discuss the bottlenecks in the basic algorithm and outline a plan to resolve them.

In the basic algorithm, the insert() and evict() operations involve four steps: (1) search for the node where belongs; (2) locally insert or evict; (3) rebalance to repair size invariants; and (4) repair remaining aggregation invariants. If one treats arity as constant, the local insertion or eviction operation takes constant time, as does the query() operation. But each of the steps for search, rebalance, and repair takes up to time. Hence, these are the bottleneck steps and will be improved upon as follows:

1. [label=()]

2. By maintaining “fingers” to the leftmost and rightmost leaves, we will reduce the search complexity to , where is the distance to the closer end of the sliding-window boundary. This means that in the FIFO or near-FIFO case, the search complexity will be constant.

3. By using an appropriate MAX_ARITY and a somewhat lazy strategy for rebalancing, we will make sure that rebalance takes no more than constant in the amortized sense. This means that for any operation that affects the tree structure, the cost to restore the proper tree structure amounts to constant per operation, regardless of out-of-order distance.

4. By introducing position-dependent aggregates, we will ensure that repairs to the aggregate values are made only to nodes along the search path or involved in restructuring. This means that the repairs cost no more than the cost of search and rebalance.

We combine the above ideas into a novel sub- algorithm for OoO SWAG. Below, we describe how these ideas will be implemented intuitively, leaving detailed algorithms and proofs to Section 3.3.

Sub- Search. In classic B-trees, a search starts at the root and ends at the node being searched, henceforth called . Often, is a leaf, so the search visits nodes. However, instead of starting at the root, one can start at the left-most or right-most leaf in the tree. This requires pointers to the left-most or right-most leaf, henceforth called the left and right fingers (Guibas et al., 1977). In addition, we keep a parent pointer at each node. Hence, the search can start at the nearest finger, walk up to the nearest common ancestor of the finger and , and walk down from there to . The resulting algorithm runs in , where  is the distance from the nearest end of the window–or more precisely, is the number of timed values from  to the nearest end of the window.

Sub- Rebalance. Insertions and evictions can cause nodes to overflow or underflow, thus violating the size invariants. There are two popular strategies that address this: either before or after the fact. The before-the-fact strategy ensures that ancestors of the affected node are not at risk of overflow or underflow by preventive rebalancing, so that the arity  is at least one further away from the threshold required by the size invariants (e.g., (Cormen et al., 1990)). The after-the-fact strategy first performs the local insert or evict step, then repairs any resulting overflow or underflow to ensure the size invariants hold again by the end of the entire insert or evict operation. We adopt the after-the-fact strategy, which has been shown to take amortized constant time (Huddleston and Mehlhorn, 1982) as long as . For simplicity, we use . The amortized cost is as rebalancing rarely goes all the way up the tree. The worst-case cost is , bounded by the tree height.

Sub- Repair. The basic algorithm stores at each node  the up-aggregate , i.e., the partial aggregate of the subtree under . This is problematic, because it means that an insertion or eviction at a node , usually a leaf, affects the partial aggregates stored in all ancestors of —that is, the entire path up to the root. To circumvent this issue, we need an arrangement of aggregates that can be repaired by traversing to a finger, without always traversing to the root. For this, we make each node store the kind of partial aggregate suitable for its position in the tree. Furthermore, because the root no longer contains the aggregate of the whole tree, we will ensure that query() can be answered by combining partial aggregates at the left finger, the root, and the right finger.

To meet these requirements, we define four kinds of partial aggregates in Figure 2. As illustrated in Figure 3, they are used in a B-tree according to the following aggregation invariants:

• [leftmargin=0em,label=,itemsep=2pt]

• Non-spine nodes store the up-aggregate . Such a node is neither a finger nor an ancestor of a finger. This aggregate must be repaired whenever the subtree below it changes. Figure 3(A) shows nodes with up-aggregates in white, light blue, or light green. For example, the center child of the root contains the aggregate hijklmn, comprising its entire subtree.

• The root stores the inner aggregate . This aggregate is only affected by changes to the inner part of the tree, and not by changes below the left-most or right-most child of the root. Figure 3(A) shows the inner parts of the tree in white and the root in gray, and the root stores the aggregate ghijklmno.

• Non-root nodes on the left spine store the left aggregate . For a given node , the left aggregate encompasses all nodes under the left-most child of the root except for ’s left-most child . When a change occurs below the left-most child of the root, the only aggregates that need to be repaired are those on a traversal up to the left spine and then down to the left finger. Figure 3(A) shows the left spine in dark blue and nodes affecting it in light blue. For example, the node in the middle of the left spine contains the aggregate cdef, comprising the left subtree of the root except for the left finger.

• Non-root nodes on the right spine store the right aggregate . This is symmetric to the left aggregate . When a change occurs below the right-most child of the root, only aggregates on a traversal to the right finger are repaired. Figure 3(A) shows the right spine in dark green and nodes affecting it in light green. For example, the node in the middle of the right spine contains the aggregate qst of the right subtree of the root except for the right finger.

### 3.3. Using Finger B-Trees

This section describes an algorithm that implements the OoO SWAG using a finger B-tree augmented with aggregates. It achieves sub- time complexity by maintaining the size invariants from Section 3.1 and the aggregation invariants from Section 3.2.

The algorithmic complexity analysis will account for the cost of split, merge, or move operations by counting coins. Specifically, the analysis counts the number of split, merge, or move steps of an insert or evict operation as spent coins. Coins can be imagined as being stored at tree nodes, so they can be used to pay for split, merge, or move operations later. Throughout this paper, coins are visualized as little golden circles next to tree nodes. Sometimes, coins must be added or removed from the outside to make up the difference between spent coins and coins in the tree before and after each step. We refer to these coins as being billed or refunded. The key result of the proof will be that billed coins never exceed 2 for any insert() or evict(), hence rebalancing has amortized constant time complexity.

Figures 36 show concrete examples covering all the interesting cases of the algorithm. Each state, for instance (A), shows a tree with aggregates and coins. Each step, for instance AB, shows an insert or evict, illustrating how it affects the tree, its partial aggregates, and coins.

• [leftmargin=1em]

• In Figure 3, Step AB is an in-order insert without rebalance, which only affects the aggregate at a single node, the right finger.

• Step BC is an out-of-order insert without rebalance, affecting aggregates on a walk to the right finger.

• Step CD is an in-order evict without rebalance, affecting the aggregate at a single node, the left finger.

• Step DE is an out-of-order insert to a node with arity MIN_ARITY, causing an overflow; rebalancing splits it.

• Step EF is an evict from a node with MIN_ARITY, causing the node to underflow; rebalancing merges it with its neighbor.

• In Figure 6, Step GH is an insert that causes nodes to overflow all the way up to the root, causing a height increase followed by splitting the old root. This affects aggregates on all split nodes and on both spines.

• In Figure 6, Step IJ is an evict that causes first an underflow that is fixed by a merge, and then an underflow at the next level where the neighbor node is too big to merge. The algorithm repairs the size invariant with a move of a child and a timed value from the neighbor. This step affects aggregates on all nodes affected by rebalancing plus a walk to the left finger.

• In Figure 6, Step KL is an evict that causes nodes to underflow all the way up to the root, causing a height decrease to eliminate the old empty root. This affects aggregates on all merged nodes and on both spines.