Cloud operators require fast and accurate single-device and network-wide detection of Heavy Hitters (HH) (most frequent flows) and of Hierarchical Heavy Hitters (HHH) (most frequent subnets) to attain real-time visibility of their traffic. These capabilities are essential building blocks in network functions such as load balancing (LBSigComm, ; LoadBalancing, ; SilkRoad, ; LBConext, ), traffic engineering (TrafficEngeneering, ; TrafficEngneering2, ) and attack mitigation (DDoSwithHHH, ; DDOSwithHHH3, ; HHHSwitch, ; DDoSwithHHH2, ; PseudoWindowHHH, ).
Quickly identifying changes in the HH and HHH is a key challenge (Change1, ) and can have a dramatic impact on the performance of such application. For example, faster detection of HH flows allows load-balancing and traffic engineering solutions to respond to traffic spikes swiftly. For attack mitigation systems, quicker and more accurate detection of HHH subnets means that less attack traffic reaches the victim. This is particularly important for combating Distributed Denial of Service (DDoS) attacks on cloud services, as they become a growing concern with the increasing number of connected devices (i.e., Internet of things) (DynAttack, ; DDoSReport, ).
In this work, we show that sliding windows are faster than interval based measurements in detecting new (hierarchical) heavy hitters. Unfortunately, the idea of using a sliding window for HHH was previously dismissed, as the existing sliding-window algorithms were “markedly slower and less space-efficient in practice”, to quote (HHHMitzenmacherArXiv, ). Intuitively, this is because the sampling methods used for accelerating interval methods do not naturally extend to sliding windows, even for the simpler HH problem. As a result, the merits of sliding windows have not been properly evaluated. Moreover, sliding windows do not provide network-wide measurement capabilities, as opposed to interval approaches (HHTagging, ; Sigcomm2018Networkwide, ; RexfordNetworkwide, ; Everflow, ). Consequently, most applications that use HH or HHH rely on interval based measurements (DDoSwithHHH2, ; DDOSwithHHH3, ; DDOSwithHHH4, ; DDoSwithHHH, ).
Contributions. Our goal is to make sliding windows practical for network applications. Accordingly, we focus on the fundamental HH and HHH problems in both single-device and network-wide measurement scenarios. We then introduce the Memento family of four algorithms—one for each problem (i.e., HH and HHH) and each measurement scenario (i.e., single-device and network-wide). These are rigorously analyzed and provide worst-case accuracy guarantees. Moreover, in the network-wide setting, we maximize the accuracy guarantee given a per-packet control bandwidth budget.
Using extensive evaluations on real packet traces, we show that the Memento algorithms achieve speedups of up to 14 in HH and up to 273 in HHH when compared to existing sliding-window solutions. We also show that they match the speed of the fastest interval-based algorithm (RHHH, ). Our algorithms detect emerging (hierarchical) heavy hitters consistently faster than interval-based approaches, and their accuracy is similar to that of slower sliding-window solutions.
Next, we implement a proof-of-concept network-wide HH and HHH measurement system. The controller uses our network-wide algorithms, and the measurement points are implemented on top of the popular HAProxy cloud load-balancer, which we extended with capabilities to rate-limit traffic from entire subnets. We evaluate the achievable accuracy given a per-packet bandwidth budget for reporting measurement data to the control. We introduce new communication methods and compare them with a traditional approach. We create an HTTP flood attack from 50 subnets and show that the detection time is near-optimal while using a bandwidth budget of 1 byte per packet. For the same budget, our methods exhibit a reduction of up to 37 in the number of undetected flood requests compared to the alternative. Finally, we open-source the Memento algorithms and the HAProxy cloud load-balancer extension (TRCode, ).
Streaming algorithms (muthukrishnan2005data, ) are designed to process a stream (sequence) of elements (in our case, packets) while analyzing the underlying data. The main challenge of these algorithms is the sheer volume of the data that they are required to process, motivating space-efficient solutions that process elements extremely fast.
One of the most studied streaming problems is that of identifying the Heavy Hitters (HH) (i.e., elephant flows) – the elements that frequently appear in the data. For instance, Space Saving (SS) (SpaceSavings, ) is a popular HH algorithm. SS utilizes a set of counters, each associated with a key (flow identifier) and a value. Whenever a packet arrives, SS increments the value of its flow’s counter if it exists. Otherwise, if there is a free counter, it is allocated to the flow with a value of . Finally, if no available counter exists, SS replaces the identifier of the flow with the smallest value with that of the current flow and increments its value. For example, assume that the minimal counter is associated with flow and has a value of , while flow does not have a counter. If a packet of flow arrives, we will reallocate ’s counter to and set its value to , leaving without a counter. When queried for the value of flow , SS returns the value of the counter associated with , or the value of the minimal counter if there is no counter for .
SS runs on intervals, i.e.,
it estimates the flow sizes from the beginning of the measurement and is often reset to allow its data to be fresh(DDOSwithHHH3, ). Another way for analyzing only the recent data is to use a sliding window algorithm (DatarGIM02, ) in which an answer to a query only reflects the last packets. WCSS (WCSS, ) extends Space Saving to sliding windows, and achieves constant update and query time. Unfortunately, WCSS is too slow to keep up with line rates, and it We expand on WCSS and how it is generalized by Memento in Section 4.1.
Hierarchical Heavy Hitters (HHH) are a generalization of the HHH problem in which we identify frequent IP networks. That is, rather than looking for the large flows, we look for the networks whose aggregated volume exceeds a predetermined threshold. To that end, MST (HHHMitzenmacher, ) proposed to utilize SS for tracking all networks. Specifically, it uses one SS instance for each network size and whenever a packet arrives, it computes all its prefixes (specifically updates, as the size of the hierarchy) MST has two main drawbacks: first, it makes multiple SS updates, while even a single update may be too slow for keeping up with line rates; second, it solves the problem on intervals rather than on sliding windows. To alleviate the first problem, RHHH (RHHH, ) proposes to draw a single random integer uniformly between and (for a parameter ). If then RHHH makes a single SS update to the ’th prefix, and otherwise it ignores the packet. For example, for the above packet, would be ignored while would lead the algorithm to feed into the relevant SS instance. While RHHH is fast enough to keep up with the line rate, its approach does not seem to extend to sliding windows easily, a gap we close in this paper.
3. Why sliding windows?
In this paper, we argue that once a new heavy hitter emerges, the sliding window method identifies it most quickly and accurately. Therefore, network applications that capitalize on sliding windows can potentially react faster to changes in the traffic. For simplicity, we consider accurate measurements, but the results are also valid for approximate measurements.
Window vs. interval. We start by comparing sliding windows to the Interval method that is commonly used in HHH-based DDoS mitigation systems (DDoSwithHHH2, ; DDOSwithHHH3, ; DDoSwithHHH, ; PseudoWindowHHH, ). As depicted in Figure (a)a, the Interval method relies on sequential interval measurements. Usually, the measurement data is available at the end of each measurement interval, wherein the improved Interval method it is accessible throughout each measurement period. There are two possible failure modes, namely: failing to detect a new heavy hitter (false negative), or falsely declaring a heavy hitter (false positive). Algorithms that follow the (improved) Interval method would either have false positives or false negatives. In contrast, sliding windows can avoid both errors. To show this, we start with the following definitions:
Definition 3.1 (Window Frequency).
We denote by the window frequency of flow , i.e., the number of packets transmitted by over the last packets.
Definition 3.2 (Normalized Window Frequency).
We denote by the normalized window frequency of flow , i.e., the fraction of ’s packets within the last packets.
Next, window heavy hitters are flows whose normalized window frequency is larger than a user-defined threshold:
Definition 3.3 (Window Heavy Hitter).
Flow is a window heavy hitter if its normalized window frequency is larger than , where is a user-defined threshold.
Window optimality. The optimal detection point for new window heavy hitters is simply once their normalized window frequency is above a user-defined threshold. Reporting a flow earlier is wrong (false positive), and reporting it afterwards is (too) late. This means that sliding window measurements, by definition, have an optimal detection time.
Motivation. We motivate the definition for window heavy hitters with an experimental scenario where a new flow appears during the measurement and consumes, at a constant rate, a larger-than-the-threshold portion of the traffic after its initial appearance. We measure how long it takes for each measurement method to identify the new heavy hitter and evaluate the following measurement methods:
(i) Interval. The window frequency of each flow is estimated at the end of every measurement. This method represents limitations of sampling techniques (e.g., (BUS, ; RHHH, )) that require time to converge and thus cannot provide estimates during the measurement. (ii) Improved interval. Same as interval, but flow frequencies are estimated upon the arrival of each packet. This represents the best case scenario for the Interval method. (iii) Window. Sliding window, where frequencies are estimated upon packet arrivals.
Figure (b)b plots the detection time for each method as a function of the normalized frequency of the new heavy hitter. Intuitively, larger heavy hitters are detected faster, because less time passes before their normalized window frequency reaches the threshold. Indeed, the sliding window approach is always faster than the Interval and improved Interval methods. When the frequency is close to the detection threshold, we get up to faster detection time compared to the Interval method. At the end of the tested range, sliding windows are still over quicker. The Interval method is the slowest, as it estimates frequencies only at the end of the measurement. Thus, such a usage pattern is undesired for systems such as load balancing and attack mitigation.
4. Sliding window algorithms
Our next step is to make sliding windows accessible to cloud operators. We do so by first introducing new single-device algorithms that are significantly faster than existing techniques, and then extend them to efficient network-wide algorithms that combine information from many measurement points to obtain a global perspective.
4.1. Heavy Hitters on Sliding Windows
Our goal is to produce faster sliding window algorithms. Intuitively, one can accelerate the performance of a heavy hitter algorithm by sub-sampling the packets. That is, we would like to sample packets with a probability of, use an HH algorithm with a window size of packets, and then multiply its estimations by a factor of . Unfortunately, this does not yield the desired outcome as the number of samples from the window varies whereas sliding-window HH algorithms are designed for fixed-sized windows. Specifically, since the actual number of samples from the sized window is distributed , this approach results in an additional error of in the size of the reference window. Since we are interested in small values of to achieve speedup (see Section 6.3), this approach results in a considerable error.
Memento overview. The key idea behind Memento is to decouple the computationally expensive operation of updating a packet (Full update) from the lightweight operation of Window update. Specifically, for each packet, Memento performs the Full update operation with probability ; otherwise, it makes the quicker Window update.
Therefore, Memento alternates between the fast Window updates and the slower Full updates. Full updates include (1) forgetting outdated data and (2) adding a new item to the measurement data structure. On the other hand, Window updates only involve (1) forgetting outdated data. That is, Memento maintains a -sized window but most of the packets within that window are missing. Thus, it attains speedup but avoids the additional error that is caused by uniform samples. The concept is exemplified in Figure (a)a.
Implementation. For simplicity, we built Memento on top of an existing sliding window HH algorithm. This makes it easier to implement, verify, and then compare with the current approaches. We picked WCSS as the underlying algorithm (WCSS, ), but our approach is general and works on other window algorithms as well (e.g., (FAST, ; HungAndTing, )). Intuitively, when , Memento becomes identical to WCSS as it performs a full update for each packet.
As detailed in Algorithm 1, given some error parameter such that , Memento divides the stream into frames of size , where each frame is then further partitioned into equal-sized blocks. Intuitively, Memento keeps count of how many times each item arrived during the last frame, and each time this counter reaches a multiple of the block size, it records this event as an overflow. Memento uses a queue of queues , which contains queues – one queue for each block that overlaps with the current window. Each queue in contains an ordered list of items that overflowed in the corresponding block. When a block ends, we remove the oldest queue from , as it no longer overlaps with the window. Additionally, we append a new empty queue to . Note that Memento does not count accurately, but instead uses a Space Saving (SpaceSavings, ) instance (denoted ) to approximately count the in-frame frequency. Space Saving (SS) is an algorithm that uses counters to provide frequency estimations and to find the heavy hitters over a stream or interval. Allocated with counters, it guarantees that the additive error is bounded by (when the number of packets is , as in Memento). We show that despite the approximate count within frames, Memento keeps the overall error bounded as guaranteed. Intuitively, Memento provides the machinery to both speed-up Space Saving and extend it to sliding windows. Finally, is cleared at the end of each frame.
The frequency of an item is estimated by multiplying its number of overflows by the block size and adding the remainder of its appearance count as reported by . In (HHHMitzenmacher, ), MST has one-sided error, and thus we choose to keep the error one-sided as well for comparison purposes. To do so, Memento adds to each item’s estimation. It then multiplies the result by as a Full update is performed on average once per packets. The table counts the number of overflows for each item for quick frequency queries. Memento de-amortizes the update of , achieving constant worst case time. To that end, when processing a packet, H-Memento pops (at most) one flow from the queue of the oldest block (see lines 8-11). This ensures that the worst case update time is constant as we are guaranteed that by the end of the block we will have an empty queue and will be fully updated. Finally, for finding the heavy hitters themselves (rather than just estimating flow sizes), Memento iterates over the flows with entry in and estimate their sizes. Since every heavy hitter must overflow in the window, we are guaranteed that it will have such an entry.
4.2. Extending to Hierarchical Heavy Hitters
Hierarchical heavy hitters monitor subnets and flow aggregates in addition to individual flows. We start by introducing existing approaches for HHH measurements on sliding windows.
Existing approaches. In MST (HHHMitzenmacher, ), multiple HH instances are used to solve the HHH problem. This design trivially extends to sliding windows by replacing the HH building blocks with window algorithms (e.g., WCSS (WCSS, )). This was proposed by (HHHMitzenmacher, ) but dismissed as impractical. Replacing the underlying algorithms with Memento is slightly better as we can perform Window updates to most instances. Unfortunately, the update complexity remains which may still be too slow. In contrast, H-Memento achieves constant time updates, matching the complexity of interval algorithms (RHHH, ). Another natural approach comes from the RHHH (RHHH, ) algorithm. RHHH shares the same data structure as MST but randomly updates at most a single HH instance which allows for constant time updates. Additionally, it makes small changes to the query procedure to compensate for the sampling error and guarantees that (with high probability) it will have no false negatives. This method does not work for sliding windows, as each HH instance is updated a varying number of times and monitors a possibly different window.
H-Memento’s overview. In H-Memento we differ from the lattice structure of RHHH and MST. That is, we maintain a single large Memento instance and use it to monitor all the sampled prefixes. Therefore, we use just one sliding window to measure all subnets, which the underlying Memento does in constant time. This approach also has engineering benefits such as code reuse, simplicity, and maintainability. The update procedure of H-Memento is illustrated in Figure (b)b. Next, we proceed with notations and definitions for the HHH problem, which we later use to detail H-Memento.
|The packet stream.|
|Current number of packets (the stream length).|
|The window size.|
|Size of the hierarchy.|
|Sampling ratio for HHH, .|
|Variable for the i’th appearance of a prefix .|
|Sampled prefixes with id .|
|Sampled prefixes from all ids.|
|Domain of fully specified items.|
|Overall, sample, algorithm’s error guarantee.|
|Overall, sample, algorithm’s confidence.|
|Conditioned frequency of with respect to .|
|Subset of with the closest prefixes to q.|
|Frequency of prefix q|
|Upper and lower bounds for .|
inverse CDF of the normal distribution .
|per-packet control bandwidth budget.|
|the minimal header size (in bytes),|
|bytes required to report a packet.|
|number of measurement points.|
|number of samples in each report.|
|overall error in network-wide settings.|
HHH notations and definitions. For brevity, Table 1 summarizes the notations used in this work. We consider IP prefixes (e.g., ). A prefix without a wildcard (e.g., ) is called fully specified. The notation is the domain of the fully specified items. A prefix generalizes another prefix if is a prefix of . For example, and generalize the (fully specified) . The parent of a prefix is the longest generalizing prefix, e.g., is ’s parent. Definition 4.1 formalizes this concept.
Definition 4.1 (Generalization).
Let be prefixes. We say that generalizes and denote if for each dimension , or . We denote the set of fully specified items generalized by using . Similarly, the set of every fully specified item that is generalized by a set of prefixes is denoted by: . Moreover, denote if and .
Definition 4.1 also deals with the more general multidimensional case. For example, we can consider tuples of the form (source IP, destination IP). In that case, fully specified “prefixes” are fully determined in both dimensions,
Also, observe that “prefixes” now have two parents, e.g.,
are both parents to
The size of the hierarchy () is the number of different prefixes that generalize a fully specified prefix. Next, we look at a set of prefixes and denote as the set of prefixes in that are most closely generalized by the prefix . That is, .
For example, consider the prefix and the set , then we have . The window frequency of a prefix is the total sum of packets within the window that are generalized by , i.e., Note that each packet is generalized by different prefixes. This motivates us to look at the conditioned (residual) frequency that a prefix adds to a set of already selected prefixes . The conditioned frequency is defined as: .
We denote by the number of times prefix is sampled in the window, is an upper bound on and is a lower bound. The notation stands for the sampling rate of each specific prefix.
– an estimator for ’s frequency.
– an upper bound for ’s frequency.
– a lower bound for ’s frequency.
We now define the depth of a prefix (or a prefix tuple). Fully specified items are of depth 0, their parents are of depth 1 and more generally, the parent of an item with depth is of depth . denotes the maximal depth; observe that this may be lower than (e.g., in 2D byte-hierarchies and ). Hierarchical heavy hitters are calculated by iterating over all fully specified items (depth ). If their frequency is larger than a threshold of , we add them to the set . Then, we go over all the items with depth and if their conditioned frequency, with regard to , is above , we add them to the set. We name the resulting set and repeat the process times, until the set contains the (exact) hierarchical heavy hitters. Unfortunately, we need space that is linear in the stream size to calculate exact HHH (and even plain heavy hitters) (TCS-002, ). Hence, as done by previous work (RHHH, ; HHHMitzenmacher, ; Cormode2003, ; Cormode2004, ; CormodeHHH, ), we solve approximate HHH.
A solution to the approximate HHH problem is a set of prefixes that satisfies the Accuracy and Coverage conditions (Definition 4.2). Here, Accuracy means that the estimated frequency of each prefix is within acceptable error bounds and Coverage means that the conditioned frequency of prefixes not included in the set is below the threshold. This does not mean that the conditioned frequency of prefixes that are included in the set is above the threshold. Thus, the set may contain a small number of subnets misidentified as HHH (false positives).
Definition 4.2 (Approximate HHHs).
An algorithm solves - Approximate Window Hierarchical Heavy Hitters if it returns a set of prefixes that, for an arbitrary run of the algorithm, satisfies:
Accuracy: If then
Coverage: If then
H-Memento’s full description. A pseudo-code for H-Memento is given in Algorithm 2. The output method performs the HHH set calculation as explained for exact HHH. The calculation yields an approximate result as we only have an approximation for the frequency of each prefix. Thus, we conservatively estimate conditioned frequencies.
For two dimensions, we use the inclusion-exclusion principle (Definition 4.3) to avoid double counting.
Definition 4.3 ().
Denote by the greatest lower bound of and . is a unique common descendant of and s.t. If and have no common descendants,
A pseudo code for the update method is given in Algorithm 2, which is the same for one and two dimensions. The difference between these is encapsulated in the calcPred method which uses Algorithm 3 for one dimension and Algorithm 4 for two. In two dimensions, is first set in Line 8 of Algorithm 2. Then, we remove previously selected descendant heavy hitters (Line 3, Algorithm 4) and finally we add back the common descendants (Line 6, Algorithm 4)). The sampling error is accounted for in Line 9. Intuitively, our analysis shows which values guarantee that H-Memento solves the approximate HHHs problem. A formal proof of the algorithm’s correctness appears in Section A.
4.3. Network-Wide Measurements
As Figure 7 illustrates, we now discuss a centralized controller that receives data from multiple clients and forms a network-wide view of the traffic (e.g., network-wide HH or HHH). Similarly to (RexfordNetworkwide, ; HHTagging, ) we assume that there are several measurement points and that each packet is measured once. Our design focus is on two critical aspects of this system: (1) a communication method between the clients and the controller that conveys the information in a timely and bandwidth-efficient manner, and (2) a fast controller algorithm.
Formal model. First, we define a sliding window in the network-wide model as the last packets that were measured somewhere in the network. Intuitively, we want the controller to analyze the traffic of the most recent packets in the entire network, as observed by the measurement points. For example, we may want to monitor the last million packets in the entire network.
(1) Communication method. We now suggest three methods to communicate with the controller. For each method, the frequency of messaging with the control is according to the bandwidth budget (). That is, smaller reports can be sent more frequently but also deliver less information.
Aggregation. Existing HH algorithms are often mergeable, i.e., the content of two HH instances can be efficiently merged (AndersonIMSUM, ). We are unaware of previous work that targets HHH, but since MST (HHHMitzenmacher, ) and RHHH (RHHH, ) use HH algorithms as building blocks then they can be merged as well. This capability motivates the Aggregation communication method. In this method, each client periodically transmits all the entries of its HH algorithm to the controller. Given enough bandwidth, this method is intuitively the most communication-efficient, as all data is transmitted. However, as each message is large, we infrequently send messages to meet the bandwidth budget, which creates inaccuracies.
Sample. Most network devices are capable of transmitting uniform packet samples to the controller. Motivated by this capability, the Sample method samples packets with a fixed probability , and sends a report to the controller once per packets. Thus on average, each message contains a single sample. This information is enveloped by the usual packet headers that are required to deliver the packet in the network. We observe that this uses a significant portion of the bandwidth for the header fields of the transmitted packet. Yet, the Sample method is considerably easier to deploy than the Aggregate option, as the nodes only sample packets and do not run the measurement algorithms. The communication pattern is network-friendly as we get a stable flow of traffic from the clients to the controller.
Batch. The Batch approach is designed to utilize bandwidth more efficiently than the Sample. The idea is simple: instead of transmitting, on average, a single sample per message, we send on average samples (e.g., 100) per report. That is, we send a report once per packets, containing all the sampled packets within this period. This pattern utilizes the bandwidth more efficiently as the payload ratio of the message is considerably higher. However, it also creates delays in reporting new information to the controller. Our analysis is used to find the optimal batch size and minimize the total error.
(2) Controller algorithm. The controller maintains an instance of Memento or H-Memento where we term the respective algorithms D-Memento and D-H-Memento. The controller behaves slightly differently in each option.
Aggregation. Aggregation is used in this study only as a baseline. Thus, instead of implementing a specific algorithm, we simulate an idealized aggregation technique with an unlimited space at the controller and no accuracy losses upon merging. As we later show, the Sample and Batch approaches outperform this Aggregation method; thus, we conclusively demonstrate that they are superior to any aggregation technique.
Sample and Batch. In the Sample and Batch schemes, the controller maintains a Memento or H-Memento instance. When receiving a report, it first performs a Full update for each sampled packet and then makes Window updates for the un-sampled ones. In total, the Sample performs updates and the Batch performs updates.
This section is divided into two parts; first, Section 5.1 analyzes our single-device Memento and H-Memento algorithms and shows accuracy guarantees. Next, Section 5.2 analyzes our network-wide D-Memento and D-H-Memento algorithms, and explains how to find the optimal batch size (in terms of guaranteed error) given a certain (per-packet) bandwidth budget.
5.1. Memento and H-Memento Analysis
This section surveys the main theoretical results for Memento and for H-Memento. These assure correctness as long as the sampling probability is large enough.
Formal model. Our traffic is modeled as a stream . It is initially empty, and a packet is added at each step. A sliding-window algorithm considers only the last packets, denoted as . The notation denotes the frequency of flow in . Given , a heavy hitters algorithm provides an estimator for . We formalize the problem as follows:
Definition 5.1 ().
An algorithm solves - Window Frequency Estimation if given a query for a flow (), it provides such that
Memento. Theorem 5.2 is the main theoretical result for Memento. It states that Memento solves the - Window Frequency Estimation problem for whenever it is allocated counters and has a sampling probability that satisfies , where is the inverse of the cumulative density function of the normal distribution with mean
and standard deviation of. Note that satisfies for any . In other words, the theorem emphasizes the trade off between the amount of space allocated and the sampling rate, for achieving a target error bound . Specifically, if the algorithm has many counters (i.e., have a low ), then we can afford a higher (i.e., the sampling rate can be low). The complete analysis for Memento and H-Memento is presented in Appendix A.
Theorem 5.2 ().
Memento solves - Windowed Frequency Estimation for and .
H-Memento. Theorem 5.3 is our main result for H-Memento. It says that H-Memento is correct for any , where is the size of the hierarchic domain ( for source hierarchies and for (source,destination) hierarchies). The complete analysis for Memento and H-Memento is presented in Appendix A.
Theorem 5.3 ().
H-Memento solves - Approximate Windowed Hierarchical Heavy Hitters for .
5.2. D-Memento and D-H-Memento Analysis
We now provide analysis for our network-wide D-Memento and D-H-Memento algorithms. Intuitively, the error in these algorithms comes from two origins. First, an error due to sampling, which is quantified by Theorems 5.2 and 5.3. However, there is an additional error that is caused by the delay in transmission, as the measurement points only send the sampled packets once in every packets. If a measurement point has a low traffic rate, it may take long time for it to see packets; in this case, all of its samples may be obsolete and may not belong in the most recent window. Therefore, our first step is to reason about the accuracy impact of the Batch and Sample methods.
Notations and definitions. We denote the bandwidth budget as bytes/packet. That is, determines how much traffic is used for communicating between the measurement points and the controller. This communication is done using standard packets, which have header field overheads. We denote by the minimal header size (in bytes) of the chosen transmission protocol (e.g., 64 bytes for TCP). Next, reporting a sampled packet requires bytes (e.g., 4 bytes for srcip, or 8 bytes for (srcip,dstip) pair). We also denote by the total number of measurement points.
Model. Intuitively, we can choose two (dependent) parameters: the sampling rate, , and the batch size . That is, each measurement point samples with probability until it gathers packets. At this point, it assembles an -sized packet that encodes the sampled packet and sends it to the controller. As the expected number of packets required to gather a sized batch is , the bandwidth constraint can be written as Specifically, this allows to express the maximum allowed sampling probability as since sampling at a lower rate would not utilize the entire bandwidth and would result in sub-optimal accuracy.
Accuracy of the Batch and Sample methods. We can now quantify the error of the Batch and Sample methods. Intuitively, we have to factor the delays in communication (as we only report per a fixed number of packets to stay within the bandwidth budget). For example, if there are two measurement points in which one processes a million requests per second while the other only a thousand, the batches of the second point would include many obsolete packets that are not within the current window. However, recall that these reports only reflect packets at each of the points. Therefore, we conclude that:
Theorem 5.4 ().
The error created by the delayed reporting in the Batch method is bounded by .
Next, Theorems 5.2 and 5.3 enable us to bound the sampling error as a function of , while Theorem 5.4 bounds the delayed reporting error. The following theorem applies for D-Memento (using ) and for D-H-Memento (using the appropriate value). As the round trip time inside the data center is small compared to window sizes that are of interest, the error caused by the delay of packet transmissions is negligible, and thus we do not factor it here. Theorem 5.5 quantifies the overall error in the Batch method; the error of the Sample method is derived when setting .
Theorem 5.5 ().
Given overhead , batch size , bandwidth budget , sample payload size , window size and confidence , the overall error (in packets) is at most:
According to Theorem 5.3, we have that . This means that our overall error is bounded by:
Formally, we showed a bound of , for each choice of . The guarantees for the Sample method are given by fixing . The next step is to use Theorem 5.5 to calculate the optimal batch size given a bandwidth budget . Thus, we get the best achievable accuracy for the Batch method within the bandwidth limitation. We have:
We then compare this expression to zero to compute the optimal batch size . This is easily done with numerical methods.
For example, for a TCP connection (); ten measurement points (); source IP hierarchy (); error probability of ; a window of size ; and a bandwidth quota of byte per packet, the optimal batch size is . The resulting (overall) error guarantee is 13K packets (i.e., an error of ). Increasing the bandwidth budget to bytes decreases the absolute error to 5.3K packets () while increasing the optimal batch size to . When increasing the window size (), the absolute error increases by an factor and the error (as a fraction of ) decreases. For example, increasing the window size to increases the optimal batch size to , while reducing the error to . Alternatively, 2D source/destination hierarchies (increasing from to ) result in a slightly larger error and a higher optimal batch size.
Figure 8 illustrates the accuracy guarantee provided by each method. We compare three synchronization variants – Sample, Batch with , and Batch with an optimal (varies with ), as explained above. As depicted, Sample has the smallest delay error and yet provides the worst guarantees as it conveys little information within the bandwidth budget. The 100-Batch method has lower a sampling error (as its sampling rate is higher), but its reporting delay makes the overall error larger. For larger values of , the optimal batch size grows closer to and the accuracy gap narrows.
Server. Our evaluation was performed on a Dell 730 server running Ubuntu 16.04.01 release. The server has 128GB of RAM and an Intel Xeon CPU E5-2667 v4@ 3.20GHz.
Algorithms and implementation. For the HH problem, we compare Memento and WCSS (WCSS, ). For WCSS we use our Memento implementation without sampling (). For the HHH problem, we compare H-Memento to MST (HHHMitzenmacher, ) and RHHH (RHHH, ) (interval algorithms). We use the code released by the original authors of these algorithms. We also form the Baseline sliding window algorithm by replacing the underlying algorithm in MST (HHHMitzenmacher, ) with WCSS. Specifically, MST proposed to use Lee and Ting’s algorithm (Lee:2006:SME:1142351.1142393, ) as WCSS was not known at the time. By replacing the algorithm with the WCSS, a state of the art window algorithm, we compare with the best variant known today.
Yardsticks. We consider source IP hierarchies in byte granularity () and two-dimensional source/destination hierarchies (). Such hierarchies are also used in (RHHH, ; HHHMitzenmacher, ; CormodeHHH, ). We run each data point
times and use two-sided Student’s t-tests to determine theconfidence intervals.
6.1. Heavy Hitters Evaluation
We evaluate the effect of the sampling probability on the operation speed and empirical accuracy of Memento, and use the speed and accuracy of WCSS as a reference point for this evaluation. The notation X-WCSS stands for WCSS that is allocated X counters (for X). Similarly, the notation X-Memento is for Memento with X counters. The window size is set to million packets and the interval length is set to million packets.
use of sampling, except when the sampling rate is low and the number of counters is high. Even then, this is mainly evident in the skewed Datacenter trace.
As depicted in Figure 15, the update speed is determined by the sampling probability and is almost indifferent to changes in the number of counters. Memento achieves a speedup of up to 14 compared to WCSS. As expected, allocating more counters also improves the accuracy. It is also evident that the error of Memento is almost identical to that of WCSS, which indicates that it works well for the range. The smallest evaluated , namely, , already exhibits slight accuracy degradation, which shows the limit of our approach. It appears that a larger number of counters, or heavy tailed workloads (such as the Backbone trace), allow for even smaller sampling probabilities without a impact to the attained accuracy.
6.2. Hierarchical Heavy Hitters Evaluation
H-Memento vs. existing window algorithm. Next, we evaluate H-Memento and compare it to the Baseline algorithm. We consider two common types of hierarchies, namely a one-dimensional source hierarchy () and two-dimensional source/destination hierarchies (). Note that H-Memento performs updates in constant time while the Baseline does it in . Following the insights of Figure 15, we evaluate H-Memento with a sampling rate such that , so that each of the prefixes is sampled with a probability of at least . That is, we do not allow sampling probabilities of to get an effective sampling rate of at least , which is the range in which Memento is accurate.
We evaluate three configurations for each algorithm, with a varying number of counters. The notation 64H denotes the use of counters when , and 1600 counters when . The notations 512H and 4096H follow the same rule. In the Baseline algorithm, the counters are utilized in equally-sized WCSS instances, while H-Memento has a single Memento instance with that many counters.
Figure 18 shows the evaluation results. We can see how is the dominating performance parameter. H-Memento achieves up to a speedup in source hierarchies and a speedup in source/destination hierarchies. This difference is explained by the fact that the Baseline algorithm makes expensive Full updates for each packet, while H-Memento usually performs a single Window update.
H-Memento vs. interval algorithm. Next, we compare the throughput of H-Memento to the previously suggested RHHH (RHHH, ). H-Memento and RHHH are similar in their use of samples to increase performance. Moreover, RHHH is the fastest known interval algorithm for the HHH problem. Our results, presented in Figure 21, show that
is faster than RHHH for small sampling ratios. The reason lies in the implementation of the sampling. Namely, in RHHH, sampling is implemented as a geometric random variable, which is inefficient for small sampling probabilities, whereas in, it is performed using a random number table. Still, as the sampling probability gets lower, the geometric calculation becomes more efficient, and eventually, RHHH is faster than . This is because H-Memento performs a Window update for most packets, while RHHH only decrements a counter.
Looking at both performance figures independently, we conclude that achieves very high performance and is likely to incur little overheads in a virtual switch implementation in a similar manner to RHHH.
6.3. Network-Wide Evaluation
This section describes our proof-of-concept system. We incorporated H-Memento into HAProxy which provides the capability to monitor traffic from subnets, an ability which we used to implement rate limiting for subnets. Our controller periodically receives information from (in the Batch, Sample or Aggregate method) the load-balancers and uses this to perform the HHH measurement (with the D-H-Memento algorithm). Then, the HHH output can be used as a simple threshold-based attack mitigation application where a subnet is rate-limited if its window frequency is above the threshold.
HAProxy. We implemented and integrated our algorithms into the open-source HAProxy load-balancer (version ). Specifically, we leveraged and extended HAProxy’s Access Control List (ACL) capabilities, to allow the updates of our algorithms with new arriving data as well as to perform mitigation (i.e., Deny or Tarpit) when an attacker is identified.
Traffic generation. Our goal is to obtain realistic measurements involving multiple simultaneous stateful connections such as HTTP GET and POST requests from multiple clients towards the load-balancers. To that end, we developed a tool that enables a single commodity desktop to maintain and initiate stateful HTTP GET and POST requests sourcing from multiple IP addresses. Our solution requires the cooperation of both ends (i.e., the traffic generators and the load-balancer servers) for an arbitrarily large IP pool.
It is based on the NFQUEUE and libnetfilter-queue Linux targets that enable the delegation of the decision on packets to a userspace software. As reported by the Apache ab load testing tool, using a single commodity computer, we can initiate and maintain up to 30,000 stateful HTTP requests per second from different IPs without using the HTTP keep-alive feature. We are only limited by the pace at which the Linux kernel opens and closes sockets (e.g., TCP timeout).
Controller. We implemented in C a test controller that communicates with the load-balancers via sockets. It holds a local HHH algorithm implementation and exchanges information with the load-balancers (e.g., receives aggregations, samples, or batches). The controller then generates a global and coherent window view of the ingress traffic.
Testbed. We built a testbed involving three physical servers. The first is used for traffic generation towards the load-balancers. Specifically, we used several apache ab instances augmented with our tool to generate realistic stateful traffic from multiple IP addresses with delay and racing among different clients. The second station holds ten autonomous instances (i.e., separate processes) of HAProxy load-balancers listening on different ports for incoming requests. Finally, at the third station, we used docker to deploy Apache server instances listening on different sockets.
6.3.1. H-Memento’s Accuracy
In this experiment, we evaluate MST (denoted as Interval), the Baseline algorithm and H-Memento with a single load-balancer client. Our goal is to monitor the last 1,000,000 HTTP requests that have entered the load-balancer. The Baseline algorithm and H-Memento are set at and a window size of 1,000,000 requests. The MST Interval instance is using a measurement period of 1,000,000 requests and is configured with , resulting in comparable memory usage. For each new incoming HTTP request, each algorithm estimates the frequency of each of its IP prefixes.
The results are depicted in Figure 25. In all the traces, the Interval approach is the least accurate, while as expected, H-Memento is slightly less accurate than the Baseline algorithm due to its use of sampling. These conclusions hold for every prefix length and testbed workload.
6.3.2. Accuracy and Traffic Budget
In this experiment, we generate traffic towards ten load-balancers communicating with a centralized controller that maintains a global window view of the last 1,000,000 requests that entered the system. We evaluate the three different transmission methods (Aggregation, Sample, and Batch) with the same 1-byte per packet control traffic budget.
Results. Figure 29 depicts the results. As indicated, the best accuracy is achieved by the Batch approach, while Sample significantly outperforms Aggregation. Intuitively, the Aggregation method sends the largest messages, each of which contains the full information known to the measurement point. Its drawback is a long delay between controller updates. The Sample method has a smaller delay but utilizes the bandwidth inefficiently due to the packet header overheads. Finally, Batch has a slightly higher delay but delivers more data within the bandwidth budget, which improves the controller’s accuracy.
6.4. HTTP Flood Evaluation
We now evaluate our detection system in an HTTP flood. Our deployment consists of ten HAProxy load-balancers that serve as the entry point and direct requests to Apache servers. The HAProxy load-balancers also report to the centralized controller that discovers subnets that exceed the user-defined threshold. The bandwidth budget is set to 1-byte per packet and the window size is million packets.
Traffic. We inject flood traffic on top of the Backbone packet trace. Specifically, we select a random time at which we inject 50 randomly-picked 8-bit subnets that account for 70% of the total traffic once the flood begins. We generate a new trace as follows. (1) We select 50 subnets by randomly choosing 8-bits for each, and (2) a random trace line in the range (,). Until that line the trace is unmodified. (3) From that line on, at each line, with probability 0.7 we add a flood line from a uniformly picked flooding sub-network, and with probability 0.3 we skip to the next line of the original trace.
Results. Figure 33 depicts the results. Figure (a)a and Figure (b)b show the detection speed of the flooding subnets by the three different approaches at the controller. We compare among the three approaches and additionally outline an optimal algorithm that uses an accurate window and “knows” exactly what traffic enters the load-balancers without delay (OPT). It is notable that the Batch approach achieves near-optimal performance, and outperforms Sample and Aggregation. Figure (c)c shows that the Batch method identifies almost all of the attack messages as is expected by our theoretical analysis. Further, its miss rate is 37 smaller under the 1-byte per packet bandwidth budget when compared to the ideal Aggregation method.
7. Related Work
Heavy hitters are an active research area on both intervals (FAST, ; HashPipe, ; SpaceSavings, ; RAP, ; DIMSUM, ; AndersonIMSUM, ; SketchVisor, ) and sliding windows (WCSS, ; HungLT10, ; SWHH, ; FAST, ; SWAMP, ). HH integration in the single-device mode is an active research challenge. For example, Sketchvisor (SketchVisor, ) suggests using a lightweight fast-path measurement when the line is busy. This increases the throughput but reduces accuracy. Alternatively, HashPipe (HashPipe, ) adopts the interval-based Space Saving (SpaceSavings, ) into programmable switches. NetQRE (netqre, ) allows the network administrator to write a measurement program. The program can describe HH and HHH as well as sliding windows. However, their algorithm is exact rather than approximate and requires a space that is linear in the window size which is expensive for large windows.
Hierarchical heavy hitters. The HHH problem was first defined (in the Interval model) by (Cormode2003, ), which also introduced the first algorithm. The problem then attracted a large body of follow-up work as well as an extension to multiple dimensions (Cormode2004, ; CormodeHHH, ; Hershberger2005, ; HHHMitzenmacher, ; Zhang:2004:OIH:1028788.1028802, ; MASCOTS, ; HHHIMC17, ). MST (HHHMitzenmacher, ) is a conceptually simple multidimensional HHH algorithm that uses multiple independent HH instances; one instance is used for each prefix pattern. Upon a packet arrival, all instances are updated with their corresponding prefixes. The set of hierarchical heavy hitters is then calculated from the set of (plain) heavy hitters of each prefix type. The algorithm requires space and update time. MST can also operate in the sliding window model, by replacing the underlying HH algorithm with a sliding window solution (WCSS, ; FAST, ; HungAndTing, ). Randomized HHH (RHH) (RHHH, ) is similar to MST but only updates a single HH instance. This reduces the update complexity to a constant but requires a large amount of traffic to converge. RHHH does not naturally extend to sliding windows since each HH instance receives a slightly varying number of updates and thus considers a different window.
Network-wide measurement. The problem of network-wide measurement is becoming increasingly popular (RexfordNetworkwide, ; Sigcomm2018Networkwide, ; SketchVisor, ; FlowRadar, ). A centralized controller collects data from all measurement points to form a network-wide perspective. Measurement points are placed in the network so that each packet is measured only once. The work of (HHTagging, ) suggests marking monitored packets which allows for more flexible measurement point placement.
In (RexfordNetworkwide, ), the controller determines a dynamic reporting threshold that allows for reduced communication overheads. It is unclear how to utilize the method in the sliding window model. Yet, the optimization goal is very similar in essence to this work: maximize accuracy and minimize traffic overheads. Stroboscope (Stroboscope, ) is another network-wide measurement system that also guarantees that the overheads adhere to a strict budget. FlowRadar (FlowRadar, ) avoids communication during the entire measurement period. Instead, the state of each measurement point is shared at the end of the measurement. Thus, FlowRadar follows the Interval pattern, which we showed to be slow to detect new heavy hitters.
Our work highlights the potential benefits of sliding-window measurements to cloud operators and makes them practical for network applications. Specifically, we showed in this work that window-based measurements detect traffic changes faster, and thus enable more agile applications. Despite these benefits, sliding windows have not been used extensively, since existing window algorithms are too slow to cope with the line speed and do not provide a network-wide view. Accordingly, we introduced the Memento family of HH and HHH algorithms for both single-device and network-wide measurements. We analyzed the algorithms and extensively evaluated them on real traffic traces. Our evaluations indicate that the Memento algorithms meet the necessary speed and efficiently to provide network-wide visibility. Therefore, our work turns sliding-window HH and HHH measurements into a practical option for the next generation of network applications.
A potential drawback of existing HHH solutions, ours included, is the ability to make real-time queries. That is, while RHHH provides line-rate packet processing on streams and H-Memento provides it for sliding windows, neither allows sufficiently fast queries. Therefore, we believe that a mechanism that would allow constant-time updates for detection of changes in the hierarchical heavy hitters set would be a promising direction for future work.
We open-sourced the Memento algorithms and the HAProxy load-balancer extension that provides capabilities to block and rate-limit traffic from entire sub-networks (rather than from individual flows) (TRCode, ). We hope that our open-source code will further facilitate sliding-window measurements in network applications.
We thank the anonymous reviewers and our shepherd, Kenjiro Cho, for their helpful comments and suggestions. This work was partly supported by the Hasso Plattner Institute Research School; the Zuckerman Institute; the Technion Hiroshi Fujiwara Cyber Security Research Center; and the Israel Cyber Bureau.
Appendix A H-Memento Analysis
Notations. For brevity, notations are summarized in Table 1.
Main result. The main result for H-Memento is given by Theorem 5.3, in Section A. Given the window size (), the desired accuracy (), the desired confidence (), and the hierarchy size (), Theorem 5.3 provides a lower bound on the sampling probability that still ensures correctness. Specifically, the theorem says that H-Memento is correct for any . The symbol is a parameter that depends on , and satisfies for any .
That is, we prove that the HHH set returned by H-Memento satisfies the accuracy and coverage properties. Section A.1 shows the correctness of Memento and the accuracy property of H-Memento. We then show Coverage in Section A.2. Finally, Section A.3 shows that H-Memento solves approximate windowed HHH.
We model the update procedure of H-Memento as a balls and bins experiment where we first select one out of prefixes and then update that prefix with probability . For simplicity, we assume that . Thus, we have bins and balls. Upon a packet arrival, we place a ball in one of the bins; if the bin is one of the first , we perform a full update for the sampled prefix type, and otherwise we perform a window update. Definition A.1 formulates this model. Alternatively, Memento is modeled as the degenerate case where and thus we update the fully specified prefix.
Definition A.1 ().
For each bin () and set of packets (), denote by the number of balls (from ) in bin . When the set contains all packets, we use the notation .
We require confidence intervals for any and a set . However, the ’s are correlated as and therefore we use the technique of Poisson approximation. It enables us to compute confidence intervals for independent Poisson variables and convert back to the balls and bins case.
Formally, let , be independent variables representing the number of balls in each bin.
We now use Lemma A.2 to get intervals for the ’s.
Lemma A.2 (Corollary 5.11, page 103 of (Mitzenmacher:2005:PCR:1076315, )).
Let be an event whose probability monotonically increases with the number of balls. If the probability of is in the Poisson case then it is at most in the exact case.
a.1. Accuracy Analysis
To prove accuracy, we show that, for every prefix ():
We have multiple sources of error and thus we first quantify the sampling error. Let be the Poisson variable corresponding to a prefix . That is, the set contains all the packets that are generalized by . Therefore:
Lemma A.3 ().
Let be a Poisson random variable. Then
; here, is the value that satisfies and is the cumulative density of the normal distribution with mean and STD .
Theorem A.4 ().
We use Lemma A.3 for and get:
Since we do not know the exact value of , we assert that to get: We need error of the form: and thus set: We extract to get: Thus, when , we have that: We multiply by and get:
Finally, since is monotonically increasing with the number of balls (), we use Lemma A.2 and conclude that
To reduce clatter, we denote . Theorem A.4 shows that when the sample is accurate enough. The error of the underlying Memento algorithm is proportional to the number of sampled packets. Thus, if we oversample we get a slightly worse accuracy guarantee. We compensate by allocating (slightly) more counters as explained in Corollary A.5.
Corollary A.5 ().
Consider the number of updates (from the last items) to the underlying algorithm (). If , then
Theorem A.4 yields: Thus: ∎
Corollary A.5 means that, by allocating slightly more space to the underlying algorithm, we can compensate for possible oversampling. Generally, we configure an algorithm () that solves - Windowed Frequency Estimation with . Applying Corollary A.5, we get that, with probability , there are at most sampled packets. Using the union bound we have that with probability : For example, WCSS requires counters for . If we set , we now require counters.
Hereafter, we assume that the algorithm is already configured to accommodate this problem.
Theorem A.6 ().
Consider an algorithm () that solves the -Windowed Frequency Estimation problem. If , then for and , solves - Windowed Frequency Estimation.
We use Theorem A.4. That is, we have that
solves - Windowed Frequency Estimation and provides us with an estimator for – the number of updates for a prefix . According to Corollary A.5: Multiplying by yields:
We need to show that: . Note that: and . Thus,
The last inequality follows from the observation that if the error of (3) exceeds , then one of the events occurs. We bound this expression with the Union bound.
Theorem A.6 implies accuracy, as it guarantees that, with probability ,the estimated frequency of any prefix is within of its real frequency. In particular, this means that the HHH prefix estimations are within bound as shown by Corollary A.7. Furthermore, by considering the degenerate case where we always select fully specified items (i.e., and ), we conclude the correctness of Memento, as stated in the following Corollary A.8.
Corollary A.7 ().
If , then Algorithm 2 satisfies the accuracy constraint for and .
Corollary A.8 ().
If then Memento solves the - Windowed Frequency Estimation problem for and .
a.2. Coverage Analysis
We now show that H-Memento satisfies the coverage property (Definition 4.2). That is, Conditioned frequencies are calculated differently for one and two dimensions and therefore Section A.2.1 shows coverage for one dimension and Section A.2.2 for two.
a.2.1. One Dimension
Lemma A.9 ().
In one dimension:
Lemma A.10 ().
The conditioned frequency estimation of Algorithm 2 is:
Theorem A.11 ().
from Lemma A.10, we get .It is enough to show that the randomness is bounded by with probability as and . We denote by the set of packets that affect the calculation of . We split into two: contains packets that increase the value of and contains these that decrease it. We use to estimate the sample error in and for estimating the error in .
We denote by the number of balls in the positive sum and use Lemma A.3. is non-negative. Thus and