CLEF: Limiting the Damage Caused by Large Flows in the Internet Core (Technical Report)

07/16/2018 ∙ by Hao Wu, et al. ∙ 0

The detection of network flows that send excessive amounts of traffic is of increasing importance to enforce QoS and to counter DDoS attacks. Large-flow detection has been previously explored, but the proposed approaches can be used on high-capacity core routers only at the cost of significantly reduced accuracy, due to their otherwise too high memory and processing overhead. We propose CLEF, a new large-flow detection scheme with low memory requirements, which maintains high accuracy under the strict conditions of high-capacity core routers. We compare our scheme with previous proposals through extensive theoretical analysis, and with an evaluation based on worst-case-scenario attack traffic. We show that CLEF outperforms previously proposed systems in settings with limited memory.



There are no comments yet.


page 22

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Detecting misbehaving large network flows111As in prior literature [15, 42], the term large flow denotes a flow that sends more than its allocated bandwidth. that use more than their allocated resources is not only an important mechanism for Quality of Service (QoS) [35] schemes such as IntServ [6], but also for DDoS defense mechanisms that allocate bandwidth to network flows [4, 27, 23]. With the recent resurgence of volumetric DDoS attacks [3], the topics of DDoS defense mechanisms and QoS are gaining importance; thus, the need for efficient in-network accounting is increasing.

Unfortunately, per-flow resource accounting is too expensive to perform in the core of the network [15], since large-scale Internet core routers have an aggregate capacity of several Terabits per second (Tbps). Instead, to detect misbehaving flows, core routers need to employ highly efficient schemes which do not require them to keep per-flow state. Several approaches for large-flow detection have been proposed in this context; they can be categorized into probabilistic (i.e., relying on random sampling or random binning) and deterministic algorithms. Examples of probabilistic algorithms are Sampled Netflow [11] and Multistage Filters [15, 14], while EARDet [42] and Space Saving [29] are examples of deterministic approaches.

However, previously proposed algorithms are able to satisfy the requirements of core router environments only by significantly sacrificing their accuracy. In particular, with the constraints on the amount of high-speed memory on core routers, these algorithms either can only detect flows which exceed their assigned bandwidth by very large amounts, or else they suffer from high false-positive rates. This means that these systems cannot prevent the performance degradation of regular, well-behaved flows, because of large flows that manage to stay “under the radar” of the detection algorithms, or because the detection algorithms themselves erroneously flag and punish the well-behaved flows.

As a numeric example, consider that for EARDet to accurately detect misbehaving flows exceeding a threshold of 1 Mbps on a 100 Gbps link, it would require counters for that link. Maintaining these counters, together with the necessary associated metadata, requires between 1.6 MB and 4MB of state222The IP metadata consists of source and destination addresses, protocol number, and ports. Thus, it requires about 16 bytes and 40 bytes per counter for IPv4 and IPv6, respectively., which exceeds typical high-speed memory provisioning for core routers, and would come at a high cost (for comparison, note that only the most high-end commodity CPUs approach the 1–4 MB range with their per-core L1/L2 memory, and the price tag for such processors surpasses USD 4000 [18]).

In this paper we propose a novel randomized algorithm for large flow detection called Recursive Large-Flow Detection (RLFD). RLFD works by considering a set of potential large flows, dividing this set into multiple subsets, and then recursively narrowing down the focus to the most promising subset. This algorithm is highly memory efficient, and is designed to have no false positives. To achieve these properties, RLFD sacrifices some detection speed, in particular for the case of multiple concurrent large flows. We improve on these limitations by combining RLFD with the deterministic EARDet, proposing a hybrid scheme called CLEF, short for in-Core Limiting of Egregious Flows. We show how this scheme inherits the strengths of both algorithms: the ability to quickly detect very large flows of EARDet (which it can do in a memory efficient way), and the ability to detect low-rate large flows with minimal memory footprint of RLFD.

To have a significant comparison with related work, we define a damage

metric which estimates the impact of failed, delayed, and incorrect detection on well-behaved flows. We use this metric to compare RLFD and CLEF with previous proposals, which we do both on a theoretical level and by evaluating the amount of damage caused by (worst-case) attacks. Our evaluation shows that CLEF performs better than previous work under realistic memory constraints, both in terms of our damage metric and in terms of false negatives and false positives.

To summarize, this paper’s main contributions are the following: a novel, randomized algorithm, RLFD, that provides eventual detection of persistently large flows with very little memory cost; a hybrid detection scheme, CLEF, which offers excellent large-flow detection properties with low resource requirements; the analysis of worst-case attacks against the proposed large-flow detectors, using a damage metric that allows a realistic comparison with the related work.

2 Problem Definition

This paper aims to design an efficient large-flow detection algorithm that minimizes the damage caused by misbehaving flows. This section introduces the challenges of large-flow detection and defines a damage metric to compare different large-flow detectors. We then define an adversary model in which the adversary adapts its behavior to the detection algorithm in use.

2.1 Large-Flow Detection

A flow is a collection of related traffic; for example, Internet flows are commonly characterized by a 5-tuple (source / destination IP / port, transport protocol). A large flow is one that exceeds a flow specification during a period of length . A flow specification can be defined using a leaky bucket descriptor , where and are the maximum legitimate rate and burstiness allowance, respectively. Flow specifications can be enforced in two ways: arbitrary-window, in which the flow specification is enforced over every possible starting time, or landmark-window, in which the flow specification is enforced over a limited set of starting times.

Detecting every large flow exactly when it exceeds the flow specification, and doing so with no false positives requires per-flow state (this can be shown by the pigeonhole principle [38]), which is expensive on core routers. In this paper, we develop and evaluate schemes that trade timely detection for space efficiency.

As in prior work in flow monitoring, we assume each flow has a unique and unforgeable flow ID, e.g., using source authentication techniques such as accountable IPs [1], ICING [32], IPA [24], OPT [20], or with Passport [25]. Such techniques can be deployed in the current Internet or in a future Internet architecture, e.g., Nebula [2], SCION [44], or XIA [17].

Large-flow detection by core routers.

In this work, we aim to design a large-flow detection algorithm that is viable to run on Internet core routers. The algorithm needs to limit damage caused by large flows even when handling worst-case background traffic. Such an algorithm must satisfy these three requirements:

  • Line rate: An in-core large-flow detection algorithm must operate at the line rate of core routers, which can process several hundreds of gigabits of traffic per second.

  • Low memory: Large-flow detection algorithms will typically access one or more memory locations for each traversing packet; such memory must be high-speed (such as on-chip L1 cache). Additionally, such memory is expensive and usually limited in size, and existing large-flow detectors are inadequate to operate in high-bandwidth, low-memory environments. An in-core large-flow detection algorithm should thus be highly space-efficient. Though perfect detection requires counters equal to the maximum number of simultaneous large flows (by the pigeonhole principle [38]), our goal is to perform effective detection with much fewer counters.

  • Low damage: With the performance constraints of the previous two points, the large-flow detection algorithm should also minimize the damage to honest flows, which can be caused either by the excessive bandwidth usage by large flows, or by the erroneous classification of legitimate flows as large flows (false positives). Section 2.2 introduces our damage metric, which takes both these aspects into account.

2.2 Damage Metric

We consider misbehaving large flows to be a problem mainly in that they have an adverse impact on honest flows. To measure the performance of large flow detection algorithms we therefore adopt a simple and effective damage metric which captures the packet loss suffered by honest flows. This metric considers both (1) the direct impact of excessive bandwidth usage by large flows, and (2) the potential adverse effect of the detection algorithm itself, which may be prone to false positives resulting in the blacklisting of honest flows. Specifically, we define our damage metric as , where (overuse damage) is the total amount of traffic by which all large flows exceed the flow specification, and (false positive damage) is the amount of legitimate traffic incorrectly blocked by the detection algorithm. The definition of the overuse damage assumes a link at full capacity, so when this is not the case the damage metric represents an over-approximation of the actual traffic lost suffered by honest flows. We note that the metrics commonly used by previous work, i.e., false positives, false negatives, and detection delay, are all reflected by our metric.

2.3 Attacker Model

In our attacker model, we consider an adversary that aims to maximize damage. Our attacker responds to the detection algorithm and tries to exploit its transient behavior to avoid detection or to cause false detection of legitimate flows.

Like Estan and Varghese’s work [15]

, we assume that attackers know about the large-flow detection algorithm running in the router and its settings, but have no knowledge of secret seeds used to generate random variables, such as the detection intervals for landmark-window-based algorithms 

[30, 13, 19, 28, 29, 15, 16, 12], and random numbers used for packet/flow sampling [15]. This assumption prevents the attacker from performing optimal attacks against randomized algorithms.

We assume the attacker can interleave packets, but is unable to spoof legitimate packets (as discussed in Section 2.1) or create pre-router losses in legitimate flows. Figure 1 shows the network model, where the attacker arbitrarily interleaves attack traffic () between idle intervals of legitimate traffic (), and the router processes the interleaved traffic to generate output traffic () and perform large-flow detection. Our model does not limit input traffic, allowing for arbitrary volumes of attack traffic.

In our model, whenever a packet traverses a router, the large-flow detector receives the flow ID (for example, the source and destination IP and port and transport protocol), the packet size, and the timestamp at which the packet arrived.

Figure 1: Adversary Model.

3 Background and Challenges

In this section we briefly review some existing large flow detection algorithms, and discuss the motivations and challenges of combining multiple algorithms into a hybrid scheme.

3.1 Existing Detection Algorithms

We review the three most relevant large-flow detection algorithms, summarized in Table 1. We divide large flows into low-rate large flows and high-rate large flows, depending on the amount by which they exceed the flow specification.


EARDet [42] guarantees exact and instant detection of all flows exceeding a high-rate threshold , where is the link capacity and is the number of counters. However, EARDet may fail to identify a large flow whose rate stays below .

Multistage Filters.

Multistage filters [15, 14] consist of multiple parallel stages, each of which is an array of counters. Specifically, arbitrary-window-based Multistage Filter

(AMF), as classified by Wu et al. 

[42], uses leaky buckets as counters. AMF guarantees the absence of false negatives (no-FN) and immediate detection for any flow specification; however, AMF has false positives (FPs), which increase as the link becomes congested (as shown in Appendix 0.B.2).

Flow Memory.

Flow Memory (FM) [15] refers to per-flow monitoring of select flows. FM is often used in conjunction with another system that specifies which flows to monitor; when a new flow is to be monitored but the flow memory is full, FM evicts an old flow. We follow Estan and Varghese [15]’s random eviction. If the flow memory is large enough to avoid eviction, it provides exact detection. In practice, however, Flow Memory is unable to handle a large number of flows, resulting in frequent flow eviction and potentially high FN. The analysis in Appendix 0.B.1 shows that FM’s real-world performance depends on the amount by which a large flow exceeds the flow specification: high-rate flows are more quickly detected, which improves the chance of detection before eviction.

Algorithm EARDet AMF FM
No-FP yes no yes
No-FN low-rate no yes no
high-rate yes yes yes
Instant detection yes yes yes

Appendix 0.B.1 and 0.B.2 show that Flow Memory has high FN and AMF has high FP for low-rate large flows when memory is limited.

EARDet cannot provide no-FN when memory is limited.

Flow Memory has nearly zero FN when large-flow rate is high.

Table 1: Comparison of three existing detection algorithms. None of them achieve all desired properties.

3.2 Advantages of Hybrid Schemes

As Table 1 shows, none of the detectors we examined can efficiently achieve no-FN and no-FP across various types of large flows. However, different detectors exhibit different strengths, so combining them could result in improved performance.

One approach is to run detectors sequentially; in this composition, the first detector monitors all traffic and sends any large flows it detects to a second detector. However, this approach allows an attacker controlling multiple flows to rotate overuse among many flows, overusing a flow only for as long as it takes the first detector to react, then sending at the normal rate so that remaining detectors remove it from their watch list and re-starting with the attack.

Alternatively, we can run detectors in parallel: the hybrid detects a flow whenever it is identified by either detector. (Another configuration is that a flow is only detected if both detectors identify it, but such a configuration would have a high FN rate compared to the detectors used in this paper.) The hybrid inherits the FPs of both schemes, but features the minimum detection delay of the two schemes and has a FN only when both schemes have a FN. The remainder of this paper considers the parallel approach that identifies a flow whenever it is detected by either detector.

The EARDet and Flow Memory schemes have no FPs and are able to quickly detect high-rate flows; because high-rate flows cause damage much more quickly, rapid detection of high-rate flows is important to achieving low damage. Combining EARDet or Flow Memory with a scheme capable of detecting low-rate flows as a hybrid detection scheme can retain rapid detection of high-rate flows while eventually catching (and thus limiting the damage of) low-rate flows. In this paper, we aim to construct such a scheme. Specifically, our scheme will selectively monitor one small set at a time, ensuring that a consistently-overusing flow is eventually detected.

4 RLFD and CLEF Hybrid Schemes

In this section, we present our new large-flow detectors. First, we describe the Recursive Large-Flow Detection (RLFD) algorithm, a novel approach which is designed to use very little memory but provide eventual detection for large flows. We then present the data structures, runtime analysis, and advantages and disadvantages of RLFD. Next, we develop a hybrid detector, CLEF, that addresses the disadvantages of RLFD by combining it with the previously proposed EARDet [42]. CLEF uses EARDet to rapidly detect high-rate flows and RLFD to detect low-rate flows, thus limiting the damage caused by large flows, even with a very limited amount of memory.

4.1 RLFD Algorithm

RLFD is a randomized algorithm designed to perform memory-efficient detection of low-rate large flows; it is designed to scale to a large number of flows, as encountered by an Internet core router. RLFD is designed to limit the damage inflicted by low-rate large flows while using very limited memory. The intuition behind RLFD is to monitor subsets of flows, recursively subdividing the subset deemed most likely to contain a large flow. By dividing subsets in this way, RLFD exponentially reduces memory requirements (it can monitor flows with memory).

The main challenges addressed by RLFD include efficiently mapping flows into recursively divided groups, choosing the correct subdivision to reduce detection delay and FNs, and configuring RLFD to guarantee the absence of FPs.

Recursive subdivision.

To operate with limited memory, RLFD recursively subdivides monitored flows into groups, and subdivides only the one group most likely to contain a large flow.

We can depict an RLFD as a virtual counter tree333The terms “counter tree” and “virtual counter” are also used by Chen et al. [9], but our technique differs in both approach and goal. Chen et al. efficiently manage a sufficient number of counters for per-flow accounting, while RLFD manages an insufficient number of counters to detect consistent overuse. (Figure 2(a)) of depth . Every non-leaf node in this tree has children, each of which corresponds to a virtual counter. The tree is a full -ary tree of depth

, though at any moment, only one node (

counters) is kept in memory; the rest of the tree exists only virtually.

Each flow is randomly assigned to a path of counters on the virtual tree, as illustrated by the highlighted counters in Figure 2(b). This mapping is determined by hashing a flow ID with a keyed hash function, where the key is randomly generated by each router. Section 4.2 explains how RLFD efficiently implements this random mapping.

(a) Virtual Counter Tree
(Full -branch Tree)
(b) A Tree
(c) Example with flows,
, and .
Figure 2: RLFD Structure and Example.

Since there are levels, each leaf node at level will contain an average of flows, where is the total number of flows on the link. A flow is identified as a large flow if it is the only flow associated with its counter at level and the counter value exceeds a threshold . To reflect the flow specification from Section 2.1, we set , where is the duration of the period during which detection is performed at the bottom level . Any flow sending more traffic than during any duration of time must violate the large-flow threshold , so RLFD has no FPs. We provide more details about how we balance detection rate and the no-FP guarantee in Appendix 0.A.1.

RLFD considers only one node in the virtual counter tree at a time, so it requires only counters. To enable exploration of the entire tree, RLFD divides the process into periods; in period , it loads one tree node from level . Though these periods need not be of equal length, in this paper we consider periods of equal length , which results in a RLFD detection cycle .

RLFD always chooses the root node to monitor at level ; after monitoring at level , RLFD identifies the largest counter among the counters at level , and uses the node corresponding to that counter for level . Section 5.3

shows that choosing the largest counter detects large flows with high probability.

Figure 2(c) shows an example with counters, flows, and levels. is a low-rate large flow. In level , the largest counter is the one associated with large flow and legitimate flows and . At level , the flow set is selected and sub-divided. After the second round, is detected because it violates the counter value threshold .

Algorithm description.

As shown in Figure 3(a), the algorithm starts at the top level so each counter represents a child of the root node. At the beginning of each period, all counters are reset to zero. At the end of each period, the algorithm finds the counter holding the maximum value and moves to the corresponding node, so each counter in the next period is a child of that node. Once the algorithm has processed level , it repeats from the first level.

Figure 3(b) describes how RLFD processes each incoming packet. When RLFD receives a packet from flow , is dropped if is in the blacklist (a table that stores previously-found large flows). If is not in the blacklist, RLFD hashes to the corresponding counters in the virtual counter tree (one counter per level of the tree). If one such counter is currently loaded in memory, its value is increased by the size of the packet . At the bottom level , a large flow is identified when there is only one flow in the counter and the counter value exceeds the threshold . To increase the probability that a large flow is in a counter by itself, we choose and use Cuckoo hashing [33] at the bottom level to reduce collisions. Once a large flow is identified, it is blacklisted: in our evaluation we calculate the damage with the assumption that large flows are blocked immediately after having been added to the blacklist.

(a) Level Change Diagram.
(b) Packet Processing Diagram.
Figure 3: RLFD Decision Diagrams. “V.C.” stands for virtual counter.

4.2 Implementation of Hashing and Counter Checking

Hashing each flow into a path of virtual counters and checking whether any of these counters are loaded in the memory are two performance-critical operations of RLFD.

For each packet, our implementation only requires three bitwise operations (a hash operation, a bitwise AND operation, and a comparison over bits), thus requiring only time444This is not entirely exact, as the length of the hash output has to increase as . However, in any realistic scenario is small enough to be considered constant. and space on a modern -bit CPU.

A naive implementation of hashing could introduce unnecessary cost in computation and space. For example, a naive implementation may maintain one hash function per virtual counter array. To check whether an incoming flow needs to be monitored, it would have to check whether the incoming flow is hashed into every maximum-value counter in each level above the current level. However, this would take time for checking level by level and space for hash functions, where is the depth of the virtual counter tree.

Figure 4: RLFD Counter Hash and In-memory check. reflects the hash-generated bin number for all levels, reflects a mask that includes the first levels, and reflects the bins selected in each of the first levels. Flow is in the counter array exactly when .

Inspired by how a network router finds the subnet of an IP address, as Figure 4 illustrates, we map a flow to a virtual counter per level based on a single hash value. Specifically, given an incoming flow , we compute , and then do a bitwise AND operation of and a mask value of the current level . We then check whether the result is equal to the hash value of the currently loaded counter array (the th counter array in the th level). If the AND , then the virtual counter of in the level is in the currently loaded counter array .

Assuming RLFD has levels and counters in each counter array, we hash a flow ID into with bits, where . We require the system designer to only choose the base-2 exponential value for , so that the is an integer.

The bits 555 denotes a block of bits , . of are the index of the virtual counter in its counter array in the th level . As each counter array is determined by its ancestor counters as Figure 2(b) describes, the bits can uniquely determine the counter array in the level for the flow . Thus, to check whether the virtual counters of a flow is in memory, we just need to track the ancestor counters of the currently loaded counter array . We track the ancestor counters by , which is also a value of bits. The bits record the index of ancestor counters of , and the rest of bits are all s. To track , we just simply set the bits as the index of the selected counter at the end of the period of . The mask value for the level is also a value of bits, whose first bits are s and the rest are s. By AND , we extract the ancestor bits of the flow , and compare it with the ancestor bits of the loaded counter array. If they match, then the flow ’s counter is in the memory, and we update the counter with index by the size of the packets of the flow .

For each packet, our implementation above only need three basic operations: a hash operation, an AND operation, and a comparison over bits. Although the number of bits used in this implementation depends on and , a -bit long integer is enough in most of the cases, thus those operations only take O(1) CPU cycles in a modern -bit CPU.

4.3 RLFD Details and Optimization

We describe some of the details of RLFD and propose additional optimizations to the basic RLFD described in Section 4.1.

Hash function update.

We update the keyed hash function by choosing a new key at the beginning of every initial level to guarantee that the assignment of flows to counters between different top-to-bottom detection cycles is independent and pseudo-random. For simplicity, in this paper we analyze RLFD assuming the random oracle model. Picking a new key is computationally inexpensive and needs to be performed only once per cycle.


When RLFD identifies a large flow, the flow’s ID should be added to the blacklist as quickly as possible. Thus, we implement the blacklist with a small amount of L1 cache backed by permanent storage, e.g., main memory. Because the blacklist write only happens at the bottom-level period and the number of large flows detected in one iteration of the algorithm is at most one, we first write these large flows in the L1 cache and move them from L1 cache to permanent storage at a slower rate. By managing the blacklist in this way, we provide high bandwidth for blacklist writing, defending against attacks that overflow the blacklist.

Using multiple RLFDs.

If a link handles too much traffic to use a single RLFD, we can use multiple RLFDs in parallel. Each flow is hashed to a specific RLFD so that the load on each detector meets performance requirements. The memory requirements scale linearly in the number of RLFDs required to process the traffic.

4.4 RLFD Runtime Analysis

We analyze the runtime using the same CPU considered in EARDet [42]. An OC-768 ( Gbps) high-speed link can accommodate  million mid-size ( bit) packets per second. To operate at the line rate, a modern  GHz CPU must process each packet within  CPU cycles. A modern CPU might contain  KB L1 cache,  KB L2 cache, and  MB L3 cache. It takes , , and  CPU cycles to access L1, L2, and L3 CPU cache, respectively; accessing main memory is as slow as  cycles.

If, over a -Gbps link, we conservatively pick a large-flow threshold rate  kbps, a maximum of flows can be supported. An RLFD with flows and counters per level only needs levels to get an average of flows at the bottom level, causing only a few collisions for the counters at the bottom level which will be handled by the Cuckoo hashing approach. Even if we consider a much larger number of flows, such as  million, levels results in around flows at the bottom level. In such a -level RLFD, a flow’s path through the tree will require only bits, so a -bit integer is large enough for the hash value. In practice, the threshold rate is higher than  kbps, and the number of flows is likely to be under  million.

Computational complexity.

Based on the implementation and optimizations in Sections 4.2 and 4.3, RLFD performs the following steps on each packet: (1) a hash computation to find the flow’s path in the tree, (2) a bitwise AND operation to find the subpath down to the depth of the current period, (3) an integer comparison to determine if the flow is part of an active counter, and (4) a counter value update if the flow is hashed into the loaded counter array. Each of these operations is complexity and fast enough to compute within CPU cycles.

At the bottom level, after operations (1) to (3), RLFD performs the following steps: (5) a Cuckoo lookup/insert to find the appropriate counter, (6) a counter value update to represent the usage of a flow, (7) a large-flow check that compares the counter value with a threshold, and (8) an on-chip blacklist write if the counter has exceeded the threshold. Steps (5)(7) are only performed on packets from the small fraction of flows that are loaded in the bottom-level array; step (8) is only for packets of the flows identified as large flows in step (7), and this only happens once for each flow (if we block the large flows in the blacklist). Thus steps (5)(8) are executed much less frequently than steps (1)(4). Even so, steps (5)(8) have a constant time in expectation, and are negligible in comparison with steps (1)(4).

Storage complexity.

RLFD only keeps a small array of counters and a few additional variables: the hash function key, the -bit mask value for the current level, and the -bit identifier of the currently loaded counter array. Because we use Cuckoo hashing at the bottom level, besides a -bit field for the counter value, each counter entry needs to have a field for the associated flow ID key, which is bits in IPv4 and bits in IPv6. An array of  counters requires  KB in IPv4 and  KB for IPv6, which readily fits within the L1 cache. As discussed in Appendix 0.A.2, we can further shrink the flow ID field size to bits (with FP probability for each flow); if deployed, a  counter array is  KB and a  counter array is  KB for both IPv4 and IPv6, which can fit into the L1 cache ( KB).

4.5 RLFD’s Advantages and Disadvantages


With recursive subdivision and additional optimization techniques, RLFD is able to (1) identify low-rate large flows with non-zero probability, with probability close to 100% for flows that cause extensive damage (Section 5.3 analyzes RLFD’s detection probability); and (2) guarantee no-FP, eliminating damage due to FP.


First, a landmark-window-based algorithm such as RLFD cannot guarantee exact detection over large-flow specification based on arbitrary time windows [42] (landmark window and arbitrary window are introduced in Section 2.1). However, this approximation results in limited damage, as mentioned in Section 3. Second, recursive subdivision based on landmark time windows requires at least one detection cycle to catch a large flow. Thus, RLFD cannot guarantee low damage for flows with very high rates. Third, RLFD works most effectively when the large flow exceeds the flow specification in all levels, so bursty flows with a burst duration shorter than the RLFD detection cycle are likely to escape detection (where burst duration refers to the amount of time during which the bursty flow sends in excess of the flow specification).

4.6 CLEF Hybrid Scheme

We propose a hybrid scheme, CLEF, which is a parallel composition with one EARDet and two RLFDs (Twin-RLFD). This hybrid can detect both high-rate and low-rate large flows without producing FPs, requiring only a limited amount of memory. We use EARDet instead of Flow Memory in this hybrid scheme because EARDet’s detection is deterministic, thus has shorter detection delay.

Parallel composition of EARDet and RLFD.

As described in Section 3.2, we combine EARDet and RLFD in parallel so that RLFD can help EARDet detect low-rate flat flows, and EARDet can help RLFD quickly catch high-rate flat and bursty flows.

Twin-RLFD parallel composition.

RLFD is most effective at catching flows that violate flow specification across an entire detection cycle . An attacker can reduce the probability of being caught by RLFD by choosing a burst duration shorter than and an inter-burst duration greater than (thus reducing the probability that the attacker will advance to the next round during its inter-burst period). We therefore introduce a second RLFD (RLFD) with a longer detection cycle , so that a flow must have burst duration shorter than and burst period longer than to avoid detection by the Twin-RLFD (where RLFD and , are the first RLFD and its detection cycle respectively). For a given average rate, flows that evade Twin-RLFD have a higher burst rate than flows that evade a single RLFD. By properly setting and , Twin-RLFD can synergize with EARDet, ensuring that a flow undetectable by Twin-RLFD must use a burst higher than EARDet’s rate threshold .

Timing randomization.

An attacker can strategically send traffic with burst durations shorter than , but choose low duty cycles to avoid detection by both RLFD and EARDet. Such an attacker can only be detected by RLFD, but RLFD has a longer detection delay, allowing the attacker to maximize damage before being blacklisted. To prevent attackers from deterministically maximizing damage, we randomize the length of the detection cycles and .

5 Theoretical Analysis

Generic notations:
Rate of (outbound) link
Rate and burst threshold flow specification
Duty cycle of bursty flows ()
Period of burst
Average large-flow rate, and
Number of legitimate flows
; Maximum number of legitimate flows at rate
Number of counters available in a detector
; EARDet high-rate threshold rate
Expected overuse damage
RLFD notations:
Number of levels
Number of legitimate flows in the level
Time period of a detection level
Detection cycle
Detection prob. for flows with
When , approximately
When , approximately
Table 2: Table of Notations.

In this section, we discuss RLFD’s performance and its large-flow detection probability. We then compare CLEF with state-of-the-art schemes, considering various types of large flows under CLEF’s worst-case background traffic. Due to limited space, some derivations are in the appendix.

Detection probability.

Single-level detection probability is the probability that a RLFD selects a correct counter (containing at least one large flow) for the next level. Total detection probability is the probability that one copy of RLFD catches a large flow in a cycle , which is the product of the single-level detection probabilities across all levels in a cycle, minus the probability that two large flows will be assigned to the highest counter at the last level. The subtrahend is small and negligible when the number of levels is large enough.

5.1 RLFD Worst-case Background Traffic

Since our goal is to minimize worst-case damage, we assume the worst-case background traffic against RLFD in the rest of the analysis. Given a large flow, the worst-case background traffic is the legitimate flow traffic pattern that maximizes damage caused by a large flow. Since damage increases with expected detection delay (and thus decreases with single-level detection probability) in RLFD, we derive the worst-case background traffic by finding the minimum single-level detection probability for each level of RLFD. Theorem 5.1 states that the worst-case background traffic consists of threshold-rate legitimate flows fully utilizing the outbound link. The proof and discussion of Theorem 5.1 and are presented in Appendix 0.B.3.

Theorem 5.1

On a link with a threshold rate and an outbound link capacity , given an attack large flow , RLFD runs with the lowest probability to select the counter containing to the next level, when there are legitimate flows, each of which is at the rate of .

Figure 11 in Appendix 0.B.3 presents single-level detection probabilities for several different background traffic patterns, which empirically validates our theorem.

5.2 Characterizing Large Flows

To systematically compare CLEF with other detectors under various types of attack flows, we categorize large flows based on three characteristics, as Figure 5 illustrates:

  1. Burst Period (). A large flow sends a burst of traffic in a period of .

  2. Duty Cycle (). In each period of length , a large flow only sends packets during a continuous time period of and remains silent during the rest of the period.

  3. Average Rate (). This is the average volume of traffic sent from a large flow per second over a time interval much longer than the burst period . The instant rate during the burst chunk is .

Figure 5: Flow with average rate , burst period , duty cycle .

By remaining silent between bursts, attacks such as the Shrew attack [22] keep the average rate lower than the detection threshold to evade the detection algorithms based on landmark windows [30, 13, 19, 28, 29, 15, 16, 12].

A large flow may switch between different characteristic patterns over time, including ones that comply with flow specifications. The total damage in this case can be computed by adding up the damage inflicted by the large flow under each appearing pattern. Hence, for the purpose of the analysis, we focus our discussion on large flows with fixed characteristic patterns.

5.3 RLFD Detection Probability for Flat Flows

In order to detect a flat () large flow, the traffic of the flat large flow should be observable in each detection level.

The probability that RLFD catches one large flow in a detection cycle increases with the number of large flows passing through RLFD. Because a greater number of large flows implies that more counters may contain large flows in each level, RLFD has a higher chance of correctly selecting counters with large flows in the recursive subdivision. We therefore discuss the worst-case scenario for RLFD where only one large flow is present.

Because the operation in all but the bottom level of RLFD is similar and the only difference is the flows hashed to the counter array, we discuss the detection in a single level first and expand it to the whole detection cycle. Additional numeric examples are provided in Appendix 0.B.4.

Single-level detection probability.

Given the total number of flows traversing the link is , we can predict the expected number of flows in the th level by , where is the number of counters. Since depends only on the total number of flows and not the traffic distribution, we discuss a single-level detection with legitimate flows, counters, and a large flow at the rate of , where is the threshold rate and . When the context is clear, we use to stand for in the discussion of single-level detection.

According to Theorem 5.1, the worst-case background traffic is that all legitimate flows are at the threshold rate ; Theorem 5.2 shows an approximate lower bound of the single-level detection probability in such worst-case background traffic. The proof of Theorem 5.2 and its Corollaries 1 and 2 are presented in Appendix 0.C.1.

Theorem 5.2

Given counters in a level, legitimate flows at full rate , and a large flow with an average rate of , the probability that RLFD will correctly select the counter with large flow has an approximate lower bound of , where ;

is the cumulative distribution function (CDF) of the Poisson distribution


Corollary 1

For a detection level with legitimate flows, counters, and a large flow at the average rate of , the probability that RLFD will correctly select the counter of has an approximate lower bound of , where .

Corollary 2

For a detection level with legitimate flows, counters, and a large flow at the average rate of , the probability that RLFD will correctly select the counter of has an approximate lower bound of , where .

Total detection probability.

Theorem 5.3 describes the total probability of detecting a large flow in one detection cycle. Detailed proof is provided in Appendix 0.C.2.

Theorem 5.3

When there are legitimate flows and a flat large flow at the rate of , the total detection probability of a RLFD with counters has an approximate lower bound:


where , , and is the CDF of the Poisson distribution .

5.4 Twin-RLFD Theoretical Overuse Damage

To evaluate RLFD’s performance, we derive a theoretical bound on the damage caused by large flows against RLFD. Recall that there are two sources of damage: FP damage and overuse damage . Because RLFD has no FP, there is no need to consider . Thus, we only theoretically analyze .

Theorem 5.4 shows the expected overuse damage for flat flows and bursty flows against a Twin-RLFD. The proof is presented in Appendix 0.C.3. Additional numeric examples are in Appendix 0.B.5.

Theorem 5.4

A Twin-RLFD with and whose detection cycles are and , respectively, it can detect bursty flows at an average rate , where is the high-rate threshold rate of the EARDet. The expected overuse damage caused by such flows has the following upper bound:




and , ( when , and when ). The is the number of levels in RLFD, and is the CDF of the Poisson distribution . The damage of flat flow is that in the case of and .

We can see that a properly configured Twin-RLFD can detect bursty flows unable to be detected by EARDet (i.e., flows at average rate ).

Algorithm FP Overuse Damage (MB)
Damage Low-rate Large Flow High-rate Large Flow
Individual Twin-RLFD 0
Hybrid CLEF 0

Comparison in a Gbps link with threshold rate Kbps. Each of Twin-RLFD, EARDet, FM and AMF has counters (each of single RLFD has counters), and thus each of CLEF and AMF-FM has counters. In Twin-RLFD and CLEF, detection cycles sec, sec, and number of levels is . Attack Flows are busty flows with duty cycle of . The reasons for this Twin-RLFD configuration are shown in Appendix 0.B.5.

The overuse damage for FM is treated as infinity, due to the extremely low detection probability.

Table 3: Theoretical Comparison. CLEF outperforms other detectors with lower large flow damage. Damage in megabyte (MB).

5.5 Theoretical Comparison

We compare the CLEF hybrid scheme with the most relevant competitor, the AMF-FM hybrid scheme [15], which runs an AMF and a FM sequentially: all traffic is first sent to the AMF and the AMF sends detected large flows (including FPs) to the FM to eliminate FPs. For completeness, we also present the results of individual detectors, including Twin-RLFD, EARDet, AMF, and Flow Memory (FM). Table 3 summarizes the damage inflicted by different large-flow patterns when different detectors are deployed. The damage is calculated according to the analyses of AMF (Appendix 0.B.2), FM (Appendix 0.B.1), EARDet [42], and Twin-RLFD (Section 5.4). Figures 13(c) and 13(e) in Appendix 0.B.5 provide more details about Twin-RLFD’s overuse damage presented in Table 3.

Comparison setting.

To compare detectors in an in-core router setting, we allocate only counters for each detector, and we allocate counters for each RLFD in the Twin-RLFD for a fair comparison. Each hybrid scheme has counters in total to ensure fair comparison between hybrid schemes is fair.

We consider both high-rate large flows () and low-rate large flows (). is the minimum rate at which detection is guaranteed by EARDet, FM, and AMF-FM: . Low-rate large flows are further divided into three rate intervals for thorough comparison. For each rate interval, we consider the worst-case () and non-worst-case () burst length. The duty cycle of the bursty flow is set to , which is challenging for CLEF. Given an average rate , if is close to (close to ), a bursty flow is easily detected by EARDet (Twin-RLFD) in CLEF.

CLEF ensures lower damage.

As shown in Table 3, Twin-RLFD and CLEF outperform other detectors for identifying a wide range of low-rate flows. However, due to limited memory, it remains challenging for Twin-RLFD and CLEF to effectively detect large flows that are extremely close to the threshold.

We can see that Twin-RLFD fails to limit the damage caused by high-rate large flows, because the overuse damage is linear in of high-rate flows (due to the minimum detection delay of one cycle). Thus, CLEF uses EARDet to limit the damage caused by high-rate flows. CLEF is better than the AMF-FM hybrid scheme. This is because the FP from AMF (with limited memory) is too high to narrow down the traffic passed to the FM in the downstream, so that the FM’s performance is not improved.

(a) When
(b) When
Figure 6: Minimum Rate of Guaranteed Detection (shown as in figures), for flat large flows (), when link capacity and , where is threshold rate. Twin-RLFD and CLEF have much lower rate of guaranteed detection than other schemes when the memory is limited.

CLEF is memory-efficient.

We now consider the minimum rate of guaranteed detection () for flat flows (i.e., flat large flows () exceeding the rate ) of these detectors. The of Twin-RLFD and CLEF is bounded from above by (derived from Corollary 2), which is much less than the for EARDet and for FM and AMF-FM. This is especially true when the memory is extremely limited (i.e. ), where is the maximum number of legitimate flows at the threshold rate , and is the number of counters for each individual detector (each RLFD in Twin-RLFD has counters).

Figures 6(a) and 6(b) compare the amongst these three detectors given two link capacities: 1) (i.e., ), and 2) (i.e., ). The results suggest that Twin-RLFD and CLEF have a much lower than that of other detectors when memory is limited, and the is insensitive to memory size because RLFD can add levels to overcome memory shortage.

For bursty flows, CLEF’s is competitive to AMF-FM, due to EARDet.

6 Evaluation

We experimentally evaluate CLEF, RLFD, EARDet, and AMF-FM with respect to worst-case damage [41, Sec. 5.1]. We consider various large-flow patterns and memory limits and assume background traffic that is challenging for CLEF and RLFD. The experiment results confirm that CLEF outperforms other schemes, especially when memory is extremely limited.

6.1 Experiment Settings

Link settings.

Since the required memory space of a large-flow detector is sublinear to link capacity, we set the link capacity to Gbps, which is high enough to incorporate the realistic background traffic dataset while ensuring the simulation can finish in reasonable time. We choose a very low threshold rate KB/s, so that the number of full-use legitimate flows is , ensuring that the link is as challenging as a backbone link (as analyzed in Section 4.4 ). The flow specification is set to , where is set to 3028 bytes (which is as small as two maximum-sized packets, making bursty flows easier to catch).

The results on this 1Gbps link allow us to extrapolate detector performance to high-capacity core routers, e.g., in a 100Gbps link with MB/s. Because CLEF’s performance with a given number of counters is mainly related to the ratio between link capacity and threshold rate (as discussed in Section 5.3), CLEF’s worst-case performance will scale linearly in link capacity when the number of counters and the ratio between link capacity and threshold rate is held constant. AMF-FM, on the other hand, performs worse as the number of flows increases (according to Appendix 0.B.2 and 0.B.1). Thus, with increasing link capacity, AMF-FM may face an increased number of actual flows, resulting in worse performance. In other words, AMF-FM’s worst-case damage may be superlinear in link capacity. As a result, if CLEF outperforms AMF-FM in small links, CLEF will outperform AMF-FM by at least as large a ratio in larger links.

Background traffic.

We consider the worst background traffic for RLFD and CLEF: we determine the worst-case traffic according to Theorem 5.1. Aside from attack traffic, the rest of the link capacity is completely filled with full-use legitimate flows running at the threshold rate KB/s. The total number of attack flows and full-use legitimate flows is . Once a flow has been blacklisted by the large-flow detectors, we fill the idle bandwidth with a new full-use legitimate flow, to keep the link always running with the worst-case background traffic.

Attack traffic.

We evaluate each detector against large flows with various average rates and duty cycle . Their bursty period is set to be s. To evaluate RLFD and CLEF against their worst-case bursty flows (), large flows are allotted a relatively small bursty period s, where s is the period of each detection level in the single RLFD. In CLEF, RLFD uses the same detection level period s as well. Since RLFD usually has levels and , it is easy for attack flows to meet .

In each experiment, we have artificial large flows whose rates are in the range of KB/s to MB/s (namely, to times that of threshold rate ). The fewer large flows in the link, the longer delay required for RLFD and CLEF to catch large flows; however, the easier it is for AMF-FM to detect large flows, because there are fewer FPs from AMF and more frequent flow eviction in FM. Thus, we use attack flows to challenge CLEF and the results are generalizable.

Detector settings

We evaluate detectors with different numbers of counters () to understand their performance under different memory limits. Although a few thousands of counters are available in a typical CPU, not all can be used by one detector scheme. CLEF works reasonably well with such a small number of counters and can perform better when more counters are available.

  • EARDet. We set the low-bandwidth threshold to be the flow specification , and compute the corresponding high-rate threshold, , for counters as in [42].

  • RLFD. A RLFD has levels and counters. We set the period of a detection level as seconds666 If , it is hard for a large flow to reach the burst threshold in such a short time; if , the detection delay is too long, resulting in excessive damage.. to have fewer flows than the counters at the bottom level. The counter threshold of the bottom level is = = Bytes.

  • CLEF. We allocate counters to EARDet, and counters to each RLFD. RLFD and EARDet are configured like the single RLFD and the single EARDet above. For the RLFD, we properly set its detection level period to guarantee detection of most of bursty flows with low damage. The details of the single RLFD and CLEF are in Table 4 (Appendix 0.D).

  • AMF-FM. We allocate half of the counters to AMF and the rest to FM. AMF has four stages (a typical setting in [15]), each of which contains counters. All counters are leaky buckets with a drain rate of and a bucket size .

(a) Flat large flows,
(b) Bursty large flows,
(c) Bursty large flows,
(d) Bursty large flow,
(e) Bursty large flows,
Figure 7: Damage (in Bytes) caused by -second large flows at different average flow rate (in Byte/s) and duty cycle under detection of different schemes with different number of counters . The larger the dark area, the lower the damage guaranteed by a scheme. Areas with white color are damage equals or exceeds . CLEF outperforms other schemes in detecting flat flows, and has competitive performance to AMF-FM and EARDet over bursty flows.
(a) Flat,
(b) Bursty,
(c) Bursty,
(d) Bursty,
(e) Bursty,
Figure 8: Damage (in Bytes) caused by -second large flows at different average rate (in Byte/s) and duty cycle . Each detection scheme uses counters in total. The clear comparison among schemes suggests CLEF outperforms others with low damage against various large flows.
(a) Flat,
(b) Bursty,
(c) Bursty,
(d) Bursty,
(e) Bursty,
Figure 9: FN ratio in a -second detection for large flows at different average rate (in Byte/s) and duty cycle . Each detection scheme uses counters in total. CLEF is able to detect (FN) low-rate flows undetectable (FN) by AMF-FM or EARDet.

6.2 Experiment Results

For each experiment setting (i.e., attack flow configurations and detector settings), we did repeated runs and present the averaged results.

Figure 7(a) to 7(e) demonstrate the damage caused by large flows at different average rates, duty cycles, and number of detector counters during -second experiments; the lighter the color, the higher the damage. The damage  Byte is represented by the color white. Figures 8(a) to 8(e) compare damage in cases of different detectors with counters. Figures 9(a) to 9(e) show the percentage of FNs produced by each detection scheme with counters within seconds. We cannot run infinitely-long experiments to show the damage produced by detectors like EARDet and AMF-FM over low-rate flows, so we use the FN ratio to suggest it here. An FN of means that the detector fails to identify any large flow in seconds and is likely to miss large flows in the future. Thus, an infinite damage is assigned. On the contrary, if a detector has FN rate , it is able to detect remaining large flows at some point in the future.

CLEF ensures low damage against flat flows.

Figures 7(a), 8(a), and 9(a) support our theoretical analysis (in Section 5) that RLFD and CLEF work effectively at detecting low-rate flat large flows and guaranteeing low damage. On the contrary, such flows cause much higher damage against EARDet and AMF-FM. The nearly-black figure (in Figure 7(a)) for CLEF shows that CLEF is effective for both high-rate and low-rate flat flows with different memory limits. Figure 8(a) shows a clear damage comparison among detector schemes. CLEF, EARDet, and AMF-FM all limit the damage to nearly zero for high-rate flat flows. However, the damage limited by CLEF is much lower than that limited by AMF-FM and EARDet for the low-rate flat flows. EARDet and AMF-FM results show a sharp top boundary that reflects the damage dropping to zero at the guaranteed-detection rates.

The damage limited by an individual RLFD is proportional to the large-flow rate when the flow rate is high. Figure 9(a) suggests that AMF-FM and EARDet are unable to catch most low-rate flat flows ( Byte/sec), which explains the high damage by low-rate flat flows against these two schemes. This supports our theoretical analysis of AMF-FM and EARDet in Table 3: the infinite damage by low-rate flows against AMF-FM and EARDet.

CLEF ensures low damage against various bursty flows.

Figures 8(b) to 8(e) demonstrate the damage caused by bursty flows with different duty cycle . The smaller the is, the burstier the flow. As the large flows become burstier, the EARDet and AMF-FM schemes improve at detecting flows whose average rate is low. Because the rate at the burst is , which increases as decreases, thus EARDet and AMF-FM are able to detect these flows even though their average rates are low. For a single RLFD, the burstier the flows are, the harder it becomes to detect the large flows and limit the damage. As we discussed in Section 4.6, when the burst duration of flows is smaller than the RLFD detection cycle , a single RLFD has nearly zero probability of detecting such attack flows. Thus, we need Twin-RLFD in CLEF to detect bursty flows missed by EARDet in CLEF, so that CLEF’s damage is still low as the figures show. When the flow is very bursty (e.g., ), the damage limitation of the CLEF scheme is dominated by EARDet.

Figures 8(b) to 8(e) present a clear comparison among different schemes against bursty flows. The damage limited by CLEF is lower than that limited by AMF-FM and EARDet, when is not too small (e.g., ). Even though AMF-FM and EARDet have lower damage for very bursty flows (e.g., ) than the damage limited by CLEF, the results are close because CLEF is assisted by an EARDet with counters. Thus, CLEF guarantees a low damage limit for a wider range of large flows than the other schemes.

CLEF outperforms others in terms of FN and FP.

To make our comparison more convincing, we examine schemes with classic metrics: FN and FP. Since we know all four schemes have no FP, we simply check the FN ratios in Figures 9(a) to 9(e). Generally, CLEF has a lower FN ratio than AMF-FM and EARDet do. CLEF can detect large flows at a much lower rate with zero FN ratio, and is competitive to AMF-FM and EARDet against very bursty flows (e.g., Figures 9(b) and 9(e)).

CLEF is memory-efficient.

Figure 7(a) shows that the damage limited by RLFD is relatively insensitive to the number of counters. This suggests that RLFD can work with limited memory and is scalable to larger links without requiring a large amount of high-speed memory. This can be explained by RLFD’s recursive subdivision, by which we simply add one or more levels when the memory limit is low. Thus, we choose RLFD to complement EARDet in CLEF.

In Figure 7(a), CLEF ensures a low damage (shown in black) with tens of counters, while AMF-FM suffers from a high damage (shown in light colors), even with counters. This supports our theoretical results in Figures 6(a) and 6(b).

(a) Damage
(b) FN Ratio
Figure 10: Damage and FN ratio for large flows at different average rate (in Byte/s) and duty cycle under detection of CLEF with counters. CLEF is insensitive to bursty flows across duty cycles: 1) the damages are around the same scale (not keep increasing as duty cycle decrease, because of EARDet), 2) the FN ratios are stable and similar.

CLEF is effective against various types of bursty flows.

Figures 10(a) and 10(b) demonstrate the changes of damage and FN ratio versus different duty cycles when CLEF is used to detect bursty flows. In the -second evaluation, as decreases, the maximum damage across different average flow rates increases first by () and then decreases by (). The damage increases when because Twin-RLFD (in CLEF) gradually loses its capability to detect bursty flows. The damage therefore increases due to the increase in detection delay.

However, the maximum damage does not increase all the way as decreases, because when is getting smaller, EARDet is able to catch bursty flows with a lower average rate. This explains the lower damage from large flows in the -second timeframe. Figure 10(b) shows that the FN ratio curve changes within a small range as decreases, which also indicates the stable performance of CLEF against various bursty flows. Moreover, the FN ratios are all below , which means that CLEF can eventually catch large flows, whereas EARDet and AMF-FM cannot.

CLEF operates at high speed.

We also evaluated the performance of a Golang-based implementation under real-world traffic trace from the CAIDA [7] dataset. The implementation is able to process 11.8M packets per second, which is sufficient for a 10 Gbps Ethernet link, which has a capacity of 14.4M packets per second.

7 Related Work

The most closely related large-flow detection algorithms are described in Section 3.1 and compared in Sections 5 and 6. This section discusses other related schemes.

Frequent-item finding.

Algorithms that find frequent items in a stream can be applied to large-flow detection. For example, Lossy Counting [28] maintains a lower bound and an upper bound of each item’s count. It saves memory by periodically removing items with an upper bound below a threshold, but loses the ability to catch items close to the threshold. However, the theoretical memory lower bound of one-pass exact detection is linear to the number of large flows, which is unaffordable by in-core routers. By combining a frequent-item finding scheme with RLFD, CLEF can rapidly detect high-rate large flows and confine low-rate large flows using limited memory.

Collision-rich schemes.

To reduce memory requirement in large-flow utilization, a common technique is hashing flows into a small number of bins. However, hash collisions may cause FPs, and FPs increase as the available memory shrinks. For example, both multistage filters [15, 14] and space-code Bloom filters [21] suffer from high FPs when memory is limited.

Sampling-based schemes.

Sampling-based schemes estimate the size of a flow based on sampled packets. However, with extremely limited memory and thus a low sampling rate, neither packet sampling (e.g., Sampled Netflow [11]) nor flow sampling (e.g., Sample and Hold [15] and Sticky Sampling [28]) can robustly identify large flows due to insufficient information. In contrast, RLFD in CLEF progressively narrows down the candidate set of large flows, thereby effectively confining the damage caused by large flows.

Top-k detection.

Top-k heavy hitter algorithms can be used to identify flows that use more than of bandwidth. Space Saving [29] finds the top-k frequent items by evicting the item with the lowest counter value. HashPipe [36] improves upon Space Saving so that it can be practically implemented on switching hardware. However, HashPipe still requires keeping 80KB to detect large flows that use more than 0.3% of link capacity, whereas CLEF can enforce flow specifications as low as of the link capacity using only 10KB of memory. Tong et al. [37] propose an efficient heavy hitter detector implemented on FPGA but the enforceable flow specifications are several orders looser than CLEF. Moreover, misbehaving flows close to the flow specification can easily bypass such heavy hitter detectors. The FPs caused by heavy hitters prevent network operators from applying strong punishment to the detected flows.

Chen et al. [9] and Xiao et al. [43] propose memory-efficient algorithms for estimating per-flow cardinality (e.g., the number of packets). These algorithms, however, cannot guarantee large-flow detection in adversarial environments due to under- or over-estimation of the flow size.

Liu et al. [26] propose a generic network monitoring framework called UniMon that allows extraction of various flow statistics. It creates flow statistics for all flows, but has high FP and FN when used to detect large flows.

8 Conclusion

In this paper we propose new efficient large-flow detection algorithms. First, we develop a randomized Recursive Large-Flow Detection (RLFD) scheme, which uses very little memory yet provides eventual detection of persistently large flows. Second, we develop CLEF, which scales to Internet core routers and is resilient against worst-case traffic. None of the prior approaches can achieve the same level of resilience with the same memory limitations. To compare attack resilience among various detectors, we define a damage metric that summarizes the impact of attack traffic on legitimate traffic. CLEF can confine damage even when faced with the worst-case background traffic because it combines a deterministic EARDet for the rapid detection of very large flows and two RLFDs to detect near-threshold large flows. We proved that CLEF is able to guarantee low-damage large-flow detection against various attack flows with limited memory, outperforming other schemes even with CLEF’s worst-case background traffic. Further experimental evaluation confirms the findings of our theoretical analysis and shows that CLEF has the lowest worst-case damage among all detectors and consistently low damage over a wide range of attack flows.

9 Acknowledgments

We thank Pratyaksh Sharma and Prateesh Goyal for early work on this project as part of their summer internship at ETH in Summer 2015. We also thank the anonymous reviewers, whose feedback helped to improve the paper.

The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013), ERC grant agreement 617605, the Ministry of Science and Technology of Taiwan under grant number MOST 107-2636-E-002-005, and the US National Science Foundation under grant numbers CNS-1717313 and CNS-0953600. We also gratefully acknowledge support from ETH Zurich and from the Zurich Information Security and Privacy Center (ZISC).


  • [1] Andersen, D.G., Balakrishnan, H., Feamster, N., Koponen, T., Moon, D., Shenker, S.: Accountable internet protocol (AIP). In: Proceedings of ACM SIGCOMM (2008).,
  • [2] Anderson, T., Birman, K., Broberg, R., Caesar, M., Comer, D., Cotton, C., Freedman, M.J., Haeberlen, A., Ives, Z.G., Krishnamurthy, A., et al.: The nebula future internet architecture. In: The Future Internet Assembly. pp. 16–26. Springer (2013)
  • [3] Antonakakis, M., April, T., Bailey, M., Bernhard, M., Bursztein, E., Cochran, J., Durumeric, Z., Halderman, A.J., Invernizzi, L., Kallitsis, M., Kumar, D., Lever, C., Ma, Z., Mason, J., Menscher, D., Seaman, C., Sullivan, N., Thomas, K., Zhou, Y.: Understanding the Mirai botnet. In: USENIX Security Symposium (2017)
  • [4] Basescu, C., Reischuk, R.M., Szalachowski, P., Perrig, A., Zhang, Y., Hsiao, H.C., Kubota, A., Urakawa, J.: SIBRA: Scalable internet bandwidth reservation architecture. In: Proceedings of Network and Distributed System Security Symposium (NDSS) (Feb 2016)
  • [5] Berenbrink, P., Friedetzky, T., Hu, Z., Martin, R.: On weighted balls-into-bins games. Theoretical Computer Science 409(3), 511–520 (2008)
  • [6] Braden, R., Clark, D., Shenker, S.: Integrated Services in the Internet Architecture: an Overview. RFC 1633 (Informational) (Jun 1994),
  • [7] CAIDA: Caida Anonymized Internet Traces 2016. (2016)
  • [8]

    Cameron, A.C., Trivedi, P.K.: Regression analysis of count data, vol. 53. Cambridge university press (2013)

  • [9] Chen, M., Chen, S., Cai, Z.: Counter tree: a scalable counter architecture for per-flow traffic measurement. IEEE/ACM Transactions on Networking (2016)
  • [10]

    Choi, K.P.: On the medians of gamma distributions and an equation of ramanujan. Proceedings of the American Mathematical Society

    121(1), 245–251 (1994)
  • [11] Claise, B.: Cisco Systems NetFlow Services Export Version 9. RFC 3954 (Informational) (Oct 2004),
  • [12] Cormode, G., Muthukrishnan, S.: An Improved Data Stream Summary: The Count-Min Sketch and its Applications. Journal of Algorithms 55(1), 58–75 (2005).,
  • [13] Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency Estimation of Internet Packet Streams with Limited Space. In: Proceedings of ESA (2002),
  • [14] Estan, C.: Internet Traffic Measurement: What’s Going on in my Network? Ph.D. thesis (2003)
  • [15] Estan, C., Varghese, G.: New Directions in Traffic Measurement and Accounting: Focusing on the Elephants, Ignoring the Mice. ACM Transactions on Computer Systems (TOCS) 21(3), 270–313 (2003),
  • [16] Fang, M., Shivakumar, N.: Computing Iceberg Queries Efficiently. In: Proceedings of VLDB (1999),
  • [17] Han, D., Anand, A., Dogar, F., Li, B., Lim, H., Machado, M., Mukundan, A., Wu, W., Akella, A., Andersen, D.G., Byers, J.W., Seshan, S., Steenkiste, P.: XIA: Efficient support for evolvable internetworking. In: Proc. 9th USENIX NSDI. San Jose, CA (Apr 2012)
  • [18] Intel: Intel Xeon Processor E7 v4 Family. (2016)
  • [19] Karp, R.M., Shenker, S., Papadimitriou, C.H.: A Simple Algorithm for Finding Frequent Elements in Streams and Bags. ACM Transactions on Database Systems 28(1), 51–55 (2003).,
  • [20] Kim, T.H.J., Basescu, C., Jia, L., Lee, S.B., Hu, Y.C., Perrig, A.: Lightweight source authentication and path validation. In: ACM SIGCOMM Computer Communication Review. vol. 44, pp. 271–282. ACM (2014)
  • [21] Kumar, A., Xu, J., Wang, J.: Space-code bloom filter for efficient per-flow traffic measurement. IEEE Journal on Selected Areas in Communications 24(12), 2327–2339 (2006)
  • [22] Kuzmanovic, A., Knightly, E.: Low-rate TCP-targeted denial of service attacks: the shrew vs. the mice and elephants. In: Proceedings of ACM SIGCOMM. pp. 75–86 (2003),
  • [23] Lee, S.B., Kang, M.S., Gligor, V.D.: CoDef: Collaborative defense against large-scale link-flooding attacks. In: Proceedings of CoNext (2013)
  • [24] Li, A., Liu, X., Yang, X.: Bootstrapping accountability in the Internet we have. In: Proceedings of USENIX/ACM NSDI (Mar 2011)
  • [25] Liu, X., Li, A., Yang, X., Wetherall, D.: Passport: Secure and adoptable source authentication. In: Proceedings of USENIX/ACM NSDI (2008),
  • [26] Liu, Z., Manousis, A., Vorsanger, G., Sekar, V., Braverman, V.: One Sketch to Rule Them All: Rethinking Network Flow Monitoring with UnivMon. In: ACM SIGCOMM (2016).
  • [27] Liu, Z., Jin, H., Hu, Y.C., Bailey, M.: MiddlePolice: Toward enforcing destination-defined policies in the middle of the internet. In: Proceedings of ACM CCS (Oct 2016)
  • [28] Manku, G., Motwani, R.: Approximate Frequency Counts over Data Streams. In: Proceedings of VLDB (2002),
  • [29] Metwally, A., Agrawal, D., El Abbadi, A.: Efficient computation of frequent and top-k elements in data streams. In: International Conference on Database Theory. pp. 398–412. Springer (2005)
  • [30] Misra, J., Gries, D.: Finding Repeated Elements. Science of Computer Programming 2(2), 143–152 (1982)
  • [31] Mitzenmacher, M.: Some open questions related to cuckoo hashing. In: European Symposium on Algorithms. pp. 1–10. Springer (2009)
  • [32] Naous, J., Walfish, M., Nicolosi, A., Mazières, D., Miller, M., Seehra, A.: Verifying and enforcing network paths with ICING. In: Proceedings of ACM CoNEXT (2011).
  • [33] Pagh, R., Rodler, F.F.: Cuckoo hashing. In: European Symposium on Algorithms. pp. 121–133. Springer (2001)
  • [34] Raab, M., Steger, A.: ”balls into bins” - a simple and tight analysis. In: International Workshop on Randomization and Approximation Techniques in Computer Science. pp. 159–170. Springer (1998)
  • [35] Shenker, S., Partridge, C., Guerin, R.: Specification of Guaranteed Quality of Service. RFC 2212 (Proposed Standard) (Sep 1997),
  • [36] Sivaraman, V., Narayana, S., Rottenstreich, O., Muthukrishnan, S., Rexford, J.: Heavy-hitter detection entirely in the data plane. In: Proceedings of the Symposium on SDN Research. pp. 164–176. ACM (2017)
  • [37] Tong, D., Prasanna, V.: High throughput sketch based online heavy hitter detection on fpga. ACM SIGARCH Computer Architecture News 43(4), 70–75 (2016)
  • [38] Trybulec, W.A.: Pigeon hole principle. Journal of Formalized Mathematics 2(199),  0 (1990)
  • [39]

    Weisstein, E.W.: Pearson’s Skewness Coefficients. From MathWorld–A Wolfram Web Resource (2017),
  • [40] Weisstein, E.W.: Poisson Distribution. From MathWorld–A Wolfram Web Resource (2017),
  • [41] Wu, H., Hsiao, H.C., Asoni, D.E., Scherrer, S., Perrig, A., Hu, Y.C.: CLEF: Limiting the Damage Caused by Large Flows in the Internet Core (Technical Report). Tech. Rep. arXiv:xxxx.yyyy [cs.NI], ArXiv (2018), TBD
  • [42] Wu, H., Hsiao, H.C., Hu, Y.C.: Efficient large flow detection over arbitrary windows: An algorithm exact outside an ambiguity region. In: Proceedings of the 2014 Conference on Internet Measurement Conference. pp. 209–222. ACM (2014)
  • [43] Xiao, Q., Chen, S., Chen, M., Ling, Y.: Hyper-compact virtual estimators for big network data based on register sharing. In: ACM SIGMETRICS Performance Evaluation Review. vol. 43, pp. 417–428. ACM (2015)
  • [44] Zhang, X., Hsiao, H.C., Hasker, G., Chan, H., Perrig, A., Andersen, D.G.: SCION: Scalability, control, and isolation on next-generation networks. In: IEEE Symposium on Security and Privacy. pp. 212–227 (2011)

Appendix 0.A Additional Details For RLFD Data Structure and Optimization

0.a.1 Analysis for No-FP Guarantee

To guarantee no FP, we only identify large flows whose counter has no second flow, i.e. no flow hash collision . If we randomly hash flows into counters at the bottom level , the no-collision probability for a counter is , where is the number of flows selected into . Because we want to have as small as possible, thus, we usually may choose , where n is the total number of flows in the link. Thus, on average. Thus,


When , the no-collision probability , which gives a collision probability for each flow of .

To avoid the high collision probability in the regular hash above, we randomly pick flows (out of flows) instead. Each of flows is monitored by a dedicated counter (which does not introduce additional FNs, because ). To efficiently implement this counter assignment, we can use Cuckoo hashing [33] to achieve constant expected flow insertion time and worst-case constant lookup and update time. Cuckoo hashing resolves collisions by using two hash functions instead of only one in regular hashing. As in [31], Mitzenmacher shows that, with three hash functions, Cuckoo hashing can achieve expected constant insertion and lookup time with load factor of . Thus, when , the Cuckoo hashing can achieve , which is still much larger than in the regular hashing. As is usually less than (because we set to be the ceiling of ), it is reasonable to treat the in our later analysis. Cuckoo hashing requires to store both the key ( bits for IPv4, bits for IPv6) and value ( bits) of an entry, thus, for each counter, we need space for the flow ID and the counter value.

0.a.2 Shrinking Counter Entry Size

As we discussed, the number of flows hashed into the bottom level is much less than (e.g. at most ). a key space of bits ( for IPv6) is too large for less than keys. We can hash the flow IDs into a smaller key space, e.g. bits to save memory size. For each flow, although hash collision could happen and may result in FP in the detection in the bottom level, the probability is less than which is very small. For systems can tolerate such extremely low FP probability, we recommend it to do so.

Appendix 0.B Additional Analysis

0.b.1 Flow Memory Analysis

We analyze the Flow Memory (FM) with random flow eviction mechanism, which is applied with multistage filters in [15]. For each incoming packet whose flow is not tracked, such FM randomly picks a flow from the tracked flows and the new flow to evict. Thus, for each packet of the flow not tracked, the existing tracked flow has a probability to be evicted, where is the number of counters in the FM.

Theorem 0.B.1

In a link with total traffic rate of (), the packet size of , and the large-flow threshold , a Flow Memory with counters is able to detect large flows at rate around or higher than with high probability.

Proof sketch:

We assume number of packets arriving at the FM per second is at the packet rate of , thus the time gap between two incoming packets is . For a newly tracked flow at time stamp , the th eviction happens at , and is the probability that flow is evicted at the th eviction. Evictions are not triggered by packets of flows being tracked, however the number of flows untracked is far larger than the number of flows being tracked, thus we can approximate treat the time gap between evictions as . Thus, the expected time length for the flow to be tracked is


As the FM uses leaky bucket counters to enforce the large-flow threshold (defined in Section 2.1), the counter threshold is the burst threshold . Thus, to detect a large flow at traffic rate of , the FM requires the large flow being tracked at least for a time of , otherwise the counter value cannot reach the threshold. Therefore,


Thus for the large flows at rates far smaller than the are likely to be evicted before violating the threshold .

In the practice, the packet size is not fixed, but we treat it with fixed size for analyzing the least changes along with the . Because the real packet size is also limited in Bytes, the is a bounded factor. As the is usually larger than the maximum packet size, the for sure.

We can see the scale of the large flow rate can be detected by FM is similar to that can be detected by EARDet (i.e., , where is the link capacity). They both increase as increases. In the worst case of the FM, when the traffic rate is at link capacity (), the least detectable average rates of the FM and the EARDet are at the same scale. One difference between them is that the EARDet can guarantee deterministic detection, while the Flow Memory detects flows probabilistically. Our simulations in Section 6 support the analysis above.

0.b.2 Multistage Filter Analysis

According to the theoretical analysis in [15], a -counter multistage filter with stages each of which has counters, the probability for a flow hashed into a counter in each stage without collision () to other flows is as follows. We let , and assume there are flows in total, then


where we assume the and . The assumptions are reasonable: 1) the number of counters is usually around hundreds, and the is typically chosen as in [15], therefore ; 2) we aim to use very limited counters to detect large flows from a large number of legitimate flows, thus .

In the case that every legitimate flow is higher than the half of the threshold rate , the false positive rate is almost , because the is close to . Any collision in a counter results in that the counter value violates the counter threshold and thus a falsely positive on legitimate flows.

0.b.3 RLFD Worst-case Background Traffic

General case: weighted balls-into-bins problem.

In the well-known balls-into-bins problem, we have bins and balls. For each ball, we randomly throw it into one of bins.

We treat the flows in the network as the balls, and the counter array as the bins. Hashing flows into counters is just like randomly throwing balls into bins, where each flow is a weighted ball with weight of its traffic volume sent during a period of each level ().

Worst case: single-weight balls-into-bins problem

We assume the rate threshold of our flow specification, , is , where the is the outbound link capacity. In the general case, the legitimate flows are at average rates less than or equal to the threshold rate , however we show that the worst case background traffic for RLFD to detecting a large flow is that all legitimate flows are sending traffic at the rate of the threshold rate (Theorem 5.1). As the inbound link capacity can be larger than the outbound one, there still could be attack flows in this case. We prove the Theorem 5.1 by the Theorem 0.B.2 from Berenbrink et al. [5] which is for weighted balls-into-bins games.

Theorem 0.B.2

Berenbrink et al.’s Theorem 3.1 For two weighted balls-into-bins games and of balls and

bins, the vectors

and represent the weight of each ball in two and , respectively. If and for all , then for all , where the is the total load of the highest bins, and the is the expected across all possible balls-into-bins combinations.

Lemma 1 and Proof sketch

Lemma 1

The RLFD has the lowest probability to correctly select the counter of a large flow to the next level, when the legitimate flows use up all legitimate bandwidth.

We assume and are two different counter states after adding the attack traffic and the traffic of some legitimate flows, and there are more volume of traffic allowed to send by the other legitimate flows before the total volume of legitimate flows reaches the outbound link capacity. Let be the value of the counter assigned to , and the be the maximum value of other counters. In the , we let ; in the , we let . Hence, and cover all possible counter states. As there are still up to volume of legitimate flows can be added into counters. We use and to represent the final value of and . Thus, the probability to select the counter of is


Because in , and the cannot exceed , thus always . Then,