Efficient Loop Detection in Forwarding Networks and Representing Atoms in a Field of Sets

09/06/2018 ∙ by Laurent Viennot, et al. ∙ 0

The problem of detecting loops in a forwarding network is known to be NP-complete when general rules such as wildcard expressions are used. Yet, network analyzer tools such as Netplumber (Kazemian et al., NSDI'13) or Veriflow (Khurshid et al., NSDI'13) efficiently solve this problem in networks with thousands of forwarding rules. In this paper, we complement such experimental validation of practical heuristics with the first provably efficient algorithm in the context of general rules. Our main tool is a canonical representation of the atoms (i.e. the minimal non-empty sets) of the field of sets generated by a collection of sets. This tool is particularly suited when the intersection of two sets can be efficiently computed and represented. In the case of forwarding networks, each forwarding rule is associated with the set of packet headers it matches. The atoms then correspond to classes of headers with same behavior in the network. We propose an algorithm for atom computation and provide the first polynomial time algorithm for loop detection in terms of number of classes (which can be exponential in general). This contrasts with previous methods that can be exponential, even in simple cases with linear number of classes. Second, we introduce a notion of network dimension captured by the overlapping degree of forwarding rules. The values of this measure appear to be very low in practice and constant overlapping degree ensures polynomial number of header classes. Forwarding loop detection is thus polynomial in forwarding networks with constant overlapping degree.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the multiplication of network protocols, network analysis has become an important and challenging task. We focus on a key diagnosis task: detecting possible forwarding loops. Given a network and node forwarding tables, the problem consists in testing whether there exists a packet header and a directed cycle in the network topology such that a packet with header will indefinitely loop along the cycle. This problem is indeed NP-complete as noted in [19]. Its hardness comes from the use of compact representations for predicate filters: the set of headers that match a rule is classically represented by a prefix in IP forwarding, a general wildcard expression in Software-Defined Networking (SDN), value ranges in firewall rules, or even a mix of such representations if several header fields are considered.

We first give a toy example of forwarding loop problem where the predicate filter of each rule is given by a wildcard expression, that is an -letter string in . Such an expression represents the set all -bit headers obtained by replacing each of the expression by either or . A packet with header in that set is said to match the rule. Figure 1 illustrates a one node network with wildcard expressions of letters. Rules are tested from top to bottom. All rules indicate to drop packets except the last one that forwards packets to the node itself. This network contains a forwarding loop if there exists a header that matches no rule except the last one. Not matching a rule as corresponds to having , , or . This one node network thus has a forwarding loop iff the formula is satisfiable, which is not the case. This simple example can easily be generalized to reduce SAT to forwarding loop detection in networks with wildcard rules. It also points out a key problem: testing the emptiness of expressions such as where are the sets associated to rules.

Figure 1: Does this one node network have a forwarding loop ?

As packet headers in practical networks such as Internet typically have hundreds of bits, search of the header space is completely out of reach. The main challenge for solving such a problem thus resides in limiting the number of tests to perform. For that purpose, previous works [17, 15] propose to consider sets of headers that match some predicate filters and do not match some others. Defining two headers as equivalent when they match exactly the same predicate filters, it then suffices to perform one test per equivalence class. These classes are indeed the atoms (the minimal non-empty sets) of the field of sets (the (finite) -algebra) generated by the sets associated to the rules.

A first challenge lies in efficiently identifying and representing these atoms. This would be fairly easy if both intersection and complement could be represented efficiently. In practice, most classical compact data-structures for sets of bit strings are closed under intersection but not under complement. For example, the intersection of two wildcard expressions, if not empty, can obviously be represented by a wildcard expression, but the complement of a wildcard expression is more problematic. Previous works overcome this difficulty by representing the complement of a -letter wildcard expression as the union of several wildcard expressions (up to ). However, this can result in exponential blow-up and the tractability of these methods rely on various heuristics that do not offer rigorously proven guarantees.

A second challenge lies in understanding the tractability of practical networks. One can easily design a collection of wildcard expressions that generates all the possible singletons as atoms (all the -letters strings with only one non- letter). What does prevent such phenomenon in practice? Can we provide a property that intuitively fits with practical network and guarantees that the number of atoms does not blow up? This paper aims at addressing both challenges with provable guarantees.

Related work

The interest for network problem diagnosis has recently grown after the advent of Software-Defined Networking (SDN) [23, 18, 21, 12]. SDN offers the opportunity to manage the forwarding tables of a network with a centralized controller where full knowledge is available for analysis. Previous works has led to a series of methods for network analysis, resulting in several tools [17, 13, 19, 27]. The main approaches rely on computing classes of headers by combining rule predicate filters using intersection and set difference (that is intersection with complement). The idea of considering all header classes generated by the global collection of the sets associated to all forwarding rules in the network is due to Veriflow [17]. However, the use of set differences results in computing a refined partition of the atoms of the field of sets generated by this collection that can be much larger than an exact representation. NetPlumber [13], which relies on the header space analysis introduced in [15], refines this approach by considering the set of headers that can follow a given path of the network topology. This set is represented as a union of classes that match some rules (those that indicate to forward along the path) and not some others (those that have higher priority and deviate from the path): a similar problem of atom representation thus arises. The idea of avoiding complement operations is somehow approached in the optimization called “lazy subtraction” that consists in delaying as much as possible the computation of set differences. However, when a loop is detected, formal expressions with set differences have to be tested for emptiness. They are then actually developed, possibly resulting in the manipulation of expressions with exponentially many terms.

Concerning the tractability of the problem, the authors of NetPlumber observe a phenomenon called “linear fragmentation” [15, 14] that allows to argue for the efficiency of the method. They introduce a parameter measuring this linear fragmentation and claim a polynomial time bound for loop detection for low  [15] (when emptiness tests are not included in the analysis). However, the rigorous analysis provided in [14] includes a factor where is the diameter of the network graph. While this factor appears to be largely overestimated in practice, the sole hypothesis of linear fragmentation does not suffice for explaining tractability and prove polynomial time guarantees. The alternative approach of Veriflow is specifically optimized for rules resulting from range matching within each field of the header. When the number of fields is constant, polynomial time execution can be guaranteed but this result does not extend to general wildcard matching.

A similar problem consists in conflict detection between rules and their resolution [1, 9, 5]. It has mainly been studied in the context of multi-range rules [1, 9], which can benefit from computational geometry algorithms. (A multi-range can be seen as a hyperrectangle in a -dimensional euclidean space where is the number of fields composing headers.) Another similar problem, determining efficiently the rule that applies to a given packet, has been studied for multi-ranges [9, 10, 11]. In the case of wildcard matching, such problems are related to the old problem of partial matching [24]

. It is believed to suffer from the “curse of dimensionality” 

[3, 22] and no method significantly faster than exhaustive search is expected to be found with near linear memory (although some tradeoffs are known for small number of letters [7]). However, efficient hardware-based implementations exist based on Ternary Content Addressable Memory (TCAMs) [4] or Graphics Processing Unit (GPU) [26].

Regarding the manipulation of set collections, recent work [20] shows how to enumerate all sets obtained by closure under given set operations with polynomial delay. In particular, this allows to produce the field of sets generated by the collection. However, this setting requires a set representation which explicitly lists all elements and does not apply here. Another issue comes from the fact that the field set can be exponentially larger than its number of atoms.

Our contributions

First, we make a key algorithmic step by providing an efficient algorithm for computing an exact representation of the atoms of the field of sets generated by a collection of sets. The representation obtained is linear in the number of atoms and allows to test efficiently if an atom is included in a given set of the collection. The main idea is to represent an atom by the intersection of the sets that contain it. We avoid complement computations by using cardinality computations for testing emptiness. Our algorithm is generic and supports any data-structure for representing sets of -bit strings that supports intersection and cardinality computation in bounded time for some value . It runs in polynomial time with respect to , the number of sets and atoms respectively. Beyond combinations of wildcard and range expressions, we believe that it could be extended to support expressions on hashed values (for a fixed hash function) or bloom filters by defining auxiliary cardinality measures.

Rule repr. Trivial NetPlumber [13] Veriflow [17] This paper
-bounded
” ov. deg.
” ov. deg.
-wildcard
” ov. deg.
-multi-rng.
” ov. deg.
Table 1: Worst-case complexity of forwarding loop detection with rules that generate header classes in an -node network, for various rule set representations: -bounded for intersection and cardinality computations in time; -wildcard for wildcard expressions with letters; -multi-rng. for multi-ranges in dimension (with ). Additional hypothesis “ov. deg. ” stands for overlapping degree of rule sets bounded by .

Second, we provide a dimension parameter, the overlapping degree , that captures the complexity of a collection of rule sets considered in a forwarding network. It is defined as the maximum number of distinct rules (i.e. with pairwise distinct associated sets) that match a given header. This parameter constitutes a measure of complexity for the field of sets generated by a given collection of sets. In the context of practical hierarchical networks, we have the following intuitive reason to believe that this parameter is low: in such networks, more specific rules are used at lower levels of the hierarchy. We can thus expect that the overlapping degree is bounded by the number of layers of the hierarchy. Empirically, we observed a value within for datasets with hundreds to thousands of distinct multi-field rules, and for the collection of IPv4 prefixes advertised in BGP. A constant overlapping degree implies that the number of header classes is polynomially bounded, giving a hint on why practical networks are tractable despite the NP-completeness of the problem. In addition, the algorithm we propose is tailored to take advantage of low overlapping degree , even without knowledge of . Table 1 provides a summary of the complexity results obtained for loop detection depending on how the sets associated to rules are represented. Our techniques can be extended to handle write actions (partial writes of fixed values such as field replacement) while maintaining polynomial guarantees.

Our algorithm for atom computation solves two difficult technical issues. First, it manages to remain polynomial in the number of atoms even though the number of sets generated by intersection solely can be exponential in with general rules. Second, the use of cardinality computations allows to avoid exponential blow-up (in contrast with previous work) but naturally induces a quadratic term in the complexity. However, we manage to reduce it to in the case of collections with logarithmic overlapping degree (i.e. ). Indeed, our algorithm then becomes linear in and polynomial in , providing an algorithmic breakthrough towards efficient atom enumeration.

Roadmap: Section 2 introduces the model. Section 3 describes how to represent the atoms of a field of sets. We state in Section 4 our main result concerning atom computation and its implications for forwarding loop detection that give the upper bounds listed in Table 1. Section 5 gives more insight about the comparison of our results with previous works and justifies the lower bounds presented in Table 1. Finally, Section 6 discusses some perspectives.

2 Model

2.1 Problem

We consider a general model of network where a network instance is characterized by:

  • a graph where each node is a router and has a forwarding table .

  • a natural number representing the (fixed) bit-length of packet headers.

Let denote the set of all possible headers (all -bit strings). Each forwarding table is an ordered list of forwarding rules . Each forwarding rule , is made of a predicate filter and an action to apply on any packet whose header matches the predicate. We say that a header matches rule when it matches predicate (we may equivalently say that matches ). We then write to emphasize the fact that can be viewed as a compact data-structure encoding the set of headers that match it. This set is called the rule set associated to . For the ease of notation, we thus let denote both the predicated filter of rule and the associated set.

We consider three possible actions for a packet: forward to a neighbor, drop, or deliver (when the packet has reached destination). The priority of rules is given by their ordering: when a packet with header arrives at node , the first rule matched by is applied. Equivalently, the rule is applied when , where denotes the complement of . When no match is found (i.e. ), the packet is dropped.

Given a header , the forwarding graph of represents the forwarding actions taken on a packet with header : when the first rule that matches in indicates to forward to . The forwarding loop detection problem consists in deciding whether there exists a header such that has a directed cycle.

Note that we make the simplifying assumption that the input port of an incoming packet is not taken into account in the forwarding decision of a node. In a more general setting, a node has a forwarding table for each incoming link. This is essentially the same model except that we consider the line-graph of instead of .

2.2 Header Classes

A natural relation of equivalency exists between headers with respect to rules: two headers are equivalent if they match exactly the same rules, that is if they belong to the same rule sets. Trivially, two equivalent headers let the corresponding packets have exactly the same behavior in the network. The resulting equivalence classes partitions the header set into nonempty disjoint subsets called header classes. To check any property of the network, it suffices to do it on a class-by-class basis instead of a header-by-header basis. The number of header classes is thus a natural parameter when considering the difficulty of forwarding loop detection (or other similar network analysis problems). A more accurate definition of classes could take into account the order of rules and the topology (see Appendix E) but this does not allows us to obtain complexity gain in general.

The header classes can be defined according to the collection of rule sets of (i.e. ). If denotes the set of all rule sets associated to the rules matched by a given header , then its header class is clearly equal to (with the convention ). Such sets are the atoms of the field of sets generated by . Their computation is the main topic of this paper and is detailed in the next section.

2.3 Set representation

As we focus on the collection of rule sets, we now detail our hypothesis on their representation. We assume that a data-structure allows to represent some of the subsets of a space . For the ease of notation, also denotes the collection of subsets that can be represented with . We assume that is closed under intersection: if and are in , so is . We say that such a data-structure for subsets of is -bounded when intersection and cardinality can be computed in time at most: given the representation of , the representation of and the size of (as a binary big integer) can be computed within time . As big integers computed within time have bits, this implies : the bound obviously depends on . Intersection, inclusion test (), cardinality computation () and cardinality operations (addition, subtraction and comparison) are called elementary set operations. Under the -bounded hypothesis, all these operations can be performed in time ( is equivalent to ).

Two typical examples of data-structures meeting the above requirements are wildcard expressions and multi-ranges. In a forwarding network, we consider the header space of all -bit strings, which may be decomposed in several fields. A rule set is typically represented by a wildcard expression or a range of integers. In both cases, they can be represented within bits and both representation are -bounded. We call -wildcard a string . It represents the set . If rules are decomposed into fields, any combinations of wildcard expressions and ranges can be used (either one for each field) and represented within bits. However cardinality computations can take time as multiplications of big integers are required. Such representation is thus -bounded. Given field lengths with sum , we call -multi-range a cartesian product of integer ranges with for in . It represents the set where is the binary representation of within bits.

When manipulating a collection of sets in , we assume that their representations are stored in a balanced binary search tree, allowing to dynamically add, remove or test membership of a set in time. More efficient data-structures (tries and segment trees) can be used for wildcard expressions and multi-ranges as detailed in Appendix A.1.

3 Atoms and combinations generated by a collection of sets

Given a space of elements , a collection is a finite set of subsets of . The field of sets generated by a collection is the (finite) -algebra generated by , that is the smallest collection closed under intersection, union and complement that contains .

The atoms of are classically defined as the non-empty elements that are minimal for inclusion. For brevity, we call them the atoms generated by . Let denote their collection. Note that for and , we have either or (otherwise and would be non-empty elements of strictly included in ). This gives a characterization of the atoms that matches our definition of header classes when is the collection of rule sets of a network (see Section 2.2):

(1)

0

1

2

3

4

5

6

7

Figure 2: Toy example of an 8-elements space with a collection .

For example, the collection pictured in Figure 2 generates 7 atoms: , , , , , , and .

Due to the complement operations, the atoms can be harder to represent than the rules they are generated from. In Figure 2, all rules are ranges, but the atom is not. When the intersection operation can be computed and represented efficiently (see Section 2.3), it is natural to consider the collection of combinations defined as sets that can obtain by intersection from sets in :

(2)

In Figure 2, there are 8 combinations: , , , , , , and .

Given a collection and a subset , we let denote the of , that is the sets in that contain . We associate each combination with the set . The function (with parenthesis) should not be confused with an atom (without parenthesis). A combination is said to be covered if (the union of its non-containers covers it), otherwise it is uncovered. Similarly, we associate each atom with the combination (this corresponds to the “positive” part of the characterization from Equation 1). The following proposition states that atoms can be represented by uncovered combinations.

Proposition 1.

The collection of uncovered combinations is in one-to-one correspondence with the atom collection : each corresponds to the atom . Reciprocally, each atom is mapped to combination .

Based on Proposition 1, we say that an uncovered combination represents atom . can be seen as a canonical representation of by a collection of combinations. In Figure Figure 2, atom is represented by combination . Similarly, atoms , , , , and are represented by (uncovered) combinations , , , , , and respectively. Combination is covered: .

The proof of Proposition 1 is straightforward. First, we verify that if is an uncovered combination, is an atom: it suffices to observe that and to match with the atom characterization given by Equation 1 using . Similarly, if is an atom for some , the combination satisfies which implies . In particular, as , is uncovered.

A nice property of the characterization of Proposition 1 is that it allows to efficiently test whether a set contains an atom : given a combination that represents , is equivalent to . This comes from the fact that every uncovered combination has same containers as . (If it was not the case, and is covered.) This explains the importance of determining , and in particular to separate covered combinations from uncovered ones. This is the subject of the next section, but we first formally introduce the notion of overlapping degree.

Overlapping degree of a collection

Our representation is naturally associated to the following measure of complexity of a collection . We define the overlapping degree of as the maximum number of containers of an element, that is . Note that all elements within an atom have same containers in and that a set cannot have more containers than any of its elements. We thus have:

As any atom can be expressed as where is the set of containers of , the number of atoms is obviously bounded by where denotes the number of sets in . The overlapping degree of a collection thus measures its complexity in terms of number of atoms it may generate.

We similarly define the average overlapping degree as the average number of containers of an atom: . We obviously have . Given a collection , we will also consider the average overlapping degree of combinations, that is the average overlapping degree of . Note that has the same collection of atoms as and that a combination containing an atom must satisfy . The overlapping degree of is thus at most and we always have .

In the example from Figure 2, one can verify that we have , and .

In real datasets we observe that both and are in the range while is in the range (these include Inria firewall rules, Stanford forwarding tables provided by Kazemian et al. [16] and IPv4 prefixes announced at BGP level from Route Views [25]).

4 Incremental computation of atoms

We can now state our main result concerning the computation of the atoms generated by a collection of sets.

Theorem 1.

Given a space set and a collection of subsets of , the collection of combinations that canonically represent the atoms can be incrementally computed with elementary set operations where: is the number of atoms generated by ; is the overlapping degree of ; is the average overlapping degree of ; is the average overlapping degree of . Within this computation, each combination can be associated to the list of sets in that contain . If sets are represented by -wildcard expressions (resp. -multi-ranges), the representation can be computed in (resp. ) time.

Application to forwarding loop detection

Theorem 1 has the following consequences for forwarding loop detection.

Corollary 1.

Given a network with collection of rule sets with -bounded representation, forwarding loop detection can be performed in time where is the number of atoms in , is the overlapping degree of and (resp. ) is the average overlapping degree of (resp. ). If sets are represented by -wildcard expressions (resp. -multi-ranges), the representation can be computed in (resp. ) time.

This result can be extended to handle write actions as explained in Appendix F due to the lack of space. The upper-bounds for forwarding loop detection listed in Table 1 follow from Corollary 1 which is proved in Appendix A.3. A key ingredient consists in maintaining for each rule set a list that describes its presence, priority and action for each node. Detecting a loop for a header class then consists in merging the lists associated to the rule sets containing (as provided by our atom representation) for obtaining the forwarding graph for all . Directed cycle detection is finally performed on each such graph.

In the rest of this Section, we prove Theorem 1 by introducing two incremental algorithms for atom computation and analyzing their performance. The incremental approach allows to avoid exponential blow-up even in cases where the number of combinations can be exponential in the number of atoms (an example is given in Appendix B.2).

Algorithms for updating atoms

We first propose a basic algorithm for updating the collection of uncovered combinations of a collection when a set is added to . The main idea is that after adding to , the only new uncovered combinations that can be created are intersections of pre-existing uncovered combinations with (see Lemma 1 in Appendix A.2). We thus first add to the combinations for . As this may introduce covered combinations, we then compute the atom size for each combination . It is then sufficient to remove any combination with to finally obtain . This atom size computation is possible because we have for all . (see Lemma 2 in Appendix A.2). As this union is disjoint, we have . We thus compute the inclusion relation between combinations and store in the combinations that strictly contain . Initializing to for all , we then scan all combinations by non-decreasing cardinality (or any topological order for inclusion) and subtract from for each . A simple proof by induction allows to prove that when is scanned. The whole process in summarized in Algorithm 1.

Procedure
       For each do
            
       For each in non-decreasing cardinality order do
             For each do
      
Algorithm 1 Add a set to a collection and update the collection of its uncovered combinations accordingly.

The correctness of Algorithm 1 follows from the two above remarks (that is Lemma 1 and Lemma 2 in Appendix A.2). Its main complexity cost comes from intersecting with each combination and computing the inclusion relation between combinations, that is and elementary set operations respectively. Starting from and incrementally applying Algorithm 1 to each set in thus allows to obtain with elementary set operations, yielding the general bound of Theorem 1.

To derive better bounds for low overlapping degree , we propose a more involved algorithm that maintains and from one iteration to another and makes only the necessary updates. This requires to handle several subtleties to enable lower complexity.

We similarly start by computing the collection of combinations intersecting . A first subtlety comes from the fact that several combinations may result in the same . However, we are only interested in the combination which is minimal for inclusion that we call the parent of . The reason is that can then be computed from . The parent is unique unless is covered in which case is marked as covered and discarded (see the argument for computation later on). To obtain right parent information, we thus process all by non-decreasing cardinality. The produced combinations such were not in are called new combinations. Their atom size is initialized to . See the “Parent computation” part of Algorithm 2.

We then remark that we only need to compute (or update) for combinations that include , which we store in a set Incl. We also note that needs to be computed when is new and updated when is the parent of a new combination. A second subtlety resides in computing (or updating) only when is not covered that is when (after computation) appears to be non-zero. As the computation of lists is the most heavy part of the computation, this is necessary to enable our complexity analysis. For that purpose, we scan Incl by non-decreasing cardinality so that the correct value of is known when is scanned similarly as in Algorithm 1. However, we avoid any useless computation when is zero. Otherwise, we compute (or update) and decrease by from for adequate : if and where both in , this computation has already been made; it is only necessary when is new or when is the parent of . We optionally maintain for each combination a list that contain the list of sets that contain (Such lists are not necessary for the computation but they are useful for loop detection as detailed in Appendix A.3). See the “Atom size computation” part of Algorithm 2.

Procedure
        ; ; /* ---------------------- Parent computation ---------------------- */
        Sort Inter by non-decreasing cardinality. For each do
               If then
                      If then
                             ; ; ; ; /* Updated later. */
                            
                      ; ; ;
              else
                      If then
              
        Remove from Incl, New and all such that . /* ---------------------- Atom size computation ---------------------- */
        Sort Incl by non-decreasing cardinality. For each do
               If then
                      /* Adjust , and update for impacted : */
                      If then
                            
                      For each s.t. do
                            
                     
              
        /* ---------------------- Remove covered combinations ---------------------- */
        For each do
               Remove from any such that . If then ; ; If and then
       
Algorithm 2 Add a set to a collection and update the collection of its uncovered combinations accordingly.

A last critical point resides in the computation of for each new combination . The list can be obtained from by copying and also intersecting elements of with . This is sufficient: for such that , we can consider . If , by minimality of and we thus have . The case where and cannot happen as it would imply that two different combinations and generate by intersection with () and are both minimal for inclusion. In such case, was covered in and so would be in (and also in ). That is why such combination are already discarded during parent computation. On the other hand, the list of a combination can be updated by intersecting elements of with : when for , we have .

Finally, combinations with are discarded and removed from list of remaining combinations as detailed in the “Remove covered combinations” part of Algorithm 2.

The main argument in the complexity analyzis of Algorithm 2 comes from bounding the size of each list by (this is where the overlapping degree of is used). The number of elementary set operations performed is indeed where denotes the number of atoms of , denotes the atoms of included in , and denotes the uncovered combinations of that contain . A key step consists in proving . The accounts for membership tests and add/remove operations on collections of combinations. In the case of -wildcard and -multi-ranges, this factor can be saved by storing collections of sets in tries rather than balanced binary search trees. With -multi-ranges, a segment tree allows to retrieve Inter in elementary set operations [2, 8].

The various bounds for low overlapping degree of Theorem 1 results from applying iteratively Algorithm 2 for each set of and by carefully bounding the sums of terms. Intuitively, each atom generated by the collection of the first sets of can be associated to an atom of such that where . As each atom is included in sets of at most, this allows to bound the overall sum of terms by . Details are given in Appendix A.2.

5 Comparison with previous works

We give in Appendix C more details about “linear fragmentation”, a phenomenon observed by Kazemian et al. [14], and propose small overlap degree as a plausible cause. The notion of uncovered combination is linked to that of weak completion introduced by [5] in the context of rule-conflict resolution as detailed in Appendix D. We now provide examples where use of set complement computation can lead to exponential blow-up in previous work.

5.1 Veriflow

Veriflow [17] incrementally computes a partition into sub-classes that forms a refinement of the header classes: when a rule is added, each sub-class is replaced by and a partition of . Veriflow benefits from the hypothesis that headers can be decomposed into fixed fields and that each rule set can be represented by a multi-range . The intersection of two multi-ranges is obviously a multi-range. However, set difference is obtained by intersection with the complement which is represented as the union of up to multi-ranges. We prove in Appendix B.1 that successive difference with rules of the form with can generate multi-ranges in the computations of Veriflow as indicated in Table 1.

5.2 HSA / NetPlumber

HSA/NetPlumber [15, 13] use clever heuristics to efficiently compute the set of headers than can traverse a given path . An important one consists in lazy subtraction: set difference computations are postponed until the end of the path. For that purpose, this set is represented as a union of terms of the form where the elementary sets are represented with wildcards. The emptiness of such terms is regularly tested. A simple heuristic is used during the construction of the path: is obviously empty if is included in for some . But if the path loops, HSA has to develop the corresponding terms into a union of wildcards to determine if one of them may produce a forwarding loop.

We now provide an example where this emptiness test can take exponential time. Consider a node whose forwarding table consists in rules with following rule sets:

All rules are associated with the drop action except the last rule (with rule set ) whose action is to forward to the node itself. Such a forwarding table is depicted in Figure 1 for . Starting a loop detection from that node, HSA detects a loop for headers in . The emptiness of this term is thus tested. For that purpose, HSA represents the complement of with . Note that each of the wildcard expressions in that union have only one non- letter. Distributivity is then used to compute as . After expanding the first intersections, HSA thus obtains a union of wildcards with letters in and letters equal to that has to be intersected with . In particular, this unions contains all strings with letters equal to and equal to . All -letter strings with alphabet are produced during the computation which thus requires time. For testing a network with similar nodes, HSA thus requires time . As all sets are pairwise disjoint, the overlapping degree of the collection is and this justifies the two lower-bounds indicated for NetPlumber in Table 1.

6 Future work

Our approach could be naturally integrated in the Veriflow [17] framework both for speed-up and performance-guarantee considerations. Our ideas can also be integrated in the NetPlumber [13] framework (see the similar approach proposed in Appendix E). This would allow to enhance the emptiness tests performed within NetPlumber to guarantee polynomial time execution when the number of header classes is polynomially bounded. In the context of multi-ranges, the emptiness test of expressions is equivalent to the Klee measure problem which consists in computing the volume of a union of boxes. Indeed, expression is empty when the volume of the union of the boxes equals that of . According to recent work [6], the complexity of this problem is believed to be . It would be interesting to determine if such low complexity bounds extend to atom computation in the case of multi-ranges rules.

References

  • [1] Hari Adiseshu, Subhash Suri, and Guru M. Parulkar. Detecting and resolving packet filter conflicts. In Proceedings IEEE INFOCOM 2000, The Conference on Computer Communications, Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies, Reaching the Promised Land of Communications, Tel Aviv, Israel, March 26-30, 2000, pages 1203–1212. IEEE, 2000.
  • [2] Mark de Berg, Otfried Cheong, Marc van Kreveld, and Mark Overmars. Computational Geometry: Algorithms and Applications. Springer-Verlag TELOS, Santa Clara, CA, USA, 3rd ed. edition, 2008.
  • [3] Allan Borodin, Rafail Ostrovsky, and Yuval Rabani. Lower bounds for high dimensional nearest neighbor search and related problems. In Jeffrey Scott Vitter, Lawrence L. Larmore, and Frank Thomson Leighton, editors,

    Proceedings of the Thirty-First Annual ACM Symposium on Theory of Computing, May 1-4, 1999, Atlanta, Georgia, USA

    , pages 312–321. ACM, 1999.
  • [4] Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin Izzard, Fernando Mujica, and Mark Horowitz. Forwarding metamorphosis: Fast programmable match-action processing in hardware for SDN. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, SIGCOMM ’13, pages 99–110, New York, NY, USA, 2013. ACM.
  • [5] Matthieu Boutier and Juliusz Chroboczek. Source-specific routing. In IFIP Networking, 2015.
  • [6] Timothy M. Chan. Klee’s measure problem made easy. In 54th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2013, 26-29 October, 2013, Berkeley, CA, USA, pages 410–419. IEEE Computer Society, 2013.
  • [7] Moses Charikar, Piotr Indyk, and Rina Panigrahy. New algorithms for subset query, partial match, orthogonal range searching, and related problems. In Peter Widmayer, Francisco Triguero Ruiz, Rafael Morales Bueno, Matthew Hennessy, Stephan Eidenbenz, and Ricardo Conejo, editors, Automata, Languages and Programming, 29th International Colloquium, ICALP 2002, Malaga, Spain, July 8-13, 2002, Proceedings, volume 2380 of Lecture Notes in Computer Science, pages 451–462. Springer, 2002.
  • [8] Herbert Edelsbrunner and Hermann A. Maurer. On the intersection of orthogonal objects. Information Processing Letters, 13(4):177–181, 1981.
  • [9] David Eppstein and S. Muthukrishnan. Internet packet filter management and rectangle geometry. In S. Rao Kosaraju, editor, Proceedings of the Twelfth Annual Symposium on Discrete Algorithms, January 7-9, 2001, Washington, DC, USA., pages 827–835. ACM/SIAM, 2001.
  • [10] Anja Feldmann and S. Muthukrishnan. Tradeoffs for packet classification. In Proceedings IEEE INFOCOM 2000, The Conference on Computer Communications, Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies, Reaching the Promised Land of Communications, Tel Aviv, Israel, March 26-30, 2000, pages 1193–1202. IEEE, 2000.
  • [11] Pankaj Gupta and Nick McKeown. Algorithms for packet classification. IEEE Network: The Magazine of Global Internetworking, 15(2):24–32, March 2001.
  • [12] Naga Praveen Katta, Jennifer Rexford, and David Walker. Incremental consistent updates. In Proceedings of the Second ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking, HotSDN ’13, pages 49–54, New York, NY, USA, 2013. ACM.
  • [13] Peyman Kazemian, Michael Chang, Hongyi Zeng, George Varghese, Nick Mckeown, Scott Whyte, and U C San Diego. Real Time Network Policy Checking using Header Space Analysis. In NSDI, 2013.
  • [14] Peyman Kazemian, George Varghese, and Nick McKeown. Header space analysis: Static checking for networks. Technical report, Stanford, 2011. http://yuba.stanford.edu/ peyman/docs/headerspace_tech_report.pdf.
  • [15] Peyman Kazemian, George Varghese, and Nick McKeown. Header space analysis: Static checking for networks. In Presented as part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), pages 113–126, San Jose, CA, 2012. USENIX.
  • [16] Kazemian Peyman. HSA/NetPlumber source code repository. https://bitbucket.org/peymank/hassel-public/.
  • [17] Ahmed Khurshid, Xuan Zou, Wenxuan Zhou, Matthew Caesar, and P Brighten Godfrey. VeriFlow : Verifying Network-Wide Invariants in Real Time. In NSDI, 2013.
  • [18] Vasileios Kotronis, Xenofontas Dimitropoulos, and Bernhard Ager. Outsourcing the routing control logic: Better internet routing based on SDN principles. In Proceedings of the 11th ACM Workshop on Hot Topics in Networks, HotNets-XI, pages 55–60, New York, NY, USA, 2012. ACM.
  • [19] Haohui Mai, Ahmed Khurshid, P Brighten Godfrey, and Samuel T King. Debugging the Data Plane with Anteater. In SIGCOMM, 2011.
  • [20] Arnaud Mary and Yann Strozecki. Efficient enumeration of solutions produced by closure operations. In Nicolas Ollinger and Heribert Vollmer, editors, 33rd Symposium on Theoretical Aspects of Computer Science, STACS 2016, February 17-20, 2016, Orléans, France, volume 47 of LIPIcs, pages 52:1–52:13. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2016.
  • [21] Christopher Monsanto, Joshua Reich, Nate Foster, Jennifer Rexford, and David Walker. Composing software-defined networks. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation, nsdi’13, pages 1–14, Berkeley, CA, USA, 2013. USENIX Association.
  • [22] Mihai Patrascu. Unifying the landscape of cell-probe lower bounds. SIAM J. Comput., 40(3):827–847, 2011.
  • [23] Peter Perešíni, Maciej Kuzniar, Nedeljko Vasić, Marco Canini, and Dejan Kostic. OF.CPP: Consistent Packet Processing for Openflow. In Proceedings of the Second ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking, HotSDN ’13, pages 97–102, New York, NY, USA, 2013. ACM.
  • [24] Ronald L. Rivest. Partial-match retrieval algorithms. SIAM J. Comput., 5(1):19–50, 1976.
  • [25] Route Views Project. BGP traces. http://routeviews.org/.
  • [26] Matteo Varvello, Rafael Laufer, Feixiong Zhang, and T.V. Lakshman. Multi-layer packet classification with graphics processing units. In CoNEXT, 2014.
  • [27] Hongyi Zeng, Shidong Zhang, Fei Ye, Vimalkumar Jeyakumar, Mickey Ju, Junda Liu, Nick McKeown, and Amin Vahdat. Libra: Divide and conquer to verify forwarding tables in huge networks. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, NSDI’14, pages 87–99, Berkeley, CA, USA, 2014. USENIX Association.

Appendix A : Proof details of Theorem 1 and Corollary 1

We present here a refined version of Theorem 1 with more hypothesis about the data-structure used to store a collection of sets. We first review these hypothesis and then the proof details of both versions.

A.1 Collection of sets representation

When manipulating a collection of sets in , we assume that their representations are stored in a collection data-structure allowing to dynamically add, remove or test membership of a set. Such operation are called -collection operations. We can use a balanced binary search tree when comparisons according to a total order can be performed. Such comparison can usually be obtained by comparing directly the binary representations themselves of the set in linear time (and thus for sets with -bounded representation). It is considered as an elementary set operation. In the case of wildcard expressions, the complexity of these operations can be reduced to time by using a trie or a Patricia tree. Our algorithms will also make use of an operation similar to stabbing query that we call -intersection query. It consists in producing the list of sets in a collection of sets that intersect a given query set (). We additionally require that the list is topologically sorted according to inclusion. We say that -intersection queries can be answered with overhead when dynamically adding or removing a set from the collection takes time at most and the -intersection query for any set takes time at most. In the case of -dimensional multi-ranges, a segment tree allows to answer -intersection queries with overhead  [2, 8]. In the case of wildcard expressions, a trie or a Patricia tree allows to answer -intersection queries with overhead (the whole tree has to be traversed in the worse case, but no sorting is necessary as the result is naturally obtained according to lexicographic order).

A.2 Incremental computation of atoms (Theorem 1)

The following is a refinement of Theorem 1 with respect to data-structures that provide time bounds on -collection operations and -intersection queries. For the sake of simplicity of asymptotic expressions, we make the very loose assumption that and . (We are mainly interested in the case where is large. Note also that examples with would be very peculiar.)

Theorem 2.

Given a space set and a collection of subsets of , the collection of combinations that canonically represent the atoms generated by can be incrementally computed with elementary set operations where denotes the number of atoms generated by , denotes the overlapping degree of , denotes the average overlapping degree of and denotes the average overlapping degree of .

More precisely, if the data-structures used for representing sets and collections of sets enable elementary set operations within time , -collection operations within time and -intersection queries with overhead , then the representation of the atoms generated by can be computed in