1 Introduction
To provide reachability services to the Internet users, conventional Internet routers classify packets based only on destination address, Although one dimensional routers are adequate for destinationbased routing, there are increasing demands for higher dimensional routers
[15], for security, traffic engineering, quality of service, etc. Among the higher dimensional routers, two dimensional routers (TwoD routers), that classify packets based on both destination and source addresses, have gained considerable attentions [2][34][20], due to the important semantics of destination and source addresses [36]. For example, TwoD routers can easily express the policies between host and host, or network and network.China Education and Research Network 2 (CERNET2), the largest naive IPv6 network around the world, is now deploying Two DimensionalIP (TwoDIP) routing [39]. More specifically, the routing decisions will be based not only on destination address, but also on the source address. Such extension provides rooms to solve problems of the past and foster innovations in the future. TwoD router is a key element in TwoDIP routing.
There has been many research works on TwoD routers. Most of them focus on softwarebased solutions [37][35][1], however, softwarebased solutions need many accesses to memory, and cause nondeterministic lookup time. The problem gets worse after deploying IPv6, where more bits should be matched. Hardwarebased, especially TCAMbased solutions are the de facto standard for core routers, due to their constant lookup time and high speeds. Despite its high speeds, TCAM is limited by its low capacity, large power consumption and high cost [23]. The largest TCAM chip available currently can only accommodate 1 million IPv4 prefixes [24].
TCAM resources are further limited in TwoD routers. Two dimensional classifiers widely adopt the traditional Cisco Access Control List (ACL) structure (we call it ACLlike structure thereafter), e.g., CERNET2 is using this structure. In Figure 1, we show a typical table within ACLlike structure, where destination and source prefixes (having 4 bits for brevity) are concatenated as an entry in TCAM. For example, receiving a packet with destination address of 1011 and source address of 1111, router will forward the packet to 1.0.0.2, after matching destination prefix 101* and source prefix 11** according to the longest match first (LMF) rule. This ‘fat’ TCAM structure provides fast lookup speeds, however, this structure greatly increases the TCAM resources in TwoD routers, due to 1) it doubles the width of a TCAM entry, e.g., 288 bits (typical TCAM width) are needed within IPv6; 2) in the worst case, the number of TCAM entries can be , where and are the space of destination and source addresses. The ACLlike structure works well within a few entries, however, it becomes inefficient when the number of entries increases. If TwoDIP routing is deployed, the number of entries will predictably increase more rapidly, e.g., CERNET2 wants to carry out policy routing between about 6,000 destination prefixes and 100 source prefixes, resulting in 600,000 entries in TCAM.
Destination prefix  Source prefix  Action 

111*  111*  Forward to 1.0.0.0 
111*  100*  Forward to 1.0.0.1 
100*  111*  Forward to 1.0.0.2 
101*  11**  Forward to 1.0.0.2 
10**  11**  Forward to 1.0.0.3 
In this paper, to relieve the contradictions, we put forward a new forwarding table structure called FIST (FIB Structure for TwoDIP). The key idea of FIST is to store destination and source prefixes in two separate TCAM tables, and store other information in SRAM, which is much cheaper and less power consumptive than TCAM. In Figure 1, we need to store destination prefixes 111*, 100*, 101* and 10** in one TCAM table, and source prefixes 111*, 100*, 11** in another TCAM table. Through moving the redundancies from TCAM to SRAM, we can reduce the TCAM storage space, because 1) TCAM width can be reduced to be one half, e.g., 144 bits are enough within IPv6; 2) reducing the number of entries in TCAM, i.e., each prefix appears only once. In the worst case, there are TCAM entries. Trivial FIST may increase the SRAM storage space, thus we develop a set of techniques for better scalability. We show that the redundancies in SRAM can be largely removed, due to the flexibility in SRAM.
Within FIST, each destination prefix points to a row, each source prefix points to a column, and they both together point to a two dimensional array element (cell) in SRAM, through which we can compute the action (or next hop) information. When a packet arrives, we can match its destination and source in parallel in TCAM, and find the next hop information in SRAM. The lookup process can be pipelined, and the lookup time is comparable with current Internet routers.
However, within FIST, there may exist confliction, i.e., matching a wrong prefix, after removing the binding relation between destination and source prefixes in TCAM. For example, if a packet with destination address of 1011 and source address of 1111 arrives, destination prefix 101* and source prefix 111* will be matched by applying LMF rule in each separate TCAM table, however, there does not exist any entry with destination prefix 101* and source prefix 111*. To resolve such confliction, we precompute the right actions for all conflicted cases. Such precomputation guarantees the correctness, but it becomes impractical when updates happen frequently, because it needs recomputation for all conflicted cases, and causes large number of accesses to SRAM once update happens. To support incremental updates, we propose a new data structure called colored tree, through which we can minimize the computation cost and number of accesses to memory.
We implement the FIST on a commercial router, BitEngine 12004. Through redesigning the hardware logic, we do not need new devices. We carry out comprehensive evaluations with the real implementation, using the real topology, FIB, prefix and traffic data from CERNET2. The results show that FISTbased TwoD router can achieve linecard speeds, save TCAM and SRAM storage space, and bring acceptable update burden.
2 Overview of TwoD Router Design
We want the performance (i.e., packet processing) of the TwoD Router to be comparable with the current Internet routers. We choose TCAM as our base line design as TCAM is the key factor for the fast speed of the current routers.
The immediate change that TwoDIP routing brings to the picture is the forwarding table size. More specifically, the Forwarding Information Base (FIB) will tremendously increase. Note that a first thought might think that the routing table only doubles. This is not true, as for each destination address, it corresponds to different source address. A straightforward implementation, i.e., ACLlike structure, means the FIB table changes from {destination} {action} to {(destination, source)} {action}. This increases the FIB size by an order and a practical consequence is that TCAM cannot hold entries of such scale. Current TCAM storage is 1 million and current destination prefix number is 400,000 [4]. If TwoDIP is implemented by a straightforward approach, even with 100 source prefixes, it is already far beyond the TCAM storage.
We solve this problem by proposing a novel forwarding table structure FIST (see Fig. 1). The key of FIST is a novel separation of TCAM and SRAM. TCAM contributes to fast lookup and SRAM contributes to a larger memory space. Overall, FIST consumes TCAM storage space.
Another difficulty is the update action. In principle, an update of a destination prefix in the TwoD router may incur an update for each source prefix associated with this destination prefix and vice versa. This indicates that the update of a single entry in TwoD router is, given a straightforward design, the same as updating a full table of the current router.
Suppose that there are 10,000 source prefixes, and 500 updates on destination prefixes per second. In the worst case, there are 5,000,000 updates on SRAM per second, which almost exceeds the speed of hardware (in BitWay 12004, linecards work at 100MHz, and linecards need 20 clock cycles for a read/write operation).
We try every aspect to reduce the update complexity. We formulate an optimal transformation problem where we want to minimize the total number of read/write for each update. To solve this problem we propose a colored tree to organize the entries and we prove that we can minimize the computation complexity and number of accesses to memory during update actions.
In the following paper, Section 3 introduces the FIST structure and we prove its correctness during packet forwarding. We further present the lookup process on FIST. We discuss the incremental update action in Section 5. In Section 6, we take some practical issues into consideration and improves the trivial FIST structure. Section 7 presents the implementation of FIST on a commercial router. Section 8 provides evaluation details and results. In Section 9 and 10, we discuss the scalability of FIST and introduce the related works. Finally, we present our conclusions in Section 11.
3 FIST Structure and Lookup
3.1 The TwoD matching rule
We first present the definition of the forwarding rules that is used in two dimensional routing. Let and denote the destination and source addresses, and denote the destination and source prefixes. Let denote an action, more specifically, the next hop. The storage structure should have entries of 3tuple .
Definition 1.
TwoD matching rule: Assume a packet with and arrives at a router. The destination address should first match according to the LMF rule. The source address should then match according to the LMF rule among all the 3tuple given that is matched. The packet is then forwarded to next hop .
Our rule is defined based on the following principles: 1) Avoid confliction: it has been shown [20] that if matching the source and the destination address with the same priority, the LMF rule cannot decide the priority. Even using the firstmatchingruleintable tie breaker may result in loops and resolving the confliction is NPhard. 2) Compatibility: Matching destination prefixes first emphasizes on connectivity and is compatible with previous destinationbased architecture. More specifically, if no source prefix is involved, our rule naturally regresses to traditional forwarding rules. Note that our router design is symmetric if source prefix is matched first.
3.2 FIST Design Details
3.2.1 FIST basics
The new structure FIST is made up of two tables stored in TCAMs and two tables stored in SRAM (see Fig. 1). One table in TCAM stores the destination prefixes (we call it destination table thereafter), and the other table in TCAM stores the source prefixes (we call it source table thereafter). One table in SRAM is a two dimensional table that stores the indexed next hop of each rule in TwoDIP (we call it TDtable thereafter) and we call each cell in the array TDcell (or in short cell if no ambiguity). Another table in SRAM stores the mapping relation of index values and next hops (we call it mappingtable thereafter).
For each rule , is stored in the destination table, is stored in the source table. For the cell in the TDtable, there stores an index value. From this index value, is stored in the corresponding position of mapping table. We store the index value rather than the next hop in the TDtable, because the next hop information is much longer.
As an example, in Fig. 1, for , is stored in the destination table and is associated with the row; and is stored in the source table and is associated with the column. In the TDtable, the cell that corresponding to column and row has index value 2. In the mapping table, the next hop that is related with index value 2 is .
Theorem 1.
The TCAM storage space of FIST is bits. The SRAM storage space of FIST is bits, where is the size of the mapping table.
Proof.
Because the destination table has entries, and the source table has entries, TCAM space is bits. Mappingtable stores the mapping relations between the index and the corresponding next hop interface. They indeed have an upper on the size because each router has a bound on the number of next hop interfaces. Let denote the number of interfaces of a router and represent the size of the entries associated with the interfaces. Then the size of mappingtable is less than . The size of mappingtable can be treated as a constant compared with the SRAM storage space. Therefore, we mainly consider the TDtable size in calculating the SRAM storage space. TDtable dominates the space of SRAM, and has cells of bits. ∎
From Theorem 1, we can see that FIST move the ‘multiplication’ to SRAM, rather than eliminate it. Such movement is worthwhile considering the following facts: 1) Capacity of TCAM is much smaller than that of SRAM; 2) TCAM is 10100 times more expensive than SRAM; 3) TCAM consumes times more power than SRAM [19][8][31]. Besides, SRAM is more flexible than TCAM, thus reducing redundancies is more easily.
3.2.2 TDcell Saturation
For the example in Fig. 1, if a packet with destination address 1011 and source address 1111 arrives at the router, rule should be matched. This is because according to LMF rule, the destination prefix 101* should be first matched. There are two rules (including the default rule) associated with the destination prefix 101*. Consequently, source prefix 11** will be matched. With the new structure, destination prefix 101* will be matched and source prefix 111* will be matched. However, the cell ( row and column) in TDTable does not have any index value. Intrinsically, consider a packet that should match destination and source prefix pairs . If there exists a source prefix that is longer than , cell rather than will be matched.
To address the problem, we precompute and fill the conflicted cells, e.g., , with appropriate index value. The algorithm is as follows.
We show the TDtable after filling up all the conflicted cells in Figure 9.
Theorem 2.
FIST (with TDSaturation()) correctly handle the rule defined in Definition 1.
Proof.
When a packet arrives, and matches according to FIST. If . Then this cell stores the index value of , which is the right one.
Else according to Algorithm TDSaturation(), contains all rules given is matched. In Line 5, is a prefix of , thus the packet also match the rule . In line 6, because there does not exist where is longer than , is the longest match among all the rules given is matched. So should be set to be according to Definition 1. ∎
3.2.3 A NonHomogeneous FIST Structure
We expect that in practice, many destination prefixes only have default next hops. It is thus wasteful to leave a row for the TDtable. To become more compatible to the current router structure and further reduce the SRAM space, we divide the forwarding table into two parts. In the first part each prefix points to a row in TDtable, and in the second part each prefix points directly to an index value. For example, in Fig. 1, destination prefix 11** does not need any specific source prefix, thus it is stored in the second part.
In our implementation, we logically divide the table into two parts by using a indicator bit to separate them. We illustrate more details in Section 7.
3.3 FIST Lookup
The lookup action is shown in Fig. 2. When a packet arrives, the router first extracts the source address and destination address . Using the LMF rule, the router finds the matched source and destination prefixes in both source and destination tables that reside in TCAMs. According to the matched entry, the source table will output a column address and the destination table will output a row address. Combined with the row and column addresses, the router can find a cell in the TDtable, and return an index value. Using the index value, the router looks up the mapping table, and returns the next hop information.
Theorem 3.
The look up speed of FIST is one TCAM clock cycle plus three SRAM clock cycles.
Proof.
Source and destination tables can be accessed in parallel. Thus one clock cycle of TCAM is enough. Getting the row and column address cost one SRAM clock cycle, Then the router will access TDtable, and mapping table, each cost one SRAM clock cycle. ∎
As a comparison, the conventional destinationbased routing usually stores destination prefixes in one TCAM, and accesses both TCAM and SRAM for one time during a lookup process. Note that the SRAM clock cycle is much smaller than TCAM cycle [16], and the bottleneck of a router is normally during delivering packets through the FIFO, thus two more accesses in SRAM will not have a significant impact on throughput.
To minimize the additional impact on throughput, we develop a pipeline lookup process (see model in Fig. 3). When a packet arrives, the router first extracts the source and destination addresses, and hands them to the search engine. The router then looks up the source and destination address in parallel from the source and destination tables. Note that we can perform such parallel processing because we have saturated the TDtable. After the router obtains the SRAM addresses that point to the row and column values, the SRAM addresses are passed to a FIFO buffer, which resolves the unmatching clockrates between TCAM and SRAM. Using the SRAM addresses, router looks up the SRAM that is used in conjunction with TCAM, to get the row and column. Then router makes use of the row and column values to lookup the TDtable, and obtains the index value, which is then used to lookup the mapping table. Finally, router looks up the mapping table and obtains the next hop information.
Pipelining itself is not new and almost all routers implement it today. Using the pipeline, the lookup speed of FIST can achieve one packet per TCAM clock rate.
Observation 1.
All implemented with pipelining, the lookup process of the FIST routers is the same as conventional routers.
4 Forwarding Table Compression
Let be the set of source prefixes, be the set of destination prefixes. Let (or ) be a mapping function that maps a destination (or source) prefix (or ) to the (or ) row (or column). Let denote the cell in the row and column. We use the 5tuple to denote a forwarding table.
Definition 2.
is equivalent to , for any source address and destination address , matches in and in , in and in according to LMF rule, is satisfied.
For a given forwarding table, our objective is to find an equivalent forwarding table, which occupies minimum storage space, including both TCAM and SRAM.
4.1 Compression in TCAM Space
We first compress the storage space in TCAM, including destination and source tables. The size of destination and source tables can be measured by the number of destination and source prefixes in them.
Problem 1.
Optimal TCAM Compression: For , find an equivalent forwarding table such that the storage space in TCAM, i.e., is minimized.
We develop algorithm CompTCAM() to find the optimal TCAM compression. Our algorithm is based on the ORTC (Optimal Routing Table Constructor) algorithm [10], that computes the minimal one dimensional equivalent forwarding table in TCAM.
Intrinsically, our basic idea is to transform the two dimensional table into two conventional one dimensional tables, one is destinationbased and the other is sourcebased. The action of each prefix in the destinationbased (or sourcebased) table is the corresponding row (or column) vector in the TDtable. After transforming, the ORTC algorithm can be applied directly to compress these two tables.
Let be the row vector related with , be the column vector related with . Let be a mapping function that maps destination addresses (prefixes) to row vectors, be a mapping function that maps source addresses (prefixes) to column vectors. Let be the function by applying ORTC algorithm to the forwarding table, that has prefix set and action related with prefix is . The input of CompTCAM() is the original forwarding table and the output of CompTCAM() is the new forwarding table after compression.
Theorem 4.
Algorithm CompTCAM() computes the optimal compression, the complexity of CompTCAM() is .
Proof.
For the first part of the theorem, according to ORTC algorithm, for any destination address, for any source address, thus the new TwoDIP forwarding table is equivalent to the original one. We next prove that CompTCAM() minimizes both destination and source tables by contradiction. We only give the proof for destination table minimization, proof for source table minimization is similar.
In CompTCAM(), according to ORTC algorithm in [10], if there exists another compression that produces and , and is smaller than the computed . Then there must be a destination address that matches , and . Thus there must be a source address , such that , will match a different index value in the new TDtable.
The complexity of ORTC algorithm is , where is the number of rules. In of CompCAM(), there are rules, the complexity of the basic comparison operation is . Thus, the complexity of CompTCAM() is . ∎
CompTCAM() needs a bytebybyte comparison between rows and columns. To avoid these wasted comparisons, we can use fignerprint, that is a collisionresistant hash value computed over the rows/columns [41]
. We use SHA1 as the collisionresistant hash function, and the collision probability is proved to be much smaller than hardware error rate
[30]. Within fingerprints, the complexity of CompTCAM() can be reduced to be .Note that CompTCAM() minimizes destination and source tables in TCAM. At the same time, it reduce the size of TDtable in SRAM, i.e., the row (or column) corresponding to the eliminated destination and source prefixes will also be eliminated. Next, we try to optimize the storage space in SRAM.
4.2 Compression in SRAM Space
Within FIST structure, TDtable and mapping table reside in SRAM storage. Compared to TDtable, mapping table commonly occupies fixed and small storage space. Thus, we take TDtable as the dominant factor of SRAM storage.
To be storage efficient, we try to minimize the TDtable. We formulate the problems as following.
Problem 2.
Optimal TDtable Compression: For , find an equivalent forwarding table such that the storage space in TDtable, i.e., is minimized.
Theorem 5.
Finding the optimal compressed TDtable is NPcomplete.
Proof.
It is obvious that the decision problem of validating a given TDtable is solvable in polynomial time. Therefore, the optimal TDtable compression problem is in NP class. To show this problem is NPhard, we reduce the lossless data compression problem, which is known to be NPcomplete [33], to it.
The lossless data compression problem is, given a string, find the minimallength compressed form of the string. Here we extend the original problem to the two dimensional case (we call it two dimensional data compression problem), such that the input string can be a two dimensional string. Two dimensional data compression problem is also NPcomplete, as one dimensional data compression is a special case of it. Note that two dimensional data compression problem is not equal to the optimal TDtable compression problem, as rows (columns) in TDtable can be reordered in TDtable.
Let be the storage size of the optimal compressed TDtable of TDtable . Given a two dimensional string. We construct a TDtable, as shown in Figure 4. The TDtable is composed of four sub TDtable, , , and . Each sub TDtable is independent from the others, i.e., each expressed by a separate symbol system. represents the two dimensional string. In , each column is independent from other columns, and is the only optimal compressed column, i.e., , where is a permutation matrix. In , each row is independent from other rows, and is the only optimal compressed row, i.e., , where is a permutation matrix. is an optimal compressed sub TDtable.
Then, we show that by finding an optimal compressed TDtable, we can find a lossless two dimensional compressed string. This is because if we permutate the sub TDtable , e.g., let the permutation matrix be , then one of and must also be permutated. Without loss of generality, let be permutated and the corresponding permutation matrix be . As . So permutation on will never lead to the optimal compressed TDtable. Thus if we find the optimal compressed TDtable, we can find the lossless compressed data by pick up the matrix in the top left corner of the optimal compressed TDtable.
∎
4.2.1 Eliminating Duplicated Rows/Columns
Observation 2.
If (or ), we can merge rows (or columns) of and (or and ), by setting or (or or ).
The observation is true because FIST indirectly points to the index values through row (or column) numbers. If two rows (or columns) pointed by two destination (or source) prefixes are the same, we can eliminate one of them by making these two prefixes point to the same row (or column).
Based on this observation, we can eliminate the duplicated rows and columns within FIST structure. The complexity of this process is (because ). Within fingerprints, the complexity can be reduced to be . Actually, this process can be combined together with CompTCAM() to reduce computation time.
Theorem 6.
Eliminating the duplicated rows and columns computes the optimal TDtable compression.
Proof.
We first prove the equivalence. Without loss of generality, we eliminate row first. Let be the table after eliminating duplicated rows. We have . Let be the table after eliminating duplicated columns. is a new TDtable. We have . Thus, .
Then we prove the resulted table is the minimum one by contradiction. Assume there exists that is equivalent and . Without loss of generality, suppose that . Because does not have duplicated rows, there must exist and such that and (pigeonhole principle). So there must exist , such that and . Thus the assumption is wrong. The function of this part is same as rank computation of matrix. ∎
4.2.2 Fixed Block Deduplication
After eliminating the duplicated rows/columns, there still exists duplicated data in TDtable. For example, part of a row is the same with part of another row. To futher compress the TDtable, we apply fixed block deduplication, which is a common technique for data deduplication [26].
Fixed block deduplication is previously used to eliminate redundancies in data storage (e.g., file system). It breaks file into chunks that has fixed length, identifies redundant chunk, eliminates all but only one copy, and creates logical pointer to these chunks so that users can access them as needed [12].
Our basic idea it to cut the rows in TDtable into fiexed width chunks, called narrow rows, i.e., rows that are shoter than the original rows in TDtable. And we eliminate all duplicated narrow rows, thus only one copy of each narrow row will be preserved.
As shown in Figure 5, After deduplicating, we store the narrow rows in a narrow TDtable. Each entry of the narrow TDtable is an indexed next hop, the same with the cell in TDtable. We also set up catalog table, the entry of which points to a row number in narrow TDtable. Catalog table mapps the narrow rows in TDtable to narrow TDtable. For example, in Figure 6, the TDtable derived from the example in Figure 1 can be deduplicated into a narrow TDtable combined with a catalog table. We can see that in Figure 6, the narrow row in the solid cycle can be transformed to be the row in the narrow TDtable, and the narrow row in the dashed cycle can be transformed to be the row in the narrow TDtable.
We also show the deduplication process in Figure 5. We scan the TDtable, and extract all narrow rows from it. For each narrow row, we first compute the fingerprint of it using SHA1 function. With bloom filter [6], we can judge the narrow row is a duplicated one. If it is not, then we insert the narrow row into the narrow TDtable, and fill the entry in the catalog table. Else if it is, then we search in a data structure called narrow row index, that organizes all detected ¡fingerprint, narrow row number¿ pairs. Using the search result, we just fill the narrow row number in the corresponding position in the catalog table.
After using the deduplicated storage structure, the lookup process has to be updated. Let be the width (number of cells) of a narrow row. We show the part of the new lookup process in Figure 7, which can replace the TDtable lookup step in Figure 2 and form the whole picture of the new lookup process. After obtaining the row address and column address, the router can find an entry in the catalog table, and return a new row address in the narrow TDtable. Using the new row and original column address, the router lookups the narrow TDtable, and return an index value. Because of the random access property of the fixed block deplication method, the new lookup process still achieve constant lookup time. Although it adds one more lookup in SRAM, the influnence on the lookup speed is trivial, especially within the pipelined lookup model.
5 FIST Update
Although TDSaturation() guarantees the correctness of FIST. It needs recomputation for all conflicted cells, and rewritten of them in SRAM when update happens. Note that although the update is necessary, not all cell need to be cleared and rewritten in this update process. In this section, our objective is to minimize the number of cell updates. We use a function to denote the TDtable, let be the index value of cell .
Problem 3.
Optimal transformation: Given a TDtable and an update, find a new TDtable , such that is minimized.
To achieve this, we will first build a data structure called color tree to organize the cells. With this color tree, we will develop algorithms for insertion and deletion where only part of the cells will be updated. We will then prove that our algorithms indeed minimize the computation cost and the number of cell rewrites.
5.1 A Color Tree Structure
Each destination node has a colored tree. The tree is constructed using all source prefixes in the source table. There are black nodes and white nodes. Intrinsically, each black nodes represent the cells that are directly set and white nodes represent the conflicted cells, i.e., the cells that are not directly set, but are filled up based on algorithms, e.g., TDSaturation().
Let be the colored tree for , let be the set of black nodes, let be the set of white nodes. For example, in Figure 9, we show , the colored tree for destination prefixes 101*. In it, and .
To compute the optimal transformation upon an update, we first define domain of of a black node in colored trees. Formally,
Definition 3.
In a colored tree , the domain of a black node is , where satisfies: 1) is a prefix of ; 2) , where is a prefix of and is a prefix of .
For example, in Figure 9, the domain of **** . Intuitively, the domain of a black node is the largest subtree that roots at itself and does not contain any other black nodes.
Theorem 7.
When updating rule , the cell set is minimum cell set that should be changed, and all cells in it should be set to be the index value of .
Proof.
We prove the theorem by contradiction. Assume there exists another cell set is smaller than the above cell set, indicating that the index value of one cell , where , is not set to be the index value of . Then if a packet matches and within FIST, obviously, it should match the rule. Then the cell is set with a wrong index value. ∎
From Theorem 7, we can see that, through computing the domain, we can compute the optimal transformation when an update arrives. In the next subsection, we show two update algorithms that compute the optimal transformation.
5.2 FIST Update Algorithms
Here, we define insertion action Insert() and deletion action Delete(). Update action Update(, , ) can be seen as a deletion followed by an insertion.
Before illustrating these algorithms, we introduce a lemma that simplifies the updating process.
Lemma 1.
If is the parent of in , and , then cells and have the same index value.
Proof.
If , then and belong to the domain of the same black node. If , then belong to the domain of . Thus according to Theorem 7, the lemma is proved. ∎
Algorithm Insert() inserts a rule given the TDtable. If (or ) is not in the destination (or source) table, router should assign an unused row (or column). When a column is assigned, we should first find , which is the parent of in the source tree . According to Lemma 1, and have the same index value for all . Thus, we copy the column corresponding to to the column corresponding to . After initializing, through computing the domain, we can find the cells that should be changed. Finally, (or ) should be inserted into destination (or source) table if it does not exist.
Algorithm Delete() delete the rule related with and given the TDtable. At first, a black node in is set to be white, thus the nodes in domain now belongs to a new domain. For example, in Fig. 10, after deleting rule , node is set to be white in colored tree . And nodes and , which belong to the domain of before deletion, now belong to the domain of in . Thus cells and should be set to be 1, which is the index value of . After deletion, if there does not exist any rule related with (or ) any more, we should delete it from destination (or source) table. And then reclaim the row (or column) resources.
Theorem 8.
Insert(, , , ) and Delete( ), , ) compute the optimal transformation.
Proof.
The theorem is an immediate result of Theorem 7. ∎
Thus, the update action causes minimum computation cost, and brings least accesses to TDtable in SRAM. Beside, with the prevalence of dualport SRAM, by reading through one port and writing through the other^{1}^{1}1current dualport SRAM can resolve the readwrite collision, i.e., read during write operation at the same cell [9]., update of TDtable does not have to lock the lookup process. We can also prove that the update action of FIST is also consistent, i.e., for each rule insertion or deletion, a packet can only match the rule that would be matched before or after the insertion or deletion [38]. Due to page limit, we omit it here.
6 Practical Considerations
6.1 Reducing Update Burden on TDtable
Although TDtable update will not influence the lookup process, a single rule insertion/deletion may still cause write operations at SRAM. An update process, in the worst case, will cause updating on all cells in a row of TDtable. For example, if we update in Figure 1 with index value 1, then all cells in the row should be updated with index value 1.
When is very large, it may exceed the ability of SRAM to handle these updates. For example, if there are 10,000 source prefixes, the network produces over 500 updates on the default next hops of different destination prefixes per second. In the worst case, there will be over 5 million/second write operations into TDtable, which exceed the maximum clock rate of SRAM.
The main reason for the large number of update operations on TDtable is that, the default next hop of all destination prefixes is stored as full wildcard in source table. First, the full wildcard resides at the root node of the source tree, once updated, it will cause a lot of updates subsequently. Second, the default next hop changes frequently, because it has to change when connectivity information of the corresponding destination prefix changes.
Thus, we propose to isolate default nexthop from source table, i.e., it is not stored in source table. Rather than being matched when the full wildcard is hit in source table, the default nexthop is matched when none entry in source table is matched. In Section 7, we will illustrate this in detail.
After removing the full wildcard from the source table, we believe the update frequency of TDtable will be low, with the following two facts: 1) the update of nonconnectivity rule will be slow, i.e., it does not have to respond instantly to the changes of network topology; 2) most prefixes in the current forwarding tables are near leaf nodes in prefix trees [3], indicating that we only need to update a few cells during most rule updates.
After removing the full wildcard, the source tree may be divided into source forest, which has a similar definition with source tree except its forest structure. For example, in Fig 12, we show the source forest after removing the full wildcard. We can also define colored forest (denoted by ) and redefine domain in a similar way.
However, unlike in the colored tree, where each node has a black ancestor at least, because the root node is black. In colored forest, a node could have none black ancestor without black root node. Thus, in colored forest, a white node may do not belong to the domain of any black node. For example, in Fig 12, the shaded white node 100* and 101* do not belong to the domain of any black node. For the white node that do not belong to the domain of any black node in , cell is invalid, i.e., the cell does not have any index value and should not be matched. Fig. 12 shows the TDtable after removing the full wildcard from source table.
After that, we can revise update action, including insertion and deletion, by replacing “tree” with “forest”.
7 Implementation
As a proofofconcept, we implement the FIST forwarding table structure on a commercial router, BitEngine 12004, which supports four linecards. In each linecard, there are a CPU board (BitWay CPU8240 that works at 100MHz), two TCAM chips (IDT 74K62100, can accommodate 512K IPv4 entries at most), an FPGA chip (Altera EP1S25780), and several cascaded SRAM chips (IDT 71T75602) associated with the TCAM chips. Inside the FPGA chip, there exists internal SRAM memory.
Our implementation is based on existed hardware, and does not need any new hardware. We redesign the hardware through rewriting about 1500 lines of VHDL code (not including C code) of the original destinationbased version.
7.1 Router Framework
In Fig. 13, we show the framework of our router design. The major changes are in data plane. In data plane, the FPGA receives the packets from the interface module, extracts the packet head, and request the TCAM module. Due to resource limit, we place both destination and source table in different blocks of one TCAM, and the FPGA requests the TCAM module twice (the first in destination table, and the second in source table) to access these two tables. Although this will increase the delay per lookup in our implementation, many processors (e.g., NetLogic NL10K) now support two lookups in parallel, thus this will not become the bottleneck of lookup process in the future.
The TCAM module will output the matched prefix, and through the TCAM associated SRAM, FPGA will get the matched result, e.g., row or column address of matched prefix. Then the FPGA will compute the address of the cell in TDtable, which resides in a block of an internal SRAM of the FPGA. After getting the index, FPGA accesses the mapping table, which resides in another block of the SRAM of FPGA. Then FPGA gets the next hop information, and delivers the packets to the next processing module, switch coprocess module, which will switch the packet to the right interface.
We also design the control interfaces for control plane to access and update the forwarding table. In the control plane, we store destination prefixes in a patricia trie [17], source prefixes in another patricia trie. We store the row and column addresses in the nodes of each patricia trie, and also store each rule in a two dimensional array.
7.2 A Scalable FIST Design
We implemented the FIST structure, as shown in Figure 1, on the linecard. Besides, for better scalability, we incorporated the improvements mentioned in Section 3.2.3 and 6.1, such that FIST can accommodate more destination/source prefixes and allow more frequent updates. Within the improvements, the format of the SRAM units pointed by source table remains the same, i.e., storing only the column address. However, the format of the SRAM units pointed by destination table changes: 1) it has an indicator bit, which is set only if there is a row in TDtable for the corresponding destination prefix, such that we can reduce the SRAM space of TDtable (see Section 3.2.3); 2) it stores the index value of the default next hop for the corresponding destination prefix, such that the burden on TDtable caused by updates can be reduced (see Section 6.1).
Within the new structure, the lookup process also changes. After TCAM matching in destination and source tables and obtaining the SRAM unit corresponding to the matched prefixes. Router checks the indicator bit, if the indicator bit is unset, then router gets the index value of the default next hop directly. Else if none source prefix gets matched, then router gets the index value of the default next hop. Else if a source prefix is matched, then router accesses the cell (assume are the matched destination and source prefixes) in TDtable. If the cell is invalid, then router gets the index value of the default next hop, else the router gets the index value of cell . Using the obtained index value, router looks up in mapping table, and gets the next hop information. We show the new lookup process in Figure 16, note that compared to the original lookup process in Figure 2, all new steps are processed in CPU, indicating that there is none additional accesses to TCAM or SRAM.
7.3 Fixed Block Deduplication
We use bloom filter to accelerate the deduplication process. Within bloom filter, there exists a summary vector [41], which is a vector of bits.
8 Evaluation
8.1 Evaluation Setup
In Figure 17, we show the connection diagram of the evaluation environment. There are three components, a PC host (in this paper, the CPU of the PC host is Intel Core2 Duo T6570) that acts as the control plane of a router, a 4GE linecard that has been equipped with both ACLlike and FIST forwarding table structure and a traffic generator (IXIA 1600). Using optical fibers, the linecard is connected with the traffic generator, and using a serial cable, the linecard is connected with the PC host. The traffic generator sends packets of minimum 64 bytes (including 18 bytes of Ethernet Header) at full speeds, i.e., 4Gbps. The linecard receives the packets, lookups them in the forwarding table, and sends them back to the traffic generator. The traffic generator can summarize the sending and receiving rate.
We control the forwarding table by the PC host through the serial cable. We update the forwarding table using Algorithm and through the predefined interfaces to access hardware on the PC host. We test update at different frequency, i.e., 100, 1,000, and 10,000 updates per second. The TCAM memory is constructed according to the Lalgorithm [32], i.e., prefixes of the same length are clustered together and there exists free space between different clusters, to guarantee fast updates in TCAM. We preallocate 1000 positions for each prefix cluster of different length initially.
8.2 Data Sets
To evaluate our FIST structure, we consider two scenarios, and generate forwarding table data sets, update sequence data sets within these scenarios.
8.2.1 Policy Routing in CERNET2
CERNET2 has two international exchange centers connecting to the Internet, Beijing (CNGI6IX) and Shanghai (CNGISHIX). However, during operation, we found that CNGI6IX is very congested with an average throughput of 1.18Gbps in February 2011; and CNGISHIX is much more spared with a maximal throughput of 8.3Mbps at the same time. We want to move the outgoing International traffic of three universities, i.e., THU (in Beijing, with 38 prefixes), HUST (in Wuhan, with 18 prefixes) and SCUT (in Guangzhou, with 28 prefixes) to CNGISHIX (Shanghai portal).
In this scenario, we collects the prefix and FIB information from CERNET2. There are 6973 prefixes in the FIB of CERNET2, and among them there are 6406 foreign prefixes. We construct three policy forwarding tables on three routers, i.e., Beijing, Wuhan and Guangzhou (we call each forwarding table PRBJ, PRWH, PRGZ).
To obtain the update sequence, we set the initial two dimensional rule set to be empty, and add all rules into the forwarding table at some time point. We generate the update sequence on the router of Wuhan in this way, to simulate a common scenario, where ISPs decide to carry out a policy at some time point. We show the number of rules in each forwarding table, and number of updates in each update sequence in Table LABEL:sub@tabrulenumber.


8.2.2 Load Balancing in CERNET2
To further balance the load in between CNGI6IX and CNGISHIX, we need a more dynamic load balancing mechanism in the future. We collect about one TeraBytes of traffic data during one month (Jan, 2012) from three routers (i.e., Beijing, Shanghai and Wuhan) by NetFlow. In Figure 19 (the Yaxis has been anonymized), we also show the bandwidth utilization of both CNGI6IX and CNGISHIX during the month. We can see that CNGI6IX is much more congested than CNGISHIX.
We first process the traffic data, such that each outgoing international microflow, that is identified by their source and destination addresses, is aggregated into a macroflow, that is identified by an source and an destination prefix (here, we use the LMF rule for aggregation). Then we try to redistribute each macro flow to different exchange centers, such that load is optimally balanced. The problem can be reduced to MultiProcessor Scheduling problem [5] and is NPhard. To solve the problem, we use the greedy firstfit algorithm, which assigns each macro flow to the exchange center with the least utilization, and achieves an approximation factor of 2.
We construct three load balancing forwarding tables, each at a different time points, i.e., 6:00 morning, 2:00 afternoon and 10:00 evening during Jan 15, 2012 on the router of Wuhan (we call each forwarding table LBMO, LBAF, LBEV). We also show the number of each forwarding table in Table (b)b, in which we can see that LBEV is the largest one, because more traffic should be move to CNGISHIX when 10:00 at night is the peak traffic point during one day. We also generate the update sequence by computing a new load balancing scheme every hour.
8.3 Evaluation Results
8.3.1 Forwarding Table Size
We evaluate the storage space that FIST consumes for all forwarding tables, and the storage space after compression and after adopting nonhomogeneous structure. As a comparison, we also set the ACLlike structure as a benchmark. In Figure 20, we show the size of each forwarding table, which can be separated into TCAM and SRAM storage, within different storage structures.
Trivial FIST and ACLlike Structure: In Figure LABEL:sub@figtcam, we can see that for all forwarding tables, FIST consumes only half of the TCAM space that ACLlike structure consumes. For the data sets within the policy routing scenario, FIST costs much less TCAM storage, e.g., within PRWH, FIST consumes about 1Mb, while ACLlike structure consumes more than 72Mb TCAM storage space. This is because in our policy routing scenario, the forwarding table is very dense, i.e., many rule share the same destination or source prefix. FIST store only once for each destination or source prefix, while ACLlike structure may store multiple times for the same destination (or source) prefix if it is associated with multiple source (or destination) prefixes.
In Figure LABEL:sub@figsram, we can see that, within PRBJ, PRGZ and PRWH, FIST consumes less SRAM space than ACLlike structure. However, within LBMO, LBAF and LBEV, FIST consumes more SRAM space ACLlike structure. This is because in the policy routing scenario, the forwarding table is much denser, the rules are congregated in a few source and destination prefixes, i.e., the prefixes of THU, HUST and SCUT. However, in the load balancing scenario, rules span across many destination and source prefixes, thus the FIST structure consumes much more SRAM space.
Compression: In Figure LABEL:sub@figcompressiontcam and LABEL:sub@figcompressionsram, we show the consumed TCAM and SRAM storage space within FIST after compression, first by CompressDS() and then by CompressTD(). We also compress the forwarding tables within ACLlike structure, i.e., minimize the number of rules. Note that within ACLlike structure, we can not further reduce SRAM storage space after minimizing the TCAM storage space. In Figure LABEL:sub@figcompressiontcam, we can see that about 20%30% TCAM storage space can be saved through compression. After CompressDS(), the TCAM storage space consumed by FIST is still much smaller than the TCAM storage space consumed by ACLlike structure. CompressTD() does not effect on TCAM storage, as it only modifies the row (or column) number of destination (or source) prefixes. In Figure LABEL:sub@figcompressionsram, we show the consumed SRAM storage space after compression. We can see that the percentage of SRAM that can be saved by CompressSD() and compressing ACLlike forwarding table is similar with the percentage of TCAM that can be saved, i.e., 20%30%. However, CompressTD() has a considerable effect on the SRAM storage space within FIST, this is because there are high redundancies in the TDtable. For example, on PRWH, we carry out the same policy on all source prefixes in source table, thus their corresponding columns in TDtable can be merged.
NonHomogenous Structure: In Figure LABEL:sub@fignonhomotcam, we show the consumed TCAM storage space within nonhomogeneous structure. Nonhomogeneous FIST structure does not save TCAM storage space, because nonhomogeneous structure only separates the destination table into two parts. Nonhomogeneous ACLlike structure does save TCAM storage space, especially for the load balancing scenario. This is because the width of an TCAM entry can be reduced after store destination only rules separately. However, because the width of a TCAM entry is fixed, we can only physically (instead of logically) divide the table into two parts within ACLlike structure. In contrast, within FIST, we can flexibility logically divide the table into two parts.
In Figure LABEL:sub@fignonhomosram, we show the consumed SRAM storage space within nonhomogeneous structure. Within nonhomogeneous FIST structure, SRAM storage space can be saved. Within PRBJ, PRGZ and PRWH, about 7% SRAM space can be saved after adopting nonhomogeneous structure, because about 7% destination prefixes are not foreign prefixes, and does not have to be moved. However, for LBMO, LBAF and LBEV, the SRAM space can be reduce to be 3% of the SRAM space consumed by homogeneous structure. This is because in the load balancing scenario, only traffic of a small number of destination prefixes have to be diverted to another path. For example, within LBEV, only traffic towards 59 destination prefixes has to be diverted. Within nonhomogeneous ACLlike structure, SRAM storage space is not saved. After adopting nonhomogeneous structure, FIST costs less SRAM storage than ACLlike structure.
Combine NonHomogenous Structure with Compression: In Figure LABEL:sub@figfinaltcam and LABEL:sub@figfinalsram, we apply both nonhomogenous structure and compression techniques to all forwarding tables. The resulting tables get smaller than all previous tables. Here, we focus on SRAM storage space because nonhomogenous structure has no effect on TCAM storage space. We can see that the improvement is small compared to compression only, this is because the TDtable is already very small, and negligible as compared to other consumed SRAM storage space. However, the improvement is quite large compared to nonhomogenous structure only, because there still exist high redundancies after adopting nonhomogenous structure.
8.3.2 Lookup Speed and Update
Lookup Speed: In Figure 21, we show the lookup speed without update. We can see that without update, both sending and receiving rates reach line speeds (note that Ethernet frame contains 8 bytes of preamble and 12 bytes of gap, thus the maximum sending rate is Gbps). We also look into the data traces, and find there does not exist packet loss.
Number of Accesses to TCAM During Update:
To evaluate the update burden, i.e., the influence of update on lookup speed for each update sequence. We first evaluate the number of accesses to TCAM, as TCAM accesses dominates the interruption period during updates. We also compare FIST with ACLlike structure. In Figure LABEL:sub@figaccesstcamfrequency, we show the number of accesses to TCAM of FIST per 100 updates. We can see that PR brings only a few accesses to TCAM, this is because in our policy routing case, all destination prefixes already exist in TCAM, and carrying out the policy routing only needs to assign rows to the destination prefixes in destination table, and initially insert source prefixes into source table. After 243,500 updates, PR does not need any access to TCAM, because all destination (or source) prefixes already exist in destination (or source) table. LB also introduces only a few accesses to TCAM, because there exists many overlapping destination and source prefixes at different time points, thus they do not have to be updated in destination and source tables each time. In contrast with FIST, ACLlike structure introduces much more accesses to TCAM, e.g., 15,596 accesses to TCAM during 100 updates maximally. This is because 1) there are more rules in forwarding table within ACLlike structure; 2) within FIST, we only have to guarantee the order of destination/source prefixes with the same length in their respective destination/source table. However, within ACLlike structure, we have to guarantee the order of (destination, source) prefix pairs with the same length (for both destination and source prefixes) in a common table.
In Figure LABEL:sub@figevaluationpolicy and LABEL:sub@figevaluationload, we show the lookup speed, i.e., receiving rate on the traffic generator, of FIST within different update frequency during 5 minutes (when the update frequency is 5000, or 50000 updates/sec, the update process will be terminated earlier). In Figure LABEL:sub@figevaluationpolicy, we can see that within FIST structure, no matter at which frequency, updates has almost no influence on lookup in the policy routing scenario. This is because PR cause only a few accesses to TCAM with each update, and brings a little interruption time during lookup. In Figure LABEL:sub@figevaluationpolicy, we also compare the results of FIST and ACLlike structures, we can see that within ACLlike structure, updates have greater influence on lookup, e.g., the receiving rate is degrade by about 7% maximally when there are 50,000 updates per second.
In Figure LABEL:sub@figevaluationload, we can see that within FIST structure, when the update frequency is low, i.e., 500 updates per second. However, when the update frequency is high, e.g., 50,000 updates per second, the receiving rate is degraded by about 2%. This is because in our load balancing scenario, each updates cause more accesses to TCAM. Even when update frequency is 5,000 updates per second, there still exists some time point when the lookup speed is degraded. In Figure LABEL:sub@figevaluationload, we can see that within ACLlike structure, even at the lowest frequency, i.e., 500 updates per second, the performance is still degraded by about 0.1%.
We conclude that our FIST structure will not introduce high update burden on lookup speed. In the policy routing scenario, although there are may be millions of update when ISP operators decide to carry out some policies, the update can be completed in a short time, e.g., less than 20 seconds when there are 1 million updates, without having influence on lookup. Besides, in most cases, policy routing does not have to be implemented in a realtime way. In the load balancing scenario, we perform updates every hour, we show the number of updates needed per hour in Figure 23. The trend of updates per hour in Figure 23 is similar with the trend of traffic in CNGI6IX in Figure 19, because we need to move more traffic to CNGISHIX when CNGI6IX is more congested. We can see that the maximum number of updates per hour is about 1,300, which can be completed within one second without having influence on lookup.
Number of Accesses to SRAM During Update: In Figure 25, we show the number of accesses to SRAM within incremental update and TDSaturation(). We can see that for both policy routing and load balancing scenario, incremental update causes much less accesses to SRAM. This is because during each update, TDSaturation() has to reset all conflicted cells while incremental update only has to reset the dependent cells that must be changed, which is a subset of all conflicted cells. For example, in the load balancing scenario, incremental update causes 600 accesses to SRAM at most per 100 updates, while TDSaturation causes 10,814 accesses to SRAM at most per 100 updates. In the policy routing scenario, incremental update causes only 100 accesses to SRAM per 100 updates, this is because in the forwarding table of the policy routing scenario, the source prefixes are composed of prefixes from two universities, i.e., THU and HUST. Prefixes from THU are totally disjoint, i.e., none source prefix is a prefix of another, and prefixes from HUST are disjoint except for two prefixes (240c::/28 and 240c:3::/32). Thus update a cell in TDtable will bring almost none conflicted cells.
In Figure (b)b, we also show the computation time per 100 updates for both incremental update and TDSaturation(). The result is similar with Figure 25, because more accesses to SRAM indicates more cells that have to be computed. Thus incremental update cost much less time per update, compared to TDSaturation().
In Figure 25, we show the number of accesses to SRAM with and without isolating default next hop. We only consider the load balancing scenario, because policy routing is a special case where all nodes in the colred tree of any destination prefix are black, thus isolating default next hop has no effect. In the load balancing scenario, we randomly insert 100 updates on the default next hops of destination prefixes, after each hour when load balancing is carried out. In Figure 25, we can see that with isolation, each update on default next hop bring none access to SRAM, because we only have to update in the TCAM. However, without isolation, each 100 updates brings about 10,000 accesses to SRAM, because we also have to update the dependent cells in TDtable without isolation.
9 Discussion about Scalability
We admit that the trivial FIST will bring scalability issues in SRAM, if both the destination and source tables are very large. Current largest SRAM chip in the market is 144Mb (288Mb SRAM is on the roadmap of major vendors) [13], other memory products such as RLDRAM can provide similar performance (allows 16 bytes reading with random access time of 15 ns with memory denominations of 576 Mbit/chip [14]. Suppose multiple chips (linecards of BitEngine 12004 support four SRAM chips) is used, and 576Mb storage space is available for TDtable, if there are 10,000 destination prefixes, then TDtable can accommodate at most 7550 source prefixes. It is obviously impractical within current 400,000 destination prefixes.
However, the situation can be improved because 1) using nonhomogenous structure can exclude most destination prefixes from destination table; 2) in the real world, different prefixes usually share the same policy, e.g., prefixes belong the the same university in CERNET2 should be equally treated. They can be compressed to the granularity of coarser granularity, rather than prefixes; 3) we can enforce restrictions when adding a row or column into the TDtable. Beside, we are making continuous efforts to eliminate the redundancies in TDtable.
update scalability
we admit that in certain circumstances,
10 Related Work
Packet classification is an important topic throughout the history of the Internet. With increasing demands from users and ISPs for better and more flexible services, more research works focus on higher dimensional classification[7] [29]. In layer4, multidimensional classification is a familiar topic due to security and other reasons [15][18]. In layer3, more and more routing schemes make routing decisions based on both source and destination addresses, such as NIRA [40], customerspecific routing [11]. In this paper, our focus is on two dimensional classification in layer3, i.e., designing a TwoD router.
Hardwarebased, especially TCAMbased solutions are the de facto standard for the Internet routers [23]. TCAMbased solutions are limited by the capacity of TCAM [24], despite their constant lookup time. To reduce the TCAM storage space, various compression schemes have been studied [19][22]. In [34], optimal two dimensional routing table compression is studied. Most works related with hardwarebased multidimensional classifiers are on the basic of traditional Cisco ACL structure, which is ‘fat’ in TCAM and ‘thin’ in SRAM.
In [25], a novel TCAM structure is proposed for firewall, it moves the majority information from expensive TCAM to cheaper SRAM. However, it needs multiple sequential lookups in TCAM, and extending the width of TCAM entries, while TCAM chips storing forwarding table have limited spare bits (in CERNET2, the TCAM width is set to be 144, only 16 bits are spared). Such that it is not fit for our TwoD router design.
TCAMbased solutions need multiple accesses to memory during an update [21]. Nowadays, the update frequency can reach tens of thousands per second [28], which seriously impedes the lookup speeds. To solve this problem, [38][27] propose to keep the classification table lockfree, i.e., lookups will not be interrupted by update. In this paper, we borrow their ideas during designing the update scheme.
11 Conclusion
In this paper, we put forwarded a new forwarding table structure called FIST of TwoD routers, where forwarding decisions is based on both destination and source addresses. Our focus is to accommodate the increasing number of rules in TwoD routers, which is also a practical concern of CERNET2 during deploying TwoDIP routing. Through making a novel separation between TCAM and SRAM, FIST can significantly reduce the scarce TCAM storage space and keep fast lookup speed.
FIST stores destination and source prefixes in two separate TCAM tables. Combined with the matching results of both results, we can find the next hop information for an arriving packet. Through precomputation, we can resolve the potential confliction. By proposing a new data structure called colered tree, we designed the incremental updating algorithm, that can minimize the computation complexity and number of accesses to memory.
We implement the TwoD router within FIST on the linecard of a commercial router. Our design is incremental, and does not need any new devices. We also made comprehensive with the real design and data sets from CERNET2. The results showed that FIST can greatly reduce the TCAM storage space, and will not increase SRAM storage space in our scenarios.
References
 [1] (2005) Scalable packet classification. IEEE/ACM Trans. Netw. 13 (1), pp. 2–14. Cited by: §1.
 [2] (2006) Fast packet classification for twodimensional conflictfree filters. Comput. Netw. 50 (11), pp. 1831–1842. Cited by: §1.
 [3] (2005) Fast incremental updates for pipelined forwarding engines. IEEE/ACM Trans. Netw. 13, pp. 690–703. Cited by: §6.1.
 [4] BGP routing table analysis reports. Note: http://bgp.potaroo.net Cited by: §2.
 [5] (1986) Scheduling multiprocessor tasks to minimize schedule length. IEEE Trans. Comput. 35 (5), pp. 389–393. Cited by: §8.2.2.
 [6] (2003) Network applications of bloom filters: a survey. Internet Mathematics, pp. 636–646. Cited by: §4.2.2.
 [7] (2009) Efficient multidimensional packet classification with fast updates. Computers, IEEE Transactions on 58 (4), pp. 463 –479. Cited by: §10.
 [8] (201009) Source flow: handling millions of flows on flowbased nodes. In Proc. ACM SIGCOMM’10, New Delhi, India. Cited by: §3.2.1.
 [9] Cyclone handbook. Note: www.altera.com/literature/hb/cyc/cyc_c51007.pdf Cited by: footnote 1.
 [10] (199903) Constructing optimal ip routing tables. In Proc. IEEE INFOCOM’99, New York, NY. Cited by: §4.1, §4.1.
 [11] (200812) Efficient ipaddress lookup with a shared forwarding table for multiple virtual routers. In Proc. ACM CoNEXT’08, Madrid, Spain. Cited by: §10.
 [12] (2008) Reducing the storage burden via data deduplication. Computer 41 (12), pp. 15 –17. Cited by: §4.2.2.
 [13] (2009) Towards 100g packet processing: challenges and technologies. Bell Lab. Tech. J. 14 (2), pp. 57–79. Cited by: §9.
 [14] (2006) Routing tables: is smaller really much better?. BT Technology Journal 24, pp. 119–129. Cited by: §9.
 [15] (200812) Internet traffic classification demystified: myths, caveats, and the best practices. In Porc. ACM CoNEXT’08, Madrid, Spain. Cited by: §1, §10.
 [16] (200906) A hybrid ip forwarding engine with high performance and low power. In Proc. ICCSA’09, Seoul, Korea. Cited by: §3.3.
 [17] (1998) The art of computer programming, volume 3: (2nd ed.) sorting and searching. Addison Wesley Longman Publishing Co., Inc., Redwood City, CA, USA. External Links: ISBN 0201896850 Cited by: §7.1.
 [18] (2011) NeTraMark: a network traffic classification benchmark. SIGCOMM Comput. Commun. Rev. 41 (1), pp. 22–30. Cited by: §10.
 [19] (2010) TCAM razor: a systematic approach towards minimizing packet classifiers in tcams. Networking, IEEE/ACM Transactions on 18 (2), pp. 490 –500. Cited by: §10, §3.2.1.
 [20] (2005) Conflict detection and resolution in twodimensional prefix router tables. IEEE/ACM Trans. Netw. 13 (6), pp. 1353–1363. Cited by: §1, §3.1.
 [21] (201203) A hybrid ip lookup architecture with fast updates. In Proc. IEEE Infocom’12, Orlando, FL. Cited by: §10.
 [22] (200910) Bit weaving: a nonprefix approach to compressing packet classifiers in tcams. In Proc. IEEE ICNP’09, Orlando, Florida. Cited by: §10.
 [23] (2010) Hardware based packet classification for high speed internet routers. Springer. Cited by: §1, §10.
 [24] (201110) Split: optimizing space, power, and throughput for tcambased classification. In Proc. ACM/IEEE ANCS’11, Brooklyn, NY. Cited by: §1, §10.
 [25] (201008) Fast regular expression matching using small tcams for network intrusion detection and prevention systems. In Proc. USENIX Security’10, Washington, DC. Cited by: §10.
 [26] (2012) A study of practical deduplication. Trans. Storage 7 (4), pp. 14:1–14:20. Cited by: §4.2.2.
 [27] (201107) PCduos: fast tcam lookup and update for packet classifiers. In Proc. IEEE ISCC’11, Kerkyra, Greece. Cited by: §10.
 [28] (201006) DUOS  simple dual tcam architecture for routing tables with incremental update. In Proc. IEEE ISCC’10, Riccione, Italy. Cited by: §10.
 [29] (200904) Packet classification algorithms: from theory to practice. In Proc. IEEE Infocom’09, Rio de Janeiro, Brazil. Cited by: §10.
 [30] (200201) Venti: a new approach to archival data storage. In Proc. USENIX FAST’02, Monterey, CA. Cited by: §4.1.
 [31] Router fib technology. Note: http://www.firstpr.com.au/ip/sramipforwarding/routerfib/ Cited by: §3.2.1.
 [32] (2001) Fast updating algorithms for tcams. IEEE Micro 21 (1), pp. 36–47. Cited by: §8.1.
 [33] (1982) Data compression via textual substitution. Journal of the ACM 29 (4), pp. 928–951. Cited by: §4.2.
 [34] (2003) Compressing twodimensional routing tables. Algorithmica 35, pp. 287–300. Cited by: §1, §10.
 [35] (201008) EffiCuts: optimizing packet classification for memory and throughput. In Proc. ACM SIGCOM’10, New Delhi, India. Cited by: §1.
 [36] (2005) Network algorithmics: an interdisciplinary approach to designing fast networked devices. Morgan Kaufmann, Waltham, MA. Cited by: §1.
 [37] (2009) Scalable packet classification with controlled crossproducting. Computer Networks 53 (6), pp. 821 – 834. Cited by: §1.
 [38] (2004) CoPTUA: consistent policy table update algorithm for tcam without locking. Computers, IEEE Transactions on 53 (12), pp. 1602 – 1614. Cited by: §10, §5.2.
 [39] (201203) Two dimensional ip routing architecture. Note: Internet Draftdraftxurtgwgtwodiprouting00.txt Cited by: §1.
 [40] (2007) NIRA: a new interdomain routing architecture. IEEE/ACM TRANSACTIONS ON NETWORKING. Cited by: §10.
 [41] (200802) Avoiding the disk bottleneck in the data domain deduplication file system. In Proc. USENIX FAST’08, San Jose, California. Cited by: §4.1, §7.3.
Comments
There are no comments yet.