Performance evaluation of a NDN forwarder using statistical model checking

05/05/2019 ∙ by Siham Khoussi, et al. ∙ 0

Named Data Networking (NDN) is an emerging technology for a future internet architecture that addresses weaknesses of the Internet Protocol (IP). Since Internet users and applications have demonstrated an ever-increasing need for high speed packet forwarding, research groups have investigated different designs and implementation for fast NDN data plane forwarders and claimed they were capable of achieving high throughput rates. However, the correctness of these statements is not supported by any verification technique or formal proof. In this paper, we propose using a formal model-based approach to overcome this issue. We consider the NDN-DPDK prototype implementation of a forwarder realized at NIST, which leverages concurrency to enhance overall quality of service. We use our approach to improve its design and to formally show that it can achieve high throughput rates.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the ever growing number of communicating devices, their information-intensive usage and the increasingly sophisticated security-related issues, research groups have recognized the limitations of the current network architecture based on IP [10]. Information-Centric Networking (ICN) is a new paradigm initially proposed by V. Jacobson in 2006 that transforms the concept of the Internet from a host-centric paradigm, as we know it today, to an end-to-end paradigm focusing on the content, hence more appropriate to our modern communication practices. It promises better security, mobility, scalability, distributed cache in the network and many other features.

Several branches grew out of the ICN concept. Examples include content-centric architecture (CCNx), Data Oriented Network Architecture (DONA) and many others [15], but one project stood out the most and was sponsored by the National Science Foundation (NSF) called Named Data Networking [17]. NDN is gaining rapidly in popularity and has even started being advertised by major networking companies such as CISCO 222https://www.networkworld.com/article/2602109/lan-wan/ucla-cisco-more-join-forces-to-replace-tcpip.html.

IP was designed during the 60’s to answer a different challenge, that is of creating a communication network, where packets named only communication endpoints. The NDN project proposes to generalizes this setting, such that packets can name other objects, i.e. ”NDN changes the semantics of network service from delivering the packet to a given destination address to fetching data identified by a given name. The name in an NDN packet can name anything – an endpoint, a data chunk in a movie or a book, a command to turn on some lights, etc.” [17]. This simple change has deep implications in term of routers forwarding performance since data needs to be fetched from an initially unknown location.

Being a new concept, Named Data Networking has not undergone any formal verification work yet. The initial phase of the project was meant to come-up with proof-of-concept prototypes to validate the proposed architecture. This have lead to a plethora of implementations with very poor performance to the point that it represented a hindrance for validation. A lot of effort was then directed to optimizing performance. Unfortunately, most of the time following an ad hoc process by trying different implementations of data structures, e.g. hash maps and tries [xx] and targeting various hardware platforms, e.g. GP-GPU [xx]. Furthermore, the mentioned efforts, since relying on code, validation was carried using pure simulation and testing techniques.

In this work, we take a step back and try to tackle the performance problem at a higher level of abstraction. We consider a model-based approach that allows us in addition for rigorous reasoning and formal verification. In particular, we rely on the framework [9, 14]

which enables to build formal stochastic component-based models and to perform Statistical Model Checking (SMC) analysis. The framework is used alongside an iterative process that we propose towards a more systematic and automated design. It consists of four phases (1) build a parameterized functional BIP model, which does not include performance information (2) run a corresponding implementation (obtained automatically or built manually) in order to collect context information and performance measurements, which are characterized in the form of probability distribution functions, (3) use these distribution functions to create a stochastic, and potentially timed, performance BIP model and (4) use the

SMC engine to verify that the obtained model satisfies given requirements. These steps can be repeated for different instances of the model.

We apply our approach to verify that the NDN Data Plane Development Kit (NDN-DPDK) forwarder design can perform at high-speed packet forwarding rates. We investigate different design alternatives regarding concurrency (number of threads to achieve a goal throughput), system dimensioning (queues sizes) and deployment (mapping threads to multi-core platform). Using our approach, we were able to figure out what are the best design parameters to achieve optimal performances. These where taken into account by the NIST development team to enhance the NDN-DPDK implementation. To the best of our knowledge, this is the first time formal methods are used in the context of the NDN project.

Outline. Section 2 introduces the NDN architecture and describes the NDN-DPDK, an effort to develop a high performance forwarder for NDN networks at NIST. Section 3 presents our approach and the framework used throughout the analysis. In Section 4, we present the design of the NDN-DPDK forwarder and the models used for analysis. The performed verification and the obtained results are depicted in Section 5. The remainder summarizes the contributions, presents leaned lessons and draws some future research directions.

2 Named Data Networking

This section describes the NDN protocol and introduces the NDN-DPDK forwarder, which is the subject of this study.

2.1 Overview

NDN is one of the next internet generation. It’s core design is exclusively based on Naming contents rather than end points (IP addresses in the case of IP) and its routing is based on name prefix lookups [8].

The protocol supports three types of packets, namely Interest, Data and Nack. Only the first two are necessary. The Nack lets the forwarder know of the network’s inability to forward Interests further. Interests are consumers requests sent to a network and Data packets are content producers replies. One of NDN’s advantages is its ability to cache content (Data) everywhere the Data packet propagates making the NDN router stateful. Thus, future Interests are no longer required to fetch the content from the source, instead it could be retrieved directly from a closer node that has the cached Data.

Packets in NDN progress in a network as follow: First a client application sends an Interest with a meaningful name prefix that represents the requested content. Names in NDN are hierarchical; an example includes trying to fetch a YouTube video on the internet called Video1.mpg by a Youtuber Alex. In order to stream this video, one has to type in the name /YouTube/Alex/video1.mpg. Then, this packet is forwarded by the network nodes based on its name prefix. Finally, this Interest is satisfied by Data packets from the original source that produced this content or from an intermediate routers that cached it due to previous requests. It is also crucial to note that consecutive transmissions of Interest packets with similar name prefix might not lead to the same path each time, but could rather be forwarded along different paths each time a request is made, depending on the forwarding strategy in place. This means that the same Data could originate from different sources (producers or caches).

The NDN forwarding daemon (NFD) [1], has three different data structures: Pending Interest Table (PIT), Content store (CS) and Forwarding Interest base (FIB). The packet processing is as follows:

  1. For Interests, the forwarder, upon receiving an Interest, starts off by querying the CS for possible copies of the Data, if a CS match is found during this operation, the cached Data is returned downstream towards the client otherwise, an entry is created in the PIT with it’s incoming (source) and outgoing (destination) faces for record keeping. Using the PIT, the forwarder determines whether the Interest is looped in the network by checking a global unique number called Nonce in the Interest against existing previous PIT entries. If a duplicate nonce is found the Interest is dropped and a Nack of reason Duplicate is sent towards the requester. Otherwise, the FIB is queried for a possible next hop to forward the Interest towards an upstream node; if there’s no FIB match, the Interest is immediately dropped and replied with a Nack of reason No Route.

  2. For Data, the forwarder starts off by querying the PIT. If a PIT entry is found, the Data is sent to downstream nodes listed in the PIT entry, then the PIT arms a timer to signal the deletion of this entry and a copy of the Data is immediately stored in the CS for future queries. If no record is found in the PIT, the Data is considered malicious and is discarded.

In the next section, we explore NDN-DPDK, an ongoing effort at NIST to develop an NDN forwarder with high speed packet processing. This forwarder follows the NDN protocol described earlier, yet it leverages concurrency and includes slightly modified data structures.

2.2 The NDN-DPDK Forwarder

The NDN-DPDK forwarder’s data plane has three stages: input, forwarding, and output (Fig. 1). Each stage is implemented as one or more threads pinned to CPU cores, allocated during initialization. An input thread receives packets from a Network Interface Card (NIC), decodes them, and dispatches them to a forwarding thread. The forwarding thread processes Interest, Data, or Nack packets according to the NDN protocol. An output thread queues packets for transmission on their respective NIC.

Figure 1: Diagram of the NDN-DPDK forwarder

During forwarder initialization, each hardware NIC is provided with a large memory pool to place incoming packets. The input thread continuously polls the NIC to obtain a burst of received packets. It decodes received packets, reassembles fragmented packets, and drops malformed ones. Then, it dispatches each packet to the responsible forwarding thread which is determined as follows: (a) For an Interest, the input thread computes SipHash of its first two name components and queries the last 16 bits of the hash value in the Name Dispatch Table (NDT), a entry lookup table configured by the operator, to select the forwarding thread. (b) Data and Nack carry a 1-byte field in the packet header that indicates which forwarding thread handled the corresponding Interest. They are dispatched to the same forwarding thread.

The forwarding thread receives packets dispatched by input threads through a queue. It processes each packet according to the NDN protocol, using two data structures both implemented as hash tables: (a) The Forwarding Information Base (FIB) records where the content might be available and which forwarding strategy is responsible for the name prefix. (b) The PIT-CS Composite Table (PCCT) records which downstream node requested a piece of content, and also serves as a content cache; it combines the Pending Interest Table (PIT) and Content Store (CS) found in a traditional NDN forwarder.

The output thread retrieves outgoing packets from forwarding threads through a queue. Packets are fragmented if necessary and queued for transmission on a NIC. The NIC driver automatically frees packet buffers into original memory pools after transmission, ready for newly arrived packets.

3 Formal Model-based Approach

In this section, we describe the methodology used in this study which includes the underlying modeling formalism as well as the associated analysis technique.

3.1 Overview

Our methodology is a formal model based approach. In order to evaluate a system’s performance, its model must be faithful, i.e. reflects the real characteristics and behaviors of this system of interest. Moreover, to allow for well-founded analysis, this model needs to be formally defined and the used analysis technique needs to be trustworthy and scalable. Our approach adheres to these principles in two ways. First, by relying on the formal framework (introduced below) that encompasses a stochastic component-based modeling formalism and an SMC engine for analysis. Second, by providing a method for systematically building formal stochastic models for formal verification that combines accurate performance information with the functional behavior of the system.

Figure 2: Performance evaluation approach for NDN data plane

This approach is depicted in Fig. 2

. It takes a functional model of the system and a set of requirements to verify. The functional model could be obtained from a high-level specification or an existing implementation. The system’s implementation which could also be obtained by automatic code generation, is instrumented and used to collect performance measurements regarding the requirements of interest, e.g. throughput. These measurements are analyzed and characterized in the form of probability density functions with the help of statistical techniques such as sensitivity analysis and distribution fitting. The obtained probability functions are then introduced in the functional model using a well defined calibration procedure

[13]. The latter produces a stochastic timed model (if measurements concern time), which will be analyzed using the SMC engine.

Note that the considered models in this approach or workflow can be parameterized with respect to different aspects that we want to analyze and explore. Basically, the defined components types are designed to be instantiated in different context, e.g. with different probability density functions thus showing different performance behaviors. While, the model considered for analysis using SMC is a specific instance for which all the parameters are fixed, some degree of parameterization is still allowed on the verified requirements.

3.2 Stochastic Component-based Modeling in BIP

BIP (Behavior, Interaction, Priority) is a highly expressive component based framework for rigorous system design [4]. It allows the construction of complex, hierarchically structured models from atomic components characterized by their behavior and their interfaces. Such components are transition systems enriched with variables. Transitions are used to move from a source to a destination location. Each time a transition is taken, component variables may be assigned new values, computed by user-defined C/C++ functions. Composition of BIP components is expressed by layered application of interactions and priorities. Interactions express synchronization constraints between actions of the composed components while priorities are used to filter among possible interactions e.g. to express scheduling policies.

The stochastic semantics of BIP was initially introduced in [5, 12] and recently extended for real-time systems in [14]. It enables the definition of

Figure 3: A stochastic BIP component; client behavior issuing requests each time unit .

stochastic components encompassing probabilistic variables updated according to user-defined probability distributions. The underlying mathematical model behind is a Discrete Time Markov Chain. These are modeled as classical BIP components augmented with probabilistic variables as shown in Figure

3. The latter depicts the client behavior in a client-server setting which issues a request (snd) each p time units. The period p is set probabilistically by sampling a distribution function () given as a parameter of the model. Time is introduced by explicit tick transitions and waiting is modeled by exclusive guards on the tick and snd transitions with respect to time (captured in this example by the variable ).

3.3 SMC in a Nutshell

Statistical model-checking (SMC) [7, 16] is a formal verification method that combines simulation with statistical reasoning to provide quantitative answers on whether a stochastic system satisfies some requirements. It was successfully used in various domains such as biology [6], communication [2] and avionics [3]. It has the advantage to be applicable to models and implementations –provided that they meet specific assumptions– in addition to capturing rare events. The SMC engine implements well-known statistical algorithms for stochastic systems verification, namely, Hypothesis Testing (HT) [16]

, Probability Estimation (

PR) [7] and Rare Events (IP). In addition, it provide an automated parameter exploration procedure (PX). The tools takes as inputs a stochastic BIP model, a Linear temporal logic (LTL) property to check and a set of confidence parameters required by the statistical test.

4 NDN-DPDK Modeling

In this section we present the modeling process of the NDN-DPDK forwarder. We first present the functional model of the forwarder then show how we refinite to obtain a stochastic timed model for performance evaluation.

4.1 A Parameterized Functional BIP Model

Fig 4 depicts the BIP model of the NDN-DPDK forwarder introduced in Section 2. It shows the NDN-DPDK architecture in term of interacting BIP components which can easily be matched to the ones in Fig 1. The presented model is parameterized with respect to the number of components, their mapping into specific CPU cores, FIFOs sizes, etc. Due to space limitations, we present the behavior of the Forwarding thread component in Fig 7 and provide more details in the tech-report [xx]. It is worth mentioning that the model is initially purely functional and untimed.

Figure 4: A functional BIP model of the NDN-DPDK forwarder

4.2 Building the Performance Model

To build a performance model for our analysis, we consider the network topology in Fig 5. The latter has a traffic generator client (consumer), a forwarder (NDN-DPDK) and a traffic generator server (producer), arranged linearly.

Figure 5: BIP model of the considered network topology

The green line shows the Interest packet path from the client to the producer through the forwarder and the red line indicates the Data path towards the client. The structure of our model (Fig 4) calls for four distribution functions to characterize performance: a) Interest dispatching latency in input threads. b) Data dispatching latency in input threads. c) Interest forwarding latency in forwarding threads. d) Data forwarding latency in forwarding threads. Notice that Nack packets are out of the scope of these experiments. We identified the following factors that can potentially affect the system’s performance:

  1. Number of forwarding threads. Having more forwarding threads distributes forwarding workload onto more CPU cores. However, these cores usually compete for the shared L3 cache, which potentially increases forwarding latency of individual packets.

  2. Placement of forwarding threads onto Non Uniform Memory Access nodes (NUMA). Input threads and theirs memory pools are always placed on the same NUMA node as the Ethernet adapter whereas the output threads and the forwarding threads can be moved across the two nodes. If a packet is dispatched to a forwarding thread on a different node, the forwarding latency is generally higher because memory access is crossing NUMA boundary.

  3. Number of name components. A longer name requires more iterations during table lookups, potentially increasing Interest forwarding latency.

  4. Data payload length. Although the Data payloads are never copied, a higher payload length increases demand for memory bandwidth, thus potentially increasing latencies.

  5. Interest sending rate from the client. Higher sending rate requires more memory bandwidth, thus potentially increasing latencies. It may also lead to packet loss if queues between input and forwarding threads overflow.

  6. Number of PIT entries. Although the forwarder’s PIT is a hash table that normally offers lookup complexity, a large number of PIT entries inevitably leads to hash collisions, which could increase forwarding latency.

  7. Forwarding thread’s queue capacity. the queues are suspected to impact the overall throughout of the router through packet overflow and loss rates. However, it does not influence packets latencies inside each forwarding thread.

After identifying the factors that may have potential influence on packet latency, we instrument the real forwarder and collect latency measurements. Then, perform sensitivity and correlation analysis to identify which factors are more significant than others. This narrows down the number of factors used and associated distribution functions.

4.2.1 Forwarder Instrumentation.

Among the six factors that can potentially affect packet dispatching and forwarding latencies, factors 1234, and 5 can be controlled by adjusting forwarder and traffic generator configuration, while factor 6 is a result of network traffic and is out of control. To collect the required measurements, we modified the forwarder so that it logs latencies of each packet, as well as the PIT size after each burst of packets. We minimized the extra work that input threads and forwarding threads have to perform to enable instrumentation, leaving the measurement collection to a separate logging thread or post-processing scripts. It is important to mention that logging packet latencies throughout the experiments does in fact introduce timing overhead. Therefore, the latency values obtained from measurements will have a bias (overestimate) that translates into additional latency but the trends observed remains valid.

We conducted the experiment on a Supermicro server equipped with two Intel E5-2680V2 processors, GB DDR4 memory in two channels, and four Mellanox ConnectX-5 or ConnectX-4 100 Gbit/s Ethernet adapters. The hardware resources are evenly divided into two NUMA nodes. To create the topology in Figure 5, we plugged two QSFP28 passive copper cables to connect the four Ethernet adapters and form two point-to-point links. All forwarders and traffic generator processes were allocated with separate hardware resources and could only communicate over Ethernet adapters.

In each experiment, the consumer transmitted either at sending intervals of one Interest per ns or one Interest per ns under 255 different name prefixes. There were 255 FIB entries registered in the NDN-DPDK forwarder at runtime (one for each name prefix used by the consumer), all of which pointing to the producer node. The producer would reply to every Interest with a Data packet of the same name. The forwarder’s logging thread was configured to discard the first samples (either latency trace or PIT size) during warm-up period, and then collect the next samples and ignore the cool down session. Each experiment represents about 4 million Interest-Data exchanges.

Factor Minimum Step Maximum
Number of forwarding threads 1 1 8
Name components (length) 3 7 13
Payload length 0 300 1200
Interest sending intervals ns (2Mpps) ns ns (1.42Mpps)
Table 1: Factors used. NUMA mapping is described below.

We repeated the experiment using different combinations of the factors in Table 1. NUMA placement has one of the following arrangements:

  • (P1) Client and server faces and forwarding threads are all on the same NUMA,

  • (P2) Client face and forwarding threads on one NUMA, server face on the other,

  • (P3) Client face on one NUMA, forwarding threads and server face on the other,

  • (P4) Client face and server face on one NUMA, forwarding threads on the other.

In arrangement P1, the forwarding latency is expected to be the smallest out of all because all processes are placed on the same NUMA therefore, there is no inter-socket communication and no overhead introduced. In arrangement P4, it is also expected that both Interests and Data packets are crossing NUMA boundaries twice since the forwarding threads are pinned to one NUMA whereas the client and the server faces, connected to the Ethernet adapters, reside on another. This fact will increase packet latency tremendously as opposed to P1, P2 and P3. These two observations are sufficient to predict that placement P1 is the best case scenario and placement P4 is obviously the worst case scenario. However, we aim at getting more insight and confidence through quantitative formal analysis. This will provide a recommendation as to which placement is better suited based on the remaining parameters combinations.

4.2.2 Model Fitting.

Before calibrating our untimed BIP model with multiple distinctive probability distribution functions representing each combination of the six factors above, it seems reasonable to reduce the number of used distributions by performing sensitivity analysis. The latter allows to predict the outcome of a decision under the effect of several factors. In fact, for a given set of variables, a statistician can understand how a change in one factor can influence the outcome of an analysis and rank them by importance.
Sensitivity analysis includes a variety of techniques and plots. In this paper, we restrict ourselves to the Dex mean plot on 4 of the factor previously mentioned: (1) Number of forwarding threads, (2), thread mapping to NUMA, (3) name size, (4) Data payload size and Packet type333This is not a factor as the forwarder processes both packets simultaneously. However, we intend to show how Interest or Data influence latency;

Figure 6: Main Effects Plot for Interest and Data packets

The sensitivity analysis on the collected data (Fig 6) indicates that: (a) Interest processing is slightly slower than Data processing; (b) The number of name components in the name (packet name length) slightly impacts the latency of packets in forwarding threads. (c) Data payload has absolutely no impact on packets latency which makes sense since the forwarder does not decode the whole packet while it is being processed but rather checks its name only (packet header). (d) Number of forwarding threads used has a significant impact, which is reasonable as the load is being split equally among them.

Each forwarding thread has an instance of the PIT. The PIT sets timers for each individual packet. When it expired, the packet is removed from the table. Generally, the PIT size is expected to influence the latency of packets when it reaches a threshold. However, in this case a simple correlation easily confirms that it is not the case. This makes sense as the data structure used in this implementation is optimized for high performance.

To sum up, the factors of interest are: 1) number of forwarding threads; 2) name components; 3) NUMA placement; 4) sending rate.

4.2.3 Model Calibration.

Based on the analyses above, we came up with X probability distributions characterizing the combinations of the factors with the

Figure 7: Forwarding thread; forwarding time of Interest and Data is modeled using two distributions, resp. , .

highest impact on the forwarder performance. These probability distributions are used to calibrate the functional BIP model in Fig 4. Because of space limitation, we only show in Fig 7 the calibrated behavior of the forwarding thread. Calibration is a well defined model transformation that transforms functional components into stochastic timed ones [11].

5 Performance Analysis using SMC

In this section, we explain the SMC analysis we performed and discuss the obtained results.

5.1 Experiments Setting

Throughout the study, SMC used 41 traces per run and spent about min on average to verify the performance property in each. The forwarder performance as a whole is calculated based on the Interest satisfaction rate, meaning that each Interest packet that the consumer is sending brings back a Data packet from the forwarder’s content store (CS) or from the producer. Interests that were sent and lost along the process or were timed out inside the PIT table of a forwarding thread do not count and add up to a loss rate.

5.2 Results

5.2.1 Queues Dimensioning.

To start off, we explore the impact of sizing the forwarding thread’s queue. Each forwarding thread has an input queue. In this first experiment, the considered BIP model had only one forwarding thread and its queue capacity varies within {, , } packets. The client sends packets at three different sending rates: packets per second (pps), pps and pps. The results are shown in Figure (a)a. The Y-axis represents the Interest satisfaction rate such that (resp. ) indicates no loss (resp. loss). The X-axis represents the queue capacity for the three different sending rates.

(a) One Forwarding thread (different sending rates).
(b) Many Forwarding threads (sending rate set to ).
Figure 8: Queues sizing exploration.

The analysis indicates that at pps (pink), the Interest satisfaction rate is the highest. This result is expected since one forwarder alone is capable of handling all packets at this slow sending rate ( pps of packet size bytes is equivalent to Gbps), under any queue capacity. However, under a faster sender rate –where a single forwarder shows signs of packet loss– we unexpectedly observed a better Interest satisfaction rate with a smaller queue size (Q=).

After a thorough investigation of the exisiting NDN-DPDK implementation, we found out that the forwarding thread’s queue does not have proper queue management in terms of insertion and eviction policies that would give priority to Data over Interest packets. In the absence of such queue management, more Data packets would be dropped resulting in Interests not being satisfied, thus lower performance. It is thus advised for the final implementation of the NDN-DPDK forwarder, to use a queue capacity smaller than packets when the forwarder has only one forwarding thread and packets are sent at a fast rate.

Similarly, we explore whether this observation remains true with more forwarding threads. In order to do that, we run the SMC analysis again on eight different models each with a different number of forwarding threads ranging from to under a sending rate of pps ( Interest per us) where a loss rate was observed in a single thread experiment. And for each model we experimented with two different queue capacities, namely and packets. The results are reported in Figure (b)b where the X-Axis represents the number of forwarding threads while the Y-axis depicts the Interest satisfaction rate.

We observe that the Queue size matters mainly in the case of a model with one and two forwarding threads. In fact, for a two threads model, a bigger queue size is preferred to maximize the performance, unlike when a single thread is used. As for the other six models, it seems that the performance of the forwarder is not influenced by the queue capacity at all under the current sending rate. This is due to the fact that three forwarding threads or more are capable of splitting the workload at pps and can pull enough packets from each queue with a minimum loss rate of 0.12 % . This result stresses that the faster the sending rate, the more forwarding threads are needed to accomplish the task with minimum loss rate to avoid being concerned about the proper queue size.

5.2.2 NUMA placement, number of forwarding threads and packet name length.

Another aspect worth explored is the impact of assigning NDN-DPDK router’s forwarding threads across the two NUMA nodes (0, 1) under different sending rates. This experiment aims to determine on which NUMA the NDN-DPDK forwarder should run its forwarding threads and position its faces (each face is connected to an Ethernet adapter where Face 0 communicates with the client and Face 1 to the server.) by using SMC analysis and exploring the four possible arrangements in (4.2.1), (4.2.1), (4.2.1) and (4.2.1) in section 4. We run SMC analysis on eight different BIP models with the number of threads ranging from 1 to 8. On each of these models we run 24 experiments which corresponds to a combinations of the factors depicted in Table 1 and the NUMA mapping.

In Figures 14-14, each line resent experiments with similar packet name lengths {3, 7, 13} and a queue capacity 4096. On the right column are results for a faster sending rate of Mpps while on the left column are slow sending rate of Mpps. Each of the figures includes four curves corresponding to the four placement options P1, P2, P3 and P4.

Figure 9: 3 name comp., 1.42Mpps
Figure 10: 3 name comp., 2Mpps
Figure 11: 7 name comp., 1.42Mpps
Figure 12: 7 name comp., 2Mpps
Figure 13: 13 name comp., 1.42Mpps
Figure 14: 13 name comp., 2Mpps

Figure 14-14 show that Interest satisfaction rates scale up with the increase of forwarding threads then reach a saturation plateau where adding more threads can no longer improve the performances. Moreover, with fewer forwarding threads, the loss rate is unavoidable and exceeds 90%. This is because the sending rate is faster than the forwarding threads processing rate causing their FIFO queues to saturate and start dropping packets frequently.

Under a slower sending rate, and packets with small and medium name lengths (3 and 7), Figures 14 and 14 indicate that the performances reach the maximum satisfaction rate of over 90% with only five forwarding threads. This result is observed on all NUMA placements. Whereas in Figure 14 and 14 where the client is generating packets faster at 2Mpps, the saturation plateau is reached at six threads. Moreover, Figures 14 and 14 also show that placing all processes (threads and faces) on a single NUMA outperforms the other three options. This observation is explained by the absence of inter-socket communication thus less timing overhead added such as in the case of the purple plot where packets are crossing NUMA boundaries twice from Face 0 to the forwarding threads then through Face 1 and back. With a larger name however (Figure 14) where packets have 13 name components, the figure depicts an unexpected behaviour when using three threads or less. In this case, placing the forwarding threads on the same NUMA as Face 1 (which is the Ethernet adapter connected to the server and receives Data packets), surpasses the other three options. The intuition behind this is that since forwarding threads take longer times to process incoming packets due to their longer name lookup, particularly for Interests as they are searched by names inside the two tables (PCCT and FIB) rather than a token such as the case for Data. Placing the forwarding threads with the Data receiving Ethernet adapter connected to Face 1, has the potential to yield better results by quickly processing packets after a quick token search especially when the workload is bigger than the threads capacities. When the sending rate is increased, the same results are observed in Figure 14 under the same name length where the best placement is combining Face 1 and the threads together on one NUMA for single, double or triple threaded models. Although increasing the sending rate clearly caused packet loss of about 20% loss.

Figure 14 and 14 likewise, show the impact of increasing the sending rate on packets with smaller names. In this case, it is preferred to also position all the processes on one NUMA such as the case of the yellow plot because NUMA boundary crossing usually downgrades the performance. In fact, the difference between no NUMA crossing and the double crossing (yellow and purple plots respectively) is approximately 30% with more than five threads. The second best option is placing the forwarding threads on the NUMA receiving Interest packets with Face 0 (NUMA hosting the Ethernet adapter that receives Interests from the Client). However, When the number of threads is less than the saturation plateau and the threads are overworked and start to loose packets, it is recommended to opt for a placement of the threads with the Data receiving NUMA due to the yielded results (NUMA hosting the Ethernet adapter connected to the server).

6 Lessons learned and future work

This study shed light on a new networking technology called Named Data Networking (NDN) and its forwarding daemon. Ongoing NDN research includes the development of high-speed data plane forwarders that can operate at hundreds of gigabits per second while using modern multi-processor server hardware and kernel bypass libraries. In this paper, we discussed the results of a performance evaluation effort we undertook to reach conclusions on how the NDN forwarder prototype developed by NIST (NDN-DPDK) behaves in a network in terms of achievable throughput. We conducted an extensive analysis under different factors such as the number of threads carrying tasks and function mapping to CPUs, using a model-based approach and statistical model checking. Given the wide array of design parameters involved, this effort contributes valuable insights into protocol operation and guides the choice of such parameters. The use of statistical model checking for performance analysis allowed us to discover potential sub-optimal operation and propose appropriate enhancement to the queue management solution. Moreover, our extensive analysis provides a characterization of the achievable forwarding throughput for a given forwarder design and available hardware resources which would not have been possible to obtain, with such controllable accuracy, using traditional measurements and statistic methods. Furthermore, these results were communicated and shared with members of the NDN community in a conference throughout a poster interaction and gained attention from researchers who were interested in the methodology and its applications. In addition to that, the use of a BIP model refined with the right level of granularity and simulated using the BIP engine allows the generation of executable code that could be used instead of actual forwarder. It is important to note however, that our analysis depends largely on a stochastic model obtained using samples of data collected from the actual implementation of the forwarder which eventually introduced timing overhead, but the trends observed throughout the study remain accurate. In the future, this analysis will be extended to answer the reverse question: Given a desired throughput, what is the best hardware setup and the forwarder design to use? Rather than the question that we have investigated in this paper: Given a hardware setup and a forwarder design, what is the maximum achievable throughput? that we investigated in this paper.

The identification of any commercial product or trade name does not imply endorsement or recommendation by the National Institute of Standards and Technology, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose.

References

  • [1] NFD Developer’s Guide. Tech. rep., http://named-data.net/techreports.html
  • [2] Basu, A., Bensalem, S., Bozga, M., Caillaud, B., Delahaye, B., Legay, A.: Statistical Abstraction and Model-Checking of Large Heterogeneous Systems. In: Forum for fundamental research on theory, FORTE’10. LNCS, vol. 6117, pp. 32–46. Springer
  • [3] Basu, A., Bensalem, S., Bozga, M., Delahaye, B., Legay, A., Siffakis, E.: Verification of an AFDX infrastructure using simulations and probabilities. In: Runtime Verification, RV’10. LNCS, vol. 6418. Springer (2010)
  • [4] Basu, A., Bozga, M., Sifakis, J.: Modeling heterogeneous real-time components in bip. In: Proceedings of the Fourth IEEE International Conference on Software Engineering and Formal Methods. pp. 3–12. SEFM’06, IEEE Computer Society, Washington, DC, USA (2006)
  • [5] Bensalem, S., Bozga, M., Delahaye, B., Jégourel, C., Legay, A., Nouri, A.: Statistical Model Checking QoS Properties of Systems with SBIP. In: International Symposium On Leveraging Applications of Formal Methods, Verification and Validation, ISOLA’12. pp. 327–341 (2012)
  • [6] David, A., Larsen, K.G., Legay, A., Mikucionis, M., Poulsen, D.B., Sedwards, S.: Statistical model checking for biological systems. Int. J. Softw. Tools Technol. Transf. 17(3), 351–367 (Jun 2015)
  • [7] Hérault, T., Lassaigne, R., Magniette, F., Peyronnet, S.: Approximate Probabilistic Model Checking. In: International Conference on Verification, Model Checking, and Abstract Interpretation, VMCAI’04. pp. 73–84 (January 2004)
  • [8] Jacobson, V., Smetters, D.K., Thornton, J.D., Plass, M.F., Briggs, N.H., Braynard, R.L.: Networking Named Content (2009), https://named-data.net/wp-content/uploads/Jacob.pdf
  • [9] Mediouni, B.L., Nouri, A., Bozga, M., Dellabani, M., Legay, A., Bensalem, S.: 2.0: Statistical model checking stochastic real-time systems. In: Automated Technology for Verification and Analysis - 16th International Symposium, ATVA 2018, Los Angeles, CA, USA, October 7-10, 2018, Proceedings. pp. 536–542 (2018)
  • [10] Named data networking (ndn) project. Tech. rep., USA (Oct 2010), http://named-data.net/techreport/TR001ndn-proj.pdf
  • [11] Nouri, A.: Rigorous System-level Modeling and Performance Evaluation for Embedded System Design. Ph.D. thesis, Grenoble Alpes University, France (2015), https://tel.archives-ouvertes.fr/tel-01148690
  • [12] Nouri, A., Bensalem, S., Bozga, M., Delahaye, B., Jegourel, C., Legay, A.: Statistical model checking QoS properties of systems with SBIP. Int. J. Softw. Tools Technol. Transf. (STTT) 17(2), 171–185 (April 2015)
  • [13] Nouri, A., Bozga, M., Molnos, A., Legay, A., Bensalem, S.: ASTROLABE: A rigorous approach for system-level performance modeling and analysis. ACM Trans. Embedded Comput. Syst. 15(2), 31:1–31:26 (2016)
  • [14] Nouri, A., Mediouni, B.L., Bozga, M., Combaz, J., Bensalem, S., Legay, A.: Performance evaluation of stochastic real-time systems with the sbip framework. International Journal of Critical Computer-Based Systems 8(3-4), 340–370 (2018)
  • [15] Xylomenos, G., Ververidis, C.N., Siris, V.A., Fotiou, N., Tsilopoulos, C., Vasilakos, X., Katsaros, K.V., Polyzos, G.C.: A survey of information-centric networking research. IEEE Communications Surveys Tutorials 16(2), 1024–1049 (2014)
  • [16] Younes, H.L.S.: Verification and Planning for Stochastic Processes with Asynchronous Events. Ph.D. thesis, Carnegie Mellon (2005)
  • [17] Zhang, L., Afanasyev, A., Burke, J., Jacobson, V., claffy, k., Crowley, P., Papadopoulos, C., Wang, L., Zhang, B.: Named data networking. SIGCOMM Comput. Commun. Rev. 44(3), 66–73 (Jul 2014)