LOcAl DEcisions on Replicated States (LOADER) in programmable data planes: programming abstraction and experimental evaluation

01/21/2020 ∙ by German Sviridov, et al. ∙ 0

Programmable data planes recently emerged as a prominent innovation in Software Defined Networking (SDN), by permitting support of stateful flow processing functions over hardware network switches specifically designed for network processing. Unlike early SDN solutions such as OpenFlow, modern stateful data planes permit to keep (and dynamically update) local per-flow states inside network switches, thus dramatically improving reactiveness of network applications to state changes. Still, also in stateful data planes, the control and update of non-local states is assumed to be completely delegated to a centralized controller and thus accessed only at the price of extra delay. Our LOADER proposal aims at contrasting the apparent dichotomy between local states and global states. We do so by introducing a new possibility: permit to take localized (in-switch) decisions not only on local states but also on replicated global states, thus providing support for network-wide applications without incurring the drawbacks of classical approaches. To this purpose, i) we provide high-level programming abstractions devised to define the states and the update logic of a generic network-wide application, and ii) we detail the underlying low level state management and replication mechanisms. We then show LOADER's independence of the stateful data plane technology employed, by implementing it over two distinct stateful data planes (P4 switches and OPP - Open Packet Processor - switches), and by experimentally validating both implementations in an emulated testbed using a simple distributed Deny-of-Service (DoS) detection application.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 15

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Future networks are called to efficiently and flexibly support an ever growing variety of heterogeneous network functions such as network address translation, tunneling, load balancing, traffic engineering, monitoring, intrusion detection, and so on. Software-based programmability of such type of functions has been first pioneered by early Software Defined Networking (SDN) proposals, and then by the more recent trend of Network Function Virtualization (NFV). However, both these approaches have shown shortcomings. Indeed, original SDN approaches (and, more specifically, the OpenFlow-based ones), were relying on stateless switching architectures, and thus suffered of the need to centralize any state update and maintenance to a centralized controller, thus paying a significant toll in terms of latency and communication overhead. On the other side, NFV has addressed the design of middlebox functionalities in software, typically using commodity CPUs. However, early NFV implementations appeared to be performance-limited: it is a fact that there exists a substantial gap (a factor) between the speed attainable in software opposed to dedicated HW devices, and such gap is not going to decrease in the future, with HW switches capable to attain many Terabit per seconds, opposed to the tens of Gigabit per second attainable by their SW counterparts.

In order to overcome such limitations, starting from 2014 with OpenState [5] and P4 [8], a new innovation trend emerged with the introduction of programmable / stateful data planes. Stateful data planes offer an additional level of programmability with respect to the traditional stateless SDN paradigm, by introducing the possibility of keeping and manipulating persistent states locally at the network device. Opposed to stateless switches, persistent states can now be directly deployed and managed inside network devices in the form of simple user-defined memory elements. Furthermore, arbitrary algorithms for packet/flow processing, e.g., described in terms of simple Mealy Finite State Machines [5] or more sophisticated Extended Finite State Machines [6, 20], can be directly loaded and run inside the processing pipeline of individual network devices, thus providing opportunities of implementing network applications directly within the network device at line rate.

The crucial advantage of stateful data plane technologies consists in the possibility to significantly reduce the interaction between switches and the controller. Opposed to a stateless data plane, in which any change of the forwarding decision requires the intervention of the controller, a stateful data plane permits to take localized decisions, i.e., adapt the forwarding behavior to network events and handle changing states locally inside the switch. This approach significantly reduces the reliance upon a centralized controller, and mitigates the relevant severe penalties in terms of latency and signaling overhead [25], hence greatly improving the reactivity of network control applications.

Unfortunately, the benefits of distributing network applications on stateful switches cannot be achieved in cases where non-local states need to be considered. For example, an application that identifies the occurrence of a particular event based on multiple statistics gathered from different switches, operates on a global state that is the combination of different local statistics of different switches. Even in the case of stateful data planes, the control and update of the global state is still delegated to a centralized entity, either to a controller or a single switch [2]. The traditional approach of employing a centralized controller for global state management greatly simplifies the implementation, but non-local states can be accessed and updated only at the price of extra delay, thus affecting the overall reactivity. On the other hand, solutions employing global states centralized in a stateful switch lead to performance impairments. Indeed, all flows affected by/ affecting a global state should traverse the switches storing it. This ultimately leads to an overall higher network utilization and traffic concentration, thus affecting network congestion and available capacity. Furthermore, any failure to the switch can jeopardize the state integrity due to the presence of a single replica of the global state.

In this work we propose a novel framework, namely LOADER (LOcAl DEcisions on Replicated states), which enables a new possibility for stateful data planes: the states and the corresponding control logic are distributed across the switches and the controller, while permitting multiple replicas of the same state/control logic to be present in the network. This permits to run network applications operating on global states without a unique central entity. Switches can take instantaneous decisions based on local replicas of non-local states, without any controller intervention, thus re-establishing the beneficial effects of stateful data planes also for non-local states. LOADER provides:

  • the programming abstractions to define generic (either local or non-local) states and the control logic of any network application;

  • the engine to optimally embed the states and the control logic into the network devices and the controller, to optimize performance while taking into account the available resources in terms of processing and state storage capabilities;

  • the mechanism to transparently replicate non-local states across multiple network devices.

The rest of the paper is organized as follows. In Sec. II we discuss the related work. In Sec. III we discuss the issues and possible solutions to offload network applications into the data plane. In Sec. IV we first provide a high level abstraction of the LOADER framework by defining its core modules and later delve into the details of each module and the way LOADER abstraction is exposed to the network programmer. In Sec. V we analyze consistency-related issues when dealing with replicated states and how to overcome them. In Sec. VI we describe how we implemented a lightweight version of the LOADER framework in ONOS [4] with major emphasis on the data plane implementation in P4 [8] and Open Packet Processor (OPP) [6]. In Sec. VII we show how to program a distributed Deny-of-Service (DoS) detection application in LOADER and experimentally assess the performance for both P4 and OPP based implementations. We also show how to program other applications in Sec. VIII. Finally, we draw our conclusions in Sec. IX.

Ii Related work

Many recent works [13, 26, 3] have proposed abstraction models for the definition of network applications. However, they do not consider stateful data planes, since the states are assumed to be kept at the controller. On the contrary, SNAP [2] proposes a novel network programming abstraction, which permits to define complex network applications for stateful SDN. It addresses the problem of how to perform optimal embedding of states across the network switches, taking into account the dependency between states and the traffic flows. By design, SNAP is limited to just one replica of each state within the network. LOADER, instead, enables multiple replicas of the state, extending the single replica approach of SNAP. The optimal replication problem for multiple replicas has been defined and investigated in [23]. Given a network application and the corresponding states, the problem considers all the traffic flows that are affected by/affect such states and, based on a generic cost function, computes (i) the optimal number of replicas, (ii) their placement within the network and (iii) the corresponding optimal traffic routing. The work in [23] can be used as a building block for LOADER (i.e., the optimization engine), which provides the programming framework and the implementation for replicated states.

Stateful NetKAT [17] is a programming abstraction for the development of network applications. Differently from SNAP, NetKAT provides a native support for replicated states, yet by design the actual state replication can be performed only at the edges of the network. Moreover, differently from LOADER, the traffic affected by/affecting the replicated states is constrained to traverse all replicas, thus precluding a wide range of applications.

Swing State [16] introduces a mechanism which provide state migrations entirely in the data plane but, as in the case of SNAP, assumes only one replica of each state, which is migrated across the network.

In [28] the authors focus on providing state redundancy and traffic load balancing by employing independent copies of the same application. LOADER is able to achieve the same goal by employing state replication instead of performing full application copy, leading to better hardware resource utilization.

Moreover, none of the previously mentioned works have addressed the definition of a high level framework providing a programing abstraction useful to the network application developer, as instead LOADER does.

In general, the problem of maintaining consistency across replicated states has been deeply investigated in the past in the field of distributed systems [18] and many solutions have been proposed, depending on the nature of the states, the desired properties and the available resources. There have been however, few works concerning replication in stateful data planes. NetPaxos [10] provides application-layer acceleration for Paxos [14] consensus protocol by offloading parts of the algorithm to the switches. Differently from NetPaxos, LOADER provides state replication directly in the data plane.

A preliminary version of this work was presented in [24], focusing on some implementation issues of LOADER and providing some experimental results. Furthermore, [24] did not consider the abstraction model required to develop network applications based on replicated states.

Iii Offloading network applications

Network applications are composed of a set of operations performed over a set of states related to some network condition. A state is defined as a generic data structure holding a variable or a compound of variables, associated with one or more network applications. In stateless data planes, all states related to a network application are gathered in a logically central entity, namely the controller (eventually, a cluster of controllers). Depending on the state values, the controller performs some actions which are specified by some user-defined network applications.

Offloading network applications implies to embed some or all of the application elements into the network devices. This involves application states being embedded into the network devices under the form of stateful primitives natively supported by the network devices.

The type and corresponding amount of available resources at each network device poses hard constraints for the embedding of an application. Dedicated hardware devices, such as switches and routers, typically have limited resources in terms of processing capabilities, processing power, memory and bandwidth, but lead to almost zero latency during the execution of local processing. On the other hand, general purpose network devices such as SDN controllers provide resource flexibility at the cost of large processing latency. To minimize the application execution latency, during the embedding phase, network applications exceeding the resources constraint at a single network device may be be split across multiple devices. If application splitting still does not satisfy the resources constraint, the application will be fully delegated to the controller, as in the case of traditional stateless SDN.

In addition to the resource constraints, most of the applications present a dependency among states and actions, i.e., states are accessed/modified and actions are executed according to a well-defined order which is tightly bound to the definition of the application.

Given a generic state stored in a given network device , is said to be local if it can be accessed (read/write) only by itself. In such scenario, can be internally embedded in (provided that supports it). On the contrary, when is accessed (read/write) by multiple network devices that share the state, is said to be non-local. If all states related to a network application are local, the offloading does not present any considerable challenge as states can be embedded into a single network device, assuming no violation of the capacity constraint. However, when a state is non-local, multiple network applications or multiple parts of the same application must be able to access the state.

In classical stateless SDN, non-local states are managed by states polling and aggregation at the controller. Instead, in stateful SDN, non-local states can be supported with one of the two approaches:

  • [leftmargin=*]

  • Single replica. As proposed in SNAP [2], the state is embedded just in a single network device, thus a unique replica is available in the whole network. In SNAP, the choice of the network device is optimized according to some optimization criteria, e.g., distance among dependent states, load-balancing across the network devices, etc. All the traffic affected by/affecting the state is then routed to traverse the network devices storing the unique replica of the state. This may lead to major scalability and performance impairments, especially when a state is affected by/affects a large amount of traffic.

  • Multiple replicas: A single state is made available with multiple copies of it inside the network, by being replicated across multiple devices. This approach permits to distribute the traffic across multiple network devices while also providing robustness to failures. However, although this approach provides more embedding flexibility, it requires the presence of a replication protocol between the replicas to keep all replicas consistent.

Fig. 1: Example routing without replicated states (left) and with replicated states (right), as enabled by LOADER.

An example of the the two approaches is depicted in Fig. 1. Assume a network application composed of two states, namely and , and two flows originating from H1 and H2 and directed towards H3. For a single replica in SW1, the green flow is forced to make a detour from its shortest path to traverse SW1 storing , thus introducing additional load. On the contrary, for multiple replicas the green flow can reach its destination following the shortest path thanks to the presence of two replicas of , namely and , respectively embedded inside SW1 and SW2.

Among the most challenging aspect of the approach using state replication is that it requires the definition of a replication scheme which must operate among all the replicas to keep all replicas of a state consistent among each other. When developing the application, the programmer must be able to take into account the presence of eventual inconsistency among states. Thus, an abstraction layer is necessary for network applications based on replicated states.

In addition to define a set of guidelines to implement a suitable replication mechanism, LOADER provides a general abstraction model and a framework for developing network applications based on state replication. In the following, we identify a common abstraction for network applications permitting LOADER to be target independent and completely agnostic to the underlying network hardware. The abstraction is made generic by: (i) supporting network applications operating only on local states, as they fall into the special-case category of single-replica states, (ii) supporting the absence of stateful switches, (iii) being target-independent from the technologies implementing the data plane.

Iv LOADER abstraction model and framework

LOADER naturally extends functionalities of previously proposed frameworks based on single-replica states. It is based on three main blocks, as shown in Fig. 2:

  1. application definition by means of predefined application elements;

  2. compilation phase by means of a compiler;

  3. embedding phase.

In the next section we define an abstraction model for LOADER programming that permits the decomposition of a network application in basic elements that can be directly embedded into network devices.

Fig. 2: Main building blocks of LOADER framework.

Iv-a Application definition

Fig. 3: DAG representation of a LOADER network application and its mapping to primitive elements.

At the top layer, the user defines network applications, as in classical SDN architectures. Applications are designed in an agnostic way with respect to the core components of the framework while having the only constraint of employing a set of predefined application elements. The application elements supported by LOADER are the only part of the framework exposed to the programmer in the form of APIs or generic language libraries, as shown in Fig. 2. These elements permit an efficient decomposition of user-defined applications during the compilation phase and provide a comprehensible abstraction for the compiler during their translation to network-supported primitives.

For the sake of the presentation, we consider a reference data center load-balancing application that works as follows: (1) whenever the load on the data center is medium-low, the application distributes the user’s request among the available servers in a load-balancing fashion, i.e., an arriving request is forwarded to the least loaded server, in terms of CPU utilization; (2) otherwise, when the data center is highly loaded, the user’s request is sent to the controller for further processing.

Fig. 3 depicts an example of a generic network application employing LOADER abstraction. Each application elements is defined as follows:

Iv-A1 States

Let be the set of states associated with a network application , with be the th replica of state , with . For the reference load-balancing application, state represents the current CPU load of a generic server , where , and is the number of available servers.

Iv-A2 Reduction function

The reduction function is a generic multivariate function that maps states in to a reduced version of the input states. It is obtained by combining a set of primitive reduction actions natively available in the network device. In the reference application, with and , which compute the index corresponding to the minimum and the average of an array of values, respectively. Consequently the reduced versions are just two scalars: and .

Iv-A3 Trigger function

Based on

, the trigger function evaluates the presence of a particular event and decides whether a reaction is required or not. The reference application operates concurrently on two trigger functions leading to different activity functions. The first trigger function checks if the data center load

is below a given threshold (corresponding to a low load scenario). The second trigger function instead always returns true by passing the index of the least loaded server to the activity function.

Iv-A4 Activity function

The activity function is a sequence of actions that are executed when the events associated with a trigger function occur. In the reference application, two actions are defined. Action 1 is to send the request to the controller. Action 2 is to send the request to a specified server. If the trigger function (i.e., larger than the threshold) is satisfied, then Action 1 is executed, otherwise Action 2, both at the switch where the request has been received.

Iv-B Compilation phase

Network applications are compiled through the LOADER compiler, as shown in Fig. 2. The compiler decomposes a network application into a set of basic primitives supported by network devices. The catalog of available primitives depends on the specific network devices operating in the network and is stored in the resource management module of the network controller and it is updated through the network management plane, e.g., at device installation time.

The compiler takes as input the network capabilities in the form of available basic primitives, and the user-defined application in the form of LOADER application elements. The application is then represented by the compiler in the form of a DAG (Directed Acyclic Graph) composed of its basic elements, as shown in Fig 3. The compiler reconstructs the dependency among each application element and maps them to basic primitives supported by the network devices composing the network so that:

  • states are mapped into primitive data structures, such as counters, registers, hash tables, etc., to store application states;

  • reduction, trigger and activity functions are mapped into primitive actions, i.e. basic processing/decision capabilities offered by network devices.

Iv-C Optimal embedding and application reaction latency

The embedding consists of mapping the primitive elements provided by the compiler into a set of physical network devices. This is performed by exploiting the target-specific drivers and southbound APIs (e.g., P4Runtime, gRPC, OpenFlow, etc…) offered by the embedding engine of the controller.

Fig. 2 shows the interaction of the embedder with the rest of the framework. The embedder takes as input: (i) the set of primitive elements provided by the compiler, (ii) the resource availability inside the network provided by the controller resource manager and (iii) the actual location of the resources inside the network provided by the controller topology manager. Given this information, it is possible to find a set of feasible embeddings of the decomposed application inside the network devices supporting the required primitives. Notably, each element of the network application is not required to be embedded in a single network device. Instead, individual primitives composing the network application can be embedded in different network devices, based on the types of primitives supported by the network devices and their corresponding amount and location inside the network. The adopted algorithm to optimize the embedding (i.e., computing the optimal number of replicas and their placement within the network) is outside the scope of our work and the problem can be solved using the scheme proposed in [23].

Iv-C1 Constraints on primitives location

In the absence of co-location of primitive actions and primitive data structures directly operated by those primitive actions, state replication is mandatory. Indeed, to perform the reduction of a given set of states, the states must be locally available at the network device operating the reduction function. This requires either to provide co-location of the states and reduction functions or to perform state replication at the network device storing the corresponding reduction primitive.

Iv-C2 Intra-application state sharing

States may be shared among different network applications. Fig. 4 shows an example of two network applications and sharing a common state . With single replica approaches, must be embedded in only one network device. As a consequence, the device storing must serve both and , which may lead to scalability issues when the number of applications employing grows large. Instead, with state replication, the two applications can be made independent by replicating in and .

Iv-C3 Application reaction latency

Given an application embedding, it is possible to evaluate the corresponding reaction latency, by considering the position of the primitives in the network, the propagation delays between the involved network devices and the replication delay. For a single-replica state, the replication delay is null. On the other hand, in the case of multiple replicas, the reaction latency models the latency required to commit a new value of the state to all the replicas and will be explained in details in Sec. V-A. Interestingly, as investigated in [23], an optimal embedding might lead to multiple replicas. Although multiple replicas imply non-null replication delays, this delay can be compensated by a much smaller application execution delay. The distributed DoS detection application, considered later in Sec. VII, is an example of such a scenario, clearly showing the advantage of keeping multiple replicas for some network-wide applications.

Iv-C4 Objective-based embeddings

The optimal embedding is chosen by minimizing a particular cost function. The definition of the cost function highly influences the way the embedding is performed, as shown by [23]. As an example, a cost function aiming at reducing the network energy consumption may lead to consolidate the application into few network devices. On the other hand, a cost function modeling the network congestion may lead to replicate the application across multiple network devices to balance the traffic across the network.

Iv-D LOADER in stateless SDN

In the case of classical stateless SDN networks, with network devices able to perform only basic forwarding/routing operations, the LOADER approach can be still adopted. Indeed, LOADER provides only an abstraction layer between the actual application and its mapping to the network devices. In LOADER the controller is seen as a network device with (almost) unlimited resources, in which the embedder can concentrate the states and the algorithm logic.

Fig. 4: Reduction function decomposition in case of two network applications sharing a state without replicated states (left) and with replicated states (right).

V Consistency among states

To provide correct functionality of the application, all the replicas of a state must be consistent. Consequently, a read operation of any replica at any given time should eventually return the same result. The CAP theorem [9] states that, for a replication scheme only two properties can be picked at the same time, out of Consistency, Availability and Partition tolerance. Considering that network failures may occur, partition tolerance cannot be left out of the design of the replication algorithm, leaving us with two main reference models:

  • [leftmargin=*]

  • Strong consistency. This model privileges consistency over availability, meaning that a read operation on any non-faulty replica will return the most recent committed value (same for all replicas) or an error. This property is achieved at the cost of reduced availability due to the requirement of multiple interactions between replicas and is based on complex consensus protocols [11].

  • Eventual consistency. This model privileges availability and results in instantaneous operations on all replicas with a considerably reduced protocol complexity. Although it introduces transient inconsistency, the latter can be seen as an error in the value of a local replica.

The choice between the two models depends on the level of tolerance of the considered network application in the presence of temporary inconsistencies between replicas of the same state. The majority of network applications require small packet processing latencies. Indeed, excessive latencies may lead to noticeable performance degradation in the case of real-time traffic and applications performing per-packet processing. This leads to the necessity of privileging high availability when state changes occur.

For highly mutable states, replication schemes based on strong consistency may lead to excessive latency due to the complex protocol needed to reach the consensus, ultimately leading to excessive commit delays which will preclude the correct application functionality. However, the vast majority of network applications operate on statistical network measurements and remain robust even in the presence of small errors for the value of the global state, making strong consistency less essential.

V-a Replication delays and state inconsistency

LOADER does not impose any constraint on the adopted replication scheme. Depending on the requirements of the state to be replicated, the most suitable scheme must be adopted. It is generally true that replication schemes based on strong consistency are more complex and introduce larger latency to commit a value than the schemes based on eventual consistency. Thus, in the following, we assume to support the case of eventual consistency, for which the precise sequence of concurrent writes on the different replicas is not affecting the application correctness. We support a basic gossiping scheme to propagate updates. Notably, LOADER framework is compatible with other consistency schemes, but this requires to implement the corresponding replication protocol and the suitable reconciliation scheme. This extension is outside the scope of the current work and it is left for future work.

In an eventual consistency scheme, each state is associated with a certain replication delay , i.e., the maximum amount of time required to convey a state update to all its replicas. Note that corresponds also to the worst case inconsistency time. Assume now that a state is replicated with period (i.e., the inverse of the replication frequency). Let be the communication latency between network devices and , taking into account the propagation delay (we assume isolation of replication traffic from data traffic, thus negligible queueing delays). If is defined as the set of nodes storing replicas of , we can claim:

(1)

The programmer is required to develop network applications by keeping in mind that different state replicas may suffer of inconsistency intervals during which their values may differ. To cope up with this, LOADER exposes to the programmer the possibility of defining an explicit inconsistency level for the replicated states. This is made possible by defining a level of state inconsistency inside the trigger function. The output of a network application is driven by the outcome of the trigger function, and for this reason specifying the inconsistency level at the trigger function is sufficient to determine also the overall state inconsistency of the application.

We foresee two main inconsistency metrics which can be defined by the programmer: (1) time obsolescence and (2) update error . The former metric provides means of defining an upper bound on the time freshness of the state replicas and guarantees that at any given time any replica will contain a value not older than in time. The latter instead specifies the maximum admissible inconsistency in terms of uncommitted writes for any state variable, thus ensuring that the difference between all the replicated states does not exceed a number of state writes. The actual choice of the adopted inconsistency metric and the corresponding value is left to the programmer and it largely depends on the particular network application.

LOADER guarantees that the constraints specified by the programmer in terms of inconsistency metrics are satisfied. During the embedding phase LOADER first assigns replicas positions in the network so to minimize the maximum communication latency between any pair of replicas, i.e., minimize the second term of (1). Now we have two cases. If a time obsolescence is specified, then the replication periodicity must be set such that:

(2)

If instead an update error is specified, now must be related to rate of write operations on the state over the time. To satisfy this constraint for a generic state , it is sufficient to evaluate as the maximum number of write operations performed on over a time interval . Note that depends on the specific meaning of the considered state and should be evaluated a priori. E.g., for a packet counter at an interface, it is obtained by the data rate divided by the transmission time of a minimum size packet. Let denote the number of writes for state up to time . By construction, it holds:

(3)

By definition, we can bound (3) with and obtain:

(4)

Based on (2), is chosen such that:

(5)

Note that in the case of states which allow a definition of absolute state error based on some norm (e.g., scalars, arrays, graphs), knowing the nature of write operations permits to translate the update error into absolute value error. Assuming that a write operation can alter the state by a maximum amount, it is possible to rewrite in terms of absolute state variation and derive the temporal constraints following the same above formulation.

Listings LABEL:code:2ms_trigger, LABEL:code:2p_trigger and LABEL:code:no_trigger provide an example of the definition of a trigger function in LOADER for a toy example (i.e., summation of two states). The listings show respectively a trigger function with a given value of time obsolescence (), a trigger function with update error () and a trigger functions which does not tolerate any state inconsistency.

r = ReductionFunction(states=[s1, s2], operation=stateSum)
tr = TriggerFunction(s0=r.Result(), trigger=(r.Result() > 0),
          inconsistencyLevel=TimeObsolescence(2, "ms"))
Listing 1: Example of trigger function with time obsolescence equal to 2ms.
r = ReductionFunction(states=[s1, s2], operation=stateSum)
tr = TriggerFunction(s0=r.Result(), trigger=(r.Result() > 0),
          inconsistencyLevel= UpdateError(10))
Listing 2: Example of trigger function with update error equal to 10 writes.
r = ReductionFunction(states=[s1, s2], operation=stateSum)
tr = TriggerFunction(s0=r.Result(),trigger=(r.Result() > 0))
Listing 3: Example of trigger function without inconsistency (i.e. replication is not allowed).

V-B Replication traffic generation

To replicate a state, network devices generate by themselves update packets, based on the required replication periodicity . This generation is not currently supported in off-the-shelf hardware for stateful switches as a fundamental primitive, since, for performance reasons, packet generation events are triggered only by packet arrivals. Depending on the actual hardware, we foresee the following solutions which provide a way of generating new packets without any hardware modification of current off-the-shelf chipsets:

V-B1 Controller-triggered updates

The generation is triggered by the controller. In the case of periodic updates, the controller sends periodic trigger messages to the network devices, where they are processed and used to generate the update packets, by acting upon the reception of the trigger messages. Despite its simplicity, this approach has many limitations. First, the required control bandwidth from the controller to each switch can become relevant for small update periods. Second, the controller is loaded with an additional task, impairing its scalability.

V-B2 Traffic-triggered updates

The generation is triggered directly by the reception of data packets received at any interface of the network device. This permits to self-adapt the amount of replication traffic on the dynamicity of the states, whenever these depends on the arrived traffic. In terms of implementation, the update message is generated by cloning a data packet and then modifying it to carry the update value. For what concerns stateful SDN switches, we consider two possible approaches to regulate the replication traffic rate based on native internal primitives:

  • [leftmargin=*]

  • packet period . By keeping a packet counter, a new update packet is generated every received packets, i.e., where is the minimum packet arrival rate over the whole switch. This can be used in (1) to choose and satisfy the given inconsistency metrics. Intuitively, the update rate is proportional to the arrival rate of data packets which may suit well particular traffic-monitoring applications. On the other hand, for other applications this approach may lead to shortcomings, since in the absence of transit traffic no updates will be generated.

  • time period . An update packet is generated at the first packet arrival after time and thus . This can be used in (1) to choose and satisfy the given inconsistency metrics. Intuitively, this case results in periodic updates, i.e., a fixed replication rate approximatively independent from the traffic.

In terms of message format, the replication packet must carry the state identifier, the state value and the identifier of the switch originating the update. All identifiers can be predetermined by the controller at the time of application instantiation. This mechanism guarantees the state uniqueness while providing flexibility in term of state format encoding.

Finally, to route properly the replication traffic, the position of each application primitive in the network is considered. LOADER exploits the network knowledge at the controller to install updates forwarding rules through a Steiner tree, either shared across all the states or one specific for each state.

Vi LOADER implementation

To prove LOADER feasibility, we developed a lightweight implementation of the framework. We integrated LOADER into ONOS v1.14 while using P4 [8] and Open Packet Processor (OPP) [6] switches for the data plane. The choice of these two distinct data plane architectures aims at showing the generality of the proposed approach, which results to be independent of the specific type of devices adopted in the network.

Vi-a Control plane implementation

LOADER has been integrated inside the ONOS controller in the form of an ONOS application with custom control logic overriding the default controller behavior.

Vi-A1 Application definition

We consider a set of predefined application elements supported by the switches. This assumption permits to drastically simplify the implementation of the application definition phase inside ONOS. In particular, we specify each application element by means of predefined ad-hoc classes for each type of application element, based on the primitives supported by the switches. Thus no interaction with the resource manager of ONOS is performed.

Vi-A2 Application elements embedding

For the purpose of this work, we consider a homogeneous network with devices composed of programmable switches having the same type and amount of resources. Since the algorithm to solve the optimal embedding problem is out of the scope of this work, we consider the following simple embedding scheme, inspired to the one proposed in [23]. The position of each replicated primitive inside the network is determined by considering the betweenness centrality of each network device, weighted by the amount of traffic flowing through it. The main idea is to privilege the devices that are traversed by most of the traffic. Furthermore, the number of replicas of each primitive is fixed a-priori and not optimally chosen.

Vi-A3 Replication traffic routing

The replication traffic between the different replicas is routed on a single Steiner tree shared across all the replicas. This permits to reduce both the amount of replication traffic and the amount of flow table entries.

Vi-A4 State identification

LOADER requires a unique identifier for each state, to guarantee correct processing of update packets. LOADER assigns a unique progressive identifier to each state during the application compilation phase. For replicated states an additional identifier is assigned to distinguish between different replicas of the same state.

Vi-B P4 implementation

P4 [7] is a novel data plane programming language which aims to achieve both target and protocol independence, in-field reprogrammability while providing also stateful operations thanks to the presence of persistent memories. Similarly to OpenFlow, P4-enabled switches exploit a reconfigurable match-action pipeline, thus permitting to define multiple packet processing stages. P4 is protocol independent thanks to the presence of a programmable parser and deparser placed at the two extremes of the packet processing pipeline. Thanks to the parser programmability, it is possible to define custom protocol headers or even extend the parsing/deparsing actions to the packet payload.

To provide connectivity between ONOS and P4 switches (version 1.1), we exploited P4Runtime. At the time of this work, P4Runtime implementation in ONOS v1.14 performs only basic flow tables manipulations without providing support for features such as runtime pipeline modification and manipulation of extern objects such as registers and counters. Due to these limitations, we implemented the required primitive data structures and the replication control logic directly in P4 instead of letting the controller push them to each switch at application creation time. However, the controller is left with the possibility of activating or deactivating application elements inside a switch, which is equivalent to pushing new logic into switches.

Vi-B1 Replication traffic format

Replication traffic is transported through packets that are formatted with a custom header carried by Ethernet packets, identified by an unused protocol type (LOADER_ETHTYPE) in the Ethernet header. We leverage P4 to define custom packet formats and we implemented LOADER header format directly inside the programmable parser.

Listing LABEL:code:LOADER_header_p4 shows the full header format of LOADER packets. As previously mentioned, all identifiers are assigned by the controller during application initialization. Being srcSwID, stateID and replicaID, respectively, source switch, destination switch and replica identifiers, those identifiers are required to correctly interpret and process the update packets at the destination switches. On the other hand, the inclusion of dstSwID permits to implement more sophisticated replication schemes instead of employing ours based on shared spanning trees. In our experiments we implemented a broadcast transmission among all switches holding the replicas and for this reason dstSwID field remained not utilized. The stateValue field carries the actual value of the replicated state and its length is upper bounded by a constant number of bit, i.e., STATE_MAX_WIDTH. Finally, the L3ProtocolType field permits to attach LOADER packets to transit packets, i.e., to piggyback replication information on data traffic.

We generate nested LOADER headers to carry multiple state updates in a single packet. This functionality is depicted in Listing LABEL:code:LOADER_parser_state which shows the implementation of the LOADER protocol parser. Although in this work we opted to define a custom LOADER header, replication traffic transport can be also implemented by employing Inband Network Telemetry (INT) format [12] defined by the P4 Language consortium.

header LOADER_t {
  bit<32> srcSwID;
  bit<32> dstSwID;
  bit<32> stateID;
  bit<32> replicaID;
  bit<STATE_MAX_WIDTH> stateValue;
  bit<16> L3ProtocolType;
}\end{lstlisting}
%\end{minipage}\end{center}
\end{figure}
%\vspace{-5pt}\begin{center}\begin{minipage}{0.9\linewidth}
\begin{figure}[!tb]
\begin{center}
    \begin{lstlisting}[linewidth=0.95\columnwidth,language=P4, caption=LOADER parser implementation in P4, label=code:LOADER_parser_state]
state parse_LOADER {
  packet.extract(hdr.LOADER);
  transition select(hdr.LOADER.L3ProtocolType){
    LOADER_ETHTYPE : parse_LOADER;
    IP_ETHTYPE : parse_IP;
    default : accept;
  }
}\end{lstlisting}
%\end{minipage}
\end{center}
\end{figure}
\subsubsection{Generation of periodic update packets}\label{sec:synup}
Commercial implementations of stateful switches generally do not support the generation of self-triggered events, precluding the possibility of
%neither arbitrary internal polling {\color{red}PG: polling???} nor the generation of new packets, thus precluding the possibility of
employing periodic updates.
%This makes the generation of periodic replication traffic challenging on off-the-shelf chips.
However, in conformity with their purpose, switches are able to execute routines during packets reception and departure. Such routines may be related to simple packet processing up to more complicated user-defined routines in programmable switches. This behavior can be exploited to provide a simple mechanism to approximate a periodic traffic generation without hardware modifications.
%Due to the fact that programmable switches can execute arbitrary code when a packet is received on any of its ports it is possible to
We exploit traffic-triggered updates, as described in Sec.~\ref{sec:syn_traffic}, in which the % transit traffic as a triggering mechanism for the generation of replication traffic, in which the
% In typical scenarios a switch observes an aggregate incoming traffic with average rate $L>0$ pkts/s. Assuming constant $L$, this means that internally the switch executes a packet processing routing every $1/L$ seconds. A sufficiently large $L$ permits to implement fine-grained periodic user-defined routine execution without introducing any modification to the underlying hardware.
%
temporal periodicity $d_i^R$ is obtained as follows.
%Given $L$  and the desired replication periodicity $d^R$ it is sufficient to employ a counter which will trigger the periodic replication routine once $n=\lceil d^R\times  L\rceil$ packets have been observed since the last replication routine execution.
%It is true that in real case scenarios $L$ is not constant and it is instead variable in time. For this reason a naive model employing a simple packet counter is not enough to guarantee correct functionality of the mechanism. To overcome this issue we implemented a solution which employs time registers in conjunction with the internal switch clock.
During the execution of a replication routine, the current timestamp $t_\text{clk}$ is saved as $t$. %\todo{Revise notation}
For each subsequent incoming packet we check the value of the internal clock $t_\text{clk}$ and compare it against the expected execution time of the routine, i.e., against $t’+d_i^R$. If $t_\text{clk} \ge t’+d_i^R$ a new replication routine is executed generating an update packet and $t$ is updated.
Consequently, the first packet arriving after $d^R_i$ time will trigger the generation of the update packet.
%\subsubsection{Generation of a replication packet}
%In the case of replication traffic generation the periodic routine consists of
The replication routine
generates and transmits a state-update packet filled with the state related information. To generate these packets we employ the packet-cloning extern provided in P4 v1 model~\cite{p4-repo}. Once the update has been triggered by an arriving packet, such packet is cloned to the egress port that has been assigned to it by its prior processing. Subsequently the original packet undergoes a transformation which substitutes its original header with the LOADER header filled with all the information related to the state which needs to be updated. At the same time the payload of the packet that triggered the update is dropped. Following this operation, the newly created LOADER packet is transferred to the corresponding output queue without undergoing further processing. Since the triggering packet needs to be fully processed at the time of cloning, this functionality, which is illustrated in Listing~\ref{code:LOADER_cloning}, resides at the very end of the ingress processing pipeline. In this way the replication traffic generation routine does not impact in any way the transit packets.
\subsubsection{Replication traffic routing}\label{sec:syn}
The generated replication packets are transmitted on one or more egress ports following a Steiner tree shared among all replicas. %The choice of the ports depends on the link belonging to the Steiner tree which is used to distribute all the updates across the replicas of a state and it is precomputed by the controller.
The distribution tree consists of a mapping {\em (Switch, PortList)} which assigns to each switch of the Steiner tree the set of ports connected to the corresponding links.
%over which they connect to other switches requiring the replication traffic.
All newly generated or transit LOADER packets match against a specific match-action table which sends a copy of the packets for each port specified in  {\em PortList}. To avoid loops for transit LOADER packets,
% i.e. incoming LOADER packets which have not been generated locally,
at the egress stage the original ingress port of each packet is compared against the current egress port. If the two ports are the same, the packet is dropped. This mechanism permits to keep the amount of flow entries related to LOADER routing as low as one entry per state per switch. %In the extreme case of a single distribution tree across all states, one entry is enough for all the replicated states.
Both the P4 switch and the LAODER framework implementations are publicly available at the LOADER repository~\cite{loader-repo}.
%\vspace{-5pt}\begin{center}\begin{minipage}{0.9\linewidth}
\begin{figure}[!tb]
\begin{lstlisting}[linewidth=0.95\columnwidth,language=P4, caption=Generation of replication packet in P4, label=code:LOADER_cloning]
if( meta.LOADER_meta.state == UPDATE_NEEDED ){
  clone_pkt_to_egress(sm.egress_spec);
  fillLOADERHeaderTable.apply(meta.LOADER_meta.state_id);
  set_state_update_time(meta.LOADER_meta.state_id);
}\end{lstlisting}
%\end{minipage}\end{center}
\end{figure}
%every M ms, where M is a configurable window size.
%The global state is
%determined by summing $s_L$ the last available local state with the latestwith another switchs outcome, right after a control message receipt. {\color{red} Unclear last sentence.}
\input{opp.tex}
Listing 4: LOADER header definition in P4

Vii Distributed detection of DDoS attacks

As a proof of concept, we developed with LOADER a simple yet significant application for the distributed detection of Distributed Denial of Service (DDoS) attacks, denoted as DDoSD. The main idea of the distributed detection is to exploit the typical temporal correlation between the increase of traffic across all the network devices at the border of the network, due to the distributed nature of the attack. Clearly, the correlated traffic increase across the edge routers is much more reliable way to detect an attack with respect to a monitoring the traffic on a single network device only. Consequently a network application performing DDoSD must be able to capture this sudden increase in the the network traffic.

With traditional SDN approaches, the controller is involved in the detection process by being notified about the transit packets by switches. This leads to large overhead in terms of traffic and of detection latency. Instead, LOADER enables a distributed detection process operating directly at the switches, without any controller involvement. Furthermore, the actions to counter the attack are executed in a distributed way, by each network device involved in the detection.

As shown in Fig. 5, we consider a large network (e.g., an Autonomous System - AS) connected to other networks (e.g., other ASs) through different edge routers and the attack targets a set of internal servers. Since the definition of a realistic DDoSD algorithm is a well-known problem in the literature [27] and it is completely out of the scope of this work, we employ a simple proof-of-concept threshold-based detection scheme, which can be used as a foundation for more sophisticated DDoSD mechanisms.

Vii-a Network application definition

The total traffic entering the whole network and directed toward the targeted servers is defined as the sum of the inbound traffic over each edge router (SW1-SW4 in our reference topology). Based on the value of the inbound traffic the network application must perform some retaliation to counteract the DDoS attack. Consequently it is straightforward to map this kind of application to a LOADER application as described in the following.

Vii-A1 States

Given edge routers, we define as the average rate of inbound traffic traversing the border router , with . As monitoring target, we employ the rate of incoming SYN packets directed towards the internal servers.

Vii-A2 Reduction function

The reduction function employed by the application is composed of a single primitive action, namely . Consequently, the output of the reduction function is defined as .

Vii-A3 Trigger function

Following the previous discussion, we define the threshold function simply as a simple comparison of against a predefined threshold. Thus, a DDoS attack is detected locally at each switch if is larger than a given threshold, above which the attack is considered as detected. The threshold is determined with standard test-based statistical methods.

Vii-A4 Activity function

We employ a simple activity function which notifies the controller once the application has been triggered.

Listing LABEL:code:ddos shows how such application is described in LOADER.

from Controller import TopologyManager
from LOADER.PrimitiveActions import Drop, StateSum, Rate
from LOADER.Scope import Pkt
def extPortFilter(devices):
  extPorts = []
  for d in devices:
    extPorts += [p for p in d.getPorts() if p.Type==EXTERNAL]
  return (Pkt.ingressPort in extPorts) and (Pkt.TCP.Flag.SYN == 1)
R = 1000 # DDoS threshold in SYNs / s
# List of all edge routers
devices = TopologyManager.getEdgeRouters()
applicationStates = []
# Iterate over all edge routers
for i in range(devices):
  # Create a state for each edge router
  s = State(target=d,
      scope=Rate(filter=Pkt(filter = extPortFilter([d]))))
  applicationStates.append(s)
# Define the reduction function as the sum of application states
  r = ReductionFunction(states=applicationStates,
      operation=StateSum)
# Define the activity function to drop all incoming packets
a = ActivityFunction(target=devices,
scope=Pkt(filter=extPortFilter(devices)), action=Controller.Notify("DDoS detected"))
# Define triffer function to perform probabilistic dropping
tr = TriggerFunction(s0=r.Result(),
      trigger=r.Result()>R,
      inconsistencyLevel=TimeObsolescence(0.2, "ms")
      activity = a)
Listing 5: DDoS detection with LOADER

Vii-B Benefit of replicated states

In a single replica approach (i.e., in the absence of LOADER) the DDoSD application would require all the traffic entering the network to traverse a single switch holding the state monitoring the incoming traffic. Thus the network congestion would grow and could not be compatible with some traffic management schemes (e.g., load balancing) that require to control the routing arbitrary within the network.

LOADER instead permits to replicate the entire DDoSD application over multiple switches, thus minimizing the data overhead over the whole network. At the same time, LOADER introduces an overhead in terms of replication traffic, whose amount depends on the allowed inconsistency level. The replication traffic will be evaluated experimentally for the DDoSD application in Sec. VII-D.

Notably, DDoSD is robust to possible transient inconsistencies between the values of total traffic estimated at each switch, thus employing an eventual consistency replication scheme will not create noticeable degradation due to replicated states estimation errors.

Vii-C Implementation

The considered DDoSD scheme has been implemented on top of two different programmable data plane platforms: (1) /; (2) OPP. Furthermore, the definition of the DDoSD application was performed inside ONOS with LOADER abstraction which permits to automatically offload and configure the developed network application.

Vii-C1 Control plane implementation

We implemented basic LOADER functionalities related to this particular use case inside ONOS. To provide support for this application we considered a simple embedding algorithm. The algorithm receives as an input the network topology, the set of flows to monitor defined as source-destination pairs and the maximum amount of admissible replicated states . It then assigns positions of each application state by considering the first nodes with highest betweenness centrality. We assume a sufficiently large amount of resources inside switches, thus permitting function co-location with consequent replication of all application elements. The update distribution among the chosen nodes is performed through a shared spanning tree whose routing is setup during the application initialization.

Vii-C2 Data plane implementation with P4

Our prototype is developed and tested in a virtual environment using Mininet [15] and P4-enabled virtual switches targeting using the v1 Model and using the Simple Switch Architecture [19].

We estimate the rate of incoming TCP SYN packets by employing a sampling window equal to . Let be the estimated rate in the time interval with . The average rate is estimated at each switch as

(6)

and represents the local state to be shared across all the other border routers, coherently with the description of Sec. VII-A. In particular, is chosen as a power of 2 due to the hardware limits in P4 switches imposed to the types of operations that can be implemented, i.e., shift operations are supported, divisions are not [22]. Notably, The most recent samples of the estimated rate are stored in a circular buffer. Replicated states are instead saved in dedicated registers.

Vii-C3 Data plane implementation with OPP

The OPP implementation requires a sequence of three stages: stage 0 extracts the state from update messages; stage 1 stores the state from the metadata notified by the previous table, performs monitoring and detection and generates update messages; stage 2 performs simple L3-forwarding. Stage 0 represents the stateful processing core of replicated states. The processed flows are identified by the IPv4 destination addresses of the target servers. Stage 0 also considers one flow data variable containing the switch-local state and the variables storing the replicated states. Switch-local state is computed by employing a hardware-implemented Exponential Weighted Moving Average (EWMA) counting the number of TCP SYN packets in a given preconfigured time window.

Vii-D Experimental evaluation and validation

We configure a Mininet-based emulation environment deploying the topology shown in Fig. 5, where, for the sake of simplicity, each cluster and each AS is represented by a Mininet host. To simulate the DDoS attack, we use hping3 tool to send TCP SYN requests from all ASs to all internal servers. In each experiment, during the first 20 seconds, we send the request at a slow rate, and then we increase the rate of all senders in a such a way to trigger the execution of the activity function. We consider experiments with varying : (i) single replica embedded in SW1 (), (ii) 2 replicas () embedded in SW1 and SW3, and (iii) 4 replicas (

) embedded in SW1, SW2, SW3, SW4. We repeated the experiments to achieve negligible 95% confidence intervals if shown in the plots.

Fig. 5: Reference topology for DDoS Detection use case.
Fig. 6: Temporal evolution of the local, remote and global states for the stateful switches in case of 2 replicas for the global state in P4 implementation.

Fig. 6 shows the evolution of application states alongside with the evolution of for the case of 2 replicas, implemented in P4. Identical results are obtained with OPP and thus are not reported for the sake of space. As expected, the values of evaluated at SW1 and SW3 are coherent, and permit a contemporary detection of the DDoS attack in the two switches, without any interaction with the controller. This experimental result validates our proposed implementation for both P4 and OPP.

In Figs. 7-8 we show the utilization of the links present in the ring topology connecting all switches, for different values of , with both P4 and OPP implementations. Clearly, for one replica (i.e, single replica approach) the load on the link is greatly unbalanced and in general higher for all the links. By increasing the number of replicas to 2, the load of the data traffic decreases by a factor of 1.6 both in P4 and OPP and is much better balanced across the links. The slightly different values depends on the different mechanisms adopted for triggering the update event by the incoming traffic: in P4 the update rate depends on the traffic, whereas in OPP it is independent. Adding two other replicas reduces the data traffic by around 20% in both implementations, but now the replication traffic becomes more relevant due to the higher number of replicas. Indeed, the fraction of update packets increases from 14% (for 2 replicas) to 24% (for 4 replicas) in P4 and from 11% (for 2 replicas) to 23% (for 4 replicas) in OPP. Thus, the two implementations behave very similarly and show the beneficial effect on the overall traffic in the network due to multiple replicas.

Fig. 7: Link occupancy for data and replication traffic in case of replicas for global state in P4 implementation.
Fig. 8: Link occupation for data and replication traffic in case of replicas for global state in OPP implementation.

Viii Other applications enabled by LOADER

In the following section we describe some examples of network applications which are shown to benefit from state replication. We show how those applications can be implemented with LOADER by providing their elements mapping and a code example for each of them.

Viii-a Distributed rate limiting

In [21] the authors propose a network-wide global token bucket. Similarly to a local token bucket, a global one allows to rate limit all the incoming traffic in a given network thanks to a network application performing probabilistic dropping at the edge routers of the network. However, differently from a local one, a global token bucket involves an instance of the same token bucket run independently at each border router and using a single shared state accounting for the total inbound traffic.

This kind of application can be easily mapped to LOADER by considering the DDoSD scheme and by changing only the trigger and the activity functions as follows:

Viii-A1 States

Given edge routers, we define state as the average rate of inbound traffic traversing edge router , with .

Viii-A2 Reduction function

The reduction function performs a sum operation among all local state with .

Viii-A3 Trigger function

In order to perform probabilistic dropping the trigger function must invoke the activity function proportionally to the rate of the incoming traffic and the desired rate.

Viii-A4 Activity function

Identically to the DDoSD case, the activity function must perform dropping of incoming packets whenever invoked.

The code related to this network application with the mapping of each individual element is shown in Listing LABEL:code:drl.

from Controller import TopologyManager
from LOADER.PrimitiveActions import Drop, StateSum, Rate
from LOADER.Scope import Pkt
def extPortFilter(devices):
  extPorts = []
  for d in devices:
    extPorts += [p for p in d.getPorts() if p.Type==EXTERNAL]
  return Pkt.ingressPort in extPorts
R = 100**6 # Desired rate in bps
# List of all edge routers
devices = TopologyManager.getEdgeRouters()
applicationStates = []
# Iterate over all edge routers
for d in devices:
  # Create a state for each edge router
  s = State(target=d,
      scope=Rate(filter=Pkt(filter=extPortFilter([d]))))
  applicationStates.append(s)
# Define the reduction function as the sum of application states
r = ReductionFunction(states=applicationStates,
      operation=StateSum)
# Define the activity function to drop all incoming packets
a = ActivityFunction(target=devices,
      scope=Pkt(filter=extPortFilter(devices)),
      action=Drop)
# Define trigger function to perform probabilistic dropping
tr = TriggerFunction(s0=r.Result(),
      trigger=(rand()<(r.Result()-R)/r.Result()),
      inconsistencyLevel=UpdateError(10),
      activity = a)
Listing 6: Distributed rate limiting with LOADER

In Fig. 9 we show an example of the distributed rate limiting application in action. We create two flows: Flow 1 from AS 1 directed towards server cluster 1 and another flow from AS 3 directed towards server cluster 3. We consider shortest path routing and place state replicas in SW1 and SW3. Flow 1 starts at time 0 with a rate of 5 Mbps while flow 2 starts with an offset of 20 s and with the same rate. Although the flows do not cross each other at any point in the network, when the flow 2 starts both of them are rate limited to a predefined aggregate 8 Mbps threshold. Note that oscillations in throughput are due to the adopted probabilistic dropping scheme.

Fig. 9: Distributed rate limiter with two flows at different edges of the network.

Viii-B Link-aware load balancing

In [1] the authors propose a load balancing scheme for data center networks, based on the congestion level of individual links from the source leaf switch to the destination leaf switch. Source leaf switches keep track of local uplink congestion and of the downlink congestion from each spine switch to the destination leaf. When a new flow starts, the source leaf switch selects a path to the destination by considering the one that minimizes the maximum congestion on the whole path, i.e., local uplink congestion and the downlink congestion on the spine.

For the sake of simplicity, we present a reduced version of the application with some omitted details and by assuming that the application targets a single leaf switch with spine switches. The application can be easily extended to many leaf switches by simply instantiating multiple instances of the same application and the states related to downlink congestion must be shared across multiple leaf switches.

This network application can be mapped to LOADER as follows:

Viii-B1 States

Given a leaf switch, we define state as the average load on the uplink ports, with . Additionally, we define state as the average downlink load on the port leading to the destination leaf switch of spine switch , with .

Viii-B2 Reduction function

The reduction function is composed of two primitive actions, namely and . Consequently, the reduced version of the states is obtained as:

Viii-B3 Trigger function

Differently from previous use cases, the trigger function in this network application triggers the activity function each time a new is obtained and does not require any additional checks.

Viii-B4 Activity function

The activity function involves simple insertion of a new per-flow forwarding rule for each new flow based on the outcome of the reduction function.

The code related to this network application with the mapping of each individual element is shown in Listing LABEL:code:link-lb.

from Controller import TopologyManager
from LOADER.PrimitiveActions import SetEgress, Rate, min, max
from LOADER.Scope import Pkt
# Filter for downlink ports (i.e. from spine to leaf)
def dlPortFilter(device):
  return Pkt.getEgressPort() in [p for p in device.getPorts() if p.Type == DOWNLINK]
# Filter for uplink ports (i.e. from leaf to spine)
def ulPortFilter(device):
  return Pkt.getEgressPort() in [p for p in device.getPorts() if p.Type == UPLINK]
# Reduction function for minimum path congestion
def minMaxCong(ulCong, dlCong):
  dstLeaf = TopologyManager.getSpineID(Pkt.getDst())
  return min([max(ulCong[i], dlCong[i][dstLeaf]) for i in range(len(TopologyManager.getSpines()))])
l = TopologyManager.getLeafSwitches()[0]
spines = TopologyManager.getSpines()
dlCong = []
ulCong = []
for p in l.getPorts(filter = ulPortFilter):
  s = State(target=l, scope=Rate(filter = Port(p)))
  ulCong.append(s)
for sp in spines:
  spineLoad = []
  for p in sp.getPorts(filter = dlPortFilter):
    s = State(target=sp, scope=Rate(filter = p))
    spineLoad.append(s)
  dlCong.append(spineLoad)
r = ReductionFunction(states=[ulCong, dlCong],
      operation=minMaxCong)
a = ActivityFunction(
      target = l,
      scope = Pkt(filter = (Pkt.TCP.Flag.SYN == 1)),
      action = insertRule(
      match = Pkt.getTuple(),
      action = SetEgress,
      args = r.Result()))
tr = TriggerFunction(
      s0=r.Result(),
      inconsistencyLevel=UpdateError(10),
      activity = a)
Listing 7: Link-aware load balancing with LOADER

Viii-C Resource-aware load balancing

A resource-aware load balancing application has been introduced in Sec. IV-A. The application performs load balancing of the user requests among the available servers based on the amount of available resources (e.g., average CPU utilization) at each server. As we already defined the function mapping, in the following we present solely the code of the application. For simplicity we do not define the states as they are not directly related to the network conditions, but instead to servers status. The code related to this network application with the mapping of each individual element is shown in Listing LABEL:code:resource-lb.

from Controller import TopologyManager
from LOADER.PrimitiveActions import SetEgress, Rate
from LOADER.Scope import Pkt, ExtScopeHelper
THR = 0.8  # threshold CPU load percentage
# Get the average CPU load of servers in the form of a list of states. We omit the details.
loads = ExtScopeHelper(scope="ServerLoad")
r1 = ReductionFunction(
      states = [loads]
      operation=min([i.Value() for i in loads]))
r2 = ReductionFunction(
      states = [loads]
      operation=mean([i.Value() for i in loads]))
a1 = ActivityFunction(
      scope = Pkt(filter = (TCP.Flag.SYN == 1)),
      action = SetEgress,
      args = r1.Result())
a2 = ActivityFunction(
      scope = Pkt(filter = (TCP.Flag.SYN == 1)),
      action = SetEgress,
      args = CONTROLLER_PORT)
tr1 = TriggerFunction(
      s0=r2.Result(),
      trigger=(r2.Result() <= THR),
      inconsistencyLevel=UpdateError(15),
      activity=a1)
tr2 = TriggerFunction(
      s0=r2.Result(),
      trigger=(r2.Result() > THR),
      inconsistencyLevel=UpdateError(15),
      activity=a2)
Listing 8: Resource-aware load balancing with LOADER.

Ix Conclusions

We propose a novel framework, namely LOADER, to address the limitation of stateful data planes in the presence of non-local states at the switches in the definition of the network applications. LOADER enables stateful switches to take decisions based on information which is not locally available. This is achieved by introducing a state replication mechanism among the switches. We discuss the main practical design challenges to support state replication, whose implementation is validated using both P4 and OPP stateful data planes.

Furthermore, we provide a high-level programming abstraction for the development of distributed network applications based on replicated states. Our programming model combines the expressiveness of a high-level programming model without ignoring the underlying hardware architecture of programmable switches. Thus, it is both of easy understanding for the programmer and can provide a comprehensible abstraction for the embedding of network applications.

By combining the proposed abstraction model with the implementation of the replication mechanism, LOADER effectively permits to support distributed network-wide applications without involving any central entity. As our results show, distributed network applications can be beneficial for the network performance and can be efficiently implemented in high-performance programmable stateful switches.

References

  • [1] M. Alizadeh, T. Edsall, S. Dharmapurikar, R. Vaidyanathan, K. Chu, A. Fingerhut, F. Matus, R. Pan, N. Yadav, G. Varghese, et al. (2014) CONGA: distributed congestion-aware load balancing for datacenters. In ACM SIGCOMM Computer Communication Review, Vol. 44, pp. 503–514. Cited by: §VIII-B.
  • [2] M. T. Arashloo, Y. Koral, M. Greenberg, J. Rexford, and D. Walker (2016) SNAP: stateful network-wide abstractions for packet processing. In ACM SIGCOMM, External Links: ISBN 978-1-4503-4193-6 Cited by: §I, §II, 1st item.
  • [3] R. Beckett, M. Greenberg, and D. Walker (2016) Temporal netkat. ACM SIGPLAN Notices 51 (6), pp. 386–401. Cited by: §II.
  • [4] P. Berde, M. Gerola, J. Hart, Y. Higuchi, M. Kobayashi, T. Koide, B. Lantz, B. O’Connor, P. Radoslavov, W. Snow, et al. (2014) ONOS: towards an open, distributed SDN OS. In ACM SIGCOMM HotNets, pp. 1–6. Cited by: §I.
  • [5] G. Bianchi, M. Bonola, A. Capone, and C. Cascone (2014-04) OpenState: programming platform-independent stateful Openflow applications inside the switch. ACM SIGCOMM CCR. External Links: ISSN 0146-4833 Cited by: §I.
  • [6] G. Bianchi, M. Bonola, S. Pontarelli, D. Sanvito, A. Capone, and C. Cascone (2016) Open Packet Processor: a programmable architecture for wire speed platform-independent stateful in-network processing. arXiv:1605.01977. Cited by: §I, §I, §VI.
  • [7] P. Bosshart and al. (2013) Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN. In ACM SIGCOMM CCR, Cited by: §VI-B.
  • [8] P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, et al. (2014) P4: programming protocol-independent packet processors. ACM SIGCOMM Computer Communication Review 44 (3), pp. 87–95. Cited by: §I, §I, §VI.
  • [9] E. Brewer (2012-02) CAP twelve years later: how the “rules” have changed. Computer. External Links: ISSN 0018-9162 Cited by: §V.
  • [10] H. T. Dang, D. Sciascia, M. Canini, F. Pedone, and R. Soulé (2015) NetPaxos: consensus at network speed. In ACM SIGCOMM SOSR, External Links: ISBN 978-1-4503-3451-8 Cited by: §II.
  • [11] H. Howard and R. Mortier (2019) A generalised solution to distributed consensus. External Links: 1902.06776, Link Cited by: 1st item.
  • [12] C. Kim, A. Sivaraman, N. Katta, A. Bas, A. Dixit, and L. J. Wobker (2015) In-band network telemetry via programmable dataplanes. In ACM SIGCOMM, Cited by: §VI-B1.
  • [13] H. Kim, J. Reich, A. Gupta, M. Shahbaz, N. Feamster, and R. Clark (2015) Kinetic: verifiable dynamic network control. In USENIX NSDI 15, pp. 59–72. Cited by: §II.
  • [14] L. Lamport et al. (2001) Paxos made simple. ACM Sigact News. Cited by: §II.
  • [15] B. Lantz, B. Heller, and N. McKeown (2010) A network in a laptop: rapid prototyping for software-defined networks. In ACM SIGCOMM HotNets, pp. 19. Cited by: §VII-C2.
  • [16] S. Luo, H. Yu, and L. Vanbever (2017) Swing State: consistent updates for stateful and programmable data planes. In ACM SOSR, Cited by: §II.
  • [17] J. McClurg, H. Hojjat, N. Foster, and P. Černỳ (2016) Event-driven network programming. In ACM SIGPLAN Notices, Vol. 51, pp. 369–385. Cited by: §II.
  • [18] M. T. Özsu and P. Valduriez (2011) Principles of distributed database systems. Springer Science & Business Media. Cited by: §II.
  • [19] P4 language repository. External Links: Link Cited by: §VII-C2.
  • [20] S. Pontarelli, R. Bifulco, M. Bonola, C. Cascone, M. Spaziani, V. Bruschi, D. Sanvito, G. Siracusano, A. Capone, M. Honda, and F. Huici (2019) FlowBlaze: stateful packet processing in hardware. In USENIX NSDI 19, pp. 531–548. Cited by: §I.
  • [21] B. Raghavan, K. Vishwanath, S. Ramabhadran, K. Yocum, and A. C. Snoeren (2007) Cloud control with distributed rate limiting. In ACM SIGCOMM Computer Communication Review, Vol. 37, pp. 337–348. Cited by: §VIII-A.
  • [22] N. K. Sharma, A. Kaufmann, T. Anderson, A. Krishnamurthy, J. Nelson, and S. Peter (2017) Evaluating the power of flexible packet processing for network resource allocation. In USENIX NSDI, Cited by: §VII-C2.
  • [23] A. Siddique Muqaddas, G. Sviridov, P. Giaccone, and A. Bianco (2019) Optimal state replication in stateful data planes. arXiv:1912.03025. Cited by: §II, §IV-C3, §IV-C4, §IV-C, §VI-A2.
  • [24] G. Sviridov, M. Bonola, A. Tulumello, P. Giaccone, A. Bianco, and G. Bianchi (2018) LODGE: LOcal Decisions on Global statEs in programmable data planes. In IEEE NetSoft, pp. 257–261. Cited by: §II.
  • [25] S. H. Yeganeh, A. Tootoonchian, and Y. Ganjali (2013) On scalability of software-defined networking. IEEE Communications Magazine 51 (2), pp. 136–141. Cited by: §I.
  • [26] Y. Yuan, R. Alur, and B. T. Loo (2014) NetEgg: programming network policies by examples. In ACM SIGCOMM HotNets, pp. 20. Cited by: §II.
  • [27] S. T. Zargar, J. Joshi, and D. Tipper (2013) A survey of defense mechanisms against distributed denial of service (DDoS) flooding attacks. IEEE communications surveys & tutorials 15 (4), pp. 2046–2069. Cited by: §VII.
  • [28] A. Zeineddine and W. El-Hajj (2018) Stateful distributed firewall as a service in SDN. In IEEE NetSoft, pp. 212–216. Cited by: §II.

German Sviridov received his BSc in Computer Engineering and MSc in Telecommunication Engineering, both from Politecnico di Torino, Italy. In late 2017 he joined the Telecommunication Networks Group (TNG) at the Dipartimento di Elettronica e Telecomunicazioni of Politecnico di Torino as a Ph.D. student. His current research interests involve programmable data planes for SDN and scheduling mechanisms for data center networks.

Marco Bonola received the Ph.D. degrees in telecommunications engineering from the University of Rome Tor Vergata in 2007 and is currently a senior researcher at CNIT (Consorzio Nazionale Interuniversitario per le Telecomunicazioni). He is also a contract professor of Network Labs and Enterprise Networks at the University of Rome Tor Vergata. He has participated to several EU research projects and coordinated the technical aspects of the H2020 project BEBA.

Angelo Tulumello is an undergraduate student (with bachelor degree) at University of Rome Tor Vergata, working on stateful dataplanes and in-switch telemetry. He won the third place at the SIGCOMM Student Research Competition with the work “A Fully Portable TCP Implementation Using XFSMs”. He has participated to the H2020 SUPERFLUIDITY project. He is the principal maintainer of the DPDK based FlowBlaze SW implementation.

Paolo Giaccone received the Dr.Ing. and Ph.D. degrees in telecommunications engineering from the Politecnico di Torino, Italy, in 1998 and 2001, respectively. He is currently an Associate Professor in the Department of Electronics, Politecnico di Torino. During 2000-2001 and in 2002 he was with the Information Systems Networking Lab, Electrical Engineering Dept., Stanford University, Stanford, CA. His main area of interest is the design of network control and optimization algorithms.

Andrea Bianco is Full Professor and Department Head of the Dipartimento di Elettronica e Telecomunicazioni of Politecnico di Torino, Italy. He has co-authored over 200 papers published in international journals and presented in leading international conferences in the area of telecommunication networks. His current research interests are in the fields of protocols and architectures of all-optical networks, switch architectures for high-speed networks, SDN networks and software routers.

Giuseppe Bianchi is Full Professor of Networking and Network Security at the School of Engineering of the University of Roma Tor Vergata since 2007. He has carried out pioneering research work on WLAN performance modeling, and is currently interested in network programmability, privacy and security, and performance evaluation. He has chaired more than 10 international conferences (e.g., IEEE Infocom, ACM CoNext, ITC, WoWMoM, LANMAN, etc.), and has coordinated to date six large scale European Projects.

Vii Distributed detection of DDoS attacks

As a proof of concept, we developed with LOADER a simple yet significant application for the distributed detection of Distributed Denial of Service (DDoS) attacks, denoted as DDoSD. The main idea of the distributed detection is to exploit the typical temporal correlation between the increase of traffic across all the network devices at the border of the network, due to the distributed nature of the attack. Clearly, the correlated traffic increase across the edge routers is much more reliable way to detect an attack with respect to a monitoring the traffic on a single network device only. Consequently a network application performing DDoSD must be able to capture this sudden increase in the the network traffic.

With traditional SDN approaches, the controller is involved in the detection process by being notified about the transit packets by switches. This leads to large overhead in terms of traffic and of detection latency. Instead, LOADER enables a distributed detection process operating directly at the switches, without any controller involvement. Furthermore, the actions to counter the attack are executed in a distributed way, by each network device involved in the detection.

As shown in Fig. 5, we consider a large network (e.g., an Autonomous System - AS) connected to other networks (e.g., other ASs) through different edge routers and the attack targets a set of internal servers. Since the definition of a realistic DDoSD algorithm is a well-known problem in the literature [27] and it is completely out of the scope of this work, we employ a simple proof-of-concept threshold-based detection scheme, which can be used as a foundation for more sophisticated DDoSD mechanisms.

Vii-a Network application definition

The total traffic entering the whole network and directed toward the targeted servers is defined as the sum of the inbound traffic over each edge router (SW1-SW4 in our reference topology). Based on the value of the inbound traffic the network application must perform some retaliation to counteract the DDoS attack. Consequently it is straightforward to map this kind of application to a LOADER application as described in the following.

Vii-A1 States

Given edge routers, we define as the average rate of inbound traffic traversing the border router , with . As monitoring target, we employ the rate of incoming SYN packets directed towards the internal servers.

Vii-A2 Reduction function

The reduction function employed by the application is composed of a single primitive action, namely . Consequently, the output of the reduction function is defined as .

Vii-A3 Trigger function

Following the previous discussion, we define the threshold function simply as a simple comparison of against a predefined threshold. Thus, a DDoS attack is detected locally at each switch if is larger than a given threshold, above which the attack is considered as detected. The threshold is determined with standard test-based statistical methods.

Vii-A4 Activity function

We employ a simple activity function which notifies the controller once the application has been triggered.

Listing LABEL:code:ddos shows how such application is described in LOADER.

from Controller import TopologyManager
from LOADER.PrimitiveActions import Drop, StateSum, Rate
from LOADER.Scope import Pkt
def extPortFilter(devices):
  extPorts = []
  for d in devices:
    extPorts += [p for p in d.getPorts() if p.Type==EXTERNAL]
  return (Pkt.ingressPort in extPorts) and (Pkt.TCP.Flag.SYN == 1)
R = 1000 # DDoS threshold in SYNs / s
# List of all edge routers
devices = TopologyManager.getEdgeRouters()
applicationStates = []
# Iterate over all edge routers
for i in range(devices):
  # Create a state for each edge router
  s = State(target=d,
      scope=Rate(filter=Pkt(filter = extPortFilter([d]))))
  applicationStates.append(s)
# Define the reduction function as the sum of application states
  r = ReductionFunction(states=applicationStates,
      operation=StateSum)
# Define the activity function to drop all incoming packets
a = ActivityFunction(target=devices,
scope=Pkt(filter=extPortFilter(devices)), action=Controller.Notify("DDoS detected"))
# Define triffer function to perform probabilistic dropping
tr = TriggerFunction(s0=r.Result(),
      trigger=r.Result()>R,
      inconsistencyLevel=TimeObsolescence(0.2, "ms")
      activity = a)
Listing 5: DDoS detection with LOADER

Vii-B Benefit of replicated states

In a single replica approach (i.e., in the absence of LOADER) the DDoSD application would require all the traffic entering the network to traverse a single switch holding the state monitoring the incoming traffic. Thus the network congestion would grow and could not be compatible with some traffic management schemes (e.g., load balancing) that require to control the routing arbitrary within the network.

LOADER instead permits to replicate the entire DDoSD application over multiple switches, thus minimizing the data overhead over the whole network. At the same time, LOADER introduces an overhead in terms of replication traffic, whose amount depends on the allowed inconsistency level. The replication traffic will be evaluated experimentally for the DDoSD application in Sec. VII-D.

Notably, DDoSD is robust to possible transient inconsistencies between the values of total traffic estimated at each switch, thus employing an eventual consistency replication scheme will not create noticeable degradation due to replicated states estimation errors.

Vii-C Implementation

The considered DDoSD scheme has been implemented on top of two different programmable data plane platforms: (1) /; (2) OPP. Furthermore, the definition of the DDoSD application was performed inside ONOS with LOADER abstraction which permits to automatically offload and configure the developed network application.

Vii-C1 Control plane implementation

We implemented basic LOADER functionalities related to this particular use case inside ONOS. To provide support for this application we considered a simple embedding algorithm. The algorithm receives as an input the network topology, the set of flows to monitor defined as source-destination pairs and the maximum amount of admissible replicated states . It then assigns positions of each application state by considering the first nodes with highest betweenness centrality. We assume a sufficiently large amount of resources inside switches, thus permitting function co-location with consequent replication of all application elements. The update distribution among the chosen nodes is performed through a shared spanning tree whose routing is setup during the application initialization.

Vii-C2 Data plane implementation with P4

Our prototype is developed and tested in a virtual environment using Mininet [15] and P4-enabled virtual switches targeting using the v1 Model and using the Simple Switch Architecture [19].

We estimate the rate of incoming TCP SYN packets by employing a sampling window equal to . Let be the estimated rate in the time interval with . The average rate is estimated at each switch as

(6)

and represents the local state to be shared across all the other border routers, coherently with the description of Sec. VII-A. In particular, is chosen as a power of 2 due to the hardware limits in P4 switches imposed to the types of operations that can be implemented, i.e., shift operations are supported, divisions are not [22]. Notably, The most recent samples of the estimated rate are stored in a circular buffer. Replicated states are instead saved in dedicated registers.

Vii-C3 Data plane implementation with OPP

The OPP implementation requires a sequence of three stages: stage 0 extracts the state from update messages; stage 1 stores the state from the metadata notified by the previous table, performs monitoring and detection and generates update messages; stage 2 performs simple L3-forwarding. Stage 0 represents the stateful processing core of replicated states. The processed flows are identified by the IPv4 destination addresses of the target servers. Stage 0 also considers one flow data variable containing the switch-local state and the variables storing the replicated states. Switch-local state is computed by employing a hardware-implemented Exponential Weighted Moving Average (EWMA) counting the number of TCP SYN packets in a given preconfigured time window.

Vii-D Experimental evaluation and validation

We configure a Mininet-based emulation environment deploying the topology shown in Fig. 5, where, for the sake of simplicity, each cluster and each AS is represented by a Mininet host. To simulate the DDoS attack, we use hping3 tool to send TCP SYN requests from all ASs to all internal servers. In each experiment, during the first 20 seconds, we send the request at a slow rate, and then we increase the rate of all senders in a such a way to trigger the execution of the activity function. We consider experiments with varying : (i) single replica embedded in SW1 (), (ii) 2 replicas () embedded in SW1 and SW3, and (iii) 4 replicas (

) embedded in SW1, SW2, SW3, SW4. We repeated the experiments to achieve negligible 95% confidence intervals if shown in the plots.

Fig. 5: Reference topology for DDoS Detection use case.
Fig. 6: Temporal evolution of the local, remote and global states for the stateful switches in case of 2 replicas for the global state in P4 implementation.

Fig. 6 shows the evolution of application states alongside with the evolution of for the case of 2 replicas, implemented in P4. Identical results are obtained with OPP and thus are not reported for the sake of space. As expected, the values of evaluated at SW1 and SW3 are coherent, and permit a contemporary detection of the DDoS attack in the two switches, without any interaction with the controller. This experimental result validates our proposed implementation for both P4 and OPP.

In Figs. 7-8 we show the utilization of the links present in the ring topology connecting all switches, for different values of , with both P4 and OPP implementations. Clearly, for one replica (i.e, single replica approach) the load on the link is greatly unbalanced and in general higher for all the links. By increasing the number of replicas to 2, the load of the data traffic decreases by a factor of 1.6 both in P4 and OPP and is much better balanced across the links. The slightly different values depends on the different mechanisms adopted for triggering the update event by the incoming traffic: in P4 the update rate depends on the traffic, whereas in OPP it is independent. Adding two other replicas reduces the data traffic by around 20% in both implementations, but now the replication traffic becomes more relevant due to the higher number of replicas. Indeed, the fraction of update packets increases from 14% (for 2 replicas) to 24% (for 4 replicas) in P4 and from 11% (for 2 replicas) to 23% (for 4 replicas) in OPP. Thus, the two implementations behave very similarly and show the beneficial effect on the overall traffic in the network due to multiple replicas.

Fig. 7: Link occupancy for data and replication traffic in case of replicas for global state in P4 implementation.
Fig. 8: Link occupation for data and replication traffic in case of replicas for global state in OPP implementation.

Viii Other applications enabled by LOADER

In the following section we describe some examples of network applications which are shown to benefit from state replication. We show how those applications can be implemented with LOADER by providing their elements mapping and a code example for each of them.

Viii-a Distributed rate limiting

In [21] the authors propose a network-wide global token bucket. Similarly to a local token bucket, a global one allows to rate limit all the incoming traffic in a given network thanks to a network application performing probabilistic dropping at the edge routers of the network. However, differently from a local one, a global token bucket involves an instance of the same token bucket run independently at each border router and using a single shared state accounting for the total inbound traffic.

This kind of application can be easily mapped to LOADER by considering the DDoSD scheme and by changing only the trigger and the activity functions as follows:

Viii-A1 States

Given edge routers, we define state as the average rate of inbound traffic traversing edge router , with .

Viii-A2 Reduction function

The reduction function performs a sum operation among all local state with .

Viii-A3 Trigger function

In order to perform probabilistic dropping the trigger function must invoke the activity function proportionally to the rate of the incoming traffic and the desired rate.

Viii-A4 Activity function

Identically to the DDoSD case, the activity function must perform dropping of incoming packets whenever invoked.

The code related to this network application with the mapping of each individual element is shown in Listing LABEL:code:drl.

from Controller import TopologyManager
from LOADER.PrimitiveActions import Drop, StateSum, Rate
from LOADER.Scope import Pkt
def extPortFilter(devices):
  extPorts = []
  for d in devices:
    extPorts += [p for p in d.getPorts() if p.Type==EXTERNAL]
  return Pkt.ingressPort in extPorts
R = 100**6 # Desired rate in bps
# List of all edge routers
devices = TopologyManager.getEdgeRouters()
applicationStates = []
# Iterate over all edge routers
for d in devices:
  # Create a state for each edge router
  s = State(target=d,
      scope=Rate(filter=Pkt(filter=extPortFilter([d]))))
  applicationStates.append(s)
# Define the reduction function as the sum of application states
r = ReductionFunction(states=applicationStates,
      operation=StateSum)
# Define the activity function to drop all incoming packets
a = ActivityFunction(target=devices,
      scope=Pkt(filter=extPortFilter(devices)),
      action=Drop)
# Define trigger function to perform probabilistic dropping
tr = TriggerFunction(s0=r.Result(),
      trigger=(rand()<(r.Result()-R)/r.Result()),
      inconsistencyLevel=UpdateError(10),
      activity = a)
Listing 6: Distributed rate limiting with LOADER

In Fig. 9 we show an example of the distributed rate limiting application in action. We create two flows: Flow 1 from AS 1 directed towards server cluster 1 and another flow from AS 3 directed towards server cluster 3. We consider shortest path routing and place state replicas in SW1 and SW3. Flow 1 starts at time 0 with a rate of 5 Mbps while flow 2 starts with an offset of 20 s and with the same rate. Although the flows do not cross each other at any point in the network, when the flow 2 starts both of them are rate limited to a predefined aggregate 8 Mbps threshold. Note that oscillations in throughput are due to the adopted probabilistic dropping scheme.

Fig. 9: Distributed rate limiter with two flows at different edges of the network.

Viii-B Link-aware load balancing

In [1] the authors propose a load balancing scheme for data center networks, based on the congestion level of individual links from the source leaf switch to the destination leaf switch. Source leaf switches keep track of local uplink congestion and of the downlink congestion from each spine switch to the destination leaf. When a new flow starts, the source leaf switch selects a path to the destination by considering the one that minimizes the maximum congestion on the whole path, i.e., local uplink congestion and the downlink congestion on the spine.

For the sake of simplicity, we present a reduced version of the application with some omitted details and by assuming that the application targets a single leaf switch with spine switches. The application can be easily extended to many leaf switches by simply instantiating multiple instances of the same application and the states related to downlink congestion must be shared across multiple leaf switches.

This network application can be mapped to LOADER as follows:

Viii-B1 States

Given a leaf switch, we define state as the average load on the uplink ports, with . Additionally, we define state as the average downlink load on the port leading to the destination leaf switch of spine switch , with .

Viii-B2 Reduction function

The reduction function is composed of two primitive actions, namely and . Consequently, the reduced version of the states is obtained as:

Viii-B3 Trigger function

Differently from previous use cases, the trigger function in this network application triggers the activity function each time a new is obtained and does not require any additional checks.

Viii-B4 Activity function

The activity function involves simple insertion of a new per-flow forwarding rule for each new flow based on the outcome of the reduction function.

The code related to this network application with the mapping of each individual element is shown in Listing LABEL:code:link-lb.

from Controller import TopologyManager
from LOADER.PrimitiveActions import SetEgress, Rate, min, max
from LOADER.Scope import Pkt
# Filter for downlink ports (i.e. from spine to leaf)
def dlPortFilter(device):
  return Pkt.getEgressPort() in [p for p in device.getPorts() if p.Type == DOWNLINK]
# Filter for uplink ports (i.e. from leaf to spine)
def ulPortFilter(device):
  return Pkt.getEgressPort() in [p for p in device.getPorts() if p.Type == UPLINK]
# Reduction function for minimum path congestion
def minMaxCong(ulCong, dlCong):
  dstLeaf = TopologyManager.getSpineID(Pkt.getDst())
  return min([max(ulCong[i], dlCong[i][dstLeaf]) for i in range(len(TopologyManager.getSpines()))])
l = TopologyManager.getLeafSwitches()[0]
spines = TopologyManager.getSpines()
dlCong = []
ulCong = []
for p in l.getPorts(filter = ulPortFilter):
  s = State(target=l, scope=Rate(filter = Port(p)))
  ulCong.append(s)
for sp in spines:
  spineLoad = []
  for p in sp.getPorts(filter = dlPortFilter):
    s = State(target=sp, scope=Rate(filter = p))
    spineLoad.append(s)
  dlCong.append(spineLoad)
r = ReductionFunction(states=[ulCong, dlCong],
      operation=minMaxCong)
a = ActivityFunction(
      target = l,
      scope = Pkt(filter = (Pkt.TCP.Flag.SYN == 1)),
      action = insertRule(
      match = Pkt.getTuple(),
      action = SetEgress,
      args = r.Result()))
tr = TriggerFunction(
      s0=r.Result(),
      inconsistencyLevel=UpdateError(10),
      activity = a)
Listing 7: Link-aware load balancing with LOADER

Viii-C Resource-aware load balancing

A resource-aware load balancing application has been introduced in Sec. IV-A. The application performs load balancing of the user requests among the available servers based on the amount of available resources (e.g., average CPU utilization) at each server. As we already defined the function mapping, in the following we present solely the code of the application. For simplicity we do not define the states as they are not directly related to the network conditions, but instead to servers status. The code related to this network application with the mapping of each individual element is shown in Listing LABEL:code:resource-lb.

from Controller import TopologyManager
from LOADER.PrimitiveActions import SetEgress, Rate
from LOADER.Scope import Pkt, ExtScopeHelper
THR = 0.8  # threshold CPU load percentage
# Get the average CPU load of servers in the form of a list of states. We omit the details.
loads = ExtScopeHelper(scope="ServerLoad")
r1 = ReductionFunction(
      states = [loads]
      operation=min([i.Value() for i in loads]))
r2 = ReductionFunction(
      states = [loads]
      operation=mean([i.Value() for i in loads]))
a1 = ActivityFunction(
      scope = Pkt(filter = (TCP.Flag.SYN == 1)),
      action = SetEgress,
      args = r1.Result())
a2 = ActivityFunction(
      scope = Pkt(filter = (TCP.Flag.SYN == 1)),
      action = SetEgress,
      args = CONTROLLER_PORT)
tr1 = TriggerFunction(
      s0=r2.Result(),
      trigger=(r2.Result() <= THR),
      inconsistencyLevel=UpdateError(15),
      activity=a1)
tr2 = TriggerFunction(
      s0=r2.Result(),
      trigger=(r2.Result() > THR),
      inconsistencyLevel=UpdateError(15),
      activity=a2)
Listing 8: Resource-aware load balancing with LOADER.

Ix Conclusions

We propose a novel framework, namely LOADER, to address the limitation of stateful data planes in the presence of non-local states at the switches in the definition of the network applications. LOADER enables stateful switches to take decisions based on information which is not locally available. This is achieved by introducing a state replication mechanism among the switches. We discuss the main practical design challenges to support state replication, whose implementation is validated using both P4 and OPP stateful data planes.

Furthermore, we provide a high-level programming abstraction for the development of distributed network applications based on replicated states. Our programming model combines the expressiveness of a high-level programming model without ignoring the underlying hardware architecture of programmable switches. Thus, it is both of easy understanding for the programmer and can provide a comprehensible abstraction for the embedding of network applications.

By combining the proposed abstraction model with the implementation of the replication mechanism, LOADER effectively permits to support distributed network-wide applications without involving any central entity. As our results show, distributed network applications can be beneficial for the network performance and can be efficiently implemented in high-performance programmable stateful switches.

References

  • [1] M. Alizadeh, T. Edsall, S. Dharmapurikar, R. Vaidyanathan, K. Chu, A. Fingerhut, F. Matus, R. Pan, N. Yadav, G. Varghese, et al. (2014) CONGA: distributed congestion-aware load balancing for datacenters. In ACM SIGCOMM Computer Communication Review, Vol. 44, pp. 503–514. Cited by: §VIII-B.
  • [2] M. T. Arashloo, Y. Koral, M. Greenberg, J. Rexford, and D. Walker (2016) SNAP: stateful network-wide abstractions for packet processing. In ACM SIGCOMM, External Links: ISBN 978-1-4503-4193-6 Cited by: §I, §II, 1st item.
  • [3] R. Beckett, M. Greenberg, and D. Walker (2016) Temporal netkat. ACM SIGPLAN Notices 51 (6), pp. 386–401. Cited by: §II.
  • [4] P. Berde, M. Gerola, J. Hart, Y. Higuchi, M. Kobayashi, T. Koide, B. Lantz, B. O’Connor, P. Radoslavov, W. Snow, et al. (2014) ONOS: towards an open, distributed SDN OS. In ACM SIGCOMM HotNets, pp. 1–6. Cited by: §I.
  • [5] G. Bianchi, M. Bonola, A. Capone, and C. Cascone (2014-04) OpenState: programming platform-independent stateful Openflow applications inside the switch. ACM SIGCOMM CCR. External Links: ISSN 0146-4833 Cited by: §I.
  • [6] G. Bianchi, M. Bonola, S. Pontarelli, D. Sanvito, A. Capone, and C. Cascone (2016) Open Packet Processor: a programmable architecture for wire speed platform-independent stateful in-network processing. arXiv:1605.01977. Cited by: §I, §I, §VI.
  • [7] P. Bosshart and al. (2013) Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN. In ACM SIGCOMM CCR, Cited by: §VI-B.
  • [8] P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, et al. (2014) P4: programming protocol-independent packet processors. ACM SIGCOMM Computer Communication Review 44 (3), pp. 87–95. Cited by: §I, §I, §VI.
  • [9] E. Brewer (2012-02) CAP twelve years later: how the “rules” have changed. Computer. External Links: ISSN 0018-9162 Cited by: §V.
  • [10] H. T. Dang, D. Sciascia, M. Canini, F. Pedone, and R. Soulé (2015) NetPaxos: consensus at network speed. In ACM SIGCOMM SOSR, External Links: ISBN 978-1-4503-3451-8 Cited by: §II.
  • [11] H. Howard and R. Mortier (2019) A generalised solution to distributed consensus. External Links: 1902.06776, Link Cited by: 1st item.
  • [12] C. Kim, A. Sivaraman, N. Katta, A. Bas, A. Dixit, and L. J. Wobker (2015) In-band network telemetry via programmable dataplanes. In ACM SIGCOMM, Cited by: §VI-B1.
  • [13] H. Kim, J. Reich, A. Gupta, M. Shahbaz, N. Feamster, and R. Clark (2015) Kinetic: verifiable dynamic network control. In USENIX NSDI 15, pp. 59–72. Cited by: §II.
  • [14] L. Lamport et al. (2001) Paxos made simple. ACM Sigact News. Cited by: §II.
  • [15] B. Lantz, B. Heller, and N. McKeown (2010) A network in a laptop: rapid prototyping for software-defined networks. In ACM SIGCOMM HotNets, pp. 19. Cited by: §VII-C2.
  • [16] S. Luo, H. Yu, and L. Vanbever (2017) Swing State: consistent updates for stateful and programmable data planes. In ACM SOSR, Cited by: §II.
  • [17] J. McClurg, H. Hojjat, N. Foster, and P. Černỳ (2016) Event-driven network programming. In ACM SIGPLAN Notices, Vol. 51, pp. 369–385. Cited by: §II.
  • [18] M. T. Özsu and P. Valduriez (2011) Principles of distributed database systems. Springer Science & Business Media. Cited by: §II.
  • [19] P4 language repository. External Links: Link Cited by: §VII-C2.
  • [20] S. Pontarelli, R. Bifulco, M. Bonola, C. Cascone, M. Spaziani, V. Bruschi, D. Sanvito, G. Siracusano, A. Capone, M. Honda, and F. Huici (2019) FlowBlaze: stateful packet processing in hardware. In USENIX NSDI 19, pp. 531–548. Cited by: §I.
  • [21] B. Raghavan, K. Vishwanath, S. Ramabhadran, K. Yocum, and A. C. Snoeren (2007) Cloud control with distributed rate limiting. In ACM SIGCOMM Computer Communication Review, Vol. 37, pp. 337–348. Cited by: §VIII-A.
  • [22] N. K. Sharma, A. Kaufmann, T. Anderson, A. Krishnamurthy, J. Nelson, and S. Peter (2017) Evaluating the power of flexible packet processing for network resource allocation. In USENIX NSDI, Cited by: §VII-C2.
  • [23] A. Siddique Muqaddas, G. Sviridov, P. Giaccone, and A. Bianco (2019) Optimal state replication in stateful data planes. arXiv:1912.03025. Cited by: §II, §IV-C3, §IV-C4, §IV-C, §VI-A2.
  • [24] G. Sviridov, M. Bonola, A. Tulumello, P. Giaccone, A. Bianco, and G. Bianchi (2018) LODGE: LOcal Decisions on Global statEs in programmable data planes. In IEEE NetSoft, pp. 257–261. Cited by: §II.
  • [25] S. H. Yeganeh, A. Tootoonchian, and Y. Ganjali (2013) On scalability of software-defined networking. IEEE Communications Magazine 51 (2), pp. 136–141. Cited by: §I.
  • [26] Y. Yuan, R. Alur, and B. T. Loo (2014) NetEgg: programming network policies by examples. In ACM SIGCOMM HotNets, pp. 20. Cited by: §II.
  • [27] S. T. Zargar, J. Joshi, and D. Tipper (2013) A survey of defense mechanisms against distributed denial of service (DDoS) flooding attacks. IEEE communications surveys & tutorials 15 (4), pp. 2046–2069. Cited by: §VII.
  • [28] A. Zeineddine and W. El-Hajj (2018) Stateful distributed firewall as a service in SDN. In IEEE NetSoft, pp. 212–216. Cited by: §II.

German Sviridov received his BSc in Computer Engineering and MSc in Telecommunication Engineering, both from Politecnico di Torino, Italy. In late 2017 he joined the Telecommunication Networks Group (TNG) at the Dipartimento di Elettronica e Telecomunicazioni of Politecnico di Torino as a Ph.D. student. His current research interests involve programmable data planes for SDN and scheduling mechanisms for data center networks.

Marco Bonola received the Ph.D. degrees in telecommunications engineering from the University of Rome Tor Vergata in 2007 and is currently a senior researcher at CNIT (Consorzio Nazionale Interuniversitario per le Telecomunicazioni). He is also a contract professor of Network Labs and Enterprise Networks at the University of Rome Tor Vergata. He has participated to several EU research projects and coordinated the technical aspects of the H2020 project BEBA.

Angelo Tulumello is an undergraduate student (with bachelor degree) at University of Rome Tor Vergata, working on stateful dataplanes and in-switch telemetry. He won the third place at the SIGCOMM Student Research Competition with the work “A Fully Portable TCP Implementation Using XFSMs”. He has participated to the H2020 SUPERFLUIDITY project. He is the principal maintainer of the DPDK based FlowBlaze SW implementation.

Paolo Giaccone received the Dr.Ing. and Ph.D. degrees in telecommunications engineering from the Politecnico di Torino, Italy, in 1998 and 2001, respectively. He is currently an Associate Professor in the Department of Electronics, Politecnico di Torino. During 2000-2001 and in 2002 he was with the Information Systems Networking Lab, Electrical Engineering Dept., Stanford University, Stanford, CA. His main area of interest is the design of network control and optimization algorithms.

Andrea Bianco is Full Professor and Department Head of the Dipartimento di Elettronica e Telecomunicazioni of Politecnico di Torino, Italy. He has co-authored over 200 papers published in international journals and presented in leading international conferences in the area of telecommunication networks. His current research interests are in the fields of protocols and architectures of all-optical networks, switch architectures for high-speed networks, SDN networks and software routers.

Giuseppe Bianchi is Full Professor of Networking and Network Security at the School of Engineering of the University of Roma Tor Vergata since 2007. He has carried out pioneering research work on WLAN performance modeling, and is currently interested in network programmability, privacy and security, and performance evaluation. He has chaired more than 10 international conferences (e.g., IEEE Infocom, ACM CoNext, ITC, WoWMoM, LANMAN, etc.), and has coordinated to date six large scale European Projects.

Viii Other applications enabled by LOADER

In the following section we describe some examples of network applications which are shown to benefit from state replication. We show how those applications can be implemented with LOADER by providing their elements mapping and a code example for each of them.

Viii-a Distributed rate limiting

In [21] the authors propose a network-wide global token bucket. Similarly to a local token bucket, a global one allows to rate limit all the incoming traffic in a given network thanks to a network application performing probabilistic dropping at the edge routers of the network. However, differently from a local one, a global token bucket involves an instance of the same token bucket run independently at each border router and using a single shared state accounting for the total inbound traffic.

This kind of application can be easily mapped to LOADER by considering the DDoSD scheme and by changing only the trigger and the activity functions as follows:

Viii-A1 States

Given edge routers, we define state as the average rate of inbound traffic traversing edge router , with .

Viii-A2 Reduction function

The reduction function performs a sum operation among all local state with .

Viii-A3 Trigger function

In order to perform probabilistic dropping the trigger function must invoke the activity function proportionally to the rate of the incoming traffic and the desired rate.

Viii-A4 Activity function

Identically to the DDoSD case, the activity function must perform dropping of incoming packets whenever invoked.

The code related to this network application with the mapping of each individual element is shown in Listing LABEL:code:drl.

from Controller import TopologyManager
from LOADER.PrimitiveActions import Drop, StateSum, Rate
from LOADER.Scope import Pkt
def extPortFilter(devices):
  extPorts = []
  for d in devices:
    extPorts += [p for p in d.getPorts() if p.Type==EXTERNAL]
  return Pkt.ingressPort in extPorts
R = 100**6 # Desired rate in bps
# List of all edge routers
devices = TopologyManager.getEdgeRouters()
applicationStates = []
# Iterate over all edge routers
for d in devices:
  # Create a state for each edge router
  s = State(target=d,
      scope=Rate(filter=Pkt(filter=extPortFilter([d]))))
  applicationStates.append(s)
# Define the reduction function as the sum of application states
r = ReductionFunction(states=applicationStates,
      operation=StateSum)
# Define the activity function to drop all incoming packets
a = ActivityFunction(target=devices,
      scope=Pkt(filter=extPortFilter(devices)),
      action=Drop)
# Define trigger function to perform probabilistic dropping
tr = TriggerFunction(s0=r.Result(),
      trigger=(rand()<(r.Result()-R)/r.Result()),
      inconsistencyLevel=UpdateError(10),
      activity = a)
Listing 6: Distributed rate limiting with LOADER

In Fig. 9 we show an example of the distributed rate limiting application in action. We create two flows: Flow 1 from AS 1 directed towards server cluster 1 and another flow from AS 3 directed towards server cluster 3. We consider shortest path routing and place state replicas in SW1 and SW3. Flow 1 starts at time 0 with a rate of 5 Mbps while flow 2 starts with an offset of 20 s and with the same rate. Although the flows do not cross each other at any point in the network, when the flow 2 starts both of them are rate limited to a predefined aggregate 8 Mbps threshold. Note that oscillations in throughput are due to the adopted probabilistic dropping scheme.

Fig. 9: Distributed rate limiter with two flows at different edges of the network.

Viii-B Link-aware load balancing

In [1] the authors propose a load balancing scheme for data center networks, based on the congestion level of individual links from the source leaf switch to the destination leaf switch. Source leaf switches keep track of local uplink congestion and of the downlink congestion from each spine switch to the destination leaf. When a new flow starts, the source leaf switch selects a path to the destination by considering the one that minimizes the maximum congestion on the whole path, i.e., local uplink congestion and the downlink congestion on the spine.

For the sake of simplicity, we present a reduced version of the application with some omitted details and by assuming that the application targets a single leaf switch with spine switches. The application can be easily extended to many leaf switches by simply instantiating multiple instances of the same application and the states related to downlink congestion must be shared across multiple leaf switches.

This network application can be mapped to LOADER as follows:

Viii-B1 States

Given a leaf switch, we define state as the average load on the uplink ports, with . Additionally, we define state as the average downlink load on the port leading to the destination leaf switch of spine switch , with .

Viii-B2 Reduction function

The reduction function is composed of two primitive actions, namely and . Consequently, the reduced version of the states is obtained as:

Viii-B3 Trigger function

Differently from previous use cases, the trigger function in this network application triggers the activity function each time a new is obtained and does not require any additional checks.

Viii-B4 Activity function

The activity function involves simple insertion of a new per-flow forwarding rule for each new flow based on the outcome of the reduction function.

The code related to this network application with the mapping of each individual element is shown in Listing LABEL:code:link-lb.

from Controller import TopologyManager
from LOADER.PrimitiveActions import SetEgress, Rate, min, max
from LOADER.Scope import Pkt
# Filter for downlink ports (i.e. from spine to leaf)
def dlPortFilter(device):
  return Pkt.getEgressPort() in [p for p in device.getPorts() if p.Type == DOWNLINK]
# Filter for uplink ports (i.e. from leaf to spine)
def ulPortFilter(device):
  return Pkt.getEgressPort() in [p for p in device.getPorts() if p.Type == UPLINK]
# Reduction function for minimum path congestion
def minMaxCong(ulCong, dlCong):
  dstLeaf = TopologyManager.getSpineID(Pkt.getDst())
  return min([max(ulCong[i], dlCong[i][dstLeaf]) for i in range(len(TopologyManager.getSpines()))])
l = TopologyManager.getLeafSwitches()[0]
spines = TopologyManager.getSpines()
dlCong = []
ulCong = []
for p in l.getPorts(filter = ulPortFilter):
  s = State(target=l, scope=Rate(filter = Port(p)))
  ulCong.append(s)
for sp in spines:
  spineLoad = []
  for p in sp.getPorts(filter = dlPortFilter):
    s = State(target=sp, scope=Rate(filter = p))
    spineLoad.append(s)
  dlCong.append(spineLoad)
r = ReductionFunction(states=[ulCong, dlCong],
      operation=minMaxCong)
a = ActivityFunction(
      target = l,
      scope = Pkt(filter = (Pkt.TCP.Flag.SYN == 1)),
      action = insertRule(
      match = Pkt.getTuple(),
      action = SetEgress,
      args = r.Result()))
tr = TriggerFunction(
      s0=r.Result(),
      inconsistencyLevel=UpdateError(10),
      activity = a)
Listing 7: Link-aware load balancing with LOADER

Viii-C Resource-aware load balancing

A resource-aware load balancing application has been introduced in Sec. IV-A. The application performs load balancing of the user requests among the available servers based on the amount of available resources (e.g., average CPU utilization) at each server. As we already defined the function mapping, in the following we present solely the code of the application. For simplicity we do not define the states as they are not directly related to the network conditions, but instead to servers status. The code related to this network application with the mapping of each individual element is shown in Listing LABEL:code:resource-lb.

from Controller import TopologyManager
from LOADER.PrimitiveActions import SetEgress, Rate
from LOADER.Scope import Pkt, ExtScopeHelper
THR = 0.8  # threshold CPU load percentage
# Get the average CPU load of servers in the form of a list of states. We omit the details.
loads = ExtScopeHelper(scope="ServerLoad")
r1 = ReductionFunction(
      states = [loads]
      operation=min([i.Value() for i in loads]))
r2 = ReductionFunction(
      states = [loads]
      operation=mean([i.Value() for i in loads]))
a1 = ActivityFunction(
      scope = Pkt(filter = (TCP.Flag.SYN == 1)),
      action = SetEgress,
      args = r1.Result())
a2 = ActivityFunction(
      scope = Pkt(filter = (TCP.Flag.SYN == 1)),
      action = SetEgress,
      args = CONTROLLER_PORT)
tr1 = TriggerFunction(
      s0=r2.Result(),
      trigger=(r2.Result() <= THR),
      inconsistencyLevel=UpdateError(15),
      activity=a1)
tr2 = TriggerFunction(
      s0=r2.Result(),
      trigger=(r2.Result() > THR),
      inconsistencyLevel=UpdateError(15),
      activity=a2)
Listing 8: Resource-aware load balancing with LOADER.

Ix Conclusions

We propose a novel framework, namely LOADER, to address the limitation of stateful data planes in the presence of non-local states at the switches in the definition of the network applications. LOADER enables stateful switches to take decisions based on information which is not locally available. This is achieved by introducing a state replication mechanism among the switches. We discuss the main practical design challenges to support state replication, whose implementation is validated using both P4 and OPP stateful data planes.

Furthermore, we provide a high-level programming abstraction for the development of distributed network applications based on replicated states. Our programming model combines the expressiveness of a high-level programming model without ignoring the underlying hardware architecture of programmable switches. Thus, it is both of easy understanding for the programmer and can provide a comprehensible abstraction for the embedding of network applications.

By combining the proposed abstraction model with the implementation of the replication mechanism, LOADER effectively permits to support distributed network-wide applications without involving any central entity. As our results show, distributed network applications can be beneficial for the network performance and can be efficiently implemented in high-performance programmable stateful switches.

References

  • [1] M. Alizadeh, T. Edsall, S. Dharmapurikar, R. Vaidyanathan, K. Chu, A. Fingerhut, F. Matus, R. Pan, N. Yadav, G. Varghese, et al. (2014) CONGA: distributed congestion-aware load balancing for datacenters. In ACM SIGCOMM Computer Communication Review, Vol. 44, pp. 503–514. Cited by: §VIII-B.
  • [2] M. T. Arashloo, Y. Koral, M. Greenberg, J. Rexford, and D. Walker (2016) SNAP: stateful network-wide abstractions for packet processing. In ACM SIGCOMM, External Links: ISBN 978-1-4503-4193-6 Cited by: §I, §II, 1st item.
  • [3] R. Beckett, M. Greenberg, and D. Walker (2016) Temporal netkat. ACM SIGPLAN Notices 51 (6), pp. 386–401. Cited by: §II.
  • [4] P. Berde, M. Gerola, J. Hart, Y. Higuchi, M. Kobayashi, T. Koide, B. Lantz, B. O’Connor, P. Radoslavov, W. Snow, et al. (2014) ONOS: towards an open, distributed SDN OS. In ACM SIGCOMM HotNets, pp. 1–6. Cited by: §I.
  • [5] G. Bianchi, M. Bonola, A. Capone, and C. Cascone (2014-04) OpenState: programming platform-independent stateful Openflow applications inside the switch. ACM SIGCOMM CCR. External Links: ISSN 0146-4833 Cited by: §I.
  • [6] G. Bianchi, M. Bonola, S. Pontarelli, D. Sanvito, A. Capone, and C. Cascone (2016) Open Packet Processor: a programmable architecture for wire speed platform-independent stateful in-network processing. arXiv:1605.01977. Cited by: §I, §I, §VI.
  • [7] P. Bosshart and al. (2013) Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN. In ACM SIGCOMM CCR, Cited by: §VI-B.
  • [8] P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, et al. (2014) P4: programming protocol-independent packet processors. ACM SIGCOMM Computer Communication Review 44 (3), pp. 87–95. Cited by: §I, §I, §VI.
  • [9] E. Brewer (2012-02) CAP twelve years later: how the “rules” have changed. Computer. External Links: ISSN 0018-9162 Cited by: §V.
  • [10] H. T. Dang, D. Sciascia, M. Canini, F. Pedone, and R. Soulé (2015) NetPaxos: consensus at network speed. In ACM SIGCOMM SOSR, External Links: ISBN 978-1-4503-3451-8 Cited by: §II.
  • [11] H. Howard and R. Mortier (2019) A generalised solution to distributed consensus. External Links: 1902.06776, Link Cited by: 1st item.
  • [12] C. Kim, A. Sivaraman, N. Katta, A. Bas, A. Dixit, and L. J. Wobker (2015) In-band network telemetry via programmable dataplanes. In ACM SIGCOMM, Cited by: §VI-B1.
  • [13] H. Kim, J. Reich, A. Gupta, M. Shahbaz, N. Feamster, and R. Clark (2015) Kinetic: verifiable dynamic network control. In USENIX NSDI 15, pp. 59–72. Cited by: §II.
  • [14] L. Lamport et al. (2001) Paxos made simple. ACM Sigact News. Cited by: §II.
  • [15] B. Lantz, B. Heller, and N. McKeown (2010) A network in a laptop: rapid prototyping for software-defined networks. In ACM SIGCOMM HotNets, pp. 19. Cited by: §VII-C2.
  • [16] S. Luo, H. Yu, and L. Vanbever (2017) Swing State: consistent updates for stateful and programmable data planes. In ACM SOSR, Cited by: §II.
  • [17] J. McClurg, H. Hojjat, N. Foster, and P. Černỳ (2016) Event-driven network programming. In ACM SIGPLAN Notices, Vol. 51, pp. 369–385. Cited by: §II.
  • [18] M. T. Özsu and P. Valduriez (2011) Principles of distributed database systems. Springer Science & Business Media. Cited by: §II.
  • [19] P4 language repository. External Links: Link Cited by: §VII-C2.
  • [20] S. Pontarelli, R. Bifulco, M. Bonola, C. Cascone, M. Spaziani, V. Bruschi, D. Sanvito, G. Siracusano, A. Capone, M. Honda, and F. Huici (2019) FlowBlaze: stateful packet processing in hardware. In USENIX NSDI 19, pp. 531–548. Cited by: §I.
  • [21] B. Raghavan, K. Vishwanath, S. Ramabhadran, K. Yocum, and A. C. Snoeren (2007) Cloud control with distributed rate limiting. In ACM SIGCOMM Computer Communication Review, Vol. 37, pp. 337–348. Cited by: §VIII-A.
  • [22] N. K. Sharma, A. Kaufmann, T. Anderson, A. Krishnamurthy, J. Nelson, and S. Peter (2017) Evaluating the power of flexible packet processing for network resource allocation. In USENIX NSDI, Cited by: §VII-C2.
  • [23] A. Siddique Muqaddas, G. Sviridov, P. Giaccone, and A. Bianco (2019) Optimal state replication in stateful data planes. arXiv:1912.03025. Cited by: §II, §IV-C3, §IV-C4, §IV-C, §VI-A2.
  • [24] G. Sviridov, M. Bonola, A. Tulumello, P. Giaccone, A. Bianco, and G. Bianchi (2018) LODGE: LOcal Decisions on Global statEs in programmable data planes. In IEEE NetSoft, pp. 257–261. Cited by: §II.
  • [25] S. H. Yeganeh, A. Tootoonchian, and Y. Ganjali (2013) On scalability of software-defined networking. IEEE Communications Magazine 51 (2), pp. 136–141. Cited by: §I.
  • [26] Y. Yuan, R. Alur, and B. T. Loo (2014) NetEgg: programming network policies by examples. In ACM SIGCOMM HotNets, pp. 20. Cited by: §II.
  • [27] S. T. Zargar, J. Joshi, and D. Tipper (2013) A survey of defense mechanisms against distributed denial of service (DDoS) flooding attacks. IEEE communications surveys & tutorials 15 (4), pp. 2046–2069. Cited by: §VII.
  • [28] A. Zeineddine and W. El-Hajj (2018) Stateful distributed firewall as a service in SDN. In IEEE NetSoft, pp. 212–216. Cited by: §II.

German Sviridov received his BSc in Computer Engineering and MSc in Telecommunication Engineering, both from Politecnico di Torino, Italy. In late 2017 he joined the Telecommunication Networks Group (TNG) at the Dipartimento di Elettronica e Telecomunicazioni of Politecnico di Torino as a Ph.D. student. His current research interests involve programmable data planes for SDN and scheduling mechanisms for data center networks.

Marco Bonola received the Ph.D. degrees in telecommunications engineering from the University of Rome Tor Vergata in 2007 and is currently a senior researcher at CNIT (Consorzio Nazionale Interuniversitario per le Telecomunicazioni). He is also a contract professor of Network Labs and Enterprise Networks at the University of Rome Tor Vergata. He has participated to several EU research projects and coordinated the technical aspects of the H2020 project BEBA.

Angelo Tulumello is an undergraduate student (with bachelor degree) at University of Rome Tor Vergata, working on stateful dataplanes and in-switch telemetry. He won the third place at the SIGCOMM Student Research Competition with the work “A Fully Portable TCP Implementation Using XFSMs”. He has participated to the H2020 SUPERFLUIDITY project. He is the principal maintainer of the DPDK based FlowBlaze SW implementation.

Paolo Giaccone received the Dr.Ing. and Ph.D. degrees in telecommunications engineering from the Politecnico di Torino, Italy, in 1998 and 2001, respectively. He is currently an Associate Professor in the Department of Electronics, Politecnico di Torino. During 2000-2001 and in 2002 he was with the Information Systems Networking Lab, Electrical Engineering Dept., Stanford University, Stanford, CA. His main area of interest is the design of network control and optimization algorithms.

Andrea Bianco is Full Professor and Department Head of the Dipartimento di Elettronica e Telecomunicazioni of Politecnico di Torino, Italy. He has co-authored over 200 papers published in international journals and presented in leading international conferences in the area of telecommunication networks. His current research interests are in the fields of protocols and architectures of all-optical networks, switch architectures for high-speed networks, SDN networks and software routers.

Giuseppe Bianchi is Full Professor of Networking and Network Security at the School of Engineering of the University of Roma Tor Vergata since 2007. He has carried out pioneering research work on WLAN performance modeling, and is currently interested in network programmability, privacy and security, and performance evaluation. He has chaired more than 10 international conferences (e.g., IEEE Infocom, ACM CoNext, ITC, WoWMoM, LANMAN, etc.), and has coordinated to date six large scale European Projects.

Ix Conclusions

We propose a novel framework, namely LOADER, to address the limitation of stateful data planes in the presence of non-local states at the switches in the definition of the network applications. LOADER enables stateful switches to take decisions based on information which is not locally available. This is achieved by introducing a state replication mechanism among the switches. We discuss the main practical design challenges to support state replication, whose implementation is validated using both P4 and OPP stateful data planes.

Furthermore, we provide a high-level programming abstraction for the development of distributed network applications based on replicated states. Our programming model combines the expressiveness of a high-level programming model without ignoring the underlying hardware architecture of programmable switches. Thus, it is both of easy understanding for the programmer and can provide a comprehensible abstraction for the embedding of network applications.

By combining the proposed abstraction model with the implementation of the replication mechanism, LOADER effectively permits to support distributed network-wide applications without involving any central entity. As our results show, distributed network applications can be beneficial for the network performance and can be efficiently implemented in high-performance programmable stateful switches.

References

  • [1] M. Alizadeh, T. Edsall, S. Dharmapurikar, R. Vaidyanathan, K. Chu, A. Fingerhut, F. Matus, R. Pan, N. Yadav, G. Varghese, et al. (2014) CONGA: distributed congestion-aware load balancing for datacenters. In ACM SIGCOMM Computer Communication Review, Vol. 44, pp. 503–514. Cited by: §VIII-B.
  • [2] M. T. Arashloo, Y. Koral, M. Greenberg, J. Rexford, and D. Walker (2016) SNAP: stateful network-wide abstractions for packet processing. In ACM SIGCOMM, External Links: ISBN 978-1-4503-4193-6 Cited by: §I, §II, 1st item.
  • [3] R. Beckett, M. Greenberg, and D. Walker (2016) Temporal netkat. ACM SIGPLAN Notices 51 (6), pp. 386–401. Cited by: §II.
  • [4] P. Berde, M. Gerola, J. Hart, Y. Higuchi, M. Kobayashi, T. Koide, B. Lantz, B. O’Connor, P. Radoslavov, W. Snow, et al. (2014) ONOS: towards an open, distributed SDN OS. In ACM SIGCOMM HotNets, pp. 1–6. Cited by: §I.
  • [5] G. Bianchi, M. Bonola, A. Capone, and C. Cas