I Introduction
In recent years a major shift of paradigm has been observed in the field of SDN with the introduction of stateful data planes, which address the performance limitations of a complete centralization of the control plane in a canonical SDN architecture, as highlighted in [14, 25]. Indeed, stateful switches, as described in [6, 5], can be programmed to execute userdefined code during packet processing, operating on local state variables stored in persistent memories. Thus stateful data planes provide an additional level of programmability with respect to canonical SDN, whose data plane is instead stateless, according to the original OpenFlow paradigm. Stateful switches can take local decisions without relying on the intervention of an SDN controller [3]. This fact has many main beneficial effects. First, it greatly improves the reactivity of network applications by reducing the communication and latency overhead due to the interaction with the controller. Second, it reduces the computational burden of the controller to sustain the correct network behavior [11]. Finally, the availability of state variables enables new finegrained networking applications [12], as decisions can now be taken on a perpacket basis contrary to the perflow basis of canonical SDN.
The availability of local state variables (denoted simply as “states” in the following) and the capability to run local programs (i.e., finite state machines) based on such states open a new perspective, since now distributed algorithms can be devised to run in the switches across the network. This allows to extend the scalability of many network applications, thanks to its distributed nature.
In particular, differently from previous works, we focus on the specific scenario in which the network application runs locally in stateful switches based on some nonlocal states. Indeed, for applications implementing networkwide policies, the value of a state may be “global” across multiple switches and each switch holds only a local replica of the state. Recent works, as [23], have shown the practical feasibility of this approach by leveraging available programmable data planes, such as P4 [6] and Open Packet Processor (OPP) [5].
Whenever a given state is replicated across multiple switches, two fundamental and coupled questions must be addressed: How many replicas are needed? In which switches to place such replicas? The optimal answer must consider many issues. First, all the traffic flows must traverse at least one switch that hold a state affecting (or affected by) the flow. However, routing a flow not along its shortest path increases the data traffic on the network, impairing the overall network performance. Thus, from the point of view of the data traffic, it would be convenient to increase the number of replicas until at least one replica is present along the shortest path of each flow. At the same time, adopting replicas comes at the cost of keeping the replicas synchronized. This requires the interaction between the switches holding the replicas and thus introduces a synchronization traffic, which increases with the number of replicas and consequently affects the overall offered load on the network, impairing the network performance. Thus, from the point of view of the synchronization traffic, it would be convenient to reduce as much as possible the number of replicas. In summary, the optimal selection of the number of replicas and their location depends on the tradeoff between data traffic and synchronization traffic.
In our work we address all the above mentioned questions and provide the following contributions:

we propose the optimal state replication problem and formalize it as an ILP problem, that minimizes the overall (i.e., data plus synchronization) traffic;

to cope with the limited scalability of the ILP solver, we propose an approximation algorithm, denoted as PlaceMultiReplicas (PMR), able to solve large instances of the problem;

we numerically evaluate the performance of PMR and show that it approximates very well the optimal solution, at least for small instances of the problem; furthermore, we show that adding also few replicas in a network can improve highly the performance with respect to the singlereplica scenario;

we analytically find the optimal number of replicas for unwrapped Manhattan network topologies and characterize its asymptotic behavior; we show that the formula obtained for large networks can be used also for small instances of the network.
The remainder of the paper is organized as follows. In Sec. II, we describe the state replication problem. In Sec. III, we present the ILP formalization of the optimal state replication problem. In Sec. IV, we propose the PMR algorithm. In Sec. V we show the numerical results for the state placement problem. In Sec. VI, we present the asymptotic analysis of the optimal number of replicas in a network. In Sec. VII we discuss the related works. Finally, we draw our conclusions in Sec. VIII.
Ii State replication in stateful SDN
Following the increasing need for highly dynamic network services and policies, programmable data planes have enabled numerous traffic processing policies to be offloaded directly into the switches. New frameworks to embed userdefined network policies to the stateful switches have been proposed [1, 17]. In our work we will consider SNAP [1] as a reference framework, even if our proposed approach is general and relevant to any programming abstractions for stateful data planes.
SNAP introduces a onebigswitch (OBS) model as a network abstraction, according to which the whole network of switches is seen as a single “big” switch with a given set of input and output ports, corresponding to the end hosts, and an aggregate list of available resources for traffic processing. Due to the way the OBS abstraction is defined, flow routing between hosts is described on the basis of I/O port pairs.
When defining a network application, the programmer is exposed to the OBS abstraction without having any knowledge of the actual underlying composition of the network. The network applications are decomposed by SNAP into an extension of forward decision diagram (xFFD) that incorporates also stateful processing elements available at the switches. The placement of the singlereplica state is affecting the overall application and network performance. Indeed, xFFD alongside with the traffic matrix between the OBS ports are fed into SNAP ILP (Integer Linear Programming) optimizer which selects the switches where to place each state and the corresponding processing logic of the decomposed application. The order in which traffic traverses the switches, that store the states, plays a fundamental role as state dependencies must be preserved in order to correctly execute the xFDD of the original application. To guarantee the correct execution of an application, all the flows affecting or affected by a state must be routed across the switch holding it. Thus the routing does not generally follow the shortest path between the input and output port in the OBS and the SNAP solver jointly optimizes the placement of the states and the routing to minimize the total data traffic in the network.
The main limitation of SNAP emerges from the fact that it allows only one unique replica for each state inside the network. This considerably restrains the routing of flows and consequently precludes a wide range of optimization techniques such as load balancing and traffic engineering.
Iia State replication
In order to cope up with the above mentioned limitations of SNAP, we consider a scenario in which states are replicated across the stateful switches. We address the optimal placement of the replicas of each state across the network, given the knowledge of the traffic demands and of the xFDD defining the network application.
As a toy example, consider a networkwide application that acts on a global counter (e.g., the total traffic entering/leaving the network) affected by all flows in the network. SNAP will place a single replica of the state associated to the global counter in a single switch in the topology, likely into the switch in the most “central” position (i.e., with the highest betweenness centrality) with respect to the network topology, as shown in Fig. 0(a). As a consequence, all flows will be forced to be routed through the single switch storing the state and, from there, routed to their destinations. Due to the “hotspot” routing, the set of feasible solutions for the capacitated routing problem is reduced or becomes null. Instead, replicating the global state on multiple switches will lead to a better network utilization, as shown in Fig. 0(b), and to a much more larger set of feasible routing solutions, with a beneficial effect on the maximum amount traffic that can be sustained in the network and/or on the experienced delays.
The choice of an appropriate synchronization mechanism is crucial for the network performance and for the implementation complexity of the replication scheme. Notably, the CAP theorem [7] states that for a replication scheme, out of Consistency, Availability and Partition tolerance, only two properties can be picked at the same time. Considering that network failures may occur, partition tolerance cannot be left out of the design of our replication algorithm, leaving us with the following two reference models:
Strong consistency
A replication algorithm based on strong consistency privileges consistency over availability. This translates into strong guarantees that the same value of a state will be read across all replicas, at the cost of higher delays to access and update the states. The delay penalty is caused by the adopted protocol (e.g., Paxos [15], Raft [18]) requiring intensive interaction among the replicas whenever a read or write transaction is executed. Side effects of the replication protocol are the high overhead in terms of synchronization traffic and its highly complex algorithm, typically incompatible with the limited amount of hardware resources available at the switches. Furthermore, the latency due to the communication between replicas requires buffering packets at each switch while waiting for the outcome of the replication transaction. This further makes the scheme too complex to be adopted in practice in high speed networks.
Eventual consistency
Replication schemes based on eventual consistency prioritize the availability of the replicas over the consistency. This translates into low latencies during the execution of transactions at the cost of no guarantees on the consistency of the actual values of each replica. Most of eventual consistency algorithms are based on gossip protocols [4, 22, 19] which incur into small overhead in terms of synchronization traffic. At the same time, due to the simplicity of the adopted communication protocols, these algorithms can be implemented in programmable switches.
Due to the implementation and performance issues highlighted for strong consistency schemes, in the following we will assume a replication scheme based on eventual consistency, according to which each replica generates a fixed amount of synchronization traffic towards all the other replicas. As shown in [23], this scheme can be implemented in current stateofart programmable data plane and, in practice, maintains small errors among the values of the replicas.
Iii Optimal state replication problem
Given a network graph, the objective of the state replication problem is to identify the best set of nodes (i.e., switches) where to place the replicas of each state and to compute the optimal routing. Coherently with [1], the nodes are selected in order to minimize the overall traffic in the network and to guarantee that all flows affecting (or affected by) a given state will traverse at least one replica with this state. Differently from [1], the traffic present in the network is composed of not just of data traffic, but also of the traffic introduced by the synchronization protocol required to keep replicas of the same state consistent.
We propose an integer linear program (ILP) formalization, as in the original SNAP model [1]. The relevant notation is reported in Table LABEL:tab:milpNotationInput. Our formalization takes the following input parameters:

Network. Let be the network graph with nodes. Let be the capacity of edge .

Traffic flows. Let be the set of all flows. The traffic demands are assumed to be known in advance, in particular: let be the demand of traffic flow , being and respectively the source and the destination nodes of the flow.

State variables. Let be the set of all state variables. Let be the ordered sequence of state variables for flow , obtained from the xFFD of the corresponding application.

Maximum number of replicas. Let be a given upper bound on the number of replicas for a state variable , chosen by the network designer. Note that the optimal number of replicas for state , denoted by , will be computed while satisfying the constraint .
Let be the set of all possible sequences of state replicas for a flow . Consider a toy example in which a flow requires 3 state variables , , , i.e., . Each state has 2 replicas (denoted as “1” and “2”). Now , and, as example, the sequence implies that traverses replica of state , then replica of state , and finally replica of state . Let be the replica of state variable in sequence . For the above example with , , and .
The output of the solver is described as follows, and the relevant notation is reported in Table LABEL:tab:milpNotationOutput:

Placement of the replicas of each state. Let
be a binary variable equal to 1 iff replica
of state is stored at node . Note that the optimization problem might place multiple replicas on the same node, but this would correspond to a single instance of the state. Thus, the optimal number of distinct replicas of state across the whole network can be computed as follows^{1}^{1}1Let be the indicator function of , equal to 1 iff condition is true.: 
Data traffic routing. Let be a binary variable equal to 1 iff flow traverses the sequence of state replicas on edge . The set of such variables describes the complete routing of all flows in the network, taking also into account the constraint for the required sequence of traversed replicas. To avoid outofsequence problems, we do not allow flow splitting between different sequences of replicas.

Synchronization traffic routing. Let be a binary variable equal to 1 iff there are replicas of the state variable on nodes and and the flow from node to node traverses edge . This set of variables describes the routing of the synchronization traffic between different replicas of the same state. Let be the traffic generated by each state replica to update each other single replica of the same state.
Iv Approximation algorithm for single state replication
We address specifically the problem of state replication for a single state variable. To address the limited scalability of the ILP solver, we propose PlaceMultiReplicas (PMR) algorithm which is computationally scalable and will be shown in Sec. V to approximate well the optimal solution obtained by the ILP solver for small problem instances.
The pseudocode of PMR is given in Algorithm 1. It takes as input the network graph , the state variable and the maximum number of replicas of and the set of flows requiring . As output, the algorithm returns: the routing variables of the data flows and of the state synchronization flows and the replicas placement variables . The algorithm works through 3 phases:

Phase 1. The network graph is partitioned into clusters, in order to minimize the maximum distance among the elements within a cluster. This allows to distribute the replicas across the whole network in a balanced way, exploiting the spatial diversity offered by each cluster.

Phase 2. In each cluster, a replica is placed in the “most central” node, i.e., the one with the highest betweenness centrality, in order to minimize the data traffic for each flow.

Phase 3. The position of each replica is perturbed at random using a local search to improve the solution with respect to one obtained in the previous two phases.
Algorithm 1 comprises all the mentioned phases. After having initialized the routing and the replica placement variables (lines 24), Phase 1 is executed in line 5 by calling ComputePartitions. This method solves the means clustering problem [21] with using Lloyd’s algorithm [20] in which the node with the highest betweenness centrality is chosen as center of the partition.
As part of Phase 2 (lines 69), within each subgraph the node with the highest betweenness centrality is assigned a state variable replica through NodeWithHighestBC. As a reminder, betweenness centrality of a node is proportional to the number of shortest paths crossing it.
Lines 11 to 18 refer to a local search procedure with iterations. Within each iteration, RouteFlows is used to route flows through the location of the replicas identified in Phase 2, following two subpaths: one from the flow source node to the closest replica and one from this replica to the destination node. The procedure works on the set of flows and the location of state variables and returns the routing variables for data flows and for state synchronization , and the corresponding total traffic in the network. Lines 24 to 40 route the data flows from their source to the destination while traversing the replica which has the minimum path length among all other replicas. For each flow, in lines 26 and 27, the replica and the path traversing it are initialized. Then for each replica (in lines 2835), first, the shortest path is computed. is the vertex for which . If the path length .length is less than the previous minimum minDist in line 30, then the current path is stored as the best path and the current replica as the best replica . In lines 3639, for each edge in , the routing as well as the traffic value is updated. Lines 41 to 49 generate flows from each state replica to all the other state replicas for state synchronization using the shortest path. This includes the synchronization flows being updated in line 45 for each edge in the path before updating the total traffic in line 46. If is less than the previous minimum, then the minimum traffic value and all the decision variables are updated (lines 1415). In Phase 3 (line 17), a local search procedure perturbs the existing state replica locations. This proceeds by randomly selecting one node where a replica is located and moving it to one of its neighbor nodes. This new solution is then compared with the current one (line 13) after having evaluated the corresponding routing and total traffic.
V Performance comparison
We evaluate the performance of PMR presented in Sec. IV. The local search in PMR runs with iterations. In the case of small instances of the problem, we run an ILP solver, coded using IBM CPLEX optimizer [8], implementing the optimization model in Sec. III. We compute the approximation ratio, i.e., the ratio between the total traffic obtained by PMR and the optimal traffic obtained by the ILP solver. We consider two standard topologies for the network graph:

Unwrapped Manhattan is a grid.

WattsStrogatz [24] adds a few longrange links to regular graph topologies to reduce the distances between pairs of nodes and emulate a smallworld model. It is generated by taking a ring of nodes, where each node is connected to
nearest neighbors. In each node, the edge connected to its nearest clockwise neighbor is disconnected with probability
and connected to another node chosen uniformly at random over the entire ring. Thus, the final topology maintains the original average degree while being connected. In the following, we will use and .
Vi Asymptotic analysis for number of replicas
We now present an asymptotic analysis, i.e., for very large network graphs, to estimate the optimal number of replicas. We will consider specifically an unwrapped Manhattan topology since amenable to analytical modeling. Furthermore, for simplicity we assume a single state.
Via Methodology
We consider a unit square as shown in Fig. 2, representing the boundary of an unwrapped Manhattan topology containing nodes, with . Thus, any position within the unit square is associated to a network node, and any line within the unit square represents a routing path across a sequence of nodes in the original topology.
We now assume that the number of replicas is a perfect square, i.e. . The unit square is divided into individual squares, each of them of size and with a center point , where is an index identifying the square, as shown in Fig. 2. Here, denotes the location of the th state replica in the network. We now evaluate the optimal number of replicas that minimizes the total traffic in the topology.
The total traffic is composed of the data traffic and the synchronization traffic, coherently with the cost function in (LABEL:eq:objTotalTraffic). Consider now a given flow . We assume that the traffic demand is routed in a straight line between two points in the square, since its approximates well the stepwise stairlike routing in the original Manhattan topology, for . The total traffic generated by the flow is where is the corresponding distance of the routing path in terms of hops in the Manhattan topology. The following bound can be easily shown, relating the distance between two points in the unit square and the corresponding routing distance in terms of hops:
(1) Now recall that a flow from a source node to a destination node must traverse at least one replica , as shown in Fig. 2, in order to affect (or being affected by) the state replica.
We start by evaluating the overall data traffic. We assume uniform traffic between any pair of nodes in the original topology, with a total number of flows equal to and all flows with rate , coherently with Sec. V. Based on (1), we can define the average routing distance as:
(2) where is a constant value less than . Thus, the overall data traffic generated in the network can be computed as the total generated data traffic times the average distance :
(3) where is the average total distance between two randomly generated points in the unit graph passing through the closest replica.
To evaluate , we utilize a Monte Carlo method. We generate pairs of points with uniform random coordinates in the unit square, which are and for source and destination nodes respectively, as in Fig. 2. Assume now the following case holds: the distance between and its closest replica is smaller than between and its closest replica. Now the total distance between and is computed by summing two terms: the distance from to the closest replica , and the one from such replica to . If the considered case does not hold, the result is identical for symmetry. Fig. 3 shows the average total distance obtained by randomly generating pairs of nodes. When the number of replicas is large, asymptotically approaches 0.5412 coherently with wellknown theoretical results [10].
We now evaluate the overall synchronization traffic between the replicas, by knowing the predefined positions of the replicas in the unit square. The average distance between any two replicas asymptotically approaches 0.5221 as shown in Fig. 3. Thanks to (1), the synchronization traffic between the replicas can be computed as follows:
(4) where the last term considers the pairwise synchronization between replicas. Note that is independent from the data traffic.
Property 1
The total traffic for an unwrapped Manhattan topology of size is given by:
(5) where , and both and depend on as shown in Fig. 3.
ViB Optimal number of replicas and its approximation
We now evaluate numerically (5) and, through a dichotomic search, we find the optimal number of replicas that minimizes . Fig. 4 shows the optimal number of replicas for different values of and .
Note that for higher values of , more replicas are required to cover the network. For higher values of , the number of replicas decreases because of the higher cost in terms of synchronization traffic.
The curves in Fig. 4 can be fit by a function in the following form:
(6) with the fitting parameters. Using standard leastsquare fitting procedure, we numerically evaluated the best fitting parameters and obtained the following claim:
Property 2
The optimal number of replicas in an unwrapped Manhattan topology of size can be approximated as follows
(7) which implies that grows as .
Fig. 5 shows the optimal number of replicas obtained according to (7). As expected, if is small, then the number of replicas is large and for small networks correspond almost to one replica per node. For large values of synchronization traffic (), the number of replicas is kept at the minimum, and 8 replicas are enough for networks with switches. We now evaluate the error introduced by Property 2. We evaluated (i) by solving the optimization problem described in Sec. III, (ii) by computing (7), and (iii) the optimal number of replicas obtained by running PMR. We considered the same uniform traffic pattern described in Sec. V for the unwrapped Manhattan topology. All the results were obtained with 1000 different runs.
Fig. 6 shows the maximum error between and for that varies between and . In all cases, the maximum error is bounded by one, i.e., overestimates by at most one the optimal number of replicas. This result shows that the formula in (7) is also a good approximation for small Manhattan networks.
Due to scalability restraints we could not run the optimal solver to evaluate the error for larger networks. For this reason we had to refer to the optimal number of replicas obtained by PMR. Fig. 7 shows the error between and for varying between 9 and 121. Also in this case, the maximum error is bounded by one. Thus, the expression in (7) appears to be a reliable approximation even for larger unwrapped Manhattan topologies.
Vii Related works
The Virtual Network Embedding (VNE) problem finds the optimal placement of chains of VNFs while providing various optimization metrics. VNE can be closely mapped to the problem mentioned in this paper, if we consider network functions to be states and chains to be dependency graphs as computed by SNAP. There exist multiple ILP formulations and heuristics for VNE (an extensive survey is available in
[9]), some of which are similar to the one proposed in our work. However, to our best knowledge, none of them consider the possibility of having replicated virtual functions.As mentioned before, SNAP [1] solves the problem of how to optimally place the states across the network switches, taking also into account the dependency between states and the traffic flows. However, by design, SNAP is limited to just one replica of each state within the network. Our work extends SNAP by enabling multiple replicas of the same state.
There exist multiple other network programming abstractions [13, 26, 2]. However, most of them limit themselves to keeping the states at the controller with little existing work exploiting stateful data planes to store states. Instead, NetKAT [17] focuses on stateful data planes and provides a native support for replicated states, but, by design, the replicas are placed at the edges (i.e., entry and exit switches) of the network for all intransit flows. Thus, the placement is not optimized based on the traffic matrix and our methodology could be directly applied to NetKAT. Moreover, in [17] the synchronization traffic is carried in piggybacking over the data traffic. Thus, not only the synchronization but also the data traffic is required to traverse all state replicas. Our proposal instead decouples data traffic and synchronization traffic, thus leading to more flexibility for the routing.
Swing State [16] introduces a mechanism for state migrations entirely in the data plane but, similarly to SNAP, assumes only a single replica of a state which can be migrated across the network on demand.
Viii Conclusions
We consider stateful data planes in which states can be replicated across multiple switches. We propose an ILP formalization of the optimal placement problem that identify the optimal placement for the state replicas and the optimal routing for the data and synchronization traffic. In order to cope with the limited scalability of the ILP solver, we propose PMR algorithm and we show that it approximates well the optimal solution. We also numerically show the beneficial effect of state replication on reducing the overall traffic in the network. Finally, we provide an asymptotic analysis to compute the optimal number of state replicas in unwrapped Manhattan topology and show its applicability also to small graphs. Our results advocate the adoption of replicated states when the network application is distributed and the states are “global” across multiple switches. Notably, our work is complementary to the works showing the feasibility of implementing replicated states in stateofart programmable data planes.
References
 [1] (2016) SNAP: stateful networkwide abstractions for packet processing. In ACM SIGCOMM, External Links: ISBN 9781450341936 Cited by: §II, §III, §III, §VII.
 [2] (2016) Temporal NetKAT. ACM SIGPLAN Notices 51 (6), pp. 386–401. Cited by: §VII.
 [3] (201705) Onthefly traffic classification and control with a stateful SDN approach. In IEEE ICC, Vol. , pp. 1–6. External Links: Document, ISSN 19381883 Cited by: §I.
 [4] (2007) The promise, and limitations, of gossip protocols. ACM SIGOPS Operating Systems Review 41 (5), pp. 8–13. Cited by: §IIA.
 [5] (2017) Implementing advanced network functions for datacenters with stateful programmable data planes. In LANMAN, pp. 1–6. Cited by: §I, §I.
 [6] (2013) Forwarding metamorphosis: fast programmable matchaction processing in hardware for SDN. In ACM SIGCOMM CCR, Cited by: §I, §I.
 [7] (201202) CAP twelve years later: How the “rules” have changed. Computer 45 (2), pp. 23–29. Cited by: §IIA.
 [8] CPLEX Optimizer. External Links: Link Cited by: §V.
 [9] (2013) Virtual network embedding: a survey. IEEE Communications Surveys & Tutorials 15 (4), pp. 1888–1906. Cited by: §VII.

[10]
(1993)
Expected distances between two uniformly distributed random points in rectangles and rectangular parallelpipeds
. Journal of the Operational Research Society 44 (5), pp. 513–519. Cited by: §VIA.  [11] (2015) Measuring control plane latency in SDNenabled switches. In ACM SIGCOMM SOSR, Cited by: §I.
 [12] (2016) Inband Network Telemetry (INT). External Links: Link Cited by: §I.
 [13] (2015) Kinetic: verifiable dynamic network control. In USENIX NSDI 15, pp. 59–72. Cited by: §VII.
 [14] (201501) SoftwareDefined Networking: a comprehensive survey. Proceedings of the IEEE 103 (1), pp. 14–76. Cited by: §I.
 [15] (2001) Paxos made simple. ACM Sigact News. Cited by: §IIA.
 [16] (2017) Swing State: consistent updates for stateful and programmable data planes. In ACM SIGCOMM SOSR, Cited by: §VII.
 [17] (2016) Eventdriven network programming. In ACM SIGPLAN Notices, Vol. 51, pp. 369–385. Cited by: §II, §VII.
 [18] (2014) In search of an understandable consensus algorithm.. In USENIX Annual Technical Conference, Cited by: §IIA.
 [19] (1996) Bayou: replicated database services for worldwide applications. In ACM SIGOPS European workshop, pp. 275–280. Cited by: §IIA.
 [20] (2013) Graph partitioning for network problems. In Joint NZSA ORSNZ Conference, pp. 1–10. Cited by: §IV.
 [21] (200708) Survey: graph clustering. Computer Science Review 1 (1), pp. 27–64. External Links: ISSN 15740137, Document Cited by: §IV.
 [22] (2011) Conflictfree replicated data types. In Symposium on SelfStabilizing Systems, pp. 386–400. Cited by: §IIA.
 [23] (201806) LODGE: LOcal Decisions on Global statEs in programmable data planes. In IEEE NetSoft, Vol. , pp. 257–261. Cited by: §I, §IIA.
 [24] (1998) Collective dynamics of ’smallworld’ networks. Nature 393 (6684), pp. 440. Cited by: 2nd item.
 [25] (2013) On scalability of softwaredefined networking. IEEE Communications Magazine 51 (2), pp. 136–141. Cited by: §I.
 [26] (2014) NetEgg: programming network policies by examples. In ACM SIGCOMM HotNets, pp. 20. Cited by: §VII.

Iv Approximation algorithm for single state replication
We address specifically the problem of state replication for a single state variable. To address the limited scalability of the ILP solver, we propose PlaceMultiReplicas (PMR) algorithm which is computationally scalable and will be shown in Sec. V to approximate well the optimal solution obtained by the ILP solver for small problem instances.
The pseudocode of PMR is given in Algorithm 1. It takes as input the network graph , the state variable and the maximum number of replicas of and the set of flows requiring . As output, the algorithm returns: the routing variables of the data flows and of the state synchronization flows and the replicas placement variables . The algorithm works through 3 phases:

Phase 1. The network graph is partitioned into clusters, in order to minimize the maximum distance among the elements within a cluster. This allows to distribute the replicas across the whole network in a balanced way, exploiting the spatial diversity offered by each cluster.

Phase 2. In each cluster, a replica is placed in the “most central” node, i.e., the one with the highest betweenness centrality, in order to minimize the data traffic for each flow.

Phase 3. The position of each replica is perturbed at random using a local search to improve the solution with respect to one obtained in the previous two phases.
Algorithm 1 comprises all the mentioned phases. After having initialized the routing and the replica placement variables (lines 24), Phase 1 is executed in line 5 by calling ComputePartitions. This method solves the means clustering problem [21] with using Lloyd’s algorithm [20] in which the node with the highest betweenness centrality is chosen as center of the partition.
As part of Phase 2 (lines 69), within each subgraph the node with the highest betweenness centrality is assigned a state variable replica through NodeWithHighestBC. As a reminder, betweenness centrality of a node is proportional to the number of shortest paths crossing it.
Lines 11 to 18 refer to a local search procedure with iterations. Within each iteration, RouteFlows is used to route flows through the location of the replicas identified in Phase 2, following two subpaths: one from the flow source node to the closest replica and one from this replica to the destination node. The procedure works on the set of flows and the location of state variables and returns the routing variables for data flows and for state synchronization , and the corresponding total traffic in the network. Lines 24 to 40 route the data flows from their source to the destination while traversing the replica which has the minimum path length among all other replicas. For each flow, in lines 26 and 27, the replica and the path traversing it are initialized. Then for each replica (in lines 2835), first, the shortest path is computed. is the vertex for which . If the path length .length is less than the previous minimum minDist in line 30, then the current path is stored as the best path and the current replica as the best replica . In lines 3639, for each edge in , the routing as well as the traffic value is updated. Lines 41 to 49 generate flows from each state replica to all the other state replicas for state synchronization using the shortest path. This includes the synchronization flows being updated in line 45 for each edge in the path before updating the total traffic in line 46. If is less than the previous minimum, then the minimum traffic value and all the decision variables are updated (lines 1415). In Phase 3 (line 17), a local search procedure perturbs the existing state replica locations. This proceeds by randomly selecting one node where a replica is located and moving it to one of its neighbor nodes. This new solution is then compared with the current one (line 13) after having evaluated the corresponding routing and total traffic.
V Performance comparison
We evaluate the performance of PMR presented in Sec. IV. The local search in PMR runs with iterations. In the case of small instances of the problem, we run an ILP solver, coded using IBM CPLEX optimizer [8], implementing the optimization model in Sec. III. We compute the approximation ratio, i.e., the ratio between the total traffic obtained by PMR and the optimal traffic obtained by the ILP solver. We consider two standard topologies for the network graph:

Unwrapped Manhattan is a grid.

WattsStrogatz [24] adds a few longrange links to regular graph topologies to reduce the distances between pairs of nodes and emulate a smallworld model. It is generated by taking a ring of nodes, where each node is connected to
nearest neighbors. In each node, the edge connected to its nearest clockwise neighbor is disconnected with probability
and connected to another node chosen uniformly at random over the entire ring. Thus, the final topology maintains the original average degree while being connected. In the following, we will use and .
Vi Asymptotic analysis for number of replicas
We now present an asymptotic analysis, i.e., for very large network graphs, to estimate the optimal number of replicas. We will consider specifically an unwrapped Manhattan topology since amenable to analytical modeling. Furthermore, for simplicity we assume a single state.
Via Methodology
We consider a unit square as shown in Fig. 2, representing the boundary of an unwrapped Manhattan topology containing nodes, with . Thus, any position within the unit square is associated to a network node, and any line within the unit square represents a routing path across a sequence of nodes in the original topology.
We now assume that the number of replicas is a perfect square, i.e. . The unit square is divided into individual squares, each of them of size and with a center point , where is an index identifying the square, as shown in Fig. 2. Here, denotes the location of the th state replica in the network. We now evaluate the optimal number of replicas that minimizes the total traffic in the topology.
The total traffic is composed of the data traffic and the synchronization traffic, coherently with the cost function in (LABEL:eq:objTotalTraffic). Consider now a given flow . We assume that the traffic demand is routed in a straight line between two points in the square, since its approximates well the stepwise stairlike routing in the original Manhattan topology, for . The total traffic generated by the flow is where is the corresponding distance of the routing path in terms of hops in the Manhattan topology. The following bound can be easily shown, relating the distance between two points in the unit square and the corresponding routing distance in terms of hops:
(1) 
Now recall that a flow from a source node to a destination node must traverse at least one replica , as shown in Fig. 2, in order to affect (or being affected by) the state replica.
We start by evaluating the overall data traffic. We assume uniform traffic between any pair of nodes in the original topology, with a total number of flows equal to and all flows with rate , coherently with Sec. V. Based on (1), we can define the average routing distance as:
(2) 
where is a constant value less than . Thus, the overall data traffic generated in the network can be computed as the total generated data traffic times the average distance :
(3) 
where is the average total distance between two randomly generated points in the unit graph passing through the closest replica.
To evaluate , we utilize a Monte Carlo method. We generate pairs of points with uniform random coordinates in the unit square, which are and for source and destination nodes respectively, as in Fig. 2. Assume now the following case holds: the distance between and its closest replica is smaller than between and its closest replica. Now the total distance between and is computed by summing two terms: the distance from to the closest replica , and the one from such replica to . If the considered case does not hold, the result is identical for symmetry. Fig. 3 shows the average total distance obtained by randomly generating pairs of nodes. When the number of replicas is large, asymptotically approaches 0.5412 coherently with wellknown theoretical results [10].
We now evaluate the overall synchronization traffic between the replicas, by knowing the predefined positions of the replicas in the unit square. The average distance between any two replicas asymptotically approaches 0.5221 as shown in Fig. 3. Thanks to (1), the synchronization traffic between the replicas can be computed as follows:
(4) 
where the last term considers the pairwise synchronization between replicas. Note that is independent from the data traffic.
Property 1
The total traffic for an unwrapped Manhattan topology of size is given by:
(5) 
where , and both and depend on as shown in Fig. 3.
ViB Optimal number of replicas and its approximation
We now evaluate numerically (5) and, through a dichotomic search, we find the optimal number of replicas that minimizes . Fig. 4 shows the optimal number of replicas for different values of and .
Note that for higher values of , more replicas are required to cover the network. For higher values of , the number of replicas decreases because of the higher cost in terms of synchronization traffic.
The curves in Fig. 4 can be fit by a function in the following form:
(6) 
with the fitting parameters. Using standard leastsquare fitting procedure, we numerically evaluated the best fitting parameters and obtained the following claim:
Property 2
The optimal number of replicas in an unwrapped Manhattan topology of size can be approximated as follows
(7) 
which implies that grows as .
Fig. 5 shows the optimal number of replicas obtained according to (7). As expected, if is small, then the number of replicas is large and for small networks correspond almost to one replica per node. For large values of synchronization traffic (), the number of replicas is kept at the minimum, and 8 replicas are enough for networks with switches. We now evaluate the error introduced by Property 2. We evaluated (i) by solving the optimization problem described in Sec. III, (ii) by computing (7), and (iii) the optimal number of replicas obtained by running PMR. We considered the same uniform traffic pattern described in Sec. V for the unwrapped Manhattan topology. All the results were obtained with 1000 different runs.
Fig. 6 shows the maximum error between and for that varies between and . In all cases, the maximum error is bounded by one, i.e., overestimates by at most one the optimal number of replicas. This result shows that the formula in (7) is also a good approximation for small Manhattan networks.
Due to scalability restraints we could not run the optimal solver to evaluate the error for larger networks. For this reason we had to refer to the optimal number of replicas obtained by PMR. Fig. 7 shows the error between and for varying between 9 and 121. Also in this case, the maximum error is bounded by one. Thus, the expression in (7) appears to be a reliable approximation even for larger unwrapped Manhattan topologies.
Vii Related works
The Virtual Network Embedding (VNE) problem finds the optimal placement of chains of VNFs while providing various optimization metrics. VNE can be closely mapped to the problem mentioned in this paper, if we consider network functions to be states and chains to be dependency graphs as computed by SNAP. There exist multiple ILP formulations and heuristics for VNE (an extensive survey is available in
[9]), some of which are similar to the one proposed in our work. However, to our best knowledge, none of them consider the possibility of having replicated virtual functions.As mentioned before, SNAP [1] solves the problem of how to optimally place the states across the network switches, taking also into account the dependency between states and the traffic flows. However, by design, SNAP is limited to just one replica of each state within the network. Our work extends SNAP by enabling multiple replicas of the same state.
There exist multiple other network programming abstractions [13, 26, 2]. However, most of them limit themselves to keeping the states at the controller with little existing work exploiting stateful data planes to store states. Instead, NetKAT [17] focuses on stateful data planes and provides a native support for replicated states, but, by design, the replicas are placed at the edges (i.e., entry and exit switches) of the network for all intransit flows. Thus, the placement is not optimized based on the traffic matrix and our methodology could be directly applied to NetKAT. Moreover, in [17] the synchronization traffic is carried in piggybacking over the data traffic. Thus, not only the synchronization but also the data traffic is required to traverse all state replicas. Our proposal instead decouples data traffic and synchronization traffic, thus leading to more flexibility for the routing.
Swing State [16] introduces a mechanism for state migrations entirely in the data plane but, similarly to SNAP, assumes only a single replica of a state which can be migrated across the network on demand.
Viii Conclusions
We consider stateful data planes in which states can be replicated across multiple switches. We propose an ILP formalization of the optimal placement problem that identify the optimal placement for the state replicas and the optimal routing for the data and synchronization traffic. In order to cope with the limited scalability of the ILP solver, we propose PMR algorithm and we show that it approximates well the optimal solution. We also numerically show the beneficial effect of state replication on reducing the overall traffic in the network. Finally, we provide an asymptotic analysis to compute the optimal number of state replicas in unwrapped Manhattan topology and show its applicability also to small graphs. Our results advocate the adoption of replicated states when the network application is distributed and the states are “global” across multiple switches. Notably, our work is complementary to the works showing the feasibility of implementing replicated states in stateofart programmable data planes.
References
 [1] (2016) SNAP: stateful networkwide abstractions for packet processing. In ACM SIGCOMM, External Links: ISBN 9781450341936 Cited by: §II, §III, §III, §VII.
 [2] (2016) Temporal NetKAT. ACM SIGPLAN Notices 51 (6), pp. 386–401. Cited by: §VII.
 [3] (201705) Onthefly traffic classification and control with a stateful SDN approach. In IEEE ICC, Vol. , pp. 1–6. External Links: Document, ISSN 19381883 Cited by: §I.
 [4] (2007) The promise, and limitations, of gossip protocols. ACM SIGOPS Operating Systems Review 41 (5), pp. 8–13. Cited by: §IIA.
 [5] (2017) Implementing advanced network functions for datacenters with stateful programmable data planes. In LANMAN, pp. 1–6. Cited by: §I, §I.
 [6] (2013) Forwarding metamorphosis: fast programmable matchaction processing in hardware for SDN. In ACM SIGCOMM CCR, Cited by: §I, §I.
 [7] (201202) CAP twelve years later: How the “rules” have changed. Computer 45 (2), pp. 23–29. Cited by: §IIA.
 [8] CPLEX Optimizer. External Links: Link Cited by: §V.
 [9] (2013) Virtual network embedding: a survey. IEEE Communications Surveys & Tutorials 15 (4), pp. 1888–1906. Cited by: §VII.

[10]
(1993)
Expected distances between two uniformly distributed random points in rectangles and rectangular parallelpipeds
. Journal of the Operational Research Society 44 (5), pp. 513–519. Cited by: §VIA.  [11] (2015) Measuring control plane latency in SDNenabled switches. In ACM SIGCOMM SOSR, Cited by: §I.
 [12] (2016) Inband Network Telemetry (INT). External Links: Link Cited by: §I.
 [13] (2015) Kinetic: verifiable dynamic network control. In USENIX NSDI 15, pp. 59–72. Cited by: §VII.
 [14] (201501) SoftwareDefined Networking: a comprehensive survey. Proceedings of the IEEE 103 (1), pp. 14–76. Cited by: §I.
 [15] (2001) Paxos made simple. ACM Sigact News. Cited by: §IIA.
 [16] (2017) Swing State: consistent updates for stateful and programmable data planes. In ACM SIGCOMM SOSR, Cited by: §VII.
 [17] (2016) Eventdriven network programming. In ACM SIGPLAN Notices, Vol. 51, pp. 369–385. Cited by: §II, §VII.
 [18] (2014) In search of an understandable consensus algorithm.. In USENIX Annual Technical Conference, Cited by: §IIA.
 [19] (1996) Bayou: replicated database services for worldwide applications. In ACM SIGOPS European workshop, pp. 275–280. Cited by: §IIA.
 [20] (2013) Graph partitioning for network problems. In Joint NZSA ORSNZ Conference, pp. 1–10. Cited by: §IV.
 [21] (200708) Survey: graph clustering. Computer Science Review 1 (1), pp. 27–64. External Links: ISSN 15740137, Document Cited by: §IV.
 [22] (2011) Conflictfree replicated data types. In Symposium on SelfStabilizing Systems, pp. 386–400. Cited by: §IIA.
 [23] (201806) LODGE: LOcal Decisions on Global statEs in programmable data planes. In IEEE NetSoft, Vol. , pp. 257–261. Cited by: §I, §IIA.
 [24] (1998) Collective dynamics of ’smallworld’ networks. Nature 393 (6684), pp. 440. Cited by: 2nd item.
 [25] (2013) On scalability of softwaredefined networking. IEEE Communications Magazine 51 (2), pp. 136–141. Cited by: §I.
 [26] (2014) NetEgg: programming network policies by examples. In ACM SIGCOMM HotNets, pp. 20. Cited by: §VII.
V Performance comparison
We evaluate the performance of PMR presented in Sec. IV. The local search in PMR runs with iterations. In the case of small instances of the problem, we run an ILP solver, coded using IBM CPLEX optimizer [8], implementing the optimization model in Sec. III. We compute the approximation ratio, i.e., the ratio between the total traffic obtained by PMR and the optimal traffic obtained by the ILP solver. We consider two standard topologies for the network graph:

Unwrapped Manhattan is a grid.

WattsStrogatz [24] adds a few longrange links to regular graph topologies to reduce the distances between pairs of nodes and emulate a smallworld model. It is generated by taking a ring of nodes, where each node is connected to
nearest neighbors. In each node, the edge connected to its nearest clockwise neighbor is disconnected with probability
and connected to another node chosen uniformly at random over the entire ring. Thus, the final topology maintains the original average degree while being connected. In the following, we will use and .
Vi Asymptotic analysis for number of replicas
We now present an asymptotic analysis, i.e., for very large network graphs, to estimate the optimal number of replicas. We will consider specifically an unwrapped Manhattan topology since amenable to analytical modeling. Furthermore, for simplicity we assume a single state.
Via Methodology
We consider a unit square as shown in Fig. 2, representing the boundary of an unwrapped Manhattan topology containing nodes, with . Thus, any position within the unit square is associated to a network node, and any line within the unit square represents a routing path across a sequence of nodes in the original topology.
We now assume that the number of replicas is a perfect square, i.e. . The unit square is divided into individual squares, each of them of size and with a center point , where is an index identifying the square, as shown in Fig. 2. Here, denotes the location of the th state replica in the network. We now evaluate the optimal number of replicas that minimizes the total traffic in the topology.
The total traffic is composed of the data traffic and the synchronization traffic, coherently with the cost function in (LABEL:eq:objTotalTraffic). Consider now a given flow . We assume that the traffic demand is routed in a straight line between two points in the square, since its approximates well the stepwise stairlike routing in the original Manhattan topology, for . The total traffic generated by the flow is where is the corresponding distance of the routing path in terms of hops in the Manhattan topology. The following bound can be easily shown, relating the distance between two points in the unit square and the corresponding routing distance in terms of hops:
(1) 
Now recall that a flow from a source node to a destination node must traverse at least one replica , as shown in Fig. 2, in order to affect (or being affected by) the state replica.
We start by evaluating the overall data traffic. We assume uniform traffic between any pair of nodes in the original topology, with a total number of flows equal to and all flows with rate , coherently with Sec. V. Based on (1), we can define the average routing distance as:
(2) 
where is a constant value less than . Thus, the overall data traffic generated in the network can be computed as the total generated data traffic times the average distance :
(3) 
where is the average total distance between two randomly generated points in the unit graph passing through the closest replica.
To evaluate , we utilize a Monte Carlo method. We generate pairs of points with uniform random coordinates in the unit square, which are and for source and destination nodes respectively, as in Fig. 2. Assume now the following case holds: the distance between and its closest replica is smaller than between and its closest replica. Now the total distance between and is computed by summing two terms: the distance from to the closest replica , and the one from such replica to . If the considered case does not hold, the result is identical for symmetry. Fig. 3 shows the average total distance obtained by randomly generating pairs of nodes. When the number of replicas is large, asymptotically approaches 0.5412 coherently with wellknown theoretical results [10].
We now evaluate the overall synchronization traffic between the replicas, by knowing the predefined positions of the replicas in the unit square. The average distance between any two replicas asymptotically approaches 0.5221 as shown in Fig. 3. Thanks to (1), the synchronization traffic between the replicas can be computed as follows:
(4) 
where the last term considers the pairwise synchronization between replicas. Note that is independent from the data traffic.
Property 1
The total traffic for an unwrapped Manhattan topology of size is given by:
(5) 
where , and both and
Comments
There are no comments yet.