Exploiting Parallelism in Optical Network Systems: A Case Study of Random Linear Network Coding (RLNC) in Ethernet-over-Optical Networks

07/10/2017
by   Anna Engelmann, et al.
0

As parallelism becomes critically important in the semiconductor technology, high-performance computing, and cloud applications, parallel network systems will increasingly follow suit. Today, parallelism is an essential architectural feature of 40/100/400 Gigabit Ethernet standards, whereby high speed Ethernet systems are equipped with multiple parallel network interfaces. This creates new network topology abstractions and new technology requirements: instead of a single high capacity network link, multiple Ethernet end-points and interfaces need to be considered together with multiple links in form of discrete parallel paths. This new paradigm is enabling implementations of various new features to improve overall system performance. In this paper, we analyze the performance of parallel network systems with network coding. In particular, by using random LNC (RLNC), - a code without the need for decoding, we can make use of the fact that we have codes that are both distributed (removing the need for coordination or optimization of resources) and composable (without the need to exchange code information), leading to a fully stateless operation. We propose a novel theoretical modeling framework, including derivation of the upper and lower bounds as well as an expected value of the differential delay of parallel paths, and the resulting queue size at the receiver. The results show a great promise of network system parallelism in combination with RLNC: with a proper set of design parameters, the differential delay and the buffer size at the Ethernet receiver can be reduced significantly, while the cross-layer design and routing can be greatly simplified.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 4

09/17/2019

Parallel Concatenation of Non-Binary Linear Random Fountain Codes with Maximum Distance Separable Codes

The performance and the decoding complexity of a novel coding scheme bas...
04/30/2019

On the Construction of G_N-coset Codes for Parallel Decoding

In this paper, we propose a type of G_N-coset codes for a highly paralle...
01/22/2018

Adaptive parallelism with RMI: Idle high-performance computing resources can be completely avoided

In practice, standard scheduling of parallel computing jobs almost alway...
01/07/2015

A Case Study: Task Scheduling Methodologies for High Speed Computing Systems

High Speed computing meets ever increasing real-time computational deman...
05/21/2019

Coding theory for noiseless channels realized by anonymous oblivious mobile robots

We propose an information transmission scheme by a swarm of anonymous ob...
11/20/2017

More Than The Sum Of Its Parts: Exploiting Cross-Layer and Joint-Flow Information in MPTCP

Multipath TCP (MPTCP) is an extension to TCP which aggregates multiple p...
03/23/2022

Optical Network Design for 4G LTE

The number of mobile users is increasing rapidly, 3GPP initiated a new t...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The high-speed Ethernet standard IEEE802.3 specifies that the 40/100/400Gb/s Ethernet traffic can be packetized and distributed over multiple parallel lanes (e.g., 10 lanes x 10Gb/s, for 100Gb/s), also referred to as Multi-Lane Distribution (MLD) [1]. Each lane can then be mapped onto parallel optical channels for transmission in Optical Transport Networks (OTN). Thus, each Ethernet end point is attached to network through multiple and parallel logical points of attachments, – which can be dynamically configured, enabling a functional decomposition of the overall system and new topological abstractions where multiple end points need to be mapped to multiple paths in the networks. Also other network systems already support parallelism: network parallelism as a concept can extend from networking to both the commodity hardware and modern cloud computing, such as multi-core parallel architecture and Hadoop Map Reduce. As the foundational capabilities for the parallel network system mature, they carry the potential to become the driving engine for multiple facets of information technology infrastructure today, including physical layer security, scalable data center architectures, and network load balancing.

Parallel network systems are however more complex than corresponding systems with serial connections and single end-system interfaces. The complexity needs to be evaluated against the benefits of parallelization. Another issue is of performance, including packet skews and delays. The skew of data packets occurs due to diversity of parallel links, which in Ethernet receivers requires the so-called

de-skewing via buffering. As the standard IEEE802.3 defines the maximum of 180ns skew per Ethernet lane to eliminate or reduce retransmissions or dropping of the unrecoverable Ethernet frames, keeping the skew within bounds is critical. The challenge of delay and packet processing requires methods of reduction of data coding overhead in different ISO/OSI layers. Today, each layer currently has its own representation of coding, e.g., source coding or coded storage. A simple distributed coding over Ethernet layer and optical layer, for instance, can eliminate the delay and complexity caused by mapping different coding schemes for different purposes and at different layers. Hence, parallelism, coding and routing requires different thinking about the cross-layer system engineering.

This paper sets the goal to explore the potential of parallelism in future network systems in combination with simple and unified coding in multiple layers, to solve system performance, cross-layer design and network resource utilization problem. We choose random linear network coding (RLNC) as unified coding scheme between Ethernet and optical (physical) layers. RLNC is known for its capability to allow a recoding without prior decoding resulting in two main advantages that eliminate the need i) for cross-layer coordination, or optimization of resources and ii) for exchange of code information. We take the following approach to system analysis and modeling. A serial data traffic is split into discrete parallel parts (frames) in the electronic end-system (Ethernet) that are assigned several parallel optical interfaces and paths in the optical network. A distributed and composable cross-layer coordination is based on RLNC, which can be employed in both layers. In this system, we analytically derive the expected values of differential delay in a generic parallel network system, whereby the number of parallel paths in the network maybe equal or larger than the number of parallel Ethernet lanes in the end-system. We furthermore derive the upper and lower bounds on the resulting queue size at the receiver, with and without RLNC. We analyze and compare the networks with optimal (without RLNC) and random routing (with RLNC) in the network, and show that RLNC can significantly reduce, and even eliminate, the path computation complexity and need for optimal routing. The theoretical results are validated by simulations, and show that the required storage at receiver in the parallel Ethernet system with RLNC is always smaller than in an Ethernet-over-optical system without RLNC, and that irrespectively of the routing scheme. We also show that by carefully selecting the level of parallelism in the Ethernet and optical network systems, the cross-layer interactions can be designed in a simple and practical fashion.

The rest of the paper is organized as follows. Section II discusses prior art and summarizes our contribution. Section III presents the parallel network system model. Section IV focuses on modeling of the expected differential delay and derives lower and upper bounds on skew queues at the receiver. Section V shows analytical and simulation results. Conclusions are drawn in Section VI.

Ii Related work and our contribution

From the functional perspective, some aspects of previous work bear resemblance with our concept of multichannel transmission and multi-interface nodes, but none of the previous work is directly applicable without a critical consideration. Various signal multiplexing technologies, for example, Wavelength Division Multiplexing (WDM), Polarization Division Multiplexing (PDM), Space Division Multiplexing (SDM) or elastic optical Frequency Division Multiplexing (OFDM), differ in the ways to realize physical links and impact the system design differently. Given the broad range or related topics, this section reviews those aspects of the state of the art that we find relevant to the idea of parallelism in optical network systems, whereby specific parts of it, like RLNC, are to be seen as tools used in combination with, and not as solutions for, network system parallelism.

Ii-a Multilane Distribution in High-Speed Ethernet

The 100/400 GE standards in IEEE 802.3 [1, 2] define Multiple Lane Distribution (MLD) systems with parallel interfaces. In these systems, high-speed Ethernet signals are distributed onto multiple lanes in a round robin fashion, with data rate at 40/100 Gbps perfectly compatible with optical channel rates in Optical Transport Networks (OTNs) [3, 4]. It should be noted that MLD in high-speed Ethernet defines cross-layer system requirements different from inverse multiplexing. The latter technique is standardized in IEEE802.3 as the so-called Link Aggregation, supported by the Link Aggregation Control Protocol (LACP). In optical transport networks (OTN) and synchronous optical network (SONET) standards, the inverse multiplexing is also defined, as Virtual Concatenation (VCAT). In fact, MLD does not require inverse multiplexing techniques, albeit their proven ability to implement dynamic skew compensation mechanism as studied in [5] and [6]. Instead, MLD enables parallel lanes to be mapped to parallel interfaces and channels in the optical layer, allowing the implementation and management of cross-layer interactions, similar to what has been shown Layers 3 and above [7]. Past work also used Layer 2 switching concepts, particularly OpenFlow, in conjunction with multipath TCP (MPTCP) [8].

Ii-B Parallelism vs. multipath routing

Our approach utilizes concepts known from similar to multipath routing in layer 3, but extends the same to the network with multiple end-system interfaces. The number of the multiple network end-points can be dynamically configured, which creates not only new network abstractions, but also new routing optimization problems, since the number of end-points is usually matched to the number of routes in the network. In high-speed Ethernet systems, frames are distributed onto multiple lanes (4, or 10 lanes) in a round robin fashion, with data rates perfectly compatible with the corresponding number of optical channel rates in Optical Transport Networks (OTNs) [3, 4].

In the optical layer, the elastic (cognitive, flexible) optical networks, optical transponders (OTP) have evolved from fixed or mixed-line-rate (MLR) to bandwidth-variable (BV) to sliceable-BV [9, 10] to serve various low-capacity (e.g., 40G) and high capacity (400G) demands. The sliceable bandwidth-variable transponders (SBVTs), also called multiflow optical transponder (MF-OTP), maps traffic flows coming from the upper layers (e.g., Internet flows) to multiple optical flows. Moreover, recent work showed that parameters of multiflow optical transponders can be software programmed [11]. These systems are different from parallel network systems, since the Internet traffic flows cannot be dynamically configured to map the optical routes, for instance with an software defined routers.

It should be noted that prior art proposed optimizations to compute multiple paths. In contrast, in our approach abandons the idea of routing optimizations, for the reasons of complexity and also because current multipath routing algorithms cannot be used in network topologies with multiple links between nodes. The latter requires different approach to path analysis, and in our approach we use a combinatorial path analysis for the same. We also show that parallelism can simplify or eliminate routing when used in combination with random linear network coding, which is a significant result.

Ii-C Random Linear Network Coding (RLNC)

Previous work on linear network coding focused in general on improving network throughput and reliability. However, significant body of work in the last decade (e.g., [12, 13, 14, 15, 16]) addressed with network coding the end-to-end delays improvement in delay-constrained networks in broadcast and unicast scenarios. In [12], for instance, the delay performance of network coding was studied and compared it to scheduling methods. Lucani et. al, in [13] tailored coding and feedback to reduce the expected delay. Paper [14] studied the problem of minimizing the mean completion delay for instantly decodable network coding. In [15, 16] authors showed that network coding can outperform optimal routing in single unicast setting. More recent works, like [17], presented a streaming code that uses forward error correction to reduce in-order delivery delay over multiple parallel wireless networks. However, none of these works address delay in parallel network systems.

Network-coded multipath routing has been applied for erasure correction [18], where the combined information from multiple paths is transferred on a few additional (parallel) paths. The additional information was used to recover the missing information during decoding.

In optical networks, our previous work [19] proposed for the first time a network coded parallel transmission scheme for high-speed Ethernet using multipath routing. Paper [20] focused on enabling parallel transmission by linear network coding without consideration of data link layer technology. In [21] we presented a preliminary theoretical model to achieve fault tolerance by using 2-parallel transmission and RLNC to achieve better spectral efficiency in the optical layer. Finally, in [22], we showed that utilizing of RLNC significantly improve reliability and security in parallel optical transmission systems.

Our cross-layer approach can be generally based on any symbol based MDS-Codes (Maximum Distance Separable Code), while we decided to use random LNC as a tool owing to the fact that it allows decoupling between code selection and transmission architecture. RLNC encoding and decoding can be perform in a parallel fashion [23]

, whereas strucutured MDS codes are generally difficult to code or decode in a multithreaded fashion. The distributed nature of the RLNC code construction removes the need for cross-layer coordination when combined with parallelization. Different parallel channels may construct their own codes, and further parallelization takes place with the use of a single coding approach, without the need for state awareness across parallel paths. Our choice of RLNC moreover was motivated by potential use of its recoding capability and design of unified code for a network system in a cross-layer fashion. The composability feature underlies the ability to have cross-layer operation without the need to exchange state or other code-related information across layers. Each layer may, or not, introduce its own coding, composing upon the coding of other layers. Even after repeated coding (recoding), the effect upon the data is of a single linear transformation, requiring no decoding in the network, but a single linear inversion. This makes the combination of RLNC and parallelism especially promising. Note also that RLNC may lend itself to a hybrid use where some of the data may be transmitted uncoded. We do not present that scenario explicitly.

Ii-D Our contribution

This paper builds on our preliminary works [19, 20]. In extension of the preliminary work, this paper provides

  • Derivation of the expected value of the differential delay in arbitrary networks, and between any pair of arbitrary nodes connected with multiple parallel links, enabling routingless (or, random routed) network operation, in cases 1) without coding, 2) with RLNC and 3) with coding redundancy.

  • Derivation of occurrence probability of maximum possible differential delay, including cases where network contains multiple links and paths with maximal or/and minimal possible delay, which is a case study of practical relevance;

  • A new theoretical framework to queue analysis in end-systems including the derivation of a closed form of expected buffer size at receiver, with and without RLNC, and for an arbitrary distribution of path delays;

  • Analysis of the impact of coding redundancy and the level of parallelism on the network performance, and buffer sizing at receiver;

Iii System Model

Fig. 1: A parallel network system architecture. ( is the number of parallel flows (sub-flows, or lanes), is the number of utilized optical paths in the network; each pair of nodes is connected with 10 parallel interfaces and links in the network.)

Iii-a Background

Fig. 2: Multi-lane Ethernet-over-optical network system. (a) Traffic distribution in source; (b) Deskew buffer from [1].

Fig. 1 illustrates a parallel network system architecture envisioned. At the sender, a serial flow of high-speed data units is split into up to parallel flows (sub-flows, or lanes), whereby each parallel flow is then independently routed over separate optical channels. Depending on the application, the number of electronic lanes can be perfectly matched to the number of optical channels (), whereby the aggregated capacity in the optical layer is always greater or equal to those in the electronic layer. Fig. 1 also illustrates how special functions can be added to each parallel channel, such as adaptive coding features. The input data units (e.g., packets, frames) are encoded using an RLNC encoder. The number of parallel lanes, , and RLNC are related as follows: data units from parallel flows encoded with the same set of coding coefficients are called a generation, while the number of resulting parallel flows after decoding is defined as generation size, here . We refer to the number of resulting parallel lanes, i.e., , as the level of parallelism in end-system (). The network topology is such that every node is connected with other nodes over mulitple parallel links (here: 10 parallel links between any pair of nodes). The number of encoded data units is generally equal to the number of parallel paths and interfaces allocated in the network, whereby in our example the source node uses 6 out of 10 parallel interfaces to setup 6 parallel paths. If , as we illustrate here, we refer to as redundancy, here . The decoder starts the decoding as soon as at least data units from one generation arrived. We envision that forwarding nodes in the middle of network can perform additional recoding of optical flows to the same destination as shown in Fig. 1 (dashed line). With recoding, it is possible to insert additional redundancy and so increase fault tolerance, without decoding.

Fig. 2 shows a typical multi-lane 40Gb/s Ethernet-over-optical network system[1]. In the transmitter, a high speed stream of serial Ethernet frames is split into data blocks of 64b, encoded with 64b/66b line code in Physical Coding Sublayer (PCS), and distributed over virtual Ethernet lanes. For identification of the lane ordering at the receiver, specific alignment markers are inserted in each lane after data blocks. After that, each 10GE Ethernet lane is mapped to four optical data units (ODU), here of type 2. The ODU2e method enables the transparent 10GE mapping using an over-clocking approach, whereas extended Generic Framing Procedure (GFP) principles are applied. The ODU signals are then modulated on four optical carriers and transmitted over four optical channels (Path 1, 2, 3, 4). In general, the number of Ethernet virtual lanes and the allocated optical paths do not need to be equal. However, in our model, the OTN concept is assumed to generally map data streams from Ethernet lanes into ODU2-v or ODU2e-v containers.

A simplified architecture of the receiver, also according to IEEE802.3ba, is shown in Fig.  2. Here, PCS layer processes the 66b data blocks received to retrieve the original Ethernet frame, which requires multiple processing entities, including lane block synchronization, lane deskew and reorder, alignment removal, etc. (not all shown here). To illustrate this, let us assume that paths and are the shortest and the longest path in terms of delay, respectively. For compensation of the resulting inter-lane skew, the data blocks from path must be buffered in the receiver until data from longer paths arrive, i.e. paths 1, 2, and 4. For compensation, the receiver implements the so-called deskewing and reordering. The deskew function of the PCS is implemented with the input FIFO queues, that store data blocks until the alignment markers of all lanes are received and synchronized. This allows the scheduler to start the lane identification, alignment removal and the reordering to form the original data stream.

Let us now focus on the receiver design with RLNC. A nice feature of RLNC is that it can be implemented without altering the system architecture presented, see dashed box in Fig. 2. The coding process is illustrated in Fig. 3 in more detail. Let us assume an Ethernet frame of bits () split into data blocks and then encoded with 64b/66b line code 111The line code is not to be mistaken for RLNC. The latter is performed over the last 64b after sync header., where we introduce the notation 64b/64b+2b to differentiate between 64 data bits and the 2 bits of the sync header according to [1]. The data blocks are then distributed over virtual PCS lanes so that each sub-flow on defined virtual (parallel) lane contains exactly data blocks222

In our model of traffic splitting, we assume the bit padding in that case that the data block, or symbol, is incomplete.

.

Assuming the RLNC coding process is based on symbol size bits, and since blocks contain data bits excluding the 2 synchronization bits, each coded data block would contain symbols. All symbols from each parallel lane related to parallel data blocks that are simultaneously encoded with the same set of RLNC coefficients (a generation), while the number of resulting encoded data blocks, i.e., , is defined as generation size. In Fig. 3, each Ethernet frame thus encoded into generations, while the generation is extended by two redundant blocks resulting in generation size . The 2 sync header bits of each data block bypass the RLNC encoder and are added after coding as header in the form 64b/64b+2b+Cb, where C is an additional ID-header.

At the receiver, the reference deskew and reorder model with RLNC is shown in Fig. 4, where transmission is over paths similar to traditional Ethernet system (Fig. 2), i.e., without redundancy. Later (Fig. 5), we discuss the system implemented with coding redundancy. The distributed line buffer of the Ethernet system is now organized as a centralized decoding buffer consisting of multiple virtual output queues (VOQ), whereby a new VOQ is created for each generation, each time the first data block of a new generation arrives. The decoder checks the existing VOQ for complete generations, and starts decoding as soon as one generation is complete, whereby all data blocks of a complete generation are decoded in parallel, by running Gaussian elimination. Thus the parallel decoding replaces the line specific deskewing approach of the Ethernet system by taking advantage of the multiplexing gain due to the centralized buffer. After decoding, the data blocks are sent in the correct order, – thus eliminating the need for reordering. That is due to the fact that data blocks are decoded in parallel, while a correct assignment of decoding coefficients to encoded data blocks assures the right order of data blocks. As a result, decoded data blocks are only serialized.

For successful decoding, all data blocks from the same generation need to be uniquely identified . This can be implemented using additional bits in the header to form 72b coded blocks. Each data block of a generation is identified by the same number , whereby we use . At the receiver, the identifier is processed by the scheduler and addresses the correct VOQ . Since ODU2 payload includes 15232 bytes, after sequentially arrived data blocks per lane, the number wrap-round will overwrite the same VOQ with a new generation. This corresponds to the maximium delay difference between lanes to , where the time unit (tu) is the transmission time of a data block. For instance, for 10Gbit/s we have ns and a maximum delay difference of ns, which is far larger than the required ns for Ethernet systems, and thus would be unacceptable. 333Note, the 72b=64b/64b+2b+6b block increases the line rate by , which requires to develop an efficient GFP method in the OTN layer. A sensible approach to address this is to reuse the Ethernet inherent alignment marker process without adding additional ID-header bits, i.e. , also in line with the existing standards. After receiving the first alignment marker, the corresponding lane is marked as reference, the scheduler initialized the VOQ for the first generation and for each following block of the same line the next VOQ is generated. In fact this allows to address different VOQs, and to compensate delay differences of up to . To limit the buffer space, however, similar to the method with ID-header, we could cyclically overwrite the VOQs after receiving the data blocks on the initializing lane, where . 444The data blocks extracted from an ODU container or its sync header may be erroneous, despite the existence of FEC in the optical layer. In case of a single block errors, the RLNC decoding of one generation will fail. If an alignment marker is erroneous, all data blocks received on the associated lane may be sorted to the wrong VOQ resulting in decoding errors. This error process will be stopped with arrival of the next marker and, thus, re-initialization of VOQs mapping. At the same time, this issue is not different from the erroneous packet handling in the conventional Ethernet system.

The coding coefficients required for encoding and decoding can be selected and distributed by control plane using an out-of-band signaling channel. An in-band signaling method can be used by applying transcoding, as specified in IEEE802.bj. For example with 4 PCS lanes, the 256b/257b transcoding enables us to transmit 35 additional bits after sending 5 blocks of 256b/257b (or after 20 blocks of 64b/66b) per lane (in total, 140 additional bits serially) without increasing the 10GE line rate, which is inline with the Ethernet and OTN standards. Thus,

coding vectors each of length 8b can be sent on each lane every 20 64b/66b blocks to the destination, i.e., 20 successive generations are coded with the same coding vector.

In the model that follows, encoding and decoding are applied in the end-systems, i.e., on the Ethernet layer, while optical nodes in the core network simply forward the incoming coded data over reserved outgoing interfaces.

Fig. 3: Ethernet traffic parallelization with LNC.

Fig. 4: Decoding buffer at the receiver.

Iii-B RLNC-based end-system model

To model the RLNC based end-to-end system, we adopt the network model from [24] representing a network as a directed and acyclic graph , where and are vertex set and edge set, respectively. The source and destination nodes are denoted as and , respectively. A distinction is made between incoming and outgoing links of an arbitrary node , which are denoted as a set and , respectively. A link is an incoming link of a node , if , where , while link is an outgoing link of a node , if , where .

As illustrated in Fig. 3, the traffic sequence, is decomposed into data blocks of the same length, i.e., symbols each. The linear coding process is performed over a field , whereby each symbol has the same length of bits.

We define time unit (tu) as a discrete time based on the link capacity of the physical link, which can be analyzed as a transmission delay of one data block. Thus, the parallelization, reordering and de-skewing at the end systems can be modeled as a discrete time process. At time , the incoming symbols , , of parallel data blocks at source node are generated by the processes , on every virtual lane , denoted as , , . The incoming symbols are encoded into symbols , , and sent out on each outgoing lane , . In this model, the RLNC encoder buffers incoming symbols from all lanes in parallel, and encodes the same with simple linear coding. Thus, the signal carried on an outgoing link of source at time is:

(1)

where is an encoding interval and collected in the matrix are encoding coefficients from the finite field .

At the receiver, the decoded information at time on parallel lane is modeled as:

(2)

where is decoding interval, are coding coefficients from the finite field collected in the matrix [24]. Generally, RLNC can result in linearly dependent combinations, whereby the probability for that combinations is related to the finite field size. The probability of selecting coefficients that do not correspond to a decodable code instance is of the order of [25, 26, 27, 28, 29, 30]. Thus, with high probability, regardless of the number of paths selected, RLNC will lead to a satisfied LNC. The decobability can be verified at the transmitter or receiver. In case an instance of coding coefficent is not decodable, the encoder can readily select another instance, with coefficients selected uniformly at random from the field in which we operate.

Iii-C Network model

The network is modeled as a directed graph with a set of nodes and links, whereby each pair of nodes is connected with multiple parallel links. In this topology, it is expected that at least out of existing parallel links and paths are available between source and destination , whereby only paths are selected for parallel transmission. To derive the likelihood that network can provide paths, we use the following model applicable to connection-oriented networks. A network can provide at most available parallel paths between nodes and , while the setup probability of an arbitrary path over a defined link denoted as 555Since our motivation is to analyze the differential delay in the network, and the resulting buffer size at the receiver, we do not consider in this paper how various traffic load pattern impact the path setup probability .. We approximate the model for blocking probability of the connection request by assuming that the network load is distributed so that each out of possible paths can be setup with an equal probability, . As a result, the probability, that a path between and cannot be setup, and is blocked, is defined as

(3)

Thus, the probability for available parallel paths out of

existing paths would follow the Binomial distribution, i.e.,

(4)

Finally, the transmission request is blocked with probability , when the number of available paths is lower than the number of outgoing interfaces , i.e., .

(5)

The mean number of available paths is determined as

(6)

On the other hand, the mean number of available for optimization or random selection paths , which ensures the successful parallel transmission and is relevant for the buffer analysis in the next Section, can be derived as

(7)

As previously mentioned, RLNC can extend the generation size by including redundant data blocks resulting in redundant data flows from source. The redundant, i.e., , data blocks are transmitted in parallel with other data blocks from the same generation. In case of data block loss or network failures, a data block coming from redundant parallel paths can replace any data block from the same generation. We show that this feature is useful not only for fault tolerance but also for reducing the expected value of the differential delay, and thus the buffer size.

Iv Analysis

The analysis includes three parts: (i) analysis of the expected differential delay in a generic network, (ii) analysis of the impact of coding redundancy on differential delay, and, (iii) derivation of the expected value of the queue (buffer) size at the receiver, including the upper and lower bounds.

For presented analysis, we utilize following underlying assumptions

  • The network does not exhibit any failures or losses;

  • In transmission system with RLNC, a set of parallel paths is chosen randomly at the source, and with the same probability among all paths available;

  • Let us assume that available parallel paths , between source and destination in are collected in a set and are sorted in the ascending order so that the increasing index of each path corresponds to an increasing path delay , i.e., , which are arranged in a vector of length , . Since we consider next only one certain source destination pair, the notation can be simplified as ;

  • Since any fiber path provides multiple wavelength path, we assume that multiple available paths can have the same end-to-end delay;

  • For a fair comparison, all data blocks in the system without network coding have the same size as in case with RLNC;

  • Link capacity is defined as a data block per time unit (tu).

  • To simplify the analysis, we assume integer values for delays, i.e. , which can be realized without loss of accuracy by choosing a sufficiently small value of tu;

  • We assume an idealized scheduler for both architectures, i.e., with and without RLNC;

  • For scheduler, we do not assume a specific polling strategy, which may have an impact on the queue size.

  • In steady state, we assume the full traffic load and a deterministically distributed arrival process, where on each lane one data block arrives per tu.

  • We note the binomial coefficient as , whereby for .

Iv-a Expected value of differential delay

The differential delay is typically defined as difference in delays between the longest and the shortest paths and , respectively. We next derive the expected value of differential delay given a number of available paths in arbitrary networks, whereby a set of parallel paths is chosen randomly.

Let’s assume that parallel paths chosen randomly with the same probability among available paths form a subset and let’s denote as the set of all possible subsets. There are possible path combinations, where each combination is collected in a subset , and appears with same probability . However, all paths in each subset are sorted so that their corresponding path delays appear in ascended order. This requires to derive an index mapping , which maps an index , , used to specify a path out of the subset to a path , . This also maps the delay of path to the corresponding component of the delay vector . This ensures, that the increasing index of each path corresponds to an increasing path delay , .

To define such mapping let introduce and , i.e. the set of all subsets of with cardinality . Let us use , , to index each of these subsets. Thus based on paths , the set is given as and the final mapping between path and is defined by the index function

(8)

where and . Furthermore, Eq.(8) enables the mapping of paths by the relation and has the property . This ensures that each path can occur once in the path set . For example, assume parallel paths are randomly chosen from the set , i.e., and . The sorting due to increasing delays defines the mapping to the selected subset shown by the equivalence .

The longest and the shortest paths, and , respectively, within chosen set define the differential delay of the path set chosen, i.e.,

(9)

As we consider networks with a large number of path sets available, the differential delays depend on the path set chosen, i.e., , . The expected value of differential delay takes into account all possible paths combinations, as indexed by , i.e.,

(10)

where we use to simplify notation. The expected value can be derived knowing the expected delay of the path over all path sets randomly chosen with probability . Using the mapping as specified by Eq. (8), the expected delay of path can be derived from

(11)

where is the probability, that the path with delay from is selected as path with delay , , in a random chosen path set . Using combinatorial theory, in the range this probability is given by

(12)

The detailed derivation of Eq.(12) is in Appendix. Using Eqs. (11)-(12), the expected value of path delay of the path of a randomly chosen path set yields,

(13)

where

. Thus, the expected maximal and minimal path delays over all path combinations are defined as

and , respectively. Finally, using Eq. (10), the expected value of differential delay can be derived as, i.e.,

(14)

To derive further relations, let us define the likelihood that a specific subset is chosen by indicate its dependence on the number of parallel paths. Since only one out of path combinations is randomly selected for transmission, all path sets and all paths of a set occur with the same probability. Let us denote the probability of an arbitrary paths combination as , whereby

(15)

where is the probability that an arbitrary path is selected for transmission and collected in .

Generally, each element from can have value equal to values of their direct neighbors, i.e., and . Thus, vector can contain and elements with minimal and maximal path delays, i.e., and , respectively. Therefore, there maybe many more path combinations yielding a maximal differential delay defined with Eq. (16)

(16)

The occurrence probability of is

(17)

where for a path combination we assume, that the path with maximal delay is mapped to one of the paths , i.e, for . But if , the mapping of the path with maximal delay has to be limited to . Furthermore, its path with minimal delay, , is mapped to one out of paths for . Otherwise for a selected longest path , the first path of a set of path is restricted to paths with index , thus the mapping of is restricted to the range . Finally, there are possible path combinations, whose probability follows from Eq. (15) and is independent of . Furthermore, the shortest and longest paths in each combination , i.e., one or more paths with delays and from , respectively, occur with the same probability in all combinations. Thus Eq. (17) can be simplified as follows

(18)

When delay vector contains only two elements with maximal and minimal delay, i.e., and , the summation is omitted in Eq. (18), which yields

(19)

Similar to derivation of Eq. (18), we can additionally derive an occurrence probability of path combinations, where predefined paths and are assumed as longest and shortest paths, respectively, resulting in differential delay . That is equivalent to the special case, where and are irrelevant for probability calculation. Thus, the occurrence probability of the combinations, which contain certain paths and as the longest and the shortest paths, respectively, is

(20)

However, when, in a network with parallel paths, the paths and have delays and , , , , the combinations with and as the longest and shortest paths have occurrence probability smaller or equal to the occurrence probability of path combination with maximal differential delay, i.e., . On the other hand, there can be other path combinations, which do not contain paths and as the longest and shortest paths, but result in the same differential delay value . Thus, based on Eq. (18), we can claim that the maximal differential delay has the larger occurrence probability than any other lower value of differential delay only if , which is valid for a large values of and .

Iv-B Impact of coding redundancy on path analysis

As previously mentioned, parallel lanes can be coded into data blocks, and thereafter transmitted over paths. Thus, network routing redundancy can be directly related to the coding redundancy. Due to the fact, however, that the destination needs to buffer only out of data blocks from the same generation for the decoding start, while any data blocks arriving later can be ignored, the application of the previous analysis is not straightforward in this case, and needs a few modified expressions of the expected values of the differential delay.

Let us assume that all parallel paths are selected randomly with the same probability. For each path set the corresponding delays are sorted as , , from , . For successful decoding, the receiver needs to buffer the data blocks arriving from any paths with least delays, i.e., . Thus, in a parallel network system with RLNC, the sender can in fact arbitrarily assign any of the paths, irrespectively of their related path delays, which eliminates need for routing and complex cross-layer design.

In a parallel network system with linear network coding and redundant paths, where the path set of paths is randomly chosen, a maximal path delay which has to be taken into account at the decoder is bounded as follows

(21)

The detailed derivation of Eq. (21) is in Appendix. With Eqs. (21), (12) and (13), an expected maximal path delay follows to be

(22)

Similar the expected minimal path delay can be calculated as . Finally, with Eq. (22) and using the fact that the number of paths is now given by the expected value of differential delay follows to be

(23)

where simplifies to the result given by Eq. (14).

As a result, the maximal differential delay is

(24)

Due to the fact, that the delay vector can contain and elements , which are equal to and , respectively, it is advantageously to consider a special case covered by Eq.(24). If the maximal relevant delay is given by , and is independent of . Thus, an equivalent form of Eq.(24) is given by . Using this result and similar to Eq. (17), the occurrence probability of maximal differential delays is given by Eq. (25).

(25)

Due to the redundant paths, only paths are effectively used and the maximal used path delay is determined by , as discussed above. Thus in contrast to Eq. (17) the path with maximal delay is mapped to one path within the range given by for . Similar, the path with minimal delay, , is mapped to one of the paths out of the range . In accordance with our assumptions, the redundant paths have a delay larger or equal than the path. Now depending on the mapping of the path to one path , the redundant paths will be mapped to paths with delays out of the range . Thus there are path combinations to achieve these mapping, which are combined in subset . Due to Eq. (15), the probability for the set of redundant path can be written as . Similar to Eq. (17), for the remaining paths there are combinations whose paths are combined in subset , where the probability for a set of paths can be written as . Thus the probability simplifies to

(26)

For the case, there is only one path with maximal delay, and only one path with minimal delay, , i.e , the probability of occurrence of a maximal differential delay can be derived from Eq. (26) as

(27)

Compared to the system without redundancy, the receiver experiences a reduction of a maximum differential delay in the network , expressed as follows

(28)

Additionally, the expected value of differential delay can be reduced by the ratio determined by Eq. (29).

(29)

On the other hand, sending of redundant data flows over additional paths presents a capacity and transmission overhead, which we define and later numerically evaluate as

(30)
Fig. 5: Buffer models at the receiver. (a) Deskew buffer per lane; (b) Decoding buffer.

Iv-B1 Discussion on path failures and packet loss

In contrast to coded parallel transmission without redundancies, the utilization of redundant paths does not only provide a reduction of differential delay, but also can increase robustness of system regarding packet, i.e., coded data blocks loss and path failures, when the number of lost packets per generation and number of failed paths is less or equal to the number of coding and path redundancies , respectively. However, we need to consider a possible impact of any fault on resulting differential delay.

With path failures, while available paths in the network fail, the source can utilize only of the remaining paths. Thus, the expected differential delay can be calculated by Eq. (23) under consideration of reduced vector , which contains now elements.

In case of packet loss, while , we consider a worst case, whereby packet loss can occur on any utilized path. In that worst case scenario, the maximal possible differential delay needs to be considered for system design to prevent possible retransmission. Thus, the mean differential delay in system with packet loss can be calculated with Eq. (13) as

(31)

Iv-C Analysis of receiver queue size

In a parallel network system without RLNC, we refer to the required queue size, as the deskew buffer (Fig. 5). In the system with RLNC, on the other hand, the buffer architecture is based on the virtual output queue (VOQ), referred to as the decoding buffer (Fig. 5). Let us denote the delay difference between the longest and an arbitrary path in a path set as , , where refers to the largest differential delay within the path set and can be determined with Eq. (9).

To analyze the effect of differential delay for input lanes, in this section we first assume a fixed optimal path pattern with delays ordered as . Here, the largest differential delay within the optimal path set is denoted as . Generally, the expected value of the buffer size strongly depends on the paths chosen from vector .

However, due to the definition of , the accumulated differential delay between all selected paths is given by . In contrast to randomly selected paths, the optimal path set will minimize the differential delay between selected paths and, thus, the buffer size. As a result, the optimal path combination is given by , when .

We assume an idealized scheduler, whereby, in steady state, we assume a deterministically distributed arrival process. The assumption that arrivals are deterministic is because paths are selected in the optical layer to transport a single Ethernet frame. For instance, for a 1500 byte frame and parallel lanes, 47 blocks will arrive in succession on each of the parallel lanes at the receiver, which makes it deterministic.

For input lanes, the idealized scheduler runs time faster, such that during a full cycle time a total of data blocks are forwarded and processed every tu as defined next

(32)

where is the mean forwarding time of a data block and is the polling and processing time.

Iv-C1 Deskew Buffer Size

This architecture reflects the multi-lane Ethernet technology, where the number of parallel paths in is equal to the original number of virtual lanes used in the sender, i.e. . In terms of buffer sizing, the worst case scenario occurs when the receiver needs to deskew the differential delay between the lanes with largest and smallest delay given by Eq. (9), where the path set index is neglected to simplify the notation. This delay, if measured in multiples of time units, requires a buffer size of for the shortest lane. The reordering and re-serialization process is undertaken by the scheduler, which requires an additional buffer place per lane to enable the processing after the cycle delay . The latter is a simple model for Ethernet’s multi-lane deskewing mechanism using data block markers. Thus, the queue size required for the lane related to the shortest path is . To allow for any arbitrary pattern of differential delay, the input buffer size is the same for each lane, which corresponds to a classical design principle used in parallel hardware architectures. Consequently, the total buffer size can be expressed as

(33)

Iv-C2 Decoding queue size

The first data blocks from any lanes and originated from first generation created at the RLNC sender typically arrive at the receiver at different times. To analyze the effect of differential delay for input lanes, we first assume a fixed path pattern with delays ordered as .

The scheduler has to poll input lanes and as soon as the first data block of a new generation arrives and that on a lane corresponding to the shortest path, the scheduler forwards the data block to a newly created virtual output queue (VOQ). In the initial phase, the decoding commences when data blocks of the first generation arrive. Consequently, the data blocks from the same generation arriving from the shorter paths have to be buffered in the decoding buffer until the data block from the same generation is received. Thus, before the last packet from the first generation arrives, packets from other lanes must be buffered. The queue size during the initial phase is thus

(34)

where the explicit dependence on the path set index is omitted. Note, as soon as there is no differential delay, i.e., , the initial size of the decoding queue Eq.(34) becomes zero. This is due to the fact that data blocks are immediately transferred by the scheduler during the subsequent cycle.

For the deterministic arrival process, steady state is reached after the first generation completes. Here, the amount of data blocks forwarded in every decoding interval is

(35)

In steady state, the decoding process finishes after the decoding interval , and all data blocks from one complete generation leave the decoding buffer. To avoid idle periods, a new decoding process should immediately commence after previous decoding cycle is finished. In the best case, the scheduler transfers a new generation to the decoder every