Mobile video is expected to contribute of all the mobile traffic by 2020 . Wireless multicast, by leveraging the shared nature of the wireless medium, could be an efficient way to distribute popular videos to many users. However, wireless multicast has not been widely deployed in practical systems, because it has been difficult to simultaneously achieve high throughput, low delay, and low feedback overhead for serving a large number of users.
Throughput challenge: In a wireless communication system, the channel conditions of different receivers are heterogeneous due to multipath, shadowing, mobility, etc. Since the throughput of multicast is bottlenecked by the receiver with the worst channel condition, as the number of receivers grows, the achievable throughput in multicast vanishes, offsetting the multicast gain.
Delay challenge: Existing multicast coding schemes incur large delay at the receivers. There are two categories of multicast coding schemes in the literature. The first category of schemes employ a block coding strategy, e.g., random linear network coding (RLNC) , LT codes , and Raptor codes , etc. However, to maintain a non-diminishing throughput, the block size has to grow on the order111We use the standard order notation: for two real-valued sequences and , if ; and if ; and if for some constant ; and if for some constants and . as the number of receivers grows . Since the decoding delay increases with the block length, such schemes are known to have a poor delay performance. The second category of schemes achieve a lower delay through an incremental network coding design, e.g., [6, 7, 8, 9, 10]. The data packets participate in the coding procedure progressively, and hence the receivers are able to decode packets progressively, leading to a lower decoding delay. However, when the traffic load is high (close to ), the average delay increases dramatically on the order of [6, 8].
Feedback challenge: To achieve reliable multicast or to improve the delay performance (e.g., [2, 3, 4, 6, 7, 8, 9, 10]), the transmitter needs to collect channel state information or reception status reports (such as ACKs/NAKs) from the receivers through feedback. Conventionally, the feedback overhead increases linearly with the number of receivers, e.g., [2, 3, 4, 7, 8, 9, 10], which becomes a system bottleneck when the multicast sessions have a large number of receivers.
In modern wireless networks, multi-channel communications have become commonplace. For instance, 4G/LTE mobile wireless networks are based on OFDM and OFDMA, where a wide spectrum band is divided into many resource blocks, each with 180kHz . The 802.11a standard can have orthogonal channels in the 5GHz band. The availability of multiple channels provides significant flexibility in designing wireless resource allocation algorithms.
In order to address the above three fundamental challenges, we develop a multi-channel multicast code design in this paper, which can simultaneously achieve high throughput, low delay222In this paper, we consider the end-to-end delay (including queueing delay and transmission delay), which is measured from the time a packet arrives at the transmitter to the time that the packet is decoded at the receiver. In comparison, the delay metric considered in e.g., [5, 6, 9, 10] only accounts for the transmission delay., and low feedback overhead. The contributions of this paper are summarized as follows.
We propose Multi-channel Moving Window Codes (MC-MWC). The key idea behind MC-MWC is a simple strategy, through which multiple multicast sessions can be jointly served by the shared multi-channel resources.
High Throughput: We focus on a many-user many-channel asymptotic regime. First, we derive an algorithm independent lower bound on the number of channels needed for achieving any non-vanishing throughput as the number of receivers increases. Then, we show that MC-MWC achieves the lower bound in an order sense. Hence, MC-MWC achieves order-optimal throughput in the many-user many-channel asymptotic regime. Furthermore, we prove that the number of channels required by a conventional scheme based on the optimal static channel allocation and capacity-achieving codes is doubly-exponentially larger than that required by MC-MWC.
Low Delay: Using large deviations theory, we show that the delay of MC-MWC decreases linearly as the number of channels grows, while the delay reduction of conventional channel-allocation based schemes (even incorporating with any coding schemes) is no more than a finite constant. To the best of our knowledge, MC-MWC is the first wireless multicast scheme in which the delay decreases linearly as the number of channels grows, with no loss in the per-channel-throughput.
Trace-driven and numerical results are provided to validate the analytical results and show that 1) MC-MWC achieves significant throughput and delay improvements in the practical scenarios even when the number of users (channels) is not large, 2) the implementation complexity of MC-MWC is low in practice.
The rest of this paper is organized as follows. In Section II, we introduce some related works. In Section III, the system model is described. In Section IV, we introduce Multi-channel Moving Window Codes (MC-MWC). The throughput and delay performance of MC-MWC are analyzed in Section V and Section VI, respectively. In Section VII, we evaluate the performance of our method through simulations. Finally, we conclude the paper in Section VIII.
Ii Related Work
In wireless unicast, exploiting the multi-channel resources has been extensively studied. For example, in one line of work, e.g., [13, 14], the multi-channel resources are allocated based on the queue lengths of the users such that throughput optimality is achieved. However, throughput optimality does not necessarily imply good delay performance. Thus, another line of work, e.g., [15, 16] proposed delay-based scheduling policies, which achieve optimal throughput and provably good delay performance for multi-channel wireless unicast.
In wireless multicast, multiple receivers may receive the same transmitted packet, and as a result, the work-conservation principle assumed in wireless unicast is violated . In addition, rateless codes or network codes, e.g., [2, 3, 4, 6, 7, 8, 9, 10]. are typically used to achieve good throughput performance. With coding, the receivers need additional time to decode the packets and thus the queueing delay considered in wireless unicast does not fully capture the overall delay. For the above two reasons, the methodologies developed for multi-channel wireless unicast are not applicable to the multicast scenario.
Existing studies about resource allocation in wireless multicast are either channel-statistics based or channel-state based. In the first category, e.g., [18, 19], it is assumed that the transmitter has access to the channel statistics of the receivers. To realize cooperative multicast, the authors in  proposed a network-coding based multicast scheme, in which the scheduling policy depends on the channel statistics of the receivers. In , to maximize the long-term throughput, the channels are allocated statically according to the channel statistics. The second category, e.g., [17, 9] assumes that the transmitter has perfect knowledge of the channel state information of the receivers and exploits the opportunistic gain in throughput or delay. A throughput-optimal policy was developed in  for wireless multicast with dynamic topology. In , the authors proposed an instantly decodable network coding scheme which achieves zero decoding delay at the cost of throughput loss. However, collecting channel state information from all receivers incurs a prohibitive large feedback overhead, rendering them impractical when multicast sessions have a large number of receivers. An open question is how do we obtain a linear delay reduction in the number of channels without loss in the per-channel throughput?
Iii System Model
We consider a multi-channel, multi-session wireless multicast network with orthogonal channels and multicast sessions, as shown in Figure 1. As in [13, 15, 16], for ease of presentation, we assume that the number of channels is equal to the number of sessions. Our throughput and delay analysis can be readily generalized to the scenario when the number of channels scales linearly with the number of sessions.
Iii-a Multicast Sessions
For a multicast session , the transmitter needs to send a stream of data packets to a set of receivers, denoted as . For any two different sessions and , their sets of receivers and are allowed to have arbitrary intersections, i.e., a receiver may be interested in multiple sessions. Let denote the number of receivers in the session . The number of receivers in different sessions may be different and we assume that are i.i.d. across different sessions333This assumption can be relaxed such that have heterogeneous distributions. The details are omitted due to space limitations.. Let
denote a random variable which has the same distribution as, where represents the expected number of receivers in one session, i.e., .
In our model, time is slotted. The packets which need to be served in each session arrive stochastically. Let denote the number of packet arrivals from session at the beginning of time-slot . We assume that are stationary and bounded random variables, independent across time-slots, independent of the number of receivers . We use to denote the expected packet arrival rate in session .
Iii-B Channel Model
The channel between the transmitter and the receivers is assumed to follow the standard broadcast erasure channel model. Let denote the channel state of receiver on channel in time-slot , given by
denote the probability of successful packet reception at receiveron channel . We make the following assumption on .
Assumption (Heterogeneous Channel Conditions): The channel statistics are random variables that have heterogeneous values for different receivers and different channels. Specifically, are i.i.d.444Note that while are i.i.d., they correspond to the probability of successful packet reception of receiver on channel . So, these probabilities can be different for different realizations of , hence the channel conditions are heterogeneous. across receivers and across channels. Let and denote the the CDF and the expectation of , respectively. Moreover, there exists such that .
The above assumptions are more general than the homogeneous network assumptions in [5, 7, 8, 10], where all are assumed to be equal and thus fail to reflect the heterogeneous channel conditions in reality. In addition, the existence of is fairly weak, since it basically says that there is a non-zero probability that is close to . In Section VII, we will provide simulation results that are obtained using data traces collected from a software defined radio platform. We will see that the analytical results obtained under the above assumptions are also valid in practice.
Iv Multi-Channel Moving Window Codes
In this section, we propose a Multi-Channel Moving Window Codes (MC-MWC) approach, which is built on the recent work Moving Window Codes (MWC) . Compared with MWC, MC-MWC has two key novelties. First, MWC is designed for the single-channel single-session multicast, and actually we can show that a straightforward generalization of MWC to the multi-channel multi-session multicast setting will lead to poor throughput and poor delay performance. Second, owning to a new strategy, MC-MWC simultaneously achieves good throughput and delay performance by exploiting the multi-channel resources. In the following, we first introduce a strategy, and then propose MC-MWC.
Iv-a A Simple Merging Strategy
The key idea behind MC-MWC is a simple strategy.
(Merging) The multicast sessions are jointly served by the multi-channel resources as follows:
At the transmitter, the streams of packets from all sessions are merged to form one large multicast session. Then, the transmitter uses the channels to send the grouped multicast session to all receivers in .
In each time-slot, a receiver listens on all the channels to receive as many packets as it can. Then, the received packets are used to decode the merged multicast session, including the session(s) receiver is interested in.
Notice that only describes some high-level multicast transmission rules. The coding, transmission, and feedback procedures of a specific example of will be explained next in detail.
Iv-B Multi-Channel Moving Window Codes
As shown in Figure 2, MC-MWC is comprised of three modules, i.e., the encoder at the transmitter side, the decoder at the receiver side, and the feedback modules.
At the transmitter, all the packets from all sessions are grouped together to form one large multicast session. The packets in the merged multicast session are indexed as , where the packets are indexed according to the order of their arrivals and the tie breaking rule for the packets arriving at the same time-slot is arbitrary. All the newly-arrived packets at the beginning of time-slot are instantly injected555The injection rate has to be within the capacity region of MC-MWC, which will be discussed and analyzed in Section V. to an encoder window. At time-slot , the total number of packet arrivals from all the sessions is denoted as . Then, the number of packets that the encoder has received up to the beginning of time-slot is , i.e.,
To prevent the encoder window from growing indefinitely over time, at the beginning of time-slot , packets which are “determinable”666“Determinable” will be formally defined later in the decoder module. at all receivers ( can be determined via a low overhead feedback mechanism described later) are removed from the encoder window.
At time-slot , the encoder generates coded packets through linear combinations of the data packets in the encoder window, which are transmitted over the channels to all receivers . Specifically, the coded packet which is transmitted on the channel in time-slot is generated by
where denotes the packet of the merged session, “” is the product operator on a Galois field , and
are independently drawn according to a uniform distribution on. The values of and are embedded in the packet header of . Moreover, can be known at each receiver by feeding the same seed to the random number generators of the transmitter and all the receivers.
With , a receiver receives all information sent on the channels and decode all the sessions including the session(s) it is interested in, i.e., .
To facilitate a clear understanding of the decoding procedure, we restate the definition of “determinable” packet that was originally defined in .
(“Determinable” packet) A packet is said to be “determinable” at a receiver if the receiver has enough information to express as a linear combination of some packets with greater indices, and “indeterminable” otherwise.
Let be the number of “determinable” packets at receiver by the end of time-slot . Define a virtual decoder queue
for each receiver . Then, is the number of packets which have arrived from one of the sessions but are “indeterminable” at receiver at the end of time-slot .
Suppose at the beginning of time-slot , the packets are “determinable” at receiver and the total number of packets that the encoder received is . Similar to [7, 6], when the field size is sufficiently large, for every successfully received packet over the channels at time-slot , with a high probability, the next “indeterminable” packet becomes “determinable” until all the packets are “determinable”. To better see this, consider the following simple example. The receiver receives 3 coded packets from channels in time-slot , i.e., , and . From the first packet, , thus, by definition, becomes “determinable”. Similarly, from the second and third packets, we have and , by which and are “determinable” in turn. Therefore, the evolution of is given by
Recall that to prevent the encoder window from growing indefinitely over time, packets are removed from the encoder window at the beginning of time-slot . To achieve reliable multicast, these packets must be “determinable” at all receivers at the beginning of time-slot , and hence must be controlled based on the status feedback from all the receivers. To avoid the prohibitively large overhead caused by traditional per-receiver feedback mechanisms, we modify the anonymous feedback mechanism proposed in the recent work [6, 12] to ensure reliable multicast reception at all receivers with a negligible feedback overhead.
Anonymous feedback was devised for a single multicast session [6, 12], which guarantees reliable multicast with constant feedback overhead, regardless of the number of receivers. However, without , applying anonymous feedback to the multi-session multicast would incur an overhead which increases with the number of sessions.
In MC-MWC, anonymous feedback can be applied to the merged session. The key idea is to let the receiver(s), for which the oldest packets in the encoder are not all “determinable”, send a NAK in a shared feedback channel at the end of each time slot. This feedback is “anonymous” in the sense that the transmitter does not differentiate the actual ID(s) of the receiver(s) sending NAK. As long as the transmitter can detect the existence of NAK signal from feedback, it will keep the oldest packet in the encoder buffer; otherwise, the oldest packets are removed from the buffer. This mechanism ensures that
which guarantees reliable multicast. The anonymous feedback in MC-MWC only requires a short sub-slot to detect the existence of NAK in the shared feedback channel. Thus, the total feedback overhead of MC-MWC is a constant, not only independent of the number of receivers in each session, but also independent of the number of sessions in the network. Compared with conventional schemes, e.g., [2, 3, 4, 7, 8, 9, 10], the feedback overhead of MC-MWC is times smaller. Compared with MWC , the feedback overhead of MC-MWC is times smaller. Therefore, MC-MWC effectively reduces the feedback overhead when there are a large number of sessions/receivers. In , it is shown that anonymous feedback can be easily implemented and incurs a low overhead in practice.
V High Throughput of MC-MWC
In this section, we analyze the throughput performance of MC-MWC. Noting that multicast is most beneficial when there are a large number of receivers, we focus on a scenario when both the number of receivers and the number of channels scale to infinity. We refer to this setting as the many-user many-channel asymptotic regime. We emphasize that although our analysis is in the asymptotic regime, the results provide important insights on the practical scenario when the number of users (channels) is not large (e.g., only 10-20 users per session), as illustrated in Section VII. To facilitate the analysis in this regime, we make the following assumption on the distribution of the number of receivers, i.e., . There is a function such that for any and for any ,
Note that this is a mild assumption. For instance, the assumption is satisfied when
V-a Algorithm Independent Lower Bound
In the single-channel single-session multicast case, the achievable throughput is bottlenecked by the receiver with the worst channel condition. As a result, with the number of receivers increasing, it is more and more likely that there exists one receiver whose channel condition happens to be poor, leading to a vanishing multicast throughput. We are interested in understanding the following fundamental question in a multi-channel, multi-session multicast: how many multi-channel resources are required to achieve a non-vanishing per-session throughput? We answer the question by deriving an algorithm independent scaling law on the number of channels required to achieve a non-vanishing throughput as the number of receivers increases.
First, given and the channel statistics
, we define the full capacity region as the set of arrival rate vectors that can be stabilized by some multicast scheme that has the instantaneous channel state information (CSI) of all receivers at the transmitter.
Note that it is well known that without coding, can be expressed as the convex hull of all feasible schedules. Nevertheless, with the possibility of inter-session coding, it is generally very difficult to explicitly characterize even when the channel statistics is given. Instead, we derive a fundamental property of in the many-user many-channel asymptotic regime.
For any session , given the channel statistics , define the maximum achievable throughput of session in the full capacity region as
Then we have the following theorem regarding .
If the number of channels scales slower than logarithmically777In this paper, denotes the natural logarithm. with the expected number of receivers, i.e., for some , then the achievable throughput of any multicast scheme vanishes as :
in which denotes “convergence in probability”.
[Proof sketch of Theorem 1] The key idea is to define the set of “bottleneck” receivers for a session as
If , for any receiver , with any possible scheduling and coding scheme, we could upper bound the achievable throughput of receiver by allocating all channels to serve receiver at all time-slots, in which case its throughput is upper bounded by the sum of the capacity of all channels,
Thus, when , the achievable throughput of session is also upper bounded by Equation (14). Then, the focus is to show that for any and any , we have
We provide the proof details in Appendix A.
Theorem 1 suggests that, to achieve any non-vanishing throughput, the number of channels must scale at least logarithmically with the expected number of receivers, i.e., .
V-B Order Throughput Optimality of MC-MWC
Now we analyze the achievable throughput of MC-MWC. Given the number of receivers in each session as well as the channel statistics , the maximum achievable throughput of MC-MWC can be easily derived as follows. For a receiver , it receives information on all the channels, the sum capacity of which is . Recall the capacity of multicast is limited by the worst receiver, the maximum sum throughput of all sessions is then given by
Similar to the definition of , we define the capacity region of MC-MWC.
To characterize , let us define a convex polytope for a given and any given as
The polytope contains all rate vectors such that the average rate is no greater than .
Then we derive the following theorem regarding .
If the number of channels scale logarithmically with the expected number of receivers, i.e., for any , then the achievable per-session throughput of MC-MWC is lower bounded by in the following sense:
[Proof sketch of Theorem 2] Let us prove that, with MC-MWC, for any , if . then with probability no less than , as .
Note that for any , it can be shown that
Then, using Hoeffding’s inequality, we derive a lower bound of Equation (18).
Noting that and can be arbitrarily close to , the proof is complete.
We provide the proof details in Appendix B.
Equation (17) suggests that if the number of channels scales logarithmically with the number of receivers, with high probability MC-MWC can stabilize all the rate vectors such that the average rate is less than .
V-C Throughput Gain over a Conventional Scheme
Finally, we compare MC-MWC with a conventional scheme, which can be considered as a straightforward extension of [18, 19] to the multi-channel, multi-session multicast setting. First, let us define capacity-achieving code.
(Capacity-achieving code) A coding scheme is said capacity-achieving if it can achieve the capacity of the broadcast erasure channel.
The conventional scheme is based on the following static channel allocation.
Static Channel Allocation: In the static channel allocation, the transmitter allocates the channels according to the channel statistics . Let : represent a one-to-one mapping from the sessions to the channels . The static channel allocation is to find a such that the long-term sum throughput of all sessions is maximized. Note that given a static channel allocation , the maximum achievable throughput of any session is bottlenecked by the receiver with the worst channel condition on the channel , i.e., . Then, the optimal static channel allocation is formulated by
Given the optimal channel allocation to Equation (21), the transmitter encodes the packets from session using a capacity-achieving coding scheme, such as rateless codes (e.g., [3, 4]) and some network coding schemes (e.g., [2, 7]), and sends the coded packets over the channel .
With the conventional scheme, given and the channel statistics , there is a maximum achievable throughput for each session , denoted as . We derive the following theorem regarding .
If the number of channels scales slower than exponentially with the expected number of receivers, i.e., for all , then the achievable throughput of the optimal static channel allocation with capacity-achieving codes vanishes as , that is,
See Appendix C.
Theorem 3 suggests that, to achieve a non-vanishing throughput, the number of channels must scale at least exponentially with the expected number of receivers, i.e., there exists some such that .
Vi Low Delay of MC-MWC
In this section, we show that MC-MWC achieves significant delay reduction by exploiting the multi-channel resources. We focus on the uniform traffic scenario when the arrival rates for all sessions are equal, i.e., .
Without loss of generality, we focus on analyzing the delay performance of an arbitrary session . Let be the packet in session . The delay of the packet with respect to a receiver is defined as the time between the arrival of the packet at the transmitter to the decoding of and all the packets in session with smaller indices, denoted as .
Then, assuming the system is stationary and ergodic, for receiver , the delay violation probability that the delay of a packet exceeds a threshold is given by
We first analyze the delay performance of MC-MWC.
Let the time-slots
be the decoding moments of receiversatisfying (7). Suppose that packet arrives at time-slot , which is between two successive decoding moments . Then, packet and all packets with smaller indices in session will be decoded in time-slot . The delay of packet at the receiver is
The following theorem characterizes the delay experienced at a receiver , which is shown to be independent of the channel statistics of the other receivers and .
For a receiver such that , the asymptotic decay rate of the delay violation probability of MC-MWC is
In addition, for any , with probability we have
[Proof sketch of Theorem 4] We focus on receiver . Define as the interval between the decoding moment and the decoding moment, which can be expressed as
indicating that the decoding process is a renewal process. Noting that analyzing directly is very difficult, in the first step, we connect with .
is upper and lower bounded by
Then, using large deviations theory, we derive the decay rate for .
The decay rate of the decoding interval in the steady state is given by
where is the rate function defined in Equation (25).
We provide the proof details in Appendix D.
Next, we show a lower bound on the delay performance of a general class of conventional schemes defined as follows.
(Channel-allocation based schemes) A multicast scheme is said to be channel-allocation based, if different sessions are allocated to and served by different channels. The channel allocation can be either static over time (as formulated by Equation (21)), or dynamically adapted across time-slots based on the instantaneous channel state information of all the receivers and all time-slots. To reduce the technical difficulties, it is assumed that one session can be allocated with at most one channel in each time-slot, which makes sense considering that there are channels and sessions with . Given the channel allocation, the transmitter may serve the multicast sessions by employing any possible coding scheme, including but not limited to rateless codes and network codes, e.g., [2, 3, 4, 6, 7, 8, 9, 10].
When , for any receiver , the decay rate of delay in any channel-allocation based scheme is upper bounded by
is a constant independent of the number of channels .
See Appendix E.
Theorem 5 suggests that the conventional channel-allocation based schemes (even incorporating any coding schemes), can achieve at most a constant delay improvement (not a function of the number of channels) by exploring the multi-channel resources.
The intuition behind Theorem 5 is as follows. For a queueing system, the delay of packets typically originates from two sources: 1) the stochastic and bursty arrival process; 2) the stochastic service process. Note that with a channel-allocation based scheme, a receiver can receive at most one packet from any given session in one time-slot. Even when the receiver could always receive one packet in each time-slot, there is a lower bound of the delay violation probability which originates from the stochastic and bursty arrival process. The lower bound is independent of the number of channels. However, owing to , MC-MWC is not limited to the assumption that a receiver can receive at most one packet from any given session in one time-slot.
In this section, we provide trace-driven and numerical results to investigate the throughput, delay and implementation complexity performance of MC-MWC.
Vii-a Throughput Performance
To validate our analytical results, we consider a network with heterogeneous channel conditions, where is uniformly distributed in the interval . Each of the multicast sessions has receivers. We consider the case which is most unfavorable in terms of throughput for : for any two different sessions and , the intersection of the sets of receivers and is empty.
We compare the achievable throughput of MC-MWC with that of the optimal static channel allocation with capacity-achieving codes. Figure 3 depicts the average achievable throughput of both schemes in randomly generated network scenarios, under different types of scaling of with respect to . It can be observed that the results match well with Theorem 1 and Theorem 3. When and thus increases logarithmically with , MC-MWC achieves a non-diminishing throughput while the throughput of the optimal static channel allocation vanishes as increases. When and therefore scales exponentially with , the throughput of MC-MWC grows and converges to , and the throughput of the optimal static channel allocation still decreases with which verifies the prediction in Theorem 1 that, to achieve a non-diminishing throughput, with the optimal static channel allocation, has to scale at least exponentially with . When and thus scales linearly with , the throughput of MC-MWC increases while the throughput of the optimal static channel allocation decreases with .
To evaluate the throughput performance of MC-MWC in practice, we collect traces from experiments on a software defined radio platform. We use NI PXIe-1082 platform with NI-5791 RF front end for the experiments. The carrier frequency is set to be 2.5 GHz and the bandwidth used is 40 MHz. Experiments are performed in an indoor lab environment to get the CSI measurements. We note that the effects of multipath is obvious: The SNR difference across this 40 MHz bandwidth can be more than 20 dB. Hence, we collect the CSI traces at 100 different client locations and use these traces to emulate multicast receivers. Groups are formed randomly among these receivers. Packet reception probability is calculated from the subband CSI traces using the method in .
Figure 4 shows the CDF of throughput under different schemes, where groups are formed with on average receivers in each group and random group formations are performed to get the figure. Here we consider the optimal static channel allocation, as well as a random channel allocation strategy, which assigns channels to sessions randomly. Both the optimal static channel allocation and random channel allocation schemes incorporate random linear network codes (RLNC) , which is one capacity-achieving code. We can see that MC-MWC shows a 2.7x gain over optimal static channel allocation scheme, which addresses the concern that: in real world, even when the channel conditions are correlated across receivers and channels, and when the number of receivers/channels is not large, the proposed scheme could still have a significant throughput gain.
Vii-B Delay Performance
Recall that in the heterogeneous network, MC-MWC achieves a larger throughput than the optimal static channel allocation. To make the delay comparison fair, we consider a homogeneous network scenario, i.e., for all receivers and for all channels, in which the maximum achievable throughput of MC-MWC and the optimal static channel allocation are the same. There are receivers in each session. In each session , the packets arrive according to a Bernoulli process with rate .
Figure 5 plots the delay violation probability of one receiver with different schemes, i.e., MC-MWC and moving window codes (MWC) . It can be observed that of MC-MWC decays exponentially with and matches the predicted asymptotic decay rate in Equation (25). Furthermore, the decay rate is linear in the number of channels , which shows that MC-MWC achieves a significant delay gain over MWC  by exploring the multi-channel resources.
Then, we consider the average delay performance of the same network scenario under different arrival rate . Figure 6 shows the average delay of different schemes with respect to the traffic load . From Figure 6, we have the following observations. First, the average delay of MC-MWC is much lower than that of RLNC , which is one of the rateless codes. It is worthy to note that the average delay of other rateless codes, such as LT codes  and Raptor codes , is close to that of RLNC. Second, compared with MWC , MC-MWC achieves a delay reduction which is roughly linear in the number of channels . Third, the delay gain holds for any load .
Vii-C Low Implementation Complexity
With MC-MWC, a receiver needs to decode the sessions it may not be interested in. Hence, it is important to understand the decoding complexity of MC-MWC. To this end, we emulate the decoding procedure of MC-MWC on commercial CPU and GPU, and show that the decoding complexity of MC-MWC is affordable in practice for moderate .
In the emulation, the finite field size in MC-MWC is set to be , and we use a simple table loop-up approach  to do the operations on the finite field. Each packet has bytes. We emulate the decoding process at a receiver with for all channels, where packets arrive according to a Bernoulli process with rate such that . We implement two versions of the decoding algorithm, the serial version on a single core CPU (Intel Core i7-2600 CPU clocked at 3.4 GHz), and the parallel version on a GPU (NVIDIA GeForce GTX 860M). Note that the GPU provides 640 shader units clocked at 1029 MHz, and we use 16 blocks and 64 threads per blocks in CUDA. For comparison, we also emulate RLNC  with a block length of packets, which does not incorporate with .
In Figure 7, we plot the average decoding complexity (in terms of cycles) of different schemes to decode one packet in a session. From Figure 7, we have the following observations. First, as grows, the decoding complexity of MC-MWC increases on both CPU and GPU, while the decoding complexity of RLNC remains unchanged. This is consistent with our expectation that the decoding complexity with increases with . Second, for both MC-MWC and RLNC, GPU leverages its ability to decode packets in parallel, and thus dramatically reduce the decoding complexity. Third, despite that MC-MWC requires the receiver to decode sessions, the decoding complexity of MC-MWC is smaller than that of RLNC when is small. This is because the moving window coding strategy in general results in a sparser decoding matrix compared with the dense decoding matrix in RLNC, as shown in [6, 22]. Finally, MC-MWC can support sessions, the throughput of each is more than on our GPU. Therefore, the decoding complexity of MC-MWC is low in practice for moderate , in which case MC-MWC already exhibits significant throughput and delay gains as shown previously.
In this paper, we develop a Multi-Channel Moving Window Codes (MC-MWC) and prove that it achieves high throughput, low delay, and requires very limited feedback. We verify our theoretical results using trace-driven simulations, and show that the complexity of implementing MC-MWC is low in practice. This new approach, which exploits multi-channel capability, moving window coding, and anonymous feedback, has the potential to finally realize in practice the significant promise of wireless multicast.
-  “Cisco visual networking index: Forecast and methodology, 2015-2020.” http://www.cisco.com.
-  T. Ho, M. Médard, R. Koetter, D. R. Karger, M. Effros, J. Shi, and B. Leong, “A random linear network coding approach to multicast,” IEEE Trans. Inf. Theory, vol. 52, pp. 4413–4430, Oct. 2006.
-  M. Luby, “LT codes,” in IEEE FOCS 2002, pp. 271–280, 2002.
-  A. Shokrollahi, “Raptor codes,” IEEE Trans. Inf. Theory, vol. 52, pp. 2551–2567, Jun. 2006.
-  Y. Yang and N. Shroff, “Throughput of rateless codes over broadcast erasure channels,” IEEE/ACM Trans. Netw., vol. 23, no. 1, pp. 126–137, 2015.
F. Wu, Y. Sun, Y. Yang, K. Srinivasan, and N. Shroff, “Constant-delay and constant-feedback moving window network coding for wireless multicast: Design and asymptotic analysis,”IEEE J. Select. Areas Commun., vol. 33, pp. 127–140, Feb 2015.
-  J. K. Sundararajan, D. Shah, and M. Médard, “ARQ for network coding,” in IEEE ISIT 2008, pp. 1651–1655, Jul. 2008.
-  J. K. Sundararajan, D. Shah, and M. Médard, “Feedback-based online network coding,” CoRR, vol. abs/0904.1730, 2009.
-  S. Parastoo, S. Ramtin, and T. Danail, “An optimal adaptive network coding scheme for minimizing decoding delay in broadcast erasure channels,” EURASIP J. Wirel. Commun. Netw., vol. 2010, Jan. 2010.
-  A. Fu, P. Sadeghi, and M. Medard, “Dynamic rate adaptation for improved throughput and delay in wireless network coded broadcast,” IEEE/ACM Trans. Netw., in press.
-  H. Holma and A. Toskala, LTE for UMTS-OFDMA and SC-FDMA based radio access. John Wiley & Sons, 2009.
-  F. Wu, Y. Yang, O. Zhang, K. Srinivasan, and N. B. Shroff, “Anonymous-query based rate control for wireless multicast: Approaching optimality with constant feedback,” ACM MobiHoc, pp. 191–200, 2016.
-  S. Bodas, S. Shakkottai, L. Ying, and R. Srikant, “Low-complexity scheduling algorithms for multichannel downlink wireless networks,” IEEE/ACM Trans. Netw., vol. 20, no. 5, pp. 1608–1621, 2012.
-  M. Ouyang and L. Ying, “Approaching throughput optimality with limited feedback in multichannel wireless downlink networks,” IEEE/ACM Trans. Netw., vol. 21, pp. 1827–1838, Dec. 2013.
-  B. Ji, G. R. Gupta, X. Lin, and N. B. Shroff, “Low-complexity scheduling policies for achieving throughput and asymptotic delay optimality in multichannel wireless networks,” IEEE/ACM Trans. Netw., vol. 22, no. 6, pp. 1911–1924, 2014.
-  B. Ji, G. R. Gupta, M. Sharma, X. Lin, and N. B. Shroff, “Achieving optimal throughput and near-optimal asymptotic delay performance in multichannel wireless networks with low complexity: a practical greedy scheduling policy,” IEEE/ACM Trans. Netw., vol. 23, no. 3, pp. 880–893, 2015.
-  A. Sinha, L. Tassiulas, and E. Modiano, “Throughput-optimal broadcast in wireless networks with dynamic topology,” ACM MobiHoc ’16, pp. 21–30, 2016.
-  D. Koutsonikolas, Y. C. Hu, and C.-C. Wang, “Pacifier: High-throughput, reliable multicast without "crying babies" in wireless mesh networks,” IEEE/ACM Trans. Netw., vol. 20, pp. 1375–1388, Oct. 2012.
-  K. C.-J. Lin and D.-N. Yang, “Multicast with intraflow network coding in multirate multichannel wireless mesh networks,” IEEE Trans.on Veh. Technol., vol. 62, no. 8, pp. 3913–3927, 2013.
-  W. Zhou, T. Das, L. Chen, K. Srinivasan, and P. Sinha, “Basic: backbone-assisted successive interference cancellation,” ACM MobiCom, pp. 149–161, 2016.
-  M. A. Hasan, “Look-up table-based large finite field multiplication in memory constrained cryptosystems,” IEEE Trans. on Comput., vol. 49, pp. 749–758, Jul 2000.
-  A. Tassi, I. Chatzigeorgiou, and D. E. Lucani, “Analysis and optimization of sparse random linear network coding for reliable multicast services,” IEEE Trans. on Commun., vol. 64, no. 1, pp. 285–299, 2016.
-  S. I. Resnick, A probability path. Springer Science & Business Media, 2013.
-  R. Srikant and L. Ying, Communication networks: an optimization, control, and stochastic networks perspective. Cambridge University Press, 2013.
-  E. Çinlar, “Markov renewal theory: A survey,” INFORMS Management Science, vol. 21, no. 7, pp. 727–752, 1975.
-  A. Dembo and O. Zeitouni, Large deviations techniques and applications. Springer-Verlag New York, Inc., 2 ed., 2010.
-  A. Weiss, Large Deviations for Performance Analysis: Queues, Communications, and Computing. CRC Press, 1995.
Appendix A Proof of theorem 1
Without loss of generality, let us focus on analyzing the achievable throughput of session .
We want to show that if the number of channels scales a little bit slower than logarithmically with the expected number of receivers, i.e., for some , the achievable throughput of session diminishes to for any possible scheme as . It is sufficient for us to show that for any and any , the probability that session could achieve throughput is less than as .
Define the set of “bottleneck” receivers for session as
Notice that if , for any receiver , with any possible scheme, we could upper bound the achievable throughput of receiver by allocating all channels to serve receiver at all time-slots, in which case its throughput is upper bounded by the sum of the capacity of all channels,
Recall that the throughput of multicast is bottlenecked by its worst receiver. Thus, when , the achievable throughput of session is also upper bounded by Equation (29).
By the definition in Equation (28), the probability that a receiver is not in the “bottleneck” set can be given by
where is the CDF of .
Subsequently, based on Equation 30, we will show that the probability that is high when for any .
where step (a) is based on Equations (9) and (10) and is picked such that , in step (b) Equation (30) is applied, step (c) utilizes the condition that , and in step (d), we apply the inequality that for and .
By the assumption that , when is large enough, we have . Then it is easy to verify that with for any given , we have
where step (a) is because is picked such that .
Recall that we have shown that when , the achievable throughput of session is less than , according to Equation (33), it is impossible for session to achieve throughput with probability . Since and could be chosen arbitrarily close to , by the Cauchy criterion (see Theorem 6.3.1 in ), Equation (13) holds, which completes the proof.
Appendix B Proof of theorem 2
Let us prove that, with MC-MWC, for any , to guarantee that with probability no less than , it is sufficient to let the number of channels scale logarithmically with the expected number of receivers, i.e., , as .
Recall that with MC-MWC, the maximum sum throughput of all sessions is given by Equation (15). On the other hand, according to Equation (6), is a random walk on , and has a steady state distribution if Equation (15) holds, which suggests that Equation (15) is also achievable by MC-MWC. Hence, we have
Notice that for any , we have
where step (a) uses the fact that