Collaborative uploading describes a type of crowdsourcing scenario in networked environments where a device utilizes multiple paths over neighboring devices to upload content to a centralized processing entity such as a cloud service. Intermediate devices may aggregate and preprocess this data stream. Such scenarios arise in the composition and aggregation of information, e.g., from smartphones or sensors. We use a queuing theoretic description of the collaborative uploading scenario, capturing the ability to split data into chunks that are then transmitted over multiple paths, and finally merged at the destination. We analyze replication and allocation strategies that control the mapping of data to paths and provide closed-form expressions that pinpoint the optimal strategy given a description of the paths' service distributions. Finally, we provide an online path-aware adaptation of the allocation strategy that uses statistical inference to sequentially minimize the expected waiting time for the uploaded data. Numerical results show the effectiveness of the adaptive approach compared to the proportional allocation and a variant of the join-the-shortest-queue allocation, especially for bursty path conditions.

## Authors

• 8 publications
• 9 publications
• 4 publications
• 15 publications
• 38 publications
09/25/2020

### Pareto-Optimal Bit Allocation for Collaborative Intelligence

In recent studies, collaborative intelligence (CI) has emerged as a prom...
02/20/2021

### GMA: A Pareto Optimal Distributed Resource-Allocation Algorithm

To address the rising demand for strong packet delivery guarantees in ne...
03/28/2022

### Distributed Task Management in the Heterogeneous Fog: A Socially Concave Bandit Game

Fog computing has emerged as a potential solution to the explosive compu...
04/03/2018

### Query Shortest Paths Amidst Growing Discs

The determination of collision-free shortest paths among growing discs h...
07/24/2011

### Towards Bridging IoT and Cloud Services: Proposing Smartphones as Mobile and Autonomic Service Gateways

Computing is currently getting at the same time incredibly in the small ...
04/18/2022

### Split Learning over Wireless Networks: Parallel Design and Resource Management

Split learning (SL) is a collaborative learning framework, which can tra...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Internet of Things (IoT) describes a world of heterogeneous devices, such as sensors and actuators that are connected through various communication technologies while carrying out everyday tasks. Crowdsourcing in the context of IoT often refers to interconnected devices that ubiquitously exchange and aggregate information to achieve complex goals. For example, live events can be covered by composing many information streams originating from various mobile and fixed sources such as smart phones, audio/visual, and ambient sensors. Live events include not only entertainment events but also emergency situations, such as security breaches and attacks on civilians.

A common feature of many of these crowdsourcing devices is the ability to simultaneously utilize different sets of wireless and wired communication technologies, such as WiFi, cellular, Ethernet and powerline communication, and to further recognize and simultaneously interact with surrounding devices. Fig. 1 shows different examples of collaborative uploading scenarios. As depicted, it is crucial to understand how a primary device can best utilize the parallel paths provided by secondary devices for uploading its data.

Modeling the collaborative uploading problem intrinsically includes scenario-specific challenges as shown by the heterogeneous examples in Fig. 1. Nevertheless, we are interested in the uploading performance, e.g., the time required to transfer a piece of data from a source device to a processing unit in an edge-cloud. An abstraction that enables powerful results on such performance measures is provided through queuing theory, as sketched in the bottom of Fig. 1. This connection-layer abstraction enables the source to make intelligent decisions as to how to utilize the available and possibly heterogeneous paths by only considering their latencies.

Our goal in this work is to find optimal collaborative uploading strategies in crowdsourcing scenarios. We differentiate between intermittent (devices such as sensors sending data on a coarse time scale) and continuous collaborative uploading (devices continuously streaming video footage, e.g., using Facebook Live). In optimizing performance metrics such as the uploading time, we also make a distinction between the cases when devices possess full knowledge of the different path characteristics, and when they perform statistical inference.

In this paper, we analyze replication and allocation strategies for collaborative uploading scenarios. We use a Fork-Join (FJ) queuing system (see Fig. 1) that captures the ability to split data into chunks that are transmitted over multiple paths, and finally merged when all chunks are received. Our contributions are summarized as: 1) closed-form expressions for the mean upload latency in the intermittent uploading case, allowing a comparison between a replication and an allocation (splitting) strategy. We find optimal strategies for given path latencies. In doing so, we also show numerical results suggesting near-optimality of the proportional allocation. 2)

Online path-aware adaptation of the allocation strategy based on statistical inference and stochastic gradient descent to sequentially minimize the expected waiting times in the continuous uploading case. We evaluate and compare the performance of our proposed adaptive strategy under various levels of path service burstiness.

The rest of this paper is organized as follows: in Sec. II, we outline our modeling approach for the intermittent as well as the continuous uploading case. The intermittent case is then considered in detail in Sec. III. In Sec. IV, we pose the continuous uploading system as a queuing theoretic one and use stochastic gradient methods to address the optimization of the allocation. In Sec. V, we furnish an evaluation study of our proposed online algorithm. Finally, we discuss related work in Sec. VI and summarize our findings in Sec. VII.

## Ii Modeling approach

Here, we present an overview of our approach, which consists of (i) defining an appropriate performance metric and (ii) framing an appropriate optimization problem thereafter.

We characterize the intermittent case as one where the time intervals between two successive uploads are so large that there is no self-induced queuing. Then, aspects such as cross-traffic can be described by means of the statistical properties of the path latencies alone. A primary device uploading data intermittently aims to minimize the upload latency, i.e., the time until the data reaches the cloud. Given multiple paths over secondary devices, the primary device may split the data into chunks that are transmitted or replicated over the available paths. The upload latency being a stochastic quantity, it is natural to consider its mean as a performance metric and optimize it over all possible splitting/replication configurations. In Sec. III, we express the upload latency as an order statistic of the individual upload times over the different paths, making the theory of order statistics a useful tool in our analysis.

In the case of continuous upload of a data stream, e.g., a primary device uploading a live video to the cloud, there is a notion of waiting before each data chunk can be uploaded and hence, that of queuing. We call the event of new data generation and passing by the application to the lower layers on the primary device, an arrival of a new data batch. Each data batch is split into chunks of various sizes that are transported over several paths. Paths are characterized by a random service time required to transport the assigned chunks. Finally, the data batch reaches the cloud when all of its chunks are received. Such systems are known as FJ queuing systems [1, 2, 3].

In the following we consider the intermittent uploading case of data of size over possibly heterogeneous paths (e.g., sensor or monitoring devices uploading data on a coarse time scale). Assume that the data can be divided into smaller chunks consisting of packets. Then, every

is a valid allocation vector, where

denotes the number of packets to be sent via path  and denotes the set of all non-negative integer solutions of the Diophantine equation , for . We denote the random amount of time taken to transport the -th packet out of the packets allocated to path by . Here, may capture different phenomena that impact the transmission time over a path, such as resource allocation, transmission collisions, and retransmissions. Assume that for each , with

, the random variables

’s are mutually independently distributed111Mutual independence, although not necessary for the subsequent analysis, is assumed for the sake of simplicity. In order to account for possible dependencies observed in real-world applications, one needs to additionally specify a correlation structure for these variables. This step is application-specific and is not easy in general. We do not attempt that in this paper. . Recall that the data consisting of packets can be reconstructed only after all the packets have arrived. Therefore, the upload latency can be expressed as  where for denotes the amount of time taken by path  to transport packets, and by convention, . The random variable  measures the total amount of time taken to transport all the packets over different paths. In this work, we consider

the expected upload time given an allocation , as our performance metric. The density function of is given by the -fold self-convolution of the density function of

due to independence. Let us denote the cumulative distribution function (CDF) of

by . Stacking into a column vector , we express the expected values of the order statistics of as an operator  on (see Remark 1 in Appendix A-A). Since

is the first moment of the

-th order statistic, we get

 ψ(k)=μNF(k)=∑j∈[N](−1)j+1MjF(k), (1)

where and are operators defined in Appendix A-A. The optimal allocation is found by minimizing , i.e.,

 (2)

Note that when the path characteristics are unknown, we can perform statistical inference222This is particularly important from an engineering perspective. The issue of statistical inference becomes more interesting in the context of continuous upload. We show examples in Sec. V.. In the following, we show some illustrative examples with computable before generalizing this allocation scheme to include replication strategies.

### Iii-a The canonical two-path case

We consider the problem of finding the optimal allocation over two heterogeneous paths. Let denote our allocation. The corresponding upload latency is given by and its mean is . Suppose the packet latencies and are exponentially distributed with rates and . Then, setting , and , the expected upload time is

 ψ(k)= k1λ1+k2λ2−rk1−1∑n1=0k2−1∑n2=0(n1+n2n1)pn1qn2.

To minimize the above, we derive the following relation through algebraic manipulation in Appendix B

 ψ(k1,k2)⋛ψ(k1+1,k2−1)⟺Ip(k1,k2)I1−p(k2−1,k1+1)⋛λ2λ1,

where is the regularized -function. This allows finding the optimal allocation (see Appendix B).

When is large, the optimal strategy can be found by numerically solving the following nonlinear equation

 Ip(x,K−x)I1−p(K−x−1,x+1)−(1p−1)=0.

In this case, the optimal allocation on path  is one of the two nearest integers producing a lower mean upload latency.

In Fig. 2, we consider the canonical two-path scenario for different choices of path-specific delay distributions and show the mean upload latency as a function of the number of packets on path . For distributions not admitting a closed-form expression for the mean upload latency, e.g., Weibull and lognormal, we performed numerical integration.

Near-optimality of proportional allocation: A comparison with [4, 5]: The two-path scenario has been studied in [4, 5] for the exponential delay model. The authors, however, do not compute a closed-form expression for the mean upload latency and only provide the following upper bound, based on a Chernoff technique

 ψ(k1,k2)≤max{k1λ1,k2λ2}+√2π(√k1λ21+√k2λ22)(due to \@@cite[cite]{[\@@bibref{% }{Zhang2011Delay}{}{}]}).

Based on the above bound, the authors characterize the optimal allocation as being either the proportional allocation, i.e., or the winner-takes-it-all allocation, i.e., . In contrast, we provide exact closed-form expression for the mean upload delay and find the optimal allocation . Interestingly, we observe near-optimality of the proportional allocation, e.g., as shown in Fig. 4 (left) for exponential path delays. In Fig. 2, we see that similar conclusions hold for Weibull and lognormal delays as well.

### Iii-B The N-path case with exponential delays

We next consider the general case of paths available for uploading packets of data as sketched in Fig. 1. Suppose the -th path has exponential delay with rate . The mean upload latency of the allocation is given by

 ψ(k)= ∑S∈{A⊆[N]:A≠ϕ}(−1)∣S∣+1∑0≤ni≤ki−1:i∈S(∏i∈Sλniini!) ×Γ(∑i∈Sni+1)(∑i∈Sλi)∑i∈Sni+1.

The outer summation is carried out over all non-empty subsets of . The derivation is provided in Appendix B. The closed-form expression of for more than two paths has not been provided before, to the best of our knowledge.

###### Example 1.

In particular, when there are three paths admitting exponential delays with parameters  and respectively, the expression for mean delay corresponding to an allocation  simplifies to

 ψ(k) = k1λ1+k2λ2+k3λ3 −1λ1+λ2k1−1∑n1=0k2−1∑n2=0(n1+n2)!n1!n2!(λ1λ1+λ2)n1(λ2λ1+λ2)n2 −1λ2+λ3k2−1∑n2=0k3−1∑n3=0(n2+n3)!n2!n3!(λ2λ2+λ3)n2(λ3λ2+λ3)n3 −1λ3+λ1k3−1∑n3=0k1−1∑n1=0(n3+n1)!n3!n1!(λ3λ3+λ1)n3(λ1λ3+λ1)n1 +1λ1+λ2+λ3k1−1∑n1=0k2−1∑n2=0k3−1∑n3=0(n1+n2+n3)!n1!n2!n3!(λ1λ1+λ2+λ3)n1 ×(λ2λ1+λ2+λ3)n2(λ3λ1+λ2+λ3)n3. (3)

The exact expression of the mean delay above can be minimized to find the optimal allocation.

In Fig. 3, we consider three heterogeneous paths with exponential delays with rates equal to respectively. The near-optimality of the proportional allocation is observed here too (centered at the innermost contour).

The allocation strategies discussed so far inherently impose a synchronization constraint at the destination. At a certain overhead, one way to circumvent this synchronization constraint is replication, which we consider next.

### Iii-C Replication strategies

A basic replication strategy is to send the entire data over all available paths and take the first chunk that arrives at the destination. Replication strategies are known to reduce latency in some regimes [6]. However, an apparent drawback is their overuse of resources, e.g., higher energy consumption. Roughly put, replication replaces the max operation (requiring the last chunk to arrive to complete the data at the receiver) with a min operation (taking the first to arrive at the receiver). However, the min operation is taken over elements that stochastically dominate the elements over which the max operation is taken. This poses an interesting trade-off: When should we replicate, and not allocate?

In the basic replication case, the upload latency is  where , as before. Our objective remains minimizing the mean upload latency

 ϕ(N,K)\coloneqqE[D]=E[min(D(K)1,D(K)2,…,D(K)N)]=μ1F(Kυ),

where is an -dimensional vector of all ones and . We favor the replication strategy if is smaller than the mean upload latency of any allocation , i.e., if . In relation to the question of replication versus allocation, we introduce next the notion of synchronization cost.

Synchronization cost: Suppose all available paths are used for transmission and let denote the reduced set of valid allocations. Within , an allocation can be worse than a replication essentially because of the synchronization at the destination, i.e., because of some paths being much slower than others. To compare with a replication strategy, we define the synchronization cost given paths and data size as

 χ(N,K)\coloneqq mink∈Λ∗(N,K)ψ(k)−ϕ(N,K) = mink∈Λ∗(N,K)μNF(k)−μ1F(Kυ). (4)

If is positive, replication yields smaller mean upload latency and hence, is preferred. If is negative, we prefer allocation over replication. Intuitively, if the data size is large, we expect the cost of redundancy to be high and to be negative.

Consider the canonical two-path example with exponential delays from Sec. III-A. A straightforward computation of yields the following closed-form expression of the synchronization cost defined in (III-C),

 χ(2,K)= min(k1,k2)∈Λ∗(2,K)ψ(k1,k2)−rK−1∑n1=0K−1∑n2=0(n1+n2n1)pn1qn2.

In Fig. 4, we show the synchronization cost as a function of the data size . As the data size increases the cost of redundancy worsens the performance of replication. Consequently, an allocation strategy is preferred for large data. However, the zero-crossing data size seen in Fig. 4, which marks the regimes where replication and allocation are more beneficial, shifts depending on path heterogeneity.

### Iii-D Combined Allocation and Replication: An (N,r)-strategy

Here, we present a variant of the replication strategy, called the -strategy. An -strategy splits data of size into smaller chunks so that the data batch can be reconstructed from any out of the chunks. One of the ways to achieve such a splitting is to use Erasure codes, e.g., maximum distance separable (MDS) codes [7]. Note that an -strategy corresponds to allocation and an -strategy, to replication. To formulate an -strategy, we define

 Υ(N,r,K)\coloneqq{k∈[K]N∣∑i∈Ski≥K∀S⫅[N],∣S∣=r}.

We call a an -allocation for data of size . The data is received as soon as the first out of chunks arrive at the destination. Let the order statistics corresponding to be denoted by . The mean upload latency for is . In Appendix A, we provide an example of -allocations given three heterogeneous paths with exponential delays. For a fixed , the optimal -allocation is given by . We can, however, further improve the performance by optimizing over . To measure the performance of an allocation compared to the optimal one, we define the regret of an -allocation as

 γ(k)\coloneqqηr(k)−minr∈[N]ηr(k(r)opt). (5)

In Fig. 5, we consider three heterogeneous paths with exponential delays. We find the optimal allocation by minimizing the regret. Interestingly, the optimal allocation is neither a replication, nor a allocation, but rather a -allocation.

Now, we analyze collaborative uploading for continuous data streams using an FJ queuing model. An example scenario is the continous upload of video data using multiple paths, as depicted in Fig. 1. We first consider a rigid allocation strategy based on known probabilistic bounds on the steady-state waiting times before proposing an adaptive allocation scheme based on stochastic gradient descent.

### Iv-a Rigid allocation based on steady-state bounds

Following [1], we define the waiting time of an incoming data batch as the amount of time it waits until the last of its chunks starts getting uploaded. Consider the steady-state waiting time (precise definition given in [1] and for the sake of completeness, also in Appendix A). It is hard to find out the distribution of the steady-state waiting times in closed form (see [1, 8]

). One approach is to compute tight upper bounds on the tail probabilities. Following

[2], for a given allocation  and independent service times, we get

 P(W≥σ)≤ (6)

where is given by a condition involving the Laplace transforms of the inter-arrival times and the service times for packets and . Here, is the effective decay rate of the tail probability in the sense of large deviations principle, and assesses the quality of a given allocation (the higher the decay rate, the better).

Reducing the waiting times is equivalent to maximizing the effective decay rate. Treating as a function of the allocation, the optimal allocation is given by

 (7)

In Fig. 6, we revisit the canonical two-path scenario with exponential delays (derivation in Example 3 in Appendix A). Plotting the effective decay rate as a function of the number of packets on path , we find the optimal allocation (yielding the largest decay rate). We also observe the near-optimality of the proportional allocation.

The approach in (7) is convenient because of its simplicity. However, it has a number of drawbacks. Apart from the exponentially growing search space for the optimal allocation, the approach is valid for the steady-state waiting time only. In many applications, the transient behavior is important. The approach in (7) does not allow for adaptation as it ignores the current state of the system (the number of chunks already on each path). In a realistic setup with changing environment (e.g., Markov-modulated paths’ services), the ability to adapt is crucial. Keeping this in mind, we propose an adaptive allocation scheme in the next section.

We consider the problem of sequentially optimizing allocations for collaborative uploading of incoming data batches. The procedure is sketched in Fig. 8. Our adaptive allocation strategy seeks to minimize a sequence of cost functions by choosing a sequence of optimal allocation vectors , with . An allocation vector is a vector of integers, where the -th entry corresponds to the number of packets (chunk size) transported over path and is the size of the -th data batch. Since optimization over integers is hard, we adopt the following standard relaxation: instead of an allocation vector, we optimize a proportion vector corresponding to the -th data batch, where is the set of valid proportions. The allocation vector is found by taking the floor of first entries of and subtracting their sum from to get the -entry such that (denote it as ). Adhering to our modeling approach in Sec. II, we choose the mean waiting time as our cost function.

Denote the service times of data batch on path and the inter-arrival times between data batch and by and , respectively. From a control theoretic perspective, we treat as our control variable to optimize the mean of the following output

 Wj\coloneqqmax{0,maxn∈[N],k∈[j−1]{∑i∈[k](xn,j−iSn,j−i−tj−i)}}. (8)

The quantity in (8) mimics the waiting time of -th data batch in a Fork-Join system [2, 1] with a diminishing rounding error for increasing batch sizes. The -th cost function is

 cj(xj)\coloneqqE[Wj+1], (9)

We minimize the cost functions sequentially in an online fashion using gradient descent methods as they achieve a bounded regret in an online convex programming scenario [9].

As data batches are passed from the application on the primary device to be split over multiple secondary devices (Fig. 1), we assume the -th inter-arrival time is known to the scheduler before employing the next proportion , given

. Using Monte Carlo (MC) methods we can calculate an unbiased estimate of the cost function for each data batch

. Since (8) is a piecewise linear function, which is non-smooth, we use an unbiased estimate of a subgradient of the -th cost function in (9), to perform gradient descent. The definition of a subgradient is given in Appendix C and [10].

#### Gradient descent for data allocation

As shown in Fig. 8, we update the proportion using gradient descent by

 xj+1=PC(xj+η^gj(xj)) (10)

where is an unbiased estimate of the subgradient of in (9) evaluated at the current . Here, is the static learning rate controlling the step size of the subgradient, and denotes the Euclidean projection operator [11], projecting the gradient update onto the set of feasible proportions .

The update equation (10) ensures bounded regret with an unbiased estimate of the subgradient [12], which we obtain using Monte Carlo methods. We resample service times times to estimate the subgradient. The -th sample for the service times up to data batch at each path is denoted by . Then, the MC estimate of the subgradient is

 ^gj(xj)= 1M∑m∈[M](s(m)n,jen1(s(m)n∗m,k∗meTn∗mxj +k∗m−1∑i=1xn∗m,j−is(m)n∗m,j−i−k∗m−1∑i=0tj−i>0)), (11)

where is the -th unit vector, and is the indicator function. The maximizers and can be found by

 (n∗m,k∗m)=argmaxn∈[N],k∈[j]{s(m)n,jeTn+k−1∑i=1xn,j−is(m)n,j−i−k−1∑i=0tj−i}. (12)

Note that due to ergodicity, queuing systems such as the one described in (8) possess regeneration points at the beginning of every busy period, i.e., at the last time point when the queue was empty. This reduces the history, i.e., the number of samples, required for calculating the subgradient substantially. Algorithm 1 describes the adaptive allocation method. In the next section, we describe the inference procedure required for our adaptive allocation.

### Iv-C Inference for service time processes

Since the adaptation of allocations requires samples of the service times (Step 2 of Algorithm 1), we provide here illustrative resampling schemes for independent and identically distributed (i.i.d.) and Markov-modulated service times.

#### Exponential i.i.d. service times

Assume i.i.d service times at the different paths. Suppose the service times at one of the paths are exponentially distributed with rate parameter  so that the distribution of is gamma with shape parameter , and rate . The predictive distribution of a new sample of service time based on a set of observed data is found by integrating out the parameter , i.e., by

. If we assume a conjugate prior

on , namely, a gamma distribution with (hyper)-parameters and , the posterior is a gamma distribution with (posterior) parameters and . The derivation steps are provided in Appendix C. In order to sample from the predictive distribution, we first sample from the posterior distribution and then sample from conditioned on (step 2 in Algorithm 1). Repeating this procedure iteratively, we get the following update equations for the posterior parameters , with and .

#### Markov modulated service times

In this case, we assume that the service times are instances of a Markov modulated exponentially distributed variable. For a sequence of service times, we first find maximum-likelihood estimates (MLE) of the underlying parameters, e.g., the initial distribution, transition matrix and the rates. Since the MLE computation is expensive, this estimation process is executed offline. The derivation of the MLE is provided in Appendix C. Next, we condition on these parameters to calculate the maximum a posteriori (MAP) estimate of the current hidden state of the Markov chain. We do this online using the Viterbi algorithm [13]. Having inferred the hidden state, we resample the service times (step 2 in Algorithm 1) online by conditioning on the hidden state and the parameters estimated in the first step.

## V Numerical Evaluation

We consider the setup in Fig. 1 with arrivals at the primary device being modeled as a Markov modulated Poisson process (MMPP) to allow for burstiness in batch arrivals. This model was shown to be a good candidate for some network traffic [14]

. We assume inter-arrival times are modulated by a three-state Markov chain. Further, the batch sizes are sampled from a Poisson distribution with mean

to account for varying data sizes to be uploaded. We assume five heterogeneous paths are available. For the service process, we observe just one sample of the service times for each path and each data batch.

We evaluate the performance of our adaptive allocation using two experiments. In the first experiment, we vary the service time distributions to reflect different regimes of stress on the paths, assuming full knowledge of the model parameters. In the second experiment, we do not assume knowledge of the model parameters, but instead infer them. The MC estimate of the subgradient is always based on samples, except for the one sample estimate (OSE) method where only the observed sample is used. We use and simulations for the first and the second experiment, respectively.

We compare our adaptive allocation method with the proportional allocation because of its observed near-optimality in the intermittent case and in the bound approach for continuous stream uploading. Note that finding the optimal allocation is computationally expensive, while the proportional allocation is readily found. We also consider a variant of a queue-aware “join the shortest queue” (JSQ) schedule where the batch is assigned to the path with the shortest queue. Such schedules are known to have good performance. However, for many applications, obtaining the queue length information is hard.

Experiment 1: Here, we assume Markov modulated exponentially distributed service times. This captures the service time correlations, e.g., in time varying wireless channels [15]. We use five independent three-state Markov chains to modulate the means of the service times for each chunk on each path. For the proportional allocation, we calculate the mean service rate weighted by the stationary probabilities.

We consider two complimentary situations: one, called the low stress regime, where the service rates are high such that one path is sufficient to serve incoming batches (ensuring stability), and the other, called the high stress regime, where the service rates are low that utilizing all the five paths is necessary to ensure stability. In Fig. 10, we show the complementary cumulative distribution function (CCDF) of the waiting times comparing different allocation strategies. This figure shows the benefit of adaptive allocation. Note that the proportional allocation is not adaptive. On the other hand, the batch JSQ under-utilizes parallelization. Our allocation scheme described in Sec. IV-B ensures adaptiveness while being allocative. This benefit of adaptation is prominent, especially in the high stress regime as seen in Fig. 10 (Middle). Fig. 10 (Right) also shows how our allocation adapts to the service rate changes. Here, the latent state of the Markov chain and the parameters for the service times are assumed to be known.

Experiment 2: In this experiment, we infer the model parameters. In keeping with the setup in Fig. 8, we first assume i.i.d. exponentially distributed service times. Fig. 11

(Left) shows the CCDFs of the waiting times for different allocation strategies. To estimate the subgradient, the oracle uses the true parameters to draw samples, while the Bayesian inference draws samples from the predictive distribution.

We further consider the setup in Fig. 8 with Markov modulated exponentially distributed service times as described in Experiment 1. The inference method draws samples from the emission distribution using a MAP estimate of the current latent state of the Markov chain for each allocation. The parameters of the Markov modulation of the service times are learned offline using a training sequence. In Fig. 11 (Right), we compare our inference-based method with the OSE, and an oracle that draws samples from the emission distribution with the true parameters and the true latent states.

From Fig. 11, we see that increasing the number of samples for the subgradient estimation leads to smaller tail probabilities, since the subgradient noise decreases. Interestingly, the lack of knowledge of the model parameters does not affect the performance of our adaptive allocation, as the inference method achieves results comparable to that of the oracle.

## Vi Related Work

The work in [16] anticipated the emergence of crowdsourcing systems. A recent discussion introduced crowdsourced live event coverage in [17], where the authors propose adaptive strategies for collaboratively uploading the most relevant streams. Our analytical treatment is complimentary to [17].

The intermittent uploading scenario is analyzed in [4], where the authors provide an upper bound on the mean delay for the canonical two-path scenario with exponential path delays. In contrast, we obtain closed-form expressions for the mean delay for paths. We also provide tools and examples of the optimization thereof.

A segment of related work is concerned with the analysis of controlled and uncontrolled Fork-Join systems. For uncontrolled, i.e., non-adaptive FJ systems, it is known that exact results are hard to obtain [8]. Exact results are known for the joint workload distribution for only two parallel queues with Poisson arrivals and i.i.d exponential service times [18]. For more general scenarios we resort, e.g., to bounds on the tail probabilities of the steady-state waiting times for single-stage systems [2, 1, 8] or multistage systems [19]. They also highlight the benefits of parallelization under high utilization regimes, in agreement with what we observe in Experiment 1 in Sec. V. The work in [20] considers controlling FJ systems using a gradient descent approach to minimize queue lengths. They use changes in queue lengths to find an unbiased estimate of the gradient. Note that obtaining the queue lengths is not straightforward in many applications. Therefore we infer the service time distributions to adapt our allocation using gradient descent to minimize the expected waiting time. In [21], the authors investigate scheduling of batch jobs in systems with multiple servers. As opposed to our setup, they assume a queuing model without synchronization constraint and only consider a special class of i.i.d. distributed service times. For our adaptive allocation method, we do not make any independence assumption.

Redundancy techniques have grown in popularity over the years as a means to decrease latency. In [6], the authors study the trade-off between the latency reduction attained by redundancy and the corresponding overhead. Based on empirical results, they argue that redundancy can be effective in a large class of applications. The work in [7] is close to ours. The authors model a cloud computing scenario as a Fork-Join system with identical servers and analyze different redundancy techniques, which are akin to our -allocations in the intermittent uploading case, with a view to reducing latency in a cost-efficient manner. The authors find that the log-concavity of the task service times decides the success of redundancy techniques. Their approach is complimentary to our adaptive allocation with heterogeneous Markov modulated servers.

The allocation problem in the continuous stream uploading case can be seen as a type of load-balancing problem, however, with the mean waiting time as the objective function. A programming model for the allocation of continous streams over multiple paths was introduced in [22]. In a recent work [23], the authors consider storage and delivery of large files in data-centers, where files are first erasure-coded and then stored in a subset of the available servers. They compare the performance of water-filling and batch sampling as dynamic load-balancing policies and provide computable performance bounds. In contrast, we do not restrict ourselves to a rigid allocation strategy and adapt our allocation dynamically.

## Vii Conclusion

In this work, we optimize allocation and replication strategies in collaborative uploading scenarios. We differentiate between intermittent and continuous stream uploading, based on the system’s queuing behavior. In the first case (no queuing), we unify the notions of allocation and replication, and provide closed-form expressions for the mean upload latency. We use our exact formulation for the intermittent uploading case to derive optimal allocation and replication strategies.

We pose the continuous stream uploading case as a Fork-Join queuing model with varying burstiness of the data traffic to be uploaded, and of the paths’ service. Thereby we propose an adaptive allocation scheme, based on statistical inference of the properties of the paths’ latencies. We sequentially minimize a notion of the expected waiting time, ensuring a bounded regret. We show the effectiveness of our adaptive approach compared to proportional allocation and batch JSQ allocation. The lack of knowledge of the model parameters does not affect the performance of our adaptive allocation, as the inference methods are able to achieve results comparable to those of an oracle with full system knowledge.

## Appendix A

### A-a Moments of order statistics

Let be independent positive-valued random variables with absolutely continuous CDFs . Let the corresponding order statistics be . Write and . The distribution of the -th order statistic can be elegantly written in terms of certain permanents as [24, Theorem 4.1],

 P(Yr≤y)= N∑i=r1i!(N−i)!per[F(y)i1−F(y)N−i], (13)

where denotes the matrix whose first  columns are and the last  columns are , denotes the permanent of an real matrix , and denote the class of all permutations of . Using (13), we derive the expected values of the order statistics [24, 25].

###### Remark 1.

For , the mean of can be conveniently written in terms of -operators given by

 E[Yr]=μrF\coloneqqN∑j=N−r+1(−1)j−(N−r−1)(j−1N−r)MjF,

where the -operators, for , are defined as

 MjF\coloneqq∑S∈{A⊆[N]:∣A∣=j}∫∞0(∏i∈S(1−Fi(x)))dx. (14)
###### Proof of Remark 1.

The proof follows from [24, 25]. However, for the sake of completeness, we furnish a brief sketch here. Define, for ,

 Hr(y)\coloneqqP(Yr≤y), (15)

where is given in (13). Then, the mean can be obtained by performing the following integral

 E[Yr]=∫∞0(1−Hr(y))dy.

Observe that, we can derive the following recursion relation from (13), for ,

 Hr−1(y) =Hr(y)+1(r−1)!(N−r+1)!per[F(y)r−11−F(y)N−r+1], (16)

where the permanent of a real matrix is given by

 perA\coloneqq∑σ∈Θ(N)N∏i=1ai,σ(i),

and denote the class of all permutations of . Plugging in the definition of the permanent, we rewrite (16) as

 Hr−1(y) =Hr(y)+1(r−1)!(N−r+1)!∑σ∈Θ(N)N∏i=1ai,σ(i)(y),

where

 ai,σ(i)(y)= {Fi(y) if 1≤σ(i)≤r−1,1−Fi(y) if r≤σ(i)≤N. (17)

Rearranging the terms in the recurrence relation, we get

 1−Hr(y)= 1−Hr−1(y) +1(r−1)!(N−r+1)!∑σ∈Θ(N)N∏i=1ai,σ(i)(y).

Integrating both sides and using the -operators, we get

 μrF= μr−1F +1(r−1)!(N−r+1)!∑σ∈Θ(N)∫∞0N∏i=1ai,σ(i)(y)dy = μr−1F+KrF,

where the operator is given by

 KrF\coloneqq 1(r−1)!(N−r+1)!∑σ∈Θ(N)∫∞0N∏i=1ai,σ(i)(y)dy.

Note that there are terms involving and terms involving in the product, for each permutation . Therefore, we have

 KrF= ∑S∈{A⊆[N]:∣A∣=r−1}∫∞0⎛⎝∏j∈SFj(y)⎞⎠⎛⎝∏j∈Sc(1−Fj(y))⎞⎠dy.

Let us rewrite -operators in the following way to get an identity

 KrF≡ r∑j=1(−1)j−1c(j,r,N)MN−r+jF, (18)

where ’s are suitable counting coefficients so that the above identity holds true with -operators defined by

 MjF\coloneqq∑S∈{A⊆[N]:∣A∣=j}∫∞0(∏i∈S(1−Fi(x)))dx.

Notice that the number of terms under the summation over with is , while that under the summation over with appearing in the computation of is . Therefore, by applying multiplication principle of combinatorial analysis, the counting coefficients must satisfy

 (Nr−1)(r−1j−1)=c(j,r,N)(NN−r+j),

in order for the above identity in (18) to hold true (see [25]). Therefore, we get

 c(j,r,N)= (N−r+jj−1), (19)

and we get the following recursion relation, for ,

 μrF= μr−1F+r∑j=1(−1)j−1(N−r+jj−1)MN−r+jF. (20)

Observe that and . Thereby from (20), the claim

 μrF=N∑j=N−r+1(−1)j−(N−r−1)(j−1N−r)MjF (21)

follows by induction on . The induction is proved in [25] and we do not repeat it here. This completes the proof.

### A-B General (N,r)-strategy

###### Example 2 (Example of (N,r)-strategy).

Suppose we have three paths with exponential delays with parameters  and . Define, for , and . The mean upload latency corresponding to a -allocation (replication) is