I Introduction
Recent years have witnessed the fast popularization of unmanned aerial vehicle (UAV) in both academia and industry [7, 3, 6]
. The UAVs are often equipped with sensing, communication, and computing capabilities and can generate massive data, which include valuable information that can be used for improving decisions, making scientific discoveries, or supporting new artificial intelligence (AI) applications. To effectively utilize the information from these data, e.g., by using deep learning algorithms, considerable amount of computing resources are often needed. However, due to small payload, a single UAV often has a limited computing capability and battery capacity for carrying out computationintensive tasks.
To address above issues, the existing solution is to offload data from the UAV to the ground station or remote cloud for processing. However, this solution suffers from many issues such as long transmission latency and data losses, and is thus not suitable for delaysensitive applications. A better solution is to offload the data to nearby UAVs and leverage their computing resources to perform data processing and analysis. Such a computing system formed by UAVs connected via UAVtoUAV communication links are often known as the UAVbased networked airborne computing (NAC) [12]. Compared with the traditional cloud or static serverbased computing systems, UAVbased NAC systems are featured by 1) high node mobility; 2) heterogeneous nodes with different computing, communication and sensing capabilities; and 3) dynamic computing and communication resources. These unique features make many existing distributed computing techniques that assume homogeneous and static computing nodes perform poorly in UAVbased NAC systems.
The timevarying communication and computing properties of UAVbased networked airborne computing systems can be modeled as uncertain stragglers that are slow in generating the result or take a long time to transmit data. Topology changes or link/node failures can also be modeled as uncertain stragglers that fail to generate or return any results. To alleviate the effects of stragglers, coded distributed computing (CDC) [13] is a promising technique, which introduces computation redundancy into the system via exploiting the coding theory. Currently, most works on CDC focus on the matrix multiplication problem or assume homogeneous distributed systems with static computing nodes [9, 18, 16, 14]
. However, many data analysis algorithms, especially the filtering or feature extraction techniques like the convolutional neural networks (CNNs), involve convolution operations. How to perform resilient distributed convolution over UAVbased NAC systems formed by heterogeneous moving UAVs has not been investigated, to the best of our knowledge. The stateoftheart coded convolution strategy introduced in
[5] was designed for homogeneous systems with static computing nodes, which performs poorly over the UAVbased NAC system as we will show in the simulation studies. Although there have been some works considering heterogeneous systems [14, 18, 16] and moving computing nodes [8, 17], these works are centered on the matrix multiplication problem, which has a quite different problem solving procedure from the convolution problem.In this paper, we aim to fill the aforementioned research gap by making the following main contributions:

Dynamic coded distributed convolution strategy. We propose an innovative dynamic coded distributed convolution strategy with privacy awareness for UAVbased NAC. It integrates the coding theory with a novel task decomposing and allocation mechanism to dynamically assign tasks to the worker nodes based on their communication and computing performances. Unlike most existing CDC algorithms that have to predetermine the amount of computation redundancy to be introduced before performing the task, our strategy introduces redundancy dynamically and only when needed. It can thus achieve high resilience with the minimal redundancy. Furthermore, as our strategy encodes the input data, data privacy is protected to some extent.

Comprehensive simulation studies. We conducted comprehensive simulation studies to evaluate the performance of the proposed strategy, in comparison to the uncoded distributed convolution strategy and the stateoftheart coded distributed convolution strategies. The results demonstrate the high efficiency and resilience of the proposed strategy in face of uncertain stragglers.
In the rest of the paper, we first describe the problem to be solved in Sec. II, and then review the two existing distributed convolution strategies in Sec. III. The proposed dynamic coded distributed convolution strategy is then introduced in Sec. IV. In Sec. V, we present the simulation results on the performance of the proposed strategy, compared to existing distributed convolution strategies. Section VI finally concludes the paper.
Ii Problem Description
Consider a UAVbased NAC system formed by multiple UAVs with different computing and/or communication capabilities. Suppose one of the UAV needs to perform a vector convolution task, , where is a prestored vector and is the input vector. To save energy and reduce computation time, it decides to offload the task to its neighbors within its communication range.
The problem considered in this paper is how the master node (UAV that offloads the task) should decompose the task and distribute subtasks to surrounding worker nodes (UAVs that execute the offloaded task collaboratively), such that the task completion time is minimized. To solve this problem, the key technical challenges to conquer include: 1) As all worker nodes in the UAVbased NAC system can move, the network topology may change frequently due to node leave and join, and the communication quality of UAVtoUAV links varies over time; 2) The computing resources available at a worker node are also time variant, due to completion of old tasks or receipt of new tasks; 3) The input data may contain sensitive information and directly sending the data to worker nodes may raise privacy concerns. The desired distributed computing scheme should thus 1) be resilient to network topology and resource changes, 2) be efficient in computing the task, and 3) protect data privacy to certain extent.
Iii Review of Existing Solutions
In this section, we review two stateoftheart distributed convolution strategies.
Iiia Uncoded Convolution Strategy
In the uncoded convolution strategy introduced in [5], the master node first partitions both vectors and evenly into a set of subvectors of length , i.e., and , where is the total number of worker nodes. It then sends each pair of subvectors, and , to a different worker node for further processing, where and . Each worker node computes and returns the result back to the master node. After receiving results from all worker nodes, the master node finally aggregates the results with proper shifts to obtain the value of .
As this strategy requires the results from all worker nodes to obtain the final value, any delay will significantly degrade its performance and any node/link failure will cause the whole task to fail. In addition, this strategy simply decomposes the workload evenly, and thus cannot address the node heterogeneity and dynamic features of UAVbased NAC. Moreover, it directly sends the input data to the worker nodes and hence may cause information leakage.
IiiB Traditional Coded Convolution Strategy
To improve the resilience of the uncoded strategy to the straggler effects, a coded strategy was developed in [5]. The key idea is to introduce redundancy into the computation by using the coding theory. In particular, similar to the uncoded strategy, the coded strategy first partitions both vectors and into small subvectors of equal length . The difference is that the subvector length can be any value larger than , and the subvectors of are encoded into subvectors with each having a length of , by using a (, ) MDS code. The following equation shows how a Vandermonde matrix, denoted as , can be used to encode the set of subvectors, , into a larger set of subvectors, :
With the encoded subvectors , the master node then sends each pair to a different worker node, where and . The worker nodes then convolve the received two subvectors and return the result back to the master node after the task is completed. The master node can decode to reconstruct after receiving any of the set by using the following equation:
where represent any distinct indices of , and is a submatrix of . Finally, the master node can reconstruct after obtaining .
It should be noted that although this strategy can effectively reduce the straggler effect, it has the following limitations: 1) It cannot address the node heterogeneity and dynamic features of UAVbased NAC; 2) It is only resilient to up to node failures; 3) In order to achieve high resilience, the introduced computation redundancy, indicated by , should be large; 4) It also directly sends the input data to the worker nodes and thus may cause information leakage.
Iv Dynamic Coded Convolution Strategy with Privacy Awareness
In this section, we introduce a privacyaware dynamic coded convolution strategy that addresses the unique features of UAVbased NAC systems. How to decompose and encode the task is first explained, followed by the description of how to allocate and distribute the decomposed subtasks.
Iva Task Decomposing and Encoding
Instead of partitioning both vectors, we split only the input vector evenly into subvectors , where is the length of each subvector and can be any integer between 1 and . Then instead of encoding , we encode the input subvectors into a larger set by applying a (, ) MDS code, where specifies the computation redundancy. This will not only enhance the system resilience to uncertain stragglers, but also protect the data privacy to certain extent as the original input data is not sent. These subvectors are then pushed into a stack, denoted as , at the master node. Whenever a worker node becomes available, we pop a subvector from the top of stack and send it to this worker node to compute , where vector is prestored in all worker nodes. Once the master node receives convolution results from the worker nodes, it can decode , using the similar decoding procedure described in Section IIIB, and thereby reconstructing .
Unlike in the traditional coded convolution strategy, where the amount of computation redundancy is fixed after specifying the length of the subvectors, we here base on the network condition to dynamically introduce redundancy when needed. In particular, we first set as a small value, e.g., , so that the initial stack only contains encoded input subvectors merely adequate enough for obtaining the final result. During task execution, whenever the stack becomes empty (or below a certain threshold) and the master node still hasn’t received sufficient results for computing the final value, the master node pushes a new , generated by encoding , into the stack. Note that we can prestore an encoding matrix that is large enough at the master node, and take the first rows to initialize the stack and take a new row whenever needed to generate new during task execution. With this scheme, we can minimize the amount of introduced computation redundancy and maximize the system resilience to uncertain stragglers simultaneously.
IvB Task Allocation
To determine which worker node the master node should send the next subvector (popped from the stack ) to and when to send this subvector, we borrow the idea introduced in [8]. The key idea is to send a subvector
to each worker node at the beginning. The master node then determines the best time to send the next subvector to a worker node based on the estimation of the time required for this worker node to complete its current task as well as send back the result.
In particular, let be the time interval between sending two consecutive subvectors, and , to the worker node from the master node. As illustrated in Fig. 1, in order to maximally reduce the computation delay, the desired will minimize the idle time at the worker node while not overloading it. That is, ideally, the worker node should receive immediately after it completes the previous task, i.e., computing . To determine , the key is thus to estimate the time required for the master node to compute , denoted as . Here, we apply the method introduced in [8] to estimate the expected time required for the worker node to compute . In particular, the expected computation time can be estimated by following equations:
(1) 
(2) 
(3) 
where is the time when the worker node finishes computing , is the time when the master node receives the computation result of from the worker node , and is the time when the master node sends subvector to the worker node . is the accumulated idle time of worker node . is the number of results sent back from the worker node , (bytes) is the size of the vector and (bytes) is the size of the result of . Lastly, is the round trip time of sending to the worker node and receiving the computed result, which can be estimated at the master node by exchanging Acknowledgement (ACK) packages [15] or based on the timestamps returned by the worker node .
Given , we then determine using the following equation:
(4) 
Algorithm 1 summarizes the complete procedure of the proposed dynamic coded convolution strategy.
V Simulation Studies
In this section, we conduct simulations to evaluate the performance of the proposed strategy, in comparison with the uncoded convolution and traditional coded convolution schemes. All simulations are performed on a PC with 16GB of RAM and Intel Core i54590.
Va System Models
We use the following system models to simulate the movement of UAVs, as well as how they compute and how they communicate with each other.
VA1 Mobility Model
A simple 2dimensional (2D) pointmass mobility model is adopted to simulate the movement of each UAV. In particular, let denote the location of UAV at time . Then its location at the next time point is given by the following equation:
where is the velocity of UAV at time , where .
VA2 Computing Model
To simulate the time required by each UAV to convolve two vectors, say and
, we adopt the following shifted exponential distribution model commonly used in the literature
[18]:where is the time taken by UAV to compute . and are shift and straggling parameters, respectively, which characterize the computing power of UAV . represents the computation load required for computing
. To derive this value, we assume that the Fast Fourier Transform (FFT) is used by each UAV to calculate the convolution of two vectors with arbitrary lengths. Hence, we have
[4, 1]:where is a constant independent of the lengths of the vectors.
VA3 Communication Model
We assume that the communication between any two UAV nodes is achieved through a directional antenna, and the antennas are always aligned during the movement [2, 10]. The communication time required for the master node to transmit to (or receive from) a worker node a dataset containing numbers at time can be approximated by the following equation [11]:
where is the average size of the numbers in the dataset. is the data rate (bits/sec) given by:
where (Hz) is the communication bandwidth between the master node and worker node and is the noise power (W), both of which are assumed to be constant. is the signal power (W) that depends on the distance between the master node and the worker node at time . (dBm) is the transmitting power, is the sum of the transmitting and receiving gains, is the Gaussian noise and is the wave length.
VB Experiment Setup
We consider the following four computation scenarios:

Scenario 1: , , .

Scenario 2: , , .

Scenario 3: , , .

Scenario 4: , , .
In all scenarios, the straggling parameter in the computation model is randomly sampled from the range , and the shift parameter is set to . To simulate the straggler effect, we consider two cases: 1) stragglers caused by long communication latency and/or computation delay; and 2) stragglers caused by system failures or moving out of the master node’s communication range. To model the first case, we manually make the runtime of the stragglers to be times the simulated runtime returned from its computation model. To model the second case, we make the stragglers stop returning any results to the master node.
For the configuration of the mobility model, the initial position of each UAV is randomly sampled from the range . The velocity of each UAV is randomly sampled from the range m/s once every second. Lastly, the parameters in the communication model are configured as , , and . It is worthy of remark that our method does not require any knowledge of the mobility, computation or communication models.
VC Simulation Results
VC1 Impact of Parameter
We first study the impact of the key parameter in our method, i.e., the length of the input subvectors , by evaluating the performance of our method at different values of . To reduce uncertainty, each experiment in our simulation study is repeated for times and the mean execution times are recorded. As shown in Fig. 2, in all four scenarios, the execution time of our method first decreases as increases, and then increases after reaches a certain value. The best is thus the one that leads to the minimum mean execution time, which varies in different scenarios. Fig. 2 also reveals that with the increase of the problem size (characterized by and ) or the decrease of the amount of computing resources available (indicated by ), the time required for conducting the computation task increases.
In subsequent experiments, we use the bestperforming to configure our method in each scenario, which are marked with big dots in Fig. 2. For the choice of , length of the subvectors in the traditional coded convolution strategy, we follow the selection guideline provided in [5] and set as the integer from range that maximizes , where is given by:
VC2 Comparison Studies
We first study the case when stragglers with long communication/computation delay are present. Fig. 3 compares the performance of different strategies when 50% of the worker nodes randomly selected are such stragglers. It shows that our method achieves the highest efficiency in all scenarios and the uncoded convolution strategy is the least efficient.
To better understand the three strategies, we further conduct a stress test by varying the percentages of the stragglers with long delays. Fig. 4 shows the results of the stress test for different strategies in Scenario 4. As we can see, the performance of all three strategies degrade with the increase of the straggler ratio and our method achieves the best performance in all cases. It can also be observed that the uncoded strategy is the most sensitive to stragglers, as indicated by the immediate increase of its execution time when the straggler ratio becomes nonzero. Nevertheless, the execution time of our method does not increase much until the straggler ratio exceeds around 83%, demonstrating its high resilience to uncertain stragglers.
Lastly, we investigate the case when node failures or node leaves can happen. As the master node cannot receive any results from such stragglers, the results received from other worker nodes may not be sufficient enough for the master node to reconstruct the convolution , leading to task failures. Therefore, in this study, we measure the task success rate (ratio of successful runs) of each strategy at the presence of such type of stragglers. Fig. 5 shows the success rates of different strategies, where each strategy runs 2000 times in each scenario. In each simulation run, up to worker nodes can fail. The result demonstrates that our method is highly resilient to node failures. It is worthy noting that our method can successfully complete the task as long as there is a worker node alive, which can be the master node itself. Moreover, even if the master node loses connection with all worker nodes, as long as there is a new node joining later, the task will resume.
Vi Conclusion
This paper introduces an efficient, resilient, and privacyaware distributed computing strategy for vector convolution tasks in heterogeneous and mobile UAVbased NAC systems. It combines the coding theory with a novel task decomposing and allocation mechanism to achieve a high resilience to uncertain stragglers with the minimal computation redundancy. As input data is encoded, it also provides some protection for data privacy. The simulation results show that the proposed strategy outperforms existing solutions in both efficiency and resilience, especially when a large number of highlatency computing nodes are present or frequent node leaves/failures happen. Moreover, our method is adaptive to the dynamic network changes in UAVbased NAC systems and can complete the task as long as there is a worker node alive, which can be the master node itself.
In the future, we will design intelligent strategies to automate the configuration for the key parameter , and extend the proposed strategy to real CNNbased applications. We will also develop hardware testbed for UAVbased NAC and conduct flight tests to evaluate the performance of the proposed strategy.
Acknowledgment
We would like to thank the National Science Foundation (NSF) under Grants CI1953048 and CAREER2048266 for the support of this work.
References
 [1] (1999) The fourier transform and its applications, 3rd ed. McGrawHill New York. Cited by: §VA2.
 [2] (2017) Longrange and broadband aerial communication using directional antennas (acda): design and implementation. IEEE Transactions on Vehicular Technology 66 (12), pp. 10793–10805. Cited by: §VA3.
 [3] (2009) Developing a uavbased rapid mapping system for emergency response. In Unmanned Systems Technology XI, Vol. 7332, pp. 75–86. Cited by: §I.
 [4] (1969) The fast fourier transform and its applications. IEEE Transactions on Education 12 (1), pp. 27–34. Cited by: §VA2.
 [5] (2017) Coded convolution for parallel and distributed computing within a deadline. In 2017 IEEE International Symposium on Information Theory (ISIT), pp. 2403–2407. Cited by: §I, §IIIA, §IIIB, §VC1.
 [6] (2016) Visual monitoring of civil infrastructure systems via cameraequipped unmanned aerial vehicles (uavs): a review of related works. Visualization in Engineering 4 (1), pp. 1–8. Cited by: §I.
 [7] (2013) Processing and assessment of spectrometric, stereoscopic imagery collected using a lightweight uav spectral camera for precision agriculture. Remote Sensing 5 (10), pp. 5006–5039. Cited by: §I.
 [8] (201809) Dynamic heterogeneityaware coded cooperative computation at the edge. In Proc. of ICNP 2018, pp. 23–33. Cited by: §I, §IVB, §IVB.
 [9] (2017) Coded computation for multicore setups. In Proc. of IEEE ISIT 2017, pp. 2413–2417. Cited by: §I.
 [10] (2019) Design and implementation of aerial communication using directional antennas: learning control in unknown communication environments. IET Control Theory & Applications 13 (17), pp. 2906–2916. Cited by: §VA3.
 [11] (2019) Learning and uncertaintyexploited directional antenna control for robust longdistance and broadband aerial communication. IEEE Transactions on Vehicular Technology 69 (1), pp. 593–606. Cited by: §VA3.
 [12] (2019) Toward uavbased airborne computing. IEEE Wireless Communications 26 (6), pp. 172–179. Cited by: §I.
 [13] (2021) A comprehensive survey on coded distributed computing: fundamentals, challenges, and networking applications. IEEE Communications Surveys & Tutorials. Cited by: §I.
 [14] (2017) Coded computation over heterogeneous clusters. In Proc. of IEEE ISIT 2017, Cited by: §I.
 [15] (2006) Observations on roundtrip times of tcp connections. Simulation Series 38 (3), pp. 347. Cited by: §IVB.
 [16] (2019) Coding for heterogeneous uavbased networked airborne computing. In 2019 IEEE Globecom Workshops (GC Wkshps), pp. 1–6. Cited by: §I.

[17]
(2021)
Multiagent reinforcement learning based coded computation for mobile ad hoc computing
. arXiv preprint arXiv:2104.07539. Cited by: §I.  [18] (2021) On batchprocessing based coded computing for heterogeneous distributed computing systems. IEEE Transactions on Network Science and Engineering 8 (3), pp. 2438–2454. Cited by: §I, §VA2.