Coded Distributed Computing Schemes with Smaller Numbers of Input Files and Output Functions
Li et al. (IEEE Transaction on Information Theory, 64, 109-128, 2018) introduced coded distributed computing (CDC) to reduce the communication load in general distributed computing frameworks such as MapReduce, Hadoop and Spark. They also proposed CDC schemes with minimal computation load for fixed computation load. However, these schemes require exponentially large numbers of input files and output functions when the number of computing nodes gets large. In this paper, we give a construction of CDC schemes based on placement dilevery arrays (PDAs), which was introduced to study coded caching schemes. Consequently, based on known results of PDAs, several CDC schemes can be obtained. Most importantly, the minimal number of output functions in all the new schemes are only a factor of the number of computing nodes, and the minimal number of input files in our new schemes is much smaller than that of CDC schemes derived by Li et al.
READ FULL TEXT