Optimal Data Placement for Data-Sharing Scientific Workflows in Heterogeneous Edge-Cloud Computing Environments
The heterogeneous edge-cloud computing paradigm can provide a more optimal direction to deploy scientific workflows than traditional distributed computing or cloud computing environments. Due to the different sizes of scientific datasets and some of these datasets must keep private, it is still a difficult problem to finding an data placement strategy that can minimize data transmission as well as placement cost. To address this issue, this paper combines advantages of both edge and cloud computing to construct a data placement model, which can balance data transfer time and data placement cost using intelligent computation. The most difficult research challenge the model solved is to consider many constrain in this hybrid computing environments, which including shared datasets within individual and among multiple workflows across various geographical regions. According to the constructed model, the study propose a new data placement strategy named DE-DPSO-DPS, which using a discrete particle swarm optimization algorithm with differential evolution (DE-DPSO-DPA) to distribute these scientific datasets. The strategy also not only consider the characteristics such as the number and storage capacity of edge micro-datacenters, the bandwidth between different datacenters and the proportion of private datasets, but also analysis the performance of algorithm during the workflows execution. Comprehensive experiments are designed in simulated heterogeneous edge-cloud computing environments demonstrate that the data placement strategy can effectively reduce the data transmission time and placement cost as compared to traditional strategies for data-sharing scientific workflows.
READ FULL TEXT