Wide-Area Data Analytics
We increasingly live in a data-driven world, with diverse kinds of data distributed across many locations. In some cases, the datasets are collected from multiple locations, such as sensors (e.g., mobile phones and street cameras) spread throughout a geographic region. The data may need to be analyzed close to where they are produced, particularly when the applications require low latency, high, low cost, user privacy, and regulatory constraints. In other cases, large datasets are distributed across public clouds, private clouds, or edge-cloud computing sites with more plentiful computation, storage, bandwidth, and energy resources. Often, some portion of the analysis may take place on the end-host or edge cloud (to respect user privacy and reduce the volume of data) while relying on remote clouds to complete the analysis (to leverage greater computation and storage resources). Wide-area data analytics is any analysis of data that is generated by, or stored at, geographically dispersed entities. Over the past few years, several parts of the computer science research community have started to explore effective ways to analyze data spread over multiple locations. In particular, several areas of "systems" research - including databases, distributed systems, computer networking, and security and privacy - have delved into these topics. These research subcommunities often focus on different aspects of the problem, consider different motivating applications and use cases, and design and evaluate their solutions differently. To address these challenges the Computing Community Consortium (CCC) convened a 1.5-day workshop focused on wide-area data analytics in October 2019. This report summarizes the challenges discussed and the conclusions generated at the workshop.
READ FULL TEXT