Making a Case for MLNs for Data-Driven Analysis: Modeling, Efficiency, and Versatility
Datasets of real-world applications are characterized by entities of different types, which are defined by multiple features and connected via varied types of relationships. A critical challenge for these datasets is developing models and computations to support flexible analysis, i.e., the ability to compute varied types of analysis objectives in an efficient manner. To address this problem, in this paper, we make a case for modeling such complex data sets as multilayer networks (or MLNs), and argue that MLNs provide a more informative model than the currently popular simple and attribute graphs. Through analyzing communities and hubs on homogeneous and heterogeneous MLNs, we demonstrate the flexibility of the chosen model. We also show that compared to current analysis approaches, a network decoupling-based analysis of MLNs is more efficient and also preserves the structure and result semantics. We use three diverse data sets to showcase the effectiveness of modeling them as MLNs and analyzing them using the decoupling-based approach. We use both homogeneous and heterogeneous MLNs for modeling and community and hub computations for analysis. The data sets are from US commercial airlines and IMDb, a large international movie data set. Our experimental analysis validate modeling, efficiency of computation, and versatility of the approach. Correctness of results are verified using independently available ground truth. For the data sets used, efficiency improvement is in the range of 64
READ FULL TEXT