Graph Summarization

04/30/2020 ∙ by Angela Bonifati, et al. ∙ Foundation for Research & Technology-Hellas (FORTH) 0

The continuous and rapid growth of highly interconnected datasets, which are both voluminous and complex, calls for the development of adequate processing and analytical techniques. One method for condensing and simplifying such datasets is graph summarization. It denotes a series of application-specific algorithms designed to transform graphs into more compact representations while preserving structural patterns, query answers, or specific property distributions. As this problem is common to several areas studying graph topologies, different approaches, such as clustering, compression, sampling, or influence detection, have been proposed, primarily based on statistical and optimization methods. The focus of our chapter is to pinpoint the main graph summarization methods, but especially to focus on the most recent approaches and novel research trends on this topic, not yet covered by previous surveys.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Preliminaries

As the works we will discuss operate with different graph models, we start with a brief overview of useful notions.

Let be a set of objects and , a finite set of vertices. An undirected graph (UG) is a structure , s.t is a finite set of edges. If , is a directed graph (DG). We call both base graphs. Base graphs s.t multiple edges connect the same two vertices are called multi-graphs. Base graphs s.t an attribute list can be attached to each node/edge are called attributed graphs (AG).

Definition 1 (Knowledge Graphs)

Given a set of RDF triples , a knowledge graph (KG) is a multi-graph s.t and . A special case is that of geographical knowledge graphs (GG), where vertices can be mapped to meaningful geographic identifiers Yan (2019).

Let be a finite set of labels and , a function assigning a finite set of labels to each object. A labeled graph (LG) is a structure . Further extending this class to directed, attributed multi-graphs, we obtain the most expressive type of static graphs, i.e., property graphs, defined below (see also Bonifati et al. (2018)).

Definition 2 (Property Graphs)

Let be a set of property keys and , a set of values. A property graph (PG) is a structure , where is a finite set of vertices, is a finite set of edges, is a function assigning a pair of vertices to each edge, and is a partial function assigning property values to objects.

In the dynamic setting, the corresponding graph type for representing data streams Henzinger et al. (1998) is given by streaming graphs (SG) Latapy et al. (2018).

Definition 3 (Streaming Graphs)

Let be a set of timestamps. A streaming graph is a structure , where is a set of temporal nodes and is a set of links, s.t implies and .

2 Recent Summarization Techniques

We outline novel summarization techniques proposed in the literature. In Section 2.1, we discuss graph clustering methods, in Section 2.2, we present recent methods for statistical summarization, whereas in Section 2.3, we discuss goal-driven summarization approaches for streaming and property graphs.

2.1 Graph Clustering

Works in this category target graph clustering

-based approaches. Graph clustering is one of the key techniques used in exploratory data analysis, as it allows to identify components that exhibit similar properties. In general, a graph cluster consists of nodes that are densely connected within a group and sparsely connected with outside ones. In order to understand the structure of large-scale graphs, it is important to not only compute such clusters, but to also identify the roles that the various nodes play within the graph. As such, nodes that bridge different clusters are distinguished as hubs and are considered to correspond to highly influential entities, while those that are neither clusters nor hubs are called outliers and are treated as noise. Such a differentiation is important when mining complex networks; for example, in web graphs, hubs link related pages, while outliers can correspond to spam.

State of the art approaches to graph summarization through clustering roughly fall into two categories: structural and attributed-based. The following techniques operate of simple, undirected graphs, with the exception of the attribute-based ones, in which a list of feature attributes is also associated to each node.

Structural Clustering.

Structural Clustering takes into account the graph’s connectivity and uses standard algorithms based on partitioning Wang et al. (2014) and on computing modularity, density, or custom measures, such as the reliable structural similarity introduced in Qiu et al. (2019)

for clustering probabilistic graphs. Other approaches rely on identifying sets of k-median (respectively, k-center) nodes that maximize the average (respectively, minimum) connection probability between each node and its cluster’s center

Williamson and Shmoys (2011).

Structural clustering also often employs spectral methods, such as that of Laplacian eigenmaps to map nodes with higher similarity closer, based on a given symmetric, non-negative metric. However, a drawback of such techniques is that they are vulnerable to noise and outliers. To address this, recent techniques based on removing low-density nodes have been proposed. Recently, the work in Kim et al. (2020) shows how to use a sparse regularization model, which reconstructs node density from a similarity matrix, to prune out the noise and detect clusters in the process.

Other structural methods factorize the node adjacency matrix to compute clusters (Cao et al. (2015),Nikolentzos et al. (2017)), low-dimensional node embeddings (Cao et al. (2016), Wang et al. (2016), Ye et al. (2018)), or run random walks to learn such embeddings by maximizing neighbourhood probabilities (Perozzi et al. (2014), Grover and Leskovec (2016)). In Yan et al. (2019), a color-based random walk mechanism is presented, which allows identifying interactions between the seed nodes of local clusters.

The recent work in Han et al. (2019) extends k-median/k-center techniques to uncertain graphs and proposes several novel algorithms with provable performance bounds. In Wen et al. (2019), an index-based algorithm is introduced for the structural clustering of undirected, unweighted graphs. The proposed methodology is based on the maintenance of structural similarity for each pair of adjacent vertices and is capable of handling updates. The work in Liu and Barahona (2020) addresses the problem of structural clustering, by using multi-scale community detection techniques based on continuous-nearest neighbours (CkNN) similarity graphs and Markov stability quality measures. Community detection techniques are used in Kuo et al. (2017) to compute graph clusters, while also taking into account node relevance with respect to given queries. To this end, the authors introduce the query-oriented normalized cut and cluster balance metrics and combine these to compute the output clustering.

The work of Zhao et al. (2019) frames graph clustering as an unconstrained convex optimization problem and proposes a technique to reorganize datasets into so-called triangle lassos, connecting similar nodes. A optimized, iterative version of the SCAN algorithm, anySCAN, is presented in Mai et al. (2019), with the purpose of performing parallel clustering computations on large, static and dynamic graph datasets. In Zhan et al. (2019)

, the machine learning technique of multi-view clustering is used to combine feature information from different graph views. These are then integrated into a global graph, whose structure is tuned through a specialized objective function ensuring that the number of components corresponds to that of clusters. In order to refine clustering results, game theoretical methods, based on consensus computation, have also been proposed. For example, in

Hamidi et al. (2019), multiple graph clusters are integrated and outlier nodes are obtained through majority voting.

Within the structural clustering category, one can distinguish between quotient and non-quotient approaches. On the one hand, quotient methods are based on the notion of graph node ”equivalence” and produce summaries by assigning a representative to each such equivalence class. A recent work in this area is given by Goasdoué et al. (2019), in which compact summaries of heterogeneous RDF graphs are built for visualization purposes. The approach ignores the schema triples, considering only type and data ones, and relies on the concept of property cliques, which encode transitive relations of edge co-occurrence on graph nodes.

The proposed algorithms are time linear in the size of the input graph and incremental. In Shin et al. (2019), the authors present a fast summarization algorithm for graphs that are too large to fit in main memory, based on dividing these into smaller subgraphs, to be processed in parallel. Apart from the compact representation, this summarization also produces edge corrections, allowing one to restore the original graph, exactly or within given error bounds.

On the other hand, non-quotient methods are usually based on centrality measures, selecting only specific graph subsets, as in Ding et al. (2019)

. This recent work proposes an algorithm enhancing topological summarization with semantic information. Thus, embeddings are generated to measure the extent to which concepts produce compact summaries, while similarity is captured by the distance between these embeddings. Next, k-means is used to select the important concepts and their similarity is further taken into account, in order to avoid redundancy.

Attributed Clustering.

Attributed Clustering

considers both the topology of the graph and a set of feature attributes that is attached to each node. To obtain consistent clusters, in this setting, nodes and features are either taken into account together, by matrix factorization and spectral clustering algorithms, or are integrated in graph convolutional networks (GCN)

Kipf and Welling (2017). In the latter case, a wide variety of graph auto-encoders (variational, marginal, adversarial, regularized) are then used to learn node representations and to reconstruct the adjacency matrix, as well as the different node features. Recently, the work of Zhang et al. (2019) has proposed to combine a high-order graph convolution method (for smooth feature representation) with spectral clustering on the learned features, to capture global structures and to adapt the convolution order to each dataset. Flow-based technique for local clustering are introduced in Veldt et al. (2019), whereby semi-supervised information about target clusters is exploited to place constraints or penalties on excluding specific seed nodes from the output set. The underlying method in Chen et al. (2019) is based on a star-schema graph representation, in which attributes are modeled as different node types. DBSCAN clustering is then performed, using a personalized Pagerank as a unified distance measure for structural and attribute similarity.

2.2 Statistical Summarization

Statistical summarization mostly relies on occurrence counting and quantitative measures. Underlying approaches are based on either pattern-mining or sampling. Works in the former category aim to reveal patterns in the data and use these to summarize, while those in the later focus on selecting graph subsets.

The approach in Yan (2019) focuses on summarizing geographical knowledge graphs and introduces the concept of geo-spatial inductive bias (knowledge patterns hidden within geographic components). It deals with the summarization of both hierarchical and multimedia information related to the geographic nodes.

2.3 Goal-Driven Summarization

In the above sections, we have discussed various methods for summarizing static graphs. However, many of the works focused on generating summaries are goal-driven and set to optimize the memory footprint or some other utility type.

Following this direction, a key problem to tackle is that of summarizing dynamically changing graphs. These are graphs whose content (either edge labels, weights, or entire nodes and edges) is evolving over a sliding window of predefined size and are also known as graph streams under the window-based model Pacaci et al. (2020). Such continuously changing graphs need to be summarized in a way that ensures the scalability and efficiency of the queries formulated on the obtained summary.

Streaming graph summarization approaches have recently appeared and leverage a common principle: the production of a concise representation that fits in memory. Tsalouchidou et al. Tsalouchidou et al. (2020) focus on the design of an online clustering algorithm that overcomes the basic stringent memory requirements of a baseline (based on -means clustering Riondato et al. (2014)). They build on the micro-clusters concept from Aggarwal et al. (2003)

, in order to provide a memory-efficient algorithm for continuously changing graphs. The idea is to leverage a time series of adjacency matrices, each of which represents a static graph. The latter can also be seen as an Order 3 tensor. The problem is then formulated in terms of tensor summarization, where a tensor summary is obtained for the last

timestamps. Their distributed implementation allows dealing with large-scaled graphs on which temporal and probabilistic queries can be issued. The second approach Gou et al. (2019) also considers weighted graphs, where the weight is given by the timestamp, and strives to find an alternative data structure to the adjacency matrix, based on hash-based compression. In particular, a graph sketch, designed for sparse graphs, is created to store different source nodes/destination nodes in the same row/column and to distinguish them with fingerprints. The method outperforms the state of the art graph summarization algorithms, such as Zhao et al. (2011), for most queries, including topological ones (such as reachability and successor queries).

Other goal-driven summaries have addressed the problem of creating query-aware, compact graph representations, starting from a weighted or a labeled graph instance. GRASP summaries Dumbrava et al. (2019) have been defined for multi-labeled graphs that also possess node and edge properties. These knowledge-driven semantic graphs are also known as property graphs

(PGs). In GRASP, supernodes (superedges, resp.) are created to group together label-compatible graph nodes (edges, resp.), while also storing relevant statistical information. By incorporating this information, the obtained graph summaries are thus tailored to highly accurate approximations of basic analytical queries. The target fragment is that of counting regular path queries, which allows one to estimate, for example, the number of connections established in a social network within a given period.

The second kind of summaries Kumar and Efstathopoulos (2018) differ from the above in that they aim to maximize a utility function. While they also apply group-based iterative graph summarization as GRASP, their approach is not tailored to a specific query fragment. Contrary to GRASP, they also allow one to instantiate several utility functions, such as edge importance, edge submodularity, etc. In a sense, application-specific utility functions could thus be encoded.

Furthermore, depending on the chosen utility function, their definition of error is different from GRASP and builds upon the reconstruction error, in case the graph summarization step is reverted. The high utility and scalability of their method is shown through a wide range of experiments. In addition, in Safavi et al. (2019), the authors propose a personalized graph summarization method. The idea is to construct custom knowledge graph summaries, which only contain the most relevant information and which respect storage limitations. The problem is formalized as one of constructing a sparse graph that maximizes the inferred individual utility, subject to user- and device-specific constraints on the summary size.

3 Key Research Findings

The summarization approaches discussed previously can be structured into the taxonomy depicted in Fig.1

. We first notice that these methods apply to both static and dynamic graphs. Also, depending on their scope, the used techniques can be roughly classified as three-fold. First, those that rely on the underlying graph topology mainly perform clustering, by preserving structural or semantic (attribute-based) properties. Next, statistical means, ranging from sampling to complex pattern mining, are used to discover hidden information. Finally, goal driven approaches consider the relevance with respect to given queries or to pre-defined utility functions when summarizing.

Figure 1: A taxonomy of the included works.

We consider each of these directions and distil the topics currently in the limelight.

Regarding graph clustering, recent efforts focus on locality and efficiency. As such, flow-based algorithms are adapted and improved, to render local clustering amenable to real-world, semi-supervised problems. Other methods target local clustering under constraints and employ colored random walks, to account for prior knowledge. For efficiency purposes, index-based approaches are used in structural clustering and tailored to efficient graph querying and index maintenance. Also, the challenging problem of uncertain graph summarization has been recently tackled, by designing approximation algorithms with improved accuracy and performance.

While most graph summaries are built through clustering techniques, we have seen that other approaches are also being successfully employed. For example, when considering quantitative criteria, statistical means can be used to extract relevant patterns. One recent application area is that of domain knowledge graphs, where geographic information can thus be compactly represented. Finally, utility functions, such as query relevance or memory footprints, can be taken into account when constructing summaries. This is especially relevant when dealing with expensive analytical queries, such as counting RPQs Bonifati and Dumbrava (2018), or with large volumes of dynamic data, such as streaming graphs.

To better grasp the scope and purpose of the summarization approaches from Sec. 2, we provide a classification in Fig. 2. Note that the corresponding graph types are abbreviated, cf. Sec. 1, as follows: undirected (UG), labeled (LG), attributed (AG), as well as knowledge graphs (KG), geographical graphs (GG), property graphs (PG), and stream graphs (SG).

Inspecting the above table, we notice that most recent works have focused on structural clustering. While attributed approaches (Zhang et al. (2019), Chen et al. (2019)

) also take into account richer graph models, typically considering feature vectors associated to nodes, the full expressiveness of property graphs is only tackled in

Dumbrava et al. (2019), for AQP summarization.

Work/Method/Graph Type Keywords Purpose
Qiu et al. (2019) Similarity-based clustering (LG) Probabilistic graphs; Dynamic programming Data mining
Kim et al. (2020) Spectral clustering (UG) Non-linear patterns; Density reconstruction; Node cutting Noise elimination
Yan et al. (2019) Constrained local clustering (LG) Color-based random walk; Seed nodes Community detection
Wen et al. (2019) Index-based clustering (UG) SCAN; Index maintenance; Core & neighbour orders Querying
Liu and Barahona (2020) Geometric-based clustering (LG) Markov Stability; Similarity Graphs Community detection
Kuo et al. (2017) Query-oriented clustering (LG) Laplacian eigenmaps Community detection
Zhao et al. (2019) Convex clustering (UG) Triangle lasso; Unconstrained optimization; Regularization Data analysis
Mai et al. (2019) Anytime clustering (LG) SCAN; Parallelization; Dynamic Graphs; Multicore CPU Application-specific
Zhan et al. (2019) Adaptive clustering (UG)

Multiview clustering and learning; Feature extraction

Unsupervised learning
Hamidi et al. (2019) Consensus clustering (UG) Similarity graphs; Automatic partitioning Application-specific
Kipf and Welling (2017) Attributed clustering (LG) Multi-layer graph convolutional network Semi-supervised learning
Zhang et al. (2019) Attributed clustering (AG) Adaptive high-order convolution Application-specific
Veldt et al. (2019) Attributed clustering (UG) Flow-based local graph clustering Community detection
Chen et al. (2019) Attributed clustering (AG)

DBSCAN; Incrementality; Game theory

Data Mining
Goasdoué et al. (2019) Structural quotient (KG) Incremental; Property cliques Visualization
Shin et al. (2019) Structural quotient (UG) Partitioning and Parallelization; Compression Querying
Ding et al. (2019) Structural non-quotient (KG) Concept Vectors; Structural and semantic embeddings Visualization
Safavi et al. (2019) Structural non-quotient (KG) Personalization; Utility optimization Visualization
Yan (2019) Structural non-quotient (GG) Geospatial inductive bias; Hierarchical; Multimedia Visualization
Tsalouchidou et al. (2020) Tensor summaries (SG) Streaming graphs; Micro-clusters Querying
Zhao et al. (2011) Hash-based compression (LG) Timestamped weighted graphs Querying
Dumbrava et al. (2019) Quotient summaries (PG) Property Graphs; Complex Path Queries Approx. Querying
Kumar and Efstathopoulos (2018) Utility-driven summaries (LG) Trade-off between error and utility Application-specific
Figure 2: Classifying Novel Summarization Approaches

4 Applications

In this section, we elaborate on potential use-cases for graph summaries.

Query Efficiency. As summaries are often compact representations of the original input graphs, they can be used as indexes on the latter Konrath et al. (2012). Consequently, for efficiency purposes, queries could first be formulated on the summaries. The obtained summary nodes could then further be matched with the nodes they represent.

Query Size Estimation. Summaries often include statistics about the original graph, which could be exploited to estimate the size of query results Le et al. (2014).

Query Disambiguation. Queries that contain path expressions with wildcards are difficult to evaluate, despite being common in practice. A summary can easily provide information on the connectivity of the initial nodes and, as such, enable queries to be more efficiently evaluated via rewriting Goldman and Widom (1997).

Source Selection. Another interesting application is the use of summaries to detect whether a graph is likely to have specific information of potential interest for the user, without actually having to inspect the real data source Li and Wang (2017).

Graph Visualization. An obvious application for summaries is to enable the exploration of the original data source, effectively reducing the number of nodes/edges to be perceived by the user (Dunne and Shneiderman (2013), Koutra et al. (2014), Troullinou et al. (2018a), Troullinou et al. (2018b), Pappas et al. (2017)).

Schema Discovery. When no schema is present in the initial graph, a summary can be used instead to help users understand the original content, as shown in Bouhamoum et al. (2018).

Pattern Extraction. Summarization also enables pattern identification and extraction Koutra et al. (2014), by abstracting away irrelevant graph portions. An interesting such use-case is given by blockchain-based crypto-currencies. In this setting, transactions correspond to openly-accessible graphs, whose topological features can shed light on the role and interactions of the participants. Graph analysis techniques can be thus applied to identify salient structural patterns.

Knowledge Graph Search. Specialized summaries Song et al. (2018) can drive the search strategy in knowledge graphs. These represent lossy replacements of complex graph pattern and can be directly queried as approximate graph materialized views.

5 Future Research Directions

In this section, we discuss future directions for graph summarization, as inspired by the existing literature.

In the area of graph clustering, further improvements are needed to cope with mixed datasets, in which data points are comprised of both numerical and categorical attributes. For such datasets, one has to design custom models, capable of handling missing or uncertain feature values, as well as explainable and interpretable clustering algorithms. Explainability of clustering results would also be beneficial for graph summarization, in order to tune results to particular use cases.

The problem of building overlapping graph clusters, as addressed by fuzzy clustering algorithms, is also interesting to consider and its implications for graph summarization are tangible.

Moreover, we note that most existing approaches build static summaries. However, the used input graphs are constantly evolving and being updated. To address this, new research is tackling the problem of dynamicity. As summary recomputation is often costly, novel insights are needed on how to efficiently achieve incrementality. On a related note, recent works also focus on streaming graphs, as summarization techniques are required to handle the constantly arriving flow of data that cannot actually be stored. In this setting, ensuring that streaming summaries are updatable, for example using a sliding windows approach, is essential for efficient processing.

Furthermore, another interesting future direction would be to investigate quality metrics for summaries and evaluation benchmarks. However, as graph summarization employs numerous techniques, different outputs might be produced, depending on the purpose, rendering the task difficult.

Finally, the problem of graph summarization has been extensively addressed for existing graph data models, such as RDF, labeled, and weighted graphs. However, principled approaches would be desirable for more expressive graph data models, such as property graphs. On these graphs, clustering, in particular attributed methods also using edge features, dynamicity and benchmarking are all viable future research directions to be pursued.

Bibliography

  • C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu (2003) A framework for clustering evolving data streams. In VLDB, pp. 81–92. Cited by: §2.3.
  • P. Boldi and S. Vigna (2019) (Web/social) graph compression. In Encyclopedia of Big Data Technologies, Cited by: Graph Summarization.
  • A. Bonifati and S. Dumbrava (2018) Graph queries: from theory to practice. SIGMOD Record 47 (4), pp. 5–16. Cited by: §3.
  • A. Bonifati, G. H. L. Fletcher, H. Voigt, and N. Yakovets (2018) Querying graphs. Synthesis Lectures on Data Management, Morgan & Claypool Publishers. Cited by: §1.
  • R. Bouhamoum, K. Kellou-Menouer, S. Lopes, and Z. Kedad (2018) Scaling up schema discovery for RDF datasets. In ICDE Workshops, pp. 84–89. Cited by: §4.
  • S. Cao, W. Lu, and Q. Xu (2015) GraRep: learning graph representations with global structural information. In CIKM, pp. 891–900. Cited by: §2.1.
  • S. Cao, W. Lu, and Q. Xu (2016)

    Deep neural networks for learning graph representations

    .
    In AAAI, pp. 1145–1152. Cited by: §2.1.
  • S. Cebiric, F. Goasdoué, H. Kondylakis, D. Kotzinos, I. Manolescu, G. Troullinou, and M. Zneika (2019) Summarizing semantic graphs: a survey. VLDB J. 28 (3), pp. 295–327. Cited by: Graph Summarization.
  • L. Chen, Y. Gao, Y. Zhang, C. S. Jensen, and B. Zheng (2019) Efficient and incremental clustering algorithms on star-schema heterogeneous graphs. In ICDE, pp. 256–267. Cited by: Figure 2, §2.1, §3.
  • Y. Ding, H. Yu, J. Zhang, H. Li, and Y. Gu (2019) A knowledge representation based user-driven ontology summarization method. IEICE Trans. 102-D (9), pp. 1870–1873. Cited by: Figure 2, §2.1.
  • S. Dumbrava, A. Bonifati, A. N. R. Diaz, and R. Vuillemot (2019) Approximate querying on property graphs. In SUM, LNCS, Vol. 11940, pp. 250–265. Cited by: Figure 2, §2.3, §3.
  • C. Dunne and B. Shneiderman (2013) Motif simplification: improving network visualization readability with fan, connector, and clique glyphs. In CHI, pp. 3247–3256. Cited by: §4.
  • F. Goasdoué, P. Guzewicz, and I. Manolescu (2019) Incremental structural summarization of RDF graphs. In EDBT, pp. 566–569. Cited by: Figure 2, §2.1.
  • R. Goldman and J. Widom (1997) DataGuides: enabling query formulation and optimization in semistructured databases. In VLDB, pp. 436–445. Cited by: §4.
  • X. Gou, L. Zou, C. Zhao, and T. Yang (2019) Fast and accurate graph stream summarization. In ICDE, pp. 1118–1129. Cited by: §2.3.
  • A. Grover and J. Leskovec (2016) Node2vec: scalable feature learning for networks. In KDD, pp. 855–864. Cited by: §2.1.
  • S. S. Hamidi, E. Akbari, and H. Motameni (2019) Consensus clustering algorithm based on the automatic partitioning similarity graph. Data Knowl. Eng. 124. Cited by: Figure 2, §2.1.
  • K. Han, F. Gui, X. Xiao, J. Tang, Y. He, Z. Cao, and H. Huang (2019) Efficient and effective algorithms for clustering uncertain graphs. PVLDB 12 (6), pp. 667–680. Cited by: §2.1.
  • M. R. Henzinger, P. Raghavan, and S. Rajagopalan (1998) Computing on data streams. In External Memory Algorithms, DIMACS, Vol. 50, pp. 107–118. Cited by: §1.
  • Y. Kim, H. Do, and S. B. Kim (2020) Outer-points shaver: robust graph-based clustering via node cutting. Pattern Recognit. 97. Cited by: Figure 2, §2.1.
  • T. N. Kipf and M. Welling (2017) Semi-supervised classification with graph convolutional networks. In ICLR (Poster), Cited by: Figure 2, §2.1.
  • H. Kondylakis, D. Kotzinos, and I. Manolescu (2019) RDF graph summarization: principles, techniques and applications. In EDBT, pp. 433–436. Cited by: Graph Summarization.
  • M. Konrath, T. Gottron, S. Staab, and A. Scherp (2012) SchemEX - efficient construction of a data catalogue by stream-based indexing of linked data. J. Web Semant. 16, pp. 52–58. Cited by: §4.
  • D. Koutra, U. Kang, J. Vreeken, and C. Faloutsos (2014) VOG: summarizing and understanding large graphs. In SDM, pp. 91–99. Cited by: §4, §4.
  • K. A. Kumar and P. Efstathopoulos (2018) Utility-driven graph summarization. PVLDB 12 (4), pp. 335–347. Cited by: Figure 2, §2.3.
  • L. Kuo, C. Chou, and M. Chen (2017) Query-oriented graph clustering. In PAKDD (2), LNCS, Vol. 10235, pp. 749–761. Cited by: Figure 2, §2.1.
  • M. Latapy, T. Viard, and C. Magnien (2018) Stream graphs and link streams for the modeling of interactions over time. Social Netw. Analys. Mining 8 (1), pp. 61:1–61:29. Cited by: §1.
  • W. Le, F. Li, A. Kementsietsidis, and S. Duan (2014) Scalable keyword search on large RDF data. IEEE Trans. Knowl. Data Eng. 26 (11), pp. 2774–2788. Cited by: §4.
  • J. Li and W. Wang (2017) Graph summarization for source selection of querying over Linked Open Data. In ITNEC, pp. 357–362. Cited by: §4.
  • Y. Liu, T. Safavi, A. Dighe, and D. Koutra (2018) Graph summarization methods and applications: A survey. ACM Comput. Surv. 51 (3), pp. 62:1–62:34. Cited by: Graph Summarization.
  • Z. Liu and M. Barahona (2020) Graph-based data clustering via multiscale community detection. Applied Network Science 5 (1), pp. 3. Cited by: Figure 2, §2.1.
  • S. T. Mai, S. Amer-Yahia, I. Assent, M. S. Birk, M. S. Dieu, J. Jacobsen, and J. Kristensen (2019) Scalable interactive dynamic graph clustering on multicore cpus. IEEE Trans. Knowl. Data Eng. 31 (7), pp. 1239–1252. Cited by: Figure 2, §2.1.
  • G. Nikolentzos, P. Meladianos, and M. Vazirgiannis (2017) Matching node embeddings for graph similarity. In AAAI, pp. 2429–2435. Cited by: §2.1.
  • A. Pacaci, A. Bonifati, and M. T. Özsu (2020) Regular path query evaluation in streaming graphs. In SIGMOD, Cited by: §2.3.
  • A. Pappas, G. Troullinou, G. Roussakis, H. Kondylakis, and D. Plexousakis (2017) Exploring importance measures for summarizing RDF/S KBs. In ESWC, pp. 387–403. Cited by: §4.
  • B. Perozzi, R. Al-Rfou, and S. Skiena (2014) DeepWalk: online learning of social representations. In KDD, pp. 701–710. Cited by: §2.1.
  • S. Pouriyeh, M. Allahyari, Q. Liu, G. Cheng, H. R. Arabnia, M. Atzori, F. G. Mohammadi, and K. Kochut (2019) Ontology summarization: graph-based methods and beyond. Int. J. Semantic Computing 13 (2), pp. 259–283. Cited by: Graph Summarization.
  • Y. Qiu, R. Li, J. Li, S. Qiao, G. Wang, J. X. Yu, and R. Mao (2019) Efficient structural clustering on probabilistic graphs. IEEE Trans. Knowl. Data Eng. 31 (10), pp. 1954–1968. Cited by: Figure 2, §2.1.
  • M. Riondato, D. García-Soriano, and F. Bonchi (2014) Graph summarization with quality guarantees. In ICDM, pp. 947–952. Cited by: §2.3.
  • T. Safavi, C. Belth, L. Faber, D. Mottin, E. Müller, and D. Koutra (2019) Personalized knowledge graph summarization: from the cloud to your pocket. In ICDM, pp. 528–537. Cited by: Figure 2, §2.3.
  • C. Schulz and D. Strash (2019) Graph partitioning: formulations and applications to big data. In Encyclopedia of Big Data Technologies, Cited by: Graph Summarization.
  • K. Shin, A. Ghoting, M. Kim, and H. Raghavan (2019) SWeG: lossless and lossy summarization of web-scale graphs. In WWW, pp. 1679–1690. Cited by: Figure 2, §2.1.
  • Q. Song, Y. Wu, P. Lin, X. Dong, and H. Sun (2018) Mining summaries for knowledge graph search. IEEE Trans. Knowl. Data Eng. 30 (10), pp. 1887–1900. Cited by: §4.
  • G. Troullinou, H. Kondylakis, K. Stefanidis, and D. Plexousakis (2018a) Exploring RDFS kbs using summaries. In ISWC, pp. 268–284. Cited by: §4.
  • G. Troullinou, H. Kondylakis, K. Stefanidis, and D. Plexousakis (2018b) RDFDigest+: A summary-driven system for KBs exploration. In ISWC (Poster), Cited by: §4.
  • I. Tsalouchidou, F. Bonchi, G. D. F. Morales, and R. Baeza-Yates (2020) Scalable dynamic graph summarization. IEEE Trans. Knowl. Data Eng. 32 (2), pp. 360–373. Cited by: Figure 2, §2.3.
  • N. Veldt, C. Klymko, and D. F. Gleich (2019) Flow-based local graph clustering with better seed set inclusion. In SDM, pp. 378–386. Cited by: Figure 2, §2.1.
  • D. Wang, P. Cui, and W. Zhu (2016) Structural deep network embedding. In KDD, pp. 1225–1234. Cited by: §2.1.
  • L. Wang, Y. Xiao, B. Shao, and H. Wang (2014) How to partition a billion-node graph. In ICDE, pp. 568–579. Cited by: §2.1.
  • D. Wen, L. Qin, Y. Zhang, L. Chang, and X. Lin (2019) Efficient structural graph clustering: an index-based approach. VLDB J. 28 (3), pp. 377–399. Cited by: Figure 2, §2.1.
  • D. P. Williamson and D. B. Shmoys (2011) The design of approximation algorithms. Cambridge University Press. Cited by: §2.1.
  • B. Yan (2019) Geographic knowledge graph summarization. Ph.D. Thesis, University of California, Santa Barbara, United States of America. Cited by: Figure 2, §2.2, Definition 1.
  • Y. Yan, Y. Bian, D. Luo, D. Lee, and X. Zhang (2019) Constrained local graph clustering by colored random walk. In WWW, pp. 2137–2146. Cited by: Figure 2, §2.1.
  • F. Ye, C. Chen, and Z. Zheng (2018)

    Deep autoencoder-like nonnegative matrix factorization for community detection

    .
    In CIKM, pp. 1393–1402. Cited by: §2.1.
  • K. Zhan, C. Niu, C. Chen, F. Nie, C. Zhang, and Y. Yang (2019) Graph structure fusion for multiview clustering. IEEE Trans. Knowl. Data Eng. 31 (10), pp. 1984–1993. Cited by: Figure 2, §2.1.
  • X. Zhang, H. Liu, Q. Li, and X. Wu (2019) Attributed graph clustering via adaptive graph convolution. In IJCAI, pp. 4327–4333. Cited by: Figure 2, §2.1, §3.
  • P. Zhao, C. C. Aggarwal, and M. Wang (2011) GSketch: on query estimation in graph streams. PVLDB 5 (3), pp. 193–204. Cited by: Figure 2, §2.3.
  • Y. Zhao, K. Xu, E. Zhu, X. Liu, X. Zhu, and J. Yin (2019) Triangle lasso for simultaneous clustering and optimization in graph datasets. IEEE Trans. Knowl. Data Eng. 31 (8), pp. 1610–1623. Cited by: Figure 2, §2.1.