Hypergraph Modeling and Visualisation of Complex Co-occurence Networks

09/01/2018 ∙ by Xavier Ouvrard, et al. ∙ CERN 0

Finding inherent or processed links within a dataset allows to discover potential knowledge. The main contribution of this article is to define a global framework that enables optimal knowledge discovery by visually rendering co-occurences (i.e. groups of linked data instances attached to a metadata reference) - either inherently present or processed - from a dataset as facets. Hypergraphs are well suited for modeling co-occurences since they support multi-adicity whereas graphs only support pairwise relationships. This article introduces an efficient navigation between different facets of an information space based on hypergraph modelisation and visualisation.



There are no comments yet.


This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Having insight into non-numerical data calls for the gathering of instances: classically (multi-entry) frequency arrays of occurences are used. To get further insight into data instances of a given type, one can regroup them using their links to instances of another type - used as reference. It generates a family of co-occurence sets that can be viewed as a facet of the information space. Navigating accross the different facets is achieved by iterating this process between different types of interest while keeping the same reference type: any of these types can be used as a reference. We use a publication dataset as breadcrumb trail example.

Previous approaches using a reference to articulate the different facets of an information space exist [1, 2, 3]. [4] proposes a graph-based framework which provides insights into the different facets of an information space based on user-selected perspectives, combining type of reference and of co-occurences. [5] shows how the keeping of -adic relationships can help in gaining understanding in the network evolution.

This article provides a hypergraph-based framework that supports interactions between the different facets of an information space for optimal knowledge discovery. The dataset - mostly textual - refers to physical entities with unique individual references. Data instances are attached to metadata instances. We suppose that there is no metadata instance that doesn’t have a data instance attached to it.

2 Modeling co-occurences in datasets

Hypergraphs suits well the storage of co-occurence information with references. A hypergraph is a hyperedge family 111 corresponds to and to . over the vertex set [6]. A hypergraph where the hyperedges are distinct one-to-one is said with no repeated hyperedge. In [7], a hypergraph is a triple with a vertex set, a hyperedge set and 222 is the power set of an incidence function. Considering a map the hypergraph - or - is said weighted.

2.1 Allowing navigation

Relational database schema are hypergraphs of metadata instances where the hyperedges gather table metadata: normalized forms are linked to the properties of the hypergraphs modeling them [8]. In graph databases, the schema333although not required [9] represents the relationships between the vertex types. The schema hypergraph represents these relationships as hyperedges.

Each data instance stored in the dataset is labeled using a labeling function on the vertices of . Hyperedges of the schema itself can be labeled by another labeling function over another label set.

Types of visual or referencing interest are selected in a subset of to generate the extracted schema hypergraph where , and .

From , we build the reachability hypergraph with as vertex set, the hyperedges of are the connected components of - regrouped in , the set of connected components of - and .

Last at the level of metadata, the navigation hypergraph is built by choosing a nonempty subset of possible reference vertices in a hyperedge Non empty subsets of allow to generate possible hyperedges of the navigation hypergraph where Navigation is possible without changing reference inside a hyperedge of .

In a publication dataset, typical metadata is: publication id, title, abstract, authors, affiliations, addresses, author keywords, subject categories, countries, organisations,…444Metadata of interest for visualisation or referencing are in italic A possible navigation hyperedge is: {author keywords, organisations, country, subject category} with publication id as reference.

2.2 Facet visualisation hypergraphs

Each physical entity in a dataset is described by a unique physical reference and a set of data instances of different types . The types are obtained from the metadata - for instance in publications: organisation, author keywords, country. We write the set of values of type that are attached to is possibly the emptyset if no value of type is attached to . Hence is fully described by:

In the navigation hypergraph, each hyperedge describes accessible facets relatively to a reference type. A facet will show co-occurences of a chosen type built relatively to reference instances of type ( as short). For example, in a publication dataset, with organisation as reference, one can retrieve all subject categories that are common to a given organisation.

Performing a search on the dataset will retrieve a set of physical references . A facet will be represented by the visualisation hypergraph of co-occurences of type . The set of all values of type is defined by . Each value of type is mapped to a set of physical references in which they appear, using where . The set of values of type relatively to the reference is

Hence the raw visualisation hypergraph for the facet of type attached to the search is .

Some hyperedges can possibly point to the same subset of vertices. In this case, we build a reduced visualisation weighted hypergraph from the raw visualisation hypergraph. We define: and the equivalence relation such that: , :

Considering 555 is the quotient set of by , we write where .

is the support set of the multiset666In a multiset repetitions of elements are allowed. For further details [10]. : is of multiplicity in this multiset.

It yields:

Let , then is bijective. allows to retrieve the class associated to a given hyperedge; hence the associated values of to this class - which will be important for navigation. The references associated to are The reduced visualisation weighted hypergraph for the search is defined as

2.3 Navigability through facets

Keeping the same search and reference , the sets remain the same between the different facets: considering another type and using the same reference , another visualisation hypergraph is built.

Let being the current type and being the current visualisation hypergraph. Focusing on a subset of vertices , we retrieve the corresponding hyperedge subset of which contains at least one element of Using we get for each the class associated to the hyperedge , building the set The references of type used to build the co-occurences are: . From each element of , the set of physical references is retrieved, considering as the restriction of to It yields to the physical reference set:

From these physical references, one can switch to another facet of the same search with the same reference type . Let be the targetted type. Then only will be processed as raw visualisation hypergraph, using as reference search set in the former paragraph. To obtain the related reduced weighted version we use the same approach as above. The set of co-occurences retrieved include all occurences that have co-occured with one of the element selected in the first facet.

Of course if the reduced visualisation hypergraph will contain all the instances of type attached to physical entities of the search .

Ultimately, by building a multi-dimensional network organised around types, one can retrieve very valuable information from combined data sources. This process can be extended to any number of data sources as long as they share one or more types. Otherwise the reachability hypergraph is not connected and only separated navigations will be possible.

3 Conclusion

Using the connected components of the extracted schema we have enabled the possibility of navigating the dataset. An application of the hypergraph modeling framework is the DataHedron shown in Figure 1: it enables easy navigation between facets of the information space. It is a 2.5D representation of the information space where each DataHedron face embeds a visualisation hypergraph. Navigation through facets is articulated via the references that links one facet with another. The link by references is realised on one face of the DataHedron. Combining this framework with search tools allows to have deep insight into a dataset.

Figure 1: The DataHedron.


  • [1] M. Dörk, N. H. Riche, G. Ramos, S. Dumais, Pivotpaths: Strolling through faceted information spaces, IEEE Transactions on Visualization and Computer Graphics 18 (12) (2012) 2709–2718.
  • [2] J. Zhao, C. Collins, F. Chevalier, R. Balakrishnan, Interactive exploration of implicit and explicit relations in faceted datasets, IEEE Transactions on Visualization and Computer Graphics 19 (12) (2013) 2080–2089.
  • [3] S. Hadlak, H. Schumann, H.-J. Schulz, A survey of multi-faceted graph visualization, in: EuroVis, 2015, pp. 1–20.
  • [4] A. Agocs, D. Dardanis, J.-M. Le Goff, D. Proios, Interactive graph query language for multidimensional data in collaboration spotting visual analytics framework, ArXiv e-printsarXiv:1712.04202.
  • [5] C. Taramasco, J.-P. Cointet, C. Roth, Academic team formation as evolving hypergraphs, Scientometrics 85 (3) (2010) 721–740.
  • [6] A. Bretto, Hypergraph theory, An introduction. Mathematical Engineering. Cham: Springer.
  • [7] J. Stell, Relations on hypergraphs, Relational and Algebraic Methods in Computer Science (2012) 326–341.
  • [8] R. Fagin, Degrees of acyclicity for hypergraphs and relational database schemes, Journal of the ACM 30 (3) (1983) 514–550.
  • [9] R. C. McColl, D. Ediger, J. Poovey, D. Campbell, D. A. Bader, A performance evaluation of open source graph databases, PPAA ’14, ACM, 2014, pp. 11–18. doi:10.1145/2567634.2567638.
  • [10] A. Radoaca, Properties of multisets compared to sets, in: Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2015 17th International Symposium on, IEEE, 2015, pp. 187–188.