1 Related Work
The related work is twofold since it combines multiple visual analytics techniques with the power of graph query languages. In the last 15 years, a lot of visual analytics articles were published with the aim of showing processes of transformation of multidimensional data into nodelink diagrams [38, 24].
A lot of articles have been published, especially on the coordinated multiple views topic, which introduces a visual analytics paradigm supported by an interactive query language or by a set of operations. These articles can be divided into four different groups:
Literature on graph query languages is huge [22, 17, 21, 16, 26, 2, 32, 44]. It covers the use of different graph models reflecting the variety of requirements for applications and languages.
The visual analytics model introduced in this paper promotes a different approach to graph query language. The language operates on a directed, labelled graph that is managed via user interactions treated as query inputs and follows the semantic web query language concept, SPARQL [18]. This approach allows users to generate graph patterns and evaluate them directly on the graph.
2 Basic graph and views
Let graph be a directed, label graph defined as a fourelement tuple where represents a set of vertices and , a set of edges defined as a subset of the Cartesian products of these vertices. is a set of vertex labels and is a mapping function from vertices to the corresponding labels. Figure 1 shows an example of such a graph.
We define the reachability graph over graph as where vertices are labels of graph , is defined as the Cartesian product of the labels where any two vertices of are connected if and only if there exists two connected vertices in graph and their respective labels correspond to the two vertices of graph . Graph is a description of graph , it is also called the graph schema of graph . Graph schema helps users view graph via different subgraphs of lesser dimensionality using labels of as dimensions and facilitates the generation of approximately optimal userdefined graph queries. Let graph be a graph pattern where and . To process the answer to a graph query, one needs to find all possible isomorphic subgraphs of that are homomorphic to a graph pattern
corresponding to the query. This is a graph pattern matching problem, a wellknown part of Mathematics
[13]. In this case, one defines Graph , a subgraph of graph as a sample matching the graph pattern if and only if:
,

.
The answer to a graph query is a view containing the set of subgraphs of matching . To build such a view, one needs first to introduce the graph pairing function and the set . Let and be two graph patterns. These graph patterns are paired iff

and

.
Where a path is an alternate nonempty sequence of vertices and edges, starting and ending with vertices and requiring that all edges and vertices be distinct from one another. indicates that all edges of this path are in set . The function is defined as
And , the set of these pairs is defined as .
A view of graph is defined as a sixelement tuple where

,

,

and ,

,

,

.
The use of multiple graph patterns for the construction of graph is required since the cardinality of set and set is not necessary equal to 1 (see details in Section 3.1). To ease the reading, graph is noted to refer directly to the set of labels used in the construction of the view. Also, in practice, we use an aggregation function on edges, respectively on vertices in graph for determining their respective weights instead of the elements in set (for instance, the number of elements). Figure 2 shows an example of a view.
3 Graph creation from user interactions
In this section, we introduce how the graph patterns and views can be created as a result of the following user interactions:

Selection of different nodes in the current view,

Removal of all vertices with the same label selected in one of the previous views,

Navigation from one view to another.
Users can modify set and set when performing any of the above interactions. Let be the set of vertices corresponding to a user selection, we define from :

which contains the labels of nodes in set and,

with , a subset of set , restricted to vertices having their respective labels in set .
In order for set to operate as a filter, the matched sample definition of Section 2 has to be restricted by requiring that . Example 1 below shows the content of for user selection from the graph depicted in Figure 1.
Example 1
(1)  
(2) 
3.1 Graph pattern construction
This section shows how to construct a graph pattern with set containing all the labels of vertices in set . We exploit the fact that graph patterns are actually only needed when constructing edges in and their respective weights. A pair of graph patterns are required for each combination of labels in set and set since the path direction between vertices from set and set are different due to the construction of edges between vertices of and vertices of . Each pattern has to satisfy the following criteria:

It must be a connected and directed graph,

It must be minimal,

Labels from set can be used as intermediate vertices in the pattern.
These requirements exactly fit a Steiner Minimal Tree problem [23], known to be NPcomplete[14] and for which we use a minimal spanning tree solver as an approximation algorithm. Algorithm 1 describes the full process of pair generation. Figure 3 shows the graph schema of graph depicted in Figure 1 and the generated patterns.
3.2 Connecting user interactions and views
Now that graph patterns () have been created using set , set and set , one can introduce the function
that generates views from user interactions. ( and ) as
where
are the vertices of graph and
are the ”interconnection” vertices: The other members of the sixtuple are unchanged since

labels (set ) are not modified and since

edge definition (set ) and weighting functions ( and ) only depend on set and set .
4 Operations on graphs
User interactions will result in the following graph operations:

Selection: The user selects nodes on the view,

Expansion: The user expands a view by removing in his previous selection, vertices having the same labels,

Navigation: The user navigates from a view to another.
To define these operations one needs first to introduce the concepts of visual equivalence and minimal views since there can be views with vertices of null weight that are hidden to the user and hence nonselectable. Let and be two different filters on the same view complying with . In essence, this means that there is no difference in the sets of vertices with labels contained in which technically should be empty. View and view generated using F1 and F2 are said to be visual equivalent if and only if
Definition 2 (Visequivalent)
where () represents the vertices of view (). Intuitively visual equivalence guaranties that vertices that are not common to two views have empty weights. It provides equivalence classification on views. It is easy to prove that for each class of views there is only one which does not have vertices with empty weights. This view is called the minimal view.
4.1 Selection on graphs
Let be the set of user selected nodes within a view. and where is a set of vertices from the minimal view which is visualequivalent to graph . The selection operator is defined as
Definition 3
(Selection)
where . It is to be noted that at view creation the the selection operator has been used with a more general definition of the function.
4.2 Expansion on graphs
The expansion operator is in some sense the ”invert” or the selection operator. It is defined as
Definition 4
(Expansion)
The expansion operator changes view when and remove all vertices in set that are labelled with labels in .
4.3 Navigation through graphs
By selecting a subset of labels from one can build views of graph with reduced dimensional complexity. Navigation across views is required to enable users to apprehend the full graph . Therefore the navigation function goes from view to a view labelled as and and is defined as:
Definition 5 (Navigation)
4.4 Navigation history
The navigation history can be represented as a navigation graph where vertices represent navigation states and edges navigation steps between states. complies to

.

,
where there is a navigation step between node to node if and only if one of the following statements is true:

, and ;

, and ;

and .
In , the third component of an edge is always one of the operations or . It indicates how the step was processed.
The proper size of is where and .
A particular navigation history corresponds to a walk in . An example of such a walk is given below
Example 6 (Walk on graph)
In practice, a particular set of labels is used to create an entry view from which all the above mentioned operations can then be performed by users.
5 Usecase
In the framework of AIDA [6], an FP7 project on Advanced European Infrastructures for Detectors at Accelerators, researchers needed to identify key players from academia and industry for technologies considered as strategic for the particle physics programme. To this end, the Collaboration Spotting project was launched in 2012 with a view to enabling users to search for technologies in titles and abstracts of publications and patents and viewing the organisation, journal category, keywords, city and country landscapes for each of these technologies individually. Individual technology searches are represented as vertices in a view named Technogram, used as the user entry view in which edges represent publications and/or patents common to searches.
5.1 Data
Two different sources are used for searching. The metadata records of publications from Web of Science™ Core Collection [7] developed by Clarivate Analytics (in the past, Thomson Reuters) and the metadata records of patents from PATSTAT developed by the European Patent Office [12]. Although the two sources have a number of labels in common, such as organisation, city and country there are others like journal category and keyword that only belong to publications. The subset of data from the two sources corresponding to the labels of interest for users was used to construct graph and its schema .
5.2 Storing data in a graph database (Neo4j)
Graph is stored in a Neo4j graph database [30], in which individual metadata records are stored as subgraphs of labelled vertices using Published item, Organisation, Journal Category, Author Keyword, City, Region and Country as labels. Figure 4 represents the reachability graph (graph schema) of this network. Besides these labels, additional labels have been introduced to support user authentication and authorisation (User) and searches (Graph and Technology). Searches use full text indices of the Apache Lucene project [29] that have been integrated into the Neo4j database as legacy indices [30].
5.2.1 Statistic of our graph data
Searches on publications and patents metadata records from the 2000  2014 period can be performed. The resulting data network contains 45 million vertices and 150 million edges. Its breakdown is given in Table 1. and Table 2.
Type of nodes  Number of nodes 

Patents  15.000.442 
Publications  20.087.904 
Organisations  2.918.060 
Author Keywords  8.193.604 
Subject Categories  230 
Cities  7.741 
Regions  946 
Countries  128 
46.209.055 
Patents  Publications  
Organ.  12.440.903  36.672.677  49.113.580 
Author Key.    48.941.098  48.941.098 
Subject Cat.    32.566.806  32.566.806 
Cities  3.193.709  8.826.222  12.019.931 
Regions  265.421  2.504.441  2.769.862 
Count.  3.156.449  8.020.648  11.177.097 
19.056.482  137.531.892  156.588.374 
As can be noticed the number of region edges is smaller than the number of country edges due to the use of the level of Nomenclature of Territorial Units For Statistic [11] created by the European Commission.
5.3 Navigation
The entry point for this use case is individual users. Using the terminology introduced above, the initial user interaction set contains user IDs.
5.3.1 Limitations
In the current implementation there is a restriction on the size of and fixed to a single label Published Item and the visualization system only supports undirected edges. This calls for the generation of only one graph pattern instead of two making the system faster.
In Figure 5, a short series of pictures illustrates how operations are working. The user enters the system with a technology view (vertices are labelled with the Technology label and they are connected to the other views via vertices labelled with the Published Item label).
6 Conclusion and Future Work
The current version of Collaboration Spotting running at CERN [8] addresses the implementation of the concepts using patents and publications metadata records. It is a new experimental service that aims to provide the High Energy Physics community (such as HEPTech [20]) with information on Academia & Industry main players active around key technologies, with a view to fostering more interdisciplinary and intersectoral R&D collaborations, and giving the procurement service the opportunity of reaching a wider selection of hightech companies for biding purposes. Collaboration Spotting is generic in its concepts and implementation. It can support visual analytics of any kind of data and its backend is implemented using Neo4j graph database [30]. Conference papers, technical & business news, trademarks & designs and financial data are amongst the data targeted to enrich the information on technologies that one can obtain from publications and patents. The choice of data sources will depend on users’ priorities. The tool can be of use to other communities, in particular in dentistry[27] but also to policy makers and investors if data in the labelled graph is enriched with technical & business news and financial data. Collaboration Spotting also addresses other types of data such as compatibility and dependency relationships in software and metadata [5, 35] of the LHCb experiment at CERN.
As an interactive graph query language, Collaboration Spotting is intended to provide a fully customisable visual analytics environment. In the current version data processing supports searches and contextual queries. In the future, labelled & directed relationships and attributes on nodes will be included in the labelled property graph representation of the data network and the processing will be extended to more complex operations directly on the graph resulting from searches and queries with a view to enhancing the visual perception of users.
Acknowledgements.
References
 [1] B. Bach, E. Pietriga, and J.D. Fekete. Visualizing dynamic networks with matrix cubes. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’14, pp. 877–886. ACM, New York, NY, USA, 2014. doi: 10 . 1145/2556288 . 2557010
 [2] P. Barceló Baeza. Querying graph databases. In Proceedings of the 32Nd Symposium on Principles of Database Systems, PODS ’13, pp. 175–188. ACM, New York, NY, USA, 2013. doi: 10 . 1145/2463664 . 2465216
 [3] M. Bastian, S. Heymann, M. Jacomy, et al. Gephi: an open source software for exploring and manipulating networks. Icwsm, 8:361–362, 2009.
 [4] A. Bezerianos, F. Chevalier, P. Dragicevic, N. Elmqvist, and J. D. Fekete. Graphdice: A system for exploring multivariate social networks. In Proceedings of the 12th Eurographics / IEEE  VGTC Conference on Visualization, EuroVis’10, pp. 863–872. The Eurographs Association and John Wiley & Sons, Ltd., Chichester, UK, 2010. doi: 10 . 1111/j . 14678659 . 2009 . 01687 . x
 [5] M. Cattaneo, M. Clemencic, and I. Shapoval. LHCb software and Conditions Database crosscompatibility tracking system: A graphtheory approach. In Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC), 2012 IEEE, pp. 990–996. IEEE, 2012.
 [6] CERN  AIDA team. Advanced European Infrastructures for Detectors at Accelerators, December 2017.
 [7] Clarivate Analytics (in the past, Thomson Reuters). Web of Science™, December 2017.
 [8] Collspotting Developer Team. Collspotting, December 2017.
 [9] T. A. Davis and Y. Hu. The university of florida sparse matrix collection. ACM Trans. Math. Softw., 38(1):1:1–1:25, Dec. 2011.
 [10] N. Elmqvist, P. Dragicevic, and J. D. Fekete. Rolling the dice: Multidimensional visual exploration using scatterplot matrix navigation. IEEE Transactions on Visualization and Computer Graphics, 14(6):1539–1148, Nov 2008. doi: 10 . 1109/TVCG . 2008 . 153
 [11] European Commision. NUTS  Nomenclature Of Territorial Units For Statistics, December 2017.
 [12] European Patent Office. PATSTAT  Worldwide Patent Statistical Database, December 2017.
 [13] B. Gallagher. Matching structure and semantics: A survey on graphbased pattern matching. AAAI FS, 6:45–53, 2006.
 [14] M. R. Garey, R. L. Graham, and D. S. Johnson. The complexity of computing steiner minimal trees. SIAM journal on applied mathematics, 32(4):835–859, 1977.
 [15] J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing groupby, crosstab, and subtotals. Data mining and knowledge discovery, 1(1):29–53, 1997.
 [16] R. H. Güting. Graphdb: Modeling and querying graphs in databases. In VLDB, vol. 94, pp. 12–15. Citeseer, 1994.
 [17] M. Gyssens, J. Paredaens, J. V. D. Bussche, and D. V. Gucht. A graphoriented object database model, 1990.
 [18] S. Harris, A. Seaborne, and E. Prud’hommeaux. Sparql 1.1 query language. W3C Recommendation, 21, 2013.
 [19] J. Heer and A. Perer. Orion: A system for modeling, transformation and visualization of multidimensional heterogeneous networks. Information Visualization, 13(2):111–133, 2014.
 [20] HEPTech Team. HEPTech  website, December 2017.
 [21] J. Hidders. Typing graphmanipulation operations. In Database TheoryICDT 2003, pp. 394–409. Springer, 2003.
 [22] J. Hidders and J. Paredaens. GOAL, A Graphbased Object and Association Language.
 [23] F. K. Hwang, D. S. Richards, and P. Winter. The Steiner tree problem, vol. 53 of Annals of Discrete Mathematics. Elsevier, 1992.
 [24] J. Kehrer and H. Hauser. Visualization and visual analysis of multifaceted scientific data: A survey. IEEE Transactions on Visualization and Computer Graphics, 19(3):495–513, March 2013. doi: 10 . 1109/TVCG . 2012 . 110

[25]
J. B. Kollat, P. M. Reed, and R. M. Maxwell.
Manyobjective groundwater monitoring network design using biasaware ensemble kalman filtering, evolutionary optimization, and visual analytics.
Water Resources Research, 47(2), 2011. W02529.  [26] H. S. Kunii. DBMS with Graph Data Model for Knowledge Handling. In Proceedings of the 1987 Fall Joint Computer Conference on Exploring Technology: Today and Tomorrow, ACM ’87, pp. 138–142. IEEE Computer Society Press, Los Alamitos, CA, USA, 1987.
 [27] E. Leonardi, A. Agocs, S. Fragkiskos, N. Kasfikis, J. Le Goff, M. Cristalli, V. Luzzi, and A. Polimeni. Collaboration spotting for dental science. Minerva Stomatologica, 63(9):295–306, sep 2014.
 [28] Z. Liu, S. B. Navathe, and J. T. Stasko. Networkbased visual analysis of tabular data. In 2011 IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 41–50, Oct 2011. doi: 10 . 1109/VAST . 2011 . 6102440
 [29] Lucene™/Solr™ Committers. Apache Lucene™ Documentation, December 2017.
 [30] Neo4j. The Neo4j Manual v2.3.3, December 2017.
 [31] J. O’Madadhain, D. Fisher, S. White, and Y. Boey. The JUNG (Java Universal Network/Graph) Framework. University of California, Irvine, California, 2003.
 [32] J. Paredaens, P. Peelman, and L. Tanca. GLog: A graphbased query language. Knowledge and Data Engineering, IEEE Transactions on, 7(3):436–453, 1995.
 [33] A. Scharl, A. HubmannHaidvogel, A. Weichselbraun, H. P. Lang, and M. Sabou. Media watch on climate change – visual analytics for aggregating and managing environmental knowledge from online sources. In 2013 46th Hawaii International Conference on System Sciences, pp. 955–964, Jan 2013. doi: 10 . 1109/HICSS . 2013 . 398
 [34] R. Shadoan and C. Weaver. Visual analysis of higherorder conjunctive relationships in multidimensional data using a hypergraph query system. IEEE Transactions on Visualization and Computer Graphics, 19(12):2070–2079, Dec 2013. doi: 10 . 1109/TVCG . 2013 . 220
 [35] I. Shapoval, M. Clemencic, and M. Cattaneo. ARIADNE: a Tracking System for Relationships in LHCb Metadata. In Journal of Physics: Conference Series, vol. 513, p. 042039. IOP Publishing, 2014.
 [36] Z. Shen, K.L. Ma, and T. EliassiRad. Visual analysis of large heterogeneous social networks by semantic and structural abstraction. IEEE transactions on visualization and computer graphics, 12(6):1427–1439, 2006.
 [37] C. Stolte, D. Tang, and P. Hanrahan. Polaris: a system for query, analysis, and visualization of multidimensional relational databases. IEEE Transactions on Visualization and Computer Graphics, 8(1):52–65, Jan 2002. doi: 10 . 1109/2945 . 981851
 [38] T. von Landesberger, A. Kuijper, T. Schreck, J. Kohlhammer, J. van Wijk, J.D. Fekete, and D. Fellner. Visual analysis of large graphs: Stateoftheart and future research challenges. Computer Graphics Forum, 30(6):1719–1749, 2011.
 [39] M. Wattenberg. Visual exploration of multivariate graphs. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’06, pp. 811–819. ACM, New York, NY, USA, 2006. doi: 10 . 1145/1124772 . 1124891
 [40] C. Weaver. Crossfiltered views for multidimensional visual analysis. IEEE Transactions on Visualization and Computer Graphics, 16(2):192–204, March 2010. doi: 10 . 1109/TVCG . 2009 . 94
 [41] P. C. Wong, H.W. Shen, C. R. Johnson, C. Chen, and R. B. Ross. The top 10 challenges in extremescale visual analytics. IEEE computer graphics and applications, 32(4):63, 2012.
 [42] P. C. Wong and J. Thomas. Visual analytics. IEEE Computer Graphics and Applications, 24(5):20–21, Sept 2004. doi: 10 . 1109/MCG . 2004 . 39
 [43] P. T. Wood. Query languages for graph databases. SIGMOD Rec., 41(1):50–60, Apr. 2012.
 [44] J. Yang, S. Zhang, and W. Jin. DELTA: Indexing and Querying Multilabeled Graphs. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11, pp. 1765–1774. ACM, New York, NY, USA, 2011. doi: 10 . 1145/2063576 . 2063832
Comments
There are no comments yet.