Visual Similarity Perception of Directed Acyclic Graphs: A Study on Influencing Factors

While visual comparison of directed acyclic graphs (DAGs) is commonly encountered in various disciplines (e.g., finance, biology), knowledge about humans' perception of graph similarity is currently quite limited. By graph similarity perception we mean how humans perceive commonalities and differences in graphs and herewith come to a similarity judgment. As a step toward filling this gap the study reported in this paper strives to identify factors which influence the similarity perception of DAGs. In particular, we conducted a card-sorting study employing a qualitative and quantitative analysis approach to identify 1) groups of DAGs that are perceived as similar by the participants and 2) the reasons behind their choice of groups. Our results suggest that similarity is mainly influenced by the number of levels, the number of nodes on a level, and the overall shape of the graph.



There are no comments yet.


page 1

page 2

page 3

page 4


Directed Graph Embeddings

Definitions of graph embeddings and graph minors for directed graphs are...

A Generalization of the Directed Graph Layering Problem

The Directed Layering Problem (DLP) solves a step of the widely used lay...

Preliminary investigation into how limb choice affects kinesthetic perception

We have a limited understanding of how we integrate haptic information i...

Perception of visual numerosity in humans and machines

Numerosity perception is foundational to mathematical learning, but its ...

A Linked Aggregate Code for Processing Faces (Revised Version)

A model of face representation, inspired by the biology of the visual sy...

Measuring Human-perceived Similarity in Heterogeneous Collections

We present a technique for estimating the similarity between objects suc...

Multi-modal Networks Reveal Patterns of Operational Similarity of Terrorist Organizations

Capturing dynamics of operational similarity among terrorist groups is c...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The visual comparison of directed acyclic graphs (DAGs) is a task encountered in various disciplines, e.g., in finance, biology, natural language processing, or social network analysis. The task is strongly influenced by the human perception of similarity since comparison builds upon making similarity judgments. In spite of the numerous occurrences of this task and recent papers surveying visual graph comparison techniques and visualizations 

[4, 12], knowledge about the human perception of graph similarity – especially for DAGs – is quite limited.

Only a few investigations address the comparison of graphs. Gleicher et al. [12] identify the basic types of techniques for visual comparison (juxtaposition, superposition, and explicit encoding). Tominski et al. [43] explicitly deal with the comparison of large node-link diagrams in superposition. They argue that interaction is essential for this process. Some interesting insights can be gained from the literature on dynamic graphs showing the evolution of node-link diagrams over time. The survey of Beck et al. [4] about visualizations of dynamic graphs provides an overview of visualization options transferable to general visual graph comparison since dynamic graph visualization has an inherent comparison component. Others discuss the extension of these visualizations and techniques with features like highlighting of commonalities and differences or the effectiveness of difference maps [1, 2, 3, 5, 16]. However, none of these papers deal with the issue of similarity perception within the context of graph comparison. There is also a large amount of research on graph readability. This research is partially relevant since the DAGs need to be read in order to compare them. Examples include studies on edge crossings and mental map perseverance (e.g., [23, 39, 35, 37]. These aspects are, however, not in the focus of our attention. The research investigating the comparison of visualizations in general is also interesting. Pandey et al. [33] conducted an experiment to study the similarity perception of scatterplots. So, their work inspired our methodology.

To the best of our knowledge, there is no research focusing on how humans perceive the similarity of DAGs. We are especially interested in the factors which influence the perception of similarity (possibly, number of nodes/edges, edge crossings, etc.). We deem the knowledge about the influencing factors important for the generation of future actionable guidelines for comparative visualizations. Towards this end, we conducted a study with small, unlabeled synthetic DAGs and used card sorting as our methodology. We decided for these DAGs in order to be able to keep the number of to be tested factors manageable. However, because of our systematic procedure the study scope can be easily extended in the future. The DAGs are represented as node-link diagrams. We address two research questions: (1) Which groups do the participants form?, and (2) Which factors did the participants consider to judge the similarity?

Our results indicate that similarity is mainly influenced by the number of levels and the number of nodes on a level as well as the overall shape. We provide additional material - i.a. our study material, our collected data and our analysis results here:

2 Related Work

While there exists an extensive body of research in perceptual psychology and pattern recognition on similarity judgments and dissimilarity measures (cf. e.g.,

[13, 34] for an overview), we will concentrate on work dealing with graphs and other types of plots.

Starting with graph visualization techniques and visual comparison techniques there exist several recent surveys of these areas (e.g., [4, 12, 15, 44, 25]). The basic techniques, that is, juxtaposition, superposition, and explicit encoding – following Gleicher et al.’s [12] classification – are sometimes enriched by emphasizing the correspondences between graphs [7, 16], e.g., by highlighting similar parts [3, 5, 16], or by emphasizing differences by collapsing the identical parts [1]. The enrichment, that is, emphasizing the commonalities or differences, usually relies on a similarity function respecting specific criteria. In this respect, Gao et al. [10] provide an overview of research done on graph edit distances, a mathematical way to measure the similarity between pairwise graphs. However, it is still unknown whether the criteria on which existing similarity functions are based correspond to the criteria used by humans when visually comparing two or more node-link diagrams. Tominski et al. [43] proposed interaction techniques which aid users in doing comparison tasks and which were inspired by the real-world behavior of people when comparing information printed on paper. Getting a better understanding of the perceived differences and commonalities is likely to result in better visualization and interaction techniques.

Moreover, the existing body of work dealing with perceptual and cognitive aspects focuses primarily on the readability of single graphs. Several factors, including graph aesthetics (edge crossings [23, 37, 38], layout [30, 8, 20, 29, 32], graph design [17, 40], and graph semantics or content knowledge [39, 32, 24]) have been identified to be important for graph readability. Huang et al. [18] – concerned with sociograms – note that good readability is not enough to effectively communicate network structures, emphasizing that the spatial arrangement of the nodes also influences viewers in perceiving the structure of social networks.

While perceptual aspects of single graphs have been thoroughly investigated, literature dealing with perceptual aspects when comparing node-link visualizations is considerably more scarce. Notable papers in this space are the work of Bach et al. [3] and Ghani et al. [11] who are both concerned with dynamic graphs (cf. Beck et al. [4] for an overview). Noteworthy is also the work of Archambault et al. [2] who evaluated the effectiveness of difference maps which show changes between time slices of dynamic graphs. While we are not necessarily concerned with dynamic graphs these works are nonetheless relevant in our context as dynamic graphs are often analyzed by using discrete time-slices. In our own previous work [27] we provided an overview of methodological challenges when dealing with the investigation of graph comparison and described a first preliminary study targeted towards identifying factors which influence the recognition of graph differences in very small star-shaped node-link diagrams. The work presented in this paper can be viewed as a continuation of these efforts.

Looking beyond the perception of node-link diagrams literature is currently also quite limited when it comes to similarity perception of other types of plots, a sentiment shared by Pandey et al. [33] who investigated how human observers judge the similarity of scatterplots. Our quantitative analysis as presented in this paper is partly based on the methodology put forward by Pandey et al. . Fuchs et al. [9] looked into how contours affect the recognition of data similarity in star glyphs. Likewise, Klippel et al. [22] investigated the similarity judgments of star glyphs using a methodology comparable to the one used by Pandey et al. [33] and us: Participants were shown various plots which they then had to group according to their perceived similarity.

3 Study Methodology

In this section, we present our study methodology. As noted above, Pandey et al.’s [33] work about the human similarity perception of large sets of scatterplots strongly inspired our basic methodology, since we share the research questions for different data types. Furthermore, Pandey et al. substantiate that the methodological principle of card sorting produces valuable results for this type of research questions. For advantages, drawbacks, and the suitability of card sorting for research questions like ours see Section 3.4.

3.1 Research Questions

Our superordinate research question (RQ) is: What factors influence human similarity perception of DAGs? We firstly have to know the factors influencing the similarity judgment. Once we know the influencing factors, we can, for instance, research the specific degree of influence of a single factor as well as the interplay between the factors. In order to analyze our superordinate RQ, we formulate two subordinate ones:

  • RQ1: Which groups do the participants form?

  • RQ2: Which factors did the participants consider to judge the similarity?

3.2 Dataset

Creating an appropriate study dataset is challenging due to the large number of possible variations [27]. Therefore, we were forced to limit the number of DAGs. Our object of study were 69 small (6-9 nodes), unlabeled, synthetic DAGs visualized as node-link diagrams. In the following we will use the term DAG to also refer to its embedding. When creating the DAGs we considered known factors influencing graph readability (e.g., edge crossings) and characteristics of DAGs from real-world datasets (e.g., a node may be the child of more than one parent node). We decided to have synthetic and small DAGs in order to be able to keep the number of to be tested factors manageable and to evaluate them systematically. Because of our study and data creation methodology it is easily feasible to extend our results with further studies considering further factors. Especially since knowledge about human graph similarity perception is currently quite limited, we consider the manageability of the problem as crucial. The size of our DAGs is also realistic. They are comparable to cascades in finance and biology (e.g., [26, 28]) and directed acyclic word graphs [41].
We deem factors of graph readability as potentially important for visual graph comparison since in order to be able to visually compare DAGs it is necessary to read them. We consider properties of real DAGs as important for our studies since they influence the visual appearance of the DAGs. More importantly, considering properties of real DAGs strengthens the realism of our synthetic data and, consequently, the transferability of our results to real world use cases.

To create our dataset, we started with , as depicted in Figure 1. is symmetric since it is easier to break symmetry than to introduce symmetry. We herewith cover symmetric and asymmetric DAGs on the basis of . This is important since humans are prone to symmetry [45]. is single-rooted since this is typical for various real-world DAG datasets; e.g., cascades. To test node and edge changes (addition of node(s) and edge(s)) we had a two stage DAG creation process. First, we created the base graphs - and their reflections by adding one, two, and three nodes. We ensured that the addition of the node(s) is done in the inner as well as the outer areas of (cf. Figure 1 - Base graphs). Secondly, we created all possible DAGs resulting from adding one and two edges (cf. Figure 1 - Alternatives) using our custom-made GraphCreator – a tool to create all possible DAGs resulting from a specific change of a DAG, e.g., adding one edge to - and their reflections. Herewith, we ensure that we have the maximal possible variation from which we then sample our study dataset.
Down-sampling is necessary since the visual comparison of DAGs is a quite cognitive demanding task for the participants. For the down-sampling we considered the following factors: edge crossing, visual layout, more than one parent node has the same child node, and long connections – typically across more than one level. We considered edge crossing since it is a prominent factor in graph readability [23, 37] for which reason we assume that it also plays a role in visual graph comparison. Furthermore, we considered the visual layout – another known factor from graph readability. The visual layout of DAGs does not contain any analytically relevant information about the DAGs’ data structure and properties. However, it still has a significant impact on graph readability which is why we deem it important to test its influence on visual graph comparison [8, 18]. We test this factor by horizontally reflecting , , and (cf. Figure 1). The ability to test the impact of isomorphism is a beneficial byproduct of this decision. We decided for a traditional hierarchical node-link diagram layout () with the root placed on top since Burch et al. found that this layout type outperforms other types such as upward layouts. In order to avoid confounding effects by destroying the mental map we did not optimize the layout after a DAG change; e.g., resolving an avoidable edge crossing. By visually inspecting DAGs from real-world datasets we found that two frequently occurring properties are: more than one parent node has the same child node ( – e.g., cascades in finance and biology), long connections – typically across more than one level ( – e.g., directed acyclic word graphs in natural language processing). Consequently, we considered them as factors for our final dataset. We did the down-sampling under the constraint of preserving the systematic variation (cf. Figure 1 – Final dataset) and by ensuring that changes take place at the inner as well as the outer areas of the respective base graph. Our dataset is available here: website.

Figure 1: Dataset creation: (I) Base graph creation by adding 1, 2, and 3 nodes to ensuring that the added node is placed at the inner as well as the outer areas of , (II) creation of all possible alternatives by adding one and two edges to the base graphs, (III) down-sampling of the alternatives considering the factors described in Section 3.2.

3.3 Participants

We recruited 20 volunteers (13 male, 7 female, between 20 and 60 years). We had no prerequisite of having experience with DAGs. This way our results are not limited to experienced users. In our opinion, it is more likely that experienced users know which factors really bear relevant information for the comparison task whereas for inexperienced users misconceptions are more likely. We are convinced that if we want to understand the human similarity perception and as a consequence improve comparative visualizations, we need a varying range of expertise with DAGs. Our participants had a diverse educational level (vocational training, undergraduate, graduate, post-graduate) and came from various disciplines. Two of our participants had basic knowledge in information visualization and five had advanced knowledge. In spite of this, the participants’ experience with DAGs varied vastly.

3.4 Study Procedure

Every participant was welcomed and the experimenter handed over the study material and explained the task. Each session took approximately one hour.


We asked the participants to group 69 DAGs with respect to their perceived similarity – multiple occurrences of a single DAG in different groups were allowed. Furthermore, we asked them to tag each group with the factors they used to build them. Finally, participants had to judge the easiness of forming the respective group (“How difficult or easy was it for you to create this group?”) and their confidence in the group’s consistency (“How doubtful or confident are you about the consistency of the DAGs in the group, i.e., would you create the same group again if you did this task again?”). The questions regarding easiness and confidence were to judge on a five-point Likert scale (“1 = very difficult/doubtful, 2 = difficult/doubtful, 3 = neutral, 4 = easy/confident, 5 = very easy/confident”). The formed groups provide the data needed to answer RQ1 while the participants’ group tags provide the data to answer RQ2. For the task formulation we kept the one from Pandey et al. [33] since it captured exactly what we wanted to ask our participants. Moreover, the formulation was already pretested and successful in Pandey et al.’s study.

Card Sorting Methodology.

Card sorting is a well-known methodology in psychology and human-computer interaction for externalizing mental models humans have about the environment they live in. Wood and Wood [46] define card sorting as follows: As the name implies, the method originally consisted of researchers writing labels representing concepts (either abstract or concrete) on cards, and then asking participants to sort (categorize) the cards into piles that were similar in some ways. Humans group objects according to their perceived similarity into different categories. In this way, card sorting helps to uncover the structure of mental models. There are different methods to conduct card sorting. Researchers generally distinguish between open vs. closed sorting tasks and between paper-based and computer-supported card sorting [14]. In closed card sorting, participants have to sort the cards according to a given scheme, in open card sorting, the participants develop it themselves. The procedures for card sorting tasks sometimes differ considerably. Sometimes, the cards that have been assigned to a category are placed in a pile [46], so that participants do not shuffle them around on a canvas. Especially in computerized card sorting, it is often not possible to see all cards from which to choose at the same time [6, 33] which forces the study participants to compare the cards in memory. We used an open, paper-based card sorting since literature indicates that the paper-based approach yields more consistent results than the computerized one [14].

Study Setup and Materials.

We used an empty meeting room with good lighting for conducting the study. The participant got the task sheet, the data sheet, sheets for building the groups, and sheets for tagging each group with the group building factors as well as for judging the easiness and the confidence. The task sheet contained the afore explained task. The data sheet consisted of the 69 randomly positioned DAGs. We decided to present our dataset on paper, so that the participants could see all data items at the same time. The order of the data items was kept the same for all participants to exclude order as a possible confounding variable. The participants had to write down the DAGs’ IDs which belong to a group and give each group a unique identifier. Furthermore, they had to write down the tags as well as their easiness and confidence judgment together with the unique group identifier. The material is available on our website.

4 Analysis and Results

To analyze the collected data with respect to our research questions we used a mixed-approach involving a quantitative (RQ1, RQ2) and a qualitative (RQ2) analysis. The qualitative analysis provided the factors the participants tagged their formed groups with. The quantitative analysis resulted in the perceptual consensus over all participants as well as it served as a check of the participants’ self-reported factors extracted in the qualitative analysis.

4.1 Quantitative Analysis (RQ1)

We did a perceptual consensus calculation over all participants with complete data (), i.e., participants who assigned each DAG to at least one group. The perceptual consensus of the perceived similarity served as a basis to find out (1) whether the similarity perception of humans is consistent across individual people and (2) whether it is objectifiable with graph theoretical or visual properties.

To gain insights into the consistency and objectifiability of the similarity perception and to mitigate the potential bias – saying one thing and doing another – of self-reporting questions such as tagging we analyzed the perceived similarity consensus regarding which of the known graph theoretical and visual properties explain the clusters best. The mitigation potential resides in the perceived similarity consensus encapsulating what the participants really did.

The consensus easiness and confidence scores for each cluster provide information about the similarity consensus’ perceived solidness and robustness. A high average easiness score means that the grouping is solid, thus, due to an easy assignment, it is less likely that a participant assigned a DAG randomly. The average confidence score reflects the participants’ opinion whether they would form the same group again. A high score means that the grouping is robust since it is highly probable that it would look similar if the task were repeated.


To build the perceptual consensus for the participants’ similarity judgments, we calculated a pairwise perceptual distance between each pair of DAGs, based on the number of occurrences of each DAG pair in the same group and on the number of individual occurrences (for details cf. [33]). The perceptual distance calculation resulted in a perceptual distance matrix (PDM). Like Pandey et al. [33]

we did a hierarchical clustering, in our case with average linkage. We evaluated the correct number of clusters with the mean/median of number of groups and with the gap statistic 


. The mean/median indicate the average number of participant-built groups and thus served as a reasonable estimator for the number of clusters. The gap statistic respects, like the individual groupings and similarity per se, the cluster similarity which made it to a further reasonable estimator. The hierarchical clustering result is the consensus grouping of all DAGs based on the similarity consensus contained in the PDM.

For the clusters’ property analysis we determined various properties for each graph. Based on this we determined the dominating properties of the clusters as well as the cluster separating properties. Examples of the employed properties are: depth, visual symmetry, visual leaning, edge crossing – number and existence, edge length, number of nodes on a specific level, and the existence and the number of nodes having more than on parent node.

For the consensus of the easiness and confidence score we calculated an easiness and confidence value for each plot on the basis of the assumption that each plot inherits the easiness and confidence score of the participant-built groups it belongs to. Then we calculated an average easiness and confidence score for the hierarchical clustering result. For a detailed explanation please refer to [33].


The gap statistic indicated that the data supports eight clusters. Both the mean and median of the number of built groups supported the indicated eight clusters (). As all three were similar, we decided to cut the tree into eight clusters. Figure 2 shows the resulting dendrogram and the resulting clusters – marked using colored boxes. Excerpts of the hierarchical clusters are shown in Figure 3. The entire clusters can be found on our website. The easiness and confidence scores of all hierarchical clusters are around (cf. Table 1). This means that the participants on average found their groups easy to build and were confident they would look similar if they repeated the task. Consequently, this results in a good solidness and robustness of the consensus grouping.

Figure 2: Dendrogram resulting from hierarchical clustering with average linkage. The resulting eight clusters (C1–C8) are marked using colored boxes.
Figure 3: Excerpts of the hierarchical clusters (cf. website complete clusters).

The properties which distinguish the clusters best are the depth of the DAGs, the number of nodes on a specific level of the DAGs, and the visual leaning of the DAGs. Table 1 summarizes the properties of the clusters. Clusters C1 and C2 are identical in depth and number of nodes on each of their four levels. However, they are separated by the leaning. While the DAGs of C1

are left-skewed, those of

C2 are right-skewed. The leaning separating the clusters C1 and C2 suggests that not the reflection of (cf. Figure 1) itself was apparent to the participants but rather a property which changed – the leaning (cf. Section 3.2).
Clusters C3, C4, and C5 have identical depth (3) as well as three nodes on the second level. The number of nodes on the third level separates these clusters. The depth separates the clusters C3, C4, C5 from C1, C2. Cluster C5 clearly shows that neither the reflection of (cf. Figure 1) itself nor a changed property mattered. It seems that the pure number of nodes dominates significantly over, e.g., node position (2 left, 1 right vs. reflected: 1 left, 2 right).
Clusters C6, C7, and C8 have identical depth (3) and four nodes on the second level. The number of nodes on the third level separates them. The number of nodes on the second level separates C6, C7, C8 from C3, C4, C5. C6, C7, C8 and C1, C2 are separated by depth. C7 shows that also the reflection of itself (cf. Figure 1) or a changed property, e.g., node position, did not matter.
Interestingly, edges and edge crossings – important factors of graph theory and graph aesthetics – seem not to matter to the participants. The excerpts of C3 and C5 in Figure 3 clearly show: The edges had no influence on the similarity judgment of the participants. Otherwise DAGs with such different topology would not have been grouped together. The excerpt of C7 shows that the participants also did not really care about edge crossings. To conclude, we consider the consistency of the hierarchical clusters as high regarding graph theoretical and visual DAG properties. They are also well objectifiable with these properties.

Cluster DAG Properties
C1 4.3 4.2 depth: 4 number of nodes on level 2: 4; on level 3: 3;
on level 4: 1 leaning: left
C2 4.4 4.3 depth: 4 number of nodes on level 2: 4; on level 3: 3;
on level 4: 1 leaning: right
C3 4.1 4.0 depth: 3 number of nodes on level 2: 3; on level 3: 4
C4 4.1 4.1 depth: 3 number of nodes on level 2: 3; on level 3: 2
C5 3.6 3.8 depth: 3 number of nodes on level 2: 3; on level 3: 3
C6 3.7 3.7 depth: 3 number of nodes on level 2: 4; on level 3: 4
C7 3.6 3.8 depth: 3 number of nodes on level 2: 4; on level 3: 3
C8 3.8 3.9 depth: 3 number of nodes on level 2: 4; on level 3: 2
Table 1: Properties of the DAGs in the clusters C1-C8 along with average easiness () and confidence () values for each cluster.

4.2 Qualitative Analysis (RQ2)

We performed a thematic analysis of the participants’ tags to reveal the factors they considered. We also analyzed the factors’ importance based on the number of mentions of a specific factor. For this analysis we used the data of all 20 participants since it does not depend on whether a DAG was grouped or not.


First, we transcribed the participants’ tags by noting each tag together with how the participant used it, e.g., in a hierarchical manner. Additionally we collected the following data for the tags (henceforth called factors) of each participant: factor type (visual, graph theoretical), combined vs. single factors (e.g., number of levels vs. number of levels and number of nodes), number of considered factors, number of values per factor (e.g., number of edge crossings = 1, 2 and 3 number of values = 3). We deemed the factor type as important since the graph theoretical properties are those which contain the information relevant for comparison insights. However, we already know from graph readability research the significant influence of visual factors (e.g., edge crossing). Knowing those for visual comparison is beneficial for controlling their influence. We collected the other data as meta-information on the factors the participants used in order to learn more about the participants’ usage of the factors.


The individual transcriptions can be found on our website. Figure 4 shows the factors considered by the participants together with how often a factor was named. Multiple mentions of one and the same factor by one participant were not considered. In total, our participants used 27 distinct factors (cf. Figure 4). Ten of these can be considered to be graph theoretical factors (yellow) and 15 to be visual factors (blue). Two of the used factors are neither graph theoretical nor visual (gray). Just five out of 20 participants used a combined factor and only two of these five used more than one combined factor. The most frequently combined factor was number of nodes on a specific level (five times); e.g., number of nodes on the second level = 3 and number of nodes on the third level = 4. Eight of the 27 factors were used by at least of the participants (cf. Figure 4, left). We will focus on these eight, for the other 20 factors please refer to Figure 4, right.

The most important factors according to usage frequency were: number of levels (i.e., depth of the DAG), number of nodes on a specific level, shape, arm/branch ( DAG sub-shape), one parent node, edge crossing, child node(s) with parent node, visual leaning (left: , right: ). The factor shape is basically the convex hull of the DAG (). Regarding shape it is interesting to note that we could observe a coherence of shape with the number of nodes on a specific level. Participants, for instance, denoted a DAG such as as “narrow/small pyramid” and a DAG such as as “wide/large pyramid”. However, it is clear that this coherence is also influenced by the DAGs’ layout. Arm/branch refers to the shape of a DAG’s sub-graph (). Edge crossing deals with crossings of the visualized edges (). The participants considered different types of edge crossings, e.g., presence of edge crossing or (un)resolvable edge crossings. The factors one parent node and child node(s) with parent node relate to the number of nodes which are parent to another node (, ). Again, we could observe that participants used different types of these factors.

Interestingly, also the extracted factors substantiate that edges and edge crossings did not really matter (cf. Section 4.1). The factor edge crossing is one of the least used of the most important factors. Other edge related factors were used just once (cf. Figure 4, right). Various individual groupings also support this, e.g.: (factor: one parent left).

Figure 4: Factors used by the participants (yellow: graph theoretical, blue: visual, gray: no type). Multiple mentions of the same factor by the same participant were excluded.

5 Discussion and Conclusion

We conducted a card sorting study to identify the factors influencing the similarity perception of DAGs to mitigate the present knowledge gap regarding this topic despite the vast presence of visual comparison tasks in various disciplines.

Both, the results of our quantitative and qualitative analysis point to similar factors which seem to dominantly influence similarity perception of DAGs, namely the number of levels (depth), the number of nodes on specific levels, as well as shape-related aspects such as the visual leaning of a DAG. Herewith, we can be certain that the self-reported factors of the participants were not biased. The strong influence of shape is remarkable as in our case the spatial arrangement did not convey any additional information. This resulted in cases where structurally identical DAGs were assigned to different groups due to one being left-skewed or right-skewed. Being skewed to the left or right mainly played a role for the 4-level DAGs (cf. C1 and C2), most likely because it had a stronger influence on the overall shape as in the 3-level cases. Nevertheless, this observation supports previous results which found evidence that perception of graphs is sensitive to its spatial layout (cf. e.g., [18, 30]). Surprisingly, edge crossings – an important factor concerning the readability of graphs [36] – contrary to our expectations did not seem to have a strong impact on perceived DAG similarity. This is, for example, evident in the clusters C5 and C6 where no distinction between DAGs with and without edge crossings has been made (cf. Figure 3). In the participants’ statements we found soft evidence that they did not subconsciously resolve the edge crossing and therefore did not mention edge factors; on the contrary, the edges were not in the focus of the participants.
The fixed order of our data items did not lead to arbitrary groupings. The individual groupings and the consensus grouping are well objectifiable with DAG properties. We analyzed the individual groupings by checking the objectifiability of grouped consecutive data items (cf. website for details). The quantitative analysis shows the objectifiability of the consensus grouping (cf. Section 4.1).

In future work, it will be necessary to investigate how the identified factors and their importance varies across different graph sizes. It is, for instance, reasonable to assume that, for larger graphs, factors concerning details of a graph (e.g., number of parent nodes, number of nodes on a specific layer) decrease in importance while factors concerning the overall appearance (e.g., shape) increase. Regardless of that, our study provides first results which can contribute to the design of comparative visualizations. Moreover, a better understanding of the factors which drive humans’ similarity judgment may also be used towards developing perception-based graph similarity measures. Current notions of graph similarity such as graph isomorphism and edit distance (cf. [10]

), descriptive statistics of graph structure measures such as degree distribution or diameter, or iterative approaches which assess the similarity of the neighborhood of nodes (e.g., 

[19, 21, 31]) rely purely on graph theoretical properties.

Besides understanding the individual factors we also deem it important to understand the strategies that participants employ while judging the similarity of data items. This will help to offer useful interactions with comparative visualizations. While our study was not specifically designed for this we could observe circumstantial evidence, as a byproduct from our transcription, that the participants used three distinctive strategies: Eleven participants chose a factor, grouped the entire dataset according to it, and then grouped the resulting groups into further sub-groups (divide-and-conquer). There were also seven participants who always respected the entire dataset considering the factors one after the other. Some of the participants chose all their factors in advance. Still others chose their factors in an ad hoc fashion; that means, after having grouped the dataset according to a factor they thought about the next. Finally, there were two participants who did their grouping by considering just one single factor. More thorough investigations will be necessary to verify these observations.

To conclude, we consider the similarity perception of DAGs in visual comparison across people as consistent and well objectifiable with graph theoretical or visual properties. We find this substantiated by our quantitative and qualitative analysis. An in-depth analysis is subject to future research.

6 Acknowledgments

This work was financially supported by the Deutsche Forschungsgemeinschaft e.V. (DFG, LA 3001/2-1) and the Austrian Science Fund (FWF, I 2703-N31).


  • [1] Archambault, D.: Structural differences between two graphs through hierarchies. In: Proc. GI. pp. 87–94. Canadian Information Processing Society (2009)
  • [2] Archambault, D., Purchase, H.C., Pinaud, B.: Difference map readability for dynamic graphs. In: Brandes, U., Cornelsen, S. (eds.) Graph Drawing: 18th International Symposium, GD 2010, pp. 50–61 (2011)
  • [3] Bach, B., Pietriga, E., Fekete, J.D.: Graphdiaries: Animated transitions and temporal navigation for dynamic networks. IEEE Trans. Vis. Comput. Graphics 20(5), 740–754 (2014)
  • [4] Beck, F., Burch, M., Diehl, S., Weiskopf, D.: The state of the art in visualizing dynamic graphs. In: Proc. EuroVis - STARs (2014)
  • [5] Bremm, S., Von Landesberger, T., Heß, M., Schreck, T., Weil, P., Hamacher, K.: Interactive visual comparison of multiple trees. In: Proc. IEEE VAST. pp. 31–40 (2011)
  • [6] Chaparro, B.S., Hinkle, V.D., Riley, S.K.: The usability of computerized card sorting: A comparison of three applications by researchers and end users. J. Usability Stud. 4(1), 31–48 (2008)
  • [7] Collins, C.M., Carpendale, S.: VisLink: Revealing relationships amongst visualizations. IEEE Trans. Vis. Comput. Graphics 13(6), 1192–1199 (2007)
  • [8] Dwyer, T., Lee, B., Fisher, D., Quinn, K.I., Isenberg, P., Robertson, G., North, C.: A comparison of user-generated and automatic graph layouts. IEEE Trans. Vis. Comput. Graphics 15(6), 961–968 (2009)
  • [9] Fuchs, J., Isenberg, P., Bezerianos, A., Fischer, F., Bertini, E.: The influence of contour on similarity perception of star glyphs. IEEE Trans. Vis. Comput. Graphics 20(12), 2251–2260 (2014)
  • [10] Gao, X., Xiao, B., Tao, D., Li, X.: A survey of graph edit distance. Pattern Anal. Appl. 13(1), 113–129 (2010)
  • [11] Ghani, S., Elmqvist, N., Yi, J.S.: Perception of animated node-link diagrams for dynamic graphs. Comput. Graph. Forum 31(3pt3), 1205–1214 (2012)
  • [12] Gleicher, M., Albers, D., Walker, R., Jusufi, I., Hansen, C.D., Roberts, J.C.: Visual comparison for information visualization. Inf. Vis. 10(4), 289–309 (2011)
  • [13] l. Goldstone, R., Son, J.Y.: Similarity. In: Holyoak, K.J., Morrison, R.G. (eds.) The Cambridge Handbook of Thinking and Reasoning (2005)
  • [14] Greve, G.: Different or alike? Comparing computer-based and paper-based card sorting. International J. of Strategic Innovative Marketing 1(1), 27–36 (2014)
  • [15] Hadlak, S., Schumann, H., Schulz, H.J.: A survey of multi-faceted graph visualization. In: Proc. EuroVis - STARs (2015)
  • [16] Holten, D., Van Wijk, J.J.: Visual comparison of hierarchically organized data. Comput. Graph. Forum 27(3), 759–766 (2008)
  • [17] Holten, D., van Wijk, J.J.: A user study on visualizing directed edges in graphs. In: Proc. CHI. pp. 2299–2308 (2009)
  • [18] Huang, W., Hong, S.H., Eades, P.: Layout effects on sociogram perception. In: Healy, P., Nikolov, N.S. (eds.) Graph Drawing: 13th International Symposium, GD 2005, pp. 262–273 (2006)
  • [19] Jeh, G., Widom, J.: SimRank: A measure of structural-context similarity. In: Proc. KDD. pp. 538–543 (2002)
  • [20] Kieffer, S., Dwyer, T., Marriott, K., Wybrow, M.: HOLA: Human-like orthogonal network layout. IEEE Trans. Vis. Comput. Graphics 22(1), 349–358 (2016)
  • [21] Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)
  • [22] Klippel, A., Hardisty, F., Weaver, C.: Star plots: How shape characteristics influence classification tasks. Cartogr. Geogr. Inf. Sci. 36(2), 149–163 (2009)
  • [23] Kobourov, S.G., Pupyrev, S., Saket, B.: Are crossings important for drawing large graphs? In: Duncan, C., Symvonis, A. (eds.) Graph Drawing: 22nd International Symposium, GD 2014, pp. 234–245 (2014)
  • [24] Körner, C.: Concepts and misconceptions in comprehension of hierarchical graphs. Learn. Instr. 15(4), 281–296 (2005)
  • [25] von Landesberger, T., Kuijper, A., Schreck, T., Kohlhammer, J., van Wijk, J., Fekete, J.D., Fellner, D.: Visual analysis of large graphs: State-of-the-art and future research challenges. Comput. Graph. Forum 30(6), 1719–1749 (2011)
  • [26] von Landesberger, T., Diel, S., Bremm, S., Fellner, D.W.: Visual analysis of contagion in networks. Inf. Vis. 14(2), 93–110 (2015)
  • [27] von Landesberger, T., Pohl, M., Wallner, G., Distler, M., Ballweg, K.: Investigating graph similarity perception: A preliminary study and methodological challenges. In: Proc. VISIGRAPP. pp. 241–250 (2017)
  • [28] Lenz, O., Keul, F., Bremm, S., Hamacher, K., von Landesberger, T.: Visual analysis of patterns in multiple amino acid mutation graphs. In: Proc. IEEE VAST. pp. 93–102 (2014)
  • [29] McGee, F., Dingliana, J.: An empirical study on the impact of edge bundling on user comprehension of graphs. In: Proc. AVI. pp. 620–627 (2012)
  • [30] McGrath, C., Blythe, J., Krackhardt, D.: The effect of spatial arrangement on judgments and errors in interpreting graphs. Soc. Networks 19(3), 223–242 (1997)
  • [31] Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In: Proc. ICDE. pp. 117–128 (2002)
  • [32] Novick, L.R.: The importance of both diagrammatic conventions and domain-specific knowledge for diagram literacy in science: The hierarchy as an illustrative case. In: Barker-Plummer, D., Cox, R., Swoboda, N. (eds.) Diagrammatic Representation and Inference, pp. 1–11 (2006)
  • [33] Pandey, A.V., Krause, J., Felix, C., Boy, J., Bertini, E.: Towards understanding human similarity perception in the analysis of large sets of scatter plots. In: Proc. CHI. pp. 3659–3669 (2016)
  • [34] Pekalska, E., Duin, R.P.W.: The dissimilarity representation for pattern recognition: Foundations and applications (2005)
  • [35] Purchase, H.C., Pilcher, C., Plimmer, B.: Graph drawing aesthetics – created by users, not algorithms. IEEE Trans. Vis. Comput. Graphics 18(1), 81–92 (2012)
  • [36] Purchase, H.: Which aesthetic has the greatest effect on human understanding? In: DiBattista, G. (ed.) Graph Drawing: 5th International Symposium, GD 1997, pp. 248–261 (1997)
  • [37] Purchase, H.C.: Metrics for graph drawing aesthetics. Vis. Languages & Computing 13(5), 501–516 (2002)
  • [38] Purchase, H.C., Hoggan, E., Görg, C.: How important is the “mental map”? – an empirical investigation of a dynamic graph layout algorithm. In: Kaufmann, M., Wagner, D. (eds.) Graph Drawing: 14th International Symposium, GD 2006, pp. 184–195 (2007)
  • [39] Purchase, H.C., McGill, M., Colpoys, L., Carrington, D.: Graph drawing aesthetics and the comprehension of UML class diagrams: An empirical study. In: Proc. pp. 129–137 (2001)
  • [40] Tennekes, M., de Jonge, E.: Tree colors: color schemes for tree-structured data. IEEE Trans. Vis. Comput. Graphics 20(12), 2072–2081 (2014)
  • [41] Thornley, S., Marshall, R., Wells, S., Jackson, R.: Using directed acyclic graphs for investigating causal paths for cardiovascular disease. J. Biometrics Biostatistics 4, 182 (2013)
  • [42] Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Series B. Stat. Methodol. 63(2), 411–423 (2001)
  • [43] Tominski, C., Forsell, C., Johansson, J.: Interaction support for visual comparison inspired by natural behavior. IEEE Trans. Vis. Comput. Graphics 18(12), 2719–2728 (2012)
  • [44] Vehlow, C., Beck, F., Weiskopf, D.: The state of the art in visualizing group structures in graphs. In: Proc. EuroVis - STARs (2015)
  • [45] Welch, E., Kobourov, S.: Measuring symmetry in drawings of graphs. Comput. Graph. Forum 36(3), 341–351 (2017)
  • [46] Wood, J.R., Wood, L.E.: Card sorting: Current practices and beyond. J. Usability Stud. 4(1), 1–6 (2008)