DeepAI
Log In Sign Up

Automatic Polygon Layout for Primal-Dual Visualization of Hypergraphs

N-ary relationships, which relate N entities where N is not necessarily two, can be visually represented as polygons whose vertices are the entities of the relationships. Manually generating a high-quality layout using this representation is labor-intensive. In this paper, we provide an automatic polygon layout generation algorithm for the visualization of N-ary relationships. At the core of our algorithm is a set of objective functions motivated by a number of design principles that we have identified. These objective functions are then used in an optimization framework that we develop to achieve high-quality layouts. Recognizing the duality between entities and relationships in the data, we provide a second visualization in which the roles of entities and relationships in the original data are reversed. This can lead to additional insight about the data. Furthermore, we enhance our framework for a joint optimization on the primal layout (original data) and the dual layout (where the roles of entities and relationships are reversed). This allows users to inspect their data using two complementary views. We apply our visualization approach to a number of datasets that include co-authorship data and social contact pattern data.

READ FULL TEXT VIEW PDF

page 7

page 9

page 11

07/15/2022

Toward Systematic Design Considerations of Organizing Multiple Views

Multiple-view visualization (MV) has been used for visual analytics in v...
04/29/2020

Organic Narrative Charts

Storyline visualizations display the interactions of groups and entities...
12/19/2021

A Multi-Layout Approach to Immersive Visualization of Network Data

Network data plays a vital role in much of today's visualization researc...
08/05/2019

Visual-Relation Conscious Image Generation from Structured-Text

Generating realistic images from text descriptions is a challenging prob...
09/04/2017

Anisotropic Radial Layout for Visualizing Centrality and Structure in Graphs

This paper presents a novel method for layout of undirected graphs, wher...
06/02/2021

Gradient Assisted Learning

In distributed settings, collaborations between different entities, such...
09/20/2018

OpenMPL: An Open Source Layout Decomposer

Multiple patterning lithography has been widely adopted in advanced tech...

1 Related Work

A hypergraph is an extension of a graph. Many graph drawing algorithms exist [Gibson:13], and they usually represent the vertices and edges in the data with geometric objects. In the visual metaphor of Qu et al. [Qu:2017], polygons are used to represent hyperedges in the data.

There has been much work in hypergraph visualization [Alsallakh:16, Vehlow2017graph]. Many existing techniques are based on Euler and Venn diagrams [Rogers:08, Stapleton:12, Micallef:14], which focus on showing whether and by how much two hyperedges in the data intersect. For -ary relationship visualization, it is important to show not only whether two relationships (hyperedges) share common entities (vertices), but also which entities are part of a given relationship.

Matrix-based techniques [Kim:2007, Sadana:2014, Lex:2014] model a hypergraph using a table. The hyperedges and/or vertices are mapped to a row, a column, or an entry in the matrix. When the number of rows and/or the number of columns is relatively large, it can be difficult to count the number of non-empty entries in a row or column or to decide whether two entries are in the same row or column. The effectiveness of this approach depends on whether the rows and columns can be sorted in a certain way [Alsallakh:16]. In addition, some matrix-based techniques such as the UpSet method [Lex:2014] do not explicitly model the vertices in the data. Instead, they are inferred from the intersections of hyperedges.

Another approach treats the hypergraph as a bipartite graph [Stasko:07, Dork:12, Alsallakh:2013], in which both the vertices and the hyperedges are the nodes in the graph, though of different colors. Thus, the visualization of hypergraphs is converted to the visualization of bipartite graphs. Addressing the often large number of edge crossings in the visualization is the main challenge one faces using such an approach.

As an extension to the Euler and Venn diagrams, the subset-based approach [Santamara:2010, Riche:2011, Alsallakh:2013, Arafat:17] visualizes each hyperedge as a simple loop that defines a region. The vertices incident to a hyperedge are visualized as points enclosed by the loop. In some of the techniques [Riche:2011, Alsallakh:2013], a vertex belonging to multiple hyperedges can be duplicated as multiple points. The copies of the same vertex are connected using curves. The subset-based approach is designed to handle hyperedges with overlaps [Alsallakh:16]. However, certain properties of individual hyperedges such as their cardinalities are not explicitly represented in some of these visualization methods [Santamara:2010, Riche:2011]. In our work, we reuse the two-dimensional CW-complex approach of Qu et al. [Qu:2017] in which each relationship (hyperedge) is mapped to a polygon whose number of edges encodes the cardinality of the hyperedge. To alleviate the intensive labor associated with manual editing during layout generation, we aim to provide an automatic layout algorithm.

A more recent approach models a hypergraph as a metro map (transit map) in which each hyperedge is modeled as a metro line with its vertices being the metro stations on the line. This approach allows the rich techniques of transit map generation [Wu:2020] to be applied to hypergraph visualization. Frank et al. [Frank:2021] investigate the theoretically minimum number of crossings between different metro lines (hyperedges) while Jacobsen et al. [Jacobsen:2021] provide practical optimization algorithms for the fast generation of high-quality metro maps for given hypergraphs. While this approach can make it easier to see the overlaps between different hyperedges, it can be challenging to quickly see the cardinality of a hyperedge, especially when its corresponding metro line partially overlaps with other metro lines.

Evans et al. [Evans:2019] propose to represent a hypergraph using a set of 3D polygons such that each polygon corresponds to a vertex in the hypergraph. They further investigate the theoretical feasibility of such a representation. However, the technique is not demonstrated on any data due to its theoretical nature, nor is an algorithm provided in generating a visualization based on this approach.

There has also been work on cluster visualization and community visualization [Fagnan:12, Dogrusoz:13, Vehlow2013fuzzy]. In these setups, the nodes in the data are partitioned so that each node belongs to exactly one cluster or a pre-dominant community [Dogrusoz:96] while having connections to other nodes in the data, including those in other clusters or pre-dominant communities. The visualization then focuses on placing each cluster or pre-dominant community in such a way that there is a clear spatial separation of the clusters or the communities [Fagnan:12]. The positions of the nodes in each cluster or pre-dominant community can be further improved through local operations [Dogrusoz:13]. To increase readability, glyphs are used to replace clusters and communities in order to provide a more abstract visualization of the relationships between clusters and communities [Dunne2013, Vehlow2013fuzzy, archambault2008grouseflocks]. In hypergraphs, each vertex can belong to multiple hyperedges. Spatially grouping the vertices based on their clusters and pre-dominant communities can have the unintended side effect of downplaying community memberships that are not chosen as the pre-dominant ones. In our work, we consider all hyperedges important and visualize them as such. A number of interactive tools are available [Sun:2016, Zhao:2018, Sun:2019] that support the exploration of bicluster data. As the data is visually represented using graphs, cluttering can occur as a result of excessive edge crossings [Alsallakh:16].

2 Data Representation

A polyadic dataset consists of a set of entities and a multiset of relationships . Each relationship is a subset of , and each entity must belong to at least one relationship. An entity can have attributes such as importance. Note that we allow two relationships to have an identical set of entities. We partition as where is the number of elements in . An element () is a subset of with elements. For example, consists of all unary relationships in the data and of all binary relationships. Note that we require a relationship to have at least one entity and at most entities. Furthermore, the order of the entities in an -ary relationship is of little significance in our applications.

Polyadic data can be modeled as a hypergraph, with vertices (entities) and hyperedges (relationships). The polygonal representation of such data [Qu:2017] represents a hyperedge as a polygon, leading to a two-dimensional CW-complex [Hatcher:2002] for the hypergraph. In the remainder of the paper, we will not differentiate between entities (semantics) and vertices (both theoretical and visual representations). Similarly, we will not differentiate among relationships (semantics), hyperedges (theoretical representation), and polygons (visual representation). More specifically, we will use vertices and polygons in the next sections.

The degree of a vertex is the number of polygons that contain this vertex. The cardinality of a polygon is the number of vertices in the polygon. Furthermore, the degree of a polygon is the number of polygons that share at least one vertex with the polygon. When a vertex is part of a polygon, they are incident to each other.

3 Automatic CW-Complex Layout Optimization

We aim to provide an algorithm for the generation of a high-quality CW-complex layout for a given polyadic dataset. As with any optimization problem, we need the following components: (1) an objective function, and (2) an optimization framework. We describe both in this section.

3.1 CW-Complex Layout Principles

Before describing our objective functions, we state a number of design principles for CW-complex layouts that can produce clarity in the final layout.

  1. Every polygon should be regular.

  2. All polygons of the same cardinality should have the same area.

  3. Polygons with larger cardinalities should have larger areas.

  4. No polygon should contain self-overlaps and flips.

  5. Unnecessary overlaps between polygons should be avoided.

  6. When two polygons share at least three vertices, the intersection polygon should also be regular.

  7. Two vertices should not overlap each other.

  8. A vertex should not appear on the border or in the interior of a polygon when the vertex is not part of the polygon.

These principles are motivated by the following observations.

(a)
(b)
Figure 3: Self-overlaps (a) and flips (b) in a polygon can make it difficult to recognize the cardinality of the polygon.

Figure 4: This figure illustrates the motivations behind Principle 5. The two relationships share (a) zero, (b) one, and (c) two vertices, respectively. The interiors of the polygons partially overlap, which can distort the interpretation of the amount of sharing.
(a)
(b)
Figure 7: When two convex polygons share at least three vertices, overlaps of the polygons are unavoidable. When the intersection polygon (shaded) is not regular (a), it can be more difficult to recognize the cardinality of the intersection polygon than when the intersection is regular (b).

First, the number of edges in a polygon can be best perceived when the polygon is regular (Principle 1) and is sufficiently large. In addition to the number of sides in the polygon, we also wish to use the area of the polygon to encode the cardinality of the underlying -ary relationship (Principles 2 and 3).

Second, a polygon with self-overlaps and flips (Figure 3) can make it more difficult to recognize the cardinality of the polygon (Principle 4).

Third, when two polygons with at most two shared vertices overlap in the layout (Figure 4), it can lead to the false impression that the polygons share more common vertices (Principle 5).

Fourth, when two polygons share at least three vertices and the polygon formed by the shared vertices is not regular, it can be more difficult to see the cardinality of the intersection polygon, i.e. the number of shared vertices (Principle 6). Figure 7 contrasts two examples where the polygons’ shared vertices form a regular polygon (b) and a highly non-regular one (a).

In addition, when two vertices overlap, they can appear as one (Principle 7). Similarly, if a vertex appears on the boundary or interior of a polygon not incident to it, the user can be given the false impression that the vertex is part of the polygon (Principle 8).

Principles 35, and 8 have been applied in existing subset-based hypergraph visualization, while the other principles that we have identified are rather specific to the polygon metaphor.

3.2 Objective Function

Based on the aforementioned design principles for hypergraph visualization, we formulate the following energy terms which are combined into the objective function used during the optimization.

Polygon Regularity (PR) Energy:

The isoperimetric ratio of a polygon [gage1984curve], defined as where and are respectively the perimeter and area of the polygon, is a measure for the regularity of the polygon. Given an -sided polygon, this measure is minimized when the polygon is regular (Principle 1), which is . The isoperimetric ratio also favors convex polygons over non-convex polygons and over polygons with self-overlaps or flips.

Since the area of a polygon is a quadratic polynomial with respect to the - and -coordinates of its vertices, the isoperimetric ratio is a radical (with square root terms). Using it directly as an energy can make the optimization process more challenging. Instead, we make use of the following energy term:

(1)

for all polygons in the data. , , and are the area, perimeter, and cardinality of , respectively. This energy is non-negative and is zero only when is regular.

Polygon Area (PA) Energy:

To satisfy Principles 2 and 3, we observe that for regular polygons whose edge lengths are , the area of the polygons is a monotonically increasing function of the cardinality of the polygons. That is, when combined with Principle 1 whose corresponding energy term is , we can formulate a energy by requiring all edges in the layout to have the unit length as follows:

(2)

where is any edge in any of the polygons in the data and is the length of . We will discuss how edges are generated in a polygon in Section 3.3. Note that while itself does not ensure appropriate areas for polygons, it can do so when the polygon is regular. Therefore, combining the PR and PA energies can address Principles 12, and 3.

Polygon Separation (PS) Energy:

Principle 5 suggests that when two relationships share at most two entities, it is desired to arrange the corresponding polygons in such a way to avoid having any intersection in the interior of the polygons. However, this is not always possible given the complexity of the data. To address this, we define the polygon separation (PS) energy for the case when the two relationships share zero, one, or two entities.

We approximate each polygon by a circle whose center is the centroid of and whose radius is the circumradius of a regular -gon whose edge lengths are . Note that in this case the circumradius is .

When the two regular polygons and share zero vertices, as illustrated in Figure 11 (a), the minimal distance between their circumcenters is the sum of their circumradii. To prevent that the two polygons touch, we add a buffer distance . Therefore, the minimal distance between the centroids of the polygons is , where and are the cardinalities of and , respectively. Based on this analysis, we define the polygon separation energy for this pair of polygons as :

(3)

where is the distance between the centroids of the polygons and .

When the two polygons share one vertex (the pivot), the distance between the polygons is measured in terms of the angle between the line segments formed by the pivot and the centroids of the two polygons (Figure 11 (b)). When the two polygons are regular, the minimal angle between the two line segments is . Similar to the previous case, we add a buffer angle such that the minimum angular distance between the two polygons is .

Let be the angle between the two line segments in the current configuration. Then the polygon separation energy for this pair of polygons is

(4)

When the two polygons share two vertices (Figure 11 (c)), it is desirable that the shared vertices form an edge for both polygons. In this case, the ideal distance between the circumcenters of the polygons is . Therefore, we define the polygon separation energy in this case as

(5)
(a)
(b)
(c)
Figure 11: The illustration of polygon separation energy when two polygons share zero (a), one (b), and two (c) vertices, respectively.

While it is possible that the two shared vertices do not form an edge in one of the polygons, in Section 3.3 we introduce operations that can change the order of the vertices in the polygons that may lead to the desirable vertex orders for the polygons.

Finally, when two polygons share at least three vertices, their interiors inevitably overlap. In this case, we define the PS energy to be zero.

The total polygon separation energy is thus defined as

(6)

for all pairs of polygons in the data. Note that besides Principle 5, our PS energy also aims to address Principles 7 and 8. That is, when two polygons are properly positioned, their vertices cannot overlap, nor can a vertex of one polygon appear in the interior of another polygon.

The separation among monogons is different since a monogon shares at most one vertex with other polygons. Besides, in contrast to other polygons, the location of the incident vertex is not the only factor that decides the drawing of a monogon.

Each monogon is drawn as the shape of a waterdrop which comprises a semicircle tip and two intersecting line segments tangent to the semicircle (Figure 14 (a)). The intersection point of the line segments is the vertex of the monogon, while the center of the semicircle is the center of the monogon. In our system, all monogons have the same size, which is controlled by the radius of the semicircle in each monogon as well as the distance between the center and vertex of the monogon.

There are two degrees of freedom when drawing a monogon: the orientation angle

and its incident vertex’s location. After optimizing the locations of vertices, one degree of freedom is fixed. However, a good orientation angle still needs to be decided for each monogon to reduce the overlapping between the monogon and its incident polygons.

We formulate the polygon separation energy for a monogon and a polygon as follows:

(7)

where is a weight to control the separation force between different polygons. If both and its incident polygon are monogons, then is set to 0.1 to make monogons cluster together since we observe that this facilitates the counting of the monogons when needed. The weight is 1 otherwise.

(a)
(b)
Figure 14: (a) The orientations of monogons are optimized to decrease the repulsion between its center and other polygons’ centers. The orientation of the monogon is measured by the angle which is the angle between the -axis and the line segment which connects the monogon center and the only incident vertex. The monogon center is the center of the semicircle tip of the waterdrop. (b) A hexagon (brown) and an enneagon (pink: nine-sided) share 3 vertices (filled circles with red boundary). The three shared vertices evenly separate the edges of the enneagon into three segments (colored differently).

This separation energy is calculated for every monogon and all of its incident polygons, so the total polygon separation energy for monogons is

(8)

Polygon Intersection (PI) Energy:

When two relationships share at least three entities, their polygons must overlap with an interior. Principle 6 states that the polygon formed by the shared vertices should be regular.

Let be the intersection polygon of and . The vertices of divide the boundary of into segments, each of which is a collection of consecutive edges of (Figure 14 (b)). For to stay regular, we need that each segment has the same number of edges of . Clearly, this is only possible when the number of edges of is divisible by the number of edges of . Still, we strive to distribute the edges of as evenly as possible into the above segments.

Let be the total length of segment where . We define the division energy of with respect to as

(9)

where represents the ideal average length of each segment. This is under the assumptions that each edge in the polygon has a length of one and the perimeter of the polygon is . We can similarly define the division energy of with respect to , which we denote by .

The polygon intersection (PI) energy is then defined as

(10)

over all pairs of polygons and whose intersection polygon contains at least three vertices.

Our final objective function is thus . To identify good values of the weights , , , and , we collect datasets. The number of hyperedges ranges from to , and the number of entities ranges from to . We sample weight sets , , ,

by bilinearly interpolating among four weight sets

, , and , which we consider the corners of a square. The square is then covered by a grid, each grid point represents a sampling weight. After conducting experiments, we check which five weight sets could give us the average lowest sum energy without weighting. We get the final weight set by normalizing the average of these top five weight sets. Based on this experiment, we choose the following values: , , , .

Principle 4 is automatically fulfilled since we do not allow self-overlaps and flips in any polygon. We describe how to achieve this in Section 3.3.

3.3 Polygon Layout Optimization

Our optimization framework consists of the following stages. First, we generate an initial layout. Next, the layout is iteratively improved to achieve lower values with respect to the objective function. The process stops when some termination criteria are met. We now describe each of these steps in detail.

Initial Layout and Starrization:

Our system can generate different initial layouts, such as placing all the vertices in the data on a circle (circular initial layout) or randomly (random initial layout). We can also convert the hypergraph data into a graph by treating each hyperedge as a clique in the graph. The vertices are then placed by using a force-directed algorithm [hu2005efficient].

Once all the vertices have been positioned, we need to construct the polygon for each relationship in the data. Note that the order of the vertices in the polygon is not meaningful with respect to the underlying relationship. However, randomly selecting the order of entities in a relationship can lead to a polygon with self-overlap or flip (Figure 3), thus violating Principle 4.

(a)
(b)
Figure 17: The starrization operation changes the order of vertices of the polygon to eliminate self-overlaps and flips. Compare the examples in this figure to those shown in Figure 3. The centroids, O, required by the starrization operation, are shown in both examples.

To address this issue, we employ a procedure which we refer to as the starrization. The input to this procedure is the positions of the vertices in a polygon without the connectivity information. The starrization then computes the centroid of the convex hull for the vertices of the polygon (Figure 17). The centroid is then used as a reference point, with the vertices in the polygon sorted by their angular coordinates. This gives rise to an order of the vertices in the polygon that is guaranteed to be free of self-overlaps and flips (compare the polygons before starrization in Figure 3 and the corresponding polygons after starrization in Figure 17). Note that our problem is related to but different from the problem of creating a polygonal region based on a set of points inside the region and a set of points outside the region [Reinbacher:2005].

Energy Minimization:

Given the initial layout, our automatic layout algorithm iteratively improves the layout by finding new locations of the vertices in the data and the order of the vertices in the polygons. We treat the locations of the vertices as a variable vector in a

-dimensional space where is the number of entities (vertices) in the data. The 2D coordinates of the vertices are consecutively encoded into the variable vector.

There are two operations performed. The first operation is to find a layout that yields the local minimum of energy. Because our objective functions are all arithmetic, we use the automatic differentiation library Adept[hogan2014fast] to monitor the evaluation of the objective functions so that the gradient can be calculated automatically. With the capability to calculate the gradient, we choose to use the quasi-Newton optimization method L-BFGS [liu1989limited] due to the speed and memory efficiency of this solver. During the optimization, the L-BFGS algorithm uses the gradient information to update the search direction and performs a one-dimensional search for the minimum of the energy on the line in the search direction [more1994line]. The solver has been shown to be able to handle non-linear optimization problems very well[liu2009centroidal]. Note that starrization is performed when evaluating energies for potential new locations of vertices. This ensures that no self-overlaps and flips can occur after the optimization.

The second operation evaluates, for each pair of vertices in the same polygon, the objective function before and after the two vertices’ locations are swapped. Note that the orders of the vertices in the polygon are also swapped. While a pair swap does not impact the shape of the polygon, it can impact other polygons that are incident to the vertices. Consequently, we perform starrization to the adjacent polygons to ensure no self-overlaps and flips after the swap. If a swap improves the objective function, the swap is accepted. Otherwise, it is rejected. Note that each pair of vertices inside each polygon is evaluated until no swap improves the energy.

The optimization alternates between the iterative line searches and the pair swaps. This process continues until there is no further gain in the objective function by the line searches and the pair swaps. Note that since our optimization problem is multi-objective and non-convex, there is no guarantee that a global minimum can be found.

Note that translating or rotating a layout does not change its evaluation with respect to the objective function. Consequently, we fix the locations of two vertices while optimizing for the positions of the other vertices during a line search.

Our default initial layout is based on the force-directed algorithm of Hu[hu2005efficient]. However, while different initial layouts can lead to different optimized layouts, it is not clear which initial layout consistently outperforms the other ones. Consequently, we provide different initial layout options in our system. Figures 3939, and  39 in Section A (Appendix) compare various optimized layouts (right) based on the three initial layout schemes (left). Given that the problem of finding a polygonal layout with minimal polygon overlaps is similar to the crossing number problem[Garey:83], which is known to be NP-hard, it is unlikely that any of the initial layouts guarantees to generate the global optimal layout. We observe that the force-directed scheme tends to lead to fewer overlaps in the polygons. On the other hand, the circular and random schemes tend to generate layouts that are space-filling. There is a trade-off between space utilization and the amount of unnecessary overlaps. We have made all three options available in our system, and used the force-directed method to generate the layouts for our user study.

4 Dual-Views and Joint Optimization

While the polyadic data pre-determines the roles of the entities and relationships, such a role assignment can be arbitrary. For example, in a paper–author dataset, a reader may consider each author as an entity and each paper as a relationship over its authors. On the other hand, a researcher might consider his/her papers as the entities while the authors as relationships over the papers they have authored. The paper-centric view and the author-centric view can be seen as “the two sides of the same coin”. In general, any data can have two views: (1) the primal view based on the input data model, and (2) the dual view in which the roles of the entities and relationships in the data are reversed.

Given these observations, we provide a synchronized optimization and visualization framework in which both viewpoints are not only optimized and displayed individually (e.g. Figure Automatic Polygon Layout for Primal-Dual Visualization of Hypergraphs (a) and (b)), but also optimized jointly and displayed side-by-side (e.g. Figure Automatic Polygon Layout for Primal-Dual Visualization of Hypergraphs (c-d)).

When generating the layout for the dual view, we wish to position each polygon’s dual element, which is a vertex in the dual view, as close as possible to this polygon’s centroid in the primal view. This allows the correspondence between the polygon and its dual vertex to be relatively easily perceived. To achieve this goal, we define the following energy.

Dual Distance Energy (DD).

Let be the polygon dual to . For a polygon , let be its centroid. The dual distance (DD) energy for the whole layout is then defined as

(11)

With the dual distance energy, the objective function becomes where by applying equal weights to scalarize the multi-objective function into one scalar function. Note that our system allows users to change the weights to prioritize different design principles during the optimization.

When conducting the joint optimization, we optimize the two views simultaneously. That is, the locations of the vertices in the primal view and the locations of the vertices in the dual view are formed as one high-dimensional vector and used to evaluate the objective function that includes all the energy terms for the primal view and the dual view as well as the DD energy. A new configuration (a layout for the primal and a layout for the dual) is accepted only if the total energy decreases. This applies to both the line search and the pair swap operation.

In both the primal and dual views, a monogon corresponds to a degree-one vertex in the other view. To assist the mapping between monogons and their dual elements, the objective function (Equation 11) can be applied again locally.

We still use the same optimization solver to minimize the energy for monogons. However, for monogons, the input becomes a vector of variables where is the number of monogons in the hypergraph. Each variable corresponds to the orientation angle of a monogon. Note that when optimizing the monogons, all other terms of our energy are no longer useful since a monogon is always regular, in the ideal length, and shares only one vertex with any other polygon.

5 Visualization and Functionalities

Our visualization system allows the user to load a hypergraph and visualize it. To increase perceptibility, each vertex is rendered as a sphere with reflective material. Each edge is rendered as a cylinder that is also reflective. Each polygon is rendered with reflective and translucent material. This leads to rendering effects that are similar to the Cushioned Treemaps [vanWijk:1999].

When multiple polygons intersect, polygons with larger cardinalities are behind those with smaller cardinalities. This choice is similar to that of Kelp Diagrams [Dinkla:2012].

The colors of the polygons can be based on the properties of the underlying relationships. To increase the readability of overlapping polygons, we use colors suggested by ColorBrewer [harrower2003colorbrewer].

We provide two views, one for the primal representation and the other for the dual representation. The user can choose to see either view only or both. When both views are shown, the user can also select a vertex or a polygon in one view to inspect its properties. The corresponding polygon or vertex in the other view is highlighted automatically.

(a)
(b)
(c)
(d)
(e)
Figure 23: This figure shows the resulting statistics of our survey with

participants. For Q1, Q4, and Q5, the yellow bars represent the accuracy rate for each question and each method. The green bars represent the timing results (in seconds). The scale of accuracy is always shown on the left of the figure, and the scale of the timing is shown on the right. The standard error of the accuracy and timing results (yellow and green bars) are included as black line segments. Note that the standard error is zero for the polygon-based method in Q1. Consequently, the standard error in this case is not shown by the graphing software. The user preferences over the four methods inquired in Q2 and Q3 are shown in pie charts.

(a)
(b)
(c)
(d)
Figure 28: Visualization results used for question Q2 in our user study with a dataset that contains hyperedges and vertices.

6 Performance and Evaluation

Our optimization framework requires repeated evaluation of the objective function given the current layout configuration (after starrization). Recall that and are the numbers of vertices and hyperedges in the data, respectively. The computation of the PR and PA energies have a complexity of . On the other hand, the computation of the PS and PI energies is more computationally costly as it requires processing each pair of hyperedges in the data. The complexity for the PS and PI energies is . Similarly, each starrization operation has a complexity of . This means that each evaluation of the objective function is of the complexity . During each line search, there are a constant number of objective function evaluations. On the other hand, each sequence of pair swaps incur number of objective function evaluations, one per a potential pair swap. Thus, the complexity of each such sequence is . Since the number of times of consecutive line searches and pair swaps depends on both the sizes of the data and the convergence criteria, our optimization framework has an overall complexity of .

We have tested our optimization framework on eight datasets. Six of the datasets were collected from the DBLP database [author_data] with different search criteria and two were from an infectious disease dataset [isella2011s]. The smallest dataset has entities and relationships, while the largest dataset has entities and relationships. The time to perform the automatic optimization (either primal or dual) ranges from seconds for the smallest dataset to seconds for the largest dataset. When performing joint optimization, the time ranges from seconds for the smallest data to seconds for the largest data. The timing results were taken from a computer with an Intel(R) Xeon(R) E-G  CPU @ GHz and GB RAM.

We conducted a user study with participants including five undergraduate students and graduate students to understand how the polygon-based layout compares to existing subset-based hypergraph visualization techniques and whether the incorporation of the dual view helps with data analysis.

Due to the ongoing COVID-19 pandemic and the resulting university closure, the user study was conducted remotely. The participants of our study consisted of students in Computer Science, student in Physics, and student in Mechanical Engineering. To our knowledge, the participants were not familiar with hypergraphs and their visualization. They were given five questions in the form of an online survey. Their answers and the time it took a participant to answer each question were recorded. The questions were as follows:

  1. How many authors are part of the paper with the most co-authors?

  2. Which layout helps you most effectively determine whether the two papers with the most co-authors share an author?

  3. Which layout helps you most effectively find the paper with the greatest number of authors?

  4. How many papers does the most productive author have?

  5. How many authors does the most authored paper of the most productive author have?

The first three questions were designed to compare our technique with three recent hypergraph visualization techniques that are region-based and whose software was available: EulerView [simonetto2009fully], HyperVis [Arafat:17], and the Zykov representation [zykov1974hypergraphs, ouvrard2020hypergraphs] with the layout generation method based on wheel graph introduced in [Arafat:17]. The results of these techniques were generated by using the released codes of these methods so that the colors and layouts were consistent with the respective published work. To provide some context to the questions, we used the data based on author-paper interpretation. That is, each polygon was a paper and the vertices of the polygon represented its authors. In our survey, the colors of the polygons were based on qualitative color schemes of ColorBrewer [harrower2003colorbrewer], which are designed for categorical attributes. Thus, the participants still needed to extract the cardinality of a polygon based on the number of edges in the polygon.

Question Q1 consisted of four individual questions. For each sub-question, the participants were given one of the four region-based hypergraph layouts: (a) EulerView, (b) HyperVis, (c) Zykov, and (d) N-ary (ours). Note that the layouts were based on the same dataset created synthetically, with papers (polygons) over authors (vertices). The order of the sub-questions was randomized and differed from user to user. As shown in Figure 23 (a), the accuracy rate was the highest for our method. While the accuracy rate for Zykov’s approach was also high, the average time to answer the question was seconds versus seconds for our approach. The standard errors of the data are also shown, which are comparable for the timing of all four techniques. On the other hand, the standard error is zero for our technique, indicating that all participants answered Q1 correctly.

Questions Q2 and Q3 were designed to understand the effectiveness of the four hypergraph visualization methods in conveying the relationships among hyperedges (-ary relationships) in the data and the distribution of the cardinalities of -ary relationships, respectively. Question Q2 used a synthetic dataset with papers from authors, which was created from a larger actual dataset. Question Q3 used a more complex dataset with papers from authors which was also created from a larger actual dataset. For both questions, the users were given all four layouts simultaneously, such as the one shown in Figure 28 for Question Q2. As shown in Figure 23 (b-c), for both questions of the users favored our visualization over the other techniques.

Overall, our user study suggests that our visualization technique leads to higher accuracy and less time to finish a task than the other techniques. In addition, the participants appeared to prefer the polygon-based visualization over the other layouts.

The last two questions were designed to understand the potential benefits of including a dual view in the visualization. In our context, a polygon in the dual view is an author, and the vertices in the polygon are papers of the author.

Question Q4 consisted of two sub-questions. In the first sub-question, the participants were given the primal view, while in the second they were given the dual view. To avoid bias, the users were not notified that these images were based on the same dataset. As shown in Figure 23 (d), while achieving the same accuracy of , the dual view required an average of seconds to complete the task while the primal view required an average of seconds to complete the same task. The dataset used in this question was the same as that for Question Q3.

Question Q5 consisted of three sub-questions: (1) the primal view only, (2) the dual view only, and (3) simultaneous display of both the primal and the dual that were jointly optimized. The dataset for this question was the same as that for Question Q2. Again, the users were not notified that these visualizations were based on the same dataset. As shown in Figure 23 (d), for this question the dual view and the combined view led to the same accuracy () which is better than the primal view (). With the combined view the participants needed an average of seconds to finish the task while with the dual view the participants needed on average seconds.

These results indicate potential benefits of the primal-dual view over using only the primal view. However, the pandemic-related university closure placed a number of constraints on our user study, such as the relatively small number of participants, the lack of control over the devices used in the survey (and thus the display sizes), and attention spans of the participants that affected our decision on the length of our survey to avoid incomplete answers. Consequently, we consider the findings of our user study preliminary. A more thorough, in-person user study with a controlled environment, is needed to validate or invalidate our preliminary findings.

(a)
(b)
Figure 31: Paper and authorship data from the online database DBLP [author_data] for publications from 2013 to 2015 in IEEE Transactions on Pattern Analysis and Machine Intelligence. Each -ary relationship is either a paper with authors (left: the primal view) or an author with papers (right: the dual view).

7 Case Studies

We have applied our visualization technique to two applications: an author collaboration network and an infectious social contact network[isella2011s].

7.1 Authorship Collaboration Network

In Figure 31, we show the largest connected subset of publications for IEEE Transactions on Pattern Analysis and Machine Intelligence from to . We make a number of observations.

From the primal view ((a): polygonspapers), we can see that there is a variety in the number of co-authors in a paper, ranging from two-author papers to eight-author papers. Combined with the lack of single-author papers (monogons), this highlights the collaborative nature of the field. In addition, it seems that the majority of the papers have three or four authors, indicating that this is the range of the number of people in a team that balances between productivity and management.

From the dual view ((b): polygonsauthors), we observe that there are many authors with only one publication. We speculate that they were the students on the team, who after graduation, left the academic world of publications. On the other hand, polygons adjacent to many monogons are likely senior researchers and Ph.D. advisors. There are two highly productive authors (indicated by the grey polygons), who appear to be well-connected in the network but at a large distance. This can be perceived as the research areas that the two authors have worked on are relatively unrelated.

The Erdős number measures the collaborative distance between a researcher and the Hungarian mathematician Paul Erdős. In the research community in our dataset, we can similarly define such a distance between any researcher to the most productive author in the data. This is achieved by first identifying the largest polygon in the dual view (b), then finding the corresponding vertex in the primal view, and finally measuring the polygon distance between this vertex and the vertex representing the researcher whose Erdős number is being computed.

Such insight is facilitated by the layouts generated by our automatic framework, and the primal-dual approach.

(a)
(b)
(c)
(d)
Figure 36: The visualization of a social contact pattern data [isella2011s] for May , (top) and June , 2009 (bottom). The left images show the primal views and the right images show the dual views. Notice the difference in the patterns of the two days (top vs. bottom).

7.2 Social Contact Patterns

We also apply our framework to a social contact pattern dataset that aims to track the spread of infectious diseases. Isella et al.[isella2011s] conduct an experiment by tracking visitors to a science gallery. The visitors are asked to wear an electronic badge that detects face-to-face contact. Unfortunately, the published data contains only the time durations of the contacts but not the physical distances of the contacts.

In the primal view, each visitor is an entity (vertex). A relationship (polygon) involving visitors implies that a maximal set of visitors who, as pairs, have spent more than seconds in a deemed close-enough distance in this study. We select two days (Sunday, May , 2009 and Saturday, June , 2009) from this dataset as our test cases, and visualize the data for both days in Figure 36.

From the primal views ((a) and (c)), we observe that there were more visitors on May than on June . We speculate that this is partially due to the fact that more visitors visit the galley on Sundays than on Saturdays. It is also possible that more people choose to be outdoors in June when it is more likely to be sunny than in May when it can be rainy.

In addition, a large polygon corresponds to a group of people who had pairwise close contact. While they could be in close contact at different places and times, the fact that many visitors do not stay in the galley for a long period of time indicates that the group either knew each other (e.g. a family or a group of friends) or corresponds to a time of the day during which the members of the group visited the galley simultaneously (though uncoordinated).

The relatively many overlapping polygons are an indication that the groups of visitors may belong to a larger group. Recall that whether two visitors are considered in closed contact depends on whether they were in a close distance for over seconds. Both the distance threshold and the time threshold ( seconds) were arbitrarily chosen. Varying the values of the thresholds and observing their impacts on the data can lead to additional insight for the application.

Finally, the large polygons in the dual views ((b) and (d)) correspond to people who had been in contact with the largest number of other visitors. This can be caused by them being in the galley longer than other visitors, such as the employees of the gallery or the tour guides working there. Regardless, should an infectious disease break out, such people can contribute to the faster spread of the disease. Tracing their activities before and during the breakout can be more urgent than tracking the visitors represented by the smaller polygons.

The above observations and hypotheses are enabled by our automatic layout optimization technique and the primal-dual approach.

8 Conclusion and Future Work

The main contribution of our work is the introduction of an automatic polygon layout optimization framework that enhances the quality of the layout for hypergraphs. At the core of our technique is a set of design principles for polygon layout that we have identified and the objective function based on these principles. To our knowledge, it is the first time that the order of vertices in a polygon is explicitly addressed during layout generation (the pair swap operation and the polygon intersection energy). To avoid self-overlaps and flips in the layout, we develop a procedure called starrization which guarantees that the layout is free of such artifacts. We also handle datasets with monogons.

Recognizing the duality between entities (vertices) and relationships (hyperedges), our system enables simultaneous display of both the primal view and the dual view. To correlate the two layouts (primal and dual), we enable an automatic joint layout optimization framework based on an augmented energy term that encourages the spatial correlations between the two views.

Through a user study, we show that the polygon-based layouts generated by our automatic framework compare favorably over a number of recent subset-based hypergraph approaches for a number of tasks. In addition, the user study confirms the benefits of the primal-dual visualization framework and the joint optimization.

There are limitations to both the polygon metaphor and our optimization approach. As our optimization algorithm has a complexity of where is the sum of the number of vertices and the number of hyperedges, it can be difficult to be scaled up to much larger datasets. We plan to explore hierarchical optimization and visualization to alleviate this problem. In addition, we will investigate the use of the GPUs as the evaluation of the objective function is highly parallelizable. Another issue with our optimization is that our polygon area (PA) energy and polygon separation (PS) energy are formulated assuming the underlying polygon has a low polygon regularity (PR) energy, i.e. being nearly regular. Our choice of such formulations is motivated by a number of factors such as reducing the computational cost. For example, finding the exact distance between two polygons not sharing a vertex is a classical computational geometry problem. Employing a formulation that requires exact computation can further increase the computational complexity of our optimization framework. However, when the polygons are not close to being regular, our current formulations of the PA and PS energies are no longer as effective. We will investigate other formulations that are less dependent on the regularity of the polygons.

The polygon metaphor also has its limitations. It performs best when the underlying hypergraph has approximately a tree-like structure. When there are an excessive number of polygons adjacent to a vertex, overlaps among these polygons are unavoidable. Similarly, a cluster of polygons can have unavoidable overlaps (e.g. Figure 36). When these types of overlaps occur, which can become more prominent as the tree-like structure in the data disappears, we observe that the effectiveness of the polygon metaphor decreases. Tasks such as recognizing a hyperedge and its cardinality, deciding whether a vertex belongs to a hyperedge, and perceiving whether two hyperedges intersect become more difficult. We plan to explore a multi-scale representation of hypergraphs to address this challenge. In addition, we plan to investigate the adaptation of graph sparification [Lai:2020] to hypergraph sparcification, in which less important data is filtered out from the visualization.

For visualization, we will explore better layouts to reduce the amount of unused space. Incorporating label placement for vertices and hyperedges in both the primal view and the dual view can strengthen the link between the two views. This is a potentially fruitful research direction. Addressing the uncertainty in the data is another future research avenue.

Acknowledgements.
The authors wish to thank our anonymous reviewers for their constructive feedback. We appreciate the help from Avery Stauber during video production. We also would like to thank Dr. Markus Wallinger and Dr. Danial Archambault for sharing the codes of EulerView. This work was supported in part by the NSF award (# 1619383).

References

Appendix A Initial Layout and Its Impact on Final Layout

Different initial layouts of the same dataset can lead to different final optimized layouts: force-directed [hu2005efficient] (Figure 39), circular (Figure 39), and random (Figure 39). Our system provides all three initial layout schemes for the user. The results in the paper were based on the force-directed layout similar to Figure 39.

Figure 37: The optimized layout for a dataset (b) given an initial layout where the vertices are placed based on the force-directed layout algorithm of Hu [hu2005efficient] (a).
Figure 38: The optimized layout for a dataset (b) given an initial layout where the vertices are placed on a circle (a).
Figure 39: The optimized layout for a dataset (b) given an initial layout where the vertices are randomly placed (a).

Appendix B Time Comparison between Manual Layout Design and Automatic Optimization

Table 1 compares the time of manually designing a polygon layout to that of using our automatic optimization-based algorithm. Note that for all three test datasets, the automatic algorithm is about three magnitudes faster than manual design. Both the manual design and the automatic algorithm start with the same initial layout, which, for the three test datasets, are based on the force-directed method [hu2005efficient].

Data No. No. Design time Optimization
Vertices Hyperedges (seconds) time (seconds)
No. 1
No. 2
No. 3
Table 1: This table compares the times of manually designing a polygon layout for three hypergraphs to those of using our automatic optimization framework.