1 Introduction
Treemaps are one of the bestknown methods for visualizing large hierarchical datasets. Given an input tree whose leaves have several attributes, treemaps recursively partition a 2D spatial region into cells whose visual attributes (area, color, shading, or annotation) encode the tree’s data attributes. Compared to other methods such as nodelink techniques, treemaps effectively use all available screen pixels to show data, and thus can display trees of tens of thousands of nodes on a single screen. Most treemapping algorithms use rectangles, although there are alternative models such as Voronoi treemaps [1], orthoconvex and Lshaped treemaps [9], and Jigsaw treemaps [39]. In this paper, we focus exclusively on rectangular treemaps.
The input of a rectangular treemapping algorithm is a rectangle and a set of nonnegative values together with a hierarchy on these values (represented by a tree). The output is a treemap , which is a recursive partition of into a set of interiordisjoint rectangles, where each rectangle has area , and the regions of the children of an interior node of the hierarchy form a rectangle (associated with their parent). Such a partition of a rectangle into a set of disjoint rectangles is also called a rectangular layout, or layout for short. We typically assume that the input values are normalized, that is, the sum corresponds to the area of .
Nowadays large hierarchical datasets are also available over time. Hence, there is a need for timedependent treemaps which display changing trees and data values. Ideally, such timedependent treemaps enable the user to easily follow structural changes in the tree and in the data. In a timedependent setting, the input values become functions for each , where the discrete domain represents the different time steps in the data. We assume that the hierarchy on the values and are not timedependent, and that the values are properly normalized for each time step separately. Furthermore, we use the special value to represent that data element is not present at time ; and we speak of insertions or deletions if starts or stops to be nonzero, respectively.
The visual quality of rectangular treemaps is usually measured via the aspect ratio of its rectangles. This indicator can become arbitrarily bad: Consider a treemap which consists of only two rectangles. If the area of one of these rectangles tends towards zero, then its aspect ratio tends towards infinity. Nagamochi and Abe [23] describe an algorithm (APP) which computes, for a given set of values and a hierarchy, a treemap which provably approximates the optimal aspect ratio. De Berg et al. [9] prove that minimizing the aspect ratio for rectangular treemaps is strongly NPcomplete. Kong et al. [17] propose perceptional guidelines to improve treemap design and Zhou et al. [40] perform user studies to test the effectiveness of different rectangular treemapping algorithms. Recently Lu et al. [20] argue that the optimal aspect ratio for treemaps should, in fact, be the golden ratio.
Rectangular treemaps were pioneered by Shneiderman [25]. His SliceandDice (SND) algorithm generally produces rectangles of high aspect ratio and hence of poor visual quality. Squarified treemaps (SQR) by Bruls et al. [6]
are based on a slicing heuristic which tends to perform very well in practice, producing in general rectangles of nearoptimal aspect ratio.
For timedependent treemaps, a second quality criterion is stability. Ideally, small data changes should yield only small changes in the treemap. However, Squarified treemaps (which have good aspect ratios) are not particularly stable. Shneiderman and Wattenberg [26] develop the first treemap algorithms which consider stability. Their ordered treemap algorithms (PivotbyMiddle (PBM), PivotbySize (PBZ), and PivotBySplit (PBS)) use a specified order on the input data and aim to lay out rectangles in this order. Yet, no guarantees exist on how close two consecutive rectangles in the order will stay in the treemap as the input data varies over time. Related approaches are the Strip algorithm (STR) by Bederson et al. [2] and the Split algorithm (SP by Engdahl [10]. Other methods, e.g., the Spiral algorithm (SPI) by Tu and Shen [33], and the Hilbert (HIL) and Moore (MOO) algorithms by Tak and Cockburn [31], lay out rectangles following a spacefilling curve. Common to all above treemapping algorithms is that they fully recompute the treemap when the data changes.
A few recent treemapping algorithms were specifically developed for timedependent data. Hahn et al. [15] and Hees and Hage [34] describe stable versions of Voronoi treemaps. Chen et al. [8] propose a smallmultiple metaphor to visualize timedependent hierarchies. Their algorithm computes a global layout for all time steps simultaneously, but does not handle insertions or deletions. Scheibel et al. [24] give an algorithm that maps changes in the data onto an initial layout. However, “treemaps” of subsequent time steps are not proper rectangular layouts as white space is introduced when resolving overlaps between rectangles. Lukasczyk et al. [21] and Köpp and Weinkauf [18] show how to compute static overviews of the whole evolution of timevarying hierarchical data sets. GuerraGómez et al. [12] and Card et al. [7] present interactive visualization tools for the same kind of data.
Sondag et al. [29] propose the first stable rectangular treemapping algorithm. Their Local Moves algorithm (LM) changes the layout of the treemap between time steps using only local modifications (local moves). The initial layout is computed using APP [23]. The runningtime performance and the visual quality of the timedependent treemaps produced by LM varies according to the number of local moves used: More local moves require a longer running time, but generally result in higher visual quality. LM with four local moves (LM4) delivers a reasonable tradeoff between running time and quality. In contrast, LM with zero local moves (LM0) focuses solely on stability but is much more efficient than LM4. Recent implementations of the LM algorithm use a special rule to handle insertions of large numbers of rectangles: If the number of rectangles to be inserted in a subtree exceeds the number of rectangles currently present in (after any deletions), a new layout for is computed using APP. A closely related method to LM has recently been used to visualize large timedependent treemaps in the context of software evolution understanding [36].
Contribution. While, as outlined above, advances have been booked in the design of treemapping algorithms, comprehensive evaluations thereof are lacking, even more so for the timedependent case. There is currently no comprehensive evaluation containing more than a few treemapping algorithms, more than a few datasets, and a principled discussion of quality metrics. Our paper contributes to such an evaluation along three lines:
(1) We introduce a new method to measure the stability of timedependent treemaps which explicitly considers input data in a principled way (Section 2). An algorithm is stable if layout change and data change correlate positively. To design such a metric, we overcome the difficulty that the data and layout spaces are a priori incomparable, by introducing the concept of a baseline treemap which represents the minimum amount of change that any timedependent treemapping algorithm must incur when moving from treemap to the next treemap .
(2) The performance of treemapping algorithms heavily depends on the characteristics of the datasets used. Ideally, we want to measure the performance of treemapping algorithms across all such possible datasets. We approximate the infinite problem space of all possible datasets by a lowdimensional feature space spanned by a small set of measurable characteristics (features) of the problem instances (Section 3). Our feature space is a meaningful tool to analyze how well the choice of datasets covers the whole spectrum of the problem space.
(3) We perform a quantitative evaluation of 13 rectangular treemapping algorithms on 46 characteristic datasets. We analyze and visually summarize the results with respect to visual quality and stability, which provides insights to both researchers and practitioners when selecting treemapping algorithms for specific applications and datasets. To date, this is by far the most comprehensive evaluation of (dynamic) treemapping algorithms. Section 4 describes our experimental pipeline. Section 5 shows the results of our experiments. Section 6 concludes with a discussion of our results.
2 Metrics
There are two important criteria to evaluate treemapping algorithms: visual quality and stability. A variety of metrics have been proposed for both criteria. We next outline the visual quality and stability metrics we use in our evaluation and discuss why these metrics are a representative subset of existing metrics for treemapping algorithms. We measure our metrics for each leaf rectangle separately. Rather than summarizing these values by average, maxima, or some other statistic on the distribution of values over all rectangles, we capture the complete distribution. This allows us next to analyze and visualize the measure values at different levels of detail (Section 5).
Please note that we do not compute metrics for interior nodes and hence these values are not included in our analysis. In particular, based on our results, we cannot assess the quality of treemapping algorithms for tasks that involve internal nodes.
2.1 Visual quality
The perrectangle weight information in a treemap is conveyed by the areas of the rectangles. Since areas of rectangles close to squares are easier to estimate than areas of elongated rectangles, the visual quality of a treemap is commonly measured by the aspect ratios of its rectangles. For a rectangle
of width and height we compute the aspect ratio :(1) 
Values of close to are considered “bad”; values close to are considered “good”. Some authors propose that should be close to the golden ratio [20]. However, as little evidence is provided for this metric value target, we adopt the uniformly accepted optimum. Large rectangles have more visual impact, so one can weigh aspect ratios by their respective areas [36]. However, this affects only a statistic on the distribution of aspect ratios and not the individual aspect ratios. Hence, to allow the computation of e.g. the weighted mean or median aspect ratio, we store the corresponding weight with each of the computed aspect ratios. We denote the corresponding weighted metric by .
2.2 Stability
Evaluating the stability of a treemapping algorithm is more involved. Consider treemaps at two consecutive time steps and . Since stability does not explicitly depend on the value of , we denote the former and the new treemap by and respectively, to simplify notation. We also denote the rectangle areas in and by and , respectively. For a stable treemapping algorithm, the (visual) difference between and should roughly correspond to the difference between and . Note that the combination of large changes in data values and small changes in the layouts is very unlikely, since rectangle areas in treemaps must exactly match the data values. Hence, we want to actually measure instability, i.e., large layout changes that are not caused by large data changes.
Most existing metrics for treemap stability only consider the visual change in the treemap’s layout. This layout change is usually computed by evaluating the change for each rectangle separately and aggregating it over all rectangles. Shneiderman and Wattenberg [26] define
as the Euclidean distance between the vectors
and , where , , , and are the coordinates of the topleft corner, the width, and the height of a rectangle, respectively. They then define as the average over all rectangles. Hahn et al. [15, 13] simplify this metric by defining as the distance moved by the centroid of a rectangle, again defining as the average. Tak and Cockburn [31] use the same as [26], but defineas the variance over all values computed by
. They also propose a drift metric, which measures how much a rectangle moves away from its average position over a long time period. Very recently, Scheibel et al. [24] introduced two new layout change metrics. The average aspect ratio change defines as the relative change between the aspect ratios of and , and defines as the average. The relative parent change defines as the relative change of the distance between the center of a rectangle and the center of its parent, again defining as the average. Chen et al. [8] propose a metric to quantify the ability of users to track timedependent data in treemaps which is closely related to the drift metric [31]. A different approach measures layout change using pairs of rectangles. Hahn et al. [14] introduced the relative direction change, which, for every pair of rectangles and , measures how much the angle from the center of to the center of changes. Recently, Sondag et al. [29] proposed the relative position change, which, for every rectangle pair , measures how much the relative position of with respect to changes. Then is defined as the average over all rectangle pairs. We discuss this metric in more detail below.Summarizing the above, we distinguish two types of layout change metrics: (1) absolute metrics measure how much individual rectangles move/change, and (2) relative metrics measure how much relative positions of pairs of rectangles change. As these two types of metrics capture different aspects, we include one metric of each type in our evaluation, as follows.
Cornertravel distance. As absolute metric we use a metric inspired by already existing absolute metrics [36, 35]. Let and be the width and height of an input rectangle , respectively. Let , , , and (, , , and ) be the positions of the corners of a rectangle (). We define the cornertravel (CT) distance for a rectangle as
(2) 
where denotes the norm. Simply put, is the cornertocorner correspondence distance between and
, which is a wellknown metric used in computer vision to quantify change between two shapes using feature points
[30]. This metric is very similar to the original change metric introduced by Shneiderman and Wattenberg [26] by a small bounded factor. Note that , since a rectangle corner can travel by at most the length of the diagonal of .Relative position change. As relative metric we use the relative position change by Sondag et al. [29]. We choose this metric over the metric by Hahn et al. [14] since the latter is designed for general shapes and thus captures the relative position of rectangles less well. To compute the relative position of with respect to , we
partition the space surrounding into 8 sectors by extending the sides of to infinity (Figure 1). The relative position is defined as a vector , where is the fraction of that falls in sector . We obtain two different vectors and for treemaps and . We then define the relative position (RP) change of with respect to as
(3) 
To make this metric consistent with other metrics, we average all relative position changes with respect to to obtain the relative position change for a single rectangle as
(4) 
Data change. The stability metrics discussed above do not take data change into account. If the data changes by a large amount, then the layouts should be allowed to change significantly without considering this to be an instability. To add data change to a stability metric, one can consider the difference or ratio between the layout change and the data change [36, 35]. However, this has two problems: (1) we need a way to measure data change so that (2) the metric spaces for data and for layouts are comparable. For example, data change can be measured in terms of changes of rectangle areas (since these map the data). However, layout changes such as the cornertravel distance measure lengths, not areas. Such measures are not directly comparable, and thus their ratios or differences may not be meaningful. Although such metrics could be made comparable by suitable normalizations, such adaptations are necessarily metricspecific and do not provide a generic solution.
Baseline treemap. We overcome the above issues with a new method that captures data change in the layout space. For this, we define a baseline treemap with respect to and . The layout of (that is, the combinatorial structure of the rectangular subdivision which constitutes ) is based on the layout of . However, the areas of the rectangles in are the areas of . The idea is that minimizes the layout distance to among all treemaps with the areas of . Put differently: represents the minimum amount of change that any timedependent treemapping algorithm must incur when moving from and its associated area values to the next treemap and its area values . As a result, is a good metric for data change in the layout space.
We construct using an algorithm from [11]. For a treemap , a maximal segment is a maximal contiguous horizontal or vertical line segment contained in the union of boundaries of all rectangles in (e.g., the green segments in Figure 2). For two horizontal maximal segments and , we say that if there is a rectangle in whose bottom side coincides with and whose top side coincides with . This defines a partial order on horizontal maximal segments. We define a partial order on vertical maximal segments analogously (Figure 2). We say that is orderequivalent to if the corresponding partial orders on maximal segments are isomorphic. As shown by Eppstein et al. [11], for every possible set of areas, there exists an orderequivalent treemap to that correctly represents those areas. In particular, we can initially define as the treemap orderequivalent to with the areas of .
If rectangles are inserted or deleted, the baseline treemap cannot be orderequivalent to , so we handle insertions and deletions separately. Dealing with deletions is easy: we simply let the areas go to zero. For insertions we must be more careful. Indeed, while we consider only rectangles present in both and when measuring stability ( and in Equations 2 and 4), inserted rectangles can strongly impact the positions of rectangles in . To deal with insertions in the stablest possible way, we observe that the baseline treemap does not strictly need to be a proper treemap, but only needs to capture how much rectangles must minimally move to update to the new data. To minimize the movement of the rectangles due to insertions, we distribute the cumulative area of the inserted rectangles over the “walls” of treemap as evenly as possible. To do this, we replace every maximal segment in by a rectangle, and assign every such rectangle a portion of the inserted area corresponding to the length of (Figure 3). Hence the walls become equally thick everywhere and the original rectangles of need to move as little as possible to yield .
The baseline treemap does not always minimize the movement of every rectangle. Still, our experiments show that in a vast majority of cases the layout change between and is a good estimate for the minimum layout change between and , and thus a good measure for data change. Also, note that is not an actual treemap that represents the input data. Rather, it is an artificially created treemap (thus, the name ‘baseline’) which has many additional (gray) rectangles that represent the data change between time steps.
Stability metric. We can now define a stability metric that takes data change into account. Consider a rectangle and the corresponding rectangles and in and , respectively, and let be the layout change function for single rectangles. Two natural choices for spatial stability are the difference or ratio between and . Our experiments showed that the difference is typically more informative, that is, it typically exhibits clearer, more pronounced patterns, so we define the stability of a single rectangle for the CT and RP layout change metrics as
(5)  
(6) 
Note that if , which is possible. Indeed, a value of for represents “very stable”, and is considered to be (roughly) as stable as possible. As with visual quality, we also considered assigning higher weights to larger rectangles. However, our experiments showed only a minimal difference between the unweighted and weighted versions of the stability metrics. Hence, we omitted the weighted versions in our experiments.
Limitations. The stability metrics we use focus only on consecutive time steps. The stability of timevarying treemaps could conceivably be influenced by effects that span multiple time steps, which our metrics do not capture directly. However, we believe that the most salient events influencing stability occur between consecutive time steps and hence we limit the metrics accordingly.
3 Data
The visual quality and/or stability of treemapping algorithms heavily depends on the characteristics of the treated datasets. For example, SliceandDice (SND) changes the slicing direction from horizontal to vertical on each level of the treemap. As a result, SND has low visual quality for shallow trees. A typical way to study the performance of several algorithms is to compare their worst or average case behavior. However, this would not tell us anything about how actual performance correlates with the input data. We aim to provide sufficient insight so that both practitioners and researchers can make informed choices about which algorithm to use for their data. To do this, we study the performance of treemapping algorithms as a function of the input data characteristics. For this, we approximate our tree space along explicit features (Sec. 3.1) and next sample this space to create suitable datasets to evaluate our quality metrics proposed in Sec. 2.
3.1 Data space description
Our methodology is inspired by the framework proposed by SmithMiles et al. [28] to objectively measure the performance of algorithms across instance space. We approximate the infinite problem space of all possible timedependent weighted trees by a lowdimensional feature space
spanned by a small set of measurable characteristics (features) of the problem instances. Similar featurebased approaches are used to represent other data spaces in machine learning
[4]. We propose to use the following four features:Levels of hierarchy. Some treemapping algorithms (such as SND) use the hierarchy directly to compute the layout of the treemap while other algorithms use additional information, such as an order on the input or the layout of the previous time step. This feature hence classifies treemapping algorithms based on how well they are handling (deep) hierarchies and how much they depend on a certain depth of the hierarchy.
Variance of node weights. This metric strongly influences the visual quality of treemaps. Lowvariance datasets are more regular and typically easier to lay out with high visual quality than highvariance datasets. This feature directly measures the effectiveness of a treemapping algorithm in optimizing the visual quality.
Speed of weight change. The timedependent nature of the data forces the treemaps to change over time. When data values change only slightly, not many structural changes are needed to maintain a highquality treemap. If data values change rapidly, a treemapping algorithm must make a tradeoff between stability and quality. This feature captures how well treemapping algorithms can handle fastchanging data. Here we also distinguish between data changing smoothly and data that changes in bursts (spikes).
Insertions and deletions. Frequent insertions and deletions will negatively impact the stability of a treemap. This feature clearly distinguishes treemapping algorithms that perform well only in the presence of very few insertions and deletions, and those that effectively utilize insertions and deletions instead. We also distinguish between continuous insertions and deletions and spikes of large numbers of insertions and/or deletions.
Obviously, other features can be used to characterize (timedependent) trees, e.g., the total node count; the min, max, and average node degrees; and the (im)balance of the tree structure, to mention just a few [5, 19]. While such features will influence performance aspects of treemapping algorithms such as running time, we believe they are less discriminative for treemapping quality. Separately, we must limit the number of features used to describe to make the sampling thereof practical in terms of the number of resulting datasets, and further quality computations done on these, as outlined next.
3.2 Data space sampling
To evaluate the performance of treemapping algorithms over , we must sample this space with datasets that cover its four dimensions well. For this, we use a dimensional grid defined by carefully chosen feature values or valueranges which have been determined by analyzing the distribution of feature values over thousands of realworld tree datasets (see further Section 4), as follows.
Levels of hierarchy (3 samples). We use three ranges: 1 level, 2 or 3 levels, and more than 3 levels. Most hierarchical datasets we have analyzed have 2 or 3 levels. This number of levels is quite common for datasets that are visualized via treemaps, mainly because visually understanding the node nesting in deeper treemaps becomes hard [37, 6]. This is also recognized by Tableau^{1}^{1}1Tableau visualization software. www.tableau.com where treemaps having more than a few levels are not explicitly supported by visual cues. A special case is a dataset with only a single level, i.e., sets of weight values. Such datasets are also often visualized by treemaps, as these are more spacefilling than alternatives such as bar charts [37]. Such singlelevel treemaps are challenging for treemapping algorithms that implicitly use the depth of the hierarchy. Finally, we consider datasets with more than 3 levels, which correspond to deep hierarchies such as, for example, file systems or software architectures [15, 14, 35].
Variance of node weights (2 samples). We use two ranges: Low variance and high variance. To ensure that our classification is not strongly influenced by the total number of tree nodes, we use the coefficient of variation
to classify the datasets, with the standard deviation
and the mean computed over all leaf values over all time steps. We say that there is low variance if and high variance if , respectively.Speed of weight change (3 samples). Let denote the set of nonzero weights at two consecutive time steps and , and let and be the sum over all and , respectively, for those that are in . We measure the weight change between and as . We define the speed of weight change as small if both the mean and standard deviation of over all time steps are less than
. In this case, the dataset’s weight changes are small with few outliers present. We define the speed of weight change as regular if the coefficient of variation
of is at most 1, the mean is less than , and the speed of weight change is not small. In this case, significant changes happen but the number of outliers is small. Finally, we define the speed of weight change as spiky if it is not small nor regular. In this case, either very large changes () continuously occur, or the coefficient of variation is large () and changes are somewhat substantial ( or ).Insertions and deletions (3 samples). For two consecutive time steps and , let and denote the set of nonzero weights at and , respectively. We measure the impact of insertions and deletions as the cardinality of the symmetric difference between and , relative to the cardinality of , that is, . As above and with the same reasoning, we denote the impact of insertions and deletions as small if both the mean and standard deviation of over all time steps is less than ; we denote it as regular if the coefficient of variation of is at most 1, its mean is less than , and the data is not classified as small; and we denote the impact as spiky if it is not classified as small nor as regular.
This sampling of yields dataset categories. We next describe how we practically executed a sampling of along this grid to evaluate treemapping algorithm performance.
4 Experimental Pipeline
To assess the quality of dynamic treemapping, we designed and executed the following pipeline (Figure 4).
Datasets. We consider dynamic trees from several types of sources. We found 46 (out of 54) instances of our dataset categories.

Worldbank:^{2}^{2}2https://data.worldbank.org/indicator/, accessed 04072018. Timedependent world development indicators of countries on topics such as agriculture, rural and urban development, education, trade and health.

GitHub:^{3}^{3}3https://github.com, accessed 16072018 Hierarchies of folders, files, and classes, weighted by the number of code lines, extracted from all revisions of several popular GitHub repositories using Scitools^{4}^{4}4https://scitools.com.

Movies: Movies from MovieLens[16] and TMDB^{5}^{5}5The Movie Database. www.themoviedb.org, accessed 10022018.: We construct a timedependent hierarchy from the movie data in these databases by the grouprowsbyattributevalue partitioning method presented in [32, 37]. The hierarchy groups movies based on their genres, actors, release date, and keywords. Each leaf is a movie, whose weight stores a statistical measure (sum, mean, standard deviation, or count) of that movie’s ratings over a given period of time.

Custom: We complement the above selection with handpicked datasets, as follows: Dutch Names contains the frequency of popular baby names in the Netherlands per year^{6}^{6}6Meertens Instituut, KNAW. Nederlandse voornamenbank. https://www.meertens.knaw.nl/nvb, accessed 30052016.; UN Comtrade Coffee contains the amount of coffee each country imported per year^{7}^{7}7https://comtrade.un.org, accessed 15022017.; ATP contains personal information, historical rankings, and match results from 1968 to 2018 for ATP tennis players^{8}^{8}8https://github.com/JeffSackmann/tennisatp, accessed 03072018.; and Earthquakes contains the time, location, depth and intensity of seismic phenomena provided by the USGS Earthquake Hazards Program^{9}^{9}9https://earthquake.usgs.gov/earthquakes/browse/stats.php, accessed 03072018.. We manually selected these datasets to cover categories from that were not already sampled by the automatically mined datasets described earlier, and also to widen the provenance areas of our data.
Importantly, note that the above selection of dataset sources is orthogonal to the description of the feature space (Sec. 3). The former covers the origin of data (which may cover applicationspecific aspects not captured by our fourdimensional feature space); the latter covers applicationindependent data aspects as captured by the 46point sampling grid of .
To collect data from Worldbank, GitHub, MovieLens, and TMDB, we wrote several automation scripts to download the raw data, filter it to eliminate unusable (corrupted) datasets, create the hierarchies by grouping and binning data entries as needed, and finally compute the data features (Sec. 3).
We collected 2720 dataset in this way (summarized in Tab. I per category type). For full details on the naming convention of all the collected datasets, we refer to our online benchmark^{10}^{10}10 https://github.com/EduardoVernier/treemapcomparisonresources. As visible in Tab. I, the number of datasets per category varies highly. We randomly select one dataset from each category (cell in Tab. I) to include in our benchmark, i.e., 46 datasets in total. For eight categories (Tab. II, gray rows), we could not find a realworld dataset having the respective feature value combination, so we leave these out from our evaluation.
Our benchmark is not intended to provide reliable results for any single category; for that one dataset is simply not representative enough. However, the categorization of datasets allows us to reliably analyze the results along one (or maybe two) of the dimensions of (see Section 5.4). By choosing one dataset per category, we obtain a good (somewhat uniform) spread along any of the dimensions we are interested in, and a good variation in the other dimensions of , avoiding any hidden correlations. As a result, the 46 datasets chosen in this manner are much more informative than any ordinary sampling of roughly as many datasets, which is generally biased by the origin of the data.
Algorithms. We evaluated 13 stateoftheart and/or wellknown algorithms for rectangular treemaps: Approximate (APP), Hilbert (HIL), Local Moves using zero (LM0) or four (LM4) local moves, Moore (MOO), Ordered with PivotbyMiddle (PBM), PivotbySize (PBZ), and PivotbySplit (PBS), SliceandDice (SND), Spiral (SPI), Split (SPL), Squarified (SQR), and Strip (STR).
Metric extraction. For each time step of each dataset, we run all considered algorithms, generate the baselines (Section 2), and record the layouts, i.e., positions of all rectangles in all time steps . We use this data to compute our four quality metrics for each rectangle : the basic and areaweighted aspect ratios and , the cornertravel instability , and the relativepositionchange instability (Section 2
). This yields a highdimensional data collection: 46 tree datasets, several tens to hundreds of time steps per dataset, 13 algorithms, and 4 quality metrics.
Analysis. To get insights into how algorithms, quality metrics, and datasets relate, we explore all measurements using several visualizations. Section 5 details this topic, including our findings.
Replication. The full set of materials – data collection scripts, all 2720 datasets, algorithms’ code, extracted metrics, visualization code, visualization snapshots, and videos showing the animated treemaps over time – is available online, for replicability. To our knowledge, this is the first such benchmark ever constructed (and made public) in treemapping research.
5 Result Exploration
Exploring the measured data (Section 4) is a challenge in itself. To go beyond peralgorithm or peralgorithmanddataset averages, we designed several visualizations to answer specific questions on the algorithms in a bottomup way. First, we explore the data at the finest levelofdetail (Sec. 5.1). This shows insights at the rectangle level (all table rows), but cannot show all quality metrics (table columns). We next use increasingly aggregated views to compare methods from more viewpoints (table columns) but at coarser levels (fewer rows) (Secs. 5.25.4). We detail these next.
5.1 How to compare the algorithms’ induced changes (Q1)?
We want to see how the visual changes produced by each algorithm, at rectangle level, correlate with the baseline, i.e., minimal changes needed to lay out the treemap, for a given dataset. This helps us understand the behavior of our proposed stability metric (Equations 5 & 6). Moreover, this helps understanding how the baseline layout compares to concrete, existing, treemap layouts. To show this, we plot for each algorithm vs for the cornertravel distance (Equation 2) and relative position change (Equation 4) for a selected dataset, all time steps, and all rectangles . Figure 5 shows this for the WorldBankExports dataset, which has average characteristics in our taxonomy (Table II
). To better show the local sample density for such plots that have thousands of samples, we visualize a kernel density estimation
[27] of the scatterplot points using a perceptuallyuniform heat colormap.We see that most points are under the main diagonal in Fig. 5, so actual rectangle changes are (much) larger than the conservative baseline. Variations between algorithms are however significant: LM0, LM4, and SND have much smaller rectangle changes than all other algorithms (points are clustered close to the plot origin) for both absolute and relative motion. Separately, for all algorithms, the spread of is larger than the spread of . Hence, all algorithms deal better with keeping rectangles in (roughly) the same place in the treemap than with keeping them in the same relative position.
5.2 How is the quality of a treemap evolving in time? (Q2)
Figure 6 shows the different distributions of layout change vs baseline change for all algorithms for the same WorldBankRecords dataset used earlier. For each algorithm, all four quality metrics , , , and , and each time step, we draw a box plot showing the and percentile values (gray bars), median (black line), and interquartile range (green bars) of the metric for all rectangles in that time step. We order box plots lefttotight by decreasing metric median value. This shows the distribution of metric values over an entire sequence, i.e., how (un)stable is an algorithm and for how many time steps. Note that using natural (timestep) order is not useful since all algorithms but LM0 and LM4 are stateless.
We see that the spread of visual quality of the studied algorithms varies greatly over time. Consider first the unweighted aspect ratio (Figure 6a): The black curve shows the median aspect ratio per time step. The lower this curve is, and the faster it drops, the worse median aspect ratios do an algorithm yield per time step. The maximal median aspect ratio is similar for all algorithms, except SND, which scores poorly (Figure 6a, red dots). The minimal aspect ratio varies more (Figure 6a, yellow dots), and so does the median aspect ratio over time steps (shape of black curve). SQR and STR give very good aspect ratios, followed closely by APP, LM0, and LM4. The size of the green bars shows the aspectratio spread for all rectangles of a treemap (time step). SQR and STR show a narrow spread around good aspect ratio values. At the other extreme, SND has the (consistently) poorest aspect ratios. In contrast, SPI and PBZ have far larger spreads, meaning that the same treemap can have both very good and very poor aspectratio rectangles. Finally, the gray bars show the absolute range of aspect ratios for all time steps. We see that most methods behave similarly, but APP and SPL have consistently better worstcase aspect ratios; and SQR and STR, while achieving overall very good average aspect ratios, also have significantly poorer worstcase values than other goodaverage aspect ratio methods, e.g. APP, LM0, and LM4.
The rectangleareaweighted aspect ratio (Figure 6b) has a quite similar distribution to the unweighted aspect ratio, since the considered dataset has both large (more penalized) and small (less penalized) rectangles. Yet, we see that the boxplots are slightly higher than the corresponding plots for most algorithms. This tells that the tested algorithms deliver better aspect ratios for large cells than for small ones. To our knowledge, this insight has not been found so far in treemap research.
Algorithms also differ regarding instability. The relative position change instability (Figure 6d) is higher than the corner travel instability (Figure 6c) for all methods, which strengthens our earlier findings that preserving relative positions is harder than preserving absolute ones (Section 5.1). The shape of the black median curve and green bars in Figure 6c,d show the spread of median instability over all time steps, respectively over a single time step. We see that LM0, LM4, and SND are by far the stablest algorithms. This confirms earlier findings obtained over a much smaller benchmark [29] and is in line with the design of these methods: LM0 and LM4 explicitly aim to maximize stability, while SND’s design does this implicitly. APP, HIL, PBM, and SPL are among the least stable methods. The gray bars tell us the range of instabilities for all time steps. Overall, we see far less deviation between the average and worstcase behavior for instability as compared to the visual quality.
Combining these findings with those in Sec. 5.1, we find that visual quality and stability are in general competing goals. For instance, SND has poor visual quality but high stability, whereas SQR behaves conversely. However, LM0, LM4 score quite well on both visual quality and stability.
5.3 How does quality vary over different datasets? (Q3)
We studied so far the quality of algorithms at rectangle level (Section 5.1) and time step level (Section 5.2) for single datasets. To generalize our findings, we next examine how methods perform over different datasets. For this, we compute the average quality metrics , , , and for all rectangles, all time steps, all datasets in Table II. Figure 7 shows these averages (rows are methods, columns are datasets). Color coding (using a luminance colormap) is consistent over the four tables: Bright cells signify high quality, dark cells low quality.
Several observations follow. First, we see that there is no column in all four tables which contain mostly dark or bright cells. Hence, no dataset is consistently very hard, or very easy, to lay out considering both visual quality and stability. However, differences between datasets exist: Overall, we see that columns which are bright in the top two tables (good aspect ratios) are dark in the bottom two tables (poor stability), see e.g. dataset DutchNames (Fig. 7, marker B), and conversely, see e.g. dataset fwbSE.PRM.PRS5.ZS (Fig. 7, marker A) or dataset atpmatchesallplayersheight (Fig. 7, marker C). Scanning the table rowwise confirms the earlier findings (obtained so far on a single dataset) that SND has indeed the poorest visual quality but one of the highest stabilities; and that LM0 and LM4 are very stable. Interestingly, there is no algorithm that consistently scores high visual quality on all datasets. Separately, we see that corner travel and relative position change instabilities are strongly correlated for most datasets and algorithms, but also that the latter is overall larger than the former, which generalizes earlier findings obtained on a single dataset. As a general observation, is stricter than, and since correlated, seems to subsume, . This gives a guideline of which of the two metrics can be used in practice, depending on how exigent the stability evaluation needs to be.
5.4 How do methods perform and compare as a function of the data? (Q4)
Figure 7 shows that visual quality and, to a lesser degree, stability depends not only on the algorithm but the dataset being considered. Understanding this dependency should allow practitioners a more precise choice of the optimal algorithm depending on the characteristics of the data they aim to visualize. To study this, we consider the classification of datasets implied by the data characteristics introduced in Sec. 4 (see also Tab. II), and analyze how visual quality and stability depend on these characteristics.
Figure 8 shows several scatterplots of the average visual quality () vs the average instability () of all algorithms run on specific groups of datasets. To show the perdataset metrics of each algorithm, we add a star glyph [38] to each algorithm – each branch points to the scores of one dataset. Algorithms and star glyphs are categorically colored. Small star glyphs show algorithms with a consistent (stable) quality over the considered datasets; long glyph lines show datasets for which the respective algorithm scores in a very different way than its average.
The first plot (Fig. 8a) shows all algorithms ran on all 46 benchmark datasets. Overall, we see three groups: LM0 and LM4 score aboveaverage, but highly consistent, visual quality, and excellent, and highly consistent, stability. SND scores consistent, and highest, stability (marginally larger than LM0 and LM4 though) but consistently poorest visual quality. All other 10 algorithms create a separate ‘cluster’ indicating similar average visual quality and stability. Overall, these 10 algorithms are less stable than LM0, LM4, and SND, and, more interestingly, they also show a much higher variation of both stability and visual quality over the considered datasets (the star glyphs for these algorithms are quite large), thus, less guarantees of consistent behavior. This also shows that one should not use average quality metrics to compare treemapping algorithms, as these averages can be misleading. From these 10 algorithms, we see that SQR consistently obtains high visual quality, but is also one of the least stable algorithms. At the other extreme, MOO is the stablest in this group, but also scores the secondpoorest on visual quality, after SPI.
We further analyze how visual quality and stability depend on the data type. Considering weight variance (Fig. 8b), we see that lowweight variance datasets create a larger spread of both visual quality and stability than highweight variance datasets, while the relative positions of algorithms (dots in the scatterplot) remain the same. In other words, algorithms behave more similarly for datasets having high weight variances than for datasets having similarweight cells. Considering how weights change over time (Fig. 8c), a similar pattern emerges: We see more spread in instability for low weight changes than for regular and spiky weight changes. Still, the relative positions of the algorithms in the scatterplot stay roughly the same. Considering the number and dynamics of insertions and deletions during the tree sequence (Fig. 8d), the pattern changes: Stability does not seem to be visibly influenced by insertion/deletion pattern variations, while visual quality drops slightly when more such events take place, which is logical. Finally, we see that tree depth (Fig. 8e) has the highest influence on visual quality and stability: As trees get deeper, stability increases significantly, which is expected, indeed, as the motion of a rectangle is constrained within its parent. Interestingly, the overall visual quality of most algorithms also drops as the tree gets deeper. From this perspective, LM0 and LM4 score again very well, as they deliver consistent visual quality and very high stability for all considered tree depths.
Summarizing, we characterize algorithms as follows:
 G1:

High stability, average visual quality. LM0 and LM4 exhibit very high stability and average or aboveaverage visual quality for all considered datasets. Both metrics are consistent over the studied datasets and do not seem to depend on the dataset characteristics.
 G2:

High visual quality, no stability promises. SQR, STR, and APP score high on visual quality regardless of the dataset characteristics, except for tree depth, where it degrades for deeper trees. However, these methods do not give a stability guarantee – they are all quite unstable for shallow trees.
 G3:

Compromise methods. MOO, HIL, and SPI (interestingly, all related algorithmically) are on average more stable, but less visually good, than G2; and less stable than, and of comparable visual quality as, G1.
While the starplots in Fig. 8 show these insights, they do not let us easily pick the best algorithm(s) for a given type of dataset. To help with this selection, which is arguably of high added value for the interested practitioner, we next rank algorithms vs dataset categories, as follows (see Fig. 9): For each combination of the weight variance, insertionsanddeletions, weight change, and tree depth characteristics, we plot a twocolumn table showing the average visual quality (left column) and average instability (right column) of all algorithms for all datasets having the respective characteristics. The two columns are sorted separately to show the bestranking algorithms at the top. Cells indicate the algorithm names and obtained scores, and are categorically colorcoded on the algorithm name, following the same color scheme as in Fig. 8. Empty cells in this tablematrix indicate characteristic combinations that our benchmark does not cover (as explained earlier in Sec. 4).
We can read the table matrix in Fig. 9 in several ways, to answer several questions. Concerning visual quality, we see that STR and SQR are best for lowweight variance data, but have difficulties for trees of 4 or more levels. For high weight variance, the set of algorithms SQR, SPL, APP, PBS, and PBZ gives the highest visual quality, but there is no clear overall winner (visual quality for high weight variance depends strongly on the other data characteristics). For all dataset types, SND scores poorest for visual quality. Concerning stability, SND, LM0, and LM4 score consistently highest for all dataset types, except trees deeper than 4 levels, with low weight variance and low weight change, where LM0 and LM4 are still leading, but SND is secondpoorest. HIL, MOO, and SPI have very similar, but in general average, performance for all types of datasets. Separately, we see that APP delivers consistently better visual quality than stability, regardless of dataset type, whereas PBM’s visual quality is lower, similar or better than its stability, strongly influenced by the dataset type.
The table matrix can answer the following practical questions:

Which method is best for my data? Given a family of datasets with known characteristics, we search for the corresponding cell and pick the toprank algorithm(s) in visual quality, stability, or a combination of both, depending on the application requirements. When doing this, we should also examine the actual metrics, since in several cases algorithms score quite close to each other;

How is a given algorithm performing in general? To answer this, we scan the table following the color of the respective algorithm, and detect its rank (with respect to visual quality and/or stability) over all dataset types. This way, we can find algorithms that are consistently in the top, e.g. LM0 and LM4 vs stability) but also outlier situations, e.g. STR which scores in general very well for visual quality, but is secondpoorest for e.g. 23 level deep trees with low weight change, high weight variance, and regular insertions/deletions;

Which algorithms perform similarly? To answer this, we locate similar color patterns (groups of neighbor rows) in all tables. These indicate algorithms which score similarly regardless of the data type.
6 Discussion and Conclusion
We presented a methodology and results for evaluating the quality of timedependent treemapping algorithms. For this, we modeled the problem space of all possible datasets via a fourdimensional feature space (tree depth, weight variance, weight change, and pattern of insertions and deletions), sampled this feature space with 46 realworld datasets to cover most featurevalue combinations, proposed four visual quality and stability metrics, and compared 13 wellknown algorithms on these metrics and datasets. All our material (datasets, methods, metrics, visualizations) is publicly available. We hope that it will serve as a starting point for an increasingly more generic, and accepted, benchmark for timedependent treemapping.
Problem space. Our sampling of the problem space (Table II) is, obviously, sparse. A complete, or even dense, coverage of the problem space would be infeasible, requiring hundreds if not thousands of datasets. Instead, we proposed a systematic approach to describe this space using four characteristic feature dimensions and a grid based on carefully chosen values. We attempted to cover different parts of this grid with realworld datasets. This is in contrast to the majority of existing treemapping evaluations, which do not explicitly consider the characteristics of the datasets used in experiments, such as shallow vs deep trees or slowly vs rapidly changing trees. Our evaluation shows that the tested algorithms have quite different behavior depending on data characteristics. We believe that this is a key result of our paper which should be taken into account in the design of any future comparative studies on timedependent treemapping algorithms – otherwise, any findings can be strongly biased by the sample choice.
Metrics. We evaluated the quality of the tested algorithms with regard to both visual quality and stability. We introduced a new stability metric which for the first time takes both layout change and data change into account. We believe this is a necessary change of paradigm: A treemap can show massive changes, but that does not mean that the treemapping algorithm is unstable, if such changes are caused by large data changes. In other words, stability should be measured by studying the relation between layout change and data change. Doing this is nontrivial, since layout change and data change are generally incomparable. Therefore, we model the data change in the layout space by using a special baseline layout that represents the minimum amount of necessary layout change given the change in data.
We confirm the rationale behind our stability model by the measured average peralgorithm stability metrics (Figure 7) which show a consistent separation between stable and unstable algorithms for all types of considered datasets. We found no “global” winner, i.e., an algorithm scoring better in visual quality and stability for all considered dataset types (see Section 5.4). We conclude:

To choose the optimal algorithm, and more broadly, to compare algorithms, one needs to take the specific class(es) of the input datasets into account. Our proposed taxonomy (Table II) and results presented for each class are a good starting point for this.

Understanding precisely how the performance of existing algorithms relates to the features of the datasets we identified (and possibly additional features) is key to designing improved treemapping algorithms.
Our metrics take into account only the ‘raw’ data output by treemapping algorithms, i.e., the positions and sizes of the cells at different time steps. In this sense, following the wellknown model of the visualization pipeline (data import, filter, enrich, map, and render), our metrics assess the quality of the output of the mapping stage. Separately, this stage can be divided into a layout operation (the treemapping algorithm proper) and an encoding stage, where decisions are taken on how to map additional data to e.g. color, shading, texture, annotation, animation, or a third dimension (3D). Obviously, the endtoend quality of a visualization is influenced by all such stages, so the suitability of the visualization for specific data analysis tasks depends on all design decisions taken in all these stages. Separately, multiple visual variables affect each other in how they are perceived in such endtoend visualizations [3, 22]. To limit such effects, and difficulties in evaluating them, we focus here on the quality of the layout stage, which we quantify and measure as described. Studying how the compared treemapping algorithms fare perceptually with respect to each other in an endtoend context, when faced with specific tasks, is an interesting but different problem to study, which would require a different setup and methodology.
Limitations and future work.
As mentioned, our modeling and sampling of the problem space is quite coarse. Currently we cover only 46 of the 54 relevant classes in our feature space. Ideally, we should find more datasets to cover the missing classes in the feature space. Doing so by finding realworld datasets of such types has proven to be challenging. However it could be of interest to construct, and evaluate on, synthetic datasets. Doing this is however not trivial. Creating datasets that avoid sampling biases and are representative of what can be encountered in practice is a challenging (but important) question on its own in information visualization in particular and in data science in general. Separately, the features that we proposed may not fully cover the variability of the problem space with regard to the quality of treemapping algorithms. We plan to perform a careful evaluation of the feature selection by analyzing the results of multiple datasets that fall in the same class. These results will guide the addition of more features where and when necessary. Finally, we did not consider the computational scalability of the tested algorithms, since the current implementations we avail of are not (uniformly) optimized. We plan to cover this by providing a baseline of similarly implemented and optimized algorithms, and add this practically relevant metric to our benchmark. Given the open and extensible nature of our benchmark, adding this, or other, metrics, is an easy process, which we hope that other researchers interested in treemapping will contribute to.
Acknowledgments. The Netherlands Organisation for Scientific Research (NWO) is supporting M. Sondag and B. Speckmann under project no. 639.023.208, and K. Verbeek under project no. 639.021.541. This study was also financed in part by CAPES (Finance Code 001) and CNPq (Process 308851/20153).
References
 [1] M. Balzer, O. Deussen, and C. Lewerentz. Voronoi treemaps for the visualization of software metrics. In Proc. ACM Symp. on Software Visualization, pp. 165–172, 2005.
 [2] B. B. Bederson, B. Shneiderman, and M. Wattenberg. Ordered and quantum treemaps: Making effective use of 2D space to display hierarchies. ACM Trans. on Graphics, 21(4):833–854, Oct. 2002.
 [3] J. Bertin. Sémiologie Graphique. Les diagrammes, les réseaux, les cartes. GauthierVillars, 1967.
 [4] C. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
 [5] S. A. Boorman and D. C. Oliviera. Metrics on spaces of finite trees. Math. Psychology, 10(1):26–59, 1973.

[6]
M. Bruls, K. Huizing, and J. J. van Wijk.
Squarified treemaps.
In
Proc. Data Visualization
, pp. 33–42, 2000.  [7] S. Card, B. Suh, B. A. Pendleton, B. Heer, and J. W. Bodnar. Time tree: Exploring time changing hierarchies. pp. 3–10, 10 2006.
 [8] Y. Chen, X. Du, and X. Yuan. Ordered small multiple treemaps for visualizing timevarying hierarchical pesticide residue data. Visual Computer, 33(6):1073–1084, 2017.
 [9] M. de Berg, B. Speckmann, and V. van der Weele. Treemaps with bounded aspect ratio. Computational Geometry, 47(6):683 – 693, 2014.
 [10] B. Engdahl. Ordered and unordered treemap algorithms and their applications on handheld devices, 2005. MSc thesis, TRITANAE05033, Dept. of Comp. Sci., Stockholm Royal Inst. of Technology, Sweden.
 [11] D. Eppstein, E. Mumford, B. Speckmann, and K. Verbeek. Areauniversal and constrained rectangular layouts. SIAM J. on Computing, 41(3):537–564, 2012.
 [12] J. GuerraGómez, M. Pack, C. Plaisant, and B. Shneiderman. Visualizing change over time using dynamic hierarchies: TreeVersity2 and the StemView. IEEE Trans. on Visualization and Computer Graphics, 19(12):2566–2575, Dec 2013.
 [13] S. Hahn. Comparing the layout stability of treemap algorithms. Proc. HPI Research School on ServiceOriented Systems Eng., 95:71–79, 2015.
 [14] S. Hahn, J. Bethge, and J. Döllner. Relative direction change – a topologybased metric for layout stability in treemaps. In Proc. Int’l Conf. on Information Visualization Theory and Applications, pp. 88–95, 01 2017.
 [15] S. Hahn, J. Trümper, D. Moritz, and J. Döllner. Visualization of varying hierarchies by stable layout of Voronoi treemaps. In Proc. Int’l Conf. on Information Visualization Theory and Applications, pp. 50–58, 2014.
 [16] F. M. Harper and J. A. Konstan. The movielens datasets: History and context. ACM Trans. on Interactive Intelligent Systems), 5(4):19, 2016.
 [17] N. Kong, J. Heer, and M. Agrawala. Perceptual guidelines for creating rectangular treemaps. IEEE Trans. on Visualization and Computer Graphics, 16(6):990–998, 2010.
 [18] W. Köpp and T. Weinkauf. Temporal treemaps: Static visualization of evolving trees. IEEE Trans. on Visualization and Computer Graphics, 25(1):534–543, 2019.
 [19] M. K. Kuhner and J. Yamato. Practical performance of tree comparison metrics. Systematic Biology, 64(2):205–214, 2015.
 [20] L. Lu, S. Fan, M. Huang, W. Huang, and R. Yang. Golden rectangle treemap. J. of Physics: Conference Series, 787(1):012007, 2017.
 [21] J. Lukasczy, G. Weber, R. Maciejewski, C. Garth, and H. Leitte. Nested tracking graphs. In Computer Graphics Forum, vol. 36, pp. 12–22, 2017.
 [22] J. D. Mackinlay, S. K. Card, and B. Shneiderman. Readings in Information Visualization: Using Vision to Think. Morgan Kaufmann, 1999.
 [23] H. Nagamochi and Y. Abe. An approximation algorithm for dissecting a rectangle into rectangles with specified areas. Discrete Applied Math., 155(4):523–537, 2007.
 [24] W. Scheibel, C. Weyand, and J. Döllner. EvoCells – a treemap layout algorithm for evolving tree data. In Proc. Intl. Conf. on Information Visualization Theory and Applications, pp. 273–280, 2018.
 [25] B. Shneiderman. Tree visualization with treemaps: a 2D spacefilling approach. ACM Trans. on Graphics, 11(1):92–99, 1992.
 [26] B. Shneiderman and M. Wattenberg. Ordered treemap layouts. In Proc. IEEE Symp. on Information Visualization, pp. 73–78, 2001.
 [27] B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman & Hall, 1986.
 [28] K. SmithMiles, D. Baatar, B. Wreford, and R. Lewis. Towards objective measures of algorithm performance across instance space. Comput. Oper. Res., 45:12–24, May 2014.
 [29] M. Sondag, B. Speckmann, and K. Verbeek. Stable treemaps via local moves. IEEE Trans. on Visualization and Computer Graphics, 24(1):729–738, Jan 2018.
 [30] R. Szeliski. Computer Vision: Algorithms and Applications. Springer, 2010.
 [31] S. Tak and A. Cockburn. Enhanced spatial stability with Hilbert and Moore treemaps. IEEE Trans. on Visualization and Computer Graphics, 19(1):141–148, 2013.
 [32] A. Telea. Combining extended table lens and treemap techniques for visualizing tabular data. In Proc. EuroVis, pp. 120–127, 2006.
 [33] Y. Tu and H.W. Shen. Visualizing changes of hierarchical data using treemaps. IEEE Trans. on Visualization and Computer Graphics, 13(6):1286–1293, 2007.
 [34] R. van Hees and J. Hage. Stable and predictable Voronoi treemaps for software quality monitoring. Information and Software Technology, 87:242 – 258, 2017.
 [35] E. Vernier, J. Comba, and A. Telea. Quantitative comparison of dynamic treemaps for software evolution visualization. In IEEE Conf. on Software Visualization, 2018.
 [36] E. Vernier, J. Comba, and A. Telea. A stable greedy insertion treemap algorithm for software evolution visualization. In IEEE Conf. on Graphics, Patterns and Images, 2018.
 [37] R. Vliegen, J. J. van Wijk, and E. J. van der Linden. Visualizing business data with generalized treemaps. IEEE Trans. on Visualization and Computer Graphics, 12(5):789–796, 2006.
 [38] M. O. Ward. A taxonomy of glyph placement strategies for multidimensional data visualization. Information Visualization, 1(3/4):194–210, 2002.
 [39] M. Wattenberg. A note on spacefilling visualizations and spacefilling curves. In Proc. IEEE Symp. on Information Visualization, pp. 181–186, 2005.
 [40] M. Zhou, Y. Cheng, N. Ye, and J. Tian. Effectiveness and efficiency of using different types of rectangular treemap as diagrams in cartography. In Int’l Cartographic Conf., pp. 187–206, 2017.