Log In Sign Up

Picturing Bivariate Separable-Features for Univariate Vector Magnitudes in Large-Magnitude-Range Quantum Physics Data

We present study results from two experiments to empirically validate that separable bivariate pairs for univariate representations of large-magnitude-range vectors are more efficient than integral pairs. The first experiment with 20 participants compared: one integral pair, three separable pairs, and one redundant pair, which is a mix of the integral and separable features. Participants performed three local tasks requiring reading numerical values, estimating ratio, and comparing two points. The second 18-participant study compared three separable pairs using three global tasks when participants must look at the entire field to get an answer: find a specific target in 20 seconds, find the maximum magnitude in 20 seconds, and estimate the total number of vector exponents within 2 seconds. Our results also reveal the following: separable pairs led to the most accurate answers and the shortest task execution time, while integral dimensions were among the least accurate; it achieved high performance only when a pop-out separable feature (here color) was added. To reconcile this finding with the existing literature, our second experiment suggests that the higher the separability, the higher the accuracy; the reason is probably that the emergent global scene created by the separable pairs reduces the subsequent search space.


page 2

page 5

page 8

page 9

page 12

page 15


Bivariate Separable-Dimension Glyphs can Improve Visual Analysis of Holistic Features

We introduce the cause of the inefficiency of bivariate glyphs by defini...

What Do People See in a Twenty-Second Glimpse of Bivariate Vector Field Visualizations?

Little is known about how people learn from a brief glimpse of three-dim...

Building separable approximations for quantum states via neural networks

Finding the closest separable state to a given target state is a notorio...

Du Bois Wrapped Bar Chart: Visualizing categorical data with disproportionate values

We propose a visualization technique, Du Bois wrapped bar chart, inspire...

Every metric space is separable in function realizability

We first show that in the function realizability topos every metric spac...

Local Pair and Bundle Discovery over Co-Evolving Time Series

Time series exploration and mining has many applications across several ...

1 Introduction

Fig. 1: (a) Five bivariate configurations of univarate vector magnitude using the scientific notation. This example shows vector magnitude 440 () with each depicted using two values: digit 4.4 and power 2. (b) Contours shown using the - (LC) pair. This work demonstrates that more separable pairs lead to efficient local comparisons of a couple of vectors. Global scene structures guided by more separable dimensions also led to more accurate and highly efficient strategies without the needs for synthesizing univariate magnitudes.

Bivarate glyph visualization is a common form of visual design in which a dataset is depicted by two visual variables, often chosen from a set of perceptually independent graphical dimensions of shape, color, texture, size, orientation, curvature, and so on [1, 2]. A bivariate glyph design has been used to show univariates for quantum physicists at National Institute of Standards and Technology (NIST) to examine simulation results; thanks to their team’s Nobel-prize-winning scalable simulations, quantum physicists world-wide can now simulate at any scale. A critical quantum-physics analysis task is to understand spin (often depicted as vector) magnitude variations because these magnitudes showing atom behaviors are large in range and are often not continuous where the magnitudes can vary greatly in local regions. While a multitude of glyph techniques and design guidelines have been developed and compared in two-dimensions (2D) [1] [34] [ward2008multivariate], a dearth of three-dimensional (3D) glyph design principles exists. One reason is that 3D glyph design is exceptionally challenging because human judgments of metric 3D shapes and relationships contain large errors relative to the actual structure of the observed scene [todd1995distortions] [todd2003visual]. Often only structural properties in 3D left invariant by affine mappings are reliably perceived, such as the lines/planes parallelism of lines/planes and relative distances in parallel directions. As a result, 3D glyphs must be designed with great care to convey relationships and patterns, as 2D principles often do not apply [36].

Imagine visual search in the 3D large-magnitude-range vector field, where the differences between the smallest vector magnitude and the largest magnitude reach . On the visualization side, the initial design and evaluation of large-magnitude-range vector visualizations use scientific notation to depict digit and power as two concentric cylinders [3]: inside and outside tube lengths (-) are mapped to digit and power accordingly (aka splitVectors, Figure 1e). A three-dimensional (3D) bivariate glyph scene of this splitVectors design (Figure 2e) achieved up to ten times greater accuracy than the traditional direct linear univariate mapping (linear) (Figure 2f) for reading a vector ratio between two vector magnitudes. However, this bivariate splitVectors glyph also increases task completion time for an apparently simple comparison task between two vectors in 3D. Linear is a significantly more efficient approach than their new solution.

(a) - (LL) (integral)
(b) - (LCL) (redundant encoding)
(c) - (LC) (separable)
(d) - (LT) (separable)
(e) - (splitVectors) [3]
(f) Linear
Fig. 2: Large-magnitude-range contours computed from a simulation result are shown using five bivariate feature-pairs and linear representation.

One may frame this large-magnitude-range issue as a visual design problem: how can we depict a univariate quantity using bivariate visual features or glyphs to help quantum physicists examine complex spatial data? Intuitively, the last empirical study result on vector magnitude comparisons agrees well with a design consensus: to obtain a single magnitude at each location, the human visual system integrates these two component parts (digit and exponent terms) into one gestalt. This integration is referred to as holistic processing by Ware [4] in visualization and as  feature binding by Treisman [5] in vision science. Both study how our visual system combines separate object features such as shapes, color, motion trajectories, sizes, and distances into the whole object. Ware [4] (G5.14) further recommends that: “If it is important for people to respond holistically to a combination of two variables in a set of glyphs, map the variables to integral glyph properties.” Since comparison is a holistic recognition task, to represent a univariate vector magnitude we should always use integral properties (visual properties perceived together as a unit) or linear visualizations, instead of the separable (features manipulated and perceived independently) length pair in splitVectors.

Essentially, the current bivariate glyph design treats visual design as a bottom-up stimulus-driven composition in which the visual properties of object features (e.g., orientations and colors) are combined into single objects (here vectors). In this work, we challenge this consensus and argue that for one, feature-binding needs not occur at the object (vector) level and, for the other, this bivariate splitVectors gives viewers a correspondence challenge that does not arise when integral dimensions or direct linear encoding is used - the need to relate these two quantitative variables to their visual features hampers its efficiency. There are two ways to describe human experiences. If the visual field has only one or two objects at a time and if splitVectors of length-pairs are used, a viewer would take longer to process information in order to determine which length is exponent and which is mantissa. We suggest that in this case, if correspondence errors account for the temporal costs with bivariate feature pairs, then techniques preventing this type of error can be as effective as a holisticdirect linear encoding without time-consuming correspondence search.

Our method utilizes that fact that binding between separable variables is not always successful and a viewer can thus adopt a sequential task-driven viewing strategy based on visual hierarchy theory [16] to obtain gross regional distribution of larger exponents. After this, a lower-order visual comparison within the same exponent can be achieved; And no binding is needed as long as the correspondence between the two visual features can be easily understood. With these two steps, judging large or small or perceiving quantities accurately from separable variables may be no more time-consuming than single linear glyphs.

Now, for example, if we increase the feature separability by replacing the exponent-to-length mapping in Figures 1e and 2e to exponent-to-color mapping in Figures 1c and 2c for comparison tasks, it would be counterproductive for our attention first to visit each glyph to compute the magnitude (driven by bottom-up process). Instead, the global categorical color (hue) can guide our attention to first compare and categorize the colors, prior to a visual comparison when the colors are the same (within the same exponent) to compare vector lengths. In this case, no object-level binding is needed as long as the correspondence between the two visual features can be easily understood.

Further considering the viewers’ task relevant to multiple vector objects (e.g., find maximum), the same sequential viewing looking for the subregions followed by length-inspection works equally well. The reason is that feature binding need not occur at the object level, but can be done first at the scene level, and scene context benefits the reduction of search regions when there is no ambiguity in finding the correspondences. Coincidentally, this first impression of the data to drive statistical information is also called holistic or global pattern processing [6]; Wolfe called features guiding this top-down task-driven attention behaviors as scene features [7]. Here, we may refer to Ware’s holistic features [4] as object-level holistic and Biederman’s [6] and Wolfe’s [7] as scene-level holistic design thinking.

Selecting more separable visual variables to represent a unitary data value i.e., manipulated and perceived independently, would initially be considered problematic. Often at objective-level, combining two features (here for showing mantissa and exponential terms) into objects (here the magnitudes) is needed. In our study, choosing separable pairs utilizes the fact that binding between separable variables to form univariate is not always successful, a viewer can thus view the salient (exponential) terms, to obtain gross (regional) distribution; after this, visual comparison of mantissa within the same exponent regions can be achieved. No object-level binding of bivariate features for univariate is needed as long as the global-scene structures formed by the two visual features can be easily understood.

There is a compelling evidence that separable dimensions are not processed together but are broken down into their component parts and processed independently. Reducing correspondence error is influenced by the choices of separable dimensions. According to Treisman [8] and Wolfe [9], the initial preattentive phase is the major step towards improved comprehension, more important than the attentive phase. Our experiments useWe select the “most recognizable” features as color (Figures 1b,  1c,  2b,  2c) and texture (Figures 1d,  2d), and size (Figures 1a,  2a) dimensions. Size and color are preattentive and permit visual selection at a glance, at least in two-dimension (2D). We purposefully select texture patterns by varying the amount of dark on white, thus introducing luminance variations when many vectors are examined together (Figure 2d). Our results support that more separable pairs of , , and achieve the same temporal cost as the linear method. Compared to the continuous random noise in Urness et al. [10], ours is for discrete quantities and thus uses regular scale variations. When coupled with integral and separable dimensions, we anticipated that preattentive pop-out features in separable dimension pairs might reduce correspondence errors compared to integral dimensions. Following this logic, we hypothesize that highly distinguishable separable dimension pairs might erase the costs associated with the correspondence errors to reduce task completion time and be more accurate.

We tested this hypothesis in antwo experiments with four taskssix tasks using four dimension pairs to compare against the - (separable) in Zhao et al. [3]: - (integral), - (separable), - (separable), and - (redundant and separable). Since we predicte that separable dimensions with more preattentive features would reduce the task completion time, - and - might achieve more efficiency without hampering accuracy than other bivariate feature-pairs.

This work makes the following contributions:

  • Empirically validates that bivariate-glyphs encoded by highly distinguishable separable dimensions would reduce correspondence errorsimprove comparison task complete time (Exp 1).

  • Is the first to explain the benefits of the global scene-guidance which expands the widely accepted object-level bivariate glyph design in visualization (Exp 2).

  • Offers a rank order of separable variables for 3D glyph design and shows that the separable pairs - and - are among the most effective and efficient glyph encodings.feature pairs.

2 Theoretical Foundations in Perception and Vision Sciences

At least three perceptual and vision science theories have inspired our work: integral and separable dimensions [11]. preattentive features ranking [12, 13, 14, 15], and monotonicity [2].

Terminology. To avoid confusion, we adapt terms from Treisman and Gelade [5] in vision science to visualization. We use “visual dimension” to refer to the complete range of variation that is separately analyzed by some functionally independent perceptual subsystem, and “visual feature” to refer to a particular value on a dimension. Thus color, texture, and size are visual dimensions; gray-scale, spatial-frequency, and length are the features on those dimensions. Our “visual dimension” is thus most similar to Bertin’s “visual variables” [28] in the visualization domain. Differentiating the terms dimension and features is necessary for us in the long-term to compare design both within (as features) and between the dimensions.

Integral and Separable Dimensions. Garner and Felfoldy’s seminal work on integral and separable dimensions [11] has inspired many visualization design guidelines. Ware [4] suggests a continuum from more integral to more separable pairs: (red-green)-(yellow-blue), -, color-shape/size/orientation, motion-shape/size/orientation, motion-color, and group position-color. His subsequent award-winning bivariate study [2] using hue-size, hue-luminance, and hue-texton (texture) supports the idea that more separable dimensions of hue-texton lead to higher accuracy. Our work follows the same choicesideas of applying integral and separable dimensions but differs from Ware’s texton selection in two important aspects. the dependencies between two variables to be represented and whether or not the texture encodes continuous and ordered data. First, the Ware study focuses on finding relationships between two independent data variables, and thus his tasks are analytical; In contrast, ours demands two component parts from a unitary variable represented in two parts. Second, our texture uses the amount of black and white to show continuous local spatial frequency, luminance variations, in contrast to the discrete shape variation in textons. We anticipate that ours will be more suitable to continuous quantitative values [16]. No existing work we know of has studied whether or not the separable features can facilitate global comparisons and can be scaled to 3D vector field analysis.

Feature-Binding and Scene-Guidance Theories. Treisman and Gelade’s feature-integration theory of attention [8] showed that the extent of difference between target and distractors for a given feature affects search time. This theory may explain why splitVectors was time consuming: the similarity of the two lengths may make them interfere with each other in the comparison, thus introducing temporal cost. What we “see” depends on our goals and expectations. Wolfe et al. propose the theory of “guided search” [7, 17], a first attempt to incorporate users’ goals into viewing, suggesting that what users see is based on users’ goals. Wolfe et al. further suggest that color, texture, size, and spatial frequency are among the most effective features in attracting the user’s attention.

Building on these research, our current study shows that viewers can be task-driven and adopt optimal viewing strategies to be more efficient. No existing visualization work to our knowledge has studied how viewers’ strategies in visual search influence bivariate visualization of two dependent variables. While Ware has recommended holistic representations for holistic attributes, our empirical study results suggest the opposite: that separable pairs can be as efficient as holistic representations.

Preattentive and Attentive Feature Ranking. Human visual processing can be faster when it is preattentive, i.e., perceived before it is given focused attention [8]. The idea of pop-out highlighting of an object is compelling because it captures the user’s attention against a background of other objects (e.g., in showing spatial highlights [18]). Visual features such as orientation and color (hue, saturation, lightness) can generate pop-out effects [8] [19]. Healey and Enns [20] in their comprehensive review further remark that these visual features are also not popped-out at the same speed: hue has higher priority than shape and texture [21].

Visual features also can be responsible for different attention speeds, and color (hue) and size (length and spatial frequency) are among those that guide attention [16]. For visualizing quantitative data, MacKinlay [14] and Cleveland and McGill [15] leverage the ranking of visual dimensions and suggest that position and size are quantitative and can be compared in 2D. Casner [22] expends MacKinlay’s APT by incorporating user tasks to guide visualization generation. Demiralp et al. [23] evaluate a crowdsourcing method to study subjective perceptual distances of 2D bivariate pairs of shape-color, shape-size, and size-color. When adopted in 3D glyph design, these studies further suggest that the most important data attributes should be displayed with the most salient visual features, to avoid situations in which secondary data values mask the information the viewer wants to see.

Monotonicity. Quantitative data encoding must normally be monotonic, and various researchers have recommended a coloring sequence that increases monotonically in luminance [24]. In addition, the visual system mostly uses luminance variation to determine shape information [25]. There has been much debate about the proper design of a color sequence for displaying quantitative data, mostly in 2D [26] and in 3D shape volume variations [27]. Our primary requirement is that users be able to read large or small exponents at a glance. We chose four color steps in the first study and up to seven steps in the second study for showing areas of large and small exponents that are mapped to a hue-varying sequence. monotonic luminance and the higher the luminance, the higher exponents. We claim not that this color sequencethese color sequences are optimal, only that they are reasonable solutions to the design problem [26].

3 Experiment I: Local Discrimination and Comparisons

The goal in this first experiment is to quantify the benefits of separable pairs for visual processing of a few items. This section discusses the experiment, the design knowledge we can gain from it, and the factors that influence our design.

3.1 Methods

3.1.1 Bivariate Feature-Pairs

We choose five bivariate feature-pairs to examine the comparison task efficiency of separable-integral pairs.

- (integral) (Figure 1a). Lengths encode digits and exponents shown as the diagonal and height of the cylinder glyphs.

- (redundant and separable) (Figure 1b). This pair compared to - adds a redundant color (luminance and hue variations) dimension to the exponent and the four sequential colors are chosen from Colorbrewer [26].

- (separable) (Figure 1c). This pair maps four exponents to color. Pilot testing shows that correspondence errors in this case would be the lowest among these five feature-pairs.

- (separable) (Figure 1d). Texture represents exponents. The percentage of black color (Bertin [28]) is used to represent the exponential terms 0 (), 1 (), 2 () and 3 (), wrapped around the cylinders in five segments to make them visible from any viewpoint.

- (splitVectors [3], separable) (Figure 1e). This glyph uses splitVectors [3] as the baseline and maps both digit and exponent to lengths. The glyphs are semitransparent so that the inner cylinders showing the digit terms are legible.

Feather-like fishbone legends are added at each location when the visual variable length is used. The tick-mark band is depicted as subtle light-gray lines around each cylinder. Distances between neighboring lines show a unit length legible at certain distance (Figure 1, rows 2 and 3).

3.1.2 Hypotheses

Given the analysis above and recommendations in the literature, we arrived at the following working hypotheses:

  • Exp I.H1. (Overall). The - feature-pair can lead to the most accurate answers.

    Several reasons lead to this conjecture. Color and length are separable dimensions. Colors can be detected quickly, so length and color are highly distinguishable. Compared to the redundant - , - reduces density since the glyphsfeature-pairs are generally smaller than those in -.

  • Exp I.H2. (Integral-separable). Among the three separable dimensions, - may lead to the greatest speed and accuracy and - would be more effective than splitVectors.

    The hypothesis could be supported because color and length are highly separable.

  • Exp I.H3. (Redundant hypothesis). The redundant pair - will reduce time compared to splitVectors.

    This hypothesis could be supported because redundancy increases information processing capacity.

3.1.3 Tasks

Participants perform the following three task types as in Zhao et al. [3] so that results are comparable. They had unlimited time to perform these three tasks.

Exp1.Task 1 (MAG): magnitude reading (Figure 2(a)). What is the magnitude at point A? One vector is marked by a red triangle labeled “A”, and participants should report the magnitude of that vector. This task requires precise numerical input.

Exp1.Task 2 (RATIO): ratio estimation (Figure 2(b)). What is the ratio of magnitudes of points A and B? Two vectors are marked with two red triangles labeled “A” and “B”, and participants should estimate the ratio of magnitudes of these two vectors. The ratio judgment is the most challenging quantitative task [14]. Participants can either compare the glyph shapes or decipher each vector magnitude and compute the ratio mentally.

Exp1.Task 3 (COMP): comparison (Figure 2(c)). Which magnitude is larger, point A or B? Two vectors are marked with red triangles and labeled “A” and “B”. Participants select their answer by directly clicking the “A” or “B” answer buttons. This task is a simple comparison between two values and offers a binary choice of large or small.

Exp1.Task 4 (MAX): identifying the extreme value within 30 seconds (Figure LABEL:fig:task4). Which point has maximum magnitude when the exponent is X? X in the study was a number from 0 to 3. Participants need first to locate points with exponent X and then select the largest one of that group. Compared to Task 3, this is a global task requiring participants to find the extreme among many vectors.

(a) Exp1 MAG task: What is the magnitude of the vector at point A? (answer: 636.30)
(b) Exp1 RATIO task: What is the ratio of the magnitude between the vectors at points A and B? (answer: 3.60)
(c) Exp1 COMP task: Which magnitude is larger, point A or point B? (answer: A on the right.)
Fig. 3: Experiment 1: Local discrimination and comparison tasks.

3.1.4 Data Selection

Because we are interested in comparing our results to those in Zhao et al. [3]: We replicate their data selection method to generate the data by randomly sampling some quantum physics simulation results and produce samples within 3D boxes of size . There are 445 to 455 sampling locations in each selected data region.

We select the data satisfying the same following conditions: (1) the answers must be at locations where some context information is available, i.e., not too close to the boundary of the testing data. (2) no data sample is repeated to the same participant; (3) Since data must include a broad measurement, we select the task-relevant data from each exponential term of 0 to 3.

For task 1 (MAG, What is the magnitude at point A?), point A was in the range of the center of the bounding box in each data sample. In addition, the experiment had four trials for each variable pair with one instance of the exponent values of 0, 1, 2 or 3 being used. For task 2 (RATIO, What is the ratio of the magnitudes of points A and B?) points A and B are again randomly selected; the choice of exponents is the same as task 1 as well. Thus the ratios were always larger than 1. For task 3 (COMP, Which magnitude is larger, point A or point B?), points are again must be in the range of the center of the bounding box. The magnitude of one point is around 0.2, and magnitude of the other point is around 0.5 where is the maximum magnitude in the data sample used for the corresponding trial. For task 4 (MAX, Which point has maximum magnitude when the exponent is X?) X was an instance of the exponent values in the exponent range. In the first study, the range was fixed to 4.

3.1.5 Empirical Study Design

Design and Order of Trials. We use a within-subject design with one independent variable of bivariate quantitative glyphsfeature-pair (five types).and compared their efficiency in four tasks. Dependent variables are relevant error (for MAG and RATIO) or accuracy (for COMP) and task completion time. We also collect participants’ confidence levels. The accuracy measure follows Zhao et al. [3] to study how sensitive a method is to error uncertainty based on the relative error (RE) or fractional uncertainty, calculated as RE = correct answer - participant answer / (correct answer). This measure is used for MAG and RATIO tasks. The benefit of this approach is that it takes into account the value of the quantity being compared and thus provides an accurate view of the errors.

Block Participant Feature-pair
1 P1, P6, P11, P16 splitVectors, , , ,
2 P2, P7, P12, P17 , , , splitVectors,
3 P3, P8, P13, P18 , , , splitVectors,
4 P4, P9, P14, P19 , , splitVectors, ,
5 P5, P10, P15, P20 , , , , splitVectors
TABLE I: Experiment I design: 20 participants are assigned to one of the five blocks and use all five bivariate pairs. Here, : - (splitVectors), : -, : -, : -, and : -.

Table I shows that participants are assigned into five blocks in a Latin-square order, and within one block the order of the five glyphfeature-pair types is the same. Participants perform tasks with randomly selected datasets.for each encoding on each task type. Each participant performed subtaskstrials ( tasks random data bivariate-glyphsfeature-pairs). We ran four trials for each encoding method exponent. These four random data are from four exponent ranges.


We diversify the participant pool as much as possible, since all tasks can be carried out by those with only some science background. Twenty participants (15 male and 5 female, mean age = 23.3, and standard deviation = 4.02) participated in the study, with ten in computer science, three in engineering, two in chemistry, one in physics, one in linguistics, one in business administration, one double-major in computer science and math, and one double-major in biology and psychology. The five females are placed in each of the five blocks (Table 

I). On average, participants spent about 40 minutes on the computer-based tasks.

Procedure, Environment, and Interaction. Participants are greeted and complete an Institutional Review Board (IRB) consent form. All participants had normal or corrected-to-normal vision and passed the Ishihara color-blindness test. They filled in the informed consent form (which described the procedure, risks and benefits of the study) and the demographic survey. We showed glyphfeature-pair examples and trained the participants with one trial for each of the five glyphsevery feature-pair per task. They were told to be as accurate and as quickly as possible, and that accuracy was more important than time. They could ask questions during the training but were told they could not do so during the formal study. Participants practiced until they fully understood the glyphsfeature-pairs and tasks. After the formal study, participants filled in a post-questionnaire asking how these feature pairs supported their tasks and were interviewed for their comments.

Participants sat at a BenQ GTG XL 2720Z, gamma-corrected display with resolution 1920 1080. The distance between the participants and the display was about . The minimum visual angle of task-associated glyphs was in the default view where all data points were visible and filled the screen.

Participants could rotate the data and zoom in and out. Lighting placement and intensity were chosen to produce visualization with contrast and lighting properties appropriate for human assumptions and the spatial data. The screen background color was neutral stimulus-free gray background to minimize the discriminability and appearance of colors [4]. Using black or white background colors will make the black and white texture stimuli disappear thus bias the results.

3.2 Experiment I: Results and Discussion

Task Variables Significance ES
MAG time F = 6.8, p 0.0001 0.07
(LC, LT, LCL, splitVectors) LL
relative error F = 0.9, p = 0.46 0.01
RATIO time F = 6.2, p 0.0001 0.06
Three groups: A: LC, splitVectors, LT
                    B: splitVectors, LT, LCL
                    C: LT, LCL, LL
relative error F = 0.8, p = 0.50 0.01
COMP time F = 10.4, p 0.0001 0.09
Three groups: A: LCL, LC, LT
                    B: LC, splitVectors
                    C: splitVectors, LL
accuracy = 0.4, p = 0.98 0.03

TABLE II: Summary statistics by tasks. The significant main effects and the high effect size (ES) are in bold (none in these observations) and the medium effect size is in italic. Effect size is eta-square labeled “small” , “medium” , and “large” effects following Cohen [29]. Post-hoc Tukey grouping results are reported for significant main effects, where means statistically significantly better and enclosing parentheses mean they belong to the same Tukey group.
(a) Task 1 (MAG)
(b) Task 2 (RATIO)
(c) Task 3 (COMP)
Fig. 4: Task completion time () and relative error or accuracy by tasks. The horizontal axis represents the mean task completion time while the vertical axis showing the accuracy or relative error. Same letters represent the same post-hoc analysis group. Colors label the feature-pair types. All error bars represent confidence interval.

3.2.1 Analysis Approaches

We collected 1600 data points (80 from each of the 20 participants), and there were 400 data points from each of the four tasks. We collected 400 data points for each task. In preparing the accuracy and task completion time for analysis, a trial was considered to have an answer of the first type of correspondence error if responses’ exponent value did not match the correct one for the MAG task. This correspondence errors occurred when participants had trouble differentiating the levels within a encoding.

We detected 11 instances of the first type of correspondence errors from MAG (these trials comprised of the total: three splitVectors, five -, one -, and two -). This correspondence error appeared to be influenced by the integral-separable dimension as well and the integral dimension - had the highest (5) and - had no instances. We used only the remaining correct

ones in the statistical analysis because these errors would mask all other data by being at least one order of magnitude larger. For the remaining data in MAG and all data in RATIO and COMP tasks, we used standard outlier detection by first calculating the mean and standard derivation across all trials for each participant and pruning any trials that were +/- two standard derivations from that participant’s mean. With this approach, no outlier was detected in the MAG, RATIO, and COMP tasks.

Table II and Figure 4 show the and

values computed with SAS one-way measures of variance for task completion time. (

) base to obtain a normal distribution), the Friedman test of accuracy, and repeated measures of logistic regression on confidence levels. Post-hoc analyses on

are adjusted by Bonferroni correction. A post-hoc analysis using Tukey’s Studentized Range test (HSD) was performed when we observed a significant main effect. When the dependent variable was binary (i.e., answer correct or wrong), we used a logistic regression and reported the p value from the Wald test. When the value was less than 0.05, variable levels with

confidence interval of odds ratios not overlapping were considered significantly different. All error bars represent

confidence intervals. We also evaluated effect sizes using Cohen’s for continuous data (e.g., time), and Cramer’s V for binary choice (e.g. accuracy) to understand practical significance [29]. We used Cohen’s benchmarks for “small” (), “medium” (), and “large” () effects. using eta-square, labeled “small” , “medium” , and “large” effects following Cohen [29].

3.2.2 Overview of Study Results

All hypotheses but H2 are supported. Our results clearly demonstrated the benefits in terms of task completion time of separable dimensions for comparison. We observed a significant main effect of feature-pair type on task completion time for all three tasks MAG, RATIO, and COMP, and the effect sizes were in the medium range (Table II, Figure 4). - was the most efficient approach. and had the least error. For the comparison tasks (COMP and MAX) in this studyCOMP, -, - and - were most efficient for simple two-point comparison (Figure 3(c)). and were most accurate for group comparisons (Figure LABEL:fig:task4timeError). A most surprising result was that both and were highly accurate and efficient.

3.2.3 Separable Dimensions Are Better Than Integral Dimensions for Local Comparisons

Our separable-integral hypothesis (H1) was supported. In the MAG tasks, the integral - was least efficient and all other separable-pairs were in a separate group, the most efficient one (Figure 3(a)). In the RATIO tasks, -, -, and were the most efficient group (Figure 3(b)); in the COMP tasks, the redundant -, -, and - were in the most efficient group (Figure 3(c)).

SplitVectors was not as bad as we originally thought in handling correspondence errors, especially for the quantitative reading tasks of MAG and RATIO. SplitVectors belonged to the same efficient post-hoc group as - and - for the RATIO tasks and these three were also most efficient for MAG. The MAG and RATIO are the only two quantitative tasks. In contrast, The - pairs did elongate the task completion time.

We speculate that this result may indicate that when the comparison set size was small, participants did not need scene-level information to achieve accuracy. We anticipate that when the search space set-size increases, the search will become time-consuming and the lack of scene-level features would increase correspondence error and thus reduce effectiveness. We observe this in Experiment II.

The general order of these three separable visual variable pairs was that more separable pairs were more efficient; And the efficiency and effectiveness of these feature-pairs were very much task dependent. One of the most interesting results is that Separable - and - resulted in high efficiency in nearly all tasks: - functioned just as well as the - with comparable subjective confidence levels. This result can be explained that the black/white texture scales on a regular grid may lead to luminance variation, which attracts attention [16] thus directly contribute to discrimination of the global and spatial pattern differences.

3.2.4 Separable pairs of - and - achieved comparable efficiency to direct linear glyph

Critical for motivating this experiment was whether the separable pairs supported COMP and how the separable pairs compared in efficiency to the direct mapping. Since our study had the same numbers of sample data as Zhao et al. [3], we then performed a one-way -test to compare against the direct linear encoding in Zhao et al. [3]. Our separable-hypothesis (H2) was supported and our results indicated that COMP (judging large or small) from separable variables was no more time-consuming than direct linear glyphs. Our post-hoc analysis showed that -, -, and were in the same post-hoc group, i.e. that there were no significant differences between these features. We also observed that splitVectors dropped to the least efficient or most error-prone post-hoc groups (Fig. 3(c)). This result replicated the former study results in Zhao et al. [3] by showing that splitVectors impaired comparison efficiency. or effectiveness.

This result may be explained by the idea that the highly separable pairs may turn the comparison into a single-dimension digit comparison tasks, since a viewer could quickly resolve the two exponents and thus reduce the correspondence error introduced by the splitVectors design.

Relative Error or Accuracy. We adopted the error metric for quantitative data of Cleveland and McGill [15] for task types MAG, RATIO, and MAX. This metric calculates the absolute difference between the user’s and the true difference using the formula , where the base was appropriate for relative error judgments and prevented distortion of the results towards the lower end of the error scale, since some of the absolute errors were close to 0.

Participants ranked their confidence levels after each trial during the computer-based study. Preferences were collected in the post-questionnaire. Both data were on a scale of 1 (least confident or preferred) to 7 (most confident or preferred). Significant effects of the glyph type on confidence were only observed in the holistic comparison task of MAX, but not in local tasks of MAG, RATIO, and COMP. was the top preferred glyph for all tasks followed by and then . The two length-based regardless orthogonal or parallel were least preferred. The confidence levels followed a similar trend as the preferences.

3.2.5 Redundant Feature-Pairs Were Efficient

We might also compare the coloring effect with that of Healey, Booth, and Enns [19]. Their single-variate study showed that color was strongly influenced by the surroundings of the stimulus glyph, caused a significant interference effect when participants had to judge heights of glyphs or density patterns. We did not observe such effects here because the colors are discrete and can be easily distinguished.

We tested this conjecture through our observations with some new quantum physics simulation datasets from our collaborators as shown in Figure 2. We can easily discriminate the boundaries between the adjacent magnitude variations in the - (Figure 2e) and - (Figure 2d) feature boundaries and these two share a similar effect.

We think that - was an effective and efficient feature-pair for quantitative tasks because the same type used in the glyph perhaps reduced the cognitive load and also because scales of parallel lines are preserved in 3D.

It is worth noting that the only difference among these four tasks was that the first two (MAG and RATIO) involve visual discrimination (knowing precise values or how much larger) and COMP involved visual detection (larger or higher). For MAG and RATIO, a long time may have been spent on mentally synthesizing the numerical values at individual locations. Our results further confirmed that visual discrimination and visual detection were fundamentally different comparison tasks as shown in Borgo et al. [borgo2014order].

The errors or accuracy was task-dependent and perhaps depends on set-size. The lack of significant main effect on errors or accuracy happened in all tasks (MAG. RATIO, and COMP). Note that none of these three tasks required initial visual search, and target answers were labeled. Wolfe called this type of task-driven with known-target guided tasks [17]. - was most accurate in all task types. We thought at first that error may be related to so-called proximity, i.e., the perceptual closeness of visual variable choices to the tasks. The coloring was perhaps more direct. However, since the participants read those quantities as they commented, we thought the reason for not observing difference could well be their similarities in mentally computing cost. When search-space set-size increases for the MAX tasks, the search becomes time-consuming and none of the length pairs (- and -) was effective.

We also confirmed hypothesis H3. We were surprised by the large performance gain with the redundant encoding - of mapping and to the exponents in splitVectors. With redundant encoding, the relative error was significantly reduced and task completion time was much shorter (significantly shorter for MAG and COMP tasks). While Ware [4] confirmed that redundancy encoding was integrated into the encoded dimension, in our case, where color and size were separable, we suggested that the redundancy worked because participants could use either length or color in different task conditions. Since we could also consider that - was a redundant encoding with - and did better than - in some tasks (MAG and COMP), we may arrive at a design recommendation: when integral dimensions of - were less accurate, adding more separable color could compensate to aid participants in their tasks. adding a more separable dimension to the integral encoding may help improve task completion time and accuracy.

3.3 Summary

All tasks (MAG, RATIO, and COMP) lacked of significant main effect on relative errors (in MAG or RATIO) or accuracy (in COMP). Note that none of these three tasks required initial visual search, and target answers were labeled. Wolfe called this type of task-driven with known target guided tasks [7]. - was most accurate in all task types. We thought at first that relative error may be related to so-called proximity [30], i.e., the perceptual closeness of visual variable choices to the tasks. The coloring was perhaps more direct. However, since the participants read those quantities as they commented, we suspect that the reason for not observing differences could well be their similarities in mentally computing load. Since the search-space set-size was small, participants found that utilizing the object-level features was sufficient. When search-space set-size increases for the MAX tasks, the search becomes time consuming and the length pairs would not be effective. We subsequently carried out the second experiment to increase the set size to the entire scene to study the scene guidance and correspondence errors.

4 Empirical Study II: Global Scene-Features

So far, we have validated the efficiency and effectiveness of the separable pairs only for simple tasks with a couple of items. The goal of the second experiment is to address the high-level hypothesis to quantify the benefits of separable feature-pairs for tasks in search spaces as large as the entire dataset of several hundreds items.

4.1 Overview

We had two considerations in setting up this experiment. The first was a statement about feature design relevant to the global holistic experience. If the vector field contains one object at a time, then the integral and separable dimensions and associated correspondence error may explain our object experience as we have shown in Experiment I. However, when the binding problem is raised by looking at multiple vectors, it is possible that object binding does not occur at the individual object-level but rather at the scene-level first, perhaps governed by global gestalt features.

The second consideration is relevant to the correspondence errors when managing holistic global viewing experiences that may not be significant when a few items are compared. Generally, subjective reports from the first study indicate that - and - show similar perceptual speed. For a feature to actually guide attention, Wolfe [9] suggests that a just-noticeable-difference for that feature is not sufficient and one must also look at feature distractors, whether or not they are heterogeneous, and that the efficiency of a scene guidance will decline as a function of the degree of distractor variation. Efficiency will be achieved if the target and distractors are “linearly separable” meaning that a line can be drawn separating the target from the distractors in the feature space. This is similar to the studies of Acevedo et al. [31] for saliency measures and Urness et al. [10] for “texture stitching”. Acevedo et al. attempted to show that features can be segmented and Urness et al show boundaries from continuous flow fields using spatial frequency. Chung et al. showed the order and just noticeble differences of visual stimuli and suggested that size and hue had highest accuracy in 2D and texture value introduces greater ordering effects [32]. In our study, we believe that colors (especially in multiple hues) are categorical and thus should produce better perceptual speed than texture and length. Performance of texture may decline faster than color as the data range increases because our vision is not as sensitive to luminance-variation as to hues. The efficiency of color in Experiment I could well arise because the range (of 4) was not large enough. The current study expands the data range from the single level in the first study to five ranges to understand feature-pair scalability. SplitVectors produces the second type of correspondence error between two lengths which can challenge human eyes to see the two component parts.

4.2 Method

4.2.1 Feature-Pairs

(a) Continuous colormap and high-density data
(b) Continuous colormap and low-density data
(c) Categorical colormap and high-density data
(d) Categorical colormap and low-density data
Fig. 5: Experiment II: An example data using a categorical and a segmented continuous colormaps with two data densities. The boundaries between the data categories are more recognizable when the data are dense in (a) and (c). The boundaries are more difficult to recognize in (b) than in (d). We use the categorical colormaps in Experiment II.

We used -, -, and baseline splitVectors in Experiment II. These three visualizations were chosen because - and - are among the best feature-pairs from Experiment I and because color and texture are among the most separable features according to Ware [4]. To introduce a correspondence error or “distractor” experience, we vary the data range from the 4 levels in experiment I to 3-7 levels in Experiment II.

We had two reasons to use categorical hue instead of quantitative colormaps. The first was based on the subjective observation comparing a categorical colormap from Colorbrewer [26] and a segmented continous colormap by the number of exponents generated from the extended blackbody colormap (Figure 5). As we can see, the boundary detection with these colormaps might be associated with data density. We found that unless the data density was reasonably high, detecting the boundaries using continous colormaps (Figures 4(a),  4(b)) was harder than the Colorbrewer colormaps (Figures 4(c),  4(d)). The second reason is that the initial at-a-glance global statistical summary of the scene depends on categorical information [9] - then categorical visual encoding may be more suitable. An informal observation on texture choices was that detecting maximum and minimum regions was easier than any intermediate regions.

4.2.2 Hypotheses

We had the following hypotheses:

  • Exp II.H1. (Accuracy). More separable pairs will be more effective. We thus anticipate a rank order of effectiveness from high to low: -, -, and .

    While we did not see a significant main effect in Experiment I and we believed that for tasks related to multiple objects (vectors), pop-out color features would reduce the correspondance error and facilitate scene-level feature binding.

  • Exp II.H2. (Correspondence error). Less separable pairs would lead to more first type correspondence errors, when participants would choose the wrong exponent level.

  • Exp II.H3. (User behavior). More separable feature-pairs would lead to optimal users’ behaviors: i.e., participants can quickly locate task-related regions for tasks that demand looking among many vectors.

4.2.3 Tasks

(a) SEARCH: Find the vector with magnitude X. (X: 731, answer: the point marked by two yellow triangles. No answer or feedback was provided during the study.)

(b) MAX: Which point has the maximum magnitude when the exponent is X? (X=1, answer: the point marked by two yellow triangles. No answer or feedback was provided during the study.)

(c) NUMEROSITY (NUM): Estimate the total number of vector exponents of the entire vector field within 2 seconds. (answer: 7)
Fig. 6: Experiment II three task types. The callouts show the task-relevant feature-pair(s).

Participants performed three tasks in which they had to compare all vectors to obtain an answer.

Exp II.Task 1 (SEARCH): A vector search within 20 seconds (Figure 5(a)). Find the vector with magnitude X within 20 seconds. The target vector was shown at the bottom-right corner of the screen. Participants were asked to find this vector.

Exp II.Task 2 (MAX): An extreme value search within 20 seconds (Figure 5(b)). Within 20 seconds, locate the point has maximum magnitude when the exponent is X. X in the study was a number from 0 to the maximum exponent (). This was a global task requiring participants to find the extremum among many vectors.

Exp II.Task 3 (NUMEROSITY): within 2 seconds, estimate the total number of vector exponents (Figure 5(c)). Estimate the total number of vector exponents in the entire vector field within 2 seconds. Data are randomly chosen and modified to produce the 3 to 7 range. No data is used repeatedly in this experiment.

4.2.4 Data Choices

Data were first sampled using the same approach as Experiment I. We then modified the exponent range from 3 to 7 for the three tasks by normalizing the data to the desired new data range. Doing this let us preserve the critical domain-specific data attributes of their spatial structures and only altered the magnitude range to improve the applicability and reuse of our study results.

Prior literature used both synthetic data and real-world data to construct the data visualization as test scenarios, enabling tight control over the stimulus parameters. Most of the synthetic data in these studies were generated to replicate real-world data characteristics and others were explained in fictitious use scenarios. The goal was primarily to prevent preconceived user knowledge about the domain-specific attributes. As a result, the synthetic data strike the right balance between real-world uses and the data characteristics. In our cases, replicating characteristics in quantum physics data was challenging and indeed impossible, since atom behaviors in high-dimensional space were largely unknown and thus were not easily simulated. Our approach was therefore to randomly sample quantum physics simulation results to capture domain-specific attributes and then modify the data to suit evaluation purposes. We showed our data to our physicist collaborators to ensure their validity.

4.2.5 Empirical Study Design

Dependent and Independent Variables. We used a within-subject design with two independent variables of feature-pair (three levels: baseline splitVectors, -, and -) and exponent range (five levels: 3-7). The dependent variable was relative error. We did not measure time since all tasks were time-constrained.

Participants performed 3 (feature-pairs) 5 (magnitude-ranges) = 15 trials for the first two tasks. Three repetitions were used to give participants enough time to develop strategies. For NUMEROSITY tasks, the design runs 4 repetitions, resulting in 3 (feature-pair) 5 (exponent-range) 4 (repetition) = 60 trials. Each participant thus executed trials. Completing all tasks took about 32 minutes.

Self-Reporting Strategies. Several human-computer interaction (HCI) approaches can help observe users’ behaviors. Answering questions can assist us to determine not just which technique is better but also the strategies humans adopt. For example, cognitive walkthrough (CTW) measures whether or not the users’ actions match the designers’ pre-designed steps. Here we predicted that participants would use the global scene-features as guidance to accomplish tasks. We interviewed participants and asked them to verbalize their visual observations in accomplishing tasks.

4.2.6 Participants

Eighteen new participants (12 male and 6 female, mean age = 23.8, and standard deviation = 4.94) of diverse backgrounds participated in the study (seven in computer science, four in computer engineering, two in information systems, three in engineering, one in business school, and one in physics). Procedure, interaction, and environment were the same as those in the Experiment I.

4.3 Experiment II: Results and Discussion

We collected 810 data points per task for the first two tasks of SEARCH and MAX and 1080 points for the third NUMEROSITY task.

4.3.1 Summary Statistics

Task Variables Significance ES
SEARCH feature-pair F = 18.4, p 0.0001 0.46
(LC, LT) splitVectors
power-range F = 1.5, p = 0.20 0.86
MAX feature-pair F = 15.8, p 0.0001 0.47
(LC, LT) splitVectors
power-range F = 0.3, p = 0.87 0.11
NUM feature-pair = 63.2, p 0.0001 0.25
LC splitVectors LT
power-range = 47.4, p 0.0001 0.35
(3, 4) 5 (6, 7)
TABLE III: Experiment II: Summary statistics by tasks. The significant main effects and the high effect size are in bold and the medium effect size is in italic. Effect size is Cohen’s d for tasks SEARCH and MAX, and Cramer’s V for task NUMEROSITY (NUM). Post-hoc Tukey grouping results are reported for significant main effects, where means statistically significantly better and enclosing parentheses mean they belong to the same Tukey group. Here, LC: - and LT: -.
Fig. 7: Experiment II: Relative error in SEARCH and MAX and accuracy in NUM. Same letters represent the same post-hoc analysis group. All error bars represent confidence intervals.

For SEARCH and MAX tasks, we measured relative error (which was the percentage the reported value was away from the ground truth) with SAS repeated measure for SEARCH and MAX. The last NUMEROSITY tasks used accuracy which was the percentage of correct answers of all trials for each participant. Table III and Figure 7 show the summary statistics; And all error bars again represent confidence intervals.

We observed a significant main effect of feature-pair type on all three tasks. For the first two tasks, post-hoc analysis revealed that - and - were in the same group, the most efficient one and that relative errors were statistically significantly lower than those of the splitVectors. - remains the most accurate pair for the NUMEROSITY tasks. Exponent-range was only a significant main effect for NUMEROSITY, with power ranges 3 and 4 were significantly better than 5, which was better than 6 and 7.

4.3.2 More Separable Dimensions Improved Accuracy

We were interested to see if we could observe significant main effects for features which showed none in local comparison tasks in Experiment I. Here we did observe the significant main effect and confirmed our first hypothesis: - and - were also in the same more efficient Tukey group for both SEARCH and MAX. All participants reported that they searched the exponent terms first and then the digits following our hypothesis that there may exist a global pop-out of the color and texture features. Participants preferred colors and they could easily differentiate the powers.

- also led to the most accurate answers, and now splitVectors was better than - for NUMEROSITY tasks. This result can be explained by participants’ behaviors - more than half the participants suggested they simply look for the longest cylinder from the splitVectors since they know the numerical values in the test were continuous. This behavior deviated from our original purpose of testing the global estimate but did show two perspectives in favor of this work: (1) participants developed task-specific strategies during the experiment for efficiency; (2) 3D length still supported reasonable pop-out feature, even though it was not as effective as color.

These subjective behaviors through self-report suggested that they adopted a sequential task-driven viewing strategy to first obtain gross regional distribution of task-relevant exponents. After this, a visual comparison within the same exponent region were achieved; And no binding at the object-level was performed especially when the features pop-out globally as scene features. With these two steps, judging large or small or perceiving quantities accurately from separable variables would not use object-binding. Participants in our study used a top-down control process to utilize these spatial constraints regardless of feature types (for NUMEROSITY) and modify how global structures are used to see the features between tasks.

4.3.3 The Cost of Correspondence Errors

Reducing correspondence error was influenced by the choices of separable dimensions. Our second hypothesis H2 (correspondence error) was also supported. We first tested the first type of correspondence error (those answers with different exponent values) in MAX and SEARCH in the same way as in Experiment I. We saw 36 instances from SEARCH (about of all samples, or 1 -, 20 -, and 15 splitVectors); 59 instances from MAX (about of all samples or 6 -, 38 -, and 15 ). These results when combined with those in Experiment I confirmed that - had worse first type correspondence error. However when viewers in the correct data sub-categories, they could obtain as accurate answers as -.

All participants commented on how the number of powers in the data affected their effectiveness. For -, 10 participants remarked that it was difficult to differentiate adjacent powers when the total power level is around 4-5 for -. The white and black textures were very easy to perceive. All but two participants agreed that - could perhaps support up to 6. Chung et al. [32] studied ordering effects and it would be challenging to compare ours to their results because their visual stimuli were not shown as a scene-feature but a stimuli alone. More than half of the participants felt that effectiveness of splitVectors was not affected by changing the number of powers, since they looked for the longest outer cylinder to help find the answer. These results may suggest that subregion selection with - can perhaps be better designed with interfaces when the users can interactively select a texture level.

5 General Discussion

We discuss the results from both experiments and suggest future directions.

5.1 Separable Dimensions for Univariate Data Visualization for Large-Range Quantum Physics Data

The results of Experiment I showed that separable dimensions could achieve the same efficiency as direct linear visualizations for the COMP task and was always more efficient than integral pairs. For these local-tasks, we didn’t observe significant error reduction. The results from Experiment II studied the rank order of the separable pairs and found that more separable pairs also improved accuracy for global tasks. - and splitVectors in both experiments led to higher correspondence errors than -.

Visual variables that are separable (i.e. manipulated and perceived independently) would initially be considered problematic for encoding univariate data because of the known object-level feature-binding challenges involving in achieving integrated numerical readings by combining two visual features. Our experiment showed that binding does not have to be successful at the object-level. A viewer can adopt a sequential task-driven viewing strategy based on a view hierarchy: viewers first obtain global distributions of the scene. Then, a visual scrutiny is possible within a subregion. In other words, binding occurred at the scene level rather than the object level.

The separable-dimension pairs of - and - worked because they supported the scene-centered structural perception in which the processing of global structure and the spatial relationships among components precede analysis of local details according to participants’ self-reports. Another possibility for texture to be effective is the ordering - participants could see large and small [32]. From a practical perspective, our results may suggest that it was easiest for viewers to interpret a scene in which features are scene features for showing ordering and global structures. Scientific data are rarely unstructured. Using coloring to provide some initial regional division may be always better than not. Texture (luminance) could achieve similar accuracy and efficiency as long as the first-type of correspondence error was removed.

5.2 Feature Guidance vs. Scene Guidance

(a) SplitVectors
(b) -
(c) -
(d) -
Fig. 8: Contours of simulation data. Size from this viewpoint can guide visual grouping and size in 3D must take advantage of knowledge of the layout of the scene [33].

Taking into account both study results, we think an important part of the answer to correspondence error is guidance of attention. Attention in most task-driven fashion is not deployed randomly to objects. It is guided to some objects/locations over others by two broad methods: feature guidance and scene guidance.

Feature guidance refers to guidance by properties of the task-target as well as the distractors (leading to correspondence errors). These features are limited to a relatively small subset of visual dimensions: color, size, texture, orientation, shape, blur or shininess and so on. These features have been broadly studied in 3D glyph design (see reviews by Healey and Enns [20], Borgo et al. [34], Lie et al. [35], Ropinski et al. [36], and McNabb and Laramee [37]). Take one more example from quantum physics simulation results, but with a different task of searching for the structural distributions in the power of 3 in Figure 8 will guide attention to either the fat cylinders (Figure 7(a)) or the bright yellow color (Figure 7(d),  7(b)) or the very dark texture (Figure 7(c)), depending on the feature-pair types.

Working with quantum physicists, we have noticed that the structure and content of the scene strongly constrain the possible location of meaningful structures, guided “scene guidance” constraints [6, 7]. Scientific data are not random and are typically structured. Contextual and global structural influences can arise from different sources of visual information. If we return to the MAX search task in Figure 8 again, we will note that the chunk of darker or lighter texture patterns and colors on these regular contour structures strongly influence our quick detection. This is a structural and physical constraint that can be utilized effectively by viewers. This observation coupled with the empirical study results may suggest an interesting future work and hypothesis: adding scene structure guidance would speed up quantitative discrimination, improve the accuracy of comparison tasks, and reduce the perceived data complexity.

Another structure acting as guidance is the size itself. It was used by participants seeking to resolve the NUMEROSTIY tasks to look for the longest outside cylinders. We have showed several examples like Figure 8, our collaborator suggested that the cylinder-bases of the same size with the redundant encoding (Figure 7(b)) also helped locate and group glyphs belonging to the same magnitude. This observation agrees with the most recent literature that guidance-by-size in 3D must take advantage of knowledge of the layout of the scene [33].

Though feature guidance can be preattentive and features are detected within a fraction of a second, scene guidance is probably just about as fast (though precise experiments have not been done and our Experiment II only merely shows this effect). Scene ‘gist’ can be extracted from complex images after very brief exposures [6, 38]. This doesn’t mean that a viewer instantly knows, say, where the answer is located. However, with a fraction of a second’s exposure, a viewer will know enough about the spatial layout of the scene to guide his or her attention towards vector groups in the regions of interest.

A future direction, and also an approach to understanding the efficiency and the effectiveness of scene guidance, is to conduct an eye-tracking study to give viewers a flash-view of our spatial structures and then let the viewer see the display only in a narrow range around the point of fixation: does this brief preview guide attention and the gaze effectively? Recently, work in the information visualization [39] [40] [41] domain has measured and correlated performance on the glance or global structure formation. Vision science discovered long ago that seeing global scene structures in medical imaging decision making guides experts’ attention (experts always know where to look) [42] [43].

In grammar of Graphics Wilkinson puts forward some plausible properties that ‘nice’ scales should possess and suggests a possible algorithm. The properties (simplicity, granularity and coverage, with the bonus of being called ‘really nice’ if zero is included) are good but the algorithm is easy to outwit. Difficult cases for scaling algorithms arise when data cross natural boundaries, e.g., data with a range of 4 to 95 would be easier to scale compared to 4 to 101.

5.2.1 Use Our Results in Visualization Tools and Limitations of Our Work

Visualization is used when the goal is to augment human capabilities in situations where the problems might not be sufficiently defined for a computer to handle algorithmically or to communicate certain information. One of these areas is quantum physics: simulation results are in high-dimensional space thus cannot be interpreted in computational solutions. As a result, quantum physicists count on visualization to detect patterns and trends. Our collaborators were amazed though not surprised by many design possibilities and the performance differences among them.

Our current study concerns bivariate data visualization in which the bivariate variables are component parts of a univariate variable. the first variable is always an integer and the second variable is bounded to a real number in the range [1, 10). Application domains carrying similar data attributes could reuse of work. The design principle of prompting scene-level guidance would be broadly applicable to 3D visualizations. Our design is somewhat limited to preliminary pop-out stimuli. Our design could have been improved by following advanced tensor glyph design methods especially those in tensor field visualizations. Both generic 

[44] and domain-specific requirements for glyph designs  [27] [45] [46] have led to the summary of glyph propertise (e.g., invariance, uniqueness, continuity) to guide design and to render 2D and 3D tensors. A logic step for us is to truly understand the quantum physics principles to combine data attributes and human perception to arrive domain-specific solutions.

One limitation of this work is that we measured only a subset of tasks crucial to showing structures and omitted all tasks relevant to orientation. However, one may argue that the vectors naturally encode orientation. When orientation is considered, we could address the multiple-channel mappings in two ways. The first solution is to use the - to encode the quantitative glyphs and color to encode the orientations if we cluster the vectors by orientations. The second solution is to treat magnitude and orientation as two data facets and use multiple views to display them separately, with one view showing magnitude and the other for orientation (using Munzner’s multiform design recommendations [47]). The second limitation here was that our experiments were limited to a relatively small subset of visual dimensions: color, texture, and size. A future direction would be to try shapes and glyphs to produce novel and useful design.

6 Conclusion

This work shows that correspondence computation is necessary for retrieving information visually and that viewers’ strategies can play an important role. Our results showed that - with the separable pairs fall into the same group as the linear ones. was most efficient and effective for both local and global tasks. Our findings in general suggest that, as we hypothesized, distinguishable separable dimensions perform better. Our empirical study results provide the following recommendations for designing 3D bivariate glyphs for representing univariate variables.

  • Highly separable pairs can be used for quantitative holistic data comparisons as long as these glyphs are scene-structure forming. We recommend using -. and .

  • Texture-based glyphs () that introduces spatial-frequencyluminance variation are recommended.can cause correspondence error and will only be recommended when task-relevant structures can be constained.

  • Integral and separable bivariate glyphsfeature-pairs have similar accuracy when the tasks are guided (aka, target location is known).local. They influence accuracy only when the target is unknown and when the search space increases.

  • 3D glyph scene would shorten task completion time when the glyph scene support structural feature guidances.

  • The redundant encoding (-) greatly improved on the performancetask completion time of integral dimensions (splitVectors) by adding separable and preattentive color features.

Empirical study data and results can be found online at /integral-and-separable-dimension-pairs.


The work is supported in part by NSF IIS-1302755, NSF CNS-1531491, and NIST-70NANB13H181. The user study was funded by NSF grants with the OSU IRB approval number 2018B0080. Non-User Study design work was supported by grant from NIST-70NANB13H181. The authors would like to thank Katrina Avery for her excellent editorial support and all participants for their time and contributions.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Certain commercial products are identified in this paper in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the products identified are necessarily the best available for the purpose.


  • [1] J. Fuchs, P. Isenberg, A. Bezerianos, and D. Keim, “A systematic review of experimental studies on data glyphs,” IEEE Transactions on Visualization and Computer Graphics, vol. 23, no. 7, pp. 1863–1879, 2017. [Online]. Available:
  • [2] C. Ware, “Quantitative texton sequences for legible bivariate maps,” IEEE Transactions on Visualization and Computer Graphics, vol. 15, no. 6, pp. 1523–1529, 2009. [Online]. Available:
  • [3] H. Zhao, G. W. Bryant, W. Griffin, J. E. Terrill, and J. Chen, “Validation of SplitVectors encoding for quantitative visualization of large-magnitude-range vector fields,” IEEE Transactions on Visualization and Computer Graphics, vol. 23, no. 6, pp. 1691–1705, 2017. [Online]. Available:
  • [4] C. Ware, Information Visualization: Perception for Design.   Elsevier, 2012. [Online]. Available:
  • [5] A. M. Treisman and G. Gelade, “A feature-integration theory of attention,” Cognitive Psychology, vol. 12, no. 1, pp. 97–136, 1980. [Online]. Available:
  • [6] I. Biederman, “On processing information from a glance at a scene,” ACM SIGGRAPH Workshop on User-oriented Design of Interactive Graphics Systems, 1977. [Online]. Available:
  • [7] J. Wolfe, M. Cain, K. Ehinger, and T. Drew, “Guided search 5.0: Meeting the challenge of hybrid search and multiple-target foraging,” Journal of Vision, vol. 15, no. 12, p. 1106, 2015. [Online]. Available:
  • [8] A. Treisman and S. Gormican, “Feature analysis in early vision: evidence from search asymmetries,” Psychological Review, vol. 95, no. 1, pp. 15–48, 1988. [Online]. Available:
  • [9] J. M. Wolfe and I. S. Utochkin, “What is a preattentive feature?” Current Opinion in Psychology, 2018. [Online]. Available:
  • [10] T. Urness, V. Interrante, I. Marusic, E. Longmire, and B. Ganapathisubramani, “Effectively visualizing multi-valued flow data using color and texture,” IEEE Visualization, pp. 115–121, 2003. [Online]. Available:
  • [11] W. R. Garner and G. L. Felfoldy, “Integrality of stimulus dimensions in various types of information processing,” Cognitive Psychology, vol. 1, no. 3, pp. 225–241, 1970. [Online]. Available:
  • [12] C. G. Healey and J. T. Enns, “Large datasets at a glance: Combining textures and colors in scientific visualization,” IEEE Transactions on Visualization and Computer Graphics, vol. 5, no. 2, pp. 145–167, 1999. [Online]. Available:
  • [13] C. G. Healey, K. S. Booth, and J. T. Enns, “Visualizing real-time multivariate data using preattentive processing,” ACM Transactions on Modeling and Computer Simulation, vol. 5, no. 3, pp. 190–221, 1995. [Online]. Available:
  • [14] J. Mackinlay, “Automating the design of graphical presentations of relational information,” ACM Transactions on Graphics, vol. 5, no. 2, pp. 110–141, 1986. [Online]. Available:
  • [15] W. S. Cleveland and R. McGill, “Graphical perception: Theory, experimentation, and application to the development of graphical methods,” Journal of the American Statistical Association, vol. 79, no. 387, pp. 531–554, 1984. [Online]. Available:
  • [16] J. M. Wolfe and T. S. Horowitz, “What attributes guide the deployment of visual attention and how do they do it?” Nature Reviews Neuroscience, vol. 5, no. 6, pp. 1–7, 2004. [Online]. Available:
  • [17] J. M. Wolfe, “Guided search 4.0,” Integrated Models of Cognitive Systems, pp. 99–119, 2007. [Online]. Available:
  • [18] H. Strobelt, D. Oelke, B. C. Kwon, T. Schreck, and H. Pfister, “Guidelines for effective usage of text highlighting techniques,” IEEE Transactions on Visualization and Computer Graphics, vol. 22, no. 1, pp. 489–498, 2016. [Online]. Available:
  • [19] C. G. Healey, K. S. Booth, and J. T. Enns, “High-speed visual estimation using preattentive processing,” ACM Transactions on Computer-Human Interaction, vol. 3, no. 2, pp. 107–135, 1996. [Online]. Available:
  • [20] C. Healey and J. Enns, “Attention and visual memory in visualization and computer graphics,” IEEE Transactions on Visualization and Computer Graphics, vol. 18, no. 7, pp. 1170–1188, 2012. [Online]. Available:
  • [21] T. C. Callaghan, “Interference and dominance in texture segregation: Hue, geometric form, and line orientation,” Perception, & Psychophysics, vol. 46, no. 4, pp. 299–311, 1989. [Online]. Available:
  • [22] S. M. Casner, “Task-analytic approach to the automated design of graphic presentations,” ACM Transactions on Graphics, vol. 10, no. 2, pp. 111–151, 1991. [Online]. Available:
  • [23] Ç. Demiralp, M. S. Bernstein, and J. Heer, “Learning perceptual kernels for visualization design,” IEEE Transactions on Visualization and Computer Graphics, vol. 20, no. 12, pp. 1933–1942, 2014. [Online]. Available:
  • [24] B. E. Rogowitz and A. D. Kalvin, “The ”Which Blair Project”: A quick visual method for evaluating perceptual color maps,” IEEE Visualization, pp. 183–191, 2001. [Online]. Available:
  • [25] J. P. O’Shea, M. Agrawala, and M. S. Banks, “The influence of shape cues on the perception of lighting direction,” Journal of Vision, vol. 10, no. 12, pp. 1–21, 2010. [Online]. Available:
  • [26] M. Harrower and C. A. Brewer, “ An online tool for selecting colour schemes for maps,” The Cartographic Journal, vol. 40, no. 1, pp. 27–37, 2003. [Online]. Available:
  • [27] C. Zhang, T. Schultz, K. Lawonn, E. Eisemann, and A. Vilanova, “Glyph-based comparative visualization for diffusion tensor fields,” IEEE Transactions on Visualization and Computer Graphics, vol. 22, no. 1, pp. 797–806, 2016. [Online]. Available:
  • [28] J. Bertin, Semiology of Graphics: Diagrams, Networks, Maps.   University of Wisconsin Press, 1967.
  • [29] J. Cohen, Statistical power analysis for the behavioral sciences.   New York: Academic Press, 1988. [Online]. Available:
  • [30] C. D. Wickens and C. M. Carswell, “The proximity compatibility principle: Its psychological foundation and relevance to display design,” Human Factors, vol. 37, no. 3, pp. 473–494, 1995. [Online]. Available:
  • [31] D. Acevedo, J. Chen, and D. H. Laidlaw, “Modeling perceptual dominance among visual cues in multilayered icon-based scientific visualizations,” IEEE Visualization Posters, 2007.
  • [32] D. H. Chung, D. Archambault, R. Borgo, D. J. Edwards, R. S. Laramee, and M. Chen, “How ordered is it? on the perceptual orderability of visual channels,” Computer Graphics Forum, vol. 35, no. 3, pp. 131–140, 2016. [Online]. Available:
  • [33]

    M. P. Eckstein, K. Koehler, L. E. Welbourne, and E. Akbas, “Humans, but not deep neural networks, often miss giant targets in scenes,”

    Current Biology, vol. 27, 2017. [Online]. Available:
  • [34] R. Borgo, J. Kehrer, D. H. Chung, E. Maguire, R. S. Laramee, H. Hauser, M. Ward, and M. Chen, “Glyph-based visualization: Foundations, design guidelines, techniques and applications,” Eurographics State of the Art Reports, pp. 39–63, 2013. [Online]. Available:
  • [35] A. E. Lie, J. Kehrer, and H. Hauser, “Critical design and realization aspects of glyph-based 3D data visualization,” Proceedings of the Spring Conference on Computer Graphics, pp. 19–26, 2009. [Online]. Available:
  • [36] T. Ropinski, S. Oeltze, and B. Preim, “Survey of glyph-based visualization techniques for spatial multivariate medical data,” Computers & Graphics, vol. 35, no. 2, pp. 392–401, 2011. [Online]. Available:
  • [37] L. McNabb and R. S. Laramee, “Survey of surveys (SoS)-mapping the landscape of survey papers in information visualization,” Computer Graphics Forum, vol. 36, no. 3, pp. 589–617, 2017. [Online]. Available:
  • [38] A. Oliva, “Gist of the scene,” Neurobiology of Attention, vol. 696, no. 64, pp. 251–258, 2005. [Online]. Available:
  • [39] G. Ryan, A. Mosca, R. Chang, and E. Wu, “At a glance: Pixel approximate entropy as a measure of line chart complexity,” IEEE Transactions on Visualization and Computer Graphics, vol. 25, no. 1, pp. 872–881, 2019. [Online]. Available:
  • [40] Z. Bylinskii, P. Isola, C. Bainbridge, A. Torralba, and A. Oliva, “Intrinsic and extrinsic effects on image memorability,” Vision Research, vol. 116, pp. 165–178, 2015. [Online]. Available:
  • [41] M. A. Borkin, A. A. Vo, Z. Bylinskii, P. Isola, S. Sunkavalli, A. Oliva, and H. Pfister, “What makes a visualization memorable?” IEEE Transactions on Visualization and Computer Graphics, vol. 19, no. 12, pp. 2306–2315, 2013. [Online]. Available:
  • [42] H. L. Kundel, C. F. Nodine, E. F. Conant, and S. P. Weinstein, “Holistic component of image perception in mammogram interpretation: gaze-tracking study,” Radiology, vol. 242, no. 2, pp. 396–402, 2007. [Online]. Available:
  • [43] T. Drew, M. L.-H. Võ, and J. M. Wolfe, “The invisible gorilla strikes again: Sustained inattentional blindness in expert observers,” Psychological Science, vol. 24, no. 9, pp. 1848–1853, 2013. [Online]. Available:
  • [44] T. Gerrits, C. Rössl, and H. Theisel, “Glyphs for general second-order 2D and 3D tensors,” IEEE Transactions on Visualization and Computer Graphics, vol. 23, no. 1, pp. 980–989, 2017. [Online]. Available:
  • [45] H.-J. Schulz, T. Nocke, M. Heitzler, and H. Schumann, “A design space of visualization tasks,” IEEE Transactions on Visualization and Computer Graphics, vol. 19, no. 12, pp. 2366–2375, 2013. [Online]. Available:
  • [46] G. Kindlmann and C.-F. Westin, “Diffusion tensor visualization with glyph packing,” IEEE Transactions on Visualization and Computer Graphics, vol. 12, no. 5, pp. 1329 – 1335, 2006. [Online]. Available:
  • [47] T. Munzner, Visualization Analysis and Design.   A K Peters Visualization Series. CRC Press, 2014. [Online]. Available: