GestureMap: Supporting Visual Analytics and Quantitative Analysis of Motion Elicitation Data by Learning 2D Embeddings

by   Hai Dang, et al.

This paper presents GestureMap, a visual analytics tool for gesture elicitation which directly visualises the space of gestures. Concretely, a Variational Autoencoder embeds gestures recorded as 3D skeletons on an interactive 2D map. GestureMap further integrates three computational capabilities to connect exploration to quantitative measures: Leveraging DTW Barycenter Averaging (DBA), we compute average gestures to 1) represent gesture groups at a glance; 2) compute a new consensus measure (variance around average gesture); and 3) cluster gestures with k-means. We evaluate GestureMap and its concepts with eight experts and an in-depth analysis of published data. Our findings show how GestureMap facilitates exploring large datasets and helps researchers to gain a visual understanding of elicited gesture spaces. It further opens new directions, such as comparing elicitations across studies. We discuss implications for elicitation studies and research, and opportunities to extend our approach to additional tasks in gesture elicitation.


page 1

page 2

page 3

page 4


Quantitative analysis of robot gesticulation behavior

Social robot capabilities, such as talking gestures, are best produced u...

Speech2Properties2Gestures: Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech

We propose a new framework for gesture generation, aiming to allow data-...

GestureLens: Visual Analysis of Gestures in Presentation Videos

Appropriate gestures can enhance message delivery and audience engagemen...

Free-body Gesture Tracking and Augmented Reality Improvisation for Floor and Aerial Dance

This paper describes an updated interactive performance system for floor...

Prosody Based Co-analysis for Continuous Recognition of Coverbal Gestures

Although speech and gesture recognition has been studied extensively, al...

Gesture Similarity Analysis on Event Data Using a Hybrid Guided Variational Auto Encoder

While commercial mid-air gesture recognition systems have existed for at...

Omnis Prædictio: Estimating the Full Spectrum of Human Performance with Stroke Gestures

Designing effective, usable, and widely adoptable stroke gesture command...

1. Introduction

Designing effective interactions and user interfaces often involves exploring two potentially high-dimensional spaces (Williamson and Murray-Smith, 2012): 1) The space of human behaviour (e.g. comfortable motion ranges of arm and hand), and 2) the space of senseable input in a system or context (e.g. tracking of up to X body joints in 3D).

Although central to HCI, the field has developed few dedicated methods and tools for supporting the (joint) exploration of such user-sensor spaces (cf. (Williamson and Murray-Smith, 2012)). One successful method that has seen widespread use is the elicitation study paradigm (Wobbrock et al., 2005), which helps HCI researchers and practitioners to explore the space of possible and “intuitive” or “guessable” (gesture) commands: Participants are shown a “referent” (often a system action, e.g. volume up) and are asked to propose and perform a gesture they would use for it (e.g. turn wrist right). This is repeated for several referents.

Researchers then analyse these gesture proposals, compute measures to identify common proposals (e.g. (Vatavu, 2019; Wobbrock et al., 2005)), and decide on a set of gestures to be used in an interactive system, typically composed of the gestures with high agreement among participants (e.g. (Vatavu, 2012; Wobbrock et al., 2009b)).

In this way, elicitation studies inform gestural interaction with user-driven exploration: Most studies focus on the human behaviour space and thus do not rely on a specific sensor; they typically video-record participants for manual gesture analysis (e.g. (E et al., 2017; Kim et al., 2020)). Some additionally employ a sensor in elicitation (e.g. Leap (Vatavu and Zaiti, 2014), Kinect (Vatavu, 2012)), thus also potentially considering the senseable space.

While elicitation studies have become a widely used staple in the HCI toolbox, they still present challenges (cf. (Tsandilas, 2018; Villarreal-Narvaez et al., 2020)), including the need for manual data analysis. This limits elicitation studies, as well as the general endeavor of systematically exploring behaviour-sensor spaces in HCI, as characterised in the following paragraphs:

  • Workload:

    Watching videos to manually classify gestures (e.g. 

    (E et al., 2017; Kim et al., 2020; Vatavu, 2012; Vatavu and Zaiti, 2014; Wobbrock et al., 2009b)) is tedious work (Tsandilas, 2018). It may thus also hinder the use of elicitation in user-centred processes that require repeating such work (e.g. iteration).

  • Subjectivity: As critically pointed out (Tsandilas, 2018), manual interpretation at best requires further efforts (e.g. multiple coders); at worst, it leads to subjectively biased results.

  • Limited scale: Without quantitative analysis tools, large-scale elicitation remains scarce (survey: mean=25 participants (Villarreal-Narvaez et al., 2020)). This stands in contrast to motivations for diverse samples and for training recognisers on elicited gestures (E et al., 2017; Kane et al., 2011; Smith and Gilbert, 2018; Villarreal-Narvaez et al., 2020).

  • Isolated results: The lack of quantitative data analysis methods and tools hinders replication, reuse, extensions, and comparisons across elicitations, even if the same sensor was used (e.g. Kinect). Most work collects and analyses new data (Villarreal-Narvaez et al., 2020), isolated from previously published datasets.

These challenges motivate our work on new quantitative methods and tools for analyzing elicitation data. Fittingly, recent related work highlighted the need and feasibility of more objective, computational measures (Vatavu, 2019), and called for further computational models and measures, based on a survey of 216 elicitation studies (Villarreal-Narvaez et al., 2020).

Addressing this, we extend the computational toolbox for analyzing gesture elicitation data with these contributions:

  • GestureMap, a method and tool for visualizing and exploring motion data from elicitation studies on an interactive, learned 2D map, inspired by concepts from visual analytics.

  • New computational capabilities for gesture representation, consensus and clustering, based on average gestures computed with DTW Barycenter Averaging (DBA) (Petitjean et al., 2011), to connect exploration to quantitative measures, extending the computational approach motivated in recent work (Vatavu, 2019; Villarreal-Narvaez et al., 2020).

  • Insights from using GestureMap in a detailed case study on datasets from the literature, plus a qualitative expert evaluation with eight researchers.

2. Related Work

The analysis concepts introduced in this work are built on previous work spanning HCI, machine learning and visual analytics. This section briefly describes gesture elicitation studies, followed by an overview of tools that support researchers across different tasks involved in analyzing elicitation data. In particular, we outline existing analysis concepts for high-dimensional data.

2.1. Gesture Elicitation Studies

The gesture elicitation paradigm was first introduced by Wobbrock et al. (Wobbrock et al., 2005) to elicit users’ interaction preferences for new systems. This method was then specifically adapted to include gesture proposals to control surface tabletop computers (Wobbrock et al., 2009b). In a subsequent study, Morris et al. (Wobbrock et al., 2009a) confirmed that new users do prefer the user-defined gesture set over the one created by experts. Since then, this method has become a standard tool for the design of gesture input mappings for new interactive systems, for example to control a swarm of robots (Kim et al., 2020), smart-home appliances (Vatavu, 2012; Kühnel et al., 2011), or AR/VR applications (Piumsomboon et al., 2013).

Central to gesture elicitation studies is an in-depth analysis of the proposed data to find common behavior. Researchers have therefore developed various measures to formalize the consensus among participants (Wobbrock et al., 2009b; Vatavu, 2012; Vatavu and Zaiti, 2014; Vatavu and Wobbrock, 2016; Vatavu, 2013).

However, these measures rely on subjectively assessing the similarity of the observed gestures: They require researchers to group proposals into subgroups that they consider identical, which is usually done by manual annotation based on watching videos of the participants in the study (Morris, 2012; Kim et al., 2020). Thus, while these measures set standards on how to compute consensus from gesture proposal, they cannot avoid subjectivity per se.

To address this, Vatavu (2019) has recently proposed a new, data-driven approach: It employs a distance measure as an objective basis for assessing consensus in elicitation studies. Our work builds on this idea, extends its data-driven perspective with a visual analytics tool, and introduces a new measure fitting this visualization.

2.2. Gesture Analysis Tools

Several tools have been created for more effective and objective analyses. Video analysis has been the preferred evaluation method, but the annotation of individual video sequences can be time-consuming (Tsandilas, 2018). An efficient analysis becomes even more important as large-scale gesture data sets can be collected online, for example, through cloud elicitation tools (Ali et al., 2019). Thus, researchers devised different ways to distribute the work among people (Magrofuoco and Vanderdonckt, 2019; Ali et al., 2018).

While the concepts introduced in this paper also enable researchers to better annotate sequences, our focus lies in particular on the exploration of elicited gesture data.

Nebeling et al. (2015) created a tool to analyze recordings created by a Kinect camera sensor. They included three visualizations. First, they used a 3D animation of a Kinect skeleton. Second, they provided a visualization where only the moving joint is drawn on the canvas. The third visualization is similar to the second, but additionally employs a heat-map to emphasize the time domain.

The most similar work to ours is GestureAnalyzer by Jang et al. (2014) which also focuses on the analysis of gesture elicitation studies. To find behavioral patterns, it employs a variation of the small-multiples plot (Tufte, 1986)

and an interactive hierarchical clustering interface visualized in a tree layout. Their calculations and analyses are based on hand-engineered features. The gesture map which we propose in this work facilitates a richer exploration of the behavior space using machine learned features for the gesture poses. It provides an overview of the gesture data and introduces a new continuous traceable 2D paths which represent gesture sequences. For further discussions on the comparison of these two systems we refer to section


2.3. Visualization of High Dimensional Data

A key challenge in visual analytics is the effective visualization of high-dimensional data. This typically involves two steps: 1) Projecting the data to 2D for display on a screen. 2) Suitably visualising the projected data, considering the analysts’ tasks and goals. While there exist many dimension reduction techniques (Van Der Maaten et al., 2009; Van Hulle, 2012; McInnes et al., 2018; Goodfellow et al., 2014; Lawrence, 2003), we use a Variational Autoencoder (Kingma and Welling, 2013) to reduce the dimensions of the raw sensor data. To visualize temporal data, a common representation is a line plot, horizon plot (Few and Edge, 2008), or a small-multiples plot (Tufte, 1986). However, these highly abstract visualizations may occlude the nature of the underlying data. For example, these plots may hide the structure of a 3D skeleton recording. We therefore combine an abstract 2D mapping with a grid of representative 3D skeletons to give analysts a visual overview of the proposed gestures.

Also related to our work are tools to analyze and visualize machine learned representations of complex data: Deep learning models are capable of learning human-understandable features of high-dimensional data: For example, Kingma and Welling (2013) and Lawrence (2004) sample multiple points from the learned space and visualize them to demonstrate that the learned space is continuous and smooth, but without providing interaction functionalities. Smilkov et al. (Smilkov et al., 2016) filled this gap by providing a generic tool to visualize these embeddings.

Some researchers created specific visualizations to facilitate interpretation of the axes of a (2D) projection, to judge the variation of the data (Kim et al., 2016; Vatavu et al., 2014) or the relative importance of the data attributes along an axis (Kwon et al., 2016). Liu et al. (2019) used a cartographic approach to compare and analyze learned embedding spaces. In this work, we adapt similar visualization concepts with the goal to create an interpretable gesture space.

To the best of our knowledge, GestureMap is the first tool to use a latent variable model to analyze sensor-based motion data in the context of gesture elicitation studies. We combine interactive k-means clustering, automatic metric computation, a new visualization, and analysis concepts to provide an integrated platform.

3. GestureMap Concept

We introduce a structured analysis approach based on a learned 2D gesture map, as realised in GestureMap. We motivate the conceptual features via related work as summarized in Table 1 and elaborate on them in the following sections.

3.1. Feature Requirements and Overview

Challenge / Motivation in the Literature Visual Analytics Actions Feature in GestureMap
Call for more computational support (Vatavu and Wobbrock, 2016; Villarreal-Narvaez et al., 2020; Tsandilas, 2018) Model Building, Model Usage Average Gesture Sequence; Statistical Plot Overlay; Variance Computation
Multiple representation for gesture sequences (Villarreal-Narvaez et al., 2020; Jang et al., 2014) Visual Mapping 2D Path; 3D Skeleton
Comparisons across participants, sessions and trials (Vatavu and Wobbrock, 2016; Jang et al., 2014) Visualization Manipulation Selective Filtering; Gesture Highlighting
Visual support for temporal dimension (Jang et al., 2014; Nebeling et al., 2015) Visual Mapping 2D/3D Animation
Unfamiliarity with Gesture Design Space (Chen et al., 2018; Dim et al., 2016) Visual Mapping, Model usage, Model-vis Interactive Gesture Map
Processing large data sets (Jang et al., 2014; Nebeling et al., 2015; Ali et al., 2018, 2019) Model Building, Model Usage Interactive Clustering; Cluster Reassignment
Share and Save Analysis (Magrofuoco and Vanderdonckt, 2019; Nebeling et al., 2015; Jang et al., 2014) N/A Export Analysis
Table 1. Main analysis components in GestureMap with the challenge and related work that motivated this feature and a reference to the supported action within the Knowledge Generation Model for Visual Analytics (Sacha et al., 2014).

The features in GestureMap were informed by close examination of the literature on gesture elicitation and related concepts and tools: We collected features 1) proposed in related work, 2) motivated in calls for further improvements, and 3) explicitly requested from future work. In addition, we included further ideas. Table 1 shows an overview of the relation to related work. The following paragraphs further introduce and motivate the features.

3.1.1. 3D Skeleton View (Figure 12⃝)

Related tools (Nebeling et al., 2015; Jang et al., 2014) show a 3D skeleton view with animation. GestureMap also offers this, to afford easy examination of a recorded gesture.

3.1.2. 2D Map View (Figure 11⃝)

GestureMap is fundamentally motivated by providing researchers with a visual overview of the elicited gesture space.

Furthermore, some researchers indicated that participants may struggle to propose gestures, if they are unfamiliar with the gesture design space (Dim et al., 2016; Chen et al., 2018; Silpasuwanchai and Ren, 2015). They therefore modified elicitation such that people could choose from a predefined list of gesture proposals.

GestureMap addresses these needs as its 2D map shows observed gesture proposals and gives an idea of past behavior. While we focus on researchers as users of this map in this paper, it could also be shown to participants as we described in Section 8.3.

3.1.3. 2D Map Overlays (Figure 1b 1⃝, Figure 1c)

Prior work has extensively used scatter plots to analyze machine learned representations (Smilkov et al., 2016; Liu et al., 2019). Our map view affords different plots on top of it, such as:

  • Scatter plots (point = body posture; Figure 1b 1⃝)

  • Drawing paths (path = gesture; Figure 4)

  • Densities (e.g. where in the space are postures and gestures located? Figure 1c)

3.1.4. Linked Views of Postures

Villarreal-Narvaez et al. (2020) called for future work to include multiple representations of gestures. GestureMap realises this by linking the 2D map and the 3D skeleton. Concretely, the 3D skeleton view updates while the user moves the cursor over the 2D map to present the posture at that point in the gesture space.

3.1.5. Linked Animations of Gestures

Complementary to the feature for postures, GestureMap accounts for the temporal nature of gesture data (Jang et al., 2014; Nebeling et al., 2015) by offering linked animations of gesture paths (point moving on the path) and 3D skeletons (skeleton moving).

3.1.6. Gesture Clustering

As larger data sets are expected in the future (Jang et al., 2014; Nebeling et al., 2015; Ali et al., 2018), we also provide an interactive clustering method to reduce manual workload for identifying similar gesture (sub)groups.

3.1.7. Sharing Results

Motivated by such interests in related work (Magrofuoco and Vanderdonckt, 2019; Nebeling et al., 2015; Jang et al., 2014), we include an export functionality to easily share analyses with other researchers.

3.2. The Learned 2D Gesture Map

Here, we describe the map concept in more detail.

3.2.1. Core Visualization Concept

Following a cartographic approach (Skupin, 2002), and in line with 2D projections in visual analytics (e.g. (Wenskovitch et al., 2020; Kim et al., 2016)), we use a map metaphor to visually guide analysts through the elicited gesture space. This gesture map is a 2D plot with a grid of representative body poses shown as small human skeletons. These “pose landmarks” give an overview of the poses in the corresponding rectangular map region (Figure 11⃝). The map itself is continuous, that is, each 2D point represents a pose. Thus, since gestures are sequences of poses, they are paths connecting multiple points on the map. In this way, the gesture map combines a line plot’s simplicity with the structural expressiveness of a small-multiples visualization (Jang et al., 2014).

3.2.2. Learning a Gesture Map

The two dimensions of the map do not have a direct predefined meaning yet emerge from elicited data. Formally, let the set of all individual gesture poses in the dataset be denoted by , where is the dimensionality of the raw sensor data (in our case D=20). A gesture sequence which consists of gesture poses can be viewed as an ordered tuple of size i.e., .

  1. To reduce the dimensions of the raw sensor data, we use an encoder to embed every gesture pose into a latent space code . These latent codes represent a pose using only two learned features.

  2. The raw and high-dimensional gesture sequence is then embedded as a two-dimensional path in the latent space.

  3. To create the grid of gesture poses in the background, we compute an evenly spaced grid of rows and columns over a visible region in the latent space. For example, if the embedded gesture poses (latent codes) range from -4 to 4 in both x and y dimension, we would linearly sample a number of points within this square region.

  4. Using the decoder model we can decode arbitrary 2D map points into a full pose, i.e.

In this paper we use a Variational Autoencoder (VAE). In general, layout and quality of the space (e.g. smoothness), and of pose decoding, depend on the model, and we reflect on this in our discussion.

3.3. Map Interaction Concepts

Here we describe how users can interact with the map.

3.3.1. Pan and Zoom

The map supports pan and zoom and accordingly recomputes the grid of landmarks (small skeletons). This feature helps to adjust the viewport to support exploration of data-dense areas, and deal with the fact that landmark representations are discrete indicators for the continuous space.

3.3.2. Examining Poses

Scatter or density plots can be projected onto the map (e.g. Figure 11⃝ and Figure 1c). Using “details on demand”, users can hover over points to see the corresponding pose skeleton (Figure12⃝), and referent, participant and trial number in the detail view (Figure16⃝

). The scatterplot may help researchers to detect outlier body poses, while the density plot reveals regions with recorded data.

3.3.3. Examining Gestures

For further inspection, one or more gestures can be selected (e.g. Figure 4) from a referent’s list of gesture proposals (Figure 1 3⃝). This allows researchers to view details on-demand e.g. to reduce the risk of information overload.

3.3.4. Examining Unseen Poses

A fundamentally new capability of GestureMap is that unseen poses or gestures (i.e. not proposed by participants) can be simulated by decoding arbitrary 2D points in the learned space. In our prototype users can thus hover over the map to visualize 3D skeletons for any cursor location. Analysts can examine if empty regions are anatomically not feasible (cf. 8.3.3) or if people did not show such behaviour. This might be useful to adjust elicitation setup/instructions, for example to prompt people to also cover a previously empty part of the map.

3.4. Analysis Concepts Using the Gesture Map

Exploratory analysis seeks to uncover structural patterns in the dataset, identify anomalies, and single-out outliers (Tukey, 1977). We thus conceptualized the gesture map to enable researchers to seamlessly cycle between the detection of new observations and the assessment of supporting evidence. The analysis concept is structured further by differentiating between global observations and local observations. The former targets questions that may span multiple referents or the entire dataset, while the latter focuses on a few gestures to identify specific behavioral idiosyncrasies.

3.4.1. Global Observations

The first of many analysis steps often involves developing an overview of the data to understand its underlying properties: Researchers here often use statistical plots to summarize the data and to identify broad patterns.

Developing an Overview of the Gesture Space

GestureMap supports this as well: For instance, Figure 3 depicts a scatter plot projected on the gesture map. Scatter points on top of the pose grid enable researchers to quickly identify which general poses were observed in the data. Each scatter point corresponds to a pose from the dataset, whereas empty patches in the gesture map may indicate behavior that has not been observed (e.g. poses/gestures not proposed by participants during elicitation).

Spotting Clusters and Outliers

Scatter points may visually cluster near gesture poses that are characteristic for a particular referent. These clusters can help researchers to form a mental model of the main poses that are characteristic for a group of gesture sequences. It might also be interesting to analyze outlier behavior which can be detected by examining scatter points that lie far from these clusters.

Comparing Referents and Regions

Additionally, color codes facilitate the comparison of behavior across different referents. For example, it might be interesting to identify which referents share behavior and which are distinctive. Regions in the gesture map that contain multiple embedded data points from different referents may indicate that this region encodes shared generic behavior.

Judging Densities and Overlap of Referents

Scatter plots may contain too much detail and clutter the visualization. Density plots then offer a visualization of the most frequent gesture poses. Researchers can use it to detect overlapping or distinctive behavior across different referents. For example, these observations can inform researchers interested in building gesture recognizers in judging the difficulty of separating gestures for the various referents.

3.4.2. Local Observations

The key local observation in elicitation data is to examine individual gesture proposals. GestureMap also supports such analysis, as outlined here:

After the initial data exploration it is often necessary to find concrete example for detected patterns. For example, in the elicitation context, we might be interested in comparing the behavior across different participants and experimental trials.

The gesture map can serve as a common visual basis for such investigations: By projecting multiple gestures onto the map, researchers can evaluate each participant’s behavior individually. The trajectory of the embedded gesture paths can inform them on specific behavioral characteristics. For example, a participant’s movement can be subtle, in which case the embedded gesture path is simple in shape and typically spans a small region in the gesture map. In contrast, a complex gesture may be represented as an intricate path that may meander across the map.

Thus, by comparing multiple embedded gesture paths researchers can visually assess gestures as similar or not. Considering research interests in the elicitation context from the literature, for example, this might support researchers to examine if a participant can remember and repeat the same gesture proposal across multiple trials (Nacenta et al., 2013), or if behavior was influenced by a priming effect (Cafaro et al., 2018).

4. Consensus and Clustering with DBA

We introduce the concept of an average gesture sequence as a new computational capability in the context of gesture elicitation. This has three practical values, which complement our tool:

  1. Descriptive: The average gesture can serve as a single, visual proxy for a group of gestures, which opens up new visualization opportunities (e.g. showing and comparing referents as average paths on our gesture map).

  2. Evaluative: It enables a new measure of consensus/variability among gesture proposals for a referent. This measure aligns well with other statistical notions of variability: Consensus is assessed via the variance of actual gestures around the average gesture.

  3. Explorative: The average gesture enables clustering methods that require averaging (e.g. here: k-means), supporting the automated detection of groups of gestures in a dataset (e.g. in “open elicitation” without referents, cf. (Villarreal-Narvaez et al., 2020)).

We next describe the technical approach in more detail.

4.1. Computing an Average Gesture with DBA

We employ the DTW Barycenter Averaging (DBA) algorithm by Petitjean et al. (2011) to compute an average gesture: Intuitively, this algorithm first aligns an initial sequence with every sequence in the set of gesture proposals, before computing a centroid (barycenter) for each aligned coordinate. For further technical details we refer the reader to the related work (Petitjean et al., 2011).

4.2. Consensus as Variation Around Barycenter

Vatavu (2019) were the first to propose a data-driven consensus measure that does not rely on human judgement of gesture similarity. To achieve this, they employed Dynamic Time Warping (DTW) distance computations to define a consensus measure: They considered two gesture sequences and as similar if the DTW distance was below a threshold . To determine consensus for a referent they calculated the pairwise distances across all gesture proposals for this referent. Finally, to report a measure independent of the threshold value , they used a logistic regression model to determine the consensus for a range of normalized threshold values and reported the growth rate as an indication of the overall consensus.

This work motivates us to further explore data-driven measures of consensus: We follow a similar approach, but instead of regressing on the DTW distance values, and relying on pairwise comparisons, we directly compute an average sequence from all gesture proposals in a referent group, using DBA.

We then measure the DTW distance of every gesture proposal for a referent to the computed average gesture (i.e. barycenter) for . Finally, we report the variance of these DTW distances as a measure of consensus. Formally, this is noted as:


denotes the set of all gestures elicited for referent . Intuitively, for example, a high value may inform an analyst that referent contains quite varied gesture proposals (i.e. low consensus).

The gesture variance integrates well with GestureMap’s visualization concept because this already displays the involved average gestures as visual elements. Moreover, this approach yields a one-number summary without a logistic regression model on top. Overall, we see this approach as an additional measure, not a replacement of others: As a flexible tool, GestureMap can be extended to additionally display further such measures (e.g. the one by Vatavu (2019)) to support researchers with the analysis.

4.3. Clustering Gestures with DBA & K-Means

Being able to compute an average gesture enables the use of clustering methods that require average computations. Here, we use k-means in particular. The idea of clustering gesture elicitation data is motivated by two aspects:

  1. Exploration: For example, in “open elicitation” (Villarreal-Narvaez et al., 2020) or settings where referents are not predefined, such as in the work by Williamson and Murray-Smith (2012), a clustering may proved a valuable reference point to identify novel behavior.

  2. Annotation: Clustering may also be used to help kickstart (manual) annotation in cases where explicit groupings of proposals are desired (e.g. for agreement measures (Wobbrock et al., 2009b)).

Considering the literature, Jang et al. (2014) used an interactive hierarchical clustering approach with complete-linkage. In contrast, we experimented with the k-means algorithm, using DBA to calculate the centroids. We motivate this choice by interpretability of the resulting centroids, versus the abstract representations in the hierarchical approach: In particular, the centroids (i.e. average/barycenter gestures) are more compatible with our 2D gesture map, on which they could be displayed as paths. In contrast, a hierarchical treemap does not directly fit the map metaphor well.

5. Implementation of GestureMap

We implemented GestureMap as an analysis tool that integrates the described concepts of both the interactive gesture map (Section 3) and the DBA-based computations (Section 4). Here we describe the key implementation aspects.

5.1. User Interface and Functionality

Figure 1 shows the UI; the following sections refer to the numbers in the figure. Overall, we implemented all UI views and interactions conceptually described in Section 3.

5.1.1. Gesture Map Figure 1b 1⃝

Researchers can zoom, pan, and hover over the gesture map, and overlay a scatter plot or a density plot (e.g. Figure 1c) to explore individual or multiple gesture poses.

5.1.2. Experiment View Figure 13⃝

This view lists all referents and gesture proposals in a compact way as numbers for quick reference and selection. When hovering over an element, the corresponding gesture path is shown on the map for a moment.

5.1.3. 3D Skeleton View Figure 12⃝, Figure 14⃝

This view either shows the raw skeleton recording or a reconstructed skeleton. If researchers animate a gesture, it is simultaneously animated in this view and on the map. The progress of the animation can be controlled via a play/pause button and slider.

5.1.4. Statistics View Figure 15⃝

This view shows different metrics, namely variances around the average gesture sequence per selected referent (Section 4.2), the distributions of DTW distances of proposals to their average gesture sequence, and nearest neighbor distances for a selected gesture.

5.1.5. Cluster View Figure 16⃝

This dialog is unfolded with a button in Figure 13⃝ and lets users interactively cluster gesture proposals for a referent. Centroids can be animated and once the clusters have been computed, users can toggle all gesture proposals that were assigned to a centroid.

5.2. Architecture

We used a server-client architecture. The frontend and backend modules communicate through a REST API through which the data is transmitted as a JSON formatted string. The frontend was implemented with NodeJS (Foundation, 2020) and React (Facebook, 2020). For plotting, we use the PlotlyJs library (Inc., 2020b). For the backend we used the Flask framework (Community, 2020a) and Pandas (Community, 2020b) to handle the data transformations and queries. We cached expensive computations such as the computed average sequences and distances matrices on MongoDB (Inc., 2020a)

to . PyTorch 

(Paszke et al., 2017) was used to develop the embedding model.

6. Experiments

Ledo et al. (Ledo et al., 2018) identified four evaluation strategies for toolkit contributions. We follow their perspective to evaluate GestureMap, combining two such strategies: First, here we follow the Demonstration strategy and provide a detailed analysis of examples on elicitation data from related work. Second, Section 7 follows the Usage strategy and reports on a user study with HCI researchers.

6.1. Datasets

We consider four existing datasets: One explicit gesture elicitiation study by Vatavu (2019), plus three datasets collected for gesture recognition systems (Aloba et al., 2018; Fothergill et al., 2012; Chen et al., 2015a). We first focus on the dataset by Vatavu (2019) that consists of 1312 full body gestures elicited from children aged 3-6, recorded with a Kinect sensor. For preprocessing, we followed the original authors (Vatavu, 2019) but left out the resampling step.

6.2. Model Training

We used a Variational Autoencoder (VAE) (Doersch, 2016) to embed the data as a 2D gesture map. The VAE here serves as an exemplar of a model with both powerful (non-linear) encoding and decoding capabilities. We reflect on other possible choices in our discussion.

We trained the VAE on the poses (frames) of the mentioned dataset (Vatavu, 2019) which has 60 dimensions (20 body joints ). We adapted the architecture from Spurr et al. (2018) (i.e. 4 hidden layers for both encoder and decoder) and used a 2D bottleneck layer. In line with Fu et al. (Fu et al., 2019)

, we used a weight term to modulate the mix of KL-loss and reconstruction loss in early training. We trained for 2000 epochs with Adam 

(Kingma and Ba, 2014) (lr=).

We experimented with different numbers of hidden neurons

: Overall, reconstruction loss decreases for larger models, regularized by the KL-loss, leading to diminishing returns and a decision for here. For full details, we provide the training scripts and model comparisons on the project website.

6.3. Global Observations

Here demonstrate the use of GestureMap in a walkthrough of an explorative analysis: Examining the gesture map, the center (Figure 2C) reveals start/end poses (standing upright, arms at rest). We further see, for example, sitting (Figure 2B), clapping (Figure 2D), and raising an arm (Figure 2A). Thus, the map reveals the space of poses elicited by Vatavu (2019) at a glance: For example, their referents included crouch, draw a flower, draw a circle, draw a square, applaud or raise your hands, which all match the poses in our map.

Figure 2. Gesture map for the dataset by Vatavu (2019). Pose landmarks represent poses in that part of the learned gesture space. Marked areas are referenced in Section 6.3.

[]A grid of body poses representing the space of gesture poses. Four boxes highlight different areas in this visualization and named A through D. The concrete description is given in the text.

Using overlays in GestureMap, we can identify similarity and differences between gestures across referents: For example, Figure 3 (left) shows that crouch, draw circle, draw flower, draw square share common behavior; their scatter points largely overlap in the region that encodes “raised arm” behavior. In contrast, for instance, gestures proposed for crouch cover a different region (pink).

The variance plot in GestureMap (Figure 3 right) indicates that proposals for crouch and draw flower vary more than for draw circle and draw square. Potentially, for the children the basic shapes afforded less flexible interpretation than a flower or crouching.

We defined a consensus measure on this variability (Section 4.2): Comparing this variability between all referents, our results largely agree with Vatavu (2019): In particular, applaud, fly like a bird and hands up show high consensus while climb ladder, crouch, turn around have low consensus.

Figure 3. Left: Scatter plots show gesture poses for four referents elicitet by Vatavu (2019) (crouch, draw circle, draw flower, draw square). Right: Variances of the gestures’ DTW distances to their average gesture sequence.

[]Two plots are shown side-by-side. The plots highlight that the variance of the gestures around their average sequence differ between different referent groups. In descending order the variances are largest for crouch, draw flower draw square, and finally, draw circle.

6.4. Local Observations

Figure 4. Gesture proposals for throw ball from four people (different colors). Trials per person are not discernible (same color), yet the colored paths distinctly cover different regions, revealing high consistency per person.

[]Multiple embedded 2D gesture paths are projected on the gesture map. The gesture sequences correspond to different study participants. Each participant’s gestures cover a region of the gesture map that is different from the other participants.

Proposals for crouch form two main clusters (pink points in Figure 3 left), one in the region of starting poses, another in sitting/crouching regions. Thus, GestureMap visually reveals that people interpreted crouch in different ways, matching the high variance (Figure 3 right). Examining the map locally, in combination with gesture animations, reveals that some children sat on the floor, some on their heels, some crawled on hands/knees, and others stood with a stooped body posture. Some additionally jumped at the end of their gesture proposals to get back onto their feet.

As another such example, for throw ball, behavior can be categorized into four clusters: Most children used their right hand, others used two hands, and some kicked the ball. Only a few used the left hand. As Figure 4 shows, the children mostly stuck to their interpretation across multiple repeated trials for that referent, revealing consistency (cf. (Anthony et al., 2013)). This is an example for using GestureMap’s spatial visualisation of gestures as paths for visual comparison via shape.

6.5. Interactive Clustering

For a typical elicitation study, such as this one by Vatavu (2019)

, it is reasonable to expect clusters induced by the referents. Therefore, to demonstrate our proposed clustering analysis we removed the referent labels and then evaluated if k-means finds clusters that match the original referents.

Concretely, we ran the clustering with 15 sequences chosen randomly. We then inspected the mix of original referents present in the gestures assigned to each found cluster. We repeated this ten times and made these observations:

  • Our k-means clustering identified those referents with high agreement (e.g. hands up, crouch, applaud, fly like a bird).

  • Gestures for referents with much common behavior appeared as one cluster (e.g. draw circle, square, flower). Note that this is not necessarily “wrong”, since a behaviour “draw something” would also have been a plausible referent.

  • The resting pose was detected as a separate cluster.

  • Other referents were (clearly) present only in some of the clustering repetitions.

Overall, this indicates the potential of automated clustering, for example, when examining data from open elicitation with no given referents. We return to ideas for improvements in our discussion.

In another experiment, we applied clustering to look for patterns within a referent: As mentioned, referents such as throw ball and crouch contained distinct patterns, revealed on the map. Indeed, running k-means revealed some of them: For example, for throw ball k-means also detected throwing with the right hand vs using both hands. In contrast, it did not separately find left hand and kicking, presumably since those were proposed only a few times.

6.6. Comparison Between Datasets

Other researchers noted that elicitation findings are spread across multiple venues and need to be consolidated (Villarreal-Narvaez et al., 2020). GestureMap supports this as it offers a platform to visualize and analyze multiple studies. We demonstrate this by creating a gesture map using four datasets (Aloba et al., 2018; Fothergill et al., 2012; Vatavu, 2019; Chen et al., 2015b).

To motivate a concrete example, citetJain2016 showed that observers can distinguish behavior of children and adults. Figure 5 shows all 20 proposals for jump from the data by Aloba et al. (2018), next to the children’s proposals from Vatavu (2019). The gesture paths visit roughly similar main parts of the gesture space, yet the children do not find consensus. Our variance measure also reflects this (Aloba - adults , children ; Vatavu - children ).

Figure 5. Gesture paths for adults and children for “jumping” referents from two studies (Aloba et al., 2018; Jain et al., 2016).

[]Three plots are plotted side-by-side. Each plot shows multiple gesture proposals as embedded 2D paths on-top of the gesture map. The gesture sequences are more aligned for adults than for children.

As a second example, we compared behavior diversity across datasets. Without knowing anything about the referents, Figure 6 already reveals that one dataset (Aloba et al., 2018) (blue) covers a larger region than the other (Vatavu, 2019) (orange). Thus, it seems to contain a more diverse set of body poses. Indeed, this observation can be explained by the longer referent list (58 referents in (Aloba et al., 2018) vs 15 in (Vatavu, 2019)).

Figure 6. Combined gesture space from (Aloba et al., 2018; Fothergill et al., 2012; Vatavu, 2019; Chen et al., 2015b). The density plots projected on this gesture map refer to (Aloba et al., 2018) (blue) and (Vatavu, 2019) (orange).

[]Two density plots show the distribution of body poses from two datasets. One density plot covers a larger area of the gesture map than the other density plot. The larger plot therefore indicates that the corresponding dataset involves many diverse body poses.

7. User Study

To further evaluate GestureMap, we recruited eight HCI researchers (7 male, 1 female) from three universities via e-mail for remote think-alound and interview sessions. Six were familiar with gesture elicitation studies, the other two were interested in analysing gesture sensor data. Five were familiar with machine learning.

7.1. Procedure

The interviews lasted 80 minutes and were conducted via screen-sharing using Skype/Zoom, with GestureMap hosted online such that people could use it on their own computer. We again used the dataset by Vatavu (2019). With people’s consent we recorded the interviews. We encouraged them to think out loud and occasionally asked questions to better understand actions. We took notes and compiled a report from this material. Given the exploratory nature of the interactions and the diversity in people’s approaches this was done in an inductive approach, leading to the themes in Section 7.2.

The interviews had four parts: 1) We introduced GestureMap (20 minutes), with a concept presentation, a guided walk through the tool and UI, and opportunities for questions. 2) In an exploratory, manual analysis task people were prompted to use GestureMap to identify groups of behaviors in the gesture proposals for two referents. In real use, researchers would conduct such analyses to better understand elicited data. 3) In a more confirmatory, automatic analysis task we asked them to build on their gained insights to initialize the clustering algorithm and refine the automatic clustering results. In real use, researchers might export this result, for example, for a report, calculations of agreement, etc. 4) The session concluded with a semi-structured interview of at least ten minutes. Here, we inquired into what people liked/disliked about GestureMap, and asked for ideas for improvements and additional features.

7.2. Findings

7.2.1. Initial Use

Upon first use, most people immediately animated a few gestures, saying that this was the most natural and familiar way to view the data Since the map visualization was unfamiliar to them, some had initial difficulties to understand the distinction of single poses (points) and entire gestures (paths). These people found the animation particularly important: Seeing the 3D skeleton and the 2D path animated in sync highlighted that a gesture was a path on the map and thus helped them to get familiar with the map concept. Summarising their initial experience, one person said: “Although, the learning curve […] is steep, once you understand the core concepts, this tool offers a great overview of the entire behavior that is captured in the dataset.”

7.2.2. Statistical Plot Overlays

We asked the researchers to analyze the proposals for crouch and throw ball

. Throughout the interview we noticed that all participants preferred the scatter plot over the density plot. When asked why they keep returning to the scatter plot, they said that it provided more detail and that density can also be estimated from scatter points. They also said that points were visually closer to the data (point=pose).

7.2.3. Details of the Gesture Map View

When study participants paused their exploration for a longer period, we inquired why that was the case. Some people noted that they struggled to find a specific pose on the map. They suggested to increase the visibility of the poses by showing fewer and larger landmarks. Another researcher felt that the map should show more detail so it would be easier to judge differences and transitions of poses. Together, this feedback motivates a changeable grid size (our zoom was implemented to always keep an 11 11 grid).

Some found similar poses encoded in different map regions and noted that these should ideally reside in one area. This is an artefact of dimensionality reduction, as we discuss further in Section 8.2

7.2.4. Exploration Strategies

When we asked the participants what the main aspect was that they used to determine interesting behavioral patterns, we observed diverse analysis strategies, but we broadly highlight two main ones:

1) Shape driven analysis: Some started by skimming through gestures to get an overview of their different path shapes on the map. They stopped to examine gestures in more detail that differed largely from the shapes seen so far. In a sense, they searched for outlier behavior based on the path shapes. These participants noted that the 2D gesture path visualization offers a quick way to spot irregular behavior and that their analysis becomes an active search versus passively watching every gesture individually.

2) Position driven analysis: In contrast, other participants focused entirely on the scatter points as template poses. Using expectations about possible behavior for a gesture proposal (e.g. left vs right hand throwing), they examined scatter points in those map regions that based on the landmark skeletons encoded related poses.

7.2.5. Manually Forming Clusters

Regardless of their initial analysis strategy, when asked which feature they would use to group the gestures, people agreed on the path shapes as primary discerning feature (strategy 1). For the crouch referent, everyone distinguished two to three groups of behaviors. For throw ball, everyone found at least three (left/right/both handed throwing). Some also found the kicking behavior as described in Section 6. Overall, the researchers felt comfortable with grouping the proposals based on the path shapes. However, there were some complex paths (e.g. crossing over many poses on the map) that people were unable to assign to a group. One person suggested to create an outlier group for these.

7.2.6. Interactive Clustering

We asked people to use the interactive clustering tool based on their observations in the first task.

Next, they were asked to initialize the clustering algorithm using their knowledge from the previous task. Now, all participants specifically searched for individual gesture proposals as templates (strategy 2) and used those to initialize the algorithm.

However, the resulting computed centroids often deviated from people’s expectations, and thus did not immediately make sense to them. One user noted that one still has to inspect all gesture proposals in order to choose suitable initialisations for the k-means algorithm. On the positive side, the researchers liked the refinement step, where they could reassign proposals to another cluster. These reassignments, however, were not yet considered when rerunning the clustering algorithm in the current implementation.

Overall, after being asked to give a final verdict over the interactive clustering feature, all deemed it important. However, they noted that it should be more accurate and manually refined assignments need to be respected when rerunning the clustering algorithm, thus enabling iterative, interactive use. Technically, this can be readily implemented by initialising k-means with the current (refined) assignments.

8. Discussion

8.1. Extending the Gesture Elicitation Toolbox

GestureMap builds on and extends functionalities of previous tools for gesture elicitation: It combines 1) gesture modeling and visualization, 2) automatic computation of elicitation metrics, and 3) interactive clustering to provide an integrated analysis platform.

Seeing this and related work as a “toolbox”, researchers may now consider various options: For example, AGATE 2.0 (Vatavu and Wobbrock, 2016) is a highly specialized tool to compute agreement, which assumes a given labeled dataset. GestureMap could be used to label data and export it for analysis in tools like this.

Alternatively, Ali et al. (2018) proposed a crowd platform for annotation, yet without computational support for the workers, such as alternative gesture representations or similarity measures. Such support as shown in GestureMap could be combined with a crowd approach in the future. GestureMap is already implemented as a web-based tool, rendering it flexible and open to such integration.

Looking ahead, new cloud elicitation tools (Magrofuoco and Vanderdonckt, 2019; Ali et al., 2019) yield large datasets. GestureMap

’s concepts support handling large data, visually summarised and explored via our map view.

Finally, the “toolbox” in the literature includes several formalized agreement measures (Wobbrock et al., 2009b; Vatavu, 2012). These could be used also with our interactive clustering, for example, by plugging in the cluster cardinalities instead of subjective gesture group counts.

8.2. Reflection on Model & Clustering Choices

Here, we highlight model and clustering aspects to consider.

8.2.1. Smoothness of the Latent Space

A smooth latent space facilitates suitable visualization by reducing “jumps” in gesture paths. These occur due to recording issues (e.g. sensor occlusion in some frames) or when subsequent poses are embedded far apart in the 2D space. While some models address this (e.g. we used a VAE instead of AE), there is no universal “natural” 2D layout of body poses and some artifacts are likely to exist for most models and datasets. Besides technical model improvements, visualization concepts could be explored to address this as well (e.g. visually mark “jumps” along the gesture path).

8.2.2. Cluster Approaches

A difficulty with k-means is setting the number of clusters. As an example strategy, to detect the subgroup behavior for the throw ball referent, we quickly skimmed through the gestures using the map and visually identified rough patterns. We then chose correspondingly. We chose k-means, because it readily integrates with the gesture map and the ”variance around mean gesture” that we introduced in section 4.2. Color coding the cluster results can be done quickly. Jang et al. (2014) proposed to use interactive hierarchical clustering. Integrating such a tree-like layout into the gesture map adds complexity and might be material for future endeavours. We can imagine that average gestures calculated with the DBA-algorithm can be used to visualize the non-leaf nodes in the hierarchical tree. In addition, interactive hierarchical clustering would eliminate the need for choosing the number of clusters beforehand.

8.2.3. Feature Representation

Hand-engineered features (Jang et al., 2014; Aloba et al., 2020; Vatavu, 2017) may help with the interpretation, however, they may be specific to a sensor and interaction setup. As an exploratory tool, GestureMap’s learned space is applicable to new and changing setups, without developing hand-engineered features first. Furthermore, our learned representation supports gesture simulation useful to examine regions of the behavior space that were not covered by participants.

8.3. Opportunities for Research & Applications

Here we outline further ideas enabled or supported by GestureMap.

8.3.1. Supporting Meta-Analysis and Consolidation

GestureMap empowers researchers to compare data across studies (cf. Section 6). As a community, we could consolidate our findings in a meta-map of many studies, as a sensor data-driven complement to literature surveys (Villarreal-Narvaez et al., 2020). For instance, such a map might reveal which gestures and poses are most common or intensely studied. Separate maps could also compare gesture spaces for different contexts, devices, etc., for example, to better understand the influences of such factors.

8.3.2. Enabling Map-based Gesture Authoring

GestureMap could be extended to define new gestures: For example, users could draw a gesture as a path on the map. Since the underlying latent variable model can simulate new behavior (decoding), such a drawn path implicitly defines a pose sequence that could be exported as a template-based gesture recogniser. As an alternative to drawing, users could demonstrate the gesture in front of the sensor, with a “cursor” moving on the map live. Users could also select recorded gestures on the map, labelled manually or with help from our clustering tool, to train a classifier. Such a recognizer then also could be used in other tools to support sensor feed annotation (e.g. (Nebeling et al., 2015)).

8.3.3. Enabling Analysis of Unseen Behavior

So far, elicitation has focused on observed gestures, yet it might also be relevant to examine why behavior was not observed. GestureMap enables this: Researchers can explore map areas without data, which may reveal unlikely behavior, or indicate issues with interaction (e.g. anatomically difficult or tiring gestures) or the sensor (e.g. gestures leading to self-occlusion of body parts). In this way, GestureMap supports the diagnosis of challenges and limitations in the joint user-sensor space of an interactive system (cf. (Williamson and Murray-Smith, 2012)).

8.3.4. Supporting Live Exploration and Monitoring

GestureMap could be extended to more than post-hoc analysis: For example, we could embed live sensor data and continuously update the underlying mode. This live embedding provides a monitoring tool, for example, for participants to see their currently performed gesture (e.g. shown as a “cursor”/point on the map), possibly to nudge them towards exploring new regions of the behavior space (cf. (Williamson and Murray-Smith, 2012)). One could also predefine a gesture path to monitor live performances and to judge deviation from this “template”, possibly to learn/teach a movement sequence. Related, gesture sets are mostly presented as drawings and videos today (McAweeney et al., 2018). Instead, GestureMap could be used to show gestures to users, allowing them to reenact and explore them with live monitoring via the map.

9. Conclusion

As our key contribution, we presented a set of visualization and analysis concepts for gesture elicitation data and a tool that implements them: GestureMap is the first visual analytics tool for gesture elicitation which directly visualises the space of gestures, using a learned 2D embedding. It further leverages the computation of average gestures to enable researchers to 1) represent gesture groups with one gesture; 2) assess consensus as variance around this average gesture; and 3) cluster gestures automatically.

Expert users especially liked the visual expressiveness of GestureMap, as it quickly summarizes the underlying dataset. The extensibility of GestureMap further encourages future work to employ machine learning as a tool for analysis of human behavior. With this work, we contribute to the vision of more widespread use of applicable computational methods in HCI, also to support more extensive and cost-efficient large-scale, data-driven HCI work. Given the proliferation of crowd platforms to collect large datasets, we expect computational methods and visual analytics as proposed here to become indispensable tools for many future HCI studies.

GestureMap and further materials are available on the project website:

This project is funded by the Bavarian State Ministry of Science and the Arts and coordinated by the Bavarian Research Institute for Digital Transformation (bidt).


  • A. X. Ali, M. R. Morris, and J. O. Wobbrock (2018) Crowdsourcing similarity judgments for agreement analysis in end-user elicitation studies. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology, UIST ’18, New York, NY, USA, pp. 177–188. External Links: ISBN 9781450359481, Link, Document Cited by: §2.2, §3.1.6, Table 1, §8.1.
  • A. X. Ali, M. R. Morris, and J. O. Wobbrock (2019) Crowdlicit: a system for conducting distributed end-user elicitation and identification studies. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI ’19, New York, NY, USA, pp. 1–12. External Links: ISBN 9781450359702, Link, Document Cited by: §2.2, Table 1, §8.1.
  • A. Aloba, G. Flores, J. Woodward, A. Shaw, A. Castonguay, I. Cuba, Y. Dong, E. Jain, and L. Anthony (2018) Kinder-gator: the uf kinect database of child and adult motion. Goslar, DEU, pp. 13–16. Cited by: Figure 5, Figure 6, §6.1, §6.6, §6.6, §6.6.
  • A. Aloba, J. Woodward, and L. Anthony (2020) FilterJoint: toward an understanding of whole-body gesture articulation. New York, NY, USA, pp. 213–221. External Links: ISBN 9781450375818, Link, Document Cited by: §8.2.3.
  • L. Anthony, R. Vatavu, and J. O. Wobbrock (2013) Understanding the consistency of users’ pen and finger stroke gesture articulation. CAN, pp. 87–94. External Links: ISBN 9781482216806 Cited by: §6.4.
  • F. Cafaro, L. Lyons, and A. N. Antle (2018) Framed guessability: improving the discoverability of gestures and body movements for full-body interaction. New York, NY, USA, pp. 1–12. External Links: ISBN 9781450356206, Link, Document Cited by: §3.4.2.
  • C. Chen, R. Jafari, and N. Kehtarnavaz (2015a) UTD-mhad: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. pp. 168–172. Cited by: §6.1.
  • C. Chen, R. Jafari, and N. Kehtarnavaz (2015b) UTD-mhad: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. pp. 168–172. Cited by: Figure 6, §6.6.
  • Z. Chen, X. Ma, Z. Peng, Y. Zhou, M. Yao, Z. Ma, C. Wang, Z. Gao, and M. Shen (2018) User-defined gestures for gestural interaction: extending from hands to other body parts. International Journal of Human–Computer Interaction 34 (3), pp. 238–250. Cited by: §3.1.2, Table 1.
  • Community (2020a) Note: 2020-09-15 Cited by: §5.2.
  • Community (2020b) Note: Accessed: 2020-09-15 External Links: Link Cited by: §5.2.
  • N. K. Dim, C. Silpasuwanchai, S. Sarcar, and X. Ren (2016) Designing mid-air tv gestures for blind people using user- and choice-based elicitation approaches. In Proceedings of the 2016 ACM Conference on Designing Interactive Systems, DIS ’16, New York, NY, USA, pp. 204–214. External Links: ISBN 9781450340311, Link, Document Cited by: §3.1.2, Table 1.
  • C. Doersch (2016) Tutorial on variational autoencoders. External Links: 1606.05908 Cited by: §6.2.
  • J. L. E, I. L. E, J. A. Landay, and J. R. Cauchard (2017) Drone & wo: cultural influences on human-drone interaction techniques. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, CHI ’17, New York, NY, USA, pp. 6794–6799. External Links: ISBN 9781450346559, Link, Document Cited by: 1st item, 3rd item, §1.
  • Facebook (2020) Note: Accessed: 2020-09-15 External Links: Link Cited by: §5.2.
  • S. Few and P. Edge (2008) What ordinary people need most from information visualization today. Cited by: §2.3.
  • S. Fothergill, H. Mentis, P. Kohli, and S. Nowozin (2012) Instructing people for training gestural interactive systems. New York, NY, USA, pp. 1737–1746. External Links: ISBN 9781450310154, Link, Document Cited by: Figure 6, §6.1, §6.6.
  • O. Foundation (2020) Note: Accessed: 2020-09-15 External Links: Link Cited by: §5.2.
  • H. Fu, C. Li, X. Liu, J. Gao, A. Celikyilmaz, and L. Carin (2019) Cyclical annealing schedule: a simple approach to mitigating kl vanishing. External Links: 1903.10145 Cited by: §6.2.
  • I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial networks. External Links: 1406.2661 Cited by: §2.3.
  • M. Inc. (2020a) Note: Accessed: 2020-09-15 External Links: Link Cited by: §5.2.
  • P. T. Inc. (2020b) Note: Accessed: 2020-09-15 External Links: Link Cited by: §5.2.
  • E. Jain, L. Anthony, A. Aloba, A. Castonguay, I. Cuba, A. Shaw, and J. Woodward (2016) Is the motion of a child perceivably different from the motion of an adult?. ACM Trans. Appl. Percept. 13 (4). External Links: ISSN 1544-3558, Link, Document Cited by: Figure 5.
  • S. Jang, N. Elmqvist, and K. Ramani (2014) GestureAnalyzer: visual analytics for pattern analysis of mid-air hand gestures. In Proceedings of the 2nd ACM Symposium on Spatial User Interaction, SUI ’14, New York, NY, USA, pp. 30–39. External Links: ISBN 9781450328203, Link, Document Cited by: §2.2, §3.1.1, §3.1.5, §3.1.6, §3.1.7, §3.2.1, Table 1, §4.3, §8.2.2, §8.2.3.
  • S. K. Kane, J. O. Wobbrock, and R. E. Ladner (2011) Usable gestures for blind people: understanding preference and performance. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’11, New York, NY, USA, pp. 413–422. External Links: ISBN 9781450302289, Link, Document Cited by: 3rd item.
  • H. Kim, J. Choo, H. Park, and A. Endert (2016) InterAxis: steering scatterplot axes via observation-level interaction. IEEE Transactions on Visualization and Computer Graphics 22 (1), pp. 131–140. Cited by: §2.3, §3.2.1.
  • L. H. Kim, D. S. Drew, V. Domova, and S. Follmer (2020) User-defined swarm robot control. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, CHI ’20, New York, NY, USA, pp. 1–13. External Links: ISBN 9781450367080, Link, Document Cited by: 1st item, §1, §2.1, §2.1.
  • D. P. Kingma and M. Welling (2013) Auto-encoding variational bayes. External Links: 1312.6114 Cited by: §2.3, §2.3.
  • D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. External Links: 1412.6980 Cited by: §6.2.
  • C. Kühnel, T. Westermann, F. Hemmert, S. Kratz, A. Müller, and S. Möller (2011) I’m home: defining and evaluating a gesture set for smart-home control. International Journal of Human-Computer Studies 69 (11), pp. 693 – 704. External Links: ISSN 1071-5819, Document, Link Cited by: §2.1.
  • B. C. Kwon, H. Kim, E. Wall, J. Choo, H. Park, and A. Endert (2016) Axisketcher: interactive nonlinear axis mapping of visualizations through user drawings. IEEE transactions on visualization and computer graphics 23 (1), pp. 221–230. Cited by: §2.3.
  • N. D. Lawrence (2003) Gaussian process latent variable models for visualisation of high dimensional data. Cambridge, MA, USA, pp. 329–336. Cited by: §2.3.
  • N. D. Lawrence (2004) Gaussian process latent variable models for visualisation of high dimensional data. In Advances in neural information processing systems, pp. 329–336. Cited by: §2.3.
  • D. Ledo, S. Houben, J. Vermeulen, N. Marquardt, L. Oehlberg, and S. Greenberg (2018) Evaluation strategies for hci toolkit research. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI ’18, New York, NY, USA, pp. 1–17. External Links: ISBN 9781450356206, Link, Document Cited by: §6.
  • Y. Liu, E. Jun, Q. Li, and J. Heer (2019)

    Latent space cartography: visual analysis of vector space embeddings

    In Computer Graphics Forum, Vol. 38, pp. 67–78. Cited by: §2.3, §3.1.3.
  • N. Magrofuoco and J. Vanderdonckt (2019) Gelicit: a cloud platform for distributed gesture elicitation studies. Proc. ACM Hum.-Comput. Interact. 3 (EICS). External Links: Link, Document Cited by: §2.2, §3.1.7, Table 1, §8.1.
  • E. McAweeney, H. Zhang, and M. Nebeling (2018) User-driven design principles for gesture representations. New York, NY, USA, pp. 1–13. External Links: ISBN 9781450356206, Link, Document Cited by: §8.3.4.
  • L. McInnes, J. Healy, and J. Melville (2018) UMAP: uniform manifold approximation and projection for dimension reduction. External Links: 1802.03426 Cited by: §2.3.
  • M. R. Morris (2012) Web on the wall: insights from a multimodal interaction elicitation study. In Proceedings of the 2012 ACM International Conference on Interactive Tabletops and Surfaces, ITS ’12, New York, NY, USA, pp. 95–104. External Links: ISBN 9781450312097, Link, Document Cited by: §2.1.
  • M. A. Nacenta, Y. Kamber, Y. Qiang, and P. O. Kristensson (2013) Memorability of pre-designed and user-defined gesture sets. In Proceedings of the SIGCHI Conference on Human Factors in Computing SystemsProceedings of the 2018 CHI Conference on Human Factors in Computing SystemsProceedings of the SIGCHI Conference on Human Factors in Computing SystemsProceedings of the SIGCHI Conference on Human Factors in Computing Systems2015 IEEE International conference on image processing (ICIP)Proceedings of the 6th International Conference on Neural Information Processing SystemsProceedings of the 16th International Conference on Neural Information Processing SystemsProceedings of the 28th Annual Conference on Computer Graphics and Interactive TechniquesProceedings of the Second International Conference on Knowledge Discovery and Data MiningProceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete AlgorithmsProceedings of the 25th International Conference on Intelligent User InterfacesProceedings of the fifth Berkeley symposium on mathematical statistics and probabilityProceedings of the 2018 CHI Conference on Human Factors in Computing SystemsProceedings of the 39th Annual European Association for Computer Graphics Conference: Short Papers2015 IEEE International conference on image processing (ICIP)Proceedings of the 2020 International Conference on Multimodal InteractionProceedings of the 16th International Conference on Multimodal InteractionProceedings of Graphics Interface 2013, CHI ’13CHI ’18CHI ’11CHI ’12NIPS’93NIPS’03SIGGRAPH ’01KDD’96SODA ’07IUI ’20CHI ’18EGICMI ’20ICMI ’14GI ’13, Vol. 1, New York, NY, USA. External Links: ISBN 9781450318990, Link, Document Cited by: §3.4.2.
  • M. Nebeling, D. Ott, and M. C. Norrie (2015) Kinect analysis: a system for recording, analysing and sharing multimodal interaction elicitation studies. In Proceedings of the 7th ACM SIGCHI Symposium on Engineering Interactive Computing Systems, EICS ’15, New York, NY, USA, pp. 142–151. External Links: ISBN 9781450336468, Link, Document Cited by: §2.2, §3.1.1, §3.1.5, §3.1.6, §3.1.7, Table 1, §8.3.2.
  • A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer (2017) Automatic differentiation in pytorch. Note: Accessed: 2020-09-15 External Links: Link Cited by: §5.2.
  • F. Petitjean, A. Ketterlin, and P. Gançarski (2011) A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognition 44 (3), pp. 678–693. Cited by: 2nd item, §4.1.
  • T. Piumsomboon, A. Clark, M. Billinghurst, and A. Cockburn (2013) User-defined gestures for augmented reality. In CHI ’13 Extended Abstracts on Human Factors in Computing Systems, CHI EA ’13, New York, NY, USA, pp. 955–960. External Links: ISBN 9781450319522, Link, Document Cited by: §2.1.
  • D. Sacha, A. Stoffel, F. Stoffel, B. C. Kwon, G. Ellis, and D. A. Keim (2014) Knowledge generation model for visual analytics. IEEE transactions on visualization and computer graphics 20 (12), pp. 1604–1613. Cited by: Table 1.
  • C. Silpasuwanchai and X. Ren (2015) Designing concurrent full-body gestures for intense gameplay. International Journal of Human-Computer Studies 80, pp. 1–13. Cited by: §3.1.2.
  • A. Skupin (2002) A cartographic approach to visualizing conference abstracts. IEEE Comput. Graph. Appl. 22 (1), pp. 50–58. External Links: ISSN 0272-1716, Link, Document Cited by: §3.2.1.
  • D. Smilkov, N. Thorat, C. Nicholson, E. Reif, F. B. Viégas, and M. Wattenberg (2016) Embedding projector: interactive visualization and interpretation of embeddings. External Links: 1611.05469 Cited by: §2.3, §3.1.3.
  • T. R. Smith and J. E. Gilbert (2018) Dancing to design: a gesture elicitation study. In Proceedings of the 17th ACM Conference on Interaction Design and Children, IDC ’18, New York, NY, USA, pp. 638–643. External Links: ISBN 9781450351522, Link, Document Cited by: 3rd item.
  • A. Spurr, J. Song, S. Park, and O. Hilliges (2018)

    Cross-modal deep variational hand pose estimation

    External Links: 1803.11404 Cited by: §6.2.
  • T. Tsandilas (2018) Fallacies of agreement: a critical review of consensus assessment methods for gesture elicitation. ACM Trans. Comput.-Hum. Interact. 25 (3). External Links: ISSN 1073-0516, Link, Document Cited by: 1st item, 2nd item, §1, §2.2, Table 1.
  • E. R. Tufte (1986) The visual display of quantitative information. Graphics Press, USA. External Links: ISBN 096139210X Cited by: §2.2, §2.3.
  • J. W. Tukey (1977) Exploratory data analysis. Vol. 2, Reading, MA. Cited by: §3.4.
  • L. Van Der Maaten, E. Postma, and J. Van den Herik (2009) Dimensionality reduction: a comparative. Cited by: §2.3.
  • M. M. Van Hulle (2012) Self-organizing maps.. Cited by: §2.3.
  • R. Vatavu, L. Anthony, and J. O. Wobbrock (2014) Gesture heatmaps: understanding gesture performance with colorful visualizations. New York, NY, USA, pp. 172–179. External Links: ISBN 9781450328852, Link, Document Cited by: §2.3.
  • R. Vatavu and J. O. Wobbrock (2016) Between-subjects elicitation studies: formalization and tool support. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, CHI ’16, New York, NY, USA, pp. 3390–3402. External Links: ISBN 9781450333627, Link, Document Cited by: §2.1, Table 1, §8.1.
  • R. Vatavu and I. Zaiti (2014) Leap gestures for tv: insights from an elicitation study. In Proceedings of the ACM International Conference on Interactive Experiences for TV and Online Video, TVX ’14, New York, NY, USA, pp. 131–138. External Links: ISBN 9781450328388, Link, Document Cited by: 1st item, §1, §2.1.
  • R. Vatavu (2012) User-defined gestures for free-hand tv control. In Proceedings of the 10th European Conference on Interactive TV and Video, EuroITV ’12, New York, NY, USA, pp. 45–48. External Links: ISBN 9781450311076, Link, Document Cited by: 1st item, §1, §1, §2.1, §2.1, §8.1.
  • R. Vatavu (2013) The impact of motion dimensionality and bit cardinality on the design of 3d gesture recognizers. Int. J. Hum.-Comput. Stud. 71 (4), pp. 387–409. External Links: ISSN 1071-5819, Link, Document Cited by: §2.1.
  • R. Vatavu (2017) Beyond features for recognition: human-readable measures to understand users’ whole-body gesture performance. International Journal of Human–Computer Interaction 33 (9), pp. 713–730. External Links: Document, Link, Cited by: §8.2.3.
  • R. Vatavu (2019) The dissimilarity-consensus approach to agreement analysis in gesture elicitation studies. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI ’19, New York, NY, USA, pp. 1–13. External Links: ISBN 9781450359702, Link, Document Cited by: 2nd item, §1, §1, §2.1, §4.2, §4.2, Figure 2, Figure 3, Figure 6, §6.1, §6.2, §6.3, §6.3, §6.5, §6.6, §6.6, §6.6, §7.1.
  • S. Villarreal-Narvaez, J. Vanderdonckt, R. Vatavu, and J. O. Wobbrock (2020) A systematic review of gesture elicitation studies: what can we learn from 216 studies?. In Proceedings of the 2020 ACM Designing Interactive Systems Conference, DIS ’20, New York, NY, USA, pp. 855–872. External Links: ISBN 9781450369749, Link, Document Cited by: 3rd item, 4th item, 2nd item, §1, §1, §3.1.4, Table 1, item 3, item 1, §6.6, §8.3.1.
  • J. Wenskovitch, M. Dowling, and C. North (2020) With respect to what? simultaneous interaction with dimension reduction and clustering projections. New York, NY, USA, pp. 177–188. External Links: ISBN 9781450371186, Link, Document Cited by: §3.2.1.
  • J. Williamson and R. Murray-Smith (2012) Rewarding the original: explorations in joint user-sensor motion spaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’12, New York, NY, USA, pp. 1717–1726. External Links: ISBN 9781450310154, Link, Document Cited by: §1, §1, item 1, §8.3.3, §8.3.4.
  • J. O. Wobbrock, H. H. Aung, B. Rothrock, and B. A. Myers (2005) Maximizing the guessability of symbolic input. In CHI ’05 Extended Abstracts on Human Factors in Computing Systems, CHI EA ’05, New York, NY, USA, pp. 1869–1872. External Links: ISBN 1595930027, Link, Document Cited by: §1, §1, §2.1.
  • J. O. Wobbrock, M. R. Morris, and A. D. Wilson (2009a) User-defined gestures for surface computing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’09, New York, NY, USA, pp. 1083–1092. External Links: ISBN 9781605582467, Link, Document Cited by: §2.1.
  • J. O. Wobbrock, M. R. Morris, and A. D. Wilson (2009b) User-defined gestures for surface computing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’09, New York, NY, USA, pp. 1083–1092. External Links: ISBN 9781605582467, Link, Document Cited by: 1st item, §1, §2.1, §2.1, item 2, §8.1.