Which One to Go: Security and Usability Evaluation of Mid-Air Gestures

11/26/2018 ∙ by Wenyuan Xu, et al. ∙ 0

With the emerging of touch-less human-computer interaction techniques and gadgets, mid-air hand gestures have been widely used for authentication. Much literature examined either the usability or security of a handful of gestures. This paper aims at quantifying usability and security of gestures as well as understanding their relationship across multiple gestures. To study gesture-based authentication, we design an authentication method that combines Dynamic Time Warping (DTW) and Support Vector Machine (SVM), and conducted a user study with 42 participants over a period of 6 weeks. We objectively quantify the usability of a gesture by the number of corners and the frame length of all gesture samples, quantify the security using the equal error rate (EER), and the consistency by EER over a period of time. Meanwhile, we obtain subjective evaluation of usability and security by conducting a survey. By examining the responses, we found that the subjective evaluation confirms with the objective ones, and usability is in inverse relationship with security. We studied the consistency of gestures and found that most participants forgot gestures to some degree and reinforcing the memorization of gestures is necessary to improve the authentication performance. Finally, we performed a study with another 17 participants on shoulder surfing attacks, where attackers can observe the victims multiple times. The results show that shoulder surfing does not help to boost the attacks.



There are no comments yet.


page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The proliferation of various gesture capturing devices (e.g., touch screen and depth sensors) has enabled user-friendly ways to operate computers as well as to authenticate users. Essentially, such gesture-based authentication is behavioral biometrics. Compared with traditional methods (e.g., passwords, tokens, or physiological biometrics), gesture-based authentication has several advantages and is believed to be resistance to shoulder surfing, password thieves, or token loss. Not surprisingly, much work has been devoted into gesture-based authentication, and researchers have studied both contact-based and mid-air gestures. The contact-based gestures are harvested while users touch I/O devices physically. In comparison, mid-air gestures require no physical contact of devices, and thus can eliminate smudge attacks [1], avoid bacteria propagation, and allow scenarios where touch is impossible (e.g., in a clean room [2]). In light of these advantages, in this paper, we investigate the security and usability of mid-air gestures.

Fig. 1: Illustration of using various gestures as authentication inputs. Left picture shows waving with multiple fingers and right picture shows writing a You with one finger.

Already, researchers have proposed and studied various mid-air gestures for authentication, which, to name a few, include signature gestures captured by a Leap Motion controller [3], two ‘upward’ hand movements [4], simple gestures, such as drawing shapes, symbols, digits, etc., captured by a web camera with a short range depth sensor [5]. These works provide insights towards designing gestures for authentication. However, they either focused only on the security of mid-air gestures or performed preliminary user study on a limited number of gestures over a short period.

It is so far unclear, does a complicated gesture always map to a higher level of security? Will a complicated gesture encompass larger variance and cause inconsistency in identifying a user? Does a complicated gesture represent poor usability, e.g., it takes long time to perform and is difficult to remember? Does gesture-based authentication share the same dilemma of passwords: what is secure is difficult to remember? Given a gesture, can we provide quick feedback on its security level and thus assist in choosing a better gesture? This paper aims at answering these questions. In particular, we first selected a collection of representative mid-air gestures and user-defined gestures with the goal of exploring the trade-off between usability and security. We quantify security and usability of each gesture by using both

objective metrics that are calculated based on gesture samples and subjective metrics derived from a survey. Using the gesture samples collected by users over 6 weeks and the survey responded by them, we managed to show that the quantitative metrics match with subjective perception of users and thus can be used to quantify the security and usability of gestures. Since we discovered that the usability and security are in inverse relationship, we can use the quantitative metrics of usability, i.e., the number of corners and the length of a gesture, to quantify the security of the gestures and provide quick feedback of a gesture. We summarize our contributions as below.

  • We proposed a set of metrics to quantify the usability and security of a gesture, which include objective metrics that are calculated based on gesture samples and subjective metrics derived from a survey.

  • We proposed an authentication method that combines a template-based method (DTW) and a machine-learning based classifier (SVM). The combined method can handle large spatial-temporal variations of a gesture by using a small number of training samples.

  • We conducted two studies to quantify the gesture’s security and usability: an objective evaluation by authenticating gesture samples collected over 6 weeks and a subjective evaluation by gathering well-designed questionnaires from users.

  • Our studies indicate that usability and security are in inverse relationship and thus we can utilize simple metrics (the number of corners and frames) to quantify the security of a gesture for quick feedback. In addition, our study suggests that repeated performing a gesture can improve users’ perception of usability and help improve the consistency of gestures.

  • Our study on shoulder surfing shows that hand gestures are hard to mimic and shoulder surfing attack is not a main thread to our authentication system.

Ii Overview

In this section, we overview our problem definition and define metrics to quantify security, usability, as well as consistency.

Ii-a Problem

Numerous gestures have been proposed to authenticate users, yet little has been done to compare their performance in terms of security and usability. This paper aims at filling in the blank by quantifying the security and usability of different mid-air gestures. We quantify security and usability of each gesture by using both objective metrics that are calculated based on gesture samples and subjective metrics derived from a survey. In particular, this paper tries to answer the following questions.

  • Security question: Given a set of gestures, which gesture maps to the best security level, i.e., it yields the best accuracy of authentication?

  • Usability question: Given a set of gestures, which one is the easiest to use and the most acceptable to users? How to quantify the usability purely using the statistics of gestures?

  • Security vs. Usability: What is the relationship between security and usability when using mid-air gestures for authentication? Does gesture-based authentication share the same dilemma of passwords: secure gestures are more difficult to be remembered?

Ii-B Security

In the context of gesture-based authentication, we define security from the aspects of distinctness and resilience to attacks — i.e., shoulder surfing attack. A secure gesture should contain distinct biometric information that suffices user authentication, i.e., even if two users perform the same gestures, their gesture samples should be distinguishable. In addition, a secure gesture should be resilient to attacks. Since much work claims that mid-air gestures are robust against should surfing attacks without validation, we focus on such attack.

Metrics. we use Equal Error Rate (EER), which is the value where the false rejection rate equals to the false acceptance rate, to quantify distinctness. In addition, we obtain users’ subjective perception of security by conducting a user survey. Details are discussed in Section V-A.

We use precision and recall to analyze the performance of each gesture password for defending against shoulder surfing attack. Precision is the percentage of honest users out of all the users that have passed verification, and it describes how cautious the system is to accept a user. Formally, . Recall is the percentage of the honest users that have been granted access out of all honest users, and it affects the user experience. Formally, .

Ii-C Usability

Motivated by the standard ISO 9241-11 [6], we define the usability of a gesture by considering its efficiency, satisfaction, and learnability. Efficiency describes the resources required from users for successful authentication. Satisfaction reflects the comfort and acceptability of using the gesture, and learnability is defined by the “time of learning”, i.e., how easy is it for users to pass the gesture-based authentication at their first attempt [7]?

Metrics. To objectively quantify the efficiency and satisfaction of a gesture, we calculate the average length of the gesture samples (i.e., how long does a user perform the gesture) and the average number of corners in the gesture samples (i.e., the number of sharp turning points in the gesture). Intuitively, the longer it takes to perform a gesture or the more corners in a gesture, the less convenient the user feels and the poorer the usability. In addition, usability is subject to how users perceive. Thus, we also conducted a comprehensive user survey on the usability of each gesture. Details are discussed in Section V-B.

Ii-D Consistency

Consistency (aka. memorability) can affect both security and usability, and thus we study consistency by itself. An ideal gesture should be consistent over time with little memorization requirements: when users return for authentication after a period of time since the last try, they can still provide gestures that contain the same biometric information as the ones that were initially enrolled for authenticating them. The more consistency, the better security performance and the less effort to pass authentication.

Metrics. We quantified consistency from three aspects:

the variances of each gesture over time (i.e., the frame number and corner number )

the EER of the gesture samples over a period of time (in our case, 6 weeks) .

Iii System Design

We design an authentication system based on a Leap Motion controller, which is a 3D motion sensor and can track the motion of human hands as well as all ten fingers in the 3D space. We define a gesture sample as one measurement that contains a complete gesture, i.e., frames reported by Leap Motion. We develop a program written in Java that integrated Leap Motion’s SDK 2.0v [8] for collecting gestures. After collecting the gesture data, we build a classifier which combines two algorithms (DTW and SVM) to distinguish users.

In this section, we introduce the candidate gestures, feature selection and the classifier of our authentication systems.

Iii-a Gesture Selection

Many types of gestures have been studied in prior work, either in the context of gesture recognition or user authentication. These gestures include but not limited to swipe, zoom in/out, pan and scroll, point, and rotate either on touch screens or in the air [9, 10, 11]; mid-air wave [12]; mid-air signatures [13], etc. Covering every possible type of gestures is difficult, and thus we select a few popular gestures that are used for operating computing system and controlling home appliances (e.g., smart TVs) and/or have been studied specific for authentication, i.e., Swipe, Wave, Zoom and Grab and choose drawing gesture Circle, writing gesture ‘abc’ and user-defined signatures (Sig) as they are studied for authentication purpose. Finally, we let each user define a gesture to reflect his/her preferences that are not included in the aforementioned gestures, we call it User-defined gesture.

Swipe Wave
Circle Zoom
Grab ‘abc’ User-defined Sig
TABLE I: Illustration of the selected gestures. Each color corresponds to the trajectory of one finger.

In total, we select six pre-defined gestures and two user-defined gestures. We illustrate all gestures in Table I.

Pre-Defined Gestures

1) Swipe. Users intentionally swipe his or her hand from one position to another position, and we define Swipe as a one way movement. Nowadays, swiping touch screens is a popular way to turn pages.

2) Wave. Users naturally wave their hand. We choose this gesture because we believe it is easy to perform and might lead to promising usability. Table I illustrates two ways to wave: move up and down (i.e., mimicking ocean waves), or wave back and forth between left and right (i.e., waving hands to represent Hello or goodbye).

3) Zoom. Gestures Zoom in or out require to engage at least two fingers: Zoom either gathers finger-tips toward palm center or spreads out the finger-tips. Both gestures are commonly used for touch screen for changing the font size, showing/hiding a window, etc. We study Zoom in the context of 3D space.

4) Circle. Gesture Circle maps to the hand movement of drawing a circle when all five fingers are stretched out and towards the computer screen. The movement can be preformed clockwise or counter clockwise, and consists of one circle or multiple circles.

5) Grab. Gesture Grab is a quick, sudden clutch, starting with all fingers spread and ending at a fist.

6) ‘abc’. Gesture ‘abc’ is the hand movement of writing a string ‘abc’ with five fingers. It is chosen to test the gestures of writing letters/word.

User-Defined Gestures

1) Sig. We let users sign their initials/first/full names that might be used for signing documents. Some of our participants used all five fingers to sign and some of them used only one finger.

2) User-defined. We let each user to freely make one gesture that he/she believes to be secure and convenient for him/her.

Iii-B Authentication Algorithms

Similar to most biometrics based authentication systems, we utilize supervised classifiers that require training to distinguish users. To effectively authenticate a user based on their mid-air gestures, we design an authentication algorithm that combines Dynamic Time Warping (DTW) [10] and Support Vector Machine (SVM) [14] methods. As a template-based method, DTW is widely used to quantify similarity between samples and only requires a small number of templates. It allows nonrigid warping along the temporal axis and thus can tolerate differences in timing between gestures, i.e., a user may perform the same gestures at slower or faster speeds among trials. SVM is a popular machine learning algorithm that can handle more complex spatial-temporal variations of the same gestures at the cost of a large number of training samples. By combining the DTW and SVM methods, the proposed method can handle large spatial-temporal variations of a gesture by using a small number of training samples.

SVM Training and Classification. We train a binary SVM classifier for each user with template samples. Specifically, given a system with users, to train the SVM classifier for user , we take its template samples as the positive training samples and the template samples from other users as the negative training samples, i.e., samples. For each training sample, we extract its -dimensional DTW-based feature as the input to the SVM classifier, where . With the trained SVM classifiers, we can verify whether a new gesture sample is indeed performed by a user by

computing the DTW distance between and all template samples to obtain the -dimensional feature of the sample , and

input this -dimensional DTW-based feature to the user ’s SVM classifier. If the output of the SVM classifier is positive, the authentication of the user is succeeded. Otherwise, the authentication is failed.

To enroll a new user, we collect his/her template samples, add to the existing template samples, and retrain the SVM classifiers for all the users. In particular, the dimension of the sample feature will increase by when a new user is enrolled, i.e., . Note that it is possible that users may have various feature dimension of their SVM classifiers, depending on the sequence of their enrollment. In the verification stage, given a new gesture sample, we calculate its features, using which the trained SVM classifier verifies it.

Frame-based Hand:
feature grab strength, and pinch strength,
for DTW pitch, yaw, and roll,
palm width, and x, y, z axis of palm
x, y, z axis of arm and wrist.
x, y and z positions of finger-tips,
x, y and z velocities of finger-tips,
x, y and z directions of finger-tips,
angles between consecutive frames.
DTW-based DTW distance to template samples
feature sample s DTW distance to template samples
for SVM
TABLE II: Frame features from Leap Motion and DTW-based gesture-sample features.

Iii-C Feature Selection

Based on our authentication algorithms, we extract two levels of features for DTW and SVM, respectively: frame features and DTW-based features.

Frame Features. A raw data frame of Leap Motion contains frames with each frame containing 20 features for a hand and 11 features for each finger. Frame features consist of features directly from the raw data and the derived ones. The hand features include the following: grab strength and pinch strength, which describe the posture of the hand; pitch, yaw, and roll, which describe the angles of the hand around the , and -axes; palm width; coordinates of palm, arm, and wrist, respectively; hand type, which indicates whether it is a left or right hand; 4 flags of gesture types, i.e., whether it is a circle, a swipe, a key tap, or a screen tap. The finger features include the coordinates, the 3 dimensional velocity, and the moving directions of each finger tip, finger length, and finger width. Combing the features of the hand and its five fingers, we obtain 75 features on each frame from the raw data. In addition to these 75 features, on each frame we generate five new features based on finger features: the distance between finger tip positions in consecutive frames, two angles of finger-tip positions between consecutive frames in plane and planes, one angle in the 3D plane, and one curvature in the 3D plane [8]. This way, we have 25 new features over five fingers and in total we obtain 100 features on each frame.

DTW-Based Features. In the enrollment stage, we collect gesture samples for each of the enrolled anchor users. This way, we in total have template samples , where indicates user and indicates the template samples from each user. Given a gesture sample , we extract a -dimensional sample feature vector by computing and concatenating its DTW distances against all the template samples, by following a fixed order of . This -dimensional sample feature is then used as the input to train and test the SVM classifiers.

Feature Reduction. Since the features of each frame do not contribute equally towards verification, we select a subset of them to compute the DTW distance with the goal of maximizing the verification performance. To evaluate each feature, we use each of these 100 frame features to compute the DTW-based feature as mentioned above for training the SVM classifier and evaluating the average EER over all the users (to be discussed later). We discard the frame features that produce an EER less than 50%. Eventually, we kept frame features. To further boost the verification performance, we calculate the weight for each frame feature and use feature weights for computing the DTW distance between gesture samples.

Iv Data Acquisition

To quantify the gesture’s security and usability, we recruited 42 volunteers (32 males and 10 females) in two universities. Among the 42 participants, 40 people are between and years, majority of whom are college students, and 2 participants are between and years. They were asked to complete gesture collection over 6 weeks and finish a survey in the end. Among the 42 participants, 32 participants perform all types of gestures. To mimic the real scenarios where a user may only need to remember a few User-defined (UD) gestures, the other users in this group only contribute to User-defined gestures. When participants perform gestures, we encouraged them to perform in the most comfortable ways. Each participant was compensated a gift card after completion the whole experiment.

Data Participants Samples Ave. days after
Batch No. excpet UD only 1st collection
1st 32 3400 0
2nd 32 2724 2
3rd 32 2567 5
4th 32 2519 8
5th 31 2352 10
6th 29 1904 12
7th 28 1732 15
8th 28 1693 17
9th 27 1490 24
10th 24 1297 27
11th 14 954 32
12th 13 885 37
13th 13 874 43
TABLE III: Basic information on data collection.

Table III summarizes the information for each round of data collection. Each batch denotes that we collect the participants’ gesture data for one time. In the first two weeks, the participants came to our lab three times per week and in the third and forth weeks, twice per week. For the last two weeks, the participants came to our lab three times in total. The time elapsed between two consecutive data collections are more than 24 hours. In total, we collected 13 batches of data around 6 weeks.

Pre-defined Gestures.

Most of the pre-defined gestures are commonly used on touch screens or pad, and users usually know how to perform them. Nevertheless, users have their own preferences of performing gestures. For instance, a user may wave from left to right, and another user may wave from top to down.

User-defined Gestures. 1) Gesture User-defined. Participants were encouraged to select one gesture that is secure for authentication and convenient to use. Among the 42 participants, 25 chose letter(s) and number(s). 17 participants chose to draw simple shapes, such as mathematical symbols, stars, and the combination of the above shapes or some other strange shapes.

2) Gesture Sig. For convenience concerns, most participants did not sign their whole name, but just their initials, first names, or family names. Among all of them, initials are the most popular choices, which account for . The average length (i.e., the number of English letters) of the collected signatures is with a maximum of 8 and minimum of 1.

Survey. After finishing the gesture collection over six weeks, the participants were asked to answer a survey. The question answers use a 5-point Likert scale, i.e., 5 choices ranging from “Strongly agree” to “Strongly disagree”. The survey consists of three parts: (1) Part one asks for participants’ background information, e.g., gender and age and their preferences on authentication mechanisms. (2) The second part of the survey consists of a set of questions (in Table IV) for each gesture. We modified standard System Usability Scale (SUS) [15] questions to measure the usability of each gesture, and added more questions to measure the memorability. (3) In the end, we asked participants to rate the security level of each gesture if they are going to use these gestures as passwords.

1 I would like to use this gesture frequently.
2 I found it unnecessarily complex.
3 I thought it was easy to use.
4 I would need training to be able to use it.
5 I found it would be performed smoothly.
6 I think I cannot perform the same gesture every time.
7 I would imagine that most people would learn to use it very quickly.
8 I found it very cumbersome to use.
9 I felt very confident using it.
10 I needed to learn a lot of things before I could get going with this
11 I can easily remember how to perform this gesture.
12 It is hard for me to recall this gesture after one week.
TABLE IV: Questions for each gesture in the survey.

V Evaluation

We objectively evaluated the security and usability of all the gestures by analyzing the collected samples, e.g., computing the number of corners, the number of frames, and the EER of each gesture. We also subjectively evaluated the security and usability of all the gestures by conducting a survey from all the users. In the end, we try to explore the relationship between usability and security from both objective and subjective perspectives.

V-a Security Evaluation

We discuss distinctness in this section and consistency in Section VI and shoulder surfing attack in Section VII. In this section, we first summarize the subjective security results reported by participants and then quantify security objectively by EER.

V-A1 Results from Survey Responses

The third part of the survey evaluates each gesture’s security level if it is used as a password, and the question uses a 5-point Likert scale “Least secure” - “Most secure”. In Fig. 2

, the pink bars show the score of security estimation derived from the survey responses. We first count the percentage of participants who chose “Second secure” and “Most secure”. Then we divided the percentage values (

) by 20 to fit the scale of EER (), which are shown in sky blue bars.

From the survey results in Fig. 2, we have the following observations:

Pre-defined simple gestures Wave, Swipe, Grab, Zoom, and Circle are considered insecure.

Gesture ‘abc’ is considered to have the medium security level.

Gestures Sig and User-defined are considered the most secure authentication gestures.

Fig. 2: The security performance evaluated by survey responses from all participants and the EER from the authentication system.

V-A2 Results from Equal Error Rate

To verify the overall performance of the system, we tested all the data with five folds of experiments. For each gesture type and each user, we randomly divided the data into a training set and a test set, and the overview performance is the average results of the multiple rounds experiments. In this experiment, we utilize the gestures collected in the first round, and we have users, and set the number of training samples for each gesture as . Therefore, the feature for gesture has dimensions. To prepare a DTW-based feature for SVM classifiers, we need around ms to compute features of a single testing gesture using a laptop with an Intel i7-2.8 GHz CPU and 8-GB memory.

The sky blue bars in Fig. 2 show the average EER for each gesture. A smaller EER maps to a higher verification performance. The results show that:

  • The user-defined gesture Sig has the lowest error rate (). Even participants may choose the same signature, the accuracy is still high as participants should have different writing styles.

  • Although we observed that most participants chose simple movements, the user-defined gestures User-defined (UD) have error rate ,

  • The gesture ‘abc’ has an error rate of . Although all the participants wrote the same content, we can identify the owner of each sample. Compared with other pre-defined gestures, ‘abc’ is more complex and contains distinct biometric information of participants.

  • Other pre-defined gestures have error rates ranging between and .

  • The subjective security evaluation matches with the objective EER. The smaller the EER is, the more secure the authentication is.

Fig. 3: The results of participants’ evaluation on both security and usability on each gesture. ‘100’- best usability/security. ‘0’- worst usability/security.

V-B Evaluation on Usability

In this section, we first evaluate the usability of the authentication system from the survey responses. Then we evaluate the usability of each gestures from both the subjective aspect reported by the participants and the objective aspect quantified by the two metrics.

The collected background information from the survey shows that the majority (90.5%) of participants would like to use mid-air gesture for authentication if available, even including the older participants. The participants’ responses further show that the reason of the acceptance include convenience, ease-to-remember, and security. Only few of them have concerns on security, worrying that mid-air gestures are not as accurate as typing a text password.

V-B1 Results from Survey Responses

To evaluate the usability of each designed gesture, we calculate SUS scores using the responses of the first 10 questions for each gesture. Note that the SUS scores are references to compare participants’ opinions among different gestures, i.e., the gesture with a higher score indicates a higher usability than the gesture with a lower score. The blue sky colors in Fig. 3 show the average score of participants for each gesture. We find that the SUS scores for simple predefined gestures Swipe, Wave, Circle, Zoom and Grab are close to or greater than 68 (above average), indicating participants are more willing to use gestures than unwilling[16]. To analyze if different gestures has significant impact on perceived usability, a Repeated Measures Analysis of Variance (ANOVA) test was conducted using different gestures as independent variables and usability as the dependent variable. According to the result, there was not a significant effect of the gesture type on the perceived usability at the level for the seven different gestures , which suggests that the participants consider all gestures have the similar usability.

(a) Circle
(b) Zoom
(c) Grab
(d) ‘abc’
(e) UD
(f) Sig
Fig. 4: Illustration of corner detection results.
Fig. 5: The usability evaluated based on SUS scores from all participants, the number of corners and the number of frames from G-Sample+.

V-B2 Results from Gesture Samples

To examine the usability of gestures from an objective perspective, we inspect the gesture samples, and calculate the number of corners and the number of frames.

Number of Corners. We consider the gestures with a larger number of sharp changes more complex, e.g., the gesture is more complex than the gesture . Thus, we use the number of corners of gesture samples to evaluate their complexity. We define the corner of a gesture as the turning point with large curvature value. We use a corner detection algorithm based on the curvature scale space (CSS) detector [17]. We evaluate the gesture trajectory from the index finger, because trajectories from other fingers are similar. Fig. 4 illustrated an example results of the corner detection. The slate bars in Fig. 5 illustrate the average number of corners detected by each gesture type. The gestures Swipe, Wave, Circle, Zoom and Grab are simple (), while the gestures ‘abc’, User-defined and Sig are complex ().

Number of Frames. We consider the duration of performing the gesture as a factor to determine whether a gesture is easy to perform or not. A shorter performing duration indicates easier to perform. We define the enrolling time as the number of gesture frames, because the devices’ sample rate is stable. The average numbers of frames show similar trends as the number of corners, as shown in Fig. 5 with pink bars. The simple gestures (Swipe, Wave, Circle, Zoom and Grab) have , and the rest of gestures have .

From Fig. 5, we can see the number of corners and the number of frames exhibit consistent trend. What’s more, they are in inverse relationship with SUS scores. The only exception is User-defined. User-defined has similar number of corners and frames as the ones of Sig, yet User-defined has a higher usability. We believe it is caused by user preferences: users have the full control in choosing a relatively complicate gesture yet they feel easy to perform, unlike all other gestures that are forced upon them.

V-C Security vs. Usability

In this section, we explore the relationship between security and usability based on the survey responses and quantitative metrics.

Fig. 6: The correlation between EER and the number of corners [left] and the number of frames [right].

V-C1 Results from Survey Responses

To compare the subjective evaluation of usability and security, we use the SUS scores from survey questionnaires as usability metric and use the security scores that are represented by the percentage of participants who consider gestures as “Most secure” and “Second secure” out of five options (i.e., from least secure to most secure). By scaling all results to the range of 0 to 100, we show the average scores of each gesture from all participants in Fig. 3. Higher bars indicate better performance for both usability and security. We observed that the participants consider the gestures that are easier to perform (e.g., Swipe, Wave, Grab, Circle, and Zoom) as being less secure, while the gestures Sig, User-defined and ‘abc’ are considered more secure but scarified some usability.

V-C2 Results from Quantitative Metrics

We used the average number of corners and number of frames as evaluation metrics for usability and EER as a metric for security. To analyze the trade-off between security and usability, we chose the linear least square fitting technique to model the relationship between them. As a result, we obtained fitting lines with coefficient values

(number of corners v.s. EER) and (number of frames v.s. EER), shown in Fig. 6. All gestures roughly follow the inverse relationship between usability and security. That is, with the increase of the number of corners and the increase of gesture length, the security performance improves.

In addition, the two plots showed the gesture set can be clustered into two subsets. One subset consists of gestures ‘abc’, User-defined, and Sig (i.e., the 3 dots at bottom-right area in both subplots of Fig. 6), which have a larger number of corners and number of frames but lower EER values. The other subset consists of gestures Swipe, Wave, Circle, Zoom, and Grab, which have fewer corners and frames, but higher EER values. The existence of two subsets indicates the securer gestures generally do not have better usability in terms of number of corners and gesture length.

Vi Consistency Study

To evaluate the consistency of each gesture, we collected samples over 6 weeks. Participants came and contributed data three times per week for the first two weeks, and twice per week for the next two weeks, and once per week for the remaining weeks. Most participants finished the entire experiment. Only two users quit before the end of 6 weeks due to personal reasons. Table III summarizes the information of each round of data collection.

Vi-a Consistency from Survey Responses

We asked each participant how likely they could remember the gesture before each batch of gesture collection, except the first collection. Note that all participants can remember how they performed each gesture without any hint at each batch of gesture collection. For each gesture, the memorability responses from the survey also indicate that more than 80% of participants agree that they can recall it. There was no significant difference among all the gestures.

Fig. 7: [Top] The average number of frames and [Bottom] the average number of corners of each batch of data.

Vi-B Consistency on Gesture Samples

We examine the consistency of all types of gestures using the two metrics for quantify usability: the number of frames and corners. Fig. 7 shows the average number of frames and corners by gesture types over 10 batches of collection. Overtime, we observed that participants become increasingly proficient with each gestures and thus tend to perform gestures faster and smoother.

Vi-C Consistency on EER

In this section, we use EER to quantify consistency. We consider the EER values obtained when the training samples and testing samples belong to the same batch of data as the baseline. We quantify the changes of two batches of gesture samples by the difference of two EER (i.e., increase of EER) values: between the baseline EER and the EER obtained when using the training samples of one batch and testing samples from the other. A smaller difference indicates a better consistency. In particular, we ask two questions: (a) Will gestures performed over time change? (b) Will the change of gestures converge over time? To answer these question, we designed two experiments.

Consistency over Time. To understand whether the change of gestures is proportional to the gap between performing gestures, we select batches of samples that were collected in two consecutive days and in every other days. As we could not force participants to perform gestures in a tightly-controlled time schedule, we only managed to find participants who had contributed data in two consecutive days and 23 participants who had contributed data in every other day. From the results shown in Fig. 8, we observe that 1) for almost all gestures, the EER increases as the days go by; 2) the gestures User-defined, Sig and ‘abc’ have relatively smaller increases in EER than the other simple gestures.

Fig. 8: The results of security performance tested by 25 participants one day later and 23 participants two days later.

Convergence of Gesture Changes. To understand whether the changes across multiple batches will be reduced over time, we trained SVM classifiers with the samples from the st batch and tested with the ones from the 1st (excluding the training samples), 2nd, 3rd, 4th, and 5th, etc.

The results are shown in Figure 9. We have the following observations.

  • The EER results are low when tested in the same day, and increase fast at the first gap (two days on average), then show convergence around the 5th batch (10 days on average).

  • Sig and User-defined exhibit the best security performance ( for all batches). The reason could be that they are complex gestures, and the relative changes in terms of the number of corners and frames are smaller than the other gestures.

  • ‘abc’ presents relatively better performance than the rest of simple pre-defined gestures. It is a complex gesture like Sig but every participant performed the same gesture, leaving little space to tolerant changes.

  • Circle has medium performance. Circle does show less changes of the number of corners than gesture Swipe or Zoom.

  • Swipe, Grab, Zoom, and Wave have the worst performance.

Fig. 9: The EER results averaged over each type of gesture. The experiments are based on the training templates from the first batch of data.

The Number of Gestures to be Remembered. Our experiments involved two groups of users: 32 participants that required to remember and perform all gestures and 10 participants that only needed to perform User-defined gestures. The latter group mimics reality where users choose a few gestures as passwords. To understand the difference between two groups, we trained SVM classifiers with the User-defined samples from the st batch and tested with the ones from the 1st (excluding the training samples), 2nd to 9th batch of data. The results are shown in Fig. 10, from which we observed that without the burden of remembering other gestures, the 10 participants can remember the gestures better and their gestures over time exhibited much better distinctness and consistency.

Fig. 10: The average EER results of User-defined gesture using only the first batch of data for training. ‘UD only’ shows the average results of 10 participants. ‘UD+others’ shows the average results of the other 32 participants.

Vii Shoulder Surfing Attacks

It is unclear whether mid-air gestures are resilient to shoulder surfing attacks. To gain insight of shoulder surfing, We recruited 13 subjects as victims and another 4 subjects as attackers, who mimic each type of the gesture performed by victims. Each victim enrolled 12 samples for each type of the gestures studied in this paper, and thus victim samples are collected. To mimic shoulder surfing attacks, we record short videos (e.g., one or two gesture instances) while victims are performing the gestures.

(a) Swipe
(b) Wave
(c) Circle
(d) Zoom
(e) Grab
(f) ‘abc’
(g) sig
(h) User-defined
Fig. 11: precision recall curves under normal (blue) and attack scenarios. In general, the attack with multiple times observation has better performance than one time observation attacks.

With the videos, we consider two attack scenarios. In the first scenario, the attackers are allowed to watch the videos only once, representing the case that they may happen to see the victim entering gestures once. Then, the attackers entered five gestures by mimicking what they saw. Specifically, they try 5 times for each shown gesture. In total, attack samples are collected. In the second scenario, the attackers can watch the videos as many times as they want before or during the attack. Each attacker attacks 10 to 15 times while learning from a recorded video. In total, we have attack samples.

To evaluate both attack scenarios, we tested the attack samples with the classifiers trained with victim samples (used 4 samples for each class). For comparison, we also tested the classifiers with victim samples which are not used for training. We use precision recall curves to illustrate the results. By varying the threshold to reject possible impostors, we obtain precision recall curves that indicate the trade-off between security and usability. A higher precision indicates a more strict threshold (i.e., better security), at the cost of letting legitimate users try more times. A higher recall indicates a less strict threshold and may let some attackers pass authentication, but legitimate users could pass authentication with a less number of attemps.

The upper-right corner of a curve is the idea point (i.e., 100% precision and 100% recall – all legitimate users are authenticated with one attempt, and all the attackers are rejected). Fig. 11

shows precision recall curves of each gesture types with different types of test sets: normal samples from victims (depicted in blue), attack samples with multiple times of learning (black), attack samples with once observing (red). Results of all types of gestures show that attack samples with one or multiple time observation both have low precision, although multiple learning did slightly improve the chances of attacks. Nevertheless, the precision and recall are still relatively low. Thus, the mid-air gestures are difficult to mimic and the shoulder surfing is not a main thread to our authentication system.

Viii Related Work

Gestures, as a new way of human computer interaction, have shown great promises, and an extensive literature on gesture recognition exists, which includes multi-touch pinch gestures [18], 3D gesture recognition using accelerometer and gyro [19], multi-layer gesture recognition with Kinect [20], and air gesture identification [21]. These gestures have been applied to a wide variety of fields, ranging from controling robots [22], computer commands [23], authentication purposes [24, 25, 26], game control [27] to VR commands [28].

An increasing number of studies focus on user authenticating based on behavioral biometrics. Such gestures are embedded in the usage pattern of traditional I/O devices, such as keystroke dynamics and mouse movement patterns [29, 30]. With the emerge of new technology (e.g., sensors or touch screens), new gestures were discovered. Lower leg gaits [31] and hand gesture patterns [31] captured by accelerometers have shown to achieve high accuracy in user authentication. The operations on a smartphone/pad’s touch screen (e.g., writing a word or using an unlock pattern) can be used to authenticate users either once during logging in [25] or continuously thoughout the oepration [32]. The security and memorability of multi-touch gestures for mobile authentication have been studied [33]. Unfortunately, touch gestures can be vulnerable to shoulder surfing or smudge attacks [1].

Mid-air gestures have become a hot topic recently. 3D hand gesture has been studied on touch-less interactions, such as augmented reality application and game-based virtual environments [34, 35, 36, 37, 38]. For authentication purpose, Nigam et al. combined signature gesture captured by Leap Motion and facial information by a RGB camera to authenticate a user [3]. Aslan et al. explored two mid-air gestures for authentication in different situations [4]. AirAuth system evaluated the security performance with a set of simple hand gestures captured by a depth sensor Creative Senz3D [5]. Using Micrsoft Kinect, Hayashi et al. utilized fusion data of hand waving gesture and user’s body length for authentication [12], and Tian et al. used a gesture of whiting signatures in the air [13] for user identification. This paper also investigate mid-air hand gestures, but focuses on quantifying usability and security of various gestures and exploring their relationship.

Usability evaluation of authentication schemes for other purposes, e.g. password usage in daily life, Touch-ID on iphone, Biometric authentication on smartphones, is a well-researched area [39, 40, 41]. Usability is crucial for an authentication system to be adopted by users [42]. Although there are many papers on mid-air gesture-based authentication, they mostly focus on improving the accuracy. Only a few literature explored one or two aspects of usability surrounding gesture-based authentication: BroAuth [43] present an authentication mechanism based on body gestures. They evaluate the usability and security of three types of visual feedback and found that an abstract representation is the best trade-off between security and usability. Aslan et al. [4] studied 13 participants’ perceptions on two authentication gestures from the prospective of their emotions. AirAuth [5] compared participants’ pleasantness and excitement level between a set of predefined gestures. They found a positive correlation between the authentication accuracy and participants’ excitement and pleasantness. This paper is along the same line, but aims at discovering general metrics that can quantify both usability and security, and to understand the relationship between usability and security for guiding gesture evaluation.

Ix Conclusion

This paper studied the usability and security of a collection of mid-air gestures as biometrics for authentication. Through a user study that engaged 42 participants to collect gesture samples 13 times over a 6-week period and a survey, we managed to validate that the quantitative metrics (i.e., the number of corners, the sample frame length, and the equal error rate (EER)) confirms with the subjective scores from the user survey. Further, we find the correlation between security and usability metrics, which shows that an easy-to-use gesture generally has a worse security. Thus, we can utilize the number of corners and the sample frame length to quickly quantify the security of a gesture. Finally, our consistency study shows that participants tend to forget gestures between experiment rounds.

We note that our experiment results may be different from reality because samples were collected in a lab environment, where no serious consequence will be incurred if a user cannot pass the authentication in our study. We envision that after gesture-based authentication is widely used, the inconsistency of gestures over time will become smaller because we have observed in our study that repeatedly performing gestures will help to provide consistent gestures.


  • [1] A. J. Aviv, K. Gibsonand, E. Mossop, M. Blaze, and J. M. Smith, “Smudge attacks on smartphone touch screens,” in Proc. of WOOT ’10, 2010.
  • [2] U. Food and D. Administration, “Part 211 current good manufacturing practice for finished pharmaceuticals and part 210 current good manufacturing practice in manufacturing, processing, packing, or holding of drugs; general,” Code of Federal Regulations. Title 21., 2015.
  • [3] I. Nigam, M. Vatsa, and R. Singh, “Leap signature recognition using hoof and hot features,” in Proc. of IEEE ICIP ’14, Oct 2014.
  • [4] I. Aslan, A. Uhl, A. Meschtscherjakov, and M. Tscheligi, “Mid-air authentication gestures: An exploration of authentication based on palm and finger motions,” in Proc. of ICMI ’14.   ACM, 2014, pp. 311–318.
  • [5] M. T. I. Aumi and S. Kratz, “Airauth: Towards attack-resilient biometric authentication using in-air gestures,” in Extended Abstract CHI ’14, 2014.
  • [6] I. 9241-14, “Ergonomic requirements for office work with visual display terminals (vdt)s - part 14 menu dialogues,” 1998.
  • [7] J. Nielsen, Usability Engineering.   Academic Press Inc., 1993.
  • [8] “Leap Motion SDK and Plugin Documentation,” https://developer.leapmotion.com/documentation/index.html.
  • [9] D. Freeman, R. Vennelakanti, and S. Madhvanath, “Freehand pose-based gestural interaction: Studies and implications for interface design,” in Proc. of IHCI ’12, Dec 2012, pp. 1–6.
  • [10] N. Sae-Bae, K. Ahmed, K. Isbiste, and N. Memon, “Biometric-rich gestures: A novel approach to authentication on multi-touch devices,” in Proc. of SIGCHI’12, 2012, pp. 977–986.
  • [11] H. Xu, Y. Zhou, and M. R. Lyu, “Towards continuous and passive authentication via touch biometrics: An experimental study on smartphones,” in Proc.of SOUPS ’14.   USENIX Association, Jul. 2014, pp. 187–198.
  • [12] E. Hayashi, M. Maas, and J. I. Hong, “Wave to me: User identification using body lengths and natural gestures,” in Proc. of SIGCHI’14, 2014, pp. 3453–3462.
  • [13] J. Tian, C. Qu, W. Xu, and S. Wang, “Kinwrite: Handwriting-based authentication using kinect,” in NDSS’13, 2013.
  • [14] T. Joachims, “Making large-scale svm learning practical. advances in kernel methods-support vector learning,” 1999.
  • [15] J. Brooke, “Sus: A quick and dirty usability scale,” 1996.
  • [16] J. Sauro, “Measuring usability with the system usability scale (sus),” Jan 2013.
  • [17] X. C. He and N. H. C. Yung, “Curvature scale space corner detector with adaptive threshold and dynamic region of support,” in

    Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004.

    , vol. 2, Aug 2004, pp. 791–794 Vol.2.
  • [18] E. E. Hoggan, M. A. Nacenta, P. O. Kristensson, J. Williamson, A. Oulasvirta, and A. Lehtiö, “Multi-touch pinch gestures: performance and ergonomics,” in ITS, 2013.
  • [19] G. Lefebvre, S. Berlemont, F. Mamalet, and C. Garcia, “Inertial gesture recognition with blstm-rnn,” vol. 4, pp. 393–410, 2015.
  • [20] F. Jiang, S. Zhang, S. Wu, Y. Gao, and D. Zhao, “Multi-layered gesture recognition with kinect,” Journal of Machine Learning Research, pp. 227–254, 2015.
  • [21] M. Ghobadi and E. T. Esfahani, “Adaptive segmentation for air gesture identification,” 2014.
  • [22] H. K. Kaura, V. Honrao, S. Patil, and P. Shetty, “Gesture controlled robot using image processing,”

    International Journal of Advanced Research in Artificial Intelligence

    , vol. 2, 2013.
  • [23] K.-Y. Chen, D. Ashbrook, M. Goel, S.-H. Lee, and S. Patel, “Airlink: sharing files between multiple devices using in-air gestures,” in Proc. of UbiComp’14.   ACM, 2014, pp. 565–569.
  • [24] I. Ahmed, Y. Ye, S. Bhattacharya, N. Asokan, G. Jacucci, P. Nurmi, and S. Tarkoma, “Checksum gestures: continuous gestures as an out-of-band channel for secure pairing,” in Proc. of UbiComp’15.   ACM, 2015, pp. 391–401.
  • [25] S. Uellenbeck, M. Durmuth, C. Wolf, and T. Holz, “Quantifying the security of graphical passwords: The case of android unlock patterns,” in Proc. of ACM CCS, 2013, pp. 161–172.
  • [26] B. Shrestha, M. Mohamed, A. Borg, N. Saxena, and S. Tamrakar, “Curbing mobile malware based on user-transparent hand movements,” pp. 221–229, Mar 2015.
  • [27] “Xbox 360 healthy gaming guide,” http://support.xbox.com/en-US/xbox-360/system/healthy-gaming-guide.
  • [28] A. Kulshreshth and J. J. LaViola, Jr., “Exploring the usefulness of finger-based 3d gesture menu selection,” in Proc. of SIGCHI’14.   ACM, 2014, pp. 1093–1102.
  • [29] F. Monrose, M. K. Reiter, and S. Wetzel, “Password hardening based on keystroke dynamics,” in Proc. of ACM CCS’99, 1999, pp. 73–82.
  • [30] Z. Jorgensen and T. Yu, “On mouse dynamics as a behavioral biometric for authentication,” in Proc. of the ASIA CCS ’11, 2011.
  • [31] D. Gafurov, K. Helkala, and T. Søndrol, “Biometric gait authentication using accelerometer sensor,” Journal of Computers, vol. 1, no. 7, 2006.
  • [32] L. Li, X. Zhao, and G. Xue, “Unobservable re-authentication for smartphones,” in NDSS’13, 2013.
  • [33] M. Sherman, G. Clark, Y. Yang, S. Sugrim, A. Modig, J. Lindqvist, A. Oulasvirta, and T. Roos, “User-generated free-form gestures for authentication: Security and memorability,” in Proc. of MobiSys ’14.   ACM, 2014, pp. 176–189.
  • [34] E. M. Taranta II, T. K. Simons, R. Sukthankar, and J. J. Laviola Jr., “Exploring the benefits of context in 3d gesture recognition for game-based virtual environments,” ACM Trans. Interact. Intell. Syst., vol. 5, no. 1, pp. 1:1–1:34, Mar. 2015.
  • [35] K. Kim, J. Kim, J. Choi, J. Kim, and S. Lee, “Depth camera-based 3d hand gesture controls with immersive tactile feedback for natural mid-air gesture interactions,” Sensors, vol. 15, no. 1, pp. 1022–1046, Jan. 2015.
  • [36] S. S. Rautaray and A. Agrawal, “Vision based hand gesture recognition for human computer interaction: a survey,” Artificial Intelligence Review, vol. 13, pp. 1–54, Jan. 2015.
  • [37] Z. Lv, S. Feng, L. Feng, and H. Li, “Extending touch-less interaction on vision based wearable device,” in Proc. of IEEE VR ’15.   IEEE, Mar. 2015, pp. 231–232.
  • [38] T. V. Thanh, D. Kim, and Y.-S. Jeong, “Real-time virtual lego brick manipulation based on hand gesture recognition,” Advanced Multimedia and Ubiquitous Engineering, vol. 352, pp. 231–238, 2015.
  • [39] C. Bhagavatula, B. Ur, K. Iacovino, S. M. Kywe, L. F. Cranor, and M. Savvides, “Biometric authentication on iphone and android: Usability, perceptions, and influences on adoption,” in In Proc. USEC, 2015.
  • [40] I. Cherapau, I. Muslukhov, N. Asanka, and K. Beznosov, “On the impact of touch id on iphone passcodes,” in Proc. of SOUPS, 2015.
  • [41] E. Hayashi and J. I. Hong, “A diary study of password usage in daily life,” in Proc. of ACM CHI, 2011.
  • [42] H. Sasamoto, N. Christin, and E. Hayashi, “Undercover: Authentication usable in front of prying eyes,” in Proc. of ACM CHI, 2008.
  • [43] M.-E. Maurer, R. Waxenberger, and D. Hausen, “Broauth: Evaluating different levels of visual feedback for 3d gesture-based authentication,” in Proc. of ACM AVI’12, 2012, pp. 737–740.