Human Gait Database for Normal Walk Collected by Smart Phone Accelerometer

05/04/2019 ∙ by Amir Vajdi, et al. ∙ University of Massachusetts-Boston 0

The goal of this study is to introduce a comprehensive gait database of 93 human subjects who walked between two end points during two different sessions and record their gait data using two smart phones, one was attached to right thigh and another one on left side of waist. This data is collected with intention to be utilized by deep learning-based method which requires enough time points. The meta data including age, gender, smoking, daily exercise time, height, and weight of an individual is recorded. this data set is publicly available.



There are no comments yet.


page 8

page 9

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


The goal of this study is to introduce a comprehensive gait database of 93 human subjects who walked between two end points during two different sessions and record their gait data using two smart phones, one was attached to right thigh and another one on left side of waist. This data is collected with intention to be utilized by deep learning-based method which requires enough time points. The meta data including age, gender, smoking, daily exercise time, height, and weight of an individual is recorded. this data set is publicly available.


Identification of individuals through biometric measures refers to the methods of recognizing an individual through his/her physiological (e.g., face, hand) or behavioral characteristics (e.g., gait). Gait, as a behavioral biometric modality is a useful and reliable approach that identifies individuals based on their walking pattern. Human authentication and identification through gait has become an appealing research area because of its applications in forensic science, surveillance, and security. Gait recognition is preferred over the other biometric identification and authentication techniques due to the following reasons:

  • Unlike techniques using fingerprint or retinae scan, gait can be measured continuously and remotely over time.

  • It is a non-invasive measure as acquiring gait does not impact individuals’ comfort zone.

  • Since it is a behavioral biometric authentication technique, it is harder to steal and cannot be forgotten.

  • compared to other behavioral biometrics such as fingerprint, gait is a more secure modality because the gait of an individual is difficult to mimic [9]

  • A person’s gait can be recognized even under adverse conditions [8]

On the other hand, individuals’ identification and authentication through gait patterns has its own limitations. The gait of an individual may be impacted by factors like drunkeness, clothing variation, and fatigue. These limitations could be addressed by combining gait modality with other biometric modalities to have a more reliable authentication or identification system [6]. According to a survey study by [6], gait recognition approaches can be categorized into three types: machine vision-based, floor sensor-based, and wearable sensor-based. Vision-based gait recognition systems can be impacted by weather conditions, lighting, and data noises. Floor-based systems need expensive floor sensors and they limit continuous monitoring of individuals. Wearable sensor-based systems, however, allow monitoring individuals’ walking patterns continuously and remotely over time. Moreover, wearable sensor-based approaches are less expensive and less affected by noise. Smartphones can provide high quality inertial measurements and they are widely being used. This property makes them invaluable for gait applications. In this paper, we are interested in human gait identification through smartphone inertial sensors’ data.

Many databases have been developed and used for authentication and identification purposes [4, 3, 7, 11, 6, 1, 2, 5, 8, 9] . However, they are either not publicly available [3, 7, 11, 1, 5] or if they are they have their own limitations [4, 10, 2].

Designing automated human identification or authentication systems require a proper dataset. Among the publicly available datasets, the largest one was acquired by researchers at Osaka University (OU) [10]

. Although they constructed several datasets for three inertial sensors and a smartphone around the waist of subjects, and included high number of participants (744), they only recorded two very short data sequences per subject and the data collection was done in a controlled environment. Short sequence of data in this dataset limits the application of deep neural networks. Other publicly available datasets are including data from a much smaller number of participants. In the study done by

[3]., gait data was collected from 20 subjects using a mobile phone in their pocket. Subjects performed two 15-minute walk trials on two different days [3]. The most recent gait database was introduced by [4]. In their study, motion data was acquired from 50 subjects in five several acquisition sessions and five minutes of data collection per session. Android smartphones were used for data collection and worn in the right front pocket of the individuals trousers. All of the above-mentioned databases come with their own limitations and do not meet our requirements. Specifically, we collected gait data from 93 individuals using iPhone 6s, attached to the left waist and right thigh of the participants. Two acquisition sessions of about 320 meters (200 miles) were performed per subject. Participants were asked to walk with their comfortable speed. The sample frequency for collecting the raw inertial signals were 100. Compared with the existing databases, the advantages of this database are as follows:

  • A large number of subjects that can significantly improves the performance and reliability of the gait recognition algorithms.

  • The male-to-female ratio is close to 1. This property prevents creating a biased gait recognition system.

  • Our 6D gait signal includes 3D acceleration and 3D angular velocity capture data high frame rate, which is not only useful for gait recognition but also for understanding the walk motion.

  • Individuals’ meta data, including exercise level and smoking could be used further for gait analysis of different populations with regards to these aspects. Specifically, our dataset could be very useful for variability analysis because taking more than 30 steps is recommended to sustain the reliability of gait variability

  • Variation of sensor locations (left waist and right thigh) could be useful in comparing the performance of gait recognition system and their dependence or independence on the body location where the sensor is attached.

  • This dataset can be used in integration with other future datasets for Identifying walking from other activities. The limitation of our database is that different conditions (e.g., clothing, various ground slope conditions) were not considered during the data collection. We did not collect the data on various walking speeds.

Materials and Methods

Data Collection

This study was conducted by recruiting 93 individuals who walked with comfortable pace during two different sessions. In each session, two different smart phones (iPhone 6S) are installed to record the data, one on the left waist and another one in the right thigh as shown in the figure 4. Every subject during each session was walked a distance of meters between two endpoints and forward and backward (a total of meters for each session). The location of the experiment was the same for all the subjects with sea level of . For each individual, one smart phones was first installed near the left waist and then the other smart phone was installed on the right thigh. In order to remove the set up noises, the subject had to wait for 5 second after installing the smart phones and after 5 second, he/she started to walk from end point A to end point B. Once the subject arrived to point B, he/she turned around and waited for 5 second and then started to walk toward point A. After the subject arrived at point A, he/she waited for 5 seconds and then the smart phones are detached and data collection stopped. These 5 seconds interruptions are being used to identify the directions. The exact same experiment repeated for the second session. For capturing the data we used the SensorLog application (1.9.7 version) which is developed and tested for IOS framework. Additionally, subject’s meta data were collected containing age, weight, height, average daily amount of exercise, gender, and smoker/non-smoker information.

Data Records

Every subject is associated with 4 different log files (each session contains two log files) except 19 subjects who did not attend for second session. Every file name has one of the following patterns:

  • sub0-lw-s1.csv: subject number 0, left waist, session 1

  • sub0-rp-s1.csv: subject number 0, right thigh, session 1

  • sub0-lw-s2.csv: subject number 0, left waist, session 2

  • sub0-rp-s2.csv: subject number 0, right thigh, session 2

Every log file contains 58 features that are internally captured and calculated using SensorLog app. These are the raw data. Figure 1 represents all the 58 features based on each category. Additionally an Excell file contain the meta data is provided for each subject.

Figure 1: 58 features that are logged and calculated by SensorLog app. These features divided into 7 categories that are represented by each color.

Usage Notes

As an example of application of the data, we present a general recurrent neural network framework (RNN) for gait biometric recognition. The accelerometer signals are typically sequential data which are appropriate to be analyzed by an RNN. Unlike the traditional gait biometrics being tackled by using handcrafted features which lead to complex computation and heavy reliance on experimental design, the proposed model can automatically learn the dynamic features and temporal dependencies from a short data window (

points) extracted from a sequence of raw accelerometer signals. We evaluated the model on the data set and the results show that the proposed model significantly outperforms the previous studies in this field, which in turn, indicates the quality of the data set.

Data preprocessing

  1. Cleaning Data: Remove subjects since their left waist data and right leg pocket data cannot be aligned according to the timestamps.

  2. Determining axes: Compute the absolute mean value of each axis, and the axes with the maximum values are axes. Remove gravity offset from axes.

  3. Data combination: Align left waist data with right leg pocket data based on timestamps to form a 6-channel space ( and ).

  4. Splitting data: Use a sliding window of fixed length to segment the data without overlapping. The window length is , corresponding to the step size of our model. The total number of sequences obtained after this configuration is . Each sequence has a size of , which is an input of our model.

  5. Standardization: Apply a non-linear transformation (QuantileTransformer) such that the probability density function of each feature will be mapped to a normal distribution. The transformation smooths out unusual distribution and is robust to outliers.

Model implementation

Our model, referred to as GaitNet, is a 3-layer Gated Recurrent Unit (GRU) neural network with 64 units on each layer, which takes as inputs a fixed-sized sequence of raw gait points extracted by a sliding window approach. Each gait point is a 6-dimensional feature vector, consisting of

and coordinates collected from smart phones on both spots, denoted as and

. The model can automatically learn the dynamic features and temporal dependencies from the input gait data, and outputs a probability distribution over all classes. As the number of points that have been seen by the model increases, the model’s cell state becomes progressively more informed. Therefore, we are only interested in the prediction output at the last time step, when the full sequence has been observed. Fig.

2 shows the architecture of GaitNet.

Figure 2: The Architecture of GaitNet


All examples were randomly divided, using for training, for validation and another for testing. We trained our GaitNet on data windows of length time steps for a total of iterations using mini-batch gradient descent with learning rate . The batch size was set to

. A Softmax classifier was used to calculate the probability distribution over

classes and the model was trained by minimizing the cross-entropy loss between predicted probabilities and one-hot encoded target. The weights were then updated by an Adam (Adaptive Moment Estimation) optimizer with gradient clipping to control the exploding gradients problem. A dropout regularization with a rate of

was applied to each RNN cell to avoid overfitting. During training, the model was validated on the validation set every iterations to monitor the convergence and overfitting problems. In addition, the validating result was also employed to pick the best model after training. Our model took approximately hours to train on NVIDIA GTX

Ti running Tensorflow



A biometric recognition system can run in two different modes: identification or verification. We use Rank-1 identification rate (Rank-1 IR) and Equal Error Rate (EER) for the evaluation of biometric identification and verification performance, respectively. Our GaitNet model works as a multiclass classifier that can yield a class probability distribution for an input gait movement pattern of a person who is going to be identified. The decision of acceptance or rejection of a person is determined by a decision threshold to which the model compares the prediction probability. Rank-1 IR is a measure of the biometric identification performance that shows the percentage of correct identifications returned at the first place of a ranked list. The achieved Rank-1 IR for our model is . In the verification scenario, the decision threshold must be adjusted according to the desired characteristics for the application considered. The equal error rate (EER) can be used to give a threshold independent performance measure of a biometric system. GaitNet achieves EER. In Fig. 3, we also present the ROC curves to provide a more global overview of the biometric recognition performance. We plot the ROC curves in semilogarithmic scale (i.e., x axis in logarithmic scale) since low false acceptance values are of more interest, and the logarithmic scale better distinguishes values in this range.

Figure 3: The constructed ROC curve demonstrating the overall verifi- cation performance of GaitNet.

Figure 4: The two smart phones are installed as it is shown in the figure. One on left waist and another one on right thigh.

Figure 5: The top and bottom histograms represent the distribution of Weight in KG and Age of subjects, respectively.

Figure 6: The histogram represents the distribution of amount of exercise (minutes) that subjects perform per week.

Figure 7: A sample scatter plot of accelerometer values consisting of X, Y and Z coordinates before removing noises such as standing timepoints.

Figure 8: A sample scatter plot of accelerometer values consisting of X, Y and Z coordinates before removing noises such as standing timepoints.

Supporting Information

The trimmed accelerometer data set can be downloaded here.
The meta-data of subjects can be downloaded here


We thank just about everybody who contributed in this study as a subject.


  •  1. S. K. Al Kork, I. Gowthami, X. Savatier, T. Beyrouthy, J. A. Korbane, and S. Roshdi. Biometric database for human gait recognition using wearable sensors and a smartphone. In 2017 2nd International Conference on Bio-engineering for Smart Technologies (BioSMART), pages 1–4. IEEE, 2017.
  •  2. P. Casale, O. Pujol, and P. Radeva. Personalization and user verification in wearable systems using biometric walking patterns. Personal and Ubiquitous Computing, 16(5):563–580, 2012.
  •  3. J. Frank, S. Mannor, and D. Precup. A novel similarity measure for time series data with applications to gait and activity recognition. In Proceedings of the 12th ACM international conference adjunct papers on Ubiquitous computing-Adjunct, pages 407–408. ACM, 2010.
  •  4. M. Gadaleta and M. Rossi.

    Idnet: Smartphone-based gait recognition with convolutional neural networks.

    Pattern Recognition, 74:25–37, 2018.
  •  5. D. Gafurov. A survey of biometric gait recognition: Approaches, security and challenges. In Annual Norwegian computer science conference, pages 19–21. Annual Norwegian Computer Science Conference Norway, 2007.
  •  6. D. Gafurov, E. Snekkenes, and P. Bours. Gait authentication and identification using wearable accelerometer sensor. In 2007 IEEE workshop on automatic identification advanced technologies, pages 220–225. IEEE, 2007.
  •  7. T. Kobayashi, K. Hasida, and N. Otsu.

    Rotation invariant feature extraction from 3-d acceleration signals.

    In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3684–3687. IEEE, 2011.
  •  8. R. C. Lindsay, D. F. Ross, J. D. Read, and M. P. Toglia. The handbook of eyewitness psychology: volume ii: memory for people, volume 2. Psychology Press, 2013.
  •  9. B. B. Mjaaland, P. Bours, and D. Gligoroski. Walk the walk: attacking gait biometrics by imitation. In International Conference on Information Security, pages 361–380. Springer, 2010.
  •  10. T. T. Ngo, Y. Makihara, H. Nagahara, Y. Mukaigawa, and Y. Yagi. The largest inertial sensor-based gait database and performance evaluation of gait-based personal authentication. Pattern Recognition, 47(1):228–237, 2014.
  •  11. H. M. Thang, V. Q. Viet, N. D. Thuc, and D. Choi. Gait identification using accelerometer on mobile phone. In 2012 International Conference on Control, Automation and Information Sciences (ICCAIS), pages 344–348. IEEE, 2012.