GlidarCo: gait recognition by 3D skeleton estimation and biometric feature correction of flash lidar data

05/16/2019 ∙ by Nasrin Sadeghzadehyazdi, et al. ∙ University of Virginia Old Dominion University 0

Gait recognition using noninvasively acquired data has been attracting an increasing interest in the last decade. Among various modalities of data sources, it is experimentally found that the data involving skeletal representation are amenable for reliable feature compaction and fast processing. Model-based gait recognition methods that exploit features from a fitted model, like skeleton, are recognized for their view and scale-invariant properties. We propose a model-based gait recognition method, using sequences recorded by a single flash lidar. Existing state-of-the-art model-based approaches that exploit features from high quality skeletal data collected by Kinect and Mocap are limited to controlled laboratory environments. The performance of conventional research efforts is negatively affected by poor data quality. We address the problem of gait recognition under challenging scenarios, such as lower quality and noisy imaging process of lidar, that degrades the performance of state-of-the-art skeleton-based systems. We present GlidarCo to attain high accuracy on gait recognition under the described conditions. A filtering mechanism corrects faulty skeleton joint measurements, and robust statistics are integrated to conventional feature moments to encode the dynamic of the motion. As a comparison, length-based and vector-based features extracted from the noisy skeletons are investigated for outlier removal. Experimental results illustrate the efficacy of the proposed methodology in improving gait recognition given noisy low resolution lidar data.




code please


code please


page 1

page 3

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Gait identification has received an increasing interest in the last decade due to the various applications in areas ranging from intelligent security surveillance and identifying person of interest in criminal cases, to designated smart environments [1, 2]. Besides, gait analysis plays an important role to quantify the severity of certain motion-related diseases like Parkinson [3]. Gait recognition aims to tackle the identification problem based on the way people walk and early studies findings in the medical and psychology have shown the uniqueness of gait to individuals [4, 5]. While the iris [6], face [7, 8], and fingerprint [9] provide some of the most efficacious biometrics for person identification with high recognition accuracy, they require the cooperation of subjects as well as availability of high quality data. In real life however, there are many scenarios in which the subjects cannot be controlled, there is no contact between subjects and sensors, or access to the high quality data is not possible. Under such circumstances, biometrics that can be extracted from the gait have shown promising results in several studies [10, 11, 12]. Features extracted from gait are resilient to changes in clothing or lighting conditions compared to color or texture which are among prevalent features for person identification. While patterns of walking may not be necessarily unique to individuals in practice, a combination of biometric-based static attributes along with the motion analysis of certain joints can create an effective set of features to recognize one individual from another.

Fig. 1: Sample frames of lidar data. The top and bottom rows show range and intensity data, respectively.

In recent years, depth cameras have become popular for gait analysis mainly due to their ability to provide a three dimensional depiction of the scene [13, 14, 15]. Unlike their optical counterparts, depth cameras like lidar and Kinect can provide depth information that is not sensitive to illumination and changing in the lighting conditions that are among major issues in uncontrolled environments. This makes depth camera an ideal candidate for long-term person identification that rely on features such as time-invariant biometrics. In this work, we utilize the flash lidar technology to collect data. A flash lidar camera uses pulsed laser to illuminate the whole scene and simultaneously record range (depth) and intensity information. Figure 1 shows sample frames of the collected intensity and range data by the flash lidar camera. Since the laser beams can be focused to suit the objects of interest, a flash lidar camera can provide detailed depth imaging of the scene. This property of flash lidar has lead to extensive applications in areas such as autonomous vehicles, atmospheric physics, archaeology, forestry, geology, geography, seismology, space missions, and transportation.

Video-based gait recognition approaches are generally divided into two main categories, model-based, and model-free methods. Model-free methods rely on features that can be obtained from clean silhouettes [16, 17]. They are easy to implement and computationally less expensive compared to their model-based counterparts. However, model-free methods are not view and scale invariant, and require recordings from multiple angles that is not always feasible from applicability point of view. Using skeleton for gait recognition is categorized as a model-based approach for person identification. Most of the existing methods rely on fitting a model, usually a skeleton, to human silhouettes [18, 19]. The main issue with the model-based methods is the fact that in general, model fitting is a computationally expensive process. While such difficulties are not an issue with structured lighting approaches such as Kinect due to the direct estimation of joints coordinates, the working range of Kinect is limited. Furthermore, the range information of Kinect is not reliable in outdoor environments, because it is not easy to differentiate the infrared light of the sensor from the high intensity infrared of environment [20, 21]. To curb the computational complexity of model-based methods, several studies rely on high-quality real-time skeleton joints data generated by Mocap [22, 23]. However, in terms of applicability, Mocap is limited to a laboratory environment which is a major drawback. Unlike Mocap, flash lidar has been extensively used for outdoor applications. Compared with Kinect, a flash lidar camera has a drastically extended range ( meters) and its performance is not degraded in outdoor environments due to the high irradiance power of pulsed laser generated by lidar compared with the background [24].

With a limited number of studies, the only existing lidar-based person identification works in the literature are model-free and rely on background subtraction to extract human silhouette from the point cloud data provided by Velodyne’s Rotating Multi-Beam (RMB) lidar system [25, 26, 17]. In this work however, we take a model-based approach, leveraging OpenPose, a pre-trained deep network [27] to extract a skeleton model from the intensity information. Using camera properties and the depth data, the provided skeleton joint coordinates are transferred into real-world coordinates. The work presented here can be employed for gait analysis in different applications; for instance to improve the classification results by including gait information in our previous work [28].

This shift of the modality from the structured (image/video) to unstructured (skeleton) data type provides benefits in terms of data compaction, computation, storage, scalability, and recognition accuracy. Furthermore, the skeleton-related attributes mimic actual physical traits in human body and can be utilized as a soft biometric ID for the individuals. Our visual system does not extricate details of clothing texture, or the skin tone of the person who is walking. Rather, it focuses on certain body parts (joints, limbs), and tries to reconstruct the anatomy and locomotion. Such biological cues are exploited in  [29, 30]. In addition, there has been a surge of studies to find a suitable model for such purpose [31, 32, 33].

Existing successful model-based methods take advantage of high-quality skeleton data provided by Kinect or Mocap and avoid the challenge of erroneous features. However, as we mentioned earlier, these modalities are not a proper choice for real-world applications. In contrast with Kinect and Mocap, the data collected by a flash lidar camera is noisy and has low resolution that degrades the performance of skeleton extraction systems. Features that are computed from the faulty skeleton models are plagued with erroneous measurements that in turn present a major challenge for a successful gait recognition. In this paper, our main goal is to answer the following question, ”When the collected data is noisy to a level that a considerable number of fitted skeleton models contain missing or erroneous joints, is it still possible to identify gaits with a high accuracy and precision?”. In particular under the described condition, ”Can we avoid the common approach of removing noisy data, and correct the faulty skeletons, instead?” To address these questions, we present GlidarCo, a methodology to correct for the faulty and missing measurements of joint coordinates, and integrate the robust statistics to improve gait recognition using the noisy, low resolution flash lidar data.

Our contributions are fourfold. First, we present a model-based approach for gait recognition using flash lidar data that is close to real-time. Second, we present a filtering mechanism that exploits robust statistics and shape-preserving interpolation to correct for faulty and missing measurements of joint coordinates. Third, we integrate robust statistics with the traditional feature moments to incorporate the motion dynamics over the gait cycles. Fourth, as an alternative method for applications where data elimination is not an issue, we investigate features extracted from noisy skeletons for outliers, and present a modification of the Tukey’s method for vector-based feature vectors. The latter contribution is an effort to follow the traditional practice of removing noisy data and perform classification on the remaining clean data. In particular, we aim to compare the results from outlier removal method, with an unorthodox effort that seeks to correct the erroneous data. We must emphasize the importance of the latter, as it preserves the original data, that is costly to collect in many applications. An extensive experimental investigation demonstrates the efficacy of the proposed methodology in improving the performance of both length-based and vector-based features for gait identification using the flash lidar data.

The rest of this paper is presented as follows. In the next section, we will outline the related work. Next, we will describe the proposed methodology in detail in the Methods section. The results and discussion describes a thorough experimental investigation into the efficacy of the proposed method and compares the performance of multiple set of features, including state-of-the-are methods in the context of gait recognition before and after data correction. Finally, we summarize in the conclusion section.

Ii related work

Model-based methods fit a model, like a skeleton, to human body and use the features extracted from the fitted model to identify gaits. Model fitting is generally a complex and computationally expensive process. To avoid such difficulties, many studies leverage Kinect as a marker-less motion capture tool, that generates a real-time high quality intensity and depth data, along with joint information of skeleton. In general, these methods are based on identifying gait cycle and calculating static anthropometric-based features like bone lengths and height, gait features like step length and gait cycle or angle between selected body joints over each gait cycle. Statistics like mean, maximum, and standard deviation of the collected attributes are computed over each gait cycle and utilized as feature identifiers.

Using maximum, mean, and standard deviation of a set of lower body angles over a half gait cycle as features and K-Means clustering algorithm, Ball

et al. [34] acquired an accuracy of 43.6% on a dataset collected from four subjects. In [11], authors used a set of static features plus two gait features and achieved an accuracy of 90% on a dataset collected from nine subjects walking from right to left in front of a Kinect camera. Araujo et al. [35]

used eleven static anthropometric features and investigated the effect of different subset of features in gait recognition. They also compared the performance of four different classifiers on a dataset collected from eight different subjects. An average accuracy of 98% was obtained only when the training and test samples contained the same type of walking pattern. Sinha

et al. [12] proposed a set of area-based features plus distance between different body segment centroids and combined these attributes with features in [34] and [11] and obtained a higher accuracy compared with the work of Ball and Preis on a dataset of ten subjects. Kumar and Babu [36] proposed a set of covariance-based measures on the trajectory of skeleton joints, and acquired an accuracy of 90% on a dataset of 20 subjects. Dikovski [37] evaluates the performance of different features like angles of lower body joints, distance between adjacent joints, height and step length over one gait cycle. Relative distance and relative angles are computed between selected body joints and compared together utilizing the Dynamic Time Warping (DTW) algorithm by Ahmed et al. [38]. Ali et al. [39] compute triangles formed by lower body joints during motion and utilize mean of the areas during one gait cycle. In [40] Yang et al. use a set of anthropometric and relative distance-based features for identification.

Fig. 2: Pipeline for gait recognition using joint correction criterion of GlidarCo
Fig. 3: Pipeline for outlier removal. Inputs to ”3D Joint location estimator” remain the same as in Figure 2

The majority of previous model-based studies exploit sequences that were recorded on a limited walking patterns. Subjects walk on straight lines and training and test sequences include the same patterns of walking. In this work, we consider a case that involves different patterns of walking. In particular, we select disparate walking patterns for training and test sequences.

This paper is an extension of our previous work [41], with an improvement on joint coordinate filtering and per-frame identification. Furthermore, we propose a new method to integrate the dynamic of the motion. We also present an outlier removal method for vector-based feature vectors that can be employed in applications that data removal is not an issue. In addition, we evaluate the proposed methodologies on two different feature vectors, and compare with more state-of-the-art relevant methods.

Fig. 4: Top row: sample frames with correctly detected skeletons, bottom row: frames with faulty skeletons

In our dataset, recorded by a single flash lidar camera, several factors diminish the quality of features that are computed from the resulting joints. As the subjects proceed toward the camera, range data are affected by noise. The lack of color in the intensity data, and similarity between human clothing, background and skin are some of the other elements that can negatively affect the quality of detected poses and consequently the feature vectors. A common approach in the existing studies involves the removal of outlier noisy data that are generated as a result of faulty measurements. Further processing is applied on the remaining higher quality collection of joint data. In this paper, we propose an automated outlier removal procedure. However, with the shortage of data being a major challenge in many real-world surveillance scenarios, data removal will only exacerbate the data scarcity problem. In other words, while outlier removal can be a proper solution to gather higher quality data to begin with, it is not the best choice when data elimination can raise issues. We aim to address this problem by proposing a filtering mechanism that corrects erroneous joint data, instead of eradicating them. Figure 2 and 3 present the pipeline of joint correction and outlier removal methodologies, respectively.

To prove the efficacy of the proposed methodology, we will compare the performance of two different sets of features, length-based and vector-based features, and four state-of-the-art works, before and after joint correction. Furthermore, as an alternative for applications in which data elimination is not an issue, we also consider automatic outlier removal and compare it with the proposed joint correction on improving the gait identification accuracy.

Iii Methods

Iii-a Overview of method

Figure 2 describes the workflow of the proposed gait recognition methodology using flash lidar data. For a lidar sequence with frames, there exists , and , where and represent intensity and range data at frame . Images are preprocessed to reduce noise and are fed into a 2D skeleton detector. We leverage OpenPose a state-of-the-art real-time pose detector to fit a skeleton model and extract the location of body joints. In Figure 4, the top row shows examples of correctly detected skeleton joints. As we can see in this figure, OpenPose provides a skeleton model of 18 joints, where 5 of the joints represent nose, eyes, and ears. However, the model that we adopt in this paper only considers 13 joints. The reason for such choice is the fact that face joints are missing from a large majority of our samples. Furthermore, the facial joints do not convey useful information for gait recognition. Figure 5 illustrates the skeleton model that we use in this work. Given as the input to the skeleton detector, the output is the joint location coordinates that can be represented with the following vectorized form


where are the coordinates of the joint in the image frame of reference, and represents the number of joints. Using the range data and the properties of the lidar camera, we can project the 2-dimensional coordinates of joints into real-world coordinates. , the real-world location of joint in the direction can be calculated according to the following equation


where is the number of pixels in the direction, represents the angle of view, and is the range value of joint . represents the location of joint in the direction in the image coordinate system. Here is in the or direction, and the in the direction equals to the depth value at the location of the joint .

As we discussed earlier, the quality of the resulting skeleton and the joint localization are negatively affected by several factors. The features that are computed using the acquired skeletons are plagued with erroneous measurements. Therefore, gait recognition based on the computed faulty skeletons results into a high rate of false positives. To resolve this problem, we present a filtering mechanism that employs robust statistics and shape-preserving interpolation to correct for faulty measurements in time sequences of joint coordinates values. This filter will improve the quality of the joint localization and ultimately enhance the gait recognition accuracy. As an alternative approach for the joint location correction, we employ the Tukey method to detect and remove length-based and vector-based feature vectors. In particular, we present a modification for vector-based outlier detection using the Tukey method. The following subsection gives the description of the filtering mechanism, which is followed by the outlier removal subsection.

Fig. 5: The skeleton model we use in this work. Left: index of each joint in the skeleton model. Right: skeleton model in a sample frame.

Iii-B Filtering of joint location

Let be a matrix of the size of , where each row represents the time sequence of one joint in one of the direction of and , extended over frames. Since each skeleton consists of joints, there are in total joint coordinate time sequences. In order to correct for missing joint location values and noisy outliers in a given video, we perform filtering of joint location on each row of the corresponding matrix. Let represent the -th row of


Given joint location sequence , first we use Tukey’s test to detect any value in that is below , or above where stands for the interquartile range, and

are lower and upper quartile, respectively. If

is the set of all the detected outlier indices in (each index corresponds with one time instant ) defined as


where is the number of outliers in detected by the Tukey’s method, then will be corrected according to the following


where is the corrected value of at . and stand for non-outlier, and the one nearest neighbor, respectively. is the value of the nearest neighbor of , that is not an outlier. In those cases with two nearest neighbors, one is selected randomly. After the detected outlier values of are corrected according to equation 5, piece-wise cubic Hermite polynomials [42] are utilized to interpolate the missing values in . We use piece-wise Hermite polynomial to preserve the shape of . Meanwhile, by applying outlier correction before missing value interpolation, the shape of the curves will be less affected by outliers. Finally, we employ RLowess (locally weighted scattered plot smoothing) filter [43] to smooth the resulting joint location sequence and alleviate the effect of remaining smaller spikes in . RLowess assigns a value to each point by locally fitting a first-order polynomial, utilizing weighted least squares. Weights are computed using the median absolute deviation (MAD), which is a robust measure of variability in the data in the presence of outliers. The robustness of weights is critical due to the existence of smaller-amplitude spikes that act as outliers.

Fig. 6: Effect of joint location sequence filtering. From top: sample joint location sequences before (first row) and after (second row) joint location sequence filtering. Samples of faulty and missing skeleton joints before (third row) and after (bottom row) joint location sequence filtering.

The described filtering procedure will effectively correct joint location time sequences. Furthermore, when pose-detector fails to detect a skeleton model, the joint location filtering can interpolate the missing skeleton joint locations. Figure 6 illustrates the result of filtering on samples of joint location time sequences. As we can see in this figure, the original joint location sequences are noisy, containing many missing values and outliers. We can also see the results in the image reference frame, where missing joints are interpolated successfully through the filtering mechanism. While in the majority of cases, the interpolation of missing or noisy joints follows the correct joint locations, there exist cases where the obtained localization results are not accurate. Figure 7 shows some failure examples in joint localization correction. However, even for failure cases, at least half of the joints are predicted correctly. This can enhance the likelihood of correct identification compared to the original localization of the joints.

Fig. 7: Failure examples of joint sequence filtering. Sample frames of skeleton joints, before (top) and after (bottom) joint sequence filtering.

Iii-C Incorporating the dynamics

As humans, we recognize a familiar person not just by looking at their body measurements like height; we also incorporate the way that a person walks or moves their body in recognizing one subject from another subject. In the gait recognition language, the first set of features that are computed from body measurements like limb lengths or height are called static features. Attributes like step length or speed that comprise the motion of gait from one posture to another posture, are dynamic features. When individuals with approximately the same body measurements are considered, dynamic features are critical for a successful gait recognition. Speed, step length and stride length are among the widely used features to incorporate the dynamic of the motion

[11, 44]

. Another common practice in the majority of model-based methods involves computing moments like mean, maximum, and variance of selected features over the length of each gait cycle

[12, 40, 45]. The time sequence of the distance between the two ankle joints is a commonly employed attribute to compute the gait cycle. This practice has repeatedly proven to be successful in encoding the dynamic of the motion, achieving high accuracy in gait recognition. However, this analysis is commonly performed on a clean dataset that is recorded under controlled conditions, like limited directions of motion in front of the camera.

Fig. 8: Examples of time sequence of ankle to ankle distance of lidar data after joint correction. While the plot on the left presents a clear periodic pattern, the sequence on the right lacks such a pattern.

Figure 8 shows examples of ankle to ankle distance time sequences for lidar data after joint location filtering. The sequence on the left shows a periodic pattern, however like the plot on the right side, there are many examples of such sequences that lack a clear cyclic pattern. In contrast with the lidar data, we generally observe a periodic pattern with the Kinect measurements. To resolve this issue, we incorporate statistics that are robust to noisy data. Joint sequence filtering improves the quality of gait features, and therefore as we will show later gait recognition accuracy. However, there is a considerable amount of consecutive frames with missing skeleton in each sequence. This will cause the result of joint sequence correction prone to noisy measurements. To compensate for this shortcoming, in addition to mean, standard deviation, and maximum, we include median, upper and lower quartiles that are robust to noisy data. This property is, in particular, beneficial for gait cycles that are corrupted with outlier features. We build feature vectors that comprise mean, standard deviation, maximum, median, lower quartile and upper quartile of each feature over each gait cycle. Later, we will show that the resulting feature vectors can improve the classification scores over the feature vectors that only incorporate non-robust moments.

Iii-D Outlier removal

Outliers are a set of observations that cannot be described by the underlying model of a process. While in some applications, i.e. surveillance and abnormal behavior detection, outlier observations can be of interest and are kept for further investigation, there are situations that outliers are the result of faulty measurements or caused by noise. The latter type of outliers have to be detected and removed before model estimation, because the models that are estimated utilizing the data which is contaminated by such outliers, are not accurate and generate many false predictions. For gait recognition, one common approach is to remove outlier measurements from the collected data by setting some measurement thresholds [40, 46, 47, 45]. For comparison, and as an alternative approach to deal with the noisy and missing joint location measurements in our dataset that results into outlier features, we employed the Tukey method to detect outliers in the feature vectors that are computed from faulty and missing joint locations. The second row in Figure 4 presents some of the examples of faulty skeletons that are the result of erroneous joint localization. Furthermore, there are frames with missing skeletons. Figure 9 shows selected limb lengths of one subject computed from joint coordinates extracted from flash lidar data. The joint data are not treated for correction and by looking at the scale and distribution of each limb length, we can clearly see the features are highly contaminated by outlier values. We use Tukey’s test for outlier detection and employ it on every feature in a feature vector. We choose Tukey’s test in particular to avoid making any assumption about the underlying distribution of the features.

We define as a given feature vector, where is the number of features in and is the Euclidean distance between two skeleton joints. Before applying Tukey’s test, first we remove all the frames with missing skeletons. Next, we filter the remaining features, by setting an upper threshold that will be applied to all the features. To determine , we investigate the distribution of


where is the feature with maximum standard deviation. The histogram of is computed, and the maximum value of histogram bin interval is selected as according to


where is the histogram bin with the highest frequency, and

is a hyperparameter, which is set according to the distribution of

. A feature vector with a feature that is beyond will be removed. Next, Tukey’s test is employed on each feature. is not an outlier if


where is zero vector of length . For feature , means that passed the Tukey’s test, or is not an outlier. Based on Equation 8 for feature vector to be a non-outlier, all of its feature components have to be non-outliers. This means that is an outlier if there exists a , such that . Figure 10 presents the same features as in Figure 9 after outlier removal. By comparing the scale and values of features between the two figures, we observe a considerable reduction in the range of each feature as a result of outlier removal. This however, comes at the cost of eliminating a large portion of the data.

Fig. 9: Sample limb lengths for one subject from lidar data that shows abundance of outliers. Each graph represents the distribution of one limb length.
Fig. 10: Same limb lengths as in Figure 9 after outlier removal. Compare the distribution and range of each limb length between the two figures.

Iii-E Outlier removal for vector-based features

There are cases when the components of a feature vector are vectors. This happens if we compute the 3-dimensional vectors between skeleton joints. In other words, we have a vectorized matrix of the joint coordinates. is the number of 3-dimensional vectors in , and represents the column, which is the 3-dimensional vector between two skeleton joints


We need to treat each of the 3-dimensional vectors as one entity, rather than treating each dimension separately. In order to detect outliers for this set of features, we use the concept of marginal median. The marginal median of a set of vectors is a vector where each of its components is the median of all the vector components in that direction. We then use cosine distance to calculate vector similarity between each set of 3-dimensional vectors with their corresponding median vector. Defining as the marginal median over all given feature vectors



is the cosine similarity between i element of feature vector

and . This procedure will create the cosine similarity measure between each and the median vector . Then Tukey’s test is employed on the cosine similarity measures, and a feature vector is labeled as an outlier if at least one of its features is an outlier. Algorithm below describes outlier detection on the feature vectors built from 3-dimensional vectors using the concept of marginal median, cosine distance similarity measures between vectors, and Tukey’s test.
  Outlier detection for 39-D feature vectors
. Over all the given feature vectors, calculate the marginal
   median vector. Let the resulting median feature vector
2. For each 3D vector in each feature vector ,
   calculate ; save the results in one
   row of .
3. Employ Tukey’s test on each row of .
4. A given feature vector will pass Tukey’s test, if its
    corresponding row in passes Tukey’s test.

Iii-F Feature vectors

To evaluate the performance of the proposed method, we use two different sets of feature vectors: length-based feature vectors and vector-based feature vectors. The length-based feature vector consists of a set of limb lengths and distance between selected joints in the skeleton that are not directly connected. This feature vector can be described similar to in ”Outlier removal” section, where . Table I describes the components of the length-based feature vector. This set includes static limb length features and some other distance attributes that change during motion and encode information about postures. Figure 11, left side presents an illustration of the length-based feature vector.

Feature Feature
R and L Shoulder Elbow to elbow
R and L upper arm Wrist to wrist
R and L lower arm Hip to hip
Spine Knee to knee
R and L upper leg Ankle to ankle
R and L lower leg R shoulder to L ankle
shoulder to shoulder L shoulder to R ankle
TABLE I: List of length-based feature vectors (L refers to the left joints and R refers to the right joints)

The second set of feature vectors is vector based. This means that each feature is a 3-dimensional vector, computed between two skeleton joints. Compared to distance-based features [40], or to the angle-based attributes [34], vector-based features encode the angle and distance between selected joints of the skeleton. Table II lists the joints that form each of the 3-dimensional vectors in the vector-based feature vector. This feature vector can be described similar to in the last section, where . Unlike features in [36] that are computed with respect to a reference joint, the vectors in the vector-based feature vector are formulated between different joints, mimicking the limb vectors in the skeleton model. An illustration of the vector-based features is given in the right side of Figure 11.

3D vector 3D vector
Neck to R Shoulder R Hip to R Knee
Neck to L Shoulder L Hip to L Knee
Neck to R Hip R Elbow to R Wrist
Neck to L Hip L Elbow to L Wrist
R Shoulder to R Elbow R Knee to R Ankle
L Shoulder to L Elbow L Knee to L Ankle
TABLE II: List of three-dimensional vectors in the feature vector (L refers to the left joints and R refers to the right joints)
Fig. 11: Illustration of two types of feature vectors: distance-based feature vector (left), vector-based feature vector (right). All The features are depicted in red color.

Iv Results and discussion

Iv-a TigerCub 3D Flash lidar

The TigerCub is a light-weight 3D flash lidar camera that provides real-time range and intensity data, using eye-safe Zephyr laser [24]. The performance of the camera is not affected by the lack of light at night, or in the fog or dust. Like other lidar cameras, it can provide a detailed 3D mapping of the scene, where close objects can be recognized from each other. These properties make flash lidar a suitable candidate for real time data acquisition and autonomous operations.

TigerCub 3D Flash lidar has a focal plane of , and can stream up to 20 frames per second.

Iv-B Dataset

The dataset in this work has been recorded using a single TigerCub 3D Flash lidar camera, where the camera is located in a fixed location during all the actions. There are in total 34 sequences of walking actions performed by 10 subjects. The recording includes walking action of three main categories; walking toward and away from the camera, walking on a diamond shape, and walking on a diamond shape while holding a yard stick with one hand. For those frames in which subjects walk toward and away from the camera, all the views are from the front and back of the person, plus some frames of side views when the subjects turn away. The sequences with walking on a diamond shape offer more frames with the side views of the subjects. The data is captured at the rate of fps with frame resolution. The number of frames per video is different, with frames for the shortest video to frames for the video with the highest number of frames. Table III shows the number of frames per subject for each category of the walking action. Each frame has two sets of data, intensity and range, both with the same number of pixels, where intensity data is in gray-scale and the range data shows the distance of each point in the field of view from the camera sensor.

FB Walk D Walk DS Walk Total
subject 1 130 215 463 808
subject 2 248 462 451 1161
subject 3 199 398 391 988
subject 4 224 377 405 1006
subject 5 257 459 486 1202
subject 6 226 483 881 1590
subject 7 204 429 394 1027
subject 8 249 474 445 1168
subject 9 203 897 375 1475
subject 10 216 441 385 1042
TABLE III: Number of frames per type of walking action for each subject. FB Walk: front back walk, D Walk: diamond walk, DS Walk: diamond walk holding stick
Method Average Accuracy(%)

Average F-score(%)

[11] 27.90 25.36
[34] 25.34 23.24
[12] 61.81 54.61
[40] 63.82 58.64
GlidarCo, LB 54.96 51.58
GlidarCo, VB 67.16 63.47
TABLE IV: Correct identification scores (average accuracy and F-score) for the proposed features and the other methods. LB stands for length-based feature vector, and VB stands for vector-based feature vector. Features are computed without joint correction.

Iv-C Performance comparison

To evaluate the performance of the purposed method, we carry out a comparison with four state-of-the-art relevant gait recognition methods, the work of Preis [11], Ball [34], Sinha [12], and Yang [40]. Preis et al. use a set of static features, plus step length and speed as dynamic features. In [34], authors use the moments of six lower body angles. Sinha combines the features in [11] and [34] with their own area-based and distance between body segments features. Yang et al. utilize selected relative distance along different motion direction. The performance comparison includes the average accuracy and F-score as a measure of effectiveness of each method. In our experiments, we also consider the outlier removal method as another alternative approach and compare its performance with the other methods. Furthermore, to investigate the effectiveness of joint correction filtering, we compare the performance of all the methods after joint correction. We also evaluate joint correction effect on length-based and vector-based feature vectors. We use

of the sequences for training and the rest for testing. To insure the generalization of the proposed method, the classifier is tested on a type of walking that it was not trained on. Support vector machine (SVM) with the radial basis function (RBF) kernel is adopted as our classifier. Our vector-based and length-based features are computed per frame and no over-the-cylce moment computation is performed. Therefore, in this experiment, we do not incorporate motion dynamics in our features.

Method Average Accuracy(%) Average F-score(%)
[11] 41.07 38.59
[34] 28.33 26.25
[12] 80.84 78.96
[40] 75.19 70.50
Outlier removal, LB 76.60 68.89
Outlier removal, VB 80.70 75.22
GlidarCo, LB 76.37 70.19
GlidarCo, VB 84.88 78.98
TABLE V: Correct identification scores (average accuracy and F-score) for the proposed features and the other methods. Features are computed from the joint locations corrected by the proposed joint correction filtering.

Table IV shows the correct identification scores without joint correction. As we can see, the identification scores are generally low when features are computed from the skeleton data without correction filtering. This illustrates the fact that joint location coordinate values are noisy, therefore the resulting erroneous features jeopardize a successful gait identification. Results in Table V report identification scores with the proposed joint correction. It also shows the scores when outlier removal is applied on the features. While outlier removal can improve the identification scores, it is not as effective as joint correction. This might be caused by the noisy features that still exist after outlier removal, which can be observed by looking at the range of selected limb lengths after outlier removal in Figure 10. Furthermore, outlier removal results into elimination of more than of the data, which can be problematic when data is limited. The results in Table V demonstrate the effectiveness of joint correction, where it improves the gait identification scores in all of the cases. Among the evaluated methods, the performance of [34] does not improve as much as the other approaches. In [34] authors use six angles between lower body joints as the features and compute three moments of each angle over every gait cycle. We see in Figure 5 that the adopted skeleton model in our work lacks the foot joints that are essential to estimate two of the angles in [34]. To calculate these angles, we estimate the floor plane and use the normal vector to the plane. We speculate the error in this estimation might also incorporate into lower performance of this method compared to the others. Furthermore, it was reported before that distance-based features might work better than angle-based features, in particular when the number of subjects is relatively low [37]. Joint angles are also prone to changes in the walking speed [48, 49]. We also observe that regardless of the feature type, both length-based and vector-based features perform better after joint correction filtering. By comparing the results in both Table IV and V, we also realize that vector-based features outperform length-based features. Furthermore, while our features do not contain the dynamics of the motion, vector-based features still outperform methods that incorporate temporal information by computing moments of features over gait cycle.

Method Average Accuracy(%) Average F-score(%)
LB (3 statistics) 70.50 66.75
LB (6 statistics) 75.22 73.22
VB (3 statistics) 76.28 74.01
VB (6 statistics) 84.65 80.38
TABLE VI: Correct identification scores (average accuracy and F-score) with statistics of features computed over gait cycle. LB refers to length-based features, and VB refers to the vector-based features. the 3 statistics case refers to computing only mean, maximum, and standard deviation of each feature over every gait cycle. 6 statistics scenario adds median, lower and upper quartile to the initial 3 statistics.

Iv-D Evaluating features over gait cycle

As we discussed earlier, we also compute six statistics of our features over each gait cycle to incorporate the motion dynamics. Table VI presents the identification scores when the statistics of length-based and vector-based features are computed over each gait cycle. By comparing the classification scores, we make an interesting observation that adding median, upper, and lower quartile to mean, maximum, and standard deviation, which are the common statistics widely employed in many model-based methods, can improve the identification results. By comparing the results in Tables V and VI, we see that identification accuracy using the statistics of features over each gait cycle (table (VI) is almost the same as the per-frame method (table V). However, the F-score improves with the former method. The average per-class accuracy and F-score for the per-frame method is summarized in Table VII. We also present the per-class accuracy and F-score for the gait cycle statistics in Table VIII. By comparing the per-class classification scores for the per-frame and statistics over gait cycle, we also see that the minimum per-class accuracy and F-score are improved by and as a result of employing gait cycle statistics. This implies that by including the motion dynamics through the feature statistics, we can improve the performance of our model in general. This also indicates that by employing features that encode the motion dynamics, we can build a more reliable model compared to features that only include the static features. Last, the results in Tables V, VI, VII, and VIII suggest that as we increase the number of subjects for the identification task, the gait statistics that include static features through a dynamic criterion become superior to the per-frame case, where only static attributes are considered.

Method Average Accuracy(%) Average F-score(%)
subject 1 93.08 92.02
subject 2 91.54 73.46
subject 3 73.08 64.63
subject 4 83.85 61.76
subject 5 96.15 84.75
subject 6 67.69 69.02
subject 7 100 84.69
subject 8 75.77 82.95
subject 9 51.79 67.90
subject 10 81.92 88.94
TABLE VII: Correct identification scores (average accuracy and F-score) for each class of subject for the per-frame scenario of vector-based features. The minimum, and the next-to-lowest accuracy and F-score are presented in underlined type.
Method Average Accuracy(%) Average F-score(%)
subject 1 87.50 93.33
subject 2 75 63.16
subject 3 75 75
subject 4 75 70.59
subject 5 100 88.89
subject 6 100 69.57
subject 7 87.50 82.35
subject 8 87.50 90.32
subject 9 70.83 80.95
subject 10 62.50 76.92
TABLE VIII: Correct identification scores (average accuracy and F-score) for each class of subject for the statistics of vector-based features over gait cycle. The minimum, and the next-to-lowest accuracy and F-score are presented in underlined type.

Iv-E Effect of the number of training samples

In real world scenarios, there is always the issue of limited data for the task of gait recognition. Therefore, it is essential to investigate how the designed model or the selected features perform under limited data availability. We study the effect of the number of training examples on the performance of the corrected data with the assigned feature vectors. For this experiment, we examine the effect of the number of training samples on the performance of the vector-based features, both for the per-frame approach as well as the statistics over gait cycle scenario.

Figure 12, left presents the identification accuracy as a function of the number of training examples, for several number of test feature vectors in the range. For a given number of test samples, as we increase the number of training data, the accuracy of identification improves. When the size of test samples is small, accuracy increases at a higher rate as a result of using a larger number of training samples. A test sample size equal to or larger than 200 frames appears to be a proper choice empirically, as the accuracy trend shows to be more stable. We also observe that the best performance is obtained with a training set of 1000 samples, irrespective of the number of test data.

Figure 12, right illustrates the same experiment for the number of gait cycles, when the statistics of features over gait cycle are considered as the feature vectors. The number of training cycles changes over the range of , and the average classification accuracy is computed when different number of gait cycles is employed for testing. A comparison between the four different graphs in this figure illustrates that regardless of the number of test samples, using a training sample of at least gait cycles, we can acquire the highest classification accuracy with this feature vector. It should be noticed that while for test number we can achieve a higher accuracy for training samples of size 200 and higher, this only occurs due to a limited number of test examples.

V Conclusion

In this work, we presented a model-based gait recognition method using data collected by a flash lidar camera. The dataset contains 10 subjects, walking in three different manners in different directions. The detected skeletons from the collected sequences contain a considerable number of erroneous joint location measurements. Furthermore, the whole or part of the skeleton joints are missing in many frames. To improve the quality of the joint localization and to enhance gait recognition accuracy, we present GlidarCo. Unlike the common practice of removing noisy data under the described scenario, GlidarCo takes an unorthodox approach, by way of a filtering mechanism that corrects faulty skeleton joint positions to effectively improve the quality of joint localization and gait recognition. We also proposed a new and effective set of vector-based features that encode both length and angle of the limbs. Through the correction mechanism and the proposed vector-based features, GlidarCo obtained higher classification scores compared to state-of-the-art methods. Furthermore, to incorporate motion dynamics, robust statistics are integrated that can effectively improve the performance of the designed features that only employ traditional feature moments over the gait cycles. Future work will focus on anomaly detection in gait studies using lidar.

Fig. 12: Average classification accuracy for different sizes of training sample sets given a number of test examples for the frame-based (left), and statistics over gait cycle-based (right).


This work is funded in part by the U.S. Army DEVCOM, C5ISR Center NVESD.


  • [1] A. K. Jain, R. Bolle, and S. Pankanti, Biometrics: personal identification in networked society.   Springer Science & Business Media, 2006, vol. 479.
  • [2] N. V. Boulgouris, D. Hatzinakos, and K. N. Plataniotis, “Gait recognition: a challenging signal processing technology for biometric identification,” IEEE signal processing magazine, vol. 22, no. 6, pp. 78–90, 2005.
  • [3] S. Del Din, A. Godfrey, and L. Rochester, “Validation of an accelerometer to quantify a comprehensive battery of gait characteristics in healthy older adults and parkinson’s disease: toward clinical and at home use,” IEEE journal of biomedical and health informatics, vol. 20, no. 3, pp. 838–847, 2016.
  • [4] J. E. Cutting and L. T. Kozlowski, “Recognizing friends by their walk: Gait perception without familiarity cues,” Bulletin of the psychonomic society, vol. 9, no. 5, pp. 353–356, 1977.
  • [5] C. P. Charalambous, “Walking patterns of normal men,” in Classic Papers in Orthopaedics.   Springer, 2014, pp. 393–395.
  • [6] J. Daugman, “How iris recognition works,” in The essential guide to image processing.   Elsevier, 2009, pp. 715–739.
  • [7]

    M. A. Turk and A. P. Pentland, “Face recognition using eigenfaces,” in

    Computer Vision and Pattern Recognition, 1991. Proceedings CVPR’91., IEEE Computer Society Conference on.   IEEE, 1991, pp. 586–591.
  • [8] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 815–823.
  • [9] D. Maltoni, D. Maio, A. K. Jain, and S. Prabhakar, Handbook of fingerprint recognition.   Springer Science & Business Media, 2009.
  • [10] L. Lee and W. E. L. Grimson, “Gait analysis for recognition and classification,” in Automatic Face and Gesture Recognition, 2002. Proceedings. Fifth IEEE International Conference on.   IEEE, 2002, pp. 155–162.
  • [11] J. Preis, M. Kessel, M. Werner, and C. Linnhoff-Popien, “Gait recognition with kinect,” in 1st international workshop on kinect in pervasive computing.   New Castle, UK, 2012, pp. 1–4.
  • [12] A. Sinha, K. Chakravarty, and B. Bhowmick, “Person identification using skeleton information from kinect,” in Proc. Intl. Conf. on Advances in Computer-Human Interactions, 2013, pp. 101–108.
  • [13] T. Batabyal, S. T. Acton, and A. Vaccari, “Ugrad: A graph-theoretic framework for classification of activity with complementary graph boundary detection,” in Image Processing (ICIP), 2016 IEEE International Conference on.   IEEE, 2016, pp. 1339–1343.
  • [14] T. Batabyal, A. Vaccari, and S. T. Acton, “Ugrasp: A unified framework for activity recognition and person identification using graph signal processing,” in Image Processing (ICIP), 2015 IEEE International Conference on.   IEEE, 2015, pp. 3270–3274.
  • [15] R. A. Clark, K. J. Bower, B. F. Mentiplay, K. Paterson, and Y.-H. Pua, “Concurrent validity of the microsoft kinect for assessment of spatiotemporal gait variables,” Journal of biomechanics, vol. 46, no. 15, pp. 2722–2725, 2013.
  • [16] A. Kale, A. Sundaresan, A. Rajagopalan, N. P. Cuntoor, A. K. Roy-Chowdhury, V. Kruger, and R. Chellappa, “Identification of humans using gait,” IEEE Transactions on image processing, vol. 13, no. 9, pp. 1163–1173, 2004.
  • [17] C. Benedek, “3d people surveillance on range data sequences of a rotating lidar,” Pattern Recognition Letters, vol. 50, pp. 149–158, 2014.
  • [18] H. Fujiyoshi, A. J. Lipton, and T. Kanade, “Real-time human motion analysis by image skeletonization,” IEICE TRANSACTIONS on Information and Systems, vol. 87, no. 1, pp. 113–120, 2004.
  • [19] A. F. Bobick and A. Y. Johnson, “Gait recognition using static, activity-specific parameters,” in Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol. 1.   IEEE, 2001, pp. I–I.
  • [20] P. Fankhauser, M. Bloesch, D. Rodriguez, R. Kaestner, M. Hutter, and R. Y. Siegwart, “Kinect v2 for mobile robot navigation: Evaluation and modeling,” in 2015 International Conference on Advanced Robotics (ICAR).   IEEE, 2015, pp. 388–394.
  • [21] S. Zennaro, “Evaluation of microsoft kinect 360 and microsoft kinect one for robotics and computer vision applications,” 2014.
  • [22] T. Krzeszowski, A. Switonski, B. Kwolek, H. Josinski, and K. Wojciechowski, “Dtw-based gait recognition from recovered 3-d joint angles and inter-ankle distance,” in International Conference on Computer Vision and Graphics.   Springer, 2014, pp. 356–363.
  • [23] M. Balazia and K. N. Plataniotis, “Human gait recognition from motion capture data in signature poses,” IET Biometrics, vol. 6, no. 2, pp. 129–137, 2017.
  • [24] R. Horaud, M. Hansard, G. Evangelidis, and C. Ménier, “An overview of depth cameras and range scanners based on time-of-flight technologies,” Machine vision and applications, vol. 27, no. 7, pp. 1005–1020, 2016.
  • [25] C. Benedek, B. Gálai, B. Nagy, and Z. Jankó, “Lidar-based gait analysis and activity recognition in a 4d surveillance system,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 1, pp. 101–113, 2018.
  • [26]

    B. Gálai and C. Benedek, “Feature selection for lidar-based gait recognition,” in

    Computational Intelligence for Multimedia Understanding (IWCIM), 2015 International Workshop on.   IEEE, 2015, pp. 1–5.
  • [27] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” arXiv preprint arXiv:1611.08050, 2016.
  • [28] N. Sadeghzadehyazdi, T. Batabyal, L. E. Barnes, and S. T. Acton, “Graph-based classification of healthcare provider activity,” in 2016 50th Asilomar Conference on Signals, Systems and Computers.   IEEE, 2016, pp. 1268–1272.
  • [29] A. Kovashka and K. Grauman, “Learning a hierarchy of discriminative space-time neighborhood features for human action recognition,” in 2010 IEEE computer society conference on computer vision and pattern recognition.   IEEE, 2010, pp. 2046–2053.
  • [30] F. Ofli, R. Chaudhry, G. Kurillo, R. Vidal, and R. Bajcsy, “Sequence of the most informative joints (smij): A new representation for human skeletal action recognition,” Journal of Visual Communication and Image Representation, vol. 25, no. 1, pp. 24–38, 2014.
  • [31] R. Vemulapalli, F. Arrate, and R. Chellappa, “Human action recognition by representing 3d skeletons as points in a lie group,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 588–595.
  • [32] T. Batabyal, T. Chattopadhyay, and D. P. Mukherjee, “Action recognition using joint coordinates of 3d skeleton data,” in 2015 IEEE International Conference on Image Processing (ICIP).   IEEE, 2015, pp. 4107–4111.
  • [33] G. Evangelidis, G. Singh, and R. Horaud, “Skeletal quads: Human action recognition using joint quadruples,” in 2014 22nd International Conference on Pattern Recognition.   IEEE, 2014, pp. 4513–4518.
  • [34] A. Ball, D. Rye, F. Ramos, and M. Velonaki, “Unsupervised clustering of people from ‘skeleton’data,” in 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).   IEEE, 2012, pp. 225–226.
  • [35] R. M. Araujo, G. Graña, and V. Andersson, “Towards skeleton biometric identification using the microsoft kinect sensor,” in Proceedings of the 28th Annual ACM Symposium on Applied Computing.   ACM, 2013, pp. 21–26.
  • [36] M. Kumar and R. V. Babu, “Human gait recognition using depth camera: a covariance based approach,” in Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing.   ACM, 2012, p. 20.
  • [37] B. Dikovski, G. Madjarov, and D. Gjorgjevikj, “Evaluation of different feature sets for gait recognition using skeletal data from kinect,” in 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).   IEEE, 2014, pp. 1304–1308.
  • [38] F. Ahmed, P. P. Paul, and M. L. Gavrilova, “Dtw-based kernel and rank-level fusion for 3d gait recognition using kinect,” The visual computer, vol. 31, no. 6-8, pp. 915–924, 2015.
  • [39] S. Ali, Z. Wu, X. Li, N. Saeed, D. Wang, and M. Zhou, “Applying geometric function on sensors 3d gait data for human identification,” in Transactions on Computational Science XXVI.   Springer, 2016, pp. 125–141.
  • [40] K. Yang, Y. Dou, S. Lv, F. Zhang, and Q. Lv, “Relative distance features for gait recognition with kinect,” Journal of Visual Communication and Image Representation, vol. 39, pp. 209–217, 2016.
  • [41] N. Sadeghzadehyazdi, T. Batabyal, A. Glandon, N. K. Dhar, B. Familoni, K. Iftekharuddin, and S. T. Acton, “Glidar3dj: A view-invariant gait identification via flash lidar data correction,” arXiv preprint arXiv:1905.00943, 2019.
  • [42] F. N. Fritsch and R. E. Carlson, “Monotone piecewise cubic interpolation,” SIAM Journal on Numerical Analysis, vol. 17, no. 2, pp. 238–246, 1980.
  • [43] W. S. Cleveland, “Robust locally weighted regression and smoothing scatterplots,” Journal of the American statistical association, vol. 74, no. 368, pp. 829–836, 1979.
  • [44] K. Koide and J. Miura, “Identification of a specific person using color, height, and gait features for a person following robot,” Robotics and Autonomous Systems, vol. 84, pp. 76–87, 2016.
  • [45] W. Chi, J. Wang, and M. Q.-H. Meng, “A gait recognition method for human following in service robots,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 48, no. 9, pp. 1429–1440, 2018.
  • [46] J. Liu, A. Shahroudy, D. Xu, and G. Wang, “Spatio-temporal lstm with trust gates for 3d human action recognition,” in European Conference on Computer Vision.   Springer, 2016, pp. 816–833.
  • [47] V. B. Semwal, J. Singha, P. K. Sharma, A. Chauhan, and B. Behera, “An optimized feature selection technique based on incremental feature analysis for bio-metric gait data classification,” Multimedia tools and applications, vol. 76, no. 22, pp. 24 457–24 475, 2017.
  • [48] S. Han, “The influence of walking speed on gait patterns during upslope walking,” Journal of Medical Imaging and Health Informatics, vol. 5, no. 1, pp. 89–92, 2015.
  • [49] J. Kovač and P. Peer, “Human skeleton model based dynamic features for walking speed invariant gait recognition,” Mathematical Problems in Engineering, vol. 2014, 2014.