DeepWalking: Enabling Smartphone-based Walking Speed Estimation Using Deep Learning

05/09/2018 ∙ by Aawesh Shrestha, et al. ∙ South Dakota State University 0

Walking speed estimation is an essential component of mobile apps in various fields such as fitness, transportation, navigation, and health-care. Most existing solutions are focused on specialized medical applications that utilize body-worn motion sensors. These approaches do not serve effectively the general use case of numerous apps where the user holding a smartphone tries to find his or her walking speed solely based on smartphone sensors. However, existing smartphone-based approaches fail to provide acceptable precision for walking speed estimation. This leads to a question: is it possible to achieve comparable speed estimation accuracy using a smartphone over wearable sensor based obtrusive solutions? We find the answer from advanced neural networks. In this paper, we present DeepWalking, the first deep learning-based walking speed estimation scheme for smartphone. A deep convolutional neural network (DCNN) is applied to automatically identify and extract the most effective features from the accelerometer and gyroscope data of smartphone and to train the network model for accurate speed estimation. Experiments are performed with 10 participants using a treadmill. The average root-mean-squared-error (RMSE) of estimated walking speed is 0.16m/s which is comparable to the results obtained by state-of-the-art approaches based on a number of body-worn sensors (i.e., RMSE of 0.11m/s). The results indicate that a smartphone can be a strong tool for walking speed estimation if the sensor data are effectively calibrated and supported by advanced deep learning techniques.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Walking speed estimation is an essential component for numerous smartphone apps in various domains. Accurate walking speed estimation is increasingly important for many health and fitness apps as usage of these apps has ever increased recently, over 300% in just three years [1]. Research shows that walking speed information offers more than just the speed. In fact, walking speed can be used as the “human odometer” that allows us to derive valuable knowledge about the health status [2]. For example, changes in walking speed (monitored over a long term period) can be used as a vital clue to detect knee disabilities [3]. Recent studies suggest that walking speed is a useful indicator of the current lifestyle and future health predictor of a person [4]. Other kinds of apps may also benefit significantly from precise walking speed estimation. For instance, accurate walking speed estimation is important for navigation apps such as Google Maps to provide reliable navigation results; transportation apps such as the ones designed for pedestrian safety [5][6][7] rely on walking speed estimation to reduce the number of pedestrian involved accidents.

Recent advances in mobile computing technologies [8] enabled numerous solutions have been proposed to predict walking speed effectively. A majority of these solutions are based on specialized body-mounted motion sensors [9][10][11]. However, these systems are not adequate to support the general use cases of many mobile apps. There are smartphone-based solutions that utilize the embedded sensors of smartphone such as accelerometer [12] and GPS [13]

. However, imprecision of these inertial sensors results in unreliable walking speed estimation. Machine learning techniques are applied to enhance the speed estimation accuracy 

[14]. However, a simple regression model and out-of-date machine learning techniques do not allow us to obtain highly comparable precision of walking speed estimation over wearable sensor based approaches, especially because it is challenging to identify and extract effective features.

With recent advances in deep learning techniques [15], the parametrization of input data is no longer needed allowing us to perform highly accurate prediction at much faster speed with fully automated feature identification and extraction. In this paper, we present

, a deep learning based walking speed estimation framework for smartphone that can be easily integrated with various kinds of mobile apps. While deep learning has been used mostly for gait pattern recognition 

[16][17][18], to our knowledge, is the first smartphone based mobile system that maximizes precision of walking speed estimation using deep learning.

Especially, the deep convolutional neural network (DCNN) is known as a powerful deep learning technique that has seen successful applications in diverse areas such as human activity detection [19][20], and image recognition [21]. We employ the DCNN to train the network model for walking speed estimation. Accelerometer and gyroscope sensor data collected with smartphone are fed into the DCNN, and appropriate features are automatically identified and extracted to build the network model. More precisely, noise of sensor data is effectively removed using a low-pass filter designed based on a power spectrum analysis of sensor data. Smartphone orientation independent vertical and horizontal components are extracted from the noise filtered accelerometer and gyroscope data. These vertical and horizontal components are used as primary data sources to construct the images that are provided as input to the DCNN for training the network model. The architecture of the DCNN is designed such that it minimizes the root mean squared error (RMSE) of walking speed estimation.

A proof-of-concept is implemented using a off-the-shelf smartphone. It was used to collect sensor data, build the DCNN model, and predict walking speed. Experiments were conducted with 10 participants using a treadmill to obtain the smartphone sensor data as well as the ground truth information. The results show that the RMSE of walking speed estimation is 0.16m/s. It is interesting to find that the speed estimation accuracy of quite comparable with the best result obtained with several skin-mounted accelerometer sensors. These findings indicate that a smartphone itself can actually be a powerful tool for walking speed estimation if sensor data are appropriately calibrated and supported by advanced machine learning techniques like DCNN.

The contributions of this paper are summarized as follows.

  • To our knowledge, is the first deep learning based walking speed estimation framework specifically designed for smartphone apps.

  • is designed to serve general use cases and thus can be integrated with any app that requires accurate walking speed estimation.

  • A prototype of is implemented. Experiments were conducted with 10 participants to validate the effectiveness of .

This paper is organized as follows. In Section II, we review the literature on walking speed estimation. We then present an overview of the proposed walking speed estimation framework followed by the details of each system component in Section III. Experimental results are discussed in Section V after describing the experimental set up in Section IV. We then conclude in Section VI.

Ii Related Work

The majority of existing approaches for walking speed estimation are based on body-worn motion sensors (e.g., mostly accelerometer and gyroscope sensors secured to body) [22][23][10][24][2][25]. Mannini and Sabatini used foot mounted sensors [9]; McGinnis et al. utilized wearable accelerometer arrays mounted on several parts of body like shank, sacrum, and thigh [10]; and Zihajehzadeh and Park exploited wrist-worn sensors for walking speed estimation [11]. These approaches, however, do not serve effectively the general use case of numerous apps where the user holding a smartphone tries to find his or her walking speed solely based on smartphone sensors. Thus, a research question that we ask here is if it would be possible to achieve comparable precision of walking speed estimation using only the inertial sensors of smartphone, and what would be the methodologies to accomplish this goal.

A number of smartphone-based systems have been developed [12][14][13]. Cox et al. proposed a simple solution that estimates the walking speed based on the integration of acceleration [12]. Cho et al. proposed to calibrate opportunistically the inertial sensor-based speed estimation using GPS of smartphone when the user is walking outdoors [13]. Park et al. applied the regularized kernel methods on the collected accelerometer data to achieve higher accuracy of walking speed estimation [14]. Although there have been efforts to utilize machine learning techniques to improve the walking speed estimation accuracy, identification of effective features and manual extraction of those features are very challenging. In contrast to other smartphone based approaches, we leverage automated extraction of the most effective features using the DCNN to maximize the walking speed estimation accuracy.

The deep learning technology has been increasingly used in many recent works [16][17][18]. However, these deep learning based systems are focused on recognition of gait patterns rather than estimation of walking speed. Gong et al. proposed a DCNN to perform gait assessment for multiple sclerosis patients based on the spectral and temporal associations among sensor data collected with a number of inertial body sensors [16]. Gadaleta and Rossi adopted the DCNN to recognize a target user based on the way of their walking utilizing the accelerometer and gyroscope data of smartphone [17]. Hannink et al.

used the DCNN to estimate the stride length 

[18]. In line of this research direction that is built upon deep learning technology, we propose the first general-purpose deep learning based walking speed estimation framework for smartphone.

Iii System Design

In this section, an overview of is presented followed by the details of each system component of .

Iii-a System Overview

Fig. 1: System architecture of consisting of 5 key system components. The network model is trained in the offline mode, while prediction of walking speed is performed in the online mode.

Figure 1 displays the system architecture of . consists of five key components: Data Collection, Filtering, Coordinate System Alignment, DCNN Training, and DCNN Prediction. The data collection module simply collects data from the accelerometer and gyroscope sensors of smartphone. The filtering module is designed to remove the noise of collected sensor data. And then, the coordinate system alignment module extracts phone orientation independent data from filtered sensor data. As shown in Figure 1, these three system components are executed in both the online and offline modes. The network model for speed estimation is constructed in the offline mode by the DCNN Training module. This network model is used in the online mode to perform walking speed estimation by the DCNN Prediction module. More details on each system module are described in the following sections.

Iii-B Filtering

Fig. 2: Power spectral density of accelerometer and gyroscope data. Most of the power of acclerometer and gyroscope signals is concentrated at low frequencies.

To design a filter to remove noise of sensor data, we first investigate the power spectral density of the full accelerometer and gyroscope data that are used to train the DCNN network model. Figure 2 plots the power spectral density at different frequencies using the Welch’s method with the Hanning window of 1 second and half window overlap [26]. The graph shows that most of the power for the accelerometer and gyroscope data are concentrated at low frequencies (i.e., between 0 and 15Hz). Accordingly, we designed a simple low-pass Finite Impulse Response (FIR) filter with a cutoff frequency of 15Hz. This filter is applied to sensor data to remove the noise.

Fig. 3: Results of filtering for accelerometer data.
Fig. 4: Results of filtering for gyroscope data.

Figures 4 and 4 display the results of removing noise of accelerometer and gyroscope data using the low-pass filter respectively. As observed in these two figures, noise has been effectively reduced for both accelerometer and gyroscope data.

Iii-C Coordinate System Alignment

Fig. 5: The coordinate systems of smartphone and walking human. The varying orientation of smartphone is transformed into the fixed coordinate system of .

With the unknown orientation of smartphone, it is impossible to make reliable inference of walking speed given only accelerometer and gyroscope data. An algorithm is presented that transforms accelerometer and gyroscope data into orientation independent counterpart [27]. Basically the algorithm is used to express accelerometer and gyroscope data on a fixed coordinate system. Specifically, Figure 5 illustrates the three accelerometer/gyroscope axes denoted by , , and . It also displays the fixed coordinate system defined for the user which is expressed with three axes (direction of walking motion), (vertical to the ground), and (orthogonal to the other two axes).

The coordinate system alignment module is started by deriving the gravity component on each axis (i.e., and

) of the accelerometer. With the estimated gravity component, the magnitudes of vertical and horizontal components of accelerometer and gyroscope data are calculated. More specifically, let a vector

be a one-point acceleration measurement, where , , and represent acceleration measurements on respective axes. Assume that there are such acceleration vectors collected during a sampling interval. The gravity component denoted by a vector is estimated by taking averages of all the measurements on each axis collected during the sampling interval, i.e., , , and . Note that we used 2 second as the sampling interval.

Fig. 6:

The DCNN architecture. The architecture consists of cascaded four groups of layers (convolutional-batch normal-ReLU-Maxpooling) followed by the dropout layer, fully connected layer, and regression layer.

The dynamic component of denoted by that represents walking motion excluding the gravity component is . Now the vertical component is calculated using projection of onto the vertical axis as follows.


Now the horizontal component can be easily computed as . The vertical and horizontal components of gyroscope vector are similarly calculated, i.e., the vertical component is calculated based on projection of onto , and the horizontal component is computed by subtracting the vertical component from the gyroscope vector, i.e., . These vertical and horizontal components of accelerometer and gyroscope data are used as main sources to construct the images that are provided as input to the DCNN.

Iii-D Training and Prediction

Figure 6 illustrates the architecture of the DCNN. The input to the DCNN is images with the size of . Each row of the image consists of 4 values that represent the magnitude of vertical and horizontal components of both accelerometer and gyroscope. We employ (16, 32, 48, 64) convolutional filters with the size of and a stride of 1. This convolutional layer is a core building block of DCNN. It is applied to each input image by sliding the filters across the input vertically and horizontally and calculating the dot product of the weights of the filter and the input to optimize the weights.

We employ the batch normalization layer between the convolutional layer and the ReLU layer to lower the sensitivity to network initialization. In particular, we insert the ReLU layer to apply the non-saturating activation function

to increase the nonlinear properties of the network. Next, the maxpooling layer with the filter size with a stride of 1 is inserted after the ReLU layer to reduce computation in the network and control overfitting. Four such groups of layers (i.e.,

convolutional layer, batch normalization layer, ReLU layer, and maxpooling layer) are cascaded with each other in a row–note that the maxpooling layer is excluded for the 4th group. We then add a dropout layer to alleviate the overfitting problem by randomly setting input elements to zero with a given probability–We found the dropout rate of 0.2 gave good results. Finally, the fully connected layer is used to compute the class scores which are then feed to the regression layer. More specifically, since our objective is to predict continuous data, walking speed, we add the regression layer at the end of the DCNN. In this process, root-mean-squared-error (RMSE) is used as the loss function.

Iv Experimental Setup

We used Samsung Galaxy S6 to collect accelerometer and gyroscope data for both training the network model and prediction of walking speed. Ten volunteers participated in this data collection process. Each participant was asked to walk on a treadmill for 25 minutes, i.e., 5 minutes for each walking speed of 1mph, 1.5mph, 2mph, 2.5mph, and 3mph. Consequently, we have obtained a total of 250 minutes of accelerometer and gyroscope data from 10 participants. The sampling frequency for both accelerometer and gyroscope sensors was set to 100Hz. During the process of data collection, the smartphone was placed in the participant’s pocket.

To build the DCNN network model, 70% of collected data (i.e., 17 minutes of sensor data for each participant) were used to create the images that were fed into the DCNN. The remaining 30% of data (i.e., 8 minutes of sensor data for each participant) were used for prediction of walking speed. The prediction interval was 2 seconds, meaning that walking speed was estimated every 2 seconds.

The DCNN nework model was constructed offline using a PC that was equipped with the Intel core i5 6th generation CPU, 8GB of RAM running on Ubuntu 16.04 LTS. We used the neural network toolbox of MATLAB to create the network model. In addition, for proof-of-concept validation and more effective data analysis, prediction of walking speed was also performed under this offline configuration. However, it should be noted that the implemented system can be easily ported to any mobile platform like Android and iOS.

V Experimental Results

In this section, we evaluate the performance of focusing on measuring RMSE of walking speed estimation. Measured RMSE is compared with that obtained by state-of-the-art approaches based on body-worn motion sensors.

V-a Speed Estimation Accuracy

Fig. 7: RMSE of estimated walking speed for different participants. The RMSEs are comparable to state-of-the-art approaches based on wearable motion sensors.

We measured RMSE for 10 participants. Each participant was asked to walk on a treadmill for 5 minutes per each walking speed. The walking speed was estimated every 2 seconds (i.e., the prediction interval of 2 seconds). The results are displayed in Figure 7

as a box plot (the central mark: the median; the bottom and top edges of the box: the 25th and 75th percentiles, respectively; ‘+’ symbol: outliers). As shown, we observe that overall the RMSE of all participants was quite small compared with the results of state-of-the-art work 

[10]. The average RMSE of all participants was 0.16m/s, and Table I shows the average RMSE for McGinnis et al. [10] with varying sensor locations and numbers. The results indicate that a smartphone-based solution can be actually quite competitive in comparison with other approaches based on several skin-mounted sensors. More specifically, McGinnis et al. achieved the best RMSE of 0.11m/s when the BioStampRC (a skin-mounted accelerometer sensor) is secured to the user’s thigh, and shank [10], while relatively lower RMSE was obtained when fewer accelerometers were used.

McGinnis et al.
Device Locations RMSE (m/s) RMSE (m/s)
Sacrum 0.15 0.16 (Trouser Pocket)
Thigh 0.15
Shank 0.13
Sacrum, Thigh 0.16
Sacrum, Shank 0.13
Thigh, Shank 0.11
Sacrum, Thigh, Shank 0.12
TABLE I: Comparison of RMSE of estimated walking speed under a treadmill setting (for healthy subjects).

It is also observed that the largest RMSE gap between any two participants was only 0.04m/s. We interpret this result as an evidence that predicts walking speed reliably regardless of varying walking styles of different participants.

V-B Effect of Sensor Types

Accelerometer and gyroscope are the most widely used motion sensors for walking speed estimation in the literature. In this section, we aim to understand the effect of varying combinations of these motion sensors on the walking speed estimation accuracy. More precisely, we generated different sets of images with varying types of sensors, i.e., ACC only, Gyro only, and ACC+Gyro. These images were individually fed into the DCNN to construct the network model and to predict walking speed.

Fig. 8: Effect of sensor types on RMSE of estimated walking speed. When accelerometer and gyroscope sensor data are fusioned, RMSE was significantly reduced.

The results are shown in Figure 8. As expected, we obtained the best accuracy when both accelerometer and gyroscope sensors were used together. Specifically, the RMSE of ACC+Gyro was 40.1% and 37.5% smaller compared with that of ACC-only and Gyro-only, respectively. An interesting observation was that the RMSE of ACC-only and Gyro-only was actually quite similar. This result indicates that both sensors are competitive in estimating walking speed, and when these two sensors are fusioned, the walking speed estimation accuracy can be improved significantly. Analyzing the effect of other embedded sensors of smartphone such as barometer, magnetometer, etc. is left as future work.

V-C Effect of Number of Images

Recall that images are fed into the DNN in order to construct the network model. In this section, we analyze the impact of the number of images on the walking speed estimation accuracy. For this experiment, we measured RMSE by varying from 1,000 to 12,500.

Fig. 9: Effect of number of images on RMSE of estimated walking speed. Larger the number of images, the smaller RMSE becomes. Yet, the performance gain also decreases as the number of images gets larger.

The results are depicted in Figure 9. It was observed that as increased, the RMSE of estimated walking speed was gradually decreased. More specifically, when we increased from 1,000 to 12,500, the RMSE decreased by 29.4%. The reason is that larger allows the DCNN to construct the network model that more effectively reflects the correlation between walking motion and walking speed.

Another interesting observation was that the performance gain (i.e., in terms of how much RMSE was decreased) decreased as was increased. These results indicate that while providing more input to the DCNN with larger is certainly beneficial, it does not necessarily lead to substantially improved performance when is sufficiently large enough. Now considering the overhead of training the network model with large , it is important to decide an appropriate number of images that balances the tradeoff between the delay/overhead for training the network model and the expected performance gain. We leave this task of tuning the system parameter to balance the tradeoff as future work.

Vi Conclusion

We have presented , a deep learning based walking speed estimation framework for smartphone. is the first walking speed estimation system based on deep learning. It is expected to benefit diverse smartphone apps that rely on a system component of walking speed estimation.

Development of warrants a number of interesting future research directions. First, the prototype of will be fully implemented and tested with more diverse range of participants including different gender, age, and physical conditions and with different locations of the phone (e.g., in hand, on arm, etc.). Second, case studies with fitness, transportation, and navigation apps will be performed to analyze the effect of accurate walking speed estimation that offers. Third, several system parameters (e.g., the number of images, different sensor types, hyper parameters for deep learning) will be tuned for better performance.

Vii Acknowledgement

This research was supported in part by the Competitive Research Grant Program (CRGP) of South Dakota Board of Regents (SDBoR).


  • [1] “Health and fitness app usage,”
    health-fitness-app-usage-grew-330-just-3-years/, accessed: 2018-03-30.
  • [2] J.-S. Hu, K.-C. Sun, and C.-Y. Cheng, “A model-based human walking speed estimation using body acceleration data,” in Proc. of ROBIO, 2012.
  • [3] T. Andriacchi, J. Ogle, and J. Galante, “Walking speed as a basis for normal and abnormal gait measurements,” Journal of biomechanics, vol. 10, no. 4, pp. 261–268, 1977.
  • [4] S. Fritz and M. Lusardi, “White paper:“walking speed: the sixth vital sign”,” Journal of geriatric physical therapy, vol. 32, no. 2, pp. 2–5, 2009.
  • [5] T. Wang, G. Cardone, A. Corradi, L. Torresani, and A. T. Campbell, “Walksafe: a pedestrian safety app for mobile phone users who walk and talk while crossing roads,” in Proc. of HotMobile, 2012.
  • [6] S. Jain, C. Borgiattino, Y. Ren, M. Gruteser, Y. Chen, and C. F. Chiasserini, “Lookup: Enabling pedestrian safety services via shoe sensing,” in Proc. of MobiSys, 2015.
  • [7] M. Won, A. Shrestha, and Y. Eun, “Enabling wifi p2p-based pedestrian safety app,” arXiv preprint arXiv:1805.00442, 2018.
  • [8] M. Won, A. Mishra, and S. H. Son, “Hybridbaro: Mining driving routes using barometer sensor of smartphone,” IEEE Sensors Journal, vol. 17, no. 19, pp. 6397–6408, 2017.
  • [9] A. Mannini and A. M. Sabatini, “Walking speed estimation using foot-mounted inertial sensors: Comparing machine learning and strap-down integration methods,” Medical engineering and physics, vol. 36, no. 10, pp. 1312–1321, 2014.
  • [10] R. S. McGinnis, N. Mahadevan, Y. Moon, K. Seagers, N. Sheth, J. A. Wright Jr, S. DiCristofaro, I. Silva, E. Jortberg, M. Ceruolo et al., “A machine learning approach for gait speed estimation using skin-mounted wearable sensors: From healthy controls to individuals with multiple sclerosis,” PloS one, vol. 12, no. 6, p. e0178366, 2017.
  • [11] S. Zihajehzadeh and E. J. Park, “Regression model-based walking speed estimation using wrist-worn inertial sensor,” PloS one, vol. 11, no. 10, p. e0165211, 2016.
  • [12] J. Cox, Y. Cao, G. Chen, J. He, and D. Xiao, “Smartphone-based walking speed estimation for stroke mitigation,” in Proc. of ISM, 2014.
  • [13] D.-K. Cho, M. Mun, U. Lee, W. J. Kaiser, and M. Gerla, “Autogait: A mobile platform that accurately estimates the distance walked,” in Proc. of PerCom, 2010.
  • [14] J.-g. Park, A. Patel, D. Curtis, S. Teller, and J. Ledlie, “Online pose classification and walking speed estimation using handheld devices,” in Proc. of UbiComp, 2012.
  • [15] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015.
  • [16] J. Gong, M. D. Goldman, and J. Lach, “Deepmotion: a deep convolutional neural network on inertial body sensors for gait assessment in multiple sclerosis.” in Proc of Wireless Health, 2016.
  • [17] M. Gadaleta and M. Rossi, “Idnet: Smartphone-based gait recognition with convolutional neural networks,” Pattern Recognition, vol. 74, pp. 25–37, 2018.
  • [18] J. Hannink, T. Kautz, C. F. Pasluosta, J. Barth, S. Schülein, K.-G. Gaßmann, J. Klucken, and B. M. Eskofier, “Stride length estimation with deep learning,” arXiv preprint arXiv:1609.03321, 2016.
  • [19]

    F. J. Ordóñez and D. Roggen, “Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition,”

    Sensors, vol. 16, no. 1, p. 115, 2016.
  • [20] X. Li, Y. Zhang, I. Marsic, A. Sarcevic, and R. S. Burd, “Deep learning for rfid-based activity recognition,” in Proc. of SenSys, 2016.
  • [21]

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in

    Advances in neural information processing systems, 2012, pp. 1097–1105.
  • [22] R. Herren, A. Sparti, K. Aminian, and Y. Schutz, “The prediction of speed and incline in outdoor running in humans using accelerometry.” Medicine and science in sports and exercise, vol. 31, no. 7, pp. 1053–1059, 1999.
  • [23] H. Vathsangam, A. Emken, D. Spruijt-Metz, and G. S. Sukhatme, “Toward free-living walking speed estimation using gaussian process-based regression with on-body accelerometers and gyroscopes,” in Proc. of PervasiveHealth, 2010.
  • [24] A. M. Sabatini and A. Mannini, “Ambulatory assessment of instantaneous velocity during walking using inertial sensor measurements,” Sensors, vol. 16, no. 12, p. 2206, 2016.
  • [25] Q. Li, M. Young, V. Naing, and J. Donelan, “Walking speed estimation using a shank-mounted inertial measurement unit,” Journal of biomechanics, vol. 43, no. 8, pp. 1640–1643, 2010.
  • [26]

    P. Welch, “The use of fast fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms,”

    IEEE Transactions on audio and electroacoustics, vol. 15, no. 2, pp. 70–73, 1967.
  • [27] D. Mizell, “Using gravity to estimate accelerometer orientation.”   Citeseer, 2003, p. 252.