Activity Classification Using Smartphone Gyroscope and Accelerometer Data

03/20/2019 ∙ by Emily Huang, et al. ∙ Harvard University 0

Activities, such as walking and sitting, are commonly used in biomedical settings either as an outcome or covariate of interest. Researchers have traditionally relied on surveys to quantify activity levels of subjects in both research and clinical settings, but surveys are not objective in nature and have many known limitations, such as recall bias. Smartphones provide an opportunity for unobtrusive objective measurement of various activities in naturalistic settings, but their data tends to be noisy and needs to be analyzed with care. We explored the potential of smartphone accelerometer and gyroscope data to distinguish between five different types of activity: walking, sitting, standing, ascending stairs, and descending stairs. We conducted a study in which four participants followed a study protocol and performed a sequence of various activities with one phone in their front pocket and another phone in their back pocket. The subjects were filmed throughout, and the obtained footage was annotated to establish ground truth activity. We applied the so-called movelet method to classify their activity. Our results demonstrate the promise of smartphones for activity detection in naturalistic settings, but they also highlight common challenges in this field of research.



There are no comments yet.


This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many researchers have recently advocated a more substantial role for large-scale phenotyping as a route to advances in the biomedical sciences. Of the many different phenotype classes, precise capture of social, behavioral, and cognitive markers in naturalistic settings has traditionally presented special challenges to phenomics because of their temporal nature, contextual dependence, and lack of tools for measuring them objectively. The ubiquity of smartphones presents an opportunity to capture these markers in free-living settings, offering a scalable solution to the phenotyping problem. Smartphones have at least three distinct advantages compared to other approaches to social, behavioral, and cognitive phenotyping: (1) the availablility of these devices makes it possible to implement large studies without requiring additional subject instrumentation; (2) reliance on passive data makes the process unobtrusive and poses no burden on the subject, making long-term followup possible; and (3) the combination of the previous two factors make it possible, at least in principle, to obtain these markers prospectively from a cohort of interest at a low cost.

To classify different types of activity, we consider data from the tri-axial accelerometer and tri-axial gyroscope in a smartphone. These sensors output data for three orthogonal axes in the frame of reference of the phone ( = from left to right, = from top to bottom, = through the phone). The gyroscope measures the angular velocity (units of radians/second) about each axis, which indicates the direction and speed that the phone is spinning about the axis. The accelerometer measures the acceleration along each axis in units of (1 = 9.81 m/s/s). The sampling rates of these sensors can vary based on phone types and the mode the phone is in. In iPhones, for example, accelerometer and gyroscope data are collected at a frequency of 10 Hz, i.e., 10 samples per second.

There has been extensive work on using accelerometer data from wearable devices to determine a person’s activity level. Bai et al. (2014) proposed interpretable metrics to quantify how much a person moves, using accelerometer data. Bai et al. (2012) proposed the so-called movelet method to determine the activity that a person is performing based on raw data from a single, tri-axial accelerometer worn at the hip. He et al. (2014) applied the movelet method using data from three tri-axial accelerometers fixed at multiple points of the body, including the right hip, left wrist, and right wrist. Xiao et al. (2016) considered activity classification for a person using another person’s training data. Urbanek et al. (2018)

used the Fourier transform to detect sustained harmonic walking, defined as bouts of walking that are 10 seconds or more with low variability in step frequency, using body worn accelerometers.

Alhassan et al. (2008)

analyzed ActiGraph accelerometer data collected during a field study, in which there was a missing data problem due to subjects sometimes not wearing the accelerometer. The authors proposed a method to estimate a subject’s physical activity level using all of their available accelerometer data. There is also existing work on using both the accelerometer and gyroscope in wearable devices. With gyroscope and accelerometer data from wearable sensors,

Tahavori et al. (2017)

applied machine learning methods, including random forest, support vector machine, LogitBoost, and Naive Bayes, to differentiate between six activities (tandem walk, stand to sit, sit to stand, stand, backwards walking, 3 meter walk), in healthy elderly and Parkinson’s patients.

The contributions of this paper include applying the movelet method proposed by Bai et al. (2012) to smartphone data, while past work on this method focused on wearable device data. Smartphone data poses new challenges compared to wearable device data, such as having a lower frequency of data collection. For example, the ActiGraph wearable device can collect data at up to 100 Hz. The wearable sensors used in Tahavori et al. (2017) collected data at 50 Hz. Another contribution of this paper is that while previous work on the movelet method has focused solely on accelerometer data, we demonstrate the potential of using both accelerometer and gyroscope data. The data used in this paper was collected in an experiment where four healthy participants performed various activities while wearing two smartphones and three wearable devices. The present paper focuses on accelerometer and gyroscope data from the two smartphones.

The paper is organized as follows. Section 2 presents some background on the study that we conducted. In Section 3, we provide an overview of the movelet method developed by Bai et al. (2012). We apply the movelet method to our study data set and present the results in Section 4. Future work is discussed in Section 5.

2 Data

2.1 Study Procedure

Our study was approved by the Harvard T.H. Chan School of Public Health Institutional Review Board in February 2018 and the data collection took place in the summer of 2018. The eligibility requirements included being at least 18 years old, and able to walk, stand, and ascend and descend stairs without assistance from a person or device. Four healthy subjects, two females and two males, enrolled in the study. We collected data on age, weight, height, gender, dominant hand, and preferred method of carrying their personal phone (e.g., pocket, hand, bag). Table 2 lists the participants’ demographic characteristics. The participants ranged in age from 27 to 54.

Each participant completed an approximately one-hour study visit, during which she/he was outfitted with smartphones and wearable devices. The smartphones included an iPhone 5S in the front right pants pocket and an iPhone 7 in the back right pants pocket. Also, the participant wore ActiGraph GT9X Link wearable devices on the left wrist, right wrist, and right ankle, respectively. The smartphones collected gyroscope and accelerometer data continuously at a frequency of 10 Hz. The ActiGraph devices collected gyroscope and accelerometer data continuously at 100 Hz. All study visits were videotaped, and we used the video footage to manually annotate each participant’s accelerometer and gyroscope data sets with ground truth activity labels. We emphasize that this paper focuses on the smartphone data only, and a joint anlaysis of smartphone and wearable data will be presented elsewhere.

The participants performed a series of prescribed activities. We asked each participant to perform a series of activities to generate either training data for building a subject’s dictionary (discussed in Section 3) or test data for evaluating the accuracy of the dictionary-based classification model. During the collection of training data, the participant performed some routine activities for a short period of time. These included standing (for 10 seconds), walking on a flat surface ( 15 meters), ascending a single flight of stairs, descending the flight of stairs, and performing two repetitions of chair stands (i.e., sitting down from standing, staying seated for 10 seconds, and standing up from sitting). During the collection of test data, the participant followed various routes on the Harvard Longwood campus that included walking, standing, sitting on benches, ascending stairs, and descending stairs. We asked the participant to complete one specific route four times, each time with the front pocket phone in a different orientation. We also collected data on the participant walking at different speeds, including “normal” (their normal speed), “slow” (slower than normal), and “fast” (faster than normal). To simulate a real life setting rather than a controlled lab environment, all data were collected in public places. In addition to training and test data, we collected data from an iPhone as the participant made a call and browsed the Internet using the phone. An outline of the complete study protocol is given in Table 1.

3 Methods

The movelet method was proposed by Bai et al. (2012)

for activity classification using accelerometer data collected by wearable devices. This method only requires a small amount of training data and can be used to detect activities of the user’s choice, even those that occur only for a brief moment, such as the transition from sitting to standing. In the method, the user first makes a list of common activities that occur in the subject’s daily life, such as walking, ascending and descending stairs, standing, sitting, and running. The subject is then asked to perform each activity, and the resulting data is gathered. This data is referred to as training data, and only 2-3 seconds of training data are required per activity. The training data are then used to build a dictionary for the subject, whose entries are the different activities. Each entry consists of “movelets,” which are defined as 1-second windows of data. The movelets for a specific activity entry are obtained by sliding a 1-second window along the training data for the activity, starting with the left edge of the window at the first data point and sliding one data point at a time until the right edge of the window is at the last data point. The method assumes that the data is collected at a constant frequency.

After the training data is collected, the subject goes about their daily life and new data without activity labels is collected. For the subject’s new data, their dictionary is used to make classifications of the activity occurring at any given time point. First, each movelet in the new data set is compared to the dictionary movelets and the closest match is identified based on a distance metric, such as Euclidean distance. For example, if the closest match is to a dictionary movelet in the walk entry, the movelet in the new data set is classified to be walking. Second, for a given timepoint, a majority vote is taken among the neighbor movelets, including the movelet beginning at the time point and the movelets in the next second. The majority vote determines the predicted activity label at the time point.

4 Results

4.1 Training Data

We first present the participants’ training data. Training data was collected for standing, walking, ascending stairs, descending stairs, and chair-stands. Figure 1 presents the raw tri-axial (i.e., , , ) data from the front pocket smartphone gyroscope during training data collection. The four columns correspond to Participants 1-4, respectively. Each row corresponds to a specific activity. A complete chair stand was broken into three separate activities, including (i) stand-to-sit (ii) sit (iii) sit-to-stand. The plots of these three activities are taken from the first of the two chair stands that we asked the participant to perform. Figures 2, 3, and 4 have the same formatting as Figure 1. They present the raw tri-axial data for the other sensors, including the back pocket smartphone gyroscope, front pocket smartphone accelerometer, and back pocket smartphone accelerometer, respectively.

In general, we observe a fair bit of variability across the participants. For example, in Figure 1, there are clear differences in the walking data across the subjects. For example, data for Participant 3 has a smaller amplitude than that for Participant 1, and for Participant 4 we see a large amplitude for the axis (shown in red) that is not present in the data for the other participants. There is also variability between the front and back pocket gyroscope data, as is evident for Participant 1. For all four participants, the gyroscope data from the back pocket appears more jittery than the gyroscope data from the front pocket during walking and stairs.

For front and back pocket gyroscope data, the output during sitting and standing is roughly 0 radians per second because the phone is not rotating during either activity. Using the front pocket accelerometer, we can differentiate between sitting and standing because the phone is vertical during standing (so that gravity falls on the axis) while the phone is horizontal during sitting (so that gravity falls on the axis). For the back pocket accelerometer, the phone does not come to be horizontal when the participant is sitting, so here it is more difficult to differentiate between sitting and standing compared to the front pocket accelerometer.

At any given timepoint , let denote the accelerometer or gyroscope measurement at time . Then the magnitude is equal to . In the Supplementary Materials, we present plots of the magnitude data for the front gyroscope, back gyroscope, front accelerometer, and back accelerometer.

4.2 Application of Movelet Method

We applied the movelet method to the data we collected in this study. The method was applied to accelerometer and gyroscope data separately. For each participant, we built his/her dictionary using four seconds of training data per activity. If there were more than four seconds available, we used the middle four seconds. The list of activities include those along the right hand margin of Figure 1. To do classifications on test data from the front gyroscope, we used the dictionary corresponding to the front gyroscope. The handling for the back gyroscope, front accelerometer, and back accelerometer was analogous. When comparing movelets in the test data to movelets in the dictionary, we used Euclidean distance as the distance metric. As a sensitivity analysis, we show the results if the raw tri-axial data is used, as well as the results if only the magnitude data is used.

Tables 3, 4, 5, and 6 compare the predicted activity label to the true activity label for each participant. The separate tables show results for the front gyroscope, back gyroscope, front accelerometer, and back accelerometer, respectively. The results consider all portions of the test data collection, except for when the participant walked at different speeds (discussed at the end of this subsection) and when the participant performed the same course multiple times with the front pocket phone in a different orientation each time (discussed in Section 4.3). The table rows give the predicted activity labels, while the columns give the true activity labels. For any given row and column, there are two values that each represent the proportion of times that the activity given by the column is predicted to be the activity given by the row. The value on the left is for using tri-axial data, and the the value on the right is for using magnitude.

Using the movelet method, the classification results for walking, ascending stairs, and descending stairs were strong for the gyroscope (Tables 3 and 4). For the front pocket gyroscope (Tables 3), the classification accuracies when tri-axial data was used ranged from 0.72 to 0.95 for walking, 0.76 to 1 for ascending stairs, and 0.7 to 0.88 for descending stairs. The classification accuracies when magnitude data was used ranged from 0.65 to 0.91 for walking, 0.73 to 0.98 for ascending stairs, and 0.15* to 0.87 for descending stairs. (*For Participant 4, using tri-axial data yielded a better accuracy of 0.81 compared to 0.15 using magnitude data.) For the back pocket gyroscope (Table 4), the classification accuracies when tri-axial data was used ranged from 0.82 to 0.92 for walking, 0.66 to 1 for ascending stairs, and 0.46 to 0.85 for descending stairs. The classification accuracies when magnitude data was used ranged from 0.81 to 0.94 for walking, 0.78 to 0.99 for ascending stairs, and 0.32 to 0.67 for descending stairs. As a comparison, the null rate, which is based on random guessing, is 1/number of training activities = 0.14.

The gyroscope data outperformed its accelerometer counterpart in correctly predicting walking, ascending stairs, and descending stairs. For the front pocket accelerometer (Table 5), the classification accuracies when tri-axial data was used ranged from 0.57 to 0.86 for walking, 0.28 to 0.86 for ascending stairs, and 0.36 to 0.56 for descending stairs. For the accelerometer, using magnitude data improved the results compared to using tri-axial data. The classification accuracies when magnitude data was used ranged from 0.62 to 0.92 for walking, 0.38 to 0.97 for ascending stairs, and 0.38 to 0.78 for descending stairs. For the back pocket accelerometer (Table 6), the classification accuracies when tri-axial data was used ranged from 0.36 to 0.71 for walking, 0 to 0.74 for ascending stairs, 0.06 to 0.69 for descending stairs. The classification accuracies when magnitude data was used ranged from 0.65 to 0.93 for walking, 0.41 to 0.95 for ascending stairs, and 0.46 to 0.65 for descending stairs.

We asked the participant to walk at various speeds, including “normal,” “fast,” and “slow.” Tables 7 and 8 present the distribution of the predicted activity label for each data type and each participant, under the three walking speeds. Table 7 shows the results when raw tri-axial data is used. At the normal pace, the front and back gyroscopes both performed well. For participants 2 and 4, the gyroscopes predicted walking 100% of the time. Slow walking was sometimes mistaken for stairs, in particular ascending stairs. The gyroscope data was able to recognize fast walking as walking, especially the front gyroscope for participants 2 and 4 and the back gyroscope for participants 1 and 3. In general, the gyroscopes outperformed their accelerometer counterparts at recognizing walking at different speeds. Table 8 shows the results when only magnitude data is used. Using magnitude helped the back accelerometer predict walking. For the gyroscope, the classification accuracy is worse for slow walking but better for fast walking compared to using tri-axial data.

4.3 Phone Orientation

A phone can be placed in one of four possible orientations inside the pants pocket, given by whether the phone screen is facing the leg or not, and whether or not the phone is upside down. During the data collection, we asked the participant to repeat four times a course that included standing, walking, and stairs. Each time, the phone in the front pocket was re-oriented, so that data from each possible orientation was observed. For the test data collected during this segment, we implemented the movelet method under two scenarios: (1) using the raw tri-axial data without any adjustment, even though the training data was collected under a single orientation, (2) using magnitude data. Tables 9 and 10 show the distribution of the predicted activity label for each activity during this segment of the test data collection. The separate tables are for the front pocket gyroscope and the front pocket accelerometer, respectively. We focus on the front pocket phone because the back pocket phone was not re-oriented. The table rows show the predicted activity labels, and the columns give the true activity labels. For any given row and column, the two values represent the proportion of times that the activity given by the column is classified to be the activity given by the row. The first value is the proportion if raw tri-axial data is used; the second value is the proportion if only magnitude data is used.

For the front pocket gyroscope (Table 9), the classification accuracy for walking ranges from 0.74 to 0.90 when magnitude data is used, and is not as high when tri-axial data is used (0.38 to 0.87). The classification accuracy for ascending stairs is high in either case, 0.70 to 0.94 for tri-axial data and 0.61 to 1 for magnitude data. For descending stairs, the classification accuracy of using tri-axial data versus using magnitude data depends on the participant, e.g., 0.43 versus 0.80 for Participant 1 whereas 0.91 versus 0.51 for Participant 3. Standing is sometimes confused for sitting, which is expected since the phone is not rotating in either case, so it is difficult for the gyroscope to differentiate between these two stationary activities.

The front pocket accelerometer (Table 10) can differentiate standing from sitting. However, the relative performance of using tri-axial compared to magnitude data differs by participant. For example, the classification accuracy during standing is 0 (tri-axial) versus 0.92 (magnitude) for Participant 4, while it is 0.94 (tri-axial) versus 0.55 (magnitude) for Participant 1. For walking, using magnitude data works better on average across the participants, yielding classification accuracies between 0.52 to 0.95. Using magnitude data also helps for recognizing climbing stairs, yielding classification accuracies between 0.34 to 0.98. The same holds for descending stairs, where the classification accuracies when magnitude was used ranged from 0.53 to 0.86.

5 Discussion

We applied the movelet method to smartphone gyroscope and accelerometer data from our study of healthy volunteers. In the study, data was collected in public places rather than a tightly controlled lab environment, to mimic data collection in the wild. Ground truth activity labels based on video footage were used to validate the activity predictions. Using the gyroscope data, the method generally predicted walking, ascending stairs, and descending stairs with a high sensitivity. Also, we could classify fast walking correctly as walking with high sensitivity, even though the training data collection did not include fast walking. The prediction results for the gyroscope were generally better than those for the accelerometer. However, the accelerometer was more reliable for differentiating between stationary activities, such as standing versus sitting. For the accelerometer, the classification accuracies were generally improved by applying the movelet method to magnitude data compared to tri-axial data.

An advantage of the movelet method is that it requires only a small amount of training data to build a person’s dictionary. In our analyses, we used just four seconds of training data per activity for each participant’s dictionary. However, in large studies it may be challenging to collect labeled training data on every participant. One option is to match each participant without labeled training data to another person for whom labeled training data is available, and this matching can be based on variables such as age, height, weight, gender, and preferred phone carrying position. An area of future research is evaluate the accuracy of this approach. Also, one way to streamline the collection of training data is to incorporate the data collection into routine tests that are already conducted during clinic visits, such as the six-minute walk test.

In this paper, we focused on the case that the phone is in a pants pocket. Another challenge is to consider other possible placements of the phone. The study data showed that the placement of the phone in the front pocket compared to the back pocket affected the data and subsequent activity classification. The data also will look different if the phone is in the hand, a backpack, or a purse. A potential area of future research is to extend the movelet method to handle unknown and changing placements of the phone.

6 Supplementary Material

The Supplementary Materials are available upon request. Please contact Emily Huang at


  • Alhassan et al. (2008) Alhassan, S., Sirard, J. R., Spencer, T. R., Varady, A., and Robinson, T. N. (2008). Estimating physical activity from incomplete accelerometer data in field studies. Journal of Physical Activity and Health 5, S112–S125.
  • Bai et al. (2012) Bai, J., Goldsmith, J., Caffo, B., Glass, T. A., and Crainiceanu, C. M. (2012). Movelets: A dictionary of movement. Electronic Journal of Statistics 6, 559–578.
  • Bai et al. (2014) Bai, J., He, B., Shou, H., Zipunnikov, V., Glass, T. A., and Crainiceanu, C. M. (2014). Normalization and extraction of interpretable metrics from raw accelerometry data. Biostatistics 15, 102–116.
  • He et al. (2014) He, B., Bai, J., Zipunnikov, V. V., Koster, A., Caserotti, P., Lange-Maia, B., Glynn, N. W., Harris, T. B., and Crainiceanu, C. M. (2014). Predicting human movement with multiple accelerometers using movelets. Medicine and Science in Sports and Exercise 46, 1859–1866.
  • Tahavori et al. (2017) Tahavori, F., Stack, E., Agarwal, V., Burnett, M., Ashburn, A., Hoseinitabatabaei, S. A., and Harwin, W. (2017). Physical activity recognition of elderly people and people with parkinson’s (pwp) during standard mobility tests using wearable sensors. In Smart Cities Conference (ISC2), 2017 International, pages 1–4. IEEE.
  • Urbanek et al. (2018) Urbanek, J. K., Zipunnikov, V., Harris, T. B., Fadel, W., Glynn, N. W., Koster, A., Caserotti, P., Crainiceanu, C. M., and Harezlak, J. (2018). Prediction of sustained harmonic walking in the free-living environment using raw accelerometry data. Physiological Measurement .
  • Xiao et al. (2016) Xiao, L., He, B., Koster, A., Caserotti, P., Lange-Maia, B., Glynn, N. W., Harris, T. B., and Crainiceanu, C. M. (2016). Movement prediction using accelerometers in a human population. Biometrics 72, 513–524.