Surface Type Estimation from GPS Tracked Bicycle Activities

by   Nitish Nag, et al.

Road conditions affect both machine and human powered modes of transportation. In the case of human powered transportation, poor road conditions increase the work for the individual to travel. Previous estimates for these parameters have used computationally expensive analysis of satellite images. In this work, we use a computationally inexpensive and simple method by using only GPS data from a human powered cyclist. By estimating if the road taken by the user has high or low variations in their directional vector, we classify if the user is on a paved road or on an unpaved trail. In order to do this, three methods were adopted, changes in frequency of the direction of slope in a given path segment, fitting segments of the path, and finding the first derivative and the number of points of zero crossings of each segment. Machine learning models such as support vector machines, K-nearest neighbors, and decision trees were used for the classification of the path. We show in our methods, the decision trees performed the best with an accuracy of 86%. Estimation of the type of surface can be used for many applications such as understanding rolling resistance for power estimation estimation or building exercise recommendation systems by user profiling as described in detail in the paper.


Analysis of Road Accidents Lethality in Lebanon using Machine Learning

Road accidents amount to a significant loss of life in Lebanon. Hence an...

A Scalable Heuristic for Fastest-Path Computation on Very Large Road Maps

Fastest-path queries between two points in a very large road map is an i...

Machine Learning Model of the Swift/BAT Trigger Algorithm for Long GRB Population Studies

To draw inferences about gamma-ray burst (GRB) source populations based ...

Estimating Traffic Conditions At Metropolitan Scale Using Traffic Flow Theory

The rapid urbanization and increasing traffic have serious social, econo...

Algebraically Explainable Controllers: Decision Trees and Support Vector Machines Join Forces

Recently, decision trees (DT) have been used as an explainable represent...

Validation of Vector Data using Oblique Images

Oblique images are aerial photographs taken at oblique angles to the ear...

Integrated Neural Network and Machine Vision Approach For Leather Defect Classification

Leather is a type of natural, durable, flexible, soft, supple and pliabl...

Code Repositories


Estimating the type of surface using only GPS coordinates

view repo

I Introduction

Obtaining terrain data is usually done through satellite images and further analysis of these images. Unfortunately, the computing requirements to do this at scale are not feasible due to the heavy computing needs of image processing. A bicycle and a mobile phone are the most commonly owned transportation and electronic devices in the world, including people who live in remote places[25][26]. The data collected from a mobile phone GPS can be used for obtaining useful information about the surrounding terrain features and type of surface. Determining the type of surface can be useful for many applications such as estimating the rolling resistance and friction coefficient, which is otherwise difficult and complex to estimate. These road features can also be included in mapping applications so that users can understand road conditions before travel.

In this paper, we have explored how the GPS coordinates (including altitude) data of the path taken by a bicycle rider can be sufficient for classifying the type of surface between paved and dirt roads. This can be done by using simple concepts in an inexpensive way unlike the image processing techniques applied onto the satellite images. Further classification of the path has been done using supervised machine learning techniques.

Ii Related Work

Computer vision provides us with solutions to various image understanding tasks such as estimating urban/rural areas, estimation of population density and finding land cover. According to [15], over 6 million images tagged with geo location were considered and a data driven scene matching approach was adopted. An automated method to determine terrain model has been proposed in [1] to best approximate the true data points of the surface models.

Data from a mountain rangeland with Landsat data and 251 sampling sites from central Argentina was considered for surface analysis [16]. Here, 8 land cover units were defined in terms of spectral information and also ecologically meaningful units in terms of structural types was considered. Classification was done using discriminant functions and maximum likelihood functions to compare against field validations. Ground survey methods such as electronic tachymetry, GPS and terrestrial laser scanning have been used for terrain modeling as well [17].

For identification of a road in a given image, Gabor filters have been used to obtain texture orientation for every pixel and edge detection with a vanishing point constraint has been used for finding boundaries of the road in [18]. Here, 1003 images were used for identifying road regions. Road area detection algorithm from color images has been proposed in [19] and it involves edge detection using the intensity of the image and analysis of the full color image to determine the road area.

Estimating the nature of road surface becomes an important factor in improving the road infrastructure. An emphasis of use of real time data from the smart phone in estimating the nature of road surface by detecting potholes on roads [20]. Mobile phones have been used to collect data such as spectral and temporal features of the road along with the vehicle speed to classify road anomalies using support vector classifiers [21].

Wavelet analysis has been used for road profiling and in identifying roughness of road surface, detecting potholes and cracks [22]. Evaluation of road quality has also been demonstrated through GPS data and accelerometer data collected from anonymous drivers [23].

In this paper we have experimented with using only GPS data and elevation data from cyclists in evaluating the type of surface of the path. We test three classification techniques: support vector machines, K nearest neighbors, and decision trees. Our aim is to distinguish a dirt/mud path from a developed paved road.

Iii Methodology

The flowchart Fig.1 shows the methodology taken. For each model we use the below classification methods.

  • Support Vector Machines [3]

  • K nearest neighbors [4]

  • Decision Trees [5]

Fig. 1: Flowchart

Iii-a Data Collection

Strava is the main source of activities. The app uses GPS to track all the physical activities like cycling, running, and swimming. The app users can record their activities and view their statistics. Data from 44 athletes is collected after authorization. This data includes personal information like age , weight , height , waist circumference. Activity data streams are collected using the Strava application. The streams include power, cadence, temperature , altitude , GPS coordinates during the ride, velocity and distance [2]. From this, we use only GPS and altitude to classify the road.

Iii-B Data Preprocessing

The data collected includes all the activities recorded by the user. From this, only bike rides are selected for estimating the road surface. Currently, 115 rides were selected and these rides were manually classified as squiggly and straight to obtain the ground truth. Among these, 50 % were considered for training and the rest for testing.

Iii-C Method

  • Assumptions: The path taken by a bicycle rider on a dirt path is more non-uniform and more squiggly than a bike road which is relatively more uniform and straight. The same was observed from the maps on Golden Cheetah[24] for the data collected through Strava App. Figure 2 shows a dirt path and a straight path. This was plotted using the X-Y coordinates from the GPS. It can be seen from Figure 2 that the dirt path taken by the rider is more squiggly.

Fig. 2: Paths.

To identify if a path is squiggly or straight, the following methods are adopted:

Iii-C1 Using frequency of change in direction of slope

  • The path taken by the rider is divided into segments. The length of each segment is 1 percent of the total distance taken by the rider.

  • In each of these segments, the points where slope increases or decreases is noted. This way we are able to get the peaks in the path taken.

  • The variation in the direction of path is more for a squiggly path than a straight path. The frequency of change in slopes is considered. Figure 3 and 4 show the variation in direction of slope of a squiggly and straight path respectively.

    Fig. 3: Frequency of variation in direction of slope in segments in a squiggly path

    captionFrequency of variation in direction of slope in segments in a straight path

  • A threshold for the frequency in change of direction of slope needs to be set to distinguish a squiggly path from a straight path. But, setting the threshold manually can be tedious task. Instead, we have adopted the usage of supervised classification techniques.

    For each of the three models, table 1 shows the summary statistics when the ratio of train to test is 50:50

    Model Accuracy Precision Recall
    SVM 80.70% 81% 81%
    KNN with K=3 69% 76% 68%
    Decision Trees 74% 79% 74%
    TABLE I: Classification Report for method 1

Comparison of the models is shown in the figure 5. ROC curve analysis is considered for evaluating the classifiers [14]. From figure 5 it can be seen that the area under the ROC curve for SVM model was the highest. Therefore SVM performed the best compared to all other models.

Fig. 4: ROC Curve

Iii-C2 Using error in fitting the curve with Linear models for the segments

  • The path taken by the rider is divided into segments. The length of each segment is 1 percent of the total distance taken by the rider.

  • For each of these segments, a linear regression model

    [6] built.

  • In each segment 50 percent of the data is considered for training and the rest for testing.

  • The root mean square error obtained by testing the regression model is considered. A squiggly segment being more non-uniform has more root mean square error than the straight path. Figure 6 and 7 show a linear regression model being fitted in a given segment of a squiggly and straight path respectively. As it can be seen, the line fits the actual data points very well for a straight path as compared to a squiggly path.

    Fig. 5: Fitting Linear model in a segment of a squiggly path
    Fig. 6: Fitting Linear model in a segment of a straight path

For each of these models, the Table 2 shows the summary statistics when the ratio of train to test is 50:50

Comparison of the models is show in the figure 8. From this figure it can be seen that the area under the ROC curve for Decision Trees model was the highest. Therefore, for this method, decision trees performed the best compared to all other models.

Model Accuracy Precision Recall
SVM 54% 29% 54%
KNN with K=3 86.54% 87% 87%
Decision Trees 86.53% 87% 87%
TABLE II: Classification Report using method 2

Fig. 7: ROC Curve

Iii-C3 Finding zero crossings in the first derivative of each segment

  • The path taken by the rider is divided into segments. The length of each segment is 1 percent of the total distance taken by the rider.

  • In each of these segments, the first derivative is found.

  • The points of zero crossings in the first derivative is identified. The number of points of zero crossings will be more for a squiggly path compared to a straight path due to presence of greater number of local maxima/minima in the segments. Figure 9 and 10 show the number of points of zero crossings for segments in a squiggly and a straight path. It can be seen from these figures, that segments in a straight path has zero or only 2 points of zero crossings, whereas segments in a squiggly path has 5 or more points of zero crossings.

    Fig. 8: Number of Zero Crossings in segments of a squiggly path
    Fig. 9: Number of Zero Crossings in segments of a straight path

For each of the three models, the Table 1 shows the summary statistics when the ratio of train to test is 55:45

Model Accuracy Precision Recall
SVM 62.74% 63% 63%
KNN with K=3 56.73% 57% 57%
Decision Trees 71% 74% 71%
TABLE III: Classification Report for method 3

From ROC curve analysis seen in Fig 11 it can be observed that Decision trees performed better than other classification models.

Fig. 10: ROC Curve

The Machine Learning models used for the above estimation are implemented using sklearn.svm.SVC [8] for Support vector machines, sklearn.neighbors.KNeighborsClassifier [9] for K nearest neighbors, sklearn.tree.DecisionTreeClassifier [10] for Decision Trees and sklearn.linear_model.LinearRegression [11] for linear regression model.

Iv Results

As seen from the above ROC plots, among the three methods, decision trees performed better and has more area under the ROC curve when the method of fitting a linear regression model for each segment was considered. The above method was used to classify segments of the ride which was more squiggly than the other segments of the road. The Figure 12 shows segments marked in red which is more squiggly, the blue segments in the figure shows the part of the road which is straight.

Fig. 11: Segmented path

The above information can also be used to profile the user based on whether he prefers to ride on a dirt path or a bike road. In Figure 13 we show the percentage of distribution of squiggly(dirt) roads and straight roads taken by a specific athlete who donated data from Strava.

Fig. 12: Distribution of type of path during rides.

V Conclusions and Future Work

The paper illustrates the use of a simple method to estimate the type of road surface. The data used for the above experimentation can be obtained in a very easy and economical way. The type of surface of the road can be used in many applications. One such includes the estimation of coefficient of friction and rolling resistance. Determining the coefficient of rolling resistance can be used in the estimation of Power against rolling resistance which can be used in total power estimation of the athlete during the ride as given by equation 1.
P(total) = P(rolling resistance) + P(wind) +(gravity) +
P(acceleration)[7] ……………………………………………… Eqn(1)

Estimating the power output of an athlete can be used to estimate the VO2 of the individual as indicated by the below formula:

vo2(max) = (10.8 * power/weight) + 7…………….. Eqn(2)

Commonly, VO2 estimation is done using the data from various sensors on fitness tracking wearable devices. Heart rate is one of the important attributes to determine VO2 max. A neural network can be built for estimating the VO2, by considering the heart rate variability as done in

[12]. Accelerometer readings can also be combined with heart rate readings for VO2 estimation[13]. However, these wearables are expensive and not every user may own it. It is also inconvenient for the user to use the wearables continuously. In the future, we aim to use data that is freely available to us, such as wind speed, humidity, oxygen level and coefficient of friction from the type of surface to see how the power output can be modeled with respect to the above attributes. By estimating power output, we can estimate the VO2 max of the athlete during that ride as given in equation(2).

Another avenue of future work is in building recommendation systems for the users to suggest activities. Because certain cyclists may prefer smooth roads, versus mountain bike riders who prefer dirt roads, we can being to understand user preferences for bicycle travel paths. Other relevant insights can be drawn from the data we collect through Strava, weather, and social media platforms. Hence at the end of analysis, we will know the weather preference of the user, the kind of roads he prefers to cycle on, the time during which he regularly performs the activity and much more. These can be combined in order to recommend suitable activities to improve the user’s health condition.


  • [1] Champion, Nicolas, Didier Boldo, Marc Pierrot-Deseilligny, and Georges Stamon. ”Automatic estimation of fine terrain models from multiple high-resolution satellite images.” In Image Processing (ICIP), 2009 16th IEEE International Conference on, pp. 577-580. IEEE, 2009.
  • [2] Strava Developers, Strava API v3,
  • [3] Hearst, Marti A., Susan T. Dumais, Edgar Osuna, John Platt, and Bernhard Scholkopf. ”Support vector machines.” IEEE Intelligent Systems and their applications 13, no. 4 (1998): 18-28.
  • [4]

    Zhang, Min-Ling, and Zhi-Hua Zhou. ”ML-KNN: A lazy learning approach to multi-label learning.” Pattern recognition 40, no. 7 (2007): 2038-2048.

  • [5] Quinlan, J. Ross. ”Induction of decision trees.” Machine learning 1, no. 1 (1986): 81-106.
  • [6]

    Montgomery, Douglas C., Elizabeth A. Peck, and G. Geoffrey Vining. Introduction to linear regression analysis. Vol. 821. John Wiley & Sons, 2012.

  • [7] Strava Support, How Strava Calculates Power,
  • [8] 1.4. Support Vector Machines - scikit-learn 0.19.1, sklearn.svm.SVC,
  • [9] 1.4. Support Vector Machines - scikit-learn 0.19.1, sklearn.neighbors.KNeighborsClassifier, ghborsClassifier.html
  • [10] 1.4. Support Vector Machines - scikit-learn 0.19.1, sklearn.tree.DecisionTreeClassifier, TreeClassifier.html
  • [11] 1.4. Support Vector Machines - scikit-learn 0.19.1, sklearn.linear_model.LinearRegression, .LinearRegression.html
  • [12] Smolander, Juhani, Tanja Juuti, Marja-Liisa Kinnunen, Kari Laine, Veikko Louhevaara, Kaisa Männikkö, and Heikki Rusko. ”A new heart rate variability-based method for the estimation of oxygen consumption without individual laboratory calibration: application example on postal workers.” Applied ergonomics 39, no. 3 (2008): 325-331.
  • [13] Fudge, Barry W., John Wilson, Chris Easton, Laura Irwin, Jonathan Clark, Olivia Haddow, Bengt Kayser, and Yannis P. Pitsiladis. ”Estimation of oxygen uptake during fast running using accelerometry and heart rate.” Medicine and science in sports and exercise 39, no. 1 (2007): 192-198.
  • [14] Fawcett, Tom. ”An introduction to ROC analysis.” Pattern recognition letters 27, no. 8 (2006): 861-874.
  • [15] Hays, James, and Alexei A. Efros. 2008. “IM2GPS: Estimating Geographic Information from a Single Image.” In 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR. doi:10.1109/CVPR.2008.4587784.
  • [16] Cingolani, Ana M., Daniel Renison, Marcelo R. Zak, and Marcelo R. Cabido. ”Mapping vegetation in a heterogeneous mountain rangeland using Landsat data: an alternative method to define and classify land-cover units.” Remote sensing of environment 92, no. 1 (2004): 84-97.
  • [17] Gallay, Michal, Christopher D. Lloyd, Jennifer McKinley, and Lorraine Barry. ”Assessing modern ground survey methods and airborne laser scanning for digital terrain modelling: A case study from the Lake District, England.” Computers & geosciences 51 (2013): 216-227.
  • [18] Kong, Hui, Jean-Yves Audibert, and Jean Ponce. ”Vanishing point detection for road detection.” In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 96-103. IEEE, 2009.
  • [19] He, Yinghua, Hong Wang, and Bo Zhang. ”Color-based road detection in urban traffic scenes.” IEEE Transactions on intelligent transportation systems 5, no. 4 (2004): 309-318.
  • [20] Mednis, Artis, Girts Strazdins, Reinholds Zviedris, Georgijs Kanonirs, and Leo Selavo. ”Real time pothole detection using android smartphones with accelerometers.” In Distributed Computing in Sensor Systems and Workshops (DCOSS), 2011 International Conference on, pp. 1-6. IEEE, 2011.
  • [21] Gueta, Lounell B., and Akiko Sato. ”Classifying road surface conditions using vibration signals.” In Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2017, pp. 039-043. IEEE, 2017.
  • [22] Hesami, Reyhaneh, and Kerry J. McManus. ”Signal processing approach to road roughness analysis and measurement.” In TENCON 2009-2009 IEEE Region 10 Conference, pp. 1-6. IEEE, 2009.
  • [23] Aksamit, Paweł, and Marek Szmechta. ”Distributed, mobile, social system for road surface defects detection.” In Computational Intelligence and Intelligent Informatics (ISCIII), 2011 5th International Symposium on, pp. 37-40. IEEE, 2011.
  • [24] GoldenCheetah,,
  • [25] Mobile phone ranked most used electronic device: Forrester — Mobile Marketer,, /research/7473.html
  • [26],Fortune,