Impact of Driving Behavior on Commuter's Comfort during Cab Rides: Towards a New Perspective of Driver Rating

08/24/2021 ∙ by Rohit Verma, et al. ∙ University of Cambridge IIT Kharagpur 0

Commuter comfort in cab rides affects driver rating as well as the reputation of ride-hailing firms like Uber/Lyft. Existing research has revealed that commuter comfort not only varies at a personalized level but also is perceived differently on different trips for the same commuter. Furthermore, there are several factors, including driving behavior and driving environment, affecting the perception of comfort. Automatically extracting the perceived comfort level of a commuter due to the impact of the driving behavior is crucial for a timely feedback to the drivers, which can help them to meet the commuter's satisfaction. In light of this, we surveyed around 200 commuters who usually take such cab rides and obtained a set of features that impact comfort during cab rides. Following this, we develop a system Ridergo which collects smartphone sensor data from a commuter, extracts the spatial time series feature from the data, and then computes the level of commuter comfort on a five-point scale with respect to the driving. Ridergo uses a Hierarchical Temporal Memory model-based approach to observe anomalies in the feature distribution and then trains a Multi-task learning-based neural network model to obtain the comfort level of the commuter at a personalized level. The model also intelligently queries the commuter to add new data points to the available dataset and, in turn, improve itself over periodic training. Evaluation of Ridergo on 30 participants shows that the system could provide efficient comfort score with high accuracy when the driving impacts the perceived comfort.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 19

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

The growing success of ride-hailing services (Uber/Lyft) has increased most city dwellers’ reliance across the globe on these firms, both for daily commuting and intercity travels. While these app-based cab services have emerged rapidly, a growing concern is the driver quality for such on-demand cab services (Rogers, 2015; Liu et al., 2018). The app-based cab companies typically employ an open business model, where both the drivers and the riders register themselves by authenticating and validating their details (75). Typically, these cab companies continuously monitor the drivers’ performance through the smartphone app’s data and, more importantly, the feedback or the driver rating from the riders at the end of each trip. Such performance metrics are, in general, used for incentivizing the drivers and, therefore, are extremely important for the operational efficiency of the system.

Although driving performance monitoring through the app-sensed data (primarily the GPS) and the riders’ feedback at the end of the trip is crucial to monitor and maintain the service quality and resolve customer grievance, the current approaches have many limitations. First, the riders’ feedback or the driver rating provided by the rider at the end of the trip gives only a consolidated view of their experience during the ride. It does not capture (a) the instantaneous behavior of the driving and its impact on the rider’s comfort throughout the trip on a temporal scale, (b) the specific events during the trip, which have affected the riding experience. For example, a sudden jerk near the end of the trip may significantly affect the driver rating, although the rest of the trip was smooth. Indeed, various recent analysis of the Uber driver rating data has indicated that such a rating system is not accurate and also introduces multiple biases depending on the age, gender, demography, and different other factors associated with the rider as well as the riding environment (Hanrahan et al., 2017; Jiang et al., 2018). Second, the feedback or the rating is a complex consolidated parameter that combines multiple different factors; for example, the driver’s micro-behavior towards the rider impacts the rating significantly (Hanrahan et al., 2017). Therefore, it lacks transparency, where the drivers and the cab companies remain unaware of the low-level factors that affected the rating for a particular ride. Although the riders may provide the reason for a low rating, it is in-general optional. As analyzed from Uber data, most riders either refrain from giving detailed feedback or share biased or random feedback (Hanrahan et al., 2017). Third, the impact of driving over the riding experience is very much personalized depending on the riders’ age, gender, demography, health, mental conditions, etc. (Rubira Freixas, 2016; Verma et al., 2018). For example, although within the speed-limit, a fast-driving may cause discomfort to a commuter who is either old or physically weak but may make an office-goer happy.

Therefore, understanding the impact of driving behavior on a personal-scale is essential for both the drivers and the app-cab companies. Considering a ride-hailing service like Uber, the smartphones of the drivers and the cab riders are typically connected through the ride-hailing service, like the Uber app. Incidentally, a cab rider’s smartphone can capture her personal traits, which can also signify her comfort parameters (Chittaranjan et al., 2011). In a collaborative environment, the rider’s smartphone can continuously sense the driving data to derive the driving behavior and then correlate it with the commuter’s comfort parameters. An application that understands commuter comfort could open doors for other applications like (a) a live feedback system for the driver, which provides commuter profile information and suggests what driving actions could make the commuter uncomfortable. The driver can tune or control their driving behavior based on the commuter’s personal preference, making the riding more interactive and get a better rating (Chan, 2019; Raval and Dourish, 2016); (b) the app-cab companies can also match the drivers with the riders based on the driving profile of the driver and the riding preference of the rider.

Technology requirements and associated challenges: Automated systems for generating ratings from behavioral observations can play an essential role in addressing such issues, as the works by Thebault-Spieker et al.  (Thebault-Spieker et al., 2017), and Liang et al.  (Liang, 2017) have shown, either by utilizing surveys or simulations. However, to address the issues at a practical level and build various other value-added services based on the impact of driving behavior over a rider’s riding experience, we need an end-to-end driving profiling toolbox. This mechanism should continuously assess the driving behavior’s influence over the rider’s comfort and provide critical feedback, recommendation, or alert to the driver and the cab companies. However, as we mentioned earlier, such a model should capture the riders’ personality traits, as different factors have quite distinct impacts on other riders. However, these factors may not carry a direct signature to understand the impact of driving behavior on the commuter’s perceived comfort. For example, on a bumpy road, even a good driver may not avoid the jerkiness altogether; therefore, the commuter’s discomfort, in this case, is linked to the driving environment and not to the driving behavior. However, the driving behavior can be alarming if a sudden jerk is felt on a smooth road. Therefore, even a personalized learning model is not suitable to capture the commuter’s comfort as it also widely varies across different driving environments.

Effectively, the need here is to have a model which can (a) not only take decisions at a personalized level but also take into account the differences in the road conditions or the driving environments, (b) understand different baseline signatures associated with factors like acceleration, jerk, congestion, etc. that are associated with the comfort level of the commuter under various driving environments, and (c) estimate the deviation of these signatures from their typical pattern (corresponding to commuter comfort) indicating possible discomfort for the commuter. We also target to make the profiling online, based on the streaming sensor data (accelerometer, GPS) captured from commuters’ smartphones running the riding app. This ensures that our framework could be used for developing online services, such as alerting the driver during the trip itself if the passenger is likely to feel uncomfortable due to some driving actions.

In this paper, we first develop an application to collect various sensor data from the rider’s smartphone while on a cab ride to link the driving style with her comfort perception (§3). The application helps us to generate a rich dataset of driving and commuter comfort labels. Our primary contributions relying on the collected dataset are as follows;

  • Based on an online survey and a user study, we define what features affect a rider’s comfort while on a ride (§4).

  • We model the Spatio-temporal self-exciting (the value at the current time instance influences the value at the next time instance) features, viz. speed of the vehicle, jerkiness, and congestion, by analyzing their spatial time series distribution (§5).

  • We develop a Hierarchical Temporal Memory (HTM) (Hawkins and Blakeslee, 2007; Ahmad and Hawkins, 2015) based approach to detect anomalies in the distribution of these Spatio-temporal features, which are analogous to the rider’s discomfort (§6).

  • As HTM only detects anomaly for a single feature at a time; we develop a neural network model to map the likelihood of the discomfort of all the three features along with other static features to compute the rider’s comfort level. Keeping personalization of comfort in mind, we opt for Multi-task Learning (Caruana, 1998)7). The model also has a feedback mechanism to improve itself with time (§7.2).

The developed model continuously predicts the rider’s comfort level based on the driving behavior and her personality traits, and such information can be used for developing multiple applications in a driver-rider collaborative environment, as stated earlier. As a proof of concept implementation, we implement an automated driving rating system which provides continuous feedback to the driver over the ride-hailing app.

We perform experiments over users to evaluate each block of the system. Finally, we develop a rating application based on the overall comfort felt during the trip, which uses Ridergo as a framework (§8). Following this, we provide a discussion of the limitations and possible future directions for this work(§9). Before proceeding into the system’s details, we first give a brief survey of the related literature in the next section (§2).

2. Related Work

In this section, we discuss related literature that built ways to define commuter comfort, a crucial unit of the transport system, and develop systems to compute comfort in different scenarios.

2.1. Commuter Comfort: How is it perceived?

Understanding commuter comfort could be dated back to Mayr (Mayr, 1959) coining the term traveling comfort composed of riding, local and organizational comfort (Oborne, 1978). Local comfort is the comfort felt on stations or airports and takes into consideration factors like comfortable transfers or condition of waiting rooms. Organizational comfort takes into consideration the comfort linked to organizational origin, like availability of transport or reliability of a service. Riding comfort, which is the comfort inside the vehicle, was later quantified by Kottenhoff (Kottenhoff, 2016) based on the experience observed due to vehicle movements like accelerations, shaking, vibration, or jerks. So effectively, it could be linked to the driving style of the driver, which could include instances like uneven driving, heavy braking, sharp acceleration, jerkiness, as observed by Kottenhoff et al. (Kottenhoff and Sundström, 2011). In transport research literature, such as (Florio, 2013; Shen et al., 2016; Tirachini et al., 2013) and the references therein, personal interviews are used to measure comfort, which being time-consuming and labor-intensive lacks scalability. Furthermore, there have been several works which have shown that comfort is a personalized concept. For instance, Clear et. al. (Clear et al., 2017) report that in a building the perception of comfort might vary between the occupants. ComfRide (Verma et al., 2018) shows that multiple factors could affect a commuter’s perception of comfort in public buses, and every other commuter could give preference to a different set of features. This varies with age, sex, occupation, etc. Similarly, works like (64; P. Goel (2016)) have shown similar results for commuters using taxis or ride-sharing options like Uber/Lyft.

2.2. Participatory Sensing as a Cooperative Solution

Advent of several participatory sensing works (Quattrone et al., 2015; Martelaro and Ju, 2017; Cheng et al., 2017; Hossain et al., 2018; Mihoub and Lefebvre, 2019) opened grounds for approaches towards understanding commuter comfort from data obtained from multiple commuters. For instance, Cyclopath (Priedhorsky et al., 2012) obtains bikeability rating from multiple cyclists in a city to recommend the best route for a user. Similarly, PASSAGE (Garvey et al., 2016) recommends safe path for pedestrians. SmartTransfer (Du et al., 2018) provides a crowd-aware route recommendation system for public transit commuters. Works like CMS (Li and Hensher, 2013), RESen (Song et al., 2012), CommuniSense (Santani et al., 2015), UrbanEye (Verma et al., 2016b), used commuter’s smartphone sensor information to gain trip-related features. These works make use of multiple smartphone sensors like GPS, accelerometer, gyroscope, gravity sensor, etc. to obtain such information. Several of the new research works have tried to understand commuter comfort in public transport  (Eboli et al., 2016; Zhao et al., 2016; Azzoug and Kaewunruen, 2017; Verma et al., 2018; Dunlop et al., 2015). RideComfort (Azzoug and Kaewunruen, 2017) utilizes smartphone sensors to obtain vibration-based ride comfort in train rides. Dunlop et al. (Dunlop et al., 2015) used a smartphone-based survey to observe comfort perception of a commuter on a transit ride. Other works utilize smartphone sensors to get a perception of commuter comfort on buses (Eboli et al., 2016; Chin et al., ).

2.3. Commuter Comfort in Cabs

Public transport has the privilege of fixed routes and scheduled times, the absence of which adds uncertainty to computing comfort in private cabs. There have been works that compute related factors like driving behavior, driver stress  (Muñoz-Organero and Corcoba-Magaña, 2017; Qi et al., 2016; Li et al., 2018; Zhang et al., 2015; Nanni et al., 2016) or relationship between the driver-commuter pair (Perterer et al., 2013), which could indirectly impact the comfort of a commuter. Eren et al. (Eren et al., 2012) utilize accelerometer, gyroscope, and magnetometer data obtained from a driver’s smartphone to compute the driving behavior and estimate the commuting safety on the ride. Verma et al. (Verma et al., 2019) utilize the roster information collected from multiple drivers to compute stress and relate that to predicting the driving behavior, which could cause possible accidents. However, these works utilize the data obtained from the driver and hence couldn’t be personalized for the commuter. Works which directly target the comfort of a commuter also rely on data either from the car or the driver (Ruzic, 2011; Park et al., 1998). Join Driving (Zhao et al., 2013) performs commuter comfort calculation using accelerometer data obtained from the driver’s smartphone. A similar approach is followed by Machaj et al. (Machaj et al., 2020) utilizing smartphone sensors. Park et al. (Park et al., 1998) utilize vibrations observed from the commuter’s body using sensors mounted on the seat to perceive comfort. On the other hand, Ruzic et al. (Ruzic, 2011) utilize thermal sensors in the car to compute the comfort of the passenger. Elbanhawi et al. (Elbanhawi et al., 2015) do look into personalized comfort for a passenger, but that is in the context of autonomous cars.

Work Sensing Method Transport Mode Online/ Offline Comfort Computation Personalized for Commuter
SmartTransfer (Du et al., 2018) Public transit transaction records Public Buses Offline Yes (crowd-aware only) No
ComfRide (Verma et al., 2018) Commuter’s smartphone sensors Public Buses Offline Yes Yes
Join Driving (Zhao et al., 2013) Driver’s smartphone accelerometer Cabs Online Yes No
Ruzic et al. (Ruzic, 2011) Cab mounted sensors Cabs Online Yes No
Elbanhawi et al. (Elbanhawi et al., 2015) Cab mounted sensors Autonomous Cars Online Yes Yes
Ridergo Commuter’s smartphone sensors Cabs Online Yes Yes
Table 1. Comparing Ridergo with existing works

2.4. Limitation of the Existing Works

In a nutshell, although there exist several works on understanding the impact of driving behavior on commuter’s comfort or the overall riding experience, they have the following limitations. (1) The majority of the works use offline information to understand the driving behavior and its impact on the commute experience. They cannot capture online and instantaneous impact of the driving behavior on the commute experience, and therefore is limited only to the applications for offline analysis. (2) The existing approaches fail to separate the impact of environmental factors from driving behavior. For example, Join Driving (Zhao et al., 2013) looks into jerkiness by measuring the acceleration but does not consider whether the jerkiness is due to a bumpy road or due to a poor driving behavior. (3) The personalized preferences of the commuters based on age, gender, demography, occupation, etc. have not been captured in the existing models; therefore, the models are not suitable for providing fine-grained recommendation or alerts to the drivers. As Table 1 shows, Ridergo addresses these limitations by utilizing the data from the commuter’s smartphone to assess her comfort at a personal level and understand when a driving style is causing any discomfort.

3. Data Collection

We developed an in-house data collection app in order to (a) conduct the pilot experiments and (b) to pre-train the models present in the developed Ridergo system. The developed Android application is equipped to collect driving data from smartphone sensors and label the data based on the perceived comfort label in a 5-point scale. This app records the inertial sensor data (accelerometer, gyroscope, magnetometer) and GPS information along with vehicle speed. Additionally, the app also takes comfort rating input from the commuters using a 5-point slider scale (1 being least discomfort and 5 being the most discomfort). The default value of this scale is set to 1. Whenever a commuter feels some discomfort, she could update a new value, which is set as the value of the comfort label until updated again by the commuter. Moreover, the app also probes the commuter in every of the last input to check if the label has changed. The commuter need not respond to this whence the previous label is used. The collected data is continuously streamed to a server to be uniquely stored for each commuter.

Participants
Age
Group
Android
Version
Total
trips
Cities Days
Max-Min
Trip Length
Max-Min
Trip Time
20 20 - 50 6.0 - 8.1 100 5 15 2 - 56 km 5 - 120 mins
Table 2. Data Collection Details

We distributed the developed application among 20 participants, who frequently take cabs for their commute, to collect data in a natural and uncontrolled environment. The participants were asked to start the application when boarding a cab and to stop the application on alighting. They were also asked to rate the driving anytime they felt discomfort and to update their rating anytime they felt a change in their perception of comfort. The participated commuters belonged to different age groups and used different models of smartphones like Lenovo K6 Power, Moto G5, Redmi 5, Redmi Note 5 Pro, with Android version ranging from Android 6.0 to 8.1. A brief summary of the data collection experiment has been provided in Table 2. This data is used to carry a set of pilot experiments and to extract essential insights which helped us in developing the basic building blocks of Ridergo . The details follow.

4. User Study: Identification of indicators of commuter discomfort

Figure 1. Online survey results. (a) Which features affect you when on a trip? (b) Which part of the trip have you usually felt discomfort? (c) Does the time of the day impact your discomfort?

First, we conduct an online survey on a set of commuters to discover the source of discomfort experienced by them in their daily commute. Next, we demonstrate the potential of those indicative features for the identification of commuter discomfort, and subsequently highlight the challenges in developing a system that can assess the comfort of the commuter leveraging on those features.

4.1. Commuter Survey

The objective of this survey is to identify the factors which play major role in the commuter discomfort. The survey was designed as an online Google form 111https://tinyurl.com/t93npg5 and was circulated through multiple channels like Facebook, Twitter, and WhatsApp. Additionally, it was also shared through email to different mailing groups, with which the authors were associated. The survey questionnaire is composed of multiple components. (a) First, the survey collected general information regarding the respondent, like the demography and cab usage frequency of the commuter. (b) Next it inquired the commuters about the factors, which affect their comfort when in a cab ride. Six options were provided to choose (speed, jerkiness, congestion, weather condition, driver behavior, cab condition). These options were selected as a set of common features from existing works on riding comfort (Verma et al., 2018; Kottenhoff, 2016; Oborne, 1978). The commuters had the flexibility to choose multiple options. Additionally, a text box was also provided if the commuter felt any other factor should be included. (c) Furthermore, the survey queried if the discomfort she felt was usually at the beginning or end of the trip, or throughout the trip. (d) Finally, the commuter had to report if the time of the day affected the discomfort she felt on a trip. The commuter was asked to comment on the reason in a textbox for a positive response.

4.1.1. Survey Responses

We obtained responses from respondents who avail cab services in different cities from India, USA, Germany, Denmark, and the Netherlands. More than of these respondents avail cabs regularly (and commuters use quite frequently). The outcome of the survey has been summarized in Fig. 1. Majority of the respondents feel discomfort due to congestion () (Fig. 1(a)), followed by jerkiness () and vehicle speed (). All the other factors, including user-suggested factors like cyclist/pedestrian behavior, honking by other vehicles, collectively received responses. The responses also showed that more than of the commuters face discomfort either at the start or the end of the trip (Fig. 1(b)). Furthermore, the time of the day also affects the discomfort of a majority of the commuters () (Fig. 1(c)). This discomfort induced from the trip time has been attributed by their illustrative responses, such as ”poor driving at late night is more dangerous and hence uncomfortable than in the day” or ”it’s possible to miss potholes or bumps at night which causes more discomfort”.

4.1.2. Lessons learnt

Our survey study reveals that (i) speed of the vehicle, (ii) jerkiness, and (iii) road congestion are the key indicators for assessing the commuter discomfort. Additionally, the segment of a trip, which causes discomfort, can be characterized as (iv) time spent on the trip (travel time) and (v) distance covered on the trip (distance travelled). Moreover, (vi) time of the day, when the commuter is taking the trip would also be an important feature to predict commuter discomfort.

Figure 2. Kernel Density Estimate plot (with Gaussian kernel) of all the three features for samples of comfort and discomfort instances on a single trip. (a) Speed (b)Jerkiness (c) Congestion

4.2. Opportunities and Challenges

We conduct a pilot study to show the potential of the aforesaid indicators in order to discriminate comfortable vs uncomfortable ride of a commuter. We collect the recorded data obtained from the pilot data collection experiment on 20 participants (see Sec 3) and extract the key features (i) speed of the vehicle, (ii) jerkiness from acceleration data (Nygård and others, 1999), and (iii) road congestion from the inertial sensors (accelerometer, gyroscope, magnetometer) following the standard techniques in (Verma et al., 2016a). We consider the comfort labels 1-3 as comfortable and 4-5 as uncomfortable for this experiment. In Fig. 2, we plot the kernel density estimate, with a Gaussian kernel, of these three features for instances when the commuter is in a comfortable and uncomfortable state on the same trip. It is interesting to observe that the distribution varies considerably for discomfort state as compared to comfort for all the three features. This points to the fact that just by observing any kind of variation in the distribution of the features (speed of the vehicle, (jerkiness, and road congestion), one can automatically perceive once the commuter starts feeling uncomfortable in a ride.

The above observation provides us with two possible approaches. First, to develop a model solely dependent on Machine Learning, which would generate a trained model learned from a large dataset of commuter data. The second approach could be to estimate the features of the commuter’s comfort perception, any variation (or anomaly) observed over this estimate can be perceived as commuter discomfort. However, any learning based approach would require a large volume of data and the learning would be historical. On the contrary, the anomaly detection approach, as we show later, has two advantages. First, it could work with sparse data resolving the need of a large dataset. Second, it performs learning after a trip starts with a small duration of bootstrapping at the beginning of the trip which provides an option for online learning.

However, to leverage on the aforementioned opportunity, we need to address the following challenges. (a) Notably, unlike, travel time, distance, and time of the day, which can be directly calculated at any time instance, the other three features (speed of the vehicle, jerkiness, and congestion) vary spatially as well as temporally. Hence, it is non-trivial to estimate these features directly at any time instance; rather their values can be estimated from the modeled distribution of the features. The first challenge roots out from this need to develop suitable spatio-temporal baseline models, which can represent those features at the comfort state of the commuter in a ride. (b) Subsequently, any deviation (or anomaly) from the modeled baseline distribution (termed as comfort distribution) of features can be identified as discomfort. Hence, in every new trip, such anomaly likelihood needs to be learned for each feature, starting at the beginning of the trip. Moreover, since the learning would start at the beginning of the trip, the data available would be quite sparse. Hence the second challenge would be to develop a model for detecting anomaly from the comfort distribution, that can learn well on sparse data too. (c) The third challenge arises from the understanding that each commuter is different, and her personality traits should be addressed while designing the models. (d) The performance of the pre-trained model starts deteriorating once (i) the commuter’s personal preferences change over time, (ii) a new commuter launches the system. In both the cases, the pre-trained model fails to capture the comfort distribution and anomaly likelihood. Hence, the fourth challenge is to update and adapt the system with suitable model retraining mechanism.

Figure 3. Block diagram of the developed system

Keeping the above challenges in mind, we develop Ridergo , which is composed of three broad modules, as shown in Fig. 3; (a) Feature Extractor - which takes care of sensing data and extracting required features, (b) Discomfort Likelihood Estimator - which estimates the likelihood that the driving could cause discomfort, and (c) Comfort Level Predictor - which based on the discomfort likelihood predicts the comfort level of the commuter. Ridergo runs a smartphone app which captures the data and finally displays the feedback, and the overall processing of the system (primarily the above three modules) run on a server. The smartphone app periodically sends the collected data to the server and fetches the feedback to display it over the app.

In the following sections we describe each module in detail.

5. Feature Extraction

Taking clues from the user study (Section 4), in this section we introduce the features, which carry the signature of commuter comfort. We rely on the sensor data streams collected from the smartphones to extract those features.

As smartphone sensor data are usually noisy, we perform pre-processing using standard techniques for axis reorientation and data smoothing (Verma et al., 2016b). Following this, we concentrate on the extraction of features, which can be broadly categorized into two classes (a) instantaneous and (b) spatio-temporal features.

5.1. Instantaneous Features

These features could be calculated directly from the sensor data at any time instance. From the commuter survey, we identify three instantaneous features, namely (a) travel time (), (b) distance traveled (), and (c) time of the day (), which may directly impact the discomfort of a commuter. is divided into 4 time zones (6 AM -10 AM(0), 10 AM- 4 PM(1), 4 PM - 10 PM(2), 10 PM - 6 AM(3)) in this paper, however it is configurable and would change based on the city characteristics.

5.2. Spatio-Temporal Features

Unlike instantaneous features, these features vary both spatially as well as temporally, hence are difficult to compute at any time instance. For instance, determining the exact speed of a vehicle at any point is difficult, as it depends on both time and the spatial characteristics of the road the vehicle is driving on. In our survey, we identify three spatio-temporal features, (a) speed (), (b) jerkiness (), and (c) congestion (). The exact values of these features depend on the actual time & location of the vehicle as well as the behavior of neighboring vehicles at the time of computation. Instantaneous values of these spatio-temporal features do not directly indicate the commuter discomfort. Rather, at any point of journey, we may estimate the instantaneous value of the spatio-temporal feature at that time & location and then compute the discomfort likelihood, based on the deviation of the feature values from their baseline (comfort) distribution as perceived by the commuters in the previous trips (details in Sec 7).

The instantaneous speed () can be obtained from the GPS sensor. The instantaneous value of jerk () is computed as within a sampling window of  (Nygård and others, 1999), where is the acceleration along y-axis at time . On the other hand, the instantaneous value of congestion (c) could be calculated by observing the stop-move-stop-move pattern of acceleration along y-axis (Verma et al., 2017). Let the time period for the stop-move pattern be , then we have medium congestion (1) when and high congestion (2) when . Otherwise, the congestion value is set to zero.

It is interesting to note that although the features like congestion and time of the day cannot be explicitly controlled by the cab driver, nevertheless, this is important to observe how the cab driver deals with such scenarios; this discriminates between efficient driving with the poor driving and impacts the (relative) comfort of the commuter. By modeling the comfort distribution for congestion (as spatio-temporal feature), we speculate the driving behavior, which provides (relative) comfort to the commuter in congestion.

6. Discomfort signature from the spatio-temporal features

Notably, detecting commuter comfort from spatio-temporal features is not trivial; the instantaneous values of the spatio-temporal features would not directly provide a measure of comfort. For example, on a bumpy road, the jerkiness is likely to be higher – even an expert driver cannot avoid that completely. However, in this case, although the commuter may feel discomfort, it is not due to the driving behavior, rather due to the driving environment. Even a personalized model does not help, as the trip environment, like road condition, congestion, etc. may vary for each trip, which may affect the commuter comfort.

In order to address this issue, we develop a model which could identify the commuter’s discomfort at a personalized level on any trip. This model has two important steps. First, we model the baseline distribution of the spatio-temporal features perceived at the comfort state of the commuter; we call these distributions as comfort distribution. Importantly, the comfort distribution would exhibit different behavior on different trips. For instance, the distribution of jerkiness on a bumpy road would be different compared to that on a smooth highway. In the second step, we aim to estimate the spatio-temporal features from the extracted sensor data at any point of time of the trip, observe its deviation from the baseline comfort distribution and compute the likelihood of commuter discomfort. Hence, we train the comfort distribution to a learning model which can compute the deviation of the estimated distribution of the spatio-temporal features from the comfort distribution for the commuter on that trip. We designate this deviation as the discomfort likelihood. The detail follows.

6.1. Step 1: Modeling Comfort Distribution of Spatio-Temporal Features

We now focus on modeling the distribution of speed, jerkiness, and congestion at the comfort state of the commuter, which are represented as spatial time series. We start with the speed () which at any time instance could take any random value in a metric space; however, it is always dependent on the time instance and occurs over the period , where is the total trip time of the commuter. Moreover, past speed history impacts the current speed of the vehicle. Consequently, we model as a self-exciting temporal point process (Reinhart and others, 2018). Hawkes proposed the concept of a self-exciting temporal point process (Hawkes, 1971) based on the notion of causality, i.e., if an event occurs, another event becomes more likely to occur locally in time. If is the history of all speed events in a trip, for which the commuter felt comfortable up to time , then the conditional intensity (Rasmussen, 2011), which characterizes the speed process is represented as;

(1)

Here, is background rate of the speed events describing how the likelihood of the speed values evolves in time. is called triggering kernel, which regulates the influence of recent history vs. older history on the current value of speed (Hawkes, 1971).

Next we turn toward the other two features jerkiness () and congestion (). Unlike speed, both of these features are affected by the time and spatial information. For instance, congestion observed by a vehicle at some location is obviously regulated by the current time . Nevertheless, the congestion felt by that vehicle at time also gets affected by the action of nearby vehicles present in that location, attributing the role of spatial factor. Similarly, the jerkiness of the vehicle is impacted by the spatial characteristics of the road. We suitably extend the temporal point process of Eq. 1 to model the jerkiness and congestion as a self-exciting spatio-temporal point process (Reinhart and others, 2018). The conditional intensity function which characterizes a spatio-temporal self-exciting process for feature (), where could be or at times , and at locations can be expressed as;

(2)

6.2. Step 2: Discomfort Estimation from the Spatio-Temporal Features

In this section, we compute discomfort likelihood as the deviation of the observed distribution of the spatio-temporal features during a trip, with respect to the modeled comfort distribution. During a trip, we estimate the observed spatio-temporal features, speed , jerkiness , and congestion from the recorded smartphone sensor stream following Sec 6.2. We develop a HTM based model which we first train on the comfort distribution. Hence, this HTM model allows us to predict the spatio-temporal feature, pretending the comfortable state of the commuter. On the run time, during a trip, the model computes the deviation of this predicted and observed features as discomfort likelihood, indicating the commuter discomfort. Fig 4 summarises the procedure.

6.2.1. Predicting spatio-temporal features from comfort distribution

We develop a Hierarchical Temporal Memory (HTM) (Hawkins and Blakeslee, 2007; Ahmad and Hawkins, 2015) model (see Fig 4) to predict the spatio-temporal features at time , pretending the comfort state of the commuter. First we encode the instantaneous value of a spatio-temporal feature as input semantically as a sparse array called the

Sparse Distributed Representation (SDR)

through a spatial pooler to get . Then using the comfort distribution for each of the three spatio-temporal features, obtained from Eq. 1 and  2, we train the HTM model in temporal pooler such that the predicted is equal to . In this way, the HTM model gets trained to predict any spatio-temporal feature (from the comfort distribution) at a given time .

Figure 4. HTM Architecture and Anomaly Detection

6.2.2. Estimating anomaly: deviation of observed and predicted features:

Hierarchical Temporal Memory (HTM) model is equipped to detect the anomaly observed in the sparse data obtained from the commuter’s smartphone. At run time during a trip, the instantaneous value of the observed spatio-temporal feature is fed as input to the trained HTM model, which is then again represented as using the encoder. The temporal pooler, on the other hand, predicts the expected value at the comfortable state of the commuter. Given the observed representation and the predicted representation of the current feature input , the prediction error is computed that will be 0 for accurate prediction and 1 for completely orthogonal prediction.

6.2.3. Discomfort Likelihood Calculation:

Notably, prediction error only shows the instantaneous predictability of the system. For instance, a sudden brake may or may not lead to poor driving. Thus, a threshold on the prediction error would not be a proper measure of commuter discomfort. Rather, HTM model relies on the distribution of errors as a discomfort metric. It stores a window of last prediction errors as raw anomaly scores

and models the distribution as a rolling-normal distribution. Given the sample mean

and variance

in , HTM then calculates a recent short-term average of the raw anomaly scores, and computes the discomfort likelihood

based on the Gaussian tail probability (Q-function) 

(Karagiannidis and Lioumpas, 2007).

(3)

where is the sample mean for the short-term moving average window , where .

We calculate this likelihood score for all the three spatio-temporal features, (i) speed of the vehicle (), (ii) jerkiness observed (), and (iii) congestion on the road, from the recorded smartphone sensor data during the trip of a commuter.

7. Development of Ridergo 

In this section, we develop Ridergo which infers the comfort level of a commuter, based on the driving quality during a trip. The core of Ridergo is a Multi-task Learning (MTL) model, which leverages on the discomfort likelihood of spatio-temporal features along with the instantaneous features, to indicate the commuter comfort on a 5-point scale. Ridergo captures the personality traits of the individual commuter as well as adapts retrains itself for a newly joined commuter or if the existing commuter changes her preferences. It is interesting to note that, in principle, the proposed MTL model for commuter comfort detection may indeed work on the raw data. However, the MTL model requires a massive volume of raw data to automatically learn the complex Spatio-temporal features, as stated before, and it should take a long time for loss convergence. As we wish to develop a personalized MTL model equipped to predict the rider’s comfort in real-time on a trip, the availability of a sufficient volume of data on a trip is a significant challenge. This challenge gets manifold if we allow MTL to automatically learn those complex features from raw data; fast convergence of loss is another issue. Hence, in our data constraint environment, we handcraft those complex features (as discomfort likelihood of spatial features) and feed them to the MTL model, which allows us to ‘quickly’ train the model with ‘reasonably sparse volume’ of data.

7.1. MTL driven comfort detection

We develop the model to identify the commuter’s perception of driving using the Multi-task Learning technique (Caruana, 1998). The perception of each commuter is taken as a separate task, thus taking into consideration the personality trait of the commuter regarding the driver’s driving style. Additionally, the model also ensures robust learning by sharing the data across multiple tasks to learn features of one commuter (one task) from related commuters. The model provides an

indicator vector

of dimension 5 (for 5-point comfort scale), designating a probability for each comfort level (ranging from completely comfortable (1) to highly uncomfortable (5)). The comfort level with the highest probability gets inferred as the perceived comfort of the commuter.

Figure 5. Architecture of an MTL-NN

Effectively, as shown in Fig. 5, the Multi-task learning Neural Network (MTL-NN) model learns the features at two levels, the shared and the task-specific levels. The input containing the feature vector, obtained from the spatio-temporal and instantaneous features, which it obtained from the previous layer, is fed into the model. The next layer is the shared layer, which contains a set of hidden nodes; the parameters of these nodes are shared across other nodes of this layer for all the tasks. This shared layer enables inductive transfer which improves learning for one task (say, the impact of congestion on one commuter) by using the information contained in the training signals of other related tasks (say, the impact of congestion on other commuters who are similar to the commuter in some way). This improves the overall model performance since some features may be easy to learn for Commuter 1 while being difficult to learn for Commuter 2. This might occur since Commuter 2 interacts with those features in a more complex way than Commuter 1. The shared layer allows the model to eavesdrop from Commuter 1, and learn the features for Commuter 2.

Simultaneously, MTL-NN allows few hidden nodes to become specialized for capturing the comfort perception of just one commuter (i.e., specialized in one task); this personalized computation, capturing the characteristics of the specific commuter, is carried out in the final task-specific layer. In this layer, computation of one specific commuter can ignore the hidden nodes connected to other commuters, by keeping the weights connected to the small, as they do not appear useful. In this layer, the learning mechanism maps the generalized information learned at the shared layers to a final prediction personalized by the characteristics of the specific commuter (task).

7.2. Model adaptation and retraining

We initially train the Ridergo based on the pilot data collected in Section 3. However, Ridergo is equipped to adapt itself, once the performance of the model drops significantly. Precisely, as the confidence of comfort prediction deteriorates, the model occasionally probes the commuter for ground truth comfort levels (without resulting in survey fatigue). The drop in the prediction confidence is determined from the probabilities in the indicator vector; comparable probability values across diverse comfort levels (say, level and ) in the vector indicate the compromise in the prediction quality222In our implementation, we set a difference of between the highest and next highest probability in the indicator vector as the threshold for probing.. Subsequently, the commuter responses are uploaded on the server to retrain the MTL-NN model with newly collected labeled data. This facilitates Ridergo to enrich the dataset with more data points, both from existing and newly joined commuters and, in turn, improves the model by training on a higher volume of data.

8. Evaluation

We followed a client-server model for implementing Ridergo 

. The server takes care of the major computation tasks like feature extraction, discomfort likelihood computation, and comfort level calculation while the client handles the data collection, shows the computed driving feedback details to the commuters, and logs the commuter responses about their feedback on the driving. The

Discomfort Likelihood Computation Model and the Comfort Level Predictor are both written in Python over a Debian 9.3 server, with an Intel(R) Xeon(R) E5-2620 v3 @ 2.40GHz CPU, 32GB memory and 16GB GPU. The client is built for Android and was published on the Google Play Store. We performed a measurement study of the app using the Android Profiler Toolkit on three devices, Lenovo K6 Power (Android v6.0.1, API23, sampling rate of 3s), Moto G5 (Android v7, API25, sampling rate of 3s), and Samsung J8 (Android v8, API26, sampling rate of 3s). The measurement study shows that the application utilizes CPU resources and memory on an average in an hour. Battery consumption was on an average over an hour of the total battery consumption, and the energy consumption was light overall.

Figure 6. App UI (a) App interface showing comfort levels at intervals and impact of speed, congestion and jerkiness over the comfort. (b) The overall rating unit of the app. (1 - Most Comfortable, 5 - Most Uncomfortable)

Fig 6(a) shows the UI of the app, indicating the projected comfort level of the commuter and the percentage impact of speed, congestion, and jerkiness over the comfort level. The app is available on Google Play Store and had users at the time of reporting333https://play.google.com/store/apps/details?id=com.rohit.ridecomfort with of these also having taken part in the data collection experiment for pilot study and model training (as mentioned in Section 3). The data from the remaining users have not been used for model training and has been used entirely to test the performance of Ridergo . The users were advised to install the application in their smartphone and start the application every time they took a cab ride. Once started the app could be sent to the background. The commuters were also requested to provide proper feedback whenever the system queried them. The smartphone sampled data in a window of , a threshold as per literature for jerkiness calculation (Nygård and others, 1999), and thus used for all other features. If not connected to the Internet, the information was stored in a temporary file and uploaded to the server when the smartphone got reconnected. For the experiment phase, the users were also asked to run the data collection application to obtain the ground truth labels. The dataset details are given in Table 3.

Count Percentage of Count for each rating Mean
1 2 3 4 5
7431376 2 32 33 32 1 3
Table 3. Dataset Statistics

In this section, we first provide the evaluation of the complete system compared to other existing systems. Following this, we look into the performance of the Discomfort Likelihood Estimator sub-module, followed by the Comfort Level Predictor. Finally, we provide a use case of the complete system by showing its usage in a driver-rating application.

8.1. Competing Systems

We compare Ridergo with two competing systems which also provide commuter comfort level using smartphone data. As there are not many such systems available, we compare with one system which is developed for buses (Chin et al., ) (could be easily extended to cabs) and the other for cars (Zhao et al., 2013). Both of these models could be used for online comfort level computation. Additionally, we also develop another model which is similar to Ridergo but gets trained over each commuter in isolation.

8.1.1. Chin et al. (Chin et al., )

This work provides a method which utilizes statistical analysis using classification and regression tree method to compute commuter comfort. They utilize kinematic data collected from commuter smartphone and label the comfort into three levels (No discomfort (1), Noticeable discomfort (2) and Annoying discomfort (3)). We implemented the model on the available data and generated a pruned tree of size . The terminal nodes were then labeled for the three comfort labels. As Ridergo is on a 5-point scale, we use the standard Likert Scale relabeling strategy (67); using integral labels, we obtain the mapping as, .

8.1.2. Join Driving (Zhao et al., 2013)

Join Driving gets the commuter comfort from acceleration data only on a 6-point scale utilizing the International Standard 2631-1-1997 (ISO, 1997). They compute the vibration felt by the commuter using the total value of weighted root mean squared acceleration, combining the vibration along all axes as . Again using the relabeling strategy, we obtain the labels on a 3-point scale as, .

8.1.3. Single Task Learning

In the Single Task Learning (STL) approach, we use the same architecture of Ridergo , while replacing the MTL-NN with an STL-NN. Thus, the model has to learn over each commuter in isolation whenever a new commuter installs the application. Here also, we perform the mapping to a 3-point scale as in (Chin et al., ).

1 2 3 Avg
Ridergo 0.87 0.86 0.889 0.873
STL 0.73 0.753 0.726 0.712
 (Chin et al., ) 0.71 0.721 0.68 0.704
 (Zhao et al., 2013) 0.53 0.658 0.547 0.578
Table 4. AUC values for the Competing Systems

We ran the three models along with Ridergo over multiple trips. In each of these trips, the comfort level provided by each of these models were stored simultaneously. However, as (Chin et al., ) had only three levels, we map the comfort levels on a 3-point scale. Table 4 shows the AUC values for the individual comfort levels and also the average AUC. We take into account factors like congestion or time of the day in addition to the kinematic data to compute comfort, which helps in improving the result when compared to (Chin et al., ) and (Zhao et al., 2013). The shared learning and personalization aspect of the model helps Ridergo to get an edge over the STL model, as STL based models couldn’t capture the personality traits as MTL could (Caruana, 1998). This personalization aspect also is a shortcoming with the other two approaches.

8.2. Performance Evaluation: Discomfort Likelihood Estimator

Ridergo doesn’t exactly give an explicit result showing if a distribution is anomalous. Instead, it uses the discomfort likelihood () to analyze the driving behavior. In order to evaluate the HTM module, we threshold this over a configurable parameter . We say an anomaly is detected if  (Ahmad et al., 2017). Usually, the standard value of is used as , which we used for our model also. We set and (described in Section 6.2.3) as and , respectively, which we obtained empirically as shown in Table 6. We compare the HTM model with two competing models – (a) Multinomial Relative Entropy (Wang et al., 2011) and (b) EXPected Similarity Estimation (EXPoSE) (Schneider et al., 2016)

. These models which are state-of-the-art anomaly detection models were selected keeping in mind that the algorithms should; (a) make online predictions, (b) learn continuously and in an unsupervised fashion, (c) adapt to dynamic environment changes and (d) should make anomaly detection as early as possible. Both these algorithms have open-source implementation 

444https://github.com/numenta/NAB/tree/master/nab/detectors (Access: August 26, 2021). We performed parameter tuning empirically and set the thresholds at our end, as mentioned below. These were kept fixed across all streams of data.

8.2.1. Multinomial Relative Entropy (RE) (Wang et al., 2011)

This algorithm compares the observed data against multiple null hypotheses while representing frequencies of quantized data over certain window sizes. In the implementation, we tuned the window size and the bin count, which were set as and , respectively. The chi threshold, which is used to determine if a hypothesis has occurred frequently, was set as .

8.2.2. EXPoSE (Schneider et al., 2016)

The EXPected Similarity Estimation (EXPoSE) approach is based on the likelihood of the current data-point being normal based on the inner product of its feature map with kernel Hilbert space embedding of the older data points with no assumption of the underlying data distribution. We have used the decay variant of EXPoSE, which provides better results compared to windowing (Schneider et al., 2016). Here we tuned the decay factor to be set as .

Figure 7. Change in AUC with percentage of data used for training for all the three features.
Figure 8. Change in AUC with percentage of data used for training for all the competing models.

8.2.3. Impact of data stream size

The prime reason behind using HTM, as discussed before, was its ability to work well with sparse data, which could be utilized to detect anomalies early on the trip. In order to check how early can the system catch such anomalies, we performed another experiment. We trained the model with the data available from only of the trip up to and then tested over the incoming stream of the data for trips more than (such that we can test for at least of the total data). We measure Area under the Receiver Operating Characteristics (ROC) Curve (AUC), which indicates how much the model is capable of separating anomalies in the given input data stream. We use AUC to measure the performance of the HTM model as the number of anomalous cases are much less compared to the non-anomalous cases. The AUC results averaged over all the trips for the three features (jerkiness, speed, and congestion) are given in Fig. 8. As is evident, we get good AUC score even at of trip time, which almost stabilizes after . Moreover, it reaches above for speed and jerkiness at . Thus, for all our experiments, we start predicting after the first of the trip gets completed.

Fig. 8 shows the AUC values averaged over all the three features for the three models. Compared to the competing models, HTM based model provides better accuracy quite early on a trip. Relative Entropy also provides acceptable accuracy over the trip. However, EXPoSE improves only when it receives considerable data for online training and eventually is almost equal to Relative Entropy and comparable to HTM. The figure indicates that HTM converges much faster compared to Relative Entropy and EXPoSe, and therefore much more suitable in a real-time prediction problem.

W W’ Speed Jerkiness Congestion
3000 5 0.81 0.73 0.71
4000 10 0.83 0.85 0.76
4000 15 0.57 0.56 0.8
5000 5 0.61 0.84 0.53
6000 10 0.78 0.51 0.66
Table 6. Comparison of AUC results with existing models. HTM performs better than RE and EXPoSE.
Model Speed Jerkiness Congestion
RE 0.65 0.74 0.54
EXPoSE 0.4 0.47 0.21
HTM 0.83 0.86 0.78
Table 5. Change in AUC on varying and . Here we have only shown the best 5 combinations.

8.2.4. Anomaly detection performance

Table 6 gives the results of the mean AUC for anomaly detection module for all three features compared to the existing models. The online training is done for the first of the trip, after which the simultaneous learning and prediction phase starts. EXPoSE, being highly dependent on the size of the dataset, provides inferior results as the data it receives on the first minutes of the trip is not sufficient for its convergence, as we have seen earlier. The entropy-based approach also performs poorly as it is known to provide comparatively poor results when the features show both spatial as well as temporal variation at the same time.

8.3. Performance Evaluation: Comfort Level Predictor

The model was trained using the data collected during the data collection phase (Section 3). The Feature Extractor calculated all the features required by the model. The discomfort likelihoods were then obtained from the HTM model. Following this, discomfort likelihood scores for the three spatio-temporal features (speed, congestion, and jerkiness) along with the three instantaneous features (, and

), were fed in the model along with the labels obtained from the commuters. The model was then trained using a loss function for softmax regression 

(Heckerman and Meek, 1997) with data for training and for validation. The remaining data was used for testing.

We evaluate the trained MTL-NN model over the data collected from the ten volunteers who took part in the experiments in Section 3. The discomfort likelihood scores are obtained from the HTM module for all the three spatio-temporal features, and the remaining three instantaneous features are directly obtained from the Feature Extractor. As we have discussed earlier, Ridergo labels the data points on a scale from (highly comfortable) to (highly uncomfortable); however, as most of the data points are labeled between 1-3, considering this unbalanced dataset, we compute the AUC  (Huang and Ling, 2005; Galar et al., 2011). Moreover, in light of the multi-class classification, we utilize a forced binary classification using the one-versus-all approach. For instance, we consider as the success class and all other combined as the failure class. We then plot the ROC for all these separate instances and give the AUC result aggregated over the number of classes. The results of the aggregated AUC and for the five instances are given in Table 8, where we obtained an average AUC score of . It can be observed that the AUC for label 5 (highly uncomfortable) is the highest, which can mostly be linked to extreme scenarios that cause high discomfort for a commuter at a personal level and would have quite distinctive characteristics compared to other labels. In Fig 9, we plot the variation for speed and jerkiness with respect to comfort level for two users. One of them is highly impacted by the speed variations while the other due to jerkiness. As can be seen, the characteristic for level 5 is quite extreme and easily distinguishable from the other labels in both the scenarios.

Figure 9. Variation of a feature with respect to comfort level of a user who is primarily affected by the same. (1 - Most Comfortable, 5 - Most Uncomfortable)
1 2 3 4 5 Average
AUC 0.876 0.864 0.861 0.884 0.893 0.876
Table 8. Sobol Indices (TOI) for the Six Features.
Feature
TOI 0.89 0.85 0.80 0.75 0.71 0.66
Table 7. AUC Scores for all labels

In order to obtain the classification importance of each of the six features, we performed sensitivity analysis (Saltelli et al., 2000). Sensitivity analysis is the study of relative interaction of different input factors on the model output. We used Sobol Total Order Indices (Sobol, 1993) to perform sensitivity analysis as it converges to the exact relative contributions and interactions of the input factors with respect to the variability in the output. The results are given in Table 8, and we observe that the total order confidence is below for each feature, thus confirming that the sample size provided is sufficient for the analysis and the measured indices are significant. We observe that congestion followed by the time of the day has the highest impact on the discomfort a commuter feels, which also seems intuitive. Congestion is associated with long waiting times, and taking a trip at night or early morning usually would make a commuter more uncomfortable with even a small variations in the driving. However, this need not be true for all, as is evident from Fig. 11 where we have shown the impact of different features on 10 randomly chosen users. This brings out the personalization aspect clearly as we can see that each commuter is affected by different features differently.

Figure 10. Impact of features on 10 users. Darker value implies higher impact of the feature on a user.
Figure 11. Impact of data augmentation. Scenarios: (1) Data from any one new commuter (2) Data from all new commuters (3) New data from all commuters

8.3.1. Impact of data augmentation

We also tried to observe the impact of data augmentation on the model, where we retrained the model over the data collected during the testing phase of the experiments. Here, the commuter was asked to provide their labels whenever the model was nearly ambiguous (decided based on comparable classification probability between two or more classes, as discussed earlier) about predicting the label. In order to perform this experiment, we observed the results when we received new data in three scenarios; (1) One new app user: As mentioned earlier, only five of the users were involved in the data collection experiment described in Section 3. Whenever one of the new users were polled for feedback, the data was tagged as to be obtained from a new user. In this scenario, we only considered the impact on the model when we added data from only one of these new commuters to the existing dataset. (2) All new app users: In this scenario, data from any new user was added to the dataset, and the corresponding impact on the model was observed. (3) All app users: In this experiment, we collected feedback from all the users who had used the application.

Following this, we trained and tested the model with a split for training, validation, and testing. It should be noted that for the first scenario, the test was done for all new app users, and the final AUC was calculated as an average over the result of adding data of any one new user to the existing data. As is observed from Fig 11, data from one new user improves the results but not much. However, adding data from all the new users considerably improves the model. Nevertheless, once the model has learned over all 30 users, adding new data over this, though improves the performance, but not considerably.

We also noted the instances when the model requested for user feedback owing to nearly ambiguous prediction. There were two scenarios when such drop in confidence could occur;

Existing Commuter: In this scenario, we consider any existing user, who has used the app for at least a week, for conducting the experiment and observe that on average only for 5% of cases, there is a drop in confidence. This is mostly attributed due to the (rare) changes in commuter preference in a trip, for almost similar conditions (say, in a trip, she initially preferred moderate speed, however, at the last leg of the trip, preferred high speed to quickly reach the destination).

New Commuter: In this scenario, we consider the case where a new user joined the experiment. This new commuter’s comfort labels are initially detected from the existing model, trained on few existing commuters, exhibiting similarity with the new commuter (similarity is handled by the MTL-NN). Evidently, the proposed model makes mistakes for those new commuters, exhibiting drop in confidence in indicator vectors. Precisely, frequent retraining was required initially for these new commuters (average of  35% labels requiring a commuter feedback), but it gradually decreased with time.

8.4. Application: Driver Rating System based on Ridergo 

In this subsection, we provide a prototype application where we use Ridergo to assist the commuters. It should be noted, though, that the following application is just a proof-of-concept to show the utility of Ridergo and can be further modified and looked into as a separate research problem.

Ratings are an essential aspect of companies like Uber/Lyft, which affect the driver’s commission, number of rides, and in pressing cases losing their job (68). However, commuters are usually conflicted when giving a rating to a driver unless the ride has been poor (60; 63). This, in turn, profoundly affects the driver as well as the company reputation. An application that takes cues from the driving and provides a suitable rating to the driver would thus be quite useful. Ridergo could be used as an excellent framework for such an application. We added a module to our application (Fig 6(b)), which performs an averaging over the comfort rating throughout the trip to rate the driver. It also shows the impact of speed, congestion, and jerkiness over the complete trip comfort averaged over individual values. As can be observed in the figure, we also asked the commuters to provide a comfort rating to the ride, which was stored in our server as ground truth value.

In order to calculate the agreement between the calculated and user ratings, we use Kendall’s coefficient of concordance ((Kendall and Smith, 1939) which is a good metric for such 5-point rating scales (27). This is calculated as , where is the total rank given to rating , is the mean of while and are the numbers of competing rating systems and number of ratings respectively. Here and . Calculating over all the responses, we observed a value of , which is considered as a good agreement as per the existing literature (39).

9. Discussion

Although Ridergo shows considerable promise as a system to assess commuter comfort at a personalized level which could be utilized by many other services, in this section we discuss some limitations and future directions to improve the overall system.

9.1. Incorporating Additional Features for Model Improvement

Ridergo focuses on two generic feature classes – (i) instantaneous (time of the day, distance traveled, and time traveled) and (ii) spatio-temporal features (speed, congestion, and jerkiness), rather than specific feature variations, and use directly available quantitative features to develop the model. However there could be several non-quantifiable features which do impact the commuter’s comfort. For instance, the personalized features like if the commuter is in a hurry, weather condition, laptop or phone usage, etc. are more qualitative in nature. A possible direction could be to take such information as an input from the user. For instance, expected travel time could be an input from the user which could be normalized based on the average travel time on the route to measure urgency; the OpenWeatherMap Weather API could be used to get the weather condition on a three-point-scale (Verma et al., 2019); binary inputs could be taken from the user for usage of laptop or phone. The MTL model that we have used for comfort level prediction, is generic and is expected to provide proper predictions when including such features with suitable quantitative mapping.

9.2. Improving the Rating System

The rating system discussed in this paper is a simple proof-of-concept to show the utility of Ridergo . The main goal of this work is to develop a methodology to connect commuter’s comfort with the driver rating system while computing the comfort solely from the travel parameters without explicitly asking the commuter and thus eliminating a rating bias. However, any rating system has a primary linkage with the business policy of the cab companies, therefore, the cab companies can use a more sophisticated rating system, which might even vary across cab companies, while considering the commuter’s comfort as one of the important parameters.

9.3. Other Applications of Ridergo 

There are several other directions we could look into as potential applications of Ridergo . For the commuter, an application could provide more information from historical ratings or route based comfort information. Another application could make available to the driver the commuter’s comfort state and also the reason behind that. That way the driver could take necessary steps if possible to improve the commuter’s comfort. Moreover, if both are using the application, a profile sharing based on the application could be done such that the driver could take better driving decisions as well as the commuter is ready for the ride. This could be further extended to include a driver recommendation system based on profile matching.

10. Conclusion

In recent times, there has been an increasing demand for comfortable ride-sharing options like Uber, Lyft, etc. in contrast to public transport. As these ride-sharing companies hugely rely on the ratings the drivers received from the commuters, it has become imminent to maintain the comfort level for a commuter taking the ride. In light of this, we develop a system Ridergo , which understands the comfort needs of a commuter at a personalized level and computes whether a specific driving style at a time on the trip is causing discomfort to the commuter. Based on an online survey and pilot study, we understand what features could affect the comfort of a commuter. We then use a Hierarchical Temporal Memory and Multi-task learning-based model to compute if any change in the distribution of three spatial time-series features– speed, jerkiness, and congestion– along with other static trip information is causing discomfort to a commuter and to what level.

Furthermore, we also add another feature in Ridergo , which checks if the current computation of comfort level is near ambiguous and requests the commuter for feedback, which improves the dataset on which further training could make the model robust and scalable to new and existing users both. Thorough experiments with Ridergo shows that it not only computes the comfort levels effectively but could also understand at what level does a feature affects a particular commuter’s comfort. Thus, efficiently capturing the personal comfort needs of the commuter. Such a system, which computes commuter discomfort at a personalized level, could be utilized for several applications like driver rating, alerting a driver of a commuter’s discomfort, assigning drivers to commuter based on her comfort profile, etc. We have built a comfort rating application to show the utility of the comfort calculation framework. Further detailed research in this line could help build much more efficient and similar applications utilizing the perception of commuter comfort during a cab-ride.

References

  • S. Ahmad and J. Hawkins (2015) Properties of sparse distributed representations and their application to hierarchical temporal memory. arXiv preprint arXiv:1503.07469. Cited by: 3rd item, §6.2.1.
  • S. Ahmad, A. Lavin, S. Purdy, and Z. Agha (2017) Unsupervised real-time anomaly detection for streaming data. Neurocomputing 262, pp. 134–147. Cited by: §8.2.
  • A. Azzoug and S. Kaewunruen (2017) RideComfort: a development of crowdsourcing smartphones in measuring train ride quality. Frontiers in Built Environment 3, pp. 3. Cited by: §2.2.
  • R. Caruana (1998) Multitask learning. In Learning to learn, pp. 95–133. Cited by: 4th item, §7.1, §8.1.3.
  • N. K. Chan (2019) The rating game: the discipline of uber’s user-generated ratings. Surveillance and Society 17 (1/2), pp. 183–190. Cited by: §1.
  • S. Cheng, C. Chen, T. Kandappu, H. C. Lau, A. Misra, N. Jaiman, R. Tandriansyah, and D. Koh (2017) Scalable urban mobile crowdsourcing: handling uncertainty in worker movement. ACM Transactions on Intelligent Systems and Technology (TIST) 9 (3), pp. 1–24. Cited by: §2.2.
  • [7] H. Chin, X. Pang, and Z. Wang Analysis of bus ride comfort using smartphone sensor data. Cited by: §2.2, §8.1.1, §8.1.3, §8.1.3, §8.1, Table 4.
  • G. Chittaranjan, J. Blom, and D. Gatica-Perez (2011)

    Who’s who with big-five: analyzing and classifying personality traits with smartphones

    .
    In 2011 15th Annual international symposium on wearable computers, pp. 29–36. Cited by: §1.
  • A. K. Clear, S. M. Finnigan, P. Olivier, and R. Comber (2017) ” I’d want to burn the data or at least nobble the numbers” towards data-mediated building management for comfort and energy use. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing, pp. 2448–2461. Cited by: §2.1.
  • B. Du, Y. Cui, Y. Fu, R. Zhong, and H. Xiong (2018) SmartTransfer: modeling the spatiotemporal dynamics of passenger transfers for crowdedness-aware route recommendations. ACM Transactions on Intelligent Systems and Technology (TIST) 9 (6), pp. 1–26. Cited by: §2.2, Table 1.
  • I. N. Dunlop, J. M. Casello, and S. T. Doherty (2015) Tracking the transit rider experience: using smartphones to measure comfort and well-being throughout the trip. Technical report Cited by: §2.2.
  • L. Eboli, G. Mazzulla, and G. Pungillo (2016) Measuring bus comfort levels by using acceleration instantaneous values. Transportation research procedia 18, pp. 27–34. Cited by: §2.2.
  • M. Elbanhawi, M. Simic, and R. Jazar (2015) In the passenger seat: investigating ride comfort measures in autonomous cars. IEEE Intelligent Transportation Systems Magazine 7 (3), pp. 4–17. Cited by: §2.3, Table 1.
  • H. Eren, S. Makinist, E. Akin, and A. Yilmaz (2012) Estimating driving behavior by a smartphone. In 2012 IEEE Intelligent Vehicles Symposium, pp. 234–239. Cited by: §2.3.
  • M. Florio (2013) Network industries and social welfare: the experiment that reshuffled european utilities. OUP Oxford. Cited by: §2.1.
  • M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera (2011) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42 (4), pp. 463–484. Cited by: §8.3.
  • M. Garvey, N. Das, J. Su, M. Natraj, and B. Verma (2016) Passage: a travel safety assistant with safe path recommendations for pedestrians. In Companion Publication of the 21st International Conference on Intelligent User Interfaces, pp. 84–87. Cited by: §2.2.
  • P. Goel (2016) Private personalized dynamic ride sharing. Ph.D. Thesis. Cited by: §2.1.
  • B. Hanrahan, M. Ning, and Y. Chien Wen (2017) The roots of bias on uber. In Proceedings of 15th European Conference on Computer-Supported Cooperative Work-Exploratory Papers, Cited by: §1.
  • A. G. Hawkes (1971) Spectra of some self-exciting and mutually exciting point processes. Biometrika 58 (1), pp. 83–90. Cited by: §6.1.
  • J. Hawkins and S. Blakeslee (2007) On intelligence: how a new understanding of the brain will lead to the creation of truly intelligent machines. Macmillan. Cited by: 3rd item, §6.2.1.
  • D. Heckerman and C. Meek (1997) Models and selection criteria for regression and classification. In

    Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

    ,
    pp. 223–228. Cited by: §8.3.
  • H. S. Hossain, S. R. Ramamurthy, M. A. A. H. Khan, and N. Roy (2018) An active sleep monitoring framework using wearables. ACM Transactions on Interactive Intelligent Systems (TiiS) 8 (3), pp. 1–30. Cited by: §2.2.
  • J. Huang and C. X. Ling (2005) Using auc and accuracy in evaluating learning algorithms. IEEE Transactions on knowledge and Data Engineering 17 (3), pp. 299–310. Cited by: §8.3.
  • ISO (1997) Mechanical vibration and shock: evaluation of human exposure to whole-body vibration. part 1, general requirements: international standard iso 2631-1: 1997 (e). ISO. Cited by: §8.1.2.
  • S. Jiang, L. Chen, A. Mislove, and C. Wilson (2018) On ridesharing competition and accessibility: evidence from uber, lyft, and taxi. In Proceedings of the 2018 World Wide Web Conference, pp. 863–872. Cited by: §1.
  • [27] (2019) Kappa statistics for Attribute Agreement Analysis (available online). Note: https://support.minitab.com/en-us/minitab/18/help-and-how-to/quality-and-process-improvement/measurement-system-analysis/how-to/attribute-agreement-analysis/attribute-agreement-analysis/interpret-the-results/all-statistics-and-graphs/kappa-statistics/ Cited by: §8.4.
  • G. K. Karagiannidis and A. S. Lioumpas (2007) An improved approximation for the gaussian q-function. IEEE Communications Letters 11 (8), pp. 644–646. Cited by: §6.2.3.
  • M. G. Kendall and B. B. Smith (1939) The problem of m rankings.. Annals of mathematical statistics. Cited by: §8.4.
  • K. Kottenhoff and J. Sundström (2011) Samband mellan körstil och åkkomfort: förbättringspotentialen inom kollektivtrafiken. In Transportforum 2011, Cited by: §2.1.
  • K. Kottenhoff (2016) Driving styles and the effect on passengers:-developing ride comfort indicators. Cited by: §2.1, §4.1.
  • H. Li, H. Wang, L. Liu, and M. Gruteser (2018) Automatic unusual driving event identification for dependable self-driving. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems, pp. 15–27. Cited by: §2.3.
  • Z. Li and D. A. Hensher (2013) Crowding in public transport: a review of objective and subjective measures. Journal of Public Transportation 16 (2), pp. 6. Cited by: §2.2.
  • Y. Liang (2017) Knowledge sharing in online discussion threads: what predicts the ratings?. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing, CSCW ’17, New York, NY, USA, pp. 146–154. External Links: ISBN 9781450343350, Link, Document Cited by: §1.
  • M. Liu, E. Brynjolfsson, and J. Dowlatabadi (2018) Do digital platforms reduce moral hazard? the case of uber and taxis. Technical report National Bureau of Economic Research. Cited by: §1.
  • J. Machaj, P. Brida, O. Krejcar, M. Petkovic, and Q. Shi (2020) Development of smartphone application for evaluation of passenger comfort. Intelligent Information and Database Systems, pp. 249–259. External Links: ISBN 9789811533808, ISSN 1865-0937, Link, Document Cited by: §2.3.
  • N. Martelaro and W. Ju (2017) WoZ way: enabling real-time remote interaction prototyping & observation in on-road vehicles. In Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing, pp. 169–182. Cited by: §2.2.
  • R. Mayr (1959) Comfort in railway travel. The Railway Gazette, pp. 266–269. Cited by: §2.1.
  • [39] (2019) Measurement Systems Analysis Reference Manual, 4th edition (available online). Note: http://www.rubymetrology.com/add_help_doc/MSA_Reference_Manual_4th_Edition.pdf Cited by: §8.4.
  • A. Mihoub and G. Lefebvre (2019) Wearables and social signal processing for smarter public presentations. ACM Transactions on Interactive Intelligent Systems (TiiS) 9 (2-3), pp. 1–24. Cited by: §2.2.
  • M. Muñoz-Organero and V. Corcoba-Magaña (2017) Predicting upcoming values of stress while driving. IEEE Transactions on Intelligent Transportation Systems 18 (7), pp. 1802–1811. Cited by: §2.3.
  • M. Nanni, R. Trasarti, A. Monreale, V. Grossi, and D. Pedreschi (2016) Driving profiles computation and monitoring for car insurance crm. ACM Transactions on Intelligent Systems and Technology (TIST) 8 (1), pp. 1–26. Cited by: §2.3.
  • M. Nygård et al. (1999) A method for analysing traffic safety with help of speed profiles. Cited by: §4.2, §5.2, §8.
  • D. Oborne (1978) Passenger comfort—an overview. Applied Ergonomics 9 (3), pp. 131–136. Cited by: §2.1, §4.1.
  • S. Park, W. Cheung, Y. Cho, and Y. Yoon (1998) Dynamic ride quality investigation for passenger car. SAE transactions, pp. 1198–1204. Cited by: §2.3.
  • N. Perterer, P. Sundström, A. Meschtscherjakov, D. Wilfinger, and M. Tscheligi (2013) Come drive with me: an ethnographic study of driver-passenger pairs to inform future in-car assistance. In Proceedings of the 2013 Conference on Computer Supported Cooperative Work (CSCW), pp. 1539–1548. Cited by: §2.3.
  • R. Priedhorsky, D. Pitchford, S. Sen, and L. Terveen (2012) Recommending routes in the context of bicycling: algorithms, evaluation, and the value of personalization. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (CSCW), pp. 979–988. Cited by: §2.2.
  • L. Qi, M. Zhou, and W. Luan (2016) Impact of driving behavior on traffic delay at a congested signalized intersection. IEEE Transactions on Intelligent Transportation Systems 18 (7), pp. 1882–1893. Cited by: §2.3.
  • G. Quattrone, L. Capra, and P. De Meo (2015) There’s no such thing as the perfect map: quantifying bias in spatial crowd-sourcing datasets. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW), pp. 1021–1032. Cited by: §2.2.
  • J. G. Rasmussen (2011) Temporal point processes: the conditional intensity function. Lecture Notes, Jan. Cited by: §6.1.
  • N. Raval and P. Dourish (2016) Standing out from the crowd: emotional labor, body labor, and temporal labor in ridesharing. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing, pp. 97–107. Cited by: §1.
  • A. Reinhart et al. (2018) A review of self-exciting spatio-temporal point processes and their applications. Statistical Science 33 (3), pp. 299–318. Cited by: §6.1, §6.1.
  • B. Rogers (2015) The social costs of uber. U. Chi. L. Rev. Dialogue 82, pp. 85. Cited by: §1.
  • M. Rubira Freixas (2016) Effects of driving style on passengers comfort: a research paper about the influence of the bus driver´ s driving style on public transport users. Cited by: §1.
  • D. Ruzic (2011) Improvement of thermal comfort in a passenger car by localized air distribution. ACTA Technica Corviniensis-Bulletin of Engineering 4 (1), pp. 63. Cited by: §2.3, Table 1.
  • A. Saltelli, K. Chan, E. M. Scott, et al. (2000) Sensitivity analysis. Vol. 1, Wiley New York. Cited by: §8.3.
  • D. Santani, J. Njuguna, T. Bills, A. W. Bryant, R. Bryant, J. Ledgard, and D. Gatica-Perez (2015) Communisense: crowdsourcing road hazards in nairobi. In Proceedings of the 17th International Conference on Human-Computer Interaction with Mobile Devices and Services, pp. 445–456. Cited by: §2.2.
  • M. Schneider, W. Ertel, and F. Ramos (2016) Expected similarity estimation for large-scale batch and streaming anomaly detection. Machine Learning 105 (3), pp. 305–333. Cited by: §8.2.2, §8.2.2, §8.2.
  • X. Shen, S. Feng, Z. Li, and B. Hu (2016) Analysis of bus passenger comfort perception based on passenger load factor and in-vehicle time. SpringerPlus 5 (1), pp. 1–10. Cited by: §2.1.
  • [60] (2019) Should I Give This Perfectly Good Uber Driver A 1-Star Rating? (available online). Note: https://onemileatatime.com/what-to-rate-uber-driver/ Cited by: §8.4.
  • I. M. Sobol (1993) Sensitivity estimates for nonlinear mathematical models. Mathematical modelling and computational experiments 1 (4), pp. 407–414. Cited by: §8.3.
  • C. Song, J. Wu, M. Liu, H. Gong, and B. Gou (2012) Resen: sensing and evaluating the riding experience based on crowdsourcing by smart phones. In 2012 Eighth International Conference on Mobile Ad-hoc and Sensor Networks (MSN), pp. 147–152. Cited by: §2.2.
  • [63] (2019) The rating game (available online). Note: https://www.theverge.com/2015/10/28/9625968/rating-system-on-demand-economy-uber-olive-garden Cited by: §8.4.
  • [64] (2019) The Ride-Hailing Mobile Application for Personalized Travelling (available online). Note: http://www.ccsenet.org/journal/index.php/mas/article/view/0/37286 Cited by: §2.1.
  • J. Thebault-Spieker, D. Kluver, M. A. Klein, A. Halfaker, B. Hecht, L. Terveen, and J. A. Konstan (2017) Simulation experiments on (the absence of) ratings bias in reputation systems. Proc. ACM Hum.-Comput. Interact. 1 (CSCW). External Links: Link, Document Cited by: §1.
  • A. Tirachini, D. A. Hensher, and J. M. Rose (2013) Crowding in public transport systems: effects on users, operation and implications for the estimation of demand. Transportation research part A: policy and practice 53, pp. 36–52. Cited by: §2.1.
  • [67] (2019) Transforming different Likert scales to a common scale (available online). Note: ibm.com/support/pages/transforming-different-likert-scales-common-scale Cited by: §8.1.1.
  • [68] (2019) Uber will now deactivate riders with below average ratings (available online). Note: https://www.theverge.com/2019/5/29/18644143/uber-deactivate-rider-below-average-rating Cited by: §8.4.
  • R. Verma, S. Ghosh, N. Ganguly, B. Mitra, and S. Chakraborty (2017) Smart-phone based spatio-temporal sensing for annotated transit map generation. In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 16. Cited by: §5.2.
  • R. Verma, S. Ghosh, M. Saketh, N. Ganguly, B. Mitra, and S. Chakraborty (2018) Comfride: a smartphone based system for comfortable public transport recommendation. In Proceedings of the 12th ACM Conference on Recommender Systems, pp. 181–189. Cited by: §1, §2.1, §2.2, Table 1, §4.1.
  • R. Verma, S. Ghosh, A. Shrivastava, N. Ganguly, B. Mitra, and S. Chakraborty (2016a) Unsupervised annotated city traffic map generation. In Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 59. Cited by: §4.2.
  • R. Verma, B. Mitra, and S. Chakraborty (2019) Avoiding stress driving: online trip recommendation from driving behavior prediction. In 2019 IEEE International Conference on Pervasive Computing and Communications (PerCom, pp. 1–10. Cited by: §2.3, §9.1.
  • R. Verma, A. Shrivastava, B. Mitra, S. Saha, N. Ganguly, S. Nandi, and S. Chakraborty (2016b) UrbanEye: an outdoor localization system for public transport. In Proceedings of the 35th Annual IEEE International Conference on Computer Communications, Cited by: §2.2, §5.
  • C. Wang, K. Viswanathan, L. Choudur, V. Talwar, W. Satterfield, and K. Schwan (2011) Statistical techniques for online anomaly detection in data centers. In 12th IFIP/IEEE International Symposium on Integrated Network Management (IM 2011) and Workshops, pp. 385–392. Cited by: §8.2.1, §8.2.
  • [75] (2019) WHAT IS THE BUSINESS MODEL OF UBER? (available online). Note: https://www.proschoolonline.com/blog/what-is-the-business-model-of-uber Cited by: §1.
  • L. Zhang, F. Liu, and J. Tang (2015) Real-time system for driver fatigue detection by rgb-d camera. ACM Transactions on Intelligent Systems and Technology (TIST) 6 (2), pp. 1–17. Cited by: §2.3.
  • H. Zhao, L. Guo, and X. Zeng (2016) Evaluation of bus vibration comfort based on passenger crowdsourcing mode. Mathematical Problems in Engineering 2016. Cited by: §2.2.
  • H. Zhao, H. Zhou, C. Chen, and J. Chen (2013) Join driving: a smart phone-based driving behavior evaluation system. In 2013 IEEE Global Communications Conference (GLOBECOM), pp. 48–53. Cited by: §2.3, §2.4, Table 1, §8.1.2, §8.1.3, §8.1, Table 4.