Road Network Metric Learning for Estimated Time of Arrival

Recently, deep learning have achieved promising results in Estimated Time of Arrival (ETA), which is considered as predicting the travel time from the origin to the destination along a given path. One of the key techniques is to use embedding vectors to represent the elements of road network, such as the links (road segments). However, the embedding suffers from the data sparsity problem that many links in the road network are traversed by too few floating cars even in large ride-hailing platforms like Uber and DiDi. Insufficient data makes the embedding vectors in an under-fitting status, which undermines the accuracy of ETA prediction. To address the data sparsity problem, we propose the Road Network Metric Learning framework for ETA (RNML-ETA). It consists of two components: (1) a main regression task to predict the travel time, and (2) an auxiliary metric learning task to improve the quality of link embedding vectors. We further propose the triangle loss, a novel loss function to improve the efficiency of metric learning. We validated the effectiveness of RNML-ETA on large scale real-world datasets, by showing that our method outperforms the state-of-the-art model and the promotion concentrates on the cold links with few data.



There are no comments yet.


page 1

page 2

page 3

page 4


Hybrid Graph Embedding Techniques in Estimated Time of Arrival Task

Recently, deep learning has achieved promising results in the calculatio...

Relational Constraints for Metric Learning on Relational Data

Most of metric learning approaches are dedicated to be applied on data d...

Partitioned Graph Convolution Using Adversarial and Regression Networks for Road Travel Speed Prediction

Access to quality travel time information for roads in a road network ha...

Detecting Deepfakes with Metric Learning

With the arrival of several face-swapping applications such as FaceApp, ...

Deep Metric Learning using Similarities from Nonlinear Rank Approximations

In recent years, deep metric learning has achieved promising results in ...

Disambiguating Music Artists at Scale with Audio Metric Learning

We address the problem of disambiguating large scale catalogs through th...

Promoting Connectivity of Network-Like Structures by Enforcing Region Separation

We propose a novel, connectivity-oriented loss function for training dee...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Intelligent Transportation System (ITS) aims to explore better transportation options for human beings and better relationships among users, vehicles and transportation infrastructures [6, 7]. Nowadays, with massive spatio-temporal data, artificial intelligence plays more and more important role in ITS by leveraging data-driven methods to analyze the traffic patterns, and has obtained promising results in many tasks of ITS [31, 26, 9].

Estimated Time of Arrival (ETA) is one of the most fundamental and challenging problems in ITS. It is considered as predicting the travel time from an origin location to a destination location along a given route. An ETA model enables the transportation system to efficiently schedule the vehicles to control the increasing urban traffic congestion [5]. Due to the rapid growth of ride-hailing apps such as Uber and DiDi, ETA has attracted more and more attention in recent years. An accurate ETA system can significantly improve the operating efficiency of the ride-hailing platforms by influencing route planning, navigation, carpooling, vehicle dispatching and scheduling. The left part of Fig. 1 show a real case of ETA.

Existing ETA methods can be divided into two categories. The fist one is the additive methods that explicitly predict the travel time for each road segment and give the total travel time of a route by assembling the ingredients’ travel time. These methods have intuitive interpretability, but the prediction may be inaccurate when local errors are accumulated. The other one is the overall methods that directly predict the overall travel time of the route, by formulating ETA as a regression problem. For example, the Wide-Deep-Recurrent model (WDR) [26]

takes neural network to predict the travel time based on a rich set of input features. This kind of methods avoid the local error accumulation but have relatively weak interpretability because of using black-box model.

Fig. 1: The conceptual demonstration of RNML-ETA. The left part shows a real case in which the ETA system predicts the travel time along the route starting from the greed pin to the red pin. The route consists of a sequence of links. To alleviate the data sparsity problem, we propose to transfer the knowledge of hot links to the cold links by metric learning. The links’ similarity is measured using their speed distrubtion.

We refer to the road segments as links in the remaining part of this paper. The technique of embedding [1, 18, 19] is widely used, especially in deep learning ETA models, to capture the spatio-temporal patterns of link as it is one of the most fundamental element in the road network. Each link is represented by an embedding vector which encodes the link’s semantic information through sufficient iterations during the training process. Though the ride-hailing platforms collect millions of trajectories per day, the embedding vectors still suffers from the data sparsity problem of road network that many links are traversed by too few floating cars. For cold links, which are covered by few trajectories, the training of their embedding vectors may end in an under-fitting status. Thus, the travel time estimation may have large error if a route goes through cold links.

To alleviate the data sparsity problem, we propose a novel ETA model named as RNML-ETA. The model leverages multi-task learning [2] and consists of a main task predicting the travel time and an auxiliary task performing the metric learning, in which the similarity between links are measured by their speed distribution. Via the metric learning, similar links get close and dissimilar links get far away in the embedded space. Thus, the embedding vectors of cold links get sufficient training, which significantly improves the ETA accuracy. Moreover, we propose a novel loss function, the triangle loss, for metric learning to take more interaction into consideration in on update. To achieve this, we switch the roles of links among the anchor, positive and negative samples. A conceptual demonstration of RNML-ETA is given in Fig. 1.

The main contributions of this paper are three-fold:

  • To our best knowledge, RNML-ETA is the first deep learning method that effectively addresses the data sparsity problem of road network.

  • We propose a novel metric learning framework to improve the quality of link embedding vectors. The similarity of links can be measured using the speed distribution of links which can be computed from existing ETA data, requiring no extra information. We also propose the novel triangle loss to improve the learning efficiency of metric learning.

  • We conducted comprehensive evaluation of our method on large scale real-world datasets containing over 100 million trajectories. The experimental results validated that RNML-ETA significantly improves the performance compared to a state-of-the-art deep learning method.

The rest of this paper is organized as follows. Section II reviews the related works. Section III introduces our method RNML-ETA in detail. Section IV gives the experimental results on the large-scale real-world datasets. Section V is a conclusion of this paper.

Ii Related Work

Estimated Time of Arrival.

As one of the fundamental problems in intelligent transportation system, ETA attracts an extensive study in both academic and industrial communities. ETA models can be divided into two categories. The first category is the additive methods that explicitly estimate the travel time for each link and give the prediction of a route by assembling the ingredients’ travel time. Rule-based method can be used in the estimation of link travel time. For example, a simple rule dividing the link length by the link travel speed is widely used in the industry. Learning-based methods, such as the dynamic bayesian network


, gradient boosted regression tree 

[29], least-square minimization [28]

and pattern matching 

[3] are also used to mine the traffic patterns and predict the link’s travel time. The data sparsity problem of road network is discussed in [25] that a part of links are traversed by too few trajectories. To alleviate the data sparseness , the authors of [25]

propose to represent the trips as a tensor and utilize tensor decomposition to complete the missing values. However, dealing with data sparsity is still a challenging problem for ETA.

The second category is the overall methods that directly predict the overall travel time of the given route. Early methods such as TEMP [24] and time-dependent landmark graph [27]

use traditional machine learning methods to predict the travel time. Recently, due to the bloom of deep learning

[16, 13, 15], neural network models for ETA are in a rapid development. MURAT [17]

uses feed-forward neural networks to predict the travel time from the origin to the destination without a given path. Multi-task learning and graph embedding are used in MURAT to narrow the accuracy gap to the path-based methods. DeepTTE 


proposes a geo-convolution operation to encode the coordinate information and uses recurrent neural network to learn the travel time along a GPS sequence. Since GPS sequence cannot be acquired until the trip is finished, DeepTTE resamples the GPS points by uniform distance at training stage and generates pseudo points according to a planned route at inference stage. WDR model 

[26] uses a wide linear part and a deep neural network to learn the trip-level information, and a recurrent neural network to learn the fine-grained sequential information in the route. The authors of [8, 14]

transform the map information into the image sequence, and adopt convolutional neural network to mine spatial correlations for ETA. In these deep learning methods, the embedding of geographical elements, such as the link embedding in

[17, 26] and the grid embedding in [30], plays an important role. The embedding technique suffers from the data sparsity problem as well, because insufficient data makes the embedding vectors in an under-fitting status.

Metric learning. The goal of metric learning is to learn a representation function that maps objects into an embedded space. The distance in the embedded space should preserve the objects’ similarity — similar objects get close and dissimilar objects get far away. Various loss functions have been developed for metric learning. For example, the contrastive loss [4] guides the objects from the same class to be mapped to the same point and those from different classes to be mapped to different points whose distances are larger than a margin. Triplet loss [21] is also popular, which requires the distance between the anchor sample and the positive sample to be smaller than the distance between the anchor sample and the negative sample. The case with one positive sample and multiple negative samples is extended in [22]. Metric learning often suffers from slow convergence, partially because the loss only captures limited interaction in one update.

Iii Methodology

We describe the road network as a set of links , where is the total link number in the map and is the link ID ranging from 1 to . We then give the definition of ETA learning problem which is essentially a regression task:

Definition III.1

ETA Learning. Suppose we have a collection of historical trips , where stands for the total trip number, is the departure time, is the arriving time, is the driver ID and is the travel path for -th trip. Our goal is to fit a model that can predict the travel time estimation given the departure time, the driver ID and the travel path. The ground-truth travel time can be computed as . The travel path is represented as a sequence of links , where is the ID of -th link in the -th sequence and is the sequence length of .

We introduce the overall framework of the proposed method in Section III-A, define the measurement of link similarity in Section III-B and introduce the details of our metric learning loss in Section III-C.

Iii-a Overall Framework

We first construct a rich feature set from the raw information of trips. For example, according to the departure time, we can obtain the time slice in a day (every 5 minutes) and the day of week. The features can be categorized into two types: (1) the sequential features which are extracted from the travel path . For a link , we denote its feature vector as , and get a feature matrix for the -th trip. Note that the sequential feature has variable size — in other words, the column number of is decided by the path length; and (2) the non-sequential features which are irrelative to the travel path, e.g day of the week. They are represented as a feature vector with fixed size.

The link embedding vector is an important component of the link feature vector . For link with ID=, we look up an embedding table , and use its -th column

as a distributional representation for the link 

[1] . The is randomly initialized and will be updated in the training process by gradient descending to encode semantic information of links. The link feature vector is a concatenation of , the link length and the link’s travel speed :


The link’s length is obtained by geographical survey and the travel speed is the average speed of the floating cars that traversed the link within the latest time window (e.g 10 minutes).

Data amount significantly affects the quality of embedding vectors. For example in the natural language processing field, Word2vec 

[19] cannot generate meaningful embedding vectors for rare words that occur in very limited sentences. In ride-hailing platforms, the data coverage on road network is still not satisfactory though there are already millions of floating cars. A part of links are traversed by only a few or even zero trajectories. We refer to those traversed by plenty of trips as hot links, and those traversed by only a few or even zero trips as cold links. The hot links’ embedding vectors can be well trained with sufficient iteration. However, the training of cold links’ embedding vectors is often ended in an under-fitting status, which undermines the accuracy of ETA prediction.

To improve the embedding quality of cold links, we propose the Road Network Metric Learning ETA (RNML-ETA), whose training process consists of two tasks. The main task is to predict the travel time, while the auxiliary task is to regularize the link embedding vectors by transferring the knowledge of road network patterns from hot links to cold links. The metric learning in the auxiliary task can help to place the embedding vector of a cold link in a proper position in the embedded space, by reducing the distance to its similar hot links. The loss function of RNML-ETA is:


where is a hyper-parameter to balance the trade-off between the main task and the auxiliary task.

We choose Wide-Deep-Recurrent (WDR) model [26], a state-of-the-art ETA model, to accomplish the main task. The three components of WDR model includes: (1) a wide module memorizing the historical patterns in data by constructing a second order cross product and an affine transformation of the non-sequential feature ; (2) a deep module improving the generalization ability by feeding

into a Multi-Layer Perceptron (MLP), which is a stack of fully-connected layers with ReLU

[13]activation functions; and (3) a recurrent module providing a fine-grained modeling on the sequential feature

via Long-Short Term Memory network (LSTM)

[10], which can capture the spatial and temporal dependency between links.

We denote the outputs of the wide module as , the output of the deep module as , and the last hidden state of LSTM as . The travel time prediction is given by a regressor, which is also a MLP, based on the concatenation of the outputs:

Fig. 2: The overall architecture of RNML-ETA. The loss function consists of two aspects: (1) the main task uses a Wide-Deep-Recurrent model to learn the travel time prediction, and (2) the auxiliary task uses metric learning to improve the quality of link embedding vectors.

The hidden state sizes in the deep module, the LSTM and the regressor MLP are all set to 128. The hidden state and memory cell of LSTM are initialized as zeros. We choose Mean Absolute Percentage Error (MAPE) as the loss function of the main task:


where is the ground-truth travel time. The overall architecture of RNML-ETA and the main task workflow are visualized in Fig. 2. The details of the auxiliary task will be introduced in the following sections.

Iii-B Link Similarity

To apply metric learning on the link embedding vectors, a similarity measurement of links should be defined. Since the link’s travel speed essentially reflects how long a car is expected to take to pass through the link, the speed distribution across different time could be used to depict the traffic characteristic of the link. We construct a series of time bins for a day. These time bins are ensured to be non-overlapped: ; and their union covers the whole day: . We then statistic the average travel speed for link and time bin by computing:


where is the travel speed feature of -th link in -th trip, and is an indicator that if is satisfied and otherwise. Intuitively, we find a subset of the link ’s travel speed features by selecting those whose departure time belongs to the time bin , and then compute the average on the subset. In practice, we use a configuration of time bins with from 5 a.m to 11 a.m representing the morning peak, from 4 p.m to 10 p.m representing the evening peak and taking the remaining hours representing the off-peak time.

We further scale the speeds to be within by applying , where and are the minimum and maximum of . We finally get a normalized speed histogram of link :


A difference matrix can be computed as follows:


where is the element of measuring the difference between links with ID= and ID=. Smaller difference means larger similarity. The similarity based on speed histogram shows advantages on two aspects. Firstly, the ETA is mostly determined by the traffic condition and is partially influenced by personalized factors such as the driving habit. The latest average speed is a good reflection of the traffic condition. If two links have similar speed distribution, they should also have similar impact on the ETA prediction. Secondly, the speed histogram does not rely any extra information and can be computed directly from the data used in the main task, which facilitates the method implementation.

Iii-C Triangle Loss

Links with similar characteristic are expected to be closer in the embedded space and those with dissimilar characteristic are expected to be farther. With this end in view, we propose a novel metric learning loss function, named as triangle loss. Suppose we have three links with ID= and the corresponding differences , and , without loss of generality, we assume:

Fig. 3: The distances forms a triangle and the order of their edge lengths should satisfy the relation in Eq. 8.

We then compute the Euclidean distances between the embedding vectors of link , and . For example:


where is the L-2 normalized embedding vector. The three distances , and forms a triangle. We aims to restrict the lengths of the triangle edges to be in the same order as in Eq. 8, which derives three inequations:


where , and are required margins. Unlike the triplet loss [21] which has only one restriction that the distance between anchor and positive sample should be smaller than the distance between anchor and negative sample, the links in our method take turns to act as the anchor. This enables a more efficient metric learning in one update and thus accelerates the convergence. Fig. 3 gives a visualized demonstration. The triangle loss is in the form of:


where the operator and is the number of possible triangles in the training set, , and are hyper-parameters to adjust the weights of the three distances. The auxiliary task and main task are simultaneously optimized via gradient descending. For a mini-batch of trips, we first compute the loss of the main task, and then compute the auxiliary loss by randomly combining triangles with all the links in the trips.

Iv Experiment

The evaluation is on large scale real-world datasets collected in DiDi platform. We will introduce the datasets, the competing methods, the implementation details and the experimental results in sequence.

Iv-a Dataset

We collected massive floating car trajectories of Beijing in 2018 in DiDi platform. The trajectories are split into pickup and trip datasets according to the driver’s working status. A pickup trajectory starts when a driver responds to a passenger’s request and ends when he/she picks up the passenger. A trip

trajectory starts when the passenger gets on board and ends when arriving the destination. For each dataset, we use 25 weeks of data as training set and the following 2 weeks as validation set and test set, respectively. We remove the outliers with extremely short travel time (

60s) and extremely high average speed (120km/h). The data statistics are summarized in Table I.

size pickup trip
training set 25 weeks 111.0M 105.5M
validation set 1 week 4.0M 4.5M
test set 1 week 4.1M 3.9M
# traversed link - 1.2M 1.3M
TABLE I: Statistics of datasets

The links are from a wide range of roads, such as private community roads, local streets and urban freeways. As shown in Table I, the trip dataset covers more links than the pickup dataset. However, both the datasets suffer from the road network sparsity problem that most of the links are short of data. To demonstrate it, we plot the histogram of link coverage frequency in Fig. 4. Though with over 0.1 billion of trajectories, there is a significant number of cold links that are traversed by only a few times in about half a year (25 weeks). The median coverage frequencies of link are 42 on pickup and 69 on trip.

Fig. 4: Statistics of link coverage frequency. For both pickup and trip datasets, the links concentrate on the bands with small number of traversing trajectories.

Iv-B Competing Methods

We compare the proposed RNML-ETA with the following competitors.

(1) Route-ETA: a representative method in industrial application. In this solution, the travel time estimation for each link is made by dividing the link length by the link travel speed. The waiting time at each intersection is mined from the historical data. Given a route, the total travel time is predicted as the sum of each link’s travel time and each intersection’s waiting time. Route-ETA has very fast inference speed but its accuracy is often far from satisfactory compared to deep learning methods.

(2) WDR [26]: a deep learning method achieving the state-of-the-art performance in ETA problem. Since it is the model used in our main task, the comparison between WDR and RNML-ETA evaluates the benefit of the auxiliary task.

(3) WDR-no-link-emb: a variant of WDR that removes the link embedding technique. The main purpose of using this model is to quantify the contribution of link embedding vectors, of which the RNML-ETA is aiming to improve the quality.

Besides the Mean Absolute Percentage Error (MAPE), which is used as objective function in the main task, we also take Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) as the evaluation metrics. The computations are:

MAE (12)

Iv-C Implementation Details

The neural networks in WDR, WDR-no-link-emb and RNML-ETA

are implemented in PyTorch 

[20], and the training is accelerated on a single NVIDIA P40 GPU. We use a mini-batch size of 256 and set the maximal iteration number to 7 millions. The hyper-parameters of RNML-ETA are selected by the results on validation set. We use margins , and weights , in the triangle loss for both pickup and trip datasets. The task weight is 0.52 for pickup and 0.35 for trip. All the parameters, such as the MLP weights and the embedding vectors, are jointly trained using Adam [12] optimizer, which is a stochastic gradient descending method. Adam can adaptively adjust the step size according to the historical gradients and thus accelerate the convergence. The learning rate is set to 0.0002.

Iv-D Experimental Results

We list the results of pickup data in Table II and trip data in Table III, and mark the best scores by bold font. The proposed method RNML-ETA outperforms all the competitors on both datasets. The metric learning component significantly improves the main task model’s accuracy to predict the travel time. For example, RNML-ETA reduces RMSE on pickup data and reduces MAPE on trip data compared to WDR. The importance of link embedding technique is also validated that it brings and reduction on MAPE for pickup and trip data, respectively (WDR-no-link-emb v.s. WDR). Moreover, it can be observed that there is a large performance gap between the simple rule-based model Route-ETA and the deep learning models.

MAPE (%) MAE (sec) RMSE (sec)
Route-ETA 69.008 106.966
WDR-no-link-emb 59.018 95.876
WDR 54.686 89.976
RNML-ETA 19.215 53.546 87.617
TABLE II: Results of the pickup dataset
MAPE(%) MAE (sec) RMSE (sec)
Route-ETA 150.560 248.736
WDR-no-link-emb 117.337 197.652
WDR 108.919 186.083
RNML-ETA 11.597 108.519 185.897
TABLE III: Results of the trip dataset
Fig. 5: Results of the finer evaluation on subsets with different link coverage level. For a threshold , we keep the trajectory that at least of the contained links have coverage frequencies less than . The 6 subfigures stand for (a) MAPE on pickup data, (b) MAPE on trip data, (c) MAE on pickup data, (d) MAE on trip data, (e) RMSE on pickup data and (f) RMSE on trip data.

The results in Table II and Table III show the overall accuracy on all the links. Since RNML-ETA mainly aims to improve the embedding quality of cold links, its contribution needs a finer evaluation which reports the metrics at different link coverage level. Thus, we select a series of subsets from the dataset by restricting the link coverage frequency in the trajectory. Specifically, we keep a trajectory if at least of the contained links have coverage frequencies less than a threshold , and drop the trajectory otherwise. By varying from 50 to 500 on pickup data and from 300 to 750 on trip data in a step of 50, we obtain 10 subsets for each dataset. In subset with lower , the trajectory contains more cold links. We then compute the metrics on these subsets and plot the curves in Fig. 5.

We take Fig. 5 (a) as an example (the trends in other subfigures are similar). As the threshold increases, the subset includes more hot links and the MAPE of WDR gradually decreases from to , which is a large improvement for ETA problem. This phenomenon shows that links covered by more trajectories do have better prediction accuracy and supports the existence of the road network data sparsity problem. On the subset with , our method RNML-ETA outperforms WDR by more than 2 percentage in terms of MAPE. However, the gain on overall MAPE (Table II) is less than 0.2 percentage. Such a comparison validates the effectiveness of RNML-ETA that it mainly improves the performance of cold links. As increases, RNML-ETA achieves MAPE improvements up to on pickup data and up to on trip data.

Iv-E Influence of Hyper-parameter

To explore the influence of hyper-parameters, we plot the performance curves of pickup data in Fig. 6 by varying the margin and the task weight , which are two representative hyper-parameters. The basic configuration is the same as in Section IV-C, namely, , , , and .

The hyper-parameter is a bit more special than and , because it controls the gap between the longest edge and the shortest edge in the triangle loss. If this restriction is broken, it means that the model is far from our expected status and needs a stronger gradient to update the parameters. Usually, we set and find that achieves the best performance according to the curve in Fig. 6 (a). Moreover, RNML-ETA achieves better performance than WDR from to , which demonstrates that the superiority of RNML-ETA is not sensitive to the margin hyper-parameter.

The task weight is to balance the trade-off between the main task and the auxiliary task. In extreme cases, RNML-ETA degenerates to WDR if and degenerates to a pure metric learning model if . Fig. 6 (b) shows that the advantage of RNML-ETA over WDR is robust in a wide range of from to and that the best performance is achieved at .

Fig. 6: The influence of hyper-parameters: (a) for the margin in the triangle loss, and (b) for the weight balancing the main task and the auxiliary task. Though MAPE varies under different hyper-parameters, RNML-ETA generally outperforms the competitor WDR, which demonstrates the robustness of our method.

V Conclusion

In this paper, we propose a novel metric learning framework for ETA, named as RNML-ETA, to address the data sparsity problem of road network. In the main task, we use WDR model to predict the travel time. In the auxiliary task, we first construct a difference matrix by computing the Euclidean distances between the links’ speed distributions, and then use metric learning to get the similar links close and dissimilar links far away in the embedded space. The auxiliary task is aiming to improve the quality of embedding vectors of links. We conduct experiments on two large scale real-world datasets collected in DiDi platform. The results validated the effectiveness of RNML-ETA by showing that it outperforms the state-of-the-art WDR model on all the evaluation metrics. A further experiment finely examines the gains for different types of link and find that RNML-ETA significantly improves the accuracy for routes containing cold links.


  • [1] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin (2003) A neural probabilistic language model. Journal of machine learning research 3 (Feb), pp. 1137–1155. Cited by: §I, §III-A.
  • [2] R. Caruana (1997) Multitask learning. Machine learning 28 (1), pp. 41–75. Cited by: §I.
  • [3] H. Chen, H. A. Rakha, and C. C. McGhee (2013)

    Dynamic travel time prediction using pattern recognition

    In 20th World Congress on Intelligent Transportation Systems, Cited by: §II.
  • [4] S. Chopra, R. Hadsell, and Y. LeCun (2005) Learning a similarity metric discriminatively, with application to face verification. In CVPR, Vol. 1, pp. 539–546. Cited by: §II.
  • [5] S. Çolak, A. Lima, and M. C. González (2016) Understanding congested travel in urban areas. Nature communications 7 (1), pp. 1–8. Cited by: §I.
  • [6] G. Dimitrakopoulos and P. Demestichas (2010) Intelligent transportation systems. IEEE Vehicular Technology Magazine 5 (1), pp. 77–84. Cited by: §I.
  • [7] L. Figueiredo, I. Jesus, J. T. Machado, J. R. Ferreira, and J. M. De Carvalho (2001) Towards the development of intelligent transportation systems. In ITSC (Cat. No. 01TH8585), pp. 1206–1211. Cited by: §I.
  • [8] T. Fu and W. Lee (2019) DeepIST: deep image-based spatio-temporal network for travel time estimation. In ACM CIKM, pp. 69–78. Cited by: §II.
  • [9] S. Guo, Y. Lin, N. Feng, C. Song, and H. Wan (2019) Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In AAAI, Vol. 33, pp. 922–929. Cited by: §I.
  • [10] S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §III-A.
  • [11] A. Hofleitner, R. Herring, P. Abbeel, and A. Bayen (2012) Learning the dynamics of arterial traffic from probe data using a dynamic bayesian network. IEEE Transactions on Intelligent Transportation Systems 13 (4), pp. 1679–1693. Cited by: §II.
  • [12] D. Kingma and J. Ba (2015) Adam: a method for stochastic optimization. ICLR, San Diego. Cited by: §IV-C.
  • [13] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. In NeurIPS, pp. 1097–1105. Cited by: §II, §III-A.
  • [14] W. Lan, Y. Xu, and B. Zhao (2019) Travel time estimation without road networks: an urban morphological layout representation approach. In IJCAI, pp. 1772–1778. Cited by: §II.
  • [15] H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin (2009) Exploring strategies for training deep neural networks. Journal of machine learning research 10 (Jan), pp. 1–40. Cited by: §II.
  • [16] Y. LeCun, Y. Bengio, and G. Hinton (2015) Deep learning. nature 521 (7553), pp. 436. Cited by: §II.
  • [17] Y. Li, K. Fu, Z. Wang, C. Shahabi, J. Ye, and Y. Liu (2018) Multi-task representation learning for travel time estimation. In SIGKDD, pp. 1695–1704. Cited by: §II.
  • [18] G. Mesnil, X. He, L. Deng, and Y. Bengio (2013) Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding.. In Interspeech, pp. 3771–3775. Cited by: §I.
  • [19] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean (2013) Distributed representations of words and phrases and their compositionality. In NeurIPS, pp. 3111–3119. Cited by: §I, §III-A.
  • [20] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. (2019) PyTorch: an imperative style, high-performance deep learning library. In NeurIPS, pp. 8024–8035. Cited by: §IV-C.
  • [21] F. Schroff, D. Kalenichenko, and J. Philbin (2015)

    Facenet: a unified embedding for face recognition and clustering

    In CVPR, pp. 815–823. Cited by: §II, §III-C.
  • [22] K. Sohn (2016) Improved deep metric learning with multi-class n-pair loss objective. In NeurIPS, pp. 1857–1865. Cited by: §II.
  • [23] D. Wang, J. Zhang, W. Cao, J. Li, and Y. Zheng (2018) When will you arrive? estimating travel time based on deep neural networks. In AAAI, Cited by: §II.
  • [24] H. Wang, Y. H. Kuo, D. Kifer, and Z. Li (2016) A simple baseline for travel time estimation using large-scale trip data. In SIGSPATIAL GIS, pp. 61. Cited by: §II.
  • [25] Y. Wang, Y. Zheng, and Y. Xue (2014) Travel time estimation of a path using sparse trajectories. In SIGKDD, pp. 25–34. Cited by: §II.
  • [26] Z. Wang, K. Fu, and J. Ye (2018) Learning to estimate the travel time. In SIGKDD, pp. 858–866. Cited by: §I, §I, §II, §III-A, §IV-B.
  • [27] J. Yuan, Y. Zheng, X. Xie, and G. Sun (2011) T-drive: enhancing driving directions with taxi drivers’ intelligence. IEEE Transactions on Knowledge and Data Engineering 25 (1), pp. 220–232. Cited by: §II.
  • [28] X. Zhan, S. Hasan, S. V. Ukkusuri, and C. Kamga (2013) Urban link travel time estimation using large-scale taxi data with partial information. Transportation Research Part C: Emerging Technologies 33, pp. 37–49. Cited by: §II.
  • [29] F. Zhang, X. Zhu, T. Hu, W. Guo, C. Chen, and L. Liu (2016) Urban link travel time prediction based on a gradient boosting method considering spatiotemporal correlations. ISPRS International Journal of Geo-Information 5 (11), pp. 201. Cited by: §II.
  • [30] H. Zhang, H. Wu, W. Sun, and B. Zheng (2018) Deeptravel: a neural network based travel time estimation model with auxiliary supervision. In IJCAI, pp. 3655–3661. Cited by: §II.
  • [31] J. Zhang, F. Wang, K. Wang, W. Lin, X. Xu, and C. Chen (2011) Data-driven intelligent transportation systems: a survey. IEEE Transactions on Intelligent Transportation Systems 12 (4), pp. 1624–1639. Cited by: §I.