1 Introduction
Estimated time of arrival (ETA) or travel time prediction is universally considered as the travel time estimation given a pair of origin and destination locations along the route Wang et al. (2018b). As an essential component of artificial intelligence for transportation, ETA influences route planning, navigation and vehicle dispatching which are fundamental for ridehailing platforms, such as DiDi and Uber Wang et al. (2018b, a). ETA is a representative and challenging sequence learning and data mining task attracting lots of attention Wang et al. (2014); Hofleitner et al. (2012); Chen et al. (2013); Zhang et al. (2016); Wang et al. (2018b, a); Li et al. (2018).
Since 2018, deep learning LeCun et al. (2015) based methods Wang et al. (2018a); Li et al. (2018); Wang et al. (2018b) which significantly overperform nondeep learningbased methods Wang et al. (2014); Hofleitner et al. (2012); Chen et al. (2013); Zhang et al. (2016) mines the spatialtemporal correlations concurrently and effectively from largescale data and become stateoftheart. The general sequential semantic information extractor of these stateoftheart methods, such as WDR Wang et al. (2018b), DeepTTE Wang et al. (2018a), DeepTravel Zhang et al. (2018) are mainly one Recurrent Neural Network (RNN) Hopfield (1982); Jordan and I (1997); Elman and L (1990)
variant, Long ShortTerm Memory Network (LSTM)
Hochreiter and Schmidhuber (1997). RNN adopts the recurrent structure to model sequence and extract semantic information, which also determines its restricted inference speed due to nonparallelization.In this paper, we discuss the possibility of mainly adopting FFN to mine spatialtemporal information from sequential massive data for ETA, as illustrated in Fig. 1. FFN is parallelizable and naturally beneficial for fast ETA inference considering accuracy simultaneously which is a industry pain point for ridehailing platforms. However, completely depending on FFN, the model can hardly capture the dependency between links.
Will there be a novel structure helping FFN analyze sequence semantic information effectively and play its obvious advantages in inference speed for ETA? Follow this line, we present a novel Multifactor Attention which is specially designed for ETA, a sequential learning task affected by various factors. FMAETA which is mainly based on FFN with Multifactor Attention is proposed for a better sequence feature extractor than RNN which is stateoftheart since 2018.
The main contributions in this work are as follows:

We propose a novel ETA deep learning based framework, FMAETA that is the first deep learing framework entirely based on FFN with attention, to our best knowledge.

We propose a novel Multifactor Attention mechanism for effectively learning the time dependency and semantic information between time steps of the sequence. Through sufficient experiments, we find that for ETA, Multifactor Attention is better than Multihead attention Vaswani et al. (2017)
which is famous in natural language processing. Besides, Multifactor attention can be adpoted and may also be promising for other sequence learning tasks affected by various factors.

We evaluate FMAETA on the massive realworld dataset containing over 500 million trajectories from one famous ridesharing platform. The abundant experimental results demonstrate that FMAETA’s estimation precision is comparable with the stateoftheart RNN based method, WDR. Not only that, FMAETA improves the ETA model inference speed than WDR significantly.
We organize the paper as follows. Section 2 briefly summarizes the backgrounds of ETA, sequence learning and attention mechanism. Section 3 introduces the overall framework of FMAETA, followed by the description of the general Multifactor attention in detail. In Section 4, we elaborate the reason why we propose Multifactor attention. In Section 5, experimental result comparisons on the largescale realworld dataset are presented to show the excellent accuracy and inference speed of FMAETA. Finally, this paper is concluded and the possibility of further work is analyzed in Section 6.
2 Background
In this section, we briefly overview the background of our work, inculding estimated time of arrival and attention mechanism.
2.1 Estimated time of arrival
Estimated time of arrival (ETA) is a challenging problem in the field of intelligent transportation system. There are two representative methods for solving ETA, routebased method and datadriven method. The routebased method focus on formulate the travel time of a given route as the summation of time on each road segment and each intersection. Traditional machine learning methods such as dynamic Bayesian network
Hofleitner et al. (2012), leastsquare minimization Zhan et al. (2013)and pattern matching
Chen et al. (2013) are typical approaches to capture the spatialtemporal features in the routebased method. However, the idea of dividing the original trajectory results in the accumulation of local errors. The datadriven method has shifted from traditional methods such as TEMP Wang et al. (2019) and timedependent landmark graph Yuan et al. (2011) to deeplearning based methods Wang et al. (2018b); Fu et al. (2020). MURAT Li et al. (2018) uses multitask learning and graph convolutional networks to assist a residual block to predict the travel time from the departure to the destination without a given trajectory. In recent years, researchers have conducted more explorations on applying deep learning methods to solve ETA problems, such as Deeptravel Zhang et al. (2018), DeepTTE Wang et al. (2018a), Deepi2t Lan et al. (2019) and WDR Wang et al. (2018b). These methods apply different approaches on modeling spatial information, but they all use LSTM Hochreiter and Schmidhuber (1997) to extract features from time series. However, the inference speed of the model with LSTM is too slow to be applied in actual scenarios. In this work, we proposed FMAETA which can sufficiently handle the above problem.2.2 Attention mechanism
Attention is a very effective mechanism in natural language processing Bahdanau et al. (2014), image caption Xu et al. (2015) and other research areas Ren et al. (2019). Attention mechanism has outstanding ability in capturing semantic dependencies. Common attention mechanisms are local attention, global attention Luong et al. (2015), selfattention Vaswani et al. (2017), etc. Transformer Vaswani et al. (2017) is a novel sequence to sequence network entirely based on FFN with Multihead selfattention. It achieves promising results in translation with a faster speed than RNNbased models. Then selfattention becomes a hot topic in neural network attention research. Selfattention is calculated by:
(1) 
where is query, key and value matrix, is the dimension of key and query matrix, and key and query matrix are usually the same. Selfattention is proved useful in a wide variety of tasks including sequential recommendation Kang and McAuley (2018), reading comprehension Yu et al. (2018), speech recognition Salazar et al. (2019) and traffic flow predicting Zhu et al. (2018).
Deep learningbased ETA models are mostly RNNbased models. RNN has problems when deals with longrange dependencies. LSTM are able to deal the problem to some extent, but in practice it still have problems in longrange dependencies. The inference time of LSTMbased model is too long for practical application, so it is vital to introduce attention mechanism to the ETA problem.
3 Model Architecture
We first give the accurate mathematical definition of estimated time of arrival with reference to Wang et al. (2018b).
Definition 3.1 (Estimated time of arrival).
For a collection of historical trips , where is the departure time for the ith trajectory, is the arrival time for ith trajectory, is the driver ID, is the link sequence set of the trajectory, and is the total number of samples. The ground truth travel time is computed by . Here the link sequence set can be represented as , where represents the jth link in the ith trajectory, and is the length of the link sequence set.
Sicne 2018, most stateoftheart methods’ main force for capturing spatialtemporal patterns to complete ETA has change into RNN (specifically, LSTM). RNN is a famous general sequence feature extractor for various sequence learning subfields, such as speech signal processing and natural language processing. In this paper, we break the stereotype and present a ETA framework entirely based on FFN and novel Multifactor Attention, FMAETA. We introduces the overall structure of FMAETA as well as proposed Multifactor attention in next two subsections.
3.1 Overall framework
The first main step is the sophisticated feature engineering where we follow Wang et al. (2018b). Rich features from massive raw data is the key input for deep learning model. Features could be divided into the following two categories.
(1) Global features are sparse and one trajectory corresponds to one global representation, such as driverID, day of week, departure time slice. The method, Embedding Bengio et al. (2003) is adopted for the dimensionality reduction of sparse features.
(2) Sequential features are related to each link of the trajectory, for instance, length of the link (road segment), speed (road contidion), link time (related to road contidion) and embedding of linkID. These four factors influence ETA from different perspectives.
We then describe the overall framework of FMAETA, as shown in Fig. 2.
Two main components of FMAETA are Multifactor Attention and FFN.
(1) Sequential features adopt our Multifactor Attention which will be discussed in detail in next subsection. This component fully explores the relationship between different links in each track.
(2) Parallelizable FFN is the main reason for simplicity and fast inference that are our greatest advantages compared with RNN. The front FFN is utilized for each sequential factor to mining the spatialtemporal patterns in their single aspect as well as for concatenated factors. The last FFN is for the information aggregation from sequential separate and combined representations as well as embeddings of global features.
The regressor is one linear layer with ReLU
Krizhevsky et al. (2012)as the activation function. The Objective function of the overall deep learning model is the mean absolute percentage error (MAPE) which is an common and relative loss function for ETA. The
FMAETA’s parameters are trained through:(2) 
where is query’s ETA, is the ground truth time and is all the parameters of FMAETA.
3.2 Multifactor Attention
ETA is a challenging and complex problem due to the fact that various factors affect the accuracy of prediction, such as the link length as well as its road condition. Therefore, unlike for natural language processing where one word can be represented by a single embedding vector, different sequential features ought to be treated and dealt with more specifically for ETA. Our Multifactor Attention mechanism is proposed to let different sequential factors mine its patterns and the impact on ETA in different subspaces, as shown in the upper left corner of Fig.
2. Self attention is with reference to Vaswani et al. (2017)and we add position encoding, residual connections, layer normalization, and dropout after self attention following
Vaswani et al. (2017). Combined sequential features also capture the spatialtemporal patterns as a whole by FFN with selfattention.In the Fig. 2, we show the Multifactor Attention mechanism with three factors. When the number of the factors is arbitrary , the general Multifactor Attention could be expressed:
(3)  
Where is the ith factor of ETA problem, is the learned parameters in the FFN layers of the ith factor, is the learned parameters in the FFN layers of the combined features. For our FMAETA, sequential factors contains (1) length of the link, (2) road contidion speed, (3) corresponding link time and (4) embedding of linkID. Hence, we adopt the Multifactor Attention version of four factors, i.e., . Through concatenating the separate and combined sequence representations, our Multifactor Attention complete the multi level and detailed extraction of the spatialtemporal dependencies of sequence data.
4 Why Multifactor Attention
In this section, we will discuss the motivation and reason for the proposal of multifactor attention. The RNNbased model has a good performance on the ETA problem of which the evaluation metric is good. However, the RNNbased model has a slow training/inference speed, making it difficult to be applied in practical problems. FFN is a promising method to accelerate the speed of the model. But FFN has a poor performance in sequence learning and have problems on longrange dependencies. Our proposed multifactor attention can solve the above problem. We will analyze and compare the total computational complexity and sequential operations of RNN and FFN with Multifactor Attention.
As shown in Table 1, the multifactor attention only need sequetial operations while RNN requires . As for computational complexity, when the length of sequence is smaller than the dimension of features , our multifactor attention is faster than RNN.
Self attention especially multihead attention has achieved good results on sequence learning. Why not multihead attention? In terms of ETA problem, there are many different factors affect it and the traffic state is complex and dynamic. Experiments in Section 5 show that multihead attention does not perform well on complex problems in the transportation system like ETA. Our multifactor attention focus on both separate features and combined features. In this way can we promote different subspaces to analyze the effect of a certain factor pattern on ETA. The evaluation metrics shows that model with multifactor attention preforms better than model with multihead attention on ETA problems.
Hence, multifactor attention is more effective for extracting systematic and comprehensive spatialtemporal patterns comparing with multihead attention. Considering the speed promotion of FFN, FFN with multifactor attention has a great advantage in tasks in intelligent transportation system (ITS). Multifactor attention is a general method and may be also promising for other time series forecasting tasks.
Complexity per Layer  Sequential Operations  

RNN  
Multifactor Attention 

is the length of the sequence, is the dimension of features.
5 Results
5.1 Dataset
We evaluate our model on a largescale realworld floatingcar trajectory dataset Beijing 2018 collected by a ride hailing platform. It contains the trajectory data of hundreds of millions of Beijing taxi drivers after desensitization for more than 4 months in 2018. This dataset covers different types of roads in Beijing urban areas, including local streets and freeways. We filter out the abnormal data where the driving time is less than 60s or the speed exceeds 120 km/h in Beijing 2018. We divided this data set into atraining set (the first 16 weeks of data), a validation set (the middle2 weeks of data) Test set (data for the last 2 weeks).
5.2 Compared methods
We compared the proposed FMAETA with the following competitors:
(1) RouteETA: a representative method for traditional nondeep learning methods. It divides the trajectory into several links and intersections. The travel time in the ith link of this trajectory is calculated by dividing the link’s length by the estimated speed in the ith link. The delay time in the jth intersection is provided by a realtime traffic monitoring system.The final arrival time is the sum of the estimated time spent in each subsection.
(2) WDR(RNN): a deep learning method achieving the stateoftheart performance in ETA problem. WDR is a joint model including width module, depth module, and recurrent module. It can effectively use the dense features, highdimensional sparse features and local features of road sequence in traffic information. Here we use RNN in the recurrent module.
(3) WDR(LSTM): a variants of WDR(RNN). Here we use LSTM in the recurrent module of WDR and we make no changes to other part of WDR.
(4) WDFFN
: a variants of WDR. It uses deep module to replace the recurrent module. Here we use a MultiLayer Perceptron network for comparision.
(5) WDResnet: a variants of WDR. It uses deep module to replace the recurrent module. Here we use a residual structure to extract features.
(6) Multihead attention: a variants of WDR. We use FFN with multihead attention mechanism instead of RNN to extract features from the sequential data.
If this work is accepted, we will open source the codes of proposed deep learningbased model,
FMAETA.5.3 Experimental Settings
In our experiment, all models are written in PyTorch. They are trained and evaluated on a single NVIDIA Tesla P40 GPU. The number of iterations of the deep learningbased method is 3.5 million. We use the method of Back Propagation (BP) to train the deep learningbased methods, and the loss function is the MAPE loss. We choose Adam as the optimizer due to its good performance. The batch size is 256 and the initial learning rate is 0.0002.
5.4 Evaluation Metrics
To evaluate and compare the performance of FMAETA and other baselines, we use evaluation metrics, Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE) and Rooted Mean Square Error (RMSE):
(4) 
(5) 
where is the predicted travel time, and is the ground truth travel time. The calculation process of MAPE is shown in Section 3.
5.5 Experimental Results and Analysis
MAE(sec)  RMSE(sec)  MAPE(%)  Latency(ms) ^{*}  

RouteETA  69.008  106.966  25.010  0.179 
WDFFN  57.797  93.588  21.106  0.344 
WDResnet  57.064  92.241  21.015  0.454 
WDR(RNN)  55.284  90.836  19.677  1.107 
WDR(LSTM)  55.227  90.480  19.598  1.109 
Multihead attention  55.145  90.101  19.678  0.635 
FMAETA (ours)  54.642  88.794  19.618  0.866 

Latency is the average inference time of the models.
Table 2 shows the general three evaluation metrics for ETA problems. Our FMAETA outperforms all competitors in terms of MAE and RMSE metrics. FMAETA achieves similar results with the startoftheart method WDR(LSTM) in terms of MAPE metric. The detailed analysis of the experimental results are as follows.
(1) The representative nondeep learning method, routeETA performs worse than other deep learning based methods. It shows that the datadriven method is more effective than routebased method. The deep method is suitable for modeling complex transportation system given massive spatiotemporal data.
(2) Models with recurrent module performs better than models that only use deep modules without attention mechanism. WDR(RNN) and WDR(LSTM) achieves better results than WDFFN and WDResnet. WDR(LSTM) performs best on MAPE metric, because the use of gated units can solve the problem of longterm dependencies to a certain extent. The deep modules with attention achieve better results than WDR on MAE and RMSE metrics, which means attention mechanism can help to extract features and sole the longrange dependencies in long sequence.
(3) Our FMAETA performs best on MAE and RMSE metrics, which means our method is very applicable to ETA problems. Our FMAETA outperforms LSTM by 1.05% in terms of MAE loss and 1.86% in terms of RMSE loss. Our FMAETA perform similar results to WDR(LSTM), and our FMAETA only 0.1% worse than WDR(LSTM) in terms of MAPE metric. Considering the three evaluation metrics, our FMAETA performs best on ETA problems.
(4) As can be seen in the "Latency" column of Table 2, our FMAETA speed up the inference process by compared with WDR(LSTM). RouteETA has the shortest time of 0.179s, but its performance on evaluation metrics is poor. FFNbased methods without attention mechanism is fast, but it brings a great loss on the evaluation metrics. Model with multihead attention is faster than FMAETA. Its performance is worse than FMAETA. If the performance of models for ETA problem is not good enough, many tasks of ETA’s downstream in intelligent transportation systems such as route planning, navigation and vehicle dispatching will be affected greatly. Therefore, we should increase the inference speed of the model while ensuring the accuracy. Currently only our FMAETA can reach the goal.
Our FMAETA has a good performance on the ETA problem, and it greatly prompts the inference speed compared with the stateoftheart method WDR(LSTM). FMAETA achieves clear improvements over WDR(LSTM) regarding to MAE and MAPE metrics. Taking into account both evaluation metrics and speed, our method is the most suitable method for ETA problems.
5.6 Speed comparison of different methods
As we analyzed above, the stateoftheart method WDR(LSTM) for ETA problem in previous literature takes a long time for inference. This makes WDR(LSTM) hard to be applied in the realtime traffic system. FFN can greatly accelerate the inference speed of the model for ETA problem, such as WDFFN and WDResnet, but it causes a large decrease in accuracy which can be seen in Table 2. The attention mechanism can help FFN to effectively extract sequence features. The existing multihead attention improves the inference speed, but it still brings a great loss of accuracy. Our goal is to increase the inference speed of the ETA model while ensure that the evaluation metrics do not decrease. We can see from the average inference speed in Tabel 2, only FMAETA can achieve the goal. We further explore the inference speed of WDR(LSTM) and FMAETA with different sequence length. We randomly sample 50 samples at each sequence length for WDR(LSTM) and FMAETA, then plot the scatters in Figure 3. The curve in Figure 3 is obtained by fitting the sampling points through logarithmic fitting.
As illustrated by the figure, our FMAETA is obviously faster than WDR(LSTM) when the sequence length is large than 180. LSTMbased model is fast in short sequences, and its consuming time increases rapidly as the sequence becomes longer. In actual car rides data, longrange sequences are common, so FMAETA is more applicable for practical problems.
6 Conclusion and Future Work
In this paper, to our best knowledge, we are the first to estimate travel time entirely based on FFN with attention by presenting FMAETA. This idea is novel and quite different from RNN based methods which have been stateoftheart since 2018. Furthermore, we propose a novel Multifactor selfattention mechanism for FFN to better mine sequence semantic information for ETA which is affected by various factors. Through sufficient experiments on the massive realworld dataset from a famous intelligent travel platform, we conclude that FMAETA achieves slight improvements over other stateoftheart methods regarding to the prediction precision. Most importantly, our method significantly speeds up the inference process compared with RNN based methods. Multifactor selfattention mechanism is also verified by experiments to be superior to the popular Multihead selfattention that is proposed for natural language processing. Future efforts will be made to adopt our Multihead selfattention for other sequence learning tasks which are also affected by many complex factors. Besides, we plan to conduct a series of online tests for FMAETA and decide if we could adopt this promising deep learning framework for large scale practical application.
Broader Impact
We present the statement of the broader impact of our paper as followed:
a) This research is benefitial for many other tasks in ITS, such as route planning, navigation and vehicle dispatching. Our FMAETA which is an significantly faster and more accurate framework for ETA do good to the ridehailing platforms, such as DiDi and Uber, for providing better user experiencs. Furthermore, our method promotes the longterm development of ITS and the spatialtemporal sequential prediction;
b) We are convinced that nobody will be put at disadvantage from our work. On the contrary, our research indirectly makes it more convenient for many people to travel and helps environmental protection;
c) Our framework is a potential framework for online application, which reflects the practical application value of our model. If our model is lucky enough to be selected as the practical application method, the ridehailing platform will also go through multidirectional tests in order to avoid the only economic loss once our method fails;
d) We ensure that the method does not leverage any biases in the data. The experiments are carried out on the largescale realworld vehicle travel dataset. The data includes more than 500 million trajectories and covers almost all road types. Therefore, the distribution tends to be that of real world.
References
 [1] (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. Cited by: §2.2.
 [2] (2003) A neural probabilistic language model. Journal of machine learning research 3 (Feb), pp. 1137–1155. Cited by: §3.1.

[3]
(2013)
Dynamic travel time prediction using pattern recognition
. In 20th World Congress on Intelligent Transportation Systems, Cited by: §1, §1, §2.1.  [4] (1990) Finding structure in time. Cognitive science 14 (2), pp. 179–211. Cited by: §1.
 [5] (2020) CompactETA: a fast inference system for travel time prediction. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Cited by: §2.1.
 [6] (1997) LSTM can solve hard long time lag problems. In Advances in neural information processing systems, pp. 473–479. Cited by: §1, §2.1.
 [7] (2012) Learning the dynamics of arterial traffic from probe data using a dynamic bayesian network. IEEE Transactions on Intelligent Transportation Systems 13 (4), pp. 1679–1693. Cited by: §1, §1, §2.1.
 [8] (1982) Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences 79 (8), pp. 2554–2558. Cited by: §1.
 [9] (1997) Serial order: a parallel distributed processing approach. In Advances in psychology, Vol. 121, pp. 471–495. Cited by: §1.
 [10] (2018) Selfattentive sequential recommendation. In 2018 IEEE International Conference on Data Mining (ICDM), pp. 197–206. Cited by: §2.2.
 [11] (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §3.1.
 [12] (2019) Travel time estimation without road networks: an urban morphological layout representation approach. arXiv preprint arXiv:1907.03381. Cited by: §2.1.
 [13] (2015) Deep learning. nature 521 (7553), pp. 436–444. Cited by: §1.
 [14] (2018) Multitask representation learning for travel time estimation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1695–1704. Cited by: §1, §1, §2.1.
 [15] (2015) Effective approaches to attentionbased neural machine translation. arXiv preprint arXiv:1508.04025. Cited by: §2.2.
 [16] (2019) Fastspeech: fast, robust and controllable text to speech. In Advances in Neural Information Processing Systems, pp. 3165–3174. Cited by: §2.2.
 [17] (2019) Selfattention networks for connectionist temporal classification in speech recognition. In ICASSP 20192019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7115–7119. Cited by: §2.2.
 [18] (2017) Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008. Cited by: 2nd item, §2.2, §3.2.
 [19] (2018) When will you arrive? estimating travel time based on deep neural networks. In ThirtySecond AAAI Conference on Artificial Intelligence, Cited by: §1, §1, §2.1.
 [20] (2019) A simple baseline for travel time estimation using largescale trip data. ACM Transactions on Intelligent Systems and Technology (TIST) 10 (2), pp. 1–22. Cited by: §2.1.
 [21] (2014) Travel time estimation of a path using sparse trajectories. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 25–34. Cited by: §1, §1.
 [22] (2018) Learning to estimate the travel time. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 858–866. Cited by: §1, §1, §2.1, §3.1, §3.
 [23] (2015) Show, attend and tell: neural image caption generation with visual attention. In International conference on machine learning, pp. 2048–2057. Cited by: §2.2.
 [24] (2018) Qanet: combining local convolution with global selfattention for reading comprehension. arXiv preprint arXiv:1804.09541. Cited by: §2.2.
 [25] (2011) Tdrive: enhancing driving directions with taxi drivers’ intelligence. IEEE Transactions on Knowledge and Data Engineering 25 (1), pp. 220–232. Cited by: §2.1.
 [26] (2013) Urban link travel time estimation using largescale taxi data with partial information. Transportation Research Part C: Emerging Technologies 33, pp. 37–49. Cited by: §2.1.

[27]
(2016)
Urban link travel time prediction based on a gradient boosting method considering spatiotemporal correlations
. ISPRS International Journal of GeoInformation 5 (11), pp. 201. Cited by: §1, §1.  [28] (2018) Deeptravel: a neural network based travel time estimation model with auxiliary supervision. arXiv preprint arXiv:1802.02147. Cited by: §1, §2.1.

[29]
(2018)
Endtoend flow correlation tracking with spatialtemporal attention.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 548–557. Cited by: §2.2.