The data was provided by a major telecom operator and consists of an anonymised sample of seven months of roamers’ CDRs in a European country. The data covers the period between beginning of May to end of November 2013. Each CDR contains the principal antenna that a mobile device is connected to during a phone call, SMS communication or data connection. The time-stamped connection event is interpreted as a location measurement, positioning the device inside the approximate coverage area of the principal antenna. The size of this coverage area can range from a few tens of meters in a city to a few kilometers in remote areas. We do not at all consider the actual geographical positions of the antennas, instead we take this correspondence as a given and represent each antenna as a Unicode (utf-8) character (the total number of antennas in our sample is over , hence the use of the Unicode set for the character assignment). The series of connections of a roaming user is transformed into a time-stamped character sequence , which is the object passed to the prediction algorithm.
We take the time step of the sequence of locations to be 1 hour. If more than one events fall within a single time step, one of them is chosen in random to represent the location of the user. In this manner, the mobility trace of a user is converted to an abstract sequence of symbols that unfolds in one-hour steps, and predictions are given for the next hour location, i.e. the next symbol in the sequence. The length of the time step is chosen to balance precision with completeness of the sequences. The sequences of antenna connection events for most users are discontinuous and sparse, having the usual erratic profile of mobile activity patterns. A shorter time unit increases trace fragmentation, while a longer unit would reduce the accuracy of the representation of the actual mobility trace, and consequently the value of the prediction.
The test set of sequences used for benchmarking the algorithm is selected by length, so that we can gather enough statistical data on the behaviour of the algorithm. Specifically, the 1000 longest continuous character strings, i.e. mobility trace fragments, were selected. To make the comparison with individual sequence prediction sharper, we remove from the dataset the rest of the data of the users that have one or more sequences in the set of 1000. In this way it is as if the users in the test set is observed for the first time. The remaining data is used to construct the expert ensemble. Each user’s trace is turned into a Markov model which then plays the role of an expert providing predictions to the forecaster whenever possible. Our sample comprises of more than 10 million users, each of which contributes a prediction algorithm to the expert ensemble. This is an unusually large number of experts. We could find a comparable expert ensemble used only in Cohen & Singer (1999), in the context of document classification. This unusual abundance of experts is not incidental, instead it defines our approach. We exploit the redundancy of mobility patterns in our dataset to predict traces effectively.
We present the results of the empirical tests of the forecasters from two complementary aspects. We focus first on measuring the absolute performance of the EW forecaster on our 1000 test sequences. We then present aspects of the internal dynamics of the forecaster, and examine how the performance is affected by varying parameters that modify the content of the expert ensemble.
Human mobility is characterised by strong regularities that make it possible in principle to predict it with high precision Song et al. (2010); Lu et al. (2012, 2013); González et al. (2008). It does not however always exhibit these regularities. There are contexts in which human mobility patterns either change abruptly or are by their nature non-repetitive. Our test data of mobile network roamers exemplifies this. Tourists and foreign visitors in general are naturally expected to be less regular in their patterns than local residents, in some cases merely crossing through a country without any repetitive patterns at all. In a different sort of situation, when facing an emergency, mobility behaviours can change in a short time, usually while transitioning between more regular regimes Lu et al. (2012). Transient mobility behaviour is also expected at the individual level as a result of changes of daily routines. Sequential learning algorithms are exactly designed to address problems where the nature of the predicted sequence cannot be accounted for a priori, or even assumed to be persistent in time. A diversified approach using many experts overseen by a forecaster is then a better strategy for accurately predicting diverse types of sequences, in our case mobility traces.
The predictive power of the forecaster/expert combination we study is drawn from the redundancies that characterise large mobility datasets. When a single user following a new, perhaps non-repetitive pattern, e.g. while vacationing, their past position data cannot inform short-term predictions. From the point of view of a dataset containing the traces a sufficiently large and diverse set of users however, patterns that are new or irregular for the individual can be found to have already been traced by other users. In the case of tourists in particular, itineraries that are a first for a user have most often already been explored by others in part or in whole. These past patterns are encoded in individual prediction algorithms and, combined by a forecaster, can provide accurate predictions during a transient phase without depending on the regularities of a single individual. In addition, the expert ensemble can be dynamic, with experts added or removed on the fly, e.g. choosing the experts in a moving time window.
We measure the performance of the EW sleeping experts forecaster on our set of test sequences. We compare its performance with an important benchmark for human mobility prediction at this resolution and duration, the Markov model. The expert ensemble is defined (per sequence) by admitting only expert sequences that ended in a fixed time before the predicted sequence starts. For the comparative testing we took to be 3 months ( hours). We also admit all experts in our sample, without any filters. This setup provides optimal performance.
The performance of the EW forecaster can be seen in Fig. 1, in comparison with Markov model individual sequence predictors of order . The best settings for the EW forecaster turn out to be at a value , but the accuracy is only slightly higher than in the adaptive version (Supplementary Information S2), where the learning rate is free to vary over a grid of values, and at each step is given the median of the values of that have had the best performance so far. In the diagrams we always use . In addition, the exclusion of the user’s own location sequence (as it unfolds in time) in the expert ensemble gives a slightly better average prediction accuracy than one gets when admitting it as an additional expert. As we can see, the EW forecaster performs significantly better than Markov models, with the model being the most accurate among the latter. This is consistent with the results seen in previous studies of human mobility prediction. Our choice of order 1 for the Markov models of the experts is based on this ranking. The distribution of the differences in accuracy for individual test sequences shows that EW gives a 5% average advantage over the Markov model. When predicting a new sequence, EW overtakes the Markov models in accuracy after an average of 14 hours [Fig. 1(a)]. The quasi-periodic pattern in [Fig. 1(a)] is explained by the day-night variation in prediction accuracy [Fig. 1(b)], combined with the fact that most foreign visitors arrive during the day.
Internal dynamics of the forecaster
To understand better when and why the sequential learning algorithm works or not, we examine the internal dynamics of our forecaster/expert ensemble combinations. The sequential learning algorithm considered as a dynamical system evolving in time contains millions of degrees of freedom, namely the experts’ weights, and is statistical in nature. Its dynamics are described by an equal number of difference equations that depend on the sequence under prediction, and do not avail any simple treatment. Nevertheless, empirical metrics can shed some light on the factors crucial for prediction accuracy.
In Fig. 2(a), we see the dependence of prediction accuracy from the EW forecaster on two parameters modifying the expert ensemble, a sampling rate at which the ensemble is randomly sampled, and , the time span before the start of the predicted sequence from which prediction data is admitted to the ensemble. In both cases, performance is stable over a wide range of the parameters and quickly degrades outside. This demonstrates on one hand the robustness of the performance over changes in the expert ensemble and on the other the quick failure when the expert ensemble starts becoming incomplete. As we see, below a certain threshold of a few percent sampling rate the algorithms performance drops significantly. The fast drop in performance is due to the decimation of the transitions between locations available in the Markov models of the expert ensemble when experts are filtered out. In Fig. 2(b) we plot the average percentage of unique antenna-to-antenna transitions in the test sequences which are also contained in the expert ensemble, as a function of the sampling rate. The performance of the forecaster depends most crucially on the quality of the ensemble. When the ensemble is diverse, shifting mobility patterns are quickly picked up and correctly predicted by the relevant experts. The EW forecaster significantly promotes their weights relative to other experts over just a few time steps. While the Markov model (or any individual sequence prediction algorithm) has to gather enough statistical information about the new behaviour before producing correct predictions, the forecaster has most of this information already available in the experts.
This overall dependence on the availability of transitions in the expert ensemble can be also seen when zooming in to single sequences. The three sequences shown in Fig. 4 are coloured in three different scales, showing the qualitative correlations between the numbers of best and awake experts and success or failure in prediction. They have been picked to represent three types of prediction dynamics typically seen in our test set. It is clear that the probability of success correlates strongly with the number of best experts at any given time step. This number can stay relatively stable over a segment and then change up or down abruptly as the user moves to a new antenna for which few experts are available. When many experts are available for the current mobility patch of the user, the forecaster quickly starts producing correct predictions.
The EW forecaster’s internal benchmark is regret, the difference in cumulative loss - here the number of erroneous predictions - between itself and the best expert, i.e. the expert with the minimum loss, in those rounds where the expert was awake Freund et al. (1997); Blum & Mansour (2005). Here we use a different measure, comparing the forecaster’s predictive accuracy, with the accuracy obtained by each expert when predicting the sequence alone. For the comparison we declare as best expert the one that attains the best prediction accuracy over the whole sequence. In our test set, the forecaster is often significantly more accurate than the best expert, while never performing much worse [Fig. 4(a)]. The Markov model constructed sequentially from the predicted user’s data in contrast is performing more poorly, with the best expert holding a significant advantage in the majority of cases [Fig. 4(b)]. In effect, the mobility trace of a different user is often a better predictor for another user’s trace, than the latter’s own mobility data. Of course this is due partly to the dynamical construction of the user’s own Markov model, which is populated with new transitions as they happen, while the experts’ Markov models are derived from past data in a time window months.
We have shown that a large number of individual sequence prediction algorithms derived from the mobility traces of mobile phone users can be combined by an Exponential Weights forecaster to provide accurate next-hour location predictions for individual users. Using a dataset of mobility traces, we have demonstrated the potential of the method to predict short trips of transient populations such as tourists, which, in general, are in general non-stationary. The method can easily be implemented for CDRs, even when the data is highly incomplete, with many gaps in time. It outperforms the Markov model standard for individual sequence human mobility prediction while requiring only time stamped locations at the input.
The proposed method is domain-agnostic, and can in principle be applied to the prediction of any dataset of time series that can be encoded into character sequences. The only essential requirement is that the sequence dataset contains enough observations so that the phase space of the dynamical system under prediction is already covered, many times over if possible. This is in contrast to individual sequence prediction algorithms, where solely the data of a single sequence is needed to make predictions; it enjoys the advantage of fast adaptation to new mobility patterns for a newly observed agent, or in cases where the event sequence is transient.
Sequential prediction: experts and forecasters
The problem of human mobility prediction is naturally formulated as a sequential prediction problem. The sequence of positions unfolds in time, and the data available for predicting the next position lies in the past. Sequential prediction methods have been developed as a means to provide guarantees on the quality of predictions without making any a priori assumptions on the nature of the sequence. In sequential prediction with experts, the unknown character of the unfolding sequence is anticipated by combining a collection of prediction algorithms instead of a single one. The goal is to make the algorithm able to adapt to non-quasi-stationary transient patterns that may be encountered in the sequence. The benchmark for the quality of prediction is the performance of the best expert, or of an optimal combination of experts. Instead of depending only on the sequence, as in individual sequence prediction when a universal algorithm is used, the best possible prediction rate depends both on the sequence and on the ensemble of experts.
The absolute performance of an expert is measured by a loss functionthat quantifies the difference between the th expert’s prediction and the actual outcome (in our case, the user’s position in the next hour, indexed by the positive integer ). Among experts in a finite set, there are always one or more that will suffer the minimum loss over a given sequence. One of these experts can be chosen as representative of this class, as the best expert. The individual predictions are combined by a forecaster, an algorithm that assigns to each expert a weight and makes a prediction for the actual prediction based on these weights. The forecaster’s goal is to minimise its own loss:
compared to the loss of the best expert in the ensemble, for a sequence of length . This relative loss is called regret and is always measured in hindsight:
where is the index of the best expert, i.e. the expert with the minimum cumulative loss at step . Following the standard in the human mobility literature, we use the simple binary loss function
The forecaster loss function is also taken to be . An additional complication arising in mobility prediction with trace-derived experts is that at every round only a fraction of the experts can provide predictions. The Markov model for an expert is of fixed order (see Supplementary Information S1 for a discussion on the choice of algorithms). It is constructed by counting the relative frequency of transitions between antennas in a user’s history. Users generally explore a very small subset of the possible transitions, and so only users that have in the past connected to a given antenna or a sequence of antennas can provide predictions for users currently in the same location. As a result, most experts will abstain from prediction at any given round. When experts cannot provide a prediction at every step, they are called sleeping or specialised. The specific versions of the EW forecaster that we use can be found in Supplementary Information S2. For sleeping experts forecasters, theoretical bounds on regret based on varying definitions have been derived in Blum (1997); Freund et al. (1997); Blum & Mansour (2005); Devaine et al. (2009); Kleinberg et al. (2010). The bounds compare the forecaster to the best expert or convex combination of experts in those instances where the expert was awake. We do not attempt here to prove a bound for the variant we study.
The two main ingredients of an sequential learning algorithm are the mechanism for updating the weights after each round, and the rule for combining the experts. Some version of a majority vote is usually chosen for the latter. In our case the prediction outcomes are discrete characters corresponding to individual antennas. At each round the forecaster randomly picks a single prediction out of those provided by the experts with probability proportional to the weight of each expert. The weight update mechanism is a central factor behind the quality of the predictions, together with the quality of the expert ensemble. Exponential Weights forecasters penalise experts that make wrong predictions by multiplying their weights with a factor less than unity, depending on the loss and a learning rate . A higher learning rate accelerates the process of weight update:
Experts with lowered weight will contribute less in the next round. Those that are often correct will see their predictions selected with higher probability. We test the EW forecaster both with fixed learning rate and an adaptive version where the value of is tuned sequentially as in Devaine et al. (2009). In addition, we tested versions where the user’s own position data, encoded in an individual prediction algorithm of the same type as for the other experts, is added to the expert ensemble. These versions guarantee that in the infinite time limit the expert ensemble contains at least one expert that can reach the performance expected from an individual sequence prediction algorithm, but they consistently performed slightly worse that the corresponding variants where the user’s own Markov model was not included.
The central idea of our approach is that with enough users in the dataset, the space of mobility patterns will be densely covered. Users will show similarities in the type and frequency of transitions between antennas. When these transitions are encoded in an individual sequence prediction algorithm, e.g. a Markov model, one user’s past data can be useful in predicting another user’s future mobility. The bootstrapped predictions draw exclusively from the position data of users without making a priori assumptions about the sequence, such as stationarity (Supplementary Information S3), or requiring additional data sources. The forecaster learns and adapts its choices based only on the success of each expert in providing correct predictions.
This research was funded by the CS Research Foundation, Amsterdam, The Netherlands (www.collectivesensing.org) and by the Austrian Science Fund (FWF) through the Doctoral College GIScience (DK W 1237-N23), Department of Geoinformatics - Z_GIS, University of Salzburg, Austria. P.K. would like to thank Panayotis Mertikopoulos for useful discussions.
- Feder et al. (1992) Feder, M., Merhav, N., and Gutman, M. Universal prediction of individual sequences. IEEE Transactions on Information Theory, 38(4):1258–1270, 1992.
- Rissanen (1984) Rissanen, J. Universal coding, information, prediction, and estimation. Information Theory, IEEE Transactions on, 30(4):629–636, Jul 1984.
- Song et al. (2010) Song, C., Qu, Z., Blumm, N., and Barabási, A-L. Limits of predictability in human mobility. Science, 327(5968):1018–1021, 2010.
- Domenico et al. (2013) Domenico, Manlio De, Lima, Antonio, and Musolesi, Mirco. Interdependence and predictability of human mobility and social interactions. Pervasive and Mobile Computing, 9(6):798 – 807, 2013. ISSN 1574-1192. Mobile Data Challenge.
- Cesa-Bianchi & Lugosi (2006) Cesa-Bianchi, N. and Lugosi, G. Prediction, Learning and Games. Cambridge University Press, New York, NY, USA, 2006. ISBN 0521841089.
- Cutland et al. (1991) Cutland, Nigel, Kopp, Ekkehard, and Willinger, Walter. Universal portfolios. Mathematical finance, 1(1):1–29, 1991.
- Cohen & Singer (1999) Cohen, William W. and Singer, Yoram. Context-sensitive learning methods for text categorization. ACM Trans. Inf. Syst., 17(2):141–173, April 1999. ISSN 1046-8188.
- Borodin et al. (2000) Borodin, Allan, El-Yaniv, Ran, and Gogan, Vincent. On the competitive theory and practice of portfolio selection. In LATIN 2000: Theoretical Informatics, pp. 173–196. Springer, 2000.
- Monteleoni & Jaakkola (2003) Monteleoni, Claire and Jaakkola, Tommi S. Online learning of non-stationary sequences. In Advances in Neural Information Processing Systems, pp. None, 2003.
- Stoltz & Lugosi (2005) Stoltz, Gilles and Lugosi, Gábor. Internal regret in on-line portfolio selection. Machine Learning, 59(1-2):125–159, 2005.
- Devaine et al. (2009) Devaine, M., Goude, Y., and Stoltz, G. Aggregation of sleeping predictors to forecast electricity consumption. Rapport technique, EDF R&D et Ecole normale superieure, Paris, 2009.
- Mallet et al. (2009) Mallet, Vivien, Stoltz, Gilles, and Mauricette, Boris. Ozone ensemble forecast with machine learning algorithms. Journal of Geophysical Research: Atmospheres (1984–2012), 114(D5), 2009.
- Vovk & Zhdanov (2009) Vovk, Vladimir and Zhdanov, Fedor. Prediction with expert advice for the brier game. The Journal of Machine Learning Research, 10:2445–2471, 2009.
- Dashevskiy & Luo (2011) Dashevskiy, Mikhail and Luo, Zhengqian. Time series prediction with performance guarantee. Communications, IET, 5(8):1044–1051, 2011.
- Jacobs & Tamer (2011) Jacobs, Abigail Z and Tamer, Elie. Adapting to non-stationarity with growing predictor ensembles. PhD thesis, Master’s thesis, Northwestern University, 2011.
- Mallet (2010) Mallet, Vivien. Ensemble forecast of analyses: Coupling data assimilation and sequential aggregation. Journal of Geophysical Research: Atmospheres (1984–2012), 115(D24), 2010.
Monteleoni et al. (2011)
Monteleoni, Claire, Schmidt, Gavin A, Saroha, Shailesh, and Asplund, Eva.
Tracking climate models.
Statistical Analysis and Data Mining: The ASA Data Science Journal, 4(4):372–392, 2011.
- Li & Hoi (2014) Li, Bin and Hoi, Steven C. H. Online portfolio selection: A survey. ACM Comput. Surv., 46(3):35:1–35:36, January 2014. ISSN 0360-0300.
- Blum (1997) Blum, Avrim. Empirical support for winnow and weighted-majority algorithms: Results on a calendar scheduling domain. Machine Learning, 26(1):5–23, 1997.
Freund et al. (1997)
Freund, Yoav, Schapire, Robert E., Singer, Yoram, and Warmuth, Manfred K.
Using and combining predictors that specialize.
Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing, STOC ’97, pp. 334–343, New York, NY, USA, 1997. ACM. ISBN 0-89791-888-6.
- Blum & Mansour (2005) Blum, Avrim and Mansour, Yishay. From external to internal regret. In Auer, Peter and Meir, Ron (eds.), Learning Theory, volume 3559 of Lecture Notes in Computer Science, pp. 621–636. Springer Berlin Heidelberg, 2005. ISBN 978-3-540-26556-6.
- Kleinberg et al. (2010) Kleinberg, Robert, Niculescu-Mizil, Alexandru, and Sharma, Yogeshwer. Regret bounds for sleeping experts and bandits. Machine learning, 80(2-3):245–272, 2010.
- de Montjoye et al. (2013) de Montjoye, Yves-Alexandre, Quoidbach, Jordi, Robic, Florent, and Pentland, Alex(Sandy). Predicting personality using novel mobile phone-based metrics. In Greenberg, ArielM., Kennedy, WilliamG., and Bos, NathanD. (eds.), Social Computing, behavioural-Cultural Modeling and Prediction, volume 7812 of Lecture Notes in Computer Science, pp. 48–55. Springer Berlin Heidelberg, 2013. ISBN 978-3-642-37209-4.
- Lu et al. (2012) Lu, X., Bengtsson, L., and Holme, P. Predictability of population displacement after the 2010 Haiti earthquake. Proceedings of the National Academy of Sciences, 109(29):11576–11581, 2012.
- Lu et al. (2013) Lu, X., Wetter, E., Bharti, N., Tatem, A.J., and Bengtsson, L. Approaching the limit of predictability in human mobility. Scientific Reports, 3:1–9, 2013.
- Song et al. (2006) Song, L., Kotz, D., Jain, R., and Xiaoning, He. Evaluating next-cell predictors with extensive Wi-Fi mobility data. IEEE Transactions on Mobile Computing, (12), pp. 1633–1649, 2006.
- Eubank et al. (2004) Eubank, Stephen, Guclu, Hasan, Kumar, VS Anil, Marathe, Madhav V, Srinivasan, Aravind, Toroczkai, Zoltan, and Wang, Nan. Modelling disease outbreaks in realistic urban social networks. Nature, 429(6988):180–184, 2004.
- Tatem et al. (2009) Tatem, Andrew J, Qiu, Youliang, Smith, David L, Sabot, Oliver, Ali, Abdullah S, Moonen, Bruno, et al. The use of mobile phone data for the estimation of the travel patterns and imported plasmodium falciparum rates among zanzibar residents. Malar J, 8(287):10–1186, 2009.
- Tizzoni et al. (2014) Tizzoni, Michele, Bajardi, Paolo, Decuyper, Adeline, Kon Kam King, Guillaume, Schneider, Christian M., Blondel, Vincent, Smoreda, Zbigniew, González, Marta C., and Colizza, Vittoria. On the use of human mobility proxies for modeling epidemics. PLoS Comput Biol, 10(7):e1003716, 07 2014.
- Wesolowski et al. (2012) Wesolowski, Amy, Eagle, Nathan, Tatem, Andrew J, Smith, David L, Noor, Abdisalan M, Snow, Robert W, and Buckee, Caroline O. Quantifying the impact of human mobility on malaria. Science, 338(6104):267–270, 2012.
- González et al. (2008) González, M.C., Hidalgo, C.A., and Barabási, A.-L. Understanding individual human mobility patterns. Nature, 453:779–782, 2008.