Mobility has always been part of human history. In 2017, there were about 258 million international migrants worldwide, of which 150.3 million are migrant workers (UN DESA, 2019). Modeling and forecasting human mobility is therefore important, to help formulate effective governance strategies but also to deliver insight at scale to humanitarian responders and policymakers. But at the same time, developing reliable forecasting methods able to predict , the number of people moving at the next time step from a region to a region among origin regions and destination regions is extremely challenging due to lags or even absence in recent migration data, especially for developing countries (Askitas and Zimmermann, 2015; Böhme et al., 2020).
One way to mitigate this lag or lack of data is the use of real-time geo-referenced data on the internet like the Global Database of Events, Language, and Tone (GDELT Project) or Google Trends. Both have been successfully used to make forecasting in various fields (Choi and Varian, 2012; Ahmed et al., 2016; Ginsberg et al., 2009). Recently, Böhme et al. (2020) demonstrated that adding geo-referenced online search data to predict migration flows yields better performance compared to only using common economic and demographic indices, e.g. the gross domestic product (GDP), and the population size. The authors propose to predict bilateral migration flows of next year with a linear model relying on the Google trends data captured the previous year.
In this work we use the exact same data, but we replace the linear model by a recurrent neural network (LSTM (Hochreiter and Schmidhuber, 1997)) that is able to consider the whole history to make predictions. We demonstrate that the prediction quality can be drastically improved by capturing better complex migration dynamics (Masucci et al., 2013) and complex interactions between the many features.
The outline of our work is the following. We first introduce the related work in section 2. In order to make this article more self-contained, we explain in section 3 how the Google Trends features are extracted in Böhme et al. (2020) and also briefly introduce recurrent neural networks. We then describe our recurrent neural network approach in section 4. Finally, our approach is evaluated and compared with the previous approach in section 5.
2. Related Work
In traditional models, the problem of predicting
or an estimation of itis usually divided into two sub-problems: (a) predict the number of people leaving a region (aka production function); and (b) predict
the probability of a movement fromto . Thus we get that . With ML models, the problem is quite different as the goal is to directly predict from a set of features111Notice that you could approach the problem the same way as with traditional models but it is not a common practice..
|Gross Domestic Product for origin country during the year|
|Gross Domestic Product for destination country during the year|
|Population size for origin country during year|
|Population size for destination country during year|
fixed effects, encoded as a one-hot vector
|Destination country fixed effects, encoded as a one-hot vector|
|Year fixed effects, encoded as a one-hot vector|
|Bilateral GTI for a pair origin country and destination country during a year|
|Unilateral and destination GTI for an origin country , a destination country during a year|
|Current year migration flow from country to country|
There are basically two conventional models: (a) the gravity model; and (b) the radiation model. Gravity models, inspired by Newton’s law, evaluate the probability of a movement between two regions to be proportional to the population size of the two regions and , and inversely proportional to the distance between them (Anderson, 2011; Letouzé et al., 2009; Poot et al., 2016). In radiation models, inspired by diffusion dynamics, a movement is emitted from a region and has a certain probability of being absorbed by a neighboring region . The subtlety here is that this probability is dependent on the population of origin, the population of the destination, and on the population inside a circle centered in with a radius equal to the distance from to (Simini et al., 2012). Gravity is usually better to capture short distance mobility behavior, while radiation is usually better to capture long-distance mobility behaviors (Masucci et al., 2013).
To the best of our knowledge, (Robinson and Dilkina, 2018) is the first attempt to use ML in order to predict human migration. The authors use two ML techniques: (a)
model; and (b) deep learning basedartificial neural network (ANN) model. Similarly to us, this approach also attempts to directly predict from the set of features without requiring any production function. But this approach also exhibits two important differences with our approach: 1) It uses traditional features for their prediction model, which is composed of geographical and econometric properties such as the inter-country distance, median household income, etc. 2) They don’t capture the dynamic aspect since the prediction only relies on the previous time-step set of features.
More recently, Böhme et al. (2020), proposed to use the Google Trends Index (GTI) of a set of keywords related to migration (examples: visa, migrant, work, etc.) as a new feature set to make migration prediction. Böhme et al. (2020)
rely on a bilateral gravity model to predict the total number of migrant leaving a country of origin towards any of the OECD’s destination countries during a specific year. The gravity models are estimated by a linear regression. Our approach uses the exact same input data and thus also relies on the Google Trends Index (GTI) data. But instead of a linear least squares estimation model, we use a recurrent neural network (LSTM) that is fed with the complete set of historical features rather than only the ones coming from the previous time step.
We start by describing the data used for learning and predicting the migration, giving more details about the Google Trends new set of features from (Böhme et al., 2020). We describe the performance metric used to compare the prediction models also used in (Robinson and Dilkina, 2018). We then briefly introduce recurrent neural networks.
3.1. Data and features sets
Table 1 gives an overview of the features used from the data provided by Böhme et al. (2020). More specifically, we use following features: Gross Domestic Product (GDP) for origin and destination countries, population size for origin and destination countries (World Bank, 2020), the bilateral Google Trends Index (GTI), migration numbers from the previous year (IOM 2018), as well as 3 one-hot vectors for encoding the origin, destination and the year.
Google Trends Index features
The Google Trends Index (GTI) is based on the Google Trends data freely accessible at (Google, 2020). The Google Trends tool allows collecting a daily measure of the relative quantities of web search of a precise keyword in a particular region of the world for a specified span of time222The data can be downloaded from their website, or through an unofficial API (General Mills, 2019). (General Mills, 2019; Google, 2020). To best represent the migration intentions of Internet users via online searches, a set of terms related to the theme of migration is selected. It is composed of the 67 most semantically related terms to “immigration” and the 67 most semantically related terms to the word “economics” according to the website “Semantic Link”333https://semantic-link.com/. Every term is transcribed in 3 languages with Latin roots: English, Spanish and French in order to not complicate the extraction too much while covering a maximum of people, i.e. about 841 million native speakers (Eberhard et al., 2020). Tables 4 and 5 in the appendix contain the set of main keywords (Böhme et al., 2020, Table 1).
The Google Trends Indexes of a precise keyword for a particular country are then calculated by capturing the measures provided by Google Trends for the chosen keyword in the geographical area corresponding to the country in question for the time period spanning from 2004 to 2014444Google Trends data only starts from 2004 and the migration data stops after 2015.. Since the values provided by Google are provided as intervals of one month555This is specific to requests spanning from 2004 to the present. and are normalized in a range between 0 and 100, the GTI are computed by taking the average of the values for each year in order to match the migration data. The indexes therefore reflect the variation of the quantity of searches for the keyword over the years.
The bilateral GTI data is made up of the two different forms of vectors: and for the unilateral and bilateral aspects. Three different forms of GTI values are then defined:
The vector of unilateral GTI or contains the GTI values of the set of keywords for the country of origin during the year .
The vector of bilateral GTI or contains GTI values also specific to the country of destination . The values are still captured in the country of origin during the year but the related keywords correspond to the combination of the terms with the name of the destination country (examples: visa Spain, migrant Spain, work Spain, etc.).
The destination GTI contains only the GTI value of the keyword corresponding to the destination country’s name (example: Spain) for the country and the year .
The data-base OECD (2020) provides a yearly incoming migratory flow from 101 countries of origin to the 35 countries member of the OECD from the early 1980’s until 2015. Demographic and economic data about each destination and origin countries have been gathered from the World Development Indicators (World Bank, 2020).
3.2. Evaluating prediction models
- Common Part of Commuters ():
: Its value is 0 when the ground matrix and the prediction matrix have no entries in common, and 1 when they are identical.
- Mean Absolute Error ():
Its value is 0 when the values of both matrices are identical, and arbitrarily positive the worse the prediction gets.
- Root Mean Square Error ():
Its value is 0 when the values of both matrices are identical, and arbitrarily positive the worse the prediction gets. The main difference with the MAE is that the RMSE penalizes more strongly the large errors.
- Coefficient of determination (r):
Its value is 1 when the predictions perfectly fits the ground truth values, 0 when the predictions are identical to the expectation of the ground truth values, and arbitrarily negative the worse the fit gets.
- Mean Absolute Error In ():
That is on total incoming migrant by destination countries .
|Common Part of Commuters||
|Mean Absolute Error||
|Root Mean Square Error||
|Coefficient of determination||
|Mean Absolute Error In||
3.3. Recurrent Neural Networks and Long Short-Term Memory (LSTM)
Recurrent neural networks (RNN) (Graves, 2012; Goodfellow et al., 2016)) and LSTM are types of artificial neural networks (ANN) architectures particularly well suited to predict time-series or sequential data. It allows sharing features learned across different parts of the sequential data to persist through the network and it is also not required to have a fixed set of input vectors. Long short-term memory (LSTM) (Hochreiter, 1991; Doya, 1993; Bengio et al., 1994; Graves, 2012)
are special architectures of RNN improving their ability to learn properly long-term dependencies by limiting the risk of vanishing and exploding gradient problems.
Since its first publication, LSTM has recently gained momentum for several applications, including in forecasting, and has been shown to yield better performances in the prediction of time series compared to other ML techniques (Schmidhuber et al., 2005; Greff et al., 2017; Gers et al., 2000; Yao et al., 2015; Tax et al., 2017; Liang et al., 2019; Giles, 2001; Jozefowicz et al., 2015).
4. Our LSTM approach
As described in figure 1 we use one RNN in charge of predicting the bilateral flows
with the origin and destination countries one hot encoded for all pairs. The RNN has a unique LSTM layer. Another approach would have been to use different networks to estimate the flow for each pair of countries. The amount of data to train each would have been very limited, though.
4.1. Learning models and hyper-parameter optimization
To train the ML models we proceed using three sets (Goodfellow et al., 2016): (a) a train set, gathering the input features from 2004 to 2012 (input _features ) and also all the observed migration flows spanning from 2005 to 2013 as output (T ) since we predict next year migration; (b) a validation set, containing input features on the year 2013 (input_features ) and migration flows of 2014 (T ); and (c) a test set, on the year 2014 (input_features ) and 2015 (T
). The hyperparameters are optimized accordingly for each model.
A simplified version of our LSTM training is presented in Algorithm 1, while our LSTM evaluation is presented in Algorithm 2. Notice that the span of years presented in the algorithms corresponds to the one used once the validation is completed, i.e., we fit our model on both the training and validation set.
Due to the specificity of LSTM, we fit our LSTM time series by time series666By time series we mean sequence of annual migration flows between a pair of origin destination.. Therefore we use a batch of size corresponding to the number of years present in the serie. This implies that the gradient descent is applied and the LSTM’s parameters are updated after each propagation of a time series through the LSTM cells (as presented in figure 1). Furthermore the features have been normalized by time series of origin destination using a min max scaler (Goodfellow et al., 2016). Our LSTM model uses a bias of 1 for the LSTM forget gate since it has been shown to improve performances drastically (Gers et al., 2000; Jozefowicz et al., 2015).
For the experiments, we have used three different loss functions: Mean Absolute Error (), Mean Square Error (), and Common Part of Commuters (, as described in (Robinson and Dilkina, 2018), equivalent to with cpc given by equation (1) ) and adapt them to handle time series.
We optimize the following hyperparameters and present them along with their optimal value : loss function - MAE using adam optimizer, number and size of hidden layers - 1 layer of width 50, number of epochs - 50, and dropout - 0.15.
5. Results and discussion
|2-3 5-6 8-9 11-12 14-15 Models||train||test||train||test||train||test||train||test||train||test|
|Linear Regression||0.871||0.866||819||877||6 100||5 239||0.800||0.773||24 128||28 737|
|ANN||0.931||0.834||119||306||818||1 553||0.975||0.921||3 257||9 664|
|LSTM||0.945||0.892||96||225||639||1 028||0.985||0.967||2 261||4 827|
We carry out experiments comparing the performance of our LSTM approach with two other models :
The bilateral gravity model estimated through an OLS model as presented in (2020) whose gravity equation is represented below :
With representing the robust error term.
A deep learning based artificial neural network model (ANN model) as proposed in (2018)
. Our ANN is composed of densely connected with rectified linear units (ReLU) activation layers. We use the same model for all the predictions with a time-step of 1 year. This means that the ANN receives as input the set of featuresdescribed in Table 1 and outputs the predicted next-year migration flow . We optimize the following hyperparameters and present them along with their optimal value : loss function - MAE using adam optimizer, number and width of hidden layers - 2 layers of width 200, training batch size - 32, number of epochs - 170 and dropout - 0.1.
Our source code is available on the following git repository: https://github.com/aia-uclouvain/gti-mig-paper
. It contains the script to extract the Google Trends Index, the Google Colab notebook to build the different models, as well as the data we used. The code is written in python and uses the Keras library, which runs on top of TensorFlow.
In order to assess the predictive power of each model, we use a test set represented by every migration flow taking place in 2015 which represent a bit less than 10% of the whole data.
We can observe that our ML models perform much better than the Böhme et al. (2020)’s linear model. Indeed, with the same data, the ANN beats the first model in almost every metric while the LSTM model completely outperforms it in all the measures. The ANN model fits very well the training data but it does not seem to generalize as well as the LSTM model as shown by their performance on the test set. We can draw from these first results that the LSTM is the best predictive model among these three.
Since the RMSE values are always way higher than the MAE (between 5 and 7 times larger) we can conclude that the models tend to make a few really large errors. This can be explained by analyzing the data. In the dataset, the mean value of migration flows between 2 countries during a year is 742 but the median value is only 17 while the maximum is about 190 000. This indicates that our dataset is very sparse: there is a lot of near zero observations (40% are below 10) for a very few extremely important ones (less than 2% reach 10 000). One can notice that the mean absolute errors of the different models are very important compared to the mean annual migration flows (742 and 46 119, see table 3 caption) but these values are heavily biased by the sparsity of the data and by the large errors made on the really large migration flows, e.g. the USA and Spain. In the case of Spain notice that there has been an important drop in incoming migration flows in 2008 due to the 2007–2008 financial crisis (Domingo, 2017).
In order to have a better visualization of the predictive power of the models, we represent in figure 2 the scatter plot of the 3 models for the test set only. The graph reflects well the sparse nature of the data as shown by the density of points along the x-axis. As expected following the first results, we can observe that the linear model does not provide very accurate predictions. The ANN model, on the other hand, shows a stronger tendency to underestimate the ground truth values. Ultimately, the LSTM’s estimations are the ones sticking the most to the actual migration flows which confirms our first assumption.
Finally, figure 3 shows the error of the total number of incoming migrants per destination country per year for each model. We can observe that whatever the model, for the majority of countries and years, the estimation error is close to null and that the big errors often appear in the same countries of destination. Knowing that, we can see that the heatmaps of the ANN model and of the linear regression in figure 3 highlight their tendency to underestimate the migration flows especially for the last year (the test year).
To compare these errors with the actual migration flows, we represent in the rightmost heatmap in figure 3 the ground truth values of the total number of incoming migrants per destination country and per year in descending order. With this figure we can clearly see that the errors we make are mostly for the countries with important incoming migration flow.
Böhme et al. (2020) have recently demonstrated that including Google trends data in the set of standard features could improve the migration prediction models. In this work, relying exactly on the same data, we improved the quality of the prediction significantly by replacing the linear model used in by a Long short-term memory (LSTM) artificial recurrent neural network (RNN) architecture. Our experiments also demonstrated that the LSTM was outperforming a standard ANN on this task.
One drawback of our machine learning approach is that we lose interpretability of the model and predictions despite the high interpretabilty potential of Google search keywords. As future work, we would like to apply the latest interpretability techniques (see (Molnar, 2019)) to better identify what the most important features for making high quality migration predictions. This would equip economists and experts in migration with new tools to shed light on migration mechanisms.
Acknowledgements.The authors acknowledge financial support from the UCLouvain ARC convention on ”New approaches to understanding and modelling global migration trends” (convention 18/23-091).
- A Multi-Scale Approach to Data-Driven Mass Migration Analysis. SoGood@ ECML-PKDD, pp. 17 (en). Cited by: §1.
- The Gravity Model. Annual Review of Economics 3 (1), pp. 133–160 (en). External Links: Cited by: §2.
- A Closer Look at Memorization in Deep Networks. arXiv:1706.05394 [cs, stat]. External Links: Cited by: §4.1.
- The internet as a data source for advancement in social sciences. International Journal of Manpower 36 (1), pp. 2–12. Cited by: §1.
- Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks 5 (2), pp. 157–166. Cited by: §3.3.
- Searching for a better life: Predicting international migration with online search keywords. Journal of Development Economics 142, pp. 102347 (en). External Links: Cited by: Table 4, Table 5, Appendix A, §1, §1, §1, §2, §3.1, §3.1, §3, item 1, §5, §6.
- Predicting the Present with Google Trends. Economic Record 88 (s1), pp. 2–9 (en). External Links: Cited by: §1.
- El Sistema Migratorio Hispano-Americano del Siglo XXI México y España. Revista de Ciencas y Humanidades - Fundación Ramón Areces (es). Cited by: §5.
- Bifurcations of recurrent neural networks in gradient descent learning. IEEE Transactions on neural networks 1 (75), pp. 218. Cited by: §3.3.
- Ethnologue: Languages of the World. (en). Note: https://www.ethnologue.com/ Cited by: §3.1.
- A Theoretically Grounded Application of Dropout in Recurrent Neural Networks. arXiv:1512.05287 [stat]. External Links: Cited by: §4.1.
- GeneralMills/pytrends. Note: General Mills Cited by: §3.1, footnote 2.
- Learning to Forget: Continual Prediction with LSTM. Neural Computation 12 (10), pp. 2451–2471 (en). External Links: Cited by: §3.3, §4.1.
- Noisy Time Series Prediction using Recurrent Neural Networks and Grammatical Inference. Machine learning 44 (1-2), pp. 161–183 (en). Cited by: §3.3.
- Detecting influenza epidemics using search engine query data. Nature 457 (7232), pp. 1012–1014. Cited by: §1.
- Deep learning. MIT press. Cited by: §3.3, §4.1, §4.1.
- Google Trends. Note: https://www.google.com/trends Cited by: §3.1.
- Supervised Sequence Labelling with Recurrent Neural Networks. Studies in Computational Intelligence, Vol. 385, Springer Berlin Heidelberg, Berlin, Heidelberg (en). External Links: Cited by: §3.3.
- LSTM: A Search Space Odyssey. IEEE Transactions on Neural Networks and Learning Systems 28 (10), pp. 2222–2232 (en). External Links: Cited by: §3.3.
- Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §1.
- Untersuchungen zu dynamischen neuronalen Netzen. Diploma, Technische Universität München 91 (1). Cited by: §3.3.
- An Empirical Exploration of Recurrent Network Architectures. In International Conference on Machine Learning, pp. 2342–2350 (en). Cited by: §3.3, §4.1.
- Revisiting the migration-development nexus: A gravity model approach. Human Development Research Paper 44. Cited by: §2.
- A Neural Network Model for Wildfire Scale Prediction Using Meteorological Factors. IEEE Access 7, pp. 176746–176755. External Links: Cited by: §3.3.
- Gravity versus radiation models: On the importance of scale and heterogeneity in commuting flows. Physical Review E 88 (2), pp. 022812. Cited by: §1, §2.
- Interpretable machine learning. Lulu. com. Cited by: §6.
- International migration database. https://doi.org/https://doi.org/10.1787/data-00342-en. Note: https://www.oecd-ilibrary.org/content/data/data-00342-en Cited by: §3.1.
- The Gravity Model of Migration: The Successful Comeback of an Ageing Superstar in Regional Science. pp. 27 (en). Cited by: §2.
- A machine learning approach to modeling human migration. In Proceedings of the 1st ACM SIGCAS Conference on Computing and Sustainable Societies, pp. 1–8. Cited by: §2, §3.2, §3, §4.1, item 2.
Evolino: Hybrid neuroevolution/optimal linear search for sequence prediction.
Proceedings of the 19th International Joint Conferenceon Artificial Intelligence (IJCAI), Cited by: §3.3.
- A universal model for mobility and migration patterns. Nature 484 (7392), pp. 96–100. Cited by: §2.
- Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15 (1), pp. 1929–1958. External Links: Cited by: §4.1.
- Predictive Business Process Monitoring with LSTM Neural Networks. arXiv:1612.02130 [cs, stat] 10253, pp. 477–492. External Links: Cited by: §3.3.
- International Migrant Stock 2019.. United Nations Database (EN). Note: https://www.un.org/en/development/desa/population/index.asp Cited by: §1.
- World Development Indicators. Washington, D.C. : The World Bank.. Note: https://datacatalog.worldbank.org/dataset/world-development-indicators Cited by: §3.1, §3.1.
- Depth-Gated LSTM. arXiv:1508.03790 [cs] (en). External Links: Cited by: §3.3.
Appendix A Used Keywords
Tables 4 and 5 contain the set of main keywords: ” For GTI data retrieval, both singular and plural as well as male and female forms of these keywords are used where applicable. In the English language, both British and American English spelling is used. All French and Spanish keywords were included with and without accents” (Böhme et al., 2020, Table 1).
|border control||controle frontiere||control frontera|
|labor||travail||mano de obra|
|required documents||documents requis||documentos requisito|
|unauthorized||non autorisee||no autorizado|
|unskilled||non qualifies||no capacitado|
|welfare||aide sociale||asistencia social|