Human mobility traces are critically important to many disciplines in addition to computer networking, ranging from epidemiology  to urban planning . Unfortunately, existing traces of human mobility are flawed: using traditional social science methods to collect data has proven difficult , and traces collected using technology methods have suffered from a variety of limitations. These include small size (the largest is 100 nodes ), short duration (the longest is 9 months ), and high locality (many of the scenarios are limited to campus and conference environments ). These datasets may not be enough for large mobile system evaluations, and are definitely insufficient for epidemiology, where planet-wide measurements are needed to track the spread of disease.
As members of the networking community, we have both the tools and methods (e.g., hardware and software knowledge) to conduct large-scale data collection. Furthermore, our contributions will not only benefit the wireless and mobile networking research communities, but will impact fundamental research in other areas allowing more features about human behaviour to be uncovered. We believe that the situation is analogous to that of complex networks research, which has flourished since 1989 when the first large datasets from the Internet (and subsequently the World Wide Web) became available . To achieve similar improvements in mobile networking and other related fields, relevant large-scale datasets must be made available.
In this paper we challenge the community to collect large-scale human mobility traces. We highlight some of the issues in the hope that the community can help find good solutions. In the meantime, we propose some solutions intended to form the basis of initial efforts; the main aim is to raise these issues to gain community support to meet this challenge and make the topic hot in the networking community.
2 Why are large-scale human mobility traces important?
As mentioned above, large-scale datasets are useful for many aspects of research. In this paper we focus only on two aspects: system design and validation, and epidemiological studies.
2.1 System design and validation
After its first use in the evaluation of Dynamic Source Routing , the random waypoint model became the de facto standard mobility model in the mobile networking community. For example, of the ten papers in ACM MobiHoc 2002 which considered node mobility, nine used the random waypoint model . This trend has changed dramatically over recent years after the introduction of real mobility traces for evaluation: of the 10 papers considering node mobility in MobiHoc 2008, 7 used real mobility traces for evaluation.
The community has realised that unrealistic models are harmful for scientific research. Although real traces may suffer from limited numbers of participants, coarse granularity, and short experimental duration, they at least reflect some aspects of real life. Thanks to the popularity of Online Social Networks (OSNs), we can now gather large-scale data about the topology and membership information of millions of OSN users and use these to study aspects of the social networks [22, 20]. But where is the large-scale dataset for evaluating, for instance, inter-city ad-hoc communication using mobile computing? Or even a single city-wide mobile communication system (e.g., a delay-tolerant network, or city-wide gaming)? We have very few empirical hints for this. Without the help of real data, we cannot even know whether this kind of system is possible. Even if we extrapolate large-scale mobility traces from small-scale traces, the problem of validating the extrapolation remains.
Instead of using mobility traces directly to run trace-driven
simulations, a possible approach is to extract characteristics from
the data and build more realistic mobility models. Much work has been
done in modeling human mobility for mobile ad hoc network
simulation . Researchers have proposed more
realistic models by incorporating obstacles ,
social information , and clustering features
observed in mobility datasets . Analysis
of real traces has demonstrated power-law inter-contact time
distributions with cut
[6, 19], levy-flight patterns consisting of lots of small moves followed by long jumps , heterogenous centralities  (i.e., popularity) and clustering structure . But again, these results are from small-scale datasets and are limited to specific scenarios with limited time durations. Some researchers have extrapolated from these by assuming, for instance, that the way people move in a city is correlated to the centrality distribution of the city graph , but this has yet to be verified empirically. Gonzalez et al.  extracted coarse-grained levy-walk properties from large-scale mobile phone usage. The limitation is that the dataset is from cellular basestation, which only log when mobile users make a call or send an text message. This is very coarse in both geographical and temporal granularity. People may argue that human behavior should be scale-free in different dimensions, but we need data for further verification. Moreover, since the the data from this study have not been released, it is impossible to verify or build on their findings.
We need large-scale human mobility datasets with better space and time granularity to verify the properties we mention above. Following analogous progress in related fields, it seems likely that we will uncover many more features from such data which will help us to build good models. We believe that this is crucially important for the mobile computing community.
2.2 Epidemiological studies
Moving beyond social science, the communication network community has also aided research in many other academic disciplines. For instance, our methodology and data make the modeling of human dynamics possible , and more significantly, our data made possible the development of the field of complex network research 
. Large-scale mobile data can further enable the study of epidemic disease spreading. The current state-of-the-art in epidemic modeling uses data from the International Air Transport Association (IATA) commercial airline traffic database to determine travel between airports and to provide coarse-grained estimates of global spreading patterns, as well as data of transportation and commuting patterns in urban areas, which can be used to model a metapopulation mechanism of spreading . Researchers cannot develop more microscopic models of epidemic spreading because of the lack of large-scale fine-grained empirical data.
To take a topical example, consider the current swine flu outbreak. Scientists have urged governments to map the spread of swine flu more accurately in order to predict the number of people who may die from it . Current predictions indicate that one in 200 people who get swine flu badly enough to need medical help could go on to die, but given that vaccines may not be ready until later than hoped, accurate predictions are crucial. available. Any estimates about swine flu are subject to a wide margin of error, not least because not everyone who catches it develops symptoms. More accurate mapping of the spread of the virus must be carried out if it is to be effectively managed. Monitoring doctors and hospitals is insufficient since not everyone who is infected with swine flu will become ill enough to report their case to a doctor.
Figure 1 shows the process of the spreading of epidemics by the mobility of human from a subpopulation (e.g., a city) to another subpopulation. When a susceptible individual (S) is in contact with a infectious individual (either symptomatic or asymptomatic), it will be infected with a certain rate and enter the latent class. When the latent period ends, the individuals become infectious (i.e., able to transmit the infection). After the infectious period, all infectious individuals enter the recovered class. If an infectious individual moves from to another city, the subpopulation in the new city will also be infected. Using the IATA data, scientists can roughly model the migration of population across countries. But we need much better granularity of data, instead of assuming a homogeneous mixing in each subpopulation (city).
Figure 2 shows the confirmed number of swine flu cases world wide on May 19, 2009. It first started in Mexico and then spread to other countries by human mobility. We can see from the figure that the most worst countries beside Mexico are nearby countries such as the USA and Canada. Spain was the worst in Europe because there are in general a lot of connections between Spain and Mexico, but we need better data to build models to predict such behaviour.
Mobile computing can help to fight epidemics in at least two ways:
Case 1: If we can track real-time or nearly real-time human health status, we can provide advice and precautions for each users, accurately estimate the number of asymptomatic infectious individuals, predict the spreading process, identify the hotspots of the pandemic, and effectively isolate the infectious victims. This may be possible by using a personalised epidemic software. Users can self identify their health status (e.g., cough, cold) and embedd this status in a Bluetooth service. Users periodically run Bluetooth service discovery and log the devices discovered, the health status of each encountered user, and if possible also their geographical locations. Users can upload their log files to the server, which analyses results and provide effective feedback.
Case 2: If we do not have the health status of each users but only the contact log and the geographical location of certain encounters, we can understand the mixing properties of each subpopulation, model contact and mobility processes, and identify the social hotspots. With this understanding, we can accurately predict and emulate the spreading of diseases.
3 Challenges in collecting data
3.1 High experimental cost
In general it is expensive to conduct large-scale mobility experiments. Costs include equipment, software, human resources, and generating incentives for people to participate. For example, for the iMote experiments carried out by the Haggle Project , the cost of iMotes, packages, batteries, participation incentives, and the human resources spent on assembling and distributing devices, and monitoring the experiments. add up to $12,000 (including development) for a small-scale experiment with just 50 participants. This is clearly not scalable to experiments involving billions of people.
3.2 Privacy and government regulations
The law in many jurisdictions strictly regulates privacy and thus data collection, making large-scale data collection even more challenging . Before data collection can begin, the consensus of participants is required, substantially increasing the administrative burden. Further, telephone operators are restricted in what customer data they can store, for how long, and for what purpose, and the dissemination of such data is even more tightly controlled. This dramatically increases the difficulty of obtaining data from operators, which otherwise is a good way to reduce collection cost and increase dataset size.
3.3 Lack of motivating applications
We can see from the discussion above that it is not scalable to give out hardware for large-scale experiments. Instead we must rely on useful or interesting software applications to motivate participation of users that already own their own hardware. For example, there are many applications developed for iPhones but no key application exists that enables large-scale data collection. An application able to scale up to millions of users while collecting data would be incredibly valuable to the research community (as well as economically!). Equal value might be obtained through many applications with smaller (but still large) user communities: it is not a strict requirement that such a large dataset consist of a single community, and indeed, it might be valuable in avoiding bias if the overall billion-sized dataset were composed of numerous smaller (multi-million sized) components.
3.4 Lack of business models
To motivate a large amount of participation, we may need good business models. Such business models can motivate operators to share their data, and users to participate in experiments. If all parties — the operators, the users and the researchers — can benefit from participating in a system, it is more likely to succeed.
3.5 Lack of organisation
CAIDA (caida.org) exists to aid Internet traffic data collection, but there is no such organisation or group for data collection in mobile or wireless networks. The closest is CRAWDAD (crawdad.org), but that was established only to archive wireless data and, though it has performed this role well, it does not currently coordinate or lead data collection. An organisation for initiating, motivating, and coordinating mobile data collection would be extremely valuable. If such an organisation cannot be founded then, given the distributed and large-scale nature of the problem, crowd-sourcing might be utilised to achieve the same goal.
4 What can we do?
It is impractical to provide experimental devices to billions of participants. Our strategy is to develop novel application software allowing us to utilize crowd-sourcing. It is also impossible to collect data from billions of people while relying on one group alone: we need collaborative support from the joint force of the research and industrial communities to achieve active participation of sufficient individual users. The key problem is to motivate participation of the community and users by providing mutual benefit.
4.1 New communication and networking applications
Novel and useful communication and networking applications can be one
efficient way to motivate participation. For example the company
(www.sensenetworks.com) provide a innovative mobile application for real-time nightlife discovery and social navigation, answering the question, “Where is everybody going right now?” They found that this application attracted around 100,000 users in North America. Unfortunately, as with other companies, the data are not available to the public but it seems that developing useful applications might be a viable way to collect large-scale datasets for research purpose.
4.2 A common research platform for mobility and social network study
Currently there are several research groups involved in human mobility measurements [6, 26, 19, 27, 4], and we expect more researchers will move into the area in the near future. Social network research has also recently become a popular research area, and is often integrated with mobility research. In order to motivate the researchers to create a crowd-sourcing effect, we propose the development of an open platform for social network and mobility experiment. Researchers can create their own online social networks for their research projects by defining the fields of users’ profiles according to the need of their experiment, e.g., name, email addresses, and Bluetooth ID. Separate projects can have different users, but the platform itself will merge the database from all projects. When a new project starts the central server informs all users about this project and invites them to participate. The user interface and format for each project are similar, and projects can be merged on the platform. The different is that each project has a database, and manages its own data independently. This will save a lot of effort and administrative hassle when collecting and interpreting data, and conducting experiments.
4.3 A social proximity application
Isolation is usually a problem in metropolitan cities. Mobile devices can help to detect the devices in proximity and help people to notice the “familiar strangers” around them.
Mobile phones can sense the people we meet everyday within the radio range and also detect the duration of the proximity. Here we suggest a platform including both software running on the mobile client and a web based application, allowing the users to build up a social network based on the proximity information detected. Mobile users can create a profile page on the web server by register its Bluetooth ID. The profile page can be similar to a Facebook page, but having additional features which can allow the user to preview statistics about the people he met in any period, and propose related strategies for subsequent encounters. The user can request addition of a particular owner of a Bluetooth ID to his friend list as on Facebook. We believe this opens a completely new way of socialising.
For example a user could use his mobile phone to detect the Bluetooth ID of someone whom he sees on the subway everyday, but to whom he is too scared to talk. This could enable him to initiate contact, while leaving the other party in control of any communication. This application scenario may seem socially unlikely in the Western world but it is a common pattern in Asia.But note that a single Asian population, however large, is also unrepresentative: many suitable applications, encouraging participation from different continents, countries and cultures, may be necessary.
4.4 Request data from the operators
We have two ways to request data from the operators: either access to anonymised data e.g., via collaborative research projects; or full access to data as a commercial partner, e.g., by providing commercial value to the operator through data analysis. An example of the former is the access of the Google metropolitan Wi-Fi dataset . This might be possible if the data can help to improve their services or provide them better revenues, e.g., if understanding human mobility can help in Wi-Fi hotspot deployment and placement. For the latter approach, one good example is applications like Qiro (qiro.net) or SenseNetworks, both of which use collaboration with operators to access location information to provide additional services to the users. Qiro uses information from T-Mobile, E-Plus, Vodafone and O2 to help users to locate nearby friends, and facilities such as bicycle rental.
4.5 Collaboration with local government and media
Local governments are powerful entities for assisting with data collection. They can help to push applications into reality. Some governments seek to develop infrastructure and facilities to improve the people’s life in the metropolitan area. By collaborating with these governments, we can quickly access the resources and deploy the facilities. The local media can be also a good way to gather mobility information as they are often interested in new technologies, wanting to use them in future campaign activities. For example, to market the movie Artificial Intelligence, an augmented reality game based on the movie, called Beasts, was created. The game was conceived as an elaborate murder mystery played out across hundreds of websites, email messages, faxes, fake advertisements, and voicemail messages, and involved over three million active participants. Collaborating in such activities can gain us datasets of millions of people. The UK government for the swine flu case can also be a good collaborator for the data collection.
4.6 New sources of data
The popularity of Web 2.0 and user-generated content means that there may well be more available human mobility datasets on the Internet, if we know where to look. For example, Piorkowski was able to extract 125,000 short-term mobility traces gathered from a publicly available web-based repository of GPS tracks  - the Nokia Sports Tracker service, which covers mobility of many urban areas. Another example is photo-sharing sites like Flickr. Photo-sharing sites on the Internet contain billions of publicly accessible images taken virtually everywhere on earth, which are annotated with various forms of information including geolocation, time, photographer, and a wide variety of textual tags. Researchers have been able to analyse a global collection of geo-referenced photographs, and evaluate them on nearly 35 million images from Flickr .
We believe in order to achieve the goal of planet-scale mobility measurement, we need to be more creative in collecting and merging information from different sources, sensing methods, and collaborating with different organisations.
In this paper we challenge the networking community to collect planet-scale human mobility traces. We explained why large-scale mobility datasets are important for networking research, and how they could impact fundamental researches in many other academic disciplines. We identified the challenges and difficulties, and further proposed potential methods to achieve this goal.
We in no way claim that we have the ideal strategies for collecting and managing such datasets: we would go so far as to say that this is an impossible mission for a single research group. Our intent with this paper is to draw the attention of the community to this problem, enabling the collective intelligence of the whole community to be brought to bear on these crucial problems.
With these kind of datasets, we believe that we will completely change the understanding of human dynamics, potentially opening many new fields of academic study, as the availability of Internet and WWW web data allowed the study of complex networks and systems to flourish, further impacting the understanding of biological structures, e.g., DNA and proteins. We urge the community to address these challenges to make this possible, and in doing so perhaps we can help to save the world from epidemics like SARS and swine flu.
-  M. Afanasyev, T. Chen, G. M. Voelker, and A. C. Snoeren. Analysis of a mixed-use urban wifi network: when metropolitan becomes neapolitan. In Proc. of IMC ’08, pages 85–98, Oct. 2008.
-  R. Albert and A.-L. Barabasi. Statistical mechanics of complex networks. Reviews of Modern Physics, 74(1):47–97, Jan. 2002.
-  R. Albert, H. Jeong, and A.-L. Barabasi. The diameter of the world wide web. Nature, 401(6749):130–131, Sept. 1999.
-  G. Bigwood, D. Rehunathan, M. Bateman, T. Henderson, and S. Bhatti. Exploiting self-reported social networks for routing in ubiquitous computing environments. In Proc. of SAUCE 2008, pages 484–489, Oct. 2008.
-  T. Camp, J. Boleng, and V. Davies. A survey of mobility models for ad hoc network research. Wireless Communications and Mobile Computing, 2(5):483–502, Aug. 2002.
-  A. Chaintreau, P. Hui, J. Crowcroft, C. Diot, R. Gass, and J. Scott. Impact of human mobility on opportunistic forwarding algorithms. IEEE Transactions on Mobile Computing, 6(6):606–620, June 2007.
-  V. Colizza, A. Barrat, M. Barthelemy, and A. Vespignani. Predictability and epidemic pathways in global outbreaks of infectious diseases: the SARS case study. BMC Medicine, 5:34, 2007.
-  V. Colizza and A. Vespignani. Epidemic modeling in metapopulation systems with heterogeneous coupling pattern: theory and simulations. Journal of Theoretical Biology, 251:450, 2008.
-  D. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg. Mapping the world’s photos. In WWW, 2009.
-  N. Eagle and A. Pentland. Reality mining: sensing complex social systems. Personal and Ubiquitous Computing, V10(4):255–268, May 2006.
-  L. C. Freeman. A set of measures of centrality based on betweenness. Sociometry, 40(1):35–41, Mar. 1977.
-  T. Garske, J. Legrand, C. A. Donnelly, H. Ward, S. Cauchemez, C. Fraser, N. M. Ferguson, and A. C. Ghani. Assessing the severity of the novel influenza A/H1N1 pandemic. BMJ, 339(b2840), July 2009.
-  M. C. Gonzalez, C. A. Hidalgo, and A.-L. Barabasi. Understanding individual human mobility patterns. Nature, 453(7196):779–782, June 2008.
-  T. Henderson and F. ben Abdesslem. Scaling measurement experiments to planet-scale: Ethical, regulatory and cultural considerations. In Proc. of ACM HotPlanet ’09, June 2009.
-  P. Hui, J. Crowcroft, and E. Yoneki. Bubble rap: Social-based forwarding in delay tolerant networks. In Proc. of MobiHoc ’08, May 2008.
-  P. Hui, R. Mortier, K. Xu, J. Crowcroft, and V. O. Li. Sharing airtime with shair avoids wasting time and money. In Proc. of HotMobile, Feb. 2009.
-  A. Jardosh, E. M. Belding-Royer, K. C. Almeroth, and S. Suri. Towards realistic mobility models for mobile ad hoc networks. In Proc. of MobiCom 2003, pages 217–229, Sept. 2003.
-  D. B. Johnson and D. A. Maltz. Dynamic source routing in ad hoc wireless networks. In Mobile Computing, pages 153–181. Kluwer Academic Publishers, 1996.
-  T. Karagiannis, J.-Y. L. Boudec, and M. Vojnović. Power law and exponential decay of inter contact times between mobile devices. In Proc. of MobiCom 2007, pages 183–194, 2007.
-  K. Lewis, J. Kaufman, M. Gonzalez, A. Wimmer, and N. Christakis. Tastes, ties, and time: A new social network dataset using facebook.com. Social Networks, 30(4):330–342, Oct. 2008.
-  Y. Liu, A. Rahmati, Y. Huang, H. Jang, L. Zhong, and Y. Zhang. xshare: enabling impromptu sharing of mobile phones. In Proc. of MobiSys 2009, June 2009.
-  A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In Proc. of IMC’07, Oct. 2007.
-  M. Musolesi and C. Mascolo. Designing mobility models based on social network theory. Mobile Computing and Communications Review, 11(3):59–70, July 2007.
-  M. Piorkowski. Sampling urban mobility through on-line repositories of GPS tracks. In Proc. of ACM HotPlanet ’09, June 2009.
-  M. Piorkowski, N. Sarafijanovoc-Djukic, and M. Grossglauser. A Parsimonious Model of Mobile Partitioned Networks with Clustering. In Proc. of COMSNETS, January 2009.
-  I. Rhee, M. Shin, S. Hong, K. Lee, and S. Chong. On the levy-walk nature of human mobility. In Proc. of INFOCOM, Phoenix, USA, April 2008.
-  V. Srinivasan, M. Motani, and W. T. Ooi. Analysis and implications of student contact patterns derived from campus schedules. In Proc. of MobiCom 2006, pages 86–97, 2006.
-  E. Strano, A. Cardillo, V. Iacoviello, V. Latora, R. Messora, S. Porta, and S. Scellato. Street centrality vs. commerce and service locations in cities: a Kernel Density Correlation case study in Bologna, Italy, Jan. 2007. arXiv:physics/0701111v1.
-  A. Vazquez, J. G. Oliveira, Z. Dezso, K. I. Goh, I. Kondor, and A. L. Barabasi. Modeling bursts and heavy tails in human dynamics. Physical Review E, 73:036127, 2006.
-  D. J. Watts and S. H. Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 393(6684):440–442, June 1998.
-  J. Yoon, M. Liu, and B. Noble. Random waypoint considered harmful. In Proc. of INFOCOM, pages 1312–1321, Apr. 2003.