Log In Sign Up

A Study of Tourist Sequential Activity Patterns through Location Based Social Network (LBSN)

by   Anmoila Talpur, et al.
Victoria University

Sequential Pattern Mining is an important component in establishing patterns and mining trends of certain activities. Insights into tourist movement and activity patterns is deemed beneficial for the tourism sector in many ways, such as designing better travel packages for tourists, maximizing the tourist activity participation and meeting the tourist demands. This research proposes to adopt mobile social media data for effective capturing of tourist activity information in Singapore and utilizes advanced data mining techniques for extracting valuable insights into tourist behavior. The proposed methods and findings of the study have the potential to support tourism managers and policy makers in making better decisions in tourism destination management.


page 1

page 6


Building a Classification Model for Enrollment In Higher Educational Courses using Data Mining Techniques

Data Mining is the process of extracting useful patterns from the huge a...

Mining Top-k Trajectory-Patterns from Anonymized Data

The ubiquity of GPS enabled devices result into the generation of an eno...

Mining Persistent Activity in Continually Evolving Networks

Frequent pattern mining is a key area of study that gives insights into ...

Sequential pattern mining in educational data: The application context, potential, strengths, and limitations

Increasingly, researchers have suggested the benefits of temporal analys...

A novel activity pattern generation incorporating deep learning for transport demand models

Activity generation plays an important role in activity-based demand mod...

Online Soft Conformance Checking: Any Perspective Can Indicate Deviations

Within process mining, a relevant activity is conformance checking. Such...

The Alt-Right and Global Information Warfare

The Alt-Right is a neo-fascist white supremacist movement that is involv...

I Introduction

With the rapid developments in information and communication technology (ICT) and in the era of globalization, the physical barriers of space and time have been eliminated with the inception of the Internet and World Wide Web. This has allowed business organizations to fully exploit the potential advantages offered by this new technology, allowing them to connect with their consumers online and offering them wide ranging products online. Further developments in ICT tools have also lead to the development of social media networks, blogs, RSS feeds, micro-blogs and wikis, which have allowed business organizations to directly engage with their consumers. Through these communication opportunities, many brands have connected with their consumers directly to improve their productivity, performance and profitability. This approach has also been adopted by the tourism industry. In recent times, the global tourism and hospitality sector has become highly competitive and therefore, tourists firms have adopted ICT tools to engage with their consumers to design products according to their needs and requirements and to provide high quality products, at affordable prices. The developments in ICTs have led to the development of Location Based Social Networks (LBSN), which are commonly used by travelers in recent times. A LBSN is a social network that allows individuals to connect and share their location by sharing content including photographs, texts messages and videos. LBSN has the ability to track check-ins in real time, based on the location of the user. A fundamental benefit of using LBSNs to research on tourists’ behavior is the accuracy and quality of data including texts, photos, videos and physical location coordinates. Based on the nature of the research, this study will aim at using LBSN and data mining techniques to investigate tourist activity pattern. It will also aim at proposing a new method for analyzing LBSNs by using sequential activity analysis and will focus on analyzing its practical relevance in terms of tourists’ behavior, using Singapore as the case study. Finally, it will analyze the tourists’ sequential activity patterns, which can be used by industry professionals and policy makers to utilize it to predict tourist behavior in terms of destination selection and can aid them in strategic planning and tourism marketing strategy as well as in new product and services development.

Tourism is one of the leading contributors to the economic growth and development not only in Singapore but across the world [1]. Singapore is known for its richness in the attraction sites coupled with multicultural population and tropical climate. As a result, tourism has been the most active sector receiving thousands of tourists on an annual basis. According to the statistics from the Singapore Tourism Board, the tourism sector has realized a steady growth since September 2016 in terms of the international arrivals [1]. Subsequently, there has been an increase in visitation by 4% last year, thus bringing an annual cumulative number of tourists that visited Singapore in 2017 to 8.5 million [2].

There are different factors which influence the tourists’ behaviors and their choice of destinations [1]. In northern Asia and especially Singapore, the culture which includes dressing, language, art, music, religion and food is considered to be the most significant factor affecting tourism activities. Ideally, the more the cultural practices, it translates to shorter travel periods hence more destination visits and purchase of packaging tours by the inbound tourists. According to Reference [1], the greater the cultural distance between a country and a destination, fewer tourists are likely to visit that particular destination. Based on convenience, variety and safety, Singapore emerged top of the ranks as the most preferred destination by tourists. This has been attributed to the government initiative to effect laws and regulations that ensure the security of the visitors in the destination of their choice [3, 4, 5].

To study tourist activity patterns, we need to determine how close or similar a tourist activity pattern is relative to that of another tourist. An effective measure of the distance or similarity between the tourist activities should take into account many characteristics of tourist activities apart from spatial and temporal dimensions. Research has proven that tourists conduct different activities at certain times, therefore, when comparing individual activity patterns, differences in the attributes of these activities (e.g., type and purpose), should also be captured [6]. The interdependency among these dimensions needs to be maintained at a distance or similarity measure (e.g., certain activities can take place only at certain places and/or at certain times). When comparing activity patterns, the distance measure should also be able to compare structural differences in tourist activities and their contextual variables (e.g., certain activities have to be performed before specific other activities)[6]. Thus, the tourist activity patterns will unfold over time. It is evident from previous research that tourist behavior can be volatile and easily affected by a series of inter-twined factors [7, 8, 9, 10]. Thus, we attempt to find some regularity in the extracted patterns. Thereby contributing to the knowledge about tourist behavior related to the particular order of activities they prefer to choose. This might also be dependent on some extrinsic as well as intrinsic factors. For example tourist visiting a place might choose to take part in recreational activities like sight-seeing, and afterwards go for dining and after that choose to shop around the place. Studying these series of activities carried out by tourists on a large scale can give us an insight into tourist trends, choices and preferences which can assist in the understanding of tourist behavior. The knowledge gained can be applied for the strategic development of tour management and can lead to more sophisticated tour packages and travel itineraries.

In tourism research, Sequential Pattern Mining (SPM) has been extensively applied by researchers to try and understand the tourist behavior while visiting different destinations across the world. There are various methods that have been utilized in the past to understand tourist behavior [11]. The most commonly used method is the vertical formatting method which entails the use of the Apriori algorithm in mining the tourist sequential activity pattern.This study will reveal various interesting patterns related to tourist activities, however due to the confined nature of this research, only the sequential activity patterns of tourists at diverse locations will be extracted and interpreted. The study will further seek to establish the most influential patterns on tourism activities at different venues in Singapore and their preferred timings. The analysis will be conducted on the data collected from Foursquare Check-ins through the Twitter API Streaming between October 2016 and October 2017 by using the Pattern-Growth SPM method. Research Questions:

  1. What are the most common tourist sequential activity patterns in Singapore?

  2. What are the effective methods of sequential activity analysis using the LBSN data in Singapore?

  3. What are the most interesting insights resulting from the tourist sequential activity pattern analysis in Singapore?

Ii Literature Review

Over the past three decades, there have been inconsistent and spatial tourist mobility patterns across the world and especially the Asia-Pacific region [2]. However, in the last decade, a steady but rapid growth of both inbound and outbound travel was noticed by the Pacific Asia Travel Association [12]. In the assessment of the tourist spatial patterns and flows in the early 1980s, it was established that these flows resulted due to the political and economic prosperity in the Asia-Pacific region. But owing to the rapid growth in the tourism sector, the researchers have become interested in the tourism flows in terms of nature, patterns and intensity.There are two approaches that influence the state of the tourism patterns and activities in any given region. The political-economic approach is often uni-directional and relies on economic situations to define the sequential patterns[13]. On the other end, the supply-demand interaction approach shapes tourism movement as well as consumption based on individual and collective preferences in a given tourist generation. For example, places with more tourism resources are poised to attract more tourists facing an upward shift on the frequency of the patterns.With the time and cost constraints, it creates intervening destinations thus providing avenues where tourists are able to make comparisons and select suitable destinations. Moreover, the flows and movement of tourists in the Asia-Pacific countries depend on other factors such as marketing effectiveness, promotional offers, destination attributes and demographic characteristics [13].

Several countries have adopted various tools and methods to complete the initial surveys / research and comprehend their tourists better, by establishing tourists mobility patterns and flows. Among the commonly used tools include the Country Potential Generation Index (CPGI) and the Gross Travel Propensity (GTP). The latter is used to evaluate the capability of a region or country to generate trips, taking into account a particular population. In essence it provides an estimate of the travel trips in a given region of the country.The Asia-Pacific region has gained attention from the world in terms of tourism owing to the economic and demographic development including less inbound travel restrictions

[12]. Owing to the great potential of tourism growth and development, there is a need for Asia-Pacific countries to better comprehend tourist activity patterns and flows to enhance their management. The statistical report of the Pacific Asian Travel Association (PATA) exhibited different travel flows among the member countries meaning the tourist activity patterns are unique to a region. In 1995, the United States was the leading tourist destination followed by Canada, Hong Kong, Singapore, China, and Australia. However, the trends of the travel flows between 1995 and 2004 changed with China and Hong Kong topping the list of the leading destinations [13].

There is a substantial need to understand the tourists movement behavior with respect to their destination choices since it has a deep impact on tourism development process and marketing strategy. Many of the previous studies have been conducted in investigating tourist behavior with respect to destination by primarily emphasizing on spatial and temporal patterns of tourist [14, 7, 15]. There has also been an evaluation of most popular tourist trends and choices [16, 17, 18]. The analysis of literature suggests that the tourist consumer behavior is the area of discussion of various researchers, academics and industry professionals because of its substantial importance. According to Reference [1], consumer behavior is the widely researched area in tourism and marketing and therefore, it is frequently associated with tourist behavior or travel behavior. It is necessary to understand the individual tourist behavior to identify which factors influence their purchase intention and destination choice since they can be beneficial in influencing the tourism demand. The primary focus on tourist decision process offers a comprehensive, accurate and detailed analysis on their demands. Many of the researchers and industry professionals have focused on understanding tourist behavior to improve their overall marketing strategy, product development and overall quality of services offered. It is essential that travel agencies understand the tourists’ destination preferences, since it allows them to develop new products and services by utilizing appropriate marketing strategies. It is also essential to understand the travel patterns of tourists to identify their behavior and purchase intent. It helps to identify their competitors and allows them to develop products and services based on different consumer segments, according to their needs and requirements.

In the past, there have been several design methods that were used to capture and analyze data on travel patterns and behaviors of tourists [17, 19, 20]. Social media platforms, such as Flickr, have helped provide rich data sources in terms of the historical data of the tourists and their individual preferences. With millions and thousands of tourists visiting different places across the globe, the information from that platform assists in planning trips appropriately, especially to those unfamiliar cities. Ideally, the hosting service uses Geo-tagged photos to identify tourists’ trajectories in a bid to explore topological spaces with the adoption of the motifs concept to unearth the tourists’ mobility patterns. Modern tourists prefer to travel to different cities or places to spend their holiday, hence they require adequate information in terms of tourist trajectories and past trends in order to make substantial decisions. Flickr and Twitter are known to be the contemporary social media platforms that help to provide the most convenient tourist and travel recommendations to the users [21]. However, the privacy and the scalability issues have made the use of Flickr ineffective [22, 23, 24]. Nonetheless, the travel recommendations can either be generic or personalized [17]. The latter highlights individual preferences with regard to the matching of the locations during visits. On the other hand the generic recommendations follow a specific order that includes: trajectory identification > interesting locations > travel sequences > planning > activity recommendation.

Tourist trajectories comprise of sequences of landmarks with semantic, temporal, and spatial information [25]. The use of the trajectory methods in understanding travel activity patterns requires the separation of the native tourists and the international tourists to better understand the flows. Flickr works by accumulating large collection of photos and storing the meta-data namely: size, time, and location. The latter is crucial in the analysis of the tourist activity patterns in a given region or place [26]. These methods collect the meta-data, analyze and provide recommendations accordingly to the users. The Geo-tagged photos help in partially capturing the travel information which can be used to construct travel trajectories and eventually unearth the tourist activity patterns for a given location. Consequently, the similarity matrix framework obtained after the construction of the respective trajectories help in grouping tourists accordingly. From the analysis of the travel semantic motif of the tourists, it was established that more tourists preferred the natural parks in the sequence as opposed to state buildings i.e. Central park > Brooklyn bridge > Rockefeller center. Different analysis using the trajectory framework from the meta-data obtained from the Flickr gets unique sequences depending on the tourist activity of that particular location [17]. Notably, tourists with similar interests in terms of travel preferences are normally grouped or clustered together generates the possibility of developing behavioral patterns and recommendations.

Proper understanding of tourists’ activities is important for tourism management to ensure they provide the best service, meet tourist expectation and also gain repeat visitation.There have been some attempts made in the past in studying the touristic flows and predicting tourist future destinations. One of the early contributions done in this respect was from Reference [16]

who focused on the decision making process of tourists and tried to model their next destination by using the nested logit model. This model assumed the utility maximization as the tourists demand and targeted on only one subsequent destination, unlike tourists, who can visit multiple subsequent destinations. Another research was done by Reference

[14] who tried to model the tourist next destination through a survey group who collected data on tourists’ intra-attraction spatial-temporal behavior and demographic characteristics using handheld GPS tracking devices and activity diary questionnaires. One of the latest research done in this category is by Reference [27] who explored the travel behaviors of tourists in Hong Kong by using the data from Geo-tagged photos uploaded on Flickr. The tourists’ movement trajectories were highlighted and patterns were drawn to indicate the most popular tourists destinations. The location preferences was also categorized with respect to 2 main groups i.e. Asian and Western. A similar study was carried out while exploring visitors activities in Hong Kong Parks [28] and Temples [29]. The Twitter Streaming API has been used by Reference [30] in which the Geo-tagged social media data is used to categorize tourists flow in Italy. This research used the Geo-tagged social media data from Twitter to characterize spatial, temporal and demographic features of tourists’ flow in Cilento, Southern Italy. The study focused on 3 main areas which are: the tourists profiles, tourists travel patterns in the region, tourists attraction in the region and their popularity. The Geo-tagged photos on Flickr have been used in many other studies [15]. However, Flickr only gives the geo-coordinates of the photos without giving any information about the actual place, its type, category and user comments associated with it. What the tourist actually did at that particular location, which activities they were involved in, and how much time they spent at that particular activity cannot be assessed via the data obtained from Flickr. The past research has been beneficial in understanding tourists’ behavior, however, little information is gained about tourist activity at a particular destination which is very important for tourism management and can assist in many ways. This research will fill the gap in the tourism literature by studying tourist activities in the sequential order, thereby providing rich information about tourist choices, preferences and decisions while visiting a particular destination.

Iii Methodology

The Singapore dataset meets all the requirements to be regarded as the sequence database; essentially one tourist being involved in various activities with related time field. The Prefix-Span algorithm has been proven to outperform the Apriori algorithm including other emerging algorithms for SPM [11]. Hence, the Pattern-Growth method which uses Prefix-Span algorithm will be used in this study.

In terms of the execution time, the SPAM algorithm performs better than the Prefix-Span. The test performance evaluation was conducted using the ‘BMS Webview1’ dataset which contained approximately 30,000 sequences with an average of 2.3 item sets in each of the sequences. The results of the evaluations are presented in the figure I below.

Fig. 1: Evaluation of SPAM and Prefix-Span algorithm using the BMS dataset.

The Foursquare Check-ins formed the main dataset, which were collected via twitter streaming using the twitter developer API to extract all tweets of tourists visiting various destinations in Singapore. Using a specialized coded program, the raw dataset was filtered to obtain only the tweets with foursquare check-ins of the tourists. The data was collected for a period of between seven to nine months which included the category of venues to help figure out which activities the tourists engaged in at a particular time. Each check-in had the following attributes: check-in ID, user ID, time and geographic coordinate (latitude and longitude), category and subcategory of the check-in’s location, i.e. the type of place where it occurred. Notably, the ideal dataset was collected for sample period of 4-6 months there were a total of 10,000 selected check-ins that were generated by 1057 number of tourists visiting singapore within the period of 4-6 months on an average each tourist generated 8-10 check-ins on Foursquare which was included in this study.

Iii-a Prefix-Span Algorithm Development

With such a convention, the expression of a sequence is unique. Next, we examine whether one can fix the order of item projection in the generation of a projected database. Intuitively, if one follows the order of the prefix of a sequence and projects only the suffix of a sequence, one can examine in an orderly manner all the possible sub-sequences and their associated projected database. Thus, we first introduce the concept of prefix and suffix.
Definition 1 (Prefix): Suppose all the items within an element are listed alphabetically. Given a sequence a=<e1,e2,e3, en>(where each ei corresponds to a frequent element in S), a sequence B=<e1’,e2’ en’> is called a prefix of A if and only if 1) ei’ = ei for (i<=m-1); 2) em’ subset of em ; and 3) all the frequent items in (em-em’) are alphabetically after those in em’.
Definition 2 (Suffix): Given a sequence A=<e1,e2,..en> (where each ei corresponds to a frequent element in S). Let B=<e1e2..em-1em’> (m<=n) be the prefix of A. Sequence S=<em Em+1 En> is called the suffix of A with regards to prefix B, denoted as C=A/B, where e =(em-em’)2 We also denote A=B.C Note, if B is not a sub-sequence of B, the suffix of _ with regards to A is empty.
Definition 3 (Projected database): Let A be a sequential pattern in a sequence database S. The _-projected database, denoted as Sj, is the collection of suffixes of sequences in S with regards to prefix .To collect counts in projected databases, we follow the next step.
Definition 4 (Support count in projected database): Let A be a sequential pattern in sequence database S, and B be a sequence with prefix A. The support count of B in A-projected database Sj, denoted as support Sj (B), is the number of sequences in Sj such that B is a subset of A.

Iii-B The Pattern-Growth Method: Prefix-Span Algorithm

The Pattern-Growth Method of mining sequential patterns involves the use of the Prefix-Span algorithm which can be executed in the following steps;

  1. Finding the length of the sequential patterns: The given database is scanned for all the frequent items of a given length. The sequences patterns ought to be of the same length.

  2. Partitioning of the sequential patterns into subsets: The partitioning is done based on the attached prefix.

  3. Finding subsets of the sequence patterns: From the subsets of the sequential patterns, the projected database is constructed recursively.

The parameters involved are thus; S is the sequence database, is sequential pattern, l is the length of , SP is the alpha projected database. The major cost of the prefix-span is the generation of the projected database. The process of the pattern-growth method involves a series of steps which requires input, method, and then output. The output in this case is the complete set of sequential patterns. Minimum support thresholds and confidence levels are required for the implementation of the algorithm.


  • Scan the sequence database S to yield all the frequent items b where, b can be assembled to form sequential patterns or <b> may be appended together to to form set of sequence patterns.

  • Each of the frequent items form the projected database b is appended to form which is the output.

  • In each of the given sequential patter, the sequential pattern is constructed through the call of the prefix-scan (, l+1, SP).

The subsets can be formed by the resulting projected databases as shown in the Table 1 :

Prefix Projected (postfix) database Sequential Patterns
a ((abc) (ac) d (cf)),
((_d) c(bc) (ae)),((_b)
(df) cb), ((_f) cbc)
(a), (aa), (ab), (a(bc)),
(a(bc)a), (aba), (abc), ((ab)),
((ab)c), ((ab)d), ((ab)f),
((ab)dc), (ac), (aca), (acb),
(acc), (ad), (adc), (af)
b ((_c)d(cf)), ((_c) (ae),
((df)cb), (c)
(b), (ba), (bc), ((bc)), ((bc)a),
(bd), (bdc), (bf)
c ((ac)d(cf)), ((bc)(ae)), (b), (bc) (c), (ca), (cb), (cc)
d (cf) (c(bc)(ae)), ((_f)cb) (d), (db), (dc), (dcb)
e (_f)(ab)(df)cb),
((af) cbc)
(e), (ea), (eab), (eac), (eacb),
(eb), (ebc), (ec), (ecb), (ef),
(efb), (efc), (efcb)
f ((ab) (df) cb), (cbc) (f), (fb), (fbc), (fc), (fcb)
TABLE I: Projected database and Sequential Patterns

In a given sequence dataset, in this case, the Singapore Tourist Database, the Pattern-Growth method using the Prefix-Span algorithm helps in identifying all the frequent sequence patterns in the dataset. There are two parameters which are executed at the start of the algorithm, namely; minimum support and maximum prefix [31]. The latter helps by providing the length of the sequence which is quite crucial while analyzing large databases. On the other end, the minimum support parameter is obtained by dividing the pattern with the number of sequences in the dataset. Using the projected database, the Prefix-Span algorithm first finds out the lengths of the sequential patterns then uses the projected database to yield the patterns.

ID Sequence
S1 (1),(2),(1 2),(3),(1 3),(4 5),(6)
S2 (3 4),(3),(2 3),(1 4)
S3 (4 5),(2),(2 3 4),(3),(1)
S4 (4),(5),(1 6),(3),(2),(7),(1)
TABLE II: An Example of a sequence database

The Table 2 represents an example of the sequence database which contains four sequential patterns having items sets. For instance, the first sequence contains seven item sets of frequent patterns [31, 32]. Similarly, the analysis of the Venue Category and Check-in Time shall yield the subsequent patterns from the selected sequence of the Singapore Database. The application of the Prefix-Span entails the scanning of the sequence database to obtain the frequent items [33, 34, 35, 36]. Thereafter, the frequent items are appended together to form sequential patterns which is then used in the construction of the projected database.

The prefix span algorithm shall be implemented in java pseudo code although it can also be run using python and R programming languages.

Iv Results and Findings

The findings of the SPM using the Pattern-Growth method are summarized in five primary cases each exhibiting different activity patterns altogether. It provides an indication that tourists have different tastes and preferences while visiting a particular region in the world. The results provide adequate information to the tourists’ management authorities in Singapore to help them devise better strategies in terms of tourists’ visits.

Fig. 2: Frequency of the Tourist Location

From figure 2, it is evident that most of the tourists came from Thailand, Kuwait, Jakarta Malaysia, and Indonesia.

The exploratory data analysis revealed that there were more female tourists than male tourists visiting different venues in the country. Ideally, there were 3830 female, 3577 male, and 207 tourists did not wish to declare their gender. From the analysis, majority of the female tourists came from Malaysia, NaN, Thailand, Peru, and BBK. On the other end, majority of the male tourists came from Japan, NaN, Kuwait, Selangor, Thailand, and Indiana.

Fig. 3: Frequency of the Venue Category

Most of the visited Venue Categories include: Airport, Park, Airport-Gate, Asian Restaurant, Pier, and Stadium. The scenic lookout and the airport was visited more frequently as compared to other activities. We already know that Airport and all the Airport related activities signal the international tourists status, therefore, these will be ignored when generating activity patterns in the SPM phase.

Using the Pattern-Growth method and the Prefix-Span algorithm the following sequence patterns were obtained. The Apache software which runs on java was used to implement the algorithm following the systematic steps to yield the tourist sequential activity patterns.

Activity Sequence 1 > 2 > 3 Frequency Support Confidence
Travelling > Religious > Dining 620 0.015713 0.078125
Shopping > Recreation > Entertainment 620 0.136509 0.030576
Hiking > Outdoor > Refreshments 385 0.042229 0.098837
Entertainment > Religious > Shopping 289 0.283575 0.030303
Shopping > Hiking > Dining 365 0.044685 0.192308
Nature > Refreshments > Dining 265 0.136509 0.008993
Archives > Shopping > Nature 423 0.006629 0.185185
Dining > Travelling > Nature 313 0.005892 0.208333
Hiking > Nature > Shopping 313 0.136509 0.008993
Dining > Nature > Nature 287 0.083231 0.014749
Sport > Dining > Nature 465 0.012521 0.098039
Nature-walk > Archives > Nature 550 0.03781 0.032468
Religious > Nature > Shopping 425 0.009084 0.135135
Entertainment > Nature > Refreshments 414 0.098453 0.014963
Dining > Hiking > Shopping 611 0.017677 0.083333
TABLE III: Case 1: Tourists’ Sequential Activity Patterns in the Morning (7am-2pm)

Although, primarily tourists varied in their times of involvement in various activities, there were notable activities such as visiting the park, scenic look-out, outdoor sculpture, visiting the garden and border crossing that were commonly done in the morning hours. Hiking, nature walks and religious activities dominates the morning hours of the tourist schedules based on the patterns, as evident in case 1. As compared to other time periods, afternoon and evening, people prefer to visit different restaurants to eat in the morning hours. Based on the figure above several tourists share most of the sequential activity patterns.

Activity Sequence 1 > 2 > 3 Frequency Support Confidence
Dining > Walking > Shopping 715 0.050011 0.088235
Field > Exhibition > Shopping 715 0.126917 0.008278
Field > Dining > Shopping 385 0.007354 0.142857
Shopping > Gaming > Walking 270 0.075856 0.033241
Dining > Walking > Shopping 365 0.054003 0.046693
Nature > Nature > Walking 444 0.126917 0.013245
Travelling > Dining > Walking 323 0.007144 0.235294
Shopping > Refreshments > Refreshments 313 0.024795 0.042373
Nature > Dining > Walking 513 0.050011 0.021008
Shopping > Field > Nature 587 0.093717 0.013453
Dining > Field > Hiking 265 0.014079 0.089552
Archives > Travelling > Resting 240 0.075856 0.01662
Scenery > Dining > Entertainment 125 0.037403 0.033708
Dining > Archives > Walking 114 0.007354 0.171429
Archives > Dining > Field 211 0.126917 0.034768
TABLE IV: Case 2: Tourists’ Sequential Activity Patterns in the Afternoon (2pm-12am)

The case 2 features those set of sequence which were generated within the time range of mid-afternoon 2pm till mid-night 12am. The activity patterns clearly highlight the trend of tourists during the later part of the day. While a large number of tourists preferred to do some nature and field walks, dining formed the major activity at this time. Thus, they have plenty of options to choose from while dining out. This also symbolizes the rich Asian culture that predominates the world due to it’s authentic and aromatic variety of food.  

Activity Sequence 1 > 2 > 3 Frequency Support Confidence
Hiking > Dining > Shopping 1200 0.01458 0.235294
Shopping > Field > Nature 1300 0.14494 0.031558
Field > Dining > Shopping 385 0.007354 0.142857
Shopping > Nature > Dining 785 0.075856 0.033241
Dining > Walking > Shopping 365 0.042882 0.106667
Dining > Shopping > Outdoor 870 0.311607 0.03211
Heritage Trail > Nature > Outdoor 965 0.047742 0.209581
Dining > Nature > Religious 344 0.14494 0.009862
Shopping > Religious > Heritage Trail 523 0.007433 0.192308
Religious > Hiking > Shopping 613 0.091481 0.015625
Entertainment > Walking > Dining 723 0.014294 0.1
Shopping > Refreshments > Hiking 287 0.043453 0.026316
Dining > Archives > Outdoor 965 0.011435 0.1
Shopping > Nature > Dining 965 0.311607 0.011009
TABLE V: Case 3:Most predominant Sequential Activity Patterns among Tourists

Both male and female tourists were largely involved in Shopping, Nature walk, and Dining. The tourists prefer to visit variety of hotels and restaurants to explore different types of food owing to the rich Asian culture. The activity pattern Shopping > Field > Nature had the highest frequency indicating that most of the tourists spent substantial hours of their daytime involving in recreational and other outdoor activities.

Activity Sequence 1 > 2 > 3 Frequency Support Confidence
Shopping > Travel > Nature 1700 0.010649 0.21875
Dining > Travel > Religious 1480 0.115474 0.027378
Dining > Refreshment > Archives 1650 0.034276 0.092233
Hiking > Nature > Leisure 1400 0.255907 0.004551
Refreshments > Dining > Arcade 1365 0.004659 0.25
Dining > Shopping > Nature 1278 0.255907 0.030559
Dining > Archives > Nature 1415 0.037604 0.207965
Hiking > Nature > Entertainment 1498 0.255907 0.011704
Field > Nature > Outdoor 1370 0.009151 0.327273
Travel > Dining > Nature 1589 0.115474 0.063401
Nature > Shopping > Outdoor 1220 0.083195 0.088
Archives > Outdoor > Sporting 1120 0.255907 0.005202
TABLE VI: Case 4: Sequential Activity Patterns of Female Tourists

It is clear from the above activity patterns, case 4, that most of the female tourists preferred to engage in shopping followed by nature walk, while others were involved in outdoor and sporting activities. This could also be attributed to the varying age groups of the female tourists, that were included in the dataset.

Activity Sequence 1 > 2 > 3 Frequency Support Confidence
Shopping > Travel > Nature 1200 0.010318 0.209677
Travel > Archives > Entertainment 1500 0.115327 0.027417
Hiking > Entertainment > Gaming 1550 0.034948 0.090476
Dining > Religious > Nature 1400 0.25778 0.027114
Archives > Refreshments > Nature 1265 0.03578 0.195349
Dining > Recreation > Nature 1178 0.25778 0.01162
Nature > Dining > Market 1315 0.008654 0.346154
Dining > Travel > Field 1498 0.009819 0.135593
Nature > Outdoor > Religious 1370 0.08221 0.016194
Field > Travel > Nature 1189 0.115327 0.063492
Nature > Sporting > Gaming 1120 0.08221 0.089069
Nature > Outdoor > Dining 1120 0.25778 0.008393
TABLE VII: Case 5:Sequential Activity Patterns of Male Tourists

It can be drawn from the above cases that support threshold and confidence contributed significantly in the generation of the sequential patterns. Thus it can be concluded that (Travel > Archives > Entertainment) and (Hiking > Entertainment > Gaming), in case 5, had the highest frequency, implying that male tourists preferred entertainment activities during their trip. Essentially, the study attempted to investigate the times in which the tourists checked in at their respective destinations. Notably, there were places where tourists checked in more than once such as the shopping malls, metro stations, hotels, scenic lookouts and food courts. For this research, a typical day begun at 07:00 hrs. and ended at 23:00 hrs. Hence, the activities done at the beginning of the day were placed at the top of the sequence patterns followed by other activities simultaneously.

V Conclusion

In every economy, tourists form the largest contributor to its growth and development. In regards, there is need to further understand all the dynamics to be able to improve on the service delivery to both international and domestic tourists. SPM is one of the most important and extensively used data mining technique that is applied in the tourism sector in order to understand the activity patterns of the visitors in various destinations. This study involved the use of the Pattern-Growth method to facilitate the SPM process into the formulation of the activity patterns. Although tourists of both genders shared some of the activity patterns, there were patterns that were independent of each other. The tourism management of Singapore will be able to use the insights of the most interesting patterns to understand the appropriate check-in times for the various activities hence will be able to prepare more tailored tour package as well as improve and customize the delivery of their travel services.


The authors gratefully acknowledge the valuable direction and support of Professor Hua Wang, Centre for Applied Informatics, College of Engineering & Science and Research Fellow Huy Quan in the research and findings of this paper. Due to the numerous perspective comments on the several drafts of the manuscript and signifying gaps in the knowledge, the author was able to compile and write this paper successfully.


  • [1] Kozak Metin Kozak Nazmi. Tourist Behaviour: An International Perspective. CABI, 2016.
  • [2] Bathelt Harald Zeng Gang. Temporary Knowledge Ecologies: The Rise of Trade Fairs in the Asia-Pacific Region. Edward Elgar Publishing, 2015.
  • [3] et al Min Li. Privacy-aware access control with trust management in web service. World Wide Web, 14(4):407–430, Jul 2011.
  • [4] Tarek Taleb Hua Wang, Zonghua Zhang. Editorial: Special issue on security and privacy of iot. World Wide Web, 21(1):1–6, Jan 2018.
  • [5] Georgios Kambourakis Hua Wang, Xiaohong Jiang. Special issue on security, privacy and trust in network-based big data. Inf. Sci., 318(C):48–50, October 2015.
  • [6] Eunju Kim, Sumi Helal, and Diane Cook. Human activity recognition and pattern discovery. IEEE Pervasive Computing/IEEE Computer Society [and] IEEE Communications Society, 9(1):48, 2010.
  • [7] Michael Bauder and Tim Freytag. Visitor mobility in the city and the effects of travel preparation. Tourism Geographies, 17(5):682–700, 2015.
  • [8] J Adam Beeco, Wei-Jue Huang, Jeffrey C Hallo, William C Norman, Nancy G McGehee, John McGee, and Cari Goetcheus. Gps tracking of travel routes of wanderers and planners. Tourism Geographies, 15(3):551–573, 2013.
  • [9] Sabereh Dejbakhsh, Colin Arrowsmith, and Merv Jackson. Cultural influence on spatial behaviour. Tourism Geographies, 13(1):91–111, 2011.
  • [10] Diem-Trinh Le-Klähn, Jutta Roosen, Regine Gerike, and C Michael Hall. Factors affecting tourists’ public transport use and areas visited at destinations. Tourism Geographies, 17(5):738–757, 2015.
  • [11] Han Jiawei Aggarwal Charu. Frequent Pattern Mining. Springer Media, 2013.
  • [12] Hall C Michael Page Stephen. Tourism in South and Southeast Asia. Routledge, 2012.
  • [13] Li Xiangping Meng Fang Uysal Muzaffer. Spatial pattern of tourist flows among the asia-pacific countries: An examination over a decade. pages 229–243, 2008.
  • [14] Weimin Zheng, Xiaoting Huang, and Yuan Li. Understanding the tourist mobility using gps: Where is the next place? Tourism Management, 59:267–280, 2017.
  • [15] Bálint Kádár. Measuring tourist activities in cities using geotagged photography. Tourism Geographies, 16(1):88–104, 2014.
  • [16] Yang Yang, Timothy Fik, and Jie Zhang. Modeling sequential tourist flows: Where is the next destination? Annals of Tourism Research, 43:297–320, 2013.
  • [17] Yang Liu Wu Lun Liu Yu Kang Chaogui. Quantifying tourist behavior patterns by travel. pages 2–18, 2017.
  • [18] Ruan Da Chen Guoqing Kerre E Etienne Wets Geert. Intelligent Data Mining: Techniques and Applications. Springer Science Business Media, 2005.
  • [19] Yanchuan Zhang Hua Wang, Jinli Cao. Ticket-based service access scheme for mobile users. In Proceedings of the Twenty-fifth Australasian Conference on Computer Science - Volume 4, ACSC ’02, pages 285–292. Australian Computer Society, Inc., 2002.
  • [20] Hua Wang Ji Zhang, Xiaohui Tao. Outlier detection from large distributed databases. World Wide Web, 17(4):539–568, Jul 2014.
  • [21] Zimanyi Esteban. Business Intelligence: Third European Summer School, eBISS 2013, Dagstuhl Castle, Germany, July 7-12, 2013, Tutorial Lectures. Springer, 2017.
  • [22] Xiaoxun Sun, Hua Wang, Jiuyong Li, and Yanchun Zhang. Satisfying privacy requirements before data anonymization. The Computer Journal, 55(4):422–437, 2012.
  • [23] Xiaoxun Sun, Hua Wang, Jiuyong Li, and Yanchun Zhang. Injecting purpose and trust into data anonymisation. Computers & Security, 30(5):332 – 345, 2011. Advances in network and system security.
  • [24] Hua Wang Ashley Plank Xiaoxun Sun, Min Li. An efficient hash-based algorithm for minimal k-anonymity. In Proceedings of the Thirty-first Australasian Conference on Computer Science - Volume 74, ACSC ’08, pages 101–107. Australian Computer Society, Inc., 2008.
  • [25] Kim Sangkyun Reijnders Stijn. Film Tourism in Asia: Evolution, Transformation, and Trajectory. Springer, 2017.
  • [26] Aggarwal C Charu. Managing and Mining Sensor Data. Springer Science Business Media, 2013.
  • [27] Huy Quan Vu, Gang Li, Rob Law, and Ben Haobin Ye. Exploring the travel behaviors of inbound tourists to hong kong using geotagged photos. Tourism Management, 46:222–232, 2015.
  • [28] Huy Quan Vu, Rosanna Leung, Jia Rong, and Yuan Miao. Exploring park visitors’ activities in hong kong using geotagged photos. In Information and Communication Technologies in Tourism 2016, pages 183–196. Springer, 2016.
  • [29] Man Wah Yeung, Seongseop Kim, and Markus Schuckert. Japanese tourists to hong kong: Their preferences, behavior, and image perception. Journal of Travel & Tourism Marketing, 33(5):730–741, 2016.
  • [30] Alvin Chua, Loris Servillo, Ernesto Marcheggiani, and Andrew Vande Moere. Mapping cilento: Using geotagged social media data to characterize tourist flows in southern italy. Tourism Management, 57:295–310, 2016.
  • [31] Saraf Pratik Sedamkar R R Rathi Sheetal. Prefixspan algorithm for finding sequential pattern with. pages 37–41, 2015.
  • [32] Jiuyong Li Jian Pei Xiaoxun Sun, Hua Wang. Publishing anonymous survey rating data. Data Mining and Knowledge Discovery, 23(3):379–406, Nov 2011.
  • [33] Jia-Wei Han, Jian Pei, and Xi-Feng Yan. From sequential pattern mining to structured pattern mining: a pattern-growth approach. Journal of Computer Science and Technology, 19(3):257–279, 2004.
  • [34] Hua Wang Grant Daggard Hong Hu, Jiuyong Li. Combined gene selection methods for microarray data analysis. In Knowledge-Based Intelligent Information and Engineering Systems, pages 976–983, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
  • [35] Hua Wang Bin Zhou Hu Li, Ye Wang. Multi-window based ensemble learning for classification of imbalanced streaming data. World Wide Web, 20(6):1507–1525, Nov 2017.
  • [36] Min Peng et al. Personalized app recommendation based on app permissions. World Wide Web, 21(1):89–104, Jan 2018.