MobInsight: A Framework Using Semantic Neighborhood Features for Localized Interpretations of Urban Mobility

09/29/2017 ∙ by Souneil Park, et al. ∙ 0

Collective urban mobility embodies the residents' local insights on the city. Mobility practices of the residents are produced from their spatial choices, which involve various considerations such as the atmosphere of destinations, distance, past experiences, and preferences. The advances in mobile computing and the rise of geo-social platforms have provided the means for capturing the mobility practices; however, interpreting the residents' insights is challenging due to the scale and complexity of an urban environment, and its unique context. In this paper, we present MobInsight, a framework for making localized interpretations of urban mobility that reflect various aspects of the urbanism. MobInsight extracts a rich set of neighborhood features through holistic semantic aggregation, and models the mobility between all-pairs of neighborhoods. We evaluate MobInsight with the mobility data of Barcelona and demonstrate diverse localized and semantically-rich interpretations.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

page 13

page 14

page 16

page 17

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

The mobility practices in urban spaces reflect diverse spatial choices stemming from the different lives and experiences of residents. For every spatial choice, consciously or unconsciously, people go through a decision-making process with their insights. They make sense of the context of their travel, consider the spatial and time constraints, project a mental map of relevant areas, and perform a search of a place or a route that best suits their needs. However, the insights involved in this process are not explicitly revealed in the motility itself. The decisions also come more often from intuition than thoughtful reasoning, making it difficult for people to recall and elaborate on them. While such properties hinder the interpretation of the insights, recent advances in mobile computing and geo-crowdsourcing systems are opening new opportunities to approach it at scale, by exposing the movement patterns, detailed information about places, and aggregated preferences and experiences of people.

In this paper, we present MobInsight, a framework for making localized interpretations of urban mobility. MobInsight develops an extensive set of local features, and comprehensively explains a fine-grained segmentation of the mobility using the local features. The framework first points out the key features of each neighborhood that affect the mobility of the area. It further expands the view to the relation between all neighborhoods, enabling analyses about how different neighborhood features interact in determining the mobility between them. MobInsight is designed to facilitate site-specific interpretations that are difficult to make with general theories or models of human mobility (Erlander and Stewart, 1990; Simini et al., 2012; Stouffer, 1940). Although those general models capture commonly important factors, such as distance and population, they are detached from the unique urban context of the area, which limits the scope of possible interpretations.

Enabling localized interpretations involves challenging problems for research. It requires extensive exploration of numerous possible factors that shape the characteristics of an area, and sophisticated modeling techniques to explain the complex relations between those factors and mobility. MobInsight employs two main approaches to facilitate highly localized mobility interpretations. Firstly, we develop holistic semantic aggregation for comprehensive neighborhood feature analysis. It thoroughly identifies the existing places and their function by analyzing diverse online sources, including local guides, geo-crowdsourcing services, also an open directory data from the city government. Having such heterogeneous sources gives a more complete picture of the neighborhoods and mitigates possible selection biases. The method also takes benefit of the semantic annotations left on the places, which capture the meanings given in-situ by actual visitors. Using semantic analysis techniques, the method fuses the structural and linguistic heterogeneity of the annotations across the sources and produces a unified neighborhood profiling scheme.

Secondly, MobInsight performs all-pairs inter-neighborhood mobility modeling, to comprehensively explore the associations between the neighborhood features and the mobility of the people. Having all possible neighborhood pairs in the analysis not only expands the range of potential interpretations, but also mitigates socio-economic, demographic, or regional biases. For this, we take advantage of the real telecommunication logs from the largest operator of the target city, i.e.

, Barcelona, which enables the construction of a full inter-neighborhood mobility matrix. The logs include 35 million samples of call data records (CDRs) of the residents collected during a month. The data set includes the logs of all mobile phones, not just smartphones or devices with a certain mobile app. The framework employs a multi-layer neural network to learn the complex relationships between the mobility and the features. It also performs

model auditing for intuitive explanation of feature importances.

The evaluation is composed of two parts. First, through a mobility estimation task, we verify if the neighborhood features contribute to explaining the mobility. Second, we elaborate on the different types of interpretations enabled by MobInsight, and discuss how they are aligned with the descriptions about the urbanism of Barcelona.

2. Related Work

2.1. Reflections on Urban Mobility

The rich meanings behind urban mobility have been explored in many studies by reflecting on the psychological, historical, social, and cultural aspects it embodies. As the manifestation of the meanings is inherently implicit, the studies take a larger perspective and consider diverse aspects rather than to simply view mobility as ‘changes in locations.’ The larger perspective allows a deeper interpretation of the mobility with respect to many related themes, such as how people perceive spaces, and build spatial relations and practices.

Lynch’s work (Lynch, 1960) explores the mental models of urban spaces and investigates the quality of easily recognizable spaces (“legibility” to use the author’s term). While the work implies the importance of the mental representation of the space, recent works have further explored additional factors that affect the perception of the space and mobility such as demographics, technology use (Bentley et al., 2012), and emotional pleasantness (Quercia et al., 2014). Dourish (Dourish et al., 2007) emphasizes the diversity in the mobility experience, and challenges the narrow interpretations of mobility that are often found in the mobile computing research of that period. His discussion elaborates on the role of historical and cultural context in how people identify and develop relationship with particular spaces. He also emphasizes the differences in the observations made depending on social groups, even from very similar mobility patterns. The concept ‘place-identity’ (Proshansky et al., 1983) of the urbanism literature looks into such particularities in the meanings of places further at the level of individuals, and study how they are related to self-identities.

We believe De Certeau’s concept of “tactics” (De Certeau, 1984) can be read as an explanation of why such diverse meanings are involved in the mobility. The concept pays attention to the routines of ordinary people who are positioned as ‘users’ of the physical, socio-cultural, and institutional basis, distinguished from those who have the power to shape and control the basis (those who practice “strategy” in the author’s term). It highlights the creative ways people individualize the basis by altering, adapting, and appropriating it. In the context of urban mobility, he views everyday movements of people as appropriation of the physical space. The space obtains individualized meanings through the spatial tactics of people, which often go beyond rules or expected uses of the space.

We share the view rooted in the above works that rich meanings are involved behind the exhibited mobility, which emphasizes the importance of the mobility studies that are specific to the site and context of the target area. We believe the recent growth of geo-social web services and open data initiatives are creating new opportunities for such studies. Our work aims to provide the tools and techniques for these studies, empowering them to deal with the diversity and complexity of the information involved.

2.2. Computational Approaches to Mobility Interpretation

A large body of work exists on data-driven analysis of human mobility due to the pervasive use of mobile devices and the growing adoption of geo-crowdsourcing applications. Focusing on the works that attempt to explain human mobility, we put the relevant works broadly into two classes. A class of works explore common factors behind human mobility. A frequent topic of the works is to develop estimation models for a mobility application, such as commute patterns (Lenormand et al., 2015), international trade (Koo and Karemera, 1991), and virus spreading (Frias-Martinez et al., 2011). As the works develop generalized models, many of them build upon the laws of physics, for example, the gravity model (Erlander and Stewart, 1990), which uses the distance between two points and the ‘mass’ of those points (e.g., a property such as population). There are also works that use the models that consider the availability of opportunities, such as the radiation model (Simini et al., 2012) or Stoufer’s law of intervening opportunities (Stouffer, 1940). The availability of opportunity is approximated often with the number of jobs or existing places in an area (Noulas et al., 2012).

We conjecture that another direction of work is to look for local, site-specific interpretations of the mobility. However, there are relatively fewer works of this line despite the diversity of cities with their own urban context. Although there are works that use a city-specific data set (e.g., social media check-ins or mobile communication logs of a city) the findings are often made over a common interpretation frame which does not sufficiently capture the local uniqueness of the city; for example, many works (Lenormand et al., 2015; Yuan et al., 2012) have studied the functional areas of a city according to the general land use classes, such as residential, business, entertainment, etc. Similarly, the works on spatio-temporal patterns (Kling and Pozdnoukhov, 2012; Long et al., 2012) often observe diurnal cycles or major activities that can be found similarly in different cities. Our view is that it is important to make more localized interpretations that reflect on the historic, economic, and cultural context of the site, and it is necessary to have tailored techniques or analysis methods for the purpose.

We believe that Cranshaw et al.’s work (Cranshaw et al., 2012)

shares our view since it explores the space of interpretations specific to the selected site. The work takes Foursquare check-ins in Pittsburgh and identifies geographical clusters of Foursquare venues based on the check-in patterns. The validation explores the relation between the identified clusters and various factors that shape the dynamics of the city, such as the economic background of areas, demographics, and administrative boundaries, and geography. Our work shows that local insights can be also obtained through a very different analysis. In addition, the task and the data set we develop enables a quantitative assessment for certain aspects of the framework, whereas the cluster analysis of

(Cranshaw et al., 2012) had to be fully qualitative.

2.3. Data-driven Analysis of Urban Spaces

In a larger context, our work is related to the emerging area of urban informatics. New types of digital data on urban spaces have encouraged many works to study various aspects of an urban area. The aspects explored include the ones typically studied in offline surveys and also those that became newly measurable through the new data (e.g., walkability (Quercia et al., 2015)). For example, Smith et al. (Smith et al., 2013) studied the deprivation status of the areas in London, and analyzed if associations can be found with the usage of public transportation. Call logs data was also used to estimate the liveliness of areas in De Nadai et al.’s work (De Nadai et al., 2016). They explored associations between the liveliness of neighborhoods and a number of properties related to diversity (types of buildings, streets, density, etc.) to quantitatively evaluate the associations suggested in an earlier study of Jane Jacobs (Jacobs, 1961). In addition, the photos shared in Flickr were used together with the Foursquare venues to understand various aspects of walkability of the streets in London (Quercia et al., 2015).

While geo-tagged social media is frequently used in many works, a few recent works take a step back and evaluate the validity of the data sources. Johnson et al. (Johnson et al., 2016b) assess the assumption that geo-tagged social media data are made by local people. They observe that such an assumption does not hold for a significant amount of the data and further find socio-demographic biases. Another work of Johnson et al. (Johnson et al., 2016a) looks into the urban/rural divide in OpenStreetMap and Wikipedia places, and observe similar socio-demographic biases and quality differences. These findings resonate the concerns expressed in an earlier work (Dourish et al., 2007), which pointed out the limited view of expected users found in many mobile computing research (often young, affluent and familiar with technology). Though it is difficult to fully address such concerns when using newly emerging social platforms, we acknowledge the possible limitations in our study and try to mitigate them by expanding the collection of data to a broad range of sources of different characteristics.

3. MobInsight Framework

Figure 1 shows the architecture of the MobInsight framework. The framework runs two main data processing flows; first, the one that implements our holistic semantic aggregation approach, collecting places of all neighborhoods and computing their profiles through semantic analysis; second, the flow which uses the neighborhood profiles for mobility modeling and feature analysis. The results of the two data processing flows are merged into a visual interface, which provides an integrated view of the features and mobility, and supports interactive exploration. In this section, we describe the main techniques of the two data processing flows. The visual interface is explained later in the evaluation section together with example mobility interpretations.

Figure 1. MobInsight Architecture.

3.1. Holistic Semantic Aggregation

Although advances in mobile computing and geo-social platforms have produced useful tools for understanding urban spaces, many challenges and issues arise if an analysis has to comprehensively look at the details. While diverse potential data sources are available, they are extremely heterogeneous. They focus on different aspects of a city as they have different goals; for example, Wikipedia and traveling guides cover different places and offer different type of information. In addition, the structure and the language that describe the places vary among the sources.

Our basic intuition is that the different character of neighborhoods can be captured if the existing places and their function can be comprehensively identified and aggregated. For example, a neighborhood with many handicraft shops and art galleries is likely to have a different character than a neighborhood with department stores. The intuition is connected to a number of urbanism theories that also use the places and their meanings as essential elements for characterizing an area. Place-identity and placemaking studies (Proshansky et al., 1983) put the perception and experience of people in the places at the center of understanding an area. These theories not only highlight the importance of the places but also imply the potential value of crowd sourced descriptions. The places are also basic elements in urban morphology (Moudon, 1997) and space syntax (Hillier, 2007) theories, though they put more emphasis on the layout and connections between them as they focus on the urban structure and its development. Empirical qualitative studies (Næss and Jensen, 2002) that observed the influence of urban facilities on mobility more directly also motivates a large-scale data driven analysis.

To achieve the goal, holistic semantic aggregation covers diverse sources for place identification, fuses the heterogeneous descriptions, and creates a unified aggregation scheme for neighborhood profiling. Due to the extensive coverage of existing places, the method produces a neighborhood profiling scheme that is much detailed than conventional schemes used in urban planning (e.g., residential, business, commercial, entertainment, etc.). As the profiling scheme is built by taking rich semantic metadata of the places and applying linguistic techniques to it, the scheme also allows intuitive interpretations.

We now describe the two steps of the method in detail: place identification, and semantic aggregation.

Place Identification

We explore various available online resources for data collection and choose 15 different sources, including geo-social services, traveling/local guides, mapping services, and open directory data of the local government. Table 1 lists all the sources used for the collection and the number of places collected from each source. If available, the data collection was conducted through the REST API offered by the source. As for the others, we built a scraper customized for each source. The final collection has more than 128,000 places. The collected information includes the name and the location (address and coordinates), source specific meta-data (e.g., number of check-ins, stars), and reviews left on those places.

Since many sources have tourists as their target audience, a possible bias that we considered in the collection is the tourist bias, especially given the importance and the size of the tourism sector in Barcelona. Another possible limitation is the bias to commercial places, since the sources are mostly connected directly or indirectly to businesses. Regarding the geo-social services, it is possible that they might not include enough routine places (e.g., groceries, work, school) as the users have less motivation to share the locations in such places (Lindqvist et al., 2011). These biases can result in the exclusion of important places that actually have a strong relation to mobility.

The open directory data of the local government (Ope, 2017) was added especially to mitigate the aforementioned biases. The data is produced through a manual survey of all the addresses registered to the city except private homes. As shown in Table 1, a relatively small portion of places of this source overlaps with those of other sources, which suggests that the source is covering different aspects of the city. We also conducted a quick analysis of the category tags and observed that public sector venues and many small businesses (e.g., retail, repair shops, pharmacies) are listed only in the government data.

Table 1. Data sources and the number of collected places.

A challenge that arises from using multiple sources is the existence of duplicates across the sources, especially for famous places. Resolving duplicates is important as they distort the neighborhood profiles. However, identifying duplicates is not a trivial task, since the properties of places that hint the resolution (e.g., the name and coordinates) slightly vary across the sources for the same places. For example, some sources list the same place ‘Mobile World Center’ with a slight variation, e.g. ‘Mobile World Center, Barcelona.’ In addition, popular names of an area, such as the metro station name, appear in many different places in the area, which confuses the identification of the name variations.

In principle, we resolve duplicates by merging the places that have overlapping tokens in their names and are located in close proximity. However, in order to avoid merging different places that simply share a common token in their names, we set a threshold for the candidate places that share the token and do not perform the merging if the number is greater than the threshold. Since an identical place cannot appear more times than the number of data sources, we set the threshold to the number of sources used. The places whose coordinates are located within 50 meters are finally merged.

Semantic aggregation

This step creates a categorization scheme of places specific to the city of interest, and completes the profiling by aggregating the places of each category. The ultimate profile allows computing the difference between neighborhoods by comparing the number of places under each category. The core of this step is creating a categorization scheme that reflects the diversity of the collected places, while keeping the number of categories to a reasonable amount for intuitive interpretation. In order to extract a diverse and inclusive set of categories, we take all the places in the entire city into account, not just those of a specific neighborhood.

The categories are extracted by applying semantic analysis techniques to the meta-data of the places including the tags (crowd-sourced keywords) and the classification taxonomy of the sources. This creates the challenge of dealing with the wide-variety of vocabularies used in the meta-data. The crowd-sourced tags are less structured and the taxonomies are different across the sources.

We apply a combination of dimensionality reduction and clustering to obtain a set of categories that preserve the diversity and semantic relatedness of the meta-data. First, the meta-data of each place is mapped to a binary word vector by taking all possible n-grams up to the number of tokens, and applying lemmatization to them. Considering the type of data (

i.e., set of keywords) we chose latent semantic analysis (Deerwester et al., 1990)

for dimensionality reduction. The number of dimensions is reduced to 100, which explains 77% of the variance.

K-means clustering (Xu and Wunsch, 2008) is applied over the reduced dimensions to categorize the places. The number K is chosen empirically based on the silhouette score, which measures the consistency of clusters by computing the distance of elements to their own cluster compared to the other clusters (Rousseeuw, 1987). The score increased with K and saturated around 0.7 for K larger than 85, thus we take K=85.

Instead of taking the 85 clusters as the final categories of places, we tried to further reduce the categories to a reasonable number which allows intuitive interpretation of the neighborhood profiles. We went through the most frequent terms of the clusters manually, and merged the clusters that seem redundant or those that can be combined under a higher level of abstraction (e.g., merging the clusters ‘financial services’ and ‘advertisement agency’ under the abstraction ‘professional service’).

Table 2. Categories from the clustering result.

We believe there is a possible trade-off between having many fine-grained categories and interpretability, and identifying the optimal granularity is an unclear problem which may depend on many factors such as the goal of an application, local context of the city, etc. As the focus of our current work is on the development of the overall framework and its evaluation, we first use the 17 categories shown in Table 2, which were produced through the above process. We add the total place count as an additional category, thus, 18 features are ultimately used to profile the neighborhoods.

Figure 2 depicts a visual example contrasting the places of the two different categories, Daily Purchases (presented with red dots) and Attractions (blue). It shows that there are more red triangles and that they are more dispersed, whereas the blue dots are more centered to the downtown area.

Figure 2. Figure 2. ‘Daily Purchases’ (red) vs. ‘Attraction’ (blue).

3.2. All-Pair Mobility Modeling & Model Auditing

Associating neighborhood features with mobility is the key function of the framework that enables interpretations. However, understanding the complicated relations between the features and the mobility is a non-trivial task. Prior works on mobility models (Erlander and Stewart, 1990; Simini et al., 2012) suggest that generalized rules that assume a certain relation (e.g., effect of distance on mobility) are limited in terms of explaining the complexity of intra-urban mobility. Even if diverse features are available, the complexity of mobility is less likely to be explained in simple terms, for example, assuming higher mobility from an area with few schools to an area that have many schools. The features could interact in unexpected ways. In addition, mobility between a pair of areas could be influenced not only by the features of their own but also by the features of other areas surrounding the two.

We approach to model the relations between the features and mobility using a multi-layer neural network. Instead of relying on pre-defined assumptions about the effect of the features, the method can perform tailored estimation of the features’ importance as it learns from the actual mobility data of the target city. Furthermore, the approach can learn possible nonlinear interactions between the features by having multiple layers.

Another strength of our approach is that it is designed to consider the effect of all other neighborhoods together when estimating the mobility flow between a pair of neighborhoods. Given a neighborhood, the model learns the mobility flow from the neighborhood to all others as a whole, instead of learning the flow to individual destinations independently. The prediction of the model for a neighborhood is a probability distribution, which represents the mobility flow between the neighborhood and all others.

We now describe the two steps of the approach, mobility modeling and model auditing, in detail.

Mobility Modeling

Given a set of features for a neighborhood i, we want a model to predict the flow probability distribution , i.e., = Model(), where the j-th component of represents the probability that a citizen from neighborhood i moves to/from neighborhood j in a given time frame. The direction of movement and the time frame are determined by the data set we develop (refer to the section Evaluation Design).

Since must contain real values, we could think of performing a regression for each of them. However, as represents a probability distribution, we want (i) to normalize the output to sum to 1 and (ii) to train the model considering the whole distribution and all the interactions between its components (as opposed to separately training one model for each component). The simplest model that fulfills these characteristics is the multivariate linear model with softmax output (Hastie, Tibshirani, and Friedman, Hastie et al.)

. This corresponds to the combination of a multivariate linear regression,

(1)

with a softmax function,

(2)

where W is a weight matrix, b

corresponds to the bias vector, and the sum in the denominator of

is taken over all the components of the vector.

More advanced models can be built upon since the simple model corresponds to a one-layer feed-forward neural network with a softmax activation

(Goodfellow et al., 2016)

. Thus, we can stack up several layers to obtain a (potentially more accurate) nonlinear model. We also explore this possibility by considering up to 4 layers with 100 rectified linear units

(Glorot and Bengio, 2010) each.

To train the models we use gradient descent and adapt the learning rate per dimension using ADADELTA (Zeiler, 2012)

. We train for 3000 epochs using batches of 10 instances and shuffling. In order to avoid overfitting, we employ dropout

(Srivastava et al., 2014) with a probability of 0.5. In addition, we perform data augmentation (Goodfellow et al., 2016) by adding a 5% Gaussian noise to the input x

, which we previously normalize to have zero mean and unit standard deviation. Models’ weights are initialized with the so-called Glorot initialization

(Glorot and Bengio, 2010).

Model Auditing

For the estimation result of each neighborhood, the framework measures the importance of individual features through mean decrease accuracy (also known as permutation importance or direct influence) (Breiman, 2001). The general idea is to permute the values of each feature randomly, one at a time, and measure how much the permutation increases the error of the pre-trained model. Intuitively, the permutation of important variables should have a strong effect on model’s accuracy, while permuting non-important variables should have little or no effect. For each feature, we measure the relative improvement (%) of the estimation performance compared to when it is randomized.

4. Evaluation Design

We focus our evaluation on the two primary functions of MobInsight: first, if the neighborhood features contribute to mobility estimation; second, if the feature analysis leads to sound interpretations specific to the target city.

4.1. Mobility Data of Barcelona

We sample the mobility of Barcelona through a cell-phone network infrastructure. Cell phone networks are built using a set of base transceiver stations (BTS) that connect cell phones to the network. Each BTS has a latitude and a longitude, and gives coverage to an area called a cell. We follow the common practice (Frias-Martinez et al., 2011; Lenormand et al., 2015) that assumes the cell of each BTS can be approximated by a two-dimensional non-overlapping polygon, and we use a Voronoi tessellation for the approximation. The location of the cell-phone user is assumed to be somewhere inside the cell. Note that no information about the exact position of users is known.

The call data records (CDR) dataset used in this study contains all the phone calls, SMS, and MMS recorded by a major operator. The main fields of each CDR entry are: (1) a hashed ID of the originating cellphone number (2) that of the receiver (3) a time-stamp (when a call starts) (4) the duration of the call and (5) the BTS tower used.

The data was collected from the BTS towers located in Barcelona. The period of data collection was from Feb. 1 to Feb. 28, 2014. There were more than 700 active BTSs during the whole period, offering a sufficient level of segmentation of the city, much finer grained than the neighborhood-based division. In order to focus on real residents of the city and avoid tourist effects, we disregarded the records of roaming phones and those of pre-paid SIM cards. The data set had CDRs collected during one month, which account roughly for 2.5M unique phones and around 35M interactions. To preserve privacy, all the information is aggregated and encrypted. No contract or demographic data was considered, requested nor available for this study. Data collection and anonymization was done by a third party that was not involved in the analysis.

4.2. Mobility Matrix of All-Pairs of Neighborhoods

A mobility matrix, commonly called the O-D matrix (Frias-Martinez et al., 2011), characterizes the transitions of a population between different geographical regions representing the origin (O) and destination (D) of a route. Typically, O and D are the same set and represent the towns or neighborhoods of the geographical area under study. Each element of the matrix (i, j) defines the percentage or the total number of travels made to by individuals who live in .

We construct the matrix at the neighborhood level from the CDR data. For this, we first apply a home detection algorithm that infers the users’ home at a BTS level, and then group all the individuals whose home neighborhood is the same (group of BTSs within the region that defines the neighborhood). After that, the aggregated mobility of all the individuals of each neighborhood to other neighborhoods is estimated. The neighborhoods are defined by following the definition of the city municipality. To prevent noise, we only considered the users with at least 10 records in the data set.

Home Detection

We used a simplified version of the algorithm presented in (Kung et al., 2014)

. For each user, we subsume the home as the neighborhood in which the most frequently used BTS cell is located, considering the records made between (1) Monday through Thursday from 20:00 to 08:00 and (2) Saturday and Sunday at any moment during the day. If the second mostly used BTS has less than 80% of the usage of the first one, we assume the first BTS to be the home location. Otherwise, we check the physical distance between the first and the second most used BTSs. If they are within 100 meters, we consider the first BTS as the home location. If the distance is larger, we assume that we cannot reliably identify any of them as the home and discard the users from the data. As a quick validation, we computed the Pearson correlation coefficient between our estimation of neighborhood population and that of the neighborhood census of Barcelona

(Ope, 2017), and observed 0.73 as the result.

Computation of Mobility Matrix

Once the home neighborhood of people is identified, the CDR entries from outside of it are considered as travelling samples. For each neighborhood, we count the residents’ records found in all other neighborhoods. We only count the visits per day, so that if a user visits the same neighborhood several times during a day, it is counted as one visit. The final result is a 70x70 matrix that represents the frequency of visits to other neighborhoods during the considered time period. In the estimation experiment, the matrix is normalized per row to produce a normalized frequency distribution of travels to other neighborhoods.

4.3. Mobility Estimation Setup

We derive two estimation tasks from the mobility matrix: estimation of ‘To’ and ‘From’. As the name indicates, the task ‘To’ is the estimation of the relative frequency of travels made to other neighborhoods from the home neighborhood. On the other hand, ‘From’ estimates the relative frequency of visits from all other neighborhoods to the home, which can be obtained by transposing the mobility matrix.

4.3.1. Estimation Metric

Due to the limited number of data points (# of neighborhoods), we train and evaluate the model using leave-one-out cross-validation. Each iteration of the validation takes a particular neighborhood, which was left out in the training, and estimates the probability of travelling to all other neighborhood (or, for the ‘from’ task, the probability of travels made to that neighborhood from all others). The quality of the estimation is measured using the Kullback-Leibler (KL) divergence (Kullback and Leibler, 1951) between the true (estimated) probability q and the predicted probability p.

(3)

The KL divergence measures the entropy increase caused by the estimated distribution relative to the ground-truth distribution. In other words, it is the amount of information lost when p is used to approximate q. For a concise presentation, we report the average KL divergence across all the neighborhoods.

4.3.2. Comparison Methods

We briefly describe the comparison methods below.

  • Random Model: This model produces a random probability distribution at all iterations of the cross-validation. It gives us an indication of the chance level in our tasks.

  • Average-based model: At each iteration, this model takes the average mobility of the other neighborhoods. More specifically, to estimate the mobility matrix M, the row of the neighborhood i is estimated at each iteration. Each element of the row is calculated as , where N is the set of the neighborhoods in the training set.

  • Gravity model (Erlander and Stewart, 1990): This model uses the population of neighborhoods, H, and the distance information between them, d.

    (4)

    We acquired the population data from the open government data of Barcelona (Ope, 2017) and approximated the distance with the transportation time between all pairs of neighborhoods returned from Google Maps. The scaling constant g is optimized to the value which produces the minimum KL-divergence (a grid search over the full data set is performed, hence it is an optimistic performance estimate (Hastie, Tibshirani, and Friedman, Hastie et al.)).

  • Proposed Model (‘NF_Dist’): It uses neighborhood features (NF) and distance information (Dist).

  • Proposed Model without open government data

    (‘NF_woPub_Dist’): In order to understand the importance of having a less-skewed data set covering the places of the public sector, we also create a comparison method employing the same methodology, except that we exclude the open government data set.

  • Pairwise Model: Through this model, we intend to see the performance when individual mobility flows are learned and predicted independently, not as a whole. While all configurations are kept as the same as in the proposed model, a separate model is trained for each element of the probability distribution vector explained in section 3.2.

  • Vector Distance Model

    (‘VecDist’): This method uses a simple vector-based distance measure for the prediction, which does not learn the effect of features from data. The prediction is based on the neighborhood features (NF) and the distance (d), where cosine similarity is used for measuring the difference of the neighborhood features.

    (5)

4.4. Approach to Evaluation of Mobility Interpretations

In contrast to the mobility estimation performance, the interpretations are inherently qualitative and subtle. The correctness of an interpretation is not always straightforward and it is hard to clearly define the scope of right answers. In addition, the interpretations of our interest deal with the mobility patterns at a city-scale, making it difficult to conduct surveys or find experts who are familiar with such large-scale patterns.

While admitting the difficulties of conducting a thorough evaluation, we focus on delineating the types of interpretations that are enabled by MobInsight, and discuss why they are difficult to make with conventional approaches. While we use empirical examples to elaborate on each type of interpretation, instead of making arbitrary choices of the examples, the choices are made based on the estimation performance and the clarity of feature analysis. For example, we look into the neighborhoods where the estimation performance of the framework is much higher than the baseline, and those that have distinguished features contributing to the estimation performance.

We also discuss the validity of the individual examples using various resources about the urbanism of Barcelona including the census, socio-economic data, urban development projects, related articles in the encyclopedia of Barcelona, etc.

5. Results and Discussion

5.1. Mobility Estimation Performance

Table 3 provides an overview of the performance of different models for the two tasks, i.e., To/From. The proposed model using only one layer (NF_Dist) achieves around 35% relative improvement with respect to the Average model, and 30% with respect to the Gravity model. To elaborate further on the improvement in an intuitive way, we use an example estimation for a neighborhood that showed a KL-divergence improvement of 0.08 (absolute) over the Gravity model. For this neighborhood, we had 150k travels to other neighborhoods and our model reduced the total estimation error from 66k to 42k travels.

Table 3. Performance comparison.

The improvement is promising, especially considering that we used a lower resolution for the distance information than the one used for the Gravity model. The full distance information between all neighborhood pairs produces too many additional features (70) with respect to the number of neighborhood features (18) and the number of available training data. Thus, we lowered the number of features by only using the (latitude, longitude) coordinates of the neighborhood center points. The coordinates approximate the distance since the model is able to compute the difference of longitude and latitude respectively and combine them.

Apart from the main implication that the neighborhood features contribute to explaining the mobility, this result offers multiple implications for mobility estimation applications. First, the use of neighborhood features instead of population can greatly reduce the time required for an accurate estimation. While urban spaces evolve over time, the consequent changes of population can happen slowly. On the other hand, the changes in the neighborhoods are likely to be updated to the web much before the population adapts. This opens the opportunity to obtain dynamic estimations in a timely manner.

Second, it enables the prediction of mobility while making hypothetical assumptions. The neighborhood features support hypothesizing changes of places in different areas and obtaining a corresponding estimation of mobility. Developing hypotheses in terms of places would be easier and more realistic than those assuming a change of population.

Third, the comparison between NF_Dist and NF_woPub_Dist supports our intuition that it is important to include the places that are neglected in commercial sources in understanding mobility. The proposed model shows around 15% of improvement over NF_woPub_Dist. As many recent works in urban informatics consider social media as a main data source, our result offers a useful implication to such work about the possible limitation of the source, especially if the sources are used in the context of understanding mobility.

The comparison of the proposed model to VecDist shows the importance of tailoring the model based on the actual data, and the comparison to Pairwise reveals the importance of considering the effect of all neighborhoods together. We further elaborate on these points through example interpretations in the next section.

Table 4. Effect of the number of layers.

In addition to the above comparisons, we observed additional improvement of our model when a few more layers were added to the neural network (Table  4; notice that the 1-layer model corresponds to NF_Dist in Table  3). The improvement was observed until the third layer was added and started degrading when the fourth layer was added, possibly because of the limited training data.

5.2. Interpretations of Barcelona’s Mobility

5.2.1. Larger Space of Interpretation

An obvious benefit of the framework is the availability of diverse features that offer a plausible interpretation for the cases that simpler models could not explain. We elaborate on this point using a visualization that illustrates an example.

Figure  3 depicts the mobility towards a selected neighborhood (Raval, highlighted in white) over a map, and a bubble chart that shows the neighborhood features of Raval and Barri Gótic, which is selected for feature comparison. The color-coding of the map is based on the frequency of mobility towards Raval, where darker colors represent more frequent visits.

Figure 3. Example interpretation through visual exploration of the mobility and neighborhood features.

An interesting point is the significant difference between the amount of visits made from Sant Antoni and that from Barri Gótic. However, a simple mobility model that does not consider local features would not explain the drastic difference given the close distances between the two neighborhoods and Raval.

The diverse set of features help make sense of the different relationship between the neighborhoods and obtain a plausible explanation. In accordance with the fact that Raval is one of the city center area where people often visit for nightlife and shopping, the white bubbles that represent the features of Raval have a greater size for the features Bar, Special purchase (shopping), and Eating.

On the other hand, the blue bubbles that represent the features of Gótic also reveals that Gótic is another neighborhood composing the city center area which has many places for nightlife and attractions. As for the three main features of Raval mentioned above, the corresponding bubbles of the two neighborhoods show comparable sizes. It enables the interpretation that the people of Gótic might find Raval less attractive despite the close distance, and other factors than nightlife or attraction would be important for them to visit other neighborhoods.

5.2.2. Interpretations Tailored to Target Areas

As mentioned, the framework identifies the important features for explaining the mobility of each neighborhood. We frequently observed that the identified features are different from those that would have been identified by other simple approaches, such as generalized models emphasizing the role of work and home places, or vector similarity measures (e.g., cosine similarity). In order to save space, we use a number of examples instead of reporting the feature analysis of all the neighborhoods.

The result for the neighborhood Pedralbes provides a typical example. Pedralbes is a well known wealthy uptown residential area, described as a neighborhood that ‘stands out from others in terms of socio-economic class’ (Wikipedia, 2017c). It is also described to have ‘the service sector as its main economic activity and hosts financial institutions and office centers’. Indeed, the feature analysis showed high importance for ‘Professional services’, and ‘Offices’, implying that they are critical for estimating the mobility of Pedralbes. In addition, the feature ‘Education’ also showed high importance as Pedralbes hosts many private and international schools, and university campuses.

While the identified features match with the urbanism of Pedralbes, another important aspect of the result is that the framework did not put high importance to some features that would have been identified to be important by vector similarity measures. Although the features ‘administrative offices’, ‘special purchases’, and ‘leisure’ were distant from the average of other neighborhoods, MobInsight assessed the effect of these features to be negligible.

As for another example, in Barceloneta, the importance values were exceptionally skewed to ‘Club’ and ‘Eating’. This seemed trivial as the area is described to be “famous for its beach, restaurants, and nightclubs along the boardwalk” (Wikipedia, 2017b). However, we found this example interesting since there are many areas with diverse nightlife and restaurants options (e.g., other neighborhoods in the city center), and the exceptional skew of Barceloneta indicates, on the other hand, that the other features do not contribute much. A plausible explanation for the skew could be that the area went through a profound transformation during the urban project near the 1992 Olympics which aimed to strengthen its recreational function. The area is known to be struggling over gentrification after the project (Wikipedia, 2017a). Although there were a number of features whose value were significantly distant from the average of other neighborhoods, the framework put much less importance to them compared to the two key features.

Building upon this point, we observe if the skew of feature importance is an indicator of a certain residential quality. Jane Jacobs, in her famous book “The life and death of great American cities” (Jacobs, 1961), argues about essential conditions for a lively neighborhood, and emphasizes the importance of having a mix of diverse functions in a neighborhood. Inspired by her argument and the recent evaluation of the argument conducted by De Nadai et al. with a few Italian cities (De Nadai et al., 2016), we conducted a correlation analysis between the degree of skew and population. We first measured the variance of feature importance for each neighborhood, assuming that the neighborhoods with diverse features of high weight will show low variance. Then, we analyzed the correlation between the computed variance and the neighborhood’s population. Interestingly, we observed a significant inverse correlation between the variables (Pearson coefficient=-0.3, ). Such a result could be capturing the preference for neighborhoods with diverse functions, which supports Jane Jacobs’ original argument.

5.2.3. Insights beyond Individual Neighborhood Pairs

As mentioned, the framework views the mobility flow from one neighborhood to all others as a whole rather than to look at individual flows separately. While this leads to superior performance overall than the baselines that models the mobility flows independently (e.g., gravity model), we elaborate on two simple examples that demonstrates the importance of this point.

Figure 4. Mobility from Sarriá to other Neighborhoods.

Figure  4 shows the ground truth mobility flow from one neighborhood, Sarriá, to all other neighborhoods. It shows that the mobility is strongly clustered to a number of neighborhoods within the marked area. Each of the 10 neighborhoods in the marked area absorbed 6% of the mobility from Sarriá on average, whereas all the others absorbed less than 2%. This contrast implies that, for example, the infrequent mobility from Sarriá to a neighborhood outside the marked area cannot be understood by only looking into the relationship between the two. Rather, it is important to take into account the strong ties that Sarriá has with the neighborhoods in the marked area.

In addition to achieving better estimation of such contrasting mobility flows, MobInsight’s feature analysis suggests a possible interpretation. For Sarriá and the surrounding neighborhoods, MobInsight frequently assigned high importance to the features ‘Education’, ‘Health’, ‘Offices’, and ‘Leisure’, whereas those features had much less importance for the neighborhoods outside the marked border. The contrast leads to the speculation about the possible effect of socio-economic differences between the two parts of the city on the mobility. According to Barcelona’s official statistics (of Statistics of Barcelona City Council, 2015), the neighborhood cluster around Sarriá is the richest in the city with family incomes between 1.8 and 2.5 times higher than that of the neighborhoods outside. The clustered mobility could be implying socio-economic homophily between the affluent neighborhoods, reinforced by the fact that many of the education and health facilities in those neighborhoods are private. It also extends the findings made about the relation between mobility and social deprivation in the prior works (Lathia et al., 2012; Smith et al., 2013).

Figure 5. Mobility from Poblenou to other Neighborhoods.

A similar contrast is found between the neighborhoods around Poblenou and the rest, especially the ones on the north (Figure  5). The feature analysis of these neighborhoods around Poblenou commonly showed an exceptionally high importance for ‘Office’. We believe the result is capturing the specialized characteristic of the area resulting from an urban project conducted in the early 2000s. The area was suffering from deindustrialization and the project transformed the area from an industry zone of factories to a highly specialized zone for emerging industries, which now hosts numerous technology companies (over 7000 businesses), and foreign employees (Fugueras, 2005).

6. Conclusions and Future Work

In this paper, we have presented MobInsight, a framework that supports deeper interpretations of urban mobility specific to the target city. It takes advantage of the interpretable features produced by holistic semantic aggregation, the method we create for neighborhood profiling. The method thoroughly identifies existing places in the neighborhoods and extracts the semantic meanings from the annotations left on them. The framework comprehensively analyzes how the features affect the mobility by building mobility models and performing model auditing. We evaluate the framework with the mobility data of Barcelona and elaborate on three types of interpretations that touch the urbanism of Barcelona.

Our on-going works include creating and testing new semantic features for neighborhood profiling. There are additional data that are already collected but not used currently, e.g., scores given to places, full-text reviews, etc. We believe further improvement can be made by developing new features from them, such as the popularity/quality of places, different preferences between age groups or cultural background, etc. Another future direction is to expand the analysis to other cities and make more site-specific interpretations through comparison.

As mentioned, developing a methodology for a thorough evaluation of the interpretations is a challenging future work. We believe it is a large topic which should be covered in a separate study as it requires thoughtful design in terms of choosing developing ground-truth and evaluation tasks, comparison methods, and finding qualified experts as evaluators. We will explore various ideas in qualitative methodologies and also the recent advances in crowd-sourced evaluation methods.

References

  • (1)
  • Ope (2017) 2017. Open Data BCN. (2017). http://opendata.bcn.cat/opendata/ [Online; accessed 14-May-2017].
  • Bentley et al. (2012) Frank Bentley, Henriette Cramer, William Hamilton, and Santosh Basapur. 2012. Drawing the city: differing perceptions of the urban environment. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1603–1606.
  • Breiman (2001) Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5–32.
  • Cranshaw et al. (2012) Justin Cranshaw, Raz Schwartz, Jason Hong, and Norman Sadeh. 2012. The Livehoods Project: Utilizing Social Media to Understand the Dynamics of a City. In Sixth International AAAI Conference on Weblogs and Social Media.
  • De Certeau (1984) Michel De Certeau. 1984. The Practice of Everyday Life [1980], translated by Steven Rendell (Berkeley, CA. (1984).
  • De Nadai et al. (2016) Marco De Nadai, Jacopo Staiano, Roberto Larcher, Nicu Sebe, Daniele Quercia, and Bruno Lepri. 2016. The death and life of great Italian cities: a mobile phone data perspective. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 413–423.
  • Deerwester et al. (1990) Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American society for information science 41, 6 (1990), 391.
  • Dourish et al. (2007) Paul Dourish, Ken Anderson, and Dawn Nafus. 2007. Cultural mobilities: diversity and agency in urban computing. Human-Computer Interaction–INTERACT 2007 (2007), 100–113.
  • Erlander and Stewart (1990) Sven Erlander and Neil F Stewart. 1990. The gravity model in transportation analysis: theory and extensions. Vol. 3. Vsp.
  • Frias-Martinez et al. (2011) Enrique Frias-Martinez, Graham Williamson, and Vanessa Frias-Martinez. 2011. An agent-based model of epidemic spread using human mobility and social network information. In Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third Inernational Conference on Social Computing (SocialCom), 2011 IEEE Third International Conference on. IEEE, 57–64.
  • Fugueras (2005) Ramon Fugueras. 2005. Enciclop dia de Barcelona. Enciclop dia Catalana Ajuntament de Barcelona, Barcelona.
  • Glorot and Bengio (2010) Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks.. In Aistats, Vol. 9. 249–256.
  • Goodfellow et al. (2016) Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. MIT Press.
  • Hastie, Tibshirani, and Friedman (Hastie et al.) Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning. (????).
  • Hillier (2007) Bill Hillier. 2007. Space is the machine: a configurational theory of architecture. Space Syntax.
  • Jacobs (1961) Jane Jacobs. 1961. The death and life of American cities.
  • Johnson et al. (2016a) Isaac L Johnson, Yilun Lin, Toby Jia-Jun Li, Andrew Hall, Aaron Halfaker, Johannes Schöning, and Brent Hecht. 2016a. Not at home on the range: Peer production and the urban/rural divide. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 13–25.
  • Johnson et al. (2016b) Isaac L Johnson, Subhasree Sengupta, Johannes Schöning, and Brent Hecht. 2016b. The Geography and Importance of Localness in Geotagged Social Media. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 515–526.
  • Kling and Pozdnoukhov (2012) Felix Kling and Alexei Pozdnoukhov. 2012. When a city tells a story: urban topic analysis. In Proceedings of the 20th international conference on advances in geographic information systems. ACM, 482–485.
  • Koo and Karemera (1991) Won W Koo and David Karemera. 1991. Determinants of world wheat trade flows and policy analysis. Canadian Journal of Agricultural Economics/Revue canadienne d’agroeconomie 39, 3 (1991), 439–455.
  • Kullback and Leibler (1951) Solomon Kullback and Richard A Leibler. 1951. On information and sufficiency. The annals of mathematical statistics 22, 1 (1951), 79–86.
  • Kung et al. (2014) Kevin S Kung, Kael Greco, Stanislav Sobolevsky, and Carlo Ratti. 2014. Exploring universal patterns in human home-work commuting from mobile phone data. PloS one 9, 6 (2014), e96180.
  • Lathia et al. (2012) Neal Lathia, Daniele Quercia, and Jon Crowcroft. 2012. The hidden image of the city: sensing community well-being from urban mobility. In International Conference on Pervasive Computing. Springer, 91–98.
  • Lenormand et al. (2015) Maxime Lenormand, Miguel Picornell, Oliva G Cantú-Ros, Thomas Louail, Ricardo Herranz, Marc Barthelemy, Enrique Frías-Martínez, Maxi San Miguel, and José J Ramasco. 2015. Comparing and modelling land use organization in cities. Royal Society open science 2, 12 (2015), 150449.
  • Lindqvist et al. (2011) Janne Lindqvist, Justin Cranshaw, Jason Wiese, Jason Hong, and John Zimmerman. 2011. I’m the mayor of my house: examining why people use foursquare-a social-driven location sharing application. In Proceedings of the SIGCHI conference on human factors in computing systems. ACM, 2409–2418.
  • Long et al. (2012) Xuelian Long, Lei Jin, and James Joshi. 2012. Exploring trajectory-driven local geographic topics in foursquare. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing. ACM, 927–934.
  • Lynch (1960) Kevin Lynch. 1960. The image of the city. Vol. 11. MIT press.
  • Moudon (1997) Anne Vernez Moudon. 1997. Urban morphology as an emerging interdisciplinary field. Urban morphology 1, 1 (1997), 3–10.
  • Næss and Jensen (2002) Petter Næss and Ole B Jensen. 2002. Urban land use, mobility and theory of science: Exploring the potential for critical realism in empirical research. Journal of environmental policy & planning 4, 4 (2002), 295–311.
  • Noulas et al. (2012) Anastasios Noulas, Salvatore Scellato, Renaud Lambiotte, Massimiliano Pontil, and Cecilia Mascolo. 2012. A tale of many cities: universal patterns in human urban mobility. PloS one 7, 5 (2012), e37027.
  • of Statistics of Barcelona City Council (2015) Department of Statistics of Barcelona City Council. 2015. 2015 Income by Neighborhood. (2015). http://www.bcn.cat/estadistica/angles/dades/barris/economia/renda/rdfamiliar/a2015.htm [Online; accessed 14-May-2017].
  • Proshansky et al. (1983) Harold M Proshansky, Abbe K Fabian, and Robert Kaminoff. 1983. Place-identity: Physical world socialization of the self. Journal of environmental psychology 3, 1 (1983), 57–83.
  • Quercia et al. (2015) Daniele Quercia, Luca Maria Aiello, Rossano Schifanella, and Adam Davies. 2015. The digital life of walkable streets. In Proceedings of the 24th International Conference on World Wide Web. ACM, 875–884.
  • Quercia et al. (2014) Daniele Quercia, Rossano Schifanella, and Luca Maria Aiello. 2014. The shortest path to happiness: Recommending beautiful, quiet, and happy routes in the city. In Proceedings of the 25th ACM conference on Hypertext and social media. ACM, 116–125.
  • Rousseeuw (1987) Peter J Rousseeuw. 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics 20 (1987), 53–65.
  • Simini et al. (2012) Filippo Simini, Marta C González, Amos Maritan, and Albert-László Barabási. 2012. A universal model for mobility and migration patterns. Nature 484, 7392 (2012), 96–100.
  • Smith et al. (2013) Chris Smith, Daniele Quercia, and Licia Capra. 2013. Finger on the pulse: identifying deprivation using transit flow analysis. In Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 683–692.
  • Srivastava et al. (2014) Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1 (2014), 1929–1958.
  • Stouffer (1940) Samuel A Stouffer. 1940. Intervening opportunities: a theory relating mobility and distance. American sociological review 5, 6 (1940), 845–867.
  • Wikipedia (2017a) Wikipedia. 2017a. La Barceloneta — Wikipedia, The Free Encyclopedia. (2017). https://es.wikipedia.org/wiki/La_Barceloneta [Online; accessed 14-May-2017].
  • Wikipedia (2017b) Wikipedia. 2017b. La Barceloneta, Barcelona — Wikipedia, The Free Encyclopedia. (2017). https://en.wikipedia.org/wiki/La_Barceloneta,_Barcelona [Online; accessed 14-May-2017].
  • Wikipedia (2017c) Wikipedia. 2017c. Urbanismo de Barcelona — Wikipedia, The Free Encyclopedia. (2017). https://es.wikipedia.org/wiki/Urbanismo_de_Barcelona [Online; accessed 14-May-2017].
  • Xu and Wunsch (2008) Rui Xu and Don Wunsch. 2008. Clustering. Vol. 10. John Wiley & Sons.
  • Yuan et al. (2012) Jing Yuan, Yu Zheng, and Xing Xie. 2012. Discovering regions of different functions in a city using human mobility and POIs. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 186–194.
  • Zeiler (2012) Matthew D Zeiler. 2012. ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012).