1 Empirical Model and Estimation
We model the consumer’s choice of restaurant conditional on deciding to go out to lunch. We assume that the consumer selects the restaurant that maximizes utility, where the utility of user for restaurant on her th visit is
where denotes the week in which trip happens, and is the distance from to . This gives a parameterized expression for the utility: is an intercept term that captures a restaurant’s popularity; and are latent vectors that model a user’s latent preferences and a restaurant’s latent attributes; is a vector that captures a restaurant’s latent factors for travel distance and is a user’s latent preferences of willingness to travel to restaurants with those factors; and are latent vectors of week/restaurant time effects (this allows us to capture varying effects for different parts of the year); and are error terms, which we assume to be independent and identically Gumbel distributed. We specify a hierarchical model where observable characteristics of restaurants, denoted by , affect the mean of the distribution of latent restaurant characteristics and . This hierarchy allows restaurants to share statistical strength, which helps to infer the latent variables of lowfrequency restaurants. We estimate the posterior over the latent model parameters using variational inference. Our approach is similar to Ruiz, Athey and Blei (2017), but differs in a few respects. First, we assume that each consumer chooses only one restaurant on a purchase occasion, so interactions among products are not important. Second, TTFM is hierarchical, allowing observed restaurant characteristics to affect the prior distribution of latent variables. (See Appendix A.3 for details.)
For comparison, we also consider a simpler model, a standard multinomial logit model (MNL), which is a restricted version of our proposed model: the term is constant across restaurants, is set to be equal to the observable characteristics of items, is constant across users, is omitted (including it created problems with convergence of the estimation), and is restricted to be constant across users and restaurants.
2 The Data and Summary Statistics
The dataset is from SafeGraph, a company that collects anonymous, aggregates locational information from consumers who have opted into sharing their location through mobile applications. The data consists of “pings” from consumer phones; each observation includes a unique device identifier that we associate with a single anonymous consumer, the time and date of the ping, and the latitude, longitude and accuracy of the ping over a sample period from January through October 2017.
From this data, we construct the key variables for our analysis. First, we construct the approximate “typical” morning location of the consumer, defined as the most common place the consumer is found from 9:00 to 11:15 a.m. on weekdays. We restrict attention to consumers whose morning locations are consistent over the sample period, and for which these locations are in the Peninsula of the San Francisco Bay Area (roughly, South San Francisco to San José, excluding the mountains and coast). We determine that the consumer visited a restaurant for lunch if we observed at least two pings more than 3 minutes apart during the hours of 11:30 a.m. to 1:30 p.m. in a location that we identify as a restaurant. Restaurants are identified using data from Yelp that includes geocoordinates, star ratings, price range, restaurant categories (e.g., Pizza or Chinese), and we also use Yelp to infer approximate dates of restaurant openings and closings. Last, we narrow the dataset to consumer choices over a subset of restaurants that appear sufficiently often in the data, and to consumers who visit a sufficient number of restaurants. This process results in a final dataset of 106,889 lunch visits by 9,188 users to 4,924 locations. Table 1 provides summary statistics on the users and restaurants included in the dataset. (Appendix A.2 gives all details about the dataset processing pipeline.)
UserLevel Statistics  

Variable (Per User)  Mean  25%  50%  75%  % Missing 
Total Visits  11.63  4.00  7.00  13.00  — 
Distinct Visited Rest.  7.25  3.00  5.00  9.00  — 
Distinct Visited Categories  11.60  6.00  10.00  15.00  — 
Median Distance (mi.)  3.06  0.89  1.86  3.79  — 
Weekly Visits  0.39  0.15  0.25  0.47  — 
Weeks Active  31.14  22.00  33.00  41.00  — 
Mean Rating of Visited Rest.  3.29  3.00  3.33  3.61  1 
Mean Price Range of Visited Rest.  1.55  1.33  1.53  1.75  0.6 
RestaurantLevel Statistics  
Variable (Per Restaurant)  Mean  25%  50%  75%  % Missing 
Distinct Visitors  13.53  5.00  10.00  19.00  — 
Median Distance (mi.)  2.39  0.93  1.72  2.94  — 
Weeks Open  42.17  44.00  44.00  44.00  — 
Weekly Visits (Opens)  0.54  0.17  0.37  0.72  — 
Weekly Visits (Always Open)  0.52  0.16  0.34  0.68  — 
Weekly Visits (Closes)  0.53  0.15  0.34  0.67  — 
Price Range  1.56  1.00  2.00  2.00  10.66 
Rating  3.38  2.89  3.53  4.00  14.52 
3 Estimation and Model Fit
We divide the dataset into three parts, 70.6 percent training, 5.0 percent validation, and 24.4 percent testing. We use the validation dataset to select parameters such as the length of the latent vectors and ( and , respectively), while we compare models and evaluate performance in the test dataset. (See Section A.4 for details.) We select and . In the hierarchical prior, the distribution of a restaurant’s components depends on price range, star ratings, and restaurant category.
Across several measures evaluated on the test set, TTFM is a better model than MNL. For example, precision@5 is the percentage of times that a user’s chosen restaurant is in the set of the top five predicted restaurants. It is 35% for TFMM and 11% for MNL. Further, as shown in Figures 10 and 10, TTFM predictions improve significantly for highfrequency users and restaurants, while MNL does not exhibit that improvement. This highlights the benefits of personalization: When given enough data, TTFM learns userspecific preferences.
Model  MSE  Log Likelihood  Precision@1  Precision@5  Precision@10 

Training Sample  
TTFM  0.00025  3.59  31.8%  59.4%  70.3% 
MNL  0.00031  6.58  2.8%  10.7%  16.7% 
Heldout Test Sample  
TTFM  0.00028  5.19  20.5%  35.5%  42.2% 
MNL  0.00031  6.55  3.1%  11.4%  17.5% 
Precision measures the share of visits in the set of the top {1,5,10} restaurants predicted by the model.
Figure 2
illustrates that both TTFM and MNL fit well the empirical probability of visiting restaurants at varying distances from the consumer’s morning location. But Figure
2shows that TTFM outperforms MNL at fitting the actual visit rates of different restaurants; here restaurants are grouped by their visitfrequency deciles. The rich heterogeneity of TTFM allows personalized predictions for restaurants.
4 Parameter Estimates
The distributions of estimated elasticities from TTFM are summarized in Table 8 and Figure 11. Note that the elasticities in the MNL vary only because the baseline visit probabilities vary across consumers and restaurants. TTFM elasticities are more dispersed, reflecting the personalization capabilities of the TTFM model. The average elasticity across consumers and restaurants (weighted by trip frequency) is
. Thus, distance matters substantially for lunch, which is consistent with the fact that roughly 60 percent of visits are within two miles of the consumer’s morning location. Furthermore, there is substantial heterogeneity in that willingness to travel. Across users and restaurants, the standard deviation of elasticities in the TTFM model is 0.68, while the average withinuser standard deviation of elasticities is 0.30 and the average withinrestaurant standard deviation of elasticities is 0.60. Elasticities are substantially less dispersed in the MNL model.
Characteristic  Mean  se  25 %  50 %  75 %  N 

All restaurants  1.411  0.0001  1.585  1.408  1.203  4924 
Most popular category: Mexican  1.499  0.0004  1.664  1.491  1.285  694 
Most popular category: Sandwiches  1.435  0.0006  1.602  1.441  1.235  522 
Most popular category: Hotdog  1.403  0.0007  1.570  1.390  1.216  377 
Most popular category: Coffee  1.390  0.0008  1.563  1.404  1.178  365 
Most popular category: Bars  1.370  0.0009  1.546  1.362  1.161  352 
Most popular category: Chinese  1.353  0.0009  1.517  1.378  1.176  350 
Most popular category: Japanese  1.320  0.0011  1.472  1.336  1.140  276 
Most popular category: Pizza  1.497  0.0010  1.649  1.481  1.307  260 
Most popular category: Newamerican  1.323  0.0019  1.540  1.351  1.117  181 
Most popular category: Vietnamese  1.328  0.0020  1.541  1.327  1.155  156 
Most popular category: Other  1.411  0.0002  1.582  1.406  1.189  1391 
Price range: 1  1.446  0.0001  1.607  1.435  1.245  2091 
Price range: 2  1.368  0.0001  1.542  1.371  1.162  2165 
Price range: 3  1.320  0.0026  1.506  1.353  1.108  122 
Price range: 4  1.449  0.0178  1.664  1.496  1.289  21 
Price range: missing  1.474  0.0006  1.648  1.455  1.225  525 
Rating, quintile: 1  1.427  0.0003  1.605  1.414  1.209  842 
Rating, quintile: 2  1.392  0.0003  1.557  1.397  1.187  842 
Rating, quintile: 3  1.364  0.0003  1.532  1.366  1.169  842 
Rating, quintile: 4  1.385  0.0004  1.571  1.370  1.180  842 
Rating, quintile: 5  1.438  0.0003  1.603  1.438  1.250  841 
Rating, quintile: missing  1.475  0.0004  1.653  1.464  1.232  715 
Characteristic  Mean  se  25 %  50 %  75 %  N 

All restaurants  1.411  0.0001  1.585  1.408  1.203  4924 
City: Daly City  1.105  0.0019  1.331  1.150  0.959  165 
City: Burlingame  1.119  0.0030  1.327  1.194  1.018  110 
City: Millbrae  1.130  0.0049  1.418  1.240  0.954  80 
City: San Bruno  1.132  0.0035  1.398  1.216  0.987  101 
City: South San Francisco  1.187  0.0021  1.413  1.232  0.999  135 
City: San Mateo  1.243  0.0012  1.454  1.284  1.101  268 
City: Foster City  1.318  0.0070  1.506  1.397  1.163  44 
City: San Carlos  1.321  0.0026  1.479  1.350  1.195  95 
City: Palo Alto  1.330  0.0013  1.519  1.342  1.171  234 
City: Brisbane  1.332  0.0139  1.455  1.344  1.181  15 
City: Belmont  1.334  0.0047  1.500  1.374  1.212  58 
City: Redwood City  1.362  0.0012  1.530  1.389  1.217  214 
City: Cupertino  1.365  0.0018  1.532  1.386  1.174  169 
City: East Palo Alto  1.374  0.0142  1.521  1.393  1.229  13 
City: Los Gatos  1.391  0.0026  1.583  1.437  1.219  106 
City: Los Altos  1.406  0.0043  1.564  1.394  1.236  60 
City: Menlo Park  1.407  0.0031  1.570  1.428  1.287  87 
City: Mountain View  1.422  0.0013  1.592  1.429  1.233  213 
City: Santa Clara  1.442  0.0009  1.681  1.456  1.238  355 
City: San Jose  1.451  0.0002  1.635  1.464  1.278  1858 
City: Campbell  1.482  0.0015  1.640  1.493  1.317  144 
City: Saratoga  1.497  0.0059  1.628  1.481  1.394  40 
City: Sunnyvale  1.501  0.0008  1.659  1.513  1.325  302 
City: Stanford  1.607  0.0062  1.760  1.605  1.482  39 
Tables 3 and 4 and Figure 3 illustrate how elasticities vary across restaurant types and cities. Willingness to travel is lower for lowpriced restaurants (elasticity for price range $ (under $10) versus for price range $$ ($11$30)); lower for Mexican restaurants and Pizza places than for Chinese and Japanese restaurants (elasticities of and versus and , respectively). Cities with many work locations nearby retail districts, including San José, Sunnyvale, and Mountain View have a lower willingness to travel than cities that are more spread out like Daly City, Burlingame, San Bruno, and San Mateo. Appendix Section A.5
provides further descriptive statistics about latent factors and model results, illustrating for example how to model can be used to find restaurants that are intrinsically similar (without regard to location) as well as which restaurants are similar in terms of user utilities.
5 Analyzing Restaurant Opening and Closing
The TTFM model can make predictions about how market share will be redistributed among restaurants when restaurants open or close, and these predictions can be compared to the actual changes that occur in practice. For this exercise, we focus on 221 openings and 190 closings where, both before and after the change, there were at least 500 restaurant visits by users with morning locations within a 3 mile radius of the relevant restaurant. Figure 7 illustrates that restaurant openings and closings are fairly evenly distributed over the time period.
One challenge of analyzing market share redistribution is that for any given target restaurant that opens or closes, we would expect some baseline level of market share changes of competing restaurants due to changes in the open status of neighboring restaurants. We address this in an initial exercise where we hold the environment fixed in the following way. For each target restaurant that changed status, we first construct the predicted difference in market shares for each other restaurant between the “closed” and “open” regime (irrespective of which came first in time), and then subtract out the predicted change in market share that would have occurred for each restaurant if the target restaurant had been closed in both periods. We then sum the changes across restaurants in different groups defined by their distance from the target restaurant. Table 5 shows TTFM model predictions for how the opening/closing restaurant’s market share is redistributed over other restaurants within certain distances after the restaurant becomes unavailable (i.e. before the opening or after the closing). The TTFM model estimates imply that just over 50 percent of the market share impact of a closure accrues restaurants within 2 miles of the target restaurant.
Distance from opening/closing restaurant (mi.)  
2  2  4  4  6  6  8  8  10  10  
share  51 %  23 %  10 %  6 %  3 %  6 % 
cum. share  51 %  74 %  84 %  90 %  94 %  100 % 
Figure 4 compares the actual changes in market share that occured against the predictions of the TTFM model. It should be noted that baseline changes unrelated to the opening and closing of the target restaurants seem to dominate both the actual and predicted market share changes in the figure. The figure shows that our model’s predictions match well the actual changes that occurred, but it there is substantial variation in the changes that occured in the actual data, making it difficult to evaluate model performance using this exercise.
Our final exercise considers the best choice of restaurant type for a location. For the set of restaurants that open or close, we look at how the demand for the restaurant that changed status (the “target restaurant”) compares to the counterfactual demand the model predicts in the scenario where a different restaurant in our sample (as described by its mean latent characteristics) is placed in the location of the target restaurant. For each target, we consider a set of 200 alternative restaurants, 100 from the same category as the target restaurant and 100 from a different category.^{1}^{1}1These alternatives are sampled with equal probabilities from the set of restaurants in our sample. We then compare the target restaurant’s estimated market share to the mean demand across the set of alternatives. In Table 6, we see that both the restaurants that opened and those that closed on average have higher predicted demand than either group of alternatives. However, the restaurants that opened appear to be in more valuable locations, since for the 200 alternative restaurants, we predict higher average demand if they were (counterfactually) placed at the opening locations than at the locations of closing restaurants. As a further comparison, we split the set of alternatives into groups based on whether or not they are in the same broad category as the restaurant that opened or closed. We find that alternative restaurants from the same category as the target would perform better on average than alternatives from a different category.
Mean Predicted Demand  Closing  Opening 

Actual Opening/Closing Restaurant  10.33 (0.83)  12.10 (1.14) 
Alternative from Same Category  10.08 (0.12)  10.53 (0.11) 
Alternative from Different Category  9.09 (0.08)  9.71 (0.08) 
6 Ideal Locations and Ideal Restaurant Types
In this section, we consider the match between restaurant characteristics and locations. In each geohash6, we select one restaurant location at random and use the TTFM model to predict what the total demand would have been if a different restaurant had been located in its place. The set of alternative restaurants was chosen to include one restaurant from each of the major categories in the sample.^{2}^{2}2From each category, we randomly selected one restaurant whose market share is within standard deviation of the mean market share in the full sample.
In Figure 13, we examine which locations are predicted to provide the largest demand in the lunch market for each restaurant category. We can see for example that Vietnamese restaurants are predicted to have the highest demand in a dense region in the southeastern portion of the map. The demand for Filipino restaurants is relatively diffuse, whereas the demand for sandwiches is characterized by small but dense pockets of relatively high demand.
In Figure 14, we group the restaurant categories into coarse groups based on the price range and the type of cuisine. We examine within each group which category would have the highest total demand in each location. There is considerable spatial heterogeneity in which restaurant category is predicted to perform best in each location.
7 Conclusions
This paper makes use of a novel dataset to analyze consumer choice: mobile location data. We propose the TTFM model, a rich model that allows heterogeneity in user preferences for restaurant characteristics as well as for travel time, where preferences for travel time vary across restaurants as well. We show that this model fits the data substantially better than traditional alternatives, and by incorporating recent advances in Bayesian inference, the estimation becomes tractable. We use the model to conduct counterfactual analysis about the impact of restaurants opening and closing, as well as to evaluate how the choice of restaurant characteristics affects market share. More broadly, we believe that with the advent of digitization, panel datasets about consumer location can be combined with rich structural models to answer questions about firm strategy as well as urban policy, and models such as TTFM can be used to accomplish these goals.
References
 (1)
 Athey et al. (2017) Athey, Susan, David M. Blei, Robert Donnelly, and Francisco J. R. Ruiz. 2017. “Counterfactual Inference for Consumer Choice Across Many Product Categories.” Unpublished.
 Blei, Kucukelbir and McAuliffe (2017) Blei, David M., Alp Kucukelbir, and Jon D. McAuliffe. 2017. “Variational Inference: A Review for Statisticians.” Journal of the American Statistical Association, 112(518): 859–877.
 Blum (1954) Blum, Julius R. 1954. “Approximation methods which converge with probability one.” The Annals of Mathematical Statistics, 25(2): 382–386.
 Bottou, Curtis and Nocedal (2016) Bottou, L., F. E. Curtis, and J. Nocedal. 2016. “Optimization Methods for LargeScale Machine Learning.” arXiv:1606.04838.
 Elrod (1988) Elrod, Terry. 1988. “Choice map: Inferring a productmarket map from panel data.” Marketing Science, 7(1): 21–40.
 Hoffman et al. (2013) Hoffman, M. D., David M. Blei, C. Wang, and J. Paisley. 2013. “Stochastic Variational Inference.” Journal of Machine Learning Research, 14: 1303–1347.
 Jordan (1999) Jordan, Michael I., ed. 1999. Learning in Graphical Models. Cambridge, MA, USA:The MIT Press.
 Keane (2015) Keane, Michael P. 2015. “Panel Data Discrete Choice Models of Consumer Demand.” , ed. B. H. Baltagi, Chapter 18, 549–583. Oxford University Press.
 Kingma and Welling (2014) Kingma, Diederik P., and Max Welling. 2014. “AutoEncoding Variational Bayes.” arXiv:1312.6114.
 Neilson (2013) Neilson, C. 2013. “Targeted vouchers, competition among schools, and the academic achievement of poor students.” Yale University Working Paper.

Rezende, Mohamed and Wierstra (2014)
Rezende, Danilo Jimenez, Shakir Mohamed, and Daan Wierstra.
2014. “Stochastic backpropagation and approximate inference in deep generative models.” Vol. 32 of
Proceedings of Machine Learning Research, 1278–1286. PMLR.  Robbins and Monro (1951) Robbins, H., and S. Monro. 1951. “A stochastic approximation method.” The Annals of Mathematical Statistics, 22(3): 400–407.
 Ruiz, Athey and Blei (2017) Ruiz, Francisco J. R., Susan Athey, and David M. Blei. 2017. “SHOPPER: A Probabilistic Model of Consumer Choice with Substitutes and Complements.” arXiv:1711.03560.
 Titsias and LázaroGredilla (2014) Titsias, M. K., and M. LázaroGredilla. 2014. “Doubly stochastic variational Bayes for nonconjugate inference.” Vol. 32 of Proceedings of Machine Learning Research, 1971–1979. PMLR.
 Wainwright and Jordan (2008) Wainwright, M. J., and M. I. Jordan. 2008. “Graphical Models, Exponential Families, and Variational Inference.” Foundations and Trends in Machine Learning, 1(1–2): 1–305.
 Wan et al. (2017) Wan, Mengting, Di Wang, Matt Goldman, Matt Taddy, Justin Rao, Jie Liu, Dimitrios Lymberopoulos, and Julian McAuley. 2017. “Modeling Consumer Preferences and Price Sensitivities from LargeScale Grocery Shopping Transaction Logs.” 1103–1112, International World Wide Web Conferences Steering Committee.
 Zhao, Du and Buntime (2017) Zhao, He, Lan Du, and Wray Buntime. 2017. “Leveraging Node Attributes for Incomplete Relational Data.” Vol. 70 of Proceedings of Machine Learning Research, 4072–4081. PMLR.
Appendix A Appendix
This Appendix begins by providing details of the data and dataset creation. Next we provide estimation details. Then, we provide a variety of results about goodness of fit and our model estimates, including summaries of estimated sensitivity to distance broken out by restaurant category and other characteristics. Next, we provide details of our analyses of restaurant openings and closings, as well as counterfactual analyses about the ideal locations of restaurants of different categories.
a.1 Data Description
Our dataset is constructed using data from SafeGraph, a company which aggregates locational information from anonymous consumers who have opted in to sharing their location through mobile applications. The data consists of “pings” from consumer phones; each observation includes a unique device id that we associate with a single consumer; the time and date of the ping; and the latitude and longitude and horizontal accuracy of the ping, all for smartphones in use during the sample period from January through October 2017.
Our second data source is Yelp. From Yelp, we obtained a list of restaurants, locations, ratings, price ranges, and categories, and we infer dates of openings and closings from the dates on which consumers created a listing on Yelp or marked a location as closed, respectively.
a.2 Dataset Creation and Sample Selection
Our area of interest is the corridor from South San Francisco to South San José around I101 and I280. We start with a rough bounding box around the area, find all incorporated cities whose area intersects the bounding box and then remove Fremont, Milpitas, Hayward, Pescadero, Loma Mar, La Honda, Pacifica, Montara, Moss Beach, El Granada, Half Moon Bay, Lexington Hills and Colma from the set because they are too far from the corridor.
This leaves us with the following 41 cities: Los Gatos, Saratoga, Campbell, Cupertino, Los Altos Hills, Monte Sereno, Palo Alto, San José, San Bruno, Atherton, Brisbane, East Palo Alto, Foster City, Hillsborough, Millbrae, Menlo Park, San Mateo, Portola Valley, Sunnyvale, Mountain View, Los Altos, Santa Clara, Belmont, Burlingame, Daly City, San Carlos, South San Francisco, Woodside, Redwood City, Alum Rock, Burbank, Cambrian Park, East Foothills, Emerald Lake Hills, Fruitdale, HighlandsBaywood Park, Ladera, Loyola, North Fair Oaks, Stanford and West Menlo Park.
We then take the shapefiles for these cities as provided by the Census Bureau and find the set of rectangular regions known as geohash5s^{3}^{3}3Geohashes are a system in which the earth is gridded into a set of successively finer set of rectangles, which are then labelled with alphanumeric strings. These strings can then be used to describe geographic information in databases in a form that is easier to work with than latitudes and longitudes. At its coarsest, the geohash1 level, the earth is divided into 32 rectangles whose edges are roughly 3000 miles long. Each geohash1 is then in turn divided into 32 rectangles that are about 800 miles across. The finest geohash resolution used in this paper, geohash8, corresponds to rectangles of size 125 60 feet. See http://www.geohash.org/ for further details. that cover their union. This is our area of interest and is shown in Figure 5.
To construct our user base we only consider movement pings emitted on weekdays. We define an active week to be one during which a user emits at least one such ping. The user base includes users who meet the following criteria during our sample period, January to October 2017:

Have an approximate inferred home location as provided by SafeGraph

Are “active” (defined as having at least 12 — not necessarily consecutive — active weeks)

Have at least 10 pings in the area of interest on average in active weeks

80 percent of pings during hours of 9 — 11:15 a.m. are in the area of interest

60 percent of pings during hours of 9 — 11:15 a.m. are in their “broad morning location” where “broad morning location” is at the geohash6 level (a rectangle of roughly 0.75 miles 0.4 miles).

40 percent of pings during hours of 9 — 11:15 a.m. are in their “narrow morning location” where “narrow morning location” is at the geohash7 level (a square with edge length of roughly 500 feet).

Have their “broad morning location” in the area of interest
These restrictions give us 32,581 users, which we refer to as our “user base.” We then consider the set of restaurants. We begin with the set of restaurants known to Yelp in the San Francisco Bay Area, which we reduce through the following restrictions:

Locations are in the area of interest

Locations belong not just to the category “food” but also belong to certain subcategories (manually) selected from Yelp’s list (https://www.yelp.com/developers/documentation/v2/category_list): thai, soup, sandwiches, juicebars, chinese, tradamerican, newamerican, bars, breweries, korean, mexican, pizza, coffee, asianfusion, indpak, delis, japanese, pubs, italian, greek, sportsbars, hotdog, burgers, donuts, bagels, spanish, basque, chicken_wings, seafood, mediterranean, portuguese, breakfast_brunch, sushi, taiwanese, hotdogs, mideastern, moroccan, pakistani, vegetarian, vietnamese, kosher, diners, cheese, cuban, latin, french, irish, steak, bbq, vegan, caribbean, brazilian, dimsum, soulfood, cheesesteaks, tapas, german, buffets, fishnchips, delicatessen, texmex, wine_bars, african, gastropubs, ethiopian, peruvian, singaporean, malaysian, cajun, cambodian, cafes, halal, raw_food, foodstands, filipino, british, southern, turkish, hungarian, creperies, tapasmallplates, russian, polish, afghani, argentine, belgian, fondue, brasseries, himalayan, persian, indonesian, modern_european, kebab, irish_pubs, mongolian, burmese, hawaiian, cocktailbars, bistros, scandinavian, ukrainian, lebanese, canteen, austrian, scottish, beergarden, arabian, sicilian, comfortfood, beergardens, poutineries, wraps, salad, cantonese, chickenshop, szechuan, puertorican, teppanyaki, dancerestaurants, tuscan, senegalese, rotisserie_chicken, salvadoran, izakaya, czechslovakian, colombian, laos, coffeeshops, beerbar, arroceria_paella, hotpot, catalan, laotian, food_court, trinidadian, sardinian, cafeteria, bangladeshi, venezuelan, haitian, dominican, streetvendors, shanghainese, iberian, gelato, ramen, meatballs, armenian, slovakian, czech, falafel, japacurry, tacos, donburi, easternmexican, pueblan, uzbek, sakebars, srilankan, empanadas, syrian, cideries, waffles, nicaraguan, poke, noodles, newmexican, panasian, acaibowls, honduran, guamanian, brewpubs.^{4}^{4}4Locations can belong to several categories. The location will be included if any categories match.
This yields a list of locations far too broad. We thus refine the resulting set of locations by removing:

The coffee and tea chains Starbucks, Peet’s and Philz Coffee

All locations whose name matches the regular expression (coffeetea) but whose name does not start with “coffee”

All locations whose name matches the regular expression (donutdoughnut) but does not contain “bagel”

All locations whose name matches the regular expression food court

All locations whose name matches the regular expression mall

All locations whose name matches the regular expression market

All locations whose name matches the regular expression supermarket

All locations whose name matches the regular expression shopping center

All locations whose name matches the regular expression (yogurtice creamdessert)

All locations whose name matches the regular expression cater but does not match the regular expression (and&) (this is to keep places like “Catering and Cafe” in the sample)

All locations whose name matches the regular expression truck and who do not have a street address (these are likely to be food trucks that move around)

A number of “false positives” manually by name (commonly these are grocery stores, festivals or farmers’ markets)

A number of cafeterias at prominent Bay Area tech companies like Google, VMWare and Oracle
Finally, we review the list of locations that would be removed under these rules and save a few handsful of locations from removal manually.
Applying these restrictions leaves us with 6,819 locations. As a last step we deduplicate on geohash8. Some locations are so close together that given our matching method we cannot tell them apart and need to decide which of potentially several locations in a geohash8 we want to assign a visit to. In 4,577 cases there is a unique restaurant in the geohash8, while 687 have two, with the remainder having three or more. We deduplicate using the first restaurant in alphabetical order, leaving us with 5,555 locations. (One reason to remove San Francisco from the sample is that higher density areas have more duplication.) The resulting restaurants are visualized in Figure 6.
Next, we define a “visit” to a restaurant. For each user, each restaurant and each day we count the number of pings in the restaurant’s geohash8 as well as its immediately adjacent geohash8s as well as the dwelltime, defined as the difference between the earliest and the latest ping seen at the loction during lunch hour. Call any such match a “visit candidate”. To get from visit candidates to visits, we impose the requirement that there be at least 2 pings in one of the location’s geohash8s and that the dwelltime be at least 3 minutes. We also require that the visit be to a location that has no overlap with either the person’s home geohash7 or the geohash7 we have identified as the person’s narrow morning location so as to reduce the possibility of misidentifying people living near a location or working at the location as visiting the location. In cases where a sequence of pings satisfying these criteria falls into the geohash8s of multiple locations we attribute the visit to the locations for which the dwelltime is longest.
To put together our estimation dataset, we restrict the above visits to a set of users and restaurants we see sufficiently often. We require first that each user have at least 3 visits during the sample period, that each location have at least one visit by someone in the user base per week on average, or at least five visits overall (from users overall, not just those in our user base). This leaves us with 106,889 lunch visits by 9,188 users to 4,924 locations.
We also use data from Yelp to infer the dates of restaurant openings and closings. We use the following heuristic: the opening is the date on which a listing was added to the Yelp database, while the closing date is the date on which a restaurant is marked by a member as closed. Figure
7 shows the openings and closings throughout the sample period. We focus on openings and closings of restaurants that are considered by users whose morning location is within 3 miles of the opening/closing restaurant and who collectively take at least 500 lunch visits both before and after the change in status.Distance
As our measure of distance between a user’s narrow morning location and each of the items in her choice set we use the simple straightline distance (taking into account the earth’s curvature). After calculating these distances we cull all alternatives that are further than 20 miles away from the choice set.
Item covariates
The following restaurant covariates (or subsets thereof) are used in the estimation of both the MNL and the TTFM:

rating_in_sample: the average rating awarded during the sample period Jan – Oct 2017. If missing the value is replaced by the rating_in_sample average and another variable, rating_in_sample_missing indicates that this replacement has been made

N_ratings_in_sample: the number of ratings that entered the computation of rating_in_sample

rating_overall: the average all–time rating. If missing the value is replaced by the rating_overall average and another variable, rating_overall_missing indicates that this replacement has been made

N_ratings_overall: the number of ratings that entered the computation of rating_overall

category_mexican – category_dancerestaurants: A number of 0/1 indicator variables for whether an item has the corresponding category associate with it on Yelp
a.3 Estimation Details
To estimate the TTFM model, we build on the approach outlined in the appendix of Ruiz, Athey and Blei (2017), and indeed we use the same code base, since when we ignore the observable attributes of items, our model is a special case of Ruiz, Athey and Blei. Ruiz, Athey and Blei considers a more complex setting where shoppers consider bundles of items. When restricted to the choice of a single item, the model is identical to TTFM replacing price with distance for TTFM. However, we treat observable characteristics differently in TTFM than Ruiz, Athey and Blei. In the latter, observables enter the consumer’s mean utility directly, while in TTFM we incorporate observables by allowing them to shift the mean of the prior distribution of latent restaurant characteristics in a hierarchical model.
We assume that one quarter of latent variables are affected by restaurant price range, one quarter are affected by restaurant categories, one quarter are affected by star ratings, and for one quarter of the latent variables there are no observables shifting the prior.
The TTFM model defines a parameterized utility for each customer and restaurant,
where denotes the utility for the th visit of customer to restaurant . This expression defines the utility as a function of latent variables which capture restaurant popularity, customer preferences, distance sensitivity, and timevarying effects (e.g., for holidays). All these factors are important because they shape the probabilities for each choice. Below we describe the latent variables in detail.
Restaurant popularity. The term is an intercept that captures overall (timeinvariant) popularity for each restaurant . Popular restaurant will have higher values of , which increases their choice probabilities.
Customer preferences. Each customer has her own preferences, which we wish to infer from the data. We represent the customer preferences with a vector for each customer. Similarly, we represent the restaurant latent attributes with a vector of the same length. For each choice, the inner product represents how aligned the preferences of customer and the attributes of restaurant are. This term increases the utility (and consequently, the probability) of the types of restaurants that the customer tends to prefer.
Distance effects. We next describe how we model the effect of the distance from the customer’s morning location to each restaurant. We posit that each customer has an individualized distance sensitivity for each restaurant , which is factorized as , where latent vectors and have length . Using a matrix factorization approach allows us to decompose the customer/restaurant distance sensitivity matrix into percustomer latent vectors and perrestaurant latent vectors , both of length , therefore reducing the number of latent variables in the model. Thus, the inner product indicates the distance sensitivity, which affects the utility through the term . We place a minus sign in front of the distance effect terms to indicate that the utility decreases with distance.
Timevarying effects. Taking into account timevarying effects allows us to explicitly model how the utilities of restaurants vary with the seasons or as a consequence of holidays. Towards that end we introduce the latent vectors and of length . For each restaurant and calendar week , the inner product captures the variation of the utility for that restaurant in that specific week. Note that each trip of customer is associated with its corresponding calendar week, .
Noise terms. We place a Gumbel prior over the error (or noise) terms , which leads to a softmax model. That is, the probability that customer chooses restaurant in the th visit is
where denotes the choice.
Hierarchical prior. The resulting TTFM model is similar to the Shopper model (Ruiz, Athey and Blei, 2017), which is a model of market basket data. The TTFM is simpler because it does not consider bundles of products, i.e., we restrict the choices to one restaurant at a time, and thus we do not need to include additional restaurant interaction effects.
A key difference between Shopper and the TTFM is how we deal with lowfrequency restaurants. To better capture the latent properties of lowfrequency restaurants, we make use of observed restaurant attributes. In particular, we develop a hierarchical model to share statistical strength among the latent attribute vectors and .^{5}^{5}5We could also consider a hierarchical model over the time effect vectors , but these are lowdimensional and factorize a smaller restaurant/week matrix, so for simplicity we assume independent priors over . Inspired by Zhao, Du and Buntime (2017), we place a prior that relates the latent attributes with the observed ones. More in detail, let be the vector of observed attributes for restaurant , which has length . We consider a hierarchical Gaussian prior over the latent attributes and distance coefficients ,
Here, we have introduced the latent matrices and , of sizes and respectively, which weigh the contribution of each observed attribute on the latent attributes. In this way, the (weighted) observed attributes of restaurant can shift the prior mean of the latent attributes. By learning the weighting matrices from the data, we can leverage the information from the observed attributes of highfrequency restaurants to estimate the latent attributes of lowfrequency restaurants.
To reduce the number of entries of the weighting matrices, we set some blocks of these matrices to zero. In particular, we assume that one quarter of the latent variables is affected by restaurant price range only, one quarter is affected by restaurant categories, one quarter is affected by star ratings, and for the remaining quarter we assume that there are no observables shifting the prior (which is equivalent to independent priors). We found that this combination of independent and hierarchical priors over the latent variables works well in practice.
To complete the model specification, we place an independent Gaussian prior with zero mean over each latent variable in the model, including the weighting matrices and
. We set the prior variance to one for most variables, except for
and , for which the prior variance is , and for and , for which the prior variance is. We also set the variance hyperparameters
.Inference. As in most Bayesian models the exact posterior over the latent variables is not available in closed form. Thus, we must use approximate Bayesian inference. In this work, we approximate the posterior over the latent variables using variational inference.
Variational inference approximates the posterior with a simpler and tractable distribution (Jordan, 1999; Wainwright and Jordan, 2008). Let be the vector of all hidden variables in the model, and the variational distribution that approximates the posterior over . In variational inference, we specify a parameterized family of distributions , and then we choose the member of this family that is closest to the exact posterior, where closeness is measured in terms of the KullbackLeibler (KL) divergence. Thus, variational inference casts inference as an optimization problem. Minimizing the KL divergence is equivalent to maximizing the evidence lower bound (ELBO),
where denotes the observed data and . Thus, in variational inference we first find the parameters of the approximating distribution that are closer to the exact posterior, and then we use the resulting distribution
as a proxy for the exact posterior, e.g., to approximate the posterior predictive distribution. For a review of variational inference, see
Blei, Kucukelbir and McAuliffe (2017).Following other successful applications of variational inference, we consider meanfield variational inference, in which the variational distribution factorizes across all latent variables. We use Gaussian variational factors for all the latent variables in the TTFM model, and therefore, we need to maximize the ELBO
with respect to the mean and variance parameters of these Gaussian distributions. We use gradientbased stochastic optimization
(Robbins and Monro, 1951; Blum, 1954; Bottou, Curtis and Nocedal, 2016) to find these parameters. The stochasticity allows us to overcome two issues: the intractability of the expectations and the large size of the dataset.The first issue is that the expectations that define the ELBO are intractable. To address that, we take advantage of the fact that the gradient itself can be expressed as an expectation, and we form and follow Monte Carlo estimators of the gradient in the optimization procedure. In particular, we use the reparameterization gradient (Kingma and Welling, 2014; Titsias and LázaroGredilla, 2014; Rezende, Mohamed and Wierstra, 2014). The second issue is that the dataset is large. For that, we introduce a second layer of stochasticity in the optimization procedure by subsampling datapoints at each iteration and scaling the gradient estimate accordingly (Hoffman et al., 2013). Both approaches maintain the unbiasedness of the gradient estimator.
a.4 Model Tuning and Goodness of Fit
a.5 Additional Results
Table 7 illustrates how much of the variation in mean item utility (excluding distance) is explained by observable characteristics. All observables combined explain 14 percent of the variation. City and categories each explain 6 – 7 percent and lose only a little explanatory power once other variables are accounted for. Star ratings and price range account for 2.8 and 2.3 percent of the variation respectively when considered alone, but only 0.6 percent and 0.4 percent once the other variables are taken into account.
Predictors  Variance contribution  Marginal variance contribution 

Rating  0.028  0.006 
Price range  0.023  0.004 
City  0.062  0.053 
Categories  0.067  0.046 
All  0.140 
Model  Overall  WithinUser  WithinItem  

Mean  SD  SD(Mean)  Mean(SD)  SD(Mean)  Mean(SD)  
TTFM  1.4114  0.6810  0.5992  0.3005  0.2977  0.6003 
MNL  1.4291  0.0033  0.0001  0.0023  0.0002  0.0022 
Table 8 gives the means and standard deviations of elasticities in the MNL and TTFM models. Figure 11 plots the distribution of elasticities where the unit of analysis is the restaurantuser pair.
Tables 9, 10 and 11 illustrate how the model can be used to discover restaurants that are similar in terms of latent characteristics to a target restaurant. Distance between two restaurants, and , is calculated as the Euclidean distance between the vectors of latent factors affecting mean utility, and . Note that because distance is explicitly accounted for at the user level, we do not expect restaurants with similar latent characteristics to be near one another; rather, they will uncover restaurants that would tend to be visited by the same consumers, if they were (counterfactually) in the same location. We see that indeed, the most similar restaurants to our target restaurants are in quite different geographic locations. Perhaps surprisingly, the category of the similar restaurants is generally different from the target restaurant, suggesting that other factors are important to individuals selecting lunch restaurants.
Location  City  Category  Distance (Miles)  Latent Distance 

Zarzour Kabob & Deli  San Jose  Mideastern  17.2  1.58 
Tava Kitchen  Palo Alto  Asian Fusion  0.5  1.62 
Pizza Hut  Menlo Park  Pizza  2.8  1.62 
Subway  Santa Clara  Sandwiches  11.7  1.62 
Rack & Roll BBQ Shack  Redwood City  Seafood  3.8  1.62 
Burger King  Redwood City  Burgers  5.2  1.63 
Subway  Los Gatos  Sandwiches  19.8  1.64 
Pita Salt  Campbell  Street Food  17.0  1.64 
Papa John’s Pizza  San Jose  Pizza  19.5  1.64 
Cutesy Cupcakes  San Jose  Coffee  14.0  1.65 
Location  City  Category  Distance (Miles)  Latent Distance 

The Van’s Restaurant  Belmont  Sandwiches  8.5  1.28 
La Viga Seafood Cocina Mexicana  Redwood City  Mexican  4.1  1.31 
Three Seasons  Palo Alto  Japanese  0.4  1.32 
Cali Spartan Mexican Kitchen  San Jose  Mexican  17.9  1.34 
Poor House Bistro  San Jose  Southern  16.8  1.37 
McCormick Schmick’s Seafood  San Jose  Trad American  17.3  1.38 
Taqueria 3 Hermanos  Mountain View  Mexican  6.1  1.38 
Peanuts Deluxe Cafe  San Jose  Breakfast  17.4  1.38 
Izzy’s San Carlos  San Carlos  New American  6.6  1.38 
Bibo’s Ny Pizza  San Jose  Pizza  18.0  1.39 
Location  City  Category  Distance (Miles)  Latent Distance 

Gourmet Franks  Palo Alto  Hotdog  0.03  3.07 
Lobster ShackXpress  Palo Alto  Seafood  0.01  3.31 
Mayfield Bakery & Cafe  Palo Alto  New American  0.72  3.44 
Shalala  Mountain View  Japanese  6.15  3.46 
Tin Pot Creamery  Palo Alto  Coffee  0.70  3.47 
Mexican Fruit Stand  San Jose  Street Food  18.63  3.60 
Leonardo’s Italian Deli & Cafe  Millbrae  Coffee  16.50  3.62 
Villa Del Sol Argentinian Restaurant  South San Francisco  Latin  19.84  3.63 
Bobo Drinks Express  San Jose  Coffee  19.34  3.63 
Merlion Restaurant & Bar  Cupertino  Bars  11.81  3.64 
Tables 12, 13 and 14 examine restaurants that are similar accounting for all components of utility. Let be the average over dates that user visited restaurants of . Distance between two restaurants, and , is calculated as the Euclidean distance between the mean utility vectors, and , where is the number of users. Relative to the previous exercise, we see that similar locations are very close geographically, but also still similar in other respects as well. There are many restaurants in close proximity to the selected restaurants, so the list displayed is not simply the set of closest restaurants.
Location  City  Category  Distance (Miles)  Latent Distance 

Coupa Café  Palo Alto  Coffee  0.09  6.69 
Cafe Venetia  Palo Alto  Coffee  0.14  7.54 
Jamba Juice  Palo Alto  Juice  0.46  7.72 
LYFE Kitchen  Palo Alto  New American  0.17  7.74 
Sancho’s Taqueria  Palo Alto  Mexican  0.25  7.81 
T4  Palo Alto  Coffee  0.18  7.89 
Lemonade  Palo Alto  New American  0.19  7.99 
Coupa Café  Palo Alto  Coffee  0.28  8.17 
Darbar Indian Cuisine  Palo Alto  Indpak  0.27  8.21 
Gelataio  Palo Alto  Gelato  0.27  8.23 
Location  City  Category  Distance (Miles)  Latent Distance 

Bare Bowls  Palo Alto  Juicebars  0.44  6.41 
Coconuts Caribbean Restaurant  Palo Alto  Caribbean  0.56  6.63 
The Oasis  Menlo Park  Bars  0.36  6.66 
Coupa Café  Palo Alto  Coffee  0.48  6.86 
Pizza My Heart  Palo Alto  Pizza  0.44  7.07 
Fraiche  Palo Alto  Coffee  0.48  7.21 
Cafe Del Sol Restaurant  Menlo Park  Mexican  0.86  7.23 
MP Mongolian BBQ  Menlo Park  BBQ  0.68  7.34 
Bistro Maxine  Palo Alto  Breakfast  0.49  7.85 
Koma Sushi Restaurant  Menlo Park  Japanese  0.35  7.88 
Location  City  Category  Distance (Miles)  Latent Distance 

Crepevine Restaurant  Palo Alto  New American  0.61  17.96 
California Pizza Kitchen  Palo Alto  New American  0.10  18.03 
True Food Kitchen  Palo Alto  New American  0.08  19.67 
Joya Restaurant  Palo Alto  Mexican  0.58  19.85 
Gott’s Roadside  Palo Alto  Bars  0.68  20.18 
Pressed Juicery  Palo Alto  Juice  0.03  20.37 
American Girl  Palo Alto  Trad American  0.10  20.37 
Dashi Japanese Restaurant  Menlo Park  Japanese  2.75  20.78 
Cafe Bistro  Palo Alto  New American  0.30  20.84 
NOLA Restaurant  Palo Alto  Bars  0.54  21.01 
a.6 Counterfactual Calculations
Figure 12 illustrates the model’s predicted impact of restaurant openings and closings on different groups of neighboring restaurants.
Sections 6 and the counterfactual exercise in 5 rely on a similar form of calculation: how many visits would we predict restaurant would receive if it were located in location currently occupied by restaurant . When we do this, we assume that all characteristics of , both observed and latent stay the same, except that when we calculate the utility for each consumer for , we use the location of when calculating distances. In principle, we can predict the demand would receive at any location in the region, however it is easier to have replace an existing location , since this ensures that the chosen location is reasonable (e.g. not in the middle of a forest or a highway).
To calculate demand for replacing restaurant , we calculate new values of the utilities for for each user and session , which change only due to the new distances are used instead of the real distances .
Then we recalculate each user’s new choice probabilities in each session, and take the sum across all users and sessions in order to get the new predicted total demand for each restaurant under the counterfactual that is located in the location of restaurant .
In Section 5, we repeat this calculation for each restaurant that either opens or closes. We draw from two distinct sets, is 100 restaurants chosen at random from the same category as and is 100 restaurants chosen at random from restaurants that are not in the same category as . In Table 6 we compare the predicted demand for the place that opens or closes, , to the mean counterfactual predictions for in and , i.e.,
In Section 6, the set of target restaurants includes one location selected at random
from each geohash6. The set is one restaurant from each major category (the
variable category_most_common
) with the constraint that each restaurant
chosen is within standard deviation of the population mean for total
demand. This constraint was to try to make the set of comparison restaurants
relatively similar in popularity. In the “best location for each category” in
Figure 13 we plot for a single category the predicted demand
for each in the set of target locations.
In Figure 14, we selected subsets of 4
or 5 categories of restaurants from that have the same price range and illustrate for each target location the category of restaurant that is