Nowadays, the prevalent method of finding information is to ”Google it”. Should we want to find locations, Google Maps is our go-to place. However, not all locations are indexed in Google Maps. Indeed, why would Google index all street lamps in New York, all the park benches in Paris, or all the bridges in Amsterdam? Nonetheless, it is conceivable that a photographer is looking for lamps that offer his desired color composition and background or a film crew looking for the ideal bridge to perform a stunt. In this paper, we provide a method for finding non-indexed places in Google Maps.
When locations have distinctive spatial features a straightforward approach to classify locations is to use images . Such images may have a multitude of sources as long as they can be associated with a GPS location . Google APIs allow us precisely that. Through Google Static Maps API  and Google Street view API , we can gather satellite and street view images, respectively, to aid classification.
We consider the task of finding outdoor places to practice Parkour and FreeRunning (detailed in Section II). Parkour involves jumping, climbing, and/or running, or any other form of movement, typically in urban environments . As a community that just now creating its second generation of practitioners, the Parkour community is new and fast-growing . As it settles down and organizes, tools and common shared resources emerge, and the most commonly shared information between communities is training locations . One of the aims of this work is to help members of the Parkour Community systematically find and share training spots within their region of interest.
I-B Prior Work
Geolocation-aided machine vision has been previously studied. Several studies [21, 8] were able to place confidence regions on the surface of the earth, based on the pixels of a single image. The authors of  further leverage a hierarchical database to improve geolocation. However, the objective of all these approaches is to use an image to find a location. On the other hand, we aim to find a location, based not on an image, but on a generic set of spatial features and have the system return possible locations within a region of interest.
This work adds to the extensive literature on machine vision applications. It presents no novelty in the methods it employs but in how known methods are utilized to help a growing community. More specifically, in this manuscript, we present:
A scalable method for feature matching in Google Maps;
A real-world-tested and systematic approach towards finding Parkour spots in a region of interest.
Furthermore, we test the system in the Arizona State University (ASU) campus, one of the largest in the United States. And we verify that the method to find Parkour spot presented in this work can methodically populate a database. The method, and the database, have the logo in Figure 1.
reviews the current literature and presents the proposed solution, detailing the machine learning models employed. In SectionIV we present the results, both of the individual parts of the system as well as the system as a whole. Lastly, Section V summarizes the conclusions, and in Section VI we leave some remarks on how the system can be improved and possible ways of building upon this project.
Ii Problem Formulation
Ii-a The geo-classification problem
The problem we aim to solve is a classification task. Given location in geographical coordinates (latitude and longitude) such that
, output a probabilitythat translates the likelihood of being a Parkour spot. For the sake of simplicity, we assume the only accessible information about are satellite images and street images . Each image belongs to - it contains the three values referring to the components of red, green, and blue (RGB), each between 0 and 1, for every pixel. If we denote by and the sets of all satellite and street view images of a certain location, then we may further define the functions that will extract knowledge from each set, respectively, and . Thus, we may write:
where is the function that weighs and combines the outputs of and resulting in the probability .
Ii-B Features of Parkour Spots
Specific to our application, we must define what features constitute a Parkour spot because the capability of identifying such features needs to be encoded in . Since Parkour, or l’art du déplacement, as the first practitioners call it , suffers from being quite loosely defined. Despite contributing to its glamour, it complicates objective definitions. Thus, although highly subjective, we attempt to define that the quality of a parkour spot. Definition: The suitability of a location for Parkour is proportional to how easily the practitioner can come up with ideas for Parkour moves and sequences in that location. Therefore, we may objectively conclude that the more architectural features exist to jump, climb, roll, crawl, or interact with the environment, the higher the likelihood of the location to be suitable for Parkour.
Iii Proposed Solution
Solving the single-coordinate classification problem presented in Section II
enables us to check coordinates for Parkour spots. Therefore, to find multiple Parkour spots, one simply needs to run the same algorithm for other coordinates systematically. In this section, we present our proposed solution to the classification of single coordinates. Our solution consists of two computer vision tasks, one for top view and one for street view images, and both tasks rely on object detection methods.
Iii-a Object Detection
Typical object detection tasks have two phases: i) Object Localization and ii) Image Classification . While object localization involves using of a bounding box to locate the exact position of the object in the image, image classification is the process of correctly classifying the object within the bounding box . Figure 2 helps show the difference. Instance and semantic segmentation go a level deeper. In semantic segmentation each pixel is classified to a particular class label, hence it is a pixel-level classification . Instance segmentation is similar except that multiple objects of the same class are considered separately as individual entities .
For the top-view model, we opt an image classification method. For us humans is hard to delineate and annotate useful features for Parkour in satellite images, therefore we intend to leave this complexity for the network to learn. We opt for the instance segmentation approach for the street view model because it provides a more detailed classification, while object detection would only provide bounding boxes. Bounding boxes become less practical and robust when the shape of the object varies considerably and is random in nature . We choose instance instead of semantic segmentation because the number of objects definitely matters in the quality of Parkour spots.
Iii-B Satellite Imaging model
The model to process satellite images uses binary classification. In a satellite image, Parkour locations may contain visible features such as stairs, railings, walls, and other elevations that resemble obstacle courses or may be suitable for Parkour. By providing only 0 or 1 labeled image our goal is to have the model identify these patterns through convolutions.
The coordinates of known Parkour locations were crowdsourced from parkour communities worldwide. We gathered over 1300 coordinates from cities such as Paris (France), London (United Kingdom), Lisbon (Portugal), and Phoenix (Arizona, United States). For top-view, the coordinates were queried from the Google Map Static API  with a fixed magnification of 21, resulting in high definition images of 640 by 640 pixels. For the negative examples required for training of the top-view model, we uniformly sampled cities, gathering 400 random coordinates from 6 random locations, resulting in 2400 negative samples. Figure 3 shows some positive and negative examples.
With our classification problem requiring all the detail satellite images can provide, it is essential to maintain a relatively high resolution. However, larger images imply larger memory requirements during training. We downscaled the images from (640,640) pixels to (512,512). Then, we divide each satellite image into four to reduce the chance of false positives by limiting the information in each input - this approach is represented in Figure 4. Post filtering, the positive training set had 3117 samples, and the negative set had 13,231 samples. For training, 3117 random negative samples were selected to maintain class balance.
Training partial sections of the model, including (but not limited to) exclusively convolution layers;
Forcing data imbalance towards the positive samples space so the model can learn positive features better;
Ultimately, we designed our own model based on the above-mentioned architectures. The hyperparameters used to train the CNN are listed in TableI.
Iii-C Street view model
The data collected from the community had street view images and other images taken by the community. Out of 1300 coordinates verified to contain Parkour spots, each image was manually evaluated, and the dataset was narrowed down to 249 images for training and 51 images for validation. The images were filtered on the basis of:
How clear and understandable the images were to the naked eye.
Selecting only daytime images since nighttime images were really low in number and it could hamper the prediction capability of the model because those images could act as noise. Google street view images are all captured in the daytime.
Images which were blurry and really complicated to annotate were discarded.
The Mask R-CNN works for any input up to 1024 by 1024, but we used inputs of size 640 by 640, because that is the maximum size for the Google street view API.
Contrary to typical model training, we did not aim to minimize the loss during training. Mathematically defining a loss for finding a parkour spot is hard. Instead, we manually tuned parameters and assessed the output images of the test set. When the loss was sufficiently low, and the output matched our intuition for parkour spots by not over- nor under-identifying objects, then we stopped training. Model hyperparameters and implementation-specific parameters are in Table II. For specific parameters, refer to [9, 4] for their meaning.
In this section, we first analyze the performance of each system component, i.e. satellite and street view models. Subsequently, both models are integrated as described in Section III. The performance is assessed using real, unlabeled data.
Iv-a Satellite Model
The satellite or top-view model was trained on thousands of positive and negative labeled examples. As a result of such training, the binary classification model yielded a classification accuracy of 80% on our test set. The confusion matrix in6 reflects the performance of our model.
Looking now at unlabeled data, Figure 7 shows the performance of the model given a grid of 196 images spanning 100 meters from the central coordinate. The model can detect most of the small elevations, walls, railing, stairs, and similar features that are suitable for Parkour. However, the model does classify pointed roofs, solar panel arrays, and HVAC (heating, ventilation and air conditioning) arrays as positives. A solution is to include similar samples in the negatives training set.
Iv-B Street view Model
To assess the street view model in realistic conditions, we used many unlabeled examples. Overall, the model works consistently well. Figure 8 shows an example. We see that although the model might at times mistake railings by walls, it is still identifying those elements to be useful for Parkour, which is what is most relevant to our application.
Iv-C ASU Campus Results
To test the end-to-end framework, we used the proposed system to identify spots at ASU campus. We used a center coordinate and an area of interest that is a square inscribed in a circle with a radius of 650 meters. Then we uniformly sampled the region to achieve non-overlapping satellite images (roughly 40 meters apart), and acquired four 90-degree street view images in each of the uniform coordinates.
The method used to determine the quality of a Parkour spot is counting the number of class hits for each of the four street view directions. If there are more than Parkour-usable objects (i.e. short walls, stairs or rails), we mark the coordinate as containing a Parkour spot. The number of positives can be controlled with the threshold . Resorting solely to street-view provided the highest reliability and interpretability. Table III shows some statistics from this study.
|Center coordinate||(33.4184, -111.9328)|
|Radius of interest||650 meters|
|Number of coordinates||1155|
|Number of API requests||5775|
|Total cost of API requests||34.65 $|
|Number of Positives||46|
|Number of True Positives||28|
Almost 50% of the positive results are false positives. Figure 9
shows four cases where the system was fooled. First, unbeknownst to us, the Google street view API sometimes returns indoor images. Since tables, benches, counters, and walls are identified as useful for Parkour, indoor locations are wrongly ranked high. Outdoor locations with furniture are also positively classified. Pools too, due to having a fair share of sun loungers and railings. Lastly, several street-view requests had a view considerably above street-level, leading the system to identify spots for 5-story high giants instead of humans. We estimate that filtering problematic inputs from the Google API can reduce the percentage of false positives to below 20%.
In this work, we presented a systematic method for finding Parkour spots. The first of its kind in the Parkour Community. We defined the general feature matching problem. Using a binary classification of satellite images and instance segmentation for street view images and connecting both approaches to maximize the information derived for each coordinate, we accurately determined the likelihood of a location being a Parkour Spot. We then executed our methodology on our campus and personally verified the quality of the results, achieving a precision of over 60%. Finally, we analyzed the most prominent false positives and identified fixes to improve the system performance further.
Vi Future Work
We presented a scalable framework because its performance improves by enhancing its parts. One future work direction can be to evolve the proposed system by enhancing the satellite image (top-view) model or the street-view model accuracy, robustness, or inference speed. In terms of inference speed, the bottleneck is on the street-view model. High-speed object detection approaches, like Single Shot Detection  and YOLO 
, can improve classification speed while possibly improving performance. The integration of both models can be improved to reduce the required API requests, i.e. cost and speed of operation in unknown terrain. Furthermore, another interesting approach is to exploit feature extraction explicitly, e.g. by engineering a solution with edge detection. Finally, it would be interesting to study faster ways of encoding feature knowledge since it takes several days to perform all data annotations.
Graph-based discriminative learning for location recognition.
2013 IEEE Conference on Computer Vision and Pattern Recognition, Vol. , pp. 700–707. External Links: Cited by: §I-B.
-  (2019) The VGG image annotator (VIA). CoRR abs/1904.10699. External Links: Cited by: §III-C.
-  (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR abs/1311.2524. External Links: Cited by: §III-A.
-  Mask R-CNN for Object Detection and Segmentation. Note: https://github.com/matterport/Mask_RCNN[Online; Dec 2021] Cited by: §III-C, §III-C.
-  Parkour Spot ID. Note: https://github.com/jmoraispk/ParkourSpotID[Online; Dec 2021] Cited by: Fig. 1.
-  Google Maps Static API. Note: https://developers.google.com/maps/documentation/maps-static/overview[Online; Dec 2021] Cited by: §I-A, §III-B.
-  Google Maps Street View Static API. Note: https://developers.google.com/maps/documentation/streetview/overview[Online; Dec 2021] Cited by: §I-A.
-  (2008) IM2GPS: estimating geographic information from a single image. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, Vol. , pp. 1–8. External Links: Cited by: §I-A, §I-B.
-  (2017) Mask R-CNN. CoRR abs/1703.06870. External Links: Cited by: §III-C, §III-C.
-  (2015) Deep residual learning for image recognition. CoRR abs/1512.03385. External Links: Cited by: §III-B.
-  (2017) Adam: a method for stochastic optimization. External Links: Cited by: TABLE I.
-  (2014) Microsoft COCO: common objects in context. CoRR abs/1405.0312. Cited by: §III-C.
-  (2015) SSD: single shot multibox detector. CoRR abs/1512.02325. External Links: Cited by: §VI.
-  (2009) Parkour, the city, the event. Environment and Planning D: Society and Space 27 (4), pp. 738–750. External Links: Cited by: §II-B.
-  (2015) You only look once: unified, real-time object detection. CoRR abs/1506.02640. External Links: Cited by: §VI.
-  (2008) Playing with fear: parkour and the mobility of emotion. Social & Cultural Geography 9 (8), pp. 891–914. External Links: Cited by: §I-A.
-  (2013-12) OverFeat: integrated recognition, localization and detection using convolutional networks. International Conference on Learning Representations (ICLR) (Banff), pp. . Cited by: §III-A.
-  (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §III-B.
CS231n: Convolutional Neural Networks for Visual Recognition, 2017. Note: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf[Online; Dec 2021] Cited by: Fig. 2, §III-A.
-  (2015) Rethinking the inception architecture for computer vision. CoRR abs/1512.00567. Cited by: §III-B.
-  (2016) PlaNet - photo geolocation with convolutional neural networks. CoRR abs/1602.05314. External Links: Cited by: §I-A, §I-B.
-  Parkour. Note: https://en.wikipedia.org/wiki/Parkour[Online; Dec 2021] Cited by: §I-A.