Log In Sign Up

Aerial image geolocalization from recognition and matching of roads and intersections

by   Dragos Costea, et al.

Aerial image analysis at a semantic level is important in many applications with strong potential impact in industry and consumer use, such as automated mapping, urban planning, real estate and environment monitoring, or disaster relief. The problem is enjoying a great interest in computer vision and remote sensing, due to increased computer power and improvement in automated image understanding algorithms. In this paper we address the task of automatic geolocalization of aerial images from recognition and matching of roads and intersections. Our proposed method is a novel contribution in the literature that could enable many applications of aerial image analysis when GPS data is not available. We offer a complete pipeline for geolocalization, from the detection of roads and intersections, to the identification of the enclosing geographic region by matching detected intersections to previously learned manually labeled ones, followed by accurate geometric alignment between the detected roads and the manually labeled maps. We test on a novel dataset with aerial images of two European cities and use the publicly available OpenStreetMap project for collecting ground truth roads annotations. We show in extensive experiments that our approach produces highly accurate localizations in the challenging case when we train on images from one city and test on the other and the quality of the aerial images is relatively poor. We also show that the the alignment between detected roads and pre-stored manual annotations can be effectively used for improving the quality of the road detection results.


page 3

page 6


TTPLA: An Aerial-Image Dataset for Detection and Segmentation of Transmission Towers and Power Lines

Accurate detection and segmentation of transmission towers (TTs) and pow...

On the usability of deep networks for object-based image analysis

As computer vision before, remote sensing has been radically changed by ...

A Two-Stream Symmetric Network with Bidirectional Ensemble for Aerial Image Matching

In this paper, we propose a novel method to precisely match two aerial i...

An Aerial Image Recognition Framework using Discrimination and Redundancy Quality Measure

Aerial image categorization plays an indispensable role in remote sensin...

Precise Aerial Image Matching based on Deep Homography Estimation

Aerial image registration or matching is a geometric process of aligning...

See as a Bee: UV Sensor for Aerial Strawberry Crop Monitoring

Precision agriculture aims to use technological tools for the agro-food ...

1 Introduction

The ability to accurately recognize different categories of objects from aerial imagery, such as roads and buildings, is of great importance in understanding the world from above, with many useful applications ranging from mapping, urban planning to environment monitoring. This domain is starting a flourishing period, as the several technological and computational aspects involved, both at the hardware and algorithms levels, form in combination very powerful systems that are suitable for practical, real-world tasks. In this paper we address two important problems that are not sufficiently studied in the literature. We are among the first, to our best knowledge, to propose a method for automatic geo-localization in aerial images without GPS information, by putting in correspondence the real world images with the publicly available, manually labeled maps from the OpenStreetMap (OSM) project 111 We solve the task by first learning to detect roads and intersections in aerial images, and then learn to identify specific intersections based on a high level descriptor that puts in correspondence the detected intersections from real world images to intersections detected in the manually labeled OSM maps. Accurate localization is then obtained by the geometric alignment of the two road maps - the detected ones and the OSM annotations - at the final step. We present how the alignment to the OSM maps could be used to improve the quality of the detected roads and intersections. We also show that the accurate geometric registration of roads and intersections can improve both recognition of the roads and the initial localization. A key insight of our approach is the observation that intersections tend to have a unique road pattern surrounding them and thus can play a key role in localization, by reducing this difficult task to a sparse feature matching problem followed by a local refined roadmap alignment. For the accurate detection of roads we use a recent state of the art method [1] that is based on a dual stream local-global deep CNN, which takes advantage of both the local appearance of an object as well as the larger contextual region around the object of interest, in order to augment its local appearance and thus improve recognition performance.

2 Related work on road detection and localization

Road detection in aerial imagery has been traditionally addressed by detection methods that use manually designed features  [21, 19, 15, 13, 9]

. The recent success of convolutional neural networks 

[14, 25] has led to greatly improved accuracy and robust road detection [22, 24]. As shown in [1], the lack of good quality aerial images, as well as clutter and occlusion can greatly affect and significantly degrade the learning and performance even for top, state-of-the-art architectures. Post-processing is often required in aerial image analysis [21], but it is not expected to solve the most difficult cases. There are many approaches proposed for road detection, such as following road tracks [11], local context modeling with CRFs [23], minimum path methods [26] or using neural networks [22]

. Arguably, free road vectors are widely available for most of the planet. However, they are sometimes misaligned and have a poor level of detail. Therefore some methods attempt to correct these road vectors by aligning them to real rectified aerial images 

[20]. Topological road improvement methods trace back to [8]. A more recent approach [23] uses Conditional Random Fields in conjunction with a minimum cost path algorithm for improving topology. The authors take into account various cues, such as context, cars, smoothness between road widths in order to offset road vertices to their real location. The same authors previously proposed a metric for topology measurement [27].

There are several methods related to automatic geolocalization from aerial images, but the tasks they address differ from ours. Some use known landmarks, others ground floor images or extra GPS or IMU measurements. Most employ sparse, manually designed features - ours being the first, to the best of our knowledge, to automatically localize aerial images from recognition and matching of semantic categories, such as roads and intersections, in the context of deep neural networks. More specifically, related to our work, geolocalization for unmanned aerial vehicles (UAVs) using sparse manually designed features has been proposed in [4], while accurate, sub-pixel manhole localization has been proposed using known landmarks [6]. A road following strategy for UAVs with lost GPS signal is described in [7]. Other authors augment a feature-based approach by fusing camera input with GPU and inertial measurement unit (IMU) outputs. They propose a monocular SLAM approach without visual beacons [12, 5], which yields an error of about 5m. Given the global coverage of aerial images, there has been interest in geolocalizing a ground image using aerial images at training time [18, 29, 17]. Geolocalizing single ground images has also been recently experimented in [28]. An approach loosely related to geolocalization proposed the study of street patterns in order to identify the city class [2].

Figure 1: Framework overview: ground truth road maps and intersections are extracted from OSM, with known locations (left pathway). High-level descriptors are extracted from each intersection and stored offline. At test time, roads and intersections are automatically detected in aerial images using the dual stream CNN model [1]. The same type of intersections descriptors are extracted and matched against the OSM set of descriptors in order to localize a given detected intersection in the aerial image. This provides an initial localization that is further improved by geometric alignment. This also helps in intersection identification - only pairs of intersections with high alignment score are put in correspondence.

3 Our approach

Our method has several stages: 1) road pixelwise classification in a given aerial image; 2) detection of intersections based on the detected roads; 3) identification of a given intersection by matching its surrounding region to regions from a stored dataset of OpenStreetMap(OSM) road and interesections maps. At this stage we keep, for each test intersection, a list of closest OSM interesections in the intersections descriptor space; 4) accurate geometric alignment for improved localization and road detection enhancement. At this stage we keep from the list of candidate intersection matches the one with minimum geometric alignment error. In this work we focus on recognition and localization of given detected intersection. We use intersections as anchors for localization for three reasons. First, once intersections are found and images are aligned to known roadmaps the location of any given point in the image follows immediately. Second, intersections are sparse and require very little computational and storage costs for recognition and matching. Third, they are also sufficiently discriminative localization when their surrounding area is taken into account. They tend to have a unique pattern of roads in the neighborhood region, which acts as a unique fingerprint that is useful for location recognition. We present an overview of our approach in Figure 1

. Note that while we did not use any GPS information for localization, we assumed that we know the orientation of the image with respect to the cardinal points - an information that is easily obtained with a compass in a real world situation. To account for small errors in orientation estimation we added a random Gaussian noise to the test image rotation angle with 0 mean and standard deviation of 5 degrees. While the added noise affected slightly the performance of intersection recognition, it did not influence the final geometric alignment stage that is affine invariant. We detail the stages of our pipeline next.

3.1 Finding roads and intersections

Figure 2:

Our system for detection of roads and intersections. We first detect roads in the image by classifying each individual pixel using the recently proposed LG-Net model, that processes information along two pathways - a local one for reasoning based on local appearance, and a global pathway for image interpretation at the level of the larger contextual region. The detected roadmap is then passed to an adjusted AlexNet model trained for the task of intersection recognition. Intersections are detected by a scanning window approach followed by non-maxima suppression.

Detection of roads:

We train a state-of-the-art dual stream local-global Convolutional Neural Network [1] (LG-Net) on the task of road detection (Figure 2). The network combines two pathways, one based on an adjusted VGG-Net [25] that uses local appearance information (a local 64x64 patch surrounding the road region) and the other, based on an adjusted AlexNet [14], which takes as input a significantly larger neighborhood (256x256) for contextual reasoning. The two pathways are joined in the last FC layers and the output is a small 16x16 center patch having 1’s for road pixels and zeros otherwise. The final road map is obtained by dividing the larger aerial images into disjoint 16x16 patches, which are classified independently. In the experiments presented in [1] the local-global network achieves an F-measure that is consistently superior to a network that has only the local pathway. Also, compared to previous contextual approaches to road detection, ours avoids hand crafted cues, such as the nearby cars and consistent road width [20] or nearby lines [30], and effectively learns to reason about context by considering the larger area containing the road.

Detection of intersections:

For the detection of intersections we trained an adjusted AlexNet architecture, modified to output a single class to signal the presence or absence of an intersection at a given point in the image. We considered as input several channels containing the original RGB image as well as the estimated roadmap provided by the LG-Net. Including the channels with the original RGB low level signal improved the maximum detection F-measure from to

, in our experiments, using a scanning window approach with non-maxima suppression. The most relevant of the two types of input is the estimated roadmap that represents signal at a higher, semantic level of image interpretation. Note that intersections, by definition, are directly related to the existence of at least two roads that intersect. In order to speed up the detection of intersections we classified pixels on the grid (with steps of 10 pixels) and obtained the final dense intersections map by interpolation. This resulted in a speedup by two orders of magnitude at the cost of a relatively small decrease in detection quality. In Figure 

2, we also present the system for intersection detection with an example estimated map of intersections. We notice that most intersections are detected, while, in some cases, intersections seem to be correctly detected in the image but are not present in the OSM, which we considered as ground truth. Note that such inconsistencies between images and manually labeled roads are not uncommon in OSM.

3.2 Automatic geolocalization

We represent each intersection by a descriptor which is learned such that identical intersections from detected roads and OSM roads should have similar descriptors, while descriptors for different intersections should be as far separated as possible. For extracting the intersection descriptors we start from the modified AlexNet trained for intersection detection, such that the last FC layer of 4096 elements is used as a descriptor. Intersections from the detected road maps will be matched against a database from OSM using Euclidean distances in descriptor space. While this approach proves to be very effective, we further improve the performance by fine-tuning the network using backpropagation for adjusting distances in descriptor space in order to improve matching performance. (Figure

3). Localization is further refined by the geometric alignment between the estimated roads and the OSM roads in the regions centered at the intersections that have been put in correspondence. We detail next the algorithms for matching and localization.

Descriptor extraction and learning:

We extract descriptors for intersection images in a way that is similar to [16]. Moreover, we fine-tune the descriptor extracted for intersections from the neural network, so as to minimize the distance between identical intersections and maximize the distances between dissimilar ones. First, we train the modified Alexnet for intersection detection. Second we fine tune the network weights in a Siamese-like fashion, with corresponding intersection pairs from estimated roadmaps and OSM, respectively, marked as positive and different intersection pairs marked as negative. See [10] for details on this type of training. The robust loss formula we use takes in consideration the ground truth label , which is if the intersections are the same and otherwise, the squared Euclidean distance between pairs of intersections descriptors and a margin , which gives zero penalty to descriptors and from different intersections that are at a distance of at least in descriptor space:


Intersection identification:

The learning phase creates a descriptor for each intersection image. Similar images will correspond to descriptors that are close in Euclidean space. When matching two regions centered at two candidate intersection matches, we also consider the descriptors of the nearby intersections. This results in a bipartite graph matching problem for matching two sets of descriptors. It is possible, as nearby intersections usually have similar regions to wrongly match detected intersections to their neighbor OSM intersections , but such local misplacements are most often fixed at the final geometric alignment step when all the roads details in a region are taken into account. Next we present our method for finding correspondences between detected intersections and the ones from OSM, by matching sets of intersections from their corresponding regions. These neighborhoods of a certain radius centered at the intersections of interest. As our experiments show, the larger this radius the more accurate the intersection identification. This is expected, as larger regions include more road structures that are unique to a specific urban area.

for each road detected test intersection and given radius  do
     Gather roads and intersections from region intersection region .
     for each label OSM intersection  do
         Gather roads and intersections from region .
         Compute matching distance between regions:
          1) Get nearest neighbor distance between
each intersection in to intersections from .
          2) Compute sum of 1NN distances .
          3) Get nearest neighbor distance between
each intersection in to intersections from .
          4) Compute sum of 1NN reverse distances .
          5) Set distance between intersections: .
     end for
return list of closest ’s OSM intersections to using distance .
end for
Algorithm 1 Intersection identification by matching regions

Geometric alignment:

Although a location can be theoretically determined by a single correctly identified intersection and a correct rotation with respect to the cardinal points , in order to have a robust match and further improve the initial localization (which could be off due to intersection detection misalignments), we also estimate for a given pair of candidate intersection matches , a geometric affine transformation between the roads in regions and

Then, a misalignment measure is computed such that most outlier candidates in the list

of a given test intersection (found using Algorithm 1) are removed. The 2D registration procedure is performed by sampling road points from the test and query images and computing Shape Context descriptors[3]

at sampled locations. Using kNN with Shape Context descriptors, a list of candidate correspondences are found and an affine transform is robustly estimated using RANSAC. Then, the Euclidean distance transform (

Matlab function) is used in order to compute the symmetrized Chamfer distance between the two registered roadmaps, as a measure of misalignment - which, in practice yields significantly better results. Other approaches (such as [20]) also proposed road alignment. Ours is fast and very effective for rejection of outlier intersection matches, improving localization and road enhancement (next Section). The more detailed overview of our localization algorithm is presented below:

for each road intersection  do
     1) Find list of candidate matches from OSM using Algorithm 1.
     2) Compute symmetric Chamfer distances between
region and the corresponding regions of each .
     3) Return aligned from with minimum distance .
end for
Algorithm 2 Geolocalization algorithm

3.3 Enhancing the road map

We can use the aligned OSM roadmaps to improve the detected roads and vice-versa - since OSM roadmaps sometimes contain wrongly labeled roads, or do not reflect recent road changes. Here present a simple but effective method: 1) we apply a soft dilation procedure on the estimated roadmap and multiply it, pixel by pixel, with the aligned OSM map; 2) the resulted soft output is then smoothed with a Gaussian filter and the result is thinned using a standard nonmax suppression method for boundary detection. 3) after thinning the roads are dilated back, to achieve the initial thickness. The results are substantially better, as expected, greatly improving the similarity between the roads found and the OSM roads - the f-measure in road detection improved from to . Important note: this procedure does not use ground truth localization, but only the entire OSM dataset and relies on the accuracy of the automatic matching and alignment algorithms. It has proved generally effective even when the localization was wrong but the road structure between the matched OSM region and the test image was similar. We present qualitative results in Figure 4.

4 Experimental analysis

Two Cities Dataset:

We collected aerial images of two European cities (termed A and B) and automatically aligned them with the OSM road maps for training and evaluation. We plan to make the dataset public. The images are 600x600px, have the spatial resolution of 1m/pixel and cover an area of about 70 sq. Km each. We use city A for training and validation and images from city B for testing. The quality of the images is fairly low, which makes the task of road detection and localization very challenging, even for the human eye (see example images in Figure 4).

Figure 3 presents the average performance measures after geolocalizing all 3177 intersections from city B. We present intersection identification (recognition) rates versus the region radius (top left plot). As expected performance increases as the region radius increases, at the cost of more computation and data being required. We also demonstrate that the geometric alignment phase significantly increase performance, bringing it close to the mark even when the region radius is small. The plot also presents the consistent improvement brought by fine tuning the descriptors to optimize intersection matching. The other three plots present the distribution of localization errors in meters. We notice that most errors (around or above of them) are below 2.5 meters, that is below 3 pixels for the image resolution available in our experiments. This error is very small considering the poor image quality and the errors present in the OSM itself, which was considered as ground truth. For these reasons we believe that our results demonstrate high level of localization accuracy for our system, which could be very effective in most cases when the GPS signal is lost.

Figure 3: Performance evaluation. Top left plot: performance increases with the region radius. Note that intersections descriptor learning as well as the final geometric alignment method significantly improve localization accuracy. The other three plots, showing distribution of errors per distance in meters show that our approach is able to correctly localize an intersection with an error of maximum 2.5 meters in at least of cases.

Computational details:

Training time for road detection and intersections descriptor learning took between 3-5 days on a GeForce GTX 970 GPU with 4Gb memory and 1664 CUDA cores. At test time, road extraction speed is 5km2/s, at a spatial resolution of 1m/pixel and represents the most expensive task for geolocalization. Intersection detection takes 0.7km2/s, while localization by means of kNN in intersection descriptor space and geometric alignment is an order of magnitude faster in the context of searching within the limits of a sq. Km city.

Figure 4: Enhancing the road detection by region recognition and geometric alignment to OSM roads. Our simple procedure, described in the text, could be useful for both improving the detected roadmap in the test image and correcting the OSM manually labeled maps. Note that for road enhancement we used the automatically matched and aligned regions from OSM using the initial estimated roadmaps, and NOT the ground truth matches.

5 Discussion and Conclusions

We have presented a complete system for geo-localization from aerial images in the absence of GPS information. Our proposed pipeline includes many contributions with efficient methods for road and intersection detection, intersection recognition with geometric alignment for accurate localization, followed by road detection enhancement. There are many potential applications for our approach in areas such as urban planning, tracking structural changes, updating of existing maps and environment monitoring. Our system could also be used in the context of unmanned aerial vehicles, in order to correct their GPS localization or to make their flight possible even when GPS signal is lost. We estimate that if the search area was only times smaller than in our experiments, the automatic localization would be tractable for onboard processing, in near real-time, for current generation of NVIDIA’s embedded GPUs (Jetson TX1). For nighttime use for example, the roads are generally ’extracted’ by means of street lightning, which makes the problem of road and intersection detection easier - thus even more accessible for on-board processing. We have proven that geolocalization from images alone, using learned high level features is feasible and can achieve a high level of accuracy. It can be used as a GPS alternative or in conjunction with GPS, bringing valuable contributions to the literature and also to many applications that require offline or online, realtime processing.

The authors would like to thank Alina Marcu for his dedicated assistance with some of our experiments. Marius Leordeanu was supported in part by CNCS-UEFISCDI, under project PNII PCE-2012-4-0581.


  • [1] Anonymous, ‘Object contra context: Dual local-global semantic segmentation in aerial imagery’, in submitted to ECAI, (2016).
  • [2] Marc Barthélemy and Alessandro Flammini, ‘Modeling urban street patterns’, Physical review letters, 100(13), 138702, (2008).
  • [3] Serge Belongie, Jitendra Malik, and Jan Puzicha, ‘Shape context: A new descriptor for shape matching and object recognition’, in NIPS, volume 2, p. 3, (2000).
  • [4] Fernando Caballero, Luis Merino, Joaquín Ferruz, and Aníbal Ollero, ‘Unmanned aerial vehicle localization based on monocular vision and online mosaicking’, Journal of Intelligent and Robotic Systems, 55(4-5), 323–343, (2009).
  • [5] Fernando Caballero, Luis Merino, Joaquin Ferruz, and Aníbal Ollero, ‘Vision-based odometry and slam for medium and high altitude flying uavs’, Journal of Intelligent and Robotic Systems, 54(1-3), 137–161, (2009).
  • [6] Christian Drewniok and Karl Rohr, ‘High-precision localization of circular landmarks in aerial images’, in Mustererkennung 1995, 594–601, Springer, (1995).
  • [7] Eric Frew, Tim McGee, ZuWhan Kim, Xiao Xiao, Stephen Jackson, Michael Morimoto, Sivakumar Rathinam, Jose Padial, and Raja Sengupta, ‘Vision-based road-following using a small autonomous aircraft’, in Aerospace Conference, 2004. Proceedings. 2004 IEEE, volume 5, pp. 3006–3015. IEEE, (2004).
  • [8] Paolo Gamba, Fabio Dell’Acqua, and Gianni Lisini, ‘Improving urban road extraction in high-resolution images exploiting directional filtering, perceptual grouping, and simple topological concepts’, Geoscience and Remote Sensing Letters, IEEE, 3(3), 387–391, (2006).
  • [9] Armin Gruen and Haihong Li, ‘Road extraction from aerial and satellite images by dynamic programming’, ISPRS Journal of Photogrammetry and Remote Sensing, 50(4), 11–20, (1995).
  • [10] Raia Hadsell, Sumit Chopra, and Yann LeCun, ‘Dimensionality reduction by learning an invariant mapping’, in

    Computer vision and pattern recognition, 2006 IEEE computer society conference on

    , volume 2, pp. 1735–1742. IEEE, (2006).
  • [11] Jiuxiang Hu, Anshuman Razdan, John C Femiani, Ming Cui, and Peter Wonka, ‘Road network extraction and intersection detection from aerial images by tracking road footprints’, Geoscience and Remote Sensing, IEEE Transactions on, 45(12), 4144–4157, (2007).
  • [12] Jonghyuk Kim and Salah Sukkarieh, ‘Real-time implementation of airborne inertial-slam’, Robotics and Autonomous Systems, 55(1), 62–71, (2007).
  • [13] Dan Klang, ‘Automatic detection of changes in road data bases using satellite imagery’, International Archives of Photogrammetry and Remote Sensing, 32, 293–298, (1998).
  • [14]

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton, ‘Imagenet classification with deep convolutional neural networks’, in

    Advances in neural information processing systems, pp. 1097–1105, (2012).
  • [15] Ivan Laptev, Helmut Mayer, Tony Lindeberg, Wolfgang Eckstein, Carsten Steger, and Albert Baumgartner, ‘Automatic extraction of roads from aerial images based on scale space and snakes’, Machine Vision and Applications, 12(1), 23–31, (2000).
  • [16]

    Kevin Lin, Huei-Fang Yang, Jen-Hao Hsiao, and Chu-Song Chen, ‘Deep learning of binary hash codes for fast image retrieval’, in

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 27–35, (2015).
  • [17] Tsung-Yi Lin, Serge Belongie, and James Hays, ‘Cross-view image geolocalization’, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 891–898, (2013).
  • [18] Tsung-Yi Lin, Yin Cui, Serge Belongie, and James Hays, ‘Learning deep representations for ground-to-aerial geolocalization’, in Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, pp. 5007–5015. IEEE, (2015).
  • [19] Yucong Lin and Srikanth Saripalli, ‘Road detection and tracking from aerial desert imagery’, Journal of Intelligent & Robotic Systems, 65(1-4), 345–359, (2012).
  • [20] Gellert Mattyus, Shenlong Wang, Sanja Fidler, and Raquel Urtasun, ‘Enhancing road maps by parsing aerial images around the world’, in The IEEE International Conference on Computer Vision (ICCV), (December 2015).
  • [21] Helmut Mayer, Stefan Hinz, Uwe Bacher, and Emmanuel Baltsavias, ‘A test of automatic road extraction approaches’, International Archives of Photogrammetry, Remote Sensing, and Spatial Information Sciences, 36(3), 209–214, (2006).
  • [22] Volodymyr Mnih and Geoffrey E Hinton, ‘Learning to detect roads in high-resolution aerial images’, in Computer Vision–ECCV 2010, 210–223, Springer, (2010).
  • [23] Javier A Montoya-Zegarra, Jan D Wegner, L’ubor Ladickỳ, and Konrad Schindler, ‘Mind the gap: modeling local and global context in (road) networks’, in Pattern Recognition, 212–223, Springer, (2014).
  • [24] Shunta Saito and Yoshimitsu Aoki, ‘Building and road detection from large aerial imagery’, in IS&T/SPIE Electronic Imaging, pp. 94050K–94050K. International Society for Optics and Photonics, (2015).
  • [25] Karen Simonyan and Andrew Zisserman, ‘Very deep convolutional networks for large-scale image recognition’, arXiv preprint arXiv:1409.1556, (2014).
  • [26] Engin Türetken, Fethallah Benmansour, and Pascal Fua, ‘Automated reconstruction of tree structures using path classifiers and mixed integer programming’, in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 566–573. IEEE, (2012).
  • [27] Jan Wegner, Javier Montoya-Zegarra, and Konrad Schindler, ‘A higher-order crf model for road network extraction’, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1698–1705, (2013).
  • [28] Tobias Weyand, Ilya Kostrikov, and James Philbin, ‘Planet-photo geolocation with convolutional neural networks’, arXiv preprint arXiv:1602.05314, (2016).
  • [29] Scott Workman, Richard Souvenir, and Nathan Jacobs, ‘Wide-area image geolocalization with aerial reference imagery’, in Proceedings of the IEEE International Conference on Computer Vision, pp. 3961–3969, (2015).
  • [30] Jiangye Yuan and Anil M Cheriyadat, ‘Road segmentation in aerial images by exploiting road vector data’, in Computing for Geospatial Research and Application (COM. Geo), 2013 Fourth International Conference on, pp. 16–23. IEEE, (2013).