1 Introduction
Creating and updating road maps is a tedious, expensive, and often manual process today [11]. Accurate and uptodate maps are especially important given the popularity of locationbased mobile services and the impending arrival of autonomous vehicles. Several companies are investing hundreds of millions of dollars on mapping the world, but despite this investment, error rates are not small in practice, with map providers receiving many tens of thousands of error reports per day.^{1}^{1}1See, e.g., https://productforums.google.com/forum/#!topic/maps/dwtCso9owlU for an example of a city (Doha, Qatar) where maps have been missing entire subdivisions for years. In fact, even obtaining “ground truth” maps in welltraveled areas may be difficult; recent work [10] reported that the discrepancy between OpenStreetMap (OSM) and the TorontoCity dataset was 14% (the recall according to a certain metric for OSM was 0.86).
Aerial imagery provides a promising avenue to automatically infer the road network graph. In practice, however, extracting maps from aerial images is difficult because of occlusion by trees, buildings, and shadows (see Figure 1
). Prior approaches do not handle these problems well. Almost universally, they begin by segmenting the image, classifying each pixel in the input as either road or nonroad
[5, 10]. They then implement a complex postprocessing pipeline to interpret the segmentation output and extract topological structure to construct a map. As we will demonstrate, noise frequently appears in the segmentation output, making it hard for the postprocessing steps to produce an accurate result.The fundamental problem with a segmentationbased approach is that the CNN is trained only to provide local information about the presence of roads. Key decisions on how road segments are interconnected to each other are delegated to an errorprone postprocessing stage that relies on heuristics instead of machine learning or principled algorithms. Rather than rely on an intermediate image representation, we seek an approach that produces the road network directly from the CNN. However, it is not obvious how to train a CNN to learn to produce a graph from images.
We propose RoadTracer, an approach that uses an iterative graph construction process for extracting graph structures from images. Our approach constructs the road network by adding individual road segments one at a time, using a novel CNN architecture to decide on the next segment to add given as input the portion of the network constructed so far. In this way, we eliminate the intermediate image representation of the road network, and avoid the need for extensive postprocessing that limits the accuracy of prior methods.
Training the CNN decision function is challenging because the input to the CNN at each step of the search depends on the partial road network generated using the CNN up to that step. We find that standard approaches that use a static set of labeled training examples are inadequate. Instead, we develop a dynamic labeling approach to produce training examples on the fly as the CNN evolves during training. This procedure resembles reinforcement learning, but we use it in an efficient supervised training procedure.
We evaluate our approach using aerial images covering 24 square km areas of 15 cities, after training the model on 25 other cities. We make our code and a demonstration of RoadTracer in action available at https://roadmaps.csail.mit.edu/roadtracer. We implement two baselines, DeepRoadMapper [10] and our own segmentation approach. Across the 15 cities, our main experimental finding is that, at a 5% average error rate on a junctionbyjunction matching metric, RoadTracer correctly captures 45% more junctions than our segmentation approach (0.58 vs 0.40). DeepRoadMapper fails to produce maps with better than a 19% average error rate. Because accurately capturing the local topology around junctions is crucial for applications like navigation, these results suggest that RoadTracer is an important step forward in fully automating map construction from aerial images.
2 Related Work
Classifying pixels in an aerial image as “road” or “nonroad” is a wellstudied problem, with solutions generally using probabilistic models. Barzobar et al
. build geometricprobabilistic models of road images based on assumptions about local roadlike features, such as road geometry and color intensity, and draw inferences with MAP estimation
[2]. Wegner et al. use higherorder conditional random fields (CRFs) to model the structures of the road network by first segmenting aerial images into superpixels, and then adding paths to connect these superpixels [17]. More recently, CNNs have been applied to road segmentation [12, 6]. However, the output of road segmentation, consisting of a probability of each pixel being part of a road, cannot be directly used as a road network graph.
To extract a road network graph from the segmentation output, Cheng et al. apply binary thresholding and morphological thinning to produce singlepixelwidth road centerlines [5]. A graph can then be obtained by tracing these centerlines. Máttyus et al. propose a similar approach called DeepRoadMapper, but add postprocessing stages to enhance the graph by reasoning about missing connections and applying heuristics [10]. This solution yields promising results when the road segmentation has modest error. However, as we will show in Section 3.1, heuristics do not perform well when there is uncertainty in segmentation, which can arise due to occlusion, ambiguous topology, or complex topology such as parallel roads and multilayer roads.
Rather than extract the road graph from the result of segmentation, some solutions directly extract a graph from images. Hinz et al. produce a road network using a complex road model that is built using detailed knowledge about roads and their context, such as nearby buildings and vehicles [8]. Hu et al. introduce road footprints, which are detected based on shape classification of the homogeneous region around a pixel [9]. A road tree is then grown by tracking these road footprints. Although these approaches do not use segmentation, they involve numerous heuristics and assumptions that resemble those in the postprocessing pipeline of segmentationbased approaches, and thus are susceptible to similar issues.
Inferring road maps from GPS trajectories has also been studied [4, 14, 13]. However, collecting enough GPS data that can cover the entire map in both space and time is challenging, especially when the region of the map is large and far from the city core. Nevertheless, GPS trajectories may be useful to improve accuracy in areas where roads are not visible from the imagery, to infer road connectivity at complex interchanges where roads are layered, and to enable more frequent map updates.
3 Automatic Map Inference
The goal of automatic map inference is to produce a road network map, i.e., a graph where vertices are annotated with spatial coordinates (latitude and longitude), and edges correspond to straightline road segments. Vertices with three or more incident edges correspond to road junctions (e.g. intersections or forks). Like prior methods, we focus on inferring undirected road network maps, since the directionality of roads is generally not visible from aerial imagery.
In Section 3.1, we present an overview of segmentationbased mapinference methods used by current stateoftheart techniques [5, 10] to construct a road network map from aerial images. We describe problems in the maps inferred by the segmentation approach to motivate our alternative solution. Then, in Section 3.2, we introduce our iterative map construction method. In Section 4, we discuss the procedure used to train the CNN used in our solution.
3.1 Prior Work: Segmentation Approaches
Segmentationbased approaches have two steps. First, each pixel is labeled as either “road” or “nonroad”. Then, a postprocessing step applies a set of heuristics to convert the segmentation output to a road network graph.
Stateoftheart techniques share a similar postprocessing pipeline to extract an initial graph from the segmentation output. The segmentation output is first thresholded to obtain a binary mask. Then, they apply morphological thinning [18] to produce a mask where roads are represented as onepixelwide centerlines. This mask is interpreted as a graph, where set pixels are vertices and edges connect adjacent set pixels. The graph is simplified with the DouglasPeucker method [7].
Because the CNN is trained with a loss function evaluated independently on each pixel, it will yield a noisy output in regions where it is unsure about the presence of a road. As shown in Figure
2(a) and (b), noise in the segmentation output will be reflected in the extracted graph. Thus, several methods have been proposed to refine the initial extracted graph. Figure 2(c) shows the graph after applying three refinement heuristics: pruning short dangling segments, extending deadend segments, and merging nearby junctions.Although refinement is sufficient to remove basic types of noise, as in Figure 2, we find that many forms of noise are too extensive to compensate for. In Figure 3, we show an example where the segmentation output contains many gaps, leading to a disconnected graph with poor coverage. Given this segmentation output, even a human would find it difficult to accurately map the road network. Because the CNN is trained only to classify individual pixels in an image as roads, it leaves us with an untenable jigsaw puzzle of deciding which pixels form the road centerlines, and where these centerlines should be connected.
These findings convinced us that we need a different approach that can produce a road network directly, without going through the noisy intermediate image representation of the road network. We propose an iterative graph construction architecture to do this. By breaking down the mapping process into a series of steps that build a road network graph iteratively, we will show that we can derive a road network from the CNN, thereby eliminating the requirement of a complex postprocessing pipeline and yielding more accurate maps.
3.2 RoadTracer: Iterative Graph Construction
In contrast to the segmentation approach, our approach consists of a search algorithm, guided by a decision function implemented via a CNN, to compute the graph iteratively. The search walks along roads starting from a single location known to be on the road network. Vertices and edges are added in the path that the search follows. The decision function is invoked at each step to determine the best action to take: either add an edge to the road network, or step back to the previous vertex in the search tree. Algorithm 1 shows the pseudocode for the search procedure.
Search algorithm. We input a region , where is the known starting location, and is a bounding box defining the area in which we want to infer the road network. The search algorithm maintains a graph and a stack of vertices that both initially contain only the single vertex . , the vertex at the top of , represents the current location of the search.
At each step, the decision function is presented with , , and an aerial image centered at ’s location. It can decide either to walk a fixed distance (we use meters) forward from along a certain direction, or to stop and return to the vertex preceding in . When walking, the decision function selects the direction from a set of
angles that are uniformly distributed in
. Then, the search algorithm adds a vertex at the new location (i.e., away from along the selected angle), along with an edge , and pushes onto (in effect moving the search to ).If the decision process decides to “stop” at any step, we pop from . Stopping indicates that there are no more unexplored roads (directions) adjacent to . Note that because only new vertices are ever pushed onto , a “stop” means that the search will never visit the vertex again.
Figure 4 shows an example of how the search proceeds at an intersection. When we reach the intersection, we first follow the upper branch, and once we reach the end of this branch, the decision function selects the “stop” action. Then, the search returns to each vertex previously explored along the left branch. Because there are no other roads adjacent to the upper branch, the decision function continues to select the stop action until we come back to the intersection. At the intersection, the decision function leads the search down the lower branch. Once we reach the end of this branch, the decision function repeatedly selects the stop action until we come back to and becomes empty. When is empty, the construction of the road network is complete.
Since road networks consist of cycles, it is also possible that we will turn back on an earlier explored path. The search algorithm includes a simple merging step to handle this: when processing a walk action, if is within distance of a vertex , but the shortest distance in from to is at least , then we add an edge and don’t push onto . This heuristic prevents small loops from being created, e.g. if a road forks into two at a small angle.
Lastly, we may walk out of our bounding box . To avoid this, when processing a walk action, if is not contained in , then we treat it as a stop action.
CNN decision function. A crucial component of our algorithm is the decision function, which we implement with a CNN. The input layer consists of a window centered on . This window has four channels. The first three channels are the RGB values of the portion of aerial imagery around . The fourth channel is the graph constructed so far, . We render by drawing antialiased lines along the edges of that fall inside the window. Including in the input to the CNN is a noteworthy aspect of our method. First, this allows the CNN to understand which roads in the aerial imagery have been explored earlier in the search, in effect moving the problem of excluding these roads from postprocessing to the CNN. Second, it provides the CNN with useful context; e.g., when encountering a portion of aerial imagery occluded by a tall building, the CNN can use the presence or absence of edges on either side of the building to help determine whether the building occludes a road.
The output layer consists of two components: an action component that decides between walking and stopping, and and an angle component that decides which angle to walk in. The action component is a softmax layer with 2 outputs,
. The angle component is a sigmoid layer with neurons, . Each corresponds to an angle to walk in. We use a threshold to decide between walking and stopping. If , then walk in the angle corresponding to . Otherwise, stop.We noted earlier that our solution does not require complex postprocessing heuristics, unlike segmentationbased methods where CNN outputs are noisy. The only postprocessing required in our decision function is to check a threshold on the CNN outputs and select the maximum index of the output vector. Thus, our method enables the CNN to directly produce a road network graph.
4 Iterative Graph Construction CNN Training
We now discuss the training procedure for the decision function. We assume we have a ground truth map (e.g., from OpenStreetMap). Training the CNN is nontrivial: the CNN takes as input a partial graph (generated by the search algorithm) and outputs the desirability of walking at various angles, but we only have this ground truth map. How might we use to generate training examples?
4.1 Static Training Dataset
We initially attempted to generate a static set of training examples. For each training example, we sample a region and a step count , and initialize a search. We run steps of the search using an “oracle” decision function that uses to always make optimal decisions. The state of the search algorithm immediately preceding the th step is the input for the training example, while the action taken by the oracle on the
th step is used to create a target output
, . We can then train a CNN using gradient descent by backpropagating a cross entropy loss between and , and, if , a meansquared error loss between and .However, we found that although the CNN can achieve high performance in terms of the loss function on the training examples, it performs poorly during inference. This is because is essentially perfect in every example that the CNN sees during training, as it is constructed by the oracle based on the ground truth map. During inference, however, the CNN may choose angles that are slightly off from the ones predicted by the oracle, resulting in small errors in . Then, because the CNN has not been trained on imperfect inputs, these small errors lead to larger prediction errors, which in turn result in even larger errors.
Figure 5 shows a typical example of this snowball effect. The CNN does not output the ideal angle at the turn; this causes it to quickly veer off the actual road because it never saw such deviations from the road during training, and hence it cannot correct course. We tried to mitigate this problem by using various methods to introduce noise on in the training examples. Although this reduces the scale of the problem, the CNN still yields low performance at inference time, because the noise that we introduce does not match the characteristics of the noise introduced inherently by the CNN during inference. Thus, we conclude a static training dataset is not suitable.
4.2 Dynamic Labels
We instead generate training examples dynamically by running the search algorithm with the CNN as the decision function during training. As the CNN model evolves, we generate new training examples as well.
Given a region , training begins by initializing an instance of the search algorithm , where is the partial graph (initially containing only ) and is the vertex stack. On each training step, as during inference, we feedforward the CNN to decide on an action based on the output layer, and update and based on that action.
In addition to deciding on the action, we also determine the action that an oracle would take, and train the CNN to learn that action. The key difference from the static dataset approach is that, here, and are updated based on the CNN output and not the oracle output; the oracle is only used to compute a label for backpropagation.
The basic strategy is similar to before. On each training step, based on , we first identify the set of angles where there are unexplored roads from . Next, we convert into a target output vector . If is empty, then . Otherwise, , and for each angle , we set , where is the closest walkable angle to . Lastly, we compute a loss between and , and apply backpropagation to update the CNN parameters.
A key challenge is how to decide where to start the walk in to pick the next vertex. The naive approach is to start the walk from the closest location in to . However, as the example in Figure 6 illustrates, this approach can direct the system towards the wrong road when differs from .
To solve this problem, we apply a mapmatching algorithm to find a path in that is most similar to a path in ending at . To obtain the path in , we perform a random walk in starting from . We stop the random walk when we have traversed a configurable number of vertices (we use ), or when there are no vertices adjacent to the current vertex that haven’t already been traversed earlier in the walk. Then, we match this path to the path in to which it is most similar. We use a standard mapmatching method based on the Viterbi algorithm [15]. If is the endpoint of the last edge in , we start our walk in at .
Finally, we maintain a set containing edges of that have already been explored during the walk. is initially empty. On each training step, after deriving from mapmatching, we add each edge in to . Then, when performing the walk in , we avoid traversing edges that are in again.
5 Evaluation
Dataset. To evaluate our approach, we assemble a large corpus of highresolution satellite imagery and ground truth road network graphs covering the urban core of forty cities across six countries. For each city, our dataset covers a region of approximately 24 sq km around the city center. We obtain satellite imagery from Google at 60 cm/pixel resolution, and the road network from OSM (we exclude certain nonroads that appear in OSM such as pedestrian paths and parking lots). We convert the coordinate system of the road network so that the vertex spatial coordinate annotations correspond to pixels in the satellite images.
We split our dataset into a training set with 25 cities and a test set with 15 other cities. To our knowledge, we conduct the first evaluation of automatic mapping approaches where systems are trained and evaluated on entirely separate cities, and not merely different regions of one city, and also the first largescale evaluation over aerial images from several cities. Because many properties of roads vary greatly from city to city, the ability of an automatic mapping approach to perform well even on cities that are not seen during training is crucial; the regions where automatic mapping holds the most potential are the regions where existing maps are nonexistent or inaccurate.
Baselines. We compare RoadTracer with two baselines: DeepRoadMapper [10] and our own segmentationbased approach. Because the authors were unable to release their software to us, we implemented DeepRoadMapper, which trains a residual network with a soft intersectionoverunion (IoU) loss function, extracts a graph using thresholding and thinning, and refines the graph with a set of heuristics and a missing connection classifier.
However, we find that the IoU loss results in many gaps in the segmentation output, yielding poor performance. Thus, we also implement our own segmentation approach that outperforms DeepRoadMapper on our dataset, where we train with cross entropy loss, and refine the graph using a fourstage purely heuristic cleaning process that prunes short segments, removes small connected components, extends deadend segments, and merges nearby junctions.
Metrics. We evaluate RoadTracer and the segmentation schemes on TOPO [3], SP [16], and a new junction metric defined below. TOPO and SP are commonly used in the automatic road map inference literature [4, 14, 17, 1]. TOPO simulates a car driving a certain distance from several seed locations, and compares the destinations that can be reached in with those that can be reached in
in terms of precision and recall. SP generates a large number of origindestination pairs, computes the shortest path between the origin and the destination in both
and for each pair, and outputs the fraction of pairs where the shortest paths are similar (distances within 5%).However, we find that both TOPO and SP tend to assign higher scores to noisier maps, and thus don’t correlate well with the usability of an inferred map. Additionally, the metrics make it difficult to reason about the cause of a low or high score.
Thus, we propose a new evaluation metric with two goals: (a) to give a score that is representative of the inferred map’s practical usability, and (b) to be interpretable. Our metric compares the ground truth and inferred maps junctionbyjunction, where a junction is any vertex with three or more edges. We first identify pairs of corresponding junctions
, where is in the ground truth map and is in the inferred map. Then, is the fraction of incident edges of that are captured around , and is the fraction of incident edges of that appear around . For each unpaired ground truth junction , , and for each unpaired inferred map junction , . Finally, if and , we report the correct junction fraction and error rate .TOPO and our junction metric yield a precisionrecall curve, while SP produces a single similar path count.
Scheme  Correct  Long  Short  NoPath 
DeepRoadMapper  0.21  0.29  0.03  0.47 
Seg. (Ours)  0.58  0.14  0.27  0.01 
RoadTracer  0.72  0.16  0.10  0.02 
Quantitative Results. We evaluate performance of the three methods on 15 cities in the test set. We supply starting locations for RoadTracer by identifying peaks in the output of our segmentationbased approach. All three approaches are fully automated.
Both RoadTracer and the segmentation approaches have parameters that offer a tradeoff between recall and error rate (). We vary these parameters and plot results for our junction metric and TOPO on a scatterplot where one axis corresponds to recall and the other corresponds to error rate. For DeepRoadMapper and our segmentation approach, we vary the threshold used to produce a binary mask. We find that the threshold does not impact the graph produced by DeepRoadMapper, as the IoU loss pushes most outputs to the extremes, and thus only plot one point. For RoadTracer, we vary the walkstop action threshold .
We report performance in terms of average and across the test cities in Figure 7, and in terms of average TOPO precision and recall in Figure 8.
On the junction metric, RoadTracer has a better for a given . The performance improvement is most significant when error rates are between 5% and 10%, which is the range that offers the best tradeoff between recall and error rate for most applications—when error rates are over 10%, the amount of noise is too high for the map to be usable, and when error rates are less than 5%, too few roads are recovered (see Figure 9). When the error rate is 5%, the maps inferred by RoadTracer have 45% better average recall () than those inferred by the segmentation approach (0.58 vs 0.40).
On TOPO, RoadTracer has a lower error rate than the segmentation approaches when the recall is less than 0.43. Above 0.43 recall, where the curves cross, further lowering in RoadTracer yields only a marginal improvement in recall, but a significant increase in the error rate. However, the segmentation approach outperforms RoadTracer only for error rates larger than 0.14; we show in Figure 9 that inferred maps with such high error rates are not usable.
We report SP results for the thresholds that yield highest number of correct shortest paths in Table 1. RoadTracer outperforms the segmentation approach because noise in the output of the segmentation approach causes many instances where the shortest path in the inferred graph is much shorter than the path in the ground truth graph.
Our DeepRoadMapper implementation performs poorly on our dataset. We believe that the soft IoU loss is not wellsuited to the frequency of occlusion and complex topology found in the city regions in our dataset.
Qualitative Results. In Figure 10, we show qualitative results in crops from four cities from the test set: Chicago, Boston, Salt Lake City, and Toronto. For RoadTracer and our segmentation approach, we show inferred maps for the threshold that yields 5% average . DeepRoadMapper only produces one map.
RoadTracer performs much better on frequent occlusion by buildings and shadows in the Chicago and Boston regions. Although the segmentation approach is able to achieve similar recall in Boston on the lowest threshold (not shown), several incorrect segments are added to the map. In the Salt Lake City and Toronto regions, performance is comparable. DeepRoadMapper’s soft IoU loss introduces many disconnections in all four regions, and the missing connection classifier in the postprocessing stage can only correct some of these.
We include more outputs in the supplementary material, and make our code, fullresolution outputs, and videos showing RoadTracer in action available at https://roadmaps.csail.mit.edu/roadtracer.
6 Conclusion
On the face of it, using deep learning to infer a road network graph seems straightforward: train a CNN to recognize which pixels belong to a road, produce the polylines, and then connect them. But occlusions and lighting conditions pose challenges, and such a segmentationbased approach requires complex postprocessing heuristics. By contrast, our iterative graph construction method uses a CNNguided search to directly output a graph. We showed how to construct training examples dynamically for this method, and evaluated it on 15 cities, having trained on aerial imagery from 25 entirely different cities. To our knowledge, this is the largest mapinference evaluation to date, and the first that fully separates the training and test cities. Our principal experimental finding is that, at a 5% error rate, RoadTracer correctly captures 45% more junctions than our segmentation approach (0.58 vs 0.40). Hence, we believe that our work presents an important step forward in fully automating map construction from aerial images.
7 Acknowledgements
This research was supported in part by the Qatar Computing Research Institute (QCRI).
References
 [1] M. Ahmed, S. Karagiorgou, D. Pfoser, and C. Wenk. A comparison and evaluation of map construction algorithms using vehicle tracking data. GeoInformatica, 19(3):601–632, 2015.
 [2] M. Barzohar and D. B. Cooper. Automatic finding of main roads in aerial images by using geometricstochastic models and estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(7):707–721, 1996.
 [3] J. Biagioni and J. Eriksson. Inferring road maps from global positioning system traces. Transportation Research Record: Journal of the Transportation Research Board, 2291(1):61–71, 2012.
 [4] J. Biagioni and J. Eriksson. Map inference in the face of noise and disparity. In Proceedings of the 20th International Conference on Advances in Geographic Information Systems, pages 79–88. ACM, 2012.
 [5] G. Cheng, Y. Wang, S. Xu, H. Wang, S. Xiang, and C. Pan. Automatic road detection and centerline extraction via cascaded endtoend convolutional neural network. IEEE Transactions on Geoscience and Remote Sensing, 55(6):3322–3337, 2017.
 [6] D. Costea and M. Leordeanu. Aerial image geolocalization from recognition and matching of roads and intersections. arXiv preprint arXiv:1605.08323, 2016.
 [7] D. H. Douglas and T. K. Peucker. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica: The International Journal for Geographic Information and Geovisualization, 10(2):112–122, 1973.
 [8] S. Hinz and A. Baumgartner. Automatic extraction of urban road networks from multiview aerial imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 58(1):83–98, 2003.
 [9] J. Hu, A. Razdan, J. C. Femiani, M. Cui, and P. Wonka. Road network extraction and intersection detection from aerial images by tracking road footprints. IEEE Transactions on Geoscience and Remote Sensing, 45(12):4144–4157, 2007.

[10]
G. Máttyus, W. Luo, and R. Urtasun.
DeepRoadMapper: Extracting road topology from aerial images.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pages 3438–3446, 2017.  [11] G. Miller. The Huge, Unseen Operation Behind the Accuracy of Google Maps. https://www.wired.com/2014/12/googlemapsgroundtruth/, Dec. 2014.
 [12] V. Mnih and G. E. Hinton. Learning to detect roads in highresolution aerial images. In European Conference on Computer Vision, pages 210–223. Springer, 2010.
 [13] Z. Shan, H. Wu, W. Sun, and B. Zheng. COBWEB: A robust map update system using GPS trajectories. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pages 927–937. ACM, 2015.
 [14] R. Stanojevic, S. Abbar, S. Thirumuruganathan, S. Chawla, F. Filali, and A. Aleimat. Kharita: Robust map inference using graph spanners. arXiv preprint arXiv:1702.06025, 2017.
 [15] A. Thiagarajan, L. Ravindranath, K. LaCurts, S. Madden, H. Balakrishnan, S. Toledo, and J. Eriksson. VTrack: Accurate, energyaware road traffic delay estimation using mobile phones. In Proceedings of the 7th ACM Conference on Embedded Networked Sensor Systems, pages 85–98. ACM, 2009.
 [16] J. D. Wegner, J. A. MontoyaZegarra, and K. Schindler. A higherorder CRF model for road network extraction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1698–1705, 2013.
 [17] J. D. Wegner, J. A. MontoyaZegarra, and K. Schindler. Road networks as collections of minimum cost paths. ISPRS Journal of Photogrammetry and Remote Sensing, 108:128–137, 2015.
 [18] T. Zhang and C. Y. Suen. A fast parallel algorithm for thinning digital patterns. Communications of the ACM, 27(3):236–239, 1984.