Road Network Reconstruction from Satellite Images with Machine Learning Supported by Topological Methods

09/15/2019 ∙ by Tamal K. Dey, et al. ∙ The Ohio State University 21

Automatic Extraction of road network from satellite images is a goal that can benefit and even enable new technologies. Methods that combine machine learning (ML) and computer vision have been proposed in recent years which make the task semi-automatic by requiring the user to provide curated training samples. The process can be fully automatized if training samples can be produced algorithmically. Of course, this requires a robust algorithm that can reconstruct the road networks from satellite images reliably so that the output can be fed as training samples. In this work, we develop such a technique by infusing a persistence-guided discrete Morse based graph reconstruction algorithm into ML framework. We elucidate our contributions in two phases. First, in a semi-automatic framework, we combine a discrete-Morse based graph reconstruction algorithm with an existing CNN framework to segment input satellite images. We show that this leads to reconstructions with better connectivity and less noise. Next, in a fully automatic framework, we leverage the power of the discrete-Morse based graph reconstruction algorithm to train a CNN from a collection of images without labelled data and use the same algorithm to produce the final output from the segmented images created by the trained CNN. We apply the discrete-Morse based graph reconstruction algorithm iteratively to improve the accuracy of the CNN. We show promising experimental results of this new framework on datasets from SpaceNet Challenge.



There are no comments yet.


page 7

page 12

page 16

page 18

page 20

page 21

page 22

page 23

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Layout of road networks is essential for diverse applications in geographic information systems. Efficient reconstruction from images and timely updates of road networks are important both for map designs and handling events such as natural disasters. The availability of high-resolution satellite images has enabled such technology in recent years though the process is not fully automatic. Currently the road extraction from satellite images is mainly completed manually [20]. Doing so automatically or even semi-automatically in a reliable manner is challenging as there are a variety of different types of roads whose images are cluttered with noise and occlusions (by cars/trees etc).

Extracting lane-related information from high resolution satellite images has been addressed in recent years [11, 19, 24]

. Specifically for road extraction, a range of methods that combine machine learning and computer vision methods have been proposed to reconstruct roads using labelled data. These are semi-automatic in the sense that they use manually curated samples to train the classifier. These methods often consist of two main stages. The first stage consists of the background segmentation and the second stage consists of the centerline extraction. The background segmentation is usually done via machine learning methods such as performing feature extraction and pixel-wise label predictions with SVM

[4, 16] or CNN [3]. More recently, a CNN framework called U-Net [15] is proposed that outputs the segmentations directly, improving the predictions and the running time significantly. The baseline algorithms for SpaceNet Challenge [20] use the architecture such as U-Net and PSPNet [25]. For the second stage, methods like skeleton or medial axis extraction with pre- and post-processing are often used to obtain the final road networks. However, recovering the correct connections and junctions of roads still remain challenging. This problem is critical since the road network is often used in routing and false breaks in the extraction lead to unacceptable results. This two-stage approach can potentially be fully automatized if training samples can be produced algorithmically.

To achieve the full automatization, one needs to have a direct reconstruction from images that may not be completely faithful but reliable enough to serve as the generator of good training samples. Then, one can iteratively use the technique to improve upon the training samples. This is what we achieve in this work for automatic road reconstruction. It turns out that our direct reconstruction method even improves over the state-of-the-art techniques for semi-automatic reconstructions by providing a more robust algorithm for the second stage. Our direct reconstruction method is a topology-based graph reconstruction algorithm. It uses the recent techniques of topological persistence [8] and discrete Morse theory [10] in topological data analysis. This topology-based approach for recovering hidden structures has been proposed and studied recently [5, 12, 14, 22]. It has been applied to extracting graph-like structures from simulated dark matter density fields [18] and reconstructing road networks from GPS traces [21, 6]. This discrete-Morse based graph reconstruction framework is clean both conceptually and implementation-wise. Most importantly, as it uses a global topological structure to make decisions (instead of using purely local information to decide whether a point is on or off the road), the algorithm is robust to noise, non-uniform sampling of the data, and reliable at recovering junctions. Very recently, this graph reconstruction algorithm has been further simplified, and theoretical guarantees of this graph reconstruction algorithm for the case when the signal prevails noise have been provided [7].

Specific contributions:

Our contribution is twofold.

(1) First, in a semi-automatic framework, we apply the discrete-Morse based graph reconstruction algorithm on the segmented satellite images obtained by a CNN. This, of course, requires user provided training samples to train the CNN. We show that this leads to reconstructions with better network connectivity and less noise compared to some existing state-of-the-art technique.

(2) More importantly, next, in a fully automatic framework, we develop a novel method to leverage the power of the discrete-Morse based graph reconstruction algorithm to train a CNN from a collection of images without labelled data so that it can produce segmentation for new images. To elaborate, we start with running the graph reconstruction algorithm on the raw satellite images to obtain some initial reconstructions. We then put the pixels from reliable branches of the output graph as positive and others as negative to create the labels for the training, and produce an intermediate CNN classifier. We predict the segmented images for the training set using this intermediate CNN and then repeat the same process on the output to gradually improve the CNN. Our experiment shows that after several iterations of training, the labels computed from the graph reconstruction algorithm become less noisy and the performance of the classifier improves significantly. If we relax the condition slightly and assume that we know the labels for only 10% of the train set, we can incorporate this partially labelled data into our framework, and the performance of the classifier becomes even better.

We experiment on datasets from SpaceNet Challenge [20] which consists of high resolution images for four cities. For the semi-automatic framework, we compare our results with the results of the winner’s algorithm, using the APLS score (defined in [20]) as well as another metric which we call Average Hausdorff distance, to evaluate the quality of the reconstructed networks compared to the ground-truth (provided by SpaceNet Challenge). Overall, our reconstructions tend to have better connectivity and are less noisy. For the fully automatic framework, we show that the reconstruction quality is significantly improved through our iterative training process. Furthermore, our framework can be modified to include a small set of labelled data and the accuracy improves as we use more and more labelled data.

This paper is organized as follows, Section 2 briefly describes the idea of the discrete-Morse based graph reconstruction algorithm. Section 3 introduces our semi-automatic framework and fully automatic framework. Section 4 provides various experiment results for both frameworks and discusses limitations and future works.

2 Discrete-Morse based graph reconstruction

Figure 1: High level road network reconstruction from satellite images framework pipeline.

On the high level, the road network reconstruction from satellite images framework has two stages; see Figure 1. In Stage 1, we use some machine learning techniques to convert a given satellite image into a segmented image where roughly speaking, the value at each pixel represents the likelihood of this pixel being on / around roads. In Stage 2, we extract the hidden road-network (graph) from this segmented image.

We apply the simplified version of our discrete-Morse based graph reconstruction algorithm [7] to extract road networks from the segmented images (sometimes called “road masking” in the literature). Given that our approach mostly uses this reconstruction algorithm as a black box, we only provide a high-level description of the main ideas here. Interested readers should see [7] for more details.

Figure 2: (a) An input density field on the plane (top corner) and its terrain view. (b) Mountain ridges of the terrain (black lines) capture the road network.

Given a segmented image, we view it as a density field defined on a 2D grid, where the function value at each vertex reflects the likelihood of the corresponding pixel belonging to the road class. The goal is to extract a graph that represents the hidden road network.

In particular, for simplicity, assume we have a triangulation of the grid (image), and thus the segmented image can be viewed as a “density function” with reflecting how likely is in the road class. The graph of this density function can be viewed as a terrain with the height of a point being its function value ; see Figure 1(a). Algorithm MorseGraphRecon() of [7] (developed based on earlier work, e.g, [5, 12, 18, 21]) proposes to use the “mountain ridge” of this terrain to describe the hidden graph. Intuitively, the mountain ridge structures are formed by those flow lines (following the steepest descending direction) that connect maxima and saddles of this terrain. Curves in the mountain ridges connect mountain peaks and saddles, and separate different “valleys”. A point on such a curve has a higher function value than points off the curve in a direction orthogonal to the curve locally. This is consistent with what a “road” should be: points in a road have higher “density” than points off the road in the orthogonal direction though this point may not have the highest density value along the road itself. See Figure 1(b).

Algorithm MorseGraphRecon() extracts the “mountain ridges” from the input density function (terrain) via the so-called 1-stable manifolds from Morse theory. For the sake of efficient and numerically stable computation, it uses the discrete Morse theory [10] to implement it. Very importantly, the algorithm also uses the concept of persistent homology [9] to capture “importance” of different pieces of 1-stable manifolds (more precisely, important max-saddle pairs) in a meaningful manner. This allows the algorithm to remove noise and simplify the output graph (road network) systematically.

Notice that, since this algorithm uses the global “mountain ridges” to infer the hidden networks, it does not need to identify the junction nodes separately, and it can also bridge through small gaps in the density field. The algorithm is clean (uses only one parameter) and efficient. It takes time for a planar triangulation with vertices.

3 Approaches

3.1 Semi-automatic framework

The semi-automatic framework follows the high-level two-stage approach as outlined in Figure 1. In the first stage, we train a CNN using training images consisting of ground-truth roads labeled. Given a raw satellite image, we feed it to this trained CNN to obtain a segmented image. In the second stage, we apply the discrete-Morse based graph reconstruction algorithm to extract the road-network from the segmented image. For the second stage to work more accurately, we need to detect road ends called “tips” in the segmented images obtained in the first stage. We take advantage of the CNN to add a simple “tip-detection” stage that enhances the segmented images. The overall pipeline for Stage-1 of the semi-automatic framework is shown in Figure 3. The inputs for the framework are high resolution satellite images, which are split into a test set and a train set. The train set has ground truth graphs (obtained manually) that represent the centerlines of the road networks. The road-labels for training are created by thickening the ground truth graph and labeling pixels inside the thickened graph as positive and others as negative.

Figure 3: The pipeline for Stage 1 (CNN training) for our semi-automatic framework.

CNN architecture

We use the architecture from the winner’s approach of the Spacenet Challenge [1]. It uses resnet34 [13] as encoder and unet-like [15] decoder.

Reconstructing tips.

The graph reconstruction algorithm MorseGraphRecon() sometimes may miss hanging branches. To remedy this, we propose a novel way to enhance the segmented images. In particular, following the edit strategy of [6], we modify the density values (i.e, the pixel values of the segmented images) of the tips to high values thus causing them to become local maxima which in turn forces reconstructed roads connecting to them. We develop two techniques to detect the tips: (1) Learn the locations of the tips with the same CNN architectures. (2) Detect the tips from the segmented images by checking the windows around points with high densities. As shown in Figure 3, we add up the segmented image and the two tip enhancements to obtain the final segmented image to feed to Stage 2. Figure 4 shows the comparison between reconstructions without and with tip enhancements.

Figure 4: Comparison of results without and with tip enhancements. Left: raw satellite images (yellow graphs are ground-truth road networks). Middle/Right: red-graphs are reconstructions without/with tip enhancement respectively, overlaid on top of the ground-truth graph (yellow). Dark colors are the learned density field.

3.2 Fully Automatic Framework

The ground truth labeling used in the semi-automatic framework is itself a graph like structure. In this section, we propose to create the labels using the discrete-Morse based graph reconstruction algorithm without the knowledge of the ground truth. These labels are used to train a CNN for image segmentation. The segmented images are again labeled by the output of the graph reconstruction algorithm and fed to the CNN for training purpose. A few iterations of training and labeling improves the quality of the image segmentation significantly as our experiments show. This framework is particularly useful when there is no or very few labelled data to begin with.

Figure 5: Pipeline for Stage 1 (CNN training) for our fully automatic framework. Note that no input satellite image has labels for roads!

Our framework can deal with the following two scenarios. We perform label-free learning when we do not have ground truth roads for any input satellite images to begin with. We perform partially-labeled learning when we have a small fraction of images (say of training set) with road labels.

Label-free case.

We describe the framework for the label-free case, and the partially-labeled case can be handled by a slight modification of it. The high-level pipeline of Stage 1 (training a CNN for segmenting an input image) is in Figure 5. Given an input set of raw satellite images (with no labels), we split it into the training and testing sets, denoted by and , respectively. We run algorithm MorseGraphRecon() on each image from , and let be its corresponding output. We use a large threshold for simplification in algorithm MorseGraphRecon() so as to generate a reconstruction of the more reliable part of the input. Then we label pixels on as positive and pixels on the complement of as negative. Next we train the CNN classifier with those labeled pixels, and this is our first classifier (shown in Algorithm 1).

Data: Images
Result: Classifier
1 begin
2       Compute the triangulation of and take pixel values as density function = MorseGraphRecon(,, ) Label pixels on as positive, pixels on the complement of as negative Train a CNN classifier by above features return
Algorithm 1 MorseLabelTrain()

Now feeding each original training image from to returns a collection of segmented images , where in each image, every pixel has a value reflecting the likelihood of it being positive (on the road). We repeat the steps with images in and obtain a new CNN classifier . In a generic -th iteration of this process, feeding the training images to returns segmented images , which we use to train a new CNN classifier . The process terminates when the segmented images undergo little changes over iterations.

Partially-labelled case.

For this scenario, we start training the CNN classifier using only the labelled training images to obtain . In each of the subsequent iteration , we use both the labels computed from the segmented images at this iteration, as well as the original labels from the ground truth.

4 Experiments


We consider data from the SpaceNet Challenge 3 [20]. It includes four cities: Las Vegas, Paris, Shanghai and Khartoum and consists of the original panchromatic band, the 1.24m resolution 8-band multi-spectral 11-bit geotiff, and a 30 cm resolution Pan-Sharpened 3-band and 8-band 16-bit geotiff. We only use the 30 cm resolution Pan-Sharpened 3-band (RGB) 16-bit geotiff in our experiments. Each image from the dataset covers 400m by 400m with a size of 1300px by 1300px. The ground truth for each image is a graph representing the centerline of the roads. The width of the roads in the masks is 4 meters. To evaluate the results, we need to compare the proposed graphs with the ground truth. So we only take the train set from this challenge (since ground truth is only known for this set).


The first metric we use to evaluate the results is the Average Path Length Similarity (APLS) [20]. This is the metric used for evaluation in SpaceNet Challenge 3.

Definition 4.1

Let and be two input graphs. For where exists in , let (resp. ) denote the closet node to (resp. to ) in . denote the length of the shortest path. First we define the cost of :

We next define

Where = # unique paths in , and we take the sum over all unique paths. Finally, the APLS score of and

is defined to be the harmonic mean of

and :

This metric sums the differences in optimal paths existing between nodes in the ground truth graph and the reconstructed graph . It consists of two parts: part 1 considers optimal paths from the ground truth graph, finds paths from the reconstructed graph which correspond to them and measure differences; and part 2, in opposite direction, considers paths from the reconstructed graph, finds their correspondences in the ground-truth and compare them. The final score is the harmonic mean of these two parts. It cares about the connections between the nodes and punishes breaks in the roads. However, this metric may not be accurate when the size of the graph and the total amount of paths are small since the metric evaluates the portion (ratio) of paths that match well. In this case, a small difference in the graphs could result in a relatively large difference in the score (see Figure 6). To obtain a more comprehensive picture, we also use the following Average Hausdorff distance:

Definition 4.2

Suppose and are two graphs; is the point set sampled from ; is the point set sampled from , and denotes the Euclidean distance. Then, the one-directional Hausdorff distance is:

Here, MAX is a specific maximum value. We set as final Average Hausdorff distance between and .

Note that for APLS score, the higher the score is, the more similar the two graphs are. But for Average Hausdorff distance, the lower the distance is, the more similar the two graphs are.

AOI_2-Id: 1634
0.8560 / 14.0662
0.4684 / 15.3238
Figure 6: Left: input satellite image. Middle/Right: reconstruction of Buslaev’s method and our method. The APLS-score/Average Hausdorff distance are listed. Note that even though our reconstruction is very similar, the APLS-score is significantly lower due to the sparsity of the signal. Average Hausdorff distance is more accurate for this case.


There are several parameters in the entire pipeline, among which the persistence threshold (for the discrete-Morse based reconstruction algorithm) and the arc-intensity threshold

(used to further remove noisy arcs during the post-processing) affect the results most. To tune these two hyperparameters, we experiment on the validation set with a range of parameters that are chosen empirically, then take the set of parameters that give the highest score. We take APLS scores as the reference to tune the parameters.

Furthermore, for datasets AOI_3 (Shanghai) and AOI_4 (Paris), there are many images with extremely sparse signal, while many of them have much denser signal. We thus use a two-threshold system for the arc-intensity threshold: For those images we need a low arc-intensity threshold : We sort the images by the sum of their intensities, and apply a lower to those images with low total-intensity. We use a higher for the remaining images.

For example, see the right figure for dataset AOI_4, where -axis is the percentage of images (sorted in increasing total-intensity), and -axis is their total-intensity. Given that there is a sharp transition around , we apply a lower threshold to the of images with the lowest total intensity. We use the same strategy for AOI_3, and choose as the threshold to have two values.

4.1 Semi-automatic reconstruction results

train validation test total
AOI_2_Vegas 659 165 165 989
AOI_3_Paris 206 52 52 310
AOI_4_Shanghai 798 200 200 1198
AOI_5_Khartoum 189 47 47 283
Table 1: The split of the dataset.The ratio for train/test/valid data is approximately 4:1:1.
Buslaev[2] ours Buslaev[2] ours
AOI_2 0.8211 0.8278 18.3539 17.7841
AOI_3 0.5848 0.6324 291.0188 289.9532
AOI_4 0.6630 0.6632 69.5775 68.9596
AOI_5 0.6069 0.6477 44.4201 41.6037
Table 2: APLS score and Average Hausdorff distance for the test set. The value used for Average Hausdorff distance is 500 pixels, the size of each image is 1300px by 1300px. The Average Hausdorff distance for AOI_3 is high because there are more cases where the proposed graph is empty while the ground truth graph is not.
AOI_2 0.12 0.4
AOI_3 0.1 0.3(30%)/0.4
AOI_4 0.1 0.3(40%)/0.4
AOI_5 0.07 0.3
Table 3: Chosen parameters. For AOI_3, the content in the column means that for 40% of images with the lowest total intensity, take and for the rest of the images take . The same for AOI_4.

Compared method: Buslaev’s method [2]

We compare our framework with the method of the winner of SpaceNet Challenge 3 [2]. It uses the same CNN architecture to train and then predict the segmented images. In Stage 2, Buslaev’s method first extracts the skeleton from the thresholded segmented images. Then, it transforms the skeleton to a multi-graph using library “sknw” [17]. Finally, it translates the multi-graph to a graph with straight edges. Buslaev’s method outperforms other methods in SpaceNet Challenge 3, so we only compare ours with this one.


As mentioned before, we tested on four datasets. The split of train-validation-test is 4:1:1 for each data set, and the precise numbers are listed in Table 1.

Tables 2 shows the scores under the two metrics for Buslaev’s framework and ours over test datasets (on validation datasets our scores are consistently better). Each score is an average of scores for all test images (recall the split of train-validation-test is shown in Table 1). For APLS score, the larger value the better it is. For Average Hausdorff distance, the smaller value the better it is. Note that AOI_4 and AOI_5 are rather noisy images (especially AOI_5) and most challenging among all datasets. Our method significantly outperforms Buslaev’s method on AOI_5. We also observe that, in general, our output tends to have better connectivity. Figure 7 shows a few examples. Buslaev’s algorithm tends to have more extra branches, and worse connectivity. We note that the final average APLS score reported here for Buslaev’s method is different from the posted one 0.6663 in [23]. This is because the posted score is computed for the original test set from SpaceNet challenge, while we our test set is a subset of the original train set – we cannot compare on the original test set from SpaceNet challenge as the ground truth for them are not publicly available. Tables 3 shows the finally chosen parameters for the reproducibility of the experiment.

Running time.

For each of the two larger datasets (AOI_2_Vegas and AOI_3_Shanghai): Training takes around 500 minutes to learn both the lanes and the tip-marks. Testing (to obtain the segmentation on all images from the test-set) takes around 18 minutes. The final road network extraction stage takes 20 mins for each choice of two hyper-parameters (persistence threshold and intensity threshold), and total 20 * 9 = 180 minutes to tune the two hyper parameters on validation sets and then run on the final testing sets. We note that the graph reconstruction code is not optimized and we believe can be improved for the 2D setting, which would improve the time for the last final road network extraction stage.

For each of the two smaller datasets (AOI_3_Paris and AOI_5_Khartoum): Training takes around 140 minutes to learn both the lanes and tip-marks. Testing (to obtain the segmentation on all images from the test-set) takes around 6 minutes. The final road network extraction stage takes 6 mins for each choice of parameter set, and total 54 minutes to tune the two hyper parameters on validation sets and then run on the final testing sets.

Train Test Road extraction
Table 4: The running time: stands for minutes; means it will be run twice (for both road and tip detections). comes from the tuning of the parameters.
AOI_2-Id: 1462
0.5700 / 18.6828
0.6655 / 14.2246
AOI_3-Id: 217
0.6194 / 20.7057
0.8193 / 16.8086
AOI_4-Id: 267
0.4721 / 29.5151
0.5693 / 21.6930
AOI_5-Id: 207
0.6334 / 30.0484
0.7287 / 24.9911
Figure 7: Left: raw satellite images (yellow graphs are ground-truth road networks). Middle/right: red-graphs are reconstruction of Buslaev’s method and our semi-automatic framework, respectively, overlaid on top of the ground-truth graph (yellow). Dark colors are the learned density field. The numbers given are APLS-score/Average Hausdorff distance.

4.2 Fully automatic reconstruction results

In the following experiments, we randomly select 200 images as the training set , and 50 images as the test set for each dataset. We evaluate the method by computing the APLS scores on the original test set after each iteration. We initialize our fully automatic approach by converting each RGB image to grayscale and then applying a Gaussian filter. One could potentially use other image processing methods to further pre-process it. When applying the graph reconstruction algorithm, we use the same parameters used in Section 4.1 Table 3.

Alternative method for centerline detection.

To show that the discrete-Morse based graph reconstruction algorithm is important for our fully-automatic training framework, we develop the following alternative scheme SkeletonLabelTrain() as a baseline to compare: the graph reconstruction algorithm is replaced with the Buslaev’s [2] skeleton extraction algorithm (as described in Section 4.1).

Note that this skeleton extraction used in [2] is not designed to work directly on the raw satellite images; see Figure 8, where in (a) we show an output by this skeleton extraction algorithm directly applied to a raw satellite image (yellow curves are ground truth), while (b) shows the output of the discrete Morse-based algorithm on the same input, which is much better. Hence to improve the performance of this the baseline method SkeletonLabelTrain(), we will still first use the discrete-Morse graph reconstruction algorithm (or if there are partially-labelled data, using those first) at the beginning of the training process, and switch to the skeleton extraction algorithm only after a few iterations.

Figure 8: Left: Skeleton by [2]. Right: Skeleton by discrete-Morse algorithm.
MorseLabelTrain() 0.2523 0.3340 0.3886 0.4173 0.4332 0.4655 0.4829 0.5252 0.5497 0.5813 0.5922
SkeletonLabelTrain() 0.2677 0.2643 0.2763 0.2753
Table 5: APLS-score for the reconstructed road networks for testing images, based on our label-free framework (MorseLabelTrain()), compared to the alternative method SkeletonLabelTrain(). The first two iterations for SkeletonLabelTrain() is done by MorseLabelTrain() and thus are not shown. After 6 iterations, the score does not improve for SkeletonLabelTrain() any more.
AOI_2 - Id: 323
0.0653 / 57.0400
0.5564 / 30.3855
Figure 9: Left: raw satellite image (ground truth in yellow, even though it is not used!). Middle / right: the reconstructed graph using CNN after the first and the 11-th iterations.

Results for label-free case.

We show results here for dataset AOI_2_Vegas, which is a cleaner dataset from SpaceNet Challenge. Our new fully-automatic framework is less effective on AOI_5_Khartoum, which is much more noisy; however, we will show later that, with 10% labelled images, it can obtain reasonable results on the challenging AOI_5_Khartoum dataset as well.

For test images, we always apply tip detection and arc removal when running the graph reconstruction algorithm. These two procedures are not applied to the segmented images during the first three iterations of the training process of the pipeline in Figure 5, as removing arcs results in loss of signals and tip detection tends to introduce noise when the segmented images are not yet reliable. From onward, we start to apply tip detection since the segmented images are now less noisy. We also decrease the threshold for persistence simplification for the discrete-Morse based graph reconstruction for , as the quality of segmented images becomes better and better.

In Table 5, we show the APLS-score for test images using the CNN learned at the -th iterations, as increases (the Average Hausdorff distance shows a similar trend). In particular, represents the output reconstructed from the segmented images of the set using the trained CNN . We compare the output of our framework for label-free case, denoted by MorseLabelTrain(), with the output of the baseline method SkeletonLabelTrain(). Note that, as explained earlier, the first two iterations for SkeletonLabelTrain() are done by MorseLabelTrain() (using discrete Morse graph reconstruction), and thus no APLS-scores are given for those two iterations for SkeletonLabelTrain(). Also no APLS-scores is shown for SkeletonLabelTrain() after the 6th iterations as the score does not improve further. In contrast, the APLS-score continues to improve (for test images) during the iterative process. In Figure 9, we show an example of the reconstructed graph using the CNN from different iterations of our fully automatic training process: observe that at the beginning, only part of signals are captured. Subsequently, the classifier becomes better and more and more signals are captured.

For this set of (200 + 50) images sampled from AOI_2 dataset, the APLS-score for our semi-automatic framework is about . In this fully automatic framework, in the end we obtain a score of , which is worse. However, keep in mind that no labels are used at all.

AOI_2 ours 0.6521 0.6918 0.7305 0.7210
AOI_2 Skel. 0.4860 0.5137 0.5214 0.5252
AOI_5 ours 0.5351 0.5787 0.5893 0.6077
AOI_5 Skel. 0.5247 0.5091 0.4884 0.4543
Table 6: APLS score for partially-labeled case, where 10% random images have road labels.
AOI_2 - Id: 429
0.3926 / 18.3656
0.6374 / 15.6549
AOI_5 - Id: 50
0.6449 / 68.8017
0.6648 / 50.7188
AOI_5 - Id: 150
0.3758 / 28.1091
0.7356 / 23.2734
Figure 10: Using 10% labelled images. Reconstructed graphs by our MorseLabelTrain() after 1 and 4 iterations.

With 10% ground truth

Now we use a small set of labelled data: Specifically, we assume that only 10% images (i.e, 20 images) have labels (i.e, ground-truth roads given). Table 6 shows the APLS-score for test images after different iterations by MorseLabelTrain() and SkeletonLabelTrain(). For MorseLabelTrain, all scores improve. It is important to note that with only 10% labeled-images, we can now also handle the challenging AOI_5 dataset, and achieve an APLS-score of . (For the case of AOI_2, compared to the label-free case, the score of our new MorseLabelTrain() improves to from ).

It is interesting to note that this iterative procedure does not seem to help SkeletonLabelTrain() much, with scores even getting worse for the noisy dataset AOI_5. We show some examples of reconstructed graphs at different iterations for our algorithm (Figure 10) and for the alternative SkeletonLabelTrain() method (Figure 11).

AOI_2 429
0.2430 / 16.7267
0.2547 / 11.8099
AOI_5 50
0.4533 / 74.7638
0.1940 / 94.0461
Figure 11: Using 10% labelled images. Reconstruction of the alternative SkeletonLabelTrain() method after 1 and 4 iterations.

4.3 Limitations and future work

First, currently we choose the parameter globally. Figure 12 shows the effect of the persistence threshold . The example demonstrates that there is no single parameter value that works for all cases. As for the parameter arc-intensity threshold , we choose it adaptively for AOI_3_Shanghai and AOI_4_Paris by the intensity of the images to deal with the extreme sparse images. For general cases, it is hard to make this choice simply based on the intensities of the images, see Figure 13. An interesting future research direction would be to investigate how to choose these parameters adaptively, yet (semi-)automatically. Second, we recover the tips by locating their positions and modifying the density values. It will be interesting to see if we can recover the tips from the graph reconstruction algorithm directly. Third, we observe that the fully automatic framework sometimes is not efficient for a noisy dataset such as AOI_5_Khartoum. It would be good to improve the performance of this approach for noisy datasets.

AOI_2-Id: 333
=0.1 / 0.8162
=0.15 / 0.6372
AOI_2-Id: 1107
=0.1 / 0.8412
=0.15 / 0.9035
Figure 12: Effects of the persistence threshold on the results. Left: raw satellite images (yellow graphs are ground-truth road networks). Middle/right: Results for different , the number after is the APLS score. The first row gives an example where low leads to better results and the second row gives an example where high leads to better results.
AOI_2-Id: 964
=0.4 / 0.5827
=0.5 / 0.4598
AOI_2-Id: 750
=0.4 / 0.9255
=0.5 / 0.9753
Figure 13: Effects of the arc intensity threshold . Left: raw satellite images (yellow graphs are ground-truth road networks). Middle/right: Results for different , the number after is the APLS score. The first row gives example when low leads to better results and the second row gives example when high leads to better results.

Acknowledgment: We acknowledge the NSF grants CCF-1740761, RI-1815697, CCF-1733798 and CCF-1618247 for partially supporting this research.


  • [1] A. Buslaev, S. Seferbekov, V. Iglovikov, and A. Shvets (2018) Fully convolutional network for automatic road extraction from satellite imagery. In

    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

    Cited by: §3.1.
  • [2] A. Buslaev (2018) Spacenet round 3 winner.. Note: Cited by: Figure 8, §4.1, §4.1, §4.2, §4.2, Table 2.
  • [3] D. Ciresan, A. Giusti, L. M. Gambardella, and J. Schmidhuber (2012)

    Deep neural networks segment neuronal membranes in electron microscopy images

    In Advances in neural information processing systems, pp. 2843–2851. Cited by: §1.
  • [4] S. Das, T. Mirnalinee, and K. Varghese (2011) Use of salient features for the design of a multistage framework to extract roads from high-resolution multispectral satellite images. IEEE transactions on Geoscience and Remote sensing 49 (10), pp. 3906–3931. Cited by: §1.
  • [5] O. Delgado-Friedrichs, V. Robins, and A. Sheppard (2015-03) Skeletonization and partitioning of digital images using discrete morse theory. IEEE Trans. Pattern Anal. Machine Intelligence 37 (3), pp. 654–666. External Links: ISSN 0162-8828 Cited by: §1, §2.
  • [6] T. K. Dey, J. Wang, and Y. Wang (2017) Improved road network reconstruction using discrete morse theory. In Proc. 25th ACM SIGSPATIAL, pp. 58. Cited by: §1, §3.1.
  • [7] T. K. Dey, J. Wang, and Y. Wang (2018) Graph Reconstruction by Discrete Morse Theory. In Proc. 34th Sympos. Comput. Geom., pp. 31:1–31:15. Cited by: §1, §2, §2.
  • [8] H. Edelsbrunner and J. Harer (2010) Computational topology : an introduction. American Mathematical Society. Cited by: §1.
  • [9] H. Edelsbrunner, D. Letscher, and A. Zomorodian (2002) Topological persistence and simplification. Discr. Comput. Geom. 28, pp. 511–533. Cited by: §2.
  • [10] R. Forman (1998) Morse theory for cell complexes. Advances in mathematics 134 (1), pp. 90–145. Cited by: §1, §2.
  • [11] X. Gu, A. Zang, X. Huang, A. Tokuta, and X. Chen (2015) Fusion of color images and lidar data for lane classification. In Proc. 23rd ACM SIGSPATIAL, pp. 69. Cited by: §1.
  • [12] A. Gyulassy, M. Duchaineau, V. Natarajan, V. Pascucci, E. Bringa, A. Higginbotham, and B. Hamann (2007-11) Topologically clean distance fields. IEEE Trans. Visualization Computer Graphics 13 (6), pp. 1432–1439. External Links: ISSN 1077-2626 Cited by: §1, §2.
  • [13] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proc. of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §3.1.
  • [14] V. Robins, P. J. Wood, and A. P. Sheppard (2011-08) Theory and algorithms for constructing discrete morse complexes from grayscale digital images. IEEE Trans. Pattern Anal. Machine Intelligence 33 (8), pp. 1646–1658. External Links: ISSN 0162-8828 Cited by: §1.
  • [15] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §1, §3.1.
  • [16] W. Shi, Z. Miao, and J. Debayle (2014) An integrated method for urban main-road centerline extraction from optical remotely sensed imagery. IEEE Transactions on Geoscience and Remote Sensing 52 (6), pp. 3359–3372. Cited by: §1.
  • [17] sknw (2017) Note: Cited by: §4.1.
  • [18] T. Sousbie (2011-06) The persistent cosmic web and its filamentary structure - I. Theory and implementation. 414, pp. 350–383. External Links: 1009.4015 Cited by: §1, §2.
  • [19] T. Sun, Z. Di, and Y. Wang (2018) Combining satellite imagery and gps data for road extraction. In Proc. of the 2nd ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, pp. 29–32. Cited by: §1.
  • [20] A. Van Etten, D. Lindenbaum, and T. M. Bacastow (2018) SpaceNet: a remote sensing dataset and challenge series. arXiv preprint arXiv:1807.01232. Cited by: §1, §1, §1, §4, §4.
  • [21] S. Wang, Y. Wang, and Y. Li (2015) Efficient map reconstruction and augmentation via topological methods. In Proc. 23rd ACM SIGSPATIAL, pp. 25. Cited by: §1, §2.
  • [22] K. Weiss, F. Iuricich, R. Fellegara, and L. De Floriani (2013) A primal/dual representation for discrete morse complexes on tetrahedral meshes. In Computer Graphics Forum, Vol. 32, pp. 361–370. Cited by: §1.
  • [23] (2018) Winning solutions from spacenet road detection and routing challenge. Note: Cited by: §4.1.
  • [24] A. Zang, R. Xu, Z. Li, and D. Doria (2017) Lane boundary extraction from satellite imagery. In Proc. of the 1st ACM SIGSPATIAL Workshop on High-Precision Maps and Intelligent Applications for Autonomous Vehicles, pp. 1. Cited by: §1.
  • [25] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia (2017) Pyramid scene parsing network. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2881–2890. Cited by: §1.