Change detection is a long-standing research problem, with various applications in general vision field [18, 3, 33, 17, 24] and aerial image scope [19, 30, 13, 28]. As a popular application domain, change detection based on aerial images could be used to analyze the changes of the land surface during a period. It becomes the basis of automatic aerial image analysis. The general definition for change detection is to identify the areas where changes happen by jointly analyzing two registered images . The output is a binary image, in which ones denote pixels where changes happen and zeros represent pixels that remain unchanged. Hence it can be regarded as a binary dense classification problem. Fig. 1(a) and Fig. 1(b) are a pair of co-registered aerial images. Fig. 1(c) is the result of traditional binary change detection task for the pair of images.
However, there exist some drawbacks for the task. To begin with, traditional change detection methods only give the information about whether a region has changed, without the information about what kind of change it is. As a result, it can not fulfill the requirement for understanding the change type, which limits its application to a large extend. Besides, the definition of change is really casual and vague. For instance, the appearance of grassland would change from summer to winter. Under this situation, whether we should label these areas as changed area is not clear at all. This would influence the following processing severely.
To deal with problems above, an efficient solution is to introduce semantic information. To our best knowledge,  first proposed the concept named semantic change detection for street images. It consists of two steps. The first step is the same with traditional change detection, which outputs a binary image showing changed areas. The second step is to simply label the newly added objects. Therefore, it can not decide the type of change, due to the lack of semantic information in the image before change. Following this definition, 
proposed a new deep neural network model which performs well on public street view datasets. tried to deal with semantic segmentation and change detection problems simultaneously through multitask learning. Although this work involves three kinds of change, city expansion, soil change, and water change, it does not include the information about what category the area changed from and which category the area changed to.
In our work, we propose a new task, semantic change pattern analysis(SCPA), to utilize semantic information and analyze change types comprehensively, especially for aerial images. Specifically, for a pair of co-registered images, the output of the task would be a pixel-level multi-class classification result. Let’s refer to the image before change as source image, and the image after change as destination image. Each pixel of the output image would be assigned a class label, which denotes the change type for the corresponding pair of pixels. The change type here is jointly defined by the pixel’s semantic label in the source image and destination image, i.e., the pixel changes from what to what. For instance, label ‘1’ denotes that the pixel has changed from bare land in source image to building in destination image. Note that if the source image and destination image are switched, i.e., the pixel changed from building to bare land, the change type is supposed to be different and another label other than ‘1’ should be assigned. Fig. 1(d) shows the result of semantic change pattern analysis task.
To evaluate the performance for SCPA task, we need a simple and interpretable metric. Considering the fact that SCPA could be regarded as a multi-class pixel-level classification problem, where each class is a type of change, we adopt the mean Intersection over Union(IoU) over all classes as the main metric. This metric is mature and easy to calculate. However, it dosen’t show the ability of method to determine whether an area has changed directly. So we use accuracy as an auxiliary metric to present that ability.
For the new task, another important aspect is dataset. The necessary data is semantic information label. Most existing datasets are for traditional binary change detection, which are not suitable for SCPA task. The most related one is the dataset proposed in . Yet the images and labels of the dataset are acquired in different time from distinct sources, so the image and the label do not match well and there exist obvious errors in the label. That would severely influence model performance for the pixel-level task.
Therefore, we constructed a dataset via manually labeling pixel by pixel for proposed SCPA task. The dataset is based on a pair of large co-registered aerial images of Wuhan City, China, which is named as SCPA-Wuhan City(SCPA-WC) dataset. SCPA-WC is the first well-annotated aerial image dataset containing semantic information label for this task. We believe this publicly available dataset would facilitate the development of the newly proposed task.
On the whole, our contributions are summarized below:
We propose a new task, semantic change pattern analysis, mainly for aerial images, and give the metric for the task, which is clean and interpretable. This is a higher-level task than traditional binary change detection or semantic change detection. Given a pair of images, it can not only decide where changes happen, but also determine the types of change. It would eliminate the ambiguity of change in traditional change detection and provide more useful and richer information for following automatic image analysis.
We construct a dataset, SCPA-Wuhan City dataset, for the task. It contains a pair of large co-registered aerial image of Wuhan City, China, which is labeled manually pixel by pixel with semantic information. This is the first well-annotated aerial image dataset containing semantic information label for this task. We believe the publicly available dataset would make it convenient for people to try their new ideas in this field.
We have conducted extensive baseline experiments on the dataset for the task. We have demonstrated the ability of current methods for the proposed task. These results would become the basis for future work, and encourage people to design more targeted methods for the task. We hope it could facilitate the development of the task, and draw more attention from the community.
2 Related Work
2.1 Binary Change Detection
Binary change detection in this paper refers to the traditional change detection task. The task aims to identify changed areas for a pair of co-registered images. This is a fundamental topic for automatic image analysis, especially for aerial images, since it could provide information of land surface that has experienced changes. Many people have conducted research in this field.
Methods for binary change detection usually consist of two processes. The first one is to calculate a difference map between corresponding pixels, and the second one is to separate these pixels into “change” or “no change” based on a threshold . These works either focus on the image differencing method [5, 15, 16], or put effort on decision function [7, 8]. More recent works [35, 10]
take use of convolutional neural networks to perform binary change detection. This task is only able to determine whether a region has changes, without telling the type of change.
2.2 Semantic Change Detection
The concept of semantic change detection was first proposed in .  proposed a network called CDnet to find structural changes in street view video.  presented a network named ChangeNet based on parallel deep convolutional neural network architecture for the task. This task is based on traditional change detection and mainly targets street view situation. The process contains two steps. The first step is to decide whether an area has changed, which is the same with binary change detection. The additional step is to give newly added objects a class label.
To be specific, if a car appears in the destination image, the car area would be labeled as changed, and another class label of car would be assigned to the area. In other words, it only focuses on what the newly added object is, without considering the change type, i.e., the area changed from what to a car. Although a later work  involves three kinds of change, city expansion, soil change, and water change, it focuses on objects and doesn’t conform to the change type definition here. Because the semantic labels of the changed areas in source image and destination image are not specified at all. Take soil change for instance, it can not answer what an area used to be before it changed to soil, or what the soil changed to in the destination image.
2.3 Change Detection Datasets
Various datasets have been constructed in the field of change detection. For binary change detection,  constructed a binary change detection dataset named air change dataset with several aerial image pairs.  presented a dataset called ONERA satellite change detection dataset, which is composed of some multispectral satellite image pairs.  showed a dataset named aerial imagery change detection dataset. The dataset consists of synthetic aerial iamges with artificial changes generated by the rendering engine.
For semantic change detection,  transformed the TSUNAMI dataset  for the task via adding semantic label to the destination image.  proposed the VL-CMU-CD dataset, which uses simultaneous localization and mapping technique to get nearly registered images.  built a dataset named HRSCD, whose images and labels come from different sources. The only guarantee is that the images and labels are acquired in the same year, which means the time difference between the images and labels could be as large as one full year. That would severely influence model’s performance on the pixel-level task and make the result unreliable if we adopt the dataset. Therefore, a well-annotated dataset containing registered image pairs and corresponding semantic label is needed for SCPA task.
3 Semantic Change Pattern Analysis Task
3.1 Task Definition
Task format. The format for semantic change pattern analysis task is intuitive. If there are land classes encoded by in source image and destination image, there would be change types at most, where the last denotes the type of no change. The change types are encoded by . For a pair of pixels belonging to the source image and registered destination image respectively, the semantic change pattern analysis task requires to map the pair of pixels to a change type . The change type is jointly defined by and , so difference in or would both result in a different change type . We present a brief proof below.
Given land classes in source image and destination image, the maximal number of corresponding change type is .
For land classes in source image and destination image, we first randomly take one pixel from source image , and the corresponding one from destination image . The possible combination situation would be . Then we define all kinds of no change situation, as a whole one, which is reasonable for our task, so we need to remove from , and get the result .
Relation to change detection. Semantic change pattern analysis is a higher-level task than traditional binary change detection and semantic change detection task. Compared to them, SCPA not only requires the information of whether an area has changed, but also the information of what kind of change type the area has experienced. In other words, the solution to SCPA problem would contain the solutions to binary change detection problem and semantic change detection problem.
Relation to semantic segmentation. Semantic segmentation problem performs a pixel-level classification task for a single image input. Obviously, SCPA could also be regarded as a multi-class pixel-level classification task. The difference is the pixel label of SCPA is the change type, other than the land category, and it requires a pair of images as input, not single one. Yet the connection is still close. In fact, an intuitive approach to perform SCPA task is based on semantic segmentation method, which we would explain later in 3.3.
Relation to status transition process.
Status transition process is a kind of random process where system status changes. SCPA task has close relation with the status transition problem, except that SCPA requires both source image and destination image as input, while status transition problem only needs source image as input and outputs the transition probability matrix for each land class.
3.2 Task Metric
Mean IoU. Since SCPA is a multi-class pixel-level classification problem actually, where each class is a type of change, we adopt the mean Intersection over Union(mIoU) over all classes as the main metric. It makes the metric insensitive to class imbalance. This metric is mature and easy to calculate. It’s able to assess model’s performance on determining each change type comprehensively. Formally,
where is the number of pixels of change type predicted to belong to change type , and is the number of total change types that actually appears.
Binary Accuracy. Although mean IoU is able to evaluate model’s performance on determining each change type, it can not directly present method’s ability to decide whether an area has changed, which is also an important aspect for the task. To meet the requirement, we ignore the difference between all kinds of change types, and only care whether the pixel has changed or not. In other words, here SCPA is degraded to a binary change detection problem. Then we take binary accuracy(BAcc) as the auxiliary metric, the widely used one in binary change detection task. Specifically,
where TP denotes the number of pixels that method predicted as changed, and the corresponding ground truth is also labeled as changed, no matter what the specific change type is. TN represents the number of pixels that both the method and the ground truth denote as no change.
3.3 Task Pipeline
In this part, we would discuss two pipelines for SCPA task. One is a two-step method, which is based on semantic segmentation method. It enables us to fully take use of existing methods to deal with the SCPA task. The other one is a unified one-step method, which is supposed to target for the SCPA task.
Two-step method. Since SCPA aims to determine the change type for a pair of images, a natural thought would be a two-step approach. The first step is to get the land class label for both source image and destination image through semantic segmentation method, the second step is to compare predicted source image label with destination image label, and then we obtain the final result. The pipeline is shown in Fig. 2. It enables us to take use of existing methods to tackle the problem, which establishes a meaningful foundation for future work. We take this approach as a baseline in following experiments.
One-step method. Although two-step method mentioned above could be used to deal with SCPA task, it is not specialized designed for the task. For unchanged pixels, the land class information outputted by two-step method is not necessary for the task, since we only need to know the change type for actually changed pixels. Hence we expect a more unified, one-step method targeting SCPA task in the future. Ideally, given a pair of images, the one-step method could directly give the output of the task, without taking extra steps. Fig. 3 presents the pipeline. Some related works [12, 10] have been proposed in binary and semantic change detection field, yet it still requires more effort to construct a unified, one-step method for the SCPA task.
4 SCPA-Wuhan City Dataset
4.1 Dataset Overview
To facilitate the development of the proposed task, we construct the first well-annotated dataset containing both semantic label and change situation. The dataset is based on a large pair of aerial images of Wuhan City, China, which is named as SCPA-Wuhan City(SCPA-WC) dataset. The large pair of aerial images is first registered in pixel level to meet the requirement of the task. They were acquired by the IKONOS sensor, and have a spatial resolution of 1 m. The aerial images come from .
In practice, it’s difficult to directly label the change situation for each pixel, especially including each change type for the task. Therefore, we first label each pixel of source image and destination image respectively with its land class, i.e., give the label of semantic segmentation task, compare the pixel label in the two images, and then get the final label of change type. This way is efficient and accurate. To make sure the labels of unchanged areas keep same, we map the label of source image to destination image and adjust the label for changed areas after labeling the source image first. Besides, since this labeling approach provides extra land class label for unchanged areas, it would benefit the training process of two-step method mentioned before for the task with more training samples.
About data split, we obtain 1,706 image patches(853 pairs of images) with the size of 512512 by cropping the large pair of images. We then randomly extract 1/2 from these images as training set, 1/6 as validation set and 1/3 as test set.
4.2 Dataset Property
There are 7 land classes in total in this dataset. Specifically, they are background, farmland, bare land, industrial area, parking area, residential area, and water body, which are common categories in urban area. The change that happens among these categories also gets much attention in practice. Fig. 5 presents the detailed distribution of these categories. As we can see, farmland takes up the largest proportion in both source image and destination image. After several years, a large proportion of farmland and bare land has disappeared. On the contrary, industrial area, parking area and residential area have increased a lot.
We further calculate the distribution of all change types, as illustrated in Fig. 6. Obviously, no change is the most common situation for the dataset. This is reasonable for a very large area. Also, it’s interesting to look up the symmetric elements along diagonal. For instance, 1,498,180 pixels of farmland in the source image have changed to industrial area in destination image, and 321,957 pixels of bare land have changed to parking area. But the opposite situation didn’t happen. This characterizes the development situation of the city, i.e., the city is at a fast development period, not in decaying period. However, current change detection task won’t present this information. It effectively demonstrates the significance of proposed SCPA task in practical use.
5.1 Experimental Settings
We have conducted extensive baseline experiments on SCPA-WC dataset for the task. To fully take use of labeled data, we use training set and validation set together to train these models, and test set to evaluate their performance.
About training details, all models are trained on 4 Titan XP GPUs with 4
12GB memory totally. Stochastic gradient descent (SGD) with momentum is adopted to train this network. Momentum value is set as 0.9. We apply poly learning rate policy to adjust learning rate, which reduces the learning rate per iteration. This could be expressed as:
where is the current learning rate, is initial learning rate, is the current iteration step, and is the maximal iteration step. The is set as 0.01 and is set as 0.9. The
depends on batch size and the number of epoch. For all models, batch size is 12 and the number of epoch is set to 100 to make sure these networks converge.
5.2 Baseline Methods
We adopt two-step method mentioned above as the baseline in our experiments. As we explained before, it consists of two stages. The first stage is to get the land class label for both source image and destination image, which actually equals to semantic segmentation task. The other stage is to compare the land class label and get the corresponding change type. To make a comprehensive experiment, we have benchmarked many typical semantic segmentation methods on the SCPA-WC dataset.
. FCN is the first one dealing with semantic segmentation task with proposed fully convolutional network. FRRN designs two information streams to combine high resolution feature maps with low resolution ones. GCN demonstrats the importance of large convolutional kernel and utilizes it to improve segmentation performance. DeepLabv3 adopts astrous spatial pyramid pooling module with global pooling operation to extract high resolution feature maps without adding extra parameter. RefineNet is a generic multi-path refinement network that explicitly exploits all the information along the down-sampling process. It enables high-resolution prediction with long-range residual connections.
5.3 Experimental Analysis
Tab. 1 presents quantitative results. Fig. 7 demonstrates visualization results of these methods. As Tab. 1 shows, these methods can not deal with the SCPA task well. The mean IoU is as low as 8.31%. For change types (0,5) and (0,6), all methods fail totally. This indicates the simple two-step pipeline based on semantic segmentation only is far from satisfactory. Hence there is much room to create new methods for the challenging SCPA task.
Besides, as we can see in Tab. 1, the binary accuracy metric is high. This means these methods can handle traditional binary change detection task well. On the contrary, SCPA task is really challenging for them. That proves SCPA is a higher-level and more complex task than traditional change detection, which is worth exploration and study.
6 Future of Semantic Change Pattern Analysis
Semantic change pattern analysis is a meaningful and challenging task. Current methods are not able to handle the task well. To facilitate the development of the filed, we would like to conclude by discussing some possibilities.
Dataset. Although this work provides the first well-annotated dataset for SCPA task, it only focuses on aerial images. Datasets in other fields, like street view are also expected. These datasets would broaden SCPA task’s usage and benefit the evaluation of SCPA models in those fields.
Method. As the experiment shows, two-step method based on semantic segmentation only performs poor on SCPA task. This encourages us to explore other ways. On one hand, for two-step method, we think the main reason why these naive ways fail is they don’t take the relation between the source image and destination image into consideration. Hence we could try to introduce spatial-temporal relation to the problem. For instance, status transition probability could be used as a auxiliary information to decide the change type. Similar thought has been studied for binary change detection task in . On the other hand, we also expect a more unified method, which can handle both source image and destination image simultaneously and give the final SCPA result directly and neatly. We think this is really interesting though challenging. Siamese network might be useful.
Application. Though we mainly focus on aerial images in this work, the application of SCPA should not be limited to this field. It could be applied to general scenes including street view, just like existing usages of change detection in general vision field [14, 26, 31, 29, 1]. Besides, we think visual tracking task and SCPA might benefit each other, because spatial-temporal relation is an important part for both tasks. We hope semantic change pattern analysis task could draw more attention from the community, and invigorate the change detection and related fields.
-  (2018) Street-view change detection with deconvolutional networks. Autonomous Robots 42 (7), pp. 1301–1322. Cited by: §2.2, §2.3, §6.
Change detection in optical aerial images by a multilayer conditional mixed markov model. IEEE Transactions on Geoscience and Remote Sensing 47 (10), pp. 3416–3430. Cited by: §2.3.
-  (2013) Change detection in feature space using local binary similarity patterns. In 2013 International Conference on Computer and Robot Vision, pp. 106–112. Cited by: §1.
-  (2011) Constrained optical flow for aerial image change detection. In 2011 IEEE International Geoscience and Remote Sensing Symposium, pp. 4176–4179. Cited by: §2.3.
-  (2005) A wavelet-based change-detection technique for multitemporal sar images. In International Workshop on the Analysis of Multi-Temporal Remote Sensing Images, 2005., pp. 85–89. Cited by: §2.1.
-  (2012) A novel framework for the design of change-detection systems for very-high-resolution remote sensing images. Proceedings of the IEEE 101 (3), pp. 609–630. Cited by: §1.
-  (2000) Automatic analysis of the difference image for unsupervised change detection. IEEE Transactions on Geoscience and Remote sensing 38 (3), pp. 1171–1182. Cited by: §2.1.
Unsupervised change detection in satellite images using principal component analysis and-means clustering. IEEE Geoscience and Remote Sensing Letters 6 (4), pp. 772–776. Cited by: §2.1.
-  (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587. Cited by: §5.2, Table 1.
-  (2018) MFCNET: end-to-end approach for change detection in images. In 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 4008–4012. Cited by: §2.1, §3.3.
-  (2018) Urban change detection for multispectral earth observation using convolutional neural networks. In 2018 IEEE International Geoscience and Remote Sensing Symposium, pp. 2115–2118. Cited by: §2.3.
-  (2018) Fully convolutional siamese networks for change detection. In 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 4063–4067. Cited by: §3.3.
-  (2019) Multitask learning for large-scale semantic change detection. Computer Vision and Image Understanding 187, pp. 102783. Cited by: §1, §1, §1, §2.1, §2.2, §2.3.
-  (2008) Using 3d line segments for robust and efficient change detection from multiple noisy images. In European Conference on Computer Vision (ECCV), pp. 172–185. Cited by: §6.
Convolutional neural network features based change detection in satellite images.
First International Workshop on Pattern Recognition, Vol. 10011, pp. 100110W. Cited by: §2.1.
-  (2017) Zoom out cnns features for optical remote sensing change detection. In 2017 2nd International Conference on Image, Vision and Computing (ICIVC), pp. 812–817. Cited by: §2.1.
-  (2015) Fine-grained change detection of misaligned scenes with varied illuminations. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1260–1268. Cited by: §1.
-  (2012) Changedetection. net: a new change detection benchmark dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 1–8. Cited by: §1.
-  (2013) Semantic approach in image change detection. In International Conference on Advanced Concepts for Intelligent Vision Systems, pp. 450–459. Cited by: §1.
-  (2016) Semantic change detection with hypermaps. arXiv preprint arXiv:1604.07513 2 (4). Cited by: §1, §2.2, §2.3.
-  (2017) Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1925–1934. Cited by: §5.2, Table 1.
-  (2008) Using local transition probability models in markov random fields for forest change detection. Remote Sensing of Environment 112 (5), pp. 2222–2231. Cited by: §6.
-  (2015) Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440. Cited by: §5.2, Table 1.
-  (2019) Robust change captioning. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4624–4633. Cited by: §1.
-  (2017) Large kernel matters–improve semantic segmentation by global convolutional network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4353–4361. Cited by: §5.2, Table 1.
-  (2014) Change detection in the presence of motion blur and rolling shutter effect. In European Conference on Computer Vision (ECCV), pp. 123–137. Cited by: §6.
-  (2017) Full-resolution residual networks for semantic segmentation in street scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4151–4160. Cited by: §5.2, Table 1.
-  (2019) Did it change? learning to detect point-of-interest changes for proactive map updates. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4086–4095. Cited by: §1.
-  (2015) Change detection from a street image pair using cnn features and superpixel segmentation.. In BMVC, pp. 61–1. Cited by: §2.3, §6.
-  (2013) City-scale change detection in cadastral 3d models using images. In Proceedings of the IEEE Conference on computer Vision and Pattern Recognition (CVPR), pp. 113–120. Cited by: §1.
-  (2014) Image-based 4-d reconstruction using 3-d change detection. In European Conference on Computer Vision (ECCV), pp. 31–45. Cited by: §6.
ChangeNet: a deep learning architecture for visual change detection. In Proceedings of the European Conference on Computer Vision (ECCV) workshop, Cited by: §1, §2.2.
-  (2014) CDnet 2014: an expanded change detection benchmark dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 387–394. Cited by: §1.
-  (2017) Kernel slow feature analysis for scene change detection. IEEE Transactions on Geoscience and Remote Sensing 55 (4), pp. 2367–2384. Cited by: §4.1.
-  (2017) Change detection based on deep siamese convolutional network for optical aerial images. IEEE Geoscience and Remote Sensing Letters 14 (10), pp. 1845–1849. Cited by: §2.1.