While climate change has become one of the greatest threats to our world, renewable energy such as solar power is critical to fight climate changeChu and Majumdar (2012); Agnew and Dargusch (2015). China, as the world’s leading installer of solar photovoltaics (PV), is the world’s largest producer of solar PV power and massive solar farms were built not only to produce clean energy but also to reduce poverty.
However, one question remains to be answered: where are those solar farms located? Mapping the location of solar farms and tracking its installation progress is particularly important for the following aspects: first, it allows the government to gauge the development of solar power industry and make strategies; second, it helps the solar power company to quantify and optimize the efficiency of solar panels; third, it is useful for investors to evaluate the operation of solar power companies. Obviously, it is impractical to locate solar farms on maps manually. What if we can trace them systematically from the sky? Most recently, more and more companies have launched satellites into space, produced massive satellite imagery data and therefore accelerated its commercialization in various fields.
In this paper, we proposed a deep learning framework named SolarNet, which is used to analyze large-scale high-resolution satellite imagery data and is able to accurately identify hundreds visible large solar farms in China while many of those are built in deserts, mountains and even lakes. To the best of our knowledge, it is the first time that the locations and sizes of solar farms in China are tracked by mining satellite imagery data through deep learning algorithms.
2 Related Works
In this section, we give a brief review of related works. Semantic segmentationLong et al. (2015)
is an important computer vision technique that has been widely applied to detect objects from remote sensing imagery data, such as urban architectural segmentationWei et al. (2004); Bischke et al. (2019), road extractionMokhtarzade and Zoej (2007), crop segmentationRydberg and Borgefors (2001), etc. However, compared with natural images, segmentation on satellite imagery data is much more challenging due to: 1) the resolution of different satellites may be not consistent, 2) the size of satellite is huge which may lead to huge computational cost, 3) the background, cloud, reflection of sunshine etc. could also complicate the segmentation, 4)the texture of solar panels may also vary due to various sensor specs. Our framework SolarNet which could detect solar farms from satellite imagery data is designed based on semantic segmentation.
Semantic Segmentation: Deep learning has achieved great success in semantic segmentation taskKrizhevsky et al. (2012). In 2014, Full Convolutional Network (FCN)Long et al. (2015), which replaced the network’s fully connected layer with convolution, was proposed and achieved much higher accuracy than the patch classification methodVarma and Zisserman (2008). Recently, Li et al. (2019) proposed by Xia Li on ICCV 2019 demonstrated a state-of-the-art segmentation algorithm named EmaNet.
Solar Panel Detection: Most recently, Yu etc.Yu et al. (2018) proposed a framework called DeepSolar which successfully located the civil solar panels in the United States and developed a public data set. Their data set mainly focused on household solar power planes in the US, by contrast, most of the large solar power plants in China were built in the fields with complex background such as deserts, mountains and even lakes as shown in Figure 1, which pose more challenges to the detection task. Our algorithm addressed those difficulties by combining the advantage of FCN and EmaNet. In order to fully evaluate the proposed segmentation method, we also particularly created a satellite imagery data set of the solar plants in China to train our model.
SolarNet is based on Expectation-Maximization Attention Networks (EMANet). In order to compare the performance, we used UNet as a baseline algorithm, which is one of most popular deep learning based semantic segmentation methods.
The network architecture is described in detail in Table 1. It has tow parts: a contracting path and an expansive path. The contracting path follows the typical architecture of a convolutional network. we uses two repeated convolutions with 3
2 max pooling operation with stride 2 for downsampling. At each downsampling step we made the number of feature channels becomes to double times. In the expansive process every step consists of upsampling feature map followed by a 22 convolution that halves the number of feature channels, a concatenation with the correspondingly cropped feature map from the contracting path, and two 3
3 convolutions, each followed by a BN layer and a ReLU layer. In the final layer, a 11 convolution is used to map each 2-component feature vector to the desired number of classes whether this pixel is solar plane or not. The network has 17 convolutional layers in total.
|3x3 conv 64 dim||3x3 conv 64 dim||pooling||BN & RELU|
|3x3 conv 128 dim||3x3 conv 128 dim||pooling||BN & RELU|
|3x3 conv 256 dim||3x3 conv 256 dim||pooling||BN & RELU|
|3x3 conv 512 dim||3x3 conv 512 dim||pooling||BN & RELU|
|3x3 conv 512 dim||3x3 conv 512 dim||upsampling||BN & RELU|
|3x3 conv 256 dim||3x3 conv 256 dim||upsampling||BN & RELU|
|3x3 conv 128 dim||3x3 conv 128 dim||upsampling||BN & RELU|
|3x3 conv 64 dim||3x3 conv 64 dim||upsampling||BN & RELU|
|1x1 conv 2 dim||SoftMax|
3.2 SolarNet: a multitask Expectation-Maximization Attention Networks
Attention mechanism have been widely used for various tasks. The proposed Expectation-Maximization Attention (EMA) module Moon (1996)
is robust with regard to the variance of input and is also efficient in terms of memory and computational powerWang et al. (2018). For a simple introudction, we consider an input feature map of size from a single image. was the intermediate activated feature map of a CNN. We reshaped into , where . Briefly, given the input , the initial bases and
are the latent variables. The E-step is used to estimates the latent variables, and then used the M-step updated bases . After times iteration, we reconstruct the since , lies in a subspace of . This method removes much unnecessary noise and makes the final classification of each pixel more segmentable. Moreover, this operation reduces the complexity from to in the pixel segmentation process.
where represents the general kernel function, we simply take the exponential inner dot in our implementation.
One shortcoming of FCN segmentation structure is that its multiple local convolution operations is not able to capture sufficient global information,and thus harms the performance in discontinuous object segmentation. The structure of EMAU based on EM algorithm is an unsupervised clustering algorithm without convolution operation and thus could effectively captures the global information. In our case, the solar power plants usually scatter in various discontinuous areas as shown in Figure 4, and EMANet is able to deal with such case as shown in the result section.
Inspired by Zhou and Le’s workZhou et al. (2016); Le et al. (2019), we proposed an optimized multitask-EMANet, which combines local pixel-level segmentation and global image-level classification. Many existing studies show that the feature map of classification network usually corresponds the area of the object to be segmented, which could improve the segmentation performance.
Moreover, the work of DeepSolarYu et al. (2018) did not use the segmentation network but leveraged the intermediate results from the classification branch and generated the Class Activation Maps (CAMs) by aggregating feature maps learned through the convolutional layers. This method did not require segmentation ground truth to train the model, but required the ground truth of class label to minimize the classification error.
The proposed SolarNet architecture used pretrained ResNet-101 as backboneHe et al. (2016)
and the EMAU module to extract features. After re-configuring the features of EMAU module, the feature of ResNet-101 were then summed together and the last summed one was used to the last segmentation task. SolarNet adopted the classification network to further enhance the segmentation results. Meanwhile, the classification network shares the same weight with segmentation network, and the final layer is a fully connected layer which is used to classify whether contains the solar planes or not. With single forward pass we then computed the segmentation loss and classification loss simultaneously. The network architecture is shown in Figure4.
When training the model, we also adopted adam gradient descent methodBurges et al. (2005); Dozat (2016). In order to fully incorporate the EMAU’s into deep neural networks, we here describe how to train EMAU in each iteration. As each image has different pixel feature distributions compared to others, using the to reconstruct feature maps of a new image is not suitable. So we need to run EMAU moudle on each image simultaneously. For the first mini-batch, the Kaiming’s initializationHe et al. (2015) has been used to initialize , where the matrix multiplication can be treadted as a convolution. For the following batches, we can simple used back propagation to update
by standard. However, since iterations of E-step and M-step can be expanded as a recurrent neural network (RNN)Mikolov et al. (2010), the gradients propagating though them will generate the vanishing or explosion problem. Therefore, the updating of is unstable, moving averagingDandawate and Giannakis (1995) has been used to update in the training process. After several iterations over an image, the generated can be considered as a biased update of , where the bias comes from the image sampling process.
The pseudo code of the training process of SolarNet is shown in Algorithm 1. It is important to note that in each iteration a semi-supervised clustering process of T-round EMAU module is required. And in the test process, each image was performed a clustering process with T-round iteration.
In this section, we elaborated the implementation details of SolarNet and demonstrated the results of all the solar farms in China that we have mapped. First we compared the performance of SolarNet and two other baseline methods with regard to three kinds of datasets. Then we visualized the locations and distributions of all solar power plants in China detected by SolarNet. Furthermore, we showed several bad cases and discussed how to future improve our algorithms in the future.
819 images were used to train the mode while 119 images were used to test the model. The size of all the images ranges from to . In order the create more dataste to train the model, we adopted the following data augmentation methods:
Crop: Choosed a random ROI area from a original image: .
Scale: Choosed a random scale size , rescaled the original image:
Rotation: Choosed a random angle , rotated the orignal image:
Reflection: Flipped the original image horizontally: , or flipped the original image vertically:
|Parameter||Learning Rate||Iteration||Training Set||Testing Set|
|Parameter||EM Iteration||EM Latent Variables Size|
We used mean Intersection over Union (mIoU) as the criteria to evaluate segmentation performance and compared the SolarNet with two other methods. The results in Table 3 shows that the SolarNet outperformed two others. Figure 5(d) demonstrated several solar farms detected by all three methods and one can see that SolarNet is able to accurate detect the solar farms under very complex backgrounds. Figure showed two sizeable solar farms we detected which shaped like a horse and panda, respectively.
|our dataset||deepsolar dataset||our+deepsolar dataset|
We then used the trained SolarNet framework to map all the solar farms in China by mining large scale satellite imagery data that covered the whole China. We successfully detected about 500 solar farms covering the area of 2000 square kilometers or 770 square miles in total, equivalent to the size of whole Shenzhen city or two and a half or New York city. Figure 8 visualized the locations of all detected solar farms in China marked by blue dots. One can see that most of the solar farms were built in the northwestern part of China where the sunlight is abundant and thus is ideal for solar power. Among all the provinces in China, Qinghai has installed the most solar farms with the area of near 400 square kilometers in total as shown in Figure 7.
5 Discussion and future work
In this paper, we proposed a deep learning framework named SolarNet to map the solar farms from massive satellite imagery data. The method was also evaluated by comparing with two other image segmentation algorithms and the results showed the accuracy of SolarNet. We then used SolarNet to successfully detect near 500 large solar farms in China, covering near 2000 square kilometers equivalent to the whole size of Shenzhen city. To the best of our knowledge, it is the first time that we identified the locations and sizes of solar farms on satellite imagery through deep learning in China, the largest producer of solar power in the world.
SolarNet may fail to detect the solar farms when the it resembles its surrounding background as shown in Figure 9. In the future, we plan to improve our methods in the following way:
Labeling more solar panels from the satellite imagery data in various circumstances, such as the solar panels on the roof in residential areas.
Mapping and tracking the installment of solar panel from satellite imagery data is very helpful for the following fields: 1) it could help the solar PV power companies to optimize the location and direction of solar panels so that they can maximize their renewable energy production; 2) it could help the investors and market researchers to track the latest trends of solar power industry; 3) the government could evaluate their policy efficiency based on our results, for example, how the subsidiary policy is impacting the development of solar power industry. Therefore, we plan to build a Solar Power Index in China by analyzing longer historical satellite imagery data with SolarNet so that we could track long term trends. And we also plan to apply the proposed framework to map the locations and develop the index of other type of renewable energy such as wind turbine.
- Effect of residential solar and storage on centralized electricity supply systems. Nature Climate Change 5 (4), pp. 315–318. Cited by: §1.
- Detection of neolithic settlements in thessaly (greece) through multispectral and hyperspectral satellite imagery. Sensors 9 (2), pp. 1167–1187. Cited by: item 3).
- Detecting the effects of hydrocarbon pollution in the amazon forest using hyperspectral satellite images. Environmental Pollution 205, pp. 225–239. Cited by: item 3).
- Multi-resolution, object-oriented fuzzy analysis of remote sensing data for gis-ready information. ISPRS Journal of photogrammetry and remote sensing 58 (3-4), pp. 239–258. Cited by: item 2).
- Multi-task learning for segmentation of building footprints with deep neural networks. In 2019 IEEE International Conference on Image Processing (ICIP), pp. 1480–1484. Cited by: §2.
Learning to rank using gradient descent.
Proceedings of the 22nd International Conference on Machine learning (ICML-05), pp. 89–96. Cited by: §3.2.
- Opportunities and challenges for a sustainable energy future. nature 488 (7411), pp. 294. Cited by: §1.
- R-fcn: object detection via region-based fully convolutional networks. In Advances in neural information processing systems, pp. 379–387. Cited by: §3.1.
Asymptotic theory of mixed time averages and kth-order cyclic-moment and cumulant statistics. IEEE Transactions on Information Theory 41 (1), pp. 216–232. Cited by: §3.2.
Incorporating nesterov momentum into adam. Cited by: §3.2.
Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pp. 1026–1034. Cited by: §3.2.
- Identity mappings in deep residual networks. In European conference on computer vision, pp. 630–645. Cited by: §3.2.
- Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §2.
- Multitask classification and segmentation for cancer diagnosis in mammography. arXiv preprint arXiv:1909.05397. Cited by: §3.2.
- Expectation-maximization attention networks for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pp. 9167–9176. Cited by: §2.
Fully convolutional networks for semantic segmentation.
Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440. Cited by: §2, §2.
- Recurrent neural network based language model. In Eleventh annual conference of the international speech communication association, Cited by: §3.2.
- Road detection from high-resolution satellite images using artificial neural networks. International journal of applied earth observation and geoinformation 9 (1), pp. 32–40. Cited by: §2.
- The expectation-maximization algorithm. IEEE Signal processing magazine 13 (6), pp. 47–60. Cited by: §3.2.
- Integrated method for boundary delineation of agricultural fields in multispectral satellite images. IEEE Transactions on Geoscience and Remote Sensing 39 (11), pp. 2514–2520. Cited by: §2.
Deep high-resolution representation learning for human pose estimation. arXiv preprint arXiv:1902.09212. Cited by: item 2).
- A statistical approach to material classification using image patch exemplars. IEEE transactions on pattern analysis and machine intelligence 31 (11), pp. 2032–2047. Cited by: §2.
- Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803. Cited by: §3.2.
- Urban building extraction from high-resolution satellite panchromatic image using clustering and edge detection. In IGARSS 2004. 2004 IEEE International Geoscience and Remote Sensing Symposium, Vol. 3, pp. 2008–2010. Cited by: §2.
- DeepSolar: a machine learning framework to efficiently construct a solar deployment database in the united states. Joule 2 (12), pp. 2605–2617. Cited by: §2, §3.2.
Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921–2929. Cited by: §3.2.