SolarNet: A Deep Learning Framework to Map Solar Power Plants In China From Satellite Imagery

12/08/2019 ∙ by Xin Hou, et al. ∙ 30

Renewable energy such as solar power is critical to fight the ever more serious climate change. China is the world leading installer of solar panel and numerous solar power plants were built. In this paper, we proposed a deep learning framework named SolarNet which is designed to perform semantic segmentation on large scale satellite imagery data to detect solar farms. SolarNet has successfully mapped 439 solar farms in China, covering near 2000 square kilometers, equivalent to the size of whole Shenzhen city or two and a half of New York city. To the best of our knowledge, it is the first time that we used deep learning to reveal the locations and sizes of solar farms in China, which could provide insights for solar power companies, market analysts and the government.



There are no comments yet.


page 2

page 5

page 8

page 9

page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

While climate change has become one of the greatest threats to our world, renewable energy such as solar power is critical to fight climate changeChu and Majumdar (2012); Agnew and Dargusch (2015). China, as the world’s leading installer of solar photovoltaics (PV), is the world’s largest producer of solar PV power and massive solar farms were built not only to produce clean energy but also to reduce poverty.

However, one question remains to be answered: where are those solar farms located? Mapping the location of solar farms and tracking its installation progress is particularly important for the following aspects: first, it allows the government to gauge the development of solar power industry and make strategies; second, it helps the solar power company to quantify and optimize the efficiency of solar panels; third, it is useful for investors to evaluate the operation of solar power companies. Obviously, it is impractical to locate solar farms on maps manually. What if we can trace them systematically from the sky? Most recently, more and more companies have launched satellites into space, produced massive satellite imagery data and therefore accelerated its commercialization in various fields.

In this paper, we proposed a deep learning framework named SolarNet, which is used to analyze large-scale high-resolution satellite imagery data and is able to accurately identify hundreds visible large solar farms in China while many of those are built in deserts, mountains and even lakes. To the best of our knowledge, it is the first time that the locations and sizes of solar farms in China are tracked by mining satellite imagery data through deep learning algorithms.

2 Related Works

In this section, we give a brief review of related works. Semantic segmentationLong et al. (2015)

is an important computer vision technique that has been widely applied to detect objects from remote sensing imagery data, such as urban architectural segmentation

Wei et al. (2004); Bischke et al. (2019), road extractionMokhtarzade and Zoej (2007), crop segmentationRydberg and Borgefors (2001), etc. However, compared with natural images, segmentation on satellite imagery data is much more challenging due to: 1) the resolution of different satellites may be not consistent, 2) the size of satellite is huge which may lead to huge computational cost, 3) the background, cloud, reflection of sunshine etc. could also complicate the segmentation, 4)the texture of solar panels may also vary due to various sensor specs. Our framework SolarNet which could detect solar farms from satellite imagery data is designed based on semantic segmentation.

Semantic Segmentation: Deep learning has achieved great success in semantic segmentation taskKrizhevsky et al. (2012). In 2014, Full Convolutional Network (FCN)Long et al. (2015), which replaced the network’s fully connected layer with convolution, was proposed and achieved much higher accuracy than the patch classification methodVarma and Zisserman (2008). Recently, Li et al. (2019) proposed by Xia Li on ICCV 2019 demonstrated a state-of-the-art segmentation algorithm named EmaNet.

Solar Panel Detection: Most recently, Yu etc.Yu et al. (2018) proposed a framework called DeepSolar which successfully located the civil solar panels in the United States and developed a public data set. Their data set mainly focused on household solar power planes in the US, by contrast, most of the large solar power plants in China were built in the fields with complex background such as deserts, mountains and even lakes as shown in Figure 1, which pose more challenges to the detection task. Our algorithm addressed those difficulties by combining the advantage of FCN and EmaNet. In order to fully evaluate the proposed segmentation method, we also particularly created a satellite imagery data set of the solar plants in China to train our model.

Figure 1: Part of solar farms in China. The first row shows solar power plants in the deserts, the second rows shows solar power plants in the mountains while the last row shows solar power plants in the lakes. One can see the complex backgrounds in those images.

3 Method

SolarNet is based on Expectation-Maximization Attention Networks (EMANet). In order to compare the performance, we used UNet as a baseline algorithm, which is one of most popular deep learning based semantic segmentation methods.

3.1 UNet

Different from the classic Convolutional Neural Networks (CNN), the convolutional layer of FCN adopts the fully connected layer to obtain fixed-length feature vectors for classification

Dai et al. (2016), and thus is able to deal with input images with any size. The deconvolution layer of FCN performs the feature map of the last volume-bases layer. This architecture can produce a prediction for each pixel, while retaining the spatial information in the original input image. The UNet architecture which stems from FCN was first proposed by [] is used as a baseline model and the net architecture is illustrated in Figure 2.

Figure 2: UNet Architecture.

The network architecture is described in detail in Table 1. It has tow parts: a contracting path and an expansive path. The contracting path follows the typical architecture of a convolutional network. we uses two repeated convolutions with 3

3 kernerl size, while each is followed by a batch normalization layer and a rectified linear unit, a 2

2 max pooling operation with stride 2 for downsampling. At each downsampling step we made the number of feature channels becomes to double times. In the expansive process every step consists of upsampling feature map followed by a 2

2 convolution that halves the number of feature channels, a concatenation with the correspondingly cropped feature map from the contracting path, and two 3

3 convolutions, each followed by a BN layer and a ReLU layer. In the final layer, a 1

1 convolution is used to map each 2-component feature vector to the desired number of classes whether this pixel is solar plane or not. The network has 17 convolutional layers in total.

3x3 conv 64 dim 3x3 conv 64 dim pooling BN & RELU
3x3 conv 128 dim 3x3 conv 128 dim pooling BN & RELU
3x3 conv 256 dim 3x3 conv 256 dim pooling BN & RELU
3x3 conv 512 dim 3x3 conv 512 dim pooling BN & RELU
3x3 conv 512 dim 3x3 conv 512 dim upsampling BN & RELU
3x3 conv 256 dim 3x3 conv 256 dim upsampling BN & RELU
3x3 conv 128 dim 3x3 conv 128 dim upsampling BN & RELU
3x3 conv 64 dim 3x3 conv 64 dim upsampling BN & RELU
1x1 conv 2 dim SoftMax
Table 1: UNet architecture detail

3.2 SolarNet: a multitask Expectation-Maximization Attention Networks

Attention mechanism have been widely used for various tasks. The proposed Expectation-Maximization Attention (EMA) module Moon (1996)

is robust with regard to the variance of input and is also efficient in terms of memory and computational power

Wang et al. (2018). For a simple introudction, we consider an input feature map of size from a single image. was the intermediate activated feature map of a CNN. We reshaped into , where . Briefly, given the input , the initial bases and

are the latent variables. The E-step is used to estimates the latent variables

, and then used the M-step updated bases . After times iteration, we reconstruct the since , lies in a subspace of . This method removes much unnecessary noise and makes the final classification of each pixel more segmentable. Moreover, this operation reduces the complexity from to in the pixel segmentation process.



where represents the general kernel function, we simply take the exponential inner dot in our implementation.



One shortcoming of FCN segmentation structure is that its multiple local convolution operations is not able to capture sufficient global information,and thus harms the performance in discontinuous object segmentation. The structure of EMAU based on EM algorithm is an unsupervised clustering algorithm without convolution operation and thus could effectively captures the global information. In our case, the solar power plants usually scatter in various discontinuous areas as shown in Figure 4, and EMANet is able to deal with such case as shown in the result section.

Figure 3: When performing convolution operation, each convolution operator only extracts the local spatial features. By contrast, after multi-level convolution operation, the continuous spatial information of the feature map is split by each convolution operator. The EMAU module performs clustering operation of element wise, and could capture more the global information in space.

Inspired by Zhou and Le’s workZhou et al. (2016); Le et al. (2019), we proposed an optimized multitask-EMANet, which combines local pixel-level segmentation and global image-level classification. Many existing studies show that the feature map of classification network usually corresponds the area of the object to be segmented, which could improve the segmentation performance.


Moreover, the work of DeepSolarYu et al. (2018) did not use the segmentation network but leveraged the intermediate results from the classification branch and generated the Class Activation Maps (CAMs) by aggregating feature maps learned through the convolutional layers. This method did not require segmentation ground truth to train the model, but required the ground truth of class label to minimize the classification error.

The proposed SolarNet architecture used pretrained ResNet-101 as backboneHe et al. (2016)

and the EMAU module to extract features. After re-configuring the features of EMAU module, the feature of ResNet-101 were then summed together and the last summed one was used to the last segmentation task. SolarNet adopted the classification network to further enhance the segmentation results. Meanwhile, the classification network shares the same weight with segmentation network, and the final layer is a fully connected layer which is used to classify whether contains the solar planes or not. With single forward pass we then computed the segmentation loss and classification loss simultaneously. The network architecture is shown in Figure


Figure 4: SolarNet Architecture: in addition to the EMA operator, two convolutions at the beginning and the end of EMA, sum the output with original input, to form a residual-like block.

When training the model, we also adopted adam gradient descent methodBurges et al. (2005); Dozat (2016). In order to fully incorporate the EMAU’s into deep neural networks, we here describe how to train EMAU in each iteration. As each image has different pixel feature distributions compared to others, using the to reconstruct feature maps of a new image is not suitable. So we need to run EMAU moudle on each image simultaneously. For the first mini-batch, the Kaiming’s initializationHe et al. (2015) has been used to initialize , where the matrix multiplication can be treadted as a convolution. For the following batches, we can simple used back propagation to update

by standard. However, since iterations of E-step and M-step can be expanded as a recurrent neural network (RNN)

Mikolov et al. (2010), the gradients propagating though them will generate the vanishing or explosion problem. Therefore, the updating of is unstable, moving averagingDandawate and Giannakis (1995) has been used to update in the training process. After several iterations over an image, the generated can be considered as a biased update of , where the bias comes from the image sampling process.


The pseudo code of the training process of SolarNet is shown in Algorithm 1. It is important to note that in each iteration a semi-supervised clustering process of T-round EMAU module is required. And in the test process, each image was performed a clustering process with T-round iteration.

1:Random Initial network’s weights:
2:Original Satellite Imagery: Semantic Segmentation Imagery: Whether it contains solar panels:
4:function EStep()
5:     return
6:end function
7:function MStep()
8:     return
9:end function
10:for  do
14:     Random initial
16:     for  do
19:     end for
24:end for
Algorithm 1 SolarNet Training Procedure

4 Results

In this section, we elaborated the implementation details of SolarNet and demonstrated the results of all the solar farms in China that we have mapped. First we compared the performance of SolarNet and two other baseline methods with regard to three kinds of datasets. Then we visualized the locations and distributions of all solar power plants in China detected by SolarNet. Furthermore, we showed several bad cases and discussed how to future improve our algorithms in the future.

819 images were used to train the mode while 119 images were used to test the model. The size of all the images ranges from to . In order the create more dataste to train the model, we adopted the following data augmentation methods:

  • Crop: Choosed a random ROI area from a original image: .

  • Scale: Choosed a random scale size , rescaled the original image:

  • Rotation: Choosed a random angle , rotated the orignal image:

  • Reflection: Flipped the original image horizontally: , or flipped the original image vertically:

Parameter Learning Rate Iteration Training Set Testing Set
Parameter EM Iteration EM Latent Variables Size
Table 2: Parameters of SolarNet to train the model.

We used mean Intersection over Union (mIoU) as the criteria to evaluate segmentation performance and compared the SolarNet with two other methods. The results in Table 3 shows that the SolarNet outperformed two others. Figure 5(d) demonstrated several solar farms detected by all three methods and one can see that SolarNet is able to accurate detect the solar farms under very complex backgrounds. Figure showed two sizeable solar farms we detected which shaped like a horse and panda, respectively.

Model mIOU
our dataset deepsolar dataset our+deepsolar dataset
Resnet101-Unet 84.65% 84.22% 86.54%
Resnet101-EMANet-single 94.00% 90.98% 93.79%
SolarNet-Multitask-1.0 94.21% 90.39% 93.94%
Table 3: With the multi-task embedding, SolarNet could beat the orignal EMANet and UNET on our dataset evaluation.
Figure 5: Solar farms located by SolarNet. The first column is the orignal satellite imagery data. The blue area indicates the detected solar farms by UNet (second column) and EMANet (third column) and red area in the fourth column indicates the ground-truth labeled manually. One can see how SolarNet was able to accurately detect solar farms under very complicated backgrounds.
Figure 6: Two massive animal-shaped (horse and panda) solar farms detected by SolarNet.

We then used the trained SolarNet framework to map all the solar farms in China by mining large scale satellite imagery data that covered the whole China. We successfully detected about 500 solar farms covering the area of 2000 square kilometers or 770 square miles in total, equivalent to the size of whole Shenzhen city or two and a half or New York city. Figure 8 visualized the locations of all detected solar farms in China marked by blue dots. One can see that most of the solar farms were built in the northwestern part of China where the sunlight is abundant and thus is ideal for solar power. Among all the provinces in China, Qinghai has installed the most solar farms with the area of near 400 square kilometers in total as shown in Figure 7.

Figure 7: The area of detected solar farms in various provinces in China (unit: )
Figure 8: Solar farm map in China. Each blue dot indicates a detected solar farm from satellite imagery. We colored each province according to the area of solar farms (darker color indicates larger areas). A heat map of solar farm density was also overlaid. Ten representative solar farms built on deserts, mountains, lakes or the fields were also displayed.

5 Discussion and future work

In this paper, we proposed a deep learning framework named SolarNet to map the solar farms from massive satellite imagery data. The method was also evaluated by comparing with two other image segmentation algorithms and the results showed the accuracy of SolarNet. We then used SolarNet to successfully detect near 500 large solar farms in China, covering near 2000 square kilometers equivalent to the whole size of Shenzhen city. To the best of our knowledge, it is the first time that we identified the locations and sizes of solar farms on satellite imagery through deep learning in China, the largest producer of solar power in the world.

SolarNet may fail to detect the solar farms when the it resembles its surrounding background as shown in Figure 9. In the future, we plan to improve our methods in the following way:

  • Labeling more solar panels from the satellite imagery data in various circumstances, such as the solar panels on the roof in residential areas.

  • Adapting SolarNet to handle the satellite imagery data with various resolutions Benz et al. (2004). For example, HRNet proposed by Sun et al. (2019)

    is an effective super-resolution method to deal with various resolution images.

  • Using hyperspectral imagery data to enhance the segmentation performance. As showed in Alexakis et al. (2009); Arellano et al. (2015) could provide more information when detecting objects from satellite.

Figure 9: SolarNet may fail to detect the solar farm when it resembles its surrounding environment.

Mapping and tracking the installment of solar panel from satellite imagery data is very helpful for the following fields: 1) it could help the solar PV power companies to optimize the location and direction of solar panels so that they can maximize their renewable energy production; 2) it could help the investors and market researchers to track the latest trends of solar power industry; 3) the government could evaluate their policy efficiency based on our results, for example, how the subsidiary policy is impacting the development of solar power industry. Therefore, we plan to build a Solar Power Index in China by analyzing longer historical satellite imagery data with SolarNet so that we could track long term trends. And we also plan to apply the proposed framework to map the locations and develop the index of other type of renewable energy such as wind turbine.


  • S. Agnew and P. Dargusch (2015) Effect of residential solar and storage on centralized electricity supply systems. Nature Climate Change 5 (4), pp. 315–318. Cited by: §1.
  • D. Alexakis, A. Sarris, T. Astaras, and K. Albanakis (2009) Detection of neolithic settlements in thessaly (greece) through multispectral and hyperspectral satellite imagery. Sensors 9 (2), pp. 1167–1187. Cited by: item 3).
  • P. Arellano, K. Tansey, H. Balzter, and D. S. Boyd (2015) Detecting the effects of hydrocarbon pollution in the amazon forest using hyperspectral satellite images. Environmental Pollution 205, pp. 225–239. Cited by: item 3).
  • U. C. Benz, P. Hofmann, G. Willhauck, I. Lingenfelder, and M. Heynen (2004) Multi-resolution, object-oriented fuzzy analysis of remote sensing data for gis-ready information. ISPRS Journal of photogrammetry and remote sensing 58 (3-4), pp. 239–258. Cited by: item 2).
  • B. Bischke, P. Helber, J. Folz, D. Borth, and A. Dengel (2019) Multi-task learning for segmentation of building footprints with deep neural networks. In 2019 IEEE International Conference on Image Processing (ICIP), pp. 1480–1484. Cited by: §2.
  • C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. N. Hullender (2005) Learning to rank using gradient descent. In

    Proceedings of the 22nd International Conference on Machine learning (ICML-05)

    pp. 89–96. Cited by: §3.2.
  • S. Chu and A. Majumdar (2012) Opportunities and challenges for a sustainable energy future. nature 488 (7411), pp. 294. Cited by: §1.
  • J. Dai, Y. Li, K. He, and J. Sun (2016) R-fcn: object detection via region-based fully convolutional networks. In Advances in neural information processing systems, pp. 379–387. Cited by: §3.1.
  • A. V. Dandawate and G. B. Giannakis (1995)

    Asymptotic theory of mixed time averages and kth-order cyclic-moment and cumulant statistics

    IEEE Transactions on Information Theory 41 (1), pp. 216–232. Cited by: §3.2.
  • T. Dozat (2016)

    Incorporating nesterov momentum into adam

    Cited by: §3.2.
  • K. He, X. Zhang, S. Ren, and J. Sun (2015)

    Delving deep into rectifiers: surpassing human-level performance on imagenet classification

    In Proceedings of the IEEE international conference on computer vision, pp. 1026–1034. Cited by: §3.2.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Identity mappings in deep residual networks. In European conference on computer vision, pp. 630–645. Cited by: §3.2.
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §2.
  • T. Le, N. Thome, S. Bernard, V. Bismuth, and F. Patoureaux (2019) Multitask classification and segmentation for cancer diagnosis in mammography. arXiv preprint arXiv:1909.05397. Cited by: §3.2.
  • X. Li, Z. Zhong, J. Wu, Y. Yang, Z. Lin, and H. Liu (2019) Expectation-maximization attention networks for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pp. 9167–9176. Cited by: §2.
  • J. Long, E. Shelhamer, and T. Darrell (2015) Fully convolutional networks for semantic segmentation. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    pp. 3431–3440. Cited by: §2, §2.
  • T. Mikolov, M. Karafiát, L. Burget, J. Černockỳ, and S. Khudanpur (2010) Recurrent neural network based language model. In Eleventh annual conference of the international speech communication association, Cited by: §3.2.
  • M. Mokhtarzade and M. V. Zoej (2007) Road detection from high-resolution satellite images using artificial neural networks. International journal of applied earth observation and geoinformation 9 (1), pp. 32–40. Cited by: §2.
  • T. K. Moon (1996) The expectation-maximization algorithm. IEEE Signal processing magazine 13 (6), pp. 47–60. Cited by: §3.2.
  • A. Rydberg and G. Borgefors (2001) Integrated method for boundary delineation of agricultural fields in multispectral satellite images. IEEE Transactions on Geoscience and Remote Sensing 39 (11), pp. 2514–2520. Cited by: §2.
  • K. Sun, B. Xiao, D. Liu, and J. Wang (2019)

    Deep high-resolution representation learning for human pose estimation

    arXiv preprint arXiv:1902.09212. Cited by: item 2).
  • M. Varma and A. Zisserman (2008) A statistical approach to material classification using image patch exemplars. IEEE transactions on pattern analysis and machine intelligence 31 (11), pp. 2032–2047. Cited by: §2.
  • X. Wang, R. Girshick, A. Gupta, and K. He (2018) Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803. Cited by: §3.2.
  • Y. Wei, Z. Zhao, and J. Song (2004) Urban building extraction from high-resolution satellite panchromatic image using clustering and edge detection. In IGARSS 2004. 2004 IEEE International Geoscience and Remote Sensing Symposium, Vol. 3, pp. 2008–2010. Cited by: §2.
  • J. Yu, Z. Wang, A. Majumdar, and R. Rajagopal (2018) DeepSolar: a machine learning framework to efficiently construct a solar deployment database in the united states. Joule 2 (12), pp. 2605–2617. Cited by: §2, §3.2.
  • B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba (2016)

    Learning deep features for discriminative localization

    In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921–2929. Cited by: §3.2.