Automated pipeline for large scale detection of solar arrays in France
Photovoltaic (PV) energy is key to mitigating the current energy crisis. However, distributed PV generation, which amounts to half of the PV energy generation, makes it increasingly difficult for transmission system operators (TSOs) to balance the load and supply and avoid grid congestions. Indeed, in the absence of measurements, estimating the distributed PV generation is tough. In recent years, many remote sensing-based approaches have been proposed to map distributed PV installations. However, to be applicable in industrial settings, one needs to assess the accuracy of the mapping over the whole deployment area. We build on existing work to propose an automated PV registry pipeline. This pipeline automatically generates a dataset recording all distributed PV installations' location, area, installed capacity, and tilt angle. It only requires aerial orthoimagery and topological data, both of which are freely accessible online. In order to assess the accuracy of the registry, we propose an unsupervised method based on the Registre national d'installation (RNI), that centralizes all individual PV systems aggregated at communal level, enabling practitioners to assess the accuracy of the registry and eventually remove outliers. We deploy our model on 9 French départements covering more than 50 000 square kilometers, providing the largest mapping of distributed PV panels with this level of detail to date. We then demonstrate how practitioners can use our unsupervised accuracy assessment method to assess the accuracy of the outputs. In particular, we show how it can easily identify outliers in the detections. Overall, our approach paves the way for a safer integration of deep learning-based pipelines for remote PV mapping. Code is available at https://github.com/gabrielkasmi/dsfrance.READ FULL TEXT VIEW PDF
Automated pipeline for large scale detection of solar arrays in France
In 2021, photovoltaic (PV) generation amounted to 821 TWh worldwide and 14.3 TWh in France. The installed capacity is estimated to be about 633 GWp worldwide [iea2021solar]. In France, the installed capacity amounts to 13.66 GWp [rte2021bilan]. PV energy generation is rapidly growing and is key to fulfilling the goals of the sustainable development scenario (SDS) [haegel2017terawatt]. However, public authorities and industry stakeholders often lack knowledge about the installed PV capacity, especially regarding distributed PV installations [erdener2022review], which, in the context of growing installed capacity, could result in increased congestion, overgeneration, and supplementary reserve request [pierro2022progress].
In this context, transmission system operators (TSOs) need information regarding the PV production stemming from distributed or behind-the-meter installations to balance load and supply and avoid congestion on the transmission grid. Indeed, utility-scale PV outputs can be directly measured or estimated with regional PV models. However, the lack of precise knowledge regarding distributed PV installations’ characteristics makes it impossible to fit regional PV models. In order to quickly acquire information regarding the distributed PV fleet, remote sensing methods are appealing. Recent works [yu2018deepsolar, mayer20223d]
have shown that convolutional neural networks (CNNs) can be leveraged to quickly and accurately map solar installation over large areas.
However, it remains hard to assess the accuracy of these mapping pipelines during their large-scale deployment. As pointed out by [de2020monitoring, kausika2021geoai], this lack of accountability limits the ability to use such to produce official statistics regarding PV installations. [kausika2021geoai]
relied on manual verification to monitor the precision and recall of their model over large areas. We propose using open data statistics to automatically perform this monitoring, thus enabling fast deployment and quality control of automated PV mapping models. Our main contribution is to provide an unsupervised, simple, and scalable way of measuring the accuracy of our automated detection pipeline regarding the estimated installed capacity and number of installations. To summarize, our contributions are the following:
We extend state-of-the-art methods to map the characteristics of distributed PV installations over France using a deep learning-based pipeline. We provide the tilt, surface, and installed capacity for each installation. We cover a total area of more than 50 000 square kilometers, currently the largest registry with this level of detail.
Using open-source data, we design an unsupervised method that can assess the accuracy of the registry over the whole deployment area.
We demonstrate that our accuracy assessment method enables practitioners to quickly identify aberrant data generated by the automated pipeline, thus improving the safety of deep learning-based PV panel detection.
Remote detection of solar arrays on overhead imagery is now a well-established field of research. [de2020using] provides a complete overview of the works in this field. First works [malof2015automatic, malof2016automatic] used to rely on hand-crafted features and classification algorithms to identify PV panels on aerial images. The advent of deep-learning [lecun2015deep] enabled large scale mapping of PV panels using semantic segmentation [yuan2016large].
The DeepSolar project [yu2018deepsolar]
was a significant milestone as a deep learning-based pipeline was used to detect PV installations and estimate their surface area for the first time. The method relies on a two-step pipeline: images are classified, and if an image contains an array, it is passed to a segmentation model to identify the polygons corresponding to the PV installation. With this method, the authors achieved a precision of 93.1% (recall: 88.5%) in residential areas and precision of 93.7% (recall: 90.5%) in non-residential areas. This work triggered efforts to construct CNN-based pipelines aiming to map installations over large areas[mayer20223d, kausika2021geoai, kruitwagen2021global]. As pointed out by [de2020monitoring], pipelines developed over one territory cannot be straightforwardly applied to another. Besides, as it remains difficult to assess that the accuracy remains the same over the whole deployment territory, automated PV registries pipelines remain scarcely used for official statistics construction [de2020monitoring]. As a first step towards a better assessment of the accuracy of deep learning-based PV mapping pipelines, [kausika2021geoai] leveraged manual annotators to compute the precision and recall in areas that were not in the training dataset. However, their method is very labor-intensive and does not address the needs of practitioners. Besides, as pointed out by [stowell2020harmonised], France’s distributed PV generation has not been yet mapped.
We train our classification and segmentation models on a new dataset called BDAPPV (Base de données d’apprentissage profond photovoltaïque). This dataset contains labeled thumbnails of PV panels. These PV panels come from a PV database maintained by the non-profit association of small owners of PV panels "asso BDPV" (Base de données photovoltaïque). Table 1 summarizes the characteristics of our training dataset. A crowdsourcing campaign enabled us to gather this training data, which we will release in a forthcoming publication.
|Dataset||Total number of samples||Positive samples (share in %)|
We do image classification and segmentation on RGB orthoimagery. These images are provided by the Institut Géograhpique National (IGN) and are freely accessible online. This dataset is called BD ORTHO. The ground sampling distance of these images is 20 cm/pixel. These images cover all French départements. For our study, we downloaded the image bundles of 9 French départements, located in the North, West, South and East of France, covering approximately 10% of the French metropolitan territory.
We use topographic data, also provided by the IGN under an open license. This dataset is called BD TOPO. This data contains information on all the buildings registered in France. The main aim of this dataset is to filter the detections.
At the characteristics extraction stage of our PV registry pipeline (see figure 1), we use the PV characteristics gathered in the BDPV database to calibrate the characteristics extraction module of our pipeline. These characteristics include the tilt, azimuth, surface, and installed capacity of PV installations.
The final data we use is the Registre national d’installations (RNI), which aggregates the number of PV installations and the aggregated installed capacity per city. This database is updated every year and is provided under open access.
We build on [mayer20223d] to construct an automated PV registry pipeline. This pipeline takes as input orthoimages and topological data and outputs PV panels’ characteristics and localization. Figure 1 summarizes our approach.
We fine tuned our classification and segmentation models on our training dataset. During inference, we classify all images and segment only images on which a PV panel is detected. Segmentation masks are then converted into polygons. We use the weights of [mayer20223d] as an initialization for our fine-tuned models. Our classification model is based on the Inception-v3 architecture [szegedy2016rethinking] and our segmentation model is based on the DeepLab-v3 architecture [chen2017rethinking].
We convert the polygons into PV panels’ characteristics. We estimate the tilt angle based on the tilt angles reported in the BDPV database. The BDPV database spreads across France. We first cluster our detections by their surface. Then, for each surface cluster, we impute as the tilt the average tilt computed from nearby installations recorded in the BDPV database. The main objective is to capture the geographical variability of the tilt angle. The advantage of this method is that it is fast to compute and does not require surface models. Using the tilt angle and projected polygon area, we infer the real surface of the PV installation. Finally, we use the real surface to infer the installed capacity. Like previous works[so2017estimating, mayer20223d], we assume a linear relationship between surface and installed capacity. The coefficient captures the efficiency of the PV panels.
We use the BD TOPO to filter all polygons that are not on a building. We also filter installations that are either too small (the threshold is set at 1.7 square meters, the typical area of a single PV module) or too large. The upper threshold is 36 kWp as the RNI focuses on installations with an installed capacity lower than 36 kWp.
The RNI provides the total number of installations and the aggregated installed capacity city-wise. To use it as a reference, we aggregate our detections city-wise.
Using the RNI and the aggregations coming from our pipeline, we compute the following accuracy metrics:
The average percentage error (APE) and the mean APE (MAPE) computed over the whole département. The APE and MAPE are computed with respect to the installed capacity in city ,
The detection ratio , based on [mayer20223d] computed at the city level and averaged over the départements. This ratio is computed with respect with the number of installations in city ,
The average installation percentage error (AIPE) which is the APE computed for the average installation. By construction, a negative AIPE indicates that we underestimate the installations’ size, a positive AIPE indicates that we overestimate them.
Our fine-tuned model achieves competitive results compared to state-of-the-art models (see table 2). For the classification branch, we reach a F1-score of 0.84. For the segmentation branch, we reach an Intersection-over-Union (IoU) of 0.86.
We conduct our analysis on 9 French départements: Nord (north), Loire-Atlantique (west), Hérault (south), and 6 out of 8 départements of the Rhône-Alpes region (southeast). This way, we ensure sufficient geographical diversity in landscapes, population densities, and architectural styles. The total area covered is 51858 square kilometers. It is currently the largest area mapped with this level of detail, as [mayer20223d] covered an area of 34000 squared kilometers. It roughly corresponds to 10% of the French metropolitan territory. We can then compute the accuracy in terms of APE, detection ratio, and AIPE over the whole area.
Table 3 reports the MAPE, median APE, detection ratio and mean AIPE across départements and overall. We can see that the MAPE is 47%, and the mean ratio is slightly above 1. In detail, we can see that we tend to slightly overestimate (by 16%) the size of the installations compared to the baseline. Performance for all metrics is relatively constant across all départements but the Nord.
|Département||MAPE (%)||median APE (%)||mean||mean AIPE (%)|
|44 (Loire-Atlantique, west)||39.09||38.05||0.67||22.83|
|69 (Rhône, east, urban)||31.99||28.91||0.83||12.18|
|59 (Nord, north, urban)||130.06||88.13||2.23||61.16|
|34 (Hérault, south)||26.80||17.78||1.01||6.82|
|01 (Ain, east)||35.90||35.35||0.77||6.18|
|38 (Isère, east)||33.41||31.15||0.81||9.18|
|42 (Loire, east)||29.12||23.81||1.00||15.42|
|26 (Drôme, south east)||30.46||23.51||0.92||3.74|
|74 (Haute-Savoie, east,||44.08||41.18||0.67||-4.61|
Then, by displaying the scatter plot of the estimated installed capacity against the target installed capacity, we can identify cities where the estimation is very inaccurate. On the example depicted on figure 2, we can see that a few outlier cities contribute to the degradation of the overall accuracy of the model. These cities are located in the north département.
This example shows how it is possible to monitor the accuracy of the registry pipeline over the whole area of interest. We can identify areas where the model performs well and where on the other hand, one should be cautious about the quality of the data produced. Moreover, we enable the practitioner to inspect outliers to understand why the model is wrong.
We focus on our worst city to understand why its accuracy metrics are so bad. By reading the registry for this city, we see that the count of detections is accurate (21 detections for 22 targets). However, the estimation of installed capacity is too high (388 kWp for a target of 66 kWp). Looking at the registry, we can see that one installation has an installed capacity of 304 kWp. By visualizing the results as displayed in figure 3 we can see that the model detected a large factory window as a panel. With these tools, the practitioner can decide what is best to do given his application.
The need for the French TSO for detailed information on distributed PV motivates this work. However, this need is ubiquitous in many European countries [killinger2018search]. It is possible to apply our accuracy assessment in these countries: for instance, the RNI can be substituted by the Marktstammdatenregister (MaStR)for Germany[bundes2022markt] , the Stamdataregister for Denmark[energi2022stamdata].
We built on existing literature to construct an automated pipeline for large-scale distributed PV mapping and characterization. It addresses the burning need to map and characterize distributed PV installations. Existing works have not covered France, yet the growing PV installed capacity poses increasing challenges to stakeholders such as the TSO.
Crucially, we introduced an unsupervised method that enables stakeholders to assess the accuracy of the model’s outputs. We demonstrated that we could derive simple and interpretable metrics with this method. We also showed that this method could help identify outliers, thus paving the way for a safer integration of deep learning-based pipelines for remote PV mapping.
In future work, we plan to add an unsupervised estimation of the azimuth angle of the PV installations into our registry. Using BDPV as a baseline, we intend to assess the accuracy of estimating the tilt and azimuth angles by comparing the tilt and azimuth distributions generated by our detection model to the distributions coming from the BDPV database.
We also intend to pursue the model’s deployment until eventually mapping the whole of metropolitan France.