Log In Sign Up

DeepSolar tracker: towards unsupervised assessment with open-source data of the accuracy of deep learning-based distributed PV mapping

by   Gabriel Kasmi, et al.

Photovoltaic (PV) energy is key to mitigating the current energy crisis. However, distributed PV generation, which amounts to half of the PV energy generation, makes it increasingly difficult for transmission system operators (TSOs) to balance the load and supply and avoid grid congestions. Indeed, in the absence of measurements, estimating the distributed PV generation is tough. In recent years, many remote sensing-based approaches have been proposed to map distributed PV installations. However, to be applicable in industrial settings, one needs to assess the accuracy of the mapping over the whole deployment area. We build on existing work to propose an automated PV registry pipeline. This pipeline automatically generates a dataset recording all distributed PV installations' location, area, installed capacity, and tilt angle. It only requires aerial orthoimagery and topological data, both of which are freely accessible online. In order to assess the accuracy of the registry, we propose an unsupervised method based on the Registre national d'installation (RNI), that centralizes all individual PV systems aggregated at communal level, enabling practitioners to assess the accuracy of the registry and eventually remove outliers. We deploy our model on 9 French départements covering more than 50 000 square kilometers, providing the largest mapping of distributed PV panels with this level of detail to date. We then demonstrate how practitioners can use our unsupervised accuracy assessment method to assess the accuracy of the outputs. In particular, we show how it can easily identify outliers in the detections. Overall, our approach paves the way for a safer integration of deep learning-based pipelines for remote PV mapping. Code is available at


DeepRec: An Open-source Toolkit for Deep Learning based Recommendation

Deep learning based recommender systems have been extensively explored i...

ResAttUNet: Detecting Marine Debris using an Attention activated Residual UNet

Currently, a significant amount of research has been done in field of Re...

PyTorch-Hebbian: facilitating local learning in a deep learning framework

Recently, unsupervised local learning, based on Hebb's idea that change ...

Code-free development and deployment of deep segmentation models for digital pathology

Application of deep learning on histopathological whole slide images (WS...

METER-ML: A Multi-sensor Earth Observation Benchmark for Automated Methane Source Mapping

Reducing methane emissions is essential for mitigating global warming. T...

Code Repositories


Automated pipeline for large scale detection of solar arrays in France

view repo

1 Introduction

In 2021, photovoltaic (PV) generation amounted to 821 TWh worldwide and 14.3 TWh in France. The installed capacity is estimated to be about 633 GWp worldwide [iea2021solar]. In France, the installed capacity amounts to 13.66 GWp [rte2021bilan]. PV energy generation is rapidly growing and is key to fulfilling the goals of the sustainable development scenario (SDS) [haegel2017terawatt]. However, public authorities and industry stakeholders often lack knowledge about the installed PV capacity, especially regarding distributed PV installations [erdener2022review], which, in the context of growing installed capacity, could result in increased congestion, overgeneration, and supplementary reserve request [pierro2022progress].

In this context, transmission system operators (TSOs) need information regarding the PV production stemming from distributed or behind-the-meter installations to balance load and supply and avoid congestion on the transmission grid. Indeed, utility-scale PV outputs can be directly measured or estimated with regional PV models. However, the lack of precise knowledge regarding distributed PV installations’ characteristics makes it impossible to fit regional PV models. In order to quickly acquire information regarding the distributed PV fleet, remote sensing methods are appealing. Recent works [yu2018deepsolar, mayer20223d]

have shown that convolutional neural networks (CNNs) can be leveraged to quickly and accurately map solar installation over large areas.

However, it remains hard to assess the accuracy of these mapping pipelines during their large-scale deployment. As pointed out by [de2020monitoring, kausika2021geoai], this lack of accountability limits the ability to use such to produce official statistics regarding PV installations. [kausika2021geoai]

relied on manual verification to monitor the precision and recall of their model over large areas. We propose using open data statistics to automatically perform this monitoring, thus enabling fast deployment and quality control of automated PV mapping models. Our main contribution is to provide an unsupervised, simple, and scalable way of measuring the accuracy of our automated detection pipeline regarding the estimated installed capacity and number of installations. To summarize, our contributions are the following:

  • We extend state-of-the-art methods to map the characteristics of distributed PV installations over France using a deep learning-based pipeline. We provide the tilt, surface, and installed capacity for each installation. We cover a total area of more than 50 000 square kilometers, currently the largest registry with this level of detail.

  • Using open-source data, we design an unsupervised method that can assess the accuracy of the registry over the whole deployment area.

  • We demonstrate that our accuracy assessment method enables practitioners to quickly identify aberrant data generated by the automated pipeline, thus improving the safety of deep learning-based PV panel detection.

2 Related work

Remote detection of solar arrays on overhead imagery is now a well-established field of research. [de2020using] provides a complete overview of the works in this field. First works [malof2015automatic, malof2016automatic] used to rely on hand-crafted features and classification algorithms to identify PV panels on aerial images. The advent of deep-learning [lecun2015deep] enabled large scale mapping of PV panels using semantic segmentation [yuan2016large].

The DeepSolar project [yu2018deepsolar]

was a significant milestone as a deep learning-based pipeline was used to detect PV installations and estimate their surface area for the first time. The method relies on a two-step pipeline: images are classified, and if an image contains an array, it is passed to a segmentation model to identify the polygons corresponding to the PV installation. With this method, the authors achieved a precision of 93.1% (recall: 88.5%) in residential areas and precision of 93.7% (recall: 90.5%) in non-residential areas. This work triggered efforts to construct CNN-based pipelines aiming to map installations over large areas

[mayer20223d, kausika2021geoai, kruitwagen2021global]. As pointed out by [de2020monitoring], pipelines developed over one territory cannot be straightforwardly applied to another. Besides, as it remains difficult to assess that the accuracy remains the same over the whole deployment territory, automated PV registries pipelines remain scarcely used for official statistics construction [de2020monitoring]. As a first step towards a better assessment of the accuracy of deep learning-based PV mapping pipelines, [kausika2021geoai] leveraged manual annotators to compute the precision and recall in areas that were not in the training dataset. However, their method is very labor-intensive and does not address the needs of practitioners. Besides, as pointed out by [stowell2020harmonised], France’s distributed PV generation has not been yet mapped.

3 Data

3.1 Training data

We train our classification and segmentation models on a new dataset called BDAPPV (Base de données d’apprentissage profond photovoltaïque). This dataset contains labeled thumbnails of PV panels. These PV panels come from a PV database maintained by the non-profit association of small owners of PV panels "asso BDPV" (Base de données photovoltaïque). Table 1 summarizes the characteristics of our training dataset. A crowdsourcing campaign enabled us to gather this training data, which we will release in a forthcoming publication.

Dataset Total number of samples Positive samples (share in %)
Train 12127 5445 (44.90)
Validation 1732 755 (43.59)
Test 3466 1485 (42.84)
Total 17325 7685 (44.36)
Table 1: Training dataset characteristics

3.2 Geographical data

3.2.1 Orthoimagery

We do image classification and segmentation on RGB orthoimagery. These images are provided by the Institut Géograhpique National (IGN) and are freely accessible online. This dataset is called BD ORTHO. The ground sampling distance of these images is 20 cm/pixel. These images cover all French départements. For our study, we downloaded the image bundles of 9 French départements, located in the North, West, South and East of France, covering approximately 10% of the French metropolitan territory.

3.2.2 Topographic data

We use topographic data, also provided by the IGN under an open license. This dataset is called BD TOPO. This data contains information on all the buildings registered in France. The main aim of this dataset is to filter the detections.

3.2.3 Distributed PV characteristics

At the characteristics extraction stage of our PV registry pipeline (see figure 1), we use the PV characteristics gathered in the BDPV database to calibrate the characteristics extraction module of our pipeline. These characteristics include the tilt, azimuth, surface, and installed capacity of PV installations.

3.3 Monitoring data

3.3.1 PV registries

The final data we use is the Registre national d’installations (RNI), which aggregates the number of PV installations and the aggregated installed capacity per city. This database is updated every year and is provided under open access.

4 Methods

4.1 Automated registry pipeline

We build on [mayer20223d] to construct an automated PV registry pipeline. This pipeline takes as input orthoimages and topological data and outputs PV panels’ characteristics and localization. Figure 1 summarizes our approach.

4.1.1 Classification and segmentation

We fine tuned our classification and segmentation models on our training dataset. During inference, we classify all images and segment only images on which a PV panel is detected. Segmentation masks are then converted into polygons. We use the weights of [mayer20223d] as an initialization for our fine-tuned models. Our classification model is based on the Inception-v3 architecture [szegedy2016rethinking] and our segmentation model is based on the DeepLab-v3 architecture [chen2017rethinking].

4.1.2 Characteristics extraction

We convert the polygons into PV panels’ characteristics. We estimate the tilt angle based on the tilt angles reported in the BDPV database. The BDPV database spreads across France. We first cluster our detections by their surface. Then, for each surface cluster, we impute as the tilt the average tilt computed from nearby installations recorded in the BDPV database. The main objective is to capture the geographical variability of the tilt angle. The advantage of this method is that it is fast to compute and does not require surface models. Using the tilt angle and projected polygon area, we infer the real surface of the PV installation. Finally, we use the real surface to infer the installed capacity. Like previous works

[so2017estimating, mayer20223d], we assume a linear relationship between surface and installed capacity. The coefficient captures the efficiency of the PV panels.

4.1.3 Postprocessing

We use the BD TOPO to filter all polygons that are not on a building. We also filter installations that are either too small (the threshold is set at 1.7 square meters, the typical area of a single PV module) or too large. The upper threshold is 36 kWp as the RNI focuses on installations with an installed capacity lower than 36 kWp.

Figure 1: Automated PV characteristics extraction pipeline

4.2 Unsupervised accuracy tracking

4.2.1 Matching

The RNI provides the total number of installations and the aggregated installed capacity city-wise. To use it as a reference, we aggregate our detections city-wise.

4.2.2 Metrics definition

Using the RNI and the aggregations coming from our pipeline, we compute the following accuracy metrics:

  • The average percentage error (APE) and the mean APE (MAPE) computed over the whole département. The APE and MAPE are computed with respect to the installed capacity in city ,

  • The detection ratio , based on [mayer20223d] computed at the city level and averaged over the départements. This ratio is computed with respect with the number of installations in city ,

  • The average installation percentage error (AIPE) which is the APE computed for the average installation. By construction, a negative AIPE indicates that we underestimate the installations’ size, a positive AIPE indicates that we overestimate them.

5 Results

5.1 Classification and segmentation accuracy

Our fine-tuned model achieves competitive results compared to state-of-the-art models (see table 2). For the classification branch, we reach a F1-score of 0.84. For the segmentation branch, we reach an Intersection-over-Union (IoU) of 0.86.

Classification Segmentation
Work F1-score IoU
[mayer20223d] 0.87 0.74
[parhar2021hyperionsolarnet] 0.96 0.82
Ours 0.84 0.86
Table 2: Classification and segmentation accuracy

5.2 Large scale accuracy assessment

We conduct our analysis on 9 French départements: Nord (north), Loire-Atlantique (west), Hérault (south), and 6 out of 8 départements of the Rhône-Alpes region (southeast). This way, we ensure sufficient geographical diversity in landscapes, population densities, and architectural styles. The total area covered is 51858 square kilometers. It is currently the largest area mapped with this level of detail, as [mayer20223d] covered an area of 34000 squared kilometers. It roughly corresponds to 10% of the French metropolitan territory. We can then compute the accuracy in terms of APE, detection ratio, and AIPE over the whole area.

5.2.1 Overall results

Table 3 reports the MAPE, median APE, detection ratio and mean AIPE across départements and overall. We can see that the MAPE is 47%, and the mean ratio is slightly above 1. In detail, we can see that we tend to slightly overestimate (by 16%) the size of the installations compared to the baseline. Performance for all metrics is relatively constant across all départements but the Nord.

Département MAPE (%) median APE (%) mean mean AIPE (%)
44 (Loire-Atlantique, west) 39.09 38.05 0.67 22.83
69 (Rhône, east, urban) 31.99 28.91 0.83 12.18
59 (Nord, north, urban) 130.06 88.13 2.23 61.16
34 (Hérault, south) 26.80 17.78 1.01 6.82
01 (Ain, east) 35.90 35.35 0.77 6.18
38 (Isère, east) 33.41 31.15 0.81 9.18
42 (Loire, east) 29.12 23.81 1.00 15.42
26 (Drôme, south east) 30.46 23.51 0.92 3.74
74 (Haute-Savoie, east, 44.08 41.18 0.67 -4.61
Overall 47.45 32.81 1.03 16.33
Table 3: Accuracy accross the area of deployment

Then, by displaying the scatter plot of the estimated installed capacity against the target installed capacity, we can identify cities where the estimation is very inaccurate. On the example depicted on figure 2, we can see that a few outlier cities contribute to the degradation of the overall accuracy of the model. These cities are located in the north département.

Figure 2: Scatter plot of the estimated installed capacity in kWp against the recorded installed capacity (in kWp). Each dot represents a city and the lighter the point, the higher the APE for that city.

This example shows how it is possible to monitor the accuracy of the registry pipeline over the whole area of interest. We can identify areas where the model performs well and where on the other hand, one should be cautious about the quality of the data produced. Moreover, we enable the practitioner to inspect outliers to understand why the model is wrong.

5.2.2 Qualitative inquiry

We focus on our worst city to understand why its accuracy metrics are so bad. By reading the registry for this city, we see that the count of detections is accurate (21 detections for 22 targets). However, the estimation of installed capacity is too high (388 kWp for a target of 66 kWp). Looking at the registry, we can see that one installation has an installed capacity of 304 kWp. By visualizing the results as displayed in figure 3 we can see that the model detected a large factory window as a panel. With these tools, the practitioner can decide what is best to do given his application.

Figure 3: Qualitative analysis of the model’s outcomes visualized in QGIS

5.3 Broader reach

The need for the French TSO for detailed information on distributed PV motivates this work. However, this need is ubiquitous in many European countries [killinger2018search]. It is possible to apply our accuracy assessment in these countries: for instance, the RNI can be substituted by the Marktstammdatenregister (MaStR)for Germany[bundes2022markt] , the Stamdataregister for Denmark[energi2022stamdata].

6 Conclusion and future work

We built on existing literature to construct an automated pipeline for large-scale distributed PV mapping and characterization. It addresses the burning need to map and characterize distributed PV installations. Existing works have not covered France, yet the growing PV installed capacity poses increasing challenges to stakeholders such as the TSO.

Crucially, we introduced an unsupervised method that enables stakeholders to assess the accuracy of the model’s outputs. We demonstrated that we could derive simple and interpretable metrics with this method. We also showed that this method could help identify outliers, thus paving the way for a safer integration of deep learning-based pipelines for remote PV mapping.

In future work, we plan to add an unsupervised estimation of the azimuth angle of the PV installations into our registry. Using BDPV as a baseline, we intend to assess the accuracy of estimating the tilt and azimuth angles by comparing the tilt and azimuth distributions generated by our detection model to the distributions coming from the BDPV database.

We also intend to pursue the model’s deployment until eventually mapping the whole of metropolitan France.