A Multi-Stage model based on YOLOv3 for defect detection in PV panels based on IR and Visible Imaging by Unmanned Aerial Vehicle

by   Antonio Di Tommaso, et al.

As solar capacity installed worldwide continues to grow, there is an increasing awareness that advanced inspection systems are becoming of utmost importance to schedule smart interventions and minimize downtime likelihood. In this work we propose a novel automatic multi-stage model to detect panel defects on aerial images captured by unmanned aerial vehicle by using the YOLOv3 network and Computer Vision techniques. The model combines detections of panels and defects to refine its accuracy. The main novelties are represented by its versatility to process either thermographic or visible images and detect a large variety of defects and its portability to both rooftop and ground-mounted PV systems and different panel types. The proposed model has been validated on two big PV plants in the south of Italy with an outstanding AP@0.5 exceeding 98 roughly 88.3 mAP@0.5 of almost 70 including panel shading induced by soiling and bird dropping, delamination, presence of puddles and raised rooftop panels. An estimation of the soiling coverage is also predicted. Finally an analysis of the influence of the different YOLOv3's output scales on the detection is discussed.



There are no comments yet.


page 7

page 15

page 16

page 17

page 18

page 19

page 20

page 21


Design, Integration and Sea Trials of 3D Printed Unmanned Aerial Vehicle and Unmanned Surface Vehicle for Cooperative Missions

In recent years, Unmanned Surface Vehicles (USV) have been extensively d...

Reinforcement Learning for Maneuver Design in UAV-Enabled NOMA System with Segmented Channel

This paper considers an unmanned aerial vehicle enabled-up link non-orth...

Computer Vision Tool for Detection, Mapping and Fault Classification of PV Modules in Aerial IR Videos

Increasing deployment of photovoltaics (PV) plants demands for cheap and...

Oil and Gas Pipeline Monitoring during COVID-19 Pandemic via Unmanned Aerial Vehicle

The vast network of oil and gas transmission pipelines requires periodic...

Automatic Crack Detection in Built Infrastructure Using Unmanned Aerial Vehicles

This paper addresses the problem of crack detection which is essential f...

Detecting Invasive Insects with Unmanned Aerial Vehicles

A key aspect to controlling and reducing the effects invasive insect spe...

Optimized Deployment of Unmanned Aerial Vehicles for Wildfire Detection and Monitoring

In recent years, increased wildfires have caused irreversible damage to ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Motivation

Photovoltaic (PV) installations continue to grow worldwide thanks to the increasing competitiveness of solar resource, the growing energy demand of developing countries and the effectiveness in addressing climate change and reducing global warming. In the last decade it has been estimated that the overall installed PV capacity increased from about 40 GW in 2010 to 627 GW in 2019 Detollenaere 20 , and the investments in the sector are still growing.

However, due to the exposure of outdoor PV panels and other plant components to environmental factors, they may experience premature thermo-mechanical stresses, which in turn can lead to a decrease of the PV plant efficiency and unexpected downtime depending on panel type. Crystalline PV modules are indeed usually protected by an aluminium frame and a glass lamination in order not to be exposed directly to environmental agents. They are typically mounted on fixed structures oriented to the sun or on pan-tilt sun trackers to maximize their efficiency to the direct solar radiation. On the other hand, thin-film PV modules, whose response is more dependent to the diffuse solar radiation, are wrapped in a waterproof and flexible enclosure that is typically glued on fixed sloped or flat warehouse rooftops. Module degradation is roughly estimated in 1% for crystalline PV modules Chandel 15 and 3-4% for thin-film Suarez 19 on average per year due to intrinsic and extrinsic deficiencies.

The factors affecting defect occurrence are manifold Kontges 14 and include, for example, crack (cell breakage, cracking of back sheet), cell oxidation or delamination, faults or disconnection of electrical components (e.g. junction box, by-pass diode), shading due to neighbour trees, buildings, soiling, bird dropping or snail tracks and rooftop slope. A defect often appears in the form of a hotspot where the photovoltaic effect does not occur anymore, causing local overheating, leading in turn to destructive effects, such as cell or glass cracking, melting of solder, degradation of the solar cell HSHP 21 or even fires and consequent irreparable catastrophes Falvo 15 .

Therefore, in order to keep high technical and economic performances of a PV system during its lifetime, high-quality and cost-effective Operation & Maintenance (O&M) activities are sought. Classical preventive or reactive maintenance strategies have been demonstrated to be sub-optimal due to the frequency of expensive visual on-site inspections or the cost induced by downtime and consequent component replacements, respectively. As a consequence, in order to optimise simultaneously O&M cost and plant efficiency, modern Data Analytics solutions based on predictive and early-detection strategies Betti 21 are nowadays taking hold and need to be integrated in the remote monitoring platforms.

Concerning panel inspection, traditional techniques include visual inspection and I-V curve tracing Kontges 14 , which however are expensive, time-consuming, cause stops to energy production and are unfeasible for large PV plants or plants located in remote areas. Alternative non-destructive techniques include instead Infrared (IR) and Electroluminescence (EL) Imaging Kontges 14 : the former exploits the IR radiation emitted by the defect due to the local overheating, whereas the latter the disconnection of abnormal cell from the electric circuit, leading to a lack of emitted IR radiation with respect to functional cells, when applying a current to the PV module Petraglia 11 . EL is able to recognize microcracks due to the high resolution, unlike IR imaging. However EL is an invasive technique conceived for analysis of a single PV module at a time, usually disconnected from the PV system and analysed indoor in a laboratory. IR imaging, especially if combined with aerial inspection by Unmanned Aerial Vehicles (UAVs) Quater 14 , is instead a contact-less method that can be applied under real-time operating conditions directly on-site during the normal operation of the PV system. Latest aerial inspection best practices extend post-processing analyst cognition through the simultaneous acquisition of IR and VIS imageries Zefri 18 .

All techniques, in particular those exploiting multiple sensors, produce a large amount of images which cannot be inspected visually by human operators because time-demanding, error-prone and expert-dependent: automated solutions for hotspot detections are therefore of utmost importance. Furthermore, while IR imaging allows easily to detect generically a hotspot, further insights about the cause of the defect can be captured more easily only if monitoring also the Visible (VIS) spectrum.

1.2 State of the art

In the last few years, different studies exploited IR imaging to detect hotspots in PV systems. One of the early works concerning the applicability of IR imaging of PV modules under outdoor conditions has been proposed by Buerhop 12

. Research works can be mainly grouped in three different categories depending on techniques involved and characterized by a progressively higher accuracy and lower processing time: (i) based on Image Processing, (ii) Machine Learning algorithms and (iii) deep Convolutional Neural Networks (CNNs).

Category (i) usually involves steps such as edge detection and Hough transform to detect the linear border of the PV module and application of pattern recognition methods based on the analysis of the intensity distribution of emitted radiation to detect faulty areas 

Leotta 15  Aghaei 15 . In Leotta 15 the model was first tested on images of indoor panels installed in laboratory and then on outdoor panels installed on-site, verifying the unsuitability for real-time operations. Greco 17 proposed instead a double-stage procedure composed by a preliminary panel detection based on Hough transform and then hotspot detection based on a combination of a color based analysis with a model based one to rule out possible defect candidates due to heating localized at junction box. They obtain a remarkable computing speed of 25 FPS at a price of a F1-score not exceeding 60%. In general, however, the accuracy and processing rate of such methods (i) is unsatisfactory and unsuitable for implementation in O&M services.

Group (ii) is instead usually based on two main steps: extraction of hand-crafted features and use as inputs for training a classification algorithm based on machine learning to discriminate modules as either nominal or defective. While such methods have usually higher accuracy than category (i), they require domain experience for feature extraction and may be unsuitable for real-time detection. In

Kurukuru 19

first-order and second-order texture features were extracted and used to train a shallow NN to classify eight different fault classes, achieving finally a 91.7% testing accuracy.

Salazar 16

implemented a semi-automated process based on k-means clustering to determine the hotspot area and the corresponding average temperature.

Deitsch 19

process EL images of solar cells by extracting local descriptors at keypoints and inputting them into a Support Vector Machine (SVM) algorithm to classify the cells either as defective or functional. They also demonstrate that a more modern approach based on VGG19 network outperforms the SVM-based method.

Methods (iii) are now progressively replacing previous techniques thanks to the superior performances of CNNs demonstrated in visual recognition tasks and the further advantage of automatic features extraction, which definitely leads to an increase of the computing speed. CNNs are often used in combination with IR images collected by UAVs, due to their recent diffusion. Pierdicca 18 uses a VGG16 network to classify PV cells either as damaged or functional achieving a F1-score up to roughly 70%, and showing the impact of unbalancing on performances. In Oliveira 19 first a Gaussian filter is applied to reduce noise and a Laplacian operator to highlight edges. Then the image is segmented into defective and normal areas by using thresholding, and finally the binary mask is used to train a base network VGG16 to recognize disconnected substrings, hotspots and disconnected strings.

Other works focused on visible images to detect the cause of hotspot, e.g. soiling Hwang 20 or dust Yap 15 Unluturk 19 . Authors in Hwang 20

applied image processing techniques to UAV images to assess the soiling rate, while they postponed to a future work combination of artificial intelligence and statistical algorithms to evaluate the soiling distribution on PV panel surface.

Yap 15 and Unluturk 19

simulated instead different dust coverage and demonstrated that image matching, or a shallow neural network fed with texture features extracted by gray level co-occurrence matrix, are effective solutions for evaluating the degree of pollution.

However, while the literature is quite rich, a limitation is that each panel is examined independently in laboratory Oliveira 19 Yap 15 Unluturk 19 Mehta 18 Pierdicca 18 , requiring their disconnection from the field, stopping energy generation and making unsuitable their application on large-scale PV systems. This is also a simplified scenario with respect to UAV images captured on-site and differing for altitude, orientation, point of view, light intensity due to sun movement across the sky or cloud coverage, as well as presence of blurring due to a sudden camera movement caused by unexpected gusts of wind.

More critically, the interest is still in solving classification problems, whereas defect detection based on CNNs is still at an early stage. In fact, at the best of our knowledge, only few works have been presented so far Mehta 18 Herraiz 20 Ashok 18 Pierdicca 20 . In Mehta 18 a four-stage process is proposed which completely avoids the time-demanding activity of manual labelling of training data. The model includes Mask FCNN to predict simultaneously the soiling localization mask and the percentage power loss, as well a webly supervised NN (WebNN) to predict the soiling category. It has been tested on a large dataset of more than 45k RGB images in the VIS spectrum of two different PV panels installed in laboratory, achieving a remarkable frame rate of 22 FPS. Pierdicca 20 proposed an anomaly cell detection systems based on instance segmentation of thermal images. They adopted the Mask R-CNN architecture and benchmarked it with other image segmentation networks like U-Net, FPNet and LinkNet. Authors demonstrated that U-Net outperforms Mask-RCNN in terms of both accuracy and speed, but with the key-advantage of Mask R-CNN to directly returns the position of each single defective cell.

In Herraiz 20 a cascade three-stage detector is instead presented by using the double-stage region-based R-CNN. The IR image is initially rotated to align the panels to the image border by using the Sobel edge detector, then a panel detector is applied to detect panel areas and finally a hotspot detector identifies hotspots on such proposed regions. The model gains a sensitivity of almost 89% for both panel and hotspot detection, exploiting also telemetry data to deliver the hotspot location with a mean error of almost 0.86 m. Finally Ashok 18 discussed the applicability of YOLO Redmon 16 on thermal images acquired by UAV for an unspecified collection of PV plants in India. While they demonstrated the suitability of the network for detection of hotspots of varying size, results were at an embryonic stage since no discussion was presented about performances and drawbacks of the proposed model. Furthermore, YOLO did not allow detection at multiple scales or recognition of crowded objects, issues overcome in later versions of YOLO.

1.3 Paper contribution

Urged by the aforementioned problems still unsolved, in this work we propose a novel multi-stage architecture for the detection of anomalies in images of PV panels collected on-site by UAV. The model is composed by three main components: (i) a panel detector which detects the PV panel area, (ii) a defect detector which identifies the defects in the whole input image and (iii) a False Alarm Filter which removes false positives of defects detected outside the PV panel region proposals.

While this work shares some common points with that presented by Herraiz 20 , it extends and differs from the latter for the following reasons: (i) it operates on images of both the IR and visible spectrum, (ii) the application to two large PV plants of installed capacities of tens of MW and corresponding to different panel types (polycrystalline or thin-film) and installation types (ground or roof mounted), which is uncommon in Literature due to stringent data sharing policy, (iii) the simultaneous analysis of a large variety of defects, such as hotspot, the thermal stress induced by junction box, bird dropping, delamination and soiling, and including also issues not yet discussed adequately, such as raised rooftop panels and stagnant water (puddles), (iv) the estimation of the panel area affected by soiling occurrence, which is an important feedback for O&M operator to quantify the amount of PV cells affected by power loss and schedule more effective maintenance operations, (v) the application of the end-to-end single-stage detector YOLOv3 Redmon 18 as core network of the multi-stage model and, finally, (vi) the portability of the model architecture on plants of different size, locations and panel types and for images collected in different wavelength bands, once a sufficient training statistics is available.

2 Case study

Throughout the paper, we will refer to two PV plants, here below denominated as Plant_Sicilia and Plant_Campania, which are located in the southern of Italy, as shown in Fig. 1.

Figure 1: Location of the two considered PV plants in Italy. The red circle size is proportional to the installed capacity. The insets show an example of images captured by UAV.

2.1 PV plants details

Plant_Sicilia is composed by polycrystalline modules installed on the ground and is situated in Sicily, a region in the South of Italy. The photovoltaic installation, consisting of over 20 thousand panels, has a nominal capacity of 9 MW. It generates over 7.2 million kilowatt-hours per year, enough to meet the energy needs of almost 2000 households, as well as avoiding the atmospheric emission of over 3800 tonnes of CO2 per year.

Plant_Campania is instead composed by thin-film photovoltaic modules installed on the flat roof of 56 commercial and logistics buildings in the italian region of Campania. The plant has a nominal capacity of 21 MW and can produce approximately 33 million kilowatt-hours of power each year, satisfying the consumption needs of 13,000 households and avoiding the atmospheric emission of more than 21 thousand tonnes of CO2 per year.

2.2 Datasets

The inspection was performed by means of a drone Sigma Ingegneria Efesto MKII equipped with a flight controller DJI A2 and a gimbal system designed by Sigma Ingegneria. Two cameras were mounted on the drone: Workswell WIRIS 640 second Gen and MAPIR Survey3N RGB. MAPIR camera was used to get high resolution images in the visible spectrum (VIS-HR), whereas WIRIS camera took thermal IR images (LWIR) and aligned low resolution visible images (VIS-LR). The inspections for each plant were conducted in different days and time in order to collect a rich statistics in terms of illumination conditions, background and weather.

According to standard Machine Learning methodology, each dataset has been split randomly in training (70%), validation (15%) and test (15%) by implementing stratified sampling for each defect class. In particular, the validation set has been used for model optimization and performance evaluation, whereas the test set for performance evaluation only. More details are provided in the following subsections.

2.2.1 Plant_Sicilia’s dataset

Six flights were executed collecting around 500 images for each. We selected and manually annotated 2038 radiometric LWIR images all of size 640x512 for hotspot detection corresponding to different UAV flights and including PV panels with different shape, size, and orientations. In such images hot areas appear often as a red region over a cold blue background (Fig. 2(a)).

Due to the presence of frequent local overheating in correspondence of junction boxes, as discussed for example also in Salazar 16 , we annotated two different classes: (i) hotspot and (ii) thermal stress induced by junction box. Fig. 2(b) shows examples of Ground Truth Boxes (GTBs) for both classes: class (ii) appears as a couple of hot points which, without a proper annotation, may be misclassified as hotspot by the detector. Concerning class (i), we did not distinguish between heating affecting one cell or group of contiguous cells, as instead in Ashok 18 , since our statistics is mainly limited to single defective cells.

Figure 2: (a) Hotspot example highlighted by a black circle; (b): examples of hotspot (green rectangle) and junction box (blue rectangle) instances. The PV system under consideration is Plant_Sicilia.

Tab. 1 shows the statistics available for training, validation and test sets. The dataset includes 1426 training images and 306 images for both validation and test. As may be seen in Tab. 1, a severe class unbalancing occurs, being the minority class ”hotspot” consisting of only 5.44% samples of the overall dataset, i.e. almost 17 times smaller than the ”junction” class. The statistics includes also around 6k annotated panels, with 926 samples left out for test.

Dataset hotspot junctions panels
Train 341 (70.75%) 5920 (70.62%) 4152 (69.42%)
Validation 70 (14.52%) 1168 (13.93%) 903 (15.10%)
Test 71 (14.73%) 1295 (15.45%) 926 (15.48%)
Overall 482 8383 5981
Table 1: Dataset statistics for Plant_Sicilia dataset. The number of GTBs for each class of defect and panel detectors is reported for the overall, training, validation and test sets. The percentage of the overall dataset is also shown in round brackets.
Figure 3: 2D distribution of GTBs of (a) hotspots (b) junction boxes and (c) PV panels in the plane (A, AR), where A is the GTB area normalized with respect to the image area and AR is the aspect ratio. The marginal distributions are also reported as shaded red areas. The plant under consideration is Plant_Sicilia.

Figure 3 gives an overview of the main features of the proposed dataset. Hotspots are challenging targets with a tiny average area of just 0.1% of the whole image area (Fig. 3(a)). In addition, junction box area is almost comparable to hotspots (Fig. 3(b)), therefore enhancing the likelihood of false positives. The GTB of PV panels have roughly an area smaller than 40% of the whole image area and an aspect ratio peaked around 0.5, i.e. it is more likely to encounter panels oriented vertically than horizontally (Fig. 3

(c)). The Probability Density Function (PDF) of the panel area is peaked around 5% of the whole image area and almost flat between 20% and 30%, with a mean area of about 13.7% (Fig. 


2.2.2 Plant_Campania’s dataset

We prepared and manually annotated 1500 digital VIS-LR images of size 1600x1200 in order to identify defects and related cause, thanks also to a deeper inspection on VIS-HR images. The images were captured over the roof of five different buildings in order to increase statistics and model robustness. Six different defect classes were annotated: (i) puddle, (ii) soiling, (iii) strong soiling, (iv) delamination, (v) raised panel and (vi) bird dropping.

More in details, ”puddle” corresponds to the accumulation of water on the panels installed on flat rooftops. For sloped rooftops or crystalline panels mounted on trackers it has been verified in Gaur 14 that water flow increases the module efficiency by decreasing the module temperature and acting as a cleaner. On the other hand, on flat rooftops, not properly working drainpipes or rooftop hollows produce puddles of stagnant water that will persist until complete evaporation. In case of a dirty rainfall, suspended dust will leave a soiling layer that will grow in time. As a consequence, when puddle cover PV panels they are eligible to become strong soiling, progressively altering the module operation and decreasing the irradiance absorbed by the module. Even more dangerous, if water accumulation persists in time, it may penetrate into the panel through eventual micro-cracks or delamination points, causing corrosion, leakage current increment and eventual short-circuits. It can be also an early symptom of sinking for roof-mounted PV modules. Therefore maintenance interventions are recommended to identify and remove such potential issue.

Classes ”soiling” and ”strong soiling” are instead related to deposition of dust, drifting sand, car and industrial exhaust fumes, etc. on the PV panels, causing partial shadowing and greatly reducing the absorption of solar irradiance. They appear usually as spots with different and irregular size. In particular, we label as strong_soiling instances characterized by a dense pattern after a visual inspection, unlike class soiling for which deposition is more sparse.

”Delamination” occurs when the adhesion between glass, encapsulant, solar cells and back layers is compromised because of contamination (i.e. improper cleaning of the glass) or environmental factors Kontges 14 . It is usually followed by moisture ingress and corrosion (panel oxidation), causing optical reflection and subsequent decrease in module efficiency. When the delamination takes place, it can affect a single cell or multiple cells. In any case, it is required the replacement of the whole PV panel.

Solar cells in Plant_Campania are deposited as a thin layer on the flexible backing strip having a coating of adhesive applied that adheres directly to the roof surface. Flexible solar strips are particularly suitable for large coverages with any kind of tilt angle such as industrial floor, business and sports centres. They have typically a lower efficiency than mono or polycristalline modules but a higher sensitivity to diffuse radiation, lower weight and simplify installation. However it may happen that one or more pieces of panel unglued from the base: we call it as ”raised panel”. The issue typically insists in the junction area between two consecutive module cells, where the panel thickness is lower due to the absence of amourphous silicon. The reason of such behaviour is not really resolved yet: possible causes include heat dissipation, the flexibility of the panel itself and the erroneous glue application during installation. Raised panels need to be glued again to avoid integrity damages and its related consequences. The latter may consist in delamination, lower insulation resistance and correspondent increase of the leakage voltage values, which in turn may invalidate the wet insulation and lead to PV plant interruption.

Finally class ”bird dropping” corresponds to the accumulation of bird dropping on the panel. It is one of the most aggressive type of soiling as it can burn into the glass and cause hotspots under intense sunlight conditions. This defect usually occurs in multiple instances distributed over the same panel.

Fig. 4 shows some examples of GTBs for the mentioned classes. As can be seen, the defects were annotated either at instance (puddle, strong soiling) or panel (soiling, delamination, raised panel, bird dropping) level depending on the localization of their features over well-defined and delimited areas or their irregular distribution over the whole PV panel, respectively.

Figure 4: Examples of GTBs (red rectangles) for the defect classes of Plant_Campania: (a) puddle, (b) soiling, (c) strong soiling, (d) delamination, (e) raised panel and (f) bird dropping.

Tab. 2 reports the statistics available for training, validation and test sets for defects and panels, as well as the statistics singularly for each defect class. The dataset includes 1050 images for training and 225 for both validation and test. It contains, globally, 9984 defect instances grouped in 6 classes, as shown in Tab. 2. The majority (minority) class is represented by bird dropping (strong soiling) with roughly 41.0% (4.3%) of the statistics available. As may be seen in Fig. 5

, the defect size can be clustered in two main groups depending on the adopted annotation either at instance or panel level, with an average dimension of the order 0.1-0.4% in the first case, i.e. for classes puddle and strong soiling, and roughly one order of magnitude higher for the remaining classes. The most challenging class to detect is represented by puddle, with a 95% quantile of the GTB area of about 0.4% of the image dimension.

class Train Validation Test Overall
strong_soiling 270 (3.96%) 72 (4.32%) 86 (5.74%) 428 (4.29%)
raised_panel 311 (4.56%) 72 (4.32%) 92 (6.14%) 475 (4.76%)
delamination 347 (5.09%) 103 (6.18%) 73 (4.87%) 525 (5.26%)
soiling 675 (9.90%) 197 (11.81%) 188 (12.55%) 1060 (10.61%)
puddle 2348 (34.44%) 541 (32.43%) 514 (34.31%) 3403 (34.08%)
bird_dropping 2867 (42.05%) 683 (40.95%) 545 (36.38%) 4095 (41.01%)
Defects 6818 1668 1498 9984
Panels 12718 2775 2687 18180
Table 2: Number of defect instances grouped by class for Plant_Campania. In the brackets the percentage with respect to the corresponding dataset is also shown. The last two rows indicate the overall number of defect and panel instances available.
Figure 5: 2D distribution of GTBs in the plane (width, height) for each defect class, where the GTB size is normalized with respect to the image dimension. The marginal distributions are also shown as shaded red areas. The PV system under consideration is Plant_Campania.
Figure 6: 2D distribution of GTBs of PV panels in the plane (a) (width, height) and (b) (A, AR), where A is the normalized GTB area and AR is the aspect ratio. The marginal distributions are also shown as shaded red areas. The PV system under consideration is Plant_Campania.

Concerning panels, almost 18k instances were annotated with 2687 samples left out for test (Tab. 2). In addition, they exhibit a side length usually not exceeding the 40% of the image dimension (Fig. 6(a)), with a mean area of almost 3.6% and an aspect ratio close to 0.5 (Fig. 6(b)).

3 Methods

The proposed approach consists of a multi-stage architecture composed by three main processing modules and may be easily applied to aerial images in both the IR and VIS spectrum with modest customization: (i) a Panel Detector detecting the PV panel areas, (ii) a Defect Detector, which identifies the defect instances by processing the whole input image, and (iii) a False Alarm Filter, which finally post-processes the outcomes of the previous modules and filters out the defect proposals outside the predicted panel areas.

The model pipeline is depicted in Fig. 7: the input image captured by UAV is fed into the Panel and Defect detectors, both based on a sequence of Computer Vision techniques, geometrical transformations and Artificial Intelligence based on YOLOv3 deep neural network (Fig. 8). Since YOLOv3 works with rectangular bounding boxes, in case of ground truth delimiting the whole panel area, performances are better if the edges of the panels are aligned to the edges of the image. Consequently, for panel detection, and also for defect detection in VIS images where ground truths for soiling, delamination, raised panels and bird dropping extend over the whole panel rectangles (Fig. 4), the image is preliminary rotated to maximize localization skills by first detecting the linear edges of the panels and then rotating it according to the identified predominant directions. Then defects and panels are detected on the resulting images by using YOLOv3.

Finally the outcomes are ingested into the False Alarm Filter which operates as following: (i) it expresses defect proposals into the reference system of panels (actually the rotation is necessary only for IR images), (ii) it discards the defect proposals (red rectangles) detected outside the proposed panel areas (green rectangles) which can be due to sun’s glare or other external agents and (iii) it returns only the defects identified inside the panels (blue rectangles) by anti-transforming them into the original reference system of the input image. Details about the building blocks of the model are provided in the following sections.

Figure 7: Pipeline of the proposed model. The input image (either IR or VIS) is fed into the Panel and Defect detectors based on YOLOv3 architecture and Computer Vision. The resulting outcomes are processed by the False Alarm Filter, which screened out all defect proposals (red rectangles) outside the detected panel areas (green rectangles), returning only defects inside (blue rectangles) expressed into the reference system of the input image. In case of VIS images, also the strong_soiling coverage is provided.
Figure 8: YOLOv3 architecture composed by the backbone Darknet-53 and the Feature Pyramid Network making multi-scale object prediction.

denotes the element-wise summation operation, whereas ”concat” corresponds to lateral connection along depth. In the figure we assume C= 80 classes (COCO dataset 

Lin 14 ).

3.1 YOLOv3 network

YOLOv3 Redmon 18 is a single-stage end-to-end network composed by a backbone and a head subnet (Fig. 8

). The backbone is represented by the Darknet-53 network and is responsible for features extraction from input image. In order to expand the receptive field and get more contextual information useful for detecting small targets, it applies downsampling based on strided convolutions. To overcome the problem of vanishing gradients with the increase in network depth and extract more robust features, it implements skip connection by using a sequence of residual units 

He 15 .

The head subnet is built on top of the backbone and makes prediction at three different scales 1313, 2626 and 5252 by means of a Feature Pyramid Network (FPN) Lin 16 . FPN alleviates the problem of detecting objects of different scales by implementing feature fusion via a top-down pathway and lateral connections between feature maps of different resolutions. Indeed deep low-resolution features bring a high semantic value, but the position information of the corresponding feature maps is weakened, resulting more suitable for detecting large objects due to the large receptive field. On the other hand, earlier high-resolution features have semantically low value and are better for detection of smaller objects due to limited receptive field. FPN makes YOLOv3 more accurate than Single Shot Detector Liu 16 , where detection is done separately on feature maps having different scales, and competitive with double-stage detectors, such as Faster R-CNN Ren 15 , which are however much slower than YOLOv3.

YOLOv3 solves the detection task as a regression problem by resizing the input image to a default size (416416 in our case) and dividing it in a grid for each output scale, where each grid cell yields in output an array whose shape is B(5+C), where B is the number of rectangular bounding boxes a cell can predict, 5 is for the number of bounding box attributes and the object confidence, and C is the number of classes. Non Maximal Suppression (NMS) is finally used to keep only the predicted bounding boxes having highest confidence.

3.2 Edges detection and image rotation based on Computer Vision

YOLOv3’s localization performances are optimal if target edges are parallel to the image edges in order to maximize the intersection between ground truths and detected bounding boxes. We apply therefore a classical edge detection method to find the panel edges, get the predominant directions and rotate the image accordingly. In particular, the procedure is composed by three main steps: (i) edges detection based on Canny detector, (ii) lines detection by means of Hough Transform, (iii) image rotation.

3.2.1 Edges detection

We adopted the well-known Canny edge detector Canny 86 which represents a good compromise between accuracy, computational time and algorithm complexity. It is a multi-stages algorithm composed by the following steps:

  1. Noise reduction: a 5x5 Gaussian filter is exploited to smooth the input image

  2. Computation of the image intensity gradient: a Sobel operator is applied to the smoothened image to get the first derivative of the image intensity in both horizontal and vertical directions, i.e. and , respectively. From the two resulting images, edge gradient magnitude and direction for each pixel are computed as follows:

  3. Non Maximum Suppression: candidate edges correspond to points where the gradient is maximum.

  4. Hysteresis thresholding

    : a double-threshold image binarization method is applied with thresholds

    . All the candidate edges having () are retained (discarded). The remaining edges are maintained only if connected to edges having .

Figure 9: Image rotation methodology: (a) input image, (b) edge detection based on Canny detector, (c) predominant linear edge (red) identified by Hough transform, (d) rotated image.

Starting from input image shown in Fig. 9a, we obtain the binary image depicted in Fig. 9b, where the white pixels set to 1 correspond to detected edges.

The two thresholds above and , which mostly influence the number of detected edges, were set empirically by a trial and error approach on training set to default values of 450 and 550, respectively. Moreover, to face the issue of missing edges in some circumstances, we designed an iterative in-house algorithm that dynamically changed the thresholds depending on the input image. In brief, starting from the values above, the number of detected edges are counted and, if , then and are decreased by 50, and the edge detection is repeated. The procedure is stopped once a suitable number of pixels set to 1 is achieved, i.e. . Finally, to set the optimal value of , we apply the inflection point (Elbow) method on the distribution of sorted increasing values of computed on training images by using default values for and . We found .

3.2.2 Lines detection

In order to detect linear edges, we applied the Hough Transform Shehata 15 on the binary image presented in Fig. 9b. Hough transform maps the edge points into cosine curves in the Hough Space and detected lines correspond to pairs having a number of intersections larger than a threshold. In order to detect only one predominant direction suitable to rotate the image, we implemented an iterative approach starting from a high threshold and decreasing progressively its value until one line and the corresponding angle with respect to the horizontal axis are returned. An example of the result achieved is shown in Fig. 9c.

3.2.3 Image rotation

Finally, we applied a geometrical rotation to the input image based on the computed angle. Fig. 9 summarizes from left to right the steps followed to rotate the input image.

3.3 Defect Detector

YOLOv3 requires a list of images and the corresponding set of ground truth boxes and class labels to learn. Annotation has been realized by means of the open LabelImg graphical user interface LabelImg : it has been done at instance level in IR images and either at instance or panel level, depending on class, in VIS images (section 2.2). Consequently, the defect detector includes a preliminary VIS image rotation to better fit anchors on ground truths during YOLOv3’s training and inference (Fig. 7).

Since some classes exhibit only hundreds of instances (Tables 1 and 2), fine-tuning starting from pre-trained weights on COCO dataset Lin 14 has been preferred to training from scratch. This allows to effectively transfer the knowledge from the original detection task to the new domain, while handling with a limited amount of data. It also speeds up learning.

To further increase the statistics available during fine-tuning, Data Augmentation (DA) based on geometrical transformations have been applied to training data (Tab. 3). Optical distortion acting in the HSV spectrum was instead not modelled as it could weaken the distinctive color based features of some defect classes. Moreover, since a preliminary rotation was applied to VIS images before training, rotation DA was applied only to IR images.

During inference, detections are returned in the VOC format in a csv file, i.e. for each input image an output file is produced containing in each row the class label, the confidence score of the prediction, as well as the top left and the bottom right corners of the bounding box expressed in pixel units.

Data Augmentation IR images VIS images
horizontal flip ✓✗ ✓✗
vertical flip ✓✗ ✓✗
Table 3: Dataset Augmentation techniques implemented during training for IR images (Plant_Sicilia) and VIS images (Plant_Campania), and for defect (✓) and panel (✗) detectors, respectively.

3.4 Panel detector

To improve the YOLOv3 detection task, the panel detector first applies a preliminary rotation aimed to orientate PV panels either vertically or horizontally (Fig. 7) before learning or inference. DA techniques during training do not include therefore rotation, nor scaling since panel dimensions are roughly similar (Tab. 3).

In addition, at inference time, to enhance YOLOv3 performances, test Time Augmentation (TTA), in a manner similar to Garcia 20 , is applied to the rotated image generating two further images horizontally and vertically flipped. Then inference is executed on the three resulting images. Finally the detections are pruned according to NMS algorithm and the detected PV panels are returned. The TTA procedure is shown schematically in Fig. 10, whereas the post-processing NMS proceeds as follows: we consider as real bounding boxes the set produced on the original rotated image and as candidate detections all the others related to the augmented images. Then we sort in decreasing order of box confidence score, we pick from the box having highest confidence and we compute its Intersection over Union (IoU)


with each box . If , then and . Otherwise we discard and we remove it from . NMS continues until is not empty. In this work, has been set empirically to 0.2 in order to allow a minimum overlapping between panel proposals. The detections of panel detector are finally delivered in the same format as defect detector.

Figure 10: Test Time Augmentation (TTA) workflow: the initial rotated image is augmented by horizontal and vertical flip, then inference based on YOLOv3 is applied on the three images. Finally detections are expressed in the same reference system, pruned by means of NMS and returned.

3.5 False Alarm Filter

False Alarm Filter (Fig. 7) is fed with the following: (i) defect candidates by defect detector and (ii) panel detections by the panel detector. Once the defect candidates have been expressed in the same reference system of panel detections (this actually requires a rotation for IR images), then the IoU between the defect and panel bounding boxes is computed. If , then the defect is retained, otherwise is discarded. Since the defect bounding box may exceed the border of the panel proposal, we set empirically . Finally the false alarm filter returns the defect detection expressed in the reference system of the input image. In case of VIS imaging (Plant_Campania), if strong soiling is detected, the soiling coverage is also returned by computing the intersection area between the defect bounding box and the panel box where the defect is located into, i.e. . Here below an example of delivered textual file for each input VIS image is shown:
Panel 5: strong soiling covers 27.22 % of the whole area
Panel 6: strong soiling covers 7.81 % of the whole area
Panel 12: strong soiling covers 10.68 % of the whole area

3.6 k-Means Clustering

YOLOv2 introduces the concept of anchors Redmon 16b , a set of initial box candidates which are adjusted by the network during learning. Box priors may be fitted on training dataset to improve accuracy and speed. Unsupervised k-means clustering can be used to automate the procedure. However, clustering based on classical Euclidean distance weights larger boxes more then smaller ones for comparable values of overlapping. Therefore we adopt an IoU-based distance function:


where is the ground truth box and is the anchor box. To determine good priors, we applied k-means on training set and we monitored the average IoU, as well as the mean Silhouette and the total intra-cluster variation as a function of the number of clusters . The total intra-cluster variation measures the compactness of the clustering and it is computed as the sum of squares distance error (SSE) between a data point and the centroid of cluster  Nainggolan 19 , i.e.:


The silhouette coefficient represents instead the degree of inter-cluster separation and is evaluated for each instance as Sudheera 16 :


where () is the average intra-distance (inter-distance) of the -th sample from all points in the same (closest) cluster (). The better the clustering, the larger the value of the mean silhouette and the smaller the SSE over samples of training dataset.

Fig. 11a-c shows these quantities for the defect detector of Plant_Campania: according to the Elbow method, we selected clusters as the best trade-off between accuracy and speed. Then we divided up the clusters on three scales, i.e. . Fig. 11d shows the resulting anchor boxes, corresponding to the centroids of the clusters. The procedure has been repeated similarly in the other cases.

Figure 11: k-means clustering applied to training dataset of defect detector for Plant_Campania: (a) IoU, (b) silhouette and (c) SSE as a function of number of clusters. (d): resulting anchors.

4 Results

4.1 Metrics

In this work we used the PASCAL VOC metrics 

Everingham 15 to evaluate performances. Given an IoU threshold, we compute Recall as the proportion of correctly detected boxes , Precision as the ratio between the correctly detected boxes and the overall number of detected boxes

and F1-score as the harmonic mean

. Here TP, FP and FN denote the number of True Positives, False Positives and False Negatives, respectively. Then for each class we draw the Precision-Recall curve and we evaluate the Average Precision (AP) as the area under the curve:


Finally the mean Average Precision is computed as the arithmetic mean of the AP values of the different classes


where is the number of classes under consideration.

4.2 Training

The proposed model has been implemented by means of the Keras library within the TensorFlow framework. The experiments were conducted using an Intel Xeon E5-2690 v3 processor with 56 GB RAM and NVIDIA Tesla K80 GPU. To handle the problem of data lack, as well as to speed up learning and improve accuracy, a 2-stage training starting from pre-trained weights on COCO dataset 

Lin 14

has been adopted: initially transfer learning has been executed for 3 epochs by unfreezing the last three layers of the network and using a learning rate of 0.001 and a batch size of 32. Then the network has been finetuned by training the whole architecture and using a learning rate of 0.0001 and a batch size of 8. To avoid overfitting, we implemented early-stopping by monitoring validation loss and stopping finetuning once no more loss decrease was observed for ten consecutive epochs. As revealed in Fig. 


for Plant_Sicilia, the convergence of loss functions for defect and panel detectors is fast and reached after few tens of epochs.

Figure 12: Training (red) and validation (blue) losses for (a) defect detector and (b) panel detector. The PV system under consideration is Plant_Sicilia.

4.3 Results for Plant_Sicilia

4.3.1 Panel Detector

Tab. 4 shows the margin of improvement achieved for Plant_Sicilia @0.5 when applying TTA during inference. TTA increases obviously the overall number of correct detections and misdetections. The overall effect is an AP improvement of roughly 0.64%. Tab. 5

shows the evaluation metrics for different IoU on test set and results from 

Herraiz 20 : as can be seen, our panel detector outperforms Herraiz 20 for all the considered localization thresholds. In particular, our REC@0.5 is more than 10% higher than in Herraiz 20 : this is a remarkable result since a high false negative rate may negatively influence the hotspot detector outcome as hotspots identified outside detected panel areas are removed in both models. Globally, our AP@0.5 is almost 98.5% and well above 90% also for higher IoU. Only 13 out of 926 panels are missed @0.5, usually not appearing completely in the image, as also observed in Herraiz 20 .

Fig. 13 shows some detection examples. As may be seen in Fig. 13a-c, the model generalizes well for different panel orientation and size, ensuring also a classification with a very high level of confidence (usually not less than 0.95). FPs are very rare and usually correspond to the detection of a small part of a panel at the image border (Fig. 13d), thus not penalizing the outcome of the false alarm filter. FNs are equally rare and related to panel portions located at the boundary and with a significant slope with respect to the horizontal in the input image (Fig. 13e).

Metric without TTA with TTA
PREC [%] 98.44 97.49
REC [%] 98.33 99.00
AP [%] 98.27 98.91
Table 4: Precision, Recall and AP @0.5 of panel detector on validation set of Plant_Sicilia with and without the application of TTA during inference.
IoU TP FP FN PREC [%] REC [%] F1 [%] AP [%]
0.3 915 11 11 98.81 98.81 98.81 98.71
0.5 913 13 13 98.59 98.59 98.59 98.48
0.7 877 49 49 94.70 94.70 94.70 93.97
cascade Herraiz 20 NA NA NA 94.52 88.40 91.35 NA
Table 5: Evaluation metrics and absolute number of TP, FP and FN of panel detector for different IoU on test set of Plant_Sicilia. Results from Herraiz 20 are also reported.
Figure 13: Examples of detections (red boxes) made by the panel detector on test images of Plant_Sicilia. (a), (b), (c): TPs; (d): FP; (e): FN (white box).

4.3.2 Defect detector and multi-stage model

Fig. 14 depicts the relationship between recall and precision for classes hotspot and junction on test set of defect detector. As can be seen, IoU= 0.4 is a good balance point for hotspot class, ensuring a REC (PREC) as high as 88.7% (95.5%), corresponding to only 8 FNs out of 71 hotspot instances and 3 FPs. Performances degrade progressively for higher localization thresholds and, as expected, the effect is more pronounced for hotspot than junction class due to their tiny size which may lead to weak features extracted after many convolution operations in the backbone Darknet-53. Numerical results @0.4 are reported in Tab. 6 and, for a fair comparison with Literature, we also show metrics @0.5.

Figure 14: (a) Precision and (b) Recall as a function of IoU for hotspot and junction classes; (c) Precision-Recall curve for hotspot class @0.4. Data correspond to test set of defect detector for Plant_Sicilia.
[%] [%] [%] [%] [%]
hotspot 0.4 63 3 8 95.45 88.73 91.97 88.04 85.48
junction 0.4 1142 123 153 90.27 87.18 89.21 82.93
hotspot 0.5 53 13 18 80.30 74.64 77.37 66.68 71.86
junction 0.5 1097 168 198 86.72 84.71 85.70 77.03
Table 6: Evaluation metrics @0.4 and @0.5 on test set of defect detector for Plant_Sicilia. The absolute number of TP, FP and FN is also reported.

Fig. 15 shows instead some examples of detections. The defect detector is sensitive to small targets, as well as is able to discriminate accurately between hotspots and overheating due to junction box, in presence of cluttered background and changeable panel temperature (Fig. 15a-c). The three misdetections @0.4 are either due to detections outside the panel area (Fig. 15d-e), thus removable by the false alarm filter, or to wide hot area at the panel edge. FNs are instead caused either to unexpected hotspots with square shapes (Fig. 15f) or with a smoother orange shade, both being not adequately represented in the training statistics.

Figure 15: Examples of hotspot detections (sky blue boxes) made by the defect detector on test images of Plant_Sicilia. (a), (b), (c): TPs; (d), (e): FPs, (f): FN (white box). In the same images the detections for junction box class are also superimposed (red boxes).

Tab. 7 benchmarks the defect detector, the overall multi-stage model and results from Herraiz 20 , which presented a dataset containing a similar statistics for hotspots. First, it is worth noticing that @0.4 the false alarm filter removes 2 out of 3 FPs, increasing PREC (F1-score) by almost 3% (1.4%), and therefore improving the overall robustness of the proposed model. Second, @0.4 the proposed model outperforms Herraiz 20 with a better F1-score almost 3% higher, and with a similar REC. On the other hand, since Herraiz 20 did not specify the IoU, if we assume IoU= 0.5, our model is significantly outperformed by Herraiz 20

with a F1-score (REC) almost 12% (15%) lower. Since our panel detector has better performances, such degradation should be therefore imputed to the YOLOv3 engine used by hotspot detector, rather than to the false positive pruning stage: indeed 

Herraiz 20 exploited the double-stage Faster R-CNN, which is slower than YOLOv3, but should be also more accurate in detecting small objects.

[%] [%] [%] [%]
hotspot 0.4 63 3 8 95.45 88.73 91.97 88.04
multi-stage 0.4 63 1 8 98.43 88.73 93.33 88.31
hotspot 0.5 53 13 18 80.30 74.64 77.37 66.68
multi-stage 0.5 53 11 18 82.81 74.64 78.51 66.91
cascade Herraiz 20 NA 77 7 9 91.67 89.53 90.58 NA
Table 7: Evaluation metrics and absolute number of TP, FP and FN @0.4 and @0.5 on test set of Plant_Sicilia for defect detector and multi-stage model. Results from Herraiz 20 are also shown.

4.4 Results for Plant_Campania

4.4.1 Panel Detector

Tab. 8 shows the performance metrics for different localization thresholds on test set. Panel detector has an outstanding AP@0.5 of almost 97.9%, with only 42 FN out of 2687 panel instances, and an AP above 90% up to IoU of 0.7.

As revealed by Fig. 16a-c, the model is sensitive to panels of different shape and orientation. Unlike Plant_Sicilia, the confidence score may be as low as 0.45 due to the more significant variability in panel shape which makes detection harder. Misdetections are more likely to appear at the image border and for small stripes, resulting in multiple detection for the same panel (Fig. 16d-e). Missed detections are rarer than FPs (Tab. 8) and occur more often for elongated stripes which are detected as a whole (Fig. 16f), unlike annotation that segmented them into two parts due to a modest separation space. Actually both FPs and FNs depicted in Fig. 16 do not penalize the false alarm filter outcome as long as the panel area is detected and the distance between adjacent panels is negligible.

IoU TP FP FN PREC [%] REC [%] F1 [%] AP [%]
0.3 2665 81 22 97.05 99.18 98.10 98.96
0.5 2645 101 42 96.32 98.43 97.36 97.93
0.7 2473 273 214 90.05 92.03 91.03 89.56
Table 8: Precision, Recall, F1-score and AP of panel detector for different IoU on test set of Plant_Campania. The absolute number of TP, FP and FN is also reported.
Figure 16: Examples of detections (red boxes) made by the panel detector on test images of Plant_Campania. (a), (b), (c): TPs; (d), (e): FP; (f): FN and FP (in sky blue the GTBs).

4.4.2 Defect detector and multi-stage model

In Fig. 17(a)-(c) the PREC, REC and AP are shown as a function of IoU, whereas the numerical values @0.5 are reported in Tab. 9. Metrics for classes bird_dropping, raised_panel, delamination, and soiling are flat until IoU= 0.6, in contrast with puddle and strong_soiling for which degradation starts @0.4 as a consequence of the modest size 0.4% (Fig. 5) and the ground truths annotated singularly for each instance. The same reason, as well as the different statistics (Tab. 2), may explain the gap in REC@0.5 (PREC@0.5) of almost 12.8% (16.2%) between similar classes soiling and strong_soiling. In general, the larger the statistics available, the higher the recall achievable. The detector is most sensitive to class puddle, with a REC@0.5 of almost 85.2%, and less sensitive (60.3%) but more precise (93.6%) for class delamination. Handling unbalancing may help to improve performances for minority classes. On average, an IoU of 0.5 is a satisfactory balance point with a mAP roughly equal to 68.5% and a mean F1-score of 77.5%.

We found also, as expected due to the different defect size already discussed, that accumulation of bird drop and soiling, as well as raised panel and delamination, are roughly equally predicted by output feature maps 13x13 and 26x26, whereas the contribution of the output scale 52x52 becomes relevant for strong soiling deposition and predominant for puddle instances (Fig. 17(d)). In the perspective of a possible integration on-board of the architecture and depending on the more interesting defects to monitor, YOLOv3 can be therefore simplified removing the useless output scales.

Figure 17: (a) Precision, (b) Recall and (c) AP of defect detector as a function of IoU; (d) AP@0.5 as a function of defect class grouped by YOLOv3 output scale. The PV system under consideration is Plant_Campania.
[%] [%] [%] [%] [%]
puddle(p) 438 93 76 82.48 85.21 83.82 80.57 68.45
bird_dropping(bd) 382 84 163 81.97 70.09 75.56 66.72
raised_panel(rp) 63 12 29 84.00 68.47 75.44 66.51
delamination(d) 44 3 29 93.61 60.27 73.33 57.91
strong_soiling(ss) 59 21 27 73.75 68.60 71.08 60.01
soiling(s) 153 17 35 90.00 81.38 85.47 78.99
Table 9: Evaluation metrics @0.5 on test set of defect detector for Plant_Campania. The absolute number of TP, FP and FN is also reported.
Figure 18: (a) TP (sky blue boxes), (b) FP (highlighted in red) and (c) FN (highlighted in black) for class bird_dropping (bd) of Plant_Campania.
Figure 19: (a)-(c): some occurrences of delamination; (d)-(e) TP (yellow boxes), (f) FP (highlighted in red) and (g) FN (highlighted in black) for class delamination (d) of Plant_Campania. FP and FN of classes soiling, bird_dropping and raised_panel are also included.
Figure 20: (a) TP (blue boxes), (b) FP (highlighted in red) and (c) FN (highlighted in black) for class raised_panel of Plant_Campania. FP and FN of classes bird_dropping and delamination are also included.
Figure 21: (a)-(b) TP (red boxes), (c) FP (multiple detections of the same instance) and (d) FN (highlighted in black) for class strong_soiling of Plant_Campania. FP and FN of classes soiling, bird_dropping, raised_panel and puddle are also included. (e)-(g): PDF of the normalized area of TP (e), FN (f) and FP (g) instances. The mean area value is also superimposed for convenience as a red dashed line.
Figure 22: (a)-(b): PDF (a) and CDF (b) of the predicted (red) and ground truth (green) soiling coverage; (c)-(d): predicted (red) and ground truth (green) normalized soiling area (c) and panel area (d). The PV system under consideration is Plant_Campania.
Figure 23: (a) TP (green boxes), (b) FP (highlighted in red) and (c) FN (highlighted in black) for class soiling of Plant_Campania. FP and FN of classes soiling, bird_dropping and raised_panel are also shown.
Figure 24: (a)-(b) TP (violet boxes), (c) FP (highlighted in red) and (d) FN (highlighted in black) for class puddle of Plant_Campania. FP and FN of class bird_dropping are also included.

The model is effective in detecting sparse accumulation of bird drop (Fig. 18a), whereas misdetections appear often in correspondence of panels affected by early delamination on single PV cells (Fig. 18b), or dust deposition (Figures 19g, 20a). Missed detections occur instead for very incipient deposition of bird drop, when the impact on power loss should be however negligible (Figures 18c, 20b, 21b,  24d).

Class delamination has few instances corresponding roughly to 5.3% of the overall dataset (Tab. 2). More critically, such instances can differ in shape and colour. Indeed they can be in an advanced status, affecting the entire cell and being visible as a square spot (Fig. 19a), or also extending over multiple cells (Fig. 19c). Delamination may be also at an early stage, characterized by an irregular shape (Fig. 19b). The color may be variable as well, changing from grey (Fig. 19b) to rusted once the degradation becomes more severe (Fig. 19a,c). While the model can detect anomalies of all the mentioned types (Fig. 19d-e), such heterogeneity can justify the not excellent REC@0.5 of almost 60.3%, due in some circumstances to misdetections as strong_soiling because of similar colour and shape (Fig. 19g), or simultaneous occurrence of other anomalies (e.g. raised_panel, Fig. 20a) which degrades the quality of meaningful features extracted by YOLOv3. Nevertheless, the precision is close to 94% with very few FPs (Fig. 19f).

Concerning raised panels, model missed more often panels affected by concurrent defects including soiling (Fig. 20c), that was recognized as the main cause of degradation by the detector, or incipient raising (Figures 19g, 21d). The model is also sometimes confused by sun’s glare which produces some bright ripples (Figures 20b, 21b), causing a misdetection rate 1-PREC of almost 16% @0.5.

Class strong_soiling represents the lowest trade-off between PREC and REC with a F1-score of roughly 71.1%. YOLOv3 is sensitive to such defects until the deposition of dust yields spots with a well delimited pattern (Fig. 21a-b), whereas it fails when the defect contour is smooth (Fig. 21d). It has been found also that false positives have a mean area of almost 2% of the whole image (Fig. 21g), i.e. roughly four times larger than true positives (Fig. 21e), and are more likely to appear as multiple detections of the same defect not suppressed by the NMS algorithm due to modest overlapping (Fig. 21c).

We verified also that the model tends still to overestimate the soiling deposition (Fig. 22

a), causing a longer tail of the respective estimated PDF with respect to the ground truth one and a consequent underestimation of the Cumulative Distribution Function (CDF), quantifiable in a Kolmogorov-Smirnov index 

Espinar 09 exceeding 0.2 (Fig. 22b). This behaviour is mainly due to an over-forecast of the soiling area (Fig. 22c), rather than an under-forecast of the panel area (Fig. 22d).

Class soiling has the highest F1-score @0.5 of almost 85.5%: misdetections corresponds usually at dust deposition lower than expected (Figures 23b, 19e), whereas we found that missed detections occur more often for elongated panel stripes (Figures 23c, 19g, 21b-d), since the training statistics was poor as the drone flight on such installations was conducted after it rained.

It is worth noticing that, unrespective of their smallest size (Fig. 5), YOLOv3 performs well also for class puddle with the best AP@0.5 of about 80.6%, detecting occurrences with different dimension and shape (Fig. 24a-b), as a consequence of the multi-scale detection by the FPN. Notably, YOLOv3 learns correctly also the context around the puddle thanks to large receptive field of the network achieved by downsampling layers. Indeed we found that only 4 misdetections out of 93 are located outside the panel area, whereas some others are produced by shadowing of neighbour rooftop installations (Fig. 24c).

The false alarm filter improves further the precision of the overall model removing the four misdetections of class puddle mentioned above, and without impacting on performances of the other classes. The achieved balanced precision (83.1%) and recall (85.2%) outlines that the multi-stage detector can discriminate well the artifacts from real puddle instances (Tab. 10).

[%] [%] [%] [%]
defect 438 93 76 82.48 85.21 83.82 80.57
multi-stage 438 89 76 83.11 85.21 84.14 80.59
Table 10: Evaluation metrics @0.5 on test set for class puddle of Plant_Campania. The absolute number of TP, FP and FN is also reported. Results for both defect detector and multi-stage model are presented.

5 Conclusion

Driven by the request of lowering the cost of O&M in order to maximize plant revenue and by rapidly evolving enabling technologies, most notably UAV devices and artificial intelligence, automatic detection of anomalies in PV panels based on deep learning algorithms is now becoming a hot research topic. This work proposes one of the first UAV-based inspection system based on a multi-stage architecture built on top of the YOLOv3 network. Some of its distinctive key-features include its applicability to both thermal and visible images with very modest customization required and its ”Plug and Play” nature, i.e. its fast portability on a park of PV systems of increasing size and different panel technologies.

To demonstrate its effectiveness, we have presented performances on two large PV plants in the southern of Italy, either on-ground or roof-mounted, with a nominal capacity of tens of MW each for detection of hotspots and other anomalies such as soiling, bird dropping, delamination, or potential issues not discussed yet in literature, such as presence of puddles or panels unglued from the base. In particular, unlike traditional O&M on-site inspection, drone surveys held just after rainfall allowed to identify stagnant water (puddles) that may turn into severe soiling deposition once evaporated or cause moisture ingress in the PV module through eventual micro-cracks or delamination points. In case of severe soiling, we also return the degree of pollution deposition, based on the predicted defect and panel areas, and its location.

Results are promising and encourage further research on this topic: panels are detected with an outstanding AP@0.5 of 98% or more for both PV systems, whereas for IR images collected on Plant_Sicilia (9 MW) hotspots are identified with an AP@0.4 (AP@0.5) of roughly 88.3% (66.9%). Comparable performances were presented in Herraiz 20 , where however a much smaller PV system (100 KW) was discussed. Concerning instead visible images captured on Plant_Campania (21 MW), defects are detected with an AP@0.5 ranging from 57.9% for delamination up to 79.0% (80.6%) for soiling (puddle). On average, we achieve a mAP@0.5 of 68.5%. We found also that the model still tends to over-forecast the percentage panel area affected by soiling, which is essentially due to a positive bias in the predicted soiling area. The multi-stage architecture is also effective in removing artifacts identified by the defect detector on the background.

Nevertheless, some further researches must be still carried out. When doing O&M services above a huge PV plant with an extension of more than 100 ha, image acquisition shall be planned with a cost-effective approach. In this scenario, drone hovering for static image acquisition of a few PV modules Herraiz 20 is not sustainable and a continuous acquisition flight plan shall be adopted. In such conditions, final defect localization error is influenced by the following: GNSS vertical and horizontal errors, gimbal pointing error, time-shift between metadata and image registration related to flight speed. To reduce it, we propose in a future work to replace drone standard GPS with more accurate GNSS-RTK receivers (whose typical horizontal error is instead of ), and to pre-process images with orthomosaic techniques, that would impact the time-shift error. The combination of both error reduction strategies should allow a defect localization at cell-level. It would also introduce an immediate benefit for O&M cost reduction, enabling O&M operator to avoid time and specialized equipment wasted to repeat on-site inspection analysis with handheld devices with the aim to identify the correct target among adjacent PV panels in the localization error radius. Adoption of orthomosaics, which requires to handle high resolution images, may also introduce further benefits to investigate: the same rotation will be applied at the same time to all the images of a given PV plant, that could lead to a rotation discrepancies reduction. Furthermore RGB and IR mosaics can then be easily overlayed, potentially unveiling new kind of data fusion analysis.

In addition, while the model identifies panels affected by soiling, in case of sparse deposition no information is returned about the soiling distribution which directly affects the power loss, and the impact of the identified anomalies on module efficiency. Such information may be exploited by plant operator to schedule more intelligent maintenance operations such as module cleaning, repair or replacement.

Finally, actually the model can post-process automatically batches of images captured on-site and transferred on a standalone computer, whereas real-time processing directly on-board of drone need further research to shrink the YOLO architecture in order to satisfy the constraints imposed by the embedded hardware in terms of size, speed and accuracy. Preliminary simulations aimed to verify the impact of YOLOv3’s output scales on accuracy unveil a strong coupling with the desired defect type to detect, thus prompting the possibility to streamline the YOLOv3 architecture removing the outputs less efficient for defect detection.

Author Contributions

Conceptualization, A.B.; methodology, A.B. and A.D.T.; software, A.D.T. and A.B; validation, A.D.T. and A.B.; UAV-based data acquisition, GF; data curation, A.D.T., B.M. and G.F.; writing—original draft preparation, A.B. and A.D.T.; writing—review and editing, A.B. and G.F.; visualization, A.D.T. and A.B.; supervision, A.B.. All authors have read and agreed to the published version of the manuscript.


This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. We thank the company i-EM srl (https://www.i-em.eu/) for support in the data collection stage.


  • (1) A. Detollenaere, J. Van Wetter, G. Masson, I. Kaizuka, A. Jäger-Waldau and J. Donoso. Snapshot of Global PV Markets 2020 PVPS Task 1 Strategic PV Analysis and Outreach, IEA PVPS Task 1, 10.13140/RG.2.2.24096.74248, 2020
  • (2) S. Chandel, M. N. Naik, V. Sharma, R. Chandel. Degradation analysis of 28 year field exposed mono-c-Si photovoltaic modules of a direct coupled solar water pumping system in western Himalayan region of India, Renew. Energy, Vol.78, pp. 193–202, 2015
  • (3) A. Diez-Suárez, D. Calzada-Lorenzo, A. González-Martínez, L. Prado, A. Gil de la Puente, J. Blanes, M. Simón-Martín. Thin-film PV modules early degradation analysis: a case study on CIGS, Renewable Energy and Power Quality Journal, Vol. 17, pp. 320-326, 2019
  • (4) M. Köntges, S. Kurtz, C. Packard, U. Jahn, K. Berger, K. Kato, T. Friesen, H. Liu, M. Van Iseghem, J. Wohlgemuth, D. Miller, M. Kempe, P. Hacke, F. Reil, N. Bogdanski, W. Herrmann, C. Buerhop, G. Razongles and G. Friesen, Review of Failures of Photovoltaic Modules, Report IEA PVPS Task 13, 2014
  • (5) Hot Spot Heating — PVEducation, https://www.pveducation.org/
  • (6) M. C. Falvo and S. Capparella, Safety issues in PV systems: Design choices for a secure fault detection and for preventing fire risk, Case Studies in Fire Safety, Elsevier, Vol. 3, pp. 1-16, 2015
  • (7) A. Betti, M. Tucci, E. Crisostomi, A. Piazzi, S. Barmada and D. Thomopulos,

    Fault Prediction and Early-Detection in Large PV Power Plants Based on Self-Organizing Maps

    , Sensors, Vol. 21, pp. 1687, 2021
  • (8) A. Petraglia and V. Nardone, Electroluminescence in photovoltaic cell, Physics Education, IOP Publishing, 2011
  • (9) P. B. Quater, F. Grimaccia, S. Leva, M. Mussetta and M. Aghaei, ”Light Unmanned Aerial Vehicles (UAVs) for Cooperative Inspection of PV Plants”, IEEE Journal of Photovoltaics, Vol. 4, No. 4, pp. 1107-1113, 2014
  • (10) Y. Zefri, A. ElKettani, I. Sebari, S. Ait Lamallam, ”Thermal Infrared and Visual Inspection of Photovoltaic Installations by UAV Photogrammetry—Application Case: Morocco”, Drones, Vol. 2, No.4, pp. 41, 2018
  • (11) C. Buerhop, D. Schlegel, M. Niess, C. Vodermayer, R. Weißmann and C. Brabec, Reliability of IR-imaging of PV-plants under operating conditions, Solar Energy Materials and Solar Cells, Vol. 107, pp. 154–164, 2012
  • (12) G. Leotta, P. M. Pugliatti, A. Di Stefano, F. Aleo and F. Bizzarri, Post Processing Technique for Thermo-Graphic Images Provided by Drone Inspections, EUPVSEC 2015, pp. 1799-1803, 2015
  • (13) M. Aghaei, F. Grimaccia, C. A. Gonano and S. Leva, Innovative Automated Control System for PV Fields Inspection and Remote Control, IEEE Transactions on Industrial Electronics, Vol. 62, No. 11, pp. 7287-7296, 2015
  • (14) A. Arenella, A. Greco, A. Saggese and M. Vento, Real Time Fault Detection in Photovoltaic Cells by Cameras on Drones, Internation Conference on Image Analysis and Recognition, pp. 617-625, 2017
  • (15) V. S. B. Kurukuru, A. Haque, M. A. Khan and A. K. Tripathy, Fault classification for Photovoltaic Modules Using Thermography and Machine Learning Techniques, 2019 International Conference on Computer and Information Sciences (ICCIS), pp. 1-6, 2019
  • (16) A. M. Salazar and E. Q. B. Macabebe, Hotspots Detection in Photovoltaic Modules Using Infrared Thermography, MATEC Web Conf., 70, 10015, 2016
  • (17) S. Deitsch, V. Christlein, S. Berger, C. Buerhop-Lutz, A. Maier, F. Gallwitz and C. Riess. (2018), Automatic Classification of Defective Photovoltaic Module Cells in Electroluminescence Images, Solar Energy, Vol. 185, pp. 455-468, 2019
  • (18) A.K. Vidal de Oliveira, M. Aghaei and R. Rüther, Automatic Fault Detection of Photovoltaic Arrays by Convolutional Neural Networks During Aerial Infrared Thermography, 36th European Photovoltaic Solar Energy Conference and Exhibition, pp. 1302 - 1307, 2019
  • (19) P. C. Hwang, C. C. -Y. Ku and J. C. -C. Chan, Soiling Detection for Photovoltaic Modules Based on an Intelligent Method with Image Processing, 2020 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan), pp. 1-2, 2020
  • (20) W. K. Yap and R. GALEt and K. C. Yeo, Quantitative Analysis of Dust and Soiling on Solar PV Panels in the Tropics Utilizing Image-Processing Methods, Asia Pacific Solar Research Conference - ANU, Canberra, Canberra, Australia, 2015
  • (21) M. Unluturk, A. A. Kulaksiz and A. Unluturk, Image Processing-based Assessment of Dust Accumulation on Photovoltaic Modules, 2019 1st Global Power, Energy and Communication Conference (GPECOM), pp. 308-311, 2019
  • (22) S. Mehta, A. P. Azad, S. A. Chemmengath, V. Raykar and S. Kalyanaraman, DeepSolarEye: Power Loss Prediction and Weakly Supervised Soiling Localization via Fully Convolutional Networks for Solar Panels, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 333-342, 2018
  • (23) R. Pierdicca, E. S. Malinverni, F. Piccinini, M. Paolanti, A. Felicetti and P. Zingaretti, Deep Convolutional neural Network for automatic detection of damaged photovoltaic cells, Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., Vol. XLII-2, pp. 893–900, 2018.
  • (24) Á. H. Herraiz, A. P. Marugán, F. P. G. Márquez, Photovoltaic plant condition monitoring using thermal images analysis by convolutional neural network-based structure, Renewable Energy, Vol. 153, pp. 334-348, 2020
  • (25) A. K. Ashok, C. G, A. Bhat, K. Karnataki and G.Shankar, Automatic Inspection of Utility Scale Solar Power Plants using Deep Learning, https://arxiv.org/abs/1902.04132, 2018
  • (26) R. Pierdicca, M. Paolanti, A. Felicetti, F. Piccinini and P. Zingaretti, Automatic Faults Detection of Photovoltaic Farms: solAIr, a Deep Learning-Based System for Thermal Images, Energies (MDPI), Vol. 13, pp. 6496, 2020
  • (27) J. Redmon, S.Divvala, R. Girshick and A. Farhadi, You Only Look Once: Unified, Real-Time Object Detection, IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788, 2016
  • (28) J. Redmon and A. Farhadi, YOLOv3: An Incremental Improvement, https://arxiv.org/abs/1804.02767, 2018
  • (29) A. Gaur and G. Tiwari, Performance of a-Si thin film PV modules with and without water flow: An experimental validation, Applied Energy, Vol. 128, pp. 184–191, 2014
  • (30) K. He, X, Zhang, S. Ren and J. Sun, Deep Residual Learning for Image Recognition, https://arxiv.org/abs/1512.03385, 2015
  • (31) T.Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan and Serge Belongie, Feature pyramid networks for object detection, IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, pp. 936-944, 2016
  • (32) W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu and A. C. Berg, SSD: Single Shot MultiBox Detector, Computer Vision – ECCV 2016, Lecture Notes in Computer Science, Springer International Publishing, Vol 9905.
  • (33) S. Ren, K. He, R. Girshick and J. Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, https://arxiv.org/abs/1506.01497, 2015
  • (34) Canny J. et al., A Computational Approach To Edge Detection. Pattern Analysis and Machine Intelligence, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-8, No. 6, pp. 679-698, 1986
  • (35) A. S. Hassanein, S. Mohammad, M. Sameer and M. E.Ragab, A Survey on Hough Transform, Theory, Techniques and Applications, IJCSI International Journal of Computer Science Issues, Vol. 12, No. 2, pp. 139-156, 2015
  • (36) https://github.com/tzutalin/labelImg
  • (37) T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar and C. L. Zitnick, Microsoft COCO: Common Objects in Context, Computer Vision - ECCV 2014. Lecture Notes In Computer Science (Springer), Vol. 8693, 2014
  • (38) A. Casado-Garcia and J. Heras, Ensemble Methods for Object Detection, ECAI 2020, Vol. 325, pp. 2688-2695, 2020
  • (39) J. Redmon and A. Farhadi, YOLO9000: Better, Faster, Stronger, https://arxiv.org/abs/1612.08242, 2016
  • (40) R. Nainggolan, R. Perangin-angin, E. Simarmata, F. A. Tarigan, Improve the Performance of the K-Means Cluster Using the Sum of Squared Error (SSE) optimized by using the Elbow Method, J. Phys: Conf. Ser., Vol. 1361, pp. 012015, 2019
  • (41) P. Sudheera, V. R.Sajja, S. D. Kumar, N. G. Rao, Detection of Dental Plaque using Enhanced K-Means and Silhouette Methods, International Conference on Advanced Communication Control and ComputingTechnologies (ICACCCT), pp. 658-662, 2016
  • (42) M. Everingham, S.M.A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn and A. Zisserman,The PASCAL Visual Object Classes Challenge: A Retrospective, Int. J. Comput. Vis., Vol. 111, pp. 98–136, 2015
  • (43) B. Espinar, L. Ramirez, A. Drews, H. G. Beyer, L. F. Zarzalejo, J. Polo and L. Martin, Analysis of different comparison parameters applied to solar radiation data from satellite and German radiometric stations, Solar Energy, Vol. 83, No. 1, pp. 118-125, 2009