An Unsupervised Method for Estimating the Global Horizontal Irradiance from Photovoltaic Power Measurements

06/21/2017 ∙ by Lorenzo Nespoli, et al. ∙ EPFL 0

In this paper, we present a method to determine the global horizontal irradiance (GHI) from the power measurements of one or more PV systems, located in the same neighborhood. The method is completely unsupervised and is based on a physical model of a PV plant. The precise assessment of solar irradiance is pivotal for the forecast of the electric power generated by photovoltaic (PV) plants. However, on-ground measurements are expensive and are generally not performed for small and medium-sized PV plants. Satellite-based services represent a valid alternative to on site measurements, but their space-time resolution is limited. Results from two case studies located in Switzerland are presented. The performance of the proposed method at assessing GHI is compared with that of free and commercial satellite services. Our results show that the presented method is generally better than satellite-based services, especially at high temporal resolutions.



There are no comments yet.


page 5

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Motivation

Solar power generation, both at utility and residential level, will play a central role in the future of the electric power industry, with a predicted installed power ranging from 4.3 to 14.8TW by 2050 mayer2015current . Although this trend is certainly to be welcomed, unless countermeasures are taken the intermittent nature of solar generation could lead to stability issues in the electrical grid eftekharnejad2013impact . In the distribution grid, these problems will be further emphasized by the increase of electricity consumption driven by the electrification of heat generation and mobility IEA2016 , which will further increase the amplitude of the power and voltage fluctuations richardson2010impact .
Fortunately, in the meanwhile smart grid solutions that help to overcome the above-mentioned issues by modulating generation and demand are becoming available and affordable. For example, distributed energy storage systems bahramipanah2016decentralized and demand side management gelazanskas2014demand , Kim2013 can be exploited for the active control of distribution networks.

To optimize the control and guarantee the quality-of-service in the electricity grid, it is important to predict the power flows accurately, as this favors a sensible management of the available flexibility. It has been shown that forecasting accuracy can be improved when the production and consumption in the grid are disaggregated and predicted separately Monforte , Shaker2014 . The disaggregation of solar generation from the total grid load can be achieved by using on ground irradiance measurements Sossanb . In general, global horizontal irradiance (GHI) measurements are often used as exogenous input when performing both long term and short term PV production forecasts lorenz2009irradiance , Inman2013 . Unfortunately, accurate on ground irradiance measurements are often not available. Although irradiance measurements can also be used for the online estimation of PV power production and for fault detection, sensors are usually not installed for small and medium-sized plants, due to their high cost. If the irradiance is not measured directly by means of on-ground measurements, satellite estimations can be exploited. Satellite-based radiation assessment services provide an estimate of the time course of GHI for a given location, but their spatio-temporal resolution is constrained by technical limitations. Most of these services are based on the images acquired by the Meteosat 2nd generation satellites, which have a spatial resolution of 3 km at the nadir and a temporal resolution of 15 minutes Mecikalski2007 . These coarse resolutions limit the performance of satellite-based nowcasting methods. Moreover, the limited spatial resolution has a smoothing effect that can result in reduced accuracy levels for GHI estimation at a specific location, especially in the presence of local clouds. The active control of distribution networks, of which some of the critical sections can take up a small area, requires a more accurate and fast estimate of GHI. Satellite-based methods could also profit from an increased availability of on-ground GHI measurements, as they could be used for calibration Vernay2013 , Mieslinger2014 , Lorenzo2017 , a technique also known as site adaptation.
In this study, we investigate the possibility of using local PV power measurements to estimate GHI with a high temporal and spatial accuracy. Being able to estimate GHI directly from PV power measurements will allow to better estimate and forecast the PV production of an entire neighborhood by monitoring the power output of only a small fraction of the PV plants, without the need to install additional irradiance sensors.
For the development of the proposed method, we focused on accessibility and simplicity. Indeed, the method is fully unsupervised and the only inputs it requires are the measurements of the AC power output of the monitored PV plants, the ambient temperature and the geographical coordinates of the neighborhood.

1.2 Previous work

The idea of using PV plants as surrogated irradiation sensors has already been researched in the past. In Laudani2016 , voltage and current measurements from a PV module were used to calculate the incoming radiation on the plane of array. In Kara2016 , it is suggested that a nearby PV plant can be used as a proxy to estimate the power generated by another PV installation, even if only a linear relationship is considered between the proxy and the estimated output. In Traxler , proximate PV plants are used to predict the PV outputs of other PV installations, for the purposes of automatic fault detection. In Engerer2014 , a clear sky index, , is introduced: this is the ratio between the AC power of a simulated PV plant under clear sky conditions and the actual power measurements. The authors use a clear-sky radiation model, a transposition model and an inverter and PV module model. This method relies on an accurate, technical description of the PV system, which includes the PV module orientation. In Killinger2016 , a methodology is proposed to project power generation between different PV systems. The PV system power output is modeled as a quadratic function of the solar irradiance on the plane of array (POA) and the ambient temperature. The five coefficients of the curve must be fitted for each type of PV module technology. The POA irradiance is obtained by inverting the quadratic expression. The GHI is then estimated from the POA irradiance, using an iterative procedure. In their discussion, the authors suggest that simultaneously considering PV systems with multiple orientations could increase the accuracy of the GHI estimation. In Marion2017 , a similar methodology is presented for obtaining GHI from PV power measurements. Building on the work of Killinger2016 , a correction for low angle of incidence and wind speed is considered. The AOI correction is based on 18 coefficients, specific to the type of PV module coating considered. In all the aforementioned studies, namely Engerer2014 , Killinger2016 , Marion2017 , it is assumed that the inverter type, PV module type and PV module orientation are known. Furthermore, it is assumed that all the modules of a given PV plant have the same orientation.

1.3 Outline and objective

Despite the increasing number of PV installations and the abundance of available monitoring data, it is difficult to use them to estimate the GHI signal. Considering the works previously cited in subsection 1.2, we can identify three main causes that make this task particularly challenging:

  1. Most of the estimation methods in the literature require a detailed description of the PV systems, including PV power plant nominal power, fields orientations, module and inverter types. Gathering all these metadata is difficult and time consuming. Moreover, when available, the data contained in databases could be imprecise or flawed Killinger2017 . This could lead to erroneous estimations.

  2. Occasionally, PV power plants can be composed of different PV fields, each field having different nominal power and orientations. A typical case is a PV power plant with an east-west configuration, but more complex configurations are possible, as presented later in the paper. In this cases, if we possess only a single power signal, we should be able to retrieve nominal powers and orientations of an arbitrary number of PV fields to correctly estimate GHI.

  3. If more than a power signal is available for a given geographical location, an automated procedure is needed to reconcile all the measurements and efficiently make use of all the signals.

We present here a fully unsupervised method for estimating the local GHI using only the power measurements from one or more PV plants, without the need to know their nominal power and module orientations. The problem of identifying PV plants with differently oriented modules from a single power signal is addressed by means of a robust regression. In order to increase the accuracy of the GHI estimation, the method can exploit multiple power signals from different PV plants. The different signals are reconciled by means of outlier detection and by determining the shading patterns of each PV plant. The code related to the GHI estimation method, including the PV system identification methodology, is freely available online (see Section 7).

The paper is structured as follows: Section 2 describes the methodology used for estimating the GHI. Section 3 discusses the issue of how to identify the PV plant orientations without knowing the actual GHI. Section 4 discusses the models used to obtain the PV power production proxies. Section 5 briefly discusses the numerical methods for solving the GHI estimation problem. In Section 6, the accuracy of the method is assessed for two case studies, and compared to satellite-based GHI estimations and secondary standard pyranometer measurements. Finally, conclusions are presented in Section 7.

2 Methodology

The combined effect of irradiance and ambient temperature on the PV power production is well-known. Accurate empirical models that assess the total incoming irradiation on an oriented surface, given the GHI, are also available yang2016 , Loutzenhiser2007 . We can therefore build a function that links the GHI to a given PV plant power output:


where is the estimate of the power generated by a given PV plant, where is the number of time steps in the data, and are the vectors of the observed GHI and temperatures at times , and are the vectors containing the tilts and azimuths of the modules, is a vector containing the geographical coordinates of the plant, namely latitude, longitude and elevation, and is the vector of the nominal powers of the modules. Function is described later by equations 7-16 and by the empirical disc model, as stated in Section 4. If the module orientations and nominal powers were known, could be inverted in order to estimate GHI. Unfortunately, is not always invertible, especially when comes from a PV plant with differently oriented PV modules. So, for different values of GHI, the function could return the same output . In this case, a method is required in order to choose the correct GHI value from a range of possible choices. This problem is solved by following two steps. First, we estimate the panel orientation from the measured power of the given PV plant. We then use the calculated orientations to build function and solve


without inverting . Here refers to the optimized values of .
Equation 2 can be solved by using one or more PV power signals and can therefore be easily reformulated as:


where is the total number of PV power signals and and are the observed and estimated power signals normalized with the estimated nominal power (see Section 3 for how this is estimated). The main drawback to this formula is that, when estimating , all the PV signals are equally weighted. This solution is not robust in the event of shadows or faulty signals. Equation 3 can be improved in two ways:

  1. Faulty signals can be statistically detected, to avoid using them for estimating

  2. When the modules are partially or completely shaded, the GHI estimation is not accurate. This effect can be mitigated by building a map of the error , as a function of the sun position (azimuth and elevation). This map can then be used to evaluate how much a certain PV measurement can be trusted as a function of sun position. For the rest of the paper, we will refer to this map as the trust function.

The above considerations lead to the more general formula:


where is an indicator function detecting the presence of outliers, is the estimation error matrix, is the trust function and and are, respectively, the azimuth and zenith of the sun. Here can be interpreted as a dynamic weight function, since and are function of time. The role of the trust function is to attach less importance to the calculation of the ith signal if this is believed to be affected by shadows with a particular position of the sun. The construction of the and functions is described in Section 5.

3 Orientation assessment

PV plant orientation could theoretically be estimated from the PV plant power measurements, and from the GHI measurements, by means of the equations 7, 8, 9 and 10. The orientation estimation can be formulated as the following optimization problem:


where is the PV production estimated from the GHI signal. The following aspects must be taken into account:

  1. We want to estimate PV plant orientation without knowing the actual GHI seen by the modules

  2. PV plants can consist of groups of modules with different orientations, e.g. plants with an east-west configuration

  3. The presence of shadows affects the relationships between the GHI projection on an oriented surface and the PV power output

  4. Problem 5 is non-linear and non-convex

If the GHI seen by the modules is unknown, estimating their orientation would result in a blind identification problem Wills2011 , Ohlsson2014 . We exploit the fact that we can obtain a good approximation of GHI for clear-sky periods, using a model for the extra -terrestrial irradiation and for the air mass index. In this paper, we used time series obtained from the Soda-pro CAMS McClear service111, which uses the McClear clear sky model Lef2013 .
We can thus identify the PV plant orientations if we can select clear-sky periods, using only the PV plant power output. Different methods can be used to exploit PV power signals in order to identify clear-sky radiation periods. In Lonij2012 , a period is considered to be clear if the measured PV power is higher than the 80 percentile of the set of measurements taken at the same time of day, during the previous 15 days. In Killinger2017 this method is combined with the clear-sky detection routine described in Reno2016 , which uses GHI observation as input and a set of 5 extraction parameters. In this paper, we first developed a selection based on the smoothed power signals: the power output of each PV plant is filtered using a second order low-pass Butterworth filter Butterworth1930 . We then considered a period to be clear if the root mean squared relative error between the original and filtered signal was lower than a threshold value.

The main drawback of this method is that its performance is influenced by three parameters, namely the threshold value, the low-cut frequency period and the length of the period, which have to be tuned. Moreover, in the event of PV curtailment, these curtailment periods can be identified as clear periods.

In order to overcome these issues, we developed a different method. PV power signal distribution as a function of the sun position is typically bimodal, due to the presence of clouds during data acquisition. On the other hand, a unimodal distribution could indicate a systematic shadow for the corresponding sun position. In order to select clear data periods, we fit a gaussian mixture probability density function with two components

for each sun position, with a discretization of 5

. Then, for each sun position we identify the observations lying in the one sigma interval of the gaussian distribution with the largest

as clear observations. We chose to discard other values since they could have been potentially caused by cloud enhancement events (higher power) or by the presence of haze or high clouds (lower power). An example of a PV power distribution for a particular sun position is depicted in figure 1.
Despite the second model being more robust in terms of selection of clear sky periods, the task of predicting the GHI seen by PV panels through a clear sky model presents some intrinsic errors. Clear sky models are not perfect and it is not possible to guarantee that the GHI seen by the PV panels is exactly the one predicted by the clear sky models. In order to overcome this problem and the others referred to above, instead of directly solving the optimization problem 5

, we reformulated it as a robust linear regression:

Figure 1: Distribution of the power signal of one particular PV power plant of the Biel-Benken case study, for and . The distribution presents a bimodal shape. We fit two gaussian distributions and considered the leftmost peak to be caused by clouds, while we select the data between one standard deviation from the mean of the rightmost gaussian distribution as belonging to clear sky periods. Values outside the one standard deviation interval were discarded, since they could have been potentially generated during cloud enhancement events (higher power) or in the presence of haze or high clouds (lower power).

where is a robust loss function Huber2009 , is the proxy matrix, each proxy being the estimated power produced by a panel with a given orientation, is the coefficient vector for the ith PV installation, and being the number of proxies and total number of temporal observations. The additional requirement forces vector to have all positive values. Thus, we can interpret the components of vector as coefficients describing the significance of each proxy in explaining the power output of the ith PV plant, rescaled for its nominal power. Another interpretation is that non-zero entries of are the estimates of the nominal power of the ith PV plant oriented as its corresponding proxies. Furthermore, problem 6 forces to be sparse and it is robust with respect to the presence of partial shadows.
If we use a loss function of an M-estimator for , we can solve problem 6 using an efficient iterative reweighted least square algorithm Holland1977 . As previously anticipated, we could use more than one PV signal to estimate . In this case, we need to identify a set of coefficients for each of the signals, and some methods to assign different weights to the estimation errors arising from the different PV signals, in order to calculate more accurately.

4 Proxy Model for PV Performance

The proxies are an estimate of the electrical power generated by a panel with a given orientation and . In order to effectively solve problem 5, we need to select the most representative proxy orientations. The tilts and azimuths of the proxies, respectively and , are obtained by generating a triangular mesh of an icosahedron on a unit sphere. This is later refined through subdivisions, in order to increase the number of points. In this paper, the most north-facing orientations are discarded, as shown in figure 6.
The sun azimuth and elevation are calculated based on the current time and the altitude, longitude and latitude of the given location. For this task we have used the pvl_ephemeris.m matlab function from the freely available Sandia National Laboratories PV Collaborative Toolbox Stein2012a , which is based on the 1985 Grover Hughes’ Engineering Astronomy course at Sandia National Laboratories. The direct normal irradiance (DNI) is then calculated by means of the empirical disc model Maxwell1987a . The diffuse horizontal irradiance at time t is then calculated as:


where is the zenith angle of the sun at time t. DHI is then used to estimate the projection of the diffuse radiation on the given surface , using the Hay and Davies’ model Loutzenhiser2007 . The overall radiation on the given surface is then given by the sum of the diffuse, direct and ground-reflected radiation.


where is the ground reflected component, calculated as:


where is the albedo, which was fixed to a typical value of 0.2. The direct irradiation on the oriented surface is obtained from the DNI:


where is the angle of incidence of the oriented surface . To calculate and we used the PV Performance Modeling Toolbox by Sandia National Laboratories Stein2012a . Since reflection losses can significantly increase at large Marion2017 , we applied an correction, independent from the module technology ValinetinSoftware2012 :


where is the angle of incidence modifier. We use the following ASHRAE approximation ASHRAE :


and is 0.05.

Finally, in order to obtain a proxy for the electrical power produced by a field with the ith orientation, we apply a correction taking into account the ambient temperature and the inverter and module efficiencies. The cell temperature is first estimated from the ambient temperature, then a linear correction is applied Skoplaki2009 :


a reference temperature, and two coefficients. In this study, and are not estimated and are set respectively to the values of 3.14e-2 and -4.3e-3 , which represent crystalline silicon framed PV modules. Finally, the proxies are corrected for the module and inverter efficiencies, using the following equation:


where is the combined module and inverter efficiency. In order to reduce the number of parameters, we modeled it as a function of the irradiance using the following equation:


where W/m is the reference irradiance and are free parameters. By fitting equation 16 to typical inverter and polycrystalline module data, we obtained the following values: , e-2, e-2.

5 Numerical optimization

5.1 Algorithm description

As stated in Section 2, function is not always invertible. For this reason, we solve 2, or 4 if more than one PV signal is available. These minimizations can be naturally decomposed in time: that is, the norm operator can be written as the sum of the objective functions related to a single observation in time. To keep the discussion general, we can restate the left-hand side of equations 2, 3 and 4 as:


where is a placeholder for one of the objective functions defined in equations 2, 3 or 4. The overall objective function can be restated as



is the total number of observations. Derivative-free optimization algorithms such as genetic algorithms, the Nelder Mead’s simplex method and particle swarm optimization algorithms are badly affected by increasing numbers of decision variables

Pham2011 , Rios2013 . General purpose nonlinear solvers usually rely on calculating the objective function derivatives for all the values of the decision function. This means that must be calculated at each step. Even if it is possible to specify a pattern for the Hessian matrix to the trust-region-reflective algorithm in Matlab, which would significantly speed up the optimization, this algorithm requires the analytical gradient for , which we do not possess Yuan2000 .
For this reason, we implemented a solver for our problem. A comparison between our solver and fmincon computational time, when fmincon solves 4 for each time step individually, is shown in table 1. The results are related to 1500 points. The relative difference in the solutions was below 1 with associated standard deviation of 2.02e-1 . Our algorithm took approximately 15.7 minutes to process one year of data with a temporal resolution of 10 minutes, on an Intel Xenon CPU E5-2697 v2 @ 2.70 GHz with 32.0 GB of RAM.

mean ssample std ssample
fmincon 2.32 3.1e-2
our solver 1.8e-2 8.5e-3
Table 1: Computational times comparison

Since our objective function is in the form described in 17, our solver simply minimizes with a steepest-descent solution strategy. Since could present local minima as previously stated in 2, we initialized the solution with a grid search over the possible values of , as shown in the pseudocode 1.
For each time step, we searched in a discrete space of possible values of , uniformly sampled from 0 to


where is the calculated from the clear sky model, and is a safety factor accounting for the fact that particular cloud configurations can increase the measured GHI to above the clear sky values Inman2016 . We use a 30-step discretization for the grid search. Considering a standard irradiation of 1000 W/m2, this would result in an approximate accuracy level of 33.3 W/m2. We obtain a set of guess vectors for , and for each of them we calculate the proxies and assess the hypothetic power produced by each PV plant (line 3 and 4 of algorithm 1), and the PV estimation error matrix (line 5). Then, for each time we find the best guess , which is the one that minimizes the average PV estimation error among the different PV plants (line 8-11 of algorithm 1). Note that the inner minimization of line 9 is inexpensive, since it consists of finding the position of the minimum element of the average of over g, at a given , where refers to the ith PV plant.
Once the initial guess for has been obtained, the solution is refined as shown in the pseudocode 2. Starting from the first guess solution, we once again determine the PV estimation error at iteration , and then calculate the gradient of the objective function introduced in 4 with respect to :




note again that, since the function is time-separable, the only non-zero elements of are those related to observations occurring at the same time-step


For this reason, the resulting tensor can be rewritten in matrix form such that

. Each element of is calculated as


where is the jth proxy at ith observation computed from .

1:for  do grid search initialization
2:      linear rescale
6:end for
7:Find that minimize the mean PV error at each t
8:for  do
11:end for
Algorithm 1 Initialize GHI
1:while  do
4:      PV estimation error
7:     for  do
9:         if  then
12:         end if
13:     end for
15:end while
Algorithm 2 Estimate GHI

Lines 7 to 13 in algorithm 2 describe the update strategy, where is a vector of coefficients describing how much GHI must be shifted in the direction of the objective function gradient, at the iteration. Instead of using a backtracking strategy, which performs a line search on parameter , we applied an exponential decay on , in the attempt to reduce the total number of function evaluations. Since the objective function is not monotone in GHI, at each iteration we check if the mean estimation error has decreased. In this case, is unchanged, otherwise is set to zero (which results in not updating ) and the new decreases by a factor .

5.2 Trust function and outlier detection

We now illustrate the method used to construct the trust function and the outlier detection function . As previously stated in Section 2, weights the PV estimation differently based on the sun position. Greater importance is attached to signals with lower relative estimation errors during clear sky conditions for a given sun azimuth and elevation. Since we do not possess the real , clear sky periods must be estimated. In order to use as much data as possible for shadow detection, a different method from the one introduced in 3 is used. We estimate the relative PV estimation error under clear sky conditions as:


where is the 1quantile, and are the sun azimuth and zenith angles, discretized with a 2 step.

Since the map is assumed to be affected by noise, in order to obtain a more significant representation of the shadow pattern, we fit a gaussian process on top of it. We then apply a threshold to eliminate the lowest values in the map, which could be due to the lack of observed clear sky conditions in the corresponding sun position. An example of the resulting thresholded map for a given PV plant is shown in figure 2.

Figure 2: Thresholded smoothed estimated error as a function of sun position, for a given PV plant. This particular PV plant presents significant shadowing during morning hours

Once the PV estimation error has been established, we use it to map those signals that are more accurately calculated in a given combination of and , through an inverse relation:


Finally, since we do not want the change in to be unbounded , we normalize the obtained distances:


where is the distance of signal at time step . The second strategy for improving the estimation accuracy is to detect outliers in different signals at each time-step . Intuitively, if we possess more than one power signal, we can study the distribution of the various estimation errors and exclude from the objective function those signals that at time-step are labeled as outliers, applying a standard outlier detection method. We used Tukey’s outlier detection method Tukey1977 , which is based on the interquartile range, and which can be applied to non-symmetric data distributions. We can now define the outliers detection function as:


where is the interquartile range: i.e. and is a parameter dependent on the assumed distribution of . We used , which corresponds to considering approximately 1 of the points as outliers, under normal data distribution conditions. When a point is identified as an outlier, it is not used to correct . This is done by setting to zero the corresponding elements of the objective function gradient. Formally:


where are the signal and timestep marked as outliers.

6 Evaluation on case studies

The proposed method has been tested on two case studies with multiple power signals. Both case studies are located in Switzerland and are particularly challenging for estimation since they present:

  • multiple PV orientations (even at single inverter level)

  • different nominal powers for each PV installation

  • significant shading, due to nearby objects and/or horizon profile

  • partial PV curtailment

The first case study is located in Biel-Benken, on the northern Swiss plateau close to the German and French borders. It consists of 4 residential rooftop PV installations with nominal powers ranging from 6.6 to 10.7 kWp. The mean distance between the PV plants and the pyranometer is approximately 150 meters. The PV plants are affected by local shading, due to the presence of chimneys and nearby buildings. One PV plant has two different module orientations (mounted on two folds of the same roof). The effect of the horizon in this region is negligible.
The second case study is located in Lugano, a hilly region in the alpine foothills. In this case the power signals are related to 5 different industrial PV plants, with nominal power ranging from 126 to 275 kWp. Several inverters are installed in each plant, making a total number of 50 inverters. The mean distance between the PV plants and the pyranometer is approximately 300 meters. In this case the horizon is non-negligible, due to the presence of significant topographical relief formations.
For each case study, both on-ground measurements and satellite data are used for performance assessment. At each location, an ISO 9060 secondary standard pyranometer (CMP10 and CMP21, Kipp&Zonen, Delft, The Netherlands) is used as ground-truth reference. In the first case-study, the pyranometer is mounted on one of the roofs hosting the PV installations, while in the second case the pyranometer is located in the SUPSI Trevano campus. The results were also compared against two different satellite-based irradiation models: MACC-RAD and SICCS. MACC-RAD uses the Heliosat-4 method Thomas2016 , while SICCS is based on a Cloud Physical Properties model Greuell2013 . Both models are based on Meteosat satellite images. The MACC-RAD data are freely accessible, while SICCS data are sold by 3E.

The data for the Biel-Benken case study refer to the period from August 1, 2015 to August 1, 2016, with a 1-minute sampling time. Since this case presents a great annual variation in terms of shadow pattern, in order to increase the method accuracy, we repeatedly identified for different time splits of the data. Then, we chose as the one that minimizes the total RMSE on the mean PV estimation error. In particular, table 2 shows the attempted split and the achieved RMSE for the PV estimation error and for the GHI estimation error, reported here only for comparison. The chosen split uses 3 months data folds .

split [days] 365 182 121 73
PV [-] 6.6e-2 5.7e-2 5.64e-2 5.6e-2 6.4e-2
GHI [W/m2] 41.7 34.8 34.4 35
Table 2: Data splits for Biel-Benken identification
Figure 3: Error distribution for the two case studies with sampling period of 10 minutes. Upper part: empirical probability density functions of the normalized error distributions for the estimated signal and for the signal from MACC-RAD and SICCS models. Lower part: empirical cumulated density functions for the absolute relative errors of the estimated signals. Colored lines refers to the signal estimated using single PV plants power signals, the black dashed line refers to the

signal estimated with power signals of all the PV plants. Pale red bands refers to the confidence interval of the pyranometer

The data for the Lugano case study refer to the period from January 1, 2016 to June 1, 2016, with a 10-minute sampling time. In both case studies, equation 2 is solved for each PV plant separately, obtaining 4 and 5 different GHI estimations, respectively. Equation 4 is then solved using all the signals from the different PV plants, in an attempt to improve the GHI estimation.
Figure 3 summarizes the main results of two case studies, for a sampling time of 10 minutes. The top graphs show the normalized error distributions between and the two satellite-based models and between and the solution of problem 4, where is the signal measured by the pyranometers. The normalization is obtained by dividing the error distribution by a constant value , defined as the mean value of non-zero observations:


where is the number of non-zero elements of vector . In both case studies, estimating GHI from on-ground measurements significantly narrows the error distribution. The lower part of figure 3 shows the estimated probability density function for the absolute relative errors. Each line represents the GHI calculated using one single PV plant, while the thick slashed line is related to the GHI calculated using all the PV signals. The red band represents the typical pyranometer level of accuracy. The results suggest that using more than one signal increases the robustness and accuracy of the method. This is partly due to the trust functions and outlier detection function, as better explained in figure 4. As we can see, the normalized root mean squared error decreases when we remove erratic observations from the objective function, and when we use the trust function to weight the signals.
Figure 3

shows that the proposed method has low bias with respect to the satellite-based methods. In order to gain additional insight into the error distributions, we performed a bias-variance error decomposition:

Figure 4: Normalized RMSE for the two test cases, for different aggregation times. The blue and light blue bars represent the RMSE of the MACC-RAD and SICCS models. The green, brown and yellow bars represent the RMSE of the estimated GHI using our method without corrections, with the outliers detection function and with both outliers detection and trust functions, respectively.
Figure 5: Normalized bias and standard deviation of the MACC-RAD and SICCS services and the proposed method (using outlier detection and trust function), for the two test sites. Colored lines are estimated regions containing 25, 50 and 75 of the observations, when considering daily subsets. Filled circles represent normalized squared bias and normalized standard deviation for the whole datasets, while diamonds represent the expected daily normalized squared bias and normalized standard deviation. Grey lines show contour lines for the normalized RMSE.

where is the estimation error, is a given dataset and is the expectation over the dataset . We calculated and using daily datasets. We performed the calculation using the maximum available time resolution, for our method with trust function and outlier detection and for the two satellite-based methods. This procedure generates bivariate distributions in terms of bias and standard deviation. For visualization reasons, instead of showing all the points generated by this daily decomposition, estimations of the regions containing 25, 50 and 75 of the points, respectively, are plotted in figure 5. These estimations were obtained using the kernel density smoother Matlab function ksdensity. Normalized bias and normalized standard deviation for all the observations in the datasets are shown (filled circles). We also show the daily expected values for the normalized bias and normalized standard deviation (diamonds). We can see from figure 5 that, in comparison with other methods, the proposed method has a narrower distribution in terms of bias and standard deviation.

For the Biel-Benken case study, the exact tilt, azimuth and nominal power of the installed PV plants are known. In order to check if these values were estimated correctly, we compared the ground truth with the identified , plotted as a function of azimuth and elevation in figure 6. It can be seen that the non-zero coefficients of are close to those of the real PV plants. In fact, except for the second PV plant, whose PV panels present more than one orientation, the real orientations lie in the convex-hull formed by the non-zero coefficients.

7 Conclusions

In this paper, we present an unsupervised method for estimating global horizontal irradiance from the AC measurements of one or more PV plants, consisting of PV modules of unknown nominal power and orientation. The only inputs to the method are the AC power signals from the PV plants, with corresponding timestamp, and their approximate location in terms of latitude, longitude and altitude.
An algorithm was developed to speed up the optimization of the underlying non-linear non-convex problem. In terms of computational time, this compares favorably with existing general-purpose solvers.

The method was tested in two different case-studies, both presenting shading and partial curtailment. With respect to other existing satellite-based methods, the results show a significant improvement in the GHI estimation, in terms of RMSE, as shown in figure 4. In both case studies, the relative calculation error is within the secondary standard pyranometer confidence interval for roughly 20-30 of the observations, as shown in 3. The method can correctly identify the orientation and nominal power of the PV modules, even when the PV plant presents PV fields with different orientations. See figure 6.
The method relies on constructing proxies for the electrical power output of the PV modules. This depends on a set of parameters, namely , , , , , , which in this study were kept fixed. Further work is required in order to determine the influence of these parameters on the performance of the algorithm in terms of estimation accuracy. In this work the Maxwell empirical disc model has been used to disaggregate the direct and diffuse irradiance. In recent years, other models have been developed, which have been shown to provide better results Gueymard2016 , as for example the ENGERER2 model Engerer2015 . In future work we will study the effect of different separation models on the algorithm performance.
The proposed method will be used in future studies, in order to disaggregate PV generation from electricity demand, in an attempt to increase the accuracy of aggregated net load short-term forecasts in a low voltage grid. The developed algorithm is freely available as open-source code at

Figure 6: Identified for the Biel-Benken case study. The grey dots indicate the orientations of the proxies, each dot referring to a different column in the matrix. The blue dots represent the identified coefficients of while red dots represent the ground truth. The size of the dots is proportional to the identified/nominal power for a given orientation.


The authors would like to thank CTI - Commission for Technology and Innovation (CH), and SCCER-FURIES - Swiss Competence Center for Energy Research - Future Swiss Electrical Infrastructure, for their financial and technical support to the research work presented in this paper.


  • [1] J. N. Mayer, P. Simon, N. S. H. Philipps, T. Schlegl, and C. Senkpiel, “Current and future cost of photovoltaics,” Long-term Scenarios for Market Development, System Prices and LCOE of Utility-Scale PV Systems (Fraunhofer ISE, Study on behalf of Agora Energiewende, Freiburg, 2015), 2015.
  • [2] S. Eftekharnejad, V. Vittal, G. T. Heydt, B. Keel, and J. Loehr, “Impact of increased penetration of photovoltaic generation on power systems,” IEEE transactions on power systems, vol. 28, no. 2, pp. 893–901, 2013.
  • [3] IEA, “Global EV Outlook 2016: Beyond one million electric cars,” 2016. [Online]. Available:
  • [4] P. Richardson, D. Flynn, and A. Keane, “Impact assessment of varying penetrations of electric vehicles on low voltage distribution systems,” in Power and Energy Society General Meeting, 2010 IEEE.   IEEE, 2010, pp. 1–6.
  • [5] M. Bahramipanah, R. Cherkaoui, and M. Paolone, “Decentralized voltage control of clustered active distribution network by means of energy storage systems,” Electric Power Systems Research, vol. 136, pp. 370–382, 2016.
  • [6] L. Gelazanskas and K. A. Gamage, “Demand side management in smart grid: A review and proposals for future direction,” Sustainable Cities and Society, vol. 11, pp. 22–30, 2014.
  • [7] B. G. Kim, S. Ren, M. Van Der Schaar, and J. W. Lee, “Bidirectional energy trading for residential load scheduling and electric vehicles,” Proceedings - IEEE INFOCOM, vol. 31, no. 7, pp. 595–599, 2013.
  • [8] D. F. A. Monforte, M. C. Fordham, M. J. Blanco, and …, “Improving Short- Term Load Forecasts by Incorporating Solar PV Generation,” California Energy Commission, Tech. Rep., 2017.
  • [9]

    H. Shaker, H. Chitsaz, H. Zareipour, and D. Wood, “On comparison of two strategies in net demand forecasting using Wavelet Neural Network,”

    2014 North American Power Symposium, NAPS 2014, 2014.
  • [10] F. Sossan, L. Nespoli, V. Medici, and M. Paolone, “Unsupervised Disaggregation of PhotoVoltaic Production from Aggregated Power Flow Measurements of Heterogeneous Prosumers,”IEEE Transactions on Industrial Informatics,2018.
  • [11] E. Lorenz, J. Hurka, D. Heinemann, and H. G. Beyer, “Irradiance forecasting for the power prediction of grid-connected photovoltaic systems,” IEEE Journal of selected topics in applied earth observations and remote sensing, vol. 2, no. 1, pp. 2–10, 2009.
  • [12] R. H. Inman, H. T. C. Pedro, and C. F. M. Coimbra, “Solar forecasting methods for renewable energy integration,” Progress in Energy and Combustion Science, vol. 39, no. 6, pp. 535–576, 2013. [Online]. Available:
  • [13] J. R. Mecikalski, K. M. Bedka, W. M. Mackenzie, and S. J. Paech, “Convective Initiation and Lightning Prediction: The potential of the MTG-FC imagery mission,” EUMETSAT 5th MTG MTM, no. EUMETSAT 5th MTG MTM, October 2007, pp. 1–39, 2007. [Online]. Available:
  • [14] C. Vernay, P. Blanc, and S. Pitaval, “Characterizing measurements campaigns for an innovative calibration approach of the global horizontal irradiation estimated by HelioClim-3,” Renewable Energy, vol. 57, pp. 339–347, 2013. [Online]. Available:
  • [15] T. Mieslinger, F. Ament, K. Chhatbar, and R. Meyer, “A new method for fusion of measured and model-derived solar radiation time-series,” Energy Procedia, vol. 48, pp. 1617–1626, 2014. [Online]. Available:
  • [16]

    A. T. Lorenzo, M. Morzfeld, W. F. Holmgren, and A. D. Cronin, “Optimal interpolation of satellite and ground data for irradiance nowcasting at city scales,”

    Solar Energy, vol. 144, pp. 466–474, 2017. [Online]. Available:
  • [17] A. Laudani, F. R. Fulginei, A. Salvini, M. Carrasco, and F. Mancilla-David, “A fast and effective procedure for sensing solar irradiance in photovoltaic arrays,” EEEIC 2016 - International Conference on Environment and Electrical Engineering, no. 3, pp. 0–3, 2016.
  • [18] E. C. Kara, C. M. Roberts, M. Tabone, L. Alvarez, D. S. Callaway, and E. M. Stewart, “Towards Real-Time Estimation of Solar Generation From Micro-Synchrophasor Measurements,” pp. 1–32, 2016. [Online]. Available:
  • [19] P. Traxler, P. Gómez, and T. Grill, “A Robust Alternative to Correlation Networks for Identifying Faulty Systems,” in 26th International Workshop on Principles of Diagnosis A, 2015, pp. 11–18.
  • [20] N. A. Engerer and F. P. Mills, “ScienceDirect K PV : A clear-sky index for photovoltaics,” Solar Energy, vol. 105, pp. 679–693, 2014.
  • [21] S. Killinger, F. Braam, B. Muller, B. Wille-Haussmann, and R. McKenna, “Projection of power generation between differently-oriented PV systems,” Solar Energy, vol. 136, pp. 153–165, 2016.
  • [22] B. Marion and B. Smith, “Photovoltaic system derived data for determining the solar resource and for modeling the performance of other photovoltaic systems,” Solar Energy, vol. 147, pp. 349–357, 2017.
  • [23] S. Killinger, N. Engerer, and B. Muller, “QCPV: A quality control algorithm for distributed photovoltaic array power output,” Solar Energy, vol. 143, pp. 120–131, 2017.
  • [24] D. Yang, “Solar radiation on inclined surfaces: Corrections and benchmarks,” Solar Energy, vol. 136, pp. 288–302, 2016. [Online]. Available:
  • [25] P. G. Loutzenhiser, H. Manz, C. Felsmann, P. A. Strachan, T. Frank, and G. M. Maxwell, “Empirical validation of models to compute solar irradiance on inclined surfaces for building energy simulation,” Solar Energy, vol. 81, no. 2, pp. 254–267, 2007.
  • [26] A. Wills, T. B. Schön, L. Ljung, and B. Ninness, “Blind identification of wiener models,” in IFAC Proceedings Volumes (IFAC-PapersOnline), vol. 18, no. PART 1, 2011, pp. 5597–5602.
  • [27] H. Ohlsson, L. Ratliff, R. Dong, and S. S. Sastry, “Blind identification via lifting,” in IFAC Proceedings Volumes (IFAC-PapersOnline), vol. 19, no. 3.   IFAC, 2014, pp. 10 367–10 372. [Online]. Available:
  • [28] M. Lef, A. Oumbe, P. Blanc, B. Espinar, Z. Qu, L. Wald, M. S. Homscheidt, A. Arola, M. Lef, A. Oumbe, P. Blanc, and B. Espinar, “McClear : a new model estimating downwelling solar radiation at ground level in clear-sky conditions,” Atmospheric Measurement Techniques, vol. 6, pp. 2403—-2418, 2013.
  • [29] V. P. Lonij, A. E. Brooks, K. Koch, and A. D. Cronin, “Analysis of 80 rooftop PV systems in the Tucson, AZ area,” Conference Record of the IEEE Photovoltaic Specialists Conference, pp. 549–553, 2012.
  • [30] M. J. Reno and C. W. Hansen, “Identification of periods of clear sky irradiance in time series of GHI measurements,” Renewable Energy, vol. 90, pp. 520–531, 2016. [Online]. Available:
  • [31] S. Butterworth, “On the theory of filter amplifiers,” pp. 536–541, 1930.
  • [32] P. J. Huber and E. M. Ronchetti, Robust Statistics 2e, 2009, vol. 523, no. 3. [Online]. Available:
  • [33] P. W. Holland and R. E. Welsch, “Robust regression using iteratively reweighted least-squares,” Communications in Statistics - Theory and Methods, vol. 6, no. 9, pp. 813–827, 1977. [Online]. Available:
  • [34] J. S. Stein, “The photovoltaic Performance Modeling Collaborative (PVPMC),” Conference Record of the IEEE Photovoltaic Specialists Conference, pp. 3048–3052, 2012.
  • [35] E. L. Maxwell, “A quasi-physical model for converting hourly global to direct normal insolation,” pp. 35–46, 1987. [Online]. Available:
  • [36] G. Valentin, “Design and Simulation of Photovoltaic Systems Manual,” pp. 1–82, 2012.
  • [37] ASHRAE, “Methods of Testing to Determine the Thermal Performance of Solar Collectors, ASHRAE Standard 93-77,” 1978.
  • [38] E. Skoplaki and J. A. Palyvos, “On the temperature dependence of photovoltaic module electrical performance: A review of efficiency/power correlations,” Solar Energy, vol. 83, no. 5, pp. 614–624, 2009. [Online]. Available:
  • [39] N. Pham, A. Malinowski, and T. Bartczak, “Comparative study of derivative free optimization algorithms,” IEEE Transactions on Industrial Informatics, vol. 7, no. 4, pp. 592–600, 2011.
  • [40] L. M. Rios and N. V. Sahinidis, “Derivative-free optimization: A review of algorithms and comparison of software implementations,” Journal of Global Optimization, vol. 56, no. 3, pp. 1247–1293, 2013.
  • [41] Y.-x. Yuan, “A Review of Trust Region Algorithms for Optimization,” ICIAM 99: Proceedings of the Fourth International Congress on Industrial & Applied Mathematics, Edinburgh, 2000 Oxford University Press, USA, vol. 99, pp. 271–282, 2000.
  • [42] R. H. Inman, Y. Chu, and C. F. M. Coimbra, “ScienceDirect Cloud enhancement of global horizontal irradiance in California and Hawaii,” SOLAR ENERGY, vol. 130, pp. 128–138, 2016. [Online]. Available:
  • [43] J. W. Tukey, “Exploratory Data Analysis,” Analysis, vol. 2, no. 1999, p. 688, 1977. [Online]. Available:
  • [44] C. Thomas, E. Wey, P. Blanc, L. Wald, and M. Lefèvre, “Validation of HelioClim-3 Version 4, HelioClim-3 Version 5 and MACC-RAD Using 14 BSRN Stations,” Energy Procedia, vol. 91, pp. 1059–1069, 2016. [Online]. Available:
  • [45] W. Greuell, J. F. Meirink, and P. Wang, “Retrieval and validation of global, direct, and diffuse irradiance derived from SEVIRI satellite observations,” Journal of Geophysical Research Atmospheres, vol. 118, no. 5, pp. 2340–2361, 2013.
  • [46] C. A. Gueymard and J. A. Ruiz-Arias, “Extensive worldwide validation and climate sensitivity analysis of direct irradiance predictions from 1-min global irradiance,” Solar Energy, vol. 128, pp. 1–30, 2016. [Online]. Available:
  • [47] N. A. Engerer, “Minute resolution estimates of the diffuse fraction of global irradiance for southeastern Australia,” Solar Energy, vol. 116, pp. 215–237, 2015. [Online]. Available: