Power quality (PQ) and reliability, and equipment diagnostics, control and protection are some of the case studies of signal processing in electric power systems . PQ studies evaluate the waveform of voltage and current signals respect to pure sinusoidal waveforms with a single frequency component (e.g. 50 or 60 Hz) [2, 3]. Non-linear electronic devices such as cycloconverters, alternators, and current starters [3, 4] introduce undesirable harmonic distortions to the network. Other examples of sources that distort pure sinusoidal electric signals include electrical loads with high values, electrical faults on the network , tap changer , and capacitors switching banks . Depending on the source of PQ disturbances, we can find distortions such as swell/sag, harmonics, flickers, notching and transients [1, 3, 2].
The automatic recognition of PQ disturbances is an important research field in PQ analysis. It can be stated as a pattern recognition problem, in which the different types of waveform distortions are differentiated based on their features . For the feature extraction, signal processing practitioners in PQ, have employed several basis function superposition, also known as dictionaries . Similar to other quasi-stationary signals, PQ disturbances can be decomposed into time-frequency dependent components by using time-frequency or time-scale dictionaries [1, 6]
. Short time Fourier transform (STFT), Wavelets transform (WT) [9, 10], and S-transform (ST) [11, 12]
are commonly used in PQ representation for the feature extraction. After the feature extraction step, next stage is the automatic pattern recognition for the classification problem. Classifiers based on K-nearest neighbours (K-NN), Bayesian algorithms 
, support vector machines (SVM)
, and artificial neural networks (ANN) have been used for PQ pattern classification.
Several researches in PQ disturbance classification have used different combinations of dictionaries and classifiers to improve the PQ classification accuracy for PQ disturbances . Gargoom et al.  used a K-NN classifier with statistical features computed either from a WT, a ST or a Hilbert-Huang transform (HHT), obtaining a classification accuracy around . In  and , the authors employed a ST to identify short durations disturbances, with an ANN classifier, obtaining a performance between for synthetic generated signals with 30-dB noise level. Similarly, Jayasree et al. used the energy of intrinsic mode functions (IMF) components computed from a HHT as features for an ANN classifier , obtaining . Most recently, in [10, 20, 16], the authors have used the detailed coefficients from a WT, or some of its variant such as Wavelet multi-resolution analysis, together with classifiers based on either a SVM or an ANN. They obtained a performance between .
To the best of our knowledge, previous works in PQ disturbance representation and classification have been limited to the use of single dictionaries for feature extraction step. This is, they only use either GT, HHT, WT or ST together with some classifiers [8, 9, 10, 11, 12]. However, research in signal processing has shown that combining several dictionaries for signal analysis can improve the robustness and numerical stability in the feature extraction step [21, 7, 22]. Combinations of complete dictionaries for signal representations are also known as overcomplete representations (OR), and they are useful a) for increasing the richness of the representation by removing uncertainty when choosing the “proper” dictionary , b) for allowing a non-unique representation with the possibility of adaptation , c) for increasing the robustness in presence of noise, and d) for increasing the flexibility for matching any structure present in the data . However, OR increase the length of the coefficient vector of representation extracted from the signals, and may include information that is not relevant or that may be confusing for the classification step.
In order to remove redundant information, before the classification step, research communities in statistical signal processing, statistics and machine learning have proposed different methods for automatically performing basis selection in linear models. These methods are known asregularization methods or sparse linear models (SLM) . SLM highlight the relevant features in signal representation, making zero (or approximately zero) the contribution of less relevant ones. This property is known as sparsity . Taking advantage of the sparsity due to SLM, it is possible to build OR avoiding the redundant information among the dictionaries. This approach tends to increase the PQ representation accuracy due to OR, and reduces PQ classification complexity due to sparse tendency. Since our OR approach for PQ disturbances requires the analysis of variables (coefficient of representations) that are grouped per each dictionary (e.g. GT, WT or ST), in this paper we employ a SLM known as Group Lasso for PQ disturbance classification. We evaluate the performance of Group Lasso using different classifiers with either one dictionary at a time or with OR. We employ the discrete-time form of the GT, WT and ST dictionaries. Finally, for the PQ classification step, we use different statistical classifiers, namely, K-NN, SVM, ANN, and two Bayesian classifiers based on linear (LDC) and quadratic (QDC) discriminant functions.
2 Materials and Methods
2.1 Power quality disturbances
PQ determines the fitness of electric power to consumer devices, evaluating the quality of voltage and current signals. We will use two concepts to describe PQ disturbances, namely, events and variations. Events are sudden distortions which occur in specific time intervals. On the other hand, variations are steady-state or quasi-steady-state disturbances which require continuous measurements . This paper focusses on four types of disturbances: voltage or current variations (e.g. swell, sag and flicker), harmonic distortions, notching and transient events (e.g. impulsive and oscillatory).
2.1.1 Swell and sag events
swells describe an increase between and p.u. (per unit) over the RMS (Root Mean Square) value in an electric signal during an interval between ms and minute (short time), or over minute (long time) . Opposite to swells, sags describe a decrease between and p.u. below the RMS value. This kind of distortions are commonly produced by tap charger, electrical loads with high values and electrical faults on the network .
2.1.2 Flicker variations
2.1.3 Harmonic variations
2.1.4 Notching variations
2.1.5 Transient events
they can be classified in two types, impulsive and oscillatory events. The impulsive transients are unidirectional events during a short time lapse between ns and ms. They are produced by electric discharges or inductive charge commutation (motors) [3, 4]. Oscillatory transients are bidirectional events during an interval time between ms and ms. The oscillations occur commonly by power factor supplies using capacitors switching banks [3, 4].
2.2 Overcomplete representation
Be a vector which represents a discrete-time signal (e.g. a PQ disturbance) of length . A signal representation using a superposition of atoms (basis functions) is known as a complete representation . Here, the collection of parametrized atoms is known as a dictionary, where the parameter has different interpretations according to the indexation of the variables: frequency, time-frequency or time-scale values. In general, it is possible to assume that y can be expressed by the following linear matrix equation
where and are the vector of coefficients and the dictionary related to the representation, respectively. An overcomplete representation (OR) is obtained by the combination of several dictionaries in order to represent completely the signal y, obtaining a new dictionary given by . In this paper, we analyse the PQ disturbances in terms of the vector of coefficients (analysis step). These coefficients are subsequently used for feature extraction, to feed the PQ disturbance classifiers. We also use to reconstruct or synthesize (synthesis step) the PQ disturbances vector computed from the analysis step . We evaluate the quality of the reconstruction provided by the sparse vector obtained after applying Group Lasso.
2.3 Sparse linear models for grouped variable selection: Group Lasso
proposes a method for estimation in linear models known as Lasso (least absolute shrinkage and selection operator). To solve the inverse problem from the Equation1, he demonstrated that Lasso obtains a sparse representation due to the minimization of the cost function given by
where is the vector of estimated coefficients which depends of the regularization parameter , and correspond to the -norm, and -norm, respectively. Here, Lasso assumes that there is a direct and unique correspondence between parameters and variables (1:1 correspondence), performing the variable selection individually. However, the individual selection of variables produces a not satisfactory solution for grouped variables (e.g. coefficients related to bases from the dictionary ) . To deal with grouped variable estimation, an extension of Lasso was proposed in . This method is known as Group Lasso. Group Lasso assumes that the vector is partitioned in groups where the penalty is an intermediate between and regularizations. Meier et al.  showed that Group Lasso has the attractive property it performs the variable selection at the group level, promoting sparsity over for large values of
. For linear regression, the cost function for Group Lasso is given by
an identity matrix which depends on the length of-th group. According to this proposition, the necessary and sufficient condition for to be a solution of Equation (2) is given by
where , with . The solution of the Equation (2) can therefore be obtained by iteratively applying the Equation (3) for . In this paper, each group corresponds to a single dictionary, and are the representation coefficients related to the -th dictionary.
2.4 Time-frequency and time-scale dictionaries
2.4.1 Short Time Fourier Transform (STFT)
STFT is the general expression of time-frequency dictionaries. The STFT uses the concept of a window function to compute the Fourier transform in a specific interval of the analysis signal. The STFT of a discrete-time series is giving by the expression
where is the window function with location , and are the frequency terms which we use for the representation. In PQ disturbance, the Gabor transform (GT) is the most common STFT where is defined by a Gaussian function . Other types of window functions used in PQ analysis are the Hamming function and the Hann function . For real-valued representations, the atoms for GT dictionary follow
where , (), and (). Here, (total frequencies), (total locations), (fundamental frequency) and (fundamental location) are constants. The main disadvantage of GT in PQ disturbances is related to its resolution for signal representation. A higher resolution in time leads to a lower resolution in frequency, and vice versa . In addition, we need to fix the length scale .
2.4.2 Wavelet Transform (WT)
WT is other example of time-frequency transforms applied to PQ disturbances . Similarly to STFT, the WT uses a time-scale dictionary characterized by irregular functions called wavelets. Contrary to STFT, WT gives the possibility to vary the scale factor, obtaining higher resolutions in time when it is required. Here, we can fix the length of the wavelet to detect high frequencies . WT is described by
where is the wavelet function with locations and length-scale . Mexican Hat Wavelet Transform (MHWT), Daubechies, Morlets and Symlets wavelets have been applied to PQ disturbance representation . In this paper, we use the MHWT. Its atoms can be expressed by the following form
where , (), and . Here, (total scales), , (principal scale) and are constants. The WT has been used to characterize PQ disturbances such as voltage variation (e.g. swells and sags) , and transients distortions (e.g. impulsive and oscillatory), due to its variable length-scale .
2.4.3 Stockwell Transform (ST)
ST is a generalization of the STFT, extending the continuous WT and overcoming some of its disadvantages. ST includes a correction factor which gives information about phase contribution in the frequency domain, improving the performance for PQ representations. The phase correction can characterize the real and the imaginary spectrum components independently . The ST combines the time-frequency and time-scale concepts, given
where is a Gaussian window with location and length-scale . The atoms for ST are given by
In general, it is possible to express many dictionaries in a matrix form . Also, we can combine several dictionaries to achieve a new and most robust dictionary by , where and are indexation parameters. It is noteworthy that these dictionaries do not need to be orthogonal to obtain an OR .
2.5 Classifiers for automatic recognition
PQ disturbance classification can be seen as a pattern recognition problem, in which each class corresponds to a type of PQ disturbance. We start with a set , where is a feature vector extracted from the signal (we will see later how the elements in are computed from the vector of coefficients ), and is the corresponding label or class for signal . is the total number of samples. The automatic recognition of disturbances is made of two stages: training and testing stages. In the first stage, the classifiers use a set of samples , to learn a mapping between the input space and the different PQ disturbances. In the second stage, the classifiers use the information obtained from the training set to predict the labels of a new set of samples . Notice that and correspond to the number of samples for the training and test datasets, respectively. In this subsection, we give a brief description of the classifiers we used to validate the performance of SLM in the PQ classification step. The classifiers that we use are state-of-the-art, and the theory behind them can be found in deep in any pattern recognition or machine learning textbook [23, 24].
2.5.1 K-nearest neighbours (K-NN)
2.5.2 Bayesian classifiers
there are two types of non-parametric Bayesian classifiers commonly used for patter recognition, namely, the linear (LDC) and the quadratic (QDC) discriminant classifiers. LDC finds a linear combination of features which is used as a linear classifier. This method assumes normal densities with equal covariance for each class, and the joint covariance is given by the weighted average of the class covariances. QDC also assumes that measurements for each class are normally distributed, but there is not the assumption that covariances for each class are identical. Both methods consist of the computation of the sample means and the sample covariances of each class[23, 24].
2.5.3 Support vector machines (SVM)
SVM are decision machines that maximize the separating margin between two classes given a set of training data . The risk function of SVM involves the use of Lagrange multipliers, a set of Karush-Kuhn-Tucker conditions, and variables to control the trade-off between the slack variable penalty and a boundary that separate both classes 
. The SVM maximize the Lagrangian dual risk function which involves inner products between training data, which later are replaced for kernel functions. Some common kernels for PQ disturbance classification are the exponential, and the Gaussian kernels (or radial basis function, RBF) .
2.5.4 Artificial neural networks (ANN)
Table 2 shows the parametric equations used to construct the synthetic dataset for different types of PQ disturbances, namely, swells, sags, flicker effects, notches, impulsive and oscillatory transients. Those equations are based on [31, 32, 20, 33]. We generated PQ disturbances with samples per each type. Each disturbance has discrete-time values for an interval between and seconds.
2.6.2 Group Lasso
we implement the shooting algorithm for Group Lasso described in section 2.3, using a penalty factor , and a convergence tolerance equal to . These values were selected manually in order to obtain a signal reconstruction error lower than , and for preserving relevant representation coefficients. Here, each group corresponds to a single dictionary, and are the representation coefficients related to the -th dictionary. The percentage of sparsity for the coefficients is computed by the sum of the number of coefficient lower than , and dividing the result by the length of the vector .
for each time-frequency and time-scale dictionary, we use a location factor . For the GT, we focus in the first harmonics with a length-scale s. For MHWT, we used as scale factor with . For ST, we used the first harmonics, and with . However, we add a cosine/sine dictionary for all the cases with the first harmonics, aiming to improve the PQ synthesis results. We denote it as the Harmonics dictionary. Finally, in order evaluate the SLM performance using OR, we combine the GT, MHWT and ST dictionaries which we denote as GWST dictionary.
2.6.4 Feature extraction
after applying the dictionaries proposed in subsection 2.4, we compute the following features from to create the feature vectors for each signal : mean of the absolute values (
), standard deviation (
), kurtosis (), Shannon’s energy () and RMS value (). We also include the mean of the absolute values of the derivative ()111Let , the features were calculated as follows , , , where is the derivative of and . . For the PQ disturbance classification step, we normalize the features by subtracting their mean values, and dividing the result using their standard deviation.
for the classifier based on ANN, we use the Neural Network Toolbox provided for Matlab R2013a. We use an ANN made of three hidden layers with , and neurons in the first, second, and third layer, respectively, with sigmoid transfer functions . We employ the Levenberg-Marquardt scheme using the MSE criteria for the training stage. For the other classifiers described in 2.5, we use the Matlab Toolbox for Pattern Recognition (PRTools Toolbox222PRTools Toolbox is available at http://prtools.org/.). For the SVM, we use a RBF kernel , where is known as the bandwidth parameter. The bandwidth parameter, and the regularization parameter for the SVM are tuned by cross-validation. We test the classifiers twenty times with different training sets. We select randomly the of the total samples per each type of PQ disturbance to train the classifiers (133 samples per class), and then we use the other for test stage (57 samples per class). The performance for each test experiment is computed by the sum of the successful classifications of PQ disturbances, and dividing the result using the total number of test samples. Finally, we compute the mean and the standard deviation of the performance obtained in all the experiments.
3 Results and Discussions
To observe the advantages of sparse linear models (SLM) for PQ representation, Figure 2 shows the synthesis and the analysis steps for a single example of a swell event. The swell disturbance used as example appears in Figure LABEL:. The result of the synthesis step is shown for two cases: in LABEL: without sparsity, and in LABEL: using Group Lasso. For this result, we use the GWST dictionary which we obtain by combining the GT, MHWT and ST. Both methods can synthesize the PQ distortion in Figure LABEL:, ensuring a low reconstruction error. Moreover, we perform the analysis step in Figures LABEL: and LABEL: for the same example. The first, second and third interval separated by the dash line correspond to the coefficients obtained by GT, WT and ST, respectively. To represent the swell example, we see the method without Group Lasso in LABEL: tends to use all the coefficients from . However, from Figure LABEL:, Group Lasso achieves the synthesis of the PQ disturbance using slightly less than the of the total coefficients available, selecting a few number of coefficient from each dictionary. Finally, we see that the magnitude of the coefficients obtained with Group Lasso is far smaller compared to the magnitudes obtained without SLM due to the regularization parameter used in Group Lasso.
In order to quantify the level of sparsity produced by Group Lasso over the GWST representation, the synthesis step was performed over all the PQ disturbance examples from the synthetic dataset described in section 2.6. Tables 4 and 4 show the sparsity percentages produced by the method without sparsity, and using Group Lasso, respectively. According to section 2.6.2, we compute the sparsity percentages per each type of PQ disturbance (rows), and each dictionary (columns). Table 4 shows that methods without sparsity tend to use all the coefficient of representation from GWST, making more difficult subsequent PQ studies (e.g. PQ analysis and PQ disturbance classification). According to Table 4, Group Lasso automatically selects the most representative coefficients required to fully represent the PQ disturbances. Also, we notice that the Harmonics, the MHWT and the ST dictionaries contain more relevant components in terms of the disturbance representation, when compared to the GT dictionary.
To evaluate the performance of SLM for PQ classification, we first perform the synthesis step per each disturbance represented through GT, MHWT, ST, and GWST (combining GT, MHWT, ST), obtaining the set of coefficients . We then apply Group Lasso over the different representations, and compute the feature set described in 2.6.4. We train the different classifiers, and evaluate their performance over the test set as well as we describe in section 2.6.5
Table 5 shows the PQ classification performance over the test set. When Group Lasso is not applied (first four rows), ST and GWST show better results than GT and WT, independently of the classifier employed. When Group Lasso is applied (last four rows), we notice that for any particular representation, and any classifier, the accuracy obtained by additionally applying Group Lasso is higher, when compared to the same representation, and the same classifier used without Group Lasso. For example, when the representation is ST and the classifier is ANN, applying Group Lasso increases the performance by almost . The improvement is even higher when the representation is GT, and the classifier is almost QDC. In this case the improvement is close to .
For all the classifiers, accuracy results between both ST and GWST, using Group Lasso, are similar. This is particularly true for QDC and SVM. Due to the similarity of these accuracies, we apply a Wilcoxon rank-sum test (WRS) to evaluate the statistical significance of both results per classifier, concluding that the differences in performance are statistically significant.
Tables 7 and 7 show the confusion matrices for GWST using 1-NN classifier without and with Group Lasso, respectively. From these tables we notice that without Group Lasso, misclassification occur in almost all the types of PQ disturbances, mainly in oscillatory distortions due to they require a high resolution in PQ representation both in time and frequency. On the other hand, the misclassification using Group Lasso are produced by the fact that similar frequency components appear in the different types of PQ disturbances (e.g. swell and sag, notch and oscillatory).
As an additional result, we evaluate the robustness of the framework proposed here in the presence of noise. In particular, we add white Gaussian noise to each PQ signal up until we get an specific Signal-to-Noise ratio (SNR). We then apply GWST plus Group Lasso over each signal, and run the classifiers twenty times, with different training and testing sets each time. Figure 3 shows the mean accuracy of the twenty repetitions obtained by each classifier, and for different SNR values. We notice that the classifiers show a reliable classification accuracy for SNR values greater than 30dB. For values below 30dB, the accuracy decreases due to the amount of noise, confusing ever more the waveforms among the PQ disturbance classification. Notice also that for low SNR values, more sophisticated classifiers such as ANN, QDC, and SVM behave better than simpler classifiers like LDC or those based on nearest neighbours.
In this paper, we introduced overcomplete representations and sparse linear models for power quality disturbance classification. As an example of overcomplete representations, we combine the Gabor transform, the Wavelet transform with Mexican hat function, and the Stockwell transform, which are well known in the literature for PQ analysis. As an example of a sparse linear model, we use the Group Lasso assuming that each dictionary is a group in the sparse model.
Group Lasso selects automatically the most representative coefficients required to fully represent PQ disturbances, outperforming overcomplete representations. As we showed experimentally, this approach improves the performance of PQ disturbance classification for both linear and non-linear classifiers compared to methods without Group Lasso.
We are thankful to the Department of Electrical Engineering, Universidad Tencológica de Pereira, for the technical support provided to carry out this research.
-  M. Bollen, I.-H. Gu, S. Santoso, M. McGranaghan, P. Crossley, M. Ribeiro, and P. Ribeiro, “Bridging the gap between signal and power,” Signal Processing Magazine, IEEE, vol. 26, no. 4, pp. 12–31, 2009.
-  S. Chattopadhyay, M. Mitra, and S. Sengupta, Electric Power Quality, 1st ed., ser. Power Systems. Springer, 2011.
-  A. Kusko and M. Thompson, Power Quality in Electrical Systems, 1st ed. McGraw-Hill Professional, 2007.
-  G. Benysek and M. Pasko, Power theories for improved power quality, ser. Power systems. Springer, 2012.
-  A. Farazmand, F. De Leon, K. Zhang, and S. Jazebi, “Analysis, modeling, and simulation of the phase-hop condition in transformers: The largest inrush currents,” Power Delivery, IEEE Transactions on, vol. 29, no. 4, pp. 1918–1926, 2014.
-  D. Granados, R. Romero, R. Osornio, A. Garcia, and E. Cabal, “Techniques and methodologies for power quality analysis and disturbances classification in power systems: a review,” Generation, Transmission Distribution, IET, vol. 5, no. 4, pp. 519–529, 2011.
-  S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM Journal on Scientific Computing, vol. 20, pp. 33–61, 1998.
-  G. K. Sharma, A. Kumar, C. B. Rao, T. Jayakumar, and B. Raj, “Short time Fourier transform analysis for understanding frequency dependent attenuation in austenitic stainless steel,” NDT & E International, vol. 53, pp. 1–7, 2013.
-  T. Vega, V. Roig, and H. San Segundo, “Evolution of signal processing techniques in power quality,” in Electrical Power Quality and Utilisation, 9th International Conference on, 2007, pp. 1–5.
-  H. Eristi and Y. Demir, “Automatic classification of power quality events and disturbances using Wavelet transform and support vector machines,” Generation, Transmission Distribution, IET, vol. 6, pp. 968–976, 2012.
-  C. Naik and P. Kundu, “Identification of short duration power quality disturbances employing S-transform,” in Power and Energy Systems (ICPS), 2011 International Conference on, 2011, pp. 1–5.
-  N. Huang, L. Lin, W. Huang, and J. Qi, “Review of power-quality disturbance recognition using S-transform,” in Control, Automation and Systems Engineering (CASE), IITA International Conference on, 2009, pp. 438–441.
-  N. M. Khoa, D. T. Viet, and N. H. Hieu, “Classification of power quality disturbances using Wavelet transform and K-nearest neighbor classifier,” in Industrial Electronics (ISIE), IEEE International Symposium on, 2013, pp. 1–4.
-  J. Wang and C. Wang, “Bayes method of power quality disturbance classification,” in TENCON 2005 IEEE Region 10, 2005, pp. 1–4.
-  P. G. V. Axelberg, I. Y. H. Gu, and M. H. J. Bollen, “Support vector machine for classification of voltage disturbances,” Power Delivery, IEEE Transactions on, vol. 22, no. 3, pp. 1297–1303, 2007.
-  M. Reaz, F. Choong, M. Sulaiman, F. Mohd-Yasin, and M. Kamada, “Expert system for power quality disturbance classifier,” Power Delivery, IEEE Transactions on, vol. 22, no. 3, pp. 1979–1988, 2007.
-  A. M. Gargoom, N. Ertugrul, and W. L. Soong, “Investigation of effective automatic recognition systems of power-quality events,” Power Delivery, IEEE Transactions on, vol. 22, no. 4, pp. 2319–2326, 2007.
-  I. W. C. Lee and P. K. Dash, “S-transform-based intelligent system for classification of power quality disturbance signals,” Industrial Electronics, IEEE Transactions on, vol. 50, no. 4, pp. 800–805, 2003.
-  T. Jayasree, D. Sam Harrison, and T. Sree Rangaraja, “Automated classification of power quality disturbances using Hilbert Huang transform and RBF networks,” International Journal of Soft Computing and Engineering, vol. 1, no. 5, pp. 217–223, 2011.
-  D. Saxena and K. S. Verma, “Wavelet transform based power quality events classification using artificial neural network and SVM,” International Journal of Engineering, Science and Technology, vol. 4, no. 1, pp. 87–96, 2012.
-  J. Kovacevic and A. Chebira, An Introduction to Frames, ser. Foundations and Trends in Signal Processing: Issue 1. Now Publishers, 2008, vol. 2, no. 1.
-  M. S. Lewicki, T. J. Sejnowski, and H. Hughes, “Learning overcomplete representations,” Neural Computation, vol. 12, pp. 337–365, 1998.
-  C. M. Bishop, Pattern Recognition And Machine Learning (Information Science And Statistics). Springer, 2007.
-  K. P. Murphy, Machine Learning: A Probabilistic Perspective (Adaptive Computation And Machine Learning Series). The MIT Press, 2012.
-  M. H. Bollen and I. Gu, Signal processing of power quality disturbances. Wiley-IEEE Press, 2006, vol. 30.
-  S. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” Signal Processing, IEEE Transactions on, vol. 41, no. 12, pp. 3397–3415, 1993.
-  R. Tibshirani, “Regression shrinkage and selection via the Lasso,” Journal of the Royal Statistical Society, vol. 58, pp. 267–288, 1996.
L. Meier, S. van de Geer, and P. Bühlmann, “The Group Lasso for logistic regression,”Journal of the Royal Statistical Society, vol. 70, no. 1, pp. 53–71, 2008.
-  M. Yuan, M. Yuan, Y. Lin, and Y. Lin, “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society, vol. 68, pp. 49–67, 2006.
-  W. J. Fu, “Penalized Regressions: The Bridge versus the Lasso,” Journal of Computational and Graphical Statistics, vol. 7, no. 3, pp. 397–416, 1998.
-  P. Janik and T. Lobos, “Automated classification of power-quality disturbances using SVM and RBF networks,” Power Delivery, IEEE Transactions on, vol. 21, no. 3, pp. 1663–1669, 2006.
-  T. Abdel-Galil, M. Kamel, A. Youssef, E. El-Saadany, and M. Salama, “Power quality disturbance classification using the inductive inference approach,” Power Delivery, IEEE Transactions on, vol. 19, no. 4, pp. 1812–1818, 2004.
-  S. Roy and S. Nath, “Classification of power quality disturbances using features of signals,” International Journal of Scientific and Research Publications, vol. 2, no. 11, pp. 1–9, 2012.
-  I. Monedero, C. Leon, J. Ropero, A. Garcia, J. Elena, and J. C. Montano, “Classification of electrical disturbances in real time using neural networks,” Power Delivery, IEEE Transactions on, vol. 22, no. 3, pp. 1288–1296, 2007.