PALM: An Incremental Construction of Hyperplanes for Data Stream Regression

Data stream has been the underlying challenge in the age of big data because it calls for real-time data processing with the absence of a retraining process and/or an iterative learning approach. In realm of fuzzy system community, data stream is handled by algorithmic development of self-adaptive neurofuzzy systems (SANFS) characterized by the single-pass learning mode and the open structure property which enables effective handling of fast and rapidly changing natures of data streams. The underlying bottleneck of SANFSs lies in its design principle which involves a high number of free parameters (rule premise and rule consequent) to be adapted in the training process. This figure can even double in the case of type-2 fuzzy system. In this work, a novel SANFS, namely parsimonious learning machine (PALM), is proposed. PALM features utilization of a new type of fuzzy rule based on the concept of hyperplane clustering which significantly reduces the number of network parameters because it has no rule premise parameters. PALM is proposed in both type-1 and type-2 fuzzy systems where all of which characterize a fully dynamic rule-based system. That is, it is capable of automatically generating, merging and tuning the hyperplane based fuzzy rule in the single pass manner. The efficacy of PALM has been evaluated through numerical study with six real-world and synthetic data streams from public database and our own real-world project of autonomous vehicles. The proposed model showcases significant improvements in terms of computational complexity and number of required parameters against several renowned SANFSs, while attaining comparable and often better predictive accuracy.



page 1

page 9


An Ensemble of Adaptive Neuro-Fuzzy Kohonen Networks for Online Data Stream Fuzzy Clustering

A new approach to data stream clustering with the help of an ensemble of...

PANFIS++: A Generalized Approach to Evolving Learning

The concept of evolving intelligent system (EIS) provides an effective a...

TSK-Streams: Learning TSK Fuzzy Systems on Data Streams

The problem of adaptive learning from evolving and possibly non-stationa...

CFM-BD: a distributed rule induction algorithm for building Compact Fuzzy Models in Big Data classification problems

Interpretability has always been a major concern for fuzzy rule-based cl...

Fuzzy Rule Interpolation and SNMP-MIB for Emerging Network Abnormality

It is difficult to implement an efficient detection approach for Intrusi...

Big Data Analytic based on Scalable PANFIS for RFID Localization

RFID technology has gained popularity to address localization problem in...

Parsimonious Random Vector Functional Link Network for Data Streams

The theory of random vector functional link network (RVFLN) has provided...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Advance in both hardware and software technologies has triggered generation of a large quantity of data in an automated way. Such applications can be exemplified by space, autonomous systems, aircraft, meteorological analysis, stock market analysis, sensors networks, users of the internet, etc., where the generated data are not only massive and possibly unbounded but also produced at a rapid rate under complex environments. Such online data are known as data stream [1, 2]. A data stream can be expressed in a more formal way [3] as where is enormous sequence of data objects and possibly unbounded. Each of the data object can be defined by an

dimensional feature vector as

which may belong to a continuous, categorical, or mixed feature space. In the field of data stream mining, developing a learning algorithm as a universal approximator is challenging due to the following factors 1) the whole data to train the learning algorithm is not readily available since the data arrive continuously; 2) the size of a data stream is not bounded; 3) dealing with a huge amount of data; 4) distribution of the incoming unseen data may slide over time slowly, rapidly, abruptly, gradually, locally, globally, cyclically or otherwise. Such variations in the data distribution of data streams over time are known as [4, 5]; 5) data are discarded after being processed to suppress memory consumption into practical level.

To cope with above stated challenges in data streams, the learning machine should be equipped with the following features: 1) capability of working in single pass mode; 2) handling various concept drifts in data streams; 3) has low memory burden and computational complexity to enable real-time deployment under resource constrained environment. In realm of fuzzy system, such learning aptitude is demonstrated by Self Adaptive Neuro-Fuzzy System (SANFS) [6]

. Until now, existing SANFSs are usually constructed via hypersphere-based or hyperellipsoid-based clustering techniques (HSBC or HEBC) to automatically partition the input space into a number of fuzzy rule and rely on the assumption of normal distribution due to the use of Gaussian membership function

[7, 8, 9, 10, 11, 12, 13, 14, 15]. As a result, they are always associated with rule premise parameters, the mean and width of Gaussian function, which need to be continuously adjusted. This issue complicates its implementation in a complex and deep structure. As a matter of fact, existing neuro-fuzzy systems can be seen as a single hidden layer feedforward network. Other than the HSSC or HESC, the data cloud based clustering (DCBC) concept is utilized in [16, 17] to construct the SANFS. Unlike the HSSC and HESC, the data clouds do not have any specific shape. Therefore, required parameters in DCBC are less than HSSC and HESC. However, in DCBC, parameters like mean, accumulated distance of a specific point to all other points need to be calculated. In other words, it does not offer significant reduction on the computational complexity and memory demand of SANFS. Hyperplane-Based Clustering (HPBC) provides a promising avenue to overcome this drawback because it bridges the rule premise and the rule consequent by means of the hyperplane construction.

Although the concept of HPBC already exists since the last two decades [18, 19, 20], all of them are characterized by a static structure and are not compatible for data stream analytic due to their offline characteristics. Besides, majority of these algorithms still use the Gaussian or bell-shaped Gaussian function [21] to create the rule premise and are not free of the rule premise parameters. This problem is solved in [22], where they have proposed a new function to accommodate the hyperplanes directly in the rule premise. Nevertheless, their model also exhibit a fixed structure and operates in the batch learning node. Based on this research gap, a novel SANFS, namely parsimonious learning machine (PALM), is proposed in this work. The novelty of this work can be summarized as follows:

  1. PALM is constructed using the HPBC technique and its fuzzy rule is fully characterized by a hyperplane which underpins both the rule consequent and the rule premise. This strategy reduces the rule base parameter to the level of where are respectively the number of fuzzy rule and input dimension.

  2. PALM is proposed in both type-1 and type-2 versions derived from the concept of type-1 and type-2 fuzzy systems. Type-1 version incurs less network parameters and faster training speed than the type-2 version whereas type-2 version expands the degree of freedom of the type-1 version by applying the interval-valued concept leading to be more robust against uncertainty than the type-1 version.

  3. PALM features a fully open network structure where its rules can be automatically generated, merged and updated on demand in the one-pass learning fashion. The rule generation process is based on the self-constructing clustering approach [23, 24] checking coherence of input and output space. The rule merging scenario is driven by the similarity analysis via the distance and orientation of two hyperplanes. The online hyperplane tuning scenario is executed using the fuzzily weighted generalized recursive least square (FWGRLS) method.

  4. an extension of PALM, namely recurrent PALM (rPALM), is put forward in this work. rPALM addresses the underlying bottleneck of HPBC method: dependency on target variable due to the definition of point-to-hyperplane distance [25]. This concept is inspired by the teacher forcing mechanism in the deep learning literature where activation degree of a node is calculated with respect to predictor’s previous output. The performance of rPALM has been numerically validated in our supplemental document where its performance is slightly inferior to PALM but still highly competitive to most prominent SANFSs in terms of accuracy.

  5. Two real-world problems from our own project, namely online identification of Quadcopter unmanned aerial vehicle (UAV) and helicopter UAV, are presented in this paper and exemplify real-world streaming data problems. The two datasets are collected from indoor flight tests in the UAV lab of the university of new south wales (UNSW), Canberra campus. These datasets, PALM and rPALM codes are made publicly available in [26].

The efficacy of both type-1 and type-2 PALMs have been numerically evaluated using six real-world and synthetic streaming data problems. Moreover, PALM is also compared against prominent SANFSs in the literature and demonstrates encouraging numerical results in which it generates compact and parsimonious network structure while delivering comparable and even better accuracy than other benchmarked algorithms.

The remainder of this paper is structured is as follows: Section II discusses literature survey over closely related works. In Section III, The network architecture of both type-1 and type-2 PALM are elaborated. Section IV describes the online learning policy of type-1 PALM, while Section V presents online learning mechanism of type-2 PALM. In Section VI, the proposed PALM’s efficacy has been evaluated through real-world and synthetic data streams. Finally, the paper ends by drawing the concluding remarks in Section VII.

Ii Related Work and Research Gap With the State-of-The-Art Algorithms

SANFS can be employed for data stream regression, since they can learn from scratch with no base knowledge and are embedded with the self-organizing property to adapt to the changing system dynamics [27]. It fully work in a single-pass learning scenario, which is efficient for online learning under limited computational resources. An early work in this domain is seen in [6] where an SANFS, namely SONFIN, was proposed. Evolving clustering method (ECM) is implemented in [28] to evolve fuzzy rules. Another pioneering work in this area is the development of the online evolving T-S fuzzy system namely eTS [7] by Angelov. eTS has been improved in the several follow-up works: eTS+ [29], Simpl_eTS [8], AnYa [16]. However, eTS+, and Simpl_eTS generate axis parallel ellipsoidal clusters, which cannot deal effectively with non-axis parallel data distribution. To deal with the non-axis parallel data distribution, an evolving multi-variable Gaussian (eMG) function was introduced in the fuzzy system in [30]. Another example of SANFS exploiting the multivarible Gaussian function is found in [10] where the concept of statistical contribution is implemented to grow and prune the fuzzy rules on the fly. This work has been extended in [9]

where the idea of statistical contribution is used as a basis of input contribution estimation for the online feature selection scenario.

The idea of SANFS was implemented in type-2 fuzzy system in [31]. Afterward, they have extended their concept in local recurrent architecture [32], and interactive recurrent architecture [33]. These works utilize Karnik-Mendel (KM) type reduction technique [34], which relies on an iterative approach to find left-most and right-most points. To mitigate this shortcoming, the KM type reduction technique can be replaced with design coefficient [35] introduced in [36]. SANFS is also introduced under the context of metacognitive learning machine (McLM) which encompasses three fundamental pillars of human learning: what-to-learn, how-to-learn, when-to-learn. The idea of McLM was introduced in [37]. McLM has been modified with the use of Scaffolding theory, McSLM, which aims to realize the plug-and-play learning fashion [38]

. To solve the problem of uncertainty, temporal system dynamics and the unknown system order McSLM was extended in recurrent interval-valued metacognitive scaffolding fuzzy neural network (RIVMcSFNN)

[11]. The vast majority of SANFSs are developed using the concept of HSSC and HESC which impose considerable memory demand and computational burden because both rule premise and rule consequent have to be stored and evolved during the training process.

Iii Network Architecture of PALM

In this section, the network architecture of PALM is presented in details. The T-S fuzzy system is a commonly used technique to approximate complex nonlinear systems due to its universal approximation property. The rule base in the T-S fuzzy model of that multi-input single-output (MISO) system can be expressed in the following IF-THEN rule format:


where stands for the th rule, and indicates the number of rules, denotes the dimension of input feature, is the th input feature, and b are consequent parameters of the sub-model belonging to the th rule, is the output of the th sub-model. The T-S fuzzy model can approximate a nonlinear system with a combination of several piecewise linear systems by partitioning the entire input space into several fuzzy regions. It expresses each input-output space with a linear equation as presented in (1). Approximation using T-S fuzzy model leads to a nonlinear programming problem and hinders its practical use. A simple solution to the problem is the utilization of various clustering techniques to identify the rule premise parameters. Because of the generation of the linear equation in the consequent part, the HPBC can be applied to construct the T-S fuzzy system efficiently. The advantages of using HPBC in the T-S fuzzy model can be seen graphically in Fig. 1.

Figure 1: Clustering in T-S fuzzy model using hyperplanes

Some popular algorithms with HPBC are fuzzy C-regression model (FCRM) [39], fuzzy C-quadratic shell (FCQS) [40], double FCM [18], inter type-2 fuzzy c-regression model (IT2-FCRM) [22]. A main limitation of these algorithms is their non-incremental nature which does not suit for data stream regression. Moreover, they still deploy Gaussian function to represent the rule premise of TS fuzzy model which does not exploit the parameter efficiency trait of HPBC. To fill up this research gap, a new membership function [22] is proposed to accommodate the use of hyperplanes in the rule premise part of TS fuzzy system. It can be expressed as:


where is the number of rules, is an adjustment parameter which controls the fuzziness of membership grades. Based on the observation in [22], and empirical analysis with variety of data streams in our work, the range of is settled as . denotes the distance from present sample to the th hyperplane. In our work, is defined as [22] as follows:


where and respectively stand for the input vector of the th observation and the output weight vector of the th rule. This membership function enables the incorporation of HPBC directly into the T-S fuzzy system directly with the absence of rule parameters except the first order linear function or hyperplane. Because a point to plane distance is not unique, the compatibility measure is executed using the minimum point to plane distance. The following discusses the network structure of PALM encompassing its type-1 and type-2 versions. PALM can be modeled as a four-layered network working in tandem, where the fuzzy rule triggers a hyperplane-shaped cluster and is induced by (3). Since T-S fuzzy rules can be developed solely using a hyperplane, PALM is free from antecedent parameters which results in dramatic reduction of network parameters. Furthermore, it operates in the one-pass learning fashion where it works point by point and a data point is discarded directly once learned.

Iii-a Structure of Type-1 PALM Network:

In type-1 PALM network architecture, the membership function exposed in (2) is utilized to fit the hyperplane-shaped cluster in identifying type-1 T-S fuzzy model. To understand the work flow let us consider that a single data point is fed into PALM at the observation. Appertaining to the concept of type-1 fuzzy system, this crisp data needs to be transformed to fuzzy set. This fuzzification process is attained using type-1 hyperplane-shaped membership function, which is framed through the concept of point-to-plane distance. This hyperplane-shaped type-1 membership function can be expressed as:


where in (4) denotes the distance between the current sample and th hyperplane as with (3). It is defined as per definition of a point-to-plane distance [25] and is formally expressed as follows:


where and are consequent parameters of the th rule, is the number of input dimension, and is the target variable. The exertion of is an obstruction for PALM due to target variable’s unavailability in testing phase. This issue comes into picture due to the definition of a point-to-hyperplane distance [25]. To eradicate such impediment, a recurrent PALM (RPALM) framework is developed here. We refer curious readers to the supplementary document for details on the RPALM. Considering a MISO system, the IF-THEN rule of type-1 PALM can be expressed as follows:


where is the extended input vector and is expressed by inserting the intercept to the original input vector as , is the weight vector for the th rule, is the consequent part of the th rule. Since type-1 PALM has no premise parameters, the antecedent part is simply hyperplane. It is observed from (6) that the drawback of HPBC-based TS fuzzy system lies in the high level fuzzy inference scheme which degrades the transparency of fuzzy rule. The intercept of extended input vector controls the slope of hyperplane which functions to prevent the untypical gradient problem.

The consequent part is akin to the basic T-S fuzzy model’s rule consequent part . The consequent part for the th hyperplane is calculated by weighting the extended input variable with its corresponding weight vector as follows:


It is used in (7) after updating recursively by the FWGRLS method, which ensures a smooth change in the weight value. In the next step, the rule firing strength is normalized and combined with the rule consequent to produce the end-output of type-1 PALM. The final crisp output of the PALM for type-1 model can be expressed as follows:


The normalization term in (8

) guarantees the partition of unity where the sum of normalized membership degree is unity. The T-S fuzzy system is functionally-equivalent to the radial basis function (RBF) network if the rule firing strength is directly connected to the output of the consequent layer

[41]. It is also depicted that the final crisp output is produced by the weighted average defuzzification scheme.

Iii-B Network structure of the Type-2 PALM :

Type-2 PALM differs from the type-1 variant in the use of interval-valued hyperplane generating the type-2 fuzzy rule. Akin to its type-1 version, type-2 PALM starts operating by intaking the crisp input data stream to be fuzzied. Here, the fuzzification occurs with help of interval-valued hyperplane based membership function, which can be expressed as:


where is the upper and lower hyperplane, is interval valued distance, where is the distance between present input samples and th upper hyperplane, and is that between present input samples and th lower hyperplane. In type-2 architecture, distances among incoming input data and upper and lower hyperplanes are calculated as follows:


where and are the interval-valued coefficients of the rule consequent of type-2 PALM. Like the type-1 variants, type-2 PALM has dependency on target value (). Therefore, they are also extended into type-2 recurrent structure and elaborated in the supplementary document. The use of interval-valued coefficients result in the interval-valued firing strength which forms the footprint of uncertainty (FoU). The FoU is the key component against uncertainty of data streams and sets the degree of tolerance against uncertainty.

In a MISO system, the IF-THEN rule of type-2 PALM can be expressed as:


where is the extended input vector, is the interval-valued weight vector for the th rule, is the consequent part of the th rule, whereas the antecedent part is merely interval-valued hyperplane. The type-2 fuzzy rule is similar to that of the type-1 variant except the presence of interval-valued firing strength and interval-valued weight vector. In type-2 PALM, the consequent part is calculated by weighting the extended input variable with the interval-valued output weight vectors as follows:


The lower weight vector for the th lower hyperplane, and upper weight vector for the th upper hyperplane are initialized by allocating higher value for upper weight vector than the lower weight vector. These vectors are updated recursively by FWGRLS method, which ensures a smooth change in weight value.

Before performing the defuzzification method, the type reduction mechanism is carried out to craft the type-reduced set - the transformation from the type-2 fuzzy variable to the type-1 fuzzy variable. One of the commonly used type-reduction method is the Karnik Mendel (KM) procedure [34]. However, in the KM method, there is an involvement of an iterative process due to the requirement of reordering the rule consequent first in ascending order before getting the cross-over points iteratively incurring expensive computational cost. Therefore, instead of the KM method, the design factor [35] is utilized to orchestrate the type reduction process. The final crisp output of the type-2 PALM can be expressed as follows:




where and are the left and right outputs resulted from the type reduction mechanism. and , utilized in (14) and (15), are the design factors initialized in a way to satisfy the condition . In our design factor, the and steers the proportion of the upper and lower rules to the final crisp outputs and of the PALM. The normalization process of the type-2 fuzzy inference scheme [36] was modified in [11] to prevent the generation of the invalid interval. The generation of this invalid interval as a result of the normalization process of [36] was also proved in [11]. Therefore, normalization process as adopted in [11] is applied and advanced in terms of and in our work. Besides, in order to improve the performance of the proposed PALM, the and are not left constant rather continuously adapted using gradient decent technique as explained in section IV. Notwithstanding that the type-2 PALM is supposed to handle uncertainty better than its type-1 variant, it incurs a higher number of network parameters in the level of as a result of the use of upper and lower weight vectors . In addition, the implementation of q-design factor imposes extra computational cost because and call for a tuning procedure with the gradient descent method.

Iv Online Learning Policy in Type-1 PALM

This section describes the online learning policy of our proposed type-1 PALM. PALM is capable of starting its learning process from scratch with an empty rule base. Its fuzzy rules can be automatically generated on the fly using the self constructive clustering (SCC) method which checks the input and output coherence. The complexity reduction mechanism is implemented using the hyperplane merging module which vets similarity of two hyperplanes using the distance and angle concept. The hyperplane-based fuzzy rule is adjusted using the FWGRLS method in the single-pass learning fashion.

Iv-a Mechanism of Growing Rules

The rule growing mechanism of type-1 PALM is adopted from the self-constructive clustering (SSC) method developed in [23, 24] to adapt the number of rules. This method has been successfully applied to automatically generate interval-valued data clouds in [17] but its use for HPBC deserves an in-depth investigation. In this technique, the rule significance is measured by calculating the input and output coherence. The coherence is measured by analysing the correlation between the existing data samples and the target concept. Hereby assuming the input vector as , target vector as , hyperplane of the th local sub-model as , the input and output coherence between and each are calculated as follows:


where express the correlation function. There are various linear and nonlinear correlation methods for measuring correlation, which can be applied. Among them, the nonlinear methods for measuring the correlation between variables are hard to employ in the online environment since they commonly use the discretization or Parzen window method. On the other hand, Pearson correlation is a widely used method for measuring correlation between two variables. However, it suffers from some limitations: it’s insensitivity to the scaling and translation of variables and sensitivity to rotation [42]. To solve these problems, a method namely maximal information compression index (MCI) is proposed in [42], which has also been utilized in the SSC method to measure the correlation between variables as follows:



express the variance of

and respectively, presents the covariance between two variables and , stands for Pearson correlation index of and . In a similar way, the correlation and can be measured using (18) and (19). In addition, the MCI method measures the compressed information when a newly observed sample is ignored. Properties of the MCI method in our work can be expressed as follows:

  1. a maximum possible correlation is .

  2. express symmetric behavior .

  3. invariance against the translation of the dataset.

  4. express the robustness against rotation.

is projected to explore the similarity between and directly, while is meant to examine the dissimilarity between and indirectly by utilizing the target vector as a reference. In the present hypothesis, the input and output coherence need to satisfy the following conditions to add a new rule or hyperplane:


where and are predetermined thresholds. If the hypothesis satisfies both the conditions of (20), a new rule is added with the highest input coherence. Besides, the accommodated data points of a rule are updated as . Also, the correlation measure functions are updated with (18) and (19). Due to the utilization of the local learning scenario, each rule is adapted separately and therefore covariance matrix is independent to each rule , here is the number of inputs. When a new hyperplane is added by satisfying (20), the hyperplane parameters and the output covariance matrix of FWGRLS method are crafted as follows:


Due to the utilization of the local learning scenario, the consequent of the newly added rules can be assigned as the closest rule, since the expected trend in the local region can be portrayed easily from the nearest rule. The value of in (21) is very large . The reason for initializing the C matrix with a large value is to obtain a fast convergence to the real solution [43]. The proof of such consequent parameter setting is detailed in [44]. In addition, the covariance matrix of the individual rule has no relationship with each other. Thus, when the rules are pruned in the rule merging module, the covariance matrix, and consequent parameters are deleted as it does not affect the convergence characteristics of the C matrix and consequent of remaining rules.

Iv-B Mechanism of Merging Rules

In SANFS, the rule evolution mechanism usually generate redundant rules. These unnecessary rules create complicacy in the rule base, which hinders some desirable features of fuzzy rules: transparency and tractability in their operation. Notably, in handling data streams, two overlapping clusters or rules may easily be obtained when new samples occupied the gap between the existing two clusters. Several useful methods have been employed to merge redundant rules or clusters in [29, 45, 9, 17]. However, all these techniques are appropriate for mainly hypersphere-based or ellipsoid-based clusters.

In realm of hyperplane clusters, there is a possibility of generating a higher number of hyperplanes in dealing with the same dataset than spherical or ellipsoidal clusters because of the nature of HPBC in which each hyperplane represents specific operating region of the approximation curve. This opens higher chance in generating redundant rules than HSSC and HESC. Therefore, an appropriate merging technique is vital and has to achieve tradeoff between diversity of fuzzy rules and generalization power of the rule base. To understand clearly, the merging of two hyperplanes due to the new incoming training data samples is illustrated in Fig. 2.

Figure 2: Merging of redundant hyperplanes (rules) due to newly incoming training samples

In [46], to merge the hyperplanes, the similarity and dissimilarity between them are obtained by measuring only the angle between the hyperplanes. This strategy is ,however, not conclusive to decide the similarity between two hyperplanes because it solely considers the orientation of hyperplane without looking at the relationship of two hyperplanes in the target space.

In our work, to measure the similarity between the hyperplane-shaped fuzzy rules, the angle between them is estimated as follows [47, 9]:


where is ranged between and radian, The angle between the hyperplanes is not sufficient to decide whether the rule merging scenario should take place because it does not inform the closeness of two hyperplanes in the target space. Therefore, the spatial proximity between two hyperplanes in the hyperspace are taken into account. If we consider two hyperplanes as and then the minimum distance between them can be projected as follows:


The rule merging condition is formulated as follows:


where , are predefined thresholds. If (24) is satisfied, fuzzy rules are merged. It is worth noting that the merging technique is only applicable in the local learning context because, in case of global learning, the orientation and similarity of two hyperplanes have no direct correlation to their relationship.

In our merging mechanism, a dominant rule having higher support is retained, whereas a less dominant hyperplane (rule) resided by less number of samples is pruned to mitigate the structural simplification scenario of PALM. A dominant rule has a higher influence on the merged cluster because it represents the underlying data distribution. That is, the dominant rule is kept in the rule base in order for good partition of data space to be maintained and even improved. For simplicity, the weighted average strategy is adopted in merging two hyperplanes as follows:


where is the output weight vector of the th rule, is the output weight vector of th rule, and is the output weight vector of the merged rule, is the population of a fuzzy rule. Note that the rule is more influential than the rule , since The rule merging procedure is committed during the stable period where no addition of rules occurs. This strategy aims to attain a stable rule evolution and prevents new rules to be merged straightaway after being introduced in the rule base. As an alternative, the Yager’s participatory learning-inspired merging scenario [45] can be used to merge the two hyperplanes.

Iv-C Adaptation of Hyperplanes

In previous work on hyperplane based T-S fuzzy system [48], recursive least square (RLS) method is employed to calculate parameters of hyperplane. As an advancement to the RLS method, a term for decaying the consequent parameter in the cost function of the RLS method is utilized in [49] and helps to obtain a solid generalization performance - generalized recursive least square (GRLS) approach. However, their approach is formed in the context of global learning. A local learning method has some advantages over its global counterpart: interpretability and robustness over noise. The interpretability is supported by the fact that each hyperplane portrays specific operating region of approximation curve. Also, in local learning, the generation or deletion of any rule does not harm the convergence of the consequent parameters of other rules, which results in a significantly stable updating process [50].

Due to the desired features of local learning scenario, the GRLS method is extended in [9, 11]: Fuzzily Weighted Generalised Recursive Least Square (FWGRLS) method. FWGRLS can be seen also as a variation of Fuzzily Weighted Recursive Least Square (FWRLS) method [7] with insertion of weight decay term. The FWGRLS method is formed in the proposed type-1 PALM, where the cost function can be expressed as:


where denotes a diagonal matrix with the diagonal element of , represents a regularization parameter, is a decaying factor, is the extended input vector, is the covariance matrix, is the local subsystem of the th hyperplane. Following the similar approach as [9], the final expression of the FWGRLS approach is formed as follows:





with the initial conditions


where denotes the Kalman gain, is the number of rules, is a large positive constant. In this work, the regularization parameter is assigned as an extremely small value . It can be observed that the FWGRLS method is similar to the RLS method without the term . This term steers the value of even to update an insignificant amount of it minimizing the impact of inconsequential rules. The quadratic weight decay function is chosen in PALM written as follows:


Its gradient can be expressed as:


By utilizing this function, the adapted-weight is shrunk to a factor proportional to the present value. It helps to intensify the generalization capability by maintaining dynamic of output weights into small values [51].

V Online Learning Policy in Type-2 PALM

The learning policy of the type-1 PALM is extended in the context of the type-2 fuzzy system, where q design factor is utilized to carry out the type-reduction scenario. The learning mechanisms are detailed in the following subsections.

V-a Mechanism of Growing Rules

In realm of the type-2 fuzzy system, the SSC method has been extended to the type-2 SSC (T2SSC) in [17]. It has been adopted and extended in terms of the design factors and , since the original work in [17] only deals with a single design factor . In this T2SSC method, the rule significance is measured by calculating the input and output coherence as done in the type-1 system. By assuming as interval-valued hyperplane of the th local sub-model, the input and output coherence for our proposed type-2 system can be extended as follows:




Unlike the direct calculation of input coherence in type-1 system, in type-2 system the is calculated using (37) based on left and right input coherence. By using the MCI method in the T2SCC rule growing process, the correlation is measured using (18) and (19), where are substituted with , , , . The conditions for growing rules remain the same as expressed in (20) and is only modified to fit the type-2 fuzzy system platform. The parameter settings for the predefined thresholds are as with the type-1 fuzzy model.

V-B Mechanism of Merging Rules

The merging mechanism of the type-1 PALM is extended for the type-2 fuzzy model. To merge the rules, both the angle and distance between two interval-valued hyperplanes are measured as follows:


where , and . This and also needs to satisfy the condition of (24) to merge the rules, where the same range of and are applied in the type-2 PALM. The formula of merged weight in (25) is extended for the interval-valued merged weight as follows:


where . As with the type-1 PALM, the weighted average strategy is followed in the rule merging procedure of the type-2 PALM.

V-C Learning of the Hyperplane Submodels Parameters

The FWGRLS method [9] is extended to adjust the upper and lower hyperplanes of the interval type-2 PALM. The final expression of the FWGRLS method is shown as follows:





where , , , and . The quadratic weight decay function of FWGRLS method remains in the type-2 PALM to provide the weight decay effect in the rule merging scenario.

(a) prediction

(b) rule evolution
Figure 3: (a) Online identification of helicopter (in hovering condition); (b) rule evolution in that identification using type-2 PALM (L)

V-D Adaptation of Design Factors

The design factor as used in [11] is extended in terms of left and right design factor to actualize a high degree of freedom of the type-2 fuzzy model. They are initialized in such a way that the condition is maintained. In this adaptation process, the gradient of and with respect to error can be expressed as follows:


After obtaining the gradient and , the and are updated using formulas as follows:


where is a learning rate. Note that the learning rate is a key of and convergence because it determines the step size of adjustment. An adaptive strategy as done in [38] can be implemented to shorten the convergence time without compromising the stability of adaptation process.

V-E Impediments of the Basic PALM structure

In the PALM, hyperplane-shaped membership function is formulated exercising a distance exposed in (5). The is calculated using true output value based on theory of point to hyperplane distance [25]. Therefore, the PALM has a dependency on the true output in deployment phase. Usually, true outputs are not known in the deployment mode. To circumvent such structural shortcoming, the so-called "Teacher Forcing" mechanism [52] is employed in PALM. In teacher forcing technique, network has connections from outputs to their hidden nodes at the next time step. Based on this concept, the output of PALM is connected with the input layer at the next step, which constructs a recurrent PALM (RPALM) architecture. The modified distance formula for the RPALM architecture is provided in the supplementary document. Besides, the code of the proposed RPALM is made available in [53]. Our numerical results demonstrate that rPALM produces minor decrease of predictive accuracy compared to PALM but is still better than many of benchmarked SANFSs. The downside of the RPALM is that the rules are slightly not transparent because it relies on its predicted output of the previous time instant rather than incoming input .

Model Reference RMSE using testing samples NDEI using testing samples Number of rules Number of inputs Network Parameters Number of training samples Execution time (sec)
DFNN [41] 0.7800 4.8619 1 2 6 200 0.0933
GDFNN [54] 0.0617 0.3843 1 2 7 200 0.0964
FAOSPFNN [55] 0.0716 0.4466 1 2 4 200 0.0897
eTS [7] 0.0604 0.3763 5 2 30 200 0.0635
simp_eTS [8] 0.0607 0.3782 3 2 18 200 1.5255
GENEFIS [9] 0.0479 0.2988 2 2 18 200 0.0925
PANFIS [10] 0.0672 0.4191 2 2 18 200 0.3162
pRVFLN [17] 0.0478 0.2984 2 2 10 200 0.0614
Type-1 PALM (L) - 0.0484 0.3019 8 2 24 200 0.1972
Type-1 PALM (G) - 0.0439 0.2739 8 2 24 200 0.1244
Type-2 PALM (L) - 0.0377 0.2355 2 2 12 200 0.2723
Type-2 PALM (G) - 0.0066 0.0410 14 2 84 200 0.3558
Table I: Modeling of the Box-Jenkins Time Series using various Self-Adaptive Neuro-Fuzzy Systems

Vi Evaluation

PALM has been evaluated through numerical studies with the use of synthetic ad real-world streaming datasets. The code of PALMs and RPALMs along with these datasets have been made publicly available in [26, 53].

Vi-a Experimental Setup

Vi-A1 Synthetic Streaming Datasets

Three synthetic streaming datasets are utilized in our work to evaluate the adaptive mechanism of the PALM: 1) Box-Jenkins Time Series dataset; 2) the Mackey-Glass Chaotic Time Series dataset; and 3) non-linear system identification dataset.

Box-Jenkins Gas Furnace Time Series Dataset

The Box–Jenkins (BJ) gas furnace dataset is a famous benchmark problem in the literature to verify the performance of SANFSs. The objective of the BJ gas furnace problem is to model the output i.e. the concentration from the time-delayed input methane flow rate and its previous output . The I/O configuration follows the standard setting in the literature as follows:


This problem consists of 290 data samples (52) where 200 samples are reserved for the training samples while remaining 90 samples are used to test model’s generalization.

Mackey-Glass Chaotic Time Series Dataset

Mackey-Glass (MG) chaotic time series problem having its root in [56] is a popular benchmark problem to forecast the future value of a chaotic differential delay equation by using the past values. Many researchers have used the MG dataset to evaluate their SANFSs’ learning and generalization performance. This dataset is characterized by their nonlinear and chaotic behaviors where its nonlinear oscillations replicate most of the physiological processes. The MG dataset is initially proposed as a control model of the generation of white blood cells. The mathematical model is expressed as:


where and . The chaotic element is primarily attributed by . Data samples are generated through the fourth-order Range Kutta method and our goal is to predict the system output at using four inputs: , and . This series-parallel regression model can be expressed as follows:


For the training purpose, a total of 3000 samples between and is generated with the help of the 4th-order Range-Kutta method, whereas the predictive model is tested with unseen 500 samples in the range of to assess the generalization capability of the PALM.

Non-linear System Identification Dataset

A non-linear system identification is put forward to validate the efficacy of PALM and has frequently been used by researchers to test their SANFSs. The nonlinear dynamic of the system can be formulated by the following differential equation:


where . The predicted output of the system depends on the previous inputs and its own lagged outputs, which can be expressed as follows:


The first 50000 samples are employed to build our predictive model, and other 200 samples are fed the model to test model’s generalization.

Vi-A2 Real-World Streaming Datasets

Three different real-world streaming datasets from two rotary wing unmanned aerial vehicle’s (RUAV) experimental flight tests and a time-varying stock index forecasting data are exploited to study the performance of PALM.

Quadcopter Unmanned Aerial Vehicle Streaming Data

A real-world streaming dataset is collected from a Pixhawk autopilot framework based quadcopter RUAV’s experimental flight test. All experiments are performed in the indoor UAV laboratory at the University of New South Wales, Canberra campus. To record quadcopter flight data, the Robot Operating System (ROS), running under the Ubuntu 16.04 version of Linux is used. By using the ROS, a well-structured communication layer is introduced into the quadcopter reducing the burden of having to reinvent necessary software.

During the real-time flight testing accurate vehicle position, velocity, and orientation are the required information to identify the quadcopter online. For system identification, a flight data of quadcopter’s altitude containing approximately 9000 samples are recorded with some noise from VICON optical motion capture system. Among them, 60% of the samples are used for training and remaining 40% are for testing. In this work, our model’s output is estimated as from the previous point and the system input , which is the required thrust to the rotors of the quadcopter. The regression model from the quadcopter data stream can be expressed as follows:

Helicopter Unmanned Aerial Vehicle Streaming Data

The chosen RUAV for gathering streaming dataset is a Taiwanese made Align Trex450 Pro Direct Flight Control (DFC), fly bar-less, helicopter. The high degree of non-linearity associated with the Trex450 RUAV vertical dynamics makes it challenging to build a regression model from experimental data streams. All experiments are conducted at the UAV laboratory of the UNSW Canberra campus. Flight data consists of 6000 samples collected in near hover, heave and in ground effect flight conditions to simulate non-stationary environments. First 3600 samples are used for the training data, and the rest of the data are aimed to test the model. The nonlinear dependence of the helicopter RUAV is governed by the regression model as follows:


where is the estimated output of the helicopter system at .

Time-Varying Stock Index Forecasting Data

Our proposed PALM has been evaluated by the time-varying dataset, namely the prediction of Standard and Poor’s 500 (S&P-500 (^GSPC)) market index [57, 58]. The dataset consists of sixty years of daily index values ranging from 3 January 1950 to 12 March 2009, downloaded from [59]. This problem comprises 14893 data samples. In our work, the reversed order data points of the same 60 years indexes have amalgamated with the original dataset, forming a new dataset with 29786 index values. Among them, 14893 samples are allocated to train the model and the remainder of 14893 samples are used for the validation data. The target variable is the next day S&P-500 index predicted using previous five consecutive days indexes: , and . The functional relationship of the predictive model is formalized as follows:


This dataset carries the sudden drift property which happens around 2008. This property corresponds to the economic recession in the US due to the housing crisis in 2009.

Model Reference RMSE using testing samples NDEI using testing samples Number of rules Number of inputs Network Parameters Number of training samples Execution time (sec)
DFNN [41] 3.0531 12.0463 1 4 10 3000 11.1674
GDFNN [54] 0.1520 0.6030 1 4 13 3000 12.1076
FAOSPFNN [55] 0.2360 0.9314 1 4 6 3000 13.2213
eTS [7] 0.0734 0.2899 48 4 480 3000 8.6174
simp_eTS [8] 0.0623 0.2461 75 4 750 3000 20.9274
GENEFIS [9] 0.0303 0.1198 42 4 1050 3000 4.9694
PANFIS [10] 0.0721 0.2847 33 4 825 3000 4.8679
pRVFLN [17] 0.1168 0.4615 2 4 18 2993 0.9236
Type-1 PALM (L) - 0.0688 0.2718 5 4 25 3000 0.8316
Type-1 PALM (G) - 0.0349 0.1380 18 4 90 3000 0.7771
Type-2 PALM (L) - 0.0444 0.1755 11 4 110 3000 2.8138
Type-2 PALM (G) - 0.0159 0.0685 13 4 130 3000 2.4502
Table II: Modeling of the Mackey–Glass Chaotic Time Series using various Self-Adaptive Neuro-Fuzzy Systems

Vi-B Results and Discussion

In this work, we have developed PALM by implementing type-1 and type-2 fuzzy concept, where both of them are simulated under two parameter optimization scenarios: 1) Type-1 PALM (L); 2) Type-1 PALM (G); 3) Type-2 PALM (L); 4) Type-2 PALM (G). denotes the update strategy while stands for the learning mechanism. Basic PALM models are tested with three synthetic and three real-world streaming datasets. Furthermore, the models are compared against eight prominent variants of SANFSs, namely DFNN [41], GDFNN [54], FAOSPFNN [55], eTS [7], simp_eTS [8], GENEFIS [9], PANFIS [10], and pRVFLN [17]. Experiments with real-world and synthesis data streams are repeated with recurrent PALM. All experimental results using the RPALM are also purveyed in the supplementary document. Proposed PALMs’ efficacy has been evaluated by measuring the root mean square error (RMSE), and nondimensional error index (NDEI) written as follows: