1 Introduction
Research in fuzzy inference systems (FIS) initiated by zadeh1988fuzzy has drawn the attention of many disciplines over the past three decades. The success of FIS is evident from its applicability and relevance in numerous research areas: control systems (lee1990fuzzy, wang1996approach), engineering (precup2011survey), medicine (jain2017fuzzy), chemistry (komiyama2017chemistry), computational biology (jin2008fuzzy), finance and business (bojadziev2007fuzzy), computer networks (elhag2015combination, gomez2002evolving), fault detection and diagnosis (lemos2013adaptive)
(melin2011face). These are just a few among numerous FIS’s successful applications (liao2005expert, castillo2014review), which is mainly attributable to FIS’s ability to manage uncertainty and computing for noisy and imprecise data (zadeh1992fuzzy).The enormous amount of research and innovations in multiple dimensions of FIS propelled its success. These research dimensions realize the concept of: geneticfuzzy systems (GFS), neurofuzzy systems (NFS), hierarchical fuzzy systems (HFS), evolving fuzzy systems (EFS), and multiobjective fuzzy systems (MFS) which are fundamentally relied on two basic fuzzy rule types: Mamdani type (mamdani1974application), and Takagi–Sugano–Kang (TSK) type (takagi1985fuzzy). Both rule types have “IF X is A THEN Y is B” rule structure, i.e., the rules are in the antecedent and consequent form. However, the rule types Mamdanitype and TSKtype differ in their respective consequent. For the consequent, Mamdanitype takes an output action (a class), and TSKtype takes a polynomial function. Thus, they differ in their approximation ability. The Mamdanitype has a better interpretation ability, and the TSKtype has a better approximation accuracy. For antecedent, both types take a similar form that is a rule induction process take place for input space partition to form antecedent part of a rule. Therefore, the rule types, the rule induction process, and the interpretabilityaccuracy tradeoff govern the FIS’s dimensions.
In GFS, researchers investigate mechanisms to
encode and optimize the FIS’s components. The encoding takes place in the form of genetic vectors and genetic population and the optimization take place in the form of FIS’s structure and parameters optimization.
herrera2008genetic, cordon2004ten, and castillo2012optimization summarized research in GFS with taxonomy to explain both encoding and structure optimization using a genetic algorithm (GA). NFS research investigates network structure formation and parameter optimization (jang1993anfis) and answers the variations in network formation methods and the variations in parameter optimization techniques. buckley1995neural, andrews1995survey, and karaboga2018adaptive offer summaries of such variations. torra2002review and wang2006survey reviewed research in HFS which summarizes the variations in HFS design types and HFS parameter optimization techniques. The EFS research enables incremental learning ability into FISs (kasabov1998evolving, angelov2008evolving), and the MFS research enables FISs to deal with multiple objectives simultaneous (ishibuchi2007multiobjective, fazzolari2013review).This review paper offers a synthesized view of each dimension: GFS, NFS, HFS, EFS, and MFS. The synthesis recognizes these dimensions being linked to each other where the concept of one dimension applies to another. For example, NFS and EFS models can be optimized by GA. Hence, GFS entails its concepts to NFS and EFS. The complexity and concept arises from the synthesis offer a potential to investigate deep fuzzy systems (DFS), which may take advantage of GFS, HFS, and NFS simultaneously in a hybrid manner where NFS will offer solutions to network structure formation, HFS may offer solutions to resolving hierarchical arrangement of multiple layers, and GFS may offer solutions to parameter optimization. Moreover, EFS and MFS also play a role in DFS is if the goal will be to construct a system for the data stream and to optimize a system for interpretabilityaccuracy tradeoff.
This review walks through each dimension: GFS, NFS, HFS, EFS, and MFS, including a discussion on the standard FIS. First, the rule structure, rule types, and FISs types are discussed in Sec. 2. A discussion on the FIS’s designs describing how various FIS’s paradigms emerged through the interaction of FIS with neural networks (NN) and evolutionary algorithms (EA) is given in Sec. 2.3. Sec. 3 discusses the GFS paradigm which emerged through FIS and EA combinations. Sec. 4 describes the NFS paradigm including reference to selfadaptive and online system notions (Sec. 4.1); basic layers (Sec. 4.2); and feedforward and feedback architectures (Sec. 4.3). They are followed by the discussions on the HFS’s properties and the HFS’s implementations (Sec. 5). Sec. 6 summarized the EFS which offers an incremental leaning view in FIS. Sec. 7 offered the discussions on MFS which covers the Paretobased multiobjective optimization and the FIS’s multiple objective tradeoffs implementations. Followed by the challenges and the future scope in Sec. 8, and conclusions in Sec. 9.
2 Fuzzy inference systems
A standard FIS (Fig. 1) is composed of the following components:

a fuzzifier unit that fuzzifies the input data;

a knowledge base (KB) unit, which contains fuzzy rules of the form IFTHEN, i.e.,
IF a set of conditions (antecedent) is satisfied
THEN a set of conditions (consequent) can be inferred

an inference engine module that computes the rules firing strengths to infer knowledge from KB; and

a defuzzifier unit that translates inferred knowledge into a rule action (crisp output).
The KB of the FIS is composed of a database (DB) and a rulebase (RB). The DB assigns fuzzy sets (FS) to the input variables and the FSs transforms the input variables to fuzzy membership values. For rule induction, RB constructs a set of rules fetching FSs from the DB.
In a FIS, an input can be a numeric variable or a linguistic variable. Moreover, an input variable can be singleton [Fig. 2(a)] and nonsingleton [Fig. 2(b)]. Accordingly, a FIS is singleton FIS if it uses singleton inputs, i.e., FIS uses crisp and precise single value measurement as the input variables, which is the most common practice. However, realworld problems, especially in engineering, measurements are noisy, imprecise, and uncertain. Thus, FIS that uses nonsingleton input is a nonsingleton FIS. Thus, in principle, a nonsingleton FIS differs with a singleton FIS in input fuzzification process where a “fuzzifier” transform a nonsingleton input and a singleton input to a fuzzy membership value.
A fuzzifier maps a singleton input (crisp input) , for (a value in ) [Fig. 2(a)] to the following membership function for the input fuzzyfication:
(1) 
For nonsingleton inputs, a fuzzifier maps input (that is considered as noisy, imprecise, and uncertain) onto a Gaussian function (typical choice for numeric variables) as:
(2) 
where is input (considered as mean, a value along line ) and
is the standard deviation (std.) that defines the spread of the function
. The value of the fuzzy set at is and decreases from unity as moves away from (mouzouris1997nonsingleton). In general, for a singleton or nonsingleton input , inference engine output is a combination of fuzzified input with an antecedent FS as per:(3) 
where is tnorm operation that can be minimum or product and indicate supremum of Eq. (3). Fig. 3 is an example product operation in Eq. (3). Fig. 3 evaluates the product of input FS and the antecedent fuzzy set that result in for where , , , and . The product gives a maximum value at (in Fig. 3) which is calculated as:
(4) 
The design of RB further distinguishes the type of FISs: a Mamdanitype FIS (mamdani1974application) or a TakagiSuganoKang (TSK)type FIS (takagi1985fuzzy). A TSKtype FIS differs with a Mamdanitype FIS only in the implementation of fuzzy rule’s consequent part. In Mamdanitype FIS rule’s consequent part is an FS, whereas in TSKtype FIS rule’s consequent part is a polynomial function.
The DB contains FSs that are either a type1 fuzzy set (T1FS) or a type2 fuzzy set (T2FS). The basic form of a fuzzy membership function (MF) is coined as a T1FS; whereas, T2FS allows an MF to be fuzzy itself by extending membership value into an additional membership dimension. Hence, the fuzzy set (FS) types also differentiate FIS types: type1 FIS (T1FIS) and the type2 FIS (T2FIS).
For simplicity, this paper is singleton FIS centric and refers nonsingleton FIS to appropriate research. As well as, since Mamdanitype FIS differs with TSKtype FIS only in its consequent part, this paper focuses on TSKtype FIS.
2.1 Type1 fuzzy inference systems
A TSKtype FIS is governed by the “IF–THEN” rule of the form (takagi1985fuzzy):
(5) 
where is the rule in the FIS’s RB. The rule has as the T1FS, and as a function of inputs that returns a crisp output . At the rule, inputs are selected from inputs. Note that varies from ruletorule, and thus, the input dimension at a rule is denoted as . That is, the subset of inputs to a rule has elements, which leads to a incomplete rule because all available inputs may not be present to rule premises (antecedent part). Otherwise, a complete rule has all available inputs at its premises. The function , for TSKtype, is commonly expressed as:
(6) 
where is the inputs and for = to is the free parameters at the consequent part of a rule. For Mamdanitype, may be expressed as a “class.” The basic building blocks of a FIS is shown in Fig. 1 whose defuzzified crisp output is computed as follows. At first, the inference engine fires the RB’s rules, each rule has a firing strength as:
(7) 
where is the membership value of T1FS MF (e.g., Fig. 4a) at the rule. Assuming firing strength has to be computed for a nonsingleton input , then firing strength will replace in Eq. (7) by as per Eq.3. A detail generalization definition of firing strength computation is given by mouzouris1997nonsingleton.
The defuzzified output of T1FIS, as an example, is computed as:
(8) 
where is the total rules in the RB.
2.2 Type2 fuzzy inference systems
A T2FS is characterized by a 3D MF (mendel2013km): The xaxis is the primary variable, the yaxis is secondary variable (primary MF denoted by ), and the zaxis is the MF value (secondary MF denoted by ). Hence, for a singleton input , a T2FS is defined as:
(9) 
The MF value has a 2D support, called “footprint of uncertainty” of , which is bounded a lower membership function (LMF) and an upper membership function (UMF) . A T2FS bounded by an LMF and a UMF is an interval type2 fuzzy set (IT2FS), e.g., a Gaussian function [Eq. (10)] with uncertain mean and std. is an IT2FS (e.g., Fig. 4b):
(10) 
An LMF [Eq. (11)] and a UMF [Eq. (12)] of an IT2FS can be defined as (karnik1999type):
(11) 
(12) 
In Fig. 4b, a point along the xaxis of 3DIT2FS cuts the UMF and LMF along the yaxis, and the value of the type2 MF is taken along the zaxis [dotted line, which a MF in the third dimension in Fig. 4b between and ]. Considering the IT2FS MF, the IF–THEN rule of TSKtype T2FIS, for inputs , takes the form:
(13) 
where is a T2FS, is a function of that returns a pair called left and right weights of the consequent part of a rule. In TSK, is usually written as:
(14) 
where is the input and for = to is a rule’s consequent part’s parameter and for = to is its deviation factor. The firing strength of IT2FS is computed as:
(15) 
At this stage, the inference engine fires the rule and the typereducer, e.g., center of set as per Eq. (16) reduces the T2FS to T1FS (karnik1999type, wu2009enhanced):
(16) 
where and are left and right ends of the interval. For the ascending order of and , and are computed as:
(17) 
where and are the switch points determined as per respectively. Subsequently, defuzzified crisp output is computed.
For a nonsingleton interval type2 FIS, lower and upper intervals of nonsingleton inputs are created. Additionally, similar to the nonsingleton input fuzzification in the case of nonsingleton type1 FIS using input FS and antecedent FS shown in Eq. (3), for nonsingleton type2 FIS, both lower and upper intervals products are calculated using lower and upper input FSs and and lower and upper antecedent FSs and . sahab2011adaptive describe the computation of nonsingleton type2 FIS in detail.
2.3 Heuristic designs of fuzzy systems
The FIS types: Type1 (Sec. 2.1) and Type2 (Sec. 2.2) follow a similar design procedure and differ only in the type of FSs being used. The heuristic design of FIS can be viewed from its hybridization with neural networks (NN), evolutionary algorithms (EA), and metaheuristics (MH) (Fig. 5). And, such a confluence offers (herrera2008genetic):

genetic fuzzy systems (A);

neurofuzzy systems (B);

hybrid neurogenetic fuzzy systems (C); and

heuristics design of NNs (D).
This paper discusses areas A, B, and C of Fig. 5, area D in Fig. 5 is discussed in detail by ojha2017metaheuristic. The heuristic design installs learning capabilities into FIS which come from the optimization of its components. The FIS optimization/learning in a supervised environment is common practice.
Typically, in supervised learning, a FIS is trained/optimized by supplying training data () of input–output pairs, i.e., and . Each input variable is an –dimensional vector, and it has a corresponding –dimensional desired output vector . For the training data (), a FIS model produces output , where , is a set of fuzzy rules, and is an –dimensional model’s output, which is compared with the desired output , and , by using some error/distance/cost function over model .
The cost function can be a mean squared error function or can be an accuracy measure, depending on the desired outputs being continuous (regression) or discrete (classification) (caruana2004data). Learning of FIS is therefore rely on reducing a cost function by employing strategies for designing and optimizing a FIS model , where model design may be refereed to how the FIS’s components interact with each other and optimization may be referred to: RB design, RB parameter learning, and rule selection. In summary, FIS design, optimization, learning, and modeling is viewed as:

the selection of FSs via fuzzy partitioning of inputspace;

the design of FIS’s rules via an arrangement of rule and inputs;

the optimization of the rule’s parameters; and

the inference from the designed FIS.
Often a Gaussian function, a triangular function, or a trapezoidal function are selected as the MF of an FS (zadeh1999fuzzy). The inputspace partition corresponding to the MF assignments is one of the most crucial aspects of FIS design. For example, a twodimension inputspace in Fig. 6 having inputs and are partitioned using a gridpartitioning method (jin2000fuzzy, jang1993anfis) or a clusteringbased partitioning method (juang1998online, kasabov2002denfis). Fig. 6 is an example of inputs space partitioning for numerical variables. An example of partitioning for linguistic terms is explained by cord2001genetic. mao2005adaptive presented an example of inputspace partitioning using a binary tree, where the root of the tree takes whole input and partition it into two children nodes and . The partitioned subsets are assessed for a defined cost function . If the cost is lower than a defined threshold than the inputspace partitioning stops, else continues.
After the inputspace partition, FIS is designed via an arrangement of rules and optimization of rule’s parameters for inference from FIS. As per Fig. 5, FIS design can be performed by combining the FIS concept with GA and NN. Such synergy between two or more methods improves the system’s approximation capabilities (funabashi1995fuzzy). In this respect, let us revisit four different synergetic models (Fig. 5) which indicate four ways of hybridizing artificial intelligence (AI) techniques. The fuzzy system modeling combined with EA, MH, and NN falls in within the synergetic model: (1) combination, when the produced rules are optimized by an EA algorithm or an MH algorithm, and (2) fusion, when EA or an NN are used to design FIS, i.e., to construct RB.
3 Genetic fuzzy systems
EA (back1996evolutionary) and MH (talbi2009metaheuristics) have been effective in FIS optimization (cordon2004ten, herrera2008genetic, sahin2012hybrid). EA and MH are applied to design, optimize, and learn the fuzzy rules, and this gives the notions of evolutionary/genetic fuzzy systems (GFS). The basic needs of GFS are:

defining a population structure;

encoding FIS’s elements as the individuals in the population;

defining genetic/metaheuristic operators; and

defining fitness functions relevant to the problem.
3.1 Encoding of genetic fuzzy systems
The questions how to define a population structure and how to encode elements of a FIS as the individuals (called chromosome) of the population opens a diverse implementation of GFS. A FIS has the following elements: inputoutput variables; rule’s premises FSs; rule’s consequent FSs and rule’s parameters; and the rule set. These elements are combined (encoded to create a vector) in a varied manner that offers diversity in answering the mentioned questions.
Lets be an RB, a set of rules , then Fig. 8 represent two basic genetic population structures: and .
A rule that has FSs, for T1FS and for T2FS, for , the rule parameter vector may be encoded as (herrera1995tuning, ishibuchi1997comparison, ojha2016metaheuristic):
(18) 
where has two parameters and represent center and width of T1FS; and has three parameters , , and represent center, deviation factor, and width respectively. The variable for are the type1 rule’s consequent weights (parameters) and variable and for are the type2 rule’s consequent weights and weights deviations respectively.
For linguistic fuzzy terms, FS will take a single integer (e.g., the integers 0, 1, and 2, respectively may indicate a linguistic term “very small,” “small,” and “large”). For a Mamdanitype rule, thrift1991fuzzy and kim1995designing proposed decision matrix [a rule table as per Eq. (9)] for fuzzy rules. Such a decision table can be encoded as a genetic vector for the FIS learning (hadavandi2010integration).
Considering genetic fuzzy populations in Fig. 8, the Michigan approach (michigan1982) suggests encoding of a rule parameters as a chromosome, in population , i.e., of rules. Hence, optimization fuzzy system is the reduction of cost function over entire population. In Michigan approach, the optimization of population is met through mutation and crossover of rules, discarding and adding new rules into the population (ishibuchi1997comparison).
Second genetic fuzzy population in Fig. 8 has each chromosome representing a RB:
(19) 
a set of rules/chromosomes for . Thus, the population for enables both “rule optimization” and “rule selection” opportunities. The rule selection using population is known as the Pittsburgh approach (pittsburgh1980) that suggests encoding of fuzzy rule set into a single chromosome, a vectored representation of RB. Pittsburgh approach suggest selecting a subset of rules from a set (sometime randomly generated) of rules, . In the Pittsburgh approach, the optimization of the population is met through mutation and crossover of the RB and by enabling and disabling the rules in an RB. Hence, the optimization of FIS is the reduction of the cost function of the chromosomes within the population (ishibuchi1997comparison).
Relaying on the population structure and , numerous literature offers GFS with varied FIS’s elements encoding methods: lee1993integrating created a composite chromosome combining tuple of MF components and rule consequent parameters. Similar composite encoding was performed by papadakis2002ga for TSKtype rules. wu2006genetic puts MF’s parameter of a type2 fuzzy rule on a genetic vector. Using the population structure , ishibuchi1995selecting created rules as per Eq. (19), where each rule takes one of three status: if , if , and if was created as a dummy rule.
hoffmann1997evolutionary presented a concept of messy encoding by assigning an integer value to FIS’s elements while encoding them as a chromosome. For example, the rule IF is AND is THEN is were encoded as per where input variables were , , and FSs were , , . hoffmann1997evolutionary argued that such an encoding is benefited from GA since the sequence is messed up by GA operations, and thus creates a diverse rule. melin2012genetic amid for obtaining the best rule by assigning a status to TIFIS 0 and TIFIS 1, Mamdanitype rule 0, and TSKtype rule 1 apart from assigning an integer value to a FS.
3.2 Training of genetic fuzzy systems
GFS training depended on FIS’s encoding, and the GFS training should answer the questions:

Which EA/MH algorithms be used?

Whether only a few elements of FIS training is sufficient?

How should EA/MH operators be defined for the encoded GFS?
The answer to the first question relies on how an individual chromosome was encoded, as well as; it is a matter of choice from the range of optimization algorithms (back1996evolutionary, talbi2009metaheuristics). The answer to second questions was investigative by carse1996evolving with four GFS learning schemes: (1) learning MF parameters for fix rules; (2) learning rules by keeping MF parameters fix; (3) learning both MF parameters and rules in stages (one after another); and (4) learning both MF parameters and rules simultaneously. carse1996evolving concluded that learning in both MF and rule is necessary for solving a complex system, and GFS benefits from the cooperation of rules. However, it was left for an empirical evaluation to determine the best performance of stagewise or simultaneous learning. The answer to the third question is subjective to population definition (Fig. 8) and encoding mechanisms (Section 3.1) since a chromosome (solution vector) can be coded in three ways: a binaryvalued vector, an integervalued vector, and a realvalued vector. Accordingly, an EA/MH optimization as per Algorithm 1 is employed, and the algorithm’s operators are chosen and designed.
The binaryvalues vector and the integervalued vector optimization is both a combinatorial and a continuous optimization problem, both of which follow the general procedure as per Algorithm 1
. It is a combinatorial optimization when the binary vector and integer vector encoding domain is discreet. That is, the encoding (assignment) of each FIS’s element takes either 0 or 1
(ishibuchi1995selecting), or takes an integer number (hoffmann1997evolutionary, tsang2007genetic), and FIS’s fitness depends on finding the best combination of FIS’s elements. Hence, a global search algorithm like genetic algorithm (GA) (goldberg1988genetic), discrete particle swarm optimization (PSO)
(kennedy1997discrete), or discrete Ant algorithms (dorigo1999ant) can be employed to optimize binary vector and integervalued vector. The FIS optimization is a continuous optimization problem when the domain is continuous and FIS optimization is finding the best performing realvalued vector representing the rules parameters (herrera1995tuning). Hence, GA (wright1991genetic), PSO (kennedy2011particle), ACO (socha2008ant), or a search algorithm (yang2010nature) can be used for the realvalued vector optimization as per Algorithm 1.The optimization in a binary or an integer vector invites crossover operator like singlepoint crossover, twopoint crossover, and composite crossover; and the mutation operator like bit flip, random bit resetting, (goldberg1988genetic). Whereas, real vector invites crossover operators like uniform crossover, arithmetic crossover (goldberg1991real, eshelman1993real). ishibuchi1999hybrid exploited both approaches Pittsburgh and Michigan simultaneously, where for the Pittsburgh approaches they designed mutation operator as the Michigan approach for rule generation.
Typically, as an example, for a onepoint crossover and for two selected chromosomes and (also called parents), two new chromosomes and (also called offspring) are produced by swapping elements of the parent chromosomes (a chromosome is vector few elements) as follows:
(20) 
Similarly, as an example, for a onepoint mutation, one a chromosome is selected and a new chromosome is produced by replacing the an element of the chromosome by a new element or a random element (e.g., flipping 0 to 1 in binary chromosome, replacing a integer by another integer, and replacing a realvalue by another random realvalue) as follows:
(21) 
The realvalued vector encoding of FIS’s elements allows a varied FSs to lie on the same genetic vector. Hence, it is necessary to ensure that each gene (dimension) corresponding to a FIS’s element takes a value within a defined interval. For example, in Eq. (18), the variables and are MF’S parameter, and they need a defined interval like and to control the MF’s shape. cordon1997three defined interval of performance for assuring a boundary for each dimension in the vector.
martinez2010fuzzy employed PSO for finding optimal MF parameter of an encoded GFS. shahzad2009hybrid combined PSO and GA in a hybrid approach where PSO and GA start with similar populations of rules and swap the best solution iteratively among PSO and GA populations to make communication between both optimizers. martinez2015hybrid extended shahzad2009hybrid hybrid PSO and GA approach to optimize T2FIS, and valdez2011improved proposed a hybrid approach of PSObased FIS and GAbased FIS where depending upon their errors, the two rule types were activated and deactivated during the FIS optimization. An empirical evaluation of bioinspired algorithms summarized by castillo2012comparative suggests that ACO outperformed PSO and GA as GFS optimization. Examples of MHbased GFS implementations are chemical optimization(melin2013optimal), harmony search (pandiarajan2016fuzzy), artificial bee colony optimization (habbi2015self), bacteria foraging optimization (verma2017optimal).
3.3 Other forms of genetic fuzzy systems
Similar to Michigan approach, also in iterative rule learning scheme (venturini1993sia, gonzalez1997multi, ahn2007iterative) and cooperativecompetitive rule learning (greene1993competition, whitehead1996cooperative), each rule of an RB are encoded into separate genotypes, and the population of such genotype leads to the formation of RB iteratively. Iterative learning scheme starts with an empty set and adds rules onebyone to the set by finding an optimum rule from a genetic selection process. For this purpose, the genetic operators such as mutation and crossover are applied over one or two rule(s) to make offspring rule(s), and the quality of the generated rule(s) is(are) evaluated using a predefined rule quality measure. Therefore, iteratively selecting rules according to rule quality measure criteria for forms an optimum RB in an iterative manner (venturini1993sia).
The cooperativecompetitive rule learning is also an RB learning method that determines an optimum RB from competition and cooperation of rules from a genetic/metaheuristic population. GFS is also implemented as the reinforcement learning system.
juang2000genetic proposed symbiotic evolutionary learning of fuzzy reinforcement learning system which uses a cooperative coevolutionary GA for the evolution of fuzzy rules from a population of rules. A reinforcement T2FIS optimization was performed by ACO in (juang2009reinforcement). Aiming cooperation among FIS’s components, delgado2004coevolutionary split the genetic population into four separate populations: RB, individual rules, FSs, and FISs. They proposed a coevolutionary GFS relying on a hierarchical collaborative approach where each population, cooperatively shared application domain fitness as well as the population’s individuals.A fuzzy tree system, e.g., TSK rule in (mao2005adaptive, chien2002learning), allows the rules to be implemented as a binary tree and an expression tree
and the rules tree structures to be optimized by genetic programming (GP)
(koza1994genetic). hoffmann2001genetic implemented TSK rule as a local linear incremental model tree, where the algorithm incrementally built the tree while partitioning the inputspace using a binary tree formation. On the other hand, the expression tree approach for fuzzy rule implementation and optimization using rules tree population was performed in (sanchez2001combining, cordon2002new). Their approach also included a mapping of ruletree parameters (leaf node) onto a vector for its optimization using simulated annealing (aarts1988simulated).4 Neurofuzzy systems
Since the early 90s (jang1991fuzzy, jang1993anfis, buckley1994fuzzy, andrews1995survey, karaboga2018adaptive), neurofuzzy systems (NFS) that represent a fusion of both FIS and NN has been forefront among FIS’s research dimensions, especially attributed to its datadriven learning ability which does not require prior
knowledge of the problem. However, NN needs sufficient training pattern to learn, and a trained NN model does not explain how to interpret its computational behavior, i.e., NN’s computational behavior is a “black box,” which does not explain how the output was obtained for the input data. On the other hand, FIS requires prior knowledge of the problem and do not have learning ability, but it tells how to interpret its computational behavior, i.e., it explains how the output was obtained for the input data.
The shortcomings of both NN and FIS can be eliminated by combining them while making an NFS (feuring1999stability, ishibuchi2001numerical). Usually, for the rule extraction from NFS, two types of combinations are practiced (andrews1995survey): cooperative NFS and hybrid NFS. The cooperative NFS is the simplest approach closer to combination and association synergetic AI (Fig. 7). In cooperative NFS, NN and FIS work independently, and NN determines FIS’s parameters from the training data (sahin2012hybrid). Subsequently, FIS performs the required interpretation of the data. Hybrid NFS is closer to fusion synergetic AI (Fig. 7), in which, both NN and FIS are fused to create a model. Working in synergy improve the learning ability of NFS since both NN and FIS are independently capable of approximate to any degree of accuracy (buckley1999equivalence, li2000equivalence).
NFS are trained in two fundamental manners: supervised learning (See section 2.3) and reinforcement learning (lin1994reinforcement, moriarty1996efficient). This paper scope includes supervised learning extensively; whereas, the reinforcement learning for NFS is available in (berenji1992learning) through the implementation of generalized approximate reasoning based intelligencecontrol and in (nauck1993fuzzy) through model named NEFCON.
4.1 Notions of neurofuzzy systems
Selfadaptive/Selforganizing/Selfconstructing system
In NFS’s context, the adaptive systems or the selfadaptive systems may refer to the automatic tuning and adjustment of MF’s parameters (jang1993anfis, wang2002self). Whereas, a system is nonadaptive if human expert determines the MFs and their parameters. Similarly, selforganizing systems (juang1998online, wang1999self) and selfconstructing systems (lin2001self) refer to the creation of fuzzy rules and the adaptation of MF’s parameters without the intervention of human experts. The implementation of a selforganizing NFS and a selfconstructing NFS holds the key to formation appropriate RB (juang1998online, lin2001self).
There are two leaning aspects of selfadaptive NFS: structural learning and parameter learning (lin1995neural). An NFS, therefore, will be selfadaptive if it performs either of these two learning aspects or both of them during learning. In addition to the learning without human intervention, adaptive systems like selfadaptive systems and selfconstruction systems when strictly refer to online training and incremental learning for every piece of new training data, then the system may be referred to as an evolving fuzzy system (EFS) (see Sec. 6).
Online learning system/Dynamic learning system
Online learning refers to samplebysample learning. A learning system is an online learning system that adapts its structure and parameters each time it sees a training sample rather than seeing the entire training samples set (batch) at once (jang1993anfis). Similarly, a dynamic learning system and a dynamically changing system adapts its structure and parameters on receiving new training sample (wu2000dynamic, wu2001fast). In a sense, systems that grow their structures by adding MFs nodes and rule nodes are also referred to as the dynamically growing systems and the dynamic evolving systems (kasabov2002denfis, kasabov2001line). FIS’s research dimension EFS encompass online and dynamic learning systems (see Sec. 6).
Another viewpoint refers to dynamic learning systems as the recurrent fuzzy systems. In other words, the systems which accommodate temporal dependency and whose next (one step ahead) adaptation (learning) is a function of the model’s previous output (jang1992self, juang1999recurrent). In FIS research, these jargons are used with diverging context.
4.2 Layers of neurofuzzy systems
An NFS architecture typically is composed of a maximum of seven layers as shown in Fig. 10 whose layers that can be customized in various forms for both type1 and type2 FISs. The type1 and type2 FISs only differ in the type of FSs they used. Hence, the variations in type1 and type2 NFS architecture depends on the FS type used at the MF layer and the methods used at nodes to performs the computation for type1 and type2 FSs. Moreover, the typereduction that requires for type2 FIS can be implemented at one of the layer indicated available in the consequent part.
The implementation of NFS architecture categorized into two types of layers: the layers implementing the antecedent part and the layers implementing the consequent part of a rule. The number of layers in the design of NFS may vary depending upon how the antecedent and consequent part were implemented. Regardless of a layer mention in Fig. 10 explicitly appear or not in an NFS architecture, the functionality of that layer is accommodated in the either of adjacent layers to that layer. Let us discuss the functionality of the typical NFS layers:
Input layer ():
A node at the input layer holds , and primarily has a function , i.e., the raw input is fed to the next layer without any manipulation. To the best of our literature knowledge, all models agree to the transfer of inputs to the next layer without any modification. Hence, represents the output a node of the input layer, where is the dimension of the inputspace. However, models agree to either fuzzify inputs at the membership function layer () or fuzzify inputs by employing a fuzzy weight to the link connecting input layer () directly to rule layer ().
The connections/links between and is therefore, not fully connected. Rather, each input is connected to its partitioned FSs. Or in the absence of layer connection between and are not filly connected. Such partially connections between and or between and play an important role in obtaining diverse rules.
Membership function layer ():
A node at the MF layer , also called fuzzifier layer, holds , and primarily has a function , i.e., a MF is applied on input . MFs are often problem specific. An MF can be a Gaussian function, a triangular function, or a trapezoidal function. MF layer often refereed as the fuzzification layer that performs fuzzification of the inputs. MF layer is also responsible for the partitioning of the inputspace (Fig. 6). The mapping of inputs to MF layer also helps to overcome the curse of dimensionality (brown1995high).
Additionally, whether an MF layer is a separate layer or it acts as a fuzzy weight between the layers and , the MF layer’s operation remains the same. The input to an MF layer is that has been partitioned into FSs with and . Traditionally, inputs partition is kept fixed. However, automatically determining the inputspace partition by using clustering based method gives flexibility to NFS’s structural adaptation, and such an act is often refereed as structural learning. It also reflects the notions of the selfconstructing system (lin2001self). Examples of clustering for inputspace partition are: Knearest neighbor (wang1999self); mapping constrained agglomerative (wang2002self); evolving clustering (kasabov2001evolving)
; and evolving selforganizing map
(deng2003line).Rule layer ():
A node at the rule layer holds a function , and primarily performs , i.e., a rule layer node typically computes Tnorm of the previous layer’s inputs . Thus, a node at rule layer represents the antecedent (premises) part of a rule that takes inputs , where is FS fed to a rule node.
The inputs to a rule node may or may not be equal to the total number of partitions of an input . It also indicates that connections between layer and layer , which are often partly connected, govern the diversity of the rules being formed. It also gives flexibility for a structural adaptation (structural learning) in the fuzzy system being realized. For example, an algorithm may starts with no rule, and it only recruits a rule node if it is necessary during its online leaning (tung2011safin, juang1998online). The output of a rule layer node or in other words antecedent part of a rule is denoted as . Hence, the output of rules ( nodes at the rule layer) can be denoted as .
Normalization layer ():
Normalization layer computes the firing strength of the rules, which is . Therefore, the number of nodes at the layer is thus equal to the number of nodes at the layer and the connection between and is fully connected.
Term/Consequent layer ():
The nodes as term layer computes consequent part of a rule. Thus, the number of nodes at term layer are the same as the number of nodes at the layer and layer . Each node at this later has a function and the definition of depends on the FIS’s type implemented, e.g., Mamdani or TSK. In other words, what type of function implemented at the nodes of layer (horikawa1992fuzzy). Assuming that nodes at the layer are constant, then the output , where is a constant. Another type of consequent/term implementation of TSK (firstorder liner equation) node, where .
Additional layer ():
Additional layer is infrequent in NFS architecture design, which performs specific operation producing the output . The definition of in (park2002fuzzy) is a polynomial neural network. Whether the additional layer is present () or absent (), the input to the output layer is .
Output layer ():
For a single output problem, output layer holds a single node that usually is the summation of incoming inputs to the node, i.e., . Therefore, the output node act as a defuzzifier. Hence, the operation at the output layer with a function applied on is to obtain NFS’s output .
4.3 Architectures of neurofuzzy systems
4.3.1 Feedforward designs
Feedforward NFS architecture have forward connections from one layer to another and have at least three layers: input, rule, and output. Therefore, the simplest NFS architecture is IRO, i.e., Input, Rule, and Output layer architecture.
IRO architecture:
masuoka1990neurofuzzy represented IRO NFS architecture as a combination of the inputvariablemembership net, the rulenet, and the outputvariablemembership net. Moreover, the fuzzy rules are directly translated into NNs where the nodes at layer realize rule’s antecedent MFs, the node at layer represent fuzzy operation (e.g., AND), and the nodes at layer realize the rule’consequent part. This type of representation can easily be translated back and forth between fuzzy rules and NNs. However, the expert intervention will be required in the NFS construction.
buckley1995neural showed a design of threelayered IRO NFS architecture and implemented IRO NFS for discrete fuzzy systems and nondiscrete fuzzy systems (Fig. 11a). Being a threelayered architecture, their discrete IRO NFS architecture implemented fuzzy rules as the links between layers and , and the layer processed incoming signals from the transfer functions (nodes at layer ) using some aggregation function . The rules in the discrete IRO NFS is, therefore, can run in parallel. However, for a large input, the rules can grow to huge unmanageable size for a low discrete factor (buckley1995neural). On the other hand, in a nondiscrete IRO fuzzy system, the hidden layer nodes represent the rules and the links between and are set to 1. The nodes at the output layer represent an aggregation of the signals from .
In (buckley1995neural) IRO NFS, the fuzzy rules are implemented as a whole either for the links between and or for the nodes at . Whereas, nauck1997neuro proposed a threelayered IRO NFS architecture with the link between layers and and between layers and representing MFs also called fuzzy weights. In other words, the links between and fuzzify the inputs before feeding them to nodes at and defuzzify them before feeding them to nodes at .
IRO NFS architecture shown in Fig. 11b was proposed for specific problems like classification and approximation bearing abbreviations NEFCLASS (nauck1997neuro) and NEFPROX (nauck1999neuro) respectively. NFSs are shown in Fig. 11 implement the links as the fuzzy weights that improve the NFS interpretability since it avoids more than one MFs to be assigned to similar terms (nauck1997neuro).
IMRO architecture:
NFS design IMRO: input, membership, rule, and output architecture (Fig. 11c) directly computes the output of FIS by assigning weight to the links between layer and (lin2001self, wu2001fast). The IMRO NFS architecture by wang1999self is a fourlayered configuration, where layers and fuzzify the inputs. The layer consists of two nodes: and . The first node computes a weighted sum of the incoming inputs from , where is the number of nodes at layer , and is the links’ weights between from . The weight represent consequent part’s FS’s center. The second node computes sum of incoming inputs from , where the link’s weight between layers and are set to 1. The output layer node, therefore, realizes .
IMRNO/IMRTO architecture:
The fivelayer NFS architecture (Fig. 11d) adds a layer or between the layers and to perform fuzzy quantification via rule normalization or via a fuzzy term nodes (kasabov1997funn, kim1999hyfis). Example of an IMRNO NFS architecture with a normalization layer between and is in (kasabov1997funn). Whereas, an IMRTO NFS architectures with a term layer is the common practice. The nodes at the layer compute fuzzy outputs, and the links between and represent firing strength (confidence factor) of the rules at (kasabov1997funn, kim1999hyfis, kasabov2001line, kasabov2002denfis).
Contrary to IMRNO and IMRTO architectures, the fivelayered NFS presented by leng2006design is an IRNTO architecture that has layers , , , , and . In IRNTO model, nodes at layer combine both MF layer and rule layer , and the term layer between and perform a TSKtype consequent operation for the rule.
In general, fivelayer NFS architecture implements , , and as its rule’s antecedent, where nodes at implements rule’s or AND function. The layer and implements the rule’s consequent part and perform defuzzification. However, apart from and defuzzyification at layers and example of operator at and operator at is available in (shann1995fuzzy).
IMRNTO architecture:
IMRNTO NFS architecture is the most popular NFS architecture, which is attributed to the efficiency and explicit presence of FIS’s components in the architecture (jang1991fuzzy, horikawa1992fuzzy). ANFIS being the most popular implementation of IMRNTO NFS (jang1993anfis). IMRNTO NFS are sixlayered architecture with layers , , , , and . The functioning of the nodes are described in Sec. 4.2.
IMRNTXO architecture:
Beyond IMRNTO NFS architecture, IMRNTXO NFS architecture includes an additional layer that performs certain computation receiving inputs from layer and fed the computed output to the node(s) at layer . The model: modified fuzzy polynomial neural network (park2002fuzzy) is in an example of such sevenlayered architecture, where a polynomial NN that implements a polynomial function (like bilinear and biquadratic), which resembles consequent part of TSK type.
Of general NFS architecture in Fig. 10, five variation in NFS architectures formation is shown in Fig. 11 are IRO (three layers), IMRO (four layers), IMRNO/IMRTO/IRNTO (five layers), IMRNTO (six layers), and IMRNTXO (seven layers). The choice of a particular variation in NFS formation has its advantages and disadvantages. For example, IRO architecture limits itself to three layers, and that restricts it to compute entire FIS operations on a few nodes. IRO architecture computes input fuzzification at input layer node that limits it to mix with multiple FSs and when input fuzzification takes place at the links between input and rule layer an input mix with all available FSs for a fully connected network, that limits a proper fuzzy partitioning. However, IRO architectures are easy to implement and they can be translated to fuzzy rules easier than more complex architecture.
The fourlayer IMRO architecture solves the fuzzy partition issues that may appear in the layer IRO architecture since it adds a membership layer between input and rule layer. In IMRO architecture, the weight optimization of between the input and membership layer may lead to direct optimization of the FS shapes in addition to a comparatively more variation in rule design (Fig. 11c) than IRO architecture.
The fivelayer and sixlayer architectures IMRNO/IMRTO/IRNTO and IMRNTO add FIS components more explicitly than the threelayer and fourlayers architectures. Thus, they offer more efficient ways to design of NFS as a FIS system. In fivelayer architecture forth layer is chosen as a normalization layer or term layer, whereas the sixlayer architecture uses both normalization and term layers to its architecture. Moreover, seven layer architecture IMRNTXO adds an extra layer for a special purpose such as a polynomial network operation as an extra layer.
The difference among the various architectures is apparent regards to the increasingly explicit presence of the FIS components into the architectures with a higher number of layers than the architectures with a lower number of layers. The explicit presence also offers efficiency and opportunity to optimization NFS architecture to individual FIS component.
4.3.2 Feedback/Recurrent designs
Unlike feedforward architecture that models static system and can adapt to a dynamic system through a prepared training set and incremental learning methods, the feedback/recurrent design accommodates dynamic system directly into its structure (model’s learning) either via an external feedback mechanism or via an internal mechanism (mastorocostas2002recurrent). The recurrent/feedback NFS (RNFS) helps in the implementation of the systems that require its output at time step is to be fed as the input to the network at next step . The external feedback RNFS is the most straightforward implementation of RNFS architecture where rules receive network output directly as its input in the next time step. Whereas, the internal feedback NFS design fits when a system require memory elements to be implemented as an FIS component to define the temporal relation of a dynamic system. That is, in the next step, RNFS’s particular layer (e.g., membership, rule, or term layer) receives input (the output ot the previous time step). The example of both recurrent NFS (RNFS) categories are as follows:
External feedback RNFS
Let denote external RNFS architecture design by
, which indicates that the NFS architecture may remain the same as a basic feedforward NFS, but the system incorporates the feedback through one or multiple sources. Such a feedback adaptation can be incorporated through the learning algorithm like temporal backpropagation, e.g., recurrence in ANFIS
(jang1992self).Internal feedback RNFS
Internal feedback NFS design IRO (lee2000identification) takes inputs to its MF node as per . That is, the recurrence occurs at the MF nodes which enabled the membership layer node to operate as a memory unit that extends the NFS ability for the temporal problems (Fig. 12a). Unlike IRO design, the memory element in the design IMO are added at rule layer, and the nodes are called context element (Fig. 12b) that accommodates both spatial firing from MF nodes and feedback (temporal) firing from term nodes (juang1999recurrent). IMRO is the third type of internal feedback design implements, where term nodes act as the memory element (mastorocostas2002recurrent).
4.3.3 Graph and network based architectures:
Apart from the two class of architecture, a general graphical model for information flow was proposed as the Fuzzy Peri nets (looney1988fuzzy)
. Fuzzy Peri net is directed graph with nodes (neurons) and transition bars (links) that are enabled or disabled when neurons fire. The NFS feedforward and feedback architecture, therefore, can be
thought of as the special case of graphical representation. Additionally, examples of FISs combined with adaptive resonance theory (ART) to create fuzzy ART architecture is available in (carpenter1991fuzzy). Similarly, FISs were also fused with the minmax network to create a fuzzy minmax network architecture (simpson1992fuzzy) and fused with radial basis function (RBF) network to created fuzzy RBF architecture (cho1996radial).5 Hierarchical fuzzy systems
GFS is a process of empowering FISs for automatic optimization and learning, which focuses on designing FIS’s components. NFS is NN inspired and it enables the arrangement of FIS’s components into a networklike structure. Whereas, the hierarchical fuzzy systems (HFS) is a hierarchical arrangement of two or more small standard FISs, (say fuzzy logic units  FLU denoted as in Fig. 13) into a hierarchical structure. Hence, HFS invites the following questions:

What are the basic advantages of arranging small FLUs?

What are the possible ways to arrange FLUs?
5.1 Properties of hierarchical fuzzy systems
Let’s take Fig. 9 example, a standard practice of rule set formation for FISs. Now assume the rule table in Fig. 9 has inputs, and each input takes FSs. Hence, the number of rules will be , which means that the number of rules grow exponential at the rate of , and subsequently, the number of parameters to be optimized grow exponentially. This phenomenon is known as the rule explosion and the curse of dimensionality. The rule explosion reduces the basic FIS’s property: interpretation, i.e., the reasoning as to how the output was obtained for the inputs become unknown. It also led to infeasible computation in both space (rule storage space) and time (torra2002review). Additional, in both GFS and NFS, the inputspace partitioning play a crucial role in the FIS’s construction and both GFS and NFS have to employ an external method like clustering to reduce the input space dimensionality. hoffmann2001genetic illustrated a GPbased binarytree like inputspace partition that hierarchically partition inputs space, but they form a standalone FIS.
raju1991hierarchical initiated the design of hierarchical FIS (HFS) that was composed of lowdimensional fuzzy subsystems, called fuzzy logic unit (FLU). One of the arguments for HFS was to overcome the curse of dimensionality (brown1995high) and stop the rule explosion by combining several subfuzzy systems receiving only a few inputs from the whole set of inputs (Fig. 13) This allows the reduction of fuzzy rules, total system’s parameters, and the computation time. Also, the hierarchical design of fuzzy subsystem found to have a universal approximation ability (wang1999analysis, zeng2005approximation, wang1998universal).
Moreover, HFS offers intelligent control over the system for a dynamically changing domain environment (karr2000synergistic). Such a control may be implemented by allowing one of the FLU in HFS to act as performance checker and optimize entire HFS with its feedback.
Torra et al. torra2002review reviewed HFS that presents the following observations for the defining HFS architecture: If some functions are not decomposable then HFS design may not be possible, but for some functions, HFS is proved to be a universal approximator (wang1998universal). If the system’s nonlinearities are independent, then separate FLUs can be constructed. If no preference is given to the order (importance) of variables, then a general HFS is trivial to design, else preferred variables should go at beginning stages of hierarchy. If MF for a variable is sharp, then more MFs should be defined for that variable (wang1999analysis). Finally, the interpretability of HFS might become unknown while reasoning (defuzzification) are repeated at multiple stages (maeda1996investigation).
wang2006survey summarized literature work to investigate the reasoning transparency for the intermediate variable generated by defuzzification at the FLUs at different stages, and concluded that a little work had been done to understand intermediate variables fully. However, in this view, the HFS’s interpretability can be improved, provided sufficient monotonicity of FLUs concerning the inputs magdalena2018hierarchical. kouikoglou2009monotonicity concluded that under certain conditions (won2002parameter), the singlestage HFS’s output is monotonic. Hence, it is sufficient for the monotonicity of multistage HFS design.
5.2 Implementations of hierarchical fuzzy systems
The classification of HFS types is intuitive since the HFS design is a modular arrangement. Thus, HFS have variety in design and modeling lee2003modeling. A general HFS design is any combination of FLUs in stages (Fig. 13). Special cases of general arrangement can be a cascaded (incremental) design of FLUs (chung2000multistage) and chain wise FLUs arrangement (domingo1997knowledge).
Converting standard FISs to HFS
Standard FISs can be transformed to HFS. joo2003method transformed standard FIS which has rules for inputs and
Comments
There are no comments yet.