1 Introduction
The expanding utilization of smart sensors, the increasing availability of data storage, and the emergence of big data have led to an increasing amount of data being produced very often in the form of a stream cohem2008real ; krawczyk2017ensemble ; maia2020evolving . In many realworld applications, this data is organized in the form of a time series. In a time series forecasting problem, the information available for the prediction is limited to the past values of the series cohem2008real . Hence, the temporal relationships which describe the evolution of the series must be deduced exclusively from these values.
Generally, the characteristics of the processes which generate a time series are unknown assaad2008timeseries . In several cases of practical interest, such as stock indices qiu2019fusion in finance, evapotranspiration in agriculture zhao2019ensemble , among others, the variable of interest in the time series is in fact the fusion of many other data sources. Not by accident these kinds of time series tend to show highly nonlinear and nonstationary patterns qiu2019fusion ; zhao2019ensemble .
Many forecasting methods assume that the data is generated from a fixed probability distribution. However, as mentioned before, many timeseries related applications deal with nonstationary and heteroskedastic stochastic processes which may arise from phenomena such as: seasonality, periodicity, hardware and machine faults, aging sensors and components, and unexpected events. These changes modify the properties of the data generating process, then changing its underlying probability distribution over time. In such nonstationary environments, any nonadaptive model trained under the false stationarity assumption is deemed to present a progressively increasing error or simply fail at some point
(Ditzler2015, ).Fuzzy Time Series (FTS) was introduced by Song and Chissom in 1993 (song1993fuzzy, ) to handle with vague and imprecise knowledge in time series data. In FTS, the domain of the variable of interest, called Universe of Discourse (UoD), is divided into subdomains, and each of them is linked to a fuzzy set. After the construction of these fuzzy sets, temporal patterns of the type IFTHEN are extracted from the training data in order to identify a rulebase able to represent the generating function of the time series.
FTS forecasting methods have become attractive due to their simplicity, model transparency, forecasting accuracy and computational performance. Some examples of successful applications are found in tourism demand forecasting (lee2011weighted, ), energy load (chen2016electric, ; sadaei2017short, ; severiano2017very, ), stock index price predictions (Chen2011, ; silva2016interval, ; talarposhti2016stock, ), and many more. However, when dealing with nonstationary stochastic processes, the values of the time series might go outside the UoD as defined from the training data. Furthermore, the initial setting of the fuzzy sets may become inadequate over time due to the lack of mechanisms that will allow the membership functions to adapt to the varying behavior of the time series (garibaldi2008nonstationary, ).
In (song1993fuzzy, ), Song and Chisom presented an approach to induce timevariant Fuzzy Time Series, by retraining the FTS model in a sliding window. Thus, every time a new data point is fed to the model, the FTS has to be retrained from scratch. In many cases, this requirement may render the methods impractical due to the related computational costs.
Recently, in (2018esann, ), a first attempt was made to avoid retraining in FTS when they are applied to nonstationary environments. In this approach, NonStationary Fuzzy Sets (NSFS) (garibaldi2008nonstationary, )
were employed to forecast heteroskedastic time series with unconditional variance, i.e., time series where the variance changes through time in a predictable way. Their approach, however, is not able to account for conditional variances and scenarios with conceptdrift.
Given the need to produce adaptive models for forecasting in nonstationary environments bose2019designing , in this paper, we introduce a NonStationary Fuzzy Times Series (NSFTS) method with time varying parameters adapted from the data distribution.
In the proposed approach, the FTS is also built on NonStationary Fuzzy Sets (garibaldi2008nonstationary, ). Based on the residual errors, different perturbation mechanisms adapt the membership functions in response to statistical changes in the time series. Thus, the fuzzy sets will reflect changes in the stochastic data generating process without model retraining. Differently from (2018esann, ), the proposed mechanisms give to the FTS the ability to handle nonstationary and heteroskedastic data as well as scenarios with conceptdrift.
To validate the proposal, different data sets consisting of market indices, FOREX pairs, cryptocurrency exchange rates and synthetic data were used. These data sets were selected because they are all nonstationary and present different types of concept drift. The forecast accuracy of the proposed method was also compared with other methods.
The remainder of this work is organized as follows: in Section 2, the main concepts of Fuzzy Time Series and nonstationary Fuzzy Sets are introduced; in Section 3, time variant methods are discussed; in Section 4, the NSFTS method is presented and Section 5 discusses the results of the computational experiments performed to compare the proposed method against the others methods. Finally, in Section 6, the main findings of the research are synthesized.
2 Preliminaries
2.1 Nonstationary Time Series
Time series data observed in different realworld applications are often nonstationary. Given that a stationary time series is defined in terms of its mean and variance, nonstationarity can be detected if any (or both) of these components vary over time. Thus, in a nonstationary context, if the chosen forecasting model relies on a false assumption of stationarity, there is a significant risk that it will present a degrading performance over time and eventually become obsolete (Ditzler2015, ).
Some nonstationary time series are consequence of hidden contexts, not given in the form of predictive features (Tsymbal2004, ). In such situations, inferring a forecasting model becomes more difficult, since changes in the hidden context can induce unpredictable changes in the target concept. These changes are known as concept drift and since they usually cannot be identified explicitly Ditzler2015 , an effective forecasting model should be able to quickly adapt to the resulting changes in the time series data.
A symptom of concept drift in a time series is the change of the parameters that define its distribution (gama2014survey, ). Following (gama2014survey, )
, the change in the distribution can be classified with respect to the rate at which the drift occurs. For instance, a political event that suddenly causes a strong effect on the stock market is a case of
abrupt concept drift observed in a time series. Another example is the aging effects of a sensor which gradually leads to lower performance in a device. Such case can be referred to as gradual concept drift. According to Ditzler et al. (Ditzler2015, ), the drifts, whether abrupt or gradual, can also be classified as permanent or transient. The former is related to the effect of variation, and is not limited in time, while the latter occurs in a limited time window followed by an effect of disappearance.Due to the practical importance of problems with varying statistical properties, the literature has presented some alternatives to handle the issue. Popular approaches commonly used in forecasting methods, including FTS, are: (i) detrending (kim1999fuzzy, ; huang2003applications, ), in which a data transformation is applied to the timeseries to remove the trend component and (ii) timevarying models derived from the distribution of the raw data (liu2017simulation, ).
2.2 Fuzzy Time Series
Symbol  Description 

Fuzzy time series order (lags)  
Number of fuzzy sets for partitioning the universe of discourse  
Time series model  
Time instant  
Time series value at time  
Estimated time series value at time  
Universe of discourse  
jth fuzzy set  
Membership function of fuzzy set  
Number of time steps  
Fuzzified value of the series at time  
Fuzzy set midpoint  
Window of observations  
Perturbation function for the nonstationary fuzzy sets  
Set of residuals  
Linguistic variable  
Refreshing interval  
Displacement applied to the fuzzy set  
Scaling factor  
Window size  
Sample size  
Set of fuzzy sets in the left hand side of a fuzzy logical relationship  
Set of fuzzy sets in the right hand side of a fuzzy logical relationship 
Since the introduction of the Fuzzy Time Series in (song1993fuzzy, ), several categories of FTS methods have been proposed, varying mainly by their order , number of fuzzy sets and timevariance (bose2019designing, ) – see Table 1 for the convention of symbols adopted here. The order is defined by the number of timedelays (lags) that are used in modeling the time series. The time variance defines whether the FTS model changes over time, with the Time Invariant models having the same parameters for all , and the Time Variant models having different parameters in different time instants .
Given an univariate time series , where are the instances of for , the UoD is limited by the known bounds of , such that . The training procedure of an FTS model consists of the following three steps:

Partitioning: split in overlapping intervals. For each interval a new fuzzy set is created, each one with its own membership function (MF) . A linguistic value is assigned to each fuzzy set and represents a region of . The computational cost of this step is .

Fuzzification: maps the crisp time series onto the fuzzified time series , by replacing each by the fuzzified value , , for . The computational cost of this step is .

Knowledge Extraction: creates a representation of the sequential patterns in the time series. In a rulebased FTS, as in (chen1996forecasting, ), the rules have a format , where is the precedent and is the consequent. The rule can be read as “IF is AND AND is THEN may be ”. The computational cost of this step is .
Once the model is trained it can be used to forecast values of given a sample with a three step procedure:

Fuzzification: maps the crisp sample onto the fuzzified values , where each , , for . The computational cost of this step is .

Rule Matching: find in the rules whose the precedent matches with the fuzzified values . The activation of each rule is the minimum Tnorm of the individual membership values of each fuzzified value. The computational cost of this step depends on the length of the sample () and the number of rules in (), then the complexity is .

Defuzzification: the estimated value of is calculated by finding the mean value of each matched rule , by averaging the midpoints of the consequent fuzzy sets, and then calculating the sum of the mean values of each rule weighted by its activation values
Many improvements were proposed in the FTS literature. The HighOrder FTS (HOFTS) (severiano2017very, ) extended the classical method in (chen1996forecasting, ) by using several lags in the forecast and it is able to recognize more complex patterns in the time series. In yu2005weighted , Cheng2008 and sadaei2014short weights are added in the consequent of the rules in order to give more importance for certain fuzzy sets. More recently, the Probabilistic Weighted FTS (PWFTS) was proposed in (Silva2019b, ), and made available in the pyFTS library pyFTS , including weights in both precedent and consequent of the FTS rules, achieving high accuracy and outperforming traditional forecasting approaches.
In order to represent forecasting uncertainty, (silva2016interval, ), (Silva2019b, ) and (Silva2017ensemble, ) proposed approaches for probabilistic forecasting. In (Silva2019, ) and (Silva2019a, ) distributed FTS variants are proposed for Big Data time series. Multivariate time series are explored in (Silva2019c, ), which uses Fuzzy Information Granules (FIG) to propose a multivariate forecasting method. In (bose2019designing, ) a survey on the design of FTS forecasting models was provided. A comprehensive review of those aforementioned models, focused on timeinvariant and rulebased approaches can be found in (Silva2019a, ).
The major hyperparameters of FTS methods are the number of fuzzy sets and the order of the model
. These hyperparameters conduce the model training and forecasting and are responsible for the accuracy and model parsimony (the number of rules). A method for hyperparameter tuning of FTS models is presented in
patricia2020 .The FTS approaches listed in this section are all timeinvariant approaches which means that the models are trained only once and then its internal rule base does not change. To keep their accuracy, these models must make strong assumptions about the stationarity and homoskedasticity of the time series. In nonstationary environments, this is a major drawback.
To better understand the behavior of the FTS methods in nonstationary scenarios, Figure 1 shows the performances of several methods in the literature when the test data falls out of the known UoD. For this example, we used the NASDAQ dataset. It can be seen that this situation makes most of the trained models useless in the long run.
2.3 Nonstationary Fuzzy Sets
Nonstationary fuzzy reasoning and nonstationary fuzzy sets (NSFS) were introduced by Garibaldi and Ozen and Garibaldi, Jaroszewski and Musikasuwan, respectivelly in (garibaldi2007uncertainmaking, ) and (garibaldi2008nonstationary, ). The main idea is to extend the traditional fuzzy set definition by introducing a dynamic component that changes the membership function over time. This change takes several forms: a) variation in location: by displacing the parameters of the function along the UoD without changing its shape; b) variation in width: changing the shape of by stretching or contracting its bounds; and c) noise variation: by adding random noise to the membership grade.
A NSFS is defined with two functions: the nonstationary membership function (NSMF), which considers time variations of the corresponding membership function (MF), and the perturbation function, which is the dynamic component responsible for altering the parameters of the membership function given some parameter set.
According to Garibaldi et al. (garibaldi2008nonstationary, ) a nonstationary fuzzy set can be formalized as follows:
(1) 
where is a fuzzy set over a UoD characterized by a NSMF at a set of time points .
It is worth noting that the NSMF varies throughout and the time interval . This means that the NSMF parameter set should vary over time. A regular MF, , can be expressed as , where denote the parameters of . Thus, a NSMF can be denoted in an analogous way as follows:
(2) 
where each parameter can be varied over time by a perturbation function multiplied by a constant.
One of the constraints existing in several FTS models is the lack of ability to deal with conditional variances and conceptdrift scenarios, in which the statistical characteristics of the time series change, sometimes drastically. Thus, through small variation in the MFs, we deploy NSFS in FTS models to deal with variability for decisions over time and contributing to the mitigation of this drawback.
3 Time Variant Approaches
Timevariant FTS models should be applied when the data is not compliant with the stationarity and homoskedasticity assumptions. They include incremental, flexible and evolving techniques for adapting the model to the input data (bose2019designing, ; song1994forecasting, ; liu2010improved, ).
In the seminal work of Song and Chissom on time variant FTS song1994forecasting , timevariance can be seen as a metamodeling technique. It is not a proper FTS model, it is a training policy for another FTS method which controls when this method will be retrained and how many lags will be used. More specifically, the time variant approach defines , the length of the memory window, and , the refreshing interval. Thus, the chosen FTS model is built from scratch every time instants using the most recent observations of the time series.
This first timevariant FTS approach (song1994forecasting, ) was followed by several other authors who have mixed its training policy with different knowledge models and weighting schemes (liu2010improved, ; singh2007simple, ; jilani2008refined, ; vovan2018improved, ).
In Figure 2 (left) it is easy to see that all FTS time invariant approaches can be combined with the time variant method and used as the internal model. However, two major drawbacks of the classical Song and Chissom’s method can be highlighted. The first one is its limited memory. Once a new data point arrives the previous knowledge base is completely discarded. This, in turn, may lead to catastrophic forgetting of frequent patterns if the parameters and are not tuned correctly. The second one is its high computational cost. Using a binary search tree structure to organize the fuzzy sets, the time complexity for a search among them decreases from to . Thus, for a given input of size , the complexity is , given that its internal model will be retrained times with a potential cost of .
The Incremental Ensemble, see Figure 2 (right), is an alternative to control the limited memory of the Song and Chissom method by balancing the learning of new behaviors with the memory of the old ones krawczyk2017ensemble ; junior2019iterative ; assaad2008timeseries ; gama2014survey . The Incremental Ensemble is, in fact, a metamodel containing internal models. It is also controlled by the and parameters. At each interval of observations a new model is built with the last observations. is then appended to the ensemble while the oldest model is discarded.
The generalization of Song and Chissom’s method as well as the Incremental Ensemble using FTS as internal models can be seen in Figure 2. As in the Song and Chissom’s case, the major drawback of the incremental ensemble techniques is its computational cost.
4 The NonStationary Fuzzy Time Series method
The proposed NonStationary Fuzzy Time Series method extends the concepts of the Conventional FTS method (chen1996forecasting, ) to incorporate NonStationary Fuzzy Sets presented by Garibaldi et al. (garibaldi2008nonstationary, ). In the proposed forecasting procedure, the mean and variance of the residuals are used to translate and scale the sets to adapt them to the changes occurred in the data after the training process. The parameters of the fuzzy sets change towards canceling the mean of the residuals. If the variance of residuals is high, the range of the fuzzy sets should be reduced to reduce granularity. Moreover, if the bounds of U change, the sets are adapted to respond to this change as well.
By convention, it is assumed that is a real valuated univariate time series and a single data point indexed by time index . It is also assumed that all fuzzy sets have a membership function, , with triangular shapes, as defined in equation (3), where is the input value and are, respectively, the lower, midpoint and the upper basis of the triangle.
(3) 
Furthermore, all NSFS have a perturbation function , defined in (5), over the parameter set of , where is the displacement and is the scale increment. shifts the triangular function across the domain of and scales it by moving the triangle bounds with respect to the center as shown in Figure 3. Thus, the NSMF function is given by:
(4) 
where,
(5) 
The proposed method, depicted in Figure 4, consists of training, parameter adaptation and forecasting procedures.
In the training procedure, the
partitioning creates static fuzzy sets. In a forecasting model, a normal distribution of the residuals suggests that its predictive ability is consistent over the entire data range. Therefore the objective of the training procedure is to generate a model that captures all the information in the data, leaving a residual
. This is done by defining the , and parameters for each one of the fuzzy sets and using them to extract temporal patterns from the data. The most recent residuals of the model are stored in the set which consists of the differences between the model forecasts and the new data collected in that window.Given a nonstationary scenario, it is unlikely that a model with predefined and fixed parameters could manage to keep its residuals normally distributed. In the parameter adaption procedure, the mean and variance of the residuals are monitored and used to adapt the MFs. The NSFS is perturbed in order to keep the residuals as close to as possible. The goal of this procedure, is to find the best values for the parameters and of each NSFS function in order to adapt them to the changes in the data.
The forecasting procedure finds the rules that match a given numerical input and use them to compute a numerical forecast using the nonstationary fuzzy sets perturbed by the parameters and .
It can be noticed that the rationale behind the proposed method is to keep the residuals normally distributed. In this context, the main technical contribution of this paper is to introduce the adaptation procedure which will keep . This procedure is detailed in section 4.2. For completeness of presentation, the training and forescasting procedures defined in (chen1996forecasting, ) are reproduced in subsections 4.1 and 4.3, respectively.
4.1 Training Procedure
Given the training data, , the number of partitions, , and the length of the residuals window, :

Defining the universe of discourse, :
(6) where, and .
Notice that the data bounds are extrapolated to compensate for a possible underestimation of the bounds in the training set. The value is a typical value, but can be modified according to the problem.

Partitioning: Define the midpoints of the initial fuzzy sets;
(7) 
Define the linguistic variable : Create overlapping fuzzy sets , with triangular MF as defined in equation (8).
(8) where and .
Each fuzzy set is a linguistic term of the linguistic variable . Once the fuzzy sets are created, a function , as defined in equation (5), will be associated with each fuzzy set in order to transform it into a NSFS, initialized with and .
The number of sets defines the number of states and consequently the number of state transitions that the model can represent. The more complex the time series the greater the number of should be. One should, however, be careful to not overestimate since it may cause overfitting and make the model unnecessarily big.

Fuzzification: Transform the original time series data into a fuzzy time series , where is defined as:
(9) 
Generate the temporal patterns: The fuzzy temporal patterns have format , where:
The precedent (or Left Hand Side  LHS) is:
(10) And the consequent (or Right Hand Side  RHS) is:
(11) 
Generate the rule base: Select all temporal patterns with the same precedent and group their consequent sets creating a rule with the format . Thus the rule can be understood as the set of possibilities which may happen on time (the consequent) when a certain set is identified on time (the precedent).

Compute the residuals : Invoke the Forecasting Procedure defined on Section 4.3 using the given training data, , as the input and forecast the last items to calculate the set of residuals defined as:
(12) where and is the forecast produced by the model.
4.2 Parameter Adaption Procedure
Given an input value , the last forecast value
, the length of the residuals vector
, the residuals set and the linguistic variable , do:
Find the displacements of on : If is below the lower bound of store the difference in , otherwise . If is above the upper bound of , store the difference in , otherwise . Then compute the displacement range as the sum of the UoD displacements and , according with (13), and the displacement midpoint as divided by 2, according with (14)
(13) (14) 
Compute the mean and variance of the set : The residual mean indicates a bias, a shift on the accuracy of the trained model that must be corrected. The variance, , represents the shift on the trained model variance, as a reflection to a change in the test data. These values will be used to adjust the position and length of the fuzzy sets.

Compute the displacements, : The displacement is computed for each fuzzy set in order to equally move the fuzzy sets by the mean shift , one fraction of the displacement range and proportionally to the expansion of the variance in the interval :
(15) The displacements , for , are equally distributed in the interval and indicate the new position of fuzzy set , given the deviations from the known UoD, whose range is and its midpoint , and the error signal with the mean and variance . Indeed, while the term is used to translate the midpoint of by a fraction of centered in , the term is used to offset the scaling of the fuzzy set bounds by a fraction of centered in .

Compute the scaling factor , Eq. (16): For each fuzzy set the scale increment is empirically calculated as the distance between the displacements , in order to adjust the fuzzy set lengths proportionally to their displacements, avoiding discontinuities between the sets (intervals on not covered by any fuzzy set) by setting and :
(16) 
Update : Get the last forecast value and the last known value . Compute the error term , push it on to the residuals set and delete the oldest value. That is:
(17)
4.3 Forecasting Procedure
Given an input value , the linguistic variable , the inferred rule set and the perturbation parameters and for each fuzzy set :

Fuzzification: Compute the membership grade for each nonstationary fuzzy set , such that ;

Rule matching: Build a set, with all the rules where , that is:
(18) 
Defuzzification: Compute the forecast according to Equation (19) as the weighted sum of the rule midpoints, , by their membership grades for each selected rule :
(19) where
(20)
4.4 Method Discussion
The NSFS has the ability to dynamically adapt membership functions as the statistical properties of the time series vary. The NSFTS method tries to detect these changes by using the displacements of the input data in relation to the known and the residuals statistics to identify bias and variance shifts.
The major assumption of the method is that adjusting the fuzzy sets, without modifying the model structure represented by the learned rules, is enough to adapt the model to the new behavior of the time series. Such procedures, described in subsection 4.2, given an input of size , memory window length , refreshing interval and fuzzy sets, have a time complexity of , while a retraining process would take , as discussed in Section 3. Hence, this approach becomes much cheaper computationally, when compared to retraining a new model from scratch.
The function entails an additive approach for the translation and scaling of the parameters based on grid partitioning of the UoD. The procedure calculates the translation increment proportionally to the displacement of the input value in relation to the known
and the variance of the residuals. The scaling increment works as a heuristic to keep the grid aspect of the fuzzy sets after the translation, connecting the lower and upper bounds with the centers of adjacent fuzzy sets.
Figure 9 depicts the forecasting method applied to a test sample with significant drift from the training data extracted from SP500 time series, later explained in Section 5.
In Figure (a)a the partitioning of the UoD is shown with 15 partitions. The generated forecasts are presented in Figure (b)b and their residuals in Figure (c)c. The perturbations on the NSFS, in response to the previous residuals, are represented in Figure (d)d, where the same fuzzy sets represented in Figure (a)a are now colored over the vertical axis. It is possible to see that these fuzzy sets move, sometimes expanding sometimes contracting, in order to fit to the data. As the drift increases, the displacement and scaling of the fuzzy sets also grow.
The proposed model is a case of active learning, in which the learning algorithm is able to interactively obtain, from different sources of information, the necessary inputs for the generation or improvement of its learning. Basically, it uses residual error values to fit a predefined model. The version here presented focuses on highlighting its adaptability to changes observed over time. It uses firstorder fuzzy rules, that is, considering only a previous observation, and is applied to forecasting problems in univariate time series. However, its time variant model characteristics can be expanded or adapted to other models for better prediction values. For instance, the model can be expanded to comprehend a highorder fuzzy rule generation. It also can be adapted to fit parameters of other FTS models, such as PWFTS, used as benchmark in this work.
5 Computational Experiments
5.1 Experimental Design
Different datasets were chosen for model validation. The datasets consisted of four stock market indices (Dow Jones, NASDAQ, SP500 and TAIEX), three FOREX pairs (EURUSD, EURGBP, GBPUSD), two cryptocoins exchange rates (BitcoinUSD and EthereumUSD) illustrated in A (Fig. 10) and eight synthetic time series with concept drifts, see B (Fig. 11).
The market indexes data sets contain the daily averaged index by business day, such that the Dow Jones Industrial Average (Dow Jones)^{1}^{1}1https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC. Accessed in 11/11/2019 is sampled from 1985 to 2017 time window, the Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX)^{2}^{2}2https://www.twse.com.tw/en/page/products/indices/series.html. Accessed in 11/11/2019 is sampled from 1995 to 2014, the National Association of Securities Dealers Automated Quotations  Composite Index (NASDAQ ÎXIC)^{3}^{3}3https://www.nasdaq.com/marketactivity/index/ixic. Accessed in 12/11/2019 is sampled from 2000 to 2016 and the SP500  Standard & Poor’s 500^{4}^{4}4https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC. Accessed in 12/11/2019 is sampled from 1950 to 2017. The FOREX data sets contain the daily averaged quotations, by business day, from 2016 to 2018, which pairs are the US Dollar to Euro (USDEUR), Euro to Great British Pound (EURGBP) and Great British Pound to US Dollar (GBPUSD). The cryptocoin datasets contain the daily quotations, in US Dollar, of the Bitcoin (BTCUSD) and Ethereum (ETCUSD). The synthetic data aims to represent different types of concept drifts.
The Augmented DickeyFuller (ADF) test was used to check which time series are nonstationary. The Levene’s test was used to test whether the series are heteroskedastic or not. The detailed results of these tests are shown in C and D, respectively. In summary, except for the data sets, “Stationary Signal”, “Stationary Signal with blip” and “Sudden variance” (see B (Fig. 11)) all the benchmarks were shown to be nonstationary and only the “Stationary Signal with blip” homogeneity of variances with a confidence level of 95%.
The standard accuracy metrics used to evaluate point forecasting methods are the Root Mean Squared Error (RMSE) (21), Mean Absolute Percentage Error (MAPE) (22) and Theil’s U statistic (U) (23) where are the target values, are the forecast values and the sample size. These metrics are commonly used in evaluating forecasting models bose2019designing .
(21) 
(22) 
(23) 
The proposed NSFTS was tested against the Time Variant (song1994forecasting, ) and the Incremental Ensemble approaches, both using the PWFTS method Silva2019b available in pyFTS as internal method. As can be seen in Silva2019a , the PWFTS outperformed a wide number of forecasting methods ranging from classic statistical ones such as ARIMA asteriou2011arima
to more recent ones such as the kNearest Neighbors with Kernel Density Estimation
zhang2016k . The parameters of all methods are presented in Table 2, and they were defined through grid search.Method  Parameter  Value  

All  k  35  
1  

W  100  
R  10  
Incremental Ensemble  M  2 
The grid partitioning scheme was used for the initial generation of fuzzy sets, where the best number of partitions in the range was selected for each data set. The high order methods were tested with orders 2 and 3.
5.2 Results
The average RMSE, MAPE and U statistic results by method and data set are presented in Table 3. It can be seen that the NSFTS is superior or, at least, not worse than the other methods in all the selected real world benchmarks.
From the results on the artificial data sets, it can be seen that the NSFTS has more difficulties than the other methods in handling sudden changes in the series distribution mean. This behavior is expected from the NSFTS once it is an adaptive method and therefore presents a delay to respond to changes. On the other hand, it handles with relative ease incremental changes in the mean. Observing the Theil’s U statistic which is insensitive to the range of values of the series, one can see that changes in the variance alone do not seem to affect the NSFTS performance by much.
These observations explain the good performance of the NSFTS on the economic series. As one can see in A, these series present a gradual, though significant, variation of the mean along with different types of changes in variance. Apparently, the superior performance of the NSFTS when the mean varies gradually gave it the edge over the other methods.
Dataset  RMSE  MAPE  U  

T. Variant  I. Ensemble  NSFTS  T. Variant  I. Ensemble  NSFTS  T. Variant  I. Ensemble  NSFTS  
Stationary Signal  0.07  0.07  0.07  1.13  1.16  1.13  0.99  1.01  1.01 
Stationary signal with blip  0.16  0.17  0.14  1.29  1.27  1.22  1.14  1.24  1.00 
Sudden variance  0.12  0.12  0.12  1.80  1.81  1.73  0.98  1.00  0.98 
Sudden mean  0.25  0.55  0.79  2.20  6.00  19.89  1.72  3.85  5.66 
Sudden mean and variance  0.59  0.89  0.89  3.82  7.52  19.58  1.22  1.90  1.96 
Incremental mean  0.39  7.81  0.11  1.02  14.92  0.69  3.45  71.08  1.04 
Incremental variance  51.89  50.51  60.59  151.43  149.56  241.69  0.78  0.78  0.96 
Incremental mean and variance  1.29  2.37  1.65  5.06  8.48  5.44  1.03  1.51  1.09 
TAIEX  145.82  1130.86  116.82  1.27  9.60  1.34  1.52  11.90  1.24 
SP500  9.29  61.22  7.86  0.64  2.66  0.57  1.16  7.73  1.01 
NASDAQ  35.09  214.74  36.75  0.90  4.49  1.06  1.25  7.71  1.32 
DowJones  71.88  519.55  63.26  0.64  2.80  0.60  1.13  8.25  1.02 
BTCUSD  465.72  1775.53  180.90  4.97  32.81  3.40  2.96  11.50  1.19 
ETHUSD  44.54  222.21  24.16  7.35  41.47  4.06  2.06  10.84  1.25 
EURUSD  0.01  0.04  0.01  0.43  0.98  0.41  1.22  6.75  1.13 
EURGBP  0.00  0.01  0.00  0.37  0.48  0.32  1.24  3.61  1.08 
GBPUSD  0.01  0.07  0.01  0.41  1.18  0.42  1.23  9.62  1.26 
To assess the overall performance of the methods with respect to the RMSE we ran a Friedman Aligned Ranks (Ftest), a nonparametric test for the equality of the means was used with , where indicates that there is no significant difference between the means and indicates that there is at least one significant difference between the means. The Fvalue result was with a pvalue of . Thus, was rejected at 5% confidence level.
A oneversusall Finner paired posthoc test was employed to check the method equivalence with NSFTS as control method, where indicates that there is no significant difference between the means and indicates that there is a significant difference between the means. The results are presented in Table 4 which shows that NSFTS is not equivalent to the Incremental Ensemble method and equivalent to the time variant FTS model, considering a 95% confidence level.
It is important to remind that the NSFTS is computationally cheaper than the Time Variant method since it does not require retraining. Therefore, even though they present equivalent RMSE performance, the NSFTS still has the computational edge.
Comparison  Zvalue  Pvalue 

Result  


3.680062  0.000233  0.000466  Rejected  

0.144203  0.885340  0.885340  Accepted 
6 Conclusion
This paper proposed a NonStationary Fuzzy Time Series (NSFTS) method that is able to dynamically adapt its fuzzy sets to reflect the changes in the underlying stochastic processes based on the residual errors. The proposed approach enables the model to be trained only once and remain useful long after.
The parameter adaptation procedure developed here can be integrated into other FTS methods, extending them to deal with forecasting in nonstationary environments.
The method was tested with several nonstationary and heteroskedastic data sets consisting of four market indices, three FOREX pairs, two cryptocoin exchange rates and eight synthetic timeseries that present different combinations of conceptdrifts.
The main feature delivered by the proposed method is the capability of capturing a wide spectrum of unconditional heteroskedastic effects and time series trends by the variation of several parameters of its internal model. This is done without data preprocessing, without need of retraining and without changing the symbolic structure in the learned rules. In order to contribute to the replication of all the results in the paper, we provide full results and all source codes for this article in Github and in the MINDS Laboratory website. The link is https://github.com/PYFTS/NSFTS.
Acknowledgements
This work was partially financed by the Foundation for Research of the State of Minas Gerais (Fundação de Amparo à Pesquisa do Estado de Minas Gerais  FAPEMIG) and by the National Council for Scientific and Technological Development (CNPq), Brazil, Grants no. 405911/20173; no. 169392/20171; and no. 306850/20168.
References
 (1) L. Cohen, G. AvrahamiBakish, M. Last, A. Kandel, O. Kipersztok, Realtime data mining of nonstationary data streams from sensor networks, Information Fusion 9 (3) (2008) 344–353.
 (2) B. Krawczyk, L. L. Minku, J. Gama, J. Stefanowski, M. Woźniak, Ensemble learning for data stream analysis: A survey, Information Fusion 37 (2017) 132–156.
 (3) J. Maia, C. A. Severiano Junior, F. G. Guimarães, C. L. d. Castro, A. P. Lemos, J. C. F. Galindo, M. W. Cohen, Evolving clustering algorithm based on mixture of typicalities for stream data mining, Future Generation Computer Systems 106 (2020) 672–684.

(4)
M. Assaad, R. Boné, H. Cardot, A new boosting algorithm for improved timeseries forecasting with recurrent neural networks, Information Fusion 9 (1) (2008) 41 – 55, special Issue on Applications of Ensemble Methods.
doi:https://doi.org/10.1016/j.inffus.2006.10.009.  (5) X. Qiu, P. N. Suganthan, G. A. Amaratunga, Fusion of multiple indicators with ensemble incremental learning techniques for stock price forecasting, Journal of Banking and Financial Technology 3 (1) (2019) 33–42.
 (6) T. Zhao, Q. J. Wang, A. Schepen, M. Griffiths, Ensemble forecasting of monthly and seasonal reference crop evapotranspiration based on global climate model outputs, Agricultural and Forest Meteorology 264 (2019) 114 – 124. doi:https://doi.org/10.1016/j.agrformet.2018.10.001.
 (7) G. Ditzler, M. Roveri, C. Alippi, R. Polikar, Learning in Nonstationary Environments: A Survey, IEEE Computational Intelligence Magazine 10 (4) (2015) 12–25. doi:10.1109/MCI.2015.2471196.
 (8) Q. Song, B. S. Chissom, Fuzzy time series and its models, Fuzzy Sets and Systems 54 (3) (1993) 269–277.
 (9) M. H. Lee, H. Javedani, et al., A weighted fuzzy integrated time series for forecasting tourist arrivals, in: International Conference on Informatics Engineering and Information Science, Springer, 2011, pp. 206–217.

(10)
Y. H. Chen, W.C. Hong, W. Shen, N. N. Huang, Electric load forecasting based on a least squares support vector machine with fuzzy time series and global harmony search algorithm, Energies 9 (2) (2016) 70.
 (11) H. J. Sadaei, F. G. Guimarães, C. J. da Silva, M. H. Lee, T. Eslami, Shortterm load forecasting method based on fuzzy time series, seasonality and long memory process, International Journal of Approximate Reasoning 83 (2017) 196–217.
 (12) C. A. S. Junior, P. C. L. Silva, H. J. Sadaei, F. G. Guimarães, Very shortterm solar forecasting using fuzzy time series, in: 2017 IEEE International Conference on Fuzzy Systems (FUZZIEEE), 2017, pp. 1–6. doi:10.1109/FUZZIEEE.2017.8015732.
 (13) S. M. Chen, C. D. Chen, TAIEX forecasting based on fuzzy time series and fuzzy variation groups, IEEE Transactions on Fuzzy Systems 19 (1) (2011) 1–12. doi:10.1109/TFUZZ.2010.2073712.
 (14) P. C. L. Silva, H. J. Sadaei, F. G. Guimarães, Interval forecasting with fuzzy time series, in: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), 2016, pp. 1–8. doi:10.1109/SSCI.2016.7850010.
 (15) F. M. Talarposhti, H. J. Sadaei, R. Enayatifar, F. G. Guimarães, M. Mahmud, T. Eslami, Stock market forecasting by using a hybrid model of exponential fuzzy time series, International Journal of Approximate Reasoning 70 (2016) 79–98.
 (16) J. M. Garibaldi, M. Jaroszewski, S. Musikasuwan, Nonstationary fuzzy sets, IEEE Transactions on Fuzzy Systems 16 (4) (2008) 1072–1086.

(17)
M. A. Alves, P. C. d. L. Silva, C. A. Severiano Junior, G. L. Vieira, F. G. Guimarães, H. J. Sadaei, An extension of nonstationary fuzzy sets to heteroskedastic fuzzy time series, in: 26th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Springer LNCS Series, 2018, pp. 591–596.
 (18) M. Bose, K. Mali, Designing fuzzy time series forecasting models: A survey, International Journal of Approximate Reasoning 111 (2019) 78–99.
 (19) A. Tsymbal, The problem of concept drift: definitions and related work, Computer Science Department, Trinity College Dublin 106 (2).
 (20) J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, A. Bouchachia, A survey on concept drift adaptation, ACM computing surveys (CSUR) 46 (4) (2014) 44.
 (21) I. Kim, S.R. Lee, A fuzzy time series prediction method based on consecutive values, in: FUZZIEEE’99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No.99CH36315), Vol. 2, 1999, pp. 703–707 vol.2. doi:10.1109/FUZZY.1999.793034.
 (22) N. E. Huang, M.L. Wu, W. Qu, S. R. Long, S. S. Shen, Applications of hilbert–huang transform to nonstationary financial time series analysis, Applied stochastic models in business and industry 19 (3) (2003) 245–268.
 (23) Y. Liu, B. Wang, H. Zhan, Y. Fan, Y. Zha, Y. Hao, Simulation of nonstationary spring discharge using time series models, Water Resources Management 31 (15) (2017) 4875–4890.
 (24) S.M. Chen, Forecasting enrollments based on fuzzy time series, Fuzzy Sets and Systems 81 (3) (1996) 311–319.
 (25) H.K. Yu, Weighted fuzzy time series models for TAIEX forecasting, Physica A: Statistical Mechanics and its Applications 349 (3) (2005) 609–624. doi:10.1016/j.physa.2004.11.006.
 (26) C.H. Cheng, T. L. Chen, H. J. Teoh, C. H. Chiang, Fuzzy timeseries based on adaptive expectation model for TAIEX forecasting, Expert Systems with Applications 34 (2008) 1126–1132. doi:10.1016/j.eswa.2006.12.021.
 (27) H. J. Sadaei, R. Enayatifar, A. H. Abdullah, A. Gani, Shortterm load forecasting using a hybrid model with a refined exponentially weighted fuzzy time series and an improved harmony search, International Journal of Electrical Power & Energy Systems 62 (2014) 118–129.
 (28) P. C. L. Silva, H. J. Sadaei, R. Ballini, F. G. Guimaraes, Probabilistic Forecasting With Fuzzy Time Series, IEEE Transactions on Fuzzy Systems (2019) 1–1doi:10.1109/TFUZZ.2019.2922152.

(29)
P. C. d. L. Silva, P. O. Lucas, H. J. Sadaei, F. G. Guimarães,
pyFTS: Fuzzy Time Series for
Python (2018).
doi:10.5281/zenodo.597359.
URL https://doi.org/10.5281/zenodo.597359  (30) P. C. d. L. Silva, M. A. Alves, C. A. Severiano Jr, G. L. Vieira, F. G. Guimarães, H. J. Sadaei, Probabilistic Forecasting with Seasonal Ensemble Fuzzy TimeSeries, in: XIII Brazilian Congress on Computational Intelligence, Rio de Janeiro, 2017. doi:10.21528/CBIC201754.
 (31) P. C. L. Silva, P. O. Lucas, F. G. Guimarães, A Distributed Algorithm for Scalable Fuzzy Time Series, in: R. Miani, L. Camargos, B. Zarpelão, P. R. Rosas, E. (Eds.), Proceedings of 14th International Conference on Green, Pervasive, and Cloud Computing (GPC 2019), Springer Nature Switzerland, Uberlândia, 2019, pp. 1–15. doi:10.1007/9783030192235{_}4.

(32)
P. C. d. L. e. Silva, Scalable
Models For Probabilistic Forecasting With Fuzzy Time Series, Ph.D. thesis,
Universidade Federal de Minas Gerais (9 2019).
doi:10.5281/zenodo.3374641.
URL https://doi.org/10.5281/zenodo.3374641  (33) P. C. d. L. e Silva, C. A. Severiano Jr, M. A. Alves, M. W. Cohen, F. G. Guimarães, A new granular approach for multivariate forecasting, in: Latin American Workshop on Computational Neuroscience, Springer, 2019, pp. 41–58.
 (34) P. C. L. Silva, P. d. O. e Lucas, H. J. Sadaei, F. G. Guimarães, Distributed evolutionary hyperparameter optimization for fuzzy time series, IEEE Transactions on Network and Service Management (2020) 1–1.
 (35) R. Efendi, Z. Ismail, M. M. Deris, Improved weight fuzzy time series as used in the exchange rates forecasting of us dollar to ringgit malaysia, International Journal of Computational Intelligence and Applications 12 (01) (2013) 1350005.
 (36) C.H. Cheng, T.L. Chen, C.H. Chiang, Trendweighted fuzzy timeseries model for taiex forecasting, in: International Conference on Neural Information Processing, Springer, 2006, pp. 469–477.
 (37) J.R. Hwang, S.M. Chen, C.H. Lee, Handling forecasting problems using fuzzy time series, Fuzzy sets and systems 100 (13) (1998) 217–228.
 (38) J. M. Garibaldi, T. Ozen, Uncertain fuzzy reasoning: A case study in modelling expert decision making, IEEE Transactions on Fuzzy Systems 15 (1) (2007) 16–30.
 (39) Q. Song, B. S. Chissom, Forecasting enrollments with fuzzy time series—part ii, Fuzzy sets and systems 62 (1) (1994) 1–8.
 (40) H.T. Liu, M.L. Wei, An improved fuzzy forecasting method for seasonal time series, Expert Systems with Applications 37 (9) (2010) 6310–6318.
 (41) S. R. Singh, A simple time variant method for fuzzy time series forecasting, Cybernetics and Systems: An International Journal 38 (3) (2007) 305–321.
 (42) T. A. Jilani, S. M. A. Burney, A refined fuzzy time series model for stock market forecasting, Physica A: Statistical Mechanics and its Applications 387 (12) (2008) 2857–2862.
 (43) T. Vovan, An improved fuzzy time series forecasting model using variations of data, Fuzzy Optimization and Decision Making 18 (2) (2018) 151–173.
 (44) J. R. Bertini Junior, M. d. C. Nicoletti, An iterative boostingbased ensemble for streaming data classification, Information Fusion 45 (2019) 66–78.
 (45) D. Asteriou, S. G. Hall, Arima models and the box–jenkins methodology, Applied Econometrics 2 (2) (2011) 265–286.
 (46) Y. Zhang, J. Wang, Knearest neighbors and a kernel density estimator for gefcom2014 probabilistic wind power forecasting, International Journal of forecasting 32 (3) (2016) 1074–1080.
Appendix A Appendix
A: Stock market indices
Appendix B Appendix
B: Synthetic data sets
Appendix C Appendix
In order to evaluate the stationarity of the presented data sets, the Augmented DickeyFuller (ADF) test for unit root was employed considering a confidence level , where indicates that the time series have a unit root and it is nonstationary and indicates that time series does not have a unit root and it is stationary. The Augmented DickeyFuller results for unit root are shown below.
Dataset  Augmented DickeyFuller  

Statistic  pvalue  result  
Stationary Signal  7.427114  6.504708e11  Rejected 
Stationary signal with blip  7.497758  4.334045e11  Rejected 
Sudden variance  7.746345  1.029561e11  Rejected 
Sudden mean  2.112067  2.396902e01  Accepted 
Sudden mean and variance  1.165176  6.883938e01  Accepted 
Incremental mean  3.286850  1.000000e+00  Accepted 
Incremental variance  24.746787  0.000000e+00  Rejected 
Incremental mean and variance  2.217183  2.000681e01  Accepted 
TAIEX  2.491767  1.174904e01  Accepted 
SP500  0.943446  7.733287e01  Accepted 
NASDAQ  0.476224  9.841323e01  Accepted 
DowJones  0.800893  8.188597e01  Accepted 
BTCUSD  1.206100  6.709662e01  Accepted 
ETHUSD  1.852677  3.546403e01  Accepted 
EURUSD  1.845986  3.578816e01  Accepted 
EURGBP  1.845986  3.578816e01  Accepted 
GBPUSD  1.131750  7.022457e01  Accepted 
Appendix D Appendix
To check the homoskedasticity the Levene’s test was employed, which checks for homogeneity of variances, with confidence level , where indicates that the subsamples variances of the time series are all equal and indicates that at least one variance of the time series subsamples is different. the Levene’s results for homogeneity of variances are shown below.
Dataset  Levene’s test  

Statistic  pvalue  result  
Stationary Signal  10.148665  1.466392e03  Rejected 
Stationary signal with blip  1.935753  1.642857e01  Accepted 
Sudden variance  25.163718  5.733279e07  Rejected 
Sudden mean  6.989502  8.263198e03  Rejected 
Sudden mean and variance  173.802832  4.097776e38  Rejected 
Incremental mean  2954.661000  0.000000e+00  Rejected 
Incremental variance  520.174358  1.597361e102  Rejected 
Incremental mean and variance  521.736508  9.142922e103  Rejected 
TAIEX  62.530885  3.366934e15  Rejected 
SP500  64.954050  1.003282e15  Rejected 
NASDAQ  1851.123985  0.000000e+00  Rejected 
DowJones  163.413938  1.053157e36  Rejected 
BTCUSD  524.853179  4.352254e107  Rejected 
ETHUSD  666.975156  9.700699e116  Rejected 
EURUSD  401.104689  6.851098e86  Rejected 
EURGBP  401.104689  6.851098e86  Rejected 
GBPUSD  1340.896808  2.812862e260  Rejected 
Comments
There are no comments yet.