Real-Time Anomaly Detection in Data Centers for Log-based Predictive Maintenance using an Evolving Fuzzy-Rule-Based Approach

Detection of anomalous behaviors in data centers is crucial to predictive maintenance and data safety. With data centers, we mean any computer network that allows users to transmit and exchange data and information. In particular, we focus on the Tier-1 data center of the Italian Institute for Nuclear Physics (INFN), which supports the high-energy physics experiments at the Large Hadron Collider (LHC) in Geneva. The center provides resources and services needed for data processing, storage, analysis, and distribution. Log records in the data center is a stochastic and non-stationary phenomenon in nature. We propose a real-time approach to monitor and classify log records based on sliding time windows, and a time-varying evolving fuzzy-rule-based classification model. The most frequent log pattern according to a control chart is taken as the normal system status. We extract attributes from time windows to gradually develop and update an evolving Gaussian Fuzzy Classifier (eGFC) on the fly. The real-time anomaly monitoring system has to provide encouraging results in terms of accuracy, compactness, and real-time operation.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 7

04/08/2020

Comparison of Evolving Granular Classifiers applied to Anomaly Detection for Predictive Maintenance in Computing Centers

Log-based predictive maintenance of computing centers is a main concern ...
04/18/2017

Anomaly detection and motif discovery in symbolic representations of time series

The advent of the Big Data hype and the consistent recollection of event...
08/12/2020

Rule-based Anomaly Detection for Railway Signalling Networks

We propose a rule-based anomaly detection system for railway signalling ...
07/12/2019

Automated Real-time Anomaly Detection in Human Trajectories using Sequence to Sequence Networks

Detection of anomalous trajectories is an important problem with potenti...
03/25/2020

NVMe and PCIe SSD Monitoring in Hyperscale Data Centers

With low latency, high throughput and enterprise-grade reliability, SSDs...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

A computing center (CC) is responsible for supporting a flexible, on-demand, dynamic, and computing-scalable cloud infrastructure, in which the resources are available directly or by means of services [8]. The complex CC infrastructure requires maintenance tools to keep itself operative, efficient, and reliable.

The maintenance of a CC is based on the complexity of the operation and idling time. It is usually classified as: (i) reactive; (ii) preventive; (iii) predictive; and (iv) advanced. The reactive maintenance refers to a set of procedures deployed after the fault occurrence, which aims at restoring the pristine behavior. Preventive maintenance is the collection of procedures performed to lessen the likelihood of a system failure. The predictive maintenance is designed to determine the status of running services, and predict events of interest. Advanced maintenance combines the other three paradigms in order to forecast and diagnose failures [29].

Usually, CC maintenance is based on offline statistical analysis of log records – in the preventive case, this is based on fixed time intervals. Recently, online computational-intelligence-based systems, namely, evolving fuzzy modelling frameworks [24] [3] [2] [9] [12] [23] supported by fast incremental machine-learning algorithms, have been employed in general issues related to on-demand anomaly detection, forecasting, autonomous data classification, and predictive maintenance of a plethora of applications [30] [17] [1] [21] [5] [26].

Log records concern service-oriented unstructured data. Log data samples need to be ad-hoc processed by learning and modelling algorithms. The use of general-purpose solutions based on the content of log files has been a challenge over the years. In a log-based system, the data may be highly verbose such that it is hard to extract useful information from raw data. The amount of data is huge while a high percentage tends to be redundant. Any CC service run by a user generates log data using multiple files. After being processed, a reasonable amount of data for analysis is obtained.

Since all CC activities are recorded in log files, algorithms can track event occurrences through the data extracted from log files to monitor and predict the system status. In the predictive case, identification of anomalous behavior as an intermediate step using global attributes of log records is possible [4]. To reduce log-content processing, a common characteristic of the log files – the timestamp of each line writing – can be used. Furthermore, a reasonable assumption is that the system activity is proportional to the per-minute rate of lines written in a log file. Considering overall system faults, a direct impact on such rate of written log records is expected.

The background scenario of this study is the Tier-1 Bologna – the main Italian WLCG (Worldwide LHC Computing Grid) tier hold by the computing center of the Italian Institute of Nuclear Physics (INFN-CNAF). The WLCG involves 170 computing centers in over 42 countries, being the grid system that support the physics experiments performed at the biggest particle accelerator in the world, the CERN. It is organised in 4 layers – the tiers, from 0 (at CERN) to 3, in decreasing order of importance. The Tier-1 Bologna has approximately 40,000 CPU cores, 40 PB of disk storage, and 90 PB of tape storage. It is connected to the Italian (GARR) and European (GÉANT) research networks, whose bandwidth of data transmission is over 200 Gbps. Currently, the Tier-1 has collected log data from 1,197 machines.

The INFN-CNAF provides a computing farming that accounts for all computing services of the Tier-1 Bologna. It acts as a service underlying the workload management system, allowing job scheduling to access directly the INFN-CNAF experiment data. On average, about 100 thousand batch jobs are executed per day at INFN-CNAF. The resources are continuously available, 24 hours a day, 7 days a week. The CC facility is based on a warehouse infrastructure for both storage and data transfer through a distributed system [18].

As Tier-1 Bologna CC concerns a dedicated infrastructure to support physics experiments [6], minimising the resources to maintain system operationality is needed as log-data handling is a highly time and resource-consuming task. To achieve such computational cost minimisation, an approach is to identify which pieces of log data have processing priority aiming at maximising the likelihood to find useful information to the system maintenance. The present study addresses anomaly detection of the system behavior as an optimisation approach for predictive maintenance.

The rest of this paper is structured as follows. Section II presents related literature on anomaly detection and system maintenance at computing centers. Section III describes an evolving fuzzy-rule-based classification framework that is able to learn from summaries of log records and keep an updated representation of the spatial-temporal patterns related to the generation of log files. Section IV shows the methodology to perform the computational experiments. Classification results are given in Section V. Conclusions are outlined in Section VI.

Ii Related Literature

Because of the High-Luminosity LHC (HL-LHC) project, the major programmed upgrade at CERN, the used luminosity will increase by a factor of 10 from the original design. The luminosity is the rate of potential collisions per surface unit, which is proportional to the generated experimental data [11]. In this way, the amount of experimental and Monte Carlo analysis data will enlarge by, at least, the same factor, intensifying the maintenance complexity to keep the computing center quality of service (QoS).

For that reason, many efforts are being done at Tier-1 Bologna in order to create predictive maintenance tools using the log data. A first work based on the Elastic Stack Suite catalogues the log records and anomalies using an embedded unsupervised ML tool [7]. Another initiative uses supervised ML approaches to predict anomalies of system behavior in an ad-hoc solution [10]. Another work, also focused on a content-processing strategy, provides a clustering method used to characterize log records using Levenshtein distance [28]

. In particular, it was created a prototype to identify a normal and an anomalous system behavior, in a binary classification, considering the log data generation rate and an One-class Support Vector Machine approach

[18].

Ii-a The StoRM logs use case

StoRM is the storage resource manager service (SRM) for generic disk-based storage systems adopted by the Tier-1 Bologna, providing high performance to parallel file systems.

SRM has a modular architecture made by two stateless components, Front-end (FE) and Back-end (BE), connected to database systems. The FE module manages user authentication, stores/retrieves database requests, interacting with the BE module [27].

In the other hand, the BE module is the core of StoRM service, executing all synchronous and asynchronous SRM functionalities and managing the Grid interactions. A simple StoRM architecture schema is presented in Fig. 1, showing the main module interactions. Typically, BE log files entries include the operator that has requested the action (DN), the involved files locations (SURL) and the result of the operation. A sample of its log messages is shown in Fig. 2.

Fig. 1: A typical StoRM service architecture, with single back and front-end modules
Fig. 2: Example of content of a storm-backend.log file

In addition, the StoRM service at Tier-1 Bologna is used by high-energy physics experiments, in which each one has a different implementation of the structures and rules of the logging. In this work, the BE module log files from ATLAS implementation is chosen as input without any special reason.

Iii eGFC: Evolving Gaussian Fuzzy Classifier

This section outlines eGFC, a semi-supervised evolving classifier derived from an online granular-computing framework [14] [15]. Although eGFC handles partially labeled data, we assume a fully-labeled log-file dataset in this paper. eGFC employs Gaussian membership functions to cover the data space with fuzzy granules, and associate new data samples to class labels. Granules are scattered in the data space wherever needed to represent local information. eGFC global response comes from the aggregation of local models. A recursive algorithm constructs a rule base, and updates local models to deal with changes. eGFC addresses issues such as unlimited amounts of data and scalability [24] [13].

Iii-a Preliminaries

Local models are created if the newest data are sufficiently different from the current knowledge. The learning algorithm expand, reduce, delete, and merge information granules. Rules are reviewed according to inter-granular relations. eGFC provides nonlinear, non-stationary, and fuzzy discrimination boundaries among classes [24] [14]. This paper particularly addresses a 4-class log-file classification problem.

Formally, let an input-output pair be related through . We seek an approximation to

to estimate the value of

given x. In classification, is a class label, a value in a set , and specifies class boundaries. In the more general, semi-supervised case, may or may not be known when x arrives. Classification of never-ending data streams involves pairs of time-sequenced data, indexed by . Non-stationarity requires evolving classifiers to identify time-varying relations .

Iii-B Gaussian Functions and Rule Structure

Learning in eGFC does not require initial rules. Rules are created and dynamically updated depending on the behavior of a system over time. When a data sample is available, a decision procedure may add a rule to the model structure or update the parameters of a rule.

In eGFC models, a rule is

IF is AND … AND is

THEN is

in which , , are attributes, and is a class. The data stream is denoted Moreover, , ; , are Gaussian membership functions built from the available data; and is the class label of the -th rule. Rules , , form the rule base. The number of rules, , is variable, which is a notable characteristic of the approach since guesses on how many data partitions exist are needless [24].

A normal Gaussian membership function, , has height 1 [19]. It is characterized by the modal value and dispersion . Characteristics that make Gaussians appropriate include: (i) easiness of learning and changing, i.e., modal values and dispersions can be captured and updated straightforwardly from a data stream; (ii) infinite support, i.e., since the data are priorly unknown, the support of Gaussians extends to the whole domain; and (iii) smooth surface of fuzzy granules, , in the -dimensional Cartesian space – obtained by the cylindrical extension of uni-dimensional Gaussians, and the use of the minimum T-norm aggregation [13] [19].

Iii-C Adding Rules to the Evolving Fuzzy Classifier

Rules may not exist a priori. They are created and evolved as data are available. A new granule and the rule that governs the granule are created if none of the existing rules are sufficiently activated for a sample . The learning algorithm assumes that brings new information. Let be an adaptive threshold that determines if a new rule is needed. If

(1)

in which is any triangular norm, then the eGFC structure is expanded. The minimum (Gödel) T-norm is used in this paper, but other choices are possible. If is equal to 0, then the model is structurally stable, and unable to capture concept shifts. In contrast, if is equal to 1, eGFC creates a rule for each new sample, which is not practical. Structural and parametric adaptability are balanced for intermediate values of (stability-plasticity trade-off) [16].

The value of is crucial to regulate how large granules can be. Different choices impact the accuracy and compactness of a model, resulting in different granular perspectives of the same problem. Section III-E gives a Gaussian-dispersion-based procedure to update .

A new granule is initially represented by membership functions, , , with

(2)

and

(3)

We call (3) the Stigler approach to standard Gaussian functions, or maximum approach [13]. The intuition is to start big, and let the dispersions gradually shrink when new samples activate the same granule. This strategy is appealing for a compact model structure.

In general, the class of the rule is initially undefined, i.e., the -th rule remains unlabeled until a label is provided. If the corresponding output, , associated to , becomes available, then

(4)

Otherwise, the first labeled sample of the data stream that arrives after the -th time step, and activates the rule according to (1), is used to define its class, .

In case a labeled sample activates a rule that is already labeled, but the sample’s label is different from that of the rule, then a new (partially overlapped) granule and a rule are created to represent new information. Partially overlapped Gaussian granules tagged with different labels tend to have their dispersions reduced over time by the parameter adaptation procedure described in Section III-D. The modal values of the Gaussian granules may also drift, if convenient for a more suitable decision boundary.

With this initial rule parameterization, preference is given to the design of granules balanced along its dimensions, rather than granules with unbalanced geometry. eGFC realizes the principle of the balanced information granularity [20], but allows the Gaussians to find more appropriate places and dispersions through adaptation mechanisms.

Iii-D Incremental Parameter Adaptation

Updating the eGFC model consists in: (i) reducing or expanding Gaussians , , of the most active granule, , considering labeled and unlabeled samples; (ii) moving granules toward regions of relatively dense population; and (iii) tagging rules if labeled data are available. Adaptation aims to develop more specific local models in the sense of Yager [31], and provide pavement (covering) to the newest data.

A rule is candidate to be updated if it is sufficiently activated by an unlabeled sample, , according to

(5)

Geometrically, belongs to a region highly influenced by the granule . Only the most active rule, , is chosen for adaptation in case two or more rules reach the level for the unlabeled . For a labeled sample, i.e., for pairs , the class of the most active rule , if defined, must match . Otherwise, the second most active rule among those that reached the level is chosen for adaptation, and so on. If none of the rules are apt, then a new one is created (Section III-C).

To include in , eGFC’s learning algorithm updates the modal values and dispersions of the corresponding membership functions , , from

(6)

and

(7)

in which is the number of times the rule was chosen to be updated. Notice that (6)-(7) are recursive and, therefore, do not require data storage. As defines a convex region of influence around , very large and very small values may induce, respectively, a unique or too many information granules per class. An approach is to keep between a lower, , and the Stiegler, , limits.

Iii-E Dispersion-Based Time-Varying -Level

Let the activation threshold, , be time-varying, similar to [14] [9]. The threshold assumes values in the unit interval according to the overall average dispersion

(8)

in which and are the number of rules and attributes, so that

(9)

As mentioned, rules’ activation levels for an input

are compared to to decide between parametric or structural changes of an eGFC model. In general, eGFC starts learning from an empty rule base, and without knowledge about the properties of the data. Practice suggests as starting value. The threshold tends to converge to a proper value after some time steps if the classifier structure and parameters achieve a level of maturity and stability. Non-stationarities and new classes guide to values that better reflect the needs of the current environment. A time-varying avoids assumptions about how often the data stream changes.

Iii-F Merging Similar Granules

Similarity between two granules with the same class label may be high enough to form a unique granule that inherits the essential information conveyed by the merged granules. Analysis of inter-granular relations requires a distance measure between Gaussian objects. Let

(10)

be the distance between the granules and . This measure considers Gaussians and the specificity of information, that is, in turn, inversely related to the Gaussians’ dispersion [25]. For example, if the dispersions and differ one from another, rather than being equal, the distance between the underlying Gaussians is larger.

eGFC may merge the pair of granules that presents the smallest value of for all pairs of granules. Both granules must be either unlabeled or tagged with the same class label. The merging decision is based on a threshold value, , or expert judgement regarding the suitability of combining such granules to have a more compact model. For data within the unit hyper-cube, we suggest as default, which means that the candidate granules should be quite similar and, in fact, carry the same information.

A new granule, say , which results from and , is built by Gaussians with modal values

(11)

and dispersion

(12)

These relations take into consideration the uncertainty ratio of the original granules to determine an appropriate location and size of the resulting granule. Merging granules reduces the number of rules and redundancy [14] [25].

Iii-G Deleting Rules

A rule is removed from the eGFC model if it is inconsistent with the current environment. In other words, if a rule is not activated for a number of iterations, say , then it is deleted from the rule base. However, if a class is rare, then it may be the case to set to infinity and keep the inactive rules. Removing rules periodically helps to keep the knowledge base updated in some applications.

Iii-H Semi-Supervised Learning from Data Streams

The semi-supervised learning procedure to construct and update eGFC models along their lifespan is given in the next column.

Iv Methodology

We describe a dynamic control chart approach we propose for attribute extraction and log data tagging. We give details about the data-set and evaluation measures.

Iv-a Control Chart: Tagging Log Data

A control chart is a time-series graph to monitor the evolution of a process, phenomenon or variable based on the Central Limit theorem

[22]. The main idea is that the mean,

, of an independent random variable,

, with unknown distribution, follows a normal distribution.

Let

(13)

be a sequence of values that represent the number of log entries in a log file over a time window ; . The time interval from to coincides with the window boundaries, and . Additionally, let be the mean of , thus

(14)

A time series of means, with cardinality , is

(15)

As follows a normal distribution, a sample can be tagged by means of a control chart, see Fig. 3. The mean of the time

  eGFC: Online Semi-Supervised Learning  

1:  Initial number of rules, ;
2:  Initial meta-parameters, , ;
3:  Read input data sample ;
4:  Create granule (Eqs. (2)-(3)), unknown class ;
5:  FOR = 2, … DO
6:     Read ;
7:     Calculate rules’ activation degree (Eq. (1));
8:     Determine the most active rule ;
9:     Provide estimated class ;
10:     // Model adaptation
11:     IF
12:       IF actual label is available
13:         Create labeled granule (Eqs. (2)-(4));
14:       ELSE
15:         Create unlabeled granule (Eqs. (2)-(3));
16:       END
17:     ELSE
18:       IF actual label is available
19:         Update the most active granule whose class        is equal to (Eqs. (6)-(7));
20:         Tag unlabeled active granules;
21:       ELSE
22:         Update the most active (Eqs. (6)-(7));
23:       END
24:     END
25:     Update the -level (Eqs. (8)-(9));
26:     Delete inactive rules based on ;
27:     Merge granules based on (Eqs. (10)-(12));
28:  END

 

series of means is

(16)

The -th upper and lower horizontal lines in relation to refer to the

-th standard deviation,

(17)

such that if a sample

(18)

for , it is tagged as ‘Class 1’ (normal system condition). Otherwise, if (18) holds for , and , respectively, is tagged as ‘Class 2’, ‘Class 3’, and ‘Class 4’, which mean low, medium, and high-severity anomaly. The greater the value of , the greater the severity of the anomalous behavior.

Fig. 3: Control chart used to tag mean log data within a time window

Control charts are widely used in quality monitoring to identify anomalies according to the control lines calculated from a stream of means. The probability that a sample

is within the different class boundaries are 68.3%, 27.1%, 4.3%, and 0.3%, respectively. Therefore, the online data classification problem is unbalanced.

Iv-B About the Data-set

A stream of time-indexed log records is generated by the StoRM service. Each log entry is composed by the timestamp in which it was written, and the message itself. Analysis of the message type and its content is out of the scope of this paper.

We extract relevant attributes from the original log data stream by analysing constant sliding time windows. Transformed data are provided as 5-attribute vectors

(19)

whose elements evaluated in time window are , , , , and . The latter means the maximum difference of the amplitude of two consecutive , belonging to the time window .

A vector is associated to a class label that, in turn, indicates the system behavior. The true label is available after an estimation provided by the eGFC model. The pair is used by the eGFC online learning algorithm for an updating step.

Iv-C Performance Measure

Classification accuracy, , is computed recursively from

(20)

in which if (right estimation). Otherwise, (wrong class estimation).

The average number of granules or rules over time, , is a measure of model concision. Recursively,

(21)

V Results

We evaluate the evolving Gaussian fuzzy classification system. No prior knowledge about the data is assumed. Classification models are developed from scratch based on information extracted from an online log data stream.

V-a eGFC Results

We look for an evolving classifier based on the newest input data. The default meta-parameters are used (see eGFC Learning Algorithm). Table I summarizes the results averaged over 5 runs for shuffled data-sets extracted from log records. Four data-sets were produced using the same data, but different lengths of time windows, namely 60, 30, 15, and 5-minute time windows. Larger time windows impose a higher-order low-pass filter effect, and tends to isolate the trend component of the time series from cyclical and random (stochastic) components. Each data-set consists of 1,436 samples and 5 attributes. Four classes are possible, namely ‘normal operation’, ‘low severity’, ‘medium severity’, and ‘high severity’.

Lenght (min) (%) # Rules Time (s)
60
30
15
5
TABLE I: eGFC Performance in Multi-class Classification of System Anomalies ( of confidence)

Table I shows that analysis of larger 60-minute windows facilitates the eGFC learning algorithm to detect and classify spatial-temporal patterns, which represent the anomaly classes. Notice that using a more compact model structure ( fuzzy rules on average along the learning steps), the eGFC model produced an average accuracy of . The CPU time in a quad-core i7-8550U with 1.80GHz and 8GB of RAM are similar in all scenarios.

Figure 4 gives a typical example of evolution of the -level, accuracy, and number of eGFC rules. Four dimensions of the final Gaussian granules, at , are also shown. Notice that data from Class 2 and Class 3 (low and medium-severity anomalies) spread in a nonlinear way over the data space. These classes require more than one granule and rule to be represented, whereas the remaining classes are generally given in a common region. Class-4 data (high-severity anomaly) belong to a more compact region than the data of other classes and, therefore, are represented by a single granule. A higher number of granules to represent a class, in general, provides larger non-linearity of decision boundaries, which improves classification accuracy.

Fig. 4: The time evolution of the evolving factors: granulation and number of rules, and the model accuracy until the convergence at the 3 first graphics. At the last 2 graphics the eGFC Gaussian classes.

Figure 5 emphasizes the multi-dimensional ellipsoidal geometry of eGFC granules. This contour lines representation confirm the spreading characteristic specially related to Class-2 data, showing large overlapping regions of Class-1 and Class-2. Figure 6

shows the confusion matrix for a

-accuracy scenario. Notice that confusion happens in the neighbourhood of a target class, which means that if a higher number of streaming samples are further available, the eGFC model may improve its accuracy by fine tuning its decision boundaries. Class 1 (normal operation) and Class 2 (low severity) are those responsible for a larger reduction of the overall accuracy.

Fig. 5: The multi-dimensional ellipsoidal geometry of eGFC granules using the first four attributes of the log stream. The colours of the centers refer to the control chat of Fig. 3, i.e., green: normal system condition; yellow, orange and red: low, medium and high anomaly severity
Fig. 6: Example of confusion matrix provided by a -accuracy eGFC model

To sum up, using the evolving fuzzy classification methodology and the sliding window control-chart-based approach, a CC maintenance system can accurately identify time windows that require further analysis in terms of text content. The evolving methodology supports data and information mining to assist predictive maintenance. Overall system status can be modelled as Gaussian granules of the log activity rate, and status changing can be noticed visually from the control charts. In addition, the stream of system status can be used to diagnose the context of the current log status, and to predict the next status. Since eGFC preserves its accuracy in non-stationary environment, the approach has shown to be a reliable solution to the predictive maintenance problem.

Vi Conclusion

We described a real-time evolving general-purpose solution, namely, eGFC, to the log-based anomaly detection problem considering time-varying data from the Tier-1 Bologna computer center. eGFC models achieved an average accuracy of

with a confidence interval of

using a 60-minute sliding time window. Since the anomaly detection issue is context-sensible, the eGFC approach provides a strategy to update and evolve information granules and the parameters and structure of a fuzzy rule-based classifier in real-time. Multi-dimensional Gaussian granules are placed and sized autonomously in the data space aiming at constant improving the classification performance.

Fuzzy information granulation gives flexible and smooth boundaries to the classification model such that a wide variety of computer-center behaviors related to the same class label – even occurring in a conflicting region with overlapped classes – could be captured. This way, the eGFC approach, as a data-stream-oriented method, has shown to be highly applicable to a large range of classification issues concerning large log records from computing centers such as the Tier-1 Bologna, which supports the high-energy physics experiments at the Large Hadron Collider. Additionally, the autonomous sliding-window-based tagging strategy using control charts was successfully applied to the anomaly detection problem in question. Hand-labelling large volumes of online data (a key research issue in the machine learning community) is usually infeasible. Therefore, the chart-based approach seems quite promising to lead accuracy improvement in evolving classification frameworks.

The present study provides basis for extracting information from log content and identifying the best components to be text-processed, which minimise computational resource consumption. In the future we shall identify the type of message associated to anomalous time windows, and investigate autonomous feature extraction procedures.

References

  • [1] P. Angelov and X. Gu (2018-10) Deep rule-based classifier with human-level performance and characteristics. Inf. Sci. 463-464, pp. 196–213. External Links: Document Cited by: §I.
  • [2] G. Casalino, G. Castellano, and C. Mencar (2019) Data Stream Classification by Dynamic Incremental Semi-Supervised Fuzzy Clustering. Int J Artif Intell T 28 (08), pp. 26p. External Links: Document Cited by: §I.
  • [3] L. A. Cordovil Júnior, P. H. Coutinho, I. Bessa, M. D’Angelo, and R. Palhares (2019) Uncertain data modeling based on evolving ellipsoidal fuzzy information granules. IEEE Transactions on Fuzzy Systems. Note: DOI: 10.1109/TFUZZ.2019.2937052 Cited by: §I.
  • [4] L. D. Sousa et al. (2019) Big data analysis for predictive maintenance at the INFN-CNAF data center using machine learning approaches. In Conf of Open Innovations Association (FRUCT), Helsinki, pp. 448–451. External Links: Document Cited by: §I.
  • [5] L. Decker, D. Leite, F. Viola, and D. Bonacorsi Comparison of evolving granular classifiers applied to anomaly detection for predictive maintenance in computing centers. In IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS), Bari, pp. 8p. Cited by: §I.
  • [6] A. Di Girolamo et al. (2020) Operational intelligence for distributed computing systems for exascale science. In Int Conf on Computing in High Energy and Nuclear Physics (CHEP), AU, pp. 8p. Cited by: §I.
  • [7] T. Diotalevi et al. (2019) Collection and harmonization of system logs and prototypal analytics services with the elastic (ELK) suite at the INFN-CNAF computing centre. In Int Symposium on Grids & Clouds (ISGC), Taiwan: Proceedings of Science, pp. 15p. Cited by: §II.
  • [8] B. Furht and A. Escalante (2010) Handbook of cloud computing. 1st edition, Springer Publishing Company, Incorporated. External Links: ISBN 1441965238 Cited by: §I.
  • [9] C. Garcia, D. Leite, and I. Škrjanc (2019)

    Incremental missing-data imputation for evolving fuzzy granular prediction

    .
    IEEE T FUZZY SYST, pp. 15p. Note: DOI: 10.1109/TFUZZ.2019.2935688 Cited by: §I, §III-E.
  • [10] L. Giommi et al. (2019) Towards predictive maintenance with machine learning at the INFN-CNAF computing centre. In International Symposium on Grids & Clouds (ISGC). Taipei, Taiwan: Proceedings of Science, pp. 17p. Cited by: §II.
  • [11] W. Herr and B. Muratori (2006) Concept of luminosity. In CAS - CERN Accelerator School: Course on Accelerator Physics, pp. 361–378. External Links: Document Cited by: §II.
  • [12] R. Hyde, P. Angelov, and A. Mackenzie (2016) Fully online clustering of evolving data streams into arbitrarily shaped clusters. Inf. Sci. 382, pp. 41p. External Links: Document Cited by: §I.
  • [13] D. Leite, G. Andonovski, I. Skrjanc, and F. Gomide (2019) Optimal rule-based granular systems from data streams. IEEE Transactions on Fuzzy Systems (), pp. . Note: doi: 10.1109/TFUZZ.2019.2911493 Cited by: §III-B, §III-C, §III.
  • [14] D. Leite, R. Ballini, P. Costa Jr, and F. Gomide (2012-06) Evolving fuzzy granular modeling from nonstationary fuzzy data streams. Evolving Systems 3, pp. 65–79. External Links: Document Cited by: §III-A, §III-E, §III-F, §III.
  • [15] D. Leite, L. Decker, M. Santana, and P. Souza (2020) EGFC: Evolving Gaussian fuzzy classifier from never-ending semi-supervised data streams - with application to power quality disturbance detection and classification. In IEEE World Congress on Computational Intelligence (WCCI – FUZZ-IEEE), Glasgow, pp. 8p. Cited by: §III.
  • [16] D. Leite and I. Škrjanc (2019-07) Ensemble of evolving optimal granular experts, owa aggregation, and time series prediction. Inf. Sci. 504, pp. 95–112. External Links: Document Cited by: §III-C.
  • [17] J. L. Lobo, I. Lana, J. D. Ser, M. N. Bilbao, and N. Kasabov (2018)

    Evolving spiking neural networks for online learning over drifting data streams

    .
    Neural Networks 108, pp. 1–19. Cited by: §I.
  • [18] F. Minarini (2019) Anomaly detection prototype for log-based predictive maintenance at INFN-CNAF. Master’s Thesis, U. of Bologna. Cited by: §I, §II.
  • [19] W. Pedrycz and F. Gomide (2000-01) An Introduction to Fuzzy Sets: Analysis and Design. NetLibrary. External Links: ISBN 9780262281348, Document Cited by: §III-B.
  • [20] W. Pedrycz (2000-01) Granular computing : an introduction. Vol. 45, pp. 309–328. External Links: Document Cited by: §III-C.
  • [21] M. Pratama, E. Lughofer, C. Lim, W. Rahayu, T. Dillon, and A. Budiyono (2016-07) PClass+: a novel evolving semi-supervised classifier. INT J FUZZY SYST 19, pp. 863–880. External Links: Document Cited by: §I.
  • [22] P. Qiu (2014) Introduction to statistical process control. Wiley: India. Cited by: §IV-A.
  • [23] H. J. Sadaei, P. C. de Lima e Silva, F. G. Guimarães, and M. H. Lee (2019)

    Short-term load forecasting by using a combined method of convolutional neural networks and fuzzy time series

    .
    Energy 175, pp. 365–377. Cited by: §I.
  • [24] I. Škrjanc, J. Iglesias, A. Sanchis, D. Leite, E. Lughofer, and F. Gomide (2019) Evolving fuzzy and neuro-fuzzy approaches in clustering, regression, identification, and classification: a survey. Inf. Sci. 490, pp. 344–368. External Links: Document Cited by: §I, §III-A, §III-B, §III.
  • [25] E. A. Soares, H. A. Camargo, S. J. Camargo, and D. F. Leite (2018) Incremental gaussian granular fuzzy modeling applied to hurricane track forecasting. 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 8p. Cited by: §III-F, §III-F.
  • [26] L. D. Sousa et al. (2012) Event detection framework for wireless sensor networks considering data anomaly. In IEEE Symposium on Computers and Communications (ISCC), pp. 500–507. Cited by: §I.
  • [27] The StoRM project. Note: https://italiangrid.github.io/storm/ index.html Cited by: §II-A.
  • [28] S. R. Tisbeni (2019) Big data analytics towards predictive maintenance at the INFN-CNAF computing centre. Master’s Thesis, U. of Bologna. Cited by: §II.
  • [29] F. Trojan and R. Marçal (2017-07) Proposal of maintenance-types classification to clarify maintenance concepts in production and operations management. Journal of Business Economics 8, pp. 562–574. External Links: Document Cited by: §I.
  • [30] R. Venkatesan, M. Er, M. Dave, M. Pratama, and S. Wu (2016-07) A novel online multi-label classifier for high-speed streaming data applications. Evolving Systems, pp. 303–315. External Links: Document Cited by: §I.
  • [31] R. R. Yager (1998) Measures of specificity. In Computational Intelligence: Soft Computing and Fuzzy-Neuro Integration with Applications, O. Kaynak, L. A. Zadeh, B. Türkşen, and I. J. Rudas (Eds.), Berlin, Heidelberg, pp. 94–113. External Links: ISBN 978-3-642-58930-0 Cited by: §III-D.