Knowledge extraction, modeling and formalization: EEG case study

by   Dmitry Morozov, et al.
Université Lorraine

Formal Concept Analysis (FCA) is a well-established method for data analysis which finds many applications in data mining. Its extension on complex data representation formats brought a wave of new applications to the problems such as gene expression mining, prediction of toxicity of chemical compounds or clustering of sequences in process event logs. Insipired from this work our research inherits their model and designs an experiment for mining electroencephalographic recordings for patterns of sleep spindles. The contribution of this paper lies in the specification of desritizition procedure and the architecture of FCA experiment. We also provide some reflection on the related research papers.



There are no comments yet.



Introduction to Formal Concept Analysis and Its Applications in Information Retrieval and Related Fields

This paper is a tutorial on Formal Concept Analysis (FCA) and its applic...

Formal Concept Analysis for Knowledge Discovery from Biological Data

Due to rapid advancement in high-throughput techniques, such as microarr...

Core Conflictual Relationship: Text Mining to Discover What and When

Following detailed presentation of the Core Conflictual Relationship The...

Contributions to Biclustering of Microarray Data Using Formal Concept Analysis

Biclustering is an unsupervised data mining technique that aims to unvei...

Relational Data Mining Through Extraction of Representative Exemplars

With the growing interest on Network Analysis, Relational Data Mining is...

Mining Biclusters of Similar Values with Triadic Concept Analysis

Biclustering numerical data became a popular data-mining task in the beg...

Similarité en intension vs en extension : à la croisée de l'informatique et du théâtre

Traditional staging is based on a formal approach of similarity leaning ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Neuroscience which is primarily a biological science in fact offers wide range of problems to different scientific communities and stimulates interdisciplinary research across various domains. In this way mathematical graph theory serves as an instrument to analyze and build models of connectivity of brain regions (Chu et al. (2012); Giusti et al. (2015)

). Advances in Machine learning methods allow to construct accurate predictions for epilepsy seizures (

Mirowski et al. (2009)) or gain insights into the nature of sleep (Acharya et al. (2005)).

Formal concept analysis (Ganter and Wille (1997)

) which was initially borned from the algebraic lattice theory to the moment has took rather applicational direction. It provides methods to structure and explore the domain of interest, and discover new knowledge.

This study takes a look at application of Formal Concept Analysis (FCA) for segregation and differentiating sleep spindles, a kind of waves occurring on an EEG during sleep. They are characterized by short burst of high frequency brain activity and primarily serve as an indicator of a stage 2 sleep.

The rest of the paper is organized as follows. The second section introduces the methods of FCA and its extensions that lay the basis of our approach. The third section dedicated to the details and the design of conducted experiment. In the forth section we discuss related research; and the final section concludes the paper with a summary.

2 Methods

2.1 Formal Concept Analysis (FCA)

FCA Ganter and Wille (1997) is at its core a mathematical formalism which with time has developed and been extended with many theoretical and applied studies. Starting with a set of objects and a set of attributes FCA finds generalizations of the descriptions for arbitrary subset of objects.

Let and be sets, called the set of objects and attributes, respectively, and let be a relation : for holds iff the object has the attribute . The triple is called formal context.

If are arbitrary subsets, then the Galois connection is given by the following derivation operators:

for all

for all

The pair , where , and is called a formal concept of the context . is called the extent and the intent of the formal concept . From the properties of the derivation operators it follows that the conditions , can be represented in more simple way , or equivalently . This reformulated form signifies that a formal concept is such a pair of sets that either of them is closed under derivation operator ()’.

The concepts, ordered by form a complete lattice, called the concept lattice .

2.2 Pattern Structures

Pattern Structures are generalization of FCA that extends its theory beyond binary formal contexts on other formats of description (e.g. sequences or graphs).

A pattern structure is a triple , where are sets, called the set of objects and the set of descriptions, and : maps an object to a description. Respectively, is a meet-semilattice on w.r.t. , called similarity operation such that generates a complete subsemilattice of .

It can be noticed that FCA fits to the definition pattern structures. The set of objects is preserved, the semilattice of descriptions is , ( denotes the powerset of the set of attributes ), a description is a subset of attributes and is the set-theoretic intersection. : is given by

The Galois connection for a pattern structure , relating sets of objects and descriptions, is defined as follows:



Given a subset of objects returns the description which is common to all objects in . Given a description is the set of all objects whose description subsumes . The natural partial order (or subsumption order between descriptions) on is defined w.r.t. the similarity operation (in this case we say that is subsumed by ). In the case of standard FCA the natural partial order corresponds to the set-theoretical inclusion order, i.e., for two sets of attributes and

A pattern concept of a pattern structure is a pair , where and such that and is called the pattern extent and is called the pattern intent.

As in standard FCA, a pattern concept corresponds to the maximal set of objects whose description subsumes the description , where is the maximal common description of objects in . The set of all pattern concepts is partially ordered w.r.t. inclusion of extents or, dually, w.r.t. subsumption of pattern intents within a concept lattice, these two antiisomorphic orders making a lattice, called pattern lattice.

2.3 Example of a Pattern Structure

The most simple extension of binary context can be made by introduction of numbers. And Interval pattern structure provides a natural way to organize descriptions of subsets of objects into a lattice for such a case. Interval pattern structures were introduced by Kaytoue et al. (2011) where they were used for gene expression analysis.

[1, 1] [1, 1]
[2, 2] [2, 2]
[3, 3] [2, 2]
Table 1: Example: an interval context

Table 1 shows an example of an interval context where three objects are described by two attributes. In such pattern structure values of attributes are represented by intervals, but the meaning is not probabilistic. Intervals represent a way to generalize underlying values and to ascend up a level of abstraction.

If we have two objects, then a numerical attribute can have all values from the interval of this attribute in the first object and from the interval of this attribute of the second object. Consequently, the similarity between two intervals can be defined as a convex hull of the intervals, i.e. , Then, given two tuples of intervals, the similarity between these tuples is computed as a component-wise similarity between intervals.

In this example, we have the pattern structure , where , the set is the set of all possible interval pairs with the similarity operation described above, and is given by the context in Table 1 i.e., , 1 ; 1, and

Figure 1: The diagram of the lattice of for the Interval context given in table 1

Figure 1 shows the resulting pattern lattice organizing all the pattern concepts of the Table 1 into an hierarchy.

2.4 Index of stability

Stability indexes were introduced by Kuznetsov (2007)

. One distinguishes intentional and extensional stability. The first one allows estimating the strength of dependence of an intent on each object of the respective extent. Extensional stability is defined in a dual way.

Stability of a formal concept may be interpreted as probability of retaining its intent after removing some objects from the extent (assuming that all the subsets of the extent are equally probable).

Calculation of stability is been shown to be intractable (Kuznetsov (2007)) implying high computation demands of the problem. Although there have been suggested a number of methods for its approximate calculation. According to Buzmakov et al. (2014) for large contexts the stability is close to 1 and it appears to be more practical to use logarithmic scale of stability inducing the same ranking as stability :

LStab ( Stab )

In this work they also suggest the following bounds of stability for its approximate and fast calculation:

where , is a set of all direct descendants of in the lattice and is the size of the set-difference between extents of formal concepts and

3 Experiment

3.1 Data

The main goal behind the experiment was to explore different characteristic patterns of sleep spindles and to select the most significant and descriptive. Although ultimate significance of found patterns can only be judged by medical experts we use established metrics of Data Mining as an intermediate evaluation. These metrics are support and stability index.

For our experiment we use a data-set of EEG recordings provided by Neurology ward of Central Hospital of Nancy. Data-set represents set of signals captured from 19 electrodes placed on a hospitalized patient’s scalp according to 10 – 20 system . Overall duration of the signal is roughly 6 minutes and a meta-file provides timings of the sleep spindles.

An example of a spindle from one of the patents’ EEG is represented on the Figure 2. The curve eclosed between two red vertical lines stands out from the rest of the signal with its elevated frequency. What are the characteristic properties that the spindle posses? Are there any other types of spindles? We aim to answer these questions with analysis of EEG with an FCA approach .

Figure 2: Example: the curve bounded by red lines corresponds to a spindle from the electrode F4 of one of the EEG data-sets

3.2 Formation of formal context

As FCA requires description of the domain in discrete way the fundamental part of setup of an experiment is transition from continuous signals to particular descriptions of their properties. We achieve this by calculating derivative characteristics of signals of sleep spindles. In this way formal context has sleep spindles as objects, and they are described with properties of their signal. The following is the list of standard properties used by EEG-analysts for detection of spindles that we in our turn adopted for our experiment:

  • average amplitude

  • maximum of absolute value of amplitude

  • average frequency

  • dominant frequency (see Fig. 3)

  • ratio of average amplitude to dominant frequency

  • …average bandpower for frequency bands [0,5 – 2,5], [2.5 – 4.5], …[28,5 – 30,5]

Figure 3: Extraction of dominant frequency as a maximum of Furier transform of the spindle’s signal in the characteristic interval of 6 – 14 Hz

14,76 43,39 7,06 11,56 1,28 111,74
13,57 49,64 7,11 7,69 1,76 162,25
16,61 41,03 7,28 9,52 1,74 122,01
14,66 54,78 5,07 8,24 1,78 270,57
15,95 46,12 4,95 11,34 1,41 142,78
16,58 54,96 3,82 11,67 1,42 290,43
13,16 45,85 5,36 11,48 1,15 134,28
15,62 51,31 4,89 10,11 1,54 238,48
12,21 31,85 4,86 11,81 1,03 86,07
12,55 41,17 7,79 8,00 1,57 129,01
14,35 65,78 4,25 10,92 1,31 385,95
17,66 48,67 4,87 9,13 1,93 449,57
14,94 39,12 5,47 9,73 1,53 281,13
20,77 74,99 3,66 9,66 2,15 623,86
17,83 64,39 5,64 12,11 1,47 128,97
16,00 43,11 3,89 11,89 1,35 329,20
15,40 60,87 6,83 12,14 1,27 234,18
9,47 29,63 7,81 7,56 1,25 39,45
16,11 42,39 3,46 10,45 1,54 271,81
18,42 67,76 3,47 11,76 1,57 147,11

Table 2: First 6 columns of the formal context. Each spindle (on the raw) is represented by the properties of its signal

For calculation of dominant and average frequency we used Furier transform and MATLAB meanfreq correspondingly. Analysis of different frequency bands was conducted with the help of MATLAB bandpower function. The result of calculation of specified properties is a formal context that serves as an input for FCA algorithms.

Table 2 depicts a part of the final context constituting the subject of our experiment. On top of that, the whole context has additional columns for the bandpower attributes, which are not included here because of big size of the resulting table.

3.3 Scheme of the experiment

Table 3 lists the steps of the execution sequence of the experiment. First stage is preprocessing of the data-set. It includes extraction of signals of the spindles from the entire input EEG data-set. It means that each spindle present in the data-set will be given a set of cut-out signals from all the electrodes describing only particular moments in time when that spindle occured.

Then follows application of signal treatment procedures for formation of a context according to the details given in the previous section. Attribute selection which is a standard Data Mining methodology is utilized to reduce dimensionality and decrease further computational costs of treating the data. It uses information gain metric and correlation between attributes to select a subset of the most relevant attributes of the dataset.

The second stage of the experiment which marks engagement of FCA methodology is construction of Formal concept diagram. Two approaches of producing and processing FCA diagram find their place in practice: one that require keeping the whole diagram in the memory and the other which processes the results on-line and discards concepts as they go. We will be using the first more demanding approach which on the other hand enables us to colculate stability index. For construction of a lattice we use the algorithm suggested by Kourie et al. (2009).

The third stage is dedicated to traversal of the lattice and calculation of stability for each of its concepts.We use boundaries described in the section 2.4 for its fast approximation. After it, filtering is applied to the whole lattice, stage 4, giving in the result a list of patterns satisfying the creteria.

Implementation-vise, preprocessing stage is carried out with help of ready-to-use MATLAB procedures, putting all the necessary work of data-format manipulations on python scripts. Formal concept analysis algorithms for construction of concept lattice and calculation of stability use C++ implementation.

Input: EEG recordings of brain activity,
meta file providing spindle start and end timings
1. Preprocessing of raw EEG data:
  Extraction of signals of the spindles
  Calculation of the characteristics
  Formation of a formal context
  Attribute selection
2. Construction of the Formal concept diagram
3. Calculation of stability index of the concepts
4. Filtering of patterns by stability and support

Output: patterns meeting specified constraints
Table 3: Mining of stable frequent patterns of sleep spindles in EEG recordings

4 Related Studies

Our research falls in line with the recent publications (Masyutin and Kuznetsov (2016); Korepanova and Kuznetsov (2016)) of practical applications of FCA to real data. Qualitative assessment of the results (Kuznetsov and Makhalova (2015); Kashnitsky and Ignatov (2014); Kashnitsky and Kuznetsov (2016)) shows that FCA methods are not weaker than other Data Mining techniques, but represent an attractive alternative. Nevertheless it can seem that FCA beers with itself unnecessary power, which also affects the amount of resources it uses compared to lighter statistical methods.

In the literature articles on application of FCA to neuroimaging have already appeared. Endres et al. (2012) utilizes FCA to discover hierarchy in the neural codes analyzing big set of fMRI data. Although the size of the impact of data filtering and approximation preceding utilization of FCA is not quite clear. There is no convenient method to assess the cost of putting the data on a coarse scale when the end result, and so the effect, cannot be measured in any way.

Another study by Yegenoglu et al. (2016)

uses FCA methodology to separate neural patterns of white noise, in the spike train data where the signal was captured directly from implanted electrodes. Although they approve the hypothesis that stability measure is suitable to filter out noise, the design of the experiment is not clearly leads to the conclusion. In some settings it is admissible to train a classifier on negative sample, so that in the end they can be differed from positive examples. Although this approach is highly prone to overfitting.

But the main drawback is that stability as a distinctive criterion is not well suitable for that purpose, because it is not preserved when moving from training to test data-set. The same formal concepts on test and training data-sets will have different values of stability. Which puts under the question the applicability of traditional training and classification model for this case.

5 Conclusion

In this paper we presented a preliminary study of application of FCA to EEG recordings. We specified data transformation protocol and a method to filter patterns by their significance.

Current vector of work is directed at resolving the issue of scalability of FCA techniques and at an additional search criteria that could improve how good found patterns can be interpreted.

From interoperability perspective, future research will be dedicated to application of FCA approach to study interrelation between the insights provided by EEG and fMRI techniques.


  • Acharya et al. (2005) Acharya, R., Fausta, O., Kannathala, N., Chuaa, T., and Laxminarayanb, S. (2005). Non-linear analysis of eeg signals at various sleep stages. Computer Methods and Programs in Biomedicine, 80, 37—–45.
  • Buzmakov et al. (2014) Buzmakov, A., Kuznetsov, S., and Napoli, A. (2014). Scalable estimates of concept stability. In Glodeanu, C., Kaytoue, M., Sacarea, C. (eds.) Formal Concept Analysis, volume 8478 of Lecture Notes in Computer Science, 157––172. Springer.
  • Chu et al. (2012) Chu, C.J., Kramer, M.A., Jay Pathmanathan, M.T.B., Westover, M.B., Wizon, L., and Cash, S.S. (2012). Emergence of stable functional networks in long-term human electroencephalography. Journal of Neuroscience, 32(8), 2703–2713.
  • Endres et al. (2012) Endres, D., Adam, R., Giese, M.A., and Noppeney, U. (2012). Understanding the semantic structure of human fmri brain recordings with formal concept analysis. In 10th International Conference, ICFCA 2012, Leuven, Belgium, May 7-10, 2012. Proceedings, volume 7278 of Lecture Notes in Computer Science, 96–111. Springer.
  • Ganter and Kuznetsov (2001) Ganter, B. and Kuznetsov, S. (2001). Pattern structures and their projections. In (ICCS 2001), volume 2120 of

    Lecture Notes in Artificial Intelligence

    , 129–142. Springer.
  • Ganter and Wille (1997) Ganter, B. and Wille, R. (1997). Formal Concept Analysis: Mathematical Foundations. Springer, NJ, USA.
  • Giusti et al. (2015) Giusti, C., Pastalkova, E., Curto, C., , and Itskov, V. (2015). Clique topology reveals intrinsic geometric structure in neural correlations. Proceedings of the National Academy of Sciences of the United States of America, 112 no. 44, 13455––13460.
  • Kashnitsky and Ignatov (2014) Kashnitsky, Y. and Ignatov, D.I. (2014). Can fca-based recommender system suggest a proper classifier? In Proceedings of the International Workshop ”What can FCA do for Artificial Intelligence?” (FCA4AI at ECAI 2014), volume 1257, 17–26. CEUR Workshop Proceedings.
  • Kashnitsky and Kuznetsov (2016) Kashnitsky, Y. and Kuznetsov, S. (2016). Interval concept lattice as a classifier ensemble. In Proceedings of the International Workshop ”What can FCA do for Artificial Intelligence?” (FCA4AI at ECAI 2016), volume 1703. CEUR Workshop Proceedings.
  • Kaytoue et al. (2011) Kaytoue, M., Kuznetsov, S.O., Napoli, A., , and Duplessis, S. (2011). Mining gene expression data with pattern structures in formal concept analysis. Inf. Sci. (Ny), 181(10), 1989 –– 2001.
  • Korepanova and Kuznetsov (2016) Korepanova, N.V. and Kuznetsov, S.O. (2016). Pattern structures for treatment optimization. In CLA 2016: Proceedings of the Thirteenth International Conference on Concept Lattices and Their Applications, volume 1624, 217–230. CEUR Workshop Proceedings.
  • Kourie et al. (2009) Kourie, D.G., Obiedkov, S., Watson, B.W., and van der Merwe, D. (2009). An incremental algorithm to construct a lattice of set intersections. Science of Computer Programming, 74 No. 3, 128–142.
  • Kuznetsov (2007) Kuznetsov, S. (2007). On stability of a formal concept. Annals of Mathematics and Artificial Intelligence, 49, 101–115.
  • Kuznetsov and Makhalova (2015) Kuznetsov, S. and Makhalova, T. (2015). Concept interestingness measures: a comparative study. In Proceedings of the Twelfth International Conference on Concept Lattices and Their Applications Clermont-Ferrand, France, October 13-16, volume 1466, 59–72. CEUR Workshop Proceedings.
  • Kuznetsov and Samokhin (2005) Kuznetsov, S. and Samokhin, M. (2005). Learning closed sets of labeled graphs for chemical applications. In

    Proc. 15th Conference on Inductive Logic Programming (ILP 2005)

    , volume 3625 of Lecture Notes in Artificial Intelligence, 190–208. Springer.
  • Masyutin and Kuznetsov (2016) Masyutin, A. and Kuznetsov, S. (2016). Continuous target variable prediction with augmented interval pattern structures: Lazy algorithm. In CLA 2016: Proceedings of the Thirteenth International Conference on Concept Lattices and Their Applications, volume 1624, 273–284. CEUR Workshop Proceedings.
  • Mirowski et al. (2009) Mirowski, P., Madhavan, D., LeCun, Y., and Kuzniecky, R. (2009). Classification of patterns of eeg synchronization for seizure prediction. Clinical Neurophysiology, 120, 1927––1940.
  • Morozov et al. (2015) Morozov, D., Kuznetsov, S., Lezoche, M., and Panetto, H. (2015). Formal methods for process knowledge extraction.
  • Yegenoglu et al. (2016) Yegenoglu, A., Quaglio, P., Torre, E., Grün, S., and Endres, D. (2016). Exploring the usefulness of formal concept analysis for robust detection of spatio-temporal spike patterns in massively parallel spike trains. In 22nd International Conference on Conceptual Structures, ICCS 2016, Annecy, France, July 5-7, 2016, Proceedings, volume 9717 of Lecture Notes in Computer Science, 3–16. Springer.