A New Algorithm based on Extent Bit-array for Computing Formal Concepts

10/29/2021
by   Jianqin Zhou, et al.
SUN YAT-SEN UNIVERSITY
0

The emergence of Formal Concept Analysis (FCA) as a data analysis technique has increased the need for developing algorithms which can compute formal concepts quickly. The current efficient algorithms for FCA are variants of the Close-By-One (CbO) algorithm, such as In-Close2, In-Close3 and In-Close4, which are all based on horizontal storage of contexts. In this paper, based on algorithm In-Close4, a new algorithm based on the vertical storage of contexts, called In-Close5, is proposed, which can significantly reduce both the time complexity and space complexity of algorithm In-Close4. Technically, the new algorithm stores both context and extent of a concept as a vertical bit-array, while within In-Close4 algorithm the context is stored only as a horizontal bit-array, which is very slow in finding the intersection of two extent sets. Experimental results demonstrate that the proposed algorithm is much more effective than In-Close4 algorithm, and it also has a broader scope of applicability in computing formal concept in which one can solve the problems that cannot be solved by the In-Close4 algorithm.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

10/29/2021

Concept and Attribute Reduction Based on Rectangle Theory of Formal Concept

Based on rectangle theory of formal concept and set covering theory, the...
10/14/2018

Conceptual Collectives

The notions of formal contexts and concept lattices, although introduced...
11/15/2021

A Comparison of O(1) and Cyrus-Beck Line Clipping Algorithms in E2 and E3

A comparison of a new algorithm for line clipping in E2 and E3 by convex...
10/14/2020

LCM is well implemented CbO: study of LCM from FCA point of view

LCM is an algorithm for enumeration of frequent closed itemsets in trans...
12/26/2017

Space-Efficient Algorithms for Longest Increasing Subsequence

Given a sequence of integers, we want to find a longest increasing subse...
03/21/2018

On-demand Relational Concept Analysis

Formal Concept Analysis and its associated conceptual structures have be...
08/18/2019

A New Fast Computation of a Permanent

This paper proposes a general algorithm called Store-zechin for quickly ...

1 Introduction

Among data analysis techniques, Formal Concept Analysis (FCA) is a useful knowledge representation framework for describing and summarizing data. As the crucial data structure of FCA, concept lattice is an effective tool for knowledge discovering, which can depict the generalization and specification between formal concepts in a hierarchical structure. Concept lattice has been widely used in many areas, such as data mining, machine learning, information retrieval and so on

[4, 9, 17, 16, 23, 24]. The main research contents of concept lattice include lattice construction [14, 1, 2, 8, 15, 21, 19, 20], rule extraction [11, 12, 13, 18] and lattice reduction [13, 10, 22].

A challenging problem in computing these formal concepts is that a typical data set may have a great number of formal concepts. It is well known that the number of formal concepts can be increased exponentially in associated with the size of the input context and the problem of determining this number is #P-complete [7].

In FCbO algorithm, Outrata and Vychodil [14] introduced an idea in which a concept is closed before its descendants are computed, thus allowing the descendants to fully inherit the attributes of the parent. With the spirit of ‘best-of-breed’ research, this idea was integrated into the In-Close2 algorithm [1].

Considering the formal context as a matrix, a row is all the attributes of an object and a column is all the objects of an attribute. Further, all the objects of a formal concept is called extent and all the attributes of a formal concept is called intent. Within In-Close2, In-Close3 or In-Close4 algorithms [1, 2, 3], intents are stored in a linked list tree structure. Extents are stored in a linearised 2-dimensional array. The context is stored as a horizontal bit-array for optimising for RAM and cache memory. This also allows multiple context cells to be processed by a single 32-bit or 64-bit operator.

Suppose that one row of context is , , , , it is stored as in In-Close2, In-Close3 or In-Close4 algorithms, where the first bit means , the second bit means , the third bit means and so on. The main shortcoming of these algorithms is that the extent of a concept is not stored as a 32-bit-array (or 64-bit-array), thus they process the intersection of the extent of a concept and a column of context only one object at a time.

A crucial improvement in our algorithm is that both context and extent of a concept are stored as a vertical bit-array for optimising for RAM and cache memory, which can significantly reduce both the time complexity and space complexity. Suppose that one column of context or the extent of a concept is , , , , it is stored as in our algorithm. Thus multiple context cells are processed by a single 32-bit or 64-bit operator when finding the intersection of the extent of a concept and a column of context.

The second important improvement is the following. The core procedure in In-Close2 algorithm is ComputeConceptsFrom((A,B),y), which uses a queue of local array [1, 2]. In most cases, the local queues are empty, thus the space complexity is not efficient. In our algorithm, the queue is optimised and used as one global queue, which would greatly reduce the space complexity of the core procedure.

This paper illustrates, after a brief description of formal concepts, how formal concepts are computed via In-Close2 algorithm. Using a simple example, the basic recursive process of In-Close2 algorithm is shown, line-by-line. Using the same notation and style, we present a new variant called the In-Close5. The key differences between the algorithms are then compared to highlight where efficiencies occur.

The paper is organized as follows. In Section 2, we review the necessary notions concerning formal concepts and their basic properties. In Section 3, we study how formal concepts are computed using In-Close2 algorithm. In Section 4, we present the In-Close5 algorithm and give experiment results with In-Close2 algorithm, In-Close4 algorithm and In-Close5 algorithm. Finally the paper is concluded in Section 5.

2 Basic notions and properties

In this section, we will review some basic notions and properties of FC involved in this paper. The definitions of a formal context and its operators are given first as follows.

Definition 1. [5] Let be a formal context, where , , and is a binary relation between and . Here each is called an object, and each is called an attribute. If an object has an attribute , we write or .

Definition 2. [5] Let be a formal context. For any and , a pair of positive operators are defined by:

,

,

Based on the above operators, formal concepts and concept lattices are defined as follows.

Definition 3. [5] Let be a formal context. For any , , if and , then is called a formal concept, where is called the extent of the formal concept, and is called the intent of the formal concept. For any , one can define the partial order as follows:

The family of all formal concepts of is a complete lattice, and it is called a concept lattice and denoted by .

Let be a formal context. For any , , the following properties hold:

(1) , ;

(2) , ;

(3) , ;

(4) ;

(5) ;

(6) ;

Typically a table of or is used to represent a formal context, with s indicating binary relations between objects (rows) and attributes (columns). The following is a simple example of a formal context:

1 0 1 1 0 0

2
1 1 0 0 0

3
1 0 0 0 0

4
0 0 0 0 1

5
0 0 0 1 1

6
0 0 1 1 1
Table 1: Formal context

The formal concepts in Table 1 can be calculated as given in the following Table 2:











Table 2: Formal concepts in Table 1

Formal concepts in a table of or can be visualised as closed rectangles of s, where the rows and columns in the rectangle are not necessarily contiguous. Suppose we define the cell of the th row and th column as . Thus in Table 1, , , and form the concept , and is a rectangle of height and width . Similarly , and form the concept , and is a rectangle of height and width . and form the concept , and is a rectangle of height and width , here and are not contiguous.

In fact, it is not easy to compute the formal concepts given a formal context. Next we will address this problem.

3 Computation of formal concepts

A formal concept can be obtained by applying the operator to a set of attributes to get its extent, and then applying the operator to the extent to get the intent.

For example, from the context in Table 1, and . So is concept in Table 2 .

If this procedure is applied to every possible subset of , then all the concepts in the context can be obtained. However, the number of formal concepts can be exponential in terms of the size of the input context and the problem of determining this number is #P-complete [7]. So an efficient algorithm is crucial and required to compute all the formal concepts in a formal context.

By taking the advantages of algorithm In-Close and algorithm FCbO, In-Close2 is very efficient [1, 2]. The In-Close2 algorithm, given below, is invoked with an initial and an initial attribute , where there are columns in the formal context.

Line 1 – Iterate across the context, from starting attribute down to attribute 0 (the first column).

Line 2 – Skip attributes already in , as intents now inherit all of their parent’s attributes.

Line 3 – Form an extent , by intersecting the current extent with the next column of objects in the context.

Line 4 and Line 5 – If the extent formed, , equals the extent, , of the concept whose intent is currently being processed, then add the current attribute to the intent being processed, .

Line 7 – Otherwise, check whether is contained in any new concept in the queue.

Line 8 – If is not contained, place the new extent C and the location where it was found, , in a queue for later processing.

Lines 13 – The queue is processed by obtaining each new extent C and the associated location from the queue.

Line 14 – Each new partial intent, , inherits all the attributes from its completed parent intent, , along with the attribute, , where its extent was found.

Line 15 – Call ComputeConceptsFrom to compute child concepts from and to complete the intent .

As the extent of a concept is not stored as a 32-bit-array (or 64-bit-array), thus in Line 3 of ComputeConceptsFrom, the algorithm processes the intersection of the extent of a concept and a column of context only one object at a time, which increases the time complexity of In-Close2 greatly. This is the main disadvantage of In-Close2 algorithm.

For example, apply In-Close2 algorithm to the formal context in Table 1, we have results in Table 2. In the first call ComputeConceptsFrom, , , and passed through IsCannonical() test. As , where , so failed IsCannonical() test.

In the second call ComputeConceptsFrom, passed through IsCannonical() test, we got concept as the child concept of . Similarly, we got concept as the child concept of , as the child concept of and as the child concept of .

By swapping the fourth column and the fifth column in Table 1, we have the following Table 3.

1 0 1 1 0 0

2
1 1 0 0 0

3
1 0 0 0 0

4
0 0 0 1 0

5
0 0 0 1 1

6
0 0 1 1 1
Table 3: Formal context

Apply In-Close2 algorithm to the formal context in Table 3, we have results in Table 4.

In the first call ComputeConceptsFrom, all , , , and passed through IsCannonical() test.

When call ComputeConceptsFrom with and , we got , where , thus .

Similarly, we got concept as the child concept of , as the child concept of and as the child concept of .










Table 4: Formal concepts in Table 3

In Figure 1, one can see the call tree of ComputeConceptsFrom. From the graph theory, the number of vertices is equal to the number of edges plus one. Here one edge means a ComputeConceptsFrom call from the queue and one vertex means an implementation of ComputeConceptsFrom. One can see that during the first implementation of ComputeConceptsFrom, 5 function calls from the queue are launched. It is obvious that during the implementation of ComputeConceptsFrom, one function call from the queue is launched averagely.

Specifically, the local queue of ComputeConceptsFrom is implemented as the following. First int Bchildren[MAX_COLS] is used to store the location of the attribute that will spawn new concept. Second int Cnums[MAX_COLS] is used to store the concept number of the spawned concept, where MAX_COLS. One can see that the efficiency of the local queue is very low.

Figure 1. The call tree of ComputeConceptsFrom

In line 3 of ComputeConceptsFrom, if the extent formed, , is empty, then store the current attribute , which can be ignored in concepts of subsequent levels. This is the main improvement from In-Close2 to In-Close3. Further In-Close4 is a 64 bit version, and it can build and output concept trees in JSON format, where JSON stands for Java Script object notation, is a lightweight data representation method.

4 In-Close5 algorithm

Within In-Close2, In-Close3 or In-Close4 algorithms [1, 2, 3], extents are stored in a linearised 2-dimensional array. A concept of objects will occupy integers. Furthermore, in ComputeConceptsFrom, the core procedure of In-Close2, intersecting the current extent with the next column of objects in the context is the most time-consuming operation. It inspires us to store both context and extents of concepts as a vertical bit-array.

Technically, let rows in a context be divided into blocks, where is the largest number that is less than or equal to . We only store the rows with objects (or nonzero block value) by block number and block value.

For example, the column , , , , , , is divided into 2 blocks. The first block value is , where the first bit means , the second bit means , the third bit means and so on. Thus the column is stored as , namely block number is and block value is . We do not store the second block, as the second block value is .

In the case of In-Close4 algorithm, suppose that are the objects of a concept, then they will stored as , namely integers indicate the locations of all objects (or the locations of all s ).

In the best case of In-Close5, a concept of objects only occupy integers, and one bitwise logic and operation may process 32 objects when stored as 32-bit integers. In the worst case of In-Close5, the column is stored as , while In-Close4 the same column is stored as .

With In-Close2 or In-Close4 algorithm to process mushroom data [6], extents of all concepts will occupy bytes of memory. In contrast, In-Close5 algorithm only needs bytes of memory when stored as 32-bit integer, and bytes of memory when stored as 64-bit integer. From the results above, the space complexity of In-Close5 is much better than that of In-Close2 and In-Close4.

The core procedure in In-Close2 algorithm is ComputeConceptsFrom((A,B),y), which uses a queue of local array [1, 2]. Specifically, use int Bchildren[MAX_COLS] to store the location of the attribute that will spawn new concept, and int Cnums[MAX_COLS] to store the concept number of the spawned concept, where MAX_COLS. In fact, the number of times that the function is called is equal to the number of concepts. Thus in most cases the queue is empty. This inspires us to link all the local queue together as one global queue, and use the concept number as the index of the queue. Thus we only need to store the location of the attribute where the new extent was found.

The In-Close5 algorithm is presented as the following, which is invoked with an initial , an initial attribute , where there are columns in the formal context, and an initial empty Bparent.

Line 1 – Bparent contains and the attribute that can be ignored in concepts of subsequent levels. The child concept inherits attributes from the parent.

Line 2 – Iterate across the context, from starting attribute down to attribute 0 (the first column).

Line 3 – Skip attributes already in Bchild.

Line 4 – Form an extent , by intersecting the current extent with the next column of objects in the context. It is implemented in C language as the following.

unsigned int* Ac = startA[c]; //pointer to start of current extent

unsigned int* aptr = startA[highc]; //pointer to start of next extent to be created

int sizeAc = startA[c+1]-startA[c]; //calculate the size of current extent

/* iterate across objects in current extent to find them in current column */

for(int i = sizeAc/2; i >0; i–)

if(context0[*Ac][j] & *(Ac+1))

*aptr = *Ac; //add object block number to new extent (intersection)

aptr++;

*aptr = context0[*Ac][j] & *(Ac+1); //add object block value to new extent

aptr++;

Ac+=2; //move to next object block

Line 5 and Line 6 – If the extent formed, , is empty, then put in Bchild, which can be ignored in concepts of subsequent levels.

Line 8 and Line 9 – If the extent formed, , equals the extent, , of the concept whose intent is currently being processed, then add the current attribute to the intent being processed, and also put in Bchild.

Line 11 – Otherwise, check whether is contained in any new concept in the queue.

Line 12 – If is not contained, place the location in a global queue for later processing. It is implemented as the following.

Bchildren[highc-1] = j; //note where (attribute column) it was found,

nodeParent[highc] = c; //note the parent concept number and

startA[++highc] = aptr; //note the start of the new extent in A.

Lines 18 – The queue is processed by obtaining each new extent and associated location from the queue.

Line 19 – Each new partial intent, , inherits all the attributes from its completed parent intent, , along with the attribute, , where its extent was found and attributes that can be ignored in concepts of subsequent levels.

Line 20 – Call ComputeConceptsFrom to compute child concepts from and to complete the intent .

Lines 18, Lines 19 and Lines 20 are implemented in C language as the following.

// here numchildrenStart is stored as highc-1 at the beginning of

//ComputeConceptsFrom

for( = highc-2; numchildrenStart ; –)

startB[+1] = bptr; //set the start of the intent in B tree

// note that is the number of new extent

ComputeConceptsFrom(, Bchildren[]-1, Bchild);

As both context and extent of a concept are stored as a vertical bit-array, when form an extent in Line 4, at most 32 (64) context cells can be processed by a single 32-bit ( 64-bit) and operation. In the case of In-Close3, the extent of a concept is not stored as a 32-bit-array (or 64-bit-array), thus In-Close3 processes context only one cell at a time. In Line 12, we only place the location in a global queue for later processing, while In-Close3 has many local empty queues. The time complexity and space complexity is greatly reduced, however the logic structure of In-Close5 algorithm is almost the same as that of In-Close3 algorithm, so please refer to [2] for the correctness of In-Close5 algorithm.

Considering the formal context as a matrix, when transpose the matrix, the concepts of the new matrix should be symmetric to that of the original matrix. However, there are 8124 columns in transposed mushroom data, thus the depth of recursive calls of ComputeConceptsFrom is greatly increased and so does the complexity.

For In-Close5 algorithm, with one global queue, it is capable to process transposed mushroom data but with much longer time. In contrast, as local queues use too much memory, In-Close4 algorithm can not process transposed mushroom data.

Some experiments are done to compare the time complexity of In-Close2 algorithm, In-Close4 algorithm and In-Close5 algorithm. The experiment results are given in Table 5. Here mushroom data and nursery data are from [6]. The experiments are carried out using a laptop computer with an Intel Core i5-2450M 2.50 GHz processor and 8GB of RAM.

Mushroom Nursery Transposed Mushroom Transposed Nursery

#concepts
233,101 154,055 233,101 154,055

In-Close2
0.424 0.123

In-Close4
0.388 0.132

In-Close5
0.195 0.073 102.536 105.531
Table 5: Comparison of In-Close2, In-Close4 and In-Close5

From Table 5, one can see that for Mushroom data of , In-Close4 is faster than In-Close2 and In-Close5 is the fastest. For Nursery data of , In-Close4 has no advantage over In-Close2 as 30 is much less than 64. As local queues use too much memory, both In-Close2 and In-Close4 can not process transposed mushroom data.

5 Conclusions and future work

Within In-Close2, In-Close3 or In-Close4 algorithms, intents are stored in a linked list tree structure and extents are stored in a linearised 2-dimensional array. The data structure is very simple and effective. A crucial improvement in our algorithm is that both context and extents of concepts are stored as a vertical bit-array to optimise for RAM and cache memory, which also significantly reduces the time for processing extents of concepts.

Object oriented concept lattice is a more extensive concept lattice [25]. It is more difficult to construct object oriented concept lattices. In the future, we will apply the data structure and technique in these algorithms to object oriented concept lattices, attribute oriented concept lattices and so on.

References

  • [1] S. Andrews, In-close2, a high performance formal concept miner. S. Andrews, S. Polovina, R. Hill, B. Akhgar (Eds.), Conceptual Structures for Discovering Knowledge – Proceedings of the 19th International Conference on Conceptual Structures (ICCS), Springer (2011), pp. 50-62
  • [2] S. Andrews, A ‘Best-of-Breed’ approach for designing a fast algorithm for computing fixpoints of Galois Connections, Information Sciences, 295 (20) (2015) 633–649.
  • [3] S. Andrews, In-Close4 Program, 2017, https://sourceforge.net/projects/inclose/files/In-Close/.
  • [4] V.G. Blinova, D.A. Dobrynin, V.K. Finn, S.O. Kuznetsov, E.S. Pankratova, Toxicology analysis by means of the jsm-method, Bioinformatics. 19(10) (2003) 1201–1207.
  • [5] B. Ganter, R. Wille, Formal Concept Analysis: Mathematical Foundations, Springer-Verlag, New York, 1999.
  • [6] A. Frank, A. Asuncion, UCI Machine Learning Repository, 2010, http://archive.ics.uci.edu/ml.
  • [7] S.O. Kuznetsov, On computing the size of a lattice and related decision problems, Order, 18 (4) (2001) 313-321
  • [8] S.O. Kuznetsov, S.A. Obiedkov, Comparing performance of algorithms for generating concept lattices,J. Exp. Theor. Artif. Intell. 14(2–3) (2002) 189–216.
  • [9] S.O. Kuznetsov, Machine learning and formal concept analysis, in: Concept Lattices, Proceedings of the Second International Conference on Formal Concept Analysis, ICFCA 2004, Sydney, Australia, February 23-26, 2004, pp.287-312.
  • [10] S.O. Kuznetsov, S.A. Obiedkov, C. Roth, Reducing the representation complexity of lattice-based taxonomies, in: Conceptual Structures: Knowledge Architectures for Smart Applications, Proceedings of the 15th International Conference on Conceptual Structures, ICCS 2007, Sheffield, UK, July 22-27, 2007, pp.241–254.
  • [11] J. Li, C. Mei, Y. Lv, Incomplete decision contexts: approximate concept construction, rule acquisition and knowledge reduction, Int. J. Approx. Reason. 54(1) (2013) 149–165.
  • [12] J. Li, C. Mei, L. Wang, J. Wang, On inference rules in decision formal contexts, Int. J. Comput. Intell. Syst. 8(1) (2015) 175–186.
  • [13] J. Li, C. Mei, J. Wang, X. Zhang, Rule-preserved object compression in formal decision contexts using concept lattices, Knowl.-Based Syst. 71 (2014) 435–445.
  • [14] J. Outrata, V. Vychodil, Fast algorithm for computing fixpoints of Galois connections induced by object-attribute relational data, Information Sciences, 185 (1) (2012) 114-127
  • [15] P. Osicka, Algorithms for computation of concept trilattice of triadic fuzzy context, in: Advances in Computational Intelligence -Proceedings of the 14th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2012, Catania, Italy, July 9-13, 2012, pp.221-230 (Part III).
  • [16] N. Pasquier, Y. Bastide, R. Taouil, L. Lakhal, Efficient mining of association rules using closed itemset lattices, Inf. Syst. 24(1) (1999) 25–46.
  • [17] J. Poelmans, D.I. Ignatov, S. Viaene, G. Dedene, S.O. Kuznetsov, Text mining scientific papers: a survey on FCA-based information retrieval research, in:Advances in Data Mining. Applications and Theoretical Aspects -Proceedings of the 12th Industrial Conference, ICDM 2012, Berlin, Germany, July 13-20, 2012, pp.273-287.
  • [18] Z. Pei, D. Ruan, D. Meng, Z. Liu, Formal concept analysis based on the topology for attributes of a formal context, Information Sciences, 236 (2013) 66–82.
  • [19] J. Qi, W. Liu, L. Wei, Computing the set of concepts through the composition and decomposition of formal contexts, in: International Conference on Machine Learning and Cybernetics, Proceedings, ICMLC 2012, Xian, Shaanxi, China, July 15–17, 2012, pp.1326–1332.
  • [20] J. Qi, T. Qian, L. Wei, The connections between three-way and classical concept lattices, Knowl.-Based Syst. 91 (2016) 143–151.
  • [21] J. Qi, L. Wei, Z. Li, A partitional view of concept lattice, in: Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, Proceedings of the 10th International Conference, RSFDGrC 2005, Regina, Canada, August 31–September 3, 2005, pp.74–83 (Part I).
  • [22] R. Ren, L. Wei, The attribute reductions of three-way concept lattices, Knowl.-Based Syst. 99 (2016) 92–102.
  • [23] M. Shao, H. Yang, W. Wu, Knowledge reduction in formal fuzzy contexts, Knowl.-Based Syst. 73 (2015) 265–275.
  • [24] Q. Wan, L. Wei, Approximate concepts acquisition based on formal contexts, Knowl.-Based Syst. 75 (2015) 78–86.
  • [25] Y. Y. Yao, Concept lattices in rough set theory, Processing Nafips 04 IEEE Meeting of the the Fuzzy Information, Canada: IEEE, September 27,2004: 796-801.