Formal Concept Analysis for Knowledge Discovery from Biological Data

06/01/2015 ∙ by Khalid Raza, et al. ∙ Jamia Millia Islamia University 0

Due to rapid advancement in high-throughput techniques, such as microarrays and next generation sequencing technologies, biological data are increasing exponentially. The current challenge in computational biology and bioinformatics research is how to analyze these huge raw biological data to extract biologically meaningful knowledge. This review paper presents the applications of formal concept analysis for the analysis and knowledge discovery from biological data, including gene expression discretization, gene co-expression mining, gene expression clustering, finding genes in gene regulatory networks, enzyme/protein classifications, binding site classifications, and so on. It also presents a list of FCA-based software tools applied in biological domain and covers the challenges faced so far.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

After Human Genome Project, there is unprecedented growth in biological data. Due to technological advancement in high throughput technologies, such as Microarray and Next Generation Sequencing, it is possible to produce high quality biological data with rapid speed. The biological data can be broadly classified as Genomics, Transcriptomic and Proteomics. For example, gene expressions are transcriptomic data that quantify the state of genes in a cell. When these gene expression data are analyzed properly, it may reveal many hidden cellular processes and biological knowledge. These knowledge discoveries from biological data may lead to better understanding of disease mechanism and further it guide for better diagnosis and therapy of the disease.

Formal Concept Analysis (FCA), introduced by R. Wille in early 1980s wille82

, is a method based on lattice theory for the analysis of binary relational data. Since its inception, FCA has been found to have potential applications in many areas including data mining, knowledge discovery and machine learning. Like other computational technique, FCA has also been applied in microarray analysis, gene expression mining, gene expression clustering, finding genes in gene regulatory networks, enzyme/protein classifications, binding site classifications, and so on. In this chapter, we will present the current status of FCA for the analysis and knowledge discovery from biological data and also cover challenges faced so far.

2 Biological Databases

Due to availability of high-throughput techniques, biological database are being generated exponentially and the modern biology has turned into a data-rich science. Some of the important biological data are nucleotide and protein sequences, protein 3D structure produced by X-ray crystallography and NMR, metabolic pathways, complete genomes and maps, gene expression and protein-protein interaction, and so on.

Biological databases are broadly divided into sequences databases and structure databases. Sequence data are applicable for both DNA and protein, but structural databases are applicable for proteins only. Today, most of the biological databases are freely available to the researchers. In general, biological databases can be classified as primary, secondary and composite databases. A primary databases stores information of either sequence or structure. For example, Uni-PROT and PIR for protein sequence, GenBank and DDBJ for Genome sequence and the Protein Databank for protein structure. Secondary database stores information which are derived from the primary database source, such as conserved sequence information, active site residues of the protein families arrived by multiple sequence alignment of a set of related proteins, etc. The SCOP, CATCH, PROSITE are few examples of secondary databases. A composite database is a collection of variety of different primary database sources that avoid the need for searching into multiple database sources. The National Centre for Biotechnology Information (NCBI) is the main central host that links multiple database sources and makes these resources freely available to us. For more information about biological databases, refer tutorials in babu97 and kraulis01 .

3 Microarray Analysis

3.1 Mining gene expression data

Due to rapid advancement in high throughput technology such as Microarray and Next Generation Sequencing, transcriptomic data has been produced in unprecedented way. But analysis and interpretation of these data remains a challenge for the researchers due to complexity of the biological systems. The motivation and biological background that need to be considered for gene expression mining are: i) mostly single gene participates in many biological processes, i.e., it has several functions, ii) a biological process implies a small subset of genes, iii) a biological process of interest may be active in many, all or none situation for a given dataset, and iv) differentially expressed genes over different samples are not frequent.

In transcriptomic, researchers routinely analyze expression level of genes in different situations such as in tumor samples versus normal samples. Formal Concept Analysis (FCA) has been successfully applied in the field of transciptomics. Some of the studies identified set of genes that are sharing same transcriptional behavior using FCA rioult03 ; rioult03a ; kaytoue-uberall09 . Due to availability of large gene expression datasets, it is possible to apply data mining tools to identify patterns of interest in the gene expression data. One of the most widely used data mining technique is association rules which can be applied for the analysis of gene expression data. Association rules may uncover biologically relevant associations between genes, or between different environmental conditions and gene expression. An association rules can be written in the form , where and are disjoint sets of data items. The set is likely to occur whenever the set occurs. Here, the data items may include highly expressed or repressed genes, or other relevant facts stating the cellular environment of genes such as diagnosis of a disease samples. Association rules mining has been applied for gene expression data mining by many researchers including creighton03 .

In this section, we have discussed only the application of different variants of FCA for gene expression data mining, especially extracting co-expressed groups of genes sharing similar expression. Most of the methods for co-expressed genes mining are based on binary biclustering methods. Here, scaling of data is done using a single threshold and one expression value. The expression values above this threshold are considered as over-expressed and represented by 1; otherwise it is considered as under-expressed and represented by 0. Once, the gene expression values are discretized to binary table then strong relationships can be extracted having biologically meaningful information. Kaytoue-Uberall et al. (2008) kaytoue-uberall08 proposed interval-based FCA to extract groups of co-expressed genes. Given a set of genes , a set of relationships , and set of ordered intervals , where and is binary relation means gene expression value of gene is interval of index for situation . Hence, formal concept of the context shows groups of genes having in same interval. Although a priori determination of these intervals are difficult.

Messai et al. (2008) messai08 proposed interval-free FCA based method to cluster gene expression values. However, this algorithm does not deal with large data set and also no link to interordinal scaling was done. To overcome these problems, Kaytoue-Uberall et al. (2009) kaytoue-uberall09 introduced two FCA-based methods for clustering gene expression data. The first method is based on interordinal scaling and second one is based on pattern structures that require adaptation of algorithm computed with interval algebra. Between these two algorithms by Kaytoue-Uberall et al. (2009) kaytoue-uberall09 , second method has been proved to be more computationally efficient and provide more readable results. These algorithms have been tested on microarray gene expression data of fungus Laccaria biocolor taken from Gene Expression Omnibus databases (GSE9784) composed of 22,294 genes and five different conditions. For the dimension reduction, cyber-T kayala12 tool was used that filter dataset and returned 10,225 genes.

DNA methylate affects the expression of genes and their regulation may cause several cancer-specific diseases. It is observed in many investigations that hypomethylation of DNA have been associated with many cancers including breast cancer. Amin et al., (2012) amin12

applied FCA for mining the hypomethylated genes among breast cancer tumors. They constructed formal concepts lattices with significant hypomethylated genes for every breast cancer subtypes. The constructed lattice reflects the biological relationships among breast cancer tumor subtypes. The proposed filter method has two stages: non-specific filter and specific filters. The non-specific filtering step determines the hypomethlated CPGs by computing the difference between the mean of methylation level for the corresponding adjacent normal tissue. The second stage (specific filtering) receives the output of the first stage as input and performs one-sample Kolmogorov Smirnov test to check the normality of each breast cancer subtypes. If the given dataset follows normal distribution then paired t-test is applied, otherwise Wilcoxon signed ranked is applied. Once, the filtering of hypomethylated genes is done then FCA has been applied to determine breast cancer subtypes. Here, Java-based FCA analysis software tool, called ConExp

serhiy00 was used to generate the lattice diagram.

3.2 Clustering gene expression data

For grouping set of genes and/or grouping experimental conditions having similar gene expression pattern, clustering algorithms are the most popularly applied method. Some of the most widely used clustering algorithms are hierarchical, k-means, self organizing maps, fuzzy c-means, and so on

raza14 . However, FCA has also been used for grouping genes, as an alternative approach to clustering. Choi et al. (2008) choi08 proposed FCA-based approach for grouping genes based on their gene expression pattern. FCA builds a lattice from the gene expression data together with some additional biological information, where each vertex corresponds to a subset of genes which are clubbed together based on their expression values and some other functional information. The lattice structures of gene sets are assumed to show biological relationship in the gene expression dataset. Here, similarities and dissimilarities between different experiments are determined by corresponding lattices. This approach consists of three main steps: i) building a binary relation, ii) construction of concept lattice, and iii) defining a distance measure and comparing the lattices. In the first step, the objects are genes, their discretized gene expression attributes and biological attributes. In the second step, for each experiment a binary relationship is constructed using concept lattice algorithm. Finally, third step calculates distance and compares the lattices. The work of Choi et al. (2008) choi08 is an attempt to apply FCA for gene clustering but the distance measure employed was quite fundamental and it did not properly exploit the properties of the lattice structure. Hence, other possible distance measures such as spectral distance, maximal common sublattice based distance, etc. can also be investigated choi08 . In addition to global lattice comparison, local structure (sublattice) can also be investigated that may assist in identification of particular biological pathways.

Melo and collaborators melo13 proposed an FCA-based approach combined with association rule and visual analytics to find out overlapping groups of genes in gene expression and analyzed it in an analytical tool called CUBIST. The workflow of CUBIST involves querying a semantic databases and transforming the result into formal context and then it is visualized as a concept lattice and associated charts. The CUBIST tool address the challenges of gene expression analysis by filtering and grouping large amount of datasets, interactive exploration of data and presents various relevant statistics.

3.3 Clustering multi-experiment expression data

Due to availability of high-throughput techniques, presently we have large number of gene expression datasets. Combining datasets taken from multiple microarray experiments is research question. It has been proved and suggested by many recent studies that the analysis and integration of multi-experiment datasets are expected to give more accurate, reliable and robust results. The reason is that integrated datasets would be based on large number of gene expression samples and the effects of individual study-specific biases are reduced. For the consensus integration of multi-experiment expression data, FCA has been successfully applied by Hristoskova and collaborators hristoskova14 . They proposed a generic consensus clustering which applied FCA for consolidation and analysis of clustering solutions taken from multiple microarray experiments. Initially, the datasets are broken into multiple groups of related experiments based on some predefined criteria. In the next step, a consensus clustering technique is deployed to each group that results on clustering solution per group. Further, these solutions are pooled together and analyzed by FCA that enables extracting valuable insights from the data and generate a gene partition over all the experiments. The FCA-enhanced consensus clustering algorithm proposed by Hristoskova and collaborators hristoskova14 is depicted in Fig. 1. The algorithm is divided into three steps: initialization, clustering and FCA-based analysis. In the initialization step, multi-experiment data are divided into groups of related datasets. Clustering step applies consensus clustering that generates different solutions. FCA-based analysis step construct concept lattice that partitions the genes into a set of disjoint clusters, as shown in Fig. 1. The advantages of FCA-enhanced clustering approach proposed in hristoskova14 are as follows: i) Uses all data that allow each group of related experiments to have a different set of genes, i.e., total set of studies genes is not limited to those present in all the datasets, ii) it can be better tuned for each samples by identifying initial number of clusters for each group of related experiment, depending upon the number, composition and quality of expression profiles, and iii) the problem with ties is avoided by applying FCA to analyze together all partitioned results and find out the final clustering solution representation as the entire experiment collection.

Figure 1: Schematic representation of the FCA-enhanced consensus clustering algorithm hristoskova14

One another attempt for the application of FCA for knowledge discovery and knowledge integration from gene expression data has been done by Benabderrahmane (2014) benabderrahmane14 . Benabderrahmane benabderrahmane14 introduced a symbolic data mining approach based on FCA involving bi-clustering of genes, for knowledge discovery and knowledge integration. Firstly, datasets are represented as a formal context (objects × attributes), where objects are genes and attributes are their expression profiles plus additional information was used such as GO terms that they annotate, the list of pathways they are involved and their genetic interactions. The algorithm has eight steps, the outline of the algorithm is depicted in Fig. 2. This algorithm integrates different kinds of datasets such as genes having similar expression profiles and share similar biological function (GO ontology), knowledge-base of pathways and interactors (KEGG, BioGrid, STRING, etc.)

Figure 2: An overview of the proposed framework proposed by Benabderrahmane (2014)benabderrahmane14

3.4 Gene expression data comparison

Finding and understanding the similarities among various diseases is an import research problem in translational bioinformatics. Understanding disease similarities may help us in refining disease classification, identifying common etiology of comorbidities in genetic studies and finding analogies between closely related diseases and finally identify common treatments keller12 . Bhavnani and collaborators bhavnani09 applied network analysis approach to find similarities among renal disease using gene expression data.

In addition to many computational techniques, FCA has also been applied for finding disease similarities. The work of Keller et al., (2012) keller12 shows the application of FCA for identification of disease similarity. They identified formal concepts using gene disease associations that indicate hidden relationship among diseases having same set of associated genes, and gene that are associated with same set of disease. The FCA approach has advantages over network analysis approach, such as i) FCA allows representation of relationships among several diseases, ii) it provide results in algebraic form allowing to consider relationship among concepts, and iii) additional gene annotation can be added to refine concepts that assist for the identification of functional gene relationships within disease groups. FCA has been applied on renal disease dataset that finds unexpected relationships among disease which are promising but it suffers from few disadvantages. The difficulty with FCA is that many of the formal concepts may not be useful because only a few formal concepts indicate relationships.

3.5 Identifying genes of gene regulatory networks

Gene regulatory networks (GRNs) are the systematic biological networks describing interaction among set of genes in the form of a graph, where node represents genes and edges defines their regulatory interactions. Understanding the GRNs helps in understanding interactions among genes, biological and environmental effects and to identify the target genes for drug against the diseases. GRNs have been proved to be a very useful tool used to describe and explain complex dependencies between key developmental transcription factors (TFs), their target genes and regulators raza12 ; raza13 . For the better understanding of a gene regulatory network (GRN), it is necessary to know set of genes belonging to it. Identification of these set of genes correctly is a challenging task, even for a small subnetworks. In fact, only few genes of a GRN are known and rest of the genes are guessed based on experience or informed speculation gebert08 . Hence, it is better to rely on experimental data to support these guesses.

Gebert and collaborators gebert08 presented a new FCA based method to detect unknown members of GRN using time-series gene expression data. Suppose that is the set of all genes in an organism and is set of seed genes. The goal is to find subset of genes which interact strongly with GRN defined by set . Let be a relation having interactions and an matrix that consists of time-series gene expression profiles having length . If pair then it is known that and interaction to each other. The FCA-based approach proposed by Gebert et al., (2008) gebert08 has three main steps described as follows. First step is preprocessing step that uses the relation to get an initial list of interesting genes. If interaction data are not available, this step is skipped and entire gene set

is taken as the initial list of genes. In the second step, concept lattice is constructed using gene expression data that reduces the number of genes on the initial list. The last step computes probabilities for the correlation coefficient between genes that result from the second step and genes of

in order to get list of significant interactions.

4 Classification and prediction of enzymes, ligand and domain-domain identification

The classification and study of relations in FCA is focused on the basis of the objects and various types of related attributes (binary, nominal, ordinal etc.), therefore it is quiet helpful for computational scientist working on Biological data, who may wish to skip the inside details. With several advantages, including strong mathematical basis, FCA serves in several applications to explore biological data, enzyme classifications, identification of important protein domains (including protein binding sites) and related drug molecules. FCA is also reported useful in the integration of Biological activity with chemical spaces. This list is not exhaustive; FCA has also been used to understand the structural classification of glasses bartel97 and several other studies. In this section, we discussed some important application of FCA for the classification of enyzmes, binding site identification and discovery of ligand as drug molecules and so on.

4.1 Enzyme classification

Enzymes are proteins which catalyses biological reactions and they are named and classified according to the reaction they catalyse. For example, hydrolases are those types of enzymes which are involved in the reactions by addition or deletion of water molecules. Though the sequences of most of the enzymes are available in numerous biological databases, it is tedious task to predict the function of the enzymes from their respective sequences due to varied activity from small sequence combinations. Considering that the new enzyme family may emerge, an effort was made for enzyme classifications using FCA which classifies the enzymes that does not belong to known family coste14 . They comment: it is easier to predict the super-families of the proteins as compared to the families of the proteins. In this study, the labelled and unlabelled enzyme sequences were ‘objects’ whereas ‘attributes’ represent the enzyme blocks. Enzyme blocks are formed by sequential arrangements of the amino acids, which correspond to specific functions like catalytic site, lining residues of important pockets or binding sites. In this method of classification, more than half unlabelled sequences were found to be correctly classified.

Another attempt for the classification of protein using FCA has been done by Han and collaborators han07 . They proposed FCA-based approach for protein classification that uses protein domain and Gene-Ontology annotation information. Protein domains represent the evolutionary information forming a protein, while Gene-Ontology describes other properties of proteins that includes structure of protein, molecular interactions, etc. Han and collaborators han07 applied tripartite lattice for interpenetrations among protein, domain and GO terms. With the help of tripartite lattice, they classified protein from domain composite and their corresponding GO term description. They extracted concrete information using tripartite lattice in the corresponding domain that co-occur in proteins because they are more likely to exhibit common functions, as annotated in GO terms.

4.2 Binding site identification

Protein binding sites (PBS) and ligand binding sites identification are vital to protein- protein and protein-ligand interactions, respectively. This eventually helps the medical science in identification of better drug or therapy for several important diseases. There are several ways to identify the binding sites. Most commonly, the protein docking protocol helps in identifying the binding site by forming complex with one protein to other protein or a ligand (which is a drug in most of the cases).

Bresso et al., (2012) bresso12 in their report highlight: Majority of the reported methods utilising the structure based prediction methods for protein-protein interactions consider the attributes, which are physico-chemical properties like hydrophobicity, residue constituents but lack the representation of properties (e.g. accessible surface of a particular residue) of binding components or spatial relation between two components (residues). Considering these limitations and knowing the flexibility of FCA, Bresso et al., (2012) bresso12

utilised available protein 3D structures for characterizing PBS. In this concept, Inductive Logic Programming (ILP) was linked with FCA, which enabled identifications and discovery of distinct binding pockets of protein-protein interactions.

4.3 Discovering Ligand from database as a drug molecule

Using FCA, several attempts have been made to identify suitable ligands from number of chemical database like IUPHAR, ZINC and many more. The reports suggest that FCA helps in the identification of drug molecules. Drug molecules can be either agonist (activators) or antagonist (inhibitors). For a given protein, these drug molecules, would likely act as agonist or antagonist. The one which do not binds and do not show the changes, are not considered as drug molecules. In addition to the ADMETox properties, the chemical molecules, which follow the Lipinski’s Rule, are considered as suitable drugs.

Actually, when we talk about drugs, a chemical compound has number of physical properties: Hydrogen bond donors, acceptors; rotatable bonds; topological surface area; molecular weight; XlogP and chemical properties: absorption; digestion; metabolism; excretion; toxicity. Using these properties as attributes for the object ligand which could be possibly a drug molecule, one can identify and differentiate them from a bulk of chemical molecules in the database using FCA. To take an example, similar attempt was made by Sugiyama et al., in 2012 sugiyama12 . They considered the physical features discussed above, including the number of Lipinski’s rule broken to set as attribute in order to identify the ligands from IUPHAR database. They designed an algorithm, LIFT (LIgand Finding via Formal ConcepT Analysis) for semi-supervised multi-labelled classification from mixed type data. Results of the algorithm were effective and proved to be efficient system of classification to identify the ligands from the training data. Fragment Formal Concept Analysis (FragFCA) introduced by Lounkine et al., (2008) lounkine08 has the ability to identify the selective hits in high-throughput screening data sets. In the concept design of FragFCA, combinations of molecular fragments are the ’objects’ and their ’attributes’ includes the compound activity and potency information.

The effectiveness of better drug identification can be improved, when the attributes classifying the ligands could be slightly updated, so as to filter non-peptide molecules from the bulk of drug molecules. It has been identified that the peptide molecules have limited in vivo efficacy due to pharmacological constraints: solubility, stability and selectivity. Hence, for reliable and safer drug therapy, discovery and optimisation of non-peptide inhibitors/drugs is necessary mugumbate13 . Moreover, in a recent in silico identifications of the drug molecules for Cathepsin L (SmCL1) of the organism, Schistosoma mansoni responsible for the disease ‘schistosomiasis’, it was revealed that the non-peptide molecules could be better drug molecules as compared to peptide drugs molecules zafar15 . The list of popularly used software tool based on FCA is shown in Table 1.

So, to conclude, FCA can set an excellent framework to deal with variety of problems. Before application of the concept on to the biological data minor optimisations and through understanding of the domain is the need in current study for better research.

S.No. Tool Name Descriptions References

1. ConExp Java-based FCA analysis software tool used to generate the lattice diagram. Serhiy (2000) serhiy00
2. FcaStone Tool for format conversion and command-line lattice generation. Priss (2008) priss08
3. Contextual Role Editor FCA tool that work with Eclipse modeling tool. Mühle & Wende (2010) muhle10
4. FcaBedrock A tool for creating context files for Formal Concept Analysis. It can convert existing data sets in flat-file CSV or 3-column CSV, to Burmeister (.cxt) or FIMI (.dat) context files. Andrews & Orphanides (2010) andrews10
5. Lattice Miner FCA tool for the construction, manipulation and visualization of concept lattices. Lahcen & Kwuida (2010) ) lahcen10
6. CUBIST Gene expression analysis tool that combines FCA with association rule and visual analytics. It provides filtering and grouping large data sets, its interactive exploration and provides various relevant statistics. Melo et al., (2013) melo13
7. Galicia Galois lattice integrative constructor Galicia galicia
8. OpenFCA This project comprises of a set of tools for performing FCA activities, including creation of context, visualization of lattice and attribute exploration Borza et al., (2010) borza10
9. LIFT LIgand Finding via Formal ConcepT Analysis (LIFT) for semi-supervised multi-labelled classification from mixed type data. Sugiyama et al., (2012) sugiyama12
10. FragFCA Fragment Formal Concept Analysis (FragFCA) identifies the selective hits in high-throughput screening data sets. Lounkine et al., (2008) lounkine08

Table 1: List of FCA based software tools applied in biological domain

5 Conclusions and Discussions

Biological data are growing with unprecedented rate. High throughput technologies fuelled in the production of high quality biological data. These data when analyzed properly then one can discover several fruitful knowledge hidden inside biological data. Formal Concept Analysis (FCA) is a method based on lattice theory for the analysis of binary relational data and has been found to have potential applications in many areas of bioinformatics and computational biology, beside other applications. In this chapter, we presented the current status of FCA for the analysis and knowledge discovery from biological data including gene expression discretization, gene co-expression mining, gene clustering, finding genes in gene regulatory networks, enzyme/protein classifications, binding site classifications, and so on. It also presented a brief list of FCA-based software tools applied in biological domain and covered some challenges faced so far.

Bibliography

  • (1) Wille R (1982). Restructuring lattice theory: an approach based on hierarchies of concepts. In I. Rival (Ed.). Ordered sets. Reidel. Dordrecht-Boston, 445–470
  • (2) Babu MM (1997) Biological databases and protein sequence analysis. Center for Biotechnology, Anna University, Chennai
  • (3) Kraulis P (2001) Databases in bioinformatics. Available on:
    http://www.avatar.se/lectures/strbio2001/databases/index.html Accessed on May 20, 2015
  • (4) Rioult F, Boulicaut J-F, Cremilleux B, Bessoner J (2003) Using transposition for pattern discovery from microarray data. In: Proceedings of DMKD, 73–79.
  • (5) Rioult F, Robardet C, Blachon S, Cremilleux B, Olivier G, and Boulicaut J-F (2003) Mining concepts from large SAGE gene expression matrices. In: Proceedings of KDID03 co-located with ECML-PKDD 2003, Catvat- Dubrovnik (Croatia), 107–118.
  • (6) Kaytoue-Uberall M, Duplessis S, Kuznetsov SO, Napoli A (2009) Two FCA-based methods for mining gene expression data. In Sebastien Ferr e and Sebastian Rudolph, editors, ICFCA, Lecture Notes in Computer Science, Springer 5548:251–266.
  • (7) Kayala MA, Baldi P (2012) Cyber-T web server: differential analysis of high-throughput data. Nucleic acids research, 40(W1):W553–W559.
  • (8) Creighton C, Hanash S (2003) Mining gene expression databases for association rules. Bioinformatics 19(1):79-86.
  • (9) Kaytoue-Uberall M, Duplessis S, Napoli A (2008) Using formal concept analysis for the extraction of groups of co-expressed genes. In Modelling, Computation and Optimization in Information Systems and Management Sciences 439–449, Springer Berlin Heidelberg.
  • (10) Amin I, Kassim SK, Hassanien A, Hefny HA (2012) Using formal concept analysis for mining hyomethylated genes among breast cancer tumors subtypes. In IEEE 12th International Conference on Intelligent Systems Design and Applications (ISDA) 764–769.
  • (11)

    Serhiy A. Yevtushenko (2000) System of data analysis Concept Explorer (In Russian). In Proceedings of the 7th national conference on Artificial Intelligence KII-2000 Russia, 127–134.

  • (12)

    Raza K (2014) Clustering analysis of cancerous microarray data. Journal of Chemical and Pharmaceutical Research, 6(9):488–493.

  • (13) Choi V, Huang Y, Lam V, Potter D, Laubenbacher R, Duca K (2008) Using formal concept analysis for microarray data comparison. Journal of bioinformatics and computational biology, 6(01):65–75
  • (14) Messai N, Devignes MD, Napoli A, Smail-Tabbone M (2008) Many-valued concept lattices for conceptual clustering and information retrieval. In ECAI 178:127–131
  • (15) Melo C, Aufaure MA, Orphanides C, Andrews S, McLeod K, Burger A (2013) A conceptual approach to gene expression analysis enhanced by visual analytics. In Proceedings of the 28th Annual ACM Symposium on Applied Computing, ACM, 1314–1319
  • (16) Hristoskova A, Boeva V, Tsiporkova E (2014) A formal concept analysis approach to consensus clustering of multi-experiment expression data. BMC Bioinformatics 15(151):1–16
  • (17) Benabderrahmane S (2014) Formal concept analysis and knowledge integration for highlighting statistically enriched functions from microarrays data. International Work-Conference on Bioinformatics and Biomedical Engineering, IWBBIO 2014, Granda, Spain, 1.
  • (18) Keller BJ, Eichinger F, Kretzler M (2012) Formal concept analysis of disease similarity. AMIA Summits on Translational Science Proceedings, 42–51
  • (19) Bhavnani SK, Eichinger F, Martini S, Saxman P, Jagadish HV, Kretzler M (2009) Network analysis of genes regulated in renal diseases: implications for a molecular-based classification. BMC Bioinformatics 10(Suppl 9):S3
  • (20) Raza K, Parveen R (2012) Soft computing approach for modeling genetic regulatory networks. Advances in Computing and Information Technology, Springer-Verlag, 178:1–12, doi: 10.1007/978-3-642-31600-5_1
  • (21) Raza K, Jaiswal R (2013) Reconstruction and analysis of cancer-specific gene regulatory networks from gene expression profiles. International Journal on Bioinformatics & Biosciences 3(2):25–34, doi: 10.5121/ijbb.2013.3203
  • (22) Gebert J, Motameny S, Faigle U, Forst CV, Schrader R (2008) Identifying genes of gene regulatory networks using formal concept analysis. Journal of Computational Biology 15(2):185–194
  • (23) Bartel HG, Nofz M (1997) Exploration of NMR data of glasses by means of formal concept analysis. Chemometrics and Intelligent Laboratory Systems, 36(1):53–63.
  • (24) Coste F, Garet G, Groisillier A, Nicolas J, Tonon T (2014) Automated enzyme classification by formal concept analysis. In Formal Concept Analysis, Springer International Publishing, 235–250.
  • (25) Han1 M-R, Chung H-J , Kim Jihun, Noh D-Y, Kim JH (2007) Protein classification from protein-domain and gene-ontology annotation information using formal concept analysis. Y. Shi et al. (Eds.): ICCS 2007, Part II, Springer-Verlag Berlin Heidelberg, LNCS 4488:347–354.
  • (26) Bresso E, Grisoni R, Devignes MD, Napoli A, Tabbone M (2012) Formal concept analysis for the interpretation of relational learning applied on 3D protein-binding sites. In Proceeding of 4th International Conference on Knowledge Discovery and Information Retrieval - KDIR 2012, Barcelona, Spain, 12.
  • (27) Sugiyama M, Imajo K, Otaki K, Yamamoto A (2012) Semi-supervised ligand finding using formal concept analysis. IPSJ Transactions on Mathematical Modeling and Its Applications 5(2):39–48, doi: 10.2197/ipsjtrans.5.114
  • (28) Mugumbate G, Newton AS, Rosenthal PJ, Gut J, Moreira R, Chibale K, et al. (2013) Novel anti-plasmodial hits identified by virtual screening of the ZINC database. J Comput Aided Mol Des. 27:859–71, doi: 10.1007/s10822-013-9685-zPMID:24158745
  • (29) Lounkine E, Auer J, and Bajorath J (2008) Formal concept analysis for the identification of molecular fragment combinations specific for active and highly potent compounds. J. Med. Chem. 51:5342–5348, doi: 10.1021/jm800515r
  • (30) Zafar A, Ahmad S, Rizvi A, Ahmad M (2015) Novel non-peptide inhibitors against SmCL1 of schistosoma mansoni: In silico elucidation, implications and evaluation via knowledge based drug discovery. PLoS ONE 10(5):e0123996, doi:10.1371/journal.pone.0123996
  • (31) Priss U (2008) FcaStone - FCA file format conversion and interoperability software. Conceptual Structures Tool Interoperability Workshop (CS-TIW 2008).
  • (32) M hle H, Wende C (2010) Describing role-oriented software models in terms of formal concept analysis. In Proceedings of the 8th International Conference on Formal Concept Analysis, 241–254. Springer.
  • (33) Galicia: Galois lattice interactive constructor. http://www.iro.umontreal.ca/galicia/features.html, Accessed on May 15, 2015.
  • (34) Lahcen B, Kwuida L (2010) Lattice Miner: A tool for concept lattice construction and exploration. In Suplementary Proceeding of International Conference on Formal concept analysis (ICFCA’10).
  • (35) Andrews S, Orphanides C (2010) FcaBedrock, a formal context creator. In Conceptual Structures: From Information to Intelligence, Springer Berlin Heidelberg, 181–184.
  • (36) Borza PV, Sabou O, Sacarea C (2010) OpenFCA, an open source formal concept analysis toolbox. In Automation Quality and Testing Robotics (AQTR), 2010 IEEE International Conference on, 3:1–5.