In microarray research, finding groups of genes exhibiting similar expressions, clustering and biclustering techniques are more commonly used in gene expression analysis , . However,these techniques become inefficient when the influence of the time as a factor affects the behavior of expression profiles . Now, these types of longitudinal experiments are gaining interest in various areas of molecular activities where the evaluation of time is essential. For example, in cell cycles, the evolution of diseases or development at the molecular level is time based as they consider time an important factor of evaluation . Hence, triclustering appears to be a valuable mechanism as it allows evaluation of the expression profiles under a block of conditions along with under a subset of time points.
A coherent tricluster is defined as a set of genes that pursues either coherent values or behaviors. These clusters might have useful information that identify significant phenotypes or potential genes relating to the phenotypes and their regulation relations 
. The computational complexity of triclustering algorithms is more expensive than the biclustering algorithms(which are already NP hard), so heuristic based algorithms are an upstanding resemblance for triclustering.
Genetic Algorithms (GAs) are search specific algorithms and are motivated by the characteristics of genetics and natural selection . GAs usually undergo some important phases like reproduction, mutation, fitness evaluation and selection. Sequential GAs are competent in many applications as well as in different domains. However, there exist some problems in their utilization of problems like triclustering. For example, the fitness evaluation in sequential GAs is usually very time-consuming. Also, sequential GAs may get trapped in a sub-optimal region of the search space thus becoming unable to find better quality solutions. So parallel GAs(PGAs) seem to be a better alternative to the traditional sequential GAs with the adoption of parallelism. The static subpopulations with migration parallel GAs have a key characteristic of applying multiple demes along with the presence of a migration operator. Coarse-grained parallel genetic algorithms(CgPGA) follow the same general terms for a subpopulation model having a fairly small number of demes with many individuals. Very often coarse-grained parallel GAs are treated as distributed GAs as in general their implementation is carried out on distributed memory MIMD computers. This appeal can also be well configured with heterogeneous networks.
In this paper, an algorithm based on coarse grained parallel genetic algorithms(CgPGA) approach is proposed. This algorithm finds genus of similar patterns for genes on a three-dimensional space, where genes, conditions and time factor are taken into consideration.
The rest of this paper is organized as follows: A review of the literature is presented in section 2. The proposed methodologies along with the details of the fitness functions and the genetic operators used are described in section 3. The simulation results with their GO term validation are discussed in section 4.4. Finally, section 5 presents the summary and the research findings of the proposed scheme and prospects for future work.
2 Related Work
Zhao and Zaki introduced triCluster algorithm in 2005 . In this work, the patterns are discovered in three dimensional (3D) gene expression data along with a set of matrices for the quality measure. A contemporary approach that finds coherent triclusters which contain the regulatory relationships among the genes is stated in  and subsequently extract time-delayed clusters in .
LagMiner, in 
introduced a new technique to detect time-lagged 3D clusters. The evolutionary computation in the form of a multi-objective algorithm has also been employed in the search for triclusters in. Bhar Anirban et al. in 2012 presented -TRIMAX algorithm . Again in 2013, the same authors applied the -TRIMAX algorithm in estrogen-induced breast cancer cell datasets which provides insights into breast cancer prognosis . David et al. presented a novel tricluster algorithm called as trigen in 2013 . The novelty of this Trigen algorithm lies upon the use of the genetic approach to mine three dimensional gene expression microarray data. In 2015, Ayangleima et al. applied coarse-grained parallel genetic algorithm(CgPGA) with migration technique to mine biclusters in gene expression microarray data . In the year 2016 Kakati et al. presented a fast gene expression analysis that uses distributed triclustering and parallel biclustering approach . In her work, the initial bicluster finding is performed by parallel or shared memory approach and then the triclusters are extracted by a distributed or a shared nothing approach. Premalatha et al. in 2016 presented TrioCuckoo  which implemented triclustering using the famous cuckoo search technique.
3 Proposed Methodology
In this section, the reported algorithm has experimented on the standard yeast cell cycle dataset (Saccharomyces cerevisiae) . Then the biological validation process is initiated with a tool called GO term finder (Version 0.83)  to get the functional annotations of the genes resulted in the output tricluster.
3.1 Encoding of individuals
Every individual in the population encodes a tricluster. Triclusters are represented in the form binary strings of G+C+T length, G being the genes(rows), C being the conditions(columns) and T being the times(height) of the 3D expression matrix. If the bit in an individual is 1, it indicates that the respective row, column or height have a place in the tricluster.
3.2 Fitness Function
Here a fitness function has been implemented to select the best candidates, which is conceptualized up on the three dimensions aspect of the mean square residue measure (MSR) which has been an all-time effective biclustering measure for gene expression analysis . It is named as now onwards. As is a minimization function, we expect better results with smaller values.
The function is defined for every tricluster(TC). It is minimizing and thus lower values are favourable. Where,
The weights term is defined as:
Where , and are weights for the number of genes, conditions and times in a tricluster solution, respectively. High values of weights are favorable.
The distinction term is defined as:
Where, (Co-ord Distinction no. of g), (Co-ord Distinction no. of c) and (Co-ord Distinction no. of t) are, respectively, the number of genes, conditions and time coordinates in the tricluster that are absent in the tricluster being evaluated, and , and are the distinction weights of the genes, conditions and times respectively. Distinction is a measure for the uniqueness of the tricluster being currently evaluated. With increased value of distinction non-overlapping solutions compared with results previously found can be found. Where,
: Tricluster gene coordinates subset.
Tricluster condition coordinates subset.
: Tricluster time coordinates subset.
: No. of time co-ord of the tricluster
: No. of condition co-ord of the tricluster.
: No. of gene co-ord of the tricluster.
: Expression value of gene g under condition c at time t from the expression matrix.
Tri-CgPGA is based on coarse grained genetic algorithms which come under Parallel Genetic Algorithm family. So like coarse grained algorithms, this evolutionary algorithm takes several steps to execute which are illustrated in the flowchart and pseudo-code below.
4 Experimental results and discussions
All the computational simulations are performed in general conditions on a multiprocessor machine with 4 processors Intel Core i7 3.60 GHz with 4 GB RAM and Windows 8.1 64 bit operating system memory. The yeast cell cycle dataset (Saccharomyces cerevisiae)  is used for establishing the efficacy of the proposed algorithm. This dataset contains 6179 genes, 4 conditions, and 14 time points. The experiment is performed on the above mentioned dataset along with its two synthetic versions but only reported for the former.
4.1 List of the Parameters
During execution, some parameters have been set up like the crossover probability, mutation probability , weights: for genes, for conditions and for times, distinction weights: , and for genes, conditions and times respectively. The details of them are available in table 1. As the algorithms are designed for gene filtration (to obtain the solution with a minimum number of genes), the value of is set to 0.8 so that maximum number of genes can participate in the solution. While setting up the parameters for the distinction term a higher value is being provided for the genes to cover up as much space as possible in this dimension.
4.2 Results on Yeast Dataset
The simulation results are analyzed from the perspective of the different generations. Analyzing across different generations, it indicates as the number of generations is increased, the values also increase. So for bigger generations, better homogeneity among the genes is obtained which is presented in the following graphs.
4.3 Comparitive Study
The results obtained from the execution of the algorithm are quite impressive in terms of time and the volume of the output triclusters. As the fitness function is minimizing, lower the value of MSR the better is the fitness of the tricluster. Further the results of the Tri-CgPGA algorithm is compared with the results obtained by the trigen algorithm . The comparison has been done on the basis of computational time taken by the proposed algorithm to execute the codes and to derive the output. In the case of Tri-CgPGA algorithm, it took 30 seconds approximately to run for 1000 genes for 50 generations to deliver the output whereas the trigen algorithm  requires 118 seconds to do the same. Hence exploring parallelism with the genetic approach on triclustering of gene expression microarray data is preferable against the traditional GAs as it reduces the computation time for the algorithm execution. Other relevant information regarding the results obtained from the algorithms Tri-CgPGA algorithm is presented in Table 2.
|GENE SIZE||AVG. MSR||AVG. VOLUME||AVG. NO. OF GENES||AVG. NO. OF. CONDITIONS||AVG. NO. OF TIME|
4.4 GO Term Analysis
The validation of the results obtained is carried out with the Gene Ontology project (GO) . This analysis renders the ontology of terms which describes gene product annotation data along with its characteristics. The ontology describes attributes like molecular functions, cellular component and the relevant biological processes. The queries associated with the associated genes are addressed in GO using the GO Term Finder (Version 0.83) . The findings of the GO Term analysis are presented in Table 3.
A new framework Tri-CgPGA, based on the coarse-grained parallel genetic approach(CgPGA) to generate the triclusters from gene expression database is proposed in our work. The results of the suggested framework are compared with another state of the art technique called as Trigen algorithm. As the comparison justifies the proposed scheme’s efficiency over the existing schemes considering the computation time, hence it is preferable to adopt parallel GAs over traditional GAs in the triclustering of gene expression 3D microarray data. There exist number of future directions which might further improve this framework: (1) The acquisition of large-scale databases from other standard datasets to measure the performance of the frameworks (2) To further improve the coherence and the computation time, other competent evaluation measures with the suggested or other existing versions of PGAs should be investigated to obtain more meaningful triclusters.
-  (2004) Analyzing time series gene expression data. Bioinformatics 20 (16), pp. 2493–2503. Cited by: §1.
-  (2012) -TRIMAX: extracting triclusters and analysing coregulation in time series gene expression data. In International Workshop on Algorithms in Bioinformatics, pp. 165–177. Cited by: §2.
-  (2013) Coexpression and coregulation analysis of time-series gene expression data in estrogen-induced breast cancer cell. Algorithms for molecular biology 8 (1), pp. 9. Cited by: §2.
-  (2004) GO:: termfinder—open source software for accessing gene ontology information and finding significantly enriched gene ontology terms associated with a list of genes. Bioinformatics 20 (18), pp. 3710–3715. Cited by: §3, §4.4.
-  (2000) Biclustering of expression data.. In Ismb, Vol. 8, pp. 93–103. Cited by: §3.2.
-  (2004) The gene ontology (go) database and informatics resource. Nucleic acids research 32 (suppl_1), pp. D258–D261. Cited by: §4.4.
Pattern recognition in biological time series.
Conference of the Spanish Association for Artificial Intelligence, pp. 164–172. Cited by: §1.
-  (2014) TriGen: a genetic algorithm to mine triclusters in temporal gene expression data. Neurocomputing 132, pp. 42–53. Cited by: §2, §4.3.
-  (1972) Direct clustering of a data matrix. Journal of the american statistical association 67 (337), pp. 123–129. Cited by: §1.
Genetic algorithms in search, optimization and machine learning. Massachusetts: Addison-Wesley. Cited by: §1.
-  (2016) A fast gene expression analysis using parallel biclustering and distributed triclustering approach. In Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies, pp. 122. Cited by: §2.
-  (2015) Bi-clustering of gene expression microarray using coarse grained parallel genetic algorithm (cgpga) with migration. In India Conference (INDICON), 2015 Annual IEEE, pp. 1–6. Cited by: §2.
-  (2008) Multi-objective evolutionary algorithm for mining 3d clusters in gene-sample-time microarray data. In Granular Computing, 2008. GrC 2008. IEEE International Conference on, pp. 442–447. Cited by: §2.
-  (2008) . In Eighth International Conference on Hybrid Intelligent Systems, pp. 831–836. Cited by: §1.
-  (1998) Comprehensive identification of cell cycle–regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Molecular biology of the cell 9 (12), pp. 3273–3297. Cited by: §3, §4.
-  (2018) TrioCuckoo: a multi objective cuckoo search algorithm for triclustering microarray gene expression data. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 34 (6), pp. 1617–1631. Cited by: §2.
-  (2012) Mining biological information from 3d short time-series gene expression data: the optricluster algorithm. BMC bioinformatics 13 (1), pp. 54. Cited by: §1.
-  (2010) Efficiently mining time-delayed gene expression patterns. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 40 (2), pp. 400–411. Cited by: §2.
-  (2009) Finding time-lagged 3d clusters. In Data Engineering, 2009. ICDE’09. IEEE 25th International Conference on, pp. 445–456. Cited by: §2.
-  (2007) Mining time-shifting co-regulation patterns from gene expression data. In Advances in data and web management, pp. 62–73. Cited by: §2.
-  (2005) Tricluster: an effective algorithm for mining coherent clusters in 3d microarray data. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 694–705. Cited by: §2.