Metamorphic Testing for Quality Assurance of Protein Function Prediction Tools

04/16/2019 ∙ by Morteza Pourreza Shahri, et al. ∙ Montana State University 0

Proteins are the workhorses of life and gaining insight on their functions is of paramount importance for applications such as drug design. However, the experimental validation of functions of proteins is highly-resource consuming. Therefore, recently, automated protein function prediction (AFP) using machine learning has gained significant interest. Many of these AFP tools are based on supervised learning models trained using existing gold-standard functional annotations, which are known to be incomplete. The main challenge associated with conducting systematic testing on AFP software is the lack of a test oracle, which determines passing or failing of a test case; unfortunately, due to the incompleteness of gold-standard data, the exact expected outcomes are not well defined for the AFP task. Thus, AFP tools face the oracle problem. In this work, we use metamorphic testing (MT) to test nine state-of-the-art AFP tools by defining a set of metamorphic relations (MRs) that apply input transformations to protein sequences. According to our results, we observe that several AFP tools fail all the test cases causing concerns over the quality of their predictions.



There are no comments yet.


page 4

page 5

page 6

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

I-a Proteins and their functions

Proteins are one of the main components of a living body that are important due to various vital functions they perform in living cells. Basically, a cell is alive because of the functions of proteins. While our genes encode protein sequences, proteins determine all other aspects of cell function including metabolism, structure, transport, signaling, immune defense, cell division and cell death. Disease processes associated with hereditary genetic defects ultimately are due to dysfunctions in the proteins that the genes encode.

Various forms of Alzheimers, Huntingdons, Parkinsons, cystic fibrosis and hemophilia are all well-known examples of protein misproduction caused by errors in the underlying genetic code [1, 2, 3, 4, 5]. Lesser-known examples include errors in the BRCA1 and BRCA2 genes, which are known to increase a persons risk of developing breast cancer, and errors in the code of Msh2, which increases a risk of developing colon and endometrial cancers [6, 7]. Whilst all of the aforementioned diseases are vastly different in their epidemiological background, one element they all have in common is a disruption of a proteins ability to correctly perform its function.

Gene Ontology (GO) is a framework used for describing protein functions [8]. Gene Ontology is composed of different classes (or terms), each of which demonstrates a single function, and the hierarchical relations between the classes. Within the GO term hierarchy, child terms are more specialized than their parent terms, e.g., tyrosine metabolic process is a child term of metabolic process. In addition, GO relations can be is-a relations, part-of relations, etc. For example, the protein BRCA1_HUMAN has a list of functions such as androgen receptor binding (GO:0050681), damaged DNA binding (GO:0003684), etc.

Gene Ontology is composed of three sub-ontologies: the molecular function (MF) ontology, which describes various molecular activities, the biological process (BP) ontology, which describes various processes that a protein may be involved with and the cellular component (CC) ontology, which describes the localization of proteins. The official Gene Ontology website111 maintains not only the ontology but the annotations using the ontology (the gold-standard functional annotations) for a large collection of proteins from many different organisms [8]. Many of these annotations are experimentally validated through wet-lab assays. These annotations follow the “true path rule” which means annotations to a certain term imply annotations to all of its ancestors [9].

However, how biologists identify such function has been drastically altered over the last decade, thanks to the next generation sequencing revolution. Following the completion of the human genome project, DNA sequencing technology has developed at such a rate that it far surpassed Moores law [10]. The most striking example of this is the cost of sequencing a human genome. 15 years ago, the completion of the human genome project was announced. This project was a large international collaboration which took 13 years and $2.7 billion to complete [11]. In a clinical setting today, the cost of whole genome sequencing has been reported to be approximately $1,906-$24,810 and it could be sequenced and assembled in a matter of weeks [12], and based on the genetic code, protein sequences can be directly inferred from gene sequences. The exponential growth of available gene and protein sequence data presents a whole new suite of challenges to today’s biologists rendering the gold-standard Gene Ontology annotations incomplete. It is reported that only a small percentage of known proteins have experimentally validated annotations, while many among them are considered incomplete[13]. This has highlighted the need for high-throughput approaches for functional annotation and has consequently fostered collaborations between a range of disciplines, most notable of which is computer science.

I-B Automated Protein Function Prediction (AFP)

Through the development of numerous algorithms and tools, collaborations with bioinformaticians and computational biologists have altered the way and speed in which biologists can make sense of the deluge of genomic data. One area that has benefited significantly from such developments is automated protein function prediction (AFP). As mentioned above, previous routes of ascertaining protein function required extensive wet-lab investigations, often only focusing on one protein at a time and could be considered low throughput [14]. Whilst such experiments are still required for validation, computational protein function prediction tools have significantly changed the way biologists conduct protein function investigations. These high throughput approaches have been essential for modeling the impact that errors in the genetic code have upon the function of proteins and how this impacts the health of an organism.

Automated function prediction tools typically take a protein sequence as their input and output a set of predictive GO terms corresponding to their functional categories. These protein sequences are stored in the text-based FASTA222 format where the protein sequence is preceded by a description line, identified by the “>” symbol.

These tools typically make their predictions using various techniques such as sequence matching that employ the sequence alignment to extract the functions of similar proteins, protein structure-based methods, genomic context-based methods, phylogenomics-based methods, protein-protein interaction-based methods, data integration methods, and text mining-based methods [14].

Sequence-based methods match a large collection of sequences with a target protein sequence, and using this comparison they determine whether the sequences under comparison share a common ancestor. One subgroup of sequence-based methods is the methods that fall in the same category but do not directly predict functions of proteins, but they provide information about protein sequences by extracting features which can be used by other machine learning methods [15, 16, 17, 18].

Protein structure-based methods try to find a level of similarity using two given protein structures which provides the transfer of functional annotations between proteins, and the similarity can be detected using the entire structure or only a part of the structures [19, 20].

Genomic context-based methods rely on the knowledge that the location of the gene which is encoding a query protein is prominent information that can be used for function prediction [21, 22]. Evolutionary relationships are also exploited between organisms to find functional similarities between genes in phylogenomics-based methods [23, 24].

Interaction-based methods utilize protein-protein interaction (PPI) networks in which PPI data is represented as vertices (proteins) and edges (direct bindings). These interactions can be utilized to find functional relationships, and to achieve this goal, graph-theoretic methods and algorithms can be employed to predict functions of proteins [25, 26].

The data integration-based methods are mostly based on machine learning in which features generated from different biological sources are combined and used for training a machine learning model [27, 28].

Text mining-based methods have been employed for the analysis of biomedical literature for the problem of protein function prediction with the idea that the large amount of information in the literature can link proteins with each other. Therefore, they can be utilized to increase the size of labeled data for the task of training and evaluation [29, 30].

Critical Assessment of protein Function Annotation (CAFA) is a community-wide large-scale evaluation of AFP tools organized by the Function Special Interest Group333 At the time of writing this paper, CAFA2 was the latest challenge where its results of evaluation were publicly available [31]. Many tools were presented in CAFA2, and they were evaluated using different criteria such as macro-AUROC, F-max, and  [31].

I-C Quality Assurance of AFP tools

Despite the plethora of AFP tools and comprehensive CAFA evaluation results, selecting a tool from this list of top performing AFP tools to perform experiments or research would be very challenging as described below. One way to select a tool is randomly picking a few tools, feeding well-known protein sequences into the tools, and comparing the outputs with the experimentally validated GO terms, which are the results of a physical characterization of a gene product that has supported its association with the GO term444

However, using a few tools and sequences in the above specified manner, users would observe that each tool provides different set of output GO terms, and only a few terms would be in common with the experimentally validated terms. Fig. 1 shows the distribution of predicted GO terms using three randomly selected top performing CAFA2 tools in comparison with the corresponding experimentally validated terms of the well-known protein Tyrosinase, an enzyme that hydroxylates tyrosine as the first step in melanin synthesis555Diagrams generated by The distribution of GO terms in both the Molecular Function and Biological Process ontologies shows that only one of the predicted GO terms is in common between the three tools and the experimentally validated terms. The next important observation on Biological Process ontology is that one of the tools returns 1199 GO terms, whereas another tool outputs only four GO terms for the same protein. Moreover, as mentioned above, the experimentally validated terms set is incomplete. Therefore, these observations show why it would be challenging for a biologist to select a tool for their research, as well as for a developer to test the tools that they develop.

Yet, protein function prediction tools form an essential part of the vast majority of protein function investigations. Designed to complement rather than replace experimental analysis, these tools are often employed to direct the focus on experimental investigations. Failure of the tools to perform accurately could lead to lengthy, expensive and ultimately fruitless experimental investigations. Thus, it is essential to develop cost effective approaches for systematically testing AFP tools.

In this study, we apply metamorphic testing (MT) for the quality assurance of AFP tools. We develop novel metamorphic relations (MRs) using transformations to protein sequences that are typically used as inputs to AFP tools. We use these MRs to test nine top performing AFP tools from CAFA2. The results of this study show that several AFP tools fail all the test cases and only at most two tools pass all the test cases. Therefore this study has implications for both the the developers and the users of these AFP tools.

Fig. 1: Distribution of output GO terms for the protein TYRO on Biological Process and Molecular Function ontologies

Ii Metamorphic Testing

In complex systems, such as AFP tools, it is practically difficult to determine whether the output provided by the system for a given input is correct. This is known as the oracle problem [32]. MT can be used to test programs that face the oracle problem [33]. The MT process involves deriving MRs and generating test cases based on those MRs. A MR is a relation derived from the specification of the program under test and specifies how the output would change according to a specific change made to the input. Source test cases are typically derived using a traditional test case generation approach such as random test generation. Typically, the Follow-up test cases are derived by applying the transformations specified in the MR to the source test case and/or source outputs [34]. Then, the source and follow-up test cases are executed and outputs of these test cases are used to verify whether MR was violated or not. The violation of a MR indicates faults in the program.

Fig. 2 shows how MT is applied to a sorting program. This sorting program arranges a random set of numbers provided as input in the ascending order. A MR derived for the sorting program states that when the original set of numbers are shuffled and used as an input to the program, the output must be equal to the original output. In order to conduct MT on the sorting program using this MR, the source test case can be created by generating a set of random numbers and the follow-up test case can be created by shuffling the source test case. A fault is detected in the sorting program if the outputs from the source and follow-up test cases are not equal as defined in the MR [34].

Fig. 2: MT example for sorting program

Iii Metamorphic Testing for AFP Tools

The first step of applying MT for testing a given program is defining MRs. The most commonly used approach to define MRs is looking at the changes that you can make to the input and whether those changes would cause predictable changes in the output. However, defining MRs for AFP tools should be done with caution because, we cannot make random changes to the input sequences since the input sequence represents a specific protein and such changes would cause the sequence to loose its meaning. Any changes that we make to the input sequence should be made based on relevant biological knowledge as we discuss below. MRs are designed to achieve better understanding of the software .

We define a MR using the canonical sequences and their variants. A canonical sequence is defined as the “standard” sequence, generally based on its prevalence in the population and its similarity to orthologous sequences in other species. The term orthologous sequence is used in biology to refer to similar genetic sequences that are found in other species. Generally speaking, these orthologous genetic sequences are thought to maintain a similar function across the species it can be found in. All other sequences are hence considered variants of the canonical sequence. These sequence variants include genetic polymorphisms, disease-associated mutations and RNA editing events such as alternative splicing. Both the canonical sequence and the variants are generally listed under one single entry in the UniProt/Swissprot databases666 (which are the primary knowledge bases on proteins).

For testing AFP tools, we define a MR in the broad sense that says there should be a change in the output GO terms between the canonical proteins and their corresponding well-studied variants. Note that this assumption does not always hold true for all proteins, but we have carefully chosen only the protein examples that satisfy our MR. Thus, this MR imposes restrictions on the source test cases that can be used. More specific instances of this MR can be created by observing the characteristics of different variants. For instance, if the source test case is the canonical sequence of the protein Tyrosinase, and the follow-up test case is a disease variant of this protein which causes Albinism (OCA1A), biological knowledge entails that the output GO terms of the canonical sequence and the disease variant must be different.

We also note that the change in the output mentioned above is measured using the set difference. In other words, if the set of GO terms for the variant sequence is different from the set of GO terms for the canonical sequence, it is considered a change. In this setting, a GO term is only a match (i.e. equal) to that term itself, but not to any of its ancestors and or descendants. This interpretation is consistent with the CAFA evaluation setup in which tools are penalized for predicting a GO term that is an ancestor or a descendant of the gold standard GO term annotation[31].

Fig. 3: Architecture of the Metamorphic Testing system on AFP tools

We use this MR to conduct MT on a given AFP tool by performing the following steps (Fig. 3 depicts this process):

  1. Running the program with the canonical sequence (i.e. source test case) and getting as the output. The source test case is a FASTA sequence and will be a set of GO terms.

  2. Generating a follow-up test case using the source case, and executing the program with the follow-up test case and getting . The follow-up test case is also a FASTA sequence derived from a known variation of the source test case, and the output is a set of GO terms as well.

  3. Checking whether the MR defined above holds for and . In this example, the MR holds if there is a change in the list of output GO terms, i.e. additions, deletions, etc. If the expected change is satisfied, it will be a pass, otherwise, it will be a fail.

Iii-a AFP Tool Selection Criteria

In order to identify a suitable set of AFP tools to apply MT, we started with the 28 top-performing tools from the CAFA2 challenge. From these 28 tools, most are not publicly available, and some are very hard to setup and run. So, we selected tools that can be set-up for execution by spending a maximum of thirty minutes by a graduate student. At the time of this investigation, only three tools were publicly available and/or worked as advertised. As we wanted to perform the experiments on as many tools as possible, we contacted authors of the remaining 25 tools, requesting them to feed the sequences used as source and follow-up test cases to their tools and provide us with the outputs. Twelve authors responded positively, and 6 out of 12 authors sent us the outputs. Thus, in our evaluation we used the following nine tools: EVEX [29], PFP [15], CONS [16], GORBI [23], CBRG [24], ProFun [25], PANNZER [18], Argot2 [17], and INGA [27]. Fig. 4 depicts above mentioned work-flow of selecting the nine tools used in the evaluation.

Fig. 4: The flow of selection of the tools

Iii-B Source and Follow-up Test Cases

Protein Name UniProt Id
TYRO_Human Tyrosinase P14679
IL2RG_Human Cytokine receptor common subunit gamma P31785
TLR4_Human Toll-like receptor 4 O00206
TABLE I: The List of Selected Proteins

We used 18 sequences from three carefully selected well known proteins as source and follow-up test cases for testing the AFP tools using the previously defined MR. These proteins are shown in Table I. Tyrosinase (TYRO) and interleukin-2 receptor gamma (IL2RG, also termed the common gamma chain) were selected, because they have well characterized and highly defined functions. Importantly, point mutations (modification to a single location in the sequence) in the TYRO and the IL2RG cause a loss of a particular protein function that directly results in a clinical disease, i.e., oculocutaneous albinism and severe combined immunodeficiency, respectively. Toll-like receptor 4 (TLR4) likewise is a very well characterized innate immune receptor that mediates activation of pro-inflammatory signaling pathways upon binding of bacterial material. With respect the three selected proteins, changes in protein functions due to an altered amino acid sequence are expected to result in changes in GO terms for Molecular Function and Biological Process. We do not anticipate alterations in Cellular Component ontology.

These three proteins have a large numbers of variants associated with each of them. For example, TYRO has 99 variants involved in oculocutaneous albinism type A (OCA1A) alone, therefore, it not feasible and cost effective to execute the tools using all these variants.

We selected the number of variants to execute for each protein proportional to its sequence length. Thus, for TYRO more variants would be selected for execution compared to IL2RG since TYRO sequence is longer than IL2RG, i.e. we selected seven variants for TYRO, four variants for IL2RG, and four variants for TLR4. In the next step, we divide the sequence into equal segments proportional to the number of selected variants, i.e. for TYRO we divide the sequence into seven equal segments. Next, from each segment we pick the variant with the largest number of associated publications, which provides more experimental evidence for the existence of the variant. Eventually, the sequences consist of the canonical, i.e. standard, sequence and the sequences of variants as follows:

  • TYRO_HUMAN: (Canonical sequence + 7 disease variants)

  • IL2RG_HUMAN: (Canonical sequence + 4 disease variants)

  • TLR4_HUMAN: (Canonical sequence + 2 splice variants + 2 natural variants)

    Protein Identifier Position Change
    TYRO_Human P14679 Canonical
    TYRO Variant 1 VAR_007652 47 G - D
    TYRO Variant 2 VAR_007658 81 P - L
    TYRO Variant 3 VAR_007667 217 R - Q
    TYRO Variant 4 VAR_007671 299 R - H
    TYRO Variant 5 VAR_007680 373 T - K
    TYRO Variant 6 VAR_007690 419 G - R
    TYRO Variant 7 VAR_007692 446 G - S
    IL2RG_Human P31785 Canonical
    IL2RG Variant 1 VAR_002668 39 D - N
    IL2RG Variant 2 VAR_002681 153 I - N
    IL2RG Variant 3 VAR_002690 226 R - C
    IL2RG Variant 4 VAR_002701 285 R - Q
    TLR4_Human O00206 Canonical
    TLR4 Natural 1 526 N - A
    TLR4 Natural 2 711 D - K
    TLR4 Splice 1 O00206-2 Isoform 2
    TLR4 Splice 2 O00206-3 Isoform 3
    TABLE II: Variants Selection Criteria

Table II demonstrates the exact changes in the canonical sequences of proteins and their corresponding positions in the sequence. For each protein, we use the canonical sequence as the source test case and each of the variant sequences as the follow-up test case. Therefore, we have 15 pairs of source and follow-up test cases.

Iii-C Test Execution

The next step in applying MT to AFP tools is to feed the test cases into the tools, and checking whether the MRs hold for each execution. Therefore, we have nine tools and 15 pairs of source and follow-up test cases.

For each pair of source and follow-up test cases, we store the output GO terms for the Molecular Function and Biological Process ontologies separately, and compare the GO terms of and , and report the results of different ontologies separately.

Iv Results

Figs. 4(a) and 4(b) show the results of executing the tools with 15 test case pairs on Molecular Function ontology and Biological Process ontologies, respectively. Each pie chart shows the number of passes and fails of the 15 test case pairs for a given tool. As shown in Fig. 4(a), only tool H passes all the test cases. Four out of the nine tools fail all the test cases. This phenomenon can happen if the tools are not designed to detect variations in the protein sequence. The rest of the tools have a mix of passes and fails.

We executed the same 15 test case pairs (also known as metamorphic Group of Inputs [35]) on the Biological Process ontology as well. As shown in Fig. 4(b), two tools, G and H passed all the test cases. Four tools, A, B, E and F failed all the test cases. Interestingly, these are the same tools that failed all the test cases for the Molecular Function ontology. This further validates our hypothesis that these four tools are not designed to detect variations in protein sequences.

(a) Molecular function ontology
(b) Biological process ontology
Fig. 5: Overall test results

Next we analyze the test results at the individual protein sequence levels for the two ontologies. Figs. 5(a),  5(b), and 5(c) show the pie charts of the test results for the Molecular Function ontology for individual protein sequences TYRO, IL2RG and TLR4, respectively. As expected, tool H passed all the test cases. Further, tools D and G passed all the test cases for IL2RG and TLR4. However the performance of tool G on TYRO is not satisfactory as shown in Fig. 5(a). Thus, in addition to tool H, tool D could be another option to use when working with the Molecular Function ontology.

(a) TYRO
(b) ILR2G
(c) TLR4
Fig. 6: Test results at the protein level for the molecular function ontology

Similarly, Figs. 6(a)6(b), and 6(c) show the test results for the Biological Process ontology for the three protein sequences. As expected, tools G and H passed all the test cases for this ontology. Besides them, tool D passed all the test cases for IL2RG and TLR4 and also performed satisfactorily on TYRO. Thus, tool D can be another option to use when working with the Molecular Function ontology.

(a) TYRO
(b) ILR2G
(c) TLR4
Fig. 7: Test results at the protein level for the biological process ontology

Next, we investigate whether making predictions on certain protein sequences is harder than the others. Figs. 7(a) and 7(b) show the percentage of tools that successfully predicted the changes for each test case pair (named with the corresponding variant used in the follow-up test case) for the Molecular Function ontology and the Biological Process ontology, respectively. We observe that for both ontologies, higher percentage of tools passed the test cases for the variants of TLR4 compared to the other two protein sequences. This observation may suggest that the AFP tools predict the functions of natural and splice variants better than disease variants. More test executions are needed to confirm this hypothesis.

(a) Molecular function ontology
(b) Biological process ontology
Fig. 8: Percentage of tools passed for each source and follow-up test case pair

V Related Work

Srinivasan et al. worked on applying MT to LingPipe, a tool for processing text using computational linguistics, which is often used in bioinformatics for bio-entity recognition from biomedical literature  [36]. The authors proposed 10 novel MRs and the fault detection effectiveness of each of the MRs was evaluated using mutation testing. Lundgren et al. examined the effectiveness of MT for testing a genome alignment tool BBMap [37]. The experiment results showed that MT is effective in identifying subtle faults compared to pseudo-oracles. Ramanathan et al. used MT to test a workflow of epidemiological models [38]. They showed that MT can be useful when mathematical models fail. Pullum and Ozmen showed that MT could be effective in testing epidemiological models [39]. They used a differential equation and agent-based models for generating MR-transformed parameter values. Chen et al. used MT to test two open-source bioinformatics programs [40]. The first program GNLab, a tool for large-scale analysis and simulation of gene regulatory networks. The second tool SeqMap deals with mapping a short sequence that reads with a reference genome. The mutants were generated for the GNLab and SeqMap tools. The MRs had different fault finding abilities and the mutants were violated by at least one MR. The MRs related to the change in the nodes in GNLab network were less effective than the other MRs. Eleni et al. conducted metamorphic testing on three commonly used NGS (Next Generation Sequencing) short-read alignment programs: BWA, Bowtie, and Bowtie2 [41]. The results show that the MR created by permuting reads and addition of reads does not hold for BWA. Also MRs that reverse complement and extend the read fail on both bowtie and BWA.

Vi Conclusions and Future Work

In this study, we applied MT for testing nine AFP tools. We use the biological knowledge about proteins and their variants to define an MR that specifies that there should be a change in the predicted GO terms between the canonical protein sequence and their variants. We used this MR to create source and follow-up test cases using carefully selected protein examples such as disease variants.

Our results indicate that several tools do not pass any of the test cases that we used in this study. This is surprising considering the fact that all of these tools (except one) are among the top performing tools in CAFA2. The only tool for which this failure is to be expected is tool E as it appears this tool was designed specifically for the use of bacterial and archael genomes only (since we used only human data for testing).

However, it is also possible that these tools are not designed to handle variants. If that is the case, such limitations should be documented such that users of the tools are aware of the limitations [35]. This would ensure that the biologists, who are the intended primary users of these tools, use them for the appropriate use cases. This is of utmost importance due to the fact that predictions from many of these tools will be used to guide the wet-lab experiments. Predictions lacking in quality could alter the research directions, rendering loss of resources and most importantly can have a significant impact on healthcare applications.

In the future, we plan to develop more MRs for this domain which uses biological knowledge that will incorporate different orthogonal aspects of AFP such as functional characteristics unique to different species. Further, we plan to develop an expanded test suite by exploring MRs for cellular component sub-ontology and Human Phenotype ontology (which is another structured vocabulary for describing phenotype abnormalities associated with human diseases), and by employing different types of protein examples. We will also work with the AFP community to increase the number of tools to be tested. Eventually, we will develop a testing framework which is readily available for the users and developers in this domain.


We thank the PIs and teams that provided us with the results of their tools including EVEX (PI: Filip Ginter), Orengo-FunFams (PI: Christine Orengo), SIFTER (PI: Steven Brenner), Jones-UCL (PI: David Jones), Paccanaro Lab (PI: Alberto Paccanaro), ProFun (PI: Jianlin Cheng), PANDA (PI: Zheng Wang), CBRG and GORBI (PI: Christophe Dessimoz), FANN-GO (PI: Predrag Radivojac), CONS and PFP (PI: Daisuke Kihara). We also would like to thank Dr. Iddo Friedberg and Dr. Rachael Huntley for insightful discussions.


  • [1] G. R. Cutting, “Cystic fibrosis genetics: from molecular understanding to clinical application,” Nature Reviews Genetics, vol. 16, no. 1, p. 45, 2015.
  • [2] M. Ferreira and J. Massano, “An updated review of Parkinson’s disease genetics and clinicopathological correlations,” Acta Neurologica Scandinavica, vol. 135, no. 3, pp. 273–284, 2017.
  • [3] C. M. Karch, C. Cruchaga, and A. M. Goate, “Alzheimer’s disease genetics: from the bench to the clinic,” Neuron, vol. 83, no. 1, pp. 11–26, 2014.
  • [4] B. Prasad, “Hemophilia: Genetics, diagnosis and treatment,” INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH, vol. 7, no. 2, 2018.
  • [5] J.-M. Lee, V. C. Wheeler, M. J. Chao, J. P. G. Vonsattel, R. M. Pinto, D. Lucente, K. Abu-Elneel, E. M. Ramos, J. S. Mysore, T. Gillis et al., “Identification of genetic factors that modify clinical onset of Huntington’s disease,” Cell, vol. 162, no. 3, pp. 516–526, 2015.
  • [6] D. Trujillano, M. E. Weiss, J. Schneider, J. Köster, E. B. Papachristos, V. Saviouk, T. Zakharkina, N. Nahavandi, L. Kovacevic, and A. Rolfs, “Next-generation sequencing of the BRCA1 and BRCA2 genes for the genetic diagnostics of hereditary breast and/or ovarian cancer,” The Journal of molecular diagnostics, vol. 17, no. 2, pp. 162–170, 2015.
  • [7] R. Sehgal, K. Sheahan, P. R. O’Connell, A. M. Hanly, S. T. Martin, and D. C. Winter, “Lynch syndrome: an updated review,” Genes, vol. 5, no. 3, pp. 497–507, 2014.
  • [8] M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig et al., “Gene Ontology: tool for the unification of biology,” Nature genetics, vol. 25, no. 1, p. 25, 2000.
  • [9] S. Y. Rhee, V. Wood, K. Dolinski, and S. Draghici, “Use and misuse of the Gene Ontology annotations,” Nature Reviews Genetics, vol. 9, no. 7, p. 509, 2008.
  • [10] R. R. Gullapalli, K. V. Desai, L. Santana-Santos, J. A. Kant, and M. J. Becich, “Next generation sequencing in clinical medicine: Challenges and lessons for pathology and biomedical informatics,” Journal of pathology informatics, vol. 3, 2012.
  • [11] N. H. G. R. Institute, “The Human Genome Project completion: frequently asked questions,” 2010.
  • [12] K. Schwarze, J. Buchanan, J. C. Taylor, and S. Wordsworth, “Are whole-exome and whole-genome sequencing approaches cost-effective? a systematic review of the literature,” Genetics in Medicine, 2018.
  • [13] G. O. Consortium et al., “The Gene Ontology resource: 20 years and still GOing strong.” Nucleic acids research, 2018.
  • [14] A. Shehu, D. Barbará, and K. Molloy, “A survey of computational methods for protein function prediction,” in Big Data Analytics in Genomics.   Springer, 2016, pp. 225–298.
  • [15] T. Hawkins, M. Chitale, S. Luban, and D. Kihara, “PFP: Automated prediction of Gene Ontology functional annotations with confidence scores using protein sequence data,” Proteins: Structure, Function, and Bioinformatics, vol. 74, no. 3, pp. 566–582, 2009.
  • [16] I. K. Khan, Q. Wei, S. Chapman, D. Kihara et al., “The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches,” GigaScience, vol. 4, no. 1, p. 43, 2015.
  • [17] M. Falda, S. Toppo, A. Pescarolo, E. Lavezzo, B. Di Camillo, A. Facchinetti et al., “Argot2: a large scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms,” BMC bioinformatics, vol. 13, no. 4, p. S14, 2012.
  • [18] P. Koskinen, P. Törönen, J. Nokso-Koivisto, and L. Holm, “PANNZER: high-throughput functional annotation of uncharacterized proteins in an error-prone environment,” Bioinformatics, vol. 31, no. 10, pp. 1544–1552, 2015.
  • [19] K. Molloy, M. J. Van, D. Barbara, and A. Shehu, “Exploring representations of protein structure for automated remote homology detection and mapping of protein structure space,” BMC bioinformatics, vol. 15, no. 8, p. S4, 2014.
  • [20] H. Braberg, B. M. Webb, E. Tjioe, U. Pieper, A. Sali, and M. S. Madhusudhan, “SALIGN: a web server for alignment of multiple protein sequences and structures,” Bioinformatics, vol. 28, no. 15, pp. 2072–2073, 2012.
  • [21] J. O. Korbel, L. J. Jensen, C. Von Mering, and P. Bork, “Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs,” Nature biotechnology, vol. 22, no. 7, p. 911, 2004.
  • [22] L. Ferrer, J. M. Dale, and P. D. Karp, “A systematic study of genome context methods: calibration, normalization and combination,” BMC bioinformatics, vol. 11, no. 1, p. 493, 2010.
  • [23] N. Škunca, M. Bošnjak, A. Kriško, P. Panov, S. Džeroski et al., “Phyletic profiling with cliques of orthologs is enhanced by signatures of paralogy relationships,” PLoS computational biology, vol. 9, no. 1, p. e1002852, 2013.
  • [24] A. M. Altenhoff, N. Škunca, N. Glover, C.-M. Train, A. Sueki et al., “The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements,” Nucleic acids research, vol. 43, no. D1, pp. D240–D249, 2014.
  • [25] R. Cao and J. Cheng, “Integrated protein function prediction by mining function associations, sequences, and protein–protein and gene–gene interaction networks,” Methods, vol. 93, pp. 84–91, 2016.
  • [26] D. Wang and J. Hou, “Explore the hidden treasure in protein–protein interaction networks—an iterative model for predicting protein functions,” Journal of bioinformatics and computational biology, vol. 13, no. 05, p. 1550026, 2015.
  • [27] D. Piovesan, M. Giollo, E. Leonardi, C. Ferrari, and S. C. Tosatto, “INGA: protein function prediction combining interaction networks, domain assignments and sequence similarity,” Nucleic acids research, vol. 43, no. W1, pp. W134–W140, 2015.
  • [28] M. N. Wass, G. Barton, and M. J. Sternberg, “CombFunc: predicting protein function using heterogeneous data sources,” Nucleic acids research, vol. 40, no. W1, pp. W466–W470, 2012.
  • [29] S. Van Landeghem, K. Hakala, S. Rönnqvist, T. Salakoski, Y. Van de Peer, and F. Ginter, “Exploring biomolecular literature with EVEX: connecting genes through events, homology, and indirect associations,” Advances in bioinformatics, vol. 2012, 2012.
  • [30] S. Raychaudhuri, J. T. Chang, P. D. Sutphin, and R. B. Altman, “Associating genes with Gene Ontology codes using a maximum entropy analysis of biomedical literature,” Genome Research, vol. 12, no. 1, pp. 203–214, 2002.
  • [31] Y. Jiang, T. R. Oron, W. T. Clark, A. R. Bankapur, D. D’Andrea, R. Lepore et al., “An expanded evaluation of protein function prediction methods shows an improvement in accuracy,” Genome biology, vol. 17, no. 1, p. 184, 2016.
  • [32] E. J. Weyuker, “On testing non-testable programs,” The Computer Journal, vol. 25, no. 4, pp. 465–470, 1982. [Online]. Available:
  • [33] T. Y. Chen, S. C. Cheung, and S. M. Yiu, “Metamorphic Testing: a new approach for generating next test cases,” Technical Report HKUST-CS98-01, Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong, Tech. Rep., 1998.
  • [34] T. Y. Chen, F.-C. Kuo, H. Liu, P.-L. Poon, D. Towey, T. Tse, and Z. Q. Zhou, “Metamorphic testing: A review of challenges and opportunities,” ACM Computing Surveys (CSUR), vol. 51, no. 1, p. 4, 2018.
  • [35] Z. Q. Zhou, L. Sun, T. Y. Chen, and D. Towey, “Metamorphic relations for enhancing system understanding and use,” IEEE Transactions on Software Engineering, 2018.
  • [36] M. Srinivasan, M. Pourreza Shahri, I. Kahanda, and U. Kanewala, “Quality assurance of bioinformatics software: a case study of testing a biomedical text processing tool using Metamorphic Testing,” in Proceedings of the 3rd International Workshop on Metamorphic Testing.   ACM, 2018, pp. 26–33.
  • [37] A. Lundgren and U. Kanewala, “Experiences of testing bioinformatics programs for detecting subtle faults,” in Proceedings of the International Workshop on Software Engineering for Science.   ACM, 2016, pp. 16–22.
  • [38] A. Ramanathan, C. A. Steed, and L. L. Pullum, “Verification of compartmental epidemiological models using Metamorphic Testing, model checking and visual analytics,” in BioMedical Computing (BioMedCom), 2012 ASE/IEEE International Conference on.   IEEE, 2012, pp. 68–73.
  • [39] L. L. Pullum and O. Ozmen, “Early results from Metamorphic Testing of epidemiological models,” in 2012 ASE/IEEE International Conference on BioMedical Computing (BioMedCom).   IEEE, 2012, pp. 62–67.
  • [40] T. Y. Chen, J. W. Ho, H. Liu, and X. Xie, “An innovative approach for testing bioinformatics programs using Metamorphic Testing,” BMC bioinformatics, vol. 10, no. 1, p. 24, 2009.
  • [41] E. Giannoulatou, S.-H. Park, D. T. Humphreys, and J. W. Ho, “Verification and validation of bioinformatics software without a gold standard: a case study of BWA and Bowtie,” BMC bioinformatics, vol. 15, no. 16, p. S15, 2014.