Genetic heterogeneity analysis using genetic algorithm and network science

08/12/2023
by   Zhendong Sha, et al.
0

Through genome-wide association studies (GWAS), disease susceptible genetic variables can be identified by comparing the genetic data of individuals with and without a specific disease. However, the discovery of these associations poses a significant challenge due to genetic heterogeneity and feature interactions. Genetic variables intertwined with these effects often exhibit lower effect-size, and thus can be difficult to be detected using machine learning feature selection methods. To address these challenges, this paper introduces a novel feature selection mechanism for GWAS, named Feature Co-selection Network (FCSNet). FCS-Net is designed to extract heterogeneous subsets of genetic variables from a network constructed from multiple independent feature selection runs based on a genetic algorithm (GA), an evolutionary learning algorithm. We employ a non-linear machine learning algorithm to detect feature interaction. We introduce the Community Risk Score (CRS), a synthetic feature designed to quantify the collective disease association of each variable subset. Our experiment showcases the effectiveness of the utilized GA-based feature selection method in identifying feature interactions through synthetic data analysis. Furthermore, we apply our novel approach to a case-control colorectal cancer GWAS dataset. The resulting synthetic features are then used to explain the genetic heterogeneity in an additional case-only GWAS dataset.

READ FULL TEXT

page 10

page 11

page 14

research
04/28/2017

A Tribe Competition-Based Genetic Algorithm for Feature Selection in Pattern Classification

Feature selection has always been a critical step in pattern recognition...
research
05/22/2019

Selection of a Minimal Number of Significant Porcine SNPs by an Information Gain and Genetic Algorithm Hybrid Model

A panel of large number of common Single Nucleotide Polymorphisms (SNPs)...
research
11/10/2011

Genetic Algorithm (GA) in Feature Selection for CRF Based Manipuri Multiword Expression (MWE) Identification

This paper deals with the identification of Multiword Expressions (MWEs)...
research
10/22/2021

Adaptability of Improved NEAT in Variable Environments

A large challenge in Artificial Intelligence (AI) is training control ag...
research
07/27/2016

Network-Guided Biomarker Discovery

Identifying measurable genetic indicators (or biomarkers) of a specific ...
research
05/29/2018

Currency exchange prediction using machine learning, genetic algorithms and technical analysis

Technical analysis is used to discover investment opportunities. To test...
research
03/12/2021

GA for feature selection of EEG heterogeneous data

The electroencephalographic (EEG) signals provide highly informative dat...

Please sign up or login with your details

Forgot password? Click here to reset