Assessing the Reproducibility of Machine-learning-based Biomarker Discovery in Parkinson's Disease

by   Ali Amelia, et al.

Genome-Wide Association Studies (GWAS) help identify genetic variations in people with diseases such as Parkinson's disease (PD), which are less common in those without the disease. Thus, GWAS data can be used to identify genetic variations associated with the disease. Feature selection and machine learning approaches can be used to analyze GWAS data and identify potential disease biomarkers. However, GWAS studies have technical variations that affect the reproducibility of identified biomarkers, such as differences in genotyping platforms and selection criteria for individuals to be genotyped. To address this issue, we collected five GWAS datasets from the database of Genotypes and Phenotypes (dbGaP) and explored several data integration strategies. We evaluated the agreement among different strategies in terms of the Single Nucleotide Polymorphisms (SNPs) that were identified as potential PD biomarkers. Our results showed a low concordance of biomarkers discovered using different datasets or integration strategies. However, we identified fifty SNPs that were identified at least twice, which could potentially serve as novel PD biomarkers. These SNPs are indirectly linked to PD in the literature but have not been directly associated with PD before. These findings open up new potential avenues of investigation.


page 1

page 2

page 3

page 4


Prediction of Alzheimer's disease-associated genes by integration of GWAS summary data and expression data

Alzheimer's disease is the most common cause of dementia. It is the fift...

Object-Attribute Biclustering for Elimination of Missing Genotypes in Ischemic Stroke Genome-Wide Data

Missing genotypes can affect the efficacy of machine learning approaches...

Meta-analysis of Gene Expression in Neurodegenerative Diseases Reveals Patterns in GABA Synthesis and Heat Stress Pathways

Neurodegenerative diseases are characterized as the progressive loss of ...

Large-scale machine learning-based phenotyping significantly improves genomic discovery for optic nerve head morphology

Genome-wide association studies (GWAS) require accurate cohort phenotypi...

Ensemble feature selection with data-driven thresholding for Alzheimer's disease biomarker discovery

Healthcare datasets present many challenges to both machine learning and...

Knowledge-Driven Mechanistic Enrichment of the Preeclampsia Ignorome

Preeclampsia is a leading cause of maternal and fetal morbidity and mort...

How word semantics and phonology affect handwriting of Alzheimer's patients: a machine learning based analysis

Using kinematic properties of handwriting to support the diagnosis of ne...

Please sign up or login with your details

Forgot password? Click here to reset