The efficacy of various machine learning models for multi-class classification of RNA-seq expression data

08/19/2019
by   Sterling Ramroach, et al.
0

Late diagnosis and high costs are key factors that negatively impact the care of cancer patients worldwide. Although the availability of biological markers for the diagnosis of cancer type is increasing, costs and reliability of tests currently present a barrier to the adoption of their routine use. There is a pressing need for accurate methods that enable early diagnosis and cover a broad range of cancers. The use of machine learning and RNA-seq expression analysis has shown promise in the classification of cancer type. However, research is inconclusive about which type of machine learning models are optimal. The suitability of five algorithms were assessed for the classification of 17 different cancer types. Each algorithm was fine-tuned and trained on the full array of 18,015 genes per sample, for 4,221 samples (75 of the dataset). They were then tested with 1,408 samples (25 for which cancer types were withheld to determine the accuracy of prediction. The results show that ensemble algorithms achieve 100 classification of 14 out of 17 types of cancer. The clustering and classification models, while faster than the ensembles, performed poorly due to the high level of noise in the dataset. When the features were reduced to a list of 20 genes, the ensemble algorithms maintained an accuracy above 95 opposed to the clustering and classification models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2020

Machine Learning Against Cancer: Accurate Diagnosis of Cancer by Machine Learning Classification of the Whole Genome Sequencing Data

Machine learning can precisely identify different cancer tumors at any s...
research
12/20/2018

A Method to Facilitate Cancer Detection and Type Classification from Gene Expression Data using a Deep Autoencoder and Neural Network

With the increased affordability and availability of whole-genome sequen...
research
05/25/2022

A Comparative Study of Gastric Histopathology Sub-size Image Classification: from Linear Regression to Visual Transformer

Gastric cancer is the fifth most common cancer in the world. At the same...
research
11/18/2019

Drug Repurposing for Cancer: An NLP Approach to Identify Low-Cost Therapies

More than 200 generic drugs approved by the U.S. Food and Drug Administr...
research
08/05/2023

A Voting-Stacking Ensemble of Inception Networks for Cervical Cytology Classification

Cervical cancer is one of the most severe diseases threatening women's h...
research
09/09/2019

OncoNetExplainer: Explainable Predictions of Cancer Types Based on Gene Expression Data

The discovery of important biomarkers is a significant step towards unde...
research
04/02/2022

Cancer Subtyping via Embedded Unsupervised Learning on Transcriptomics Data

Cancer is one of the deadliest diseases worldwide. Accurate diagnosis an...

Please sign up or login with your details

Forgot password? Click here to reset