A systematic evaluation of methods for cell phenotype classification using single-cell RNA sequencing data

10/01/2021
by   Xiaowen Cao, et al.
0

Background: Single-cell RNA sequencing (scRNA-seq) yields valuable insights about gene expression and gives critical information about complex tissue cellular composition. In the analysis of single-cell RNA sequencing, the annotations of cell subtypes are often done manually, which is time-consuming and irreproducible. Garnett is a cell-type annotation software based the on elastic net method. Besides cell-type annotation, supervised machine learning methods can also be applied to predict other cell phenotypes from genomic data. Despite the popularity of such applications, there is no existing study to systematically investigate the performance of those supervised algorithms in various sizes of scRNA-seq data sets. Methods and Results: This study evaluates 13 popular supervised machine learning algorithms to classify cell phenotypes, using published real and simulated data sets with diverse cell sizes. The benchmark contained two parts. In the first part, we used real data sets to assess the popular supervised algorithms' computing speed and cell phenotype classification performance. The classification performances were evaluated using AUC statistics, F1-score, precision, recall, and false-positive rate. In the second part, we evaluated gene selection performance using published simulated data sets with a known list of real genes. Conclusion: The study outcomes showed that ElasticNet with interactions performed best in small and medium data sets. NB was another appropriate method for medium data sets. In large data sets, XGB works excellent. Ensemble algorithms were not significantly superior to individual machine learning methods. Adding interactions to ElasticNet can help, and the improvement was significant in small data sets.

READ FULL TEXT

page 1

page 20

research
07/28/2022

MarkerMap: nonlinear marker selection for single-cell studies

Single-cell RNA-seq data allow the quantification of cell type differenc...
research
01/12/2016

Robust Lineage Reconstruction from High-Dimensional Single-Cell Data

Single-cell gene expression data provide invaluable resources for system...
research
04/22/2022

EmbedTrack – Simultaneous Cell Segmentation and Tracking Through Learning Offsets and Clustering Bandwidths

A systematic analysis of the cell behavior requires automated approaches...
research
06/05/2015

Global Gene Expression Analysis Using Machine Learning Methods

Microarray is a technology to quantitatively monitor the expression of l...
research
06/15/2021

Active feature selection discovers minimal gene-sets for classifying cell-types and disease states in single-cell mRNA-seq data

Sequencing costs currently prohibit the application of single cell mRNA-...
research
12/19/2019

Reconstruction of Gene Regulatory Networks usingMultiple Datasets

Motivation: Laboratory gene regulatory data for a species are sporadic. ...
research
02/19/2014

A Statistical Approach to Set Classification by Feature Selection with Applications to Classification of Histopathology Images

Set classification problems arise when classification tasks are based on...

Please sign up or login with your details

Forgot password? Click here to reset