A statistical normalization method and differential expression analysis for RNA-seq data between different species

10/04/2018
by   Yan Zhou, et al.
0

Background: High-throughput techniques bring novel tools but also statistical challenges to genomic research. Identifying genes with differential expression between different species is an effective way to discover evolutionarily conserved transcriptional responses. To remove systematic variation between different species for a fair comparison, the normalization procedure serves as a crucial pre-processing step that adjusts for the varying sample sequencing depths and other confounding technical effects. Results: In this paper, we propose a scale based normalization (SCBN) method by taking into account the available knowledge of conserved orthologous genes and hypothesis testing framework. Considering the different gene lengths and unmapped genes between different species, we formulate the problem from the perspective of hypothesis testing and search for the optimal scaling factor that minimizes the deviation between the empirical and nominal type I errors. Conclusions: Simulation studies show that the proposed method performs significantly better than the existing competitor in a wide range of settings. An RNA-seq dataset of different species is also analyzed and it coincides with the conclusion that the proposed method outperforms the existing method. For practical applications, we have also developed an R package named "SCBN" and the software is available at http://www.bioconductor.org/packages/devel/bioc/html/SCBN.html.

READ FULL TEXT

page 1

page 10

research
03/29/2020

The covariance shift (C-SHIFT) algorithm for normalizing biological data

Omics technologies are powerful tools for analyzing patterns in gene exp...
research
04/29/2019

Individualized Treatment Selection: An Optimal Hypothesis Testing Approach In High-dimensional Models

The ability to predict individualized treatment effects (ITEs) based on ...
research
03/27/2019

Jaccard/Tanimoto similarity test and estimation methods

Binary data are used in a broad area of biological sciences. Using binar...
research
02/22/2018

SMAGEXP: a galaxy tool suite for transcriptomics data meta-analysis

Bakground: With the proliferation of available microarray and high throu...
research
05/08/2022

Assigning Species Information to Corresponding Genes by a Sequence Labeling Framework

The automatic assignment of species information to the corresponding gen...
research
09/13/2023

Simultaneous inference for generalized linear models with unmeasured confounders

Tens of thousands of simultaneous hypothesis tests are routinely perform...
research
08/21/2019

Efficient and powerful equivalency test on combined mean and variance with application to diagnostic device comparison studies

In medical device comparison studies, equivalency test is commonly used ...

Please sign up or login with your details

Forgot password? Click here to reset