Robust model-based estimation for binary outcomes in genomics studies

10/28/2021
by   Suyoung Park, et al.
0

In quantitative genetics, statistical modeling techniques are used to facilitate advances in the understanding of which genes underlie agronomically important traits and have enabled the use of genome-wide markers to accelerate genetic gain. The logistic regression model is a statistically optimal approach for quantitative genetics analysis of binary traits. To encourage more widespread use of the logistic model in such analyses, efforts need to be made to address separation, which occurs whenever a specific combination of predictors can perfectly predict the value of a binary trait. Data separation is especially prevalent in applications where the number of predictors is near the sample size. In this study we motivate a logistic model that is robust to separation, and we develop a novel prediction procedure for this robust model that is appropriate when separation exists. We show that this robust model offers superior inferences and comparable predictions to existing approaches while remaining true to the logistic model. This is an improvement to previously existing approaches which treats separation as a modeling shortcoming and not an antagonistic data configuration. Previous approaches, therefore, change the modeling paradigm to consider separation that, before our robust model exists, is problematic to logistic models. Our comparisons are conducted on several didactic examples and a genomics study on the kernel color in maize. The ensuing analyses reaffirm the billed superior inferences and comparable predictive performance of our robust model. Therefore, our approach provides scientists with an appropriate statistical modeling framework for analyses involving agronomically important binary traits.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2022

Efficient Penalized Generalized Linear Mixed Models for Variable Selection and Genetic Risk Prediction in High-Dimensional Data

Sparse regularized regression methods are now widely used in genome-wide...
research
02/21/2022

Statistical Inference for Genetic Relatedness Based on High-Dimensional Logistic Regression

This paper studies the problem of statistical inference for genetic rela...
research
10/17/2015

A General Method for Robust Bayesian Modeling

Robust Bayesian models are appealing alternatives to standard models, pr...
research
05/02/2021

Zero-inflated generalized extreme value regression model for binary data and application in health study

Logistic regression model is widely used in many studies to investigate ...
research
07/26/2022

Minimum Sample Size for Developing a Multivariable Prediction Model using Multinomial Logistic Regression

Multinomial logistic regression models allow one to predict the risk of ...
research
09/26/2019

A bivariate logistic regression model based on latent variables

Bivariate observations of binary and ordinal data arise frequently and r...
research
05/21/2008

An ensemble approach to improved prediction from multitype data

We have developed a strategy for the analysis of newly available binary ...

Please sign up or login with your details

Forgot password? Click here to reset