Multiresolution categorical regression for interpretable cell type annotation

08/29/2022
by   Aaron J. Molstad, et al.
0

In many categorical response regression applications, the response categories admit a multiresolution structure. That is, subsets of the response categories may naturally be combined into coarser response categories. In such applications, practitioners are often interested in estimating the resolution at which a predictor affects the response category probabilities. In this article, we propose a method for fitting the multinomial logistic regression model in high dimensions that addresses this problem in a unified and data-driven way. In particular, our method allows practitioners to identify which predictors distinguish between coarse categories but not fine categories, which predictors distinguish between fine categories, and which predictors are irrelevant. For model fitting, we propose a scalable algorithm that can be applied when the coarse categories are defined by either overlapping or nonoverlapping sets of fine categories. Statistical properties of our method reveal that it can take advantage of this multiresolution structure in a way existing estimators cannot. We use our method to model cell type probabilities as a function of a cell's gene expression profile (i.e., cell type annotation). Our fitted model provides novel biological insights which may be useful for future automated and manual cell type annotation methodology.

READ FULL TEXT
research
11/23/2021

Binned multinomial logistic regression for integrative cell type annotation

Categorizing individual cells into one of many known cell type categorie...
research
07/15/2020

A likelihood-based approach for multivariate categorical response regression in high dimensions

We propose a penalized likelihood method to fit the bivariate categorica...
research
10/19/2021

On Clustering Categories of Categorical Predictors in Generalized Linear Models

We propose a method to reduce the complexity of Generalized Linear Model...
research
05/10/2017

Automatic Response Category Combination in Multinomial Logistic Regression

We propose a penalized likelihood method that simultaneously fits the mu...
research
01/12/2015

SPRITE: A Response Model For Multiple Choice Testing

Item response theory (IRT) models for categorical response data are wide...
research
11/08/2021

Bayesian profile regression for clustering analysis involving a longitudinal response and explanatory variables

The identification of sets of co-regulated genes that share a common fun...
research
01/05/2021

Weight-of-evidence 2.0 with shrinkage and spline-binning

In many practical applications, such as fraud detection, credit risk mod...

Please sign up or login with your details

Forgot password? Click here to reset