Minimum Sample Size for Developing a Multivariable Prediction Model using Multinomial Logistic Regression

07/26/2022
by   Alexander Pate, et al.
0

Multinomial logistic regression models allow one to predict the risk of a categorical outcome with more than 2 categories. When developing such a model, researchers should ensure the number of participants (n) is appropriate relative to the number of events (E.k) and the number of predictor parameters (p.k) for each category k. We propose three criteria to determine the minimum n required in light of existing criteria developed for binary outcomes. The first criteria aims to minimise the model overfitting. The second aims to minimise the difference between the observed and adjusted R2 Nagelkerke. The third criterion aims to ensure the overall risk is estimated precisely. For criterion (i), we show the sample size must be based on the anticipated Cox-snell R2 of distinct one-to-one logistic regression models corresponding to the sub-models of the multinomial logistic regression, rather than on the overall Cox-snell R2 of the multinomial logistic regression. We tested the performance of the proposed criteria (i) through a simulation study, and found that it resulted in the desired level of overfitting. Criterion (ii) and (iii) are natural extensions from previously proposed criteria for binary outcomes. We illustrate how to implement the sample size criteria through a worked example considering the development of a multinomial risk prediction model for tumour type when presented with an ovarian mass. Code is provided for the simulation and worked example. We will embed our proposed criteria within the pmsampsize R library and Stata modules.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2021

Risk prediction models for discrete ordinal outcomes: calibration and the impact of the proportional odds assumption

Calibration is a vital aspect of the performance of risk prediction mode...
research
01/21/2020

Clinical Prediction Models to Predict the Risk of Multiple Binary Outcomes: a comparison of approaches

Clinical prediction models (CPMs) are used to predict clinically relevan...
research
11/13/2022

Prediction of spatial distribution of debris-flow hit probability considering the source-location uncertainty

Prediction of the extent and probability of debris flow under rainfall c...
research
03/01/2019

On the complexity of logistic regression models

We investigate the complexity of logistic regression models which is def...
research
03/23/2023

Logistic Regression Equivalence: A Framework for Comparing Logistic Regression Models Across Populations

In this paper we discuss how to evaluate the differences between fitted ...
research
04/20/2020

Predicting nucleation near the spinodal in the Ising model using machine learning

We predict the occurrence of nucleation in the two-dimensional Ising mod...
research
10/28/2021

Robust model-based estimation for binary outcomes in genomics studies

In quantitative genetics, statistical modeling techniques are used to fa...

Please sign up or login with your details

Forgot password? Click here to reset