Tutorial: a priori estimation of sample size, effect size, and statistical power for cluster analysis, latent class analysis, and multivariate mixture models

09/02/2023
by   Edwin S. Dalmaijer, et al.
0

Before embarking on data collection, researchers typically compute how many individual observations they should do. This is vital for doing studies with sufficient statistical power, and often a cornerstone in study pre-registrations and grant applications. For traditional statistical tests, one would typically determine an acceptable level of statistical power, (gu)estimate effect size, and then use both values to compute the required sample size. However, for analyses that identify subgroups, statistical power is harder to establish. Once sample size reaches a sufficient threshold, effect size is primarily determined by the number of measured features and the underlying subgroup separation. As a consequence, a priory computations of statistical power are notoriously complex. In this tutorial, I will provide a roadmap to determining sample size and effect size for analyses that identify subgroups. First, I introduce a procedure that allows researchers to formalise their expectations about effect sizes in their domain of choice, and use this to compute the minimally required number of measured variables. Next, I outline how to establish the minimum sample size in subgroup analyses. Finally, I use simulations to provide a reference table for the most popular subgroup analyses: k-means, Ward agglomerative hierarchical clustering, c-means fuzzy clustering, latent class analysis, latent profile analysis, and Gaussian mixture modelling. The table shows the minimum numbers of observations per expected subgroup (sample size) and features (measured variables) to achieve acceptable statistical power, and can be readily used in study design.

READ FULL TEXT
research
03/01/2020

Statistical power for cluster analysis

Cluster algorithms are gaining in popularity due to their compelling abi...
research
10/11/2022

Synthetic Power Analyses: Empirical Evaluation and Application to Cognitive Neuroimaging

In the experimental sciences, statistical power analyses are often used ...
research
09/16/2022

mpower: An R Package for Power Analysis via Simulation for Correlated Data

Estimating sample size and statistical power is an essential part of a g...
research
09/16/2020

Argus: Interactive a priori Power Analysis

A key challenge HCI researchers face when designing a controlled experim...
research
09/20/2022

Effects of Influential Points and Sample Size on the Selection and Replicability of Multivariable Fractional Polynomial Models

The multivariable fractional polynomial (MFP) procedure combines variabl...
research
08/27/2020

Analytical and statistical properties of local depth functions motivated by clustering applications

Local depth functions (LDFs) are used for describing the local geometric...
research
08/05/2019

Sample size calculations for the experimental comparison of multiple algorithms on multiple problem instances

This work presents a statistically principled method for estimating the ...

Please sign up or login with your details

Forgot password? Click here to reset