Empirical Decision Rules for Improving the Uncertainty Reporting of Small Sample System Usability Scale Scores

01/02/2021

∙

The System Usability Scale (SUS) is a short, survey-based approach used to determine the usability of a system from an end user perspective once a prototype is available for assessment. Individual scores are gathered using a 10-question survey with the survey results reported in terms of central tendency (sample mean) as an estimate of the system's usability (the SUS study score), and confidence intervals on the sample mean are used to communicate uncertainty levels associated with this point estimate. When the number of individuals surveyed is large, the SUS study scores and accompanying confidence intervals relying upon the central limit theorem for support are appropriate. However, when only a small number of users are surveyed, reliance on the central limit theorem falls short, resulting in confidence intervals that suffer from parameter bound violations and interval widths that confound mappings to adjective and other constructed scales. These shortcomings are especially pronounced when the underlying SUS score data is skewed, as it is in many instances. This paper introduces an empirically-based remedy for such small-sample circumstances, proposing a set of decision rules that leverage either an extended bias-corrected accelerated (BCa) bootstrap confidence interval or an empirical Bayesian credibility interval about the sample mean to restore and bolster subsequent confidence interval accuracy. Data from historical SUS assessments are used to highlight shortfalls in current practices and to demonstrate the improvements these alternate approaches offer while remaining statistically defensible. A freely available, online application is introduced and discussed that automates SUS analysis under these decision rules, thereby assisting usability practitioners in adopting the advocated approaches.

READ FULL TEXT

Empirical Decision Rules for Improving the Uncertainty Reporting of Small Sample System Usability Scale Scores

Valid confidence intervals for μ, σ when there is only one observation available

The Accuracy of Confidence Intervals for Field Normalised Indicators

Error analysis for small-sample, high-variance data: Cautions for bootstrapping and Bayesian bootstrapping

Did You Mean...? Confidence-based Trade-offs in Semantic Parsing

Confidence intervals for AB-test

An Empirical Bayes Approach for Constructing the Confidence Intervals of Clonality and Entropy

Generative Machine Listener

Empirical Decision Rules for Improving the Uncertainty Reporting of Small Sample System Usability Scale Scores

Related Research

Valid confidence intervals for μ, σ when there is only one observation available

The Accuracy of Confidence Intervals for Field Normalised Indicators

Error analysis for small-sample, high-variance data: Cautions for bootstrapping and Bayesian bootstrapping

Did You Mean...? Confidence-based Trade-offs in Semantic Parsing

Confidence intervals for AB-test

An Empirical Bayes Approach for Constructing the Confidence Intervals of Clonality and Entropy

Generative Machine Listener