Making SMART decisions in prophylaxis and treatment studies
The optimal prophylaxis, and treatment if the prophylaxis fails, for a disease may be best evaluated using a sequential multiple assignment randomised trial (SMART). A SMART is a multi-stage study that randomises a participant to an initial treatment, observes some response to that treatment and then, depending on their observed response, randomises the same participant to an alternative treatment. Response adaptive randomisation may, in some settings, improve the trial participants' outcomes and expedite trial conclusions, compared to fixed randomisation. But 'myopic' response adaptive randomisation strategies, blind to multistage dynamics, may also result in suboptimal treatment assignments. We propose a 'dynamic' response adaptive randomisation strategy based on Q-learning, an approximate dynamic programming algorithm. Q-learning uses stage-wise statistical models and backward induction to incorporate late-stage 'payoffs' (i.e. clinical outcomes) into early-stage 'actions' (i.e. treatments). Our real-world example consists of a COVID-19 prophylaxis and treatment SMART with qualitatively different binary endpoints at each stage. Standard Q-learning does not work with such data because it cannot be used for sequences of binary endpoints. Sequences of qualitatively distinct endpoints may also require different weightings to ensure that the design guides participants to regimens with the highest utility. We describe how a simple decision-theoretic extension to Q-learning can be used to handle sequential binary endpoints with distinct utilities. Using simulation we show that, under a set of binary utilities, the 'dynamic' approach increases expected participant utility compared to the fixed approach, sometimes markedly, for all model parameters, whereas the 'myopic' approach can actually decrease utility.
READ FULL TEXT