Time-evolving psychological processes over repeated decisions

06/26/2019
by   David Gunawan, et al.
0

Many psychological experiments have participants repeat a simple task. This repetition is often necessary in order to gain the statistical precision required to answer questions about quantitative theories of the psychological processes underlying performance. In such experiments, time-on-task can have important and sizable effects on performance, changing the psychological processes under investigation in interesting ways. These changes are often ignored, and the underlying model is treated as static. We apply modern statistical approaches to extend a static model of decision-making to account for changes with time-on-task. Using data from three highly-cited experiments, we show that there are changes in performance with time-on-task, and that these changes vary substantially over individuals - both in magnitude and direction. Model-based analysis reveals how different cognitive processes contribute to the observed changes. We find strong evidence in favor of a first order autoregressive process governing the time-based evolution of individual subjects' model parameters. The central idea of our approach can be applied quite generally to quantitative psychological theories, beyond the model that we investigate and the experimental data that we use.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

01/25/2013

Explorative Data Analysis for Changes in Neural Activity

Neural recordings are nonstationary time series, i.e. their properties t...
04/06/2020

Analyzing 3D Volume Segmentation by Low-level Perceptual Cues, High-level Cognitive Tasks, and Decision-making Processes

3D volume segmentation is a fundamental task in many scientific and medi...
10/09/2020

Autoregressive Networks

We propose a first-order autoregressive model for dynamic network proces...
01/31/2018

'It's Reducing a Human Being to a Percentage'; Perceptions of Justice in Algorithmic Decisions

Data-driven decision-making consequential to individuals raises importan...
06/30/2020

Incremental Calibration of Architectural Performance Models with Parametric Dependencies

Architecture-based Performance Prediction (AbPP) allows evaluation of th...
10/16/2019

Identifying relationships between cognitive processes across tasks, contexts, and time

It is commonly assumed that a specific testing occasion (task, design, p...
03/14/2022

Inverse Online Learning: Understanding Non-Stationary and Reactionary Policies

Human decision making is well known to be imperfect and the ability to a...

1 A New Approach

We propose advancing scientific approaches to time-on-task effects by incorporating recent advances in statistical estimation. The key idea is to model changes in performance over time by allowing the parameters of a cognitive model to change with time-on-task, but to impose constraints upon those parameters by employing time-varying (dynamic) statistical models. These constraints combine the advantages of previous approaches, such as coherently pooling information across different blocks, while avoiding some of the limitations, such as assuming independence between blocks, or assuming rigid functional forms for the time effects.

We illustrate and test our methods using data from three experiments: the one discussed above (Forstmann ., 2008), and two reported by Wagenmakers . (2008). As has become standard in cognitive modelling, we adopt a Bayesian hierarchical approach to the analysis, which allows each participant to have a unique set of parameters for a cognitive model of the task, and that these parameters are constrained to follow a parametric distribution across participants. The extensions we now develop allow each individual participant’s parameters to change with time-on-task. We operationalize “time-on-task” by splitting the data from each participant into blocks of trials. This allows for different ways in which the effects of time-on-task might be incorporated into a psychological theory:

Independent hierarchy:

Applying multi-level hierarchical structures to general linear models has become standard in many areas of psychological research. Including a level for blocks of trials (to model time-on-task) allows for coherent pooling of information across blocks, and does not impose a rigid functional form on the time varying parameters. However, the standard hierarchical approach assumes independence between the parameters from different blocks of trials, making the incorrect assumption that there is no “smoothness” in changes over blocks.

Regression models on the parameters:

The standard approach of assuming a fixed functional form for the change in parameters across time is powerful because it removes difficulties associated with blocks of trials that are small (since time-on-task can be a continuous covariate). The main limitation is the difficulty in choosing an appropriate functional form. We explore a simple extension which relaxes this limitation. Explicitly modelling residual variation in the block-by-block parameters allows for individual subjects, in individual time periods, to have parameters for the cognitive model which deviate from the deterministic function.

Random walk on the parameters:

Given its prevalence in cognitive modelling, it is natural to explore random walks (or other Markov processes) on the cognitive model’s parameters. In the simplest version of the random walk, the parameters in each time period (block of trials) differ from the parameters in the previous time period by a random draw from a static distribution. This allows for pooling of information across trials, and also models the dependence between data from nearby time periods. However, we do not explore this approach further because it makes the psychologically implausible prediction that the variance in parameters (across participants) grows infinitely large in time.

Autoregressive models:

These provide a simple way to model time series data, in which the parameters from one or more previous blocks are used in a linear regression model to predict the parameters for the next block. Autoregressive (or “AR”) models account for the dependence of parameters from nearby (in time) blocks, allow for coherent averaging of data across trials, and do not enforce rigid functional forms for the overall change in parameters with time-on-task.

We explored the practical and scientific performance of these approaches in psychological data. This required developing sophisticated estimation approaches, described below. While the examples we provide are particular to the data and model we explore, both the statistical approach and the new estimation methods are quite general and likely to be of use in a wide range of investigations.

2 Methods

2.1 Modelling Approach

All three of the data sets considered here consist of simple decisions. It has become standard to model such data using evidence accumulation models (Ratcliff ., 2016; Donkin  Brown, 2018). We adopt the linear ballistic accumulator (LBA; Brown  Heathcote, 2008), which is a well-established accumulator model that has previously been applied to data including those analyzed here. The LBA represents a decision between two options (such as “word” vs. “non-word” or “leftward motion” vs. “rightward motion”) as a race between two accumulators – see Figure 2

. Each accumulator gathers evidence in favor of one of the two responses. The amount of evidence in each accumulator increases with passing decision time, until the evidence in one of the accumulators reaches a threshold amount, which triggers a decision response. The model makes quantitative predictions about the joint distribution over response times and decision outcomes. These predictions are specified by the values given to the model’s parameters: the height of the response threshold, the distributions of drift rates and starting points for evidence accumulation, and the amount of time taken by processes outside of the decision itself.

Figure 2: The linear ballistic accumulator (LBA) model of decision-making represents a choice as a race between two accumulators. The accumulators gather evidence until one of them reaches a threshold, which triggers a decision response. The time taken to make the decision is the time taken to reach threshold, plus a constant offset amount for processes unrelated to the decision itself. Variability arises from the starting point of the evidence accumulation process and the speed of the accumulation process, which both vary randomly from trial-to-trial and independently in each accumulator.

Modern applications of the LBA model use a hierarchical Bayesian implementation. Mostly, these applications have used differential evolution Markov chain Monte-Carlo (MCMC) to estimate the posterior distribution over parameters

(Turner ., 2013). In the current analyses, we used a more efficient estimation procedure based on particle MCMC methods developed by Gunawan . (2019). Our notation follows Gunawan . (2019). Decisions in all three experiments under consideration were always forced choices between two alternatives. For these, the decision for the participant contains two pieces of information. The first is the response choice, which we denote . The second is the response time, which we denote as . If we assume that the variance of the drift rate distributions in the LBA model are always (see also Donkin ., 2009)

, then the predictions of the model for any particular decision are defined by a vector of five parameters:

Here, is the decision threshold adopted by participant on trial – the amount of evidence required to trigger a decision. Parameter

is the width of the uniform distribution of start points for the same participant and trial, and

is the amount of time taken by processes other than evidence accumulation (also called the “non-decision time”). The values and give the mean of the drift rate distributions for the two racing accumulators. The model parameters are constrained by the experimental conditions in sensible ways. For example, the thresholds are allowed to be different for speed-emphasis vs. accuracy-emphasis trials, but they are constrained to be equal for all speed-emphasis trials. With the usual assumptions of independence, the conditional density of all the observations is

(1)

where is the number of subjects and is the number of decisions by each subject.

We transformed the model’s parameters to the full real line, which allows the assumption of a multivariate normal distribution at the group-level. For this, we estimate

instead of directly estimating , which ensures that the decision threshold is always larger than the starting point of evidence accumulation. We log-transform all the parameters, which we write as:

where for each model parameter . In the following, we will sometimes will drop the notation , to improve readability, but we will note when this is done. The group distribution of participants’ individual parameters (i.e., random effects) is assumed to be multivariate normal (), with mean and covariance matrix :

As described by Gunawan . (2019), we estimate the full covariance matrix, which is advantageous when working with models such as the LBA in which parameters tend to covary strongly.

2.1.1 Three Time-Varying LBA Models

We consider three extensions of the standard, static LBA model that allow time-varying parameters for individual participants, and we denote these extensions as “AR”, “trend”, and “AR+trend”. The dynamic models all assume that the parameters for any participant in any given block are constant, but that those parameters evolve over blocks. The three models differ in the way that this evolution occurs. The AR model assumes that parameters evolve as a lag-1 autoregressive process. The trend model assumes that parameters evolve according to a polynomial regression process with linear and quadratic trends. The AR+trend model combines both approaches:

  1. AR: This extension assumes a first order autoregressive (AR(1)) process for the random effects. For each subject, , and for each block, , the density of is

    (2)
    (3)

    The autoregressive coefficient is scalar and is the same for all , to obtain parameter parsimony, and is constrained to the interval to ensure that distant time blocks become uncorrelated. Values of near to 1 indicate a high correlation between adjacent blocks, while values of near zero mean that the blocks are essentially independent of each other. Equation (3) initializes the sequence by providing a prior distribution for . The location of the prior distribution is , its covariance matrix is , where is a scalar. We believe that this is a sensible prior for , and is used in the work below with .

    We assume the following priors for the parameters in Equations (2) and (3).

    (4)

    with the dimension of . denotes the inverse Wishart distribution. For the autoregressive parameter, , we follow Kim . (1998) and use the prior .

  2. Trend: This extension assumes a polynomial trend for the random effects, rather than just a constant mean. We considered the model such that for each subject , and for each time block, , the density of is

    (5)

    where the component of is , . The prior for the polynomial coefficients is a standard normal distribution: , , and the prior for is the same as in the AR model above, specified in Equation (4).

  3. AR+Trend: This extension is a combination of the AR(1) and the trend model, allowing both a polynomial trend as well as adjacent blocks to be correlated. Thus, for each subject , and for each block, , we write , where is defined as in the trend model and the density of is

    (6)
    (7)

    Thus, is the sum of a trend and a zero mean AR(1) model . We can more succinctly write this as

    (8)
    (9)

    The priors for , , and are all as above for the AR and trend models.

2.1.2 Model Estimation

The statistical difficulties associated with reliably estimating time-varying cognitive models have previously presented barriers to the theory-based investigation of time-varying effects, including of practice, fatigue, and learning. In addition to the updated statistical modelling approaches outlined above, our investigations are made possible by the new estimation methods we have developed, which build on recent developments in statistical treatments for the (static) LBA model (Gunawan ., 2019). Our estimation approaches use particle Markov chain Monte-Carlo (MCMC) to estimate the time-varying random effects by defining an augmented parameter space which includes copies of all the model’s parameters, and the trajectories (history) of the particles representing these. This allows for efficient sampling, while maintaining the convergence properties necessary for inference. We also extend the importance sampling based method of Gunawan . (2019)

to robustly and efficiently estimate the marginal likelihood of each dynamic model. This is an important addition, as the marginal likelihood supports model selection via Bayes factors, which is critical for answering the scientific questions of interest. We follow

Tran . (2019) and call this method Importance Sampling Squared, or .

The Appendix gives full details of the estimation methods, including statistical considerations related to the new algorithms we have developed, and practical computational details. The code to implement the algorithms is available from osf.io/x29wb.

3 Simulation Study

We ran a simulation study to provide confidence that the methods we propose are able to estimate the time-varying LBA models in practical sample sizes. We simulated data from each of four models: the standard, static, LBA model, and each of the three time-varying LBA model variants introduced above. For each simulated data set, we used the method to estimate the marginal likelihoods of the four models. If the methods work as expected, then the marginal likelihoods should favor the data-generating model, at least in large enough simulated data samples (i.e., a model recovery study).

We simulated data for a hypothetical experiment which mimics the design used by Forstmann . (2008); the real data from that study are investigated below. The simulated participants made repeated decisions about the motion direction of a random dot kinematogram. For each decision, participants were randomly cued to emphasize decision speed, decision accuracy, or adopt a neutral emphasis. The different emphasis conditions were modeled by setting different evidence threshold parameters in the LBA (, , ). The remaining model parameters were held constant over all conditions. The vector of random effects for subject during block is

(10)

The parameters and refer to the means of the drift rate distributions for the two accumulators. The superscript , for “correct”, is for the accumulator where the associated response matches the actual stimulus direction, and , for “error”, is for the accumulator where the associated response does not match the actual stimulus direction. Actual estimation (sampling) was performed on the log of the vector in (10): .

We simulated data from participants and investigated two different ways of trading off the size of the time periods (blocks) vs. the number of blocks. In both cases, the total number of simulated trials per participant was , but this was either split into many smaller blocks ( time periods of trials each) or fewer larger blocks ( time periods of trials each). In each case, we generated simulated data in turn from the four different models.

When analyzing the simulated data in the model recovery study, we used the particle Metropolis-within-Gibbs (PMwG) method (see the Appendix) to generate samples from the posterior distributions with the number of particles set to . We employed three different sampling stages. In the initial stage, the first iterates were discarded as burnin. The next iterates were used in the adaptation stage to construct the efficient proposal densities for the final sampling stage, including estimates of the covariance matrix. Finally, a total of MCMC posterior draws were obtained in the sampling stage. When estimating each of the time-varying LBA models, we used the same values of and which were used to generate the synthetic data, and when the synthetic data were generated by the static LBA model, we used and when estimating the dynamic models.

We used the method to estimate marginal likelihood for each of the LBA models. Appendix A discusses the mixtures of normal proposals that were used to obtain the proposal distributions for the parameters and random effects (see particularly the material around Equation 26). We then ran the algorithm with

importance samples to estimate the log of the marginal likelihood. The number of particles used to obtain the unbiased estimate of the likelihood is set to

, and the Monte Carlo standard errors of the log of the marginal likelihood estimate were obtained by bootstrapping the importance samples. More detail about the

method is provided by Tran . (2019).

Table 1 summarizes the results of the model recovery simulation, which reports the log of the estimated marginal likelihood for each of the four models (columns) when applied to data generated by each of the four models (rows). The entries in each row have had the entry for the data-generating model subtracted, so that a negative entry indicate less evidence for a model in question than for the data-generating model. In each row, the highest likelihood is zero (for the data-generating model) which suggests that model selection via the marginal likelihood recovers the data generating model in all simulated comparisons. Of particular importance is the first row, which shows that the models do not give “false positive” results. The first row shows model performance when data were simulated from a static LBA model, and in that row the static LBA model was preferred by a large Bayes factor. This was the case even though the simulated data naturally always contain some variability across time periods, due to sampling error. The model selection results confirm that this variability is correctly attributed to random sampling error and not to a dynamic generating process.

Static AR Trend AR+trend
Static - -
AR 20 100
40 50
Trend 20 100
40 50
AR+Trend 20 100
40 50
Table 1: Model selection results for the simulation study. Rows represent the different models used to generate the data, and also different balances between numbers of trials per block () and blocks (time periods, ). The columns correspond to different models estimated from the simulated data. Each entry shows the difference in the log of the estimated marginal likelihood for the estimated model relative to the data generating model, with negative values indicating poorer performance than the data generating model, i.e., smaller marginal likelihood estimate. Standard errors of the log of the marginal likelihood estimates were all smaller than 1.1, and are omitted for clarity.

We show for the AR model that our methods recover the data-generating parameters.111For brevity, we report parameter recovery for just one of the models. We selected the AR model for this because that model is preferred in data – see below. Table 2 shows the mean group-level parameters () and the corresponding group-level covariance matrix (

). Each row represents a parameter of the AR model, and shows the data-generating value along with the credible interval estimated from simulated data. For six of the seven mean parameters (

) the estimated credible interval includes the data-generating parameter – the data-generating value for is just below the estimated credible interval. For five of the seven between-subject variance parameters (diagonal elements of ) the credible interval includes the data-generating value – the data-generating value for the between-subject variance in parameter is just above the estimated credible interval, and for the parameter it is just below. The off-diagonal elements of , which describe the covariance between parameters, are mostly well recovered as well (17 of the 21 data-generating values are within the estimated credible intervals). Given the small sample size (just participants) in this simulation study, this parameter recovery performance is good.

While Table 2 shows that the group-averaged parameters can be recovered, it is also interesting to know whether the methods recover time-changing random effect parameters for individual subjects. Figure 3 addresses this question by showing the block-by-block data-generating parameters for three example participants, overlaid with the estimated posterior distributions. In all cases, the data-generating values (solid lines) are closely tracked by the estimated values (transparently shaded regions), perhaps with the exception of the drift rate for the accumulator corresponding to the wrong response (orange series in the middle column). For completeness, a corresponding figure with data for all 19 simulated participants is available at osf.io/x29wb.

Table 2: Parameter estimation results for the simulation study. Each row represents one parameter of the AR model. Each cell shows the data-generating value of the parameter and the 95% credible interval estimated using the PMwG method. The left-most column corresponds to the means of the parameters () and the remaining columns show the covariance matrix ( – omitting the upper triangle to avoid redundancy). All values are for the log-transformed parameters ().
Figure 3: Recovery of the response threshold, drift rate and non-decision time parameters (columns) for three sample participants from the simulation study. Shaded regions in each panel show 50% credible intervals of the column-named parameters for block-wise estimates of the AR model. Lines show the data-generating values. Parameters have been transformed back to their standard definitions by exponentiating the ’s.

4 Data

We used the psychological theories and statistical methods developed above to address questions about how the participants decision-making processes changed with time-on-task in three experiments. The first two experiments investigated speed-accuracy tradeoffs, by manipulating the balance between caution and urgency. The third experiment investigated how people bias their decisions in response to changes in environmental base rates – more vs. less frequent presentations of some stimulus classes or others. In each data set, we used the models to establish whether or not there was evidence for changes in the cognitive processes of decision-making over time. We further investigated this question by comparing the different models for time-on-task, to better understand how the changes evolve with time.

Forstmann . (2008) had 19 participants make repeated decisions about the direction of motion shown in a random dot kinematogram. We analyzed data from the behavioral training, pre-scanning, session in which each participant made 840 decisions distributed evenly over three conditions. Those conditions changed the instructions given to participants about whether they should emphasize the speed of their decisions, the accuracy of their decisions, or adopt a “neutral” balance between speed-emphasis and accuracy-emphasis. The data revealed large changes in both the speed (response time) and accuracy of decisions between speed-emphasis and accuracy-emphasis conditions, but only small differences between the accuracy-emphasis and neutral-emphasis conditions. See p.17541 of the original article for full details of the method.

To model the decisions in these data, we followed the same LBA specification as used in the original article and confirmed subsequently by Gunawan . (2019). This specification is the one used in the simulation study above (see Equation (10)). The modelling collapses across left- and right-moving stimuli, forcing the same mean drift rate for the accumulator corresponding to a “right” response to a right-moving stimulus as for the accumulator corresponding to a “left” response to a left-moving stimulus: we denote this mean drift rate by . Similarly, drift rates for the accumulators corresponding to the wrong direction of motion are constrained to be equal and denoted by . Three different response thresholds were estimated, for the speed, neutral, and accuracy conditions: , , and respectively. Two other parameters were estimated: the time taken by non-decision process () and the width of the uniform distribution for start points in evidence accumulation (). To investigate time-on-task we divided the trials into blocks which matched the experimental procedure, of size trials on average (there was some variability due to a few missing data). There were blocks for each subject.

Wagenmakers . (2008) Experiment 1 had 17 participants make decisions about whether letter strings were valid English words (e.g., “RACE”) or non-words (e.g., “RAXE”). Each participant made decisions about letter strings. Half of the letter strings were non-words. The other half were divided across three types of words: high frequency words which are very common in written English (e.g., “ROAD”); low frequency words which are uncommon (e.g., “RITE”); and very low frequency words, which are extremely rare (e.g., “RAME”). In addition to this manipulation, decisions were arranged into blocks of decisions (trials) each. The instructions given to participants changed from block-to-block: in alternate blocks, participants were instructed to emphasize the accuracy or the speed of their decisions. See p.144 of the original article for full details of the method. We used the experimenter-defined blocks to investigate the effects of time-on-task: trials in each of blocks, except for a very small number of missing trials for some participants.

To describe the lexical decisions with the LBA model, we made two assumptions: that the speed-emphasis vs. accuracy-emphasis manipulation influenced only threshold settings; and that the different stimulus categories (word frequency) influenced only the means of the drift rate distributions. Assumptions like these are sometimes called “selective influence” assumptions, and are important for the psychological interpretation of the theory (Ratcliff  Rouder, 1998; Voss ., 2004). These assumptions result in the following random effects for participant in block :

(11)

such that independent mean drift rates () were estimated for each stimulus class (hf, lf, vlf, nw) and response (W, NW).

Wagenmakers . (2008) Experiment 2 was very similar in design to their Experiment 1, with 19 new participants. The important change for Experiment 2 was that the alternating blocks of speed-emphasis vs. accuracy-emphasis from Experiment 1 were replaced with alternating blocks in which either non-words or words appeared more often. In any one block of 96 trials, there were either 24 words and 72 non-words, or 72 words and 24 non-words. Blocks alternated between non-word-dominated and word-dominated. Participants were reminded, before each block began, which kind of stimulus string would be most common in the upcoming block of trials. For full details of the method, see p.152 of the original article. We again used the experimenter-defined blocks to investigate the effects of time-on-task ( and ).

To describe the lexical decisions in Experiment 2, we used a similar model specification as for Experiment 1, with one change to capture the difference between the experiments (from speed-accuracy to bias manipulation). In modelling Experiment 2, we allowed for response bias, expressed as different decision thresholds in the accumulators corresponding to “word” () and “non-word” () responses. These were allowed to be different between the blocks dominated by word stimuli () and non-word () stimuli, following the hypothesis that the base rate of the stimulus classes should influence participants’ biases. These assumptions result in the following random effects, for participant in block :

(12)

5 Results

Data from the three experiments, combined with the new time-varying LBA models, permit the investigation of interesting scientific questions which have previously been difficult or intractable. The primary theoretical question is whether there is evidence for changes in the parameters of cognitive processing with time-on-task, for each participant. This question is directly addressed by seeing whether the time-varying models, which allowed random effect parameters to evolve with time-on-task, have higher marginal likelihoods than the original static LBA model. The primary theoretical question opens up several subsidiary questions, about the ways in which parameters might evolve with time-on-task. These questions are addressed by comparing the fit of different dynamic models, and by inspecting the estimated values for model parameters related to time-on-task. For example, it is possible that time-on-task causes steady changes in decision-making processes, due to increasing fatigue or practice. On the other hand, it is also possible that time-on-task leads to non-smooth changes, for example if participants fluctuate between states that are more “on-task” and “off-task” (“mind wandering”: Mittner ., 2016). Our modelling approach allows these hypotheses to be investigated via the posterior distributions of the parameters related to the time-on-task models ( from Equation 2; and from Equation 5). A third important theoretical question concerns differences between estimated values for the parameters which are common to the static LBA model and the time-varying LBA models. That is to say, has the long-standing assumption of parameter stability systematically influenced parameter estimates?

We determined which model was best supported by the data by estimating the marginal likelihood using the method, with the same settings as above. Table 3 reports the model-selection results for data from the three experiments. For all three experiments, the preferred model was the AR model, which describes time-on-task effects as a first order autoregressive process over blocks. The differences between the marginal likelihoods are large compared to the scales usually used for such comparisons (e.g., they correspond to Bayes factors much larger than for all pairwise comparisons). The Monte Carlo errors of the log of the estimated marginal likelihoods (in parentheses in Table 3) are also small, confirming the efficiency of the method in real data. For all three experiments, the static (standard) LBA was the least-preferred account for the data, with the lowest marginal likelihood.

Static AR Trend AR+Trend
Forstmann et al. (2008)
Wagenmakers et al. (2008) Exp. 1
Wagenmakers et al. (2008) Exp. 2
Table 3: Model selection results for data from the three experiments: log of the marginal likelihoods and bootstrap-estimated standard errors (in parentheses).

The model selection results clearly favor the AR model over the others, but this does not necessarily imply that the model provides a good account of the data. Rather, the AR model may simply be the least-bad from a set of bad models. To examine this, we compared posterior predictive data generated from the AR model against the data, using three key summary statistics: the proportion of correct responses, the mean RT, and the standard deviation of RT. Figure 

4 shows these comparisons, broken down by time-on-task (blocks) and by the important experimental conditions, but averaged across participants. The model captures the data quite well. In particular, the AR model accommodates the changes with block in the mean and standard deviation of RT for all experiments. The model also accommodates changes in the proportion of correct responses (i.e., accuracy) for data reported by Forstmann . (2008), but the model under-predicts average accuracy by about for both experiments reported by Wagenmakers . (2008). The tightly-constrained selective influence assumptions which were imposed on the model nevertheless allowed it to accommodate the differences between conditions, although there is a tendency to under-predict the difference between conditions in the standard deviation of RT, for the data from Wagenmakers . (2008).

Figure 4: Data (solid lines) from three highly-cited experiments: one reported by Forstmann et al. (2008) and two reported by Wagenmakers et al. (2008), shown in columns. The top row shows that average accuracy decreased with time-on-task (-axis) in Forstmann et al.’s experiment and to a lesser extent also in Experiment 1 of Wagenmakers et al. (2008), but not in Experiment 2. The second row shows that mean response time (RT) increased with practice in one experiment and decreased in the other two, and the third row shows that the standard deviation of RT either increased, decreased or was approximately static. Posterior predictive data from the AR model are overlaid (dashed lines). In all cases, the model captures the time-varying trends present in the group-level data.

In addition to providing a good fit at the level of averaged data, the AR model also provides a good fit to individual subjects’ data. Figures 5 and 6 show – just for one of the three experiments – that trends in mean RT and accuracy are accommodated across blocks and between conditions, for most participants. Corresponding figures for the other two experiments are shown in Appendix B.

Figure 5: Mean RT from Experiment 1 of Wagenmakers et al. (2008) separately for each individual participant (panels). The red and green lines show performance in the accuracy-emphasis and speed-emphasis conditions, across time-on-task (blocks; -axis). Posterior predictive data from the AR model are shown by dashed lines. In all cases, the model captures the time-varying trends present in the participant-level data.
Figure 6: Decision accuracy from Experiment 1 of Wagenmakers et al. (2008) separately for each individual participant (panels). The red and green lines show performance in the accuracy-emphasis and speed-emphasis conditions, across time-on-task (blocks; -axis). Posterior predictive data from the AR model are shown by dashed lines. In almost all cases, the model captures the time-varying trends present in the participant-level data.

Figure 7 illustrates how the estimated parameters of the cognitive process change with time-on-task for the data from Forstmann . (2008). For three example individuals (rows in the figure), the columns of the figure show how the response threshold parameters (), the drift rate parameters (), and the non-decision time parameter () change over blocks. In each panel, the shaded regions show the 50% credible intervals estimated from the AR model across blocks (and across conditions, for the threshold parameters, and stimulus types for the drift rate parameters). There is substantial variation in the individual-level parameters with time-on-task. For example, participant increases their response thresholds by almost double over the course of the experiment. Over the same time, their ability to distinguish between correct and incorrect responses (purple and orange in the middle panel, respectively) declines markedly over blocks, while the time taken for their non-decision processing decreases by about 25%. There is also substantial variation between participants – for example, the drift rate estimates reveal approximately constant sensitivity to the difference between correct and incorrect responses for , while this sensitivity decreases substantially for , and increases slightly for .

Figure 7 also includes the estimated parameters for the same three participants from the standard, static LBA model – these are the intervals shown just off the right-hand ends of each -axis. These estimates do not always match the naive expectation that the static model might estimate parameters that are some average across a true time-varying parameter. In around half of the cases the static model estimates a parameter outside of the range of block-by-block parameter estimates from the AR model. This effect is most marked for non-decision time parameters – for all three participants, the estimate from the static LBA is faster than the estimate for any block from the AR model. The effect on non-decision time is most pronounced for participant . This participant also shows similar effects for drift rate and threshold estimates, all of which are higher in the static LBA than for any block-wise estimate of the AR model.

Figures 8 and 9 illustrate similar parameter estimates for three individual participants each from Experiments 1 and 2 respectively, reported by Wagenmakers . (2008). For completeness, corresponding figures showing estimates from all participants in all three studies are available at osf.io/x29wb.

Figure 7: Estimates of the response threshold, drift rate and non-decision time parameters (columns) for three sample participants from Forstmann et al. (2008; rows). Shaded regions in each panel show 50% credible intervals of the column-named parameters for the static LBA (rightmost shaded region) and the time-varying block-wise estimates of the AR model (leftmost shaded region that evolves over blocks). Estimates of the standard LBA are not always related to the central tendency of the block-wise estimates of the AR model. This suggests that failing to account for time-varying trends in data may systematically bias parameter estimates. Parameters have been transformed back to their standard definitions, not log.
Figure 8: Estimates of the response threshold, drift rates (for word, and for non-word response accumulators), and non-decision time parameters (columns) for three sample participants from Experiment 1 of Wagenmakers et al. (2008; rows). Shaded regions in each panel show 50% credible intervals of the column-named parameters for the static LBA (rightmost shaded region) and the time-varying block-wise estimates of the AR model (leftmost shaded region that evolves over blocks). Parameters have been transformed back to their standard definitions, not log.
Figure 9: Estimates of the response thresholds (for word-dominated and non-word-dominated blocks), drift rates (for word, and for non-word response accumulators), and non-decision time parameters (columns) for three sample participants from Experiment 2 of Wagenmakers et al. (2008; rows). Shaded regions in each panel show 50% credible intervals of the column-named parameters for the static LBA (rightmost shaded region) and the time-varying block-wise estimates of the AR model leftmost shaded region that evolves over blocks). Parameters have been transformed back to their standard definitions, not log.

We investigated the differences between the estimated parameters from the static LBA and the AR model across the entire sample of participants reported by Forstmann . (2008). Table 4 compares the average parameters () as well as the between-participants variance of each parameter (diagonal elements of , in parentheses). The estimated between-participants variance is smaller for the time-varying model, in every case. This is consistent with the hypothesis that the static model fails to capture some of the variability in parameters from block to block, and this variability instead inflates the across-subjects variance estimates. There were also differences in the estimated means. The AR model led to slower estimates of non-decision time, corresponding to approximately an extra 40msec spent in non-decision time on average (). The estimated decision thresholds were lower in the AR than the standard LBA, but the important differences between conditions were similar. The autoregression parameter () for the AR model was estimated to be quite large: the mean of the posterior distribution was with a credible interval of . This indicates a “sticky” process, in which random effects are highly correlated from one time period (block of trials) to the next. This correlation emphasizes the importance of modelling the smooth changes in decision processes over time-on-task.

Static AR
Table 4: Group-level parameter estimates from Forstmann et al.’s (2008) data. Each row represents a different model parameter, and shows the mean of the posterior distribution for the log-scaled parameter (), and the mean of the posterior distribution for the corresponding diagonal element of the covariance matrix (, in parentheses). These are shown for the static LBA and the preferred auto-regressive model (AR). Estimated between-subjects parameter variance is smaller for the AR model than for the standard LBA model.

The model selection clearly favored the time-varying models over the static LBA, and further favored the AR model over the other time-varying models which included polynomial trend compoments. Table 5 shows the parameter estimates for the linear and quadratic trends components of those models. The trend estimates are mostly smaller in the AR+trend model, which also includes an autoregressive component. This is consistent with the hypothesis that an AR type of process is part of the data-generating system. There is some evidence for linear trends in the threshold parameters of the model, but little evidence for reliable quadratic effects in any parameters.

Linear () Quadratic ()
Trend AR+Trend Trend AR+Trend
Table 5: Posterior means (and standard deviations) for parameters estimated from Forstmann et al.’s (2008) data for the linear and quadratic trends. Each row represents a different model parameter and shows the linear and quadratic trend estimates from the trend and AR+trend models.

6 Discussion

Human behavior is fundamentally non-stationary. The improvements gained through repeated practice are often large, and are reliably observed across a wide range of behavior, from very simple counting tasks to complex tasks with multiple stages (Newell  Rosenbloom, 1981; Evans ., 2018; Wynton  Anglim, 2017). In other cases, time-on-task leads to steady decreases in performance, via fatigue (Van der Linden ., 2003; Dorrian ., 2007), and in yet other cases to alternating periods of good and bad performance, caused by mind-wandering (so called “task unrelated thoughts”: Mittner ., 2016; Giambra, 1995). In many cases – such as those cited above – cognitive changes with time-on-task are a central focus of the research activity, and are directly investigated.

However, in the vast majority of cases, the changes with time-on-task are ignored. When off-the-shelf analysis tools such as ANOVA are applied, the data are routinely aggregated across time-on-task. More importantly, the same approximation is usually made even when more detailed analytic theories are constructed. Relevant to our investigation, there have been dozens or perhaps hundreds of model-based analyses of the cognitive processes underlying simple decision-making in recent years. These analyses include applied studies which reveal how experimental manipulations or person-based variables are associated with differences in decision-making due to changed caution, sensitivity, or other model parameters, while other studies focus on development or elaboration of the decision-making theories themselves, including extension (for review, see Ratcliff ., 2016). However, in almost all cases, the data are treated as if time-on-task has no effect, and the models are developed to predict unchanging and independent decisions.

Our work provides a pathway for resolving the obvious discrepancy between the reliably observed effects of time-on-task and the psychological theories which mostly ignore them. We augment a standard static model of decision-making, the linear ballistic accumulator, with a process that allows for noisy evolution of the model parameters with time-on-task, separately for each individual participant. Three different variants of the augmented time-varying LBA are compared with the standard LBA, using data from three well-cited experiments. Model comparison via Bayes factors yield strong support for the time-varying models over the static models. The strongest support is observed for the dynamic model variant which assumes that time-on-task influenced model parameters according to a first order autoregressive process (AR). This represents a psychological process in which model parameters evolve slowly, via a “sticky” process: during each period of the task, the model parameters for each subject are a combination of the parameters from the previous period, plus a long-run average. Autoregressive models are seldom used in theoretical psychology (Heath, 2014), and our results suggest that they should be considered more widely.

The effects of time-on-task were clear in group data, but were even more substantial in individuals. The same components of decision-making changed with time-on-task by different amounts, and sometimes in different directions, for different individuals. For example, Figure 7 shows that the caution of responses (threshold parameters) tended to increase with time-on-task, but that decision sensitivity increased for some participants and decreased for others. Such effects are not easily accounted for or investigated without a detailed model of time-varying effects. Previous investigations of decision-making, which have assumed stationarity statistical models have most likely estimated inflated levels of between-subject error variance due to mis-attributing these individual differences in the effects of time-on-task. Our comparison with the standard LBA model reveal just that, with greater precision for between-subject parameter distributions available in the time-varying model than the standard static model. In addition to increased error variance, it also seems likely that traditional models have suffered from bias in parameter estimates by failing to account for the effects of time-on-task. Some of the parameters estimated for the static LBA were systematically different from the corresponding values estimated using the time-varying model. This speaks against the naive hope that the static model will simply estimate parameters that are the across-time average of the parameters from the time-varying model.

Our modelling approach is likely to be useful beyond the specific application to the LBA model, and beyond decision-making research more generally. The central idea involves incorporating statistically tractable models for time series as descriptions of the changes in model parameters for individual subjects with time-on-task. This approach presents some estimation challenges, which are surmountable for many psychological theories, given recent advances in statistical and computational approaches. We hope that our approach will be used in two ways: as a way to routinely model time-on-task effects in simple decision-making; and more generally to extend quantitative theories of cognition to investigate interesting effects such as those of practice, learning, and fatigue.

References

  • Andrieu . (2010) andrieu2010particleAndrieu, C., Doucet, A.  Holenstein, R.  2010. Particle Markov chain Monte Carlo methods Particle Markov chain Monte Carlo methods. Journal of the Royal Statistical Society, Series B721-33.
  • Brown  Heathcote (2008) brown2008simplestBrown, SD.  Heathcote, A.  2008. The simplest complete model of choice response time: Linear ballistic accumulation The simplest complete model of choice response time: Linear ballistic accumulation. Cognitive psychology573153–178.
  • Brown . (2008) brown2008integratedBrown, SD., Marley, AAJ., Donkin, C.  Heathcote, A.  2008. An Integrated Model of Choices and Response Times in Absolute Identification An integrated model of choices and response times in absolute identification. Psychological Review115396–425.
  • Bunch . (2015) bunch2015particleBunch, P., Lindsten, F.  Singh, SS.  2015. Particle Gibbs with refreshed backward simulation Particle Gibbs with refreshed backward simulation. IEEE International conference on acoustics, speech, and signal processing.
  • Chib  Jeliazkov (2001) chib2001marginalChib, S.  Jeliazkov, I.  2001. Marginal likelihood from the Metropolis-Hastings output Marginal likelihood from the Metropolis-Hastings output. Journal of American Statistical Association96453270-281.
  • Craigmile . (2010) craigmile2010autocorrelatedCraigmile, PF., Peruggia, M.  Van Zandt, T.  2010. An autocorrelated mixture model for sequences of response time data An autocorrelated mixture model for sequences of response time data. Psychometrika75613–632.
  • Donkin  Brown (2018) donkin2018responseDonkin, C.  Brown, SD.  2018. Response Times and Decision-Making Response times and decision-making. Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience, Methodology349.
  • Donkin . (2009) donkin2009overconstraintDonkin, C., Brown, SD.  Heathcote, A.  2009. The overconstraint of response time models: Rethinking the scaling problem The overconstraint of response time models: Rethinking the scaling problem. Psychonomic Bulletin & Review1661129–1135.
  • Dorrian . (2007) dorrian2007simulatedDorrian, J., Roach, GD., Fletcher, A.  Dawson, D.  2007. Simulated train driving: fatigue, self-awareness and cognitive disengagement Simulated train driving: fatigue, self-awareness and cognitive disengagement. Applied ergonomics382155–166.
  • Evans . (2018) evans2018refiningEvans, NJ., Brown, SD., Mewhort, DJ.  Heathcote, A.  2018. Refining the law of practice. Refining the law of practice. Psychological review1254592.
  • Evans  Hawkins (2019) evans2019humansEvans, NJ.  Hawkins, GE.  2019. When humans behave like monkeys: Feedback delays and extensive practice increase the efficiency of speeded decisions When humans behave like monkeys: Feedback delays and extensive practice increase the efficiency of speeded decisions. Cognition18411–18.
  • Forstmann . (2008) forstmann2008striatumForstmann, BU., Dutilh, G., Brown, S., Neumann, J., Von Cramon, DY., Ridderinkhof, KR.  Wagenmakers, EJ.  2008. Striatum and pre-SMA facilitate decision-making under time pressure Striatum and pre-sma facilitate decision-making under time pressure. Proceedings of the National Academy of Sciences1054517538–17542.
  • Giambra (1995) giambra1995laboratoryGiambra, LM.  1995. A laboratory method for investigating influences on switching attention to task–unrelated imagery and thought A laboratory method for investigating influences on switching attention to task–unrelated imagery and thought. Consciousness and Cognition41–21.
  • Gunawan . (2019) gunawan2018newGunawan, D., Hawkins, GE., Tran, MN., Kohn, R.  Brown, SD.  2019. New Estimation Approaches for the Linear Ballistic Accumulator Model New estimation approaches for the linear ballistic accumulator model. under review.
  • Heath (2014) heath2014nonlinearHeath, RA.  2014. Nonlinear dynamics: Techniques and applications in psychology Nonlinear dynamics: Techniques and applications in psychology. Psychology Press.
  • Heathcote . (2000) heathcote2000powerHeathcote, A., Brown, S.  Mewhort, DJK.  2000. The Power Law Repealed: The Case for an Exponential Law of Practice The power law repealed: The case for an exponential law of practice. Psychonomic Bulletin & Review7185–207.
  • Hesterberg (1995) hesterberg1995weightedHesterberg, T.  1995. Weighted average importance sampling and defensive mixture distributions Weighted average importance sampling and defensive mixture distributions. Technometrics37185-194.
  • Kim . (2017) kim2017bayesianKim, S., Potter, K., Craigmile, PF., Peruggia, M.  Van Zandt, T.  2017. A Bayesian race model for recognition memory A bayesian race model for recognition memory. Journal of the American Statistical Association11251777–91.
  • Kim . (1998) kim1998stochasticKim, S., Shephard, N.  Chib, S.  1998. Stochastic Volatility: Likelihood Inference and Comparison with ARCH Models Stochastic volatility: Likelihood inference and comparison with arch models. The Review of Economic Studies653361-393. 10.1111/1467-937X.00050
  • Kitagawa (1996) kitagawa1996monteKitagawa, G.  1996. Monte Carlo filter and smooth for non-Gaussian nonlinear state space models Monte Carlo filter and smooth for non-Gaussian nonlinear state space models. Journal of Computational and Graphical Statistics511-25.
  • Mittner . (2016) mittner2016neuralMittner, M., Hawkins, GE., Boekel, W.  Forstmann, BU.  2016. A neural model of mind wandering A neural model of mind wandering. Trends in Cognitive Sciences20570–578.
  • Newell  Rosenbloom (1981) newell1981mechanismsNewell, A.  Rosenbloom, PS.  1981. Mechanisms of Skill Acquisition and the Law of Practice Mechanisms of skill acquisition and the law of practice. JR. Anderson (), Cognitive Skills and Their Acquisition Cognitive skills and their acquisition ( 1ñ-55). Hillsdale, NJErlbaum.
  • Palmeri (1999) palmeri1999theoriesPalmeri, TJ.  1999. Theories of Automaticity and the Power Law of Practice Theories of automaticity and the power law of practice. Journal of Experimental Psychology: Learning, Memory, and Cognition25543–551.
  • Ratcliff  Rouder (1998) ratcliff1998modelingRatcliff, R.  Rouder, JN.  1998. Modeling response times for two-choice decisions Modeling response times for two-choice decisions. Psychological Science95347–356.
  • Ratcliff . (2016) ratcliff2016diffusionRatcliff, R., Smith, PL., Brown, SD.  McKoon, G.  2016. Diffusion decision model: Current issues and history Diffusion decision model: Current issues and history. Trends in Cognitive Sciences20260–281.
  • Ratcliff  Van Dongen (2011) ratcliff2011diffusionRatcliff, R.  Van Dongen, HP.  2011. Diffusion model for one–choice reaction–time tasks and the cognitive effects of sleep deprivation Diffusion model for one–choice reaction–time tasks and the cognitive effects of sleep deprivation. Proceedings of the National Academy of Science10811285–11290.
  • Tran . (2019) tran2019robustlyTran, MN., Scharth, M., Gunawan, D., Kohn, R., Brown, SD.  Hawkins, GE.  2019. Robustly estimating the marginal likelihood for cognitive models via importance sampling Robustly estimating the marginal likelihood for cognitive models via importance sampling. arXiv preprint arXiv:1906.06020.
  • Turner . (2013) turner2013methodTurner, BM., Sederberg, PB., Brown, SD.  Steyvers, M.  2013. A method for efficiently sampling from distributions with correlated dimensions A method for efficiently sampling from distributions with correlated dimensions. Psychological Methods18368–384.
  • Van der Linden . (2003) van2003mentalVan der Linden, D., Frese, M.  Meijman, TF.  2003. Mental fatigue and the control of cognitive processes: effects on perseveration and planning Mental fatigue and the control of cognitive processes: effects on perseveration and planning. Acta psychologica113145–65.
  • Voss . (2004) voss2004interpretingVoss, A., Rothermund, K.  Voss, J.  2004. Interpreting the Parameters of the Diffusion Model: An Empirical Validation Interpreting the parameters of the diffusion model: An empirical validation. Memory & Cognition321206–1220.
  • Wagenmakers . (2008) wagenmakers2008diffusionWagenmakers, EJ., Ratcliff, R., Gomez, P.  McKoon, G.  2008. A diffusion model account of criterion shifts in the lexical decision task A diffusion model account of criterion shifts in the lexical decision task. Journal of Memory and Language581140–159.
  • Walsh . (2017) walsh2017computationalWalsh, MM., Gunzelmann, G.  Van Dongen, HP.  2017. Computational cognitive modeling of the temporal dynamics of fatigue from sleep loss Computational cognitive modeling of the temporal dynamics of fatigue from sleep loss. Psychonomic Bulletin & Review241785–1807.
  • Wynton  Anglim (2017) wynton2017abruptWynton, SK.  Anglim, J.  2017. Abrupt strategy change underlies gradual performance change: Bayesian hierarchical models of component and aggregate strategy use. Abrupt strategy change underlies gradual performance change: Bayesian hierarchical models of component and aggregate strategy use. Journal of Experimental Psychology: Learning, Memory, and Cognition43101630.

7 Appendix A: Bayesian Estimation Methods for Time-Varing LBA Models

This section develops Bayesian estimation methods for the dynamic models. Let be the vector of unknown model parameters and let be the prior distribution over ; means dimensional Euclidean space. Let be the vector of observations for the subject at the block, and define to be the vector of all observations for subject and to be the vector of observations for all subjects. Let be the vector of the particular parameter values for subject during the time period . We will refer to subject-specific parameters as “random effects”, and to time periods as “blocks”. We define as all the random effects for subject and as the vector of random effects for all subjects. We now have that

for the AR and AR+Trend models, and

for the Trend model. Equation (1) is the density of . Our goal is to sample from the posterior density

(13)

where

(14)

is the marginal likelihood. In addition to sampling from the posterior density (for parameter inference), estimating the marginal likelihood itself is used for model selection via Bayes factors.

We develop a sampling algorithm using particle Markov chain Monte-Carlo, based on methods from Andrieu . (2010)

. The core idea is to define a target distribution on an augmented space that includes all the parameters of the model (and random effects) as well as the random variables generated by Monte Carlo sampling, and such that this augmented distribution has as its marginal distribution the joint posterior of the parameters and the random effects. The rest of the appendix is organized as follows. We first describe separately the target distributions for the trend model, and for the AR and AR+trend models. We then discuss a particle Metropolis-within-Gibbs sampler which can be applied to all three dynamic models. Finally, we develop an extension of the “Importance Sampling Squared” (

) algorithm for estimating the marginal likelihoods for dynamic LBA models.

7.1 Target Distribution for the Trend model

Let be a family of proposal densities that we use to approximate the conditional posterior densities . We denote refer to all the particles for subject , generated by a standard Monte Carlo algorithm at block . We can then write the joint density of the particles given the parameters for subject as

(15)

and the joint density of the particles given the parameters for all subjects as

(16)

To define the required augmented target densities, let and , with each of , be a vector of all selected individual random effects with and
with and .

7.1.1 Theorem 1

The augmented target density is

(17)

This target distribution has the marginal distribution

(18)

and hence, with some abuse of the notation, we can write .

7.2 Target Distributions for the AR and AR+Trend models

7.2.1 Sequential Monte Carlo (SMC)

We first briefly describe the sequential Monte Carlo methods used to approximate the filtering densities for and . The sequential Monte Carlo algorithm consists of recursively producing a set of weighted particles , such that the intermediate densities are approximated by

(19)

where is the Dirac delta distribution located at . Given particles , representing the filtering density at time , the filtering density is

(20)

We can use Equation (20) to obtain the particles by first drawing the particles from a proposal distribution and then computing weights to account for the difference between the posterior density and the proposal density:

The weights are then normalized: . We implement the SMC algorithm for by using a multinomial resampling scheme, defined as . The argument means that is the ancestor of . The generic SMC algorithm is shown in Algorithm 1.

Inputs: , ,

Outputs: , ,

  1. For

    1. Sample from , for

    2. Calculate the importance weights

      and normalize those to obtain .

  2. For

    1. Sample the ancestral indices .

    2. Sample from , .

    3. Calculate the importance weights

      and normalize those to obtain .

Algorithm 1 Generic Sequential Monte Carlo Algorithm

The unbiased likelihood estimate for subject is a by product of the SMC algorithm

and the unbiased likelihood estimate for the subjects is

(21)

7.3 Particle Metropolis within Gibbs Sampling Scheme

The key idea of particle MCMC is to construct a target distribution on an augmented space that includes all the particles and ancestor indices and has the joint posterior density of the random effects and the parameters as the marginal.

We write and to refer to all the particles and ancestor indices for subject , respectively, generated by the SMC algorithm at block . We can write the joint density of the particles given parameters for subject

and the joint density of the particles given the parameters for all subjects

Let be the selected reference trajectory for subject with associated i