A state space approach to dynamic modeling of mouse-tracking data

07/19/2019 ∙ by Antonio Calcagnì, et al. ∙ Università di Padova 0

Mouse-tracking recording techniques are becoming very attractive in experimental psychology. They provide an effective means of enhancing the measurement of some real-time cognitive processes involved in categorization, decision-making, and lexical decision tasks. Mouse-tracking data are commonly analysed using a two-step procedure which first summarizes individuals' hand trajectories with independent measures, and then applies standard statistical models on them. However, this approach can be problematic in many cases. In particular, it does not provide a direct way to capitalize the richness of hand movement variability within a consistent and unified representation. In this article we present a novel, unified framework for mouse-tracking data. Unlike standard approaches to mouse-tracking, our proposal uses stochastic state-space modeling to represent the observed trajectories in terms of both individual movement dynamics and experimental variables. The model is estimated via a Metropolis-Hastings algorithm coupled with a non-linear recursive filter. The characteristics and potentials of the proposed approach are illustrated using a lexical decision case study. The results highlighted how dynamic modeling of mouse-tracking data can considerably improve the analysis of mouse-tracking tasks and the conclusions researchers can draw from them.



There are no comments yet.


page 33

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Over the last decades, the study of computer-mouse trajectories has brought to light new perspectives into the investigation of a wide range of cognitive processes (e.g., for a recent review see [freeman2017]). Unlike traditional behavioral measures such as reaction times and accuracies, mouse trajectories may offer a valid and cost-effective way to measure the real-time evolution of ongoing cognitive processes during experimental tasks [friedman2013linking]. This has also been supported by recent researches investigating mouse-tracking in association to more consolidated experimental devices, such as eye-tracking and fMRI [quetard2016differential, stolier2017neural]. In a typical mouse-tracking experiment, participants are presented with a computer-based interface showing the stimulus at the bottom of the screen and two competing categories on the left and right top corners. Participants are asked to select the most appropriate label given the task instruction and stimulus while the x-y trajectories are instantaneously recorded. The main idea is that trajectories of reaching movements can unfold the decision process underlying the hand movement behavior. For instance, the curvature of computer-mouse trajectories might reveal competing processes activated in discriminating the two categories. Mouse-tracking has been successfully applied in several cognitive research studies, including lexical decision [ke2017quantificational, incera2017sentence], social categorization [freeman2016perceptual, carraro2016hand], numerical cognition [faulkenberry2014hand, faulkenberry2016testing], memory [papesh2012memory], moral decision [koop2013assessment], and lie detection [monaro2017detection]. Moreover, the availability of specialized and freely-available software for mouse-tracking experiments have strongly contributed to the wide-spread application of such a methodology in the more general psychological domain [freeman2010mousetracker, kieslich2017mousetrap]. Recently, the debate on the nature of cognitive processes tracked by this type of reaching trajectories have also received attention from the motor control literature [van2009trajectories, friedman2013linking].

So far, mouse-tracking data have been analysed using simple strategies based on the conversion of x-y trajectories into summary measures, such as maximum deviation, area under the curve, response time, initiation time [hehman2014advanced]. Although these steps are still meaningful in case of simple and well-behaved x-y trajectories, they can also provide biased results if applied to more complex and possibly noisy data. To circumvent these problems, other approaches have been proposed more recently [cox2012gaussian, calcagni2017analyzing, zgonnikov2017decision, krpan2017behavioral]. However, also the more recent proposals require modeling empirical trajectories before the data-analysis. Although these approaches potentially provide informative results in many empirical cases, they can also suffer from a number of issues, which revolve around the reduction of x-y data to simple scalar measures. For instance, problems may arise in the case of trajectories showing multiple phases, averaging with non-homogeneous curves, and signal-noise discrimination [calcagni2017analyzing]. As far as we know, a proper framework to simultaneously model and analyse mouse-tracking data in a unified way is still lacking.

In this paper we describe an alternative perspective based on a state-space approach with the aim to simultaneously model and analyse mouse-tracking data. State-space models are very general time-series methods that allow estimating unobserved dynamics which gradually evolve over discrete time. As for diffusion models, which are widely used in modeling the temporal evolution of cognitive decision processes [smith2004psychology], they belong to the general family of stochastic processes and offer optimal discrete approximation to many continuous differential systems used to represent dynamics with autoregressive patterns [cox2017theory]. In particular, we used a non-linear and discrete-time model that represents mouse trajectories as a function of some typical experimental manipulations. The model is estimated under a Bayesian framework, using a conjunction of a non-linear recursive filter and a Metropolis-Hastings algorithm. Data analyses is then performed using posterior distributions of model parameters [gelman2014bayesian].

The reminder of this article is organized as follows. In Section 2 we motivate our proposal by reviewing the main issues of a typical mouse-tracking experiment. In Section 3 we present our proposal and describe its main characteristics. In Section 4 we describe the application of our method to a psycholinguistic case study. Section 5 provides a general discussion of the results, comments and suggestions for further investigations. Section 6 concludes the article by summarizing its main findings.

2 A motivating example

To begin with, consider a two-choice semantic categorization task [dale2007graded]

, in which participants have to classify semantic stimuli (e.g., name of animals) into their corresponding categories (e.g., mammal, fish). In the most typical implementation of a mouse-tracking task, participants would sit in front of a computer screen showing a resting frame (see Figure

1-a). They start a trial by clicking a starting button at the bottom-center of the screen, after which they are presented with a given stimulus (e.g. hen, see Figure 1-b). To finalize the trial, participants move the cursor on the screen by means of a well tuned computer-mouse in order to reach and select one of the two labels presented on the top-left and top-right corners of the screen (e.g., mammal vs. bird, see Figure 1-c). In the meanwhile, x-y coordinates, initiation time, and final clicking time are registered for each participant and trial. The basic idea is that x-y trajectories reflect the extent to which the real-time categorization response is affected by the experimental manipulation. More precisely, as a result of the assumption that co-activation of competing processes continuously drive the explicit hand response [spivey2006continuous], one would suppose to see more curved - or generally irregular - trajectories in association with stimuli showing higher ambiguity. In our case, for instance, it would be expected that atypical exemplars, such as hen, dolphin, and penguin, globally produce more curved or irregular trajectories than typical exemplars like dog, rabbit, and lion (see Figure 1-d/e).

In the mouse-tracking literature, data analysis commonly proceeds summarizing the recorded trajectories by means of few indices, which are then used as input to standard statistical techniques. In the current example, for instance, the typicality manipulation could be tested by assessing whether the distribution of maximum deviations (i.e., the maximum curvature showed by trajectories) over trials and participants is bimodal or not [freeman2013assessing]. In a similar way, linear models could be employed to test whether the typicality effect varies as a function of external covariates, such as psycholinguistic variables.

Figure 1: (A)-(C) Conceptual diagram of a typical mouse-tracking task: (A)-(B) stimulus presentation, (B) participant’s response. (D)-(E) Prototypical mouse-tracking trajectories collapsed over participants and trials as a function of manipulation task: (D) case where the manipulation has an effect - as revealed by the curvature of the trajectories, (E) case where the manipulation has no effect. (F) Conceptual diagram for the atan2 conversion: grey circles represent the sampled x-y trajectories, yellow circles represent those x-y pairs projected onto the circumference outer the Cartesian plane, whereas red lines represent the projection direction. Note that in a two-choice categorization task, the correct category C2 is presented on the top-right label (target) whereas the competing category C1 is presented in the opposite top-left label (distractor).

However, the two-step approach does have some issues. For instance, it lacks a way to represent both the experimental variability - that is induced by task manipulations - and individual variability - that is instead produced by individual-specific motor programs. Likewise, in some cases, it might neglect relevant characteristics of x-y data, with the consequence that similar classes of trajectories are treated as if they were different. Still, a two-step approach does ignore the data generation process underlying observed trajectories. This does not allow, for example, making predictions or simulations on new data given the experimental settings.

In the next section, we will present a dynamic probabilistic model that handles mouse-tracking data in a unified way. Our proposal is based upon a Bayesian non-linear state space approach, which offers a good compromise between model flexibility and model simplicity while overcoming many drawbacks of the standard mouse-tracking analyses.

However, the two-step approach does have some issues. For instance, it lacks a way to represent both the experimental variability - that is induced by task manipulations - and individual variability - that is instead produced by individual-specific motor programs. Likewise, in some cases, it might neglect relevant characteristics of x-y data, with the consequence that similar classes of trajectories are treated as if they were different. Still, a two-step approach does ignore the data generation process underlying observed trajectories. This does not allow, for example, making predictions or simulations on new data given the experimental settings.

In the next section, we will present a dynamic probabilistic model that handles mouse-tracking data in a unified way. Our proposal is based upon a Bayesian non-linear state space approach, which offers a good compromise between model flexibility and model simplicity while overcoming many drawbacks of the standard mouse-tracking analyses.

3 State-space modeling of mouse-tracking data

A state-space model is a mathematical description used to represent linear or generally non-linear dynamic models. In their general form, state-space systems consist of (i) a measurement density

that describes how the observed vector of data

at time step is linked to a possibly underlying process and (ii) a state density describing the transition dynamics that drive the vector of states . Temporal dynamics can be discrete or continuous and, in the latter case, stochastic differential equations are used to model the transition dynamics. By and large, there are two aims of any analysis involving state-space models. The first is to infer the unobserved process given the data . This task is usually accomplished by means of filtering and smoothing procedures [jazwinski2007stochastic]. The second aim regards estimating the parameters given the complete set of data . This is commonly performed using gradient-based methods on the likelihood of the model [shumway1982approach]. Although state space models were originally used in the area of aerospace modeling [kalman1960new], they are now applied in a wide variety of domains, including control theory, remote sensing, economics, and statistics [hamilton1994state, shumway2006time]. Recently, there has also been an increasing interest in psychology, where state-space models have been used to analyse, for example, dyadic interactions [song2009state], affective dynamics [lodewyckx2011hierarchical, bringmann2017changing], facial electromyography data [yang2010using], individual differences [hamaker2012regime, chow2013nonlinear], and path analysis [gu2014state].

In line with this, we developed a state-space representation to simultaneously model and analyse mouse-tracking data. In particular, our proposal is to represent the empirical collection of computer-mouse trajectories as a function of two independent sub-models, one representing the experimental manipulations (stimuli equation) and the other capturing the main features of the mouse movement process (states equation). Thus, the goal of our analysis is twofold: (i) to determine the states equation for each participant over a set of experimental trials, (ii) to estimate the parameters governing the stimuli equation. The first goal will provide information on how participants differ from each other in terms of movement dynamics. By contrast, the second goal will find out to what extent the experimental manipulations affect the individual variations in producing mouse-tracking responses.

3.1 Data

Let be a (individuals) (trials) array representing the observed data. The element of defines the sub-array containing the streaming of Cartesian coordinates of the computer mouse movements:

with and being the first and the last coordinates for the -th participant in the -th trial. The coordinates in are temporally ordered because they are usually collected while the computer-mouse is moving along its surface with a constant sampling rate. Further, to make the observed data comparable, we rescale and normalize as a function of a common ordered scale, which indicates the cumulative amount of progressive time from to [tanawongsuwan2001gait, ramsay2006functional]. Thus, the final trajectories lie on the real plane defined by the hyper-rectangles , with the first movement being equal to by convention. Since we are interested in studying the co-activation of competing processes as reflected in some spatial properties of the response - such as location, direction, and amplitude of the action dynamics [spivey2006continuous, freeman2017] - we need to simplify the original data structure so that these properties can easily emerge. Inspired by some of the work on this problem [gowayyed2013histogram, kapsouras2014action, calcagni2017analyzing], we reduce the dimensionality of the data by projecting in a proper lower-dimensional subspace of movement via the restricted four-quadrant inverse tangent mapping (atan2, see [burger2010principles]) from the real coordinates to the interval as follows:

where is the angle at the beginning of the process whereas is the angle at the end of the process. Figure 1-F shows a graphical example of the atan2 function for a hypothetical movement path. Finally, the array of angles is the input for our state-space model.

3.2 Model representation

The unobserved states equation of the model is a AR(1) Gaussian model with transition density equal to:


which models how the movement process of the -th subject changes from the step to the next step . The stochastic dynamics for the

-th subject is constrained by the variance parameter

that represents the uncertainty about the future location given the current state .

The measurement equation for the observations is modeled by means of a two-component von-Mises mixture distribution with density equal to:


where the generic density is the standard von-Mises law:

In the density formula, the term is the exponentially scaled Bessel function of order zero [abramowitz1972handbook]. The parameters of the mixture density are mapped to the experimental interface of the two-choice categorization task (see Figure 1-d/e). In particular, the means are mapped to the label categories C1 and C2 whereas the concentrations indicate how the observations are spread around the means. In this sense, since are determined by the fixed and known positions of the labels C1 and C2 on the screen, they are not treated as parameters to be estimated. Finally, the terms and

are the probabilities to activate the first and second density of the von-Mises components and are expressed as function of the latent states

and some additional covariates. The model is Markovian, in the sense that the unobserved states form a Markov sequence and the measurements are conditionally independent given the unobserved states.

To further characterize our state-space representation, the probability is defined according to a logistic function:


with being the intercept of the model. Equation (3) can be interpreted as the probability for the -th subject at step to categorize the -th stimuli as belonging to C1 ( tends to ) or C2 ( tends to ). In addition, when the categories C1 and C2 are expressed in terms of distractor and target [freeman2017], the sequences can be interpreted as the attraction probability that the distractor has exerted on the trajectory process .
The state-space representation is completed by linearly expanding the intercept term as follows:


where , is an element of the array , whereas is an element of the (Boolean) partition matrix , with indicating whether the -th stimulus belongs to the -th level of the variable . Note that the matrix satisfies the property , for all . In our model representation, Equation (4) is the stimuli equation and conveys information about the experiment. It consists of three main terms. (i) A categorical term describing how the stimuli have been arranged into

distinct levels of a categorical variable

. (ii) A continuous term that expresses the stimuli as a function of a continuous variable weighted by the coefficient . (iii) An interaction term between the levels of and , where and . This definition allows for modeling all the cases implied by an univariate experimental design with at most one covariate variable. Indeed, for and this formulation boils down to the simplest experimental case with a single categorical variable . By contrast, for and it reduces to the case where stimuli are simply paired with a continuous predictor . Finally, when , the stimuli equation reduces to the most simple case where we have as many parameters as trials. Figure 2 shows a graphical representation of state-space model whereas Figure 3 illustrates the inner-working of the model for the simplest design with a two-level experimental factor.

Figure 2:

Graphical representation of our state-space model. Note that white circles represent unobserved random variables, white double circles indicate transformed random variables, gray circles are observed random variables. Finally, square objects depict scalar quantities. Loop over individuals

, trials , and time steps are represented by outer squares.
Figure 3: Conceptual diagram of the state-space representation for two hypothetical sequences of mouse-trajectories. (A) Latent movement process . (B) Logistic curves for two cases of . (C) Probability to activate the cue C1 (upper panel) and probability to activate the cue C2 (lower panel) for both and cases. (D) Measurements as a function of their frequency (rose diagram): A case of higher attraction (upper panel) and a case of lower attraction (lower panel).

In our model representation, the observed movement angles are sampled from C1 (resp. C2) with probabilities equal to (resp. ), which in turn are expressed as a function of the AR(1) latent trajectory . Hence, an increase in the latent process will also increase the probability that is sampled from the hemispace C1 (e.g., ). By contrast, a decrease in the latent process will increase the chance to sample from C2 (e.g., ). As a result of Eq. (4) such an increasing (or decreasing) pattern can be modulated by the stimuli component . Moreover, as the coefficients are decomposed as a function of , , and , we can also analyse the effect of on in terms of the experimental manipulation , the covariate , or the interaction term . Figure 3 shows a conceptual representation of the modeling steps involved by our approach. Panel (A) shows an example of the random-walk used to represent the movement process (Eq. 3). Instead, panel (B) shows the logistic function used to form the stimuli equation (Eq. 3) for two typical cases of . Panel (C) represents the probability to activate the distractor C1 (upper panel) and the probability to activate the target C2 (lower panel) as a function of and . Finally, panel (D) depicts two cases of observed radians that are associated to and . In particular, the upper panel shows an example of data with a pronounced attraction toward C1, which is in turn reflected by the blue probability curve of the panels (B)-(C). By contrast, the lower panel represents data with little attraction toward C1, as also reflected by the red probability curve of the panels (B)-(C). In this sense, as Equation (3) represents an intercept model, the parameter does not affect the shape of the movement dynamics . On the contrary, it acts by shifting the movement dynamics upward () or downward () toward the C1 or C2 hemispaces, respectively.

3.3 Model identification

State-space model identification consists of inferring the unobserved sequence of states by means of filtering and smoothing algorithms and estimating the model’s parameters via Likelihood-based approximations [shumway2006time, sarkka2013bayesian]

. For instance, in the simplest linear gaussian case, where both the states and measurement equations are linear with additive gaussian noise, inference of latent states is usually performed via Kalman filter whereas parameter’s estimation is realised with the Expectation-Maximization algorithm. In our case, as Eqs.(

3)-(4) describe a more complex non-linear model, we adopted a recursive Gaussian approximation filter for the inference problem [smith2003estimating], coupled with a marginal Metropolis-Hastings MCMC for the parameters estimation [andrieu2010particle].

To formulate the problem more precisely, let:


be the arrays representing all the unknown parameters and unobserved states that characterize the model’s behavior. In this context, can be set to without loss of model adequacy.222Indeed, the constraint still guarantees the mapping to cover the needed time-to-time variability of the random walk, as Eq. (3) acts as a shrinkage operator on the support of the r.vs . This has also been confirmed by several pilot simulations we ran on our model. Note that this assumption is not overly limiting, since our state-space representation is built under the smoothness assumption on the movement behavior of the hand, according to which large abrupt changes in the small interval are not allowed [yu2007mixture]. The joint log-density of the complete-data given the array of parameters and the observed data is defined as follows:


where the state and measurement equations are given as in (1)-(2) whereas the term is the a-priori density function for the initial state of the process. Note that the factorization (9) is due to the Markovian properties of the model. By adopting the Bayesian perspective, we perform inference conditional on the observed sample of angles , with being an unknown term. The posterior density of hidden states and parameters is as follows:


where is a prior density ascribed on the vector of model’s parameters . Note that Equation (10) comes from the standard conditional definition , where the joint density is re-arranged by factorization using the Markovian properties of the model [andrieu2010particle]. Since our aim is to get samples from the posterior , we proceed by jointly updating and using a marginal Metropolis-Hastings. This alternates between proposing a candidate sample given and filtering the sequences conditioned on . Finally, the candidate couple is jointly evaluated by the Metropolis-Hasting ratio.

The evaluation of both the densities and involve computing the expression in Eq. (11). To do so, we derived the first term by means of filtering and smoothing procedures [jazwinski2007stochastic] whereas the second term was evaluated by implementing a Metropolis-Hasting algorithm. All the technical steps for the model identification are included in Appendix A-C.

3.4 Model evaluation

The state-space model formulated can be evaluated in many ways under the Bayesian framework of analysis [gelman2014bayesian, shiffrin2008survey]. For instance, adequacy of the algorithm can be assessed via standard diagnostic measures, such as traceplot of the chains, autocorrelation measures, and the Gelman-Rubin statistics whereas the recovery of the true model structure can be done by simulations from the priors ascribed to the model [gelman2014bayesian]

. Similarly, the adequacy of the model to reproduce the observed data can be assessed by means of simulation-based procedures (e.g., posterior predictive checks) where the fitted model is used to generate new simulated datasets that are then compared to the observed data

[gelman1996posterior, cook2006validation]. In our context, the robustness of the model formulation in recovering the true model structure as well as the goodness of fit to the observed data are assessed by adopting a simulation-based approach. Technical details on this procedure are available in Supplementary Materials.

4 Application

In this section, we will present an application of the model to the analysis of an already published lexical decision dataset [barca2012unfolding]. The state-space modeling framework will be evaluated via three different instances of model representation with an increasing level of complexity. Note that the application we report here has only an illustrative purpose with the main goal to introduce and highlight the interpretation of the model’s parameters and the flexibility of its representation with dynamic data. All the models were estimated using 20 (chains) 10000 (iterations), with a burning-in period of 2500 iterations. Starting values for the MH algorithm were determined by maximizing the observed likelihood of the model in Eq. (2). Similarly, the starting covariance matrix was computed by using the Hessian of the observed likelihood at . The adaptive phase of the MH algorithm was performed at fixed interval (with ) to prevent the degeneracy of the adaptation. For each model, the prior densities were defined as , where the variance was sufficiently large to cover the natural range of the model parameters. The adequacy of the model to reproduce the data was evaluated with a simulation-based approach, where a series of new datasets were generated through the fitted model and compared with the observed data [cook2006validation]. The goodness of fit was evaluated overall (i.e., the adequacy of the model to reproduce the complete observed matrix ) and subject-based (i.e., the adequacy of the model to reproduce for each subject the observed matrix ). Comparisons were computed by means of 0-100% normalized measures, with 0% indicating bad fit and 100% optimal fit. Technical details as well as extended graphical results are included in Supplementary Materials.

4.1 General context and motivation

Lexical decision is one of the most known and widely used task to study visual world recognition and reading in the cognitive psycholinguistic literature [hawkins2012decision, norris2008perception, yap2008additive]. Generally, this task is very simple and versatile and provides an ideal context for applying the state-space modeling framework when lexical decision data are collected via the mouse tracking paradigm. In this application, we evaluated the extent to which the parameters of the state-space model reflect eventual differences associated with the manipulation of a stimulus type factor composed by words (with either high-frequency or low-frequency) and random strings (i.e., random sequence of letters that are phonotactically illegal in the language) in the lexical decision task. Moreover, we will take advantage of this psycholinguistic case study to show how our state-space model can deal with both categorical and (pseudo)quantitative predictive variables considered either individually or in interaction in the model. In particular, the first model instance will illustrate the application of our modeling framework when a simple categorical variable (stimulus type factor) is considered to affect the observed mouse-tracking trajectories collected using the lexical decision task. By contrast, the second model will be based on a simple regression-type model with a single quantitative independent variable (bigram frequency) used to predict the attraction toward the distractor category. Finally, the third model will integrate these two variables (stimulus type factor and bigram frequency) into a unified model including the main effects of the two variables as well as their interaction. In our context, the first two models will be considered as simple toy examples to illustrate the main features of the state-space model representation when applied to real data, whereas the third model will be discussed in more details according to a group analysis evaluation as well as an individual analysis representation.

4.2 Model 1

Data structure and variables. In the original work by [barca2012unfolding], the lexical decision experiment was run in Italian and based only on one stimulus type factor with four different levels: Words of high written frequency (HF, e.g., acqua “water"), words of low written frequency (LF, e.g., cervo “deer"), pseudowords (PW, e.g., “dorto"), and strings of letters that are orthographically illegal in Italian (NW, e.g., “btfpr"). In their study, participants saw a total of 96 stimuli, one at the time, and were required to categorize each stimulus as either a word or a nonword by using the mouse-tracking paradigm. Trajectories were recorded using the Mouse Tracker software [freeman2010mousetracker] with sampling rate of approximately 70Hz [barca2012unfolding]. As usual, raw trajectories were normalized into

time steps using linear interpolation with equal spaces between coordinate samples. However, for our analysis we preferred to select only three of the four levels of the experimental factor (that is to say, HF,LF, and NW) for a total of 72 stimuli equally distributed within each level.

333The motivation for this selection was due to some technical reasons regarding the lack of design balance in the original dataset, as the PW level showed a large number of errors when compared with the other three categories. In addition, the three-level representation of the stimulus type factor simplifies the interpretation of the results when we consider the full model with interaction. Finally, the dependent variable of Model 1 consisted of the movement angles array associated with the mouse-movement trajectory recorded for each distinct stimulus in the stimulus set.

Data analysis and results. In this first model the term in the stimuli equation boils down to the simple expression:

where the indices refer to HF, LF, and NW stimuli. The MCMC convergences of the algorithm are reported in Supplementary Materials. The model fitted the data very satisfactorily, with overall fit of 84% and subject-based fit of 74% (see Table 1

). The posterior quantiles (5%,50%, and 95%) are reported in Table

2 whereas figure 4-A shows the probability graph, that is to say, the probability to activate the distractor cue for each of the three levels HF, LF, NW as a function of the latent variable .

The results of this first analysis clearly show that the dynamics of the state-space model were unaffected by the different categories represented in the recoded experimental factor. This pattern finds further support in the post-hoc comparisons between the three experimental conditions (Figure 4-B). In sum, these findings indicate that for a dynamic model represented according to a state-space modeling framework, the three stimulus categories (HF, LF, and NW) were all processed in a very similar way, as the original trajectories were not sufficiently different among the three stimulus categories. In substantive terms, the results of the categorical model showed how the attraction probability toward the distractor was definitively modest in all the three experimental conditions. This is evident from a direct inspection of figure 4

-B where the probability activation function (logistic function) is shifted towards right (

) which in turn means that the average activation of the distractor category was relatively poor in HF, LF, and NW items. In this respect, the results of our simple spatial model were partially at odds with the outcomes observed using temporal measures (response time variables)


Figure 4: Application (Model 1): (A) Marginal posterior densities for the model parameters and (B) Probability to activate the distractor cue as a function of the levels (HF, LF, NW) of the categorical variable. Note that the densities in panel (A) are shown together for the sake of comparison.

4.3 Model 2

Data structure and variables. Also for the second model, the dependent variable was represented by the movement angles array . However, unlike model 1, in model 2 the original independent categorical variable (stymulus type factor) was replaced with a quantitative psycholinguistic variable called bigram frequency. Bigram frequency is defined as the frequency with which adjacent pairs of letters (bigrams) occur in printed texts; for its characteristics, it may be considered as a measure of orthographic typicality [hauk2006time]. In this second application, only bigram frequency was used as quantitative variable, since it was the only psycholinguistic variable that could be computed for all the 72 stimuli in the stimulus set. This second model instance nicely provides a simple but effective example of application of our state-space model when a continuous variable is considered to predict the attraction toward distractor.

Data analysis and results. In model 2 the term simply reduces to:

as the first and third terms in formula (4) cancel out. In this case, the variable denotes the value of the bigram frequency for stimulus in the stimulus set. For the model results, the posterior quantiles are reported in Table 2 whereas MCMC convergences of the algorithm are reported in Supplementary Materials. Also in this case, the model fitted the data very well, with overall fit of 73% and subject-based fit of 70% (see Table 1). Figure 5 shows the probability graph for model 2. This graph represents the probability to activate the distractor hemispace at three representative levels of the variable, i.e. the lowest, the medium, and the highest values of bigram. As evident from the graph, bigram frequency affects the probability to activate the target, with a higher probability for stimuli with low bigram frequency.

In substantive terms, the results of the quantitative model supported the evidence that the attraction probability toward the distractor was slightly affected by the specif value of the quantitative predictor (bigram frequency). In particular, low-level bigram frequencies were characterized by an average larger activation probability () for the distractor, whereas medium or large frequencies were associated with a logistic function slightly shifted toward positive values of the latent space , thus reflecting a lower chance for the distractor category (average activation probability of

). Moreover, by an inspection of the contingency table for the joint representation of bigram frequency (as a transformed categorical variable) and stimulus type, we noted that low bigram frequency values were mainly characterized by string letters (NW:

) whereas high bigram frequency values were predominantly associated with high frequency words (HF: ) or low frequency words (LF: ).

Figure 5: Application (Model 2): Probability to activate the distractor cue as a function of the continuous variable. For Note that just three representative levels (low, middle, high) are represented for the sake of graphical interpretation.

4.4 Model 3

Data structure and variables. The final and more complex model included both the three-level categorical predictor (stimulus type factor: HF, LF, STR) and the continuous predictor (bigram frequency) as well as the interaction term between these two variables. The dependent variable was the movement angles array .

Data analysis and results. The stimuli equation which characterizes the third model is defined as follows:

The MCMC diagnostics together with the estimated marginal posterior densities for the model’s parameters are reported in Supplementary Materials. The model fit was good, with an overall fit of 75% whereas the subject-based fit was equal to 71% (see Table 1). The posterior quantiles are reported in Table 2. Figure 6 shows the probability graph for model 3. This graph represents the probability to activate the distractor hemispace for each of the three levels HF, LF, NW of the categorical factor as a function of the latent variable and three distinct levels (high, medium, and low) for bigram frequency. The inspection of Figure 6 shows a clear interaction between stimulus type factor and bigram frequency indicating that the impact of stimulus category, in particular word frequency, increases with the decrease of stimulus bigram frequency. In other words, at high level of bigram frequency, the probability to activate the distractor is similarly low in all conditions (.17 p-distractor .2). By contrast, when bigram frequency decreases - that is stimuli become orthographically atypical - the probability of distractor activation increases, but only for the more lexically-familiar stimuli, i.e., words of high frequency (e.g., p-distractor raises from 0.17 to 0.70, in low and high bigram frequency condition respectively).

Figure 6: Application (Model 3): Probability to activate the distractor cue as a function of the categorical variable (within panels) and three representative levels of the continuous variable (between panels).

Finally, it is also worth mentioning the emergence of the main effect of stimulus category which was instead missing in model 1. By a quick inspection of Figure 7, one may clearly observe that HF words differ from both LF words and letter strings (NW), whereas LF words and letter strings do not differ with respect to the probability of activation of the distractor hemispace. Interestingly, the addition of the covariate bigram frequency in the model allowed the main effect of stimulus category to show up. Indeed, while at the medium and high levels of bigram frequency the results are in line with those observed at a sample level in the original study (see Figures 1,2,5 in [barca2012unfolding]) and in a recent re-analysis (see Table 2 in [calcagni2017analyzing]), in the case of low bigram the probability to activate the distractor increases with respect to high frequency words (HF). This might be somewhat related to a moderate difficulty in the orthographic processing of low frequency bigram words [rastle2008morphological] even in the case of stimuli with richer lexical representation.

Figure 7: Application (Model 3): Marginal posterior densities for the model parameters. Left panel: parameters associated to the categorical variable. Right panel: parameters associated to the continuous variable and its interaction with the categorical variable. Note that the densities are shown together for the sake of comparison.
overall by-subject
Model 1 84% 78%
Model 2 73% 70%
Model 3 75% 71%
Table 1: Application: Adequacy of the model to reproduce the observed matrices (overall fit) and (by-subject fit). All the measures are normalized in the range 0% (bad fit) - 100% (otimal fit). See Supplementary Materials for technical details.
Model 1 1.224 1.234 1.211
1.323 1.337 1.310
1.443 1.457 1.432
1.003 1.002 1.003
Model 2 0.063
Model 3 0.083 1.130 1.217 0.305 -0.505 -0.437
0.341 1.300 1.314 0.402 -0.385 -0.336
0.605 1.468 1.411 0.500 -0.269 -0.235
1.008 1.013 1.012 1.011 1.013 1.010
All the models
Table 2: Application: Posterior means (), 95% posterior intervals ([]), and Gelman-Rubin index for the estimated parameters of Models 1-3.

4.5 Profiles analysis

To further investigate the dynamic characteristics involved in the lexical decision task, we extend here the results of the third model to include also a profiles analysis. Figure 8 shows the estimated latent movement states for all the participants involved in the study. The profiles appear regular, as they evolve smoothly toward the target cue (T). We grouped the dynamics into four well-separated clusters (Figure 8, smallest panels on the right) according to their functional similarities [ramsay2006functional]. Particularly, the first group is characterized by a higher exploration of the distractor’s hemispace, especially in the first 30% of the process. The same applies to the third and fourth groups, although they show a gradual activation of the distractor. Finally, the second group clearly represents those profiles with no uncertainty in the categorization process, as they show no activation of the distractor’s hemispace at all. Although well-separated among them, these clusters still show some level of inner heterogeneity (for example, see group 1 and 4). To study this latter issue in terms of experimental manipulations, we focused on group 1 and considered the low vs. high frequency conditions (HF vs. LF). We also selected the middle phase of the process (), where it is expected to observe larger cognitive competitions in the categorization [barca2012unfolding]. Figure 9 shows the participants’ profiles in terms of attraction probability for the two lexical conditions. As expected, the profiles differ between these conditions, with LF eliciting higher attraction probability. This is in line with the fact that low frequency words have a weaker lexical representation than high frequency stimuli and consequently they are more difficult to process [barca2012unfolding]. Interestingly, the individual profiles also differ in the way they activate the distractor. For instance, the participant 6 had higher probability in both LF () and HF () conditions whereas the participant 7 had a more pronounced activation just in the LF condition () than HF (). Similarly, participants 6 and 7 seemed to prolong the competing dynamics up to the 50% of the process, by contrast participants 8 and 15 seemed to resolve the lexical competition earlier as showed by the abrupt decreasing of their curves. We complete our analysis by evaluating how individual profiles are linked to empirical measurements. Figure 10 represents this scenario for two stimuli belonging to HF and LF conditions. As we can notice, the curves present the same dynamics (due to the latent states ) although they clearly differ in terms of attraction exerted by the stimulus (due to the component of the model). In this case, the LF stimulus produced larger conflict than HF in the lexical categorization. This is evident when we turn back to the observed data: as expected, the rose diagrams of LF showed larger directions in the distractor’s hemispace.

Figure 8: Application: Estimated movement dynamics of each participant (biggest panel, left) and clusters of profiles in terms of their functional similarity (smallest panels, right). Note that averaged profiles are represented as dashed lines whereas D and T in all the panels indicate distractor and target, respectively. Groups’ composition: participants 6,7,8,15 (group 1), 1,4,19,21 (group 2), 2,3,5,12,13,16,17,20,22 (group 3), 10,11,14,18 (group 4).
Figure 9: Application: Estimated attraction probabilities of participants in Group 1 for the High Frequency (left panel) and Low Frequency (right panel) lexical conditions. Note that the probability curves are computed with respect to the distractor (D), the gray area in both panels indicates a selected window of processing (), whereas the terms are computed using a normalized discrete approximation of the integral of the probability curves in the selected process window .
Figure 10: Application: Estimated attraction probabilities of participants in Group 1 and rose diagrams of observed radians for two stimuli (HF: epoca

, epoch. LF:

zampa, paw). Note that D and T in all the panels indicate distractor and target, respectively.

5 Discussion

We have described a new approach to model and analyse dynamic data coming from mouse-tracking experiments. Our proposal took the advantages of a state-space representation, in which the observed data were thought as being function of two independent sub-models, one representing the movement process and its properties () and the second modeling the two-choice experimental task (

) according to which the data were collected. These sub-models were integrated by means of an inverse-logit function (

) that expressed how the experimental manipulations acted on the movement processes in selecting the final correct response against the competing one. This formulation was flexible enough to take into account the complexity of some dynamic behaviors showed by the reaching trajectories. Moreover, it allowed for separately accounting for the motor heterogeneity and experimental variability in . Indeed, when our state-space representation simply reduced to a model where the experimental manipulations had no relevant effect in reproducing the observed data. This instance has been illustrated in Section 4.3 (Model 1). In this case, as was not allowed in our model formulation, all the variability of can be ascribed to . This is relevant in view of the fact that movement variability may reflect only individual motor executions in absence of any experimental manipulations [yu2007mixture]. The movement component was modeled to be Markovian with gaussian transition density.

Although more complex models can be used to represent movement dynamics, simple random walks still allows a great deal of flexibility in modeling reaching trajectories under weak assumptions on the movement behavior [yu2007mixture, paninski2010new]. In particular, in the case of mouse-tracking tasks, they allow representations of the following three properties: (i) Each movement is goal-oriented as individuals have to finalize the action by clicking on one of the two categories shown on the screen. (ii) Mouse-tracking trajectories generally start at rest, proceed out in the movement space, and end at rest. (iii) Hand trajectories tend to be smooth during the reaching process, i.e. small changes in the interval are more likely than large and abrupt changes [brockwell2004recursive, spivey2010curved]. The stimuli component was defined to be a linear combination of information typically involved in a univariate design, namely a categorical variable containing the levels of the experimental factor and a continuous covariate . This gave researchers the opportunity to additionally analyse which component of the experimental design is relevant in producing the effect of on . The data-generation process was defined according to a mixture of two von-Mises distributions representing the categories of a two-choice categorization task. Among others, we chose this distribution because it provides a flexible representation for angular ordered data, especially because it simplifies mathematical computations involved in the model’s derivation [mcclintock2012general, mulder2017bayesian].

There are other existing methods that offer alternative ways to model mouse-tracking data. For instance, [van2009trajectories] proposed the use of the movement superposition model [henis1995mechanisms] to model and analyse the typical two-choice lexical decision task. In particular, they modeled mouse-tracking trajectories by representing the complete hand movement as a summation of sub-movements, which were obtained by the solution of the minimum-jerk equation for the standard reaching trajectory (i.e., a movement characterized by a bell-shaped speed profile that minimizes the sum of the squared rates of jerks over the movement duration). Similarly, [friedman2013linking] discussed how an intermittent model of arms movement can be used for reaching trajectories in random-dot experiments. They used both Wiener’s diffusion process and Flash and Hogan’s movement equation to predict reaction times (RTs) and movement data. Their goal was to assess the link between movement trajectories and underlying cognitive processing. Our model differs in some respects from these works. With regards to [van2009trajectories], for instance, we used a stochastic state-space approach to model the movement trajectories instead of deterministic equations. Instead, with respect to [friedman2013linking], we tailor-made our model to a typical two-choice categorization task, making use of few assumptions on the nature of the movement process (as those implied by the Gaussian AR(1) process). By and large, our goal was not to provide a mathematical representation of the cognitive components underpinning mouse-trajectories since the model does not describe the cognitive processing per se. By contrast, we simply provided a statistical model for the analysis of mouse-tracking data, which can offer a good compromise between data modeling and data analysis.

5.1 Model’s advantages and limitations

Our non-linear state-space model has several advantages. For instance, when comparing with the standard approaches, our proposal provides a unified analytic framework to simultaneously model and analyse trajectories data. By modeling movement heterogeneity and task variability together, we can evaluate how experimental variables directly act on the observed series of trajectories, with no need to use any kind of summary measures. An additional advantage of our model concerns the study of individual differences in terms of latent dynamics. While this is impractical in standard two-step approaches, in our proposal researchers can assess individual variations by studying the movement profiles once they are estimated. For instance, they can be analysed in terms of similarity/dissimilarity with regards to external individual covariates (e.g., vocabulary knowledge and bilingualism in psycholinguistic experiments; IQ, risk-taking propensity, or more generally clinical variables in decision-making tasks). Still, individual dynamics can be compared each other qualitatively in terms of chance to activate the distractor or target cues. As the dynamics are normalized on a common cumulative scale, researchers can assess whether the chance to activate the distractor cue at a certain percentage of the process and for an experimental manipulation, is particularly higher in a sub-group of participants (this case, for example, has been described in Section 4.6).

As for any modeling approach, also the current proposal can potentially suffer from some limitations. A first limitation concerns the only-intercept model used to integrate individual dynamics and experimental information. Although this was enough to represent whether or not certain stimuli can increase the probability to select the distractor cue, we may want to known whether some experimental manipulations can modify the individual dynamics as well. However, this would particularly pronounce the computational costs required for the model identification (especially with regards to filtering), as we need to appropriately generalize Eq. (3) to include more parameters. Lastly, in the current study we used univariate non-linear state-space models to represent individual dynamics for the sake of parsimony. However, more complicated situations may require models including further movement characteristics like step-length, velocity, acceleration, and jerk [kulkarni2008state], which may be modeled as statistical constraints of the model [ciavolino2014generalized, calcagni2017analyzing].

5.2 Further extensions

Our non-linear state-space model can be improved in many aspects. For instance, the stimuli equation (4) can be generalized to cope with more complex experimental designs, like those involving multiple factors and covariates together with high-order interaction terms. Likewise, the current model restrictions can be relaxed to allow changes in slopes of as a function of the experimental stimuli. Further, the development of a hierarchical representation of the model, with a random-effect component in the state equation (3), would offer a way to model the inter-individual variations as resulting from an underlying common population. Still, the development of a multivariate state-space model to include other movement components will surely be considered a future extension of the present work. Further studies may lead to generalize the AR(1) process used for the movement dynamics to include former knowledge on the deterministic constraints of the hand movement as those used, for instance, by [van2009trajectories] and [friedman2013linking]. Moreover, further studies may also lead to generalize the AR(1) process used for the movement dynamics to include former knowledge on the deterministic constraints of the hand movement as those used, for instance, by [van2009trajectories] and [friedman2013linking]

. Finally, an open issue which deserves greater consideration in future investigations is the need for a formal comparative framework with which we may eventually contrast and compare spatial modeling perspectives (like the one presented in this contribution) and currently used methods for analyzing mouse tracking data based on descriptive statistics


6 Conclusions

In this paper we presented a novel and comprehensive analytic framework for modeling and analyse mouse-tracking trajectories. In particular, a non-linear state-space approach was used to model the observed trajectories as a function of both individual movement dynamics and experimental variables. Model identification was performed under the umbrella of Bayesian methods, in which a Metropolis-Hastings algorithm was coupled with a recursive gaussian approximation filter to get posterior distributions of model parameters. For the sake of illustration, we applied our new approach to a real mouse-tracking dataset concerning a two-choice lexical categorization task. The results indicated how our proposal can provide valuable insights to assess the dynamics involved in the decision task and identify how the experimental variables significantly contributed to the observed movement heterogeneity. Moreover, the analysis of individual profiles allowed for comprehensive and reliable identification of individual and group-based differences in the dynamics of decision making.

In conclusion, we think that this work yielded interesting findings in the development of computational models able to capture the unfolding high-level cognitive processes as reflected by motor executions which are typically involved in mouse-tracking tasks. To our knowledge, this is the first time that mouse-tracking data are fully modeled and analysed within a process-oriented approach. We believe our contribution will offer a novel strategy that may help cognitive researchers to understand the roles of cognition and action in mouse-tracking based experiments.

Acknowledgments. The authors thank Dr. Simone Sulpizio for many helpful discussions on earlier versions of the manuscript and Dr. Laura Barca for providing the dataset used in the case study. This work was supported by CARITRO, Fondazione Cassa di Risparmio di Trento e Rovereto (ref. grant 2016.0189).

Appendix A: Filtering and smoothing

The term in Eq. (11) is recursively computed given all the measurements up to the -th step. Let:


be the filtering density at step . The first term of the right-side of the equation is the observation equation whereas the second term represents the Chapman-Kolmogorov equation. Given the initialization at , the filter proceeds by solving the integral in the Chapman-Kolmogorov equation (prediction step) and compute the log posterior filtering density (update step). In our model, we solve Eq. (A.1) by means of a Gaussian approximation filter [smith2003estimating], which computes a gaussian approximation to the posterior density and determines its posterior mode and variance recursively.
More technically, let:


be the filtering density at step . Consider the following definitions:


where and represent the mode and the variance of the gaussian approximation in the prediction step. Under definitions (A.4)-(A.5), integrating out for in Eq.(A.2) yields to:


For the sake of computational simplicity, we rewrite the measurement density in Eq. (2) as follows:


where is a (deterministic) indicator variable taking the value 1 when is in the area of the screen associated to C1, and 0 when is in the area associated to C2 (see Fig. 1) [banerjee2005clustering]. Next, using these results together with Eqs. (A.3)-(A.5) we obtain:


Differentiate around gives:


where is the hyperbolic cosine function.

Finally, the posterior moment

is obtained by solving Eq. (A.9) whereas is computed by the negative inverse of Eq. (A.10) [tanner1991tools]. As Eq. (A.9) is non-linear, it can be solved numerically (e.g., using the Broyden’s method). The complete filtering algorithm is summarized in Table A.1.

Algorithm 1 Gaussian Approximation filter algorithm
Table A.1: Filtering algorithm on the interval . The algorithm takes as input the parameters and the data , whereas returns as output the filtered states . Note that the notation indicates the prediction of the current -th state given the previous whereas denotes the update of the predicted state.

Finally, to ensure that the unobserved sequence is an approximate realization from we need to refine the filtering results conditional on the whole observed data . This task is achieved by means of a standard fixed-interval smoothing algorithm [ansley1982geometrical, mendel1995lessons], which uses the posterior filtering moments and as input. The smoothing algorithm is described in Table A.2.

Algorithm 2 Fixed-interval smoothing algorithm
backward update
Table A.2: Backward smoothing algorithm over the interval . The algorithm takes as input the filtering solutions whereas returns as output the smoothed states conditional on the whole set of data . Note that the the notation refers to the filtering solutions whereas indicates the smoothing results.

Appendix B: Posteriors computation and estimation

In what follows, we describe the steps for computing the term . First, we note that the array consists of two blocks of parameters associated to the observation equation and the stimuli equation of the model (3)-(4), namely scalars paired with the set of stimuli/trials and two parameters for the vonMises concentrations. To simplify the computations in the Metropolis-Hastings algorithm, the terms can be distinctively determined prior running the MH algorithm [marin2005bayesian]. Moreover, since we are not interested in the posterior distributions of these parameters, as long as they are not involved in the state-space dynamics, we compute them using the following Maximum-Likelihood estimators [banerjee2005clustering]:


where and are the location parameters fixed by the experimenter, is defined as in Eq. (Appendix A: Filtering and smoothing), whereas is the inverse of the modified Bessel function which is evaluated numerically [abramowitz1972handbook]. Given the above results and the constrains , the array of parameters simply reduces to . This simplifies the inner-working of the MH algorithm as it now works on a smaller and more compact parameter space. To proceed further, the decomposition (10) involves the following definition for the MH proposal density [andrieu2010particle]:


where is evaluated through filtering/smoothing. This is appealing since the posterior density from which it might be difficult to sample from, reduces now to that is conveniently defined on a smaller parameters space [andrieu2009pseudo]. In our case, we can set:


with being a suitable covariance matrix. Consequently, the MH acceptance ratio is as follows:


where is the density for the marginal likelihood computation, indicates the prior density over the parameters, whereas is the MH proposal density. Note that under (A.14), the terms and in Eq. (A.15) vanish as they refer to the same probability value. This yields to:


where the MH ratio is now expressed as a function of the marginal likelihood and the priors. The term can be easily computed as a byproduct of the filtering calculations (see Appendix C). Finally, the choice of a well-suited covariance matrix for the proposal distribution is crucial in order to achieve chains’ convergences. In our context, we used the Haario’s adaptive solution where the proposal covariance is iteratively adapted during the chains using the current proposal covariance up to the adaptation step [haario2001adaptive]. The complete MH algorithm is summarized in Table A.3.

Algorithm 3 Metropolis-Hasting algorithm
Set  , initialization
Run Algorithms 1-2 to get  
m-h loop
Run Algorithms 1-2 to get  
Compute   from Eq. (A.16)
Get   accept/reject
Set  ,   if
Set  ,   if
Run     see [haario2001adaptive] adapting phase
Table A.3: Marginal Metropolis-Hastings to estimate . Note that

indicates the Uniform distribution over the interval

whereas the adaptive phase can be performed at each iteration of the chain or rather after a fixed interval (with ).

Appendix C: Marginal Likelihood computation

The marginal likelihood is computed as follows. Let the observed-data log likelihood be: