1 Introduction
In several research problems we deal with probabilistic sequences of inputs (e.g., sequence of stimuli) from which an agent generates a corresponding sequence of responses and it is of interest to model or find some kind of relation between them. This is the case, for example, of many experiments related to the study of statistical learning in neuroscience. Statistical learning in this context refers to the ability to extract statistical regularities from the environment over time (Armstrong et al., 2017; Conway, 2020; Schapiro and TurkBrowne, 2015). In this kind of experiments, humans or animals are exposed to sequences of stimuli with some statistical regularity and it is conjectured that the brain is able to retrieve/learn the statistical regularities encoded in the stimuli (von Helmholtz, 1867; Wacongne et al., 2012; Garrido et al., 2013). The study of this conjecture usually involves data analysis of some physiological or behavioral responses recorded from participants during the performance of a suitable task. Statistical learning is also a widely used term in computer science but it is not that meaning we are referring to here.
Motivated by an auditory statistical learning problem, a new class of stochastic process was introduced in Duarte et al. (2019) to model the relation between the probabilistic sequence of inputs and the corresponding sequence of responses, namely sequences of random objects driven by context tree models. A process in this class has two elements: a stochastic source generating the sequence of stimuli and a sequence of responses generated by the agent during the exposure to the stimuli. The source generating the stimuli is assumed to be a context tree model (CTM) taking values in a categorical set (Rissanen, 1983; Bühlmann and Wyner, 1999; Galves and Löcherbach, 2008). In a context tree model, at each step, the occurrence of the next symbol is solely determined by a variablelength sequence of past symbols, called context. Any stationary stochastic chain can be approximate by a context tree model, therefore it constitutes a powerful tool to model the probabilistic structure of sequences of stimuli (Fernández and Galves, 2002; Duarte et al., 2006)
. The model also assumes the existence of a family of probability measures on the set of responses (hereafter called objects) indexed by the contexts characterizing the sequence of stimuli. This family of measures describes the relation between the source and the responses. More precisely, at each step, a new object of the response sequence is chosen according to the probability measure associated to the context ending at that step in the sequence of stimuli.
Given some data, statistical model selection methods can be applied to estimate the set of contexts and, in some cases, the family of probability measures characterizing the distribution of the response sequence. The theoretical framework introduced in
Duarte et al. (2019) addressed a model selection algorithm for the particular case in which the objects of the response sequence refer to functions (e.g., electrophysiological data). Nevertheless, the proposed mathematical framework is general enough to model other kind of objects in the response sequences such as responses taking values in a categorical set, in the real line or in . In the same way that context tree models have found applications in several research areas such as neuroscience, genetics (Busch et al., 2009) and linguistics (Galves et al., 2012), this new theoretical framework can find applications in research areas beyond neuroscience. For this reason, its computational implementation is useful.This paper introduces the SeqROCTM Matlab toolbox, a set of computational tools for working with sequences of random objects driven by context tree models. In addition to the statistical software, this article makes other contributions. We formalize two model selection procedures for sequences of random objects driven by context tree models with categorical responses. We also formalize and define general conditions for the use of the Smaller Maximizer Criteria (SMC) to tune model selection algorithms for both context tree models and sequence of random objects driven by context tree models.
Since the stimuli sequence is generated by context tree model, the SeqROCTM toolbox also implements several model selection algorithms and tuning procedures for context tree model. Up to our knowledge, there exist only an R package implementing model selection for context tree model (Mächler and Bühlmann, 2004).
The paper is organized as follows. Section 2 briefly describes the mathematical concepts and model selection algorithms included in the SeqROCTM toolbox. Section 3 introduces the architecture of the software. Sections 4 and 5 present the main functionalities of the toolbox through illustrative examples motivated by statistical learning problems. Conclusions are given in Section 6.
2 Model selection methods for sequence of random objects driven by context tree models
We begin this section by introducing some notation and formal definitions. Let be a finite set. Any string , is denoted by and its length by . Given two strings and of elements of , denotes the string of length obtained by concatenating and . The string is said to be a suffix of , and denote by , if there exists a string satisfying . When we say that is a proper suffix of and denote .
Definition 1.
A context tree is defined as any set satisfying

Suffix Property. No string is a proper suffix of another string .

Irreducibility. No string belonging to can be replaced by a proper suffix without violating the suffix property.
The elements of are called contexts. The height and size of are defined as and , respectively.
In this work we deal with experiments in which an agent is exposed to an stochastic sequence of stimulus , taking values in . A key point is that the stochastic sequence is a context tree model (Rissanen, 1983; Bühlmann and Wyner, 1999). Formally, consider a stationary ergodic process and for any string denote by
Definition 2.
We say is a context tree model with parameters if there exist a function such that

for any and any finite sequence such that , it holds that

no proper suffix of satisfies condition 2.
The function is said a context function.
A sequence of responses with values in some measurable space is recorded while the agent is exposed to the stimuli sequence (e.g., neurophysiology responses such as electroencephalographic data, behavioral responses, etc.). The relation between the responses and the stimuli is modeled through a class of stochastic processes called sequence of random objects driven by context tree model (Duarte et al., 2019).
Definition 3.
The bivariate stochastic chain taking values in is a sequence of random objects driven by context tree models with parameters , where is a family of probability measures on , if

is a context tree model with parameters ;
Given some training data and, under the assumption that it was generated by a sequence of random objects driven by context tree models, the statistical problem of interest is to estimate the parameters and characterizing the response sequence.
To formulate the model selection methods we consider two scenarios. A first scenario in which the responses belong to a finite set and a second scenario in which takes values in a functional space. Hereafter we will refer to these two scenarios as categorical and functional case, respectively.
Before introducing the model selection procedures a few more definitions are needed. Given a finite string we denote by the number of occurrences of the string in the sequence , that is
(1) 
Definition 4.
Let be an integer such that , an admissible context tree of maximum height for the sample is any context tree satisfying

if and only if and .

Any string with is a suffix of some or has a suffix .
The set of all admissible context trees of maximal height is denoted by .
Definition 5.
Let be a context tree and fix a finite string . A subtree in induced by is defined as the set . The set is called a terminal subtree if for all it holds that for some .
Given two context trees and , we denote by (resp. ) if for any there exists such that (resp. ).
The model selection procedure that will be introduced for the functional case and two out of three procedures developed for the categorical case are inspired by the algorithm Context (Rissanen, 1983). For this reason, we first describe below a general algorithm Context procedure, specifying later the difference on each case. In the sequence, we present a third model selection procedure for the categorical case which is based on BIC.
2.1 General algorithm Context
Given a sample , fix an integer and let be the context tree of maximum size in . We shorten this maximal candidate context tree by successively pruning the terminal subtrees according to some statistical criterion.
For any string such that is a terminal subtree, we decide to prune or not verifying a statistical criterion, say stat(u). If stat(u) is satisfied, we prune the subtree in ,
(2) 
Otherwise, if stat(u) is not satisfied, we keep in . At each pruning step, we check a string which induces a terminal subtree in and that has not been checked yet. This pruning procedure is repeated until all the existing terminal subtrees have been checked and no more pruning is possible.
A pseudo code for the general algorithm Context is given in Algorithm 1.
The use of the general algorithm Context for the functional and categorical cases differs with respect to stat(u).
In the functional case stat(u) implements a twoside goodness of fit test for functional data (CuestaAlbertos et al., 2006). For the categorical case, stat(u) implements two different procedures. The first one compares the conditional loglikelihoods between a string and its offspring while the second one compares the empirical distributions between a string and its offspring. The following sections describe the statistical criterion stat(u) implemented in each case.
2.2 Functional case
In this section it is defined the statistic criterion used in the general algorithm Context (Algorithm 1) for the functional case.
Consider the set of realvalued square integrable function in the interval and the Borel algebra on . Moreover, assume that is a sample of a sequence of random objects driven by context tree models with parameters , with a probability measure on For any string with , let be the set of indexes belonging to where the string occurs in the sample , that is
(3) 
By definition, the set has elements. If , we set for each . Thus, is the subsample of induced by the string .
To each element of the terminal subtree we associate the subsample of induced by it. These sets of functions are used to decide whether the subtree is pruned or not: the stat(u) criterion tests the equality of distribution between the sets of functions associated to the elements of .
To test whether two samples of functions have the same distribution, it is used a projective method proposed in CuestaAlbertos et al. (2006). In this method, each function of both sets is projected in a gaussian direction chosen at random. This produces two new projected samples of real numbers. Then, the equality of distribution for these new samples is tested using the KolmogorovSmirnov test. The results presented in CuestaAlbertos et al. (2006)
guarantees that, if the two samples of functions have different distributions then the set of random directions where the projected samples have the same distribution has Lebesgue measure equal to zero. This implies that if we reject the null hypothesis of equality of distributions for two sets of projected samples, then we can also reject it for the corresponding functional sets.
Formally, for any string such that is a terminal subtree, we test the null hypothesis
(4) 
using the test statistic
(5)  
Here is a realization of a Brownian bridge in the interval and is the KolmogorovSmirnov distance between the empirical distributions and of the projections of the samples and onto the direction , respectively.
We reject the null hypothesis when , with and . Observe that for a string with only one pair of offspring (i.e., ) the null hypothesis is tested with significance level (because becames the percentile of a KS distribution). On the other hand, for a string with more than two offspring, is tested with an unknown significance level upper bounded by . This is guaranteed because in the definition of we applied a terminal subtree correction, by using instead of . In this way, is tested with significance level at most for any string .
The statistic in equation (5) depends on the a random direction . For this reason, to improve the stability of the estimate we test the null hypothesis (4) several times using different Brownian bridges. We conclude that the sets of functions associated to the elements of the terminal subtree do not have the same distribution, and consequently, we do not prune the subtree
, if the number of rejections exceeds a certain threshold. This threshold is derived according to a binomial distribution with probability of success
. Formally, fixed a string , a significant level and consider independent Brownian bridges . Compute the test statistics and define(6) 
In this case, stat(u) checks whether , where is the percentile of a binomial distribution with parameters and . The elucidation for this procedure comes from the following proposition.
Proposition 1.
For any string and integer , consider the random variables defined in (5) with independent Brownian bridges in the interval . For any significance level define with . Under the null hypothesis (4), for any , it holds that,
where denotes the smallest constant such that with a random variable with Binomial distribution of parameters and .
2.3 Categorical case
2.3.1 General algorithm Context + Conditional loglikelihood
Given such that , we denote by the number of occurrences of the string in the sample followed by the occurrence of the symbol in the sample , that is
(7) 
The maximum conditional likelihood for a sample is given by
(8) 
with
(9) 
and the maximum likelihood estimator of the conditional probability , defined as
(10) 
Notice that is the portion of the conditional likelihood of the model given the data induced by the context .
Consider the statistic
(11) 
and fix a threshold . For any string such that is a terminal branch, the function stat(u) verifies whether the following inequality holds
(12) 
If the inequality is satisfied, we prune the subtree of . Otherwise, we keep in . The inequality (12) is equivalent to check whether
Notice that is the conditional loglikelihood ratio between a model with parameters and a model with parameters , where and they differ only by one set of offspring nodes branching from , that is .
Remark 1.
The idea of comparing the maximum likelihood induced by a node with the maximum likelihood induced by its offspring was originally used in the algorithm Context introduced by Rissanen (1983).
Given , set for all , and, for any define
(13) 
The context tree estimator obtained with this procedure can be defined as
(14) 
Notice that once we have , for a given , equation (13) implies that for any , .
Remark 2.
The consistency of the original algorithm Context was proved in Rissanen (1983). In this setting (i.e., model selection procedure for context tree model) the statistic used to identify the contexts is given by
The proof of consistency depends on through the ergodicity of and its memory relation with , which in our case also satisfies. Therefore, the consistency can be easily adapted for the formulation we are introducing here for sequences of random objects driven by context tree models.
2.3.2 General algorithm Context + Offspring empirical distributions
In this case, the function stat(u) inside the general algorithm Context compares the distance between the empirical distribution associate to the string and the ones associated to its offspring.
Formally, for any finite string , define the statistic
(15) 
For any finite string such that is a terminal subtree, the function stat(u) verifies whether . If the inequality is satisfied the subtree is pruned, otherwise it is kept.
Given , set for all , and, for any define
(16) 
The context tree estimator obtained with this procedure can be defined as
(17) 
where refers to the largest suffix of . Note that, as well as in the classical algorithm Context, implies for all .
Remark 3.
In Galves and Leonardi,Florencia (2008) it was proved the strong consistency of the estimator (17) (with instead of ) for the case of unbounded context tree models. The proof relies on a mixture property which is always satisfied in the case of finite context tree models. In particular, is also true for the law of the response sequence since its time memory depends on the time memory of the associated context tree model.
2.3.3 Bayesian Information Criterion (BIC)
This section describes a model selection procedure for the categorical case using the Bayesian Information Criterion. Model selection for context tree models via BIC was first addressed in Csiszár and Talata (2006). We formalize here how the procedure introduced in Csiszár and Talata (2006) is used in our case.
Given a sample and a constant , the BIC estimator for sequence of random objects driven by context tree models is defined as
(18) 
where
stands for the degree of freedom of the model. Formally, for any admissible context tree
, we define, for each ,and .
Csiszár and Talata (2006) showed that (18) can be computed efficiently through the following inductive procedure. Starting with , for any , define the quantity and the indicator , and for any define recursively the quantity
(19) 
and the indicator
(20) 
The estimate obtained solving (18) can be written as
(21) 
Observe that, on the contrary to the algorithm Context, for a given , does not imply that for any .
Remark 4.
The fact that the recursive procedure above effectively solves the BIC optimization problem and the consistency of the estimator were proved in Csiszár and Talata (2006) for the case of context tree models . In Csiszár and Talata (2006), the analogous to equation (18) is
in which the maximum loglikelihood is computed using the empirical probabilities of the distribution , instead of . Since the distribution affects these proofs only through the ergodic theorem and its memory dependency on , it is straight forward that all the proofs can be adapted to the case of the conditional loglikelihood considered in this section, which depends on instead of .
2.3.4 Tuning the model selection methods
The threshold used in the procedures based on the general algorithm Context and the penalization constant
involved in the model selection procedure based on BIC are hyperparameters whose values must be specified a priori. Small values of
and result in big context trees (big in the sense of its size) and, consequently, overfitted models while high values of these hyperparameters give rise to context trees of small size and underfitted models.To choose the value of the hyperparameters one can use the Smallest Maximizer Criterion (SMC) (Galves et al., 2012). The SMC procedure was introduced in Galves et al. (2012) to tune the model selection method for context tree models based on BIC. Here we extend this framework to the case of sequence of random objects driven by context tree models for tuning the model selection methods proposed in the categorical case.
The SMC procedure consists of two steps. In the first step a set of candidate models is computed, namely the champion trees. In the second step, an optimal model is chosen within the set of champion trees. The champion trees obtained will depend on the model selection procedure being tuned and may differs from one procedure to another.
In this section, denote either , or .

Step 1. Compute the champion trees. The champion trees constitute a set of estimated context trees obtained by varying the value of the hyperparameter . When we obtain the admissible context tree of maximum size (the more complex model). By successively increasing the value of , we obtain a finite set of context trees totally ordered with respect to the order , say . It is not hard to see that there exists a value such that for any the estimated model is the empty tree, , which refers to the independent model.
A crucial fact for the consistency of SMC is that the context tree generating the sample data belongs eventually almost surely to the set of champion trees as goes to . This is the content of the theorem below and its proof is a cofactor of Theorem 6 in Galves et al. (2012) and an extra argument given in Appendix B.
Proposition 2.
Assume is a sample of a sequence of random objects driven by context tree model with parameters , with . Consider the map with denoting either , or and denote by
(22) 
Then is totally ordered with respect to and eventually almost surely as .
It is well known that the bigger the context tree, the higher its sample likelihood. When SMC was introduced for tuning the BIC model selection algorithm for context tree models, Galves et al. (2012) theoretically proved the existence of a change of regime in the rate in which the sample likelihood increases in the set of champion trees (Theorem 7 in Galves et al. (2012)). The authors also showed that such changing point in the likelihood function occurs at the true model generating the data. A consequence of the proof of consistency of SMC is that the change of regime does not depends on the estimation method used to obtain the champion trees, but only on some properties of the set. For this reason we state the next theorem in a slightly more general form that stated in Galves et al. (2012) and in terms of sequences of random objects driven by context tree models.
Theorem 1.
Assume is a sample of a sequence of random objects driven by a context tree model with parameters , with . Given a set satisfying

is totally ordered with respect to and

eventually almost surely as .
The following holds:

For any with there exists a constant such that
(23) 
For any with there exists a constant such that
(24)
Theorem 1 is a cofactor of Theorem 7 in Galves et al. (2012) and its proof is presented in Appendix C. This theorem provides a criterion to choose the optimal model (and consequently, the optimal value) among the champion trees. That is to say, the model in at which the change of regime occurs. This is the scope of the second step of the SMC.

Step 2. Identify the optimal tree. To select an optimal tree we use the following consequence of Theorem 1. For any ,
(25) This suggest that should be the smallest context tree such that the rescaled difference between the conditional loglikelihood of and (its successor in the order ) decreases as
increases. This is done by comparing average bootstrapped conditional loglikelihood using a ttest, as follows.

Fix two different sample sizes . Obtain independent bootstrap resamples of , of size , say
Similarly, let be another set of independent bootstrap samples of size constructed by truncating the sequences to size .

For each and its successor () compute the rescaled loglikelihood differences
(26) for and .
Apply a oneside ttest to compare the mean of the samples
and . 
Select as optimal tree the smallest champion tree such that the test rejects the equality of the means in favor of the alternative hypothesis .

Step 2 involves the computation of bootstrap resamples of a sequence of random objects driven by context tree models. The toolbox implements different bootstrap strategies for that. Before introducing them, we describe the bootstrap schemes implemented to resample a context tree model .

(Parametric bootstrap) The bootstrap samples are obtained by drawing from a parameterized distribution (Bühlmann, 2002) in the following way:

Choose a hyperparameter value and estimate the model using the data .

Generate the bootstrap samples by simulating from the approximated distribution ,


(Block bootstrap) Split the sample into nonoverlapping blocks (see Figure 1a). These blocks are build by using a renewal context of to split the sequence. A renewal context is a string from which the next symbols can be generated without knowing further information from the past. A resample is obtained by repeatedly sampling uniformly a block from the set of blocks and concatenating them. In the toolbox, the user can specify the renewal context, or it can be computed from the estimated model .
A bootstrap resampling of the bivariate chain can be obtained with two different procedures:

(Parametric bootstrap)

Choose a hyperparameter value and estimate from .

Obtain a bootstrap resampling of the sequence using one of the bootstrap strategies for context tree models described above. The user can also choose not to resample the sequence .

Generate a sequence using the distribution .


(Block bootstrap) Split the sample into nonoverlapping blocks using a renewal context (see Figure 1b). In this case, a renewal context is a string from such that from it is possible to generate both the next symbols of sequence and its associates responses , without further information from the past. The bootstrap samples can be obtain by repeatedly sampling uniformly from the set of blocks and concatenating them.
Remark 5.
Bühlmann (2000) proposed a procedure based on Risk functions for tuning the algorithm Context. The SeqROCTM toolbox implements this tuning procedure for the particular case of the Final prediction error risk. This procedure is available in the toolbox for tuning the model selection procedures for context tree models and the model selection procedures for sequence of random objects driven by context tree models (for the categorical case).
3 Software Architecture
The SeqROCTM toolbox have been designed following a modular structure. The toolbox consists of functions written in Matlab that can be grouped in four modules regarding their functionalities (see Figure 2). This architecture makes the software easy to update, either by adding new functionalities or by improving the existing ones.
The module Data Simulation includes routines to simulate sequences of inputs (i.e., context tree models), to simulate the response sequence of a sequence of random objects driven by context tree models and to simulate the bivariate sequence. The module Visualization implements the algorithm described in Mill (2020) to graphically show a tree structure. This module contains also a routine to print the context tree in the console. The Tools module implements several functions that can assist the researcher during the experimental design and data analysis. Some of those functions are also invoked by functions in other modules. Some demos illustrating how to use the toolbox are included in the Demo module.
The main functions of the toolbox are in the Model Selection module. This module contains all the model selection procedures and tuning algorithms introduced in Section 2. Figure 3 presents a closeup of this module.
The novelty implemented in this toolbox it is illustrated in Figure 3 by the branch growing from the SeqROCTM node, that is, the mathematical framework to make inference in the class of sequences of random objects driven by context tree models. Nevertheless, due to the close relation to model selection in context tree models, we end up including in the toolbox several existing model selection algorithms for context tree models.
Up to our knowledge, there is only an Rpackage implementing model selection in context tree models introduced by Mächler and Bühlmann (2004). This package implements the algorithm Context and a tuning procedure for the algorithm Context based on Risk functions. Our toolbox also works as an alternative tool for this purpose.
All the implementations are selfcontained. The only external function the toolbox uses is the function permn (van der Geest, 2019).
4 Illustrative example: The Goalkeeper game
This section presents the major functionalities of the SeqROCTM toolbox through an illustrative example.
The Goalkeeper game is a video game developed by the NeuroMat team (https://game.numec.prp.usp.br/) as a tool to investigate the conjecture that the brain does statistical model selection (Castro, 2006). During the game, the kicker can throw the ball in three directions: left, center or right. The agent, playing the role of the Goalkeeper in a soccer penalty shootout, has to defend as much penalties as possible by predicting, at each step, in which direction the kicker will throw the ball. The kicker’s choices are randomly generated according to a context tree model. The Goalkeeper game has been used in the experimental protocol of some neurobiological experiments (Stern et al., 2020).
Here, for simplicity, instead of collecting some data using the game, we will simulate different participant strategies to generate the responses. We show how the toolbox can be used to assess the participant strategy from the data. The aim is that the estimated strategy matches the one used to generate the responses matches.
To generate the sequence of kick directions, we define a context tree model that generates sequences according to the following rule: after a shot to the left the kicker always send the ball to the center, after a shot to the right he always send the ball to the left, but after a shot to the center, if one step back he sent the ball to the center, then he send the ball to the left. Otherwise, if one step back he sent the ball to the left, then he send the ball to the right with probability 0.8 and to the center with probability 0.2.
Formally, the directions left, center and right
are represented by the symbols 0, 1 and 2, respectively. These symbols will conform the alphabet. When using the toolbox, an alphabet is always represented by a vector of consecutive positive integers from zero to the number of elements minus one. A context tree is defined through a cell array of vectors, each vector representing a context (i.e., a leaf of the tree). The distributions associated to the contexts are specified by a matrix with the number of rows equals the number of contexts and the number of columns equals the number of symbols in the alphabet. In this way, the
th row contains the distribution associated to the th context in the cell array defining the context tree. The following source code defines these variables according to the example.
Comments
There are no comments yet.