1 Introduction
Probabilistic models are used in a broad swathe of disciplines ranging from the social and behavioural sciences, biology, the physical and computational sciences, to name but a few. At their very core, probabilistic models are defined in terms of random variables, which range over a set of outcomes that are subject to chance. For example, a random variable might be defined to model the performance of human memory. In this case, the possible outcomes might be words studied by a human subject before their memory is cued. After cueing, the subject recalls the first word that comes to mind from the set of study words. This outcome is recorded as a measurement. Repeated measurements over a set of subjects allow the probability of the recall of a certain word to be empirically established.
It is important to note from the outset that the random variable has been been devised by the modeller with a specific functional identity in mind, namely to model the recall of a set of predefined study words. When developing probabilistic models in this way, the underlying assumption is that the functional identity of a random variable is independent of the context in which it is measured. For example, the purpose, or functional identity of is assumed to be the same regardless of whether the memories of human subjects are studied in a controlled laboratory, or in “the wild”, such as in a night club. This assumption seems perfectly reasonable. However, in quantum physics the analog of this assumption does not always hold and has become known as “contextuality”. More formally, the KochenSpecker theorem (Kochen and Specker, 1967) implies that quantum theory is incompatible with the assumption that measurement outcomes are determined by physical properties that are independent of the measurement context. Placing this theorem in the context of probabilistic models: contextuality is the “impossibility of assigning a single random variable to represent the outcomes of the same measurement procedure in different measurement conditions” (Acacio De Barros and Oas, 2015).
Contextuality plays a central role in the rapidly developing field of quantum information in delineating how quantum resources can transcend the bounds of classical information processing (Howard et al., 2014). It also has important consequences for our understanding of the very nature of physical reality. It is still an open question, however, if contextuality manifests outside the quantum realm. Some authors in the emerging field of quantum cognition have investigated whether contextuality manifests in cognitive information processing, for example, human conceptual processing (Gabora and Aerts, 2002; Aerts et al., 2014; Aerts and Sozzo, 2014; Bruza et al., 2015; Gronchi and Strambini, 2016) and perception (Atmanspacher and Filk, 2010; Asano et al., 2014; Zhang and Dzhafarov, 2017).
It is curious that the preceding deliberations around random variables have a parallel in the field of computer programming languages. More than five decades ago, programming languages such as FORTRAN featured variables that were global. (In early versions of FORTAN, all variables were global.) As programming languages developed, global variables were seen as a potential cause of errors. For example, in a large program a variable can inadvertently be used for functionally different purposes at different points in the program. The error can be fixed by splitting variable into two global variables and . In this way can be used for one functional purpose and for the other, and hence there is no danger that their unique functional identities can become confounded. However, when the program involves large numbers of global variables, keeping track of the functional identities of variables can became tedious and a source of error. Such errors were considered serious and prevalent enough that following in the wake of Dijkstra’s famous paper titled “Go To statement considered harmful”, Wulf and Shaw (1973) advocated in a similarly influential article that global variables are “harmful” and perhaps should be abolished. This stance was developed in relation to block structured programming languages. A “block”, or “scope”, refers to the set of program constructs, such as variable definitions, that are only valid within a delineated syntactic fragment of program code. Wulf and Shaw (1973) argued that when a program employs a scope in which variable is defined locally, as well as a variable with the same label that is global to that scope, then becomes “vulnerable” for erroneous overloading. The theory of programming languages subsequently developed means so that a variable with the same label can be used in two different scopes but preserve a unique functional identity within the given scope. This is not the case in stateoftheart probabilistic modelling. We believe that the way probabilistic models are currently developed is somewhat akin to writing FORTRAN programs from a few decades ago. By this we mean that in the development of a probabilistic model all the random variables are global. As a consequence errors can appear in the associated model should the functional identity of variables be changing because the phenomenon being modelled is contextual.
The aim of this article to contribute the foundations of a probabilistic programming language that allows convenient exploration of contextuality in wide range of applications relevant to cognitive science and artificial intelligence. For example, dedicated syntax is illustrated which shows how a measurement context can be specified as a syntactic scope in a probabilistic program. In addition, random variables can be declared local to a scope to allow overloading, which is convenient for the development of models. Such programs are referred to a Pprograms and fall within the emerging area of probabilistic programming (Gordon et al., 2014).
Probabilistic programming languages (PPLs) unify techniques from conventional programming such as modularity, imperative or functional specification, as well as the representation and use of uncertain knowledge. A variety of PPLs have been proposed (see Gordon et al. (2014)
for references), which have attracted interest from artificial intelligence, programming languages, cognitive science, and the natural language processing communities
(Goodman and Stuhlmüller, 2014). However, unlike conventional programming languages, which are written with the intention to be executed, a core purpose of a probabilistic program is to specify a model in the form of a probability distribution. In short, PPLs are highlevel and universal languages for expressing probabilistic models. As a consequence, these languages should not be confused with probabilistic algorithms, or randomized algorithms, which employ a degree of randomness as part of their logic.In addition to the dedicated syntax, Pprograms have a semantics based on hypergraphs which determine whether the phenomenon is contextual. These semantics will be based on hypergraphs in two ways: Firstly, a hypergraph approach from relational database theory is used to determine whether the schema of variables of the various measurement contexts is acyclic. If so, the phenomenon is being modelled is noncontextual. Secondly, if the schema is cyclic, measurement contexts are mapped to “contextuality scenarios”, which are probabilistic hypergraphs. Although these hypergraphs have been developed in the field of quantum physics, they provide a comprehensive general framework to determine whether the phenomenon being modelled by the Pprogram is contextual.
2 An example Pprogram
In order to convey some of the core ideas behind Pprograms, Figure 1 illustrates an example program where the phenomenon being modelled is two coins being tossed in four experimental conditions. Some of these conditions induce various biases on the coins. The syntax of the Pprogram is expressed in the style of a feature rich probabilistic programming language called WebPPL^{1}^{1}1http://webppl.org/(Goodman and Tenenbaum, 2016). However, the choice of the language is not significant. WebPPL is simply being used as an example syntactic framework.
In Pprograms, syntactic scopes are delineated by the reserved word context. Each scope specifies an experimental condition, or “measurement context” under which a phenomenon is being examined. The example Pprogram defines four such contexts labelled P1, P2, P3 and P4. Consider context P1 which declares two coins as dichotomous random variables and which are local to this scope. The syntax flip(0.5) denotes a fair coin; any value other than 0.5 defines a biased coin. Declaring variables local to the scope syntactically expresses the assumption that the variables retain a unique functional identity within the scope. The random variable declarations within a scope define a set of events which correspond to outcomes which can be observed in relation to the phenomenon being examined in the given measurement context. For example, signifies that coin A1 has been observed as being a head after flipping. Joint event spaces are defined by the syntax p[A1,B1] which becomes a joint probability distribution by the syntax Infer(samples:1000,p). In this case two coins have been flipped 1000 times to prime the probabilities associated with each of the four mutually exclusive joint events in the event space. The resulting distribution represents the model of the phenomenon in that measurement context which is returned from the scope as a partial model. The other measurement contexts P2, P3 and P4 are similarly defined resulting in the four distributions depicted in Figure 2. The structure of each distribution will be referred to as a probabilistic table, or ptable for short, as these are a natural probabilistic extension to the tables defined in relational databases (Bruza and Abramsky, 2016).
Modelling practice is usually governed by the norm that it is desirable to construct a single model of the phenomenon being studied. Dedicated syntax, e.g., model(P1,P2,P3,P4) allows partial models from the four measurement contexts to be combined to form a single distribution, such that each distribution corresponding to partial models can be recovered from this single distribution by appropriately marginalizing it (Bruza and Abramsky, 2016; Bruza, 2016). It turns out that this is not always possible to construct such a single model. As we shall see below, when this happens the phenomenon being modelled turns out to be “contextual”. Abramsky (2015) discovered that this formulation of contextuality is equivalent to the universal relation problem in database theory. This problem involves determining whether the relations in a relational database be joined together (using the natural join operator) to form a single “universal” relation such that each constituent relation be can be recovered from the universal relation by means of projection.
Relational database theory tells us that a key consideration in this problem turns out to be whether the database schema comprising constituent relations (ptables) is cyclic or acyclic. A database schema is deemed “acyclic” iff the hypergraph can be reduced into an empty graph using the Graham procedure (Gyssens and Paredaens, 1984). The set denotes the vertices in the graph and the set of edges.
A hypergraph differs from a normal graph in that an edge can connect more than two vertices. For this reason, such edges are termed “hyperedges”. For example, the database schema corresponding to the Pprogram in Figure 1 is depicted in Figure 2. In our case, the nodes of the hypergraph are the the individual variables in the headers of the ptables and the edges correspond to the sets of variables in these headers, i.e., there will be one edge corresponding to each constituent ptable, where the edge is the set of variables defining the header of that ptable. Therefore, and . (As the headers of ptables only contain two variables, is in this case a standard graph.)
The Graham procedure is applied to the hypergraph until no further action is possible:

delete every edge that is properly contained in another one;

delete every node that is only contained in one edge.
The following details the steps of the Graham procedure when applied to the example:
In this case the Graham procedure results in an empty hypergraph, so the schema is deemed “acyclic”.
There are a number of theoretical results in relational database theory which make acyclic hypergraphs significant with regard to providing the semantics of joining partial models into a single model. Wong (1997)
formalizes the relationship between Markov distributions and relational database theory by means of a generalized acyclic join dependency (GAJD). The key idea behind this relationship is the equivalence between probabilistic conditional independence and a generalized form of multivalued dependency, the latter being a functional constraint imposed between two sets of attributes in the database schema. It turns out that a joint distribution factorized on an acyclic hypergraph is equivalent to a GAJD
(Wong, 2001).For example, consider once again the acyclic schema in Figure 2. There are four ptables, . As the hypergraph is acyclic, there is necessarily a so called join tree construction, denoted , that satisfies the GAJD. In the tree construction, each denotes a unique ptable in the set . The practical consequence of this is that there is a join expression of the form: where the sequence is a tree construction ordering derived from the acyclic hypergraph. If the hypergraph constructed from the schema comprising ptables is acyclic, then a generalized join expression exists which joins the ptables into a single probability distribution such that each is a marginal distribution of .
In order to gain some intuition about how this plays out in practice, the acyclic database schema depicted in Figure 2 results in the join tree depicted in Figure 3. The nodes depict the variables in the respective ptables and the edges represent the overlap between the sets of variables in the respective headers. The numbers on the edges denote the ordering used to produce the join expression: . Under the assumption that the probability distributions represented in the nodes have identical distributions when marginalized by the variable associated with the edge, we can see how the hypertree produces a Markov network which, in turn, specifies the probabilistic join of the constituent ptables (Liu et al., 2011):
(1) 
Observe how the structure of the equation mirrors the graph in Figure 3 where the numerator corresponds to the nodes of the join tree and the denominator corresponds to terms which normalize the probabilities. In addition, this expression reflects conditional independence assumptions implied by the join tree, namely and are conditionally independent of , and are conditionally independent of , and and are conditionally independent of .
Let us summarize the situation so far and reflect on the issue of contextuality. A Pprogram comprises a number of scopes where each scope corresponds to a measurement context. A scope returns a probability distribution in the form of a ptable, which can be considered a partial model of the phenomenon. A reasonable goal is to combine these distributions into a single distribution so a single model of the phenomenon is produced. When the schema of the constituent ptables is acyclic and the marginal distributions of the set of intersecting variables are constant, then a straightforward extension of relational database theory can be used to produce the required single model (as has been shown in more detail in (Bruza, 2016)). The fact that it is possible to construct a single model means that the random variables in the Pprogram have a functional identity that is independent of the measurement contexts. The phenomenon is therefore noncontextual.
Much of the research on contextuality corresponds to when the schema of the ptables is cyclic (Zhang and Dzhafarov, 2017). In order to explore such cases, we will continue to use hypergraphs, but instead on defining the graph structure at the level of the schema as was illustrated in Graham procedure, the structure of the hypergraph will be defined in terms of the underlying events in the measurement contexts defined in the Pprogram. In database terms this equates to defining the hypergraph structure in terms of the data in the respective ptables.
3 Probabilistic models and Hypergraph semantics
In the following, we draw from a comprehensive theoretical investigation using hypergraphs to model contextuality in quantum physics (Acin et al., 2015). The driving motivation is to leverage these theoretical results to provide the semantics of Pprograms when the schema of the ptables to be joined is cyclic. How these semantics are expressed relates to how the syntax has been specified, which in turn relates to the experimental design that the modeller has in mind. The basic building block of these semantics is a “contextuality scenario”.
Definition 3.1.
(Contextuality Scenario) (Definition 2.2.1 (Acin et al., 2015)) A contextuality scenario is a hypergraph such that:

denotes an event which can occur in a measurement context;

is the set of all possible events in a measurement context.
The set of hyperedges are determined by both the measurement contexts as well as the measurement protocol. Each measurement context is represented by an edge in the hypergraph . The basic idea is that each syntactic scope in a Pprogram will lead to a hyperedge, where the events are a complete set of outcomes in the given measurement context specified in the associated scope. Additional hyperedges are a consequence of the constraints inherent in the measurement protocol that is applied. The examples to follow aim to make this clear.
In some cases, hyperedges will have a nontrivial intersection: If and , then this represents the idea that the two different measurement outcomes corresponding to should be thought of as equivalent as will be detailed below by means of an order effects experiment.
Order effects experiments involve two measurement contexts each involving two dichotomous variables and which represent answers to yes/no questions and . In one measurement context, the question is served before question and in the second measurement context the reverse order is served, namely then . Order effects occur when the answer to the first question influences the answer to the second. These two measurements contexts are syntactically specified by the scopes P1 and P2 shown in Figure 4.
In this Pprogram, syntax of the form var B = A ? flip(0.8): flip(0.1) models the influence of the answer of on via a pair of biased coins. In this case, if , then the response to is determined by flipping an 80% biased coin. Conversely, if , then the response to is determined by flipping a 10% biased coin (The choices of such biases are determined by the modeller). It should be carefully noted that the measurement contexts in the order effects program do not reflect the usual understanding of measurement context employed in experiments analyzing contextuality in quantum physics. In these experiments, a measurement context comprises observables that are jointly measurable, so the order in which the observables within a given context are measured will not affect the associated statistics.
We will now use this simple example to illustrate the associated contextuality scenario which is shown in Figure 5. Firstly, the set of of events (measurement outcomes) comprises all possible combinations of yes/no answers to the questions and , namely , where 1 denotes ‘yes’ and 0 denotes ‘no’. In this figure, the two rounded rectangles represent the events within the two measurement contexts specified by the syntactic scopes P1 and P2. For example, in the rectangle labeled P1, “11” is shorthand for the event etc. Observe that the corresponding hyperedges (rounded rectangles) contain an exhaustive, mutually exclusive set of events. In addition, the two spanning hyperedges going across these rectangles similarly comprise a set of exhaustive, mutually exclusive set of events. These spanning edges help illustrate events that are considered to be equivalent.
Firstly, it is reasonable to assume answering yes (or no) to both questions in either measurement context represents equivalent events. Therefore, the events labelled and can respectively be assumed equivalent to and . It becomes a little more subtle when the polarity of the answers differ. For example, the event labelled represents the event , remembering that question was asked before question in this context. The equivalent event in hyperedge P2 is labelled , which corresponds the event , where question is asked before question . As conjunction is commutative, it is reasonable to view these two converse events as equivalent. In summary, if is equivalent to and is equivalent to then the hyperedge (the dashed hyperedge in Figure 5) can be established, in addition to the hyperedge .
Let us now return to the issue of contextuality. A probabilistic model corresponding to a contextuality scenario is the mapping of measurement outcomes to a probability . Henson and Sainz (2015) point out that
“By defining probabilistic models in this way [rather than by a function depending on the measurement performed], we are assuming that in the set of experimental protocols that we are interested in, the probability for a given outcome is independent of the measurement that is performed”.
Observe carefully that by defining probabilistic models in this way formalizes the assumption mentioned in the introduction, namely that random variables are independent of measurement context and thus have a single functional identity. Without a single functional identity it is impossible to assign a random variable to represent the outcomes of the same measurement protocol in different measurement contexts.
It is a requirement that the mapping adheres to the expected normalization condition: . By way of illustration, consider once again Figure 5. This contextuality scenario has four edges. The normalization condition enforces the following constraints:
(2)  
(3)  
(4)  
(5) 
where and denote the probabilities of outcomes in the four hyperedges. A definition of contextuality can now be presented.
Definition 3.2 ((Probabilistic) contextuality).
Let be a contextuality scenario. Let denote the set of probabilistic models on . is deemed “contextual” if .
In other words, the impossibility of a probabilistic model signifies that the phenomenon being modelled is contextual. (The label “probabilistic” mirrors an analogous definition of contextuality based on sheaf theory (Abramsky et al., 2016)).
Let us now examine the possibility of a probabilistic model on the order effects contextuality scenario (Figure 5). Equations (2) and (5) imply that . Now, are repectively associated with the outcomes and . In other words, denotes the marginal probability in measurement context P1. By a similar argument, denotes which is written this way to emphasize that question is asked first in measurement context P2. This also equates to the marginal probability . In other words, the constraints imposed by normalization conditions in the hyperedges imply that the marginal probability must be the same across both measurement contexts P1 and P2.
This conclusion makes sense when considered in relation to the definition of contextuality: The only way that a function can be defined is if the marginal probabilities of the variables and are the same in both measurement contexts P1 and P2. If not, then this means that variable has a different functional identity when question is asked first (in measurement context P1) as opposed to when it is asked second (in measurement context P2).
In summary, the semantics of a Pprogram is represented by a contextuality scenario, which has the form of a hypergraph. Contextuality equates to the impossibility of a probabilistic model over the hypergraph. This impossibility is where contextuality meets probabilistic models.
4 Syntax and semantics of combining contextual scenarios according to experimental design
Different fields employ various experimental designs when studying a phenomenon of interest. For example, in psychology a “between subjects” experimental design means a given participant should only be subjected to one measurement context. In quantum physics, however, some experiments involve measurement contexts which are enacted simultaneously with the requirement that observations made in each context are local to that context and don’t influence other measurement contexts. This constraint is often referred to as the “no signalling” condition.
One of the advantages of using a programming approach to develop probabilistic models is that experimental designs can be syntactically specified in a modular way. In this way, a wide variety of experimental designs across fields can potentially be catered for. For example, consider the situation where an experimenter wishes to determine whether a system can validly be modelled compositionally in terms of two component subsystems and as shown in Figure 6.
Two different experiments can be carried out upon each of the two presumed components, which will answer a set of ‘questions’ with binary outcomes, leading to four measurement contexts. For example, one experimental context would be to ask of component and of component .
This abstract experimental design has be instantiated in a number of ways. For example, in quantum physics it has been employed to determine whether system comprising photons^{2}^{2}2Here a bipartite system of photons is being introduced. Whenever such systems of photons are mentioned throughout this article, in principle, any bipartite or multipartite quantum system would do, even fermions. and is entangled. In addition, it has been employed in cognitive psychology to test for contextuality in human cognition (Aerts et al., 2014; Aerts and Sozzo, 2014; Bruza et al., 2015; Dzhafarov et al., 2015; Gronchi and Strambini, 2016). For example, Bruza et al. (2015) describe an experiment to determine whether novel conceptual combinations such as BOXER BAT adhere to the principle of semantic compositionality (Pelletier, 1994). Semantic compositionality entails that the meaning of BOXER BAT is some function of the meaning of the component concepts BOXER and BAT. In this case, component corresponds to the concept BOXER and component corresponds to the concept BAT. Each of these concepts happen to be biambiguous, for example, BOXER can be interpreted in a sport sense or an animal sense (a breed of dog). Similarly, the concept BAT can be interpreted in either of these senses. Interpretation of concepts can be manipulated by priming words which correspond to the ‘questions’ asked of the component concepts. For example, one experimental context would be to ask a set of human subjects to return an interpretation of BOXER BAT after being shown the priming words fighter () and vampire (). Note how is designed to prime the sport sense of BOXER and to prime the animal sense of BAT. An interpretation given in this context might be “an angry furry black animal with boxing gloves on”. It is important to note that the interpretation is probabilistic, namely the priming word influences an interpretation of the concept but does not determine it.
How can system depicted in Figure 6 be modelled as a Pprogram? And, how can the semantics of the Pprogram determine whether is contextual? One way to think about system is that it is equivalent to a set of biased coins and , where the bias is local to a given measurement context. Figure 7 depicts a Pprogram that follows this line of thinking.
The Pprogram will be referred to as “Bell scenario” as it programmatically specifies the design of experiments in quantum physics inspired by the physicist John Bell (Clauser and Horne, 1974). Such experiments involve a system of two spacelike separated photons.
4.1 Bell contextuality scenario with nosignalling
The Bell scenario program follows the design depicted in Figure 6 by first defining the components and together with the associated variables. Thereafter, the program features the four measurement associated contexts P1, P2, P3 and P4. Finally, the line model(design: ’nosignal’,P1,P2,P3,P4) specifies that the measurement contexts are to be combined according to a “no signalling” condition. Formal details of this condition will follow, but essentially it imposes a constraint that measurements made on one component do not affect outcomes observed in relation to the other component. This could be because the components have sufficient spatial separation in a physics experiment, or alternatively, in a psychology experiment the cognitive phenomena represented by components and of system are independent cognitive functions.
The question now to be addressed is how the hypergraph semantics are to be formulated. Acin et al. (2015) provides the general semantics of the Bell scenarios by means of multipartite composition of contextuality scenarios. As these semantics are compositional, it opens the door to map syntactically specified components in a Pprogram to contextuality scenarios and then to exploit the composition to provide the semantics of the program as a whole. Consider the Bell scenario program depicted in Figure 7. The syntactically defined components and are modelled as a contextuality scenarios and respectively. The corresponding hypergraphs are depicted in Figure 8.
Note how the variable definitions associated with the component map to an edge in a hypergraphs. For example, the syntax def A = component(A1,A2) corresponds to the two edges labelled and on the left hand side of Figure 8.
The question now is how the compose the contextuality scenarios and into a single contextuality scenario , which will express the semantics of the Bell scenario Pprogram. The most basic form of composition is by means of a direct product of the respective hypergraphs. The direct product is a contextual scenario such that and . (See Definition 3.1.1 in (Acin et al., 2015).) The hypergraph of the product is shown in Figure 10. Observe how each syntactic context P1, P2, P3 and P4 specified in the Bell scenario Pprogram corresponds to an edge in the hypergraph. In addition, note the structural correspondence of the hypergraph in Figure 10 with the cyclic database schema depicted in Figure 9.
Note that the events in Figure 10 are denoted as various coloured dots with each such dot corresponding directly to a row of a ptable within the cyclic schema.
The Bell scenario program syntactically specifies that there should be “no signalling” between the respective components and via the command model(design: ‘nosignal’,P1,P2,P3,P4). This condition imposes constraints on the allowable probabilistic models on the combined hypergraph structure. Following Definition 3.1.2 in (Acin et al., 2015), a probabilistic model is a “no signalling” model if:
The probabilistic constraints entailed by this definition will be illustrated in an example to follow. Acin et al. (2015)(p45) show that not all probabilistic models of contextuality scenarios composed by a direct product are “no signalling” models. In order to guarantee that all probabilistic models of a combined contextuality scenario are “no signalling” models, the constituent contextuality scenarios and should be combined by the FoulisRandall (FR) product denoted . As with the direct product of contextuality scenarios, the vertices of the FR product are defined by . It is with respect to the hyperedges that there is a difference between the FR product and the direct product:
where
We are now in a position to illustrate the semantics of the Pprogram of Figure 7 by the corresponding contextuality scenario depicted in Figure 11. Observe how the FR product produces the extra edges that span the events across measurement contexts labeled P1, P2, P3 and P4 when compared with the direct product hypergraph depicted in Figure 10, At first these spanning edges may seem arbitrary, but they happen to guarantee that the allowable probabilistic models over the composite contextuality scenario satisfy the “no signalling” condition (Sainz and Wolfe, 2017). By way of illustration, the normalization condition on edges imposes the following constraints (see Figure 11):
(6)  
(7)  
(8)  
(9) 
where and denote the probabilities of events in the respective hyperedges. A consequence of constraints (6) and (8) is that . When considering the associated outcomes this means
(The preceding is an example of one of the constraints imposed by Definition 3.1.2 in (Acin et al., 2015) as specified above). In other words, the marginal probability does not differ across the measurement contexts P1 and P2 specified in the Pprogram of Figure 7. In a similar vein, equations (6) and (9) imply that the marginal probability does not differ across measurement contexts P1 and P2. The stability of marginal probability ensures that “no signalling” is occurring from component to component (see Figure 6). In terms of our BOXER BAT example, “no signalling” implies that the probability of interpretation of the concept BOXER does change whether the priming word for BAT is ball (  sport sense) or vampire (  animal sense).
4.2 Bell contextuality scenario with signalling
Investigations into contextuality in quantum physics involve the “no signalling” condition. However, in cognitive science and related areas, the situation seems isn’t as clear cut. Dzhafarov and Kujala (2015) argue, for example, that the “no signalling” condition seems always to be violated in psychological experiments. By way of illustration, consider once again the conceptual combination BOXER BAT. Recall that the “no signalling” condition entails that the probability of interpretation of the concept BOXER does change whether the priming word for BAT is ball (  sport sense) or vampire (  animal sense). Nor does the probability of interpretation of the concept BAT does change whether the priming word for BOXER is fighter (  sport sense) or dog (  animal sense). However, it is easy to imagine that signalling may be involved in forming an interpretation of BOXER BAT. For example, Wisniewski (1997) identifies a propertybased interpretation of conceptual combinations whereby properties of the modifying concept BOXER apply in some way to the head concept BAT. One way to view this kind of interpretation is that a sense of BOXER is first established and then influences the interpretation of the concept BAT. In other words, the interpretation of the conceptual combination is formed by processing the combination from left to right. In relation to the general system depicted in Figure 6, the preceding situation involves an arrow proceeding from component to , which represents component signalling information to component .
We can model Wisniewski’s property interpretation by extending the Bell scenario to involve signalling as specified in the Pprogram shown in Figure 12.
The signalling from concept to concept in a given measurement context is modelled as the outcome of the coin being dependent on the outcome of the coin. Note that signalling does not occur the other way, namely, the probability of interpretation of does not change according to outcomes measured in relation to component . This fact allows a more refined understanding of the hypergraph semantics depicted in Figure 11 and how these semantics relate to an experimental design which now involves signalling.
In the previous section it was established that the spanning edges in the left hand side of this figure prevents signalling from component to . Conversely, the spanning edges in the right hand side of the figure prevent signalling from to . Therefore, the hypergraph semantics of the signalling Bell scenario specified by the program in Figure 12 does not include these right hand side spanning hyperedges. The resulting hypergraph semantics are depicted as the contextuality scenario shown Figure 13 which is the semantics corresponding to the syntax model(design: ‘signal(A>B)’,P1,P2,P3,P4), where A>B expresses the direction of the signalling between the respective components. Definition 3.2 can then be applied to determine whether a probabilistic model exists in relation to this contextuality scenario. If not, the signalling system modelled by the Pprogram in Figure 12 is deemed to be “strongly contextual”.
5 Discussion
The aim of this article is to take an algorithmic approach for the development of probabilistic models by providing a high level language that makes it convenient for the modeller to express models of a phenomenon that may be contextual. Borrowing from programming language theory, a key feature is the use of syntactic scopes which permits measurement contexts to be specified that correspond to the experimental conditions under which the phenomenon is being examined. The use of syntactic scopes has two consequences. Firstly, random variables local to a scope will be invisible to those local to other scopes. Secondly, each scope returns a probability distribution as a partial model.
The first consequence relates to scopes preventing the incorrect overloading of random variables. This article has attempted to show that the overloading of variables in probabilistic models relates to contextuality namely “contextual situations are those in which seemingly the same random variable changes its identity depending on the conditions under which it is recorded” (Dzhafarov and Kujala, 2014a).
Regarding the second consequence, Abramsky (2015) discovered that the problem of combining partial models into a single model has an equivalent expression in relational database theory where the problem is to determine whether a universal relation exists for a set of relations such that these relations can be recovered from the universal relation via projection. Contextuality occurs when it possible to construct the universal relation. The question to be addressed, then, is how to determine when it is possible, and when it is not.
This article proposes hypergraphs as an underlying semantic structure to address this question. Firstly, an approach developed in relational database theory is used to determine whether the schema of the partial models are acyclic. If so, the hypergraph is exploited to form a join tree which can compute a single model such that the partial models can be retrieved by appropriately marginalizing this model.
When the schema is cyclic, hypergraphs called “contextuality scenarios” are formed. The general picture is the following: Experimental designs are syntactically specified in addition to associated measurement contexts appropriate to the design. Each component can be translated into a contextuality scenario. Multipartite composition of these contextuality scenarios yields a single contextuality scenario corresponding to the experimental design. In this article, we illustrated two Bell scenario designs based on whether the “no signalling” condition holds. If this condition does hold, then the FoulisRandall (FR) product can be used to define the composition. However, when signalling is permitted, means other than the FR product need to be developed. This is an open question which is particularly relevant to psychology experiments where signalling appears to be pervasive. In this regard, recent work on signalling in Bell scenarios may provide a useful basis for further development (Brask and Chaves, 2017). For example, Brask and Chaves (2017) studies relaxations of the “no signalling” condition where different forms of communication are allowed. The Pprogram depicted in Figure 12 modelled one such condition in which outcomes can be unidirectionally communicated between the two components of the assumed model. Investigating contextuality in the presence of signalling is an important issue for cognitive science and related areas. Perhaps surprisingly it is an issue that has received scant attention to date (Dzhafarov and Kujala, 2014b). When signalling is not present, it would be interesting to investigate how variations of multipartite composition of contextuality being investigated in physics may inspire new experimental designs outside of physics (Sainz and Wolfe, 2017).
Once a contextuality scenario has been constructed for the Pprogram, “strong contextuality” occurs when it is not possible to construct a probabilistic model on the underlying hypergraph. If a probabilistic model on the hypergraph is possible, then the random variables are independent of the measurement contexts.
The motivation for demarcating the problem into acyclic vs. cyclic cases is related to efficiency: The number of variables at the schema level is likely to be much smaller than the number of underlying events, especially when one considers larger scale experiments involving numerous random variables. This is not withstanding the fact that determining whether there is a global model turns out to be tractable. Stated more formally, given a contextuality scenario , a linear progam can determine whether strong contextuality holds. (See Proposition 8.1.1 in (Acin et al., 2015)
.) This theoretical result echoes linear programming solutions which have been found for contextual semantics based on sheaf theory
(Abramsky et al., 2016) and selective influence (Dzhafarov and Kujala, 2012).One of the advantages of the hypergraph semantics of contextuality scenarios is that they are general enough to allow contextuality to be investigated in a variety of experimental settings. In the next section we show how contextuality could be investigated in an information fusion setting.
5.1 The use of Pprograms for investigating contextuality in information fusion
Information fusion refers to the problem of making a judgement about an entity, situation, or event by combining data from multiple sources which are simultaneously presented to a human subject. For example, one source might be an image and another might be a social media post. Fusion allows a much better judgment to be made because it is based on multiple sources of evidence. However, the sources may involve uncertainty, for example, the human subject may not trust the source of a social media post, or the image may appear manipulated. As a consequence, a decision of trust may be contextual because a random variable modelling trust may have different functional identities depending on the source stimulus.
Let us now sketch how a Pprogram could be developed to investigate whether trust is contextual. Firstly, imagine that empirical data is collected from human subjects in an experiment. For example, subjects could be simultaneously presented with two visual stimuli as is shown in Figure 14. The left hand stimulus purports to be a image of a typhoon hitting the Phillipines sourced from an obscure Asian media site. The right hand stimulus is sourced from Twitter where the language is unfamiliar (Japanese), but the graphic seems to depict a typhoon tracking towards the Phillipines. The subject must decide if they trust whether the stimuli depict the same event.
Random variables affecting the decision of trust could be defined as follows:

Variables relating to the image: (e.g., “Do you trust that the image does correspond to the situation described by the text?”), (e.g., “Does the image look fake or manipulated in any way?”)

Variables related to the tweet: Credibility (e.g., “Do you trust the source of the tweet to be credible?”), (e.g., “Do you trust that the tweet corresponds to the situation depicted in the image?”).
These four variables allow for an experiment in which one variable of each stimulus is measured, thus imply four measurement contexts based on the the following pairs of variables: ,, and . A between subjects design allows experimental data to be collected in each experimental context meaning a human subject is exposed to only one measurement context in order to counter learning effects. The corresponding Pprogram would therefore include four scopes corresponding to these measurement contexts and each scope would return the corresponding partial model based on the data collected in that measurement context. These four partial models correspond to the pairwise distributions: , , and .
As this program involves a cyclic schema, the situation is similar to that depicted in Figure 9. Therefore, measurement contexts would be defined around observations of individual variables using a signalling Bell scenario design. As subjects are processing both stimuli simultaneously, it raises the possibility of signalling between the left stimulus (component ) and the right stimulus (component ).
6 Summary and Future Directions
The aim of this article to contribute the foundations of a probabilistic programming language that allows exploration of contextuality in wide range of applications relevant to cognitive science and artificial intelligence. The core idea is that probabilistic models are specified as a program with associated semantics which are sensitive to contextuality. The programs feature specific syntactic scopes to specify experimental conditions under which a phenomenon is being examined. Random variables are declared local to a scope, and hence are not visible to other variables. In this way, random variables can be safely overloaded which is convenient for developing models whilst the programming semantics, not the modeller, keeps track of the whether functional identities of the random variables are being preserved.
Hypergraphs were proposed as an underlying structure to specify contextually sensitive program semantics. Firstly, a hypergraph approach developed in relational database theory was used to determine whether the schema of the partial probabilistic models is acyclic. If so, the hypergraph is exploited to form a join tree which can compute a single model such that the partial models can be retrieved by appropriately marginalizing this model. In this case, the phenomenon is noncontextual. When the schema is cyclic, the phenomenon may or may not be contextual. For the cyclic case a hypergraph called a “contextuality scenario” is formed. “Strong contextuality” occurs when it is not possible to construct a probabilistic model on the hypergraph. If it is possible, then each such model is a candidate global model and the phenomenon is noncontextual. Further research could be directed at refining the semantics to admit different types of contextuality (Abramsky and Brandenburger, 2011; Acin et al., 2015), as well as experimental designs based on different variations of signalling (Brask and Chaves, 2017; Curchod et al., 2017).
Just like higher level programming languages, such as functional programming, provided a convenient means for harnessing the power of the lambdacalculus, Pprograms aim to advance the understanding of contextuality, by providing a convenient means for harnessing the power of contextual semantics. As Pprograms are algorithmic, future work could provide syntax to specify the temporal flow of actions using control structures akin to those used in high level programming languages. This feature allows measurements with some causal structure, which is an important topic in cognitive psychology where Bayesian models are often used.
Finally, the overarching aim of this article is to raise awareness of contextuality beyond quantum physics and to contribute formal methods to detect its presence in the form of a convenient programming language.
Acknowledgements
Thanks to the three anonymous reviewers, Bevan Koopman and Ana Belen Sainz for their constructive input and suggestions. This research was supported by the Asian Office of Aerospace Research and Development (AOARD) grant: FA23861714016
References
 Abramsky (2015) Abramsky, S. (2015). Contextual semantics: From quantum mechanics to logic, databases, constraints, and complexity. Bulletin of EATCS, 2(113).
 Abramsky et al. (2016) Abramsky, S., Barbosa, R., and Mansfield, S. (2016). Quantifying contextuality via linear programming. In Proceedings of the 13th International Conference on Quantum Physics and Logic (QPL 2016).
 Abramsky and Brandenburger (2011) Abramsky, S. and Brandenburger, A. (2011). The sheaftheoretic structure of nonlocality and contextuality. New Journal of Physics, 13(113036).
 Acacio De Barros and Oas (2015) Acacio De Barros, J. and Oas, G. (2015). Some examples of contextuality in physics: Implications to quantum cognition. arXiv:1512.00033.
 Acin et al. (2015) Acin, A., Fritz, T., Leverrier, A., and Sainz, A. (2015). A combinatorial apporoach to nonlocality and contextuality. Communications in Mathematical Physics, 334:533–628.
 Aerts et al. (2014) Aerts, D., Gabora, L., and Sozzo, S. (2014). Concept combination, entangled measurements, and prototype theory. Topics in Cognitive Science, 6:129–137.
 Aerts and Sozzo (2014) Aerts, D. and Sozzo, S. (2014). Quantum entanglement in concept combinations. International Journal of Theoretical Physics, 53:3587–3603.
 Asano et al. (2014) Asano, M., Hashimoto, T., Khrennikov, A., Ohya, M., and Tanaka, Y. (2014). Violation of contextual generalization of the Leggett–Garg inequality for recognition of ambiguous figures. Physica Scripta, 2014(T163):014006.
 Atmanspacher and Filk (2010) Atmanspacher, H. and Filk, T. (2010). A proposed test of temporal nonlocality in bistable perception. Journal of Mathematical Psychology, 54:314–321.
 Brask and Chaves (2017) Brask, J. B. and Chaves, R. (2017). Bell scenarios with communication. Journal of Physics A: Mathematical and Theoretical, 50(9):094001.
 Bruza and Abramsky (2016) Bruza, P. and Abramsky, S. (2016). Probabilistic programs: Contextuality and relational database theory. In Acacio De Barros, J., Coecke, B., and Pothos, E., editors, Quantum Interaction: 10th International Conference (QI’2016), Lecture Notes in Computer Science. Springer (In Press).
 Bruza et al. (2015) Bruza, P., Kitto, K., Ramm, B., and Sitbon, L. (2015). A probabilistic framework for analysing the compositionality of conceptual combinations. Journal of Mathematical Psychology, 67:26–38.
 Bruza (2016) Bruza, P. D. (2016). Syntax and operational semantics of a probabilistic programming language with scopes. Journal of Mathematical Psychology 10.1016/j.jmp.2016.06.006 (In press).
 Clauser and Horne (1974) Clauser, J. and Horne, M. (1974). Experimental consequences of objective local theories. Physical Review D, 10(2):526–535.
 Curchod et al. (2017) Curchod, F., Johansson, M., Augusiak, R., Hoban, M., Wittek, P., and Acin, A. (2017). Unbounded randomness certification using sequences of measurements. arXiv:1510.03394v2.
 Dzhafarov and Kujala (2014a) Dzhafarov, E. and Kujala, J. (2014a). Contextuality is about identity of random variables. Physica Scripta, T163(014009).
 Dzhafarov and Kujala (2014b) Dzhafarov, E. and Kujala, J. (2014b). Embedding quantum into classical: Contextualization vs conditionalization. PLoS ONE, 9(3):e92818.
 Dzhafarov and Kujala (2015) Dzhafarov, E. and Kujala, J. (2015). Probabilistic contextuality in EPR/Bohmtype systems with signaling allowed. In Dzhafarov, E., editor, Contextuality from Quantum Physics to Psychology., chapter 12, pages 287–308. World Scientific Press.
 Dzhafarov et al. (2015) Dzhafarov, E., Zhang, R., and Kujala, J. (2015). Is there contextuality in behavioral and social systems? Philosophical Transactions of the Royal Society A, 374(20150099).
 Dzhafarov and Kujala (2012) Dzhafarov, R. and Kujala, J. (2012). Selectivity in probabilistic causality: Where psychology runs into quantum physics. Journal of Mathematical Psychology, 56(1):54–63.
 Gabora and Aerts (2002) Gabora, L. and Aerts, D. (2002). Contextualizing concepts using a mathematical generalization of the quantum formalism. Journal of Experimental and Theoretical Artificial Intelligence, 14:327–358.
 Goodman and Stuhlmüller (2014) Goodman, N. D. and Stuhlmüller, A. (2014). The Design and Implementation of Probabilistic Programming Languages. http://dippl.org. Accessed: 2017914.
 Goodman and Tenenbaum (2016) Goodman, N. D. and Tenenbaum, J. B. (2016). Probabilistic Models of Cognition. http://probmods.org/v2. Accessed: 201765.
 Gordon et al. (2014) Gordon, A., Henzinger, T., Nori, A., and Rajamani, S. (2014). Probabilistic programming. In Proceedings of the on Future of Software Engineering (FOSE 2014), pages 167–181. ACM Press.
 Gronchi and Strambini (2016) Gronchi, G. and Strambini, E. (2016). Quantum cognition and Bell’s inequality: A model for probabilistic judgment bias. Journal of Mathematical Psychology, 78:65–75.
 Gyssens and Paredaens (1984) Gyssens, M. and Paredaens, J. (1984). A decomposition methodology for cyclic databases. In Advances in Database Theory, volume 2, pages 85–122. Springer.
 Henson and Sainz (2015) Henson, J. and Sainz, A. (2015). Macroscopic noncontextuality as a principle for almostquantum correlations. Phyical Review A, 91:042114.
 Howard et al. (2014) Howard, M., Wallman, J., Veitch, V., and Emerson, J. (2014). Contextuality supplies the magic for quantum computation. Nature, 510(7505):351–355.
 Kochen and Specker (1967) Kochen, S. and Specker, E. (1967). The problem of hidden variables in quantum mechanics. Journal of Mathematics and Mechanics, 17(59).

Liu et al. (2011)
Liu, W., Yue, K., and Li, W. (2011).
Constructing the Bayesian network structure from dependencies implied in multiple relational schemas.
Expert systems with Applications, 38:7123–7134.  Pelletier (1994) Pelletier, J. (1994). The principle of semantic compositionality. Topoi, 13:11–24.
 Sainz and Wolfe (2017) Sainz, A. and Wolfe, E. (2017). Multipartite composition of contextuality scenarios. arXiv:1701.05171 [quantph].
 Wisniewski (1997) Wisniewski, E. J. (1997). When concepts combine. Psychonomic Bulletin and Review, 42(2):167–183.
 Wong (1997) Wong, S. (1997). An extended relational model for probabilistic reasoning. Journal of Intelligent Information Systems, 9:181–202.
 Wong (2001) Wong, S. (2001). The relational structure of belief networks. Journal of Intelligent Information Systems, 16:117–148.
 Wulf and Shaw (1973) Wulf, W. and Shaw, M. (1973). Global variable considered harmful. SIGPLAN Notices, 8(2):80–86.
 Zhang and Dzhafarov (2017) Zhang, R. and Dzhafarov, E. S. (2017). Testing contextuality in cyclic psychophysical systems of high ranks. In Acacio De Barros, J., Coecke, B., and Pothos, E., editors, Quantum Interaction: 10th International Conference (QI’2016), Lecture Notes in Computer Science. Springer (In Press).
Comments
There are no comments yet.