1 Introduction
Relevance in Information Retrieval (IR) is widely accepted to be a cognitive feature, driving all our information interactions. All areas of research within IR thus strive to improve relevance of documents to a user’s information need (IN). These research areas of IR can be broadly divided into two: systemoriented and useroriented IR. Whereas the systemoriented viewpoint ties relevance to be an objective property of the document and query content, the useroriented approach to IR views relevance as a cognitive property. Although IR fundamentally involves user interaction and decisionmaking, the useroriented approach has been found harder to implement, especially in evaluating performance of IR systems. This is because of the variability in user judgements of relevance [cool_belkin_2011]. Systemoriented IR thus sought to standardise IR evaluation, in which the usercognitive notion of relevance was replaced by an objective, topical relevance. This led to evaluation methodologies based on the Cranfield and TREC type test collections. The user and all of his/her contexts were removed from the evaluation process.
Recent surge in availability of online user data has led to incorporation of more user context in the computation of relevance, e.g. in learning based ranking algorithms. This context is based on the user’s past interactions with the system, in addition to user attributes like age, interests, etc. and current attributes like location, type of device, etc. The common feature in these various contexts is that they are static. They are determined before the point of user’s interaction with the IR system. However, the process of IR is interactive and dynamic. In this paper, we focus on another type of context driving user interactions  dynamic context. Dynamic context is one which changes user’s cognitive state during information interaction.
One wellknown example of when a dynamic context affects relevance is the phenomenon of Order Effect [order_Hogarth1992]. Order effects have been investigated and found to exist in IR in the presentation order of documents [Eisenberg1988_order, borlund_order, Huang2004_order, Xu2008_order]. For example, in a recent study reported in [benyou_quantum_interf_Order], two groups of participants were presented with a pair of documents and
in two different orders. For some of such pairs, it was found that the relevance of a document judged by users is different depending on the order it was presented. Although the phenomenon may appear to have an intuitive explanation, it violates one of the fundamental assumptions of classical probability theory  joint distributions, where, for two random variables representing relevance of the documents 
, , , i.e., the order of judging the documents does not matter. Order effects violate this fundamental assumption. Such order effects have also been investigated and reported in between the different dimensions of relevance, like Topicality, Understandability, Reliability, etc. [bruza_perceptions_of_document_relevance, Uprety:2018:IOE:3234944.3234972, uprety2019modelling], where different orders of dimensions considered to judge a document lead to different relevance judgements.The field of Quantum Cognition [Busemeyer:2012:QMC:2385442] offers a generalised framework to model probabilistic outcomes of human decisionmaking. It has been successful in modelling and predicting order effects [Trueblood2011_quantum_account_ordereff, Wang2013] and other paradoxical findings where axioms of classical probability theory are violated [Pothos2009_quan_explan_irrational, Busemeyer2011_quantum_expl_prob_errors]. Conceptually, it challenges the notion that cognitive states have predefined values and that a measurement merely records them. Instead, the act of measurement creates a definite state out of an indefinite state and in doing so, changes the initial state of the cognitive system. In terms of relevance, we cannot preassign relevance of a document for a user. Instead, relevance is defined only at the point of interaction of the user’s cognitive state with the document. Therefore, judgement of document first, changes user’s initial state and the subsequent judgement of relevance of is different than when is judged before . Should relevance of the documents for a user be a predefined entity, it would not be influenced by judgement of other documents and a joint distribution over relevance of the two documents would exist. We also say that these two measurements of relevance are incompatible with each other. That is, it is not possible to jointly consider the relevance of the two documents, at the same time. At the mathematical level, measurements in quantum theory are represented by operators, which in general, do not commute with each other.
In a classical system, all measurements will commute with each other. However, conversely, commutativity of measurements does not necessarily imply that the system is classical. Therefore, the type of measurements becomes imperative in identifying a quantum system. Even then, not all measurements on quantum systems generate data violating the classical probability theory. The system needs to be probed in a way which exploits the underlying quantum structure. In physics, this was done by experiments such as SternGerlach and doubleslit experiments [sakurai] which showed the violation of classical probability principles for microscopic particles like electrons and photons. In cognitive science too, several experiments performed by Tversky, Kahneman and colleagues showed such violations in human decisionmaking under uncertainty [Tversky1974].
Recently, an experiment protocol inspired by the SternGerlach experiment in Physics has provided a new way to probe cognitive systems such that they exhibit a quantumlike structure [fell:dehdashti:bruza:moreira:2019]. By quantumlike structure we mean the representation of a system using the mathematical framework of quantum theory in order to model and predict the experimental data. In [uprety2019modelling], this experiment was performed in an IR scenario involving judgement of relevance with respect to different dimensions. Extending from the SternGerlach protocol, in this paper we design a new experiment to show the violation of classical probability theory in multidimensional relevance judgements. We hypothesise that multidimensional relevance judgement has an underlying quantumlike structure, which when subject to appropriate measurement design can exhibit violations of classical probability theory. Specifically, we investigate the violation of a particular axiom of Kolmogorovian probability theory [kolmogorov_book]. Our results show that the experimental data indeed violates classical probability theory, and a quantum framework provides more accurate predictions to describe the data. This experiment not only shows the necessity of the quantum framework as an alternative for constructing probabilistic models, but also gives novel insights into user behaviour in IR. This understanding can contribute to improvement of interactive IR systems and we also discuss such implications in this paper.
2 SternGerlach Inspired Protocol for Multidimensional Relevance
The basis of the research reported in this paper is the cognitive analogue of the SternGerlach (SG) experiment, originally conducted in [uprety2019modelling]. The SG experiment [sakurai] was an important milestone in quantum physics as it showed the nonclassical behaviour of microscopic systems. The key was a particular design of the experiment which exploited the incompatibility between measurement of electron spin states along different axes. An electron has a particular property called spin, having two possible values  up (+), down (), which can be measured along different axes. An electron may have spin state along the xaxis but state along yaxis. So the outcome of measurement of the spin property of the electron depends upon the axis of measurement. Also, any measurement of spin disturbs the system. If a measurement of spin is made along X axis and Z axis, then a third measurement along X axis may give a different answer than the first one. This phenomena is called measurement incompatibility, where two measurements cannot be jointly conducted on a system  one measurement disturbs the system and the other would then measure the changed system.
The SG experiment also describes the minimum number of measurements required from a system to construct a complexvalued Hilbert Space structure. In particular, we need three incompatible measurements each with two mutually exclusive outcomes. We can use this arrangement of measuring properties of a quantum system to measure relevance of a document in IR. For this, we consider three dimensions of relevance: Topicality ()  whether a document is topically relevant to a query, Understandability ()  how easy it is to understand the content of the document, and Reliability ()  how much can the document be relied upon. Each of these three dimensions can be posed as questions requiring a Yes/No type answer (denoted as and respectively) for a document. These three dimensions are important factors considered by users for deciding relevance. Besides, they are tied to a single document, unlike diversity or novelty, which is always considered in comparison with other documents. Certain dimensions like Interest, Habit, etc. are difficult to to ascertain via crowdsourcing. As reported in [bruza_perceptions_of_document_relevance], the different relevance dimensions can exhibit incompatibility for certain querydocument pairs.
In [uprety2019modelling], three querydocument pairs were designed in such a way as to potentially exhibit incompatibility between judgement of relevance with respect to different dimensions. The content of the documents was altered to introduce uncertainty in judging each of the three dimensions. The participants were presented with three questions related to three relevance dimensions, for each querydocument pair, in line with the SG design. Figure 1 shows the three questions asked to two different groups in different orders. More details about this design can be found in [uprety2019modelling] and [fell:dehdashti:bruza:moreira:2019]. This setup enables one to construct a complexvalued Hilbert space, which models the quantumlike structure of the user’s cognitive state during information interaction.
2.1 Constructing ComplexValued Hilbert Space
The first step in building a quantum probabilistic model is to construct a representation for the user’s cognitive state. In the quantum framework, a complexvalued Hilbert space is used to represent a quantum system, and the state of the system is represented as a vector in this Hilbert space.
Following the convention used in Quantum Physics, we represent any complexvalued vector in a finite dimensional Hilbert space as a ket vector and its complex conjugate as a bra vector . The norm of this vector is the square root of its inner product with its conjugate  . For two such vectors, their projection onto each other is given as the square of their inner product  . Each vector is written as a linear combination of the vectors of the basis in which it is represented. For the purpose of representing the cognitive state of a person judging a document as topically relevant or topically irrelevant, we consider a basis formed by two orthogonal vectors and respectively. Before a user considers a judgement of topicality, the cognitive state is indefinite with respect to considering the document as topically relevant or irrelevant. Both potentialities exist. We say that the cognitive state collapses to either or after the judgement. Before the judgement, we can represent the indefinite cognitive state in terms of probabilities of its potential responses. This is represented as a linear combination of the two basis states, weighted each by real or complex coefficients (called probability amplitudes), such that the square of the probability amplitude gives the probability of collapsing to the respective state. The initial state is thus written as:
(1) 
In the SG inspired experiment design, we ask the user sequential questions about judgement of Topicality (T), Understandability (U) and Reliability (R) in the order TUR or TRU, as shown in Figure 1. Therefore we represent the cognitive state w.r.t Understandability and Reliability in term of Topicality:
(2) 
is constructed using the fact that and are orthogonal. is the probability that users judge a document Understandable, given that they have judged it as Topically relevant.
Refer to [uprety2019modelling] (Section 3) or [sakurai] (Chapter 1) for the necessity of using a complexvalued probability amplitude in the representation of Reliability in term of Topicality:
(3) 
The parameters (, and ) comprise the construction of the Hilbert space for user’s cognitive state w.r.t the interaction between the three dimensions. The parameter defines the initial state. The experiment design of Figure 1 was carried out in [uprety2019modelling] for three queries. The results are listed in Figure 2.
3 Formulation of Research Hypotheses
Using the complexvalued Hilbert Space of multidimensional relevance, this paper aims to design an extended experiment to test the following research hypotheses: (1) Fundamental axioms of classical Kolmogorov probability are violated in a multidimensional relevance judgement scenario; (2) Probabilities obtained from the experiment can be better predicted with quantum than classical (Bayesian) probabilistic models. In the following two subsections, we mathematically formulate these hypotheses.
3.1 Violation of Kolmogorov probability and Quantum Correction
Quantum probabilities are generalisation of Kolmogorov probabilities. In fact, Kolmogorov probabilities are related to set theory which formalises Boolean logic. The following proposition gives one of their fundamental properties [kolmogorov_book]:
(4) 
where , are subsets of the set of all alternatives , and , are the corresponding probabilities.
The axiom will be violated if the value of is different from zero.
In the quantum probability theory, the computation of probabilities are represented by projection operators for the events and corresponding to relevance or nonrelevance with respect to Understandability and Reliability. The analogue of relation (4) in quantum mechanics is given by the following definition [vourdas2019probabilistic]:
(5) 
where projection operators and are given by:
(6) 
It is possible to prove that this quantum correction term is proportional to the commutator of the projection operators of and [vourdas2019probabilistic] and can be thus obtained as :
(7) 
where stands for the commutator for two operators and . The projection operator is equal to the outer product of the state with itself, where the vector is computed using Equation 2. In order to construct the vector, first the Topicality basis is represented as the standard basis and hence the orthogonal vectors and are given as:
(8) 
Thus, vectors and are given as:
(9) 
Then the projector is given as:
Similarly, is :
From the values of and obtained in [uprety2019modelling], these projection operators can be constructed. The quantum analogue of , can then be calculated from Equation (7). Value of obtained from our experiment is compared to that predicted by the classical (always zero) and quantum probability frameworks.
3.2 Quantum Probabilities vs Classical Probabilities
The violation of Kolmogorovian probability axiom by a given system would likely lead to inaccurate predictions on the system using Kolmogorovian probability. This subsection formulates computation of conditional probabilities of relevance judgement along one dimension given another, using classical vs. quantum frameworks. They will be compared for our experimental data in Section 5.
For an initial state of the system , the probability of event in the quantum framework is given by , i.e., square of projection of vector onto vector . The probability for sequence following is given as [Busemeyer:2012:QMC:2385442]:
(10) 
The quantum framework does not define joint probability of events and , as in general . As we can see , which for is not equal to in Equation 10. The conditional probabilities are given according to Luder’s rule [Busemeyer:2012:QMC:2385442, khrennikov2019basics] as:
(11)  
Note that subscript is added to distinguish from classical conditional probability. Then is given as (see [uprety2019modelling] Section 4.2 for derivation):
(12)  
In contrast, classical probability theory has the basic assumption of commutativity of two events. Therefore the joint probability distribution always exists, which is the basis of calculating conditional probabilities in Bayes’ rule. Consequently, for events
, and we have:(13) 
which can be written in terms of conditional probabilities as:
(14) 
This enables calculation of conditional probabilities using the Bayes rule:
(15) 
Similarly, the other conditional probabilities can be obtained. Again, note that the probabilities in Equations (15) and (12) are different because of the difference in the underlying assumption of commutativity or joint probability.
4 Experiment
4.1 Methodology
The main aim of this experiment is to investigate the violation of Equation 4. We already have the single question probabilities from the experiment in [uprety2019modelling] and we need to obtain the probabilities of conjunction and disjunction. We do so by posing questions about Understandability and Reliability at the same time, as a pair, rather than sequentially. Each of the dimensions have two outcomes (e.g. Reliable or Not Reliable) and therefore we construct four pairs of statements, as listed in Figure 4. For the disjunction measurement, we ask the participants to select whether they agree with at least one of the two statements or none of them (corresponding to a Boolean Or condition). For a conjunction measurement on each of the four statement pairs, we ask the participants whether they agree with both of the questions or not. Figures (a)a and (b)b show the designs for the disjunction and conjunction questions for a querydocument pair. We now have a total of eight such questions and we follow a betweensubjects design such that a participant is shown only one of these eight questions randomly. Note that we are able to use the probabilities from the experiment in [uprety2019modelling] because our experiment is a betweensubjects design. The same participant is not asked all the questions  to avoid memory bias. The design is summarised in the following steps for each of the three querydocument pairs:

The participants are shown information need, query and document snippet.

Next, they are asked a Yes/No question about the Topicality of the document. This is to prepare the cognitive state of all participants by projecting their initial/background state onto the Topicality subspace of the underlying Hilbert space constructed in the previous experiment in [uprety2019modelling].

Lastly, they are randomly shown one of the eight possible conjunction or disjunction questions and asked to choose the appropriate answer.
4.2 Participants and Material
We recruited 335 participants for the experiment using the online crowdsourcing platform Prolific (prolific.ac). The study was designed using the survey platform Qualtrics (qualtrics.com/uk). The participants were paid at a rate of £6.30 per hour. We sought the participants’ consent and complied with the local data protection guidelines. The study was approved by The Open University UK’s Human Research Ethics Committee with reference number HREC/3063/Uprety. We use the same set of three querydocument pairs for our experiment as used in [uprety2019modelling], as we have reused some of their data. Each participant was shown the three queries (and the documents) and were asked to judge the topicality of the document and one of the eight questions (so we obtain probabilities like , etc.) Thus the participants can be said to be divided into eight groups for a betweensubjects design.
5 Results and Discussion
5.1 Violation of Kolmogorov probability axiom
The probabilities of conjunction and disjunction of the Understandability and Reliability questions are reported in Figure 6. In order to compute the reported in Equation 4, we also need the two probabilities related to single questions and , apart from the conjunction and disjunction probabilities. These single question probabilities are obtained from the results in [uprety2019modelling] (listed in Figure 2). Then, we calculate . In Figure 6 we see that is different from zero for all the three queries, although according to classical probability we expect that would be zero in all cases. Equation (7), based on the projection operators in quantum probability, gives predictions of , as are shown in the last column of the table.
The violation of classical probability is a result of noncommutative structure of operators for and . As we can see, if operators of and commute with each other, the quantum correction term in the Equation (7) approaches zero (the commutator is zero). In fact, the probability values obtained may violate some of the other basic axioms of classical/Kolmogorovian probability. For example, for Query 2, we can see that and which clearly violates . Also, for this query, is greater than both and . This type of violation has been termed as conjunction fallacy in the cognitive science literature [Tversky1983_conjunction]. Quantum models have been previously used to explain such violation [Busemeyer2011_quantum_expl_prob_errors] where the fundamental notion of incompatibility in judgements is identified as the potential cause.
5.2 Comparison of Quantum and Classical probability predictions
Figure 7 shows a comparison between quantum and classical probabilities with the experimental data for first two queries. The data for Query 3 had many probabilities close to 0 (see Figure 2
) and hence the sample became too small for a meaningful comparison. The probabilities are calculated for prediction of judgement of Reliability given the participant has judged Understandability and Topicality (positively), using equations derived in Section 2.3. Bayesian probabilities, in some cases, are significantly different from experimental data (
for query 1 and for query 2). Quantum probabilities are consistently closer to the experimental data.The Bayesian probabilities, as mentioned earlier, are based on the chain rule
. The fundamental assumption here is that the variables corresponding to , and can be jointly measured. In terms of the judgement process, this implies that a user can jointly consider information regarding the Reliability, Understandability and Topicality of a document with respect to the query. The incompatibility revealed in [uprety2019modelling] and the order effects shown in [bruza_perceptions_of_document_relevance] suggest that this is not always the case in general. Therefore we see Bayesian predictions deviate from the experimental data. As the quantum probability theory based on the Hilbert space model is free from this assumption of compatibility, it provides a promising alternative model that gives predictions closer to the experimental data. In fact, the modelling of incompatibility of different judgement perspectives forms one of the pillars of the Quantum Cognition research framework.6 Implications for IR
Quantum models can capture richer cognitive interactions, by way of generalising some of the constraints of classical models like commutativity. Here we discuss a few cases where our findings can inform the design of IR systems and algorithms.
The impossibility of jointly modelling Reliability and Understandability (which leads to the Kolmogorovian axiom violations) can be attributed to the fact that humans make decisions in a sequential manner and consideration of one dimension affects the judgement of the next dimension. Therefore, different orders of consideration of dimensions would lead to different final relevance judgements, making the order a factor in the variability of relevance judgements by users. When using an IR system to perform a task or make an important decision, there might be a particular order of dimensions which can lead the user to make an optimal decision. For example, for a health related query, a user might find a document difficult to understand, which may affect his or her judgement of Reliability and hence the overall relevance. However, if another user first judges reliability and finds it highly reliable, the judgement of understandability might be different. The IR system can help users to consider the optimum sequence of dimensions and thus maximise the utility, by providing extra information. For example, if the system can also provide information about the Reliability of the document in terms of a Reliability score or ratings by other users, it can reduce uncertainty in judgement and thus minimise the influence of judgement of other dimensions. Thus, for the given medical document, the low understandability might not affect the perception of Reliability.
Secondly, quantum probabilistic models can replace Bayesian models used in IR algorithms for ranking and evaluation. For example, in [Palotti_multidimensional_evaluation]
, a multidimensional evaluation metric is proposed where the gain provided by a document is written as a function of the joint probability of relevance with respect to different dimensions, e.g.
. Similar assumptions have also been made in [Palotti_understandability, zuccon_understandability_evaluation]. For documents exhibiting incompatibility between different dimensions, predictions from such a model will be inaccurate. A probabilistic model based on noncommutative operator algebra, accounting for the incompatibility between different dimensions, needs to be considered.Finally, these results of violation of classical probability theory calls for further user behaviour experiments to be conducted in IR that further exploit the Quantumlike Structure in human judgements. It would require novel experimental protocols like that of SternGerlach, Doubleslit experiment, etc., to generate data beyond the modelling capacity of classical probability theory. Such experiments in themselves might lead us to new insights into user behaviour in IR and information based decisionmaking in general.
7 Conclusion
Extending a quantuminspired experiment protocol, in this work, we begin with the hypothesis that the multidimensional property of relevance has an underlying quantum cognitive structure which can be shown as violation of certain classical (Kolmogorovian) probability axioms. A particular experimental design is reported which can exploit the quantum cognitive structure. The data shows violation of one of Kolmogorovian probability axioms. We further show that quantum probability theory is a better alternative to model multidimensional relevance judgements than its classical counterpart, i.e. Bayesian model. Finally, we highlight important implications of our research findings to the design of IR algorithms system and user experiments.
Acknowledgements
Authors affiliated to the universities in UK, Italy and China are funded by the European Union’s Horizon 2020 research and innovation programme under the Marie SklodowskaCurie grant agreement No 721321, National Key Research and Development Program of China (grant No. 2018YFC0831704) and Natural Science Foundation of China (grant No. U1636203). Authors affiliated to QUT, Australia are supported by the Asian Office of Aerospace Research and Development (AOARD) grant: FA23861714016.