This paper is about quantum analogues of Bayesian reasoning. It works towards one main result, Theorem 3
below, which gives a relation between locally updating a joint state and Bayesian inference. This is a fundamental matter, which requires some preparation in order to set the scene.
We use the term ‘classical’ probability for the ordinary, non-quantum form. We often use the word ‘state’ for a probability distribution, both in the classical and the quantum case. Classical Bayesian probability is based on what is called Bayes’ rule. It describes probabilities of events (evidence) in an updated state. In fact, there are two closely related rules, sometimes called ‘product rule’ en ‘Bayes rule’ (proper). Making this distinction is not so relevant in the classical case, but, as we shall see, it is very relevant in the quantum case.
The paper starts with the back-and-forth constructions between a joint state (distribution) on the one hand, and a channel with an initial state on the other. A channel is a categorical abstraction of a conditional probability. We shall describe this process in terms of pairing and disintegration, following . This process has a logical dimension that relates locally updating a joint state (‘crossover inference’) and Bayesian inference via the associated channel, in a result called the Bayesian Inference Theorem (see Theorem 2 below). This result is already described in , but is repeated here in more concrete form, and illustrated with an example.
The second part of the paper is about analogues in the quantum world. The constructions back-and-forth between a joint state and a channel exist in the literature  and are adapted to the current context. What is new here is the quantum logical analogue of this back-and-forth process. It is shown that updating a state with new evidence, in the form of a predicate, splits in two operations, which we call ‘lower’ and ‘upper’ conditioning. Both forms exist already, but not as counterparts. We show that the earlier mentioned product rule holds for lower conditioning, but Bayes’ rule itself holds for upper conditioning. In classical probability, the ‘lower’ and ‘upper’ versions coincide.
In a next step, the main result of the paper (Theorem 3) shows how ‘lower’ updating a joint state can equivalently be done via Bayesian inference with ‘upper’ conditioning, using the channel that is extracted from the joint state. This puts lower and upper conditioning into perspective and unveils some fundamental aspects of a quantum Bayesian theory.
Finally, there are two separate points worth emphasing. First, several constructions in this paper are illustrated with concrete calculations, via the Python-based tool EfProb ; it works both for classical and quantum probability and uses a common language for both. Next, along the way we find a novel result about how disintegration introduces ‘semi’ higher order structure in discrete probability, see Subsection 3.2.
2 Basics of discrete classical probability
This section recalls the basics of (classical, finite) discrete probability and fixes notation. For more information we refer to . A distribution, also called a state, on a set is a function with finite support and with . Such a distribution can also be written as formal convex sum . We write for the set of such distributions. The mapping is a monad on the category of sets, called the distribution monad.
A joint state is a state on an -ary product set. A binary state is thus a distribution . It has first and second marginals, written here as and . These marginal states are defined in the standard way as and . In the other direction, two states can be combined to product state via . Obviously, .
A channel is a function of the form , that is, a map in the Kleisli category of the distribution monad . Such a channel has a Kleisli extension function, or state transformer, given by . For another channel there is a composite channel given by . Channels can be combined to a product channel by .
A (fuzzy) predicate on a set is a function . For another predicate there is a (sequential) conjunction predicate on via . For two predicates on different sets we can form a parallel conjunction predicate , given by . There is always a truth channel given by .
For a state and a predicate on the same set the validity in is the expected value . If this validity is non-zero, one can form a conditioned state on , given by . This updated state is called ‘ given ’, and is commonly written as . It is easy to check to see that conditioning with truth does nothing: .
Assuming the conditionings of the states below exist, we have the ‘product’ rule on the left, and the ‘Bayesian’ rule on the right:
Moreover, successive conditioning can be reduced to a single conditioning, as on the left below, so that conditioning becomes commutative, as on the right:
Each channel also gives rise to a predicate transformer function , given by . We can now relate validity and state/predicate transformation ( and ) via the following fundamental equality of validities:
3 Classical Bayesian nets and disintegration
A major rationale for using Bayesian networks[23, 3, 2, 19] is efficiency of representation: a joint probability distribution (state) on multiple sample spaces (domains) quickly becomes very large. Representing the same distribution in graphical form, as a ‘Bayesian network’ is often much more efficient. The directed graph structure is determined by conditional independence. Semantically, the directed arcs are given by channels, that is by stochastic matrices, or more abstractly by Kleisli morphisms for the distribution monad [9, 16, 17].
The essence of this semantical view on Bayesian network theory consists of two parts.
The ability to move back-and-forth between a joint state and a graph (network) of channels. The difficult direction is extracting the various channels of the graph from a joint state. This is often called disintegration .
Equivalence of inference via joint states and inference via associated channels. In general, inference (or, Bayesian learning) happens via conditioning (updating, revising) of states, in the light of evidence given by predicates. Inference involves the propagation of such conditioning via joint states and/or via channels, via the bank-and-forth connections in 1, both in a forward and backward direction (as in [16, 18]).
Point 1 is well-known, but point 2 is usually left implicit; it is however a crucial part of why efficient representation of (big) joint states as Bayesian network graphs can be used for Bayesian reasoning. In this section we briefly elaborate both points below, and illustrate them with an example.
Note that we do not claim that with these two points 1 and 2 we capture all essentials of Bayesian network theory: e.g., we do not address the matter of how to turn a joint state into a graph, via conditional independence or via causality. This question has also be studied in a quantum setting, see e.g. .
Abstractly, point 1 involves the correspondence between a joint state on the one hand, and a channel and a (single) state on the other hand. In one direction this is easy: given a state on and a channel we can form a joint state on , namely as: , where is the copier channel with . This construction is drawn as a picture on the right (4), using the graphical language associated with monoidal categories. It will be used here only as illustration, hopefully in an intuitive self-explanatory manner. We refer to [25, 7, 4] for details.
Going in the other direction, from a joint state to a channel is less trivial. It is called disintegration e.g. in . It involves a joint state on from which a channel is extracted, in such a way that itself can be reconstructed from its first marginal and the channel . Pictorially this marginal is represented by blocking its second wire via the ground symbol . We write for this extracted channel . Then we can write Equation (5) as .
Extracted channels exist and are unique in classical discrete probability, for joint states whose first marginal has full support.
First, a state and a channel yield a joint state , namely, as described in (4) above:
In the other direction, let be a joint state whose first marginal has full support. The latter means that its support is the whole of , so that: for each , or, equivalently, . We can now define a channel by:
For a more systematic, diagrammatic description of disintegration, also for continuous probability, we refer to . Here we only need it for discrete probability, as a preparation for the quantum case.
3.2 Excursion on disintegration and semi-exponentials
We conclude this part on disintegration with a novel observation. It is interesting in itself, but it does not play a role in the sequel. It shows that disintegration gives rise to higher order ‘semi-exponential’ structure, originally introduced in . Recall that a categorical description of a (proper) exponential in a cartesian closed category involves exponent objects with an evaluation map such that for each map there is an abstraction map . These and should satisfy:
The last two equations ensure that is the unique map with , since: .
For a semi-exponential, the last equation in (6) need not hold. A semi-exponential is thus more than a ‘weak’ exponential (only the first equation) since it also satisfies naturality (the second equation). In the language of the -calculus, having ‘semi-exponentials’ means that one has a -equation, but not an -equation, see  or  for more details.
Let be the subcategory of the Kleisli
category of the distribution monad on with only
finite sets as objects. This category is symmetric monoidal ‘semi’ closed: it has semi-exponentials, which
are semi-right adjoint to the (standard) tensor product.
is symmetric monoidal ‘semi’ closed: it has semi-exponentials, which are semi-right adjoint to the (standard) tensor product.
It is well-known that cartesian products on sets and parallel product on Kleisli maps (channels) makes the category , and also , symmetric monoidal closed. We sketch how semi-exponentials are obtained via joint states whose first marginal has full support:
This definition assumes that is not the empty set. In that case we can set , the singleton set, since so that there is a trivial correspondence between maps and maps .
We define an evaluation channel via disintegration:
For abstraction, let be given. We define as:
and where is the number of elements in . Here we construct a joint state
as in the beginning of this section, from the uniform distributionon and the channel . We need to check that is well-defined, in particular that each first marginal of has full support:
It is easy to check that the first two equations from (6) hold.
3.3 Bayesian inference and disintegration
We now turn to the second point 2 from the very beginning of this section, about Bayesian inference, especially in relation to the passage back-and-forth between joint states and channels via pairing and extraction, as just described.
It may happen that a joint state is equal to the product of its two marginals, i.e. . The state is then called non-entwined. The more common case is that a joint state is entwinted, and its marginal components are correlated. If we then update in one component, we see a change in the other component. This is called crossover influence in [17, 18].
The essence of the point 2, in the beginning of this section, about inference and disintegration is that for a joint state , this crossover influence can be propagated through the channel that is extracted from the state via disintegration. This is expressed in the next result, called the Bayesian Inference Theorem.
Let be a joint state, and the extracted channel obtained via disintegration — as described in Subsection 3.1. For predicates and we then have:
The first equation describes crossover inference on the left-hand-side as forward inference on the right: first update and then do state transformation . The second equation in (7) describes crossover inference in the other component as backward inference: first do predicate transformation and then update. The terminology of ‘forward’ and ‘backward’ inference comes from , see also . An abstract graphical proof of the equations (7) is given in . But it is not hard to prove these equations concretely, by unwrapping the definitions.
3.4 An illustration of inference in a classical Bayesian network
We consider the relation between smoking and the presence of ashtrays and (lung) cancer, in the following simple Bayesian network.
Thus, 95% of people who smoke have an ashtray in their home, and 25% of the non-smokers too. On the right we see that in this situation a smoker has 40% chance of developing cancer, whereas a non-smoker only has 5% chance.
The question we want to address is: what is the influence of the presence or absence of an ashtray on the probability of developing cancer? Here the presence/absence of the ashtray is the ‘evidence’, whose influence is propagated through the network. We shall describe the outcome using the EfProb tool , concentrating on evidence propagation, and not so much on the precise represenation of the above network, using channels a and c associated with the conditional probability tables.
We first consider the prior probabilities of smoking, ashtray, and cancer:
The network gives rise to a joint state, by tupling the ashtray, identity and cancer channels, and applying them to the smoking state. We can then obtain the above three prior probabilities alternatively via three marginalisations of this joint state, namely as first, second, third marginals, by using in EfProb the corresponding masks [1,0,0], [0,1,0], [0,0,1] after the marginalisation sign \%.
We now wish to infer the (adapted) cancer probability when we have evidence of ashtrays. We shall do this in two ways, first via crossover inference using the above joint state. The ashtray evidence tt needs to be extended (weakened) to a predicate with the same domain as the joint state. In the Equations (7) this is written as: , but in EfProb it is: tt @ truth(bnd) @truth(bnd). We first use this predicate for updating the joint state, written as / in EfProb, and then we marginalise to obtain the third ‘cancer’ component that we are interested in:
Alternatively we can compute this posterior cancer probability by following the graph structure. The ashtray evidence tt is now first turned into predicate a<< tt on the state smoking. After updating this state, we transform it to an updated cancer probability, via state transformation >>. We can do this down-and-up propagation in one go:
The fact that we get the same distribution is an instance of the equations (7). As expected, in presence of ashtrays the probability of cancer is higher.
Aside: clearly, ashtrays influence (the probability of) cancer, but they are not the cause; in the graph this influence happens via a common ancestor, namely smoking, working statistically as ‘confounder’, and as the actual cause of cancer.
4 Towards quantum Bayesian theory
The main aim of this paper is to investigate quantum analogues of the Bayesian Inference Theorem 2, from the conviction that any adequate quantum Bayesian network theory should address these points 1 and 2 from the beginning of Section 3 in a satisfactory manner. Point 1 has received ample attention in quantum theory, see for instance [20, 8, 21, 1]. But Point 2 involving quantum conditioning has not really been studied this explicitly. Our main result is that one can also describe quantum conditioning consistently, both on joint states and via channels, as in Equations (7), but this requires in the quantum case that one distinguishes two forms of conditioning, which we shall call lower and upper conditioning, written as and respectively111The terminology ‘lower’ and ‘upper’ is simply determined by the position of the predicate , low in and up in .. Classically these two forms of conditioning coincide, but the quantum world is more subtle — as usual. Lower conditioning has appeared in effectus theory  and upper conditioning in the approach of . Here they are clearly distinguished for the first time, and used jointly to capture quantum inference and propagation of evidence. Interestingly, what is commonly called Bayes’ rule holds for upper conditioning, but not for lower conditioning, for which we “only” have the product rule.
First we introduce the basics about states and predicates in the quantum world. We shall do so for finite-dimensional quantum theory, using the formalism of Hilbert spaces.
4.1 Basics of quantum probability
Let be a finite-dimensional complex Hilbert space. A state of is a positive operator on with trace one. That is, is linear function satisfying and . A state is often called a density matrix
. The canonical way to define a state is to start from a vectorwith norm , and consider the operator . It sends any element to the vector . An arbitrary state is a convex combination of such vector states . A joint state on two Hilbert space and is a state on the tensor product .
A predicate, also called an effect, is a positive operator on below the identity: . The identity is given by the identity/unit matrix, and corresponds to the truth predicate, often written as . For each predicate there is an orthosupplement, written as , playing the role of negation. It is defined by , and satisfies: and .
The most interesting logical operation on quantum predicates is sequential conjunction . It is defined via the square root operation on predicates, as:
We pronounce as ‘and-then’, and read it as: after with its side-effect, the predicate holds. This operation has been studied in , and re-emerged in effectus theory [13, 6]. The square root of the matrix exists since is positive. It is computed via diagonalisation , where , in which is obtained from the diagonal matrix
by taking the square roots of the (positive) eigenvalues on the diagonal.
States and predicates of the same Hilbert space can be combined in validity, defined as:
This standard definition is also known as the Born rule.
There is a standard way to embed classical probability into quantum probability. Suppose we have classical state and predicate on a finite set with elements. Then we consider the Hilbert space with standard basis given by vectors with an on the -th position and zeros elsewhere. We write for the ‘diagonal’ quantum state on . By construction it is positive and has trace .
Similarly, a classical predicate gives a quantum predicate on via . By construction, . It is easy to see that the classical and quantum validities coincide:
The mapping preserves the logical structure on predicates, including sequential conjunction .
In both classical and quantum probability, as described here, a state is also a predicate. This is pecular. When one moves to a higher level of abstraction, this is no longer the case — for instance by using von Neumann algebras instead of Hilbert spaces, or by using continuous probability distributions on measurable spaces instead of discrete distributions on sets. In the next section we sometimes ‘convert’ a state into a predicate, but we shall make explicit when we do so. A more abstract approach is possible, using the duality between states and effects, see also Remark 4.
4.2 Two forms of quantum conditioning
This subsection introduces two forms of quantum conditioning of a state by a predicate, called ‘lower’ and ‘upper’ conditioning, and describes their basic properties.
Let be a state, and a predicate, on the same Hilbert space, for which the validity is non-zero. We shall use the following terminology, notation and definition for two forms of conditioning:
It is easy to see that both and are states again — using the familiar ‘rotation’ property of traces: . Lower conditioning arises in effectus theory, whereas upper conditioning comes from . We first observe that this difference between ‘lower’ and ‘upper’ does not exist classically.
For classical (non-quantum) states and predicates, lower and upper conditioning coincide with classical conditioning. To express this more precisely we use the notation from Remark 1 to translate from classical to quantum: for a classical state and predicate ,
Diagonal matrices commute, so that .
A second observation is about truth and sequential conjunction . Both lower and upper conditioning with truth does nothing, like in the classical case, but successive conditioning cannot be reduced to single conditioning, like in the first equation in (2), in Proposition 1. In addition, the order in quantum conditioning matters, just like the order of priming in psychology matters .
We have and , but in general successive quantum conditionings cannot be reduced to a single conditioning via sequential conjunction:
Similarly, in general, quantum conditionings do not commute:
The ‘product’ rule holds for lower conditioning and Bayes’ rule holds for upper conditioning:
We simply go through the computations:
5 Quantum channels
In order to express the quantum analogues of the equations in Theorem 2 we need the notion of ‘channel’ in a quantum setting. It exists, and is alternatively often called a quantum operation, see e.g. . There are several variations possible in the requirements, such as just positive or complete positive, unitary or subunitary, normal or not. These variations are not essential for what follows.
For a finite-diminensional Hilbert space be write for the set of linear maps . Because has finite dimension, such are automatically bounded, or equivalently, continuous. The set of operators is in fact a Hilbert space itself, with Hilbert-Schmidt inner product , where is the conjugate transpose of , as matrix. Moreover, there are canonical isomorphisms and .
If is another finite-diminensional Hilbert space, then a CP-map is a completely positive linear map . Notice the change of direction. This CP-map is called a channel
if it preserves the unit/identity matrix:. It may be called subchannel if . Each CP-map has a ‘dagger’, written as , so that , that is, .
For a channel and a predicate (effect) on we define predicate transformation via function application . Similarly, for a state on we define state transformation via the dagger of the channel, as: . Then, using that positive operators are self-adjoint, we get the same relation (3) between validity and state/predicate transformation as in the classical case:
Let be a (quantum) predicate on Hilbert space . It gives rise to a subchannel defined by:
States/predicates on are special instances of CP-maps , resp. . If we consider them as such channels, we can take their dagger . Then we can relate upper and lower conditioning via an exchange, namely as: . This re-formulation may be useful in a more general setting.
5.1 Representation of quantum channels
As mentioned, a channel is a (completely positive) linear function between spaces of operators. Let’s assume have dimensions , respectively. The space of operators then has dimension , so that the channel is determined by its values on the base vectors of . Thus, the channel is determined by matrices of size , as in:
The matrix entries of the channel will be written via double indexing, as for and .
This matrix representation of a quantum channel is used in EfProb. It is convenient, for instance because parallel composition of channels can simply be done by Kronecker multiplication of their (outer) matrices (12). We briefly describe how predicate and state transformation works.
Let be a predicate on , represented as a matrix. Predicate transformation is done simply by linear extension. It yields an matrix, forming a predicate on , via:
In the other direction we do state transformation essentially via the dagger of the channel . Explicitly, it works as follows. Let be a state of , represented by a matrix. Then we obtain the transformed state as an matrix given by computing traces:
Notice the change of order of indices: at position of we use the inner matrix from (12). The reason is the implicit use of the Hilbert-Schmidt inner product, given by , where the dagger involves a conjugate transpose.
5.2 Quantum pairing and extraction
The pairing of a classical state and a channel in (4) involves a copier . It does not exist in general in a quantum setting because of the ‘no-cloning’ theorem. But we do have ‘cup’ states with maximal entanglement. They are basis dependent: given a finite-diminensional Hilbert space with orthonormal basis of size , we can for a state of as . Similarly, there is ‘cap’ predicate . The quantum pairing and extraction operations that we describe in this subsection are due to . But the more abstract description in terms of cups and caps does not occur there. These operations depend on a choice of basis.
Given a state of and a channel we can thus form a joint state of via the ‘cup’ state of . Then we can define a pair state of via state transformation as in:
In the other direction, given a joint state of we write for the transpose of its first marginal, so:
We extract a channel from in the manner defined in :
The next result is the analogue of Lemma 2 about disintegration for classical discrete probability.
Proposition 5 (After ).
Similarly, a joint state for which the transpose of its first marginal , as defined above, is invertible can be recovered as a pair, as on the left below. In addition, ’s second marginal can be obtained via state transformation, as on the right:
We shall do the first equation and leave the others to the interested reader.
The latter equation holds since a state is positive and thus self-adjoint.
6 A quantum Bayesian Inference Theorem
This section contains the main result of this paper, namely the quantum analogue of Theorem 2. It describes how conditioning of a joint state can also be performed via the extracted channel. The novelty in our quantum description is that we need both lower and upper conditioning to capture what is going on.
Let be a state of and let be predicates, on and on respectively. Then:
The proof is ommitted since it involves rather long and boring matrix calculations. Instead we include a random test: the quantum versions of pairing / projection / extraction and lower / upper conditioning have been implemented in EfProb. They can be used to test Theorem 3 as below, by generating an arbitrary state t, in this case of type , together with arbitrary (suitably typed) predicates. The EfProb notation for lower and upper conditioning is / and ^.
The two equality tests == involve and matrices of complex numbers.
In the equations in Theorem 3 we perform lower conditioning on the joint state. One may ask if there are also ‘dual’ equations where upper conditioning on the joint state is re-described via state/predicate transformation. We have not found them.
Thanks to Kenta Cho and Alex Kissinger for helpful feedback and discussions.
-  J.-M. Allen, J. Barrett, D. Horsman, C. Lee, and R. Spekkens. Quantum common causes and quantum causal models. Phys. Rev. X, 7(3):031021, 2017.
Bayesian Reasoning and Machine Learning. Cambridge Univ. Press, 2012. Publicly available via http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=Brml.HomePage.
-  J. Bernardo and A. Smith. Bayesian Theory. John Wiley & Sons, 2000.
-  K. Cho and B. Jacobs. Disintegration and Bayesian inversion, both abstractly and concretely. See arxiv.org/abs/1709.00322, 2017.
-  K. Cho and B. Jacobs. The EfProb library for probabilistic calculations. In F. Bonchi and B. König, editors, Conference on Algebra and Coalgebra in Computer Science (CALCO 2017), volume 72 of LIPIcs. Schloss Dagstuhl, 2017.
-  K. Cho, B. Jacobs, A. Westerbaan, and B. Westerbaan. An introduction to effectus theory. see arxiv.org/abs/1512.05813, 2015.
-  B. Coecke and A. Kissinger. Picturing Quantum Processes. A First Course in Quantum Theory and Diagrammatic Reasoning. Cambridge Univ. Press, 2016.
-  B. Coecke and R. Spekkens. Picturing classical and quantum Bayesian inference. Synthese, 186(3):651–696, 2012.
-  B. Fong. Causal theories: A categorical perspective on Bayesian networks. Master’s thesis, Univ. of Oxford, 2012. see arxiv.org/abs/1301.6201.
-  S. Gudder and R. Greechie. Sequential products on effect algebras. Reports on Math. Physics, 49(1):87–111, 2002.
-  S. Hayashi. Adjunction of semifunctors: categorical structures in nonextensional lambda calculus. Theor. Comp. Sci., 41:95–104, 1985.
-  R. Hoofman and I. Moerdijk. A remark on the theory of semi-functors. Math. Struct. in Comp. Sci., 5(1):1–8, 1995.
-  B. Jacobs. New directions in categorical logic, for classical, probabilistic and quantum logic. Logical Methods in Comp. Sci., 11(3), 2015. See https://lmcs.episciences.org/1600.
-  B. Jacobs. From probability monads to commutative effectuses. Journ. of Logical and Algebraic Methods in Programming, 94:200–237, 2017.
-  B. Jacobs. Quantum effect logic in cognition. Journ. Math. Psychology, 81:1–10, 2017. See https://doi.org/10.1016/j.jmp.2017.08.004.
-  B. Jacobs and F. Zanasi. A predicate/state transformer semantics for Bayesian learning. In L. Birkedal, editor, Math. Found. of Programming Semantics, number 325 in Elect. Notes in Theor. Comp. Sci., pages 185–200. Elsevier, Amsterdam, 2016.
-  B. Jacobs and F. Zanasi. A formal semantics of influence in Bayesian reasoning. In K. Larsen, H. Bodlaender, and J.-F. Raskin, editors, Math. Found. of Computer Science, volume 83 of LIPIcs, pages 21:1–21:14. Schloss Dagstuhl, 2017.
-  B. Jacobs and F. Zanasi. The logical essentials of Bayesian reasoning. See arxiv.org/abs/1804.01193, 2018.
-  D. Koller and N. Friedman. Probabilistic Graphical Models. Principles and Techniques. MIT Press, Cambridge, MA, 2009.
-  M. Leifer and R. Spekkens. Towards a formulation of quantum theory as a causally neutral theory of Bayesian inference. Phys. Rev. A, 88(5):052130, 2013.
-  M. Leifer and R. Spekkens. A Bayesian approach to compatibility, improvement, and pooling of quantum states. Journ. of Physics A: Mathematical and Theoretical, 47(27):275301, 2014.
-  M. Nielsen and I. Chuang. Quantum Computation and Quantum Information. Cambridge Univ. Press, 2000.
-  J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Graduate Texts in Mathematics 118. Morgan Kaufmann, 1988.
-  J. Pienaar and Č. Brukner. A graph-separation theorem for quantum causal models. New Journ. of Physics, 17:073020, 2015.
-  P. Selinger. A survey of graphical languages for monoidal categories. In B. Coecke, editor, New Structures in Physics, number 813 in Lect. Notes Physics, pages 289–355. Springer, Berlin, 2011.