# A Note on Quantum Markov Models

The study of Markov models is central to control theory and machine learning. A quantum analogue of partially observable Markov decision process was studied in (Barry, Barry, and Aaronson, Phys. Rev. A, 90, 2014). It was proved that goal-state reachability is undecidable in the quantum setting, whereas it is decidable classically. In contrast to this classical-to-quantum transition from decidable to undecidable, we observe that the problem of approximating the optimal policy which maximizes the average discounted reward over an infinite horizon remains decidable in the quantum setting. Given that most relevant problems related to Markov decision process are undecidable classically (which immediately implies undecidability in the quantum case), this provides one of the few examples where the quantum problem is tractable.

## Authors

• 1 publication
• 1 publication
04/13/2020

### K-spin Hamiltonian for quantum-resolvable Markov decision processes

The Markov decision process is the mathematical formalization underlying...
06/11/2014

### Quantum POMDPs

We present quantum observable Markov decision processes (QOMDPs), the qu...
06/05/2019

### Quantum Algorithms for Solving Dynamic Programming Problems

We present quantum algorithms for solving finite-horizon and infinite-ho...
12/27/2019

### Quantum Logic Gate Synthesis as a Markov Decision Process

Reinforcement learning has witnessed recent applications to a variety of...
12/15/2020

### An exact solution in Markov decision process with multiplicative rewards as a general framework

We develop an exactly solvable framework of Markov decision process with...
02/14/2020

### On State Variables, Bandit Problems and POMDPs

State variables are easily the most subtle dimension of sequential decis...
03/11/2019

### Deep Recurrent Q-Learning vs Deep Q-Learning on a simple Partially Observable Markov Decision Process with Minecraft

Deep Q-Learning has been successfully applied to a wide variety of tasks...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The study of Markov models is important to control theory and machine learning. In optimal control, it is relevant for the theory of sequential decision making under uncertainty. Seminal results in this area were proved by Blackwell [3], Sondik [12], and others (see [2]

). In machine learning, it is central for studying the main problems in reinforcement learning, such as planning and optimization (see

[10, 14]). The primary model used in both areas is the Markov decision process (partially observable or otherwise).

For quantum computing, Markov models are useful for developing algorithms and for modeling the quantum phenomena observed in quantum systems. In the design of algorithms, several fundamental algorithms, such as Grover search, are based on quantization of Markov chain on graphs (see Szegedy

[13]). Further generalizations to quantum Markov processes were studied in quantum information theory in the context of quantum channels (see [15]).

In a recent work, Barry et al. [1] introduced a model of quantum observable Markov decision process. This is a natural quantum analogue of a partially observable Markov decision process, although it is a strict generalization of a belief Markov decision process. The latter is a commonly used full-information model to represent a partially observable Markov decision process. The main component used to define a quantum observable Markov decision process is a set of superoperators. Using standard machinery from quantum information (see [17, 18]), we recast the set of superoperators as composition of a conditional channel and a quantum instrument. We describe this in the context of quantum transducers.

A finite transducer is a finite automata with output (see [6]). There are two standard models for transducers: the Moore machine and the Mealy machine. In the former model, the output is generated from the current state, while in the second model, the output is generated based on the action taken from the current state. Following in this classical vein, we describe models for quantum Moore and quantum Mealy machines, and we prove that they are equivalent. The proof of this equivalence relies on the decoupling of a superoperator into the composition of a conditional channel and a quantum instrument. In earlier works, Wiesner et al. [16] and Monras et al. [8] had also studied quantum transducers but in the context of stochastic generators.

In [1], Barry et al.

studied the goal-state perfect reachability problem (also called planning) for quantum observable Markov decision process. Here, the objective is to find a sequence of actions which will evolve the system from the initial state to a given target state, with probability one. Their main result is that this perfect planning problem is undecidable in the quantum setting, whereas the classical version is known to be decidable.

Madani et al. [7] proved that most classical problems related to partially observable Markov decision process, such as planning and optimization, are undecidable. Since the quantum problems are generalizations of the classical ones, this immediately implies that the quantum problems are also undecidable. The list of these problems do not include the above perfect planning problem. So, Barry et al. [1] shows an interesting classical-to-quantum transition from decidable to undecidable. A similar phenomenon occurs for the perfect non-occurrence problem where we ask if there is an output sequence that will never be observed. This problem is decidable in the classical setting, but its quantum version (called the quantum measurement occurrence problem) was proved undecidable by Eisert et al. [5].

The policy existence problem in an infinite horizon that are proved undecidable by Madani et al. [7] include the problems under the total reward, average reward, and discounted reward criteria. For approximating the optimal policy, the problems under the total and average reward criteria remain undecidable, but the problem under the discounted criteria is decidable. We focus on this latter problem and observe that the quantum analogue is still decidable. This is obtained by revisiting the work of Blackwell [3] on Markov decision process over Borel spaces. We also adapt some observations of Sondik [12] on the compact representation of the optimal policy to the quantum setting.

We summarize the complexities of the problems related to Markov decision process in Figure 1.

## 2 Preliminaries

### 2.1 Markov models

Suppose , and are finite sets (of states, input actions, and output signals, respectively). Let be a collection of Markov chains with a common state space , where each is a

column-stochastic matrix. We view each column

of

as defining a conditional probability distribution

over , where . Let be a column-stochastic matrix. We view each column of as defining a conditional probability distribution over , where and . Let be an initial probability distribution over . Suppose is a bounded real-valued reward function over and is a discount factor. The tuple is called a partially observable Markov decision process (POMDP).

There are several well-known Markov models that can be derived from the above. We obtain a Markov chain if is a unary alphabet and is trivial (say, the constant zero function), and we let and for the observed chain, or let be unary (and trivial) for the unobserved chain. We derive a Markov decision process (MDP) if and , and denote this simply as . We get a hidden Markov model (HMM) if is a unary alphabet and is trivial, and denote this as .

The random process induced by a partially observable Markov decision process is defined as follows. For a sequence , where

, we consider the sequence of pairs of random variables

, and , defined so that:

 P[X0=i]=(p0)i,  for i∈S (1)
 P[Xn+1=i|Xn=j]=(Pan)ij,  for i,j∈S (2)
 P[Yn=k|Xn=j]=Qkj,  for k∈Δ, % j∈S (3)

A policy is a sequence of maps, where and is the history at time , for each . Thus, if is the history at time , then is the action taken, for all . The value function of is given by

 Vπ(p0)=E[∞∑i=0γiR(Xi)]. (4)

The expectation is taken over an infinite product probability space333A brief discussion on the existence of this infinite probability space is given in Appendix A. The main objective is to compute the optimal value function , and to find for which the maximum is achieved. A policy is called stationary if there is a map so that for all .

### 2.2 Quantum information

We briefly review some basic notation and background from quantum information (see Nielsen and Chuang [9], Watrous [17]). For a finite set , let

denote the complex Euclidean space spanned by the unit vectors

. We will use the standard Dirac notation throughout. It will often be convenient to view as a Hilbert space.

Given complex Euclidean spaces and , let be the set of linear operators from to . For an operator , let denote the unique operator in for which for all and . For brevity, we use when . We say an operator is positive semidefinite if for some . The set of all positive semidefinite operators in is denoted . A positive semidefinite operator is called a density matrix if . The set of all density matrices over is denoted . These density matrices will provide a convenient representation for quantum states.

Let be the set of linear operators from to . An operator is called positive if it maps a positive semidefinite to a positive semidefinite . An operator is called completely positive if is positive for every auxiliary space . The set of all linear operators in which are completely positive is denoted . An operator is called trace preserving if for all . The set of linear operators in that are completely positive and trace preserving is denoted . Each element of is called a quantum channel. These quantum channels will provide a representation of quantum operations.

For a finite alphabet , a conditional channel over is a collection of channels . The input to a conditional channel is a classical-quantum state , where and . First, the conditional channel prepares the following quantum state:

 Φ(ρ)=1|Σ|∑b∈Σ|b⟩⟨b|⊗Φb(ρ). (5)

It then applies the projection and returns the second register.

For a finite alphabet , a quantum instrument over is a collection of completely positive operators for which forms a channel. On input , the instrument first prepares the following quantum state:

 Ω(ρ)=∑b∈Δ|b⟩⟨b|⊗Ωb(ρ). (6)

Note that is a quantum channel. Then, the quantum instrument applies the measurement where , for each . Here, an element is selected at random with probability and, conditioned on the measurement outcome was observed, the post-measurement state is (otherwise, if the outcome was not observed, the resulting state is ).

## 3 Quantum Transducers

A transducer is a finite automata with output (see Hopcroft and Ullman [6]). We consider two standard models of transducers, namely Moore and Mealy machines, and propose their natural quantum analogues. Then, we show that the two quantum models are equivalent.

In this and subsequent sections, we assume is a finite set of states, is a finite set of input symbols, and is a finite set of output symbols. We also assume that is a complex Euclidean space associated with .

A quantum Moore machine is a tuple where is a conditional channel, is a quantum instrument, is the initial density matrix, and is an orthogonal projection (onto an accepting subspace). A single step of is a composition of the transition map followed by the output map . If the current state of is given by the density matrix and the current input is , then transforms to the intermediate state . Further, applies the instrument to and generates an output with probability . The final resulting state is .

For input and output , the probability that on input will output is denoted , while the final resulting state of is denoted . The function computed by on input and output is given by

 AccM(α,β)=Tr(ΠρM(α,β)). (7)

This is the probability that the final state belongs to the accepting subspace. The function computed by on input is given by

 AccM(α)=Tr(ΠρM(α)) (8)

where

 ρM(α)=∑βpM(β;α)ρM(α,β). (9)
###### 3.1 Fact ().

Let be a quantum Moore machine. For input and output , we have

 ρM(α,β)=Ωbn∘Φan∘…∘Ωb1∘Φa1(ρ0)pM(β;α),   pM(β;α)=Tr(Ωbn∘Φan∘…∘Ωb1∘Φa1(ρ0)). (10)
###### Proof.

We prove this by induction on the length . For , the state after the conditional channel is . By the properties of the quantum instrument, the probability that outputs is given by

 pM(b1;a1)=Tr(Ωb1(~ρ1))=Tr(Ωb1∘Φa1(ρ0)) (11)

and the post-measured state of

 ρM(a1,b1)=Ωb1∘Φa1(ρ0)pM(b1;a1). (12)

This proves the base case.

Assume the claim holds for . Suppose the input is and the output is . Let and . Then, the probability that outputs on input given that it has read input and emitted output is

 Tr(Ωbn+1(~ρn+1))=Tr(Ωbn+1∘Φan+1(ρn))=Tr(Ωbn+1∘Φan+1∘…∘Ωb1∘Φa1(ρ0))pM(b1…bn;a1…an) (13)

where the last step follows from the inductive hypothesis. Therefore,

 pM(b1…bn+1;a1…an+1)=Tr(Ωbn+1∘Φan+1∘…∘Ωb1∘Φa1(ρ0)). (14)

The resulting quantum state is

 ρn+1=Ωbn+1∘Φan+1(ρn)Tr(Ωbn+1∘Φan+1(ρn))=Ωbn+1∘Φan+1∘…∘Ωb1∘Φa1(ρ0)Tr(Ωbn+1∘Φan+1∘…∘Ωb1∘Φa1(ρ0)). (15)

This shows the claim holds for . ∎

A quantum Mealy machine is a tuple where is a set of quantum instruments over a common alphabet , is the initial density matrix, and is an orthogonal projection (onto an accepting subspace). Each quantum instrument is a composition of a channel and a complete measurement over . For notational convenience, we will identify this channel as well (it becomes an instrument after the first register is measured; see Watrous [17]). More specifically, we associate with a collection of completely positive operators satisfying . Then, we define the channel

 Λa(ρ)=∑b∈Δ|b⟩⟨b|⊗Λa,b(ρ). (16)

Upon measuring the first register, the output is with probability and the post-measurement state is . Thus, if the current state of is and the current input is , applies to and generates output according to the above process.

For input and output , the probability that on input will output is denoted , while the final resulting state of is denoted . The function computed by on input with output is given by

 AccM(α,β)=Tr(ΠρM(α,β)). (17)

This is the probability that the final state belongs to the accepting subspace. The function computed by on input is given by

 AccM(α)=Tr(ΠρM(α)) (18)

where

 ρM(α)=∑βpM(β;α)ρM(α,β). (19)
###### 3.2 Fact ().

Let be a quantum Mealy machine, where so that for each , the collection forms an instrument. Then, for any input and output , we have

 ρM(α,β)=Λan,bn∘…∘Λa1,b1(ρ0)p(β;α),   p(β;α)=Tr(Λan,bn∘…∘Λa1,b1(ρ0)). (20)
###### 3.3 Definition ().

Let be a quantum Moore machine and be a quantum Mealy machine sharing the same input alphabet and output alphabet . We say and are equivalent if for all inputs and outputs , we have and .

In what follows, we show that the two models of quantum transducers are equivalent. The idea behind this is the decoupling of the set of quantum instruments in a quantum Mealy machine into a composition of a conditional channel and a shared quantum instrument in a quantum Moore machine (see Figure 2).

###### 3.4 Proposition ().

For any quantum Moore machine, there is an equivalent quantum Mealy machine.

###### Proof.

Let be a quantum Moore machine. Consider a quantum Mealy machine where the collection of instruments are defined as

 Λa,b=Ωb∘Φa. (21)

Since both and are completely positive, so is . Note that for each , we have

 ∑b∈ΔΛa,b=(∑b∈ΔΩb)∘Φa (22)

is a channel since channels are closed under composition. Therefore, the following map is also a channel:

 Λa(ρ)=∑b∈Δ|b⟩⟨b|⊗Λa,b(ρ)=∑b∈Δ|b⟩⟨b|⊗Ωb∘Φa(ρ). (23)

This is the same state that is prepared by before the measurement.

By Facts 3.1 and 3.2, for all and , we have and (since the states agree). ∎

###### 3.5 Proposition ().

For any quantum Mealy machine, there is an equivalent quantum Moore machine.

###### Proof.

Let be a quantum Mealy machine where so that , for each .

Consider a quantum Moore machine defined as follows. We let be a channel given by

 Φa(ρ)=∑b∈Δ|b⟩⟨b|⊗Λa,b(ρ) (24)

and let be a linear operator given by

 Ωb=|b⟩⟨b|⊗I. (25)

Note that is completely positive and is the identity channel. Hence, forms a quantum instrument.

We see that

 (Ωb∘Φa)(ρ)=|b⟩⟨b|⊗Λa,b(ρ) (26)

and

 ∑b∈Δ(Ωb∘Φa)(ρ)=∑b∈Δ|b⟩⟨b|⊗Λa,b(ρ) (27)

This is the same state that is prepared by before the measurement.

By Facts 3.1 and 3.2, for all and , we have and (since the states agree). ∎

By combining Propositions 3.4 and 3.5, we have the following.

###### 3.6 Theorem ().

Quantum Mealy and Moore machines are equivalent.

Remark: The classical transducers are special cases of the quantum transducers. For example, a Moore machine can be defined as a tuple where is a set of Markov chains, is a set of probability distributions over , is the initial state and is a collection of accepting states. A Mealy machine is defined similarly except for one difference: the set is given by , that is, each probability distribution depends on both the current state and the action taken. These differ slightly from the models defined in [6] in that we retain the notion of accepting states (which is standard for finite automata).

Remark: Note that any Markov chain, hidden Markov model, or partially observable Markov decision process (and hence Markov decision process) can be simulated by a quantum Moore machine. Moreover, the decoupled structure of a quantum Moore machine allows us to either focus on the input side (conditional channel) or the output side (quantum instrument). Thus, it is easy to see that a quantum Moore machine can simulate any quantum Markov chain, any quantum automata [4], and any hidden quantum Markov model [16, 8].

## 4 Quantum observable Markov decision processes

We consider the quantum observable Markov decision process defined by Barry et al. [1]. Let be a finite set of states, be a finite set of input symbols, and be a finite set of output symbols. Let be the complex Euclidean space associated with the set of states.

###### 4.1 Definition ().

(Eisert et al. [5], Barry et al. [1])
A superoperator is given by a set of Kraus operators where . If is applied to a density matrix , then with probability , the resulting state is and the observation is emitted.

We note that a superoperator is a quantum instrument.

###### 4.2 Definition ().

(Barry et al. [1])
A quantum observable Markov decision process (QOMDP) is a tuple where is a set of superoperators, is a set of operators, is the initial state, and is a discount factor. Moreover, we assume each superoperator is defined by the set of Kraus operators satisfying . The reward associated taking action from state is given by .

The set of superoperators in Definition 4.2 functionally acts as a “conditional quantum instrument.” Using Theorem 3.6, we formalize this using conditional channel and quantum instrument (as defined in Watrous [17] and in Wilde [18]).

Remark: The reward operator must be Hermitian since we require the rewards be real for the purpose of optimization.

###### 4.3 Fact ().

The reward is a bounded real-valued function.

###### Proof.

Suppose where is a set of states. So, is a Hermitian matrix, for all , and , where

is the maximum eigenvalue (in absolute value) of

. ∎

The next proposition shows we may assume without loss of generality that the reward function depends on the current state alone. We adopt this simplifying assumption from here on.

###### 4.4 Proposition ().

Let be a quantum observable Markov decision process where the reward is defined as for all . There is an equivalent quantum observable Markov decision process where the reward is defined as for all .

###### Proof.

Suppose is the underlying finite set of states of . Let . Since , we have . Let , for some arbitrary .

Each quantum instrument of is assumed to act on product states and is defined as

 Γa(ρ⊗α)=Λa(ρ)⊗|a⟩⟨a|. (28)

So, applies the quantum instrument of to its first component and applies the replacement channel to its second component.

We define . Then, . So, if is the state trajectory of with a sequence of actions , then the corresponding trajectory of is . Thus, . ∎

Suppose is a quantum observable Markov decision process. Given a sequence of actions and a sequence of observations , we have

 ρM(α,β)=Λan,bn…Λa1,b1ρ0Λ†a1,b1…Λ†an,bnpM(β;α),   pM(β;α)=Tr(Λan,bn…Λa1,b1ρ0Λ†a1,b1…Λ†an,bn) (29)

where is the state of on input and output and is the probability that outputs on input .

###### 4.5 Definition ().

(Barry et al. [1])
A goal quantum observable Markov decision process is a tuple where , , , , and are as defined for quantum observable Markov decision process, and is the goal state. The goal state is absorbing, that is, for all and , if , then .

###### 4.6 Fact ().

For every goal quantum observable Markov decision process, there are equivalent quantum Mealy and quantum Moore machines.

###### Proof.

(Sketch) We let the Mealy projector be a projection onto the one-dimensional subspace spanned by the unique goal state of the goal quantum observable Markov decision process. The equivalent quantum Moore machine is obtained from the quantum Mealy machine using Theorem 3.6. ∎

Similar to Definition 4.5, we may also define a goal partially observable Markov decision process as where is an absorbing state for each Markov chain , . Likewise, we can show that is equivalent to a Moore machine. Note that the structure of a goal partially observable Markov decision process (used in, for example, Madani et al. [7]) is closer to the structure of a Moore machine than to a Mealy machine (see Hopcroft and Ullman [6], page 42-43). This is due to the decoupled nature of the set of Markov chains and the stochastic output function in a goal partially observable Markov decision process.

## 5 Complexity

We explore some computational problems related to partially observable and quantum observable Markov decision processes. These include reachability (or planning) and occurrence problems.

Reachability

Instance: A Moore machine , where is an absorbing goal state, and a threshold .

Question: Is there so that ?

The above problem is called the plan-existence or probabilistic planning problem in [7].

###### 5.1 Theorem ().

(Madani, Hanks, and Condon [7], Theorem 3.7)
Reachability is undecidable.

In contrast, the reachability problem with is decidable (a proof was given in [1]).

Quantum Reachability

Instance: A quantum Moore machine and a threshold .

Question: Is there so that ?

###### 5.2 Theorem ().

Quantum Reachability is undecidable.

###### Proof.

This follows from Theorem 5.1 since a quantum Moore machine can simulate a Moore machine. ∎

An alternative proof of Theorem 5.2 was also given by Wolf et al. [19].

###### 5.3 Theorem ().

(Barry, Barry, and Aaronson [1])
Quantum Reachability with is undecidable.

Blondel et al. [4] proved that the quantum reachability problem where the conditional channel consists of unitary channels with a trivial quantum instrument is decidable if and only if the threshold condition is a strict inequality.

In the occurrence problem, the goal is to determine if, in a partially observable Markov decision process, the probability of observing a sequence of output is less or equal to a given threshold. Informally, this asks if the output sequence is a rare anomaly.

Non-Occurrence

Instance: A Moore machine and .

Question: Are there are and so that ?

Although the following result is possibly folklore, for completeness, we provide a simple proof.

###### 5.4 Proposition ().

Non-Occurrence is undecidable.

###### Proof.

We reduce goal-state reachability to non-occurrence. Let be a Moore machine and is a threshold. Recall that is the probability of the transition to under action and is the probability of output from state .

Let be a Moore machine where , where , and , where . For each , we let

 ~Pa(j,i)=⎧⎪⎨⎪⎩Pa(j,i) if i,j∈S∖{sg} 1 if i∈{sg,^s} and j=^s 0 otherwise. (30)

Let if and , if and , and otherwise. We may set and arbitrarily.

We observe that reaches its goal state with probability at least if and only if outputs a string without with probability at most . ∎

Eisert et al. [5] showed that the classical Non-Occurrence problem with is decidable.

Quantum Non-Occurrence

Instance: A quantum Moore machine , and .

Question: Are there are and so that ?

###### 5.5 Corollary ().

Quantum Non-Occurrence is undecidable.

###### Proof.

This follows from the undecidability of the classical problem (Proposition 5.4). ∎

For the special case when , it is known the classical problem is decidable. But, in the quantum setting, we have the following result.

###### 5.6 Theorem ().

(Eisert, Müller, and Gogolin [5])
Quantum Non-Occurrence with trivial conditional channel and is undecidable.

## 6 Policy Existence

In a Markov decision process with finite action space, Blackwell [3] proved that there always exists an optimal policy that is deterministic and stationary. We describe Blackwell’s theorem along with relevant background.

###### 6.1 Definition ().

Let be a topological space. A collection of subsets of is called a -algebra of if it contains and it is closed under complementation and countable unions. The smallest -algebra of which contains all open subsets of is called the Borel -algebra of . The elements of this Borel -algebra are called the Borel subsets of . A Borel set is a Borel subset of a complete separable metric space. The set of all bounded real-valued functions over a Borel set is denoted .

Let be a Markov decision process where is a Borel set, is a finite set of actions, is a conditional probability distribution over given , is a reward function, is the initial state, and is a discount factor.

For a positive integer , let be the history at time . A policy is a sequence where is a conditional distribution over given . A policy is Markov if each is a deterministic map. A policy is stationary if for all , , for some map . The value of a policy is given by

 Vπ(p0)=E[∞∑i=0γiR(Xi)]. (31)

The expectation is taken over a probability space which contains all infinite trajectories (see Appendix A). Here, , , is the sequence of states induced by . Note . A policy is optimal if for all policy . In this case, we denote the optimal value as .

###### 6.2 Theorem ().

(Blackwell [3], Theorem 5)
Let be a Markov decision process (as defined above). Suppose the operator is defined as where

 Ta(V)(p)=R(p)+γE[V(P(p,a))]. (32)

Then, is -Lipschitz, that is, , so that (by the Banach fixed-point theorem) has a unique fixed point and , for all and any .

###### 6.3 Theorem ().

(Blackwell [3])
Let be a Markov decision process (as defined above). Then:

1. is optimal if and only if (see Theorem 6(f)).

2. There is an optimal stationary policy for (see Theorem 7(b)).

Theorems 6.2 and 6.3 show that is optimal if its value function is a fixed point of , that is, .

In what follows, we apply Blackwell’s theorems to a quantum observable Markov decision process. Given that a quantum observable Markov decision process is a Markov decision process over the set of density matrices, it suffices to show that the latter forms a Borel space.

###### 6.4 Proposition ().

Let be a complex Euclidean space over a finite set . Then, the set of all density matrices over is a Borel set.

###### Proof.

Let . Then, each element of is a unit trace, positive semidefinite matrix from . The latter is a complete separable metric space (with a norm induced by the inner product ). To show is Borel, it suffices to show it is closed. Let be a convergent sequence in where as . We need to show . Since trace is linear, it is continuous; thus, . To show , consider the map for some fixed but arbitrary . Since is linear, it is continuous. Thus, . This shows that . Since is arbitrary, this shows . Thus, . ∎

###### 6.5 Theorem ().

Let be a quantum observable Markov decision process. Then, there is an optimal stationary policy for .

###### Proof.

Note this follows from Theorem 6.3 since is a Borel set (by Proposition 6.4), is finite, induces a conditional probability distribution over given , and is a bounded real-valued function. ∎

Under certain assumptions, the optimal value function has a compact representation as a piecewise linear and convex function over the probability simplex. This was originally observed by Sondik [12]. In the next theorem, we generalize this observation to the quantum setting.

Let