I Introduction
Detection and response to stimuli is in general a multistage process that results in the hierarchical activation and interaction of several different regions in the brain. To understand the dynamics of brain functioning it therefore essential to investigate the information flow and connectivity (interactions) between different regions in the brain. Further, in each of these hierarchies, sensory and motor information in the brain is represented and manipulated in the form of neural activity patterns. The superposition of this electrophysiological activity can be recorded via electrodes on the scalp and is termed as electroencephalography (EEG).
Functional connectivity refers to the statistical dependencies between neural data recorded from spatially distinct regions in the brain [3, 4]. Information theory provides a stochastic framework which is fundamentally well suited for the task of assessing functional connectivity [5, 6] between neural responses. For example, [7, 8] presented mutual information (MI) [9]estimates to assess the correlation between the spike timing of an ensemble of neurons. Likewise, [10, 11, 12]
investigated the effectiveness of calculating pairwise maximum entropy to model the activity of a larger population of neurons. MI has also has been successfully employed in the past for determining the functional connectivity in EEG sensors for feature extraction and classification purposes
[13, 14]. Similarly, other studies have used MI to analyze EEG data to investigate corticocortical information transmission for pathological conditions such as Alzheimer’s disease [15] and schizophrenia [16], or for odor stimulation [17, 18].One limitation of MI and entropy when applied in the traditional (Shannon) sense is their inability to distinguish between the direction of information flow, as pointed out by Marko in [19]
. In the same work Marko also proposed to calculate the information flow in each direction of a bidirectional channel using conditional probabilities based on Markovian dependencies. In
[20], Massey extended the initial work by Marko and formally defined directed information as the information flow from the input to the output of a channel with feedback. Other measures have similarly been defined for calculating the directional information transfer rate between random processes, most notably, Kamitake’s directed information [21] and transfer entropy by Schreiber [22]. Further, feedback and directionality are also closely related to the notion of causality in information measures^{1}^{1}1Causality and directionality are formally defined in Def. 1 and Def. 2, resp., in Sec. III. [23, 24, 20]. Massey’s directed information and transfer entropy are in general referred to as causal, since they measure statistical dependencies between the past and current values of a process. We adopt this definition in this paper and therefore, causality here takes on the usual meaning of a cause occurring prior to its effect, or a stimulus occurring before response, i.e., how the past states of a system influences its present and future states [25].Our interest here is in using EEG for assessing human perception of timevarying audio quality. We are inspired by our recent results in [26] which uses MI to quantify the information flow over the endtoend perceptual processing chain from audio stimulus to EEG output. One characteristic common in subjective audio testing protocols including the current stateoftheart approach, Multi Stimulus with Hidden Anchor (MUSHRA) [27], is that they require human participants to assign a single qualityrating score to each test sequence. Such conventional testing suffers from a subjectbased bias towards cultural factors in the local testing environment and can tend to be highly variable. Instead, neurophysiological measurements such as EEG directly capture and analyze the brainwave response patterns that depend only on the perceived variation in signal quality [28, 29]. As a result, EEG is inherently well suited to assess human perception of audio [30, 31] and visual [31, 32, 33] quality. For example, in [31, 32]
the authors used linear discriminant analysis classifiers to extract features from EEG for classifying noise detection in audio signals and to assess changes in perceptual video quality, respectively. Similarly,
[30] identified features in EEG brainwave responses corresponding to timevarying audio quality using a timespacefrequency analysis, while [33] employed a waveletbased approach for an EEG classification of commonly occurring artifacts in compressed video, using a singletrial EEG.To the best of our knowledge, however, the work presented here is the first time that functional connectivity has been applied in conjunction with EEG measurements for the purposes of assessing audio quality perception. By using causal information measures to detect a change in functional connectivity we directly identify those cortical regions which are most actively involved in perceiving a change in audio quality. Further, we establish the analytical relationship between the different presented information measures and compare how well each of them is able to distinguish between the perceived audio qualities. Towards this end, we consider two distinct scenarios for estimating the connectivity between EEG sensors by appropriately grouping them into regions of interest (ROIs) over the cortex. In the first scenario, we employ Massey’s and Kamitake’s directed information and transfer entropy, respectively, to calculate the pairwise directional information flow between ROIs while using causal conditioning to account for the influence from all other regions. In the second scenario we propose a novel information measure which can be considered as a causal bidirectional modification of directed information applied to a generalized cortical network setting. In particular, we show that the proposed causal bidirectional information (CBI) measure assesses the direct connectivity between any two given nodes of a multiterminal cortical network by inherently calculating the divergence of the induced conditional distributions from those associated with a multiple access channel (MAC) with feedback. Each presented measure is validated by applying it to analyze real EEG data recorded for human subjects as they listen to audio sequences whose quality changes over time. For the sake of simplicity and analytical tractability we restrict ourselves to only two levels of audio quality (high quality and degraded quality). We determine and compare the instantaneous information transfer rates as inferred by each of these measures for the case where the subject listens to high quality audio as opposed to the case when the subject listens to degraded quality audio. Finally, note that we are not able make any assumptions about the actual structure of the underlying cortical channels (e.g., linear vs. nonlinear) as our analysis is solely based on the observed empirical distributions of the data at the input and output of these channels.
The rest of the paper is organized as follows. Section II provides an overview of EEG, the experiment, and the stimulus audio sequences. In Section III we review some directional information measures widely used in the literature for estimating connectivity. We assess the information flow between cortical regions using directional information measures in Section IV, along with determining the analytical relationship between these measures. In Section V we introduce CBI and discuss its properties. The results of our analysis on EEG data are presented in Section VI. We finally conclude with a summary of our study and future directions in Section VII.
Ii Background
In the conducted study, the EEG response activity of human test subjects is recorded as they listen to a variety of audio testsequences. The quality of these stimulus testsequences is varied with time between different “quality levels”. All audio testsequences were created from three fundamentally different basesequences sampled at a reference base quality of 44.1 kHz, with a precision of 16 bits per sample. Here, we employ the same testsequences and distortion quality levels as in [34]. Two different types of distortions are considered for our analysis, scalar quantization and frequency band truncation, where the specific parameters are listed in Table 1. The testsequence for a specific trial is created by selecting one of the two distortion types and then applying it to the original basesequence in a timevarying pattern of nonoverlapping five second blocks as shown in Fig. 1. Multiple of such trials are conducted for each subject by choosing all possible combinations of sequences, distortion types, and timevarying patterns. Note that despite the subjects being presented with all different quality levels in our listening tests, here we focus exemplary only on the “high” basequality and the “Q3 degraded” quality audio. This addresses the worstcase quality change and keeps the problem analytically and numerically tractable. A detailed exposition of the experimental setup, testsequences, and distortion quality levels is provided in [26].
Quality Level  Freq. Truncation  Scalar Quantization 
Low Pass Filter  No. of Significant Bits Retained  
Q1  4.4 KHz  4 
Q2  2.2 KHz  3 
Q3  1.1 KHz  2 
The EEG data is captured on a total of 128 spatial channels using an ActiveTwo Biosemi system with a sampling rate of 256 Hz. To better manage the large amount of collected data while also effectively covering the activity over different regions of the cortex, we group the 128 electrodes into specific regions of interest (ROI) as shown in Fig. 2. While a large number of potential grouping schemes are possible, this scheme is favored for our purposes as it efficiently covers all the cortical regions (lobes) of the brain with a relatively low number of ROIs. Also, the number of electrodes in any given ROI varies between a minimum of 9 to a maximum of 12. For example, in our region partitioning scheme ROI 2 (9 electrodes) covers the prefrontal cortex, ROI 6 (10 electrodes) the parietal lobe, ROI 8 (9 electrodes) the occipital lobe, and ROI 5 and ROI 7 (12 electrodes each) cover the left and right temporal lobes, respectively. In essence, our goal here then is to investigate the causal connectivity between the different ROI in response to the different audio quality levels.
Iii Directionality, Causality and Feedback in Information Measures
Let
denote a random vector of
constituent discrete valued random variables
. Also, letbe the corresponding realizations drawn from the joint probability distribution denoted by
, respectively. Similarly, represents a length random vector. Denoting the expected value of a random variable by , the entropy of the tuple can be written asThe mutual information (MI) between two length interacting random processes and is defined as
(1)  
(2) 
with the conditional entropy and
(3) 
The MI measures the reduction in the uncertainty of due to the knowledge of , and is zero if and only if the two processes are statistically independent. The MI between the two random processes is symmetric, i.e., , and can therefore not distinguish between the direction of the information flow. Alternatively, a directional information measure introduces the notion of direction in the exchange of information between sources. In this paper we define a directional information measure as follows.
Definition 1.
A directional information measure from to , represented with an arrow , quantifies the information exchange rate in the direction from the input process towards the output process .
As implied by the definition, the information flow measured by a directional information measure is not symmetric, and in general the flow from is not equal to . In the following, we will examine three different directional information measures presented in the literature.
Iiia Massey’s directed information
Directed information as proposed by Massey in [20] is an extension of the preliminary work by Marko [19] to characterize the information flow on a communication channel with feedback. Given input and output , the channel is said to be used without feedback if
(4) 
i.e., the current channel input value does not depend on the past output samples. If the channel is used with feedback then [20]
shows that the chain rule of conditional probability can be reduced to
(5) 
In closely related work [35] introduced the concept of causal conditioning based on (5). The entropy of causally conditioned on is defined as
(6) 
Definition 2.
A measure between two random processes and is said to be causal if the information transfer rate at time relies only on the dependencies between their past and current sample values and , and is not a function of the future sample values and .
Therefore, the notion of causality as used in this work is based on inferring the statistical dependencies between the past states of a system on its present and future states [23, 24, 20, 25]. This is in contrast to the stronger interventional interpretation about casual inferences such as in [36] which draws conclusions about causation, e.g., process causes .
Massey’s directed information is a causal measure between sequence and defined as
(7)  
(8) 
Equivalently, the directed information can also be written in terms of the KullbackLeibler (KL) divergence as
(9)  
(10) 
Also in general,
(11) 
with equality if the channel is used without feedback. Directed information therefore not only gives a meaningful notation to the directivity of information, but also provides a tighter characterization than MI on the total information flow over a channel with feedback.
IiiB Transfer entropy
In [22], Schreiber introduced a causal measure for the directed exchange of information between two random processes called transfer entropy. Similar to the pioneering work in [19], transfer entropy considers a bidirectional communication channel between and , and measures the deviation of the observed distribution over this channel from the Markov assumption
(12) 
In particular, transfer entropy quantifies the deviation of the l.h.s. of (12) from the r.h.s. of (12) using the KL divergence and is defined as
(13)  
(14)  
(15)  
(16) 
Correspondingly, for random processes of block length we can then define a sum transfer entropy [37, 38] which in effect calculates and adds the transfer entropy at every history depth ,
(17) 
Another widely used and closely related measure for causal influence was developed by Granger [39]. Granger causality is a directional measure of statistical dependency based on prediction via vector autoregression. The relationship between Granger causality and directional information measures has been previously analyzed in [40, 41]. For the specific case of Gaussian random variables, as it is the case in our EEG scenario, transfer entropy has been shown to be equivalent to Granger causality.
IiiC Kamitake’s directed information
Another variant of directed information is defined by Kamitake in [21] and given as
(18)  
(19) 
We notice that this measure is different from Massey’s directed information in that it measures the influence of the current sample of at time on the future samples of . Kamitake’s information measure is therefore directional, but not causal.
Iv Assessing Cortical Information Flow via Directional Measures
Iva Causal conditioning and indirect influences in multiterminal networks
A multiterminal network characterizes the information flow between multiple communicating nodes with several senders and receivers. Let us denote three communicating nodes , , and and the corresponding random processes associated with them as , , and respectively. In our case, the information transfer over the cortex can be considered equivalent to a cortical multiterminal network, with each ROI taking over the role of a communicating node. In the context of the cortical network the quantities , , and , then correspond to the sampled EEG signals from different ROIs. Also without any loss of generality, represents the output of multiple (and potentially all other) ROIs. Our goal here is to identify the connectivity between the processes in the cortical network. In particular, there are two distinct instances of connectivity that can arise as a result of using directional informational measures.
Definition 3.
A direct connectivity is said to exist from node to node if there exists a nonzero information flow via a direct path between the nodes.
Definition 4.
An implied connectivity arises when there is no direct path between two nodes and , but there is a nonzero information flow between the nodes because of an influence through other nodes in the network.
Therefore, a positive directed information between the random processes associated with any two nodes in a multinode network alone does not necessarily equate to a direct connectivity between them [41, 42, 43]. In Fig. 3 we show an example network topology to illustrate how implied connectivity can lead to false inferences. In the shown relay channel there is no direct information transfer between and , but the information flow is from to to . There is a Markovian influence . This results in a positive value for Massey’s directed information, , thereby leading to implied connectivity between and .
Notice however that in the example presented, the knowledge of leads to statistical conditional independence between and . We expand upon this idea and extend the expression for Massey’s directed information to account for the influence of the additional random processes in the network via causal conditioning [35].
Definition 5.
Causally conditioned Massey’s directed information is defined as the information flowing from to causally conditioned on the sequence as
(20)  
(21)  
(22) 
Proposition 1 ([41, 42]).
Assume a network as shown in Fig. 3 with three nodes , , and , with corresponding random processes , , and , respectively. Using causally conditioned Massey’s directed information eliminates implied connectivity, with if and are not directly connected.
Both transfer entropy and Kamitake’s directed information can be extended to incorporate causal conditioning from an additional random process as well.
Definition 6 ( [44, 45, 46]).
Causally conditioned transfer entropy and sum transfer entropy are defined respectively as
(23)  
(24)  
(25) 
Definition 7.
Causally conditioned Kamitake’s directed information is defined as
(26) 
Similar to Massey’s directed information, causally conditioned transfer entropy and Kamitake’s directed information can also be used to eliminate false inferences resulting from implied connectivity [47, 46]. The proof follows along the exact same lines as Proposition 1 and is omitted here. Also, note that despite the conditioning on a causal sequence , Kamitake’s directed information measure is not strictly causal due to the term.
Now we discuss how to apply causally conditioned directional information measures in order to estimate the functional connectivity between the ROI network. In Fig. 4(a) we show an example communication network with four interacting nodes. The information flow is depicted using solid arrows and the feedback using dashed arrows, respectively. Also, some nodes do not have a direct link in between them, for example to . In our cortical network model the nodes represent the ROIs, and the quantities , , and , resp., represent the random processes describing the sampled output of the EEG signals in each ROI. Our goal is then to infer the causal connectivity between the ROIs given the EEG recordings from each region. Towards this end we propose calculating the pairwise conditional directed information by choosing an input and output node, while using causal conditioning to account for the influence of all other nodes. For example, to estimate the connectivity from to we calculate as shown in Fig. 4(b). Since there is nonzero information flow between these two nodes we expect the directed information to return a positive value. Similarly in Fig. 4(c), computing yields a zero value since there is no direct connectivity between these two nodes. Repeating this procedure pairwise for all nodes provides a functional connectivity graph representative of the directional information flow over the entire ROI network.
IvB Relationship between different measures
The first relation that we are interested in is between Massey’s and Kamitake’s directed information measures as examined in [48, 49]. We extend this analysis to include causal conditioning on and show its connection to causally conditioned MI.
Proposition 2.
The relation between causally conditioned Massey’s and causally conditioned Kamitake’s directed information is given by
(27) 
Proof.
Definition 8.
Causally conditioned MI is defined as the MI between and causally conditioned on the sequence
(36) 
The following corollary then expresses causally conditioned MI in terms of directed information and follows directly from (32)(34) in the proof of Proposition 2.
Corollary 3.
Causally conditioned MI is the sum of causally conditioned Massey’s directed information from to , and causally conditioned Kamitake’s directed information in the opposite direction:
(37) 
There also exists a connection between between Massey’s directed information and transfer entropy as shown in [50, 51, 38], which we extend and state for causal conditioning as follows.
Proposition 4.
The relation between causally conditioned Massey’s directed information and causally conditioned transfer entropy is given by
(38) 
We observe that the directed information is a sum over the transfer entropy and an additional term describing the conditional undirected MI between and . Massey’s directed information as defined in Sec. IIIA was originally intended to measure the dependency between two length sequences. If we now relax this constraint to instead suitably measure the flow from a length sequence to a length sequence , while causally conditioned on , we then have a modified interpretation for causally conditioned Massey’s directed information which we denote as and define as follows,
(39) 
where the equality in (39) follows directly from from (24). Further, if we assume the sequences to be stationary and infinitely long and that the limit for exists, then asymptotically it can be shown [51, 38] that the information rates for causally conditioned Massey’s modified directed information and causally conditioned transfer entropy (23) are in fact equal.
V Causal Bidirectional Information Flow in EEG
In the following we propose an alternative bidirectional measure for estimating the causal dependency between the ROIs. This measure is motivated by the analysis of the threeterminal multiple access channel (MAC) with feedback, an important canonical building structure in networked communication.
Va Causal bidirectional information (CBI)
In order to derive the CBI measure let us first consider a preliminary result which uses causally conditioned directed information to express the information rate for such a MAC with feedback. Fig. 5 shows a two user MAC with feedback, with channel inputs and , and corresponding output . The capacity region for the two user discrete memoryless MAC with feedback can be lower bounded using directed information, in a form similar to the standard cutset bound for the MAC without feedback [9]. The information rate from to is shown to be [35], for all
(40) 
The other rate from to and the sum rate can be found in [35] and are not of interest for the following discussion.
by calculating the divergence of the observed joint distribution on the network of Fig.
6(a) from the one of a MAC with feedback in (VA).Now, consider the scenario of a general multiterminal network as shown in Fig. 6(a), where each node sends information and receives feedback from every other node in the network. CBI considers the three node network of Fig. 6(a) and measures the information flow in between nodes and by using the MAC as a reference, as shown in Fig. 6(b). In particular, we point our attention towards the relation in (VA) specifying the relation between the joint distribution of the two inputs of the MAC. The conditional independence of the inputs in the factorization of (VA) arises due to the causal nature of the feedback structure, where the output at the receiver is available causally at both and for each time . For the case of a MAC with feedback there is no direct connectivity (path) between the inputs. Any violation of (VA) creates dependencies between and , and these dependencies can be measured by the KL divergence between the joint distribution on the l.h.s. of (VA) and the factorization on the r.h.s. of (VA). This result is summarized in the following definition.
Definition 9.
Consider a multiterminal network with a source node , a destination node , and a group of nodes interacting casually with and as shown in Fig. 6(a). CBI calculates the KL divergence between the induced distributions of the observed conditional distribution and the one for an underlying MAC with feedback (VA):
(41)  
(42)  
(43) 
Therefore, CBI ascertains the direct connectivity (as per Def. 3) between two nodes in a general multinode network and is zero if and only if the following two conditions are satisfied:

[label=()]

and are independent for all , i.e., there is no information flow between the two nodes.

There is no direct link between and and all information flows only via an additional node .
In the following proposition we show that, by definition, CBI is inherently a casual bidirectional modification of Massey’s directed information.
Proposition 5.
Causal bidirectional information (CBI) is the sum of causally conditioned Massey’s directed information between and , and the sum transfer entropy in the reverse direction:
(44) 
Proof.
We start with
(45) 
where the equality in (45) follows from (22) in Definition 5. Denoting the term inside the summation on the r.h.s. of (45) by and rewriting yields
(46)  
(47)  
(48) 
where in (47) we have made use of (21) and (23), respectively, and (48) follows from the chain rule of joint probability
(49) 
Taking the summation on (48) and comparing with (42) proves the claim. ∎
Corollary 6.
CBI is a symmetric measure
(50)  
(51) 
Proof.
We also evaluate the expression for CBI by comparing it to conditional MI. Conditional MI measures the divergence between the actual observations and those which would be observed under the Markovian assumption ,
(56)  
(57)  
(58) 
where (57) follows from the chain rule of probability. Conditional MI is zero if and only if and are conditionally independent given . By comparing conditional mutual information (57) with the expression for CBI in (42), we notice that CBI uses causal conditioning as is replaced with .
Vi Inferring Change In Functional Connectivity
Via Preliminaries
In our analysis, we separately extract the EEG response sections for each of the two audio quality levels and calculate the information measures individually for each of them. This allows us to compare how the different probability distributions used in each of the presented information measures effect the ability to detect a change in information flow among the ROIs in Fig. 2, between the cases when the subjects listen to high quality audio as opposed to degraded quality audio.
We begin by selecting a source and a destination ROI, and , respectively. The other six remaining ROIs are considered to represent the side information . Since all electrodes in a ROI are located within close proximity of one another and capture data over the same cortical region, we consider every electrode in an ROI as an independent realization of the same random process. For example, the sampled EEG data recorded on every electrode within region , in a given time interval , is considered a realization of the same random process . This increases the sample size for the process , reducing the expected deviation between the obtained empirical distribution of and the true one. The discussed information measures are therefore calculated between the ROI pairs (and not between individual electrodes) for a total of 2permutations of 8 ROIs, i.e., 56 combinations of sourcedestination pairs.
In our earlier work [26]
we demonstrated that the output EEG response to the audio quality, over an ROI, converges to a Gaussian distribution with zero mean. The intuition here is that the potential recorded at the EEG electrode at any given timeinstant can be considered as the superposition of responses of a large number of neurons. Thus, the distribution of a sufficiently high number of these trials taken at different time instances converges to a Gaussian distribution as a result of the Central Limit Theorem. Fig.
7shows the histogram of the sampled EEG data, formed by concatenating the output over all sensors in one ROI for a single subject. The sample skewness and kurtosis of the EEG output distribution are shown in Table
II. For a Gaussian distribution the skewness equals 0 and the kurtosis equals 3 [52, 53]. To test Gaussanity of a large sample set, the sample skewness and kurtosis should approach these values, while an absolute skew value larger than 2 [53] or kurtosis larger than 7 [54] may be used as reference values for determining substantial nonnormality. By inspecting the sample estimates in Table II and comparing the histogram to a Gaussian distribution in Fig. 7, we observe that the EEG output distribution is indeed strongly Gaussian.Knowing that the interacting random processes from the ROIs converge to a Gaussian distribution [26] allows us to formulate analytical closedform expressions for calculating the information measures. The joint entropy of a dimensional multivariate Gaussian distribution with probability density is known to be given by [9]
(59) 
where is the covariance matrix and is the determinant of the matrix.
If , , and are jointly Gaussian distributed, then using (59) in conjunction with (43) reduces CBI to a function of their joint covariance matrices,
(60) 
In a similar manner, causally conditioned Massey’s directed information, Kamitake’s directed information, and sum transfer entropy, resp., can be reduced to obtain the following expressions,
(61) 
(62) 
(63) 
HQ  LQ  

ROI  Mean  Skewness  Kurtosis  Mean  Skewness  Kurtosis 
1  0.003  0.180  2.820  0.014  0.187  2.816 
2  0.148  0.057  2.943  0.159  0.214  2.786 
3  0.027  0.069  2.931  0.014  0.054  2.945 
4  0.216  0.154  3.155  0.2975  0.131  2.869 
5  0.058  0.132  3.133  0.073  0.103  3.103 
6  0.305  0.145  3.146  0.283  0.259  3.260 
7  0.111  0.016  2.984  0.101  0.050  2.950 
8  0.465  0.070  3.071  0.445  0.342  3.342 
ViB Receiver operating characteristic (ROC) curves
In order to evaluate the accuracy with which an information measure and in particular CBI can distinguish between the perceived audio quality we conduct a receiver operating characteristic (ROC) curve analysis [55, 56] on the generated vectors of measurements for the high and degraded quality audio, respectively. The ROC curve serves as a nonparametric statistical test to compare different information rates [57, 58, 59, 60]
and has the advantage that a test statistics can be generated from the observed measurements.
Consider a general binary classification scheme between two classes and