In order to resolve the imminent spectrum shortage problem, sharing spectrum with legacy systems has attracted intensive research during the past decade. Cognitive radio (CR), which has the capability to sense, learn, and adapt to the spectrum environment [1, 2, 3], can significantly improve spectrum efficiency and guarantee the unharmful coexistence with the legacy systems [4, 5, 6, 7, 8]. Nevertheless, the complex and uncertain spectrum environment makes spectrum sharing extremely challenging. The uncertainty may come from the radio propagation environment, the legacy system activity, or the complex behavior of the CR itself.
Just like human being, sophisticated cognitive capabilities are essential for the CR to cope with the uncertainty of spectrum environment. The cognitive capabilities collectively define the intelligence of CR. Although the CR concept was born with the core idea of realizing “cognition” , the research on measuring CR cognitive capabilities or intelligence is largely open.
Being able to quantitatively measure the intelligence of CR can bring us a lot of benefits.
With the intelligence model and measuring methodology, we will gain deeper insight about the key factors that affect the intelligence of a CR which can be used to guide the development of new CRs with high intelligence.
A CR vendor may advertise and price their CR products based on CR intelligence as a metric. A CR with higher intelligence tends to achieve better performance in practically uncertain spectrum environments, thus will be priced higher.
With the knowledge of the intelligence of individual CRs, a service provider or network manager can better configure their networks by integrating CRs with different intelligence levels in a more cost-efficient way. For example, a CR with higher intelligence leading a set of CRs with lower intelligence may achieve a desirable performance with low network deployment cost.
This work is an extension of our previous work , in which we proposed a data-driven methodology to derive the intelligence measure. We construct a CR intelligence model following human intelligence theory, specifically the widely accepted Cattell-Horn-Carroll (CHC) intelligence model . Based on this model, we develop psychometric techniques to measure the CR intelligence. The basic idea of our methodology is to use simulations to test different CRs in various spectrum environments under different settings. Based on the obtained performance data, we apply the factor analysis (FA) technique  to extract and measure the intelligence factors of CR.
More specifically, we present a case study consisting of 144 different types of CRs. We provide each CR with different levels of capabilities including learning-based algorithms [3, 18, 19, 20] for dynamic spectrum access, number of sensors, sensing accuracy, processing speed, and algorithmic complexity. With our methodology, five intelligence factors are identified for the CRs through our analysis, which are shown to comply with the nature of the tested algorithms. This validates our proposed methodology of measuring CR intelligence.
We summarize the contributions of this paper as follows:
For the first time, we propose the idea of identifying the cognitive capabilities of CR and introduce an intelligence model for the CR.
We propose a methodology to extract the CR’s intelligence factors and apply factor analysis as a theoretical framework for this purpose.
The proposed methodology is verified through a case study where we identify the intelligence factors of learning-based CRs under dynamic spectrum access scenarios and show these factors comply with the nature of the CRs.
The rest of the paper is organized as follows. Section II proposes our intelligence model for CR. Section III presents our methodology of deriving CR intelligence factors. In section IV, we present a case study in which we measure the intelligence of learning based CRs under a dynamic spectrum access scenarios. Section V discusses the related work and compares them with our approach. In particular, work on human intelligence measure and the difference between CR intelligence measure and human intelligence measure are highlighted. Future work and open problems are discussed in Section VI. Section VII concludes the paper.
Ii Quantitative Intelligence Model of CR
Motivated by the CHC model  that is widely used to describe human intelligence, we propose an intelligence model for the CR. Our model is structured with three strata (or stages) as shown in Fig. 1. At the top stage lies the stratum III, which defines a unique general intelligence factor . CRs with higher values (or loadings) in the factor are more intelligent in general. They tend to achieve better performance in various uncertain environments.
The stratum II represents a list of broad cognition capabilities contributing to intelligence, which are modeled as the following:
Comprehension-Knowledge (): includes the breadth and depth of a CR’s acquired knowledge and the ability to reason using previously learned experiences or procedures.
Fluid Reasoning (): includes the broad ability to reason, form concepts, and perform dynamic spectrum access using unfamiliar information or novel procedures.
Short-Term Memory (): is the ability to apprehend and hold information in immediate awareness and then use it within a short period (e.g., a few seconds or the time the CR is on).
Long-Term Storage and Retrieval (): is the ability to store information and retrieve it later in the process of communication or dynamic spectrum access.
Spectrum Sensing (): is the ability to sense the spectrum environment, e.g., sensing the availability of white space or presence of primary users.
Processing Speed (): measures the information processing time, which includes delays resulted from channel sensing, accessing and switching, computing, reasoning, and information retrieval, etc. The processing speed mainly refers to the delay or processing time required due to hardware limitations.
Algorithmic Processing Time () : is the time complexity of the algorithm employed within the cognitive radio. It is also called algorithmic complexity. Different learning algorithms introduce different time complexity depending on the efficiency of the algorithms applied. Note that algorithmic complexity is different from processing speed.
Within each stratum II broad cognitive capability, we can further define stratum I, which is at the bottom, with more narrow and specific cognitive capabilities. For example, fluid reasoning includes inductive reasoning, sequential reasoning, deductive reasoning, and speed of reasoning. Spectrum sensing takes into account number of sensors and accuracy of sensing capability. Processing speed considers the speed of processing on the received data, and the speed of switching among channels. Algorithmic processing time consists of the speed of reasoning and decision making.
Iii Proposed Methodology to Measure the intelligence of CR
In this section, we propose a data-driven methodology to measure the intelligence of CRs. The basic idea of this methodology is illustrated in Fig. 2.
For a pool of different CRs called , , …, , we design a set of
test items to evaluate their performance. CRs are different in terms of learning based spectrum access strategy, number of sensors, processing speed, computational complexity, etc. Various test environments arise from different primary user activity types or statistics, channel rates, frame delivery ratio, etc. Through testing each CR in the testing scenarios, we obtain a vector of performance datafor each () at each test scenario (). The dimension of equals to the number of performance metrics used. In our case study, is an array of length three for each cognitive radio performing in a given test scenario since we measure three performance metrics for each CR. Then we apply the FA method  on the measured data to derive the intelligence factors as latent factors. These factors are then matched to the broad cognitive capabilities described in Section II through analyzing the nature of the CR functions.
FA technique is applied on the data matrix , which identifies the latent factors as intelligence factors. The latent factors are then matched to the right cognitive capabilities by analyzing the functions of the CRs.
There are two types of FA in the literature: exploratory FA and confirmatory FA [17, 21]. Exploratory FA is used to identify the potential latent factors when both the number and the loading of the latent factors are unknown. Meanwhile, confirmatory FA is used when the number of latent factors are known. Then by applying the confirmatory FA we can decide whether the model and FA results match with each other or not. It can also be used to test a theory on possible cognitive capabilities. In other words, it determines whether or not the designed questions of the test measure the same factors that the questions were designed for. In this paper, we use confirmatory FA to test our theory on the possible intelligence factors.
To describe the details about the intelligence model and the latent factors, consider the performance of a test taker modeled as
where is the measured performance of the cognitive radio on the testing scenario , is the general intelligence factor (see the stratum III of the intelligence model in Fig. 1) of the cognitive radio . The parameter is called the “common factor”, whose value determines how smart the CR is to achieve high performance value . The weighting coefficient denotes the loading, i.e., the importance, of the intelligence factor on achieving high score on the testing scenario . The value of summarizes performance deviation from the simplified model , which is unique to the specific performance measurement and is thus called the “unique factor”. Equation (1) also shows how cognitive capabilities or intelligence factors can be modeled by the common factor . Having all the measured data , we can use FA to determine whether the data fit the model ( Eq. (1
) and if so to estimate the loadingand the intelligence factor .
For more detailed cognitive capability analysis, we can consider the list of broad cognitive capabilities in stratum II. Let denote the th intelligence factor (or latent factor), where . The performance data vector can be modeled as
where and are the weights (loadings) and the unique factor, respectively. Note that since it is possible to measure several metrics, the single value in (1) is substituted by the vector performance measurement . In this case, with all the measured data , we can verify the validity of the model (2) and determine the weighting coefficients as well as the latent factors . By analyzing the CR functioning, we can match the latent factors with the CR stratum II cognitive capabilities listed Section II.
where and are the matrices of common and the unique latent factors, respectively, and is the matrix of weights . Specifically,
and the other matrices can be obtained similarly.
From Eq. (3), we can obtain the correlation matrix of the observation as
where , and and denotes expectation and transposition, respectively. The Eq. (5) is derived based on the assumption that the common factor and unique factor are uncorrelated which yields . Similarly, based on the uncorrelation assumption, can be substituted by a diagonal positive definite matrix . Therefore, Eq. (5) can be rewritten as
Without loss of generality, it is assumed that the latent factors are orthogonal in the model. As a result . Then we subtract from both sides of Eq. (6) to derive
In this model, is called “the reduced correlation matrix” .
The next step is to determine both and . Note that is a diagonal matrix. If both and are known, then can be estimated as , where
is the eigenvector matrix and
is the diagonal eigenvalue matrix of the matrix. On the other hand, if has been estimated, then we can calculate as
Therefore, with an initial estimate of , the Eq. (7) can be solved iteratively where each iteration involves the following three steps:
Find the eigenvector and eigenvalue matrices and of “the reduced correlation matrix”: ;
This procedure runs iteratively until the maximum difference of the last two round of is less than certain small threshold .
Let , then will generate the unrotated factors matrix. Normally, we will pick up as latent factors those entries in
that are large enough, e.g., greater than 1. In practice, we may simply use principal component analysis to estimate , which just considers the latent factors influencing the performance and ignores the unique factors.
Iv Case Study: Intelligence Measure of CR with Learning Capabilities
In this section, we present a case study consisting of different types of CRs. By designing a set of testing environments, we apply our methodology presented in Section III to derive the latent factors and analyze them as intelligence factors as well as cognitive capabilities contributing to the CR intelligence.
We consider a single hop scenario where there is only one CR and one PU. Therefore, we can focus on each CR’s performance without considering channel contention. There are several channels in the network. The PU can appear on some or all of the channels simultaneously. We also assume a time slot based network. Figure 3 shows the time slot structure used by the CR.
As shown in the figure, the first part of the time slot is assigned for channel sensing. During this period, the CR senses the chosen channel and at the end of this period decides whether the channel is idle or not. If the CR finds the channel idle, it begins data transmission. Otherwise, it keeps silent to avoid interfering with the PU.
During the third part of the time slot, the CR learns from its observation. No matter the channel was idle or busy, both of them provide useful information for the CR to learn and optimize its decisions in the future. The last part is the switching period which indicates the amount of time that it takes the CR to switch from one channel to another one. Switching period is dependent on the hardware limitations of each CR.
We have conducted extensive simulations with different types of CRs. channels are considered in the network. testing scenarios are designed, such that each CR performs on each of them one by one. We run the simulations in MATLAB. For each CR performing in one single testing scenario we run the algorithm times and get the average.
Iv-B Cognitive Radio Capabilities
Figure 4 shows the capabilities of CRs considered in this case study in terms of their features and parameters. Combinations of all these features gives us 144 different types of CRs as explained in the following. The CR features are described as follows.
Channel access strategy (Access Policy) employed by the CR to learn and adapt to the environment. It can be a learning-based method, deterministic or just a random strategy. We consider five types of learning-based access strategies known as UCB1 , EXP3 , POLA, PROLA, and Q-Learning  and one random access strategy. Details of the strategies will be described in the sequel.
Number of sensors. Possessing more sensors, the CR observes more channels at each time slot. Then depending on the reasoning it employs, the CR may adapt better to the environment. This is probably equal to higher loads in cognitive capabilities. In this case study, we consider the number of sensors () to be either , , or .
Sensing accuracy which indicates the detecting probability when the PU is present. There are several methods of channel sensing including energy detection and feature extraction[22, 23, 24]. We consider three values of , , as the probability of the correct sensing. The values are relatively large because in practice, the CRs usually have high sensing accuracy.
Processing speed is another feature of a CR that occurs during sensing, learning, and switching parts of the time slot. Learning delay occurs due to two reasons, the hardware limitations and due to algorithmic complexity of the learning algorithm. We add up the delay due to hardware limitations that happen in different parts of a time slot as one single total delay. We assume this total delay to be either , , or in which indicates the time slot duration.
Algorithmic complexity. The delay occurred due to the time required by the computations in the algorithmic side is different than the delay due to hardware limitations. It depends on the efficiency of the learning algorithm and for this reason it is called algorithmic complexity. This type of delay depends on how well the learning algorithm has been designed algorithmic-wise and it is inherent to the learning technique.
As to the six channel access strategies we employ in this work, the random access strategy does not utilize any learning-based algorithm. The other learning based algorithms mentioned are described below.
The UCB1 and EXP3 algorithms [18, 19] are slightly modified from their original version for the case with observations to address the more general case of observation of more than one channel. The modified UCB1 and EXP3 algorithms are described in Algorithms 1 and 2, respectively. Note that UCB1 is a deterministic access policy designed for well behaved environments, while EXP3 is designed for adversarial environments.
Algorithm 1: UCB1 Algorithm with multiple observations
Initialization: Play each machine once. Per each play make observations including the played one. The observations are made on the subsequent actions beginning from the action played.
For each : Play each machine that maximizes a given deterministic policy. The decision criteria is based on the upper confidence bound concept from statistics. Make observations on the subsequent channels beginning from the taken action.
Algorithm 2: EXP3 Algorithm with multiple observations
Initialization: Assign a uniform random distribution on action selection.
1. Update the distribution on action selection based on the observations made so far plus adding some randomness. Randomness is added to make sure the agent makes enough explorations.
2. Choose an action randomly based on the distribution defined above.
3. Observe the reward on subsequent channels beginning from the taken action.
4. Update the observation history on all the channels. The observation history will be utilized in step one to optimize the channel selection distribution.
Algorithms 3 and 4 represent the POLA and the PROLA algorithms . Both algorithms are designed for adversarial environments. PROLA as explained in Section V is similar to the EXP3 algorithm in the sense that at each time step, the agent is able to both gain reward and also to make an observation utilized in its learning process. The difference between PROLA and EXP3 is that in EXP3, the agent observes the reward on the same action it takes and gains reward; however, in PROLA, the agent makes observation on a channel other than the one it takes.
POLA is similar to the PROLA algorithm since both algorithms are designed to address the case when agent does not have the capability to observe the reward on the action it takes. However, POLA has a major difference from the PROLA and EXP3 based on which at each time step, it can either take action or make observation. This scenario, happens when the agent has limited capabilities and it cannot take action and switch to another channel for observation, during the periot of the same time step .
Algorithm 3: POLA Algorithm with multiple Observations
Initialization: Assign uniform random distribution on the channels.
1. With small probability decaying in time, choose an action uniformly at random to observe its reward. Otherwise, take an action.
2. If it is decised to make observation, choose channels to observe then update the channel selection probability based on the channel observation history. Otherwise, choose a channel to access (take action) and accumulate the unobservable reward.
Algorithm 4 : PROLA Algorithm with multiple Observations
Assign random uniform distribution on channel selection.
1. Assign a distribution on action selection based on the channel observation history.
2. Choose a channel based on the above distribution to play and accumulate the unobservable reward.
3. Choose channels other than the played one uniformly at random to observe their reward during the same time slot.
4. Update the channel observation history to optimize the distribution on channel selection policy.
The last learning algorithm we apply is Q-Learning algorithm  as described in Algorithm 5. Q-Learning is similar to the UCB1 algorithm in the sense that they both are designed for well behaved environments. More specifically, Q-learning algorithm is usually applied in the environments that follow a Markovian Chain. One major difference between the Q-learning Algorithm and the UCB1 is that, Q-learning algorithm solves an optimization problem at each time step to optimize the action selection distribution.
In order to implement Q-Learning in MATLAB and to solve the optimization problems of this algorithm, CVX toolbox [26, 27] is used. More specifically, CVX toolbox is designed to solve convex optimization problems in MATLAB.
Considering all the combinations of the features as shown in Fig. (4), different types of CRs are generated. However, for random access strategy, no learning capability is utilized. So the number of channels being observed makes no impact on the CR’s performance. By removing eighteen redundant combinations, CRs remain. Different features and their assigned values are shown in Fig. 4.
Algorithm 5: Q-Learning with multiple observations
Initialization: Assign a random uniform distribution on channel selection.
1. With an small probability choose an action uniformly at random to play.
Otherwise, choose an action with the distribution assigned based on the observation history.
2. Receive the reward on the action. Make more observations on the subsequent channels other than the played one.
3. Use linear programming to optimize the action selection distribution.
Iv-C Testing Scenarios
We consider several parameters to design the testing scenarios:
Type of PU Activity. We consider three types of activities for the PU which consists of i.i.d. distribution, Markovian Chain, and arbitrary where no well defined distribution exists.
PU Load which indicates the probability of the PU to be active on each channel. PU may have a high load on all the channels or may have a light load on only one channel and a heavy load on all other channels (large gap). This testing scenario can discriminate among learning and nonlearning-based access strategies since by utilizing the observations and learning one can discriminate the good channel from low rewarding ones. We have considered several combinations of PU activity on the channels.
Channel Rate. Three different values are considered as channel rates as shown in Fig. 5. If we assume all other characteristics of the channels to be identical, a CR that learns the high rate channel may be considered as having high load in the corresponding cognitive capability.
Frame Delivery Ratio (FDR) which includes the impact of channel quality and noise on a given channel. Three possible values for FDR are considered in this case study.
Figure 5 shows a summary of the parameters considered. Combining these parameters, we create test scenarios. Each CR needs to perform on each testing scenario so that its cognitive capabilities can be derived.
Iv-D Performance Metrics
We measure the performance of the CRs based on three different metrics:
Throughput which is stored as where and indicate the testing scenario and the CR indices, respectively.
Delay which indicates total delay occurred in the time slot and is stored as .
Violation ratio which represents the average number of times the CR interfered with the PU due to wrong sensing result called miss detection. It is assumed if the CR interferes with the PU, there will be a penalty for the CR and its data will be blocked, so there will be no throughput for the CR. Violation ratio is stored in .
The performance measure data vector is equal to for and .
Iv-E Simulation Results
In this subsection we represent the simulation results, and analyze the intelligence factors as well as the cognitive capabilities of the CRs. We divide our simulations into several phases. In the first phase, we consider the UCB1, EXP3, and Random access based CRs. Associated with each of UCB1 and EXP3 policies, there are twenty-seven CRs according to Fig. 4. There are nine CRs utilizing the random access strategy.
Figure 6 shows the simulation result of the first metric, throughput. This is the total throughput obtained by aggregating the throughput achieved from all the testing scenarios for each CR applying the mentioned access strategies.
From this figure, three clusters can be identified. The first cluster (for cognitive radio index 1 to 27) represents CRs employing UCB1 learning-based access strategy. The second cluster (for cognitive radio index 28 to 54) belongs to the CRs employing EXP3 learning-based access strategy. The last cluster (for cognitive radio index 55 to 63) represents CRs utilizing random access strategies.
One observation is that, within each cluster, as the number of sensors increases, the overall throughput increases as well. Next, the total throughput of CRs employing UCB1 is higher than those employing EXP3 since most of the testing scenarios designed are well behaved (stochastic) in which UCB1 performs better [18, 19]. The third cluster illustrates those CRs employing random access methods. Since random strategy never utilizes the previous observations, it achieves the lowest throughput among others. The graphs also show that for each three consecutive CRs (i.e., three consecutive bars in the graph), the throughput is decreasing since the sensing accuracy is decreasing.
In the next step, we conduct data analysis via FA. From the simulations, three matrices are generated for three metrics we measure. They all together create the data matrix with the dimension of . FA is applied on this matrix using the software IBM SPSS .
The analysis identifies four latent factors as shown in Fig. 7. Only four factors are distinguishable and the rest are negligible which are almost zero. Due to limited space we skip the detailed output data corresponding to the FA results. Even though the number of latent factors are identified, it is not yet clear which cognitive capabilities these factors correspond to. We need to examine the data thoroughly and find out the corresponding cognitive capabilities by matching them to the CR functions.
By examining the data, the four latent factors (cognitive capabilities) are found as follows: Spectrum sensing capability, processing speed capability, environment recognition capability, and environment adaptation capability. The results are summarized in the first four rows of the Table I.
As we study the results achieved by applying FA technique, the data of the first factor provides information on the violation ratio which is impacted by the sensing accuracy and the number of sensors. As a result we conclude that the first latent factor corresponds to the spectrum sensing capability. The second latent factor addresses the delay, which is associated with the processing speed capability due to the hardware limitations of the CR. The third factor is related to the learning capability, or specifically the environment recognition capability. The forth factor shows a better performance for EXP3 and random access strategy than the UCB1 when the sensing accuracy decreases. The same thing happens when the environment is not well behaved. This indicates that the EXP3 and random access strategy adapt better to non-well behaved environments. The reason is because they utilize randomness in their access strategy which makes them more resilient to changes in the environment. Deterministic based approaches assume a stable environment which makes them vulnerable to modifications in the environment. As a result this latent factor addresses the environment adaptation capability.
Comparing to the intelligence model proposed in Section II, the processing speed capability matches the broad cognitive capability , the spectrum sensing matches , and the two others correspond to or as shown in Table I. In addition, all the CRs used in this case-study have high load on the factor.
|Factor I||Sensing Capability,|
|Factor II||Processing Speed Capability,|
|Factor III||Environment Recognition Capability, or|
|Factor IV||Environment Adaptation Capability, or|
|Factor V||Algorithmic Processing Time,|
Next, we plot the components obtained through the analysis. Component plot shows how the scenarios in the case study belong to each of the four latent factors. Since it is not possible to plot four dimensional figures, we plot the components for factors 1, 2 and 3 as shown in Fig. 8. The whole data is divided into three clusters, each corresponding to one latent factor.
In order to get a deeper insight from the results, we also apply the FA technique to only one of the performance metrics called throughput. In this case which is a limited case than the previous one, only two factors are identified as shown in Fig. 9. One of them corresponds to the learning capability and the other one corresponds to the environment adaptation capability. Figure 10 shows the components of the analyzed data in which the whole data is divided into two clusters, each corresponding to one latent factor.
In the next phase of our simulation, we add the rest of the learning based CRs applying POLA, PROLA, and Q-Learning to the ones we considered earlier to make a comprehensive list of CRs with different capabilities. Each of the CRs performs in the testing scenarios one by one. Three performance metrics are measured. This means that three matrices are generated, each with a dimension of . The combination of these matrices results in the data matrix with dimension .
As shown in Fig. 11, the performance of the PROLA is similar to the performance of the EXP3. Algorithmic wise, the only difference between these two algorithms is that in EXP3, the agent observes the reward on the same action it takes; while in the PROLA, the agent makes an observation on one other action different than the one it takes. Our analysis shows that the cognitive capabilities of the PROLA is almost the same as the ones for EXP3. All the three algorithms are designed for the non-stochastic environments. As shown in the figure the POLA algorithm achieves a lower throughput compared to the two others. This is because the POLA algorithm is not able to take action and make observation simultaneously at each time step. Instead, it decides at each time step to do either of them. This leads to a lower environment recognition capability and as a result POLA has a lower load in this cognitive capability compared to others. In contrast, EXP3 and PROLA demonstrate almost equal loads with respect to this cognitive capability. This indicates that non-stochastic based online learning algorithms do not necessarily demonstrate the same cognitive capabilities and should not be categorized into the same group.
Similarly, Fig. 12 shows the performance comparison of Q-learning and UCB1. These two algorithms are both designed for stochastic environments. As deterministic algorithms, they do not consider randomness in their policies. Our results indicate that both algorithms show high loads in the cognitive capability of environment recognition. However, their environment adaptability cognitive capability is low. Q-Learning demonstrates low load in the cognitive capability of algorithmic processing. This is because at each time slot, in order to update the action policy, the Q-learning algorithm solves an optimization problem. In contrast, the UCB1 algorithm updates action policy at each time slot by a simple sum and multiplication operations.
Finally, we derive the latent factors as shown in Fig. 13. Five cognitive factors are identified with the fifth factor as the algorithmic processing time. Table I shows the whole list of factors identified in our case study.
We also show the component plot for the whole data set used in this case study in Fig. 14. Since there are five latent factors, the component plot is five dimensional. In order to represent the five dimensional data, we fix two of the latent factors, then plot three figures considering third, fourth, and fifth latent factors, respectively.
V Related Work
V-a Learning-based Algorithms
V-A1 Multi Armed Bandits
There is a rich literature about Multi Armed Bandits (MAB). The MAB problems have many applications in cognitive radio networks with learning capabilities [1, 3, 29]. In an MAB problem, an agent plays a machine repeatedly and obtains a reward when it takes a certain action at each time. Any time when choosing an action the agent faces a dilemma of whether to take the best rewarding action known so far or to try other actions to find even better ones. Trying to learn and optimize his actions, the agent needs to trade off between exploration and exploitation. On one hand the agent needs to explore all the actions often enough to learn which is the most rewarding one and on the other hand he needs to exploit the believed best rewarding action to minimize his overall regret.
MAB problems have been studied in different settings. Stochastic MAB problems in which the rewards are generated i.i.d. on each arm are studied in . The algorithm proposed is called UCB1 . UCB1 is a deterministic strategy and the assumption is that the agent observes the reward on the action it takes. The other algorithm with the same assumption of the reward observability on the taken action is called EXP3 . EXP3 is designed for non-stochastic or adversarial environments when the adversary is oblivious.
There are also other types of MAB based on which the agent cannot observe the reward on the action it takes. These MAB problems are usually called Multi Armed Bandits with side observations . Two applications of MAB problems with side observations are addressed in [2, 3]. PORLA algorithm is designed for the learning of an agent who takes action but instead of observing the reward on the action it took, it can observe the reward on any other action than the played one . Another algorithm called POLA is applied by an agent who not only cannot observe the reward on the action it plays but rather is not able to both take an action and then switch to another action to observe its reward at the same time step .
V-A2 Reinforcement Learning
Reinforcement learning (RL) is a branch of machine learning, designed for online learning. Similar to MAB problems, the RL methods need to trade off between exploration and exploitation. Q-Learning is a well studied topic and is categorized as a reinforcement learning technique that can be used to find the optimal action selection policy [20, 25]
. The environment is usually assumed to follow Markov Chain Process.
V-B Related Work on CR Intelligence
Intelligence measure of CRs has not been well studied in the literature. However, there are various studies on evaluating the performance of CRs. A cognitive radio test methodology to test a CR system is presented in 
. The effect of cognitive engine on both SU and PU performance is measured and evaluated. It is suggested that the cognition may be measured based on the SU’s capability to improve its throughput and at the same time to decrease PU interference. The authors call their method behavior-based testing. In other words, their goal is to measure SU cognition based on the evaluation of both SU and PU performances instead of evaluating the SU cognition itself. The testing scenarios are defined as narrow-band or wide-band environments. The PU workloads and SU cognition considered in this work are limited and the authors suggest more research as a required step to justify the behavior-based cognition testing. Statistical tools and the psychometrics are not utilized in contrast to our work that considers those methods. This indicates that, our approach is completely different from this work.
The performance of cognitive radios is studied in  which considers four cognitive radio algorithms and intends to distinguish those that perform better than the others often enough. They also study how sensitive different algorithms are to suboptimal parameters. It is shown through simulations that, usually those algorithms that outperform others are highly sensitive to sub-optimal parameters. While the others that show lower performance, represent a more steady performance and are more resistant to sub-optimality in the parameters. The conclusion is that there is a trade-off between performance and consistency. The difference of this work with ours is that their goal is to compare the performance of different learning based algorithms and to distinguish those that show consistent performance and have less dependency on the parameter values. However, we derive the cognitive capabilities of CRs which is a totally different aspect of CR intelligence measurement.
V-C Cognitive Capabilities of Humans
The cognitive capabilities and the intelligence model of human beings have been studied extensively in psychology . Human cognition capabilities include sensing, learning, memory, problem solving, etc. Intelligence is defined as the ability to learn and perform cognitive tasks . Cattel-Horn-Carrot  is the most widely accepted model of human intelligence [33, 8].
The practical measurement of mental abilities has been considered as a pivotal development in the behavioral sciences and the theories and techniques formed a field called “phychometrics”. The first attempts of a mathematically more rigorous study of intelligence measure occurred in 1940s, with statistical techniques such as correlation and FA. Overall, FA is used in multiple areas including psychology and economics.
There have been some efforts trying to develop comprehensive benchmark frameworks to evaluate the cognitive radio network performance , or to evaluate the performance of more general wireless networks [35, 36, 37]. Since benchmarking wireless network is challenging, simulation has been adopted widely as a tool in the literature. However, such benchmark studies are proposed not to test CR intelligence, but to evaluate CR performance.
It is helpful to identify the differences between human and CR intelligence capabilities. One is that for human beings, the age of the test taker is an important factor that needs to be considered when designing the test questions, such as at the childhood stages in which the brain is still developing. However, with respect to the CRs, a testing scenario can be tested by all types of CRs.
Another important difference is that a human being can get tired by the long duration of the test or may not be able to focus on the test day. This can make the test results unreliable. However, this is not a problem for CR and the test results can always be correct, unbiased and reliable.
Vi Discussion and Open Problems
Vi-a Cognitive Capabilities of Routing Algorithms
Our current work is on the intelligence measure of CRs while they act in the MAC layer. Intelligence measure of CRs in the routing layer is an interesting future research direction. There are some preliminary work done on the learning-based routing methods , where the authors try to answer the question of whether machine learning including deep reinforcement learning can replace the traditional network protocol design. It is shown that data driven based routing methods that extract information from the traffic history achieve better performance. For any learning based routing algorithm designed for cognitive radio networks, we can measure their intelligence and cognitive capabilities, similarly. This leads to designing better routing algorithms and better network configurations to maximize network throughput while minimizing costs.
Vi-B Item Response Theory and IQ Measure
After extracting intelligence factors and identifying cognitive capabilities of CRs, the next step would be to combine these capabilities and assign a quantitative value to it called Intelligence Quotient (IQ). This is in fact the unique general intelligence factor g in Stratum III shown in Fig. 1. In order to do so, one needs to first make sure that the test scenarios are comprehensive and standardized. In other words, the testing items shown in Fig. 2 should include all types of test scenarios from easy to hard ones. Item Response Theory (IRT)  which is a design, analysis, and scoring paradigm for tests, is the tool that needs to be used to quantify the easy and difficult test scenarios. Using IRT to design the optimal test scenarios and to develop the IQ measurement methods is another interesting future research direction.
Vi-C Configuring the Network with Combination of CRs with Different Intelligence
As explained in the introduction, cognitive radio networks can be configured by integrating CRs with different intelligence and cognitive capabilities. This may lead to the optimal use of resources and would also be more cost efficient. More comprehensive research is needed in order to quantitatively measure the performance of such networks and to rigorously show how one or a few number of CRs with higher intelligence can lead and network with other CRs with lower intelligence.
In this paper, for the first time, we have proposed the idea of deriving the intelligence measure and analyzing the cognitive capabilities of the CR. An intelligence model is proposed for the CR, and a data-driven methodology is proposed which applies FA techniques to identify CR intelligence factors and cognitive capabilities. A case study is presented in which through extensive simulations, five latent factors are identified for the CR that comply well with the nature of the tested CRs.
Our ongoing effort is focused on measuring the intelligence quotient (IQ) for each CR. IQ can be considered as the general intelligence factor that indicates how well a CR performs in uncertain environments. We will also expand our methods to measure CR intelligence in multi-user and multi-hop networks.
M. Dabaghchian, A. Alipour-Fanid, S. Liu, and K. Zeng are partially supported by the NSF under grant No. CNS-1502584, CNS-1464487, and CNS-1619073. X. Li and Y. Chen are supported by NSF via grant CNS-1443885.
-  H. Li and Z. Han, “Dogfight in spectrum: Combating primary user emulation attacks in cognitive radio systems; part ii: Unknown channel statistics,” IEEE Transactions on Wireless Communications,, vol. 10, no. 1, pp. 274–283, 2011.
-  M. Dabaghchian, A. Alipour-Fanid, K. Zeng, and Q. Wang, “Online learning-based optimal primary user emulation attacks in cognitive radio networks,” in IEEE Communications and Network Security, (CNS 2016)., Oct 2016.
-  M. Dabaghchian, A. Alipour-Fanid, K. Zeng, Q. Wang, and P. Auer, “Optimal online learning with randomized feedback graphs with application in PUE attacks in CRN,” CoRR, vol. abs/1709.10128, 2017. [Online]. Available: http://arxiv.org/abs/1709.10128
-  I. F. Akyildiz, W.-Y. Lee, M. C. Vuran, and S. Mohanty, “Next generation/dynamic spectrum access/cognitive radio wireless networks: A survey,” Computer Networks, vol. 50, no. 13, pp. 2127 – 2159, 2006.
-  Q. Zhao and B. M. Sadler, “A survey of dynamic spectrum access,” IEEE Signal Processing Magazine, vol. 24, no. 3, pp. 79–89, May 2007.
-  K. G. Shin, H. Kim, A. W. Min, and A. Kumar, “Cognitive radios for dynamic spectrum access: from concept to reality,” IEEE Wireless Communications, vol. 17, no. 6, pp. 64–74, December 2010.
-  M. J. Marcus, “Spectrum policy for radio spectrum access,” Proceedings of the IEEE, vol. 100, no. Special Centennial Issue, pp. 1685–1691, May 2012.
-  S. D. R. Forum, “Cognitive radio definition and nomenclature,” SDRF-06-P-0009, 2008.
-  I. C. A. Organization, “potential for radio frequency interference to aeronautical surveillance systems for new terrestrial communications,” Montreal, Canada, Apr. 2012.
-  A. Alipour-Fanid, M. Dabaghchian, H. Zhang, and K. Zeng, “String stability analysis of cooperative adaptive cruise control under jamming attacks,” in 2017 IEEE 18th International Symposium on High Assurance Systems Engineering (HASE), Jan 2017, pp. 157–162.
-  A. Alipour-Fanid, M. Dabaghchian, and K. Zeng, “Platoon stability and safety analysis of cooperative adaptive cruise control under wireless rician fading channels and jamming attacks,” CoRR, vol. abs/1709.10128, 2017. [Online]. Available: https://arxiv.org/abs/1710.08476
-  D. W. Matolak, “Unmanned aerial vehicles: Communications challenges and future aerial networking,” in 2015 International Conference on Computing, Networking and Communications (ICNC), Feb 2015, pp. 567–572.
-  V. C. Gungor, D. Sahin, T. Kocak, S. Ergut, C. Buccella, C. Cecati, and G. P. Hancke, “Smart grid technologies: Communication technologies and standards,” IEEE Transactions on Industrial Informatics, vol. 7, no. 4, pp. 529–539, Nov 2011.
-  A. Zanella, N. Bui, A. Castellani, L. Vangelista, and M. Zorzi, “Internet of things for smart cities,” IEEE Internet of Things Journal, vol. 1, no. 1, pp. 22–32, Feb 2014.
-  M. Dabaghchian, S. Liu, A. Alipour-Fanid, K. Zeng, X. Li, and Y. Chen, “Intelligence measure of cognitive radios with learning capabilities,” in 2016 IEEE Global Communications Conference (GLOBECOM), Dec 2016, pp. 1–6.
-  K. S. Mcgrew, “Editorial: Chc theory and the human cognitive abilities project: Standing on the shoulders of the giants of psychometric intelligence research,” Intelligence, p. 10, 2009.
-  H. KESTELMAN, “The fundamental equation of factor analysis,” British Journal of Statistical Psychology, vol. 5, no. 1, pp. 1–6, 1952. [Online]. Available: http://dx.doi.org/10.1111/j.2044-8317.1952.tb00106.x
-  P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of the multiarmed bandit problem,” Mach. Learn., vol. 47, no. 2-3, pp. 235–256, May 2002. [Online]. Available: http://dx.doi.org/10.1023/A:1013689704352
-  P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, “The nonstochastic multiarmed bandit problem,” SIAM J. Comput., vol. 32, no. 1, pp. 48–77, Jan. 2003. [Online]. Available: http://dx.doi.org/10.1137/S0097539701398375
-  C. J. C. H. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, no. 3, pp. 279–292, May 1992. [Online]. Available: https://doi.org/10.1007/BF00992698
-  S. A. Mulaik, Foundations of factor analysis. CRC press, 2009.
-  S. Atapattu, C. Tellambura, and H. Jiang, “Energy detection based cooperative spectrum sensing in cognitive radio networks,” IEEE Transactions on Wireless Communications, vol. 10, no. 4, pp. 1232–1241, April 2011.
-  W. Zhang, R. K. Mallik, and K. B. Letaief, “Optimization of cooperative spectrum sensing with energy detection in cognitive radio networks,” IEEE Transactions on Wireless Communications, vol. 8, no. 12, pp. 5761–5766, December 2009.
-  W. l. Chin, H. c. Kuo, and H. h. Chen, “Features detection assisted spectrum sensing in wireless regional area network cognitive radio systems,” IET Communications, vol. 6, no. 8, pp. 810–818, May 2012.
-  M. L. Littman, “Markov games as a framework for multi-agent reinforcement learning,” in Proceedings of the Eleventh International Conference on International Conference on Machine Learning, ser. ICML’94. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1994, pp. 157–163. [Online]. Available: http://dl.acm.org/citation.cfm?id=3091574.3091594
-  M. Grant and S. Boyd, “Graph implementations for nonsmooth convex programs,” in Recent Advances in Learning and Control, ser. Lecture Notes in Control and Information Sciences, V. Blondel, S. Boyd, and H. Kimura, Eds. Springer-Verlag Limited, 2008, pp. 95–110, http://stanford.edu/~boyd/graph_dcp.html.
-  ——, “CVX: Matlab software for disciplined convex programming, version 2.1,” http://cvxr.com/cvx, Mar. 2014.
-  IBM SPSS. [Online]. Available: http://www.ibm.com/analytics/us/en/technology/spss/
-  Y. Gai, B. Krishnamachari, and R. Jain, “Learning multiuser channel allocations in cognitive radio networks: A combinatorial multi-armed bandit formulation,” in 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN), April 2010, pp. 1–9.
-  N. Alon, N. Cesa-Bianchi, O. Dekel, and T. Koren, “Online learning with feedback graphs: Beyond bandits,” CoRR, vol. abs/1502.07617, 2015. [Online]. Available: http://arxiv.org/abs/1502.07617
-  J. J. Thompson, K. M. Hopkinson, and M. D. Silvius, “A test methodology for evaluating cognitive radio systems,” IEEE Transactions on Wireless Communications, vol. 14, no. 11, pp. 6311–6324, Nov 2015.
-  A. Hess, F. Malandrino, N. J. Kaminski, T. K. Wijaya, and L. A. DaSilva, “Cognitive radio algorithms coexisting in a network: Performance and parameter sensitivity,” IEEE Transactions on Cognitive Communications and Networking, vol. 2, no. 4, pp. 381–396, Dec 2016.
-  R. J. Sternberg and S. B. Kaufman, Eds., The Cambridge Handbook of Intelligence. Cambridge University Press, 2011, cambridge Books Online. [Online]. Available: http://dx.doi.org/10.1017/CBO9780511977244
-  Y. Zhao, S. Mao, J. O. Neel, and J. H. Reed, “Performance evaluation of cognitive radios: Metrics, utility functions, and methodology,” Proceedings of the IEEE, vol. 97, no. 4, pp. 642–659, April 2009.
-  N. Patwari and S. K. Kasera, “Crawdad utah cir measurements,” http://crawdad.cs.dartmouth.edu/meta.php?name=utah/CIR.
-  S. Rehman, T. Turletti, and W. Dabbous, “A roadmap for benchmarking in wireless networks,” Technical Report, INRIA, 2011.
-  G. Jourjon, T. Rakotoarivelo, C. Dwertmann, and M. Ott, “Executable paper challenge: Labwiki: an executable paper platform for experiment-based research,” Procedia Computer Science, 2011.
-  A. Valadarsky, M. Schapira, D. Shahaf, and A. Tamar, “A machine learning approach to routing,” CoRR, vol. abs/1708.03074, 2017. [Online]. Available: http://arxiv.org/abs/1708.03074
-  S. P. R. Susan E. Embretson, Item Response Theory for Psychologists. LEA, 2000.