With the rapid development of electronic devices and mobile computing techniques, worldwide societal trends have demonstrated unprecedented changes in the way wireless communications are used. It is predicted that the monthly traffic of smartphones around the world will be about 50 exabytes in 2021 , which is about 12 times of that in 2016. Obviously, wireless communications have become indispensable to our society and involved many aspects of our life. Many familiar scenarios such as ultra-dense residential areas and office towers, subways, highways, and high-speed railways challenge the future mobile networks in terms of ultra-high traffic volume density, ultra-high connection density, or ultra-high mobility. Due to its ability to guarantee the users’ Quality of Service (QoS) and optimize the usage of facilities to maximum operators’ revenue, how to allocate radio resources more efficiently is always one hot topic for future wireless communications .
In practical networks, the overall performance depends on how to exploit the fluctuation of wireless channels and traffic loads to efficiently and dynamically manage the hyper-dimensional radio resources (such as frequency bands, time slots, orthogonal codes, transmit power, and transmit-receive beams) and fairly to support users’ QoS requirements. On one hand, radio resources are inherently scarce, since all users competitively share the common electromagnetic spectrum and wireless infrastructures. On the other hand, wireless services have been becoming increasingly sophisticated and various, each of which has a wide range of QoS requirements. Efficient and robust resource allocation algorithms are essential for the success of future mobile networks. Conventionally, the resource allocation problems are often formulated mathematically as optimization problems. After collecting instantaneous channel state information (CSI) and QoS requirements of users, the formulated optimization problems are solved online. That is, the solutions must be obtained shortly since wireless channels and traffic loads are varying quickly. However, most of the optimization problems are not convex , which indicates that the optimal solutions are often very difficult to be obtained, especially in the scenarios with a lot of users and diverse radio resources. Therefore, conventional Lagrangian relaxation or greedy methods are often employed to find solutions online. Inevitably, the online solutions of resource allocation will result in performance loss. With the increasing of users’ QoS requirements, conventional methods are facing great challenges in designing more sophisticated resource allocation schemes to further improve system performance with scarce radio resource, which motivates the exploration of novel design philosophy for resource allocation.
In March 2016, a five-game Go match was held between 18-time world champion Lee Sedol and AlphaGo, a computer Go program developed by Google DeepMind . From the views of conventional computing theory, Go had previously been regarded as an extremely difficult problem that was expected to be out of reach for the state-of-the-art technologies. Surprisingly, AlphaGo found its moves based on the knowledge previously “learned” from historical match records and won all but the fourth game. Inspired by the victory of AlphaGo, how to apply machine learning techniques to address the challenges in future communications attracts great attention and has been discussed widely [5, 6].
In practical wireless communications, the radio resources are dynamically allocated according to the instantaneous information including CSI and QoS requirements of users. Inexpensive cloud storage makes it very easy to save the information as data on historical scenarios that previously we would have ignored and trashed. Recent investigations have found that these data convey a lot of similarities between current and historical scenarios on user requirements and wireless propagation environments . Using the similarities among scenarios, the solutions of resource allocation in historical scenarios can be exploited to improve the resource allocation of current scenario. More specifically, the solutions of resource allocation in historical scenarios can be searched offline and stored in advance. When the measured data of current scenario arrives, it is not necessary to use conventional Lagrangian relaxation or greed methods to solve the resource allocation problem online. Instead, we only need to compare the current scenario with historical scenarios and find the most similar one. Then, we use the solution of the most similar historical scenario to allocate the radio resources for the current scenario. Interestingly, the offline characteristic makes it possible to use advanced cloud computing techniques to find optimal or near-optimal solutions of resource allocation for historical scenarios, which can improve the performance of resource allocation accordingly.
Ii Mathematical Modeling of Resource Allocation
As illustrated in Fig. 1
, the architecture of wireless communications assisted by cloud computing consists of three main components, (i) configurable computing resources clustered as a cloud with high computational and storage capabilities, (ii) base station (BS) with wireless access functions, and (iii) backhaul links which deliver the measured data of real scenarios from the BS to the cloud and deploy the machine learning based resource allocation schemes at the BS. More details will be discussed in the next section. In general, the resource allocation which is preformed at the BS can be formulated as a mathematical optimization problem, given by
is the variable vector of the problem,is the objective function to be minimized over the vector , is the parameter vector that specifies the problem instance, and are called inequality and equality constraint functions, respectively, and is called a constraint set. By convention, the standard form defines a minimization problem. A maximization problem can be treated by negating the objective function.
If a resource allocation problem is formulated as the form (1), all elements in the vector are referred as variables which describe the allocated amount or configuration of radio resources, such as the transmit power level, and the assigned subcarrier index. All elements in the vector are the system parameters or wireless propagation parameters, such as the bandwidth, the subcarrier number, and the background noise level. and are used to define the specific scenario and the limitations on the resource allocation, such as the available amount of radio resources, users QoS requirements, and the impacts from all kinds of interferences and noises. The objective function describes the characteristics of the best possible solution and reveals the design objective, i.e., the key performance metrics for resource allocation. For a specified scenario described by , the optimal solution of resource allocation is the vector that obtains the best value of objecitve function among all possible vectors and satisfies all constraints.
Iii A Machine Learning Framework
For existing wireless systems assisted by cloud computing, a huge amount of data on historical scenarios may have been collected and stored at the cloud. The strong computing capability of the cloud is exploited to search the optimal or near-optimal solutions for these historical scenarios. By classifying these solutions, the similarities hidden in these historical scenarios are extracted as a machine learning based resource allocation scheme. The machine learning based resource allocation scheme will be forwarded to guide BS how to allocate radio resource more efficiently. When a BS is deployed in a new area, there is usually no available data about historical scenarios. In this case, the initially historical data can be generated from an abstract mathematical model with realistic BS locations, accurate building footprints, presumptive user distribution and requirements, and wireless propagation models. When the new BS emerges into service, the measured data of real-time scenarios will be collected from practical systems, and later used as historical data for learning.
The proposed machine learning framework is shown in Fig. 2. At the cloud, a huge amount of historical data on scenarios are stored using the cloud storage. The historical data has a lot of attributes, including the user number, the CSI of users, international mobile subscriber identification numbers (IMSIs) of users, and so on. Some attributes, such as IMSIs of users, may be irrelevant for the specific resource allocation, i.e., these irrelevant attributes are not included in the parameter vector in the optimization problem (1). Learning from a large number of raw data with many attributes generally requires a large amount of memory and computation power, and it may influence the learning accuracy 
. Therefore, the irrelevant attributes can be removed without incurring much loss of the data quality. In order to reduce the dimensionality of the data and enables the learning process to operate faster and more effectively, feature selection is carried out to identify and remove as many irrelevant attributes as possible, which will be discussed in SectionIV-A.
Through feature selection, some key attributes are selected from the historical data and presented as a feature vector. However, there may exist some operation faults in the data measurement, transmission, and storage, which results in the abnormal, incomplete or duplicate values in feature vectors. Therefore, necessary preprocessing is required to delete erroneous or duplicate feature vectors. Then, all remain feature vectors are collected to form a very large dataset. Further, all feature vectors in dataset are split randomly into a training and a test set. Normally, 70-90% of the feature vectors is assigned into the training set.
With the training set, a supervised learning algorithm in machine learning is adopted to find the similarities hidden in historical data. By doing so, a predictive model can be built which will be used to make resource allocation decision for future unexpected scenario. More specifically, with the aid of cloud computing, advanced computing techniques can be used to search the solutions for the optimization problem (1) with more computational time. Compared with conventional Lagrangian relaxation or greedy methods, the performance of searched solutions can be improved significantly. Therefore, a high performance solution of resource allocation can be searched offline and associated with each training feature vector, which will be discussed in Section IV-B. All training feature vectors with the same solutions are classified into one class and each class is associated with its own solution. The resource allocation problem is now transformed into a multiclass classification problem, which will be discussed in Section IV-C. In order to solve the multiclass classification problem, a predictive model will be built with two functions. The first is to predict the class for future scenario, which can be mathematically described as a classifier . is the input feature vector extracted from scenario, and is the output class index showing that the scenario belongs to the class. Then, the associated solution of the class is selected to allocate radio resources for the scenario depicted by . Before deploying the model, the recently built predictive model is evaluated by the test set and further optimized until the evaluation results are satisfactory.
Using the backhaul links, the built predictive model and the associated solutions of all classes will be transmitted to BS. At the BS, the measured data of a real-time scenario is first used to form its new feature vector. Then the new feature vector will be input into the the built predictive model to allocate radio resource. Meanwhile, the new feature vector will be collected and stored temporarily at BS and forwarded to the cloud later for updating the dataset, which is very important for tracing the evolutions of real scenarios, including user behaviors and wireless propagation environments.
Although a lot of computing resources are consumed to build a predictive model, the computing work can be carried out offline during the off-peak time. Moreover, the dataset updating and the model deployment can also be accomplished during the off-peak time. Therefore, the cloud can be shared with multiple BSs and the computing tasks can be flexibly scheduled to make full use of the available computing resources.
Iv Application of Supervised Learning to Resource Allocation
In the proposed machine learning framework, a machine learning algorithm is adopted to build a predictive model. General speaking, machine learning algorithm is usually categorized as either supervised or unsupervised . In the supervised learning, the goal is to learn from training data which are labeled with nonnegative integers or classes
, in order to later predict the correct response when dealing with new data. The supervised approach is indeed similar to human learning under the supervision of a teacher. The teacher provides good training examples for the student, and the student then derives general rules from these specific examples. In contrast to supervised learning, the data for unsupervised learning have no labels and the goal instead is to organize the data and find hidden structures in unlabeled data. Most machine learning algorithms are supervised. In the following, we will discuss how to apply the supervised learning to solve the resource allocation problem.
Iv-a Feature Selection
In machine learning, feature selection, also known as attribute selection, is the process of selecting a subset of relevant attributes in historical data to form feature vector for building predictive models. The selection of an appropriate feature vector is critical due to the phenomenon known as “the curse of dimensionality”. That is, each dimension that is added to the feature vector requires exponentially increasing data in the training set, which usually results in practical significant performance degradation. Therefore, it is necessary to find a low dimension of feature vectors that captures the essence of resource allocation in practical scenarios.
In order to reduce the dimensionality of feature vectors, only valuable information for the resource allocation can be selected as features. After modeling the resource allocation as the optimization problem (1), all valuable information is included in the parameter vector . Observing the elements of , it can be found that they can be further divided into two categories: time-variant (dynamic) or time-invariant (static). Some elements are constants and thus labeled as time-invariant parameters, such as subcarrier number, maximum transmit power, and antenna number. Other elements that change quickly and are required to be measured and feedback all the time for making decisions of the resource allocation are labeled as time-variant parameters, such as user number, CSI of all users, and interference levels. As the time-invariant parameters keep unchanged, in order to minimize the dimension of the feature vectors, only the time-variant parameters can be considered to be features. Moreover, some time-variant parameters cannot be selected as features since it may be redundant in the presence of another relevant feature with which it is strongly correlated. In short, an individual feature vector specifies a unique scenario for resource allocation. However, it should be noted that the feature selection is a process of trial and error, which can be time consuming and costly especially with very large datasets.
Iv-B Solutions of Optimization Problems
To facilitate the application of supervised learning, the solution of resource allocation problem specified by each training feature vector should be obtained in advance. Then, each training feature vector is associated with its solution. According to the associated solutions, all feature vectors are labeled into multiple classes. More specifically, all training feature vectors with the same solution are placed with the same class label, indexed by a nonnegative integer. In other words, each class is associated with its unique solution. The class label information of all training feature vectors will be used to build a predictive model. In practice, the measured data of real-time scenario is selected as a new feature vector. Then the predictive model will predict the class for the new feature vector, and output the associated solution of the predicted class, i.e., how to allocate the radio resource for the real-time scenario. Obviously, if too many training feature vectors are associated with low performance solutions, the built predictive model cannot supply high performance solutions for practical resource allocation. Therefore, finding optimal or near-optimal solutions of all training feature vectors is crucial for building a high performance predictive model.
In the resource allocation problem (1), all elements in the vector are used to describe how to allocate the radio resources. Mathematically, the allocation of many radio resources can be described by integer variables, such as subcarriers, timeslots, modulation and coding schemes. Intuitively, the transmit power level can be adjusted arbitrarily between the maximum transmit power and zero. It seems that that only a continuous variable can be used to describe the transmit power allocation. However, in order to simplify the system complexity, the transmitter in practical systems are usually allowed to transmit signals with only a few prefixed power levels. Therefore, most practical resource allocation issues can be modeled as an integer optimization problem. When the number of integer variables in an integer optimization problem is very small, the optimal solution can be found by exhaustive search. However, if there are many integer variables, finding an optimal solution of resource allocation is extremely computationally complex because they are known to be non-deterministic polynomial-time hard (NP-hard) . In this case, it is more feasible to search the near-optimal solutions for all training feature vectors. Moreover, the offline characteristic of building model and the strong cloud computing and storage capabilities make it possible to spend more computation time using the metaheuristics to search near-optimal solutions. Some famous metaheuristics algorithms , such as particle swam optimization (PSO) and ant colony optimization (ACO) have been applied to the solution of many classic combination optimization problems. For most of these applications, the results show that these metaheuristics algorithms outperform other algorithms, including conventional Lagrangian relaxation or greed based algorithms.
Iv-C Multiclass Classification Problem and Classifier
When the class label information is ready for all training feature vectors, it starts to look for the similarities hidden in labeled feature vectors. Mathematically, the set of all possible feature vectors constitutes a feature space. As a special case, if the order of the feature vectors is 2, the feature space is a two-dimensional space, as shown Fig. 3. Note that and are the first and second elements of feature vectors, respectively. When all labeled feature vectors are shown in the feature space, it can be observed that the feature vectors with the same class label are often distributed very closely. Accordingly, the feature space can be divided into several subspaces and most feature vectors with the same class label are located within the same subspace. Then, the hidden similarities can be exploited by building a classifier , which predicts the class of new feature vector by determining which subspace it is located in. In supervised learning, such learning process is often called as a multiclass classification problem . So far, many machine learning algorithms have been used for designing multiclass classifiers. Selecting a machine learning algorithm is also a process of trial and error. It is a trade-off between specific characteristics of the algorithms, such as computational complexity, speed of learning, memory usage, predictive accuracy on new feature vectors, and interpretability. Therefore, the design of multiclass classifier is an essential task.
Here, we briefly introduce the -NN algorithm which is known to be very simple to understand but works incredibly well in practice. As shown in Fig. 4, whenever a new feature vector arrives, the -NN algorithm picks up totally nearest neighbors of the new feature vector from the training set. Then, the new feature vector is judged to belong to the most common class among its nearest neighbors. If , the new feature vector is simply categorized to the class of its nearest neighbor.
V Example: Beam Allocation in Multiuser Massive MIMO
In this section, the beam allocation problem in a single-cell multiuser massive MIMO system considered in  will be taken as an example to demonstrate the efficiency of our proposed machine learning framework of resource allocation.
In the single-cell system, it is assumed that the BS is located at the center of the circular cell and
users are uniformly distributed within the cell with unit radius, and each user is equipped with a single antenna. A massive mumber offixed beams are formed by deploying the Butler network  with a linear array of identical isotropic antenna elements at the BS. In such a fixed-beam system, a user will be served by a beam allocated to it as shown in Fig. 5 where each user is served by the beam in the same color and denotes the polar coordinate of user . To serve multiple users simultaneously, the key problem is: how to efficiently allocate beams to users to maximize the sum rate?
As the number of beams is much larger than the number of users and each user is served by one beam, only some of beams will be active for serving users. Therefore, we first need to decide which beams are active. This can be solved by applying our machine learning framework. Specifically, the active beam solution serves as the output of the predictive model. By assuming a line-of-sight (LoS) channel, as the beam gains of users from beams are determined by the users’ locations, the user layout should serve as the input data, which contains both the radial distance and phase information. Since the beam gains from various beams for a user significantly vary with its phase as shown in Fig. 5, the achievable sum rate with a beam allocation solution is mainly determined by the phase information of users. Therefore, the feature vector of a user layout data is selected as
where is the order statistics obtained by arranging .
Before performing resource allocation, we first need to train the predictive model by learning from a large amount of training user layout data, which can be generated by computer according to its distribution. For each training user layout, its feature vector formed according to (2) is associated with its active beam solution which can be obtained by employing offline beam allocation algorithms. Thanks to the strong cloud computing capability, optimal exhaustive search or near-optimal metaheuristics algorithms can be adopted as mentioned in Section IV-B. In this section, exhaustive search is applied for demonstration by assuming a smaller number of users and beams. After associating each feature vector in the training set with its active beam solution, all the training feature vectors are naturally classified into a variety of classes according to their active beam solutions. Specifically, the feature vectors sharing the same active beam solution are in the same class. A predictive model of active beam solution can be then built by applying a simple -NN algorithm111The -NN algorithm is employed in this section for illustration thanks to its simplicity., which can be then evaluated and optimized to guarantee its performance as Fig. 2 shows. For instance, it can be improved by adding more training data. The effect of the size of training set will be discussed by presenting Fig. 6(b).
The built predictive model is then deployed at the BS for beam allocation. For a new user layout , by forming its feature vector and defining the distance from its feature vector to a stored training feature vector , , as
nearest neighbor feature vectors with smallest distances are picked. According to the -NN algorithm, the most common class among these neighbors is chosen as the predictive class of the input user layout and the predictive model outputs the associated active beam solution of its predicted class. Based on the active beam information, each active beam is allocated to its best user with the highest received signal-to-interference-plus-noise ratio (SINR) by assuming equal power allocation among users. In addition, the new feature vectors will be collected to further update the dataset and trace the evolution of user layout.
Fig. 6 presents the average sum rate with our proposed machine learning framework of beam allocation versus the transmit signal-to-noise ratio (SNR) and size of training set. For comparison, the average sum rate with both optimal exhaustive search and low-complexity beam allocation (LBA) proposed in  are also plotted. It can be seen from Fig. 6(b) that as the number of training data increases, the average sum rate achieved by our proposed machine learning framework increases and gradually approaches that with the optimal exhaustive search. It can also be observed from Fig. 6 that with a larger training set, our algorithm outperforms the LBA algorithm proposed in , indicating that our proposed machine learning framework of resource allocation outperforms conventional techniques.
Note that for the aforementioned -NN algorithm, the distances between new data and existing training data are calculated in real time. As a result, with a large number of training data, the computation complexity would become very high in practical systems. It is therefore important to design a low-complexity multiclass classifier, which will be discussed in Section VI-A.
Vi Research Challenges and Open Issues
Machine learning offers a plethora of opportunities for the research in resource allocation for future wireless communications. There are many open issues still not being studied, and need to be further explored. This section outlines some of the most important ones from our viewpoints.
Vi-a Low-Complexity Classifier
More advanced techniques are required to design low-complexity multiclass classifiers. One of the promising techniques is to transform the multiclass classification problem into a set of binary classification problems that are efficiently solved using binary classifiers. So far, the support vector machine (SVM) has been regarded as one of the most robust and successful algorithms to design low-complexity binary classifiers, which determines the class of a new feature vector by using linear boundaries (hyperplanes) in high dimensional spaces. More specifically, the two classes are divided by only a few hyperplanes. Accordingly, the class is determined based on which sides of hyperplanes the new feature vector falls into. Compared with the-NN algorithm, the complexity of SVM-based binary classifiers is very low. For the aforementioned beam allocation example, the total number of active beam solutions is . In other words, there exist at most classes, which indicates the complexity is for determining the class of scenario. Meanwhile, the complexity of exhaustive search is . Obviously, our proposed machine learning framework of resource allocation can approach the optimal performance of exhaustive search with a low complexity. It is worth mentioning that several typical scenarios have been defined with different QoS requirements for future fifth-generation (5G) communications 
. For example, “Great service in a crowd” scenario focuses on providing reasonable experiences even in crowded areas including stadiums, concerts, and shopping malls. For each typical scenario, the hidden common features on user behaviors and wireless propagation environments may reduce the number of classes, which can be exploited to further reduce the complexity of classifiers. Recently the deep learning has shown the significant advantages in exploiting the hidden common features[16, 17].
Vi-B Multi-BS Cooperation
In practical networks, there may exist some users located at the edge of the BS coverage. If the edge user is served by only one BS, the signal quality may be very poor due to the long transmission distance. The cooperative transmission among multiple nearby BSs have been shown to be able to improve the edge user’s performance significantly . In this case, the radio resources at multiple BSs should be allocated jointly. Compared with a single BS scenario, the resource allocation problem of multi-BS cooperative transmissions requires more information among multiple BSs. Accordingly, more attributes in historical data will be selected into feature vectors, which makes the learning process more complicated and challenging. How to use historical data to improve the resource allocation with multi-BS cooperative transmissions is very challenging and needs to be studied.
Vi-C Fast Evolution of Scenarios
In many real scenarios, the user behaviors and wireless environments are time evolving essentially . That is, the characteristic hidden in historical scenarios is also dynamic. In most cases, such evolutions are too slow and gentle to be noticed. Such slow evolutions can be traced easily by constantly collecting data and periodically updating the dataset for learning. However, in some special situations, the evolutions may be very sudden and significant. For example, an emergency maintenance is carried out for a very busy road which changes the distribution of user locations and mobility characteristics greatly; a high building is demolished by blasting which changes the propagation environments significantly. Since the predictive model is built with outdated historical data, such fast evolutions may result in significant performance loss of resource allocation. In machine learning, this issue can be addressed by updating the predictive model whenever a new data is available. However, since the cloud computing is shared by many applications, the new data can only be stored temporarily at BSs and forwarded to update dataset later. How to deal with the fast evolutions of scenarios in resource allocation is a challenging topic in future research.
In future wireless communications, the conventional methods of resource allocation are facing great challenges to meet the ever-increasing QoS requirements of users with scarce radio resource. Inspired by the victory of AlphaGo, this paper proposed a machine learning framework for resource allocation and discussed how to apply the supervised learning to extract the similarities hidden in a great amount of historical data on scenarios. By exploiting the extracted similarities, the optimal or near-optimal solution of the most similar historical scenario is adopted to allocate the radio resources for the current scenario. An example of beam allocation in multi-user massive MIMO systems was then presented to verify that our proposed machine-learning based resource allocation performs better than conventional methods. In a nutshell, machine-learning based resource allocation is an exciting area for future wireless communications assisted by cloud computing.
-  E. AB. Traffic exploration, interactive online tool. https://www.ericsson.com/TET/trafficView/loadBasicEditor.ericsson.
-  E. Castaneda, A. Silva, A. Gameiro, and M. Kountouris, “An overview on resource allocation techniques for multi-user MIMO systems,” IEEE Communications Surveys Tutorials, vol. 19, no. 1, pp. 239–284, Firstquarter 2017.
-  Z.-Q. Luo and W. Yu, “An introduction to convex optimization for communications and signal processing,” IEEE Journal on Selected Areas in Communications, vol. 24, no. 8, pp. 1426–1438, Aug 2006.
-  https://deepmind.com/.
-  Z. M. Fadlullah and F. Tang and B. Mao and N. Kato and O. Akashi and T. Inoue and K. Mizutani, “State-of-the-Art Deep Learning: Evolving Machine Intelligence Toward Tomorrow’s Intelligent Network Traffic Control Systems,” IEEE Communications Surveys & Tutorials, vol. 19, no. 4, pp. 2432–2455, Fourthquarter 2017.
-  B. Mao and Z. M. Fadlullah and F. Tang and N. Kato and O. Akashi and T. Inoue and K. Mizutani, “Routing or Computing? The Paradigm Shift Towards Intelligent Computer Network Packet Transmission Based on Deep Learning,” IEEE Transactions on Computers, vol. 66, no. 11, pp. 1946–1960, Nov 2017.
-  S. Bi, R. Zhang, Z. Ding, and S. Cui, “Wireless communications in the era of big data,” IEEE Communications Magazine, vol. 53, no. 10, pp. 190–199, October 2015.
-  L. Bottou. Traffic engineering. http://www.cs.princeton.edu/courses/archive/spring10/cos424/slides/18-feat.pdf.
D. Francois, “High-dimensional data analysis: From optimal metric to feature selection,” Ph.D. dissertation, University in Ottignies-Louvain-la-Neuve, Belgium, 2007.
-  C. H. Papadimitriou and K. Steiglitz, Combinatorial optimization: Algorithms and complexity. Courier Corporation, 1982.
-  X.-S. Yang, Nature-Inspired Metaheuristic Algorithms. Luniver Press, 2008.
-  E. Alpaydin, Introduction to Machine Learning. MIT Press, 2014. [Online]. Available: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6917138
-  J. Wang, H. Zhu, L. Dai, N. J. Gomes, and J. Wang, “Low-complexity beam allocation for switched-beam based multiuser massive MIMO systems,” IEEE Transactions on Wireless Communications, vol. 15, no. 12, pp. 8236–8248, Dec 2016.
-  J. Buter and R. Lowe, “Beam-forming matrix simplifies design of electrically scanned antennas,” Electronic Design, April 1962.
-  A. Osseiran, F. Boccardi, V. Braun, K. Kusume, P. Marsch, M. Maternia, O. Queseth, M. Schellmann, H. Schotten, H. Taoka, H. Tullberg, M. A. Uusitalo, B. Timus, and M. Fallgren, “Scenarios for 5G mobile and wireless communications: The vision of the metis project,” IEEE Communications Magazine, vol. 52, no. 5, pp. 26–35, May 2014.
-  N. Kato and Z. M. Fadlullah and B. Mao and F. Tang and O. Akashi and T. Inoue and K. Mizutani, “The Deep Learning Vision for Heterogeneous Network Traffic Control: Proposal, Challenges, and Future Perspective,” IEEE Wireless Communications, vol. 24, 2017.
-  F. Tang and B. Mao and Z. M. Fadlullah and N. Kato and O. Akashi and T. Inoue and K. Mizutani, “On Removing Routing Protocol from Future Wireless Networks: A Real-time Deep Learning Approach for Intelligent Traffic Control,” IEEE Wireless Communications, vol. PP, no. 99, pp. 1–7, 2017.
-  X. Chen, H. H. Chen, and W. Meng, “Cooperative communications for cognitive radio networks: From theory to applications,” IEEE Communications Surveys Tutorials, vol. 16, no. 3, pp. 1180–1192, Third 2014.
-  A. Rakhlin. Online methods in machine learning. http://www.mit.edu/~rakhlin/6.883/.