techniques are widely adopted to improve the estimation accuracy of channel state information (CSI).Recently, as an innovative and efficient branch of the model-free machine learning methods, extreme learning machine (ELM) has gathered much interest from researchers in diversified areas. Owing to the unique properties such as fast training, solution uniqueness, and good generalization ability, ELM is promising to handle the channel interpolation tasks. However, the standard ELM was originally proposed to process vectorized data, which is not directly applicable to address the channel interpolation problem of MIMO channels. Specifically, MIMO channels exhibit frequency-space correlations, which are often recorded in the form of matrix or tensor, for which direct vectorization will lead to information loss. Therefore, a novel tensor based ELM, which is capable of handling tensorial inputs, is needed to handle the interpolation tasks by learning through the CSI in MIMO channels.
In general, there has been much effort in adapting ELM to tensorial inputs by applying certain matrix/tensor decomposition techniques, which are usually empirical. In this paper, we propose a novel ELM model with tensorial inputs (TELM) to extend the traditional ELM models for tensorial contexts while retaining the valuable ELM features. Moreover, we further propose a Tucker decomposed extreme learning machine (TDELM) based on the Tucker decomposition method  to reduce the computational complexity and establish a theoretical argument for its interpolation capability accordingly. Experimental results verify that our proposed methods can achieve comparable performance against the traditional ELMs but with reduced complexity, and outperform the other methods considerably.
The remainder of this paper is organized as follows. Section II
reviews the background of single-layer feedforward neural networks (SLFNs), traditional ELM, tensor operations and Tucker decomposition. SectionIII presents the proposed TELM and TDELM models and discusses how they will be applied to channel interpolation, Section IV investigates the properties of the considered models, and Section V demonstrates the experimental results. Finally, Section VI concludes the paper.
Ii-a Single-layer Feedforward Networks with Vector Inputs
Consider a dataset with data samples () for , where is the feature vector of the -th sample and is its label. Assume that an SLFN  contains
input neurons andhidden neurons. The prediction of the label can be formulated as where is the weight matrix, whose -th entry is the weight between the -th input neuron and the -th hidden neuron;
is the bias vector from the input layer to the-th hidden neuron; is the weight vector between the hidden layer and the output neuron; and
is the sigmoid activation function, defined as. In this setting, the bias between the hidden layer and the output layer is omitted. We then aim to solve the following optimization problem
where and are the label vector and its prediction vector, respectively. A typical algorithm finds the optimal values of by propagating the errors backwards utilizing gradient or sub-gradient descent methods. However, the algorithm can be very sensitive to initialization and might be stuck at a local minimum due to the fact that the objective function is non-convex. In addition, the algorithm can be time-costing, which restricts its usage in practical applications.
Ii-B Traditional Extreme Learning Machines with Vector Inputs
Traditional ELMs are originally designed to train SLFNs, which improve the training speed remarkably by randomly assigning and , transferring the aforementioned optimization into least square problems. More specifically, let be an matrix, whose -th entry is . Instead of minimizing the objective function with respect to , , and , each entry of and is drawn from a continuous random distribution. After the matrix is calculated, the solution to is given by according to the Gauss-Markov theorem , where is the Moore-Penrose pseudoinverse of .
Ii-C Tensor Operations and Tucker Decomposition
In this paper, we treat tensors as multi-dimensional arrays. Specifically, a tensor of order stores elements. The inner product of two tensors, e.g., , is defined as . The vectorization of a tensor is obtained by stacking the elements of into an -dimensional column vector in a fixed order , e.g., in a lexicographical order or reverse lexicographical order. Tensors can also be unfolded into matrices. Given a tensor and an index , the -mode matricization of is a matrix with columns and rows obtained by unfolding along the -th coordinate. For , the -mode rank of tensor , denoted by , is defined as the rank of , which satisfies .
The Tucker decomposition is a branch of the higher-order singular value decomposition. Consider and a vector . If ) for all , there exist a core tensor and factor matrices, i.e., with each , to be column-wise orthogonal:
which could be written compactly as . If , there exists a core tensor and factor matrices with for and (an identity matrix) such that . If for some , such a core tensor and factor matrices do not exist. As an alternative, we can use an approximation of by the truncated Tucker decomposition .
We next introduce an important property of the Tucker decomposition mentioned in .
Lemma 1 (Duality Lemma):
Assume that a tensor admits a Tucker decomposition , where all factor matrices are column-wise orthonormal. Then, for any tensor , we have , where admits a Tucker decomposition .
Lemma 1 tells us that the inner product of the original pair versus is the same as that of the decomposed pair versus . If the sizes of and are much smaller than those of and , the computational cost of calculating the inner product could be drastically reduced.
|Algorithm 1 TELM|
|Input: Data samples , .|
|Output: Weight vector .|
|1 Draw each entry of the weight tensor and the bias|
|vector from a continuous random distribution.|
|2 Calculate matrix from , , and by (2).|
|3 Calculate parameter vector by|
|4 Return .|
Iii-a Channel Interpolation
To conduct high-efficiency data transmission in a typical Multi-input Multi-output Orthogonal Frequency Division Multiplexing (MIMO-OFDM) setting, CSI for each sub-carrier is required. One way of acquiring CSI is inserting a pilot signal to each sub-carrier, and then calculating the CSI at receiver. To further reduce the overhead of probing, the following interpolation scheme is adopted. Let denote the
-th sub-carrier’s CSI; pilots are inserted into sub-carriers with odd indices. The CSI of sub-carriers with even indices are then inferred as
where is the window length and is the interpolation function. In our work, we adopt the modified ELM as and the detail will be explained in the next section. Notice that different window design can be adopted to achieve the balance between carrier usage and accuracy.
Iii-B Extreme Learning Machines with Tensorial Inputs
Consider a dataset with data samples for , where is a tensor of order , and is its label. For an SLFN with input neurons and hidden neurons, its prediction of label could be calculated as , where with as the weight tensor of the -th hidden neuron; , , , , and are defined as in Section II. The goal is still to minimize Consistent with the traditional ELM, we draw each entry of the weight tensor and the bias vector from a continuous random distribution and solve the problem , where
The problem has a unique least square solution , with is the Moore-Penrose pseudoinverse of as defined above. Detailed procedures are summarized in Algorithm 1. It is noteworthy to point out that the TELM handles the tensorial inputs with the same computational cost as the traditional ELM with vectorized inputs.
Iii-C Tucker Decomposed Extreme Learning Machines
To improve the learning efficiency of TELM, we further propose the Tucker decomposed extreme learning machine (TDELM) based on the Tucker decomposition method. Generally, computing in (2) requires multiplication operations, which is computationally demanding when dealing with a large dataset. However, by employing the Tucker decomposition, the computational cost of computing could be largely reduced when working with the datasets with a low -mode rank.
Consider a dataset with samples for . We first concatenate into a tensor of order , then find the -mode rank of for , and next apply Tucker decomposition to such that the core tensor is of size and the -th factor matrix is an identity matrix. Afterwards, we extract along the -th axis into subtensors . We next consider an SLFN with input neurons and hidden neurons. Let be the total weight tensor with as the weight tensor of the -th hidden neuron, be the bias scalar from the input layer to the -th hidden neuron, and be the weight vector between the hidden layer and the output neuron. Each entry of or is randomly drawn from a continuous random distribution, and is then calculated as
Finally, we solve the optimization problem with the least square square solution . The above procedures are summarized in Algorithm 2.
|Algorithm 2 TDELM|
|Input: Data samples , .|
|Output: Weight vector .|
|1 Concatenate into a tensor of order .|
|2 Find the -mode rank of for .|
|3 Conduct a Tucker decomposition such that|
|for , and .|
|4 Draw each entry of weight matrix and bias vector|
|from a continuous random distribution.|
|5 Calculate matrix from , , and by (3).|
|6 Calculate parameter vector by|
|7 Return .|
Iv Properties of TELM/TDELM
In this section, we establish the interpolation theorems for the TELM and TDELM. Specifically, given a dataset with distinct data samples , a TELM or TDELM with hidden neurons and sigmoid activation functions has the properties as follows.
Theorem 1 (Interpolation Capability of TELMs):
Theorem 2 (Interpolation Capability of TDELMs):
Assume that each entry of the weight tensor and bias vector is randomly chosen from an interval according to a continuous probability distribution. Then with probability one, in (3) is invertible and .
Please refer to the Appendix for details of the proof.
V Simulation Results
In this section, an experiment is conducted over a real-world wireless MIMO channel response dataset to compare the performance of training time and accuracy among the simple model-free method, the SLFN based method, traditional ELM based method and proposed TDELM based method. We will show that: 1) TDELM achieves comparable prediction accuracy against other methods and 2) TDELM requires lower computational cost and shorter training time than ELM and SLFN over decomposed data.
The dataset was generated via conducting experiments in a lecture hall. Specifically, a 64-element virtual uniform linear array (ULA) is used as the transmitter (Tx) antenna array, and the receiver (Rx) with single antenna is deployed at three different locations within the auditorium of the hall. The carrier frequency is set to be 4 GHz, and the measurement bandwidth is 200 MHz that is uniformly sampled at 513 frequency points. Sixteen continuous snapshots are obtained for each sub-channel. The obtained results are stored in a tensor of size , and each entry represents the response from one Tx array element to one Rx array element at a given frequency point. More detailed descriptions about the dataset could be found in .
The channel responses are normalized with zero-mean and unit standard derivation for more convenient comparison. The window size is set to be 4. The feature tensor is created by a sliding window method, and the -th element of is given by . The sliding window method will capture the local correlation in the neighborhood of adjacent points and the tensor decomposition is conducted on tensor .
The dataset is divided evenly into two subsets, one as the training set and the other as the test set, both containing 4,104 samples. Two neural networks, an ELM and a TDELM, are constructed with the same hidden-layer neurons for comparison fairness. The Tucker decomposition was implemented by modifying the Tensor Toolbox library 
. Least mean square filtering and input averaging (the mean method) are used as comparison schemes, and grid search is conducted for the hyperparameters: the net size and the decomposition size. The results of mean squared error (MSE) and running time are recorded. To mitigate the fluctuation caused by the randomness, the SelectBest method is adopted and training is repeated 100 times to find the best parameter set.
|TC||689.94||N/A111There are no tunable parameters for the Mean method, and hence the training time is unavailable.||0.0805|
The choices of hyperparameters are as follows: The node size in the hidden layer is 1,080, and the decomposition size is . The Tucker decomposition costs about 6 seconds over this dataset. The results of MSE and time consumption are summarized in Table I. A slice of the prediction results have been shown in Figs. 1 and 2.
It is clear that the TDELM is performing best, also being reasonably fast. From Fig. 2, it is shown that TDELM achieved consistent gains compared to ELM and NN. In addition, an instance of the curve is shown in Fig.1, where the blue line and yellow line denote the real value and the prediction value, respectively. It can be seen that in the area depicted by the orange circle, other methods have visible amount of errors, while TDELM still follows the curve pretty well. Another interesting observation is, although TDELM has better accuracy than ELM, the TD+NN scheme is the worst. This not only shows that TDELM can work better than ELM, but also implies that the gain is from both the decomposition and the fact of using correlating learning machines for prediction.
In this paper, we have proposed an extreme learning machine with tensorial inputs (TELM) and a Tucker decomposed extreme learning machine (TDELM) to handle the channel interpolation tasks. Moreover, we have established a theoretical argument for the interpolation capability of TDELM. The experimental results verified that our proposed models can achieve comparable performance against the traditional ELM but with reduced complexity, and outperform the other methods considerably in terms of the channel interpolation accuracy.
-  H. C. Lee, J. C. Chiu, and K. H. Lin, “Two-dimensional interpolation-assisted channel prediction for OFDM systems,” IEEE Trans. Broadcast, vol. 59, no. 4, pp. 648–657, Dec. 2013.
-  I. C. Wong, A. Forenza, R. W. Heath and B. L. Evans, “Long range channel prediction for adaptive OFDM systems,” in Proc. IEEE Asilomar Conf. Signals Systems Computers, Pacific Grove, CA, Nov. 2004, pp. 732–736.
-  G. B. Huang, Q. Y. Zhu, and C. K. Siew, “Extreme learning machine: a new learning scheme of feedforward neural networks,” in Proc. IEEE Int. Joint Conf. Neural Netw., Budapest, Hungary, Jul. 2004, pp. 985–990.
-  N. K. Nair and S. Asharaf, “Tensor Decomposition Based Approach for Training Extreme Learning Machines,” Big Data Research, vol. 10, pp. 8–20, Dec. 2017.
-  I. Kotsia and I. Patras, “Relative margin support tensor machines for gait and action recognition,” in Proc. ACM CIVR, Xi’an, China, Jul. 2010, pp. 446–453.
-  C. D. Meyer, Matrix Analysis and Applied Linear Algebra, Philadelphia, PA: SIAM, Jun. 2001.
-  T. G. Kolda and B. W. Bader. “Tensor decompositions and applications,” SIAM Rev., vol. 51, no. 3, pp. 455–500, Aug. 2009.
-  L. R. Tucker, “Some mathematical notes on three-mode factor analysis,” Psychometrika, vol. 31, no. 3, pp. 279–311, Sept. 1966.
-  X. Li, H. Zhou, and L. Li, “Tucker tensor regression and neuroimaging analysis,” Stat. Biosci., vol. 10, no. 3, pp. 520–545, Dec. 2018.
-  S. Banerjee and A. Roy, Linear Algebra and Matrix Analysis for Statistics, Norwell, MA: Chapman & Hall, Jun. 2014.
-  B. Zhang, B. Ai, R. He, F. Tufvesson, J. Flordelis, Q. Wang, and J. Li, “Empirical Evaluation of Indoor Multi-User MIMO Channels with Linear and Planar Large Antenna Arrays,” in IEEE PIMRC, Montreal, QC, Oct. 2017, pp. 1–6.
-  B. W. Bader and T. G. Kolda, “MATLAB tensor toolbox version 2.6,” Available: http://www.sandia.gov/~tgkolda/TensorToolbox/, 2015.
S. Dzeroski and B. Zenko, ”Is Combining Classifiers with Stacking Better than Selecting the Best One?”Machine Learning, vol. 54, no. 3, pp. 255-?273, Mar. 2004.