I Introduction
Highquality channel estimation is crucial for many wireless applications, which is resource demanding in both time and frequency, where channel interpolation [1] and prediction [2]
techniques are widely adopted to improve the estimation accuracy of channel state information (CSI).Recently, as an innovative and efficient branch of the modelfree machine learning methods, extreme learning machine (ELM) has gathered much interest from researchers in diversified areas. Owing to the unique properties such as fast training, solution uniqueness, and good generalization ability, ELM is promising to handle the channel interpolation tasks. However, the standard ELM was originally proposed to process vectorized data
[3], which is not directly applicable to address the channel interpolation problem of MIMO channels. Specifically, MIMO channels exhibit frequencyspace correlations, which are often recorded in the form of matrix or tensor, for which direct vectorization will lead to information loss. Therefore, a novel tensor based ELM, which is capable of handling tensorial inputs, is needed to handle the interpolation tasks by learning through the CSI in MIMO channels.In general, there has been much effort in adapting ELM to tensorial inputs by applying certain matrix/tensor decomposition techniques[4], which are usually empirical. In this paper, we propose a novel ELM model with tensorial inputs (TELM) to extend the traditional ELM models for tensorial contexts while retaining the valuable ELM features. Moreover, we further propose a Tucker decomposed extreme learning machine (TDELM) based on the Tucker decomposition method [8] to reduce the computational complexity and establish a theoretical argument for its interpolation capability accordingly. Experimental results verify that our proposed methods can achieve comparable performance against the traditional ELMs but with reduced complexity, and outperform the other methods considerably.
The remainder of this paper is organized as follows. Section II
reviews the background of singlelayer feedforward neural networks (SLFNs), traditional ELM, tensor operations and Tucker decomposition. Section
III presents the proposed TELM and TDELM models and discusses how they will be applied to channel interpolation, Section IV investigates the properties of the considered models, and Section V demonstrates the experimental results. Finally, Section VI concludes the paper.Ii Preliminaries
Iia Singlelayer Feedforward Networks with Vector Inputs
Consider a dataset with data samples () for , where is the feature vector of the th sample and is its label. Assume that an SLFN [3] contains
input neurons and
hidden neurons. The prediction of the label can be formulated as where is the weight matrix, whose th entry is the weight between the th input neuron and the th hidden neuron;is the bias vector from the input layer to the
th hidden neuron; is the weight vector between the hidden layer and the output neuron; andis the sigmoid activation function, defined as
. In this setting, the bias between the hidden layer and the output layer is omitted. We then aim to solve the following optimization problem(1) 
where and are the label vector and its prediction vector, respectively. A typical algorithm finds the optimal values of by propagating the errors backwards utilizing gradient or subgradient descent methods. However, the algorithm can be very sensitive to initialization and might be stuck at a local minimum due to the fact that the objective function is nonconvex. In addition, the algorithm can be timecosting, which restricts its usage in practical applications.
IiB Traditional Extreme Learning Machines with Vector Inputs
Traditional ELMs are originally designed to train SLFNs[3], which improve the training speed remarkably by randomly assigning and , transferring the aforementioned optimization into least square problems. More specifically, let be an matrix, whose th entry is . Instead of minimizing the objective function with respect to , , and , each entry of and is drawn from a continuous random distribution. After the matrix is calculated, the solution to is given by according to the GaussMarkov theorem [6], where is the MoorePenrose pseudoinverse of .
IiC Tensor Operations and Tucker Decomposition
In this paper, we treat tensors as multidimensional arrays. Specifically, a tensor of order stores elements. The inner product of two tensors, e.g., , is defined as . The vectorization of a tensor is obtained by stacking the elements of into an dimensional column vector in a fixed order [7], e.g., in a lexicographical order or reverse lexicographical order. Tensors can also be unfolded into matrices. Given a tensor and an index , the mode matricization of is a matrix with columns and rows obtained by unfolding along the th coordinate[7]. For , the mode rank of tensor , denoted by , is defined as the rank of , which satisfies [7].
The Tucker decomposition is a branch of the higherorder singular value decomposition
[8]. Consider and a vector . If ) for all , there exist a core tensor and factor matrices, i.e., with each , to be columnwise orthogonal:which could be written compactly as . If , there exists a core tensor and factor matrices with for and (an identity matrix) such that . If for some , such a core tensor and factor matrices do not exist. As an alternative, we can use an approximation of by the truncated Tucker decomposition [7].
We next introduce an important property of the Tucker decomposition mentioned in [9].
Lemma 1 (Duality Lemma):
Assume that a tensor admits a Tucker decomposition , where all factor matrices are columnwise orthonormal. Then, for any tensor , we have , where admits a Tucker decomposition .
Lemma 1 tells us that the inner product of the original pair versus is the same as that of the decomposed pair versus . If the sizes of and are much smaller than those of and , the computational cost of calculating the inner product could be drastically reduced.
Algorithm 1 TELM 
Input: Data samples , . 
Output: Weight vector . 
1 Draw each entry of the weight tensor and the bias 
vector from a continuous random distribution. 
2 Calculate matrix from , , and by (2). 
3 Calculate parameter vector by 
4 Return . 
Iii Frameworks
Iiia Channel Interpolation
To conduct highefficiency data transmission in a typical Multiinput Multioutput Orthogonal Frequency Division Multiplexing (MIMOOFDM) setting, CSI for each subcarrier is required. One way of acquiring CSI is inserting a pilot signal to each subcarrier, and then calculating the CSI at receiver. To further reduce the overhead of probing, the following interpolation scheme is adopted. Let denote the
th subcarrier’s CSI; pilots are inserted into subcarriers with odd indices. The CSI of subcarriers with even indices are then inferred as
where is the window length and is the interpolation function. In our work, we adopt the modified ELM as and the detail will be explained in the next section. Notice that different window design can be adopted to achieve the balance between carrier usage and accuracy.
IiiB Extreme Learning Machines with Tensorial Inputs
Consider a dataset with data samples for , where is a tensor of order , and is its label. For an SLFN with input neurons and hidden neurons, its prediction of label could be calculated as , where with as the weight tensor of the th hidden neuron; , , , , and are defined as in Section II. The goal is still to minimize Consistent with the traditional ELM, we draw each entry of the weight tensor and the bias vector from a continuous random distribution and solve the problem , where
(2) 
The problem has a unique least square solution , with is the MoorePenrose pseudoinverse of as defined above. Detailed procedures are summarized in Algorithm 1. It is noteworthy to point out that the TELM handles the tensorial inputs with the same computational cost as the traditional ELM with vectorized inputs.
IiiC Tucker Decomposed Extreme Learning Machines
To improve the learning efficiency of TELM, we further propose the Tucker decomposed extreme learning machine (TDELM) based on the Tucker decomposition method. Generally, computing in (2) requires multiplication operations, which is computationally demanding when dealing with a large dataset. However, by employing the Tucker decomposition, the computational cost of computing could be largely reduced when working with the datasets with a low mode rank.
Consider a dataset with samples for . We first concatenate into a tensor of order , then find the mode rank of for , and next apply Tucker decomposition to such that the core tensor is of size and the th factor matrix is an identity matrix. Afterwards, we extract along the th axis into subtensors . We next consider an SLFN with input neurons and hidden neurons. Let be the total weight tensor with as the weight tensor of the th hidden neuron, be the bias scalar from the input layer to the th hidden neuron, and be the weight vector between the hidden layer and the output neuron. Each entry of or is randomly drawn from a continuous random distribution, and is then calculated as
(3) 
Finally, we solve the optimization problem with the least square square solution . The above procedures are summarized in Algorithm 2.
Algorithm 2 TDELM 
Input: Data samples , . 
Output: Weight vector . 
1 Concatenate into a tensor of order . 
2 Find the mode rank of for . 
3 Conduct a Tucker decomposition such that 
, , 
for , and . 
4 Draw each entry of weight matrix and bias vector 
from a continuous random distribution. 
5 Calculate matrix from , , and by (3). 
6 Calculate parameter vector by 
7 Return . 
Iv Properties of TELM/TDELM
In this section, we establish the interpolation theorems for the TELM and TDELM. Specifically, given a dataset with distinct data samples , a TELM or TDELM with hidden neurons and sigmoid activation functions has the properties as follows.
Theorem 1 (Interpolation Capability of TELMs):
Assume that each entry of the weight tensor and bias vector
is randomly chosen from an interval according to a continuous probability distribution. Then with probability one,
in (2) is invertible and .Theorem 2 (Interpolation Capability of TDELMs):
Assume that each entry of the weight tensor and bias vector is randomly chosen from an interval according to a continuous probability distribution. Then with probability one, in (3) is invertible and .
Please refer to the Appendix for details of the proof.
V Simulation Results
In this section, an experiment is conducted over a realworld wireless MIMO channel response dataset to compare the performance of training time and accuracy among the simple modelfree method, the SLFN based method, traditional ELM based method and proposed TDELM based method. We will show that: 1) TDELM achieves comparable prediction accuracy against other methods and 2) TDELM requires lower computational cost and shorter training time than ELM and SLFN over decomposed data.
The dataset was generated via conducting experiments in a lecture hall. Specifically, a 64element virtual uniform linear array (ULA) is used as the transmitter (Tx) antenna array, and the receiver (Rx) with single antenna is deployed at three different locations within the auditorium of the hall. The carrier frequency is set to be 4 GHz, and the measurement bandwidth is 200 MHz that is uniformly sampled at 513 frequency points. Sixteen continuous snapshots are obtained for each subchannel. The obtained results are stored in a tensor of size , and each entry represents the response from one Tx array element to one Rx array element at a given frequency point. More detailed descriptions about the dataset could be found in [11].
The channel responses are normalized with zeromean and unit standard derivation for more convenient comparison. The window size is set to be 4. The feature tensor is created by a sliding window method, and the th element of is given by . The sliding window method will capture the local correlation in the neighborhood of adjacent points and the tensor decomposition is conducted on tensor .
The dataset is divided evenly into two subsets, one as the training set and the other as the test set, both containing 4,104 samples. Two neural networks, an ELM and a TDELM, are constructed with the same hiddenlayer neurons for comparison fairness. The Tucker decomposition was implemented by modifying the Tensor Toolbox library [12]
. Least mean square filtering and input averaging (the mean method) are used as comparison schemes, and grid search is conducted for the hyperparameters: the net size and the decomposition size. The results of mean squared error (MSE) and running time are recorded. To mitigate the fluctuation caused by the randomness, the SelectBest method
[13] is adopted and training is repeated 100 times to find the best parameter set.TDELM  ELM  TD+NN  

MSE  0.0280  0.0286  0.0645 
TC  1.3786  1.5896  397.28 
NN  Mean  LMSE  
MSE  0.0363  0.0769  0.0377 
TC  689.94  N/A^{1}^{1}1There are no tunable parameters for the Mean method, and hence the training time is unavailable.  0.0805 
The choices of hyperparameters are as follows: The node size in the hidden layer is 1,080, and the decomposition size is . The Tucker decomposition costs about 6 seconds over this dataset. The results of MSE and time consumption are summarized in Table I. A slice of the prediction results have been shown in Figs. 1 and 2.
It is clear that the TDELM is performing best, also being reasonably fast. From Fig. 2, it is shown that TDELM achieved consistent gains compared to ELM and NN. In addition, an instance of the curve is shown in Fig.1, where the blue line and yellow line denote the real value and the prediction value, respectively. It can be seen that in the area depicted by the orange circle, other methods have visible amount of errors, while TDELM still follows the curve pretty well. Another interesting observation is, although TDELM has better accuracy than ELM, the TD+NN scheme is the worst. This not only shows that TDELM can work better than ELM, but also implies that the gain is from both the decomposition and the fact of using correlating learning machines for prediction.
Vi Conclusion
In this paper, we have proposed an extreme learning machine with tensorial inputs (TELM) and a Tucker decomposed extreme learning machine (TDELM) to handle the channel interpolation tasks. Moreover, we have established a theoretical argument for the interpolation capability of TDELM. The experimental results verified that our proposed models can achieve comparable performance against the traditional ELM but with reduced complexity, and outperform the other methods considerably in terms of the channel interpolation accuracy.
References
 [1] H. C. Lee, J. C. Chiu, and K. H. Lin, “Twodimensional interpolationassisted channel prediction for OFDM systems,” IEEE Trans. Broadcast, vol. 59, no. 4, pp. 648–657, Dec. 2013.
 [2] I. C. Wong, A. Forenza, R. W. Heath and B. L. Evans, “Long range channel prediction for adaptive OFDM systems,” in Proc. IEEE Asilomar Conf. Signals Systems Computers, Pacific Grove, CA, Nov. 2004, pp. 732–736.
 [3] G. B. Huang, Q. Y. Zhu, and C. K. Siew, “Extreme learning machine: a new learning scheme of feedforward neural networks,” in Proc. IEEE Int. Joint Conf. Neural Netw., Budapest, Hungary, Jul. 2004, pp. 985–990.
 [4] N. K. Nair and S. Asharaf, “Tensor Decomposition Based Approach for Training Extreme Learning Machines,” Big Data Research, vol. 10, pp. 8–20, Dec. 2017.
 [5] I. Kotsia and I. Patras, “Relative margin support tensor machines for gait and action recognition,” in Proc. ACM CIVR, Xi’an, China, Jul. 2010, pp. 446–453.
 [6] C. D. Meyer, Matrix Analysis and Applied Linear Algebra, Philadelphia, PA: SIAM, Jun. 2001.
 [7] T. G. Kolda and B. W. Bader. “Tensor decompositions and applications,” SIAM Rev., vol. 51, no. 3, pp. 455–500, Aug. 2009.
 [8] L. R. Tucker, “Some mathematical notes on threemode factor analysis,” Psychometrika, vol. 31, no. 3, pp. 279–311, Sept. 1966.
 [9] X. Li, H. Zhou, and L. Li, “Tucker tensor regression and neuroimaging analysis,” Stat. Biosci., vol. 10, no. 3, pp. 520–545, Dec. 2018.
 [10] S. Banerjee and A. Roy, Linear Algebra and Matrix Analysis for Statistics, Norwell, MA: Chapman & Hall, Jun. 2014.
 [11] B. Zhang, B. Ai, R. He, F. Tufvesson, J. Flordelis, Q. Wang, and J. Li, “Empirical Evaluation of Indoor MultiUser MIMO Channels with Linear and Planar Large Antenna Arrays,” in IEEE PIMRC, Montreal, QC, Oct. 2017, pp. 1–6.
 [12] B. W. Bader and T. G. Kolda, “MATLAB tensor toolbox version 2.6,” Available: http://www.sandia.gov/~tgkolda/TensorToolbox/, 2015.

[13]
S. Dzeroski and B. Zenko, ”Is Combining Classifiers with Stacking Better than Selecting the Best One?”
Machine Learning, vol. 54, no. 3, pp. 255?273, Mar. 2004.