I Introduction
In the fifthgeneration (5G) and future wireless communications, millimeter wave (mmWave) communication arises as an appealing solution to provide abundant available spectrum, and thus satisfies the critical demands for the explosively growing data traffic [7, 25]
. However, data transmission in the mmWave band is challenging due to the high path loss, resulting in a limited coverage area. The small carrier wavelength enables packing a large number of antenna elements into small form factors. Leveraging the large antenna arrays employed at the transmitter and receiver, mmWave systems can perform directional beamforming to achieve high beamforming gain, which helps overcome large freespace path loss of mmWave signals and guarantees sufficient received signaltonoise ratio (SNR). Nevertheless, the largescale antennas bring significant challenges for channel estimation, especially in highly mobile environments.
Recently, deep learning (DL)
[9] has been applied to physical layer communications and regarded as an enabling technology for future wireless mobile network. The learningbased approach is datadriven, and inherently applicable for the scenarios with imperfect models and/or intractable problems, where the modeldriven method cannot work well [19, 17, 11, 12]. However, the datadriven method especially the endtoend scheme has several obvious shortcomings, including high dependence on data, high training and model complexity, lack of interpretability and performance guarantee. Meanwhile, modeldriven methods are free from these shortcomings by nature. Therefore, embedding learnable modules into the existing modelbased system, or designing a specific neural network (NN) with domain knowledge in communications can combine the advantages of both paradigms and possibly achieve better performance
[30, 10].In terms of beam alignment/tracking (BA/T) for mobile terminals (MTs), current modelbased methods are feasible, and achieve (sub)optimal performance with simple and explicit simulated models [3, 16, 5]. Meanwhile, the practical environments have implicit and complex prior information in the time frequency spatial domain, which the datadriven methods can better utilize than the modeldriven ones. In the literature, the learningenabled BA/T in mobile environments have been widelyinvestigated in recent years. The beam alignment and user localization are strongly coupling in mmWave communications, With the aid of spatial location information, it is possible to conduct beam alignment with higher accuracy and lower overhead. In [20]
, a mapping from the user location to the beam pairs (fingerprints) is learned by supervised learning (SL). The labeled data are collected by different locations and stored in a database, and the mapping is usually realized by a deep NN (DNN) in complex practical environments. The spatial location information also can be implicitly presented as global positioning system (GPS) signals
[13] and 3D point clouds [26]. The DLenabled compressed sensing (CS) is developed in [14], and researchers design a structured DNNbased CS matrix for vehicular environments. Except for the above SL approaches, BA/T can be realized by deep reinforcement learning (DRL)
[22] in a closedloop manner [24, 32, 31]. In [31], an interactive learning design paradigm which makes full use of domain knowledge and adaptive learning, is developed. The paradigm requires no prior knowledge of the dynamic channel modeling, and thus is applicable for a variety of complicated scenarios. Different from above DLbased approaches, sparse Bayesian learning (SBL) has also been considered in [15], and lowrank property of timevarying massive multipleinput multipleoutput (MIMO) channel covariances is utilized to reduce the training overhead. Expectation maximizationbased SBL framework is used to learn the sparse parameter set. Furthermore, a Kalman filter is adopted to exploit the channel temporal correlations to enhance channel tracking accuracy. For highspeed railway (HSR) wireless networks, significant angle offset induced in initial access process is investigated in
[28]. This research is established on the periodicity and regularity of trains’ trajectory. To compensate the angle offset, the aligned beam is adjusted by the historical beam training results. To reduce the beam search space, a best beam pair lookup table is learned from the historical information.In this paper, we investigate the learningaided BAT for HSR mmWave wireless networks. The advanced HSR system has following notable features including: highspeed MTs up to ; highdensity MTs up to hundreds on one carriage; and highquality services such as realtime video transmission [2, 21]. Meanwhile, the current mobile network for the HSR system is far from satisfactory due to the scarcity of the spectrum resources. Therefore, it is essential to develop the mmWave techniques for the explosively growing demand in the advanced HSR system. The current beam management procedure which includes beam measurement, reporting and indication, performs well in a regular mmWave scenario where the MTs moves at a low speed, but is inapplicable for a typical HSR scenario [8]. In highspeed mobile scenarios, this procedure is inefficient due to the following two reasons:

Beam training overhead. Regarding beam measurement, the overhead caused by frequent beam training can be very huge, due to small beam dwelling time. When number of MTs increases to and train speed is , simulation results in [6] show that almost all time frequency resource are occupied by beam training.

Time delay loss. Regarding beam reporting and beam indication, the corresponding latency is mainly produced by activating candidate beams from radio resource control (RRC) pool. The report [6] demonstrates that the latency can be up to with synchronization signal block (SSB) periodicity.
To the best of our knowledge, the above two problems have not been addressed in the existing studies. Therefore, to reduce the beam training overhead and time delay loss in the HSR scenarios, it is essential to develop a new beam management framework.
In this paper, we propose a learningaided beam prediction scheme. More concretely, given a group of received pilot signals and measurements including Doppler frequencies and communication delays at different instants, we predict the optimal Tx/Rx beams within a period of future time with fine time granularity. The duration of beam prediction up to a second level, reduces the overhead and delay to be (near) zero; and the time granularity up to a millisecond level (greatly smaller than the beam dwelling time), guarantees the beamforming performance. The beam prediction can be carried out in a purely modeldriven manner, but it cannot perform well with implicit environment priors and system models. On the other hand, the purely datadriven approach which outputs highdimensional beam indexes, is difficult to be realized by an endtoend DNN. Consequently, we innovatively propose a modelbased learnable beam prediction scheme, which equivalently transform the highdimensional beam prediction into two cascaded stages, i.e., parameter estimation and hybrid beamforming.
First, given a group of observations, we derive estimation of two parameter sets, i.e., the MT locations and speeds separately and independently. Meanwhile, the bias and variance of the estimated results cannot be derived in a practical environment. Therefore, we propose a learnable data fusion module to implicitly estimate the corresponding bias and variance, to further improve both the estimation accuracy and robustness. Secondly, due to the prior information that the determinacy of the moving train trajectory and the mmWave channel can be welldescribed as urban macro (UMa) line of sight (LoS) in
TR [1], the hybrid beamforming is realized by the estimated parameter set. Additionally, to handle the nonlinearity of tracks, we propose a learnable nonlinear mapping module. The technical contributions of this work are summarized as follows.
We propose a beam prediction scheme which reduces the overhead and delay arised by beam measurement and reporting to (near) zero. Then, the highdimensional beam prediction problem is equivalently transformed into two cascaded subproblems, i.e., parameter estimation and hybrid beamforming, which are both modelbased and learnable.

Separate estimation of two parameter sets is performed using the maximum likelihood (ML) criterion. Furthermore, we propose a data fusion module to learn the corresponding biases and variances, and obtain a final parameter set with higher accuracy and better robustness.

We propose to predict the optimal BS analog precoder and MT combiner with the estimated parameter set. The longterm prediction duration is up to , and the fine time granularity is . The BS digital precoder is realized by classical minimal mean square error (MMSE) precoding.

We propose a learnable nonlinear mapping module to fit the nonlinear tracks, where the approximator is composed of piecewise linear functions. The learned mapping is used for MT location search in ML estimators, BS analog precoding and MT combining. The upper bound of fitting error is also given.
The rest of this paper is organized as follows. The system model and the problem formulation are described in Section. II. The beam prediction including parameter estimation and hybrid beamforming, is described in Section. IIIA. The numerical results are given in Section. IV, and the conclusions are drawn in Section. V.
Notations: We use lowercase (uppercase) boldface
to denote the vector (matrix), and
is a scalar. Calligraphy letterrepresents the set or the probability distribution. Superscripts
, and represent the transpose, the complex conjugate and the Hermitian transpose, respectively. denotes the expectation operator. denotes the an identity matrix, and meansis complex circularlysymmetric Gaussian distributed with zero mean and covariance
. is an absolute operator, denotes the norm. and represent the real field and complex field, respectively.Ii System Model and Problem Formulation
Iia System Model
We consider a linklevel multiuser (MU)MIMO mmWave communication system composed of a BS and several MTs. The BS is equipped with antennas and radio frequency (RF) chains, and the RF chains are fully connected with the antennas. The BS simultaneously serves MTs, and each MT is equipped with antennas and RF chain. In practice, both the analog transmitter precoder and receiver precoder
are realized by the discrete Fourier transform (DFT) codebooks, i.e.,
and . The received signal of MT in the antenna field is represented as follows(1) 
where denotes the the channel matrix, is and transmitter digital precoder, is the baseband signal, is the additional noise, respectively. According to the mmWave channel model^{1}^{1}1We assume the BS and the MTs are (approximately) on the same horizontal plane, and thus the uniform linear array (ULA) is considered., is a sum of the contributions of dominant paths, thus the discretetime narrowband channel matrix can be described as
(2) 
where is a complex channel gain of path , and are the azimuth angles of arrival (AoA) and departure (AoD), respectively. The array spacing is half of the carrier wavelength, and the array responses at the receiver and transmitter are respectively given as follows
(3) 
Abundant prior knowledge is available in the scene of HSR, and they are beneficial to simplify the beam prediction. We summarize the prior knowledge point by point:

The channel always contains a LoS path.

The power of LoS path is much higher than the nonLoS (NLoS) paths, i.e., .

The MT moves along the track at some initial speed and acceleration .

The mapping between the AoD of the LoS path and the corresponding spatial location projected on the xaxis is a bijection, i.e., .
According to the prior knowledge 1) and 2), the channel function in (2) can be further simplified as
(4) 
Therefore, the channel matrix can be described by a parameter set of only two elements, i.e., .
IiB Problem Formulation
As shown in Fig. 1, in the first step of beam prediction procedure, the parameters of the parametric motion model such as projected location and speed, are required to be estimated by different observations. Specifically, the observations are carried out times with fixed time interval . At each instant, the BS transmits all horizontal pilot beams. The first observation is the downlink pilot signal. Given the projected location at instant , the received pilot signal of beam can be written as
(5) 
where is the pilot signal. Secondly, another observation is the measured communication delay and Doppler frequency . At instant , the measurements are respectively given as follows
(6)  
(7) 
where is the measurement error of communication delay with variance , and is the measurement error of Doppler frequency with variance . The variances are modeled by the range resolution and Doppler frequency resolution in radar theory. Besides, the residual carrier frequency is included in . Therefore, variances are modeled as follows
(8)  
(9) 
where is the light speed, is the bandwidth, is the carrier frequency, is the integral time, and is the residual carrier frequency ratio.
The acceleration is considered in the assumption 3) in IIA. To ensure the passengers’ comfort, the absolute value of HSR acceleration is relatively small. Besides, we prove that accurate estimation of acceleration with limited observation interval and times is infeasible in Appendix A. Therefore, the effects of HSR acceleration can be neglected, and we only estimate the projected location and speed of MTs. Particularly, the parameter estimation problem is: given a group of received downlink pilot signal and measurements of communication delays and Doppler frequencies , how to estimate the parameter set of the MT, where , and are respectively the projected location and speed at the final instant .
Iii Beam Prediction
Iiia Parameter Estimation
IiiA1 Linear Tracks
Firstly, we consider a simple case where the track is modeled as a straight line parallel to the xaxis with fixed distance , as shown in Fig. 2(a). With the assumption of simplified uniform motion, the projected locations can be expressed as
(10) 
When the MT moves from left to right, speed is regarded as positive, and vice versa. Therefore, the expression of function can be expressed as follows
(11) 
With parameter set
, the posterior probability of received pilot signal is expressed as follows
(12) 
Therefore, the overall posterior probability of a group of received pilot signals can be described as follows
(13) 
To estimate the binary set with a unique solution, the number of measurements must be larger than the number of elements, i.e., .
The prior of can be difficult to obtain, and the parameter set can be estimated by maximum likelihood (ML) criterion without prior of . The posterior probability is nonconvex with respect to multidimensional set . Due to exponential computational complexity, exhaustive search in the highdimensional parameter space is difficult and inefficient. To reduce the computational complexity, we propose to use coordinate descent method which updates the elements in the parameter set alternately and iteratively [4]. In the first iteration, the parameter elements are unknown, and the initial elements are obtained as follows
(14)  
(15)  
(16) 
The closedform of is given in (15). Due to the nonconvexity of (14) and (16), and are both derived by onedimensional search. Similarly, the th derivation is given as follows
(17)  
(18)  
(19) 
Consider a linear track, the complete parameter estimation algorithm with the received pilot signals is given in Algorithm 1.
The parameter estimation by received pilot signals and measurements are carried out independently. According to the geometric relationship between the BS and MT in Fig. 2(a), the derivation of with respect to can be described as follows
(20)  
(21) 
where is the speed component along the direction of LoS. With (20) and (21), the posterior probabilities of the th measurement can be respectively described as follows
(22)  
(23) 
The overall posterior probability of a group of measurements can be described as follows
(24) 
Similarly, the estimation problem with given measurements can be solved by alternating iteration optimization. The initialization of the parameters and are respectively given as follows
(25)  
(26) 
where is the symbolic function. In iteration , the parameters are derived by onedimensional search, and they are respectively given as follows
(27)  
(28) 
Consider a linear track, the complete parameter estimation algorithm with the measurements is given in Algorithm 2.
IiiA2 Nonlinear Tracks
Secondly, we consider a more generalized case where the track is curved and the MTs move along the track with constant speed. The track is assumed to be a parallel straight line in the first case, but this assumption is not (strictly) true in many cases. To address this issue, we develop a datadriven and modeldriven approach for parameter estimation. More concretely, the datadriven method is used to fit the nonlinear track, and the modeldriven method is used to estimate the parameter set by ML criterion.
As shown in Fig. 2(b), the track is modeled as an arbitrary projection distance function but follows the assumption 4) in Section. IIA. The formula (10) holds when the track is linear. When the track is nonlinear, the solution of projected locations derived by (10) is replaced by
(29) 
where the function is defined as
(30) 
where is the firstorder derivative with respect to . When is known, we can only obtain an analytical solution of (29) in most cases. Meanwhile, monotonically decreases with respect to . Therefore, the solution of (29) can be derived by binary searching in a lookup table.
According to the principle of geometry, The initialization of the parameters and are respectively given as follows
(31)  
(32) 
where the function and are respectively derived as follows
(33) 
(34) 
Both the formulas (31) and (32) are transcendental equations. Formula (31) can be solved with a numerical solution . When is given, the in formula (34) is only related to the symbolic character of , thus (32) can be easily solved by a binary try. The expressions of and at projected location can be respectively derived as follows
(35)  
(36) 
It is easy to prove that the Doppler frequency function (21) is a special case of the generalized formula (36). The parameter estimation algorithm with the received pilot signals considering a nonlinear track is similar to Algorithm 1 which considers a linear track. We highlight the differences between the linear and nonlinear cases as follows
In addition to these differences, both the parameter estimation algorithms (with the received pilot signals and with the measurements) in the nonlinear case are the same as these in the linear case.
IiiA3 Function Fitting
The derivations in the nonlinear case are obtained under the condition that the track function is known. In practice, the deterministic function is unknown. Hence, can be learned by some parametric function in a datadriven manner, where is the network parameter set. Labeled data set can be offline collected by geometric measurements, such as aerial photography and satellite photography.
According to the universal approximation theorem, using sufficient hidden computing units, the multilayer perceptron (MLP) can approximate the track with arbitrary accuracy. However, a regular MLP usually has redundant parameters and lacks interpretability. We propose to construct a contribution of
piecewise linear functions to fit the function , and the parametric function is defined as follows(37) 
where denotes the learnable parameter set, and the rectangular window function is defined as follows
(38) 
and the range is equally divided into pieces where is the BS maximum service radius. The expression of is given as follows
(39) 
where denotes the range interval. Apparently, function is continuous on the support set of . The training is carried out in a SLenabled approach. The cost function^{2}^{2}2Consider the convenience of theoretical analysis, we use the mean absolute error (MAE) as the measure. In practice, the mean square error (MSE) is also feasible. is defined as
(40) 
where means is observed with additional Gaussian noise. The parameter set is iteratively updated by minibatch gradient descent (MBGD) until convergence. Due to the determinacy of the tracks, the deployed learned function does not require any online finetuning or periodic update. The corresponding analysis about the expected loss in (40) is presented in Appendix B.
IiiA4 Data Fusion
Ignoring the effects of acceleration, the parameter set is the sufficient statistics for the following beam prediction. In Section. IIIA we already have two estimated parameter sets which are derived from different observations independently. According to the statistical theory, there exists an optimal estimation from a group of independent observations. Meanwhile, the optimality is only guaranteed when the following assumptions hold true:

The estimated variables follow Gaussian distribution;

The estimations are unbiased;

The variances of estimated variables are known.
Firstly, due to the complexity and randomness of the practical wireless communication scenario, such as imperfect hardware and inaccurate models, the above assumptions cannot hold true and thus the performance gap between the theoretical and practical results is conspicuous. Secondly, there exists a potential mapping function between the projected location and the estimation accuracy. For example, when the MTs are far away from the BS, the estimation variance by the received pilot signals is very large due to high path loss and limited angular resolution, and vice versa. This indicates that using this mapping function can improve the estimation precision.
Out of these two motivations, we propose to develop a datadriven data fusion method. To distinguish the estimation results, we mark the parameter set derived by the received pilot signals as , and similarly mark the parameter set derived by the measurements as . As shown in Fig. 3, and are then concatenated as the input of the NN model , where denotes the network parameter set which is composed by the parameters of location network and speed network . The topologies of the two networks are the same, and each network is composed by a weight subnetwork and a bias subnetwork as illustrated in Table I
. The notation ’BN’ denotes batch normalization (BN), notation ’ReLU’ denotes rectified linear unit (ReLU), and integer is computation unit number of this layer. The expressions of the two networks are respectively written as
(41)  
(42) 
In principle, the network learns to estimate the variances and offsets of the input estimations implicitly, assigns the weights and biases for the input estimations. The proposed data fusion network shares the same principle as that of the wellstudied attention networks [23], which also adjust the weights by the input features. The output of the network is . Inspired by the modelbased estimation method, the output estimations are respectively derived as
(43)  
(44) 
Compared to the regular NNs, the proposed data fusion network is lightweighted, inherently against overfitting, and have a good interpretability.
The training can be realized in an openloop manner, i.e., the network is trained with prepared labeled data by SL. The training procedure is similar to the function fitting in Section. IIIA3, and the cost function is defined as
(45) 
where the subscript denotes the labeled data. When some term dominates the overall cost function, theoretically the loss of other terms can rise. Meanwhile, we have observed that the loss of the other terms grows slowly, even with a small training set. Actually, the domination rarely occurs in practical problems. Therefore, (45) is formulated as a sum. The parameter set is iteratively updated by MBGD method until convergence. Besides, the closeloop training can be carried out by reinforcement learning.
Weight subnetwork  Bias subnetwork  
Output layer  sigmoid,  linear, 
BN layer  
Hidden layer  ReLU,  ReLU, 
BN layer  
Hidden layer  ReLU,  ReLU, 
Input layer  linear,  linear, 
IiiB Hybrid Beamforming
In our proposed beam prediction procedure, the highdimensional beam prediction problem is equivalently transformed into a lowdimensional parameter estimation problem and a cascaded hybrid beamforming problem. In the hybrid beamforming, the hybrid precoders in a future time are predicted by the parameter set . As shown in Fig. 4, the time granularity is , and the number of predict instants is . Therefore, the period of hybrid beamforming is .
IiiB1 Transmitter Analog Precoder and Receiver Combiner
Firstly, we consider a linear track and the predicted projected location of MT at instant is given as
(46) 
The corresponding AoD can be derived by (11). Secondly, consider a nonlinear track, the derivations of projected location is rewritten as
(47) 
and the corresponding AoD is instead derived by (33). The receiver combiner and the transmitter analog precoder are respectively derived by
(48)  
(49) 
IiiB2 Digital Precoder
To simply the description, we take one instant of beam prediction for example. The digital precoding matrix is composed of precoders, i.e., . The equivalent lowdimensional channel is obtained as which is obtained by CSI reference signal (CSIRS) at the BS. We adopt a classical linear MMSE to derive the transmitter digital precoder as follows
(50) 
where , is a factor to control the BS maximum transmit power.
IiiB3 Hybrid Precoding
The complete hybrid beamforming procedure with a linear track is given in Algorithm 3. The procedure with a nonlinear track is similar to Algorithm 3. We highlight the differences between the linear and nonlinear cases as follows
All the other steps are the same of those in the linear case, and then the hybrid beamforming procedure in the nonlinear case is obtained. The hybrid precoders of different predicted instants can be parallel carried out.
IiiC Implement
Consider a nonlinear track, the nonlinear mapping module and the data fusion module are assumed to be offline trained, and finetuned online. Finally, we summarize the implement procedure of beam prediction in Table II.
Initialization: 
The nonlinear mapping module and the data fusion module. 
Observation process: 
The BS transmits pilot signals to the MT. 
The MT estimates the Doppler frequencies and ToAs, then feedbacks the received pilot signals, Doppler frequencies and ToAs to the BS. 
The BS derives the final estimation result with feedbacks. 
Hybrid beamforming process: 
The BS predicts the BS analog precoder and MT combiner with the final estimation result. 
The BS transmits the MT combiner to the MT. 
The BS transmits the data signals to the MTs with hybrid beamforming, and the MTs receive the signals with combiners. 
Iv Simulation Results
Iva System Configurations
In this section, we present the simulation results to demonstrate the performance of the proposed learningaided beam prediction scheme. Generally, the simulated mmWave channel in HSR is modeled as UMa LoS in TR , and the wireless communication configurations are listed in Table III. The BS has sectors and each sector covers range. Each MT has panels (left, back, right). The speeds of MTs on board are modeled to follow Laplacian distribution, and the acceleration is also considered as an uncertain factor. Besides, the geometry of the established HSR scenario and the setting of the training beams/measurements [18, 1, 29] are given in Table III. The BA/T is regarded as benchmark and only the horizontal beam alignment is considered. The BS/MT tracks 3 Tx/Rx beams (current, left and right) in each BA/T with period being .
Name  Value  Name  Value  Name  Value 
Scenario  UMa LoS  MT speed variance  Carrier Frequency  
MT acceleration variance  Bandwidth  HSR speed  
Noise power spectral density  BS antenna number  BS maximum transmit power  
MT antenna number  Half inter site distance  prediction time duration  
Minimum BS to MT Distance  Prediction time granularity  Integral time  
Observation period  Residual carrier frequency ratio  Observation times 
IvB Data Fusion
IvB1 Linear Tracks
As shown in Fig. 5(a), the location estimation accuracy by measurements is high when the MTs are far away from the BS, while the accuracy is low when the MTs are located adjacent to the BS. This phenomenon is caused by the lack of prior knowledge on the MT speed. The communication delay contains the information of distance between the BS and the MT, Whether the MT is located at right or left side of BS, however, cannot be inferred from the delay. Besides, we use symbolic character of Doppler frequency to discriminate the MT speed direction in (25). However, the estimation performance cannot be improved especially when the measured Doppler frequency is significantly noised or the projected speed component is sharply reduced. Therefore, as shown in Figs. 5(a) and 5(b), neither the location and the speed estimations can be accurate in this range.
Meanwhile, the estimations derived by the received pilot signals becomes more accurate when the BS to MT distance is reduced, because the corresponding path loss is reduced and the SNR of the signals is increased. Besides, the AoD of the MT is also easy to be distinguished in this range. When the MT moves far away form the BS, the variances of estimation sharply increased^{3}^{3}3The illustrated MSE curves are regularized by a maximum value being ..
Generally, the estimation accuracies by measurements are more accurate than that by received pilot signals when the MTs are away from the BS, but the estimation accuracies by received pilot signals are more accurate than that by measurements when the MTs are around the BS. In a datadriven manner, our proposed data fusion method has the highest accuracy with respect to the projected location , both in the estimations of location and speed. The validity of the proposed method is verified by the simulation results, which also indicate that the function have learned a weight function with respect to .
IvB2 Nonlinear Tracks
In this part, we consider a more complex case where the track is modeled as a nonlinear function, namely . Firstly, we demonstrate the estimation results with the nonlinear track in Figs. 6(a) and 6(b). We compare Figs. 5(a) and 6(a), Figs. 5(b) and 6(b), and we have found that the trends of estimation variance with linear and nonlinear tracks are similar.
Using linear estimation algorithms described in Algorithms 1 and 2, the estimation results with the nonlinear track are shown in Figs. 6(a) and 6(b). We compare the estimations with and without nonlinear correction, and we have found that the estimation performance is significantly improved with nonlinear correction, which verifies the effectiveness of our proposed nonlinear mapping module. We have also noticed the datadriven method is significantly better than the primary two estimators in Figs. 6(a) and 6(b). This is mainly because the estimators become biased due to the model mismatch, and the data fusion module can reduce these biases and improve the estimation performance to some extent. Additionally, our proposed data fusion method also has the highest accuracy when the track is nonlinear.
IvC Nonlinear Mapping
To verify the effectiveness of our proposed piecewise function, we consider several regular regressors as comparisons, i.e., random forest (RF) with
decision trees, support vector machine (SVM) with linear kernels, secondorder polynomial regression (Poly), and a two hiddenlayer MLP where each layer has neurons. The simulation result is given in Table IV, the RF and SVM cannot perform well. The Poly is feasible, and the model complexity is very low (only three parameters). Meanwhile, the polynomial order must be known, otherwise a loworder or a highorder Poly performs badly. The MLP is also feasible, but at the cost of huge model and computation complexities. Our proposed method achieves the best tradeoff between fitting precision and model/computation complexity, and does not require any priors. Thus, we claim interpretability and simplicity.Regressor  RF  SVM  Poly  MLP  proposed 
MAE 
Optimal  Data fusion  Measurements  Pilot signals  
Spectral efficiency (bps/Hz)  
Beam prediction accuracy 
IvD Spectral Efficiency and beam prediction accuracy
The results of spectral efficiency (SE) and beam prediction accuracy^{4}^{4}4The averaged simulation results with linear and nonlinear tracks are demonstrated, since their performance are highly similar. are listed in Table V. Due to the influence of acceleration, the missalignment of optimal occurs at a probability of , and the optimal accuracy is . The proposed data fusion method outperforms the methods by measurements and received pilot signals, in terms of both SE and beam prediction accuracy. We also notice that the SE and beam prediction accuracy of the proposed method are close to the optimal.
IvE Overhead and Throughput
Consider MT specific downlink and uplink overheads, the overheads of BA/T and proposed beam prediction grow linearly with MT number [6]. We define overhead cost ratio as proportion of overhead occupied in time frequency resource. A real number is quantized by bits. Consider a time division duplex system, the proposed beam prediction along with the baseline are demonstrated in Table VI. Compared to BA/T, the proposed beam prediction consumes nearzero overheads, and thus the corresponding effective throughput (reducing downlink/uplink overhead) is higher when the MT number is greater than . Additionally, the delay loss in BA/T is about , while the proposed beam prediction has zero delay. The simulation results validate the effectiveness of our proposed scheme.
MT number  
Overhead cost ratio  BA/T  
Beam prediction  
Mean effective throughput (Mbps)  BA/T  
Beam prediction 
Furthermore, the effective throughput of MTs is illustrated in Fig. 8. edge MT on the left side denotes the lowest MT throughput while ile denotes the highest , and the middle is average throughput. Generally, the mean and ile throughputs of predictable methods are greatly improved by about , compared to the baseline BA/T. We also notice that the celledge MTs with data fusion achieve the highest throughput, and outperforms these of pilot signals and measurements.
V Conclusions
In an HSR scenario, the beam prediction which transformed into a parameter estimation and a cascaded hybrid beamforming was investigated. Based on the ML criterion, the parameter estimations with received pilot signals and measurements were respectively carried out by the coordinate descent method, and a data fusion module was proposed to further improve the estimation accuracy and robustness. In hybrid beamforming, the future beam directions and channel amplitudes were predicted for hybrid beamforming. Besides, the learnable nonlinear mapping module was adopted for the HSR scenarios with nonlinear tracks. The simulation results showed that the proposed beam prediction scheme with learnable modelbased modules outperformed the one without data fusion or the nonlinear mapping, in terms of effective throughput and alignment rate.
In our future work, we will consider integrated communications and sensing, as well as learnable prior information to further improve the beam prediction performance.
Vi Acknowledgments
We would like to thank Dr. Hengtao He and Bo Gao for valuable discussions.
Appendix A Analysis on the Speed and Acceleration Estimation
Suppose that the speed and acceleration of a MT are respectively and . Consider a linear track, according to (10), the projected locations are where denotes the observed times. We have the following wellposed or overdetermined equation:
(51)  
(52) 
where and , , and denotes the time interval. Multiplying at both sides of (51), and we have the following equation
(53) 
The first power, quadratic, cubic, and quartic sum formulas are expressed as
(54)  
(55)  
(56)  
(57) 
Using (54), (55), (56), (57), and we solve the equation (51), obtain
(58)  