A fundamental bottleneck in operating large-dimensional millimeter wave (mmWave) array antenna systems is how to accurately align beams between the transmitter and receiver in low latency [1, 2]. The use of directional narrow beams for searching the entire beam space (also called exhaustive beam search) is an extremely time-consuming operation; the exhaustive beam search has been used in existing mmWave WiFi standards including IEEE 802.15.3c  and IEEE 802.11ad , for example. For reduced overhead beam alignment, hierarchical codebooks [2, 5], compressed sensing-based algorithms [6, 7], overlapped beam pattern  and beam coding  have been proposed over the years, establishing a “structured beam alignment” paradigm. Despite a plethora of such beam alignment methods, the overhead issue still remains a critical challenge in mmWave communications.
Recently, the beam alignment problem has been approached in a statistical-machine-learning point-of-view , with a primary focus on an application of the Kolmogorov model (KM) . In , Kolmogorov elementary representations (KERs) of the received signal power values that are associated with the beam pairs in a training beam codebook are learned by solving a constrained error minimization problem. In doing so, the KERs of unsounded beam pairs are predicted by exploiting the predictive power of the KM, leading to a significantly reduced beam alignment overhead. However, there are two fundamental limitations to the conventional KM learning in the beam alignment context. First, the computational complexity of the KM training algorithm in [10, 11] is prohibitively high; the complexity is not scalable with the number of antennas and the size of codebooks. Second, the initial work in 
centers on a frequency estimation (FE) method to estimate empirical probabilities of the training set, which has to rely on a threshold setting for hypothesis testing; the threshold value is treated as a hyper-parameter, which is determined based on numerical simulations. Ultimately, the desired threshold setting must account for a specific performance criterion so as to improve the predictive power of KM.
In fact, in mmWave-based systems, quality of service is primarily dominated by latency . In particular, the requirements of low latency and overhead are perhaps even more critical than those for high throughput. Motivated by this, we propose an enhancement to the problem of mmWave multiple-input multiple-output (MIMO) beam alignment by leveraging discrete monotonic optimization (DMO) frameworks [13, 14], leading to a significantly reduced amount of computational complexity compare to the previous KM . We also propose a new threshold approach to obtaining empirical probabilities of the training set, which improves the performance of hypothesis testing for the FE of KM. Our approach is based on utilizing the Kolmogorov-Smirnov (KS) test criterion [15, 16], which is desired because it can set a detection threshold without access to a priori knowledge.
The remainder of the paper is organized as follows. In Section II, we introduce the system model and briefly review the related work on the KM-based beam alignment. In Section III, we propose the DMO algorithm to solve the KM learning optimization problem and provide a new method building the empirical training statistics via the KS test. In Section IV, simulation results are presented to illustrate the superior performance of the proposed algorithm. Finally, we conclude the paper in Section V.
Ii System Model and Previous Work
We present the beam alignment system model and provide an overview of the previous work under consideration.
Ii-a System Model
Suppose a point-to-point mmWave MIMO system where an independent block fading channel with a coherence block length (channel uses) is assumed. The transmitter and receiver are equipped with and antennas, respectively. For simplicity, we adopt a low-complexity architecture where only one radio-frequency (RF) chain is employed at both the transmitter and receiver sides.
During a coherence block , the transmitter and receiver intend to spend () channel uses to align the best transmit and receive beam pair for data transmission. To be specific, the transmitter and receiver choose an analog beamformer and combiner from the pre-designed beam sounding codebooks and such that and , respectively. We denote the index sets of and as and , respectively, with cardinalities and . Assume that and are unit-norm, i.e., . The received signal associated with the beam pair is therefore given by
where is the channel matrix and is the training symbol satisfying .
is the additive complex white Gaussian noise vector with each entry independently and identically distributed (i.i.d.) as zero mean andvariance according to .
is the effective additive noise, and thus, the signal-to-noise ratio (SNR) is.
Exhaustive beam alignment (beam sounding) is a widely used method: the transmitter and receiver jointly sound all the beams in and to find the optimal beam pair that maximizes the received signal power
In fact, the training overhead for the exhaustive method is . Since the size of the codebooks and is large in mmWave cellular networks, the drastic training overhead of exhaustive beam alignment overwhelms the available coherent channel resources. To tackle this issue, a learning-based approach, KM, was proposed to reduce the beam alignment overhead while maintaining appreciable beam alignment performance .
Ii-B Previous Work: KM-Based Beam Alignment
A binary random variableis introduced to indicate the “good” and “poor” quality of the beam pair for as
where denotes the probability of the event , is a pre-designed threshold value for the received signal power. We say that the beam pair has a “good” SNR, if . Because , it suffices to focus on the case when . The -dimensional KER of is then defined by 
where the probability mass function vector is on the unit probability simplex , i.e., and , is the all-one vector with dimension , and denotes the binary indicator vector of dimension such that the th entry of is .
The beam alignment using KM relies on the subsampled codebooks with index sets and , such that and , and have much smaller sizes, and . We let the empirical probability that beam pair has a “good” SNR be . In , a FE method was proposed to build the training set of empirical probabilities of beam pairs in the subsampled codebooks for the KM learning algorithm, i.e., , . Given the FE interval , the estimate of at time-slot , i.e., , is provided by
where is the received signal power obtained by sounding the beam pair at time-slot and denotes the indicator function. The best FE estimate comes from , which is carried out at the end of the estimation interval.
Once the training set (of empirical probabilities) is constructed, the KM learning algorithm proceeds to optimize the KM parameter vectors and by solving the constrained error minimization problem:
In order to handle the coupled non-convex combinatorial optimization in (4), a block-coordinate descent (BCD) method [11, 10] was proposed by dividing the problem in (4) into two subproblems: (i) linearly-constrained quadratic program (LCQP):
where , , and , and (ii) binary quadratic program (BQP):
where , and . The KM solves the two subproblems in (5) and (6) in an alternative way and iteratively refines the KM parameters and . More specifically, by exploiting the fact that the optimization in (5) is carried out over the unit probability simplex, a simple iterative Frank-Wolfe (FW) algorithm  was proposed to optimally solve (5), while the semi-definite relaxation with randomization (SDRwR) was employed to optimally solve (6) asymptotically in .
We let be the learned KM parameters to the problem in (4). The predictive power of KM is exploited to infer the probabilities of the test set (i.e., beam pairs which are not sounded) as
Finally, the optimal beam pair with the highest probability of having a “good” SNR is selected by evaluating both the training and test sets as
A diagram of the KM-based beam alignment, which conceptually visualizes the system model and the framework, can be found in Fig. 1.
Ii-B1 Desired Attributes of KM
There are three main advantages of KM that make it superior to other data representations such as matrix factorization (MF) , SVD-based representations , and nonnegative MF : (i) the fact that the KM in (2) represents an actual probability is exploited to model the quality of beam pairs in terms of SNR, (ii) KM offers improved prediction performance over nonnegative MF , and (iii) the interpretability of the KM in (2) , namely, the insight that it exhibits about the data, which is not possible with other learning methods that fall under the black-box type.
Ii-B2 Main Contribution of This Work
While the SDRwR method in solving (6) is asymptotically optimal [10, 11] it demands huge computational cost and thus violates the low-latency requirement in the mmWave communications . Moreover, the lack of an appropriate threshold design criterion of the FE method in  limits the beam alignment performance of the KM-based approach. To address the above limitations, we first propose an enhanced KM learning algorithm for beam alignment by leveraging DMO. A novel empirical probability estimation method based on the KS test is then provided with a proper threshold selection criterion. The proposed algorithm exhibits better beam alignment performance with a significantly reduced computational time compared to the existing work.
Iii Proposed Algorithm
To reduce the prohibitively high computational cost of SDRwR, in this section, a DMO framework is proposed. Moreover, a new method based on the KS test is presented.
Iii-a Discrete Monotonic Optimization
Prior to delivering the proposed algorithm, we provide a lemma showing an equivalent reformulation of the problem in (6).
Given the definition of and in (9), the objective function in (9) is attained by transforming the minimization to the maximization and discarding the constant in (6). Also, and are both increasing functions with respect to because and is a positive semi-definite matrix. The binary constraints , , can be equivalently rewritten as , , , i.e., , in (9), where and are increasing on . This completes the proof.
The BQP problem in (6) cannot be directly handled due to the discrete constraints. In , this nuisance has been tackled by using SDRwR, which incurs impractical computational complexity. Unlike SDRwR, the equivalent problem formulation leveraging the difference of monotonic functions (DMF) in (9) disinvolves the intractable discrete constraints without any relaxation. Motivated by Lemma 1, we propose to use a branch-reduce-and-bound (BRB) approach  to directly solves (9) without any relaxation and/or randomization. As will be seen in Fig. 2 in Section IV, the proposed DMO algorithm can substantially reduce the computational complexity (two-orders-of-magnitude improvement in time complexity). We introduce the following three main steps at each iteration in the proposed DMO algorithm, where the overall procedure is presented in detail in Algorithm 1.
We let be one of the boxes that contain feasible solutions to (9) and be the current maximum value of the objective function in (9). The reduced box can be defined by new lower and upper vertices and , respectively, without excluding any feasible solution , while maintaining  as
where and for , where is the th column of the
-dimensional identity matrix. Note that the optimal values of and can be found by referring to the compactness of and utilizing the monotonicity of , , , and (for instance, by using a bisection method) .
For every reduced box , an upper bound of is calculated such that
The upper bound in (12) holds because and are monotonically increasing functions. Furthermore, ensure , where stands for any infinite nested sequence of boxes and is the optimal solution to (9). At each iteration, any box with is deleted because such a box does not contain anymore.
At the end of each iteration, the box with the maximum upper bound, denoted by , is selected and branched to accelerate the convergence of the algorithm. The box is divided into two boxes
where , , and represent the element-wise floor and ceiling operations, respectively.
Iii-B Kolmogorov-Smirnov Test
The choice of in (3) has a profound impact on the beam alignment performance of the KM-based approach. The threshold value has been chosen subjectively based on numerical simulations , which can substantially vary depending on the channel conditions and operating SNR. There lacks an appropriate selection criterion due in part to the fact that the statistics of are unknown in practice. We overcome this difficulty by proposing, in this subsection, to estimate the trained empirical probabilities by applying the detection-theoretic criterion for threshold setting introduced by Kolmogorov and Smirnov [23, 15].
We first define the binary hypotheses of a beam pair , according to the signal model in (1) as
where the null hypothesisis declared when relies on noise only and the alternative hypothesis is true when is a function of both the signal and noise. While, under , given
, the theoretical cumulative distribution function (CDF) ofis given by
the test statistics underis unknown. To circumvent this difficulty, KS test forms the empirical CDF of from the observed data samples ,
where denotes the number of the data samples in the KS test, which is distinguished from the time interval in (3).
The KS criterion to estimate the best sample point is given by
The binary hypothesis test is then , where is the KS threshold value. Similar to conventional Neyman-Pearson, the threshold is chosen to meet the target false alarm rate such that
where the last step is due to the Kolmogorov approximation . The approximation becomes tight as tends to large such that the KS threshold can be determined by . Finally, similar to (3), the KS-estimated empirical probability at time-slot for any beam index pair is, therefore, given by
where is the detection statistic obtained by (15) at time-slot and denotes the KS estimation interval.
The key implication of the KS criterion in (15) is three folds: (i) the maximum value converges to almost surely when tends to infinity if the data samples follows the distribution , (ii) the distribution of does not depend on the underlying CDF being tested, and (iii) the maximum of difference between the CDFs stands for a jump/concentration in probability and thus becomes more representative to tell the difference of distribution compared to other statistics such as minimum and median.
Iv Simulation Results
In this section, we provide the numerical results of the proposed beam alignment approach in mmWave MIMO channels. We adopt the physical representation of sparse mmWave MIMO channels [5, 1] and assume that the rank of the channel matrix is . We set , , and throughout the simulation. The sampling rate, defined as the ratio of the number of beam pairs in the subsampled training codebook to the total number of the beam pairs in the original codebook, is given by . We obtain the numerical results by conducting Monte Carlo simulations.
In Fig. 2, the average time (in seconds) consumed to execute Algorithm 2 (i.e., the proposed KM learning with DMO) is compared with the conventional KM learning with SDRwR (i.e., Algorithm in ) for and , respectively. Notice that we measure the running time by using “cputime” function in MATLAB. We set the target false alarm rate for the KS test in Algorithm 2 and dB for the FE in [10, Algorithm 1] to obtain the empirical probabilities for the training set. We further assume here. It is clear from Fig. 2 that the proposed Algorithm 2 substantially accelerates the computational speed compared to the conventional KM learning with SDRwR [10, Algorithm 1]; more than times of improvement is observed.
In Fig. 3, the average beamforming gains of the conventional KM learning algorithm [10, Algorithm 1] and the proposed Algorithm 2 are evaluated for , , and , where given the selected beam pair , based on each algorithm, the beamforming gain is calculated from (1) by . In Fig. 3, the curves of the conventional KM learning are evaluated for different threshold values dB, while the curve of Algorithm 2 is evaluated for . Moreover, the performance of the exhaustive search, a benchmark, consuming channel uses for the beam alignment, is also presented. As can be seen from Fig. 3, Algorithm 2 shows an improvement compared to the conventional KM learning with substantially reduced complexity.
The efficacy of the proposed KS test in improving the proposed KM learning capability is further evaluated. In Fig. 4, we show the beamforming gain of Algorithm 2 and the one by replacing the KS test in Algorithm 2 with the FE as shown in (3) for , , and . Fig. 4 illustrates that, with a false alarm rate guarantee, the proposed KS test substantially improves the learning capability of the KM.
In this paper, we proposed an enhanced KM learning algorithm for beam alignment in mmWave MIMO channels. Based on DMO, one key step in learning the KM parameters, i.e., the BQP, was substantially accelerated. By considering the uncertainty brought by FE due to subjective threshold setting, the KS test was proposed to obtain the empirical probabilities of the training set, based on the detection-theoretic criterion. The simulation results demonstrate that the proposed KM learning with DMO and KS shows better beam alignment performance with a substantially reduced computational complexity compared to the conventional KM algorithm.
-  R. W. Heath, N. González-Prelcic, S. Rangan, W. Roh, and A. M. Sayeed, “An overview of signal processing techniques for millimeter wave MIMO systems,” IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 3, pp. 436–453, 2016.
-  S. Hur, T. Kim, D. J. Love, J. V. Krogmeier, T. A. Thomas, and A. Ghosh, “Millimeter wave beamforming for wireless backhaul and access in small cell networks,” IEEE Transactions on Communications, vol. 61, no. 10, pp. 4391–4403, 2013.
-  IEEE Std 802.15.3c-2009. IEEE Standard, Oct 2009.
-  IEEE Std 802.11ad-2012. IEEE Standard, Dec 2012.
-  A. Alkhateeb, O. El Ayach, G. Leus, and R. W. Heath, “Channel estimation and hybrid precoding for millimeter wave cellular systems,” IEEE Journal of Selected Topics in Signal Processing, vol. 8, no. 5, pp. 831–846, 2014.
-  S. Sun and T. S. Rappaport, “Millimeter wave MIMO channel estimation based on adaptive compressed sensing,” in 2017 IEEE International Conference on Communications Workshops (ICC Workshops), 2017, pp. 47–53.
-  W. Zhang, T. Kim, D. J. Love, and E. Perrins, “Leveraging the restricted isometry property: Improved low-rank subspace decomposition for hybrid millimeter-wave systems,” IEEE Transactions on Communications, vol. 66, no. 11, pp. 5814–5827, 2018.
-  M. Kokshoorn, H. Chen, P. Wang, Y. Li, and B. Vucetic, “Millimeter wave MIMO channel estimation using overlapped beam patterns and rate adaptation,” IEEE Transactions on Signal Processing, vol. 65, no. 3, pp. 601–616, 2017.
-  Y. Shabara, C. E. Koksal, and E. Ekici, “Beam discovery using linear block codes for millimeter wave communication networks,” IEEE/ACM Transactions on Networking, vol. 27, no. 4, pp. 1446–1459, 2019.
-  W. M. Chan, H. Ghauch, T. Kim, E. De Carvalho, and G. Fodor, “Kolmogorov model for large millimeter-wave antenna arrays: Learning-based beam-alignment,” in 2019 53rd Asilomar Conference on Signals, Systems, and Computers, 2019, pp. 411–415.
-  H. Ghauch, M. Skoglund, H. Shokri-Ghadikolaei, C. Fischione, and A. H. Sayed, “Learning Kolmogorov models for binary random variables,” in ICML, 2018.
-  G. Yang, M. Xiao, and H. V. Poor, “Low-latency millimeter-wave communications: Traffic dispersion or network densification?” IEEE Transactions on Communications, vol. 66, no. 8, pp. 3526–3539, 2018.
-  H. Tuy, M. Minoux, and N. T. Hoai-Phuong, “Discrete monotonic optimization with application to a discrete location problem,” SIAM Journal on Optimization, vol. 17, no. 1, pp. 78–97, 2006.
-  T. Kim, D. J. Love, M. Skoglund, and Z. Jin, “An approach to sensor network throughput enhancement by PHY-aided MAC,” IEEE Transactions on Wireless Communications, vol. 14, no. 2, pp. 670–684, 2015.
-  G. Zhang, X. Wang, Y. Liang, and J. Liu, “Fast and robust spectrum sensing via Kolmogorov-Smirnov test,” IEEE Transactions on Communications, vol. 58, no. 12, pp. 3410–3416, 2010.
-  A. C. Marcum, Joon Young Kim, D. J. Love, and J. V. Krogmeier, “Interference detection using time-frequency binary hypothesis testing,” in MILCOM 2015 - 2015 IEEE Military Communications Conference, 2015, pp. 1485–1490.
-  M. Jaggi, “Revisiting Frank-Wolfe: Projection-free sparse convex optimization,” in Proceedings of the 30th International Conference on Machine Learning, vol. 28, no. 1, 2013, pp. 427–435.
-  M. Kisialiou and Z. Luo, “Probabilistic analysis of semidefinite relaxation for binary quadratic minimization,” SIAM Journal on Optimization, vol. 20, no. 4, pp. 1906–1922, 2010.
-  Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” Computer, vol. 42, no. 8, pp. 30–37, 2009.
-  Y. Koren, “Factorization meets the neighborhood: A multifaceted collaborative filtering model,” in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 426––434.
-  D. D. Lee and H. S. Seung, “Algorithms for non-negative matrix factorization,” in Advances in Neural Information Processing Systems 13. MIT Press, 2001, pp. 556–562.
C. J. Stark, “Expressive recommender systems through normalized nonnegative
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016, pp. 1081––1087.
-  J. Millard and L. Kurz, “The Kolmogorov-Smirnov tests in signal detection (corresp.),” IEEE Transactions on Information Theory, vol. 13, no. 2, pp. 341–342, 1967.
-  A. Papoulis and S. U. Pillai, Probability, Random Variables, and Stochastic Processes, 4th ed. Boston: McGraw Hill, 2002.