# Eliminating NB-IoT Interference to LTE System: a Sparse Machine Learning Based Approach

Narrowband internet-of-things (NB-IoT) is a competitive 5G technology for massive machine-type communication scenarios, but meanwhile introduces narrowband interference (NBI) to existing broadband transmission such as the long term evolution (LTE) systems in enhanced mobile broadband (eMBB) scenarios. In order to facilitate the harmonic and fair coexistence in wireless heterogeneous networks, it is important to eliminate NB-IoT interference to LTE systems. In this paper, a novel sparse machine learning based framework and a sparse combinatorial optimization problem is formulated for accurate NBI recovery, which can be efficiently solved using the proposed iterative sparse learning algorithm called sparse cross-entropy minimization (SCEM). To further improve the recovery accuracy and convergence rate, regularization is introduced to the loss function in the enhanced algorithm called regularized SCEM. Moreover, exploiting the spatial correlation of NBI, the framework is extended to multiple-input multiple-output systems. Simulation results demonstrate that the proposed methods are effective in eliminating NB-IoT interference to LTE systems, and significantly outperform the state-of-the-art methods.

There are no comments yet.

## Authors

• 5 publications
• 13 publications
• 65 publications
• 3 publications
• ### Integrated Sensing, Computation and Communication in B5G Cellular Internet of Things

In this paper, we investigate the issue of integrated sensing, computati...
09/16/2020 ∙ by Qiao Qi, et al. ∙ 0

• ### Full-Duplex Transmission Optimization for Bi-directional MIMO links with QoS Guarantees

We consider a bi-directional Full-Duplex (FD) Multiple-Input Multiple-Ou...
09/24/2018 ∙ by Hiroki Iimori, et al. ∙ 0

• ### Prototyping of Open Source NB-IoT Network

Narrowband Internet-of-Things (NB-IoT) is one of the major access techno...
06/04/2020 ∙ by Chieh-Chun Chen, et al. ∙ 0

• ### Grant-Free Random Access in Machine-Type Communication: Approaches and Challenges

Massive machine-type communication (MTC) is expected to play a key role ...
12/18/2020 ∙ by Jinho Choi, et al. ∙ 0

• ### Jointly Sparse Signal Recovery via Deep Auto-Encoder and Parallel Coordinate Descent Unrolling

In this paper, utilizing techniques in compressed sensing, parallel opti...
02/07/2020 ∙ by Shuaichao Li, et al. ∙ 0

• ### Adaptive Multi-Receiver Coded Slotted ALOHA for Indoor Optical Wireless Communications

In this paper, we design a novel high-throughput random access scheme fo...
02/24/2020 ∙ by Dejan Vukobratovic, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

WITH the rapid development of the upcoming technologies of 5G new radio, the extensive research on enhanced mobile broadband (eMBB), massive machine-type communications (mMTC), and ultra-reliable low latency communications (URLLC) has drawn dramatically increasing attention from both academia and industry [1, 2, 3]. To satisfy the prospects of 5G, not only tremendous improvements of the aforementioned new radio techniques need to be achieved, but also the harmonic and fair coexistence of heterogeneous networks and the compatibility between 4G and 5G systems should be taken great care of [4]. Due to the scarcity of the spectrum suitable for wireless electromagnetic transmission, many various existing and emerging communication systems are deployed close to each other, or even overlapping in spectrum, which inevitably results in intensive interference [5]. As a typical example, the narrowband internet-of-things (NB-IoT) system is deployed reusing the spectrum of long term evolution (LTE), occupying the spectrum of LTE when operating in the “in-band” mode [6, 7, 8]. NB-IoT is a promising and emerging technology to support the prospect of mMTC in 5G new radio, capable of interconnecting a large amount of nodes with very low power consumption and narrow bandwidth [9, 10, 11]. Since LTE and LTE-Advanced (LTE-A) with the cyclic-prefixed orthogonal frequency division multiplexing (CP-OFDM) modulation are dominating technologies in 4G era [12, 13, 14], the interference from NB-IoT systems should be properly tackled so that the smooth transition from 4G to 5G can be done [15, 16]. In the process of the deployment of 5G eMBB facilities, it is also important to mitigate the interference from NB-IoT if the utilized spectrum is overlapping.

However, how to mitigate or eliminate the interference between NB-IoT and LTE systems still remains an open issue, which has not been sufficiently investigated in literature yet. Since the bandwidth of NB-IoT is sufficiently small compared with that of LTE, the interference from NB-IoT can be regarded as a certain kind of narrowband interference (NBI). Although there are plenty of conventional methods to combat against NBI in literature [17, 18, 19, 20, 21], useful data might be lost using the conventional methods, or the information of statistics or locations of the NBI should be priorly known, or a large amount of virtual sub-carriers were consumed, which limited the efficiency and applicability of the conventional methods.

Recently, emerging sparse recovery methods are introduced to NBI estimation, exploiting the sparsity property of NBI, especially the compressed sensing (CS) theory based methods are drawing great attention

[22]. Nevertheless, the state-of-the-art CS-based methods are mostly designed for non-CP-OFDM systems, or the estimation is carried out at the preamble, which might turn out inaccurate for the payload data frames. Besides, it is difficult to design a practical observation matrix with satisfactory restricted isometry property (RIP) required by CS-based methods [22]. Thus, the performance is limited when the conditions of background noise or sparsity level are unideal. Sparse Bayesian learning (SBL), as another sparse recovery theory, was proposed [23] to solve block sparse recovery problems, but prior information of the block partition and the statistics of the unknown signal were required, and the stringent parametric assumptions of the NBI were impractical.

Different from the aforementioned existing schemes, the emerging and powerful machine learning theory and techniques, drawing tremendous research attention recently, can be a great inspiration to achieve a both efficient and reliable method of NBI recovery. In the research on machine learning, cross-entropy (CE) has been exploited as the loss function to train deep neural networks

[24]. Nevertheless, the conventional CE method was not designed for sparse approximation. Moreover, the state-of-the-art research on sparse machine learning based NBI recovery using iterative cross-entropy guided training is insufficient in literature. To fill this gap, a sparse machine learning inspired probabilistic framework is formulated, and a novel algorithm called sparse CE minimization (SCEM) is proposed to iteratively learn the support distribution. The proposed method is capable of learning and recovering the NBI more efficiently and more accurately than state-of-the-art counterparts, supporting the harmonic coexistence of NB-IoT and LTE systems.

The main contributions are listed as follows:

• The theory of sparse machine learning with the method of CE-guided training is introduced to the area of NBI recovery for the first time. A novel probabilistic framework of sparse machine learning is formulated to recover and eliminate the NB-IoT interference to the LTE system, with higher spectral efficiency and recovery accuracy than the existing methods.

• A novel algorithm called SCEM based on sparse machine learning is proposed for NBI recovery, which iteratively learns the NBI support distribution guided by the CE as the loss function. An enhanced algorithm called regularized SCEM (RSCEM) is proposed by regularizing the loss function, which achieves better recovery accuracy and convergence rate.

• The proposed framework is extended to MIMO systems to utilize the spatial correlation of the NBI at multi-antennas. Thus the simultaneous SCEM (S-SCEM) algorithm is formulated, which combines the contributions from multiple antennas and simultaneously recovers the common support of the NBI to further improve the spectral efficiency and accuracy.

The rest of this paper is organized as follows: The related works are presented in Section II. The system model is presented in Section III. The main contribution of this paper, the proposed probabilistic framework formulation and the proposed algorithms of sparse machine learning for NBI recovery, are described in detail in Section IV. The performance of the proposed algorithms is evaluated through computer simulations in Section V, which is followed by the conclusions in Section VI.

. Matrices and column vectors are denoted by boldface letters; frequency-domain and time-domain vectors are denoted by boldface vectors with tilde

and without tilde , respectively; and denote the pseudo-inversion operation and conjugate transpose, respectively; represents the norm operation; denotes the cardinality of the set ; denotes the entries of the vector in the set of ; represents the sub-matrix comprised of the columns of the matrix indexed by ; denotes the complementary set of ; denotes getting the support of .

. BSBL (Block Sparse Bayesian Learning). CE (Cross-Entropy). CP (Cyclic Prefix). CRLB (Cramer-Rao Lower Bound). CS (Compressed Sensing). FTE (Frequency Threshold Excision). IBI (Inter-Block Interference). INR (Interference-to-Noise Ratio). LTE (Long Term Evolution). MIMO (Multiple-Input Multiple-Output). MSE (Mean Square Error). NBI (NarrowBand Interference). NB-IoT (NarrowBand Internet-of-Things). NLL (Negative Logarithm Likelihood). OFDM (Orthogonal Frequency Division Multiplexing). PA-SAMP (Priori Aided Sparsity Adaptive Matching Pursuit). RIP (Restricted Isometry Property). RSCEM (Regularized Sparse Cross-Entropy Minimization). SCEM (Sparse Cross-Entropy Minimization). S-SCEM (Simultaneous Sparse Cross-Entropy Minimization). SAMP (Sparsity Adaptive Matching Pursuit).

## Ii Related Works

Some coexistence simulation results for in-band and guard band scenarios between NB-IoT and legacy systems are provided for initial analysis in the 3GPP technical document [25], which shows significant interference between NB-IoT and LTE systems. Ratasuk et al provided an analysis of the impacts of the NB-IoT signal on the link budget and block error rate performance of the LTE system [26]. Kim et al investigated the interference between NB-IoT and LTE systems in the “in-band” mode [27]. Wang and Wu gave an analysis of the coexistence between NB-IoT and LTE for the stand-alone mode, and studied the effects of NB-IoT to the performance of uplink LTE transmission [28].

Since the problem of the coexistence between NB-IoT and LTE systems is vital, there have been some conventional methods to combat against NBI. A commonly adopted approach is to directly null out the sub-carriers where NBI is present, called frequency threshold excision (FTE) [17]. Nilsson proposed a linear minimum mean square error based method to estimate NBI [18]. A successive interference cancelation approach mitigating the NBI in a recursive manner was introduced in [19]. A soft decision based successive NBI cancellation method was further proposed by Darsena et al in [20]

. Coulson designed a time-domain notch filter for NBI suppression based on linear prediction criterion before discrete Fourier transform at the transmitter

[21]. The limitation of conventional methods mainly lies in that useful data might be lost, and that the statistics information or plenty of virtual sub-carriers are required.

To overcome the limitations of conventional methods, the CS theory, as a newly emerged powerful approach for sparse recovery, can be utilized to deal with the NBI estimation problem. CS-based methods were first investigated by Al-Dhahir et al, utilizing the null space to obtain the measurements of NBI for OFDM systems [29, 30]. In this work, the NBI could be recovered by using CS-based greedy algorithms. There have been studies on different CS-based greedy algorithms, such as subspace pursuit (SP) [31] proposed by W. Dai et al and sparsity adaptive matching pursuit (SAMP) [32] proposed by T. Do et al. The SP algorithm is able to recover sparse signals with or without noise disturbance costing low complexity [31]. The SAMP algorithm is designed to be adaptive to variant sparsity levels of the NBI. By dividing the iteration process into multiple stages, the SAMP algorithm is able to recover the sparse signal by iterative matching pursuit of the support basis without knowing its sparsity level [32].

Other CS-based methods were proposed to estimate the NBI, exploiting the time-domain training guard interval of time-domain synchronous OFDM (TDS-OFDM) systems [33] or the preamble in the frame header [34]. In the work of [33], the algorithm of priori aided SAMP (PA-SAMP) was proposed as an improvement of the classical algorithm SAMP [32], which makes use of the prior information of the partial NBI support acquired by the coarse power threshold method. Then the prior information was exploited in the initialization and iteration process to reduce the complexity and improve the accuracy. The two-dimensional correlation of the NBI was exploited in the framework of multiple measurements and structured CS, in literature [34]. The two-dimensional measurement data were obtained from the preambles in multiple receive antennas, and then utilized for the structured CS based recovery of the NBI. Another sparse recovery theory, sparse Bayesian learning (SBL), was proposed in [23] and has been utilized to effectively estimate the impulsive noise [35]. A block SBL (BSBL) based method of estimating the NBI generated by NB-IoT was proposed in [36], which is an improvement of the SBL-based method in [23]

. The BSBL-based method employed parametric Bayesian inference iteratively to estimate the unknown deterministic parameters of the block sparse NBI

[36]. However, the major limitation of CS-based methods is that the CS theory requires that an observation matrix with satisfactory RIP should be designed [22], which is difficult in practice. Furthermore, the performance is limited when the intensity of the background noise or sparsity level is large.

Machine learning has become a popular research trend in recent years, with many applications in the area of sparse composite regularization [37], anti-jamming [38, 39], as well as wireless communications [40]

. A reinforcement learning based scheme was proposed in literature

[38] for ultra-dense networks, which adaptively learns the policy of power control to improve the efficiency while mitigating the inter-cell interference. A two-dimensional anti-jamming mobile communication scheme based on reinforcement learning was proposed in literature [39]

, where a mobile device can achieve an optimal communication policy without the need to know the jamming and interference model in a dynamic game framework. As an important method in machine learning, the CE method is usually utilized for training deep neural networks and machine learning models, which has well solved many learning tasks such as pattern recognition, object classification and so on

[41, 42]. Recently, a machine learning based method exploiting CE was proposed in [43] to improve hybrid precoding performance for mmWave massive MIMO systems, which introduced it to wireless communications research. Previously, the CE method was also adopted to solve combinatorial optimization problems in literature, which outperforms the brute-force approach [24, 44]. Different from the state-of-the-art methods, the proposed solution in this work introduces sparse machine learning to NBI estimation, and a novel algorithm based on CE minimization is proposed to efficiently learn the NBI support, which improves both the spectral efficiency and the estimation accuracy compared with existing approaches.

## Iii System Model

### Iii-a Signal Model of LTE

As adopted in 3GPP standards of LTE [12, 13], the CP-OFDM frame structure is composed of the length- OFDM block, where is the number of sub-carriers with the sub-carrier spacing of , and the length- CP in front, which is formed by the last samples of the OFDM block, as illustrated in Fig. 1.

After transmitted in the wireless multi-path fading channel with the channel impulse response (CIR) in the presence of the NBI generated by the NB-IoT signal, the received -th CP before the -th OFDM block is represented as

 ci=ΨCPhi+ei+wi, (1)

where denotes the time-domain NBI vector located at the CP,

denotes the additive white Gaussian noise (AWGN) vector with zero mean and variance of

, and denotes the received CP, with the matrix represented as

 ⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣xi,N−Vxi−1,N−1xi−1,N−2⋯xi−1,N−L+1xi,N−V+1xi,N−Vxi−1,N−1⋯xi−1,N−L+2xi,N−V+2xi,N−V+1xi,N−V⋯xi−1,N−L+3⋮⋮⋮⋱⋮xi,N−V+L−2xi,N−V+L−3xi,N−V+L−4⋯xi−1,N−1xi,N−V+L−1xi,N−V+L−2xi,N−V+L−3⋯xi,N−Vxi,N−V+Lxi,N−V+L−1xi,N−V+L−2⋯xi,N−V+1⋮⋮⋮⋱⋮xi,N−1xi,N−2xi,N−3⋯xi,N−L⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦

The entries in the matrix above represent the last samples of the preceding -th OFDM block , which causes inter-block-interference (IBI) on the following -th CP. Since only causes IBI on the first samples of the -th CP as illustrated in Fig. 1, the last samples of will form the IBI-free region given by

 pi=[ci,L−1,ci,L,⋯,ci,V−1]T=SG,Vci, (2)

where denotes the selection matrix composed of the last rows of the . The IBI-free region exists in practical broadband transmission systems because a common rule for system design is to configure the guard interval length to be much larger than the maximum channel delay spread in the worst case to avoid IBI between OFDM symbols, which is specified in standards and supported in literature [45, 12, 36].

For simplicity of notations, the subscript of denoting the frame number is omitted in the following content of this paper when there is no ambiguity about the current frame number, unless otherwise clearly stated. Then the IBI-free region can be rewritten as

 p=SG,VΨCPh+e+w, (3)

where , , and consist of the last entries of , , and in (1), respectively, while is composed of the last rows of without the IBI component. Since the CP is the same with the last samples of the OFDM block, there is a duplicate of the IBI-free region at the last samples of its subsequent OFDM block, which can be denoted by given by

 pX=SG,VΨCPh+eX+wX, (4)

where and denote the length- time-domain NBI and AWGN vectors at the end of the OFDM block, respectively.

### Iii-B NBI Model Generated by NB-IoT

In LTE systems, the NB-IoT signal working in the “in-band” mode at the spectrum of LTE generates NBI to the receivers of the LTE system [46]. The widely adopted model of the NBI in the frequency domain is the superposition of several tone interferers, and each tone interferer is modeled by a band-limited Gaussian noise (BLGN) with the power spectral density (PSD) of  [47]. The frequency-domain location of the tone interferers can be randomly distributed among all sub-carriers [48, 47], and different tone interferers are mutually independent [48]. Let denote the frequency-domain NBI vector associated with the CP, and then each entry of the corresponding time-domain NBI signal can be represented as

 ei,n=∑k∈Π~ei,k⋅exp(j2πknN),  n=0,1,⋯,V−1, (5)

where is the set of the indices of nonzero entries, which is defined as the support. The sparsity level is defined by the number of nonzero entries, which is much smaller than the signal dimension, i.e., . The interference-to-noise ratio (INR) is used to represent the intensity of the NBI, defined by , where denotes the average power. Since the tone interferers are BLGN as described, the average power is , yielding the INR .

Since the bandwidth of NB-IoT is sufficiently small compared with that of LTE [49], the NBI generated by NB-IoT can be modeled as a sparse vector in the frequency domain, which has only few nonzero entries compared with the number of sub-carriers. The nonzero entries of the NBI are not necessarily located exactly at the frequencies of the OFDM sub-carriers in LTE, because in practice there might be a fractional frequency offset (FO) for the NB-IoT working frequency with respect to the OFDM sub-carriers. Thus, the generalized NBI model will become a block sparse vector due to the spectral leakage [50]. Then the frequency-domain block sparse NBI vector associated with the CP can be represented as

 ~eB=FHNΛFOFN~ei, (6)

where denotes the inverse discrete Fourier transform (IDFT) matrix with the entry , and

is the FO matrix, whose value of offset frequency can be modeled by a uniformly distributed variable

[50]. Transforming the frequency-domain NBI signal (6) to the time domain by partial IDFT, the NBI vector associated with the IBI-free region in (3) is obtained as

 e=SG,NFN~eB. (7)

There is a useful feature of NBI called temporal correlation, which can be utilized for measuring the NBI from the compound received signal containing both the NBI and the data components. The temporal correlation claims that, the NBI signal usually has invariant support and amplitude over one received OFDM frame of interest. This is because according to experiments and observations, the coherence time of the NBI signal is normally much larger than that of one OFDM symbol, and the working band of the NBI source such as NB-IoT is not changing so fast [51, 52, 36]. It is observed that usually the NB-IoT signal working in-band in LTE spectrum is located fixed in certain frequency locations [8, 11]. Temporal correlation is also verified by substantial field tests and experimental observations in real house and apartments [53].

Because of the temporal correlation, the frequency-domain NBI vectors associated with the CP part and the following OFDM block part share the same support and amplitude, with only a phase shift in between: Let denote the frequency-domain NBI vector associated with the CP’s duplicate in the OFDM block given by (4), where the time-domain representation of is given by

 eX=SG,NFN~eBX. (8)

Hence, can be derived by the phase shift of associated with the CP in (3), which can be represented as

 ~eBX,k=~eB,kexp(j2π(k+α)ΔlBN),k=0,1,⋯,N−1, (9)

where the value of FO determines the phase to shift, and is the corresponding time-domain distance between the CP and its duplicate in the OFDM block.

Note that as illustrated in Fig. 1, so it can be further derived that , which yields a simpler relation only related with given by

 ~eBX=exp(j2πα)~eB. (10)

## Iv Probabilistic Sparse Machine Learning Based Framework Formulation and Algorithms for NBI Recovery

In this section, the probabilistic framework of sparse machine learning as well as the sparse combinatorial optimization problem for NBI recovery is firstly formulated in Section IV-A. Then the proposed sparse machine learning based iterative algorithm called SCEM is introduced in detail in Section IV-B, followed by the enhanced algorithm of RSCEM imposing regularization on the loss function in Section IV-C. Afterwards, the extension of the proposed method to MIMO systems is presented in Section IV-D.

### Iv-a Probabilistic Sparse Machine Learning Framework Formulation for NBI Recovery

The ultimate goal of this work is to accurately recover the NBI vector located at the OFDM data block and eliminate it from the data, which can be done by estimating and using the relation in (10). Hence, firstly the measurement of the NBI should be obtained, and a probabilistic sparse machine learning based framework can be formulated to efficiently recover the NBI using the proposed algorithms.

The measurement vector of the NBI can be obtained using the temporal differential measuring operation [36]. Specifically, as illustrated in Fig. 1, the measurement vector can be obtained by the differential operation between the received IBI-free region in (3) and its duplicate in (4) at the end of the OFDM block, which nulls out the cyclic data component , yielding the measurement vector of the NBI

 Δp=Δe+Δw, (11)

where and . Thus by substituting (7) and (8) into (11), the measurement vector can be rewritten as

 Δp=SG,NFNΔ~eB+Δw, (12)

where is given by

 Δ~eB=~eB−~eBX=(1−exp(j2πα))~eB, (13)

whose support is the same with that of and .

After obtaining the measurement of the NBI in (12), the probabilistic sparse machine learning framework of NBI recovery can be formulated, by which the support distribution of the NBI can be learnt using the proposed algorithms. Because of the sparsity of the frequency-domain NBI vector, it is crucial to recover its support, i.e., the set of the indices of the nonzero entries. Since the sparsity level of the NBI is , it is required that the unknown NBI vector to be reconstructed in (12) should satisfy

 ∥Δ~eB∥0≤K (14)

where denotes the -norm, i.e., the number of nonzero entries. To recover the optimal NBI vector based on the measurement in (12), we should solve the optimization problem given by

 Δ^e∗B=argminΔ~eB∥∥Δp−SG,NFNΔ~eB∥∥2,s.t.∥Δ~eB∥0≤K, (15)

where denotes the optimal NBI vector to be recovered from the measurement in (12) that minimizes the residue error norm , with given by

 r=∥∥Δp−SG,NFNΔ~eB∥∥2. (16)

In the conventional perspective of signal processing, the problem in (15) is intractable, because of the non-convex constraint of -norm. Since the constraint is a sparse one, it can be regarded as a sparse combinatorial optimization problem. Let denote the set of all possible supports of sparse vectors satisfying the constraint in (14), we have

 (17)

so the size of the set of possible solutions is given by

 |Ξ|=k=K∑k=0CkN=k=K∑k=0N!(N−K)!K!. (18)

It can be noted from (18) that the possible supports of the solution space is exponentially and combinatorially increasing with the problem size and .

Some sparse approximation methods, including the popular CS-based theory, have been exploited to relax the non-convex optimization problem to a tractable one in literature. For instance, the non-convex -norm constraint in (15) can be relaxed to the convex -norm minimization problem [22] as

 argminΔ~eB∥Δ~eB∥1,s.t.∥∥Δp−SG,NFNΔ~eB∥∥2≤ϵ, (19)

where denotes the error norm bound due to the background AWGN noise , and thus convex programming can be exploited to solve it [54]. However, the performance of the CS-based methods is dependent on the RIP of the observation matrix [22, 55]. Besides, performance degradation could be caused due to intensive background noise and large sparsity level [22]. The spectral efficiency could still be improved because many measurement samples have to be reserved in the guard interval for CS-based methods [33].

To overcome the difficulties of state-of-the-art methods, a probabilistic sparse machine learning based approach called SCEM is proposed for NBI recovery, which is able to efficiently solve the non-convex sparse combinatorial optimization problem in (15) without strict prior RIP requirements for the observation matrix , and much more spectrum-efficient by reducing the cost of measurement data. The proposed algorithm significantly develops the conventional CE method [24] to accommodate the sparse recovery problem, and the unknown sparse NBI signal can be accurately recovered, as described in detail in the next sub-section.

### Iv-B Proposed Sparse Machine Learning Inspired Algorithm: Sparse Cross-Entropy Minimization

Based on the probabilistic framework of sparse learning, the purpose of the SCEM algorithm proposed in this paper is to efficiently solve the sparse combinatorial optimization problem in (15) by iteratively minimizing the cross-entropy between the current support distribution and the one minimizing the residue error norm. The pseudo-code of the proposed SCEM algorithm is summarized in Algorithm 1, and the computing flowchart of the essential computing modules, parameters, nodes, and data flows of the algorithm is illustrated in Fig. 2.

It can be observed from Fig. 2

that the proposed sparse machine learning algorithm iteratively learns the probability distribution of the NBI support by minimizing the loss function (i.e., the cross-entropy). In each iteration within the algorithm loop, the algorithm generates a set of candidate supports randomly based on the current support distribution

(initialized by ), and computes the corresponding residue error norms using the measurement vector from the input. After sorting the residue error norms, the set of favorable supports is selected out, which serves as the training data set. Then, the loss function is computed by calculating the cross-entropy between the training data set and the estimated output. By minimizing the loss function using gradient descent, the support distribution is backward updated to for the next iteration. This process will drive the support distribution gradually to be trained towards the one with minimum estimation error. The iterations continue until the halting condition of the algorithm is met, and the output of the algorithm is thus achieved.

The overall structure and explanations of Algorithm 1 are described as follows:

Phase 1- Input. The measurement vector , the observation matrix , the residue error norm threshold given in (19), and the number of candidate supports and favorable supports, i.e., and , are input to the algorithm.

Phase 2 - Initialization. The initial probability distribution of the NBI support is set as , where , and denotes the probability that the -th entry is in the NBI support , i.e.,

 Pr(n∈Π(k))=q(k)n,n=0,⋯N−1. (20)

Since the nonzero entries can be randomly distributed in the support, assuming each entry has an initial probability of 0.5 to be nonzero is rational without loss of generality.

Phase 3 - Main iterations. The main process is composed of multiple iterations, and terminates until the halting condition of the algorithm is met. The main process includes the following steps:

1) Candidate supports generation (Line 4): candidate supports are generated based on the support distribution . Each candidate support is generated in an efficient and simple recursive manner to obtain a -sparse support. Let denote the current temporary support in the recursive generation process, where the initial temporary support . Then, based on the current temporary support and its corresponding probability derived from the current support distribution , a more sparse temporary support can be generated by a Bernoulli trial on each entry as

 πl+1={n|n∈πl,and f(πl)n=1}, (21)

where the -valued parameter is the outcome of the Bernoulli trial on entry with Bernoulli probability . Afterwards, and keep doing this until , and then the candidate support is set as .

2) Computing NBI and residue (Lines 5-6): the estimated NBI vectors corresponding to the candidate supports are calculated based on the least squares principle implemented on the candidate supports , and the corresponding residue error norms are calculated by (16) using the estimated NBI vectors.

3) Favorable supports selection (Lines 7-8): the candidate supports are sorted by the residue error norms in the ascending order in order to pick out the best candidate supports with smallest estimation error, which is closest to the real NBI support and regarded as the favorable supports . The implicit probability distribution implied by the favorable supports is the training target of the current support distribution , which is gradually driven towards the ground-truth distribution by iteratively minimizing the CE between them.

4) Learning support distribution by minimizing CE (Line 9): The CE is utilized as the loss function in the perspective of machine learning theory, which is given by

 L(Π(k)[j];q(k))=−1NfNf∑j=1lnPr(Π(k)[j]∣∣q(k)), (22)

where is the negative logarithm likelihood (NLL) of the favorable support conditioned on the current probability distribution . By minimizing the loss function in (22), the current support distribution is updated to , which is given by

 q(k+1)=argminq(k)⎧⎨⎩−1NfNf∑j=1lnPr(Π(k)[j]∣∣q(k))⎫⎬⎭, (23)

Let a -valued length- vector denote the favorable support , where its -th entry satisfies

 f[j],n=⎧⎪⎨⎪⎩1,  n∈Π(k)[j]0,  n∉Π(k)[j] (24)

Then the conditional probability in the CE in (23) is given by

 Pr(Π(k)[j]|q(k))=Pr(f[j]|q(k)), (25)

where

is a Bernoulli random variable given by

 Pr(f[j],n=1)=q(k)n, Pr(f[j],n=0)=1−q(k)n. (26)

Thus, one can derive that

 Pr(f[j]|q(k))=N−1∏n=0(q(k)n)f[j],n(1−q(k)n)1−f[j],n. (27)

By substituting (27) into (23), the first derivative of the CE with respect to can be derived as

 ∂∂q(k)n⎧⎨⎩−1NfNf∑j=1lnPr(Π(k)[j]∣∣q(k))⎫⎬⎭ = ∂∂q(k)n⎧⎨⎩−1NfNf∑j=1[f[j],nlnq(k)n+(1−f[j],n)ln(1−q(k)n)]⎫⎬⎭ = −1NfNf∑j=1[f[j],nq(k)n−1−f[j],n1−q(k)n]. (28)

To minimize the CE, the first derivative (28) is set to zero, so the updated support distribution can be learnt by

 q(k+1)n=1NfNf∑j=1f[j],n, n=0,1,⋯,N−1. (29)

5) Iteration switching (Line 10-11): if the halting condition is satisfied when or , the algorithm ends. Otherwise, the algorithm goes into the next iteration.

Phase 4 - Output. The output of the algorithm includes the learnt support probability distribution , the recovered NBI support , and the recovered sparse NBI vector , which obtains the solution of the sparse combinatorial optimization problem in (15) as .

Afterwards, can be calculated by (13) and the NBI associated with the OFDM block can be calculated through (10). Then, the NBI can be directly eliminated from the information data in the frequency domain just by subtracting from the received frequency-domain OFDM sub-carriers , which is given by

 X0=X−~eBX, (30)

where is the DFT of the received OFDM block as illustrated in Fig. 1, while is the frequency-domain OFDM data block free from the NB-IoT interference. Thus, the NBI-free OFDM data block can be then used for information demapping and decoding.

### Iv-C Enhanced Sparse Machine Learning Based Algorithm: Regularized SCEM

In the proposed SCEM algorithm where the CE plays the important role of loss function, each NLL corresponding to each favorable support has an average contribution to the CE given in (23), so the favorable supports with different residue error norms contribute the same to the loss function. In fact, different supports should reflect different contributions on the loss function so as to encourage the algorithm to learn the support with less error. Out of this insight, an enhanced sparse learning algorithm of RSCEM is proposed, in which the loss function in (22) is regularized by multiplying with the weighting parameter to generate the regularized loss function given by

 Lreg(Π(k)[j];q(k))=−1NfNf∑j=1λ[j]lnPr(Π(k)[j]∣∣q(k)), (31)

where the regularization weighting parameter is given by

 λ[j]=¯¯¯r(k)r(k)[j], j=1,2,⋯,Nf, (32)

where is the average residue error norm over the favorable supports given by

 ¯¯¯r(k)=1Nf∑Nfj=1r(k)[j]. (33)

Note that a smaller residue error norm leads to a larger weighting parameter in (32). Hence, the NLL corresponding to a more accurate support will have a larger contribution to the regularized loss function in (31), which will drive the support distribution to converge to the ground-truth support more accurately and more efficiently. The pseudo-code of RSCEM is thus similar to that of SCEM given in Algorithm 1 except for the procedure of minimizing the loss function in Line 9, where the regularized loss function is now adopted to update the distribution as given by

 q(k+1)=argminq(k)−1NfNf∑j=1λ[j]lnPr(Π(k)[j]∣∣q(k)). (34)

To calculate the minimum regularized loss function in (34), the same notation as in the previous sub-section, i.e. the Bernoulli vector in (24) denoting the favorable support , is inherited. Through similar deduction from (24) to (27), and substituting (27) into (34), the first derivative of the regularized loss function with respect to can be obtained, represented as

 ∂∂q(k)n⎧⎨⎩−1NfNf∑j=1λ[j]lnPr(Π(k)[j]∣∣q(k))⎫⎬⎭ = ∂∂q(k)n⎧⎨⎩−1NfNf∑j=1λ[j][f[j],nlnq(k)n+(1−f[j],n)ln(1−q(k)n)]⎫⎬⎭ = −1NfNf∑j=1λ[j][f[j],nq(k)n−1−f[j],n1−q(k)n]. (35)

Setting the first derivative of the regularized loss function given in (35) to zero, the regularized loss function can be minimized, yielding the updated support probability distribution given by

 q(k+1)n=Nf∑j=1λ[j]f[j],nNf∑j=1λ[j], n=0,1,⋯,N−1. (36)

Comparing (36) with (29), it can be observed that, for the algorithm of SCEM, all the entries have the same contribution to the updating of in (29), so the different accuracy among favorable supports are not taken into consideration. On the other hand, for the enhanced RSCEM algorithm, a more accurate support will impose a larger weighting parameter on and have a larger contribution to the updating of as implied by (36). In fact, (29) can be regarded as a special case of (36) when . Consequently, it can be derived that the enhanced RSCEM algorithm will learn the ground-truth support distribution more accurately and more efficiently than SCEM, which is also validated in the simulation results in the next section.

### Iv-D Extension to MIMO: Simultaneous Multi-Antenna NBI Recovery Algorithm

The proposed method can be extended to MIMO systems to further improve the estimation accuracy by exploiting the spatial correlation of the NBI. Due to the spatial correlation, the received NBI signals at different receive antennas in the MIMO system share the same support, i.e., the locations of nonzero entries are the same, although their amplitudes might be different [34]. One can make use of the spatial correlation in the iterations of the proposed sparse machine learning algorithm to simultaneously recover the NBI signals contaminating multiple receive antennas.