I Introduction
Analogtodigital conversion plays an important role in digital signal processing systems. While physical signals take values in continuoustime over continuous sets, they must be represented using a finite number of bits in order to be processed in digital hardware [eldar2015sampling]. This operation is carried out using analogtodigital converters (ADCs), which typically perform uniform sampling followed by a uniform quantization of the discretetime samples. When using highresolution ADCs, this conversion induces a minimal distortion, allowing to effectively process the signal using methods derived assuming access to the continuousamplitude samples. However, the cost, power consumption and memory requirements of ADCs grow with the sampling rate and the number of bits assigned to each sample [walden1999analog]. Consequently, recent years have witnessed an increasing interest in digital signal processing systems operating with lowresolution ADCs. Particularly, in multipleinput multipleoutput (MIMO) communication receivers, which are required to simultaneously capture multiple analog signals with high bandwidth, there is a growing need to operate reliably with lowresolution ADCs [andrews2014will]. The most coarse form of quantization is reduction of the signal to a single bit per sample, which may be accomplished via comparing the sample to some reference level, and recording whether the signal is above or below the reference. Onebit acquisition allows using high sampling rates at a low cost and low energy consumption. Due to such favorable properties of onebit ADCs, they have been employed in a wide array of applications, including in wireless communications [jeon2018one, rao2020massive, 8683876], radar signal processing [ameri2019one, jin2020one, xi2020bilimo], and sparse signal recovery [xiao2019one, khobahi2019model].
The nonlinear nature of lowresolution quantization makes symbol detection a challenging task. This situation is significantly exacerbated in practical onebit communication and sensing where the channel is to be estimated in conjunction with symbol detection. A
coherent symbol detection task is concerned with recovering the underlying signal of interest from the onebit measurements assuming the channel state information (CSI) is known at the receiver. On the other hand, the more difficult task of blind symbol detection, which is the focus here, carries out recovery of the underlying transmitted symbols when CSI is not available.Two main strategies have been proposed in the literature to facilitate operation with lowresolution ADCs: The first designs the overall acquisition system in light of the task for which the signals are acquired. For instance, MIMO communication receivers acquire their channel output in order to extract some underlying information, e.g., symbol detection. As the analog signals are not required to be recovered from their digital representation, one can design the acquisition system to reliably infer the desired information while operating with low resolution ADCs [shlezinger2018hardware, salamatian2019task, shlezinger2019deep, shlezinger2020learning, neuhaus2020task]. Such taskbased quantization systems rely on prequantization processing, which requires dedicated hardware in the form of hybrid receiver architectures [gong2019rf, ioushua2019family] or unique antenna structures [wang2019dynamic, shlezinger2020dynamic], which are configured along with the quantization rule.
An alternative approach to taskbased quantization, which does not require additional configurable analog hardware and is the focus of the current work, is to recover the desired information from the distorted coarsely discretized representation of the signal in the digital domain. The main benefit of schemes carried out only in the digital domain is their simplicity of implementation, as they do not require to introduce modifications to the quantization system and circumvent the need for adding prequantization analog processing hardware. In the context of MIMO systems, various methods have been proposed in the literature for channel estimation and signal decoding from quantized outputs, including modelbased signal processing methods as surveyed in [liu2019low]
, as well as modelagnostic systems based on machine learning and datadriven techniques
[zhang2020deep, klautau2018detection, balevi2019one, balevi2019two, balevi2020autoencoder, kim2019machine, nguyen2020svm, nguyen2020linear].Most existing modelbased detection algorithms require coherent operation, i.e., they rely on prior knowledge of the CSI and other system parameters. Among these works are the nearMaximum Likelihood (nML) detector proposed for onebit MIMO receivers in [choi2016near], the linear receivers studied in [risi2014massive, jacobsson2015one], and the message passing based detectors considered in [ivrlac2007mimo, mo2014channel]. The fact that such approaches require accurate CSI led to several works specifically dedicated to CSI estimation in the presence of lowresolution ADCs. These include [choi2016near, mezghani2018blind], which studied maximumlikelihood estimation for recovering the CSI in the presence of onebit data, the works in [li2017channel, jacobsson2017throughput], which developed linear estimators for CSI estimation purposes in onebit MIMO systems, and [mo2017channel] which focuses on sparse channels and utilizes onebit sparse recovery methods for CSI estimation. However, all these strategies inevitably induce nonnegligible CSI estimation error, which may notably degrade the accuracy in signal detection based on the estimated CSI.
Over the past several years, datadriven methods, and specifically deep neural networks (DNNs), have attracted unprecedented attention from research communities across the board. The advent of lowcost specialized powerful computing resources and the continually increasing amount of massive data generated by the human population and machines, along with new optimization and learning methods, have paved the way for DNNs and machine learningbased models to prove their effectiveness in many engineering areas, such as computer vision and natural language processing
[lecun2015deep]. DNNs learn their mapping from data in a modelagnostic manner, and can thus facilitate noncoherent (blind) detection.Previously proposed DNNaided symbol detection techniques for communication receivers can be divided based on their receiver architectures; namely, those that utilize conventional machine learning architectures for detection, including [farsad2017detection, corlay2018multilevel, liao2019deep], and schemes combining DNNs with modelbased detection methods, such as the blind DNNaided receivers proposed in [shlezinger2019viterbinet, shlezinger2020deepsic, shlezinger2020data, he2019model] and the coherent detectors of [samuel2019learning, takabe2019trainable], see also surveys in [balatsoukas2019deep, farsad2020data]. In the context of onebit DNNaided receivers, previous works to date focus mainly on the first approach, i.e., applying conventional DNNs for the overall detection task. Among these works are [zhang2020deep, balevi2019two] and [klautau2018detection], which applied generic DNNs for channel estimation in onebit MIMO receivers. The application of conventional architectures for symbol detection was studied in [balevi2019one, kim2019machine] and [nguyen2020svm], while [balevi2020autoencoder]
showed that autoencoders can facilitate the design of error correction codes for communications with onebit receivers. Recently, the authors in
[nguyen2020linear] considered the problem of symbol detection for a onebit massive MIMO system and proposed a linear estimator module based on the Bussgang decomposition technique combined with a modeldriven neural network.The vast majority of the aforementioned works on learningaided onebit receivers rely on conventional DNN architectures. Such DNNs require a massive amount of training samples and must be trained on data from the same (or a similar) statistical model as the one under which they are required to operate, imposing a major challenge in dynamic wireless communications. In fact, the use of generic blackbox DNNs is mostly justified in applications where a satisfactory description of the underlying governing dynamics of the system is not achievable, as is the case in computer vision and natural language processing fields. As surveyed above, this is not the case in the field of onebit MIMO systems. This gives rise to the need that is bridging the gap between datadriven and modelbased approaches in this context, and moving towards specialized deep learning models for signal processing techniques in onebit MIMO systems—which is the aim of this work.
In this paper, we develop a hybrid modelbased and datadriven system which learns to carry out blind symbol detection from onebit measurements. The proposed architecture, referred to as LoRDNet (Low Resolution Detection Network), combines the wellestablished modelbased maximumlikelihood estimator (MLE) with machine learning tools through the deep unfolding method [hershey2014deep, monga2019algorithm, khobahi2020unfolded, agarwal2020deep, khobahi2020deep, naimipour2020upr] for designing DNNs based on modelbased optimization algorithms. To derive LoRDNet, we first formulate the MLE for the task of symbol detection from onebit samples. Next, we resort to firstorder gradientbased methods for the MLE computation, and unfold the iterations onto layers of a DNN. The resulting LoRDNet learns to carry out MLEapproaching symbol detection without requiring CSI.
Applying conventional gradientbased optimization methods requires knowledge of the underlying system parameters, i.e., full CSI. Hence, a typical approach to unfold such a symbol detection algorithm would be to estimate the unknown parameters from training, and substitute it into the unfolded network [he2019model]. We show that instead of estimating the unknown system parameters, it is preferable to learn an alternative channel which allows the receiver to detect the symbols reliably. Surprisingly, we demonstrate that the alternative channel learned by LoRDNet is in general not the true channel. Based on this observation, we propose a twostage training procedure, comprised of learning the proper optimization process to unfold, followed by an endtoend training of the unfolded DNN.
The proposed LoRDNet has thus the following properties:

Compared to the vanilla MLE symbol detector, our model does not need to estimate the channel separately.

Owing to its hybrid nature, it has low computational cost in operation and is highly scalable, facilitating much faster inference as compared to its blackbox datadriven and modelbased counterparts.

The proposed deep architecture is interpretable and has far fewer parameters compared to existing blackbox deep learning solutions. This follows from the incorporation of domainknowledge in the design of the network architecture (i.e., being modelbased), allowing LoRDNet to train with much fewer labeled samples as compared to existing datadriven onebit receivers.
We verify the above characteristics of LoRDNet in an experimental study, where we show that training of the proposed LoRDNet architecture can be performed with far fewer samples as compared to its datadriven counterparts, and demonstrate substantially superior performance compared to existing modelbased and datadriven algorithms for symbol detection in massive MIMO channels with onebit ADCs.
The rest of the paper is organized as follows. In Section II, we present the considered system model and the corresponding MLE formulation. In Section III, we derive LoRDNet by unfolding the firstorder gradient iterations associated with the MLE computation, and present its twostage training procedure. Section IV provides a detailed numerical analysis of LoRDNet applied to MIMO communications. Finally, Section V concludes the paper.
Throughout the paper, we use the following notation. Bold lowercase and bold uppercase letters denote vectors and matrices, respectively. We use
, , and , and to denote the transpose operator, the diagonal matrix formed by the entries of the vector argument, the sign operator, and the natural logarithm, respectively. The symbol represents the Hadamard product, while and are the allone and allzero vectors/matrices. The th entry of the vector is , and is the norm of ; is the ary Cartesian product of a set , and denotes the cone of symmetric positive definite matrices.Ii System Model and Preliminaries
In this section, we discuss the considered system model. We focus on onebit data acquisition and blind signal recovery. We then formulate the MLE for this problem, which is used in designing the LoRDNet architecture in Section III.
Iia Problem Formulation
We consider a lowresolution dataacquisition system which utilizes onebit ADCs. By letting denote the received signal, the discrete output of the ADCs can be written as , where denotes the vector of quantization thresholds, and is the sign function, i.e., if and otherwise. The received vector is statistically related to the unknown vector of interest according to the following linear relationship:
(1) 
where denotes additive Gaussian noise with a covariance matrix of the form with diagonal entries
representing the noise variance at each respective dimension, and
is the channel matrix. We assume that the elements of the unknown vector are chosen independently from a finite alphabet . This setup represents lowresolution receivers in uplink multiuser MIMO systems, where is the symbols transmitted by the users, and is the corresponding channel output, as illustrated in Fig. 1.The overall dynamics of the system are thus compactly expressed as:
(2) 
In the sequel, we refer to as the system parameters. Note that the above system model can be modified using conventional transformations to accommodate a complexvalued system model.
Our main goal is to perform the task of symbol detection, i.e., recover , from the collected onebit measurements . We focus on blind (noncoherent) recovery, namely, the system parameters , i.e., the channel matrix and the covariance of the noise, are not available to the receiver. Nonetheless, the receiver has access to a limited set of labeled samples , representing, e.g., pilot transmissions. The quantization thresholds of the ADCs, i.e., the vector , are assumed to be fixed and known. While we do not consider the selection of in the following, we discuss in the sequel how its optimization can be incorporated into the detection method.
IiB Maximum Likelihood Recovery
To understand the challenges associated with blind lowresolution detection, we next discuss the MLE for recovering from . In particular, the intuitive modelbased approach is to utilize the labeled data to estimate the system parameters , and then to use this estimation to compute the coherent (nonblind) MLE. Therefore, to highlight the limitations of this strategy, we assume here that the system parameters are fully known at the receiver. Let
(3) 
represent the loglikelihood objective for a given vector of onebit observations , where is proven in [8683876]. The coherent MLE is then given by
(4) 
Although the MLE in (4) has full accurate knowledge of the parameters , its computation is still challenging. The main difficulty emanates from solving the underlying optimization problem in the discrete domain, implying that the MLE requires an exhaustive search over the discrete domain , whose computational complexity grows exponentially with . A common strategy to tackle the discrete optimization problem in (4) is to relax the search space to be continuous. This results in the following relaxed unconstrained MLE rule:
(5) 
The optimization problem in (5) is convex due to the logconcavity of , and thus can be solved using firstorder gradient optimization. In particular, the gradient of the negative loglikelihood function with respect to the unknown vector can be compactly expressed as [8683876]:
(6) 
where is a nonlinear function defined as , in which the operator denotes the elementwise division operation, is the derivative of
, that is given by the negative probability density function of a standard Normal distribution, and
is the semiwhitened version of the onebit matrix .As obtained via (5) is not guaranteed to take values in , the final estimate of the symbols is obtained by applying a projection operator to . This operator maps the continuous input vector onto its closest lattice point on the discrete set , i.e.,
(7) 
Tackling a discrete program via continuous relaxation, as done in (5), is subject to an inherent drawback. As a case in point, one can only expect to provide an accurate approximation of the true MLE if the realvalued vector is very close to the discrete valued MLE . In such a case, the MLE is obtained by projecting into the lattice points in . However, this is not the case in many scenarios, and specifically, when the noise variance in each respective dimension is high. In other words, it is not necessarily the case that the minimizer of the objective function on the continuous domain (5) is close to the MLE, which takes values in the discrete set . Note that utilizing the true system parameters will only lead to optimal estimates when considering the original discrete problem (4). In fact, one can no longer necessarily argue that the true system parameters are optimal choices for in the relaxed MLE. This insight, which is obtained from the computation of the coherent MLE, is used in our derivation of the blind unfolded detector in the following section.
Iii Proposed Methodology
In this section, we present the proposed Low Resolution Detection Network, abbreviated as LoRDNet. We begin with a highlevel description of LoRDNet in Subsection IIIA. Then, we present the unfolded architecture in Subsection IIIB and discuss the training procedure in Subsection IIIC. Finally, we provide a discussion in Subsection IIID.
Iiia HighLevel Description
As noted in the previous section, the intuitive approach to blind symbol detection is to utilize the labeled data to estimate the true system model , and then to recover the symbol vector from using the MLE. Nonetheless, the coherent MLE (4) is computationally prohibitive, while its relaxed version in (5) may be inaccurate. Alternatively, one can seek a purely datadriven strategy, using the data to train a blackbox highlyparameterized DNN for detection, requiring a massive amount of labeled samples. Consequently, to facilitate accurate detection at affordable complexity and with limited data, we design LoRDNet via modelbased deep learning [shlezinger2020model], by combining the learning of a competitive objective, combined with deep unfolding of the relaxed MLE.
Learning a competitive objective refers to the setting of the unknown system parameters . However, the goal here is not to estimate the true system parameters, but rather the ones for which the solution to the relaxed MLE coincides with the true value of . This system identification problem can be written as
(8) 
where is the relaxed MLE (5). The optimization problem (8) yields a surrogate objective function , or equivalently, a set of system parameters , referred to as a competitive objective to the true . An illustration of such a competitive objective obtained for the case of is depicted in Fig. 2.
The main difficulty in solving (8) stems from the fact that is not differentiable with respect to the system parameters . We overcome this obstacle by applying a differentiable approximation of , or equivalently, an algorithm that approximates the operator specific to our problem. Since can be computed by firstorder gradient methods, we design a deep unfolded network [monga2019algorithm] to compute the relaxed MLE in manner which is differentiable with respect to . The usage of deep unfolding allows not only to learn a competitive objective via (8), but also results in accurate inference with a reduced number of iterations compared to modelbased firstorder gradient optimization. Furthermore, the unfolded network utilizes a relatively small amount of trainable parameters, thus enabling learning from small amounts of labeled samples.
IiiB LoRDNet Architecture
We now present the architecture of LoRDNet, which maps the low resolution into an estimated . For given system parameters whose learning is detailed in Subsection IIIC based on the competitive objective rationale described above, LoRDNet is obtained by unfolding the iterations of a firstorder optimization of the relaxed MLE (5). Our derivation thus begins by formulating the firstorder methods to iteratively solve (5) for a given .
Let be a parametrized operator defined as , where is a positivedefinite weight matrix and denotes the set of parameters of the operator . Such a linear operator can be used to model a firstorder optimization solver by considering a composition of mappings of the form:
(9)  
where is an initial point, is the set of parameters of the overall mapping . The mapping (9) is differentiable with respect to the system parameters , and its local weights . For a fixed number of iterations , the resulting function is thus differentiable with respect to the set of parameters and its input (unlike the original operator). Therefore, it can now be used as a differentiable approximation of , which allows for a training (optimization) over the set of its parameters based on the gradientbased training algorithms and the backpropagation technique.
Following the deep unfolding framework [monga2019algorithm], the function can be implemented as a
layer feedforward neural network, where the initial point
and the onebit samples constitute the input to the network, and with trainable parameters that are given by . By (6), the th layer computes:(10)  
(11) 
where the overall dynamics of the LoRDNet is given by:
(12) 
Each vector in (10) represents the input to the th layer (or equivalently, the output of the previous iteration), with being the input of the entire network (which represents the initial point for the optimization task). Upon the arrival of any new onebit measurement , the recovered symbols are obtained by feedforwarding through the layers of LoRDNet. In order to obtain discrete samples, the output of LoRDNet is projected into the feasible discrete set , viz.
(13) 
An illustration of LoRDNet is depicted in Fig. 3.
We note that one can also propose an alternative architecture, derived by applying the projection operator at the output of each layer, i.e., by defining
. Such a setting corresponds to the unfolding of a projected gradient descent method. However, our numerical investigations have consistently shown that such an architecture suffers from the vanishing gradient problem during training and a significant degradation in performance. As a result, we implement LoRDNet while applying the projection operator once on the output of the network, and only during inference, as discussed above.
In principle, one can fix for some , for which (12) represents steps of gradient descent with step size . In the unfolded implementation, the weights are tuned from data, allowing to detect with less iterations, i.e., layers. As a result, once LoRDNet is trained, i.e., its weight matrices and the unknown system parameters are learned from data, it is capable of carrying out fast inference, owing to its hybrid modelbased/datadriven structure. Furthermore, the number of iterations is optimized to boost fast inference in the training procedure, as detailed in the following.
IiiC Training Procedure
Herein, we present the training procedure for LoRDNet. In particular, our main goal is to perform inference of the unknown system parameters based on the rationale detailed in Subsection IIIA, i.e., to obtain a competitive objective. The learning competitive objective is used to tune the weights of the unfolded network . Accordingly, we present a twostage training procedure for LoRDNet (12). Once the training of the LoRDNet is completed, it carries out symbol detection from onebit information without requiring the knowledge of system parameters .
IiiC1 Training Stage 1  Learning a Competitive Objective
The first stage corresponds to learning the unknown system parameter . However, as formulated in (8), we do not seek to estimate the true values of the channel matrix and noise covariance , but rather learn the surrogate values which will facilitate accurate detection using the relaxed MLE formulation. We do this by taking advantage of two propertities of LoRDNet: The first is the differentiability of the unfolded architecture with respect to , which facilitates gradientbased optimization optimization; The second is the fact that for , LoRDNet essentially implements steps of gradient descent with step size over the convex objective (5), and is thus expected to reach its maxima.
Based on the above properties, we fix a relatively large number of layers/iterations for this training stage, and fix the weights to . Under this setting, the output of LoRDNet represents an approximation of the relaxed MLE for a given parameter , denoted , i.e., we have that
(14) 
We refer to the setting using in this stage as the basic optimization policy. Note that as the number of layers grows large, the above approximation becomes more accurate. Hence, by substituting (14) into (8) and replacing with the corresponding outputs of LoRDNet, we formulate the loss measure of the first training stage of LoRDNet as:
(15) 
Owing to the differentiable nature of with respect to , we recover based on (15
) using conventional gradientbased training, e.g., stochastic gradient descent with backpropagation, as detailed in our numerical evaluations description in Section
IVIiiC2 Training Stage 2  Learning the Unfolded Weights
Having learned the unknown system parameters in Stage 1, we turn to tuning the parameters of LoRDNet, i.e., the set . We note that in Stage 1, the rationale was to use the basic optimization policy with a large number of layers , exploiting the insight that under this setting, LoRDNet effectively implements conventional gradient descent. However, once Stage 1 is concluded and is learned, it is preferable to reduce the number of layers compared to that used in Stage 1, thus exploiting the ability of the unfolded network to carry out faster inference compared to their modelbased iterative counterparts by learning the weights applied in each iteration [gregor2010learning, monga2019algorithm]. Consequently, the first step in this stage is to set a number of layers to a value which can potentially be smaller than that used in the first training stage, and then optimize the weights according to the following criterion:
(16) 
Generally speaking, in order for a firstorder optimizer (LoRDNet in this case) to provide a descent direction at each iteration (layer), the preconditioning matrices must be positivesemidefinite so that each iteration does not reverse the gradient direction. To incorporate this requirement into LoRDNet training, we reparameterize the preconditioning matrices by writing and performing the traning over the matrices . The resulting twostage training algorithm is summarized as Algorithm 1.
When the network is properly trained, LoRDNet is expected to carry out learned and accelerated firstorder optimization, tuned to operate even in channel conditions for which such an approach does not yield the MLE for the true channel.
IiiD Discussion
LoRDNet is a datadriven acquisition system based on unfolding firstorder gradient optimization methods, designed for lowresolution MIMO receivers operating without analog processing. Its modelawareness enables the receiver to learn to accurately infer from smaller training sets compared to conventional DNN architectures applied to such setups, as suggested, e.g., in [balevi2019one], giving rise to the possibility of tracking blockfading channel conditions via online training, as in [shlezinger2019viterbinet]. Furthermore, LoRDNet differs from previously proposed deep unfolded MIMO receivers as surveyed in [balatsoukas2019deep] in two key aspects: First, LoRDNet is particularly designed for onebit observations, being derived from the iterative optimization formulation which arises from such setups. Second, previous unfolded MIMO receivers either assumed prior knowledge of the channel parameters, as in [samuel2019learning], or alternatively, utilize external modules to directly estimate the CSI as in [he2019model]. LoRDNet exploits the fact that, for its unfolded relaxed convex optimization algorithm to yield the desired MLE, an alternative channel parameters, which differ from the true , should be estimated. Consequently, the training procedure of LoRDNet does not aim to recover the true CSI, but the one which yields a competitive objective which facilitates symbol detection, thus accounting for the overall system task.
The proposed training procedure detailed in Algorithm 1
carries out each training stage once in a sequential manner. This strategy can be extended to optimizing the hyperparameters and the weights in an alternating fashion, i.e. repeating the stages multiple times, while using the learned
in Stage 2 in the Stage 1 that follows. Alternatively, the hyperparameters and the weights can be learned jointly in an endtoend manner, by optimizing (16) with respect to both and simultaneously. The main requirement for carrying out these training strategies compared to that detailed in Subsection IIIC is that the same number of layers should be used when learning both and , while when these stages are carried out once sequentially, it is preferable to use large at Stage 1 and a smaller value, which dictates the number of learned weights, in Stage 2. Furthermore, our numerical evaluations show that training once in a twostage fashion via Algorithm 1 yields similar and sometimes even improved performance compared to learning both and simultaneously in a onestage manner, as well as when alternating between these two stages, as demonstrated in Section IV.A possible extension of the training procedure is to account for ADCs with more than one bit, as well as allow LoRDNet to optimize the quantization thresholds in light of the overall symbol recovery task. While accounting for multilevel ADCs is a rather simple extension achieved by reformulating the objective function (3), optimizing the quantization thresholds requires modifying the overall training strategy. The challenge here is that modifying results in different onebit measurements . In a communication setup, in which periodic pilots are transmitted, one can envision gradual optimization of between consecutive pilot sequences, using their corresponding onebit observations to further optimize LoRDNet. The study of LoRDNet with multilevel ADCs and optimized thresholds is left for future work.
Iv Numerical Study
In this section, we numerically evaluate LoRDNet^{1}^{1}1The source code is available at: https://github.com/skhobahi/LoRDNet., and compare its performance with stateoftheart modelbased and datadriven methodologies. As a motivating application for the proposed LoRDNet, we focus on the evaluation of LoRDNet for blind symbol detection task in onebit MIMO wireless communications. In the following, we first detail the considered onebit MIMO simulation settings in Subsection IVA, after which we evaluate the receiver performance, compare LoRDNet to alternative unfolded architectures, and numerically investigate its training procedure in Subsections IVB, IVC, and IVD, respectively. .
Iva Simulation Setting
We consider an uplink onebit multiuser MIMO scenario as in (2). We focus on a single cell in which a base station (BS) equipped with antenna elements serves singleantenna users. Specifically, we consider two cases of and , i.e., a and a MIMO channel setup. The transmitted symbols of the users, represented by the unknown vector , are randomized in an independent and identically distributed (i.i.d.) fashion from a BPSK constellation set . The projection mapping is thus , where the function is applied elementwise on the vector argument. In the sequel, we assume that while the channel matrix , representing the CSI, is not available at the BS, the noise statistics are known and are fixed to . Accordingly, our goal is to utilize LoRDNet to recover the transmitted symbols from the onebit measurements. Note that the proposed methodology can carry out the task of symbol detection even for the case in which the noise statistics is unknown.
Channel Models: We evaluate LoRDNet under two channel models: (i) i.i.d. Rayleigh fading channels, where ; and (ii) the COST2100 massive MIMO channel [flordelis2019massive]. The COST2100 channel model is a realistic geometrybased stochastic model which accounts for prominent characteristics of massive MIMO channels, and is considered to be an established benchmark for evaluating MIMO communication systems. We generate the channel matrices for the COST2100 model for a narrowband indoor scenario with closelyspaced users at GHz band, where the BS is equipped with a uniform linear array (ULA) that has omnidirectional receive antenna elements. The onebit ADC operation uses zero thresholds, i.e.
. We define the signaltonoise ratio (
) as:(17) 
Benchmark Algorithms: As LoRDNet combines both modelbased and datadriven inference, we compare its performance with stateoftheart modelbased and datadriven methodologies in a onebit MIMO receiver scenario. In particular, we use the following benchmarking detection algorithms:

The modelbased nML proposed in [choi2016near]. The nML algorithm is based on a convex relaxation of the conventional ML estimator, and requires the exact knowledge of the channel parameters . We set the number of iterations of the nML algorithm to , and the stepsize is chosen using a grid search method to further improve the performance of the nML, while the remaining parameters are those reported in [choi2016near].

The datadriven Deep Soft Interference Cancellation (DeepSIC) methodology proposed in [shlezinger2020deepsic], with five learned interference cancellation iterations. DeepSIC is channelmodelagnostic and can be utilized for symbol detection in nonlinear settings such as lowresolution quantization setups. Unlike LoRDNet, which is designed particularly for observations of the form (2) where is unknown, DeepSIC has no prior knowledge of neither the channel model nor its parameters.
LoRDNet Setting: The LoRDNet receiver is implemented with layers. Recall that the first training stage of the LoRDNet is concerned with finding a competitive objective by carrying out the training of the network over the unknown set of channel parameters . Unless otherwise specified, we focus on the case where only is unknown, and the correlation matrix of the noise is available.
During the first training stage, we set , and recover based on the objective (15) using the Adam stochastic optimizer [kingma2014adam] with a constant learning rate of . Next, we carry out the training of the LoRDNet during the second stage according to the objective function defined in (16) and over the set of trainable parameters , using the Adam optimizer with a learning rate of , and a minibatch of size . We consider the learning of diagonal preconditioning matrices (unfolded weights) during the second training stage. The network is trained for epochs during the first training stage, and epochs during the second training stage, with the same value of used in both stages.
IvB Receiver Performance
Here, we evaluate the performance of the proposed LoRDNet, comparing it to the aforementioned benchmarks as well as examining its dependence on the number of training samples . In particular, we numerically evaluate the biterrorrate (BER) performance versus SNR using different training sizes , for both and channel configurations. For DeepSIC, we use only , while the nML recever of [choi2016near] operates with perfect CSI, i.e., with full accurate knowledge of . All datadriven receivers are trained for each SNR separately, using a dataset corresponding to that specific SNR value.
The results are depicted in Figs. 4(a) and 4(b) for a channel configuration under the Rayleigh fading and COST2100 channel models, respectively. Furthermore, the BER performance for a configuration under both channel models are illustrated in Fig. 5(a) for the Rayleigh fading channel, and in Fig. 5(b), for the COST2100 channel model. Based on the results presented in Figs. 4 and 5, one can observe that LoRDNet significantly outperforms the competing modelbased and datadriven algorithms and achieves improved detection performance under both simulated channels, as well as both MIMO configurations.
In particular, the nML algorithm, which is designed to iteratively approach the MLE using ideal CSI (prior knowledge of the channel matrix), is notably outperformed by LoRDNet. Such gains by LoRDNet, which learns to compute the MLE from data without requiring CSI, compared to the modelbased nML algorithm, demonstrate the benefits of learning a competitive objective function combined with a relaxed deep unfolded optimization process. Specifically, the results depicted in Figs. 45 illustrate that one can significantly improve the receiver performance by learning a new channel matrix upon which the learned competitive objective function admits optimal points near the true symbols. The learning of the competitive objective function is possible due to the hybrid modelbased/datadriven nature of LoRDNet, and the fact that it is derived based on the unfolding of firstorder optimization techniques. From a computational complexity pointofview, the depicted performance of the nML algorithm in Figs. 45 is achieved by employing iterations of a firstorder optimization algorithm, while LoRDNet uses only layers/iterations—exhibiting a significant reduction in the computational cost during inference as compared to the nML algorithm.
Comparing LoRDNet to DeepSIC illustrates that LoRDNet benefits considerably from its modelaware architecture. The fact that LoRDNet is particularly tailored to the onebit system model of (2) allows it to achieve improved accuracy, even in the case of training with small amounts of data. For instance, for the MIMO Rayleigh fading channel (see Fig. 4(a)), LoRDNet trained with samples, achieves BER of at SNR of dB, while DeepSIC trained with the same dataset requires SNR as high as dB to achieve such an error rate. Considering Fig. 4(b), a similar behavior is observed in the COST2100 channel, for a BER of . A similar performance gain for LoRDNet can be observed in a configuration; see Fig. 5. Furthermore, it can be observed that the LoRDNet still outperforms the DeepSIC methodology, even when trained on times less training samples. In particular, for the channel setup considered in this part, the total number of trainable parameters of LoRDNet is merely . For comparison, DeepSIC, which uses and trains a multilayer fullyconnected network for each user at each interference cancellation iterations, consists here of over trainable parameters. Such a reduction in the number of parameters allows for achieving substantially improved performance with much smaller training points, as observed in Figs. 45. Finally, we note that the small number of trainable parameters of LoRDNet shows its potential for online or realtime training, as proposed in [shlezinger2019viterbinet]. This can be achieved by using periodic pilots with minimal overhead on the communication, while inducing a relatively low computational burden in its periodic retraining.
So far, we have investigated the performance of the proposed LoRDNet for scenarios with known noise statistics, and unknown (i.e., ). Next, we investigate the detection performance of LoRDNet when both the channel and noise covariance matrices are not available, i.e., we set and carry out the training according to the proposed two stage methodology. Specifically, we consider the learning of a diagonally structured in addition to the channel matrix for this scenario. Fig. 6 demonstrates the BER versus performance of LoRDNet under both channel models, when trained using a dataset of size . The performance of LoRDNet for the case of is further provided for comparison purposes. Observing Fig. 6, one can readily conclude that the proposed network can successfully perform the task of symbol detection also when in unknown. Furthermore, it can be observed that a small gain in performance is achieved for both channel models when as compared to the case of
, which is presumably due to the careful addition of more degrees of freedom in learning a competitive surrogate model.
IvC Performance of Competing Deep Unfolded Architectures
In this part, we compare the performance of the proposed LoRDNet with alternative deep unfoldingbased architectures tailored for the problem at hand. Recall that the architecture of LoRDNet uses trainable parameters which are shared among the different layers, as illustrated in Fig. 3. Thus, LoRDNet is comprised of a relatively small number of trainable parameters, and uses a twostage learning method to train the shared parameters, representing the competitive model, and the iterationspecific weights, encapsulating the firstorder optimization coefficient. Nonetheless, the conventional approach for unfolding firstorder optimization techniques is to overparameterize the iterations, and then, train in an endtoend manner using a onestage training procedure discussed earlier. Therefore, to numerically evaluate the proposed unfolding mechanism of LoRDNet, we next compare it to two conventional unfolding based benchmarks derived from the relaxed MLE:

Benchmark 1: An overparameterized deep unfolded architecture obtained by setting the computational dynamics for the th layer as:
(18) Here, are the trainable parameters of the th layer, and .

Benchmark 2: Here, we again use the unfolded architecture given in (18), while limiting the number of trainable parameters by constraining the rank of the learned matrices. In particular, we set and , where denotes the set of trainable parameters of the th layer of the unfolded network. The dimension controls the rank of the resulting weight matrices , and thus the number of trainable parameters.
Comparing (18) with the corresponding dynamics of LoRDNet in (10), we note that the channel matrix , the preconditioning matrices , and the noise covariance matrix are now absorbed into the perlayer trainable matrices and . Accordingly, these unfolded benchmarks, which follow the conventional approach for unfolding optimization algorithms, are less faithful to the underlying model. These benchmarks also differ from LoRDNet in their number of trainable parameters. In particular, Benchmark 1 with layers has trainable parameters, while Benchmark 2 has weights, which can be controlled by the setting of the hyperparameter . For comparison, LoRDNet has trainable parameters for the case of and diagonally structured preconditioning matrices, while for the case of with a diagonally structured preconditioning matrix and noise covariance matrix it has trainable parameters.
We evaluate the performance of the proposed LoRDNet compared to the unfolded benchmarks in the following simulation setup. We consider train all the considered network using a dataset of size , while the highlyparameterized Benchmark 1 is also trained using samples. For Benchmark 2, we set . All architectures have layers and their performance are evaluated on the same testing dataset of size . The unfolded benchmarks are trained in the conventional endtoend fashion. The channel model is a Rayleigh fading channel. Foror the considered scenario above, the LoRDNet admits a total of trainable parameters, while Benchmark 1 ha a total of (approximately times more parameters than LoRDNet), while Benchmark 2 has trainable parameters.
Fig 7 depicts the BER versus of LoRDNet compared to the unfolded benchmarks. We observe in Fig. 7 that the proposed LoRDNet significantly outperforms the conventional unfolding based benchmarks,indicating the gains of the increased level of domain knowledge Incorporated in to the architecture of LoRDNet and its two stage training procedure. It is also observed that the performance of Benchmark 1 increases with more training samples. Interestingly, for a small training set of samples, Benchmark 2, which is obtained by imposing a rank constraint on the trainable parameters of Benchmark 1, achieves improved performance over Benchmark 1, due to its notable reduction in the number of trainable parameters.
IvD Training Analysis
In this part, we analyze the performance of the proposed twostage training procedure described in Subsection IIIC. The training aspects of LoRDNet are numerically evaluated for the Rayleigh channel model detailed before.
Following our insight on the ability of LoRDNet to accurately train with small datasets, we begin by evaluating the performance of the LoRDNet versus the training data size . For this study, we generate training datasets of size and evaluate the performance of LoRDNet using test samples. Fig. 8 depicts the BER achieved for each training size , for dB. We can observe from Fig. 8 that the performance of the LoRDNet improves across all values, where the improvements are most notable for . Interestingly, it may be concluded from Fig. 8 that LoRDNet is capable of accurately and reliably performing the task of symbol detection without CSI with as few as samples. The ability of LoRDNet to train with very few training samples (compared to the blackbox DNN models for onebit MIMO receivers[balevi2019one, zhang2020deep], as well as the DeepSIC architecture), stems from its incorporation of the domainknowledge in designing the LoRDNet architecture. This in turn leads to far fewer trainable parameters requiring much less training samples for optimizing the network.
Next, we analyze the performance and the effect of the two stage training methodology detailed in Algorithm 1 on the detection performance of the LoRDNet architecture. Recall that the first training stage is concerned with finding a competitive objective function through an optimization of LoRDNet over the unknown system parameters , while the second training stage tunes the positive definite preconditioning matrices to accelerate the convergence of the LoRDNet to the optimal points of the obtained competitive objective. To numerically evaluate the performance of the training methodology, we set dB, and generate a training dataset of size and a testing dataset of size . Then, we compare performance of Algorithm 1 with two other competing training procedures:
OneStage Training: Here, the weights and the unknown system parameters are jointly learned in a single stage. The objective of this one stage training procedure for LoRDNet is
(19) 
Alternating Training: This procedure is concerned with training the network by alternating between the two optimization problems (15) and (16) consecutively with respect to each training epoch. Here, the network is trained over alternations, corresponding to a total of training epochs. Namely, at each epoch index , we update the variables
for odd
and update for even .Before we proceed with the evaluation results, we provide some useful connections to notions widely used in the deep learning literature. Generally speaking, the performance of a statistical learning framework and its training procedure is evaluated using its generalization gap and testing error. The generalization gap of a model can be defined as the difference between the training and testing errors. Specifically, a model with smaller generalization gap and smaller testing error is highly favourable. Furthermore, a higher generalization gap may indicate that the network has overfitted to the data, and hence, it does not generalize well. For two models with the same generalization gap, the one with lower testing error is favourable. Fig. 9 depicts the BER versus the training epoch for both the training and testing dataset. We first note that the proposed two stage training method outperforms all other competing procedures and it assumes a significantly lower testing error as compared to other algorithms. Interestingly, one can observe that the proposed methodology has successfully closed the generalization gap as the testing and training error are very close to each other. On the other hand, the other two training procedures admits very large generalization gap indicating the fact that their utilization has resulted in an overfitting of the network to the data. Furthermore, it can be observed from Fig. 9 that the major improvement of the detection accuracy of the LoRDNet is taking place during the first training stage when finding a competitive objective function, i.e., epochs , where a slight improvement in the BER is achieved during the second stage, i.e., .
The success of the proposed two stage training procedure in closing the generalization gap compared to the one stage training procedure is presumably due to the fact that the twostage training approach leads to an implicit regularization on the model capacity limiting the total number of parameters used during the entire training procedure. On the contrary, the one stage training procedure allows the neural network to use its full capacity leading to an overfitting and a larger generalization gap, as observed in Fig. 9.
As discussed in Subsection IIIC, the second training stage allows LoRDNet to achieve fast inference, i.e., accelerated convergence to the optimal points of the competitive objective function. To illustrate this behavior, we perform a perlayer BER evaluation of LoRDNet, exploiting the interpretable modelbased nature of the LoRDNet, in which each layer represents an unfolded firstorder optimization iteration, and thus its output can be used as an estimate of the transmitted symbols. Figs. 10(a) and 10(b) depict the BER versus the layer/iteration number of LoRDNet at the completion of training stages 1 and 2, for the Rayleigh fading channel and the COST2100 channel model, respectively. We observe in Fig. 10 that the convergence of LoRDNet after the completion of the first training stage is slow and requires at least layers/iterations to converge. Interestingly, we note from Fig. 10 that the second training stage indeed results in an acceleration of the convergence of LoRDNet via learning the best set of preconditioning matrices for the problem at hand in an endtoend manner. In particular, after the completion of the second training stage, LoRDNet can accurately and reliably recover the symbols with as few as layers. This observation hints that one can consider further truncation of the LoRDNet after the training to reduce the computational complexity while maintaining its superior performance.
In order to quantify the quality of the learned competitive objective in closing the gap between the discrete optimization problem and its continuous version, we further provide the periteration performance of the nML algorithm and the LoRDNet algorithm which operate with perfect CSI. For this scenario, LoRDNet utilizes the true , and is thus optimizer only over the weights while employing the exact channel model . It is observed from Figs. 10(a)10(b) that learning a new surrogate model for the continuous optimization problem at hand is indeed highly beneficial and admits a far superior performance in recovering the transmitted symbols. The analysis provided in Fig. 10 further supports the rationale behind the proposed twostage training methodology, and the fact that the second training stage results in an acceleration of the underlying firstorder optimization solver (i.e., achieving a much faster descent per step) upon which the layers of the LoRDNet are based.
V Conclusion
In this work, we introduced LoRDNet, which is a hybrid datadriven and modelbased deep architecture for blind symbol detection from onebit observations. The proposed methodology is based the unfolding of firstorder optimization iterations for the recovery of the MLE. We proposed a twostage training procedure incorporating the learning of a competitive objective function, for which the unfolded network yields an accurate recovery of the transmitted symbols from onebit noisy measurements. In particular, owing to its modelbased nature, LoRDNet has far fewer trainable parameters compared to its datadriven counterparts, and can be trained with very few training samples. Our numerical results demonstrate that the proposed LoRDNet architecture outperforms the stateoftheart modelbased and datadriven symbol detectors in multiuser onebit MIMO systems. We also numerically illustrate the benefits of the proposed twostage training procedure, which allows to train with small training sets and infer quickly, due to its interpretable modelaware nature.
Comments
There are no comments yet.