Towards Radio Designs with Non-Linear Processing for Next Generation Mobile Systems

12/24/2020 ∙ by Konstantinos Nikitopoulos, et al. ∙ 0

MIMO mobile systems, with a large number of antennas at the base-station side, enable the concurrent transmission of multiple, spatially separated information streams and, therefore, enable improved network throughput and connectivity both in uplink and downlink transmissions. Traditionally, to efficiently facilitate such MIMO transmissions, linear base-station processing is adopted, that translates the MIMO channel into several single-antenna channels. Still, while such approaches are relatively easy to implement, they can leave on the table a significant amount of unexploited MIMO capacity. Recently proposed non-linear base-station processing methods claim this unexplored capacity and promise a substantially increased network throughput. Still, to the best of the authors' knowledge, non-linear base-station processing methods not only have not yet been adopted by actual systems, but have not even been evaluated in a standard-compliant framework, involving of all the necessary algorithmic modules required by a practical system. This work, outlines our experience by trying to incorporate and evaluate the gains of non-linear base-station processing in a 3GPP standard environment. We discuss the several corresponding challenges and our adopted solutions, together with their corresponding limitations. We report gains that we have managed to verify, and we also discuss remaining challenges, missing algorithmic components and future research directions that would be required towards highly efficient, future mobile systems that can efficiently exploit the gains of non-linear, base-station processing.



There are no comments yet.


page 1

page 5

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Much of the current communication systems research focuses on finding new, breakthrough ways to increase the achievable throughout (both at a user and a system level) and user connectivity capabilities, while meeting very tight latency requirements. In this direction, a plethora of ideas have been proposed. Still, very few of these ideas, and perhaps the simplest in terms of practical realization, have finally been adopted by actual wireless systems and standards. In the natural question “why is this happening?” someone can give several answers. In many of the works published, the proposed ideas are only evaluated via simulations and, therefore, the results may be heavily assumption-depended. Namely, the showed gains can be a strong function of the simulated environment that can sufficiently differ from the actual transmission environment. To facilitate more realistic evaluations, many researchers use “proof-of-concept” systems. However, this approach comes with its own challenges and practical limitations. Such a challenge is the availability of appropriate research platforms able to realize and validate proposed novel ideas. In addition, in many cases and especially in physical layer research, the proposed ideas are not evaluated in a complete, standard-compliant environment. As a result, additional algorithmic components may be required to make a new idea adoptable by a practical system or communication standards. As we will discuss later in detail, such components are often related to limitations imposed from the existing system design, as for example limitations related to conventional signalling procedures or even other mechanisms that are required for transmsission optimization, as for example the transmission rate selection. As a result, implementing and evaluating new physical layer concepts and ideas in a standard-compliant environment can be of substantial importance, not only to verify the potential gains in more realistic transmission scenarios, but also to identify any need for new “building blocks” or required modifications in the standards (e.g., the signalling, pilots, access methods). In other words, testing and verifying new physical layer ideas in a research-grade, standard-compliant environment can be an important step towards future systems design and evaluation, that can highly increase our confidence on newly proposed approaches and can assist in identifying further requirements and missing components that will enable the adoption of novel ideas in actual systems.

In this work we focus on recently proposed ideas to improve bs processing in mimo spatially multiplexed systems. The use of a large number of antennas at the bs side has been shown to be a very efficient way to increase the achievable throughput and the user connectivity capabilities of mobile systems, both in uplink and downlink transmissions, by enabling several concurrently transmitting, spatially separated users (i.e., Multi-User mimo)[bigstation, shepard2012argos, tan2009sam]. Traditionally, in such systems, linear precoding (in the downlink) and detection (in the uplink) approaches are employed at the bs, based on the zf or on the mmse principles. Such linear approaches have two major practical benefits. Their implementation is relatively simple, and since they practically translate the mutually interfering information streams into traditional, non-interfering ones, they can be easily adopted by standards with minimum modifications. Still, their main drawback is that, in order for these approaches to be efficient in terms of achievable throughput, the number of bs antennas needs to be much larger than the number of concurrently transmitted information streams, and, therefore, the number of served users [shepard2012argos, bigstation]. However, since by increasing the number of antennas the capacity of the mimo channel generally increases [telatar1999capacity], such an approach leaves on the table a significant amount of unexploited capacity [nikitopoulos2014geosphere]. Equivalently, it unnecessarily increases the number of bs antennas for a certain number of users, significantly increasing the bs cost and reducing power efficiency. In contrary, non-linear bs processing approaches, like “hard” and “soft” Maximum-Likelihood detection, in the uplink [nikitopoulos2014geosphere, STS], and Vector-Perturbation in the downlink [hochwald2005vector], promise substantially increased achievable throughput and user connectivity. Still, to the best of our knowledge, such approaches have not yet been adopted by practical systems and have not even been validated in a standard-compliant scenario. In addition, and perhaps as a consequence, it is not obvious what further system changes are required to deliver the promised gains in practice.

Contributions of this paper. In this work we present our experience by trying to incorporate and validate the performance gains of advanced, non-linear methods in a 3gpp compliant framework. We outline the main challenges we have faced in order to realize and evaluate such approaches, together with our adopted approaches and their limitations. We report gains that we have been able to verify, and we describe missing components, remaining challenges, potential solutions, and open research directions that would enable the adoption of such approaches in practice. Finally, we discuss some future research directions that have the potential to substantially increase both the throughput and connectivity capabilities of next generation wireless systems by adopting new forms of non-linear bs processing.

Ii Outline of Non-linear Processing Techniques for Base-Station Processing

As discussed, current mimo deployments mostly employ linear bs processing, both for uplink and downlink transmissions, but such approaches may leave a significant amount of capacity unexploited. Instead, non-linear bs processing approaches promise consistent gains compared to the linear ones, both in terms of achievable throughput and user connectivity. For uplink transmissions, “hard” Maximum-Likelihood detection methods have been proposed, both exact and approximate, and with most of them being realized in terms of sphere decoding [BurgHardSD, nikitopoulos2014geosphere, FSD, GSD, shabanykbest, Nilssonkbest]. Most of these approaches, though, have been evaluated using simulations and by assuming “Rayleigh” or other mathematically modeled mimo channels. While such mathematical models are necessary for the theoretical analysis of such systems, they do not necessary capture the spatial multiplexing capabilities of the actual mimo channels. In addition, the provided performance of these methods is often presented in terms of (uncoded) bit-error-rate, which is not adequate for evaluating systems throughput gains. The sphere decoding approaches of [nikitopoulos2014geosphere, FlexCore, FSD] are evaluated in actual transmission channel environments and in terms of achievable throughput. Still, their evaluations are based on a very limited number of transmission rates (i.e., combinations of QAM constellation size and coding rates) that are selected based on their average performance. In addition, since the processing takes place off-line, the reported achievable rates do not include the impact of the higher layers of the protocol stack (e.g., the impact of the harq mechanism). In addition, “hard” detection cannot be used jointly with state-of-the-art “soft” channel encoding and decoding schemes (e.g., LDPC, Turbo) adopted in recent standards, and therefore, are of limited practical interest.

For use with soft channel decoding schemes, soft-output sphere decoders have been proposed to reduce the complexity of optimal soft detection [STS, Tuplesoftout, SFSD, jalden2005parallel]. Again, most soft sphere decoding approaches, including the sequential sphere decoder of [STS] and the soft fixed complexity sphere decoder of [SFSD], are evaluated by assuming mathematically modelled mimo channels. The massively parallel hard and soft detectors of [nikitopoulos2018massively, 5GRefWileyKN], that enable practical low complexity and low latency non-linear detection, are evaluated both in mathematically modelled and measured channels. Still, these evaluations are based on a limited number of transmission rates, and the performance is reported for rates that have been chosen by an exhaustive search to maximize the average throughput, across all positions, instead of optimizing rate per packet transmission. In addition, similarly to the hard detection approaches, they do not capture the impact of the higher layers of the protocol stack.

In the downlink, non-linear, theoretical precoding approaches exist which claim the mimo channel capacity that is currently unexploited by linear precoders [hochwald2005vector, FCSE, MMSEVP, gain15dbvp, improvedvp, VPgeneralized]. These approaches are based on Dirty Paper Coding principles which can achieve the capacity of the Gaussian broadcast channel [costa1983writing]. In this direction, the non-linear Tomlinson-Harashima precoding [harashima1972matched] can substantially improve on the throughput achievable by traditional linear precoding. Improving on the Tomlinson-Harashima precoding, vector perturbation [hochwald2005vector] precoding can further contribute into bridging the gap to the mimo channel capacity limit. In particular, efficiently perturbing the transmitted constellation symbols in a way that the corresponding perturbation effect can be efficiently compensated at the receiver side. Again, most evaluations of vector perturbation precoding [hochwald2005vector, FCSE] are limited to simulations employing mathematically modelled channels. The massively parallel vector perturbation precoder of [husmann2018viper], that promises practical non-linear precoding, is evaluated by over-the-air experiments, but also with off-line processing, inheriting the corresponding evaluation drawbacks.

As discussed, to the best of our knowledge, none of the above approaches has been evaluated in a standard-compliant framework, neither a corresponding attempt has been reported that would identify missing algorithmic components, and further challenges that need to be resolved.

Iii Challenges, Adopted Approaches & Lessons Learned

Here we describe our experience by trying to incorporate and validate the performance of non-linear processing approaches in a 3GPP compliant environment. We outline some of the main challenges we have faced, as well as our adopted solutions together with their corresponding limitations, and the related lessons we have learned. As we will discuss in detail, such an attempt came with numerous challenges, ranging from finding (and extending) an appropriate software and hardware platform to perform our evaluations, to challenges related to missing components and practical aspects of the algorithms that, to the best of our knowledge, have not been highlighted/identified before.

Iii-a Seeking for the appropriate software platform

There is a number of software platforms which aim at providing a 3gpp compliant protocol stack, capable of running on general-purpose processors. They can be broadly classified as commercial and open-source. The commercial solutions include, among others, the LTE and NR Network Software Suit by Amarisoft

[Amarisoft], the National Instruments LTE Application Framework for LabVIEW Communications System Design Suite [LTEAPPFW] and Intel’s FlexRAN [FlexRAN]. The most complete is perhaps the solution provided by Amarisoft which, in contrast to other options, provides a full protocol stack implementation on bs side and ue side. Although it supports many features and transmission modes, due to its closed-source nature, Amarisoft solution cannot be openly used for physical layer research. In contrast to commercial platforms, open-source solutions which include srsLTE [srsLTE], openLTE [openLTE] and oai [kaltenberger2020openairinterface] are freely available to the public. Among those, it seems that the most advanced platform is oai which is the open-source solution with the largest developer community actively working towards adding new features into the existing code base (e.g. support for 5G nr).

Our adopted solution. For our evaluations, we extended our recently proposed sword platform [8938716], that overcomes the missing support for large/massive mimo setups, as well as the inherent inability of existing approaches to investigate non-linear processing without prohibitive software and hardware optimization necessary. To support downlink and uplink MU-MIMO transmission schemes, which were in our main interest for testing non-linear processing approaches, sword significantly extends the oai code base [9053352] and introduces a completely new mode of operation which we call pseudo-rt. This new mode combines the properties and builds upon two existing modes of operations already supported by the oai which permit rt ota transmission and emulation of an entire radio access network without the use of sdr modules. Compared to the generally adopted method of the offline processing in which a received signal is stored in a raw format on the receiver side and then processed, the pseudo-rt can be effectively used to evaluate the impact of advanced physical layer approaches on the overall system performance. In contrast to the offline processing, the pseudo-rt mode makes use of a pause period between each transmission to facilitate signal processing on both sides. As a result, it preserves the dependence between consecutive events, allowing for a more realistic setting in which the full-protocol stack is executed.

In order to enable pseudo-rt processing, a mechanism is required to ensure that processing of a subframe at a receiver side begins only after processing at a transmitter side is completed. For this purpose, the subframe processing synchronization mechanism developed as part of the oai emulation mode was reused and extended. The extensions include a number of changes in the oai device library responsible for handling communication with sdr modules to allow for appropriate handling of multiple ue and multiple bs radio chains. Further, to account for various aspects related to transmission over a real channel (e.g. propagation delay), a subset of routines used in oai rt mode were also modified. It is worth noting that the subframe processing synchronization mechanism does not ensure the exact time when transmission and reception is initiated on each sdr module. To circumvent this, additional synchronization between sdr modules is required. This can be achieved with an external reference clock source which is common for all sdr modules on the bs side and ue side.

Lessons learned. The effective investigation of advanced physical layer approaches requires supporting large/massive mimo setups and pseudo-rt mode of operation, which are not yet available in existing platforms. While our platform provides these features, the current implementation of the pseudo-rt mode of operation mandates that processing for all ue and bs is preformed by a single process, executed on a single workstation. Although beneficial during development and debugging of new features, we found that this architecture does not scale well for a higher number of ue and bs due to limited computational power. In the next iteration of our software platform we intend to adopt a new architecture which permits ue processing to be executed in a separate process (and a separate machine) to allow for better flexibility in allocation of resources for processing. Note that this is also a key enabler in providing more flexibility in interconnecting sdr modules, as the current software architecture mandates that all radio modules are connected to the same workstation. As a result, in order to conduct measurements under various channel conditions, long, low-attenuation cables are required which interconnect ue antennas with sdr modules on the ue side. We noticed that these cables, due to their limited length, can significantly restrict the set of scenarios that can be investigated. To address this we foresee to rework the subframe processing synchronization mechanism which constitutes the core of the pseudo-rt mode, and thus eliminate the need for such cables. Given the new architecture, the reworked mechanism would allow for the flexibility in interconnecting sdr modules used on the ue side with any workstation dedicated for ue processing.

Iii-B Seeking for the appropriate hardware platform.

There is a number of hardware platforms capable of supporting mimo setups which aim to be open to everyone for experimentation and can be potentially used for evaluation of new physical layer solutions. One example of such a hardware platform is COSMOS [COSMOS] which is a city-scale testbed deployed in the city of New York aimed at providing means for real-world experimentation on next generation wireless technologies and applications. Another example is POWDER [POWDER] which is also a cite-scale testbed run by the University of Utah. Contrary to COSMOS, POWDER provides hardware components specifically dedicated for large/massive MIMO experimentation, with up to 64 antennas per site/sector. Interestingly, both COSMOS and POWDER allow for the use of various open-source software platforms such as oai, srsLTE, or openLTE. Yet another example of a hardware platform is LuMaMi [malkowsky2017world] of Lund University. LuMaMi is much smaller in scale compared to COSMOS and POWDER, but in contrast to the other two testbeds, it is specifically dedicated for conducting large/massive MIMO related research and supports up to 128 radio chains.

Although all three setups have a broad range of capabilities, they come with certain limitations, that make them non appropriate for meeting our objectives, at least at their current design stage. For instance, in case of COSMOS, the capabilities of sdr modules used in the deployed nodes are limited to a maximum of four radio chains per site/sector. This can be potentially circumvented by considering a distributed MIMO setup, however, due to additional challenges, this type of setup currently is not our main focus. Situation is slightly different in case of POWDER. In this case the limitation resides on the ue side, as only two sdr modules in POWDER’s massive MIMO setup seems to be currently dedicated to run as ue. This means that the non-linear processing gains would be difficult to demonstrate since they target supporting numbers of users that is similar to the numbers of bs antennas [nikitopoulos2014geosphere, FlexCore, husmann2018viper]. The main limitation of LuMaMi is that, contrary to the other testbeds, it heavily relies on proprietary hardware and software solutions from National Instruments [NIMIMO]. This means that any experiments would have to be based on National Instruments’ software. Note that LuMaMi was not designed to be used for evaluation of physical layer approaches as part of a full 3gpp compliant protocol stack and it is not clear if LuMaMi would support the National Instruments software extensions which could potentially bridge this gap. In addition to the above, in all three cases, the lack of physical access to nodes dedicated for experimentation on the ue side, restricts investigation to a limited set of scenarios.

Our adopted solution. The identified limitations of the existing publicly available hardware platforms convinced us to invest in development of our own hardware platform which can be easily moved around and which permits investigation of scenarios with different number of bs antennas, and different number of ue. The main hardware component of our sword hardware platform is a multi-core x86_64 workstation with a large number of PCIe lanes. The large number of PCIe lanes is necessary to host multiple NICs, which in turn we use to interface with sdr modules of our choice. The sdr module selected is the usrp X series with UBX daughterboard. usrp X series hosts two independent radio chains and is one of the sdr modules recommended by Ettus for applications which require phase alignment [USRP_UBX]. In order to synchronize and maintain phase alignment across multiple sdr modules we exploit the Ettus Research Octoclock-G CDA-2990 [octoclock] which is a highly-accurate external clock reference and pulse distribution module. The synchronized USRPs on the bs side are connected to a 3.4-3.8GHz antenna array which is composed of 64 half wavelength-spaced elements in a dual-polarized, 8x8 configuration. Circulators are used to connect TX and RX paths of each radio chain to an antenna port. More detailed description with a rational behind using specific building blocks can be found in [8938716].

Lessons learned. In order to investigate and demonstrate the benefits of non-linear processing, a movable hardware platform which can run multiple ue and support large / massive mimo setup is needed. While our sword hardware solution meets these requirements, maintaining phase alignment through reference clock sharing across multiple usrp proved to be difficult and required frequent execution of tdd reciprocity calibration to compensate for any drifts. We observed that such drifts had a significant negative impact on system performance, in particular when number of ue in a setup was approaching number of bs antennas. In order to improve this we plan to achieve phase alignment through the lo sharing, rather than reference clock sharing. As highlighted by Ettus in [NI_RFSYNC], the lo sharing can significantly reduce short term and long term phase drift. Note that usrp N32X series would be required for this purpose. Further, our existing hardware setup is currently based on the use of circulators which, due to the limited output power of usrp X series, significantly limits the range of scenarios which can be investigated. In order to overcome this we intend to replace circulators with external power amplifiers in the next iteration of our platform.

Iii-C Remaining system challenges and tweaks around them

While trying to evaluate the non-linear approaches we came across a number of practical issues that needed either to be resolved or to bypassed. These are:

Enabling non-linear processing. As many non-linear decoding approaches are designed for OFDM transmission, in order to test non-linear processing in the uplink, we modified the processing of pusch in our LTE-based platform by making transform precoding optional (see LTE pusch processing in [3gpp.36.211]). To inform ue about the use of transform precoding we extended rrc signaling in line with the 5G-NR specification (note that transform precoding in 5G-NR is optional and can be dynamically enabled or disabled using rrc signaling). We faced similar issues with non-linear precoding approaches in the downlink which adopt vector perturbation and thus require a modulo operation to be applied at the transmitter side [hochwald2005vector, improvedvp]. To revert this operation on the receiver side, we modified ue processing of pdsch accordingly. Furthermore, we extended rrc signaling to inform ue about the use of vector perturbation. Note that to enable more dynamic switching existing set of dci in LTE and 5G-NR used for scheduling transmission opportunities could be extended to include information on whether the incoming transmission underwent vector perturbation.

Transmission rate selection. amc is an important aspect of 3gpp systems that enables the efficient utilization of the available spectrum resources. However, amc for non-linear is still an open problem. As discussed before, in order to evaluate performance of non-linear algorithms, the research community usually conducts an exhaustive search by running experiments for a small number of rates (i.e., QAM constellations and channel coding rates) and shows the average performance per rate. Although useful, the number of rates is in general very limited. In order to better evaluate the performance of non-linear approaches, and in the absence of amc, we have applied an “adaptive” rate adaptation algorithm which selects the employed mcs based on the reported ACK/NACK information. More specifically, the employed algorithm tracks the erroneous and correctly received transmissions in both uplink and downlink. Based on this information, and given a maximum and minimum mcs, the algorithm attempts to adjust the mcs value after a predefined number of consecutive ACKs, or NACKs is received (resetting a ACK counter, when a NACK is received, and NACK counter, when an ACK is received). To prevent excessive mcs switching, the proposed mcs selection approach exploits also a simple “cool-off period mechanism” that prevents any mcs changes for a specific number of frames after the last mcs change. Still, while our adopted approach can provide an improved throughput evaluation compared to traditionally used approaches that use a limited number of rates (and they depict the rate that maximizes the average performance across channels), it is far from being realistic, and can only be used to reliably evaluate the performance in a static environment where channel remains relatively unchanged over multiple radio frames.

DL channel estimation. As indicated in [hochwald2005vector]

non-linear precoding approaches, such as vector perturbation, results in a higher transmitted symbol constellation. As we have here identified this makes non-linear approaches more sensitive to the channel estimation errors than linear approaches. In order to evaluate the performance of non-linear precoding approaches adopting vector-perturbation, we compensated for the impact of the channel estimation errors by boosting the transmit power of ue specific dmrs used in LTE and 5G-NR for channel estimation. We note that dmrs is only used for detection at the ue side and not for beamforming that is based on the srs. We also note that LTE and 5G-NR already support power boosting for ue specific dmrs, however, only a predefined set of power boosting values can be used for this purpose. In case of LTE, a 3dB power boosting is used when more than 2 layers are transmitted. In case of 5G-NR 3dB, or 4.77dB power boosting can be applied, depending on the dmrs configuration used

[3gpp.38.214]. To inform ue about the non-standard compliant values we also extended rrc signaling. Figure 1 presents data for a single indoor measurement position and depicts the impact of ue specific dmrs power boosting on downlink sum spectral efficiency for a 4x4 MU-MIMO configuration. As seen, power boosting of ue specific dmrs can lead to significant performance improvements when non-linear precoding is used.

Figure 1: The impact of ue specific dmrs power boosting on system performance.

Channel state information (CSI) estimation. Another issue that we came across, independent of processing type (i.e. linear or non-linear), is related to estimation of csi. For the purpose of csi estimation 5G-NR and LTE employ a special signal transmitted in uplink termed srs. Existing implementation of srs tpc mechanism may result, however, in a partial loss of csi, which in turn can limit performance of a precoder. In particular, the signal amplitude difference between multiple ue in a cell is lost. The fundamental objective of the tpc mechanism is to assure that signals transmitted by multiple ue arrive at BS with approximately the same strength, which in turn results in a mentioned loss of information. In order to circumvent this, as a first attempt solution, we set the srs transmit power to a constant value. To achieve it, and at the same time retain the benefits of tpc for uplink, we separated tpc for srs and other uplink signals so that they are not conducted jointly. Note that in 5G-NR separate tpc for srs and other uplink signals is already part of the standard. Separate tpc for a new variant of srs (termed “additional” srs) has been also recently introduced in LTE release 16.

Lessons learned. Adapting non-linear processing in real system requires a number of changes in 3gpp standards which include changes in pusch and pdsch processing. Additional changes are also required in the signaling procedures. These primarily include extensions of rrc signaling which is used to inform ue about the settings of non-linear processing (e.g. additional power boosting for ue specific dmrs), but can also affect dci, e.g., to allow for a “per transmission” parameter selection. In addition, amc for non-linear systems is a critical missing component that, as we also discuss later in more detail, can determine systems performance. In this context, its absence may be one of the main reasons preventing from adopting non-linear approaches to actual systems.

Iv Evaluation results

This section presents results obtained by the ota measurements that validate our design and provide some indicative performance evaluation of advanced nl processing against linear (i.e., zf) approaches that serves as the baseline approach for linear processing. Without a loss of generality we employed the soft, near-optimal, non-linear detection algorithm discussed in [nikitopoulos2018massively] and vector-perturbation-based, non-linear precoder introduced in [husmann2018viper] since they are the most promising in terms of processing latency and complexity. The number of processing elements assumed are 40 and 32 for uplink and downlink, respectively, that have been observed to provide a good trade-off between error performance and computational complexity. The measurements were conducted using the developed hardware and software sword platform for a MU-MIMO setup with 4 and 8-antenna bs setup and 4 single antenna ue. While the examined MIMO dimensions are small, and as we will discuss later in detail, they have been sufficient to verify the gains of non-linear processing. It is also significant to note, that the aim of this short evaluation is not only to validate the gains of non-linear approaches compared to linear, but primarily to reveal hidden challenges in the system design that can affect performance.

Measurement setup. Several indicative locations were selected for measurements, including three outdoor locations (for uplink measurements only) and four indoor locations (for uplink and downlink measurements). Note that our platform does not currently integrate external power amplifiers. As a result, due to the limited output power, all selected locations had a los, and the snr was in range of 15dB or below. The platform was set in TDD mode, with 5 MHz channel bandwidth and operating frequency of 3.55 GHz. The LTE downlink/uplink slot configuration number 3 which includes 6 downlink slots, 3 uplink slots and 1 special slot was selected for measurements [3gpp.36.211]. A subset of antenna array elements with the same polarisation and equivalent to a ula was used. Furthermore, the scheduler for MU-MIMO was set to always schedule transmission to all ue with the same number of resource blocks. csi at the transmitter were obtained using srs transmitted in every frame, with a moving average filter applied to reduce the effect of thermal noise. To compensate for any thermal phase drift, each measurement instance was preceded by the TDD reciprocity calibration procedure.

Figure 2: Relative gain of NL over Linear processing in uplink for 4x4 and 8x4 MU-MIMO configuration for indoor measurement positions.
Figure 3: Relative gain of NL over Linear processing in uplink for 4x4 and 8x4 MU-MIMO configuration for outdoor measurement positions.

Uplink results. As can be seen in Figure 2 for our indicative indoor evaluations, and in Figure 3 for outdoor ones, the use of nl detection results in a consistent increase in overall system performance compared to linear. In case of the indoor scenario the average gains of NL approaches 57% and 9% for 4-antenna, and 8-antenna setup, respectively. In case of the outdoor scenario the average gains of approx. 47% and 5% was achieved. The reduced gain in the 8-antenna setup is expected, since increasing the number of BS antennas while maintaining the same number of UEs, allows simplifying the signal detection processing, at the cost of highly under-utilizing the MIMO channel [nikitopoulos2018massively]. Still, in contrast to what has been expected, the gains of the NL processing in the 8x4 MU-MIMO case are non-negligible.

Figure 4: Relative gain of NL over Linear processing in downlink for 4x4 and 8x4 MU-MIMO configuration for indoor measurement positions.

Downlink results. As seen in Figure 4, similarly to the uplink case, an increase in system performance through the use of NL precoding was obtained in all measured positions, with an average relative gain of approx. 50% and 2% in case of 4x4 and 8x4 MIMO configurations, respectively. While the downlink NL gains compared to linear processing are consistent, they are less prominent compared to the uplink. This is due to reasons like channel aging, as well as because of the imperfections of the SRS-based channel estimation and TDD calibration, that can be further improved.

V Remaining Challenges and Way Forward

Here we discuss some of the remaining challenges that need to be addressed in order to develop future bs that can benefit from the non-linear processing approaches.

Transmission Rate Adaptation. As it has become obvious, one of the important missing components in order to make non-linear processing both practical and efficient, is how to perform efficient rate adaptation. In this direction, two approaches can be potentially examined. The first one is to try to develop non-linear-specific amc methods, and the other is to adopt rateless (or fountain) channel encoding.

The direction towards developing non-linear-specific amc methods is particularly challenging in the uplink. In this case, the per-user SNR would typically differ and, therefore, each user should transmit using its own transmission rate. This issue could be partially handled by retaining the transmit power control mechanism of srs signals (see discussion in Section III-C). Still, the maximum achievable transmission rate is a function of the MIMO channel and the adopted detection method that makes the amc problem even more complicated. A promising direction towards non-linear-specific amc would be to consider the mathematical framework used for identifying the “most promising” vector solutions in the massively parallel methods of [nikitopoulos2018massively, FlexCore] since the corresponding metrics-of-promise

is related to the achievable error-rate probability. In the downlink, predicting the modulation order and the coding rate that maximizes the throughput could perhaps be easier, since typical non-linear precoding approaches result into the same SNR per user. Still, if a per-user “power-loading” approach is adopted (that by itself is an interesting research direction) the problem becomes similar to the uplink case, and then the duality between uplink and downlink transmissions could perhaps be explored.

Instead of a non-linear-specific amc, rateless codes can be used, that negate the need for choosing an mcs mode [etesamiRaptor, rfc5053, spinal]. This is achieved by initially transmitting high-rate information, and then by transmitting parity information, that decreases the effective information rate, until the transmitted information is correctly decoded. Still, such approaches will require revisiting of the way we transmit ACK/NACK signalling. It is noted, that a kin to the rate adoption problem, that is still open and perhaps needs to be considered jointly with amc, is the one of “user-selection” that allocates users to MIMO transmissions (or MIMO antennas) in order to maximize systems performance.

Scaling to large numbers of users. Channel estimation is an important aspect of every mimo system. In order to allow for effective channel estimation in 3gpp systems, each stream is assigned with a dmrs which is orthogonal with respect to dmrs allocated for other streams. As 3gpp systems have not been specifically designed for non-linear processing (which enables supporting very large numbers of users), the number of orthogonal dmrs allocations in 3gpp is limited to 8 in LTE [3gpp.36.211] and 12 in 5G-NR [3gpp.38.300]. Whilst these limits seem reasonable when systems are based on linear processing, for which a number of antennas grows rapidly with a number of streams (making deployment of sites supporting large number of streams impractical), such limits may become a bottleneck in case of systems with non-linear processing (in particular, when number of concurrently supported ue is larger than the number bs antennas, as discussed in the next Section). Note also that srs capability to support multiple users, is highly dependent on the channel delay spread and there is only a limited number of cyclic shifts that can be used in practice. As a result, and given that the srs periodicity needs to reflect changes in the channel coherence time, the srs capacity may not be sufficient to maintain the csi reliability.

Vi Conclusions & Future Research

We have, for the first time, verified in a 3gpp compliant framework that non-linear processing is a promising approach for increasing the achievable throughput and user connectivity of mobile systems. Still, there is a lot of research yet to be done before such approaches are adopted by actual wireless systems and standards. Among them, two of the most significant open questions are how to perform rate adaption and how to redesign the corresponding wireless systems in order to be able to support a much larger number of users. Especially since, as has already been shown in the literature, the gains of non-linear processing increase when increasing the number of concurrently supported users.

Despite the already verified gains, the most interesting capabilities that non-linear bs processing can offer, and perhaps revolutionize future wireless systems, have not yet been explored. In this direction, we can identify two promising research pathways: (a) non-linear processing for supporting numbers of transmitted information streams that are larger than the number of bs antennas, and (b) practical, non-linear, iterative, bs processing for further bridging the gap between the theoretical capacity and the achievable throughput of systems with large connectivity.

Transmitting more streams than base-station antennas. In a “fully connected” wireless ecosystem, future communication systems will need to be able to support a very large number of users. Traditional wireless system designs with linear processing are not capable of supporting more information streams than the number of bs antennas and, in practice, can efficiently support only a much smaller number of information streams. Non-linear processing approaches can negate this limitation, at least theoretically, and can promise a supported number of information streams that is much larger than the number of users [GSD, gMultisphere], even without the need for specifically designated Non-Orthogonal Multiple Access (NOMA) techniques [PwrNOMA, LDSOFDM, nikopour2013sparse]. Still, as already discussed, for developing such systems we will need to revisit the signalling procedures, as well as the way we perform channel estimation.

Non-linear, iterative, base-station processing. Iterative systems that exchange “soft” information between a non-linear detector and a “soft” channel decoder promise substantial gains [hochwald2003achieving, STSsoftin, LLRclip]. Still, such approaches are not scalable to large number of information streams due to their exponentially increased complexity and latency requirements. For example, the approximate non-linear approach of [LLRclip] would require a multiplications for a MIMO system. On the other hand, currently proposed massively parallel, soft-output approaches that can substantially reduce the corresponding complexity and processing latency requirements [nikitopoulos2018massively, gMultisphere] are not applicable to the iterative case. This is because such approaches are heavily based on the geometrical properties of the transmitted signal constellation which is destroyed by the existence of prior (from previous iterations) information. Furthermore, existing iterative schemes, cannot currently support a larger number of information streams than the bs antennas. Developing non-linear, massively-parallel, iterative detection, decoding techniques, able to support more users than bs antennas, could give a significant connectivity boost, and allows us to access unexploited capacity resources.