Terahertz (THz) band is the last piece of the radio frequency (RF) spectrum puzzle for wireless systems . The massive bandwidth promises to provide ultra-high data rates and seamlessly support new applications in 6G such as extended reality, autonomous driving, and edge intelligence . At the same time, many problems, such as the limited coverage range, are yet to be solved to fully unleash its potential. To combat the coverage problem in an energy-efficient manner, ultra-massive multiple-input multiple-output (UM-MIMO) with an array-of-subarray (AoSA) structure has been proposed as a promising solution . Such structure groups the antenna array into multiple subarrays, each powered by one RF chain, to perform highly directional hybrid beamforming . The fine-grained feature of beamforming design necessitates accurate channel estimation with a low pilot overhead, which, however, is extremely challenging due to limited RF chains.
Conventional wireless systems operating at sub-6 GHz and millimeter wave bands mostly considered the far-field region only, as the radius of the near-field region, determined by the Rayleigh distance, is much smaller compared with the coverage range. By contrast, for THz UM-MIMO, the near-field region becomes critical due to the enlarged Rayleigh distance, which, for example, is about m for an array with m aperture at GHz. This will occupy a large portion of the coverage of a typical THz system. Therefore, depending on the distances between the RF source/scatterers and the array, far- and near-field paths typically co-exist and together constitute the hybrid-field channel. Considering such a unique feature, channel estimation algorithms for THz UM-MIMO have to be compatible with both the far- and near-field paths, and be robust against the variable channel conditions.
Unfortunately, so far there is no unified algorithm that can address these challenges. To reduce pilot overhead, existing works mostly adopted the uni-field assumption and exploited dedicated sparsity patterns in either the far field [11, 4] or near field  to design efficient compressed sensing algorithms. A hybrid-field scenario was considered in , but the authors assumed a priori knowledge of whether each path is from the far- or near-field region to decide what algorithm to apply, which is far from practical. As an alternative, deep unfolding (DU) methods can be adopted to learn the complex channel conditions by augmenting classic iterative algorithms with learnable components [9, 8]. However, several critical problems remain unsolved and hinder their application. Specifically, although adapted from classic algorithms, the convergence of DU methods is generally not guaranteed. In addition, DU methods are truncated to a fixed number of iterations. This contradicts the property of classic iterative algorithms and can lead to an unstable performance in the changeable channel conditions.
To tackle these issues, we develop a deep learning based hybrid-field channel estimator for THz UM-MIMO that enjoys convergence guarantee and adaptive complexity. Specifically, inspired by fixed point theory , we transform each iteration of orthogonal approximate message passing (OAMP) 
into a contractive mapping, by replacing the nonlinear estimator with a specially-trained convolutional neural network (CNN). Thanks to the powerful modeling capacity of CNNs, the patterns of the hybrid-field THz UM-MIMO channel can be accurately identified and exploited. The estimated channel is computed via a fixed point iteration of the contractive mapping. It is shown that the proposed estimator enjoys provable linear convergence and can model neural networks with depth that adapts to the hybrid-field channel conditions and an adjustable error tolerance. Simulation results will verify our theoretical results and demonstrate that the proposed method outperforms state-of-the-art approaches by a large margin.
Notation: Throughout this paper, , , , , , and
are respectively the transpose, Hermitian, pseudoinverse, trace, vectorization, and the-th element of matrix . and are the -norm and the -th element of vector , respectively. is the absolute value of scalar ; returns a block diagonal matrix by aligning along the diagonal. and
are the identity matrix and the all-zero vector of appropriate dimensions.denotes expectation. denotes the composition of functions.
is a continuous uniform distribution over the interval ofand .
is a complex normal distribution with meanand covariance .
Ii System Model and Problem Formulation
We consider the uplink channel estimation for THz UM-MIMO systems. The base station (BS) is equipped with a planar AoSA with subarrays (SAs), while each SA is a uniform planar array consisting of antenna elements (AEs), as illustrated in Fig. 4(a). To improve energy efficiency, the AoSA adopts partially-connected hybrid analog-digital beamforming , as shown in Fig. 4(b). Within each SA, the AEs share the same RF chain through dedicated phase shifters. A total of RF chains are utilized to receive data streams from multiple single-antenna user equipments (UEs).
We define the index of the SA at the -th row and -th column of the AoSA by , where and . Similarly, the index of the AE at the -th row and -th column of a certain SA is defined by , where and . The distances between adjacent SAs and adjacent AEs are denoted by and , respectively. As shown in Fig. 4
(a), we construct a Cartesian coordinate system with the origin being the first AE in the first SA. Assuming that the AoSA lies in the- plane, then the coordinate of the -th AE in the -th SA is given by
Ii-a Hybrid-Field THz UM-MIMO Channel Model
Here we introduce the hybrid far- and near-field propagation environment along with channel model. The boundary of the far- and near-field regions is determined by the Rayleigh distance, i.e., , where is the array aperture, and is the carrier wavelength. Due to the enlarged near-field region in THz UM-MIMO, the channel can consist of both the far- and near-field paths, as shown in Fig. 4(c). The number of far- and near-field paths may vary, which renders the channel condition changeable. Additionally, since the wavefront is approximately planar in the far field and spherical in the near field, the array responses should be modeled separately.
Due to limited scattering, the spatial channel between the BS and a specific UE can be characterized by the superposition of one LoS path and NLoS paths , i.e.,
where is a normalization factor such that , is the carrier frequency. Also, , , , , , and are respectively the path loss, azimuth angle of arrival (AoA), elevation AoA, distance between the array and the RF source/scatterer, array response vector, and time delay of the -th path. In particular, , , are measured with respect to the origin of the coordinate system, as shown in Fig. 4(a).
Ii-A1 Path Loss
The path loss accounts for both the spread loss and the molecular absorption loss. Assuming that denotes the LoS path and denote NLoS paths, then
where is the reflection coefficient, is the LoS path length, and is the molecular absorption coefficient . For the LoS path, . For the NLoS paths, is given by
where is the angle of incidence of the -th path, is the angle of refraction. Also, and are respectively the refractive index and the roughness coefficient of the reflecting material .
Ii-A2 Array Response Vector
The array response vector differs in the far- and near-field regions, which are determined by the distance , and is given by
For notational brevity, we first construct the array response matrix. Due to the planar wavefront, each element of the far-field array response matrix is
where is the speed of light, and is the unit-length vector in the AoA direction of the -th path, given by . The corresponding far-field array response vector is given by . Due to the spherical wavefront, each element of the near-field array response matrix depends on the exact distance between the AE and the RF source/scatterer, i.e.,
The near-field array response vector can be similarly obtained by vectorization, i.e., .
Ii-B Problem Formulation
In uplink channel estimation, the UEs transmit known pilot signals to the BS for time slots. We assume that orthogonal pilots are adopted and consider an arbitrary UE without loss of generality. For the ease of algorithm design and comparison, the spatial channel is transformed to its angular domain representation in an SA-by-SA manner by using , where
is a unitary matrix with eachbeing an
normalized discrete Fourier transform matrix. The received pilot signalin the -th time slot is given by
where is the digital combining matrix, is the analog combining matrix while the elements of each component vector satisfy the unit-modulus constraint, i.e., , is the known pilot signal that is set as 1 for convenience, and
is the noise. The average received signal-to-noise-ratio (SNR) is. Since the combining matrices cannot be optimally tuned without knowledge of the channel, we consider an arbitrary scenario where is set as the identity matrix and the analog phase shifts in are randomly chosen from one-bit quantized angles, i.e., , to reduce energy consumption . The received signal after time slots of pilot transmission is given by
where , and .
To transform (9) into its equivalent real-valued form, we let , . , and
Then, the equivalent real-valued form is given by
Based on (11), channel estimation can be formulated as a linear inverse problem whose goal is to compute a good estimate of given the knowledge of and . However, due to the practical requirements of low pilot overhead and limited RF chains, it is often the case that , which makes the problem significantly ill-posed. Existing works rely heavily on the channel sparsity to design compressed sensing algorithms for channel estimation. However, the sparsifying transformations for the far- and near-field paths are not compatible with each other . DU methods could be adopted but they generally lack theoretical guarantees. Additionally, the fixed number of layers limits their ability to adapt to the changeable channel conditions and can cause unstable performance. These drawbacks motivate us to design an efficient deep learning based hybrid-field channel estimator with provable convergence guarantee and adaptive complexity.
Iii Fixed Point Networks for Hybrid-Field THz UM-MIMO Channel Estimation
Iii-a Fixed Point Modeling of Neural Networks
Most iterative algorithms for solving linear inverse problems can be represented in the following general form, i.e.,
where denotes the intermediate estimation at the -th iteration, and denotes a mapping parameterized by . Well-known examples of this general form include proximal algorithms, OAMP, and also weight-tied neural networks. The limit as , i.e., , supposed that it exists, is a solution of the fixed point equation
which models the behavior of an algorithm at its convergence. Particularly, if corresponds to one layer of a weight-tied neural network, then the fixed point is the output of the network after an infinite number of layers [1, 5].
Our basic idea is to design a neural network assisted mapping such that its fixed point is a good estimate of the hybrid-field channel given the pilot measurement . We refer to such a general framework as the fixed point network (FPN). Specifically, can be constructed by adding learnable components to various different algorithms that belong to (12). The remaining question is how we can ensure the existence of the fixed point and find it efficiently. Before further discussion, we first define two key concepts.
Definition 1 (Lipschitz continuity).
A mapping is Lipschitz continuous if there exists a constant such that
holds for any .
Definition 2 (Contractive).
A mapping is contractive if it is Lipschitz continuous with constant .
The existence of the fixed point and an efficient way to find it can both be ensured by fixed point theory. As long as is a contractive mapping (no matter what detailed operations it contains), a simple repeated application of will make converge linearly to the unique fixed point .
Theorem 3 (Banach-Picard [2, Theorem 1.50]).
For any initial value , if the sequence is generated via the relation and is a contractive mapping with Lipschitz constant , then converges to the unique fixed point of with a linear convergence rate . The gap between and decreases geometrically as .
This theorem reveals several unique advantages of the FPNs that are not available in prevailing DU methods. First, it provides a simple and unified framework to establish convergence guarantee. The only requirement, i.e., is contractive, can be satisfied by controlling the Lipschitz constant of the neural network during training . Second, the complexity of FPNs is adaptive and can be adjusted at the testing time. Since the fixed point iteration converges linearly to the unique fixed point , one can run it to an arbitrary depth depending on the desired accuracy and the hybrid-field channel condition (reflected in ). This will offer a flexible tradeoff between complexity and performance, as well as an excellent ability to adapt to the changeable channel conditions.
As mentioned before, there are a lot of possible choices for the mapping . To incorporate wireless domain knowledge, we design it based on the algorithmic structure of a powerful compressed sensing algorithm, i.e., the OAMP. The proposed FPN-based variant of it, called the FPN-OAMP, consists of a closed-form linear estimator (LE) and a CNN-based nonlinear estimator (NLE). The mapping is a composition of them, i.e., . The process of FPN-OAMP is summarized in Fig. 7(a) and Algorithm 1.
Iii-B1 Linear Estimator
The LE of FPN-OAMP is similar to that of OAMP. It is given by
The LE matrix is constructed by111The measurement matrix is fixed since the combining matrices cannot be optimally tuned without knowledge of the channel . As a result, the LE matrix is also fixed, and thus the computation of pseudoinverse is avoided.
where is the step size that ensures , such that the LE is de-correlated .
Iii-B2 Nonlinear Estimator
The NLE is given by
where is a ResNet-like structure consisting of three residual blocks (RBs), as shown in Fig. 7(b). Before the RBs,
is first reshaped into a tensor offeature maps of size , each corresponding to an SA, and then passes through a convolution (Conv) layer to lift them to 64 feature maps. Each RB constitutes two Conv layers with
kernels and a fixed number of 64 feature maps, which are respectively followed by a ReLU activation. We further follow the RBs by twoConv layers, where the first one adopts leaky ReLU activation, before reshaping back to the vector form .
Iii-C Linear Convergence of the FPN-OAMP
To prove the linear convergence of the FPN-OAMP, on the basis of Theorem 3, we need to show that is contractive, which is proved as follows.
Lemma 4 ().
The composition of an -Lipschitz and an -Lipschitz mapping is -Lipschitz.
Each layer of the FPN-OAMP is a contractive mapping if is contractive.
We begin by showing that the Lipschitz constant of
is 1. Because the non-zero eigenvalues ofand are the same, the eigenvalues of equal either 0 or 1. Since is an affine mapping, its Lipschitz constant is the spectral norm of
, i.e., the largest singular value of the matrix, given by
where denotes the -th eigenvalue of a matrix. Therefore, according to Lemma 4, the composition is a contractive mapping if is contractive. ∎
We provide details on training a contractive afterwards. Since is contractive regardless of , the linear convergence rate of FPN-OAMP will hold for different channel conditions and SNR levels. The error tolerance in Algorithm 1 explicitly controls the accuracy of the approximate fixed point , since the gap between and decreases geometrically. Adjusting can provide a flexible tradeoff between complexity and performance.
Iii-D Training of the FPN-OAMP
During training, we first run the fixed point iteration to find the approximate fixed point given an error tolerance
. The loss function is chosen as the normalized mean squared error (NMSE) ofand the ground truth channel , i.e.,
We adopt the Jacobian-free backpropagation in to train the FPN-OAMP, which only imposes a constant memory overhead regardless of the number of fixed point iterations. To enforce the contractive property of , we check its Lipschitz constant after each weight update using the current batch of training data. The Lipschitz constant is approximated by
where is the batch size,
corresponds to the -th training sample, and
is a small random perturbation . According to Lemma
4, since the LE and the (leaky) ReLU activations in the NLE are all
1-Lipschitz , the Lipschitz constant of
is only determined by the Conv layers. Additionally, since each Conv
layer is an affine mapping, its Lipschitz constant is the spectral
norm of the weight matrix, which can be controlled by multiplying
the weight by a constant. Therefore, if the contractive property is
found violated, i.e., , we can correct it by replacing the
weights of each Conv layer in , i.e., , by ,
where is the desired Lipschitz constant of the NLE222The exponent is chosen because has 9 Conv layers in total. Although there are residual connections
in the RBs,
has 9 Conv layers in total. Although there are residual connections in the RBs, has shown that the Lipschitz constant can still be controlled in this way, as long as is close to 1, i.e., the contractive property is not seriously violated. Actually, we observed that the correction was seldom triggered during training. .
Iv Simulation Results
This section provides simulation results to evaluate the performance of the proposed FPN-OAMP method in a typical THz UM-MIMO system . The key simulation parameters are summarized in Table I. Specifically,
is set as a random variable spanning both far- and near-field regions to model the hybrid-field propagation. The performance metric is the NMSE, which is averaged over a testing dataset with 5000 samples. Five benchmarks are adopted for comparison:
LS: Least squares.
OMP: Orthogonal matching pursuit .
OAMP: OAMP with the pseudoinverse LE .
FISTA: Fast iterative soft thresholding algorithm .
ISTA-Net+: State-of-the-art DU method based on the iterative soft thresholding algorithm . The number of layers is fixed as 10, since a further increase is observed to offer only negligible performance gain.
|Number of SAs / RF chains|
|Number of AEs per SA|
|Number of BS antennas|
|Angle of incidence|
|Number of paths|
|LoS path length||m|
|Scatterer distance ()||m|
|Time delay of LoS path||nsec|
|Time delay of NLoS paths ()||nsec|
We train two different sets of parameters for both ISTA-Net+ and the proposed FPN-OAMP: one for the low SNR scenario (0 to 10 dB), and the other for the high SNR scenario (10 to 20 dB). For each scenario, we generate 80000, 5000, and 5000 samples for the purpose of training, validation, and testing, respectively. The SNR of each sample is randomly drawn based on its respective scenario. We train the networks for 150 epochs using the Adam optimizer with an initial learning rate of 0.001 and a batch size of 128. The learning rate is reduced by half after every 30 epochs. When training the FPN-OAMP, we set the error toleranceas 0.01 and the maximum number of iteration as 15 to accelerate the process. Note that in the testing stage, FPN-OAMP can run for an arbitrary number of iterations, depending on the channel condition and the error tolerance . For fair comparison with ISTA-Net+, during testing, we also set the the error tolerance as 0.01, and the maximum number of iterations as 15.
In Fig. 11(a), we present the NMSE performance versus the average received SNR. It is observed that the proposed FPN-OAMP significantly outperforms all five benchmarks under different SNR levels. Compared with its base algorithm OAMP, the performance gain of FPN-OAMP is as large as about 15 dB in terms of NMSE. This indicates that the CNN component of FPN-OAMP can effectively identify and exploit the complicated hybrid-field channel conditions.
In Fig. 11(b), we illustrate the NMSE evaluated at different iteration/layer when dB. LS and OMP are not plotted since they do not produce intermediate results. As observed, NMSE of the proposed FPN-OAMP and its base algorithm OAMP converges rapidly within only 4 iterations, while FISTA converges after about 20 iterations. Notably, the performance of FPN-OAMP at iteration 2 has already outperformed the final performance of all benchmark methods, which demonstrates its superior efficiency. By contrast, the intermediate performance of the DU method, ISTA-Net+, is very unstable and keeps fluctuating until layer 7. This is because DU is a black-box solver which only seeks to optimize the final estimation quality but does not explicitly control the internal dynamics like the proposed FPN-OAMP. Additionally, the fixed number of layers limits its ability to adapt to the changeable hybrid-field channel conditions.
Fig. 11(c) provides numerical verifications for the linear convergence of FPN-OAMP under different SNR levels. In logarithmic scale, we plot the normalized gap between the intermediate estimate at iteration/layer , i.e., , and the approximate fixed point, i.e., , defined by . It is observed that the curves are all linear, which demonstrates that the linear convergence property identified in Subsection III-C is consistent under different SNR levels, and the training strategy presented in Subsection III-D is effective. Although we plot the averaged results over the testing dataset for illustration, it is worth noting that the linear convergence property holds for each individual sample.
V Conclusions and Future Work
In this paper, we proposed an efficient deep learning based hybrid-field channel estimator for THz UM-MIMO. Significant performance gains are then observed in comparison with state-of-the-art benchmarks. One unique advantage is the linear convergence guarantee based on fixed point theory, which is not available in the prevailing DU methods. Besides, the computational complexity is adaptive, making it more suitable for future wireless networks with complicated hybrid-field channel conditions. Further extension of the proposed FPNs to other important inverse problems in wireless communications, such as data detection, is a promising future direction.
-  (2019-Dec.) Deep equilibrium models. In Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), Cited by: §III-A.
-  (2019) Convex analysis and monotone operator theory in Hilbert spaces (2nd edition). Springer. Cited by: §I, Theorem 3.
-  (2009-Feb.) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 (1), pp. 183–202. Cited by: 4th item.
-  (2021-Jun.) Channel estimation and hybrid combining for wideband terahertz massive MIMO systems. IEEE J. Sel. Areas Commun. 39 (6), pp. 1604–1620. External Links: Cited by: §I, §II-A1, §II-A1, §II-A, §IV.
-  (2022-Feb.) JFB: jacobian-free backpropagation for implicit models. In Proc. Assoc. Adv. Artif. Intell. (AAAI), Cited by: §III-A, §III-D.
-  (2021-Dec.) Regularisation of neural networks by enforcing Lipschitz continuity. Mach. Learn. 110 (2), pp. 393–416. Cited by: §III-A, §III-D, Lemma 4.
-  (2020-05) Channel estimation for extremely large-scale massive MIMO systems. IEEE Wireless Commun. Lett. 9 (5), pp. 633–637. External Links: Cited by: §I.
-  (2019-Oct.) Model-driven deep learning for physical layer communications. IEEE Wireless Commun. 26 (5), pp. 77–83. External Links: Cited by: §I.
-  (2018-Oct.) Deep learning-based channel estimation for beamspace mmWave massive MIMO systems. IEEE Wireless Commun. Lett. 7 (5), pp. 852–855. External Links: Cited by: §I, §II-B, footnote 1.
-  (2021-Nov.) Feasibility-based fixed point networks. Fixed Point Theory Algorithms Sci. Eng. 2021 (1), pp. 1–19. Cited by: §III-D, footnote 2.
-  (2016-Jun.) Channel estimation via orthogonal matching pursuit for hybrid MIMO systems in millimeter wave communications. IEEE Trans. Commun. 64 (6), pp. 2370–2386. External Links: Cited by: §I, 2nd item.
Edge artificial intelligence for 6G: vision, enabling technologies, and applications. IEEE J. Sel. Areas Commun. 40 (1), pp. 5–36. External Links: Cited by: §I.
-  (2017-Jan.) Orthogonal AMP. IEEE Access 5 (), pp. 2020–2033. External Links: Cited by: §I, §III-B1, 3rd item.
-  (2021-Oct.) An overview of signal processing techniques for terahertz communications. Proc. IEEE 109 (10), pp. 1628–1665. External Links: Cited by: §I.
-  (2022-Jan.) Channel estimation for extremely large-scale massive MIMO: far-field, near-field, or hybrid-field?. IEEE Commun. Lett. 26 (1), pp. 177–181. External Links: Cited by: §I, §II-B.
ISTA-Net: interpretable optimization-inspired deep network for image compressive sensing.
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Vol. . External Links: Cited by: 5th item.
-  (2020-Jan.) Hybrid beamforming for 5G and beyond millimeter-wave systems: a holistic view. IEEE Open J. Commun. Soc. 1 (), pp. 77–91. External Links: Cited by: §I, §II.