The broad range of Internet-of-Things (IoT) applications continues to contribute substantially to the economic development and the improvement of our lives . In particular, the wirelessly networked sensors are growing at an unprecedented rate, making data aggregation highly critical for IoT services . For large scale wireless networking of sensor nodes, orthogonal multiple access protocols are highly impractical because of their low spectrum utilization efficiency for IoT and the excessive network latency . In response, the concept of over-the-air computation
(AirComp) has recently been considered for computing a class of nomographic functions, such as arithmetic mean, weighted sum, geometric mean and Euclidean norm of distributed sensor data via concurrent, instead of the sequential, node transmissions. AirComp exploits the natural superposition of co-channel transmissions from multiple data source nodes .
There have already been a number of published works related to AirComp. Among them, one research thread takes on the information theoretic view and focuses on achievable computation rate under structured coding schemes. Specifically, in the seminal work of , linear source coding was designed to reliably compute a function of distributed sensor data transmitted over the multiple-access channels (MACs). Lattice codes were adopted in [6, 7] to compute the sum of source signals over MACs efficiently. Leveraging lattice coding, a compute-and-forward relaying scheme  was proposed for relay assisted networks. On the other hand, a different line of studies [8, 9] investigates the error of distributed estimation in wireless sensor networks. In particular, linear decentralized estimation was investigated in  for coherent multiple access channels. Power control was investigated in  to optimize the estimation distortion. It was shown in  that pre- and post-processing functions enable the optimization of computation performance by harnessing the interference for function computations. Another more recent line of studies focused on designing transmitter and receiver matrices in order to minimize the distortion error when computing desired functions. Among others, MIMO-AirComp equalization and channel feedback techniques for spatially multiplexing multi-function computation have been proposed . Another work developed a novel transmitter design at the multiple antennas IoT devices with zero-forcing beamforming .
However, the main limitation of current AirComp is the dependence on channel-state-information (CSI), which leads to high latency and significant overhead in the massive Internet-of-Things networks with a large amount of devices. Even though the work  has proposed a type of CSI at sensor nodes, called No CSI, the receiver still needs to obtain the statistical channel knowledge. Recently, blind demixing has become a powerful tool to elude channel-state-information (i.e., without channel estimation at both transmitters and receivers) thereby enabling low-latency communications [13, 14, 15]. Specifically, in blind demixing, a sequence of source signals can be recovered from the sum of bilinear measurements without the knowledge of channel information . Inspired by the recent progress of blind demixing, in this paper, we shall propose a novel blind over-the-air computation
(BlairComp) scheme for low-latency data aggregation, thereby computing the desired function (e.g., arithmetic mean) of sensing data vectors without the prior knowledge of channel information. However, the BlairComp problem turns out to be a highly intractable nonconvex optimization problem due to the bilinear signal model.
There is a growing body of recent works to tame the nonconvexity in solving the high-dimensional bilinear systems. Specifically, semidefinite programming was developed in  to solve the blind demixing problem by lifting the bilinear model into the matrix space. However, it is computationally prohibitive for solving large-scale problem due to the high computation and storage cost. To address this issue, the nonconvex algorithm, e.g., regularized gradient descent with spectral initialization , was further developed to optimize the variables in the natural vector space. Nevertheless, the theoretical guarantees for the regularized gradient  provide pessimistic convergence rate and require carefully-designed initialization. The Riemannian trust-region optimization algorithm without regularization was further proposed in  to improve the convergence rate. However, the second-order algorithm brings unique challenges in providing statistical guarantees. Recently, theoretical guarantees concerning regularization-free Wirtinger flow with spectral initialization for blind demixing was provided in . However, this regularization-free method still calls for spectral initialization. To find a natural implementation for the practitioners that works equally well as spectral initialization, in this paper, we shall propose to solve the BlairComp problem via randomly initialized Wirtinger flow with provable optimality guarantees.
Based on the random initialization strategy, a line of research studies the benign global landscapes for the high-dimensional nonconvex estimation problems, followed by designing generic saddle-point escaping algorithms, e.g., noisy stochastic gradient descent, trust-region method , perturbed gradient descent . With sufficient samples, these algorithms are guaranteed to converge globally for phase retrieval , matrix recovery , matrix sensing , robust PCA 
and shallow neural networks, where all local minima are provably as good as global and all the saddle points are strict. However, the theoretical results developed in [17, 18, 19, 20, 21, 22] are fairly general and may yield pessimistic convergence rate guarantees. Moreover, these saddle-point escaping algorithms are more complicated for implementation than the natural vanilla gradient descent or Wirtinger flow. To advance the theoretical analysis for gradient descent with random initialization, the fast global convergence guarantee concerning randomly initialized gradient descent for phase retrieval has been recently provided in .
In this paper, our main contribution is to provide the global convergence guarantee concerning Wirtinger flow with random initialization for solving the nonconvex BlairComp problem. It turns out that, for BlairComp, the procedure of Wirtinger flow with random initialization can be separated into two stages:
Stage I: the estimation error is nearly stable, which takes only a few iterations,
Stage II: the estimation error decays exponentially at a linear convergence rate.
In addition, we identify the exponential growth of the magnitude ratios of the signals to perpendicular components, which explains why Stage I lasts only for a few iterations.
Throughout this paper, or denotes that there exists a constant such that whereas means that there exists a constant such that . denotes that there exists some sufficiently large constant such that . In addition, the notation means that there exists constants such that . Let superscripts and denote the transpose and conjugate transpose of a matrix/vector, respectively. Let the superscript denote the conjugate transpose of a complex number.
Ii Problem Formulation
Blind over-the-air computation (BlairComp) aims to facilitate low-latency data aggregation in IoT networks without a priori knowledge of CSI. This is achieved by computing the desired functions of the distributed sensing data based on the natural signal superposition of transmission over multi-access channels.
Ii-a Blind Over-the-Air Computation
We consider a wireless sensor network consisting of active sensor nodes and a single fusion center. Let denote the sensor data vector collected at the -th node. The fusion center, through AirComp, aims to compute nomographic functions of distributed data that can be decomposed as 
Function denotes the pre-processing function by the sensor nodes and denotes the post-processing function at the fusion center. Typical nomographic functions by AirComp include the arithmetic mean, weighted sum, geometric mean, polynomial, Euclidean norm .
In this work, we focus on a specific nomographic function
where is the preprocessed data vector transmitted by the -th node. Over channel access opportunities (e.g., time slots), the received signals at fusion center in the frequency domain can be written as [13, 15]
where are the access vectors, the CSI vectors, and is an independent circularly symmetric complex Gaussian measurement noise.
To compute the desired functions via BlairComp without knowledge of , we can consider a precoding scheme with randomly selected known vectors
follows i.i.d. circularly symmetric complex normal distributionfor . Furthermore, the first
columns of the unitary discrete Fourier transform (DFT) matrixform the known matrix . The target of BlairComp is to compute the desired function vector via concurrent transmissions without channel information, thereby providing low-latency data aggregation in the IoT networks.
Ii-B Multi-Dimensional Nonconvex Estimation
One way to estimate the result vector from the received signals in (3) is to use with as the ambiguity alignment parameter, for which one could first solve the bilinear optimization problem:
which is highly nonconvex. To measure the computation accuracy for BlairComp problem, we define the following metric
Note that are ambiguity alignment parameters such that
To estimate , one reference symbol in is needed.
In this paper, we shall propose to solve the high-dimensional BlairComp problem via Wirtinger flow with random initialization. Our main contribution is to provide the statistical optimality and convergence guarantee for the randomly initialized Wirtinger flow algorithm by exploiting the benign geometry of the high-dimensional BlairComp problem.
Iii Main Approach
In this section, we first propose an algorithm based on randomly initialized Wirtinger flow to solve the BlairComp problem . We shall present a statistical analysis to demonstrate the optimality of this algorithm for solving the high-dimensional nonconvex estimation problem.
Iii-a Randomly Initialized Wirtinger Flow Algorithm
Wirtinger flow with random initialization is an iterative algorithm with a simple gradient descent update procedure without regularization. Specifically, the gradient step of Wirtinger flow is represented by the notion of Wirtinger derivatives , i.e., the derivatives of real valued functions over complex variables.
To simplify the notations, we denote , where
For each , and denote the Wirtinger gradient of with respect to and respectively as:
In light of the Wirtinger gradient (8), the update rule of Wirtinger flow uses a stepsize via
Before proceed to theoretical analysis, we first present an example to illustrate the practical efficiency of Wirtinger flow with random initialization for solving problem (4). The ground truth values and initial points are randomly generated according to
for . In all our simulations, we set and normalize for . Specifically, for each value of , and , the design vectors ’s and ’s for each , are generated according to the descriptions in Section II. With the chosen step size in all settings, Fig. 1 shows the relative error, i.e., (5), versus the iteration count. We observe the convergence of Wirtinger flow with random initialization exhibits two stages: Stage I: within dozens of iterations, the relative error remains nearly flat, Stage II: the relative error shows exponential decay despite the different problem sizes.
In practical scenario, the estimation error of ambiguity alignment parameters would have influences on the relative error, i.e., (5). Hence, we illustrate the relationship between the estimation error of ambiguity alignment parameters and the relative error via the following experiment. Let , , the step size be and the number of users . In each iteration, for , the estimated ambiguity alignment parameter is given by where is given by (6) and is the additive noise. In the experiment, the parameter varies from to . Fig. 1 shows the relative error versus the parameter . Both the relative error and the parameter are shown in the dB scale. As we can see, the relative error scales linearly with the parameter .
Iii-B Theoretical Analysis
To present the main theorem, we first introduce several fundamental definitions. Specifically, the incoherence parameter , which characterizes the incoherence between and for .
Definition 1 (Incoherence for BlairComp).
Let the incoherence parameter be the smallest number such that
Let and , respectively, denote
where ’s are alignment parameters. We further define the norm of the signal component and the perpendicular component with respect to for as
respectively. Here, ’s are the alignment parameters. Similarly, the norms of the signal component and the perpendicular component with respect to for can be represented as
Without loss of generality, we assume () for and , for . Define the condition number with . Then the main theorem is presented in the following.
Assume that the initial points obey (11) for and the stepsize satisfies . Suppose that the sample size satisfies for some sufficiently large constant . Then with probability at least
. Then with probability at leastfor some constants , there exists a sufficiently small constant and such that
The randomly initialized Wirtinger flow makes the estimation error decays exponentially, i.e.,
The magnitude ratios of the signal component to the perpendicular component with respect to and obey
respectively, where for some constants .
The normalized root mean square error for obeys
for some constant .
Theorem 1 provides precise statistical analysis on the computational efficiency of Wirtinger flow with random initialization. Specifically, in Stage I, it takes iterations for randomly initialized Wirtinger flow to reach sufficient small relative error, i.e., where is some sufficiently small constant. The short duration of Stage I is own to the exponential growth of the magnitude ratio of the signal component to the perpendicular components. Moreover, in Stage II, it takes iterations to reach -accurate solution at a linear convergence rate. Thus, the iteration complexity of randomly initialized Wirtinger flow is guaranteed to be as long as the sample size exceeds .
To further illustrate the relationship between the signal component (resp. ) and the perpendicular component (resp. ) for , we provide the simulation results under the setting of , , and with for . In particular, , versus iteration count (resp. , versus iteration count) for is demonstrated in Fig. 2 (resp. Fig. 2). Consider Fig. 1, Fig. 2 and Fig. 2 collectively, it shows that despite the rare decline of the estimation error, i.e., , during Stage I, the size of the signal component, i.e., and for each , exponentially increase and the signal component becomes dominant component at the end of Stage I. Furthermore, the exponential growth of the ratio (resp. ) for each is illustrated in Fig. 2 (resp. Fig. 2).
Iv Dynamics Analysis
In this section, we prove the main theorem by investigating the dynamics of the iterates of Wirtinger flow with random initialization. The steps of proving Theorem 1 are summarized as follows.
Dynamics of population-level state evolution. Provide the population-level state evolution of (24a) and (24b), (25a), (25b) respectively, where the sample size approaches infinity. We then develop the approximate state evolution (27), which are remarkably close to the population-level state evolution, in the finite-sample regime. See details in Section IV-A.
Leave-one-out arguments. Prove that with high probability , , and satisfy the approximate state evolution (27) if the iterates are independent with . Please refer to Lemma 2. To achieve this, the “near-independence" between and is established via exploiting leave-one-out arguments and some variants of the arguments. Specifically, the leave-one-out sequences and random-sign sequences are constructed in Section IV-C. The concentrations between the original and these auxiliary sequences are then provided in Lemma 4-Lemma 9.
Stage II: Local geometry in the region of incoherence and contraction. We invoke the prior theory provided in  to show local convergence of the random initialized Wirtinger flow in Stage II.
Iv-a Dynamics of Population-level State Evolution
Without loss the generality, we assume that for , where are some constants and , and
denotes the first standard basis vector. This assumption is based on the rotational invariance of Gaussian distributions. Since the deterministic nature of, the ground truth signals (channel vectors) cannot be transferred to a simple form, which yields more tedious analysis procedure. For simplification, for , we denote
To study the population-level state evolution, we start with consider the case where the sequences (refer to (7)) are established via the population gradient, i.e., for ,
Here, the population gradients are computed based on the assumption that (resp. ) and (resp. ) are independent with each other. With simple calculations, the dynamics for both the signal and the perpendicular components with respect to , are given as
Assuming that is sufficiently small and () for and recognizing that , we arrive at the following population-level state evolution for both and :
Likewise, the population-level state evolution for both and :
In finite-sample case, the dynamics of the randomly initialized Wirtinger flow iterates can be represented as
where and represent the perturbation terms.
Iv-B Dynamics of Approximate State Evolution
To begin with, we define the discrepancy between the estimate and the ground truth as the distance function, given as
for , then . Moreover, based triangle inequality, there is .
In this subsection, we shall show that as long as the approximate state evolution (27) holds, there exists some constant satisfying condition (IV-B). This is demonstrated in the following Lemma. Prior to that, we first list several conditions and definitions that contribute to the lemma.
The initial points obey
(30a) (30b) (30c) for .
where is some sufficiently small constant.
for some small absolute positive constants .
For , it has
for some constants .
Then with the stepsize following , one has that , , .