I Introduction
The broad range of InternetofThings (IoT) applications continues to contribute substantially to the economic development and the improvement of our lives [1]. In particular, the wirelessly networked sensors are growing at an unprecedented rate, making data aggregation highly critical for IoT services [2]. For large scale wireless networking of sensor nodes, orthogonal multiple access protocols are highly impractical because of their low spectrum utilization efficiency for IoT and the excessive network latency [3]. In response, the concept of overtheair computation
(AirComp) has recently been considered for computing a class of nomographic functions, such as arithmetic mean, weighted sum, geometric mean and Euclidean norm of distributed sensor data via concurrent, instead of the sequential, node transmissions
[4]. AirComp exploits the natural superposition of cochannel transmissions from multiple data source nodes [5].There have already been a number of published works related to AirComp. Among them, one research thread takes on the information theoretic view and focuses on achievable computation rate under structured coding schemes. Specifically, in the seminal work of [6], linear source coding was designed to reliably compute a function of distributed sensor data transmitted over the multipleaccess channels (MACs). Lattice codes were adopted in [6, 7] to compute the sum of source signals over MACs efficiently. Leveraging lattice coding, a computeandforward relaying scheme [5] was proposed for relay assisted networks. On the other hand, a different line of studies [8, 9] investigates the error of distributed estimation in wireless sensor networks. In particular, linear decentralized estimation was investigated in [8] for coherent multiple access channels. Power control was investigated in [9] to optimize the estimation distortion. It was shown in [10] that pre and postprocessing functions enable the optimization of computation performance by harnessing the interference for function computations. Another more recent line of studies focused on designing transmitter and receiver matrices in order to minimize the distortion error when computing desired functions. Among others, MIMOAirComp equalization and channel feedback techniques for spatially multiplexing multifunction computation have been proposed [3]. Another work developed a novel transmitter design at the multiple antennas IoT devices with zeroforcing beamforming [11].
However, the main limitation of current AirComp is the dependence on channelstateinformation (CSI), which leads to high latency and significant overhead in the massive InternetofThings networks with a large amount of devices. Even though the work [12] has proposed a type of CSI at sensor nodes, called No CSI, the receiver still needs to obtain the statistical channel knowledge. Recently, blind demixing has become a powerful tool to elude channelstateinformation (i.e., without channel estimation at both transmitters and receivers) thereby enabling lowlatency communications [13, 14, 15]. Specifically, in blind demixing, a sequence of source signals can be recovered from the sum of bilinear measurements without the knowledge of channel information [16]. Inspired by the recent progress of blind demixing, in this paper, we shall propose a novel blind overtheair computation
(BlairComp) scheme for lowlatency data aggregation, thereby computing the desired function (e.g., arithmetic mean) of sensing data vectors without the prior knowledge of channel information. However, the BlairComp problem turns out to be a highly intractable nonconvex optimization problem due to the bilinear signal model.
There is a growing body of recent works to tame the nonconvexity in solving the highdimensional bilinear systems. Specifically, semidefinite programming was developed in [14] to solve the blind demixing problem by lifting the bilinear model into the matrix space. However, it is computationally prohibitive for solving largescale problem due to the high computation and storage cost. To address this issue, the nonconvex algorithm, e.g., regularized gradient descent with spectral initialization [13], was further developed to optimize the variables in the natural vector space. Nevertheless, the theoretical guarantees for the regularized gradient [13] provide pessimistic convergence rate and require carefullydesigned initialization. The Riemannian trustregion optimization algorithm without regularization was further proposed in [15] to improve the convergence rate. However, the secondorder algorithm brings unique challenges in providing statistical guarantees. Recently, theoretical guarantees concerning regularizationfree Wirtinger flow with spectral initialization for blind demixing was provided in [16]. However, this regularizationfree method still calls for spectral initialization. To find a natural implementation for the practitioners that works equally well as spectral initialization, in this paper, we shall propose to solve the BlairComp problem via randomly initialized Wirtinger flow with provable optimality guarantees.
Based on the random initialization strategy, a line of research studies the benign global landscapes for the highdimensional nonconvex estimation problems, followed by designing generic saddlepoint escaping algorithms, e.g., noisy stochastic gradient descent
[17], trustregion method [18], perturbed gradient descent [19]. With sufficient samples, these algorithms are guaranteed to converge globally for phase retrieval [18], matrix recovery [20], matrix sensing [21], robust PCA [21]and shallow neural networks
[22], where all local minima are provably as good as global and all the saddle points are strict. However, the theoretical results developed in [17, 18, 19, 20, 21, 22] are fairly general and may yield pessimistic convergence rate guarantees. Moreover, these saddlepoint escaping algorithms are more complicated for implementation than the natural vanilla gradient descent or Wirtinger flow. To advance the theoretical analysis for gradient descent with random initialization, the fast global convergence guarantee concerning randomly initialized gradient descent for phase retrieval has been recently provided in [23].In this paper, our main contribution is to provide the global convergence guarantee concerning Wirtinger flow with random initialization for solving the nonconvex BlairComp problem. It turns out that, for BlairComp, the procedure of Wirtinger flow with random initialization can be separated into two stages:

Stage I: the estimation error is nearly stable, which takes only a few iterations,

Stage II: the estimation error decays exponentially at a linear convergence rate.
In addition, we identify the exponential growth of the magnitude ratios of the signals to perpendicular components, which explains why Stage I lasts only for a few iterations.
Notations
Throughout this paper, or denotes that there exists a constant such that whereas means that there exists a constant such that . denotes that there exists some sufficiently large constant such that . In addition, the notation means that there exists constants such that . Let superscripts and denote the transpose and conjugate transpose of a matrix/vector, respectively. Let the superscript denote the conjugate transpose of a complex number.
Ii Problem Formulation
Blind overtheair computation (BlairComp) aims to facilitate lowlatency data aggregation in IoT networks without a priori knowledge of CSI. This is achieved by computing the desired functions of the distributed sensing data based on the natural signal superposition of transmission over multiaccess channels.
Iia Blind OvertheAir Computation
We consider a wireless sensor network consisting of active sensor nodes and a single fusion center. Let denote the sensor data vector collected at the th node. The fusion center, through AirComp, aims to compute nomographic functions of distributed data that can be decomposed as [10]
(1) 
Function denotes the preprocessing function by the sensor nodes and denotes the postprocessing function at the fusion center. Typical nomographic functions by AirComp include the arithmetic mean, weighted sum, geometric mean, polynomial, Euclidean norm [10].
In this work, we focus on a specific nomographic function
(2) 
where is the preprocessed data vector transmitted by the th node. Over channel access opportunities (e.g., time slots), the received signals at fusion center in the frequency domain can be written as [13, 15]
(3) 
where are the access vectors, the CSI vectors, and is an independent circularly symmetric complex Gaussian measurement noise.
To compute the desired functions via BlairComp without knowledge of , we can consider a precoding scheme with randomly selected known vectors
follows i.i.d. circularly symmetric complex normal distribution
for . Furthermore, the firstcolumns of the unitary discrete Fourier transform (DFT) matrix
form the known matrix [13]. The target of BlairComp is to compute the desired function vector via concurrent transmissions without channel information, thereby providing lowlatency data aggregation in the IoT networks.IiB MultiDimensional Nonconvex Estimation
One way to estimate the result vector from the received signals in (3) is to use with as the ambiguity alignment parameter, for which one could first solve the bilinear optimization problem:
(4) 
which is highly nonconvex. To measure the computation accuracy for BlairComp problem, we define the following metric
(5) 
Note that are ambiguity alignment parameters such that
(6) 
To estimate , one reference symbol in is needed.
In this paper, we shall propose to solve the highdimensional BlairComp problem via Wirtinger flow with random initialization. Our main contribution is to provide the statistical optimality and convergence guarantee for the randomly initialized Wirtinger flow algorithm by exploiting the benign geometry of the highdimensional BlairComp problem.
Iii Main Approach
In this section, we first propose an algorithm based on randomly initialized Wirtinger flow to solve the BlairComp problem . We shall present a statistical analysis to demonstrate the optimality of this algorithm for solving the highdimensional nonconvex estimation problem.
Iiia Randomly Initialized Wirtinger Flow Algorithm
Wirtinger flow with random initialization is an iterative algorithm with a simple gradient descent update procedure without regularization. Specifically, the gradient step of Wirtinger flow is represented by the notion of Wirtinger derivatives [24], i.e., the derivatives of real valued functions over complex variables.
To simplify the notations, we denote , where
(7) 
For each , and denote the Wirtinger gradient of with respect to and respectively as:
(8a)  
(8b) 
In light of the Wirtinger gradient (8), the update rule of Wirtinger flow uses a stepsize via
(9) 
Before proceed to theoretical analysis, we first present an example to illustrate the practical efficiency of Wirtinger flow with random initialization for solving problem (4). The ground truth values and initial points are randomly generated according to
(10)  
(11) 
for . In all our simulations, we set and normalize for . Specifically, for each value of , and , the design vectors ’s and ’s for each , are generated according to the descriptions in Section II. With the chosen step size in all settings, Fig. 1 shows the relative error, i.e., (5), versus the iteration count. We observe the convergence of Wirtinger flow with random initialization exhibits two stages: Stage I: within dozens of iterations, the relative error remains nearly flat, Stage II: the relative error shows exponential decay despite the different problem sizes.
In practical scenario, the estimation error of ambiguity alignment parameters would have influences on the relative error, i.e., (5). Hence, we illustrate the relationship between the estimation error of ambiguity alignment parameters and the relative error via the following experiment. Let , , the step size be and the number of users . In each iteration, for , the estimated ambiguity alignment parameter is given by where is given by (6) and is the additive noise. In the experiment, the parameter varies from to . Fig. 1 shows the relative error versus the parameter . Both the relative error and the parameter are shown in the dB scale. As we can see, the relative error scales linearly with the parameter .
IiiB Theoretical Analysis
To present the main theorem, we first introduce several fundamental definitions. Specifically, the incoherence parameter [13], which characterizes the incoherence between and for .
Definition 1 (Incoherence for BlairComp).
Let the incoherence parameter be the smallest number such that
Let and , respectively, denote
(12) 
where ’s are alignment parameters. We further define the norm of the signal component and the perpendicular component with respect to for as
(13)  
(14) 
respectively. Here, ’s are the alignment parameters. Similarly, the norms of the signal component and the perpendicular component with respect to for can be represented as
(15)  
(16) 
respectively.
Without loss of generality, we assume () for and , for . Define the condition number with . Then the main theorem is presented in the following.
Theorem 1.
Assume that the initial points obey (11) for and the stepsize satisfies . Suppose that the sample size satisfies for some sufficiently large constant
. Then with probability at least
for some constants , there exists a sufficiently small constant and such that
The randomly initialized Wirtinger flow makes the estimation error decays exponentially, i.e.,
(17) 
The magnitude ratios of the signal component to the perpendicular component with respect to and obey
(18a) (18b) respectively, where for some constants .

The normalized root mean square error for obeys
(19) for some constant .
Theorem 1 provides precise statistical analysis on the computational efficiency of Wirtinger flow with random initialization. Specifically, in Stage I, it takes iterations for randomly initialized Wirtinger flow to reach sufficient small relative error, i.e., where is some sufficiently small constant. The short duration of Stage I is own to the exponential growth of the magnitude ratio of the signal component to the perpendicular components. Moreover, in Stage II, it takes iterations to reach accurate solution at a linear convergence rate. Thus, the iteration complexity of randomly initialized Wirtinger flow is guaranteed to be as long as the sample size exceeds .
To further illustrate the relationship between the signal component (resp. ) and the perpendicular component (resp. ) for , we provide the simulation results under the setting of , , and with for . In particular, , versus iteration count (resp. , versus iteration count) for is demonstrated in Fig. 2 (resp. Fig. 2). Consider Fig. 1, Fig. 2 and Fig. 2 collectively, it shows that despite the rare decline of the estimation error, i.e., , during Stage I, the size of the signal component, i.e., and for each , exponentially increase and the signal component becomes dominant component at the end of Stage I. Furthermore, the exponential growth of the ratio (resp. ) for each is illustrated in Fig. 2 (resp. Fig. 2).
Iv Dynamics Analysis
In this section, we prove the main theorem by investigating the dynamics of the iterates of Wirtinger flow with random initialization. The steps of proving Theorem 1 are summarized as follows.

Stage I:

Dynamics of populationlevel state evolution. Provide the populationlevel state evolution of (24a) and (24b), (25a), (25b) respectively, where the sample size approaches infinity. We then develop the approximate state evolution (27), which are remarkably close to the populationlevel state evolution, in the finitesample regime. See details in Section IVA.

Leaveoneout arguments. Prove that with high probability , , and satisfy the approximate state evolution (27) if the iterates are independent with . Please refer to Lemma 2. To achieve this, the “nearindependence" between and is established via exploiting leaveoneout arguments and some variants of the arguments. Specifically, the leaveoneout sequences and randomsign sequences are constructed in Section IVC. The concentrations between the original and these auxiliary sequences are then provided in Lemma 4Lemma 9.


Stage II: Local geometry in the region of incoherence and contraction. We invoke the prior theory provided in [16] to show local convergence of the random initialized Wirtinger flow in Stage II.
Iva Dynamics of Populationlevel State Evolution
In this subsection, we investigate the dynamics of populationlevel (where we have infinite samples) state evolution of (13), (14), (15) and (16).
Without loss the generality, we assume that for , where are some constants and , and
denotes the first standard basis vector. This assumption is based on the rotational invariance of Gaussian distributions. Since the deterministic nature of
, the ground truth signals (channel vectors) cannot be transferred to a simple form, which yields more tedious analysis procedure. For simplification, for , we denote(20) 
as the first entry and the second through the th entries of , respectively. Based on the assumption that for , (15) and (16) can be reformulated as
(21) 
To study the populationlevel state evolution, we start with consider the case where the sequences (refer to (7)) are established via the population gradient, i.e., for ,
(22) 
where
Here, the population gradients are computed based on the assumption that (resp. ) and (resp. ) are independent with each other. With simple calculations, the dynamics for both the signal and the perpendicular components with respect to , are given as
(23a)  
(23b) 
Assuming that is sufficiently small and () for and recognizing that , we arrive at the following populationlevel state evolution for both and :
(24a)  
(24b) 
Likewise, the populationlevel state evolution for both and :
(25a)  
(25b) 
In finitesample case, the dynamics of the randomly initialized Wirtinger flow iterates can be represented as
(26) 
Under the assumption that the last term in (IVA) is wellcontrolled, which will be justified in Appendix B, we arrive at the approximate state evolution:
(27a)  
(27b)  
(27c)  
(27d) 
where and represent the perturbation terms.
IvB Dynamics of Approximate State Evolution
To begin with, we define the discrepancy between the estimate and the ground truth as the distance function, given as
(28) 
where for . Here, and each is the alignment parameter. It is easily seen that if (13), (14), (15) and (16) obey
(29) 
for , then . Moreover, based triangle inequality, there is .
In this subsection, we shall show that as long as the approximate state evolution (27) holds, there exists some constant satisfying condition (IVB). This is demonstrated in the following Lemma. Prior to that, we first list several conditions and definitions that contribute to the lemma.

The initial points obey
(30a) (30b) (30c) for . 
Define
(31) where is some sufficiently small constant.

Define
(32) (33) for some small absolute positive constants .

For , it has
(34) (35) for some constants .
Comments
There are no comments yet.