I Introduction
THE availability of big data and computing power, along with the advances in the optimization algorithms, has triggered a booming era of artificial intelligence (AI). Notably, deep learning [2]
is regarded as the most popular sector in modern AI and has achieved exciting breakthroughs in applications such as speech recognition, computer vision
[3], etc. Benefiting from these achievements, AI is becoming a promising tool that streamlines people’s decisionmaking process and facilitates the development of diversified intelligence services (e.g., virtual personal assistant, recommendation system, etc). Meanwhile, with the proliferation of mobile computing as well as InternetofThings (IoT) devices, massive realtime data are generated locally [4]. However, it is widely acknowledged that traditional cloudbased computing [5, 6] faces challenges (e.g., latency, privacy and network congestion) for supporting the ubiquitous AIempowered applications on mobile devices [7].In contrast, edge AI is a promising approach, which can tackle the above concerns, via fusing mobile edge computing [8]
with AIenabled techniques (e.g., deep neural networks (DNNs)). By pushing AI models to the network edge, it brings the edge servers close to the requesting mobile devices and thus enables lowlatency and privacypreserving. Notably, edge AI is envisioned as the key ingredient of future intelligent
G networks [9, 10, 11, 12], which fully unleashes the potentials for mobile communication and computation. Typically, edge AI consists of two phases of edge training and edge inference. In particular, federated learning [13]is a key enabling technology to train machine learning models directly on mobile devices without uploading data to the cloud center. By deploying trained AI models and implementing model inference at network edge, this paper mainly focuses on edge inference. Following
[7, 14], the edge AI inference architecture is generally classified into three major types:

Ondevice inference: It performs the model inference directly on end devices where DNN models are deployed. While some enabling techniques (e.g., model compression [15, 16], hardware speedup [17]) have been proposed to facilitate the deployment of the DNN model, it still poses challenges for resourcelimited (e.g., memory, power budget and computation) end devices [18]. To mitigate such concerns, ondevice distributed computing is envisioned as a promising solution for ondevice inference, which enables AI model inference across multiple distributed end devices [19].

Joint deviceedge inference: This mode carries out the AI model inference in a deviceedge cooperation fashion [7] with the model partition and model earlyexit techniques [20]. While deviceedge cooperation is flexible and enables low correspondencelatency edge inference, it may still have high resource requirements for end devices due to the resourcedemanding nature of DNNs [21].

Edge server inference: Such methods transfer the raw input data to edge serves for processing, which then return inference results to endusers [22, 23]. Edge server inference is particularly suitable for those computationintensive tasks. Nonetheless, the inference performance relies mainly on the channel bandwidth between the edge server and end devices. Cooperative transmission [24] becomes promising for communicationefficient inference results delivery.
To support those computationtasks on the resourcelimited end devices, edge server inference stands out as a viable solution to fulfill the key performance requirements. The main focus of this paper is on the AI model inference for mobile devices with the edge server inference architecture. For the edge AI inference system, energy efficiency is a key performance indicator [14], which motives us to focus on the energyefficient edge inference design. This is achieved by optimizing the overall network power consumption, including computation power consumption for performing inference tasks and transmission power consumption for returning inference results. In particular, cooperative transmission [24] is a widely recognized technique to reduce the downlink transmit power consumption and provide lowlatency transmission services by exploiting the high beamforming gains for edge AI inference. In this work, we thus consider that multiple edge base stations (BSs) collaboratively transmit the inference results to the end devices [22]. To enable transmission cooperation, we apply the computation replication principle [25], i.e., the inference tasks from end devices can be performed by several neighboring edge BSs to create multiple copies of the inference results. However, computation replication greatly increases the power consumption in performing inference tasks. Therefore, it is necessary to select the inference tasks to be performed by each edge BS to achieve an optimal balance between communication and computation power consumption.
In this paper, we propose a joint inference task selection and downlink beamforming strategy towards achieving energyefficient edge AI inference by optimizing the overall network power consumption consisting of the computation power consumption and the transport network power consumption under the qualityofservice (QoS) constraints. However, the resulting formulation contains combinatorial variables and nonconvex constraints, which makes it computationally intractable. To address this issue, we observe that the transmit beamforming vector has an intrinsic connection with the set of inference task selection (i.e., tasks are opted by edge servers to execute). Based on this crucial observation, we present a group sparse beamforming (GSBF) reformulation, followed by proposing a logsum function based threestage GSBF approach. In particular, in the first stage, we adopt a weighted logsum function based relaxation to enhance the group sparsity of the structural solutions.
Nonetheless, the logsum function minimization problem poses challenges in computation and analysis. To resolve the issues, we present a proximal iteratively reweighted algorithm, which solves a sequence of weighted convex subproblems. Moreover, we establish the global convergence analysis and worstcase convergence rate analysis of the presented proximal iteratively reweighted algorithm. Specifically, by leveraging the Fréchet subdifferential [26], we characterize the firstorder necessary optimality conditions of the formulated convexconstrained logsum problem. We then show that the generated iterates of the proposed algorithm make the function values steadily decrease and prove that any cluster point of the generated entire sequence is a critical point of the initial objective for any initial feasible point. Finally, we show that the defined optimality residual has ergodic worstcase convergence rate, where is the iteration counter.
In the following, we summarize the major contributions of this paper as follows.

We propose a joint task selection and downlink beamforming strategy to optimize the tradeoff between computation and communication power consumption for an energyefficient edge AI inference system. In particular, task selection is achieved by controlling the group sparsity structure of the transmit beamforming vector, thereby formulating a group sparse beamforming problem under the target QoS constraints.

To solve the resulting optimization problem, we proposed a logsum function based threestage GSBF approach. In particular, we adopt a weighted logsum approximation to enhance the group sparsity of the transmit beamforming vector in the first stage. Moreover, we propose a proximal iteratively reweighted algorithm to solve the logsum minimization problem.

For the presented proximal iteratively reweighted algorithm, we establish the global convergence analysis. We prove that every cluster point generated by the presented algorithm satisfies the firstorder necessary optimality condition for the original nonconvex logsum problem. Furthermore, a worstcase convergence rate is established for this algorithm in an ergodic sense.

Numerical experiments are conducted to demonstrate the effectiveness and competitive performance of the logsum function based threestage GSBF approach for designing the green edge AI inference system.
Ia Related Works
The study of inducing sparsity generally falls into the sparse optimization category [27, 28]. In particular, sparse optimization, emerging as a powerful tool, has recently contributed to the effective design of wireless networks, e.g., group sparse beamforming for energyefficient cloud radio access networks [29, 30], and sparse signal processing for InternetofThings (IoT) networks [31, 32]. In particular, to induce the group sparsity structure of the beamforming vector, the work of [22, 23] adopted the mixed norm. As illustrated in [27], the mixed norms () can induce the group sparsity structure of the interested solution. Moreover, the mixed norm and norm [33] are commonly adopted. However, the effectiveness of sparsity based on convex sparsityinducing norms is not satisfactory since there always exists some small nonzero elements in the obtained solutions [34]. In contrast to these works, some works applied nonconvex sparsityinducing functions to seek sparser solutions [35]. Notably, the work [34] reported the capability of logsum function for enhancing the sparsity of the solutions.
Motivated by their superior performance on inducing sparsity, we adopt logsum functions to promote the sparsity pattern in the solutions. However, adopting the logsum function to enhance sparsity usually makes the problem difficult to compute and analyze. In [34], the authors first proposed an iteratively reweighted algorithm (IRL1) for tackling the nonconvex and nonsmooth logsum functions with linear constraints. Nonetheless, they did not further conduct the convergence analysis for the proposed method. Under reasonable assumptions, the work of [36] established the convergence results for a class of unconstrained nonconvex nonsmooth problems based on the limitingsubgradient tool. In particular, these results could apply to the logsum model in an unconstrained setting. In [37], they proposed a proximal iteratively reweighted algorithm and proved that any accumulation point is a critical point. The work of [38] further showed that, for any starting feasible points, the sequence generated by their proximal iteratively reweighted algorithm could converge to a critical point under the KurdykaŁojasiewicz (KL) property [39]. However, these works focused on the unconstrained formulation or linearly constrained cases when the logsum model is involved. The theoretical analysis for the logsum function with general convexset constraints has not been investigated.
IB Organization
The remainder of this paper is organized as follows. Section II presents the system model of the edge AI inference, followed by the problem formulation and analysis. Section IIC provides the group sparse beamforming formulation. The logsum function based threestage GSBF approach is proposed in Section III. Section IV provides the global convergence and convergence rate analysis of the proposed proximal iteratively reweighted algorithm. Section V demonstrates the performance of the proposed approach. The conclusion remark is made in Section VI. To keep the main text coherent and free of technical details, we divert most of the mathematic proofs to the Appendices.
IC Notation
Throughout this paper, we subsume the notation used as follows. We use and to denote the complex vector space and the real Euclidean space , respectively. Boldface lowercase letters and upper case letters to represent vectors (e.g., ) and matrices (e.g., ) with an appropriate size, respectively. The inner product between is denoted as . and is the conventionally defined norm and norm for any vectors in , respectively. In addition, we use and to denote the Hermitian and transpose operators, respectively. is the real part of a complex scalar. is a vector with all components equal to 1 and denotes the zero vector with an appropriate size. In particular, represents a vector whose th element is the norm of a structured vector . We use to denote composition operation between two functions and symbol defines the elementwise product for any two vectors .
For any closed convex set , we use
to denote the characteristic function associated with
, which is defined asSimilarly, defines a indicator function associated with the given condition , i.e., if condition is met, then return the value ; otherwise, return the value . Moreover,
corresponds to the complex random variable
with meanand variance
.Ii System Model and Problem Formulation
This section describes the overall system model and power consumption model for performing intelligent tasks in the considered edge AI inference system, followed by the problem formulation and analysis.
Iia System Model
We consider an edge computing system consisting of antenna BSs collaboratively serving singleantenna mobile users (MUs), as illustrated in Fig. 1. These deployed BSs are used as dedicated edge nodes and have access to the enormous computation and storage resources [8]. For convenience, define and as the index sets of MUs and BSs, respectively. MUs have inference computing tasks, and the results can be inferred from taskrelated DNNs. For ease of expression, we use to denote the raw input data collected from MU , and the corresponding inference results are represented as . As performing intelligent tasks on DNNs are typically resourcedemanding, it is usually impractical to perform the tasks on resourceconstrained mobile devices locally. In the proposed edge AI inference system, by exploiting the computation replication [25], we consider the scenario that each neighboring edge BS has collected the raw input data from all MUs. Then the edge BSs process the data for model inference. After the edge BSs complete the model inference, the inference results are returned to the corresponding MUs via the downlink channels. We assume that all edge BSs have been equipped with the pretrained deep network models for all inference tasks [23].
In the downlink transmission, the edge BSs, which perform the inference tasks for the same MU cooperatively, return the inference results to the MU. We assume perfect channel state information (CSI) is available to all edge BSs to enable cooperative transmission for the inference results [24]. Let denote the indexes of MUs whose tasks are selectively performed by BS , and represents task selection strategy.
IiA1 Downlink Transmission Model
Let denote the encoded scalar of the requested output for MU , and be the transmit beamforming vector at the BS for . For convenience, and without loss of generality, we assume that , i.e., the power of is normalized to the unit. The transmitted signal at BS can be expressed as
(1) 
Let be the propagation channel coefficient vector between BS and MU . The received signal at MU denoted as , is then given by
(2)  
where is complex the additive white Gaussian noise.
We assume that all data symbols are mutually independent of each other as well as noise. Based on (2), the signaltointerferenceplusnoise ratio (SINR) for MU is therefore given as
(3) 
IiA2 Power Consumption Model
The computation and transmission power consumption for model inference is generally large. Energy efficiency is of significant importance for an energyefficient edge AI inference system design, for which the overall network power consumed in computation and communication at the edge BSs becomes our main interest. Specifically, we express the total transmission power for all edge BSs in the downlink as
(4)  
where is the radio frequency power amplifier efficiency coefficient of edge BS .
In addition to the downlink transmission power consumption, the power consumed in performing AI inference tasks should be taken into consideration as well, owing to the powerdemanding nature of running DNNs. We use to denote the computation power consumption of the BS in performing inference task . Then the computation power consumed by all BSs are given by
(5) 
For the estimation of the computation energy consumption in executing task
therein, the works [40, 41] stated that the energy consumption of a deep neural network layer for inference mainly including computation energy consumption and data movement energy consumption. For illustration, we take GoogLeNet v1 [42] as a concrete example to illustrate the energy consumed by performing inference tasks. Specifically, we use GoogLeNet v1 to perform image classification tasks on the Eyeriss chip [43]. With the help of an energy estimation online tool [44], we are able to visualize the energy consumption breakdown of the GoogLeNet v1, as illustrated in Fig. 2. We obtain the estimation of the computation power consumption via dividing the total energy consumption by the computation time. In particular, the computation time is determined by the total number of multiplicationandaccumulation (MAC) operations and the peak throughout of Eyeriss chip.Therefore, the overall power consumption for edge AI inference, including transmission and computation power consumption, is calculated as
(6) 
IiB Problem Formulation and Analysis
Note that there is a fundamental tradeoff between transmission and computation power consumption. To be specific, more edge BSs performing the same task for MUs can significantly reduce the transmission power by exploiting higher transmit beamforming gains. However, this inevitably increases the computation power consumption for performing inference tasks. Therefore, the goal of an energyefficient edge inference system can be achieved by minimizing the overall network power consumption to reach a balance between these two parts of power consumption.
Let be the target SINR for MUs to receive the reliable AI inference results in the downlink successfully. In our proposed energyefficient edge AI inference system, the overall power minimization problem is thus formulated as
(7)  
s.t.  
where denotes the maximum transmit power of edge BS .
Unfortunately, problem (7) turns out to be a mixed combinatorial optimization problem due to the presence of combinatorial variable , which makes it computationally intractable. On the other hand, the nonconvex SINR constraints also pose troublesome challenges for solving (7). To address these issues, we recast problem (7) into a tractable formulation by inducing the group sparsity of the beamforming vector in the following section.
IiC A Group Sparse Beamforming Representation Framework
One naive approach to cope with the combinatorial variable is the exhaustive search. However, it is often computationally prohibitive owing to the exponential complexity. As a practical alternative, there is a critical observation that such a combinatorial variable can be eliminated by exploiting the inherent connection between task selection and the group sparsity structure of beamforming vectors. Specifically, if edge BS does not perform the inference tasks from MU (i.e., ), then it will not deliver the inference result in the downlink transmission (i.e., ). In other words, if , all coefficients in the beamforming vector are zero simultaneously. Mathematically, we have , for all , meaning the task selection strategy can be uniquely determined by the group sparsity structure of . In this respect, the overall network power consumption problem (7) can rewritten as
(8) 
By considering the sparsity structure in the beamforming vectors, the SINR expression (3) is transformed into
(9)  
where and are the aggregated channel vector and downlink transmit beamforming vector for MU , respectively.
On the other hand, since an arbitrary phase rotation of the transmit beamforming vectors does not affect the downlink SINR constraints and the objective function value, we can always find proper phases to equivalently transform the SINR constraints in (7) into convex secondorder cone constraints [45]. We thus have the following convexconstrained sparse optimization framework for network power minimization
(10)  
s.t.  
However, problem (10) is still nonconvex due to the indicator function in the objective function. As presented in [29, Proposition 1], a weighted mixed norm can be served as the tightest convex surrogate of the objective in (10), i.e.,
(11) 
In this paper, we instead propose to adopt a new group sparsity inducing function for inference tasks selection via enhancing sparsity, thereby further reducing the network power consumption.
Iii A Lossum Function Based Threestage Group Sparse Beamforming Framework
In this section, we shall propose to adopt the logsum function to enhance the group sparsity of the beamforming vector, followed by describing the logsum function based threestage GSBF approach. In particular, we propose a proximal iteratively reweighted algorithm to address the logsum minimization problem in the first stage.
Iiia Logsum Function for Enhancing Group Sparsity
Let denote the aggregated beamforming vector . To promote the group sparsity for the beamforming vector , in this paper, we propose to use the following weighted nonconvex logsum function as an approximation for the objective
(12) 
where is a weight coefficient and is a tunable parameter. The main motivation for adopting such a logsum penalty among various types of sparsityinducing functions [27] is based on the following considerations:

From the perspective of performance and theoretical analysis of the designed algorithm, a logsum function brings more practicability due to its coercivity and boundedness of its first derivative.
IiiB A Logsum Function Based Threestage Group Sparse Beamforming Approach
We present the proposed logsum based threestage GSBF framework. Specifically, the first stage is to solve the logsum convexconstrained problem via the proposed proximal iteratively reweighted algorithm to obtain a solution ; the second stage prioritizes the tasks in progress based on the obtained solution and system parameters, followed by obtaining the optimal task selection strategy ; with fixed , we refine the in the third stage. Details are depicted as follows.
Stage 1: Logsum Function Minimization. In this first stage, we obtain the group sparsity structure of beamformer by solving the following nonconvex program
(13) 
where denotes the convexconstraints in (10).
However, the nonconvex and nonsmooth objective in (13) and the presence of the convex constraints usually pose challenges in computation and analysis. Inspired by the work of [34], we can iteratively minimize the objective by solving a sequence of tractable convex subproblems. The main idea of our presented algorithm is to solve a wellconstructed convex surrogate subproblem instead of directly solving the original nonconvex problem.
Let . First observe that is a composite function with convex and nonconvex. At the th iterate , for any feasible , we have
(14)  
where is the subgradient of at and is the prescribed proximity parameter, and the first inequality holds by the definition of the subgradient of the convex function. Hence, a convex subproblem is derived as an approximation of at current iterate , which reads
(15) 
with weights
(16) 
As presented in [34], a smaller causes larger , then drive the nonzero components of towards zero aggressively. Overall, to enhance the group sparsity structure of the beamforming vector, the proposed proximal iteratively reweighted algorithm is illustrated in Algorithm 1.
Stage 2: Tasks Selection. In this second stage, an ordering guideline is applied to determine the priority of inference tasks, which is guided by the solution obtained in Stage 1. For ease of notation, let denote the set of all tasks. By considering the key system parameters (e.g., , and ), the priority of task
is heuristically given as
(17) 
Intuitively, if edge BS is with a lower aggregative beamformer gain, lower power amplifier efficiency, lower channel power gain, but a higher computation power consumption for MU , task has a lower priority. A lower indicates that the tasks from MU have lower priority and may not be performed by BS . Thus, tasks are arranged in light of the rule (17) with descending order. That is, the task’s priority is , where denotes the permutation of task indexes.
We then solve a sequence of convex feasibility detection problems to obtain task selection strategy ,
(18) 
where and increases from to until (18) is feasible. Here are convex constraints, meaning that all ’s coefficients are zeros for task . The support set of beamformer is defined as , then the optimal task selection strategy can be derived from .
Stage 3: Solution Refinement. At this point, we have determined tasks selection for each BS. Then, fix the obtained task index set, we solve the following convex program to refine the beamforming vectors
(19)  
s.t.  
Iv Global Convergence Analysis
In this section, we provide the global convergence for Algorithm 1. Specifically, we derive the firstorder necessary optimality condition to characterize the optimal solutions. We then establish convergence results for a subsequence of the sequence generated by Algorithm 1. Furthermore, we show that for any initial feasible point, the entire sequence must have cluster points, and any cluster point satisfies the established firstorder optimality condition. Finally, the ergodic worstcase convergence rate of the optimality residual is derived.
Iva Firstorder Necessary Optimality Condition
In this subsection, we derive the firstorder necessary conditions to characterize the optimal solution of (13). Problem (13) is equivalently rewritten as
(20) 
Similarly, for the derived subproblem (15), we have
(21) 
Due to the nonconvex and nonsmooth nature of the logsum function, we make use of the Fréchet subdifferential as the major tool in our analysis. Its definition is introduced as follows.
Definition 1 (Fréchet subdifferential [26])
Let be a real Banach space and denotes the corresponding topological dual and be a function from into an extended real line , finite at . A set
is called a Fréchet subdifferential of at . Its elements are referred to as Fréchet subgradients.
Several important properties of the Fréchet subdifferential [26] are listed below, which are used to characterize the optimal solution of (13).
Proposition 1
Let be a closed and convex set. Then the following properties on Fréchet subdifferentials holds true.

If is Fréchet subdifferentiable at and attains local minimum at , then .

Let be Fréchet subdifferentiable at with being convex, then is Fréchet subdifferentiable at such that
for any .

with closed and convex sets .
Theorem 1 (Fermat’s rule)
If (20) attains a local minimum at , then it holds true that
(22) 
We next investigate the properties of in the following Proposition 2, indicating that the Fréchet subdifferentials of at is bounded.
Proposition 2
If , then for any . In particular, is any element of .
To explore the behavior of the proposed proximal iteratively reweighted algorithm, based on Theorem 1 and Proposition 2, we define the optimality residual associated with (20) at a point as
(23) 
where and . Since , it implies that if then satisfies the firstorder necessary optimality condition (22). We adopt to measure the convergence rate of our algorithm.
Moreover, we provide the firstorder optimality condition of the subproblem (21) as follows
(24) 
where , and . Note that the existence of optimal solution to (21) simply follows from the convexity and the coercivity of the objective .
Now we show that an optimal solution of (21) also satisfies the firstorder necessary optimality condition of (20) in the following lemma.
Lemma 1
satisfies the firstorder necessary optimality condition of (20) if and only if
Proof:
Please refer to Appendix A for details.
Define the model reduction caused by at a point as
(25) 
The new iterate causes a decrease in the objective , and this model reduction (25) converges to zero in the limit, both results are revealed in the following Lemma 2.
Lemma 2
Suppose is generated by of Algorithm 1 with . The following statements hold true

.

.

is monotonically decreasing. Indeed,
Proof:
Please refer to Appendix B for details.
We now provide our main result in the following Theorem 2.
Theorem 2
Proof:
Please refer to Appendix C for details.
IvB Ergodic Worstcase Convergence Rate
In this subsection, we show that the presented proximal iteratively reweighted algorithm has ergodic worstcase convergence rate in terms of the optimality residual. In the following Lemma, it states that the optimality residual has an upper bound with the displacement of the iterates.
Lemma 3
The optimality residual associated with problem (20) satisfies
with ^{1}^{1}1 denotes the maximum elements among for all , ..
Proof:
Please refer to Appendix D for details.
The subproblem (21) is referred to as the primal problem, and by exploiting the conjugate function [46], the associated FenchelRockafellar dual is constructed as
(26)  
s.t. 
where the dual objective is given as , and the technical details to construct (26) is provided in Appendix E.
The FenchelRockafellar duality theorem [46] states that the solution to (26) provides a lower bound on the minimum value to the solution of (21). Moreover, the gap between the primal objective function value of (21) and the corresponding dual objective function value of (26) at the th iterate is defined as
(27) 
If this gap is zero, then the strong duality holds. That is, at the optimal solution , we have
(28) 
We now show that the duality gap vanish asymptotically in the following Theorem.
Theorem 3
Let be the sequence generated by Algorithm 1 with . Then has ergodic worstcase convergence rate.
Proof:
Please refer to Appendix F for details.
V Numerical Experiments
In this section, we use numerical experiments to validate the effectiveness of our proposed algorithms and illustrate the presented theoretical results. We compare the logsum function based threestage GSBF approach with the coordinated beamforming approach (CB) [47] and mixed GSBF [29] beamforming approach (Mixed GSBF). These two approaches are listed below:

CB considers minimizing the total transmit power consumption. In other words, all BSs are required to perform the inference tasks from all MUs.
On the experimental setup, we consider the edge AI inference system with antennas, and singleantenna MUs that all are uniformly and independently distributed in a km km square region. The channel between BS and MU is set as , where the pathloss model is given by and is the Euclidean distance between BS and MU , is the smallscale fading coefficient, i.e., . We set W and specify W, and . Furthermore, for the proposed logsum function based threestage GSBF approach, we set , and initialize . In particular, we terminate the proximal iteratively reweighted algorithm either it hits the predefined maximum iterations or satisfies
(29) 
where is a predescribed tolerance.
Va Convergence of the Proximal Iteratively Reweighted Algorithm
The goal in this subsection is to illustrate the convergence behavior of the proposed proximal iteratively reweighted algorithm. The presented result is obtained in a typical channel realization. Fig. 3 illustrates the convergence of the proximal iteratively reweighted algorithm. We can see that steadily decreases along with the iterations, which is consistent with our analysis in Lemma 2. Interestingly, we observe that the objective value of drops quickly in the first few iterations (less iterations), which indicates that the proposed proximal iteratively reweighted algorithm converges very fast. In view of this, we may suggest early terminating the Algorithm 1 in practice to obtain an approximate solution to speed up the entire algorithm while guaranteeing the overall performance.
VB Effectiveness of the Proposed Approach
We evaluate the performance of the three algorithms in terms of the overall network power consumption, the transmit power consumption and the number of computation tasks. The presented results are averaged over randomly and independently generated channel realizations.
Fig. 4 depicts the overall network power consumption of three approaches with different target SINRs. First, we observe that all three approaches have higher total power consumption as the required SINR becomes more stringent. This is because more edge BSs are required to transmit the inference results for higher QoS. In addition, we can see that CB approach has the highest power consumption among three approaches and the relative power difference between CB and the other two approaches can achieve approximately when SINR is dB and approximately when SINR is dB, indicating the effectiveness of joint task selection strategy and group sparse beamforming approach to minimize the overall network power consumption. On the other hand, we can see that the proposed logsum function based threestage GSBF approach outperforms the mixed GSBF approach, which demonstrates that enhance the group sparsity further reduces the overall network power consumption. In particular, we also observe that the performance gap between the blue and the red curve approximately remains at when SINR ranges from dB to dB, which indicates that the proposed logsum function based threestage GSBF approach is still attractive in the high SINR regime.
Tables I and II further demonstrate the number of inference tasks performed by edge BSs and the transmission power consumption, respectively. To be specific, in Table I, we observe that the number of performed inference tasks among three approaches is different under various SINRs, which shows the existence of the task selection strategy. Besides, it is observed that the logsum function based threestage GSBF approach always achieves a less number of performed inference tasks compared to the mixed GSBF approach for target SINRs, which indicates that the logsum function based threestage GSBF approach can enhance the group sparsity pattern in the beamforming vector. Meanwhile, as observed in Table II, the CB approach has the lowest transmission power compared to the other two approaches because the CB approach only optimizes the power consumption in transmission with performing all inference tasks. On the other hand, the transmission power consumption of the logsum function based threestage GSBF approach is slightly higher compared to the mixed GSBF approach under most SINRs. This is because more edge BSs participate in performing inference tasks in the mixed GSBF approach, resulting in a higher transmit beamforming gain for reducing transmission power. In other words, less number of performed inference tasks further reduces the computation power consumption of edge BSs but increases the transmission power consumption. Observe the Fig. 4 and Tables III together, it indicates that the proposed joint task selection strategy and GSBF approach find a good balance between computation power consumption and transmission power consumption, yielding lowest network power consumption.
Target SINR [dB]  Proposed  Mixed GSBF  CB 

Target SINR [dB]  Proposed  Mixed GSBF  CB 

Vi Conclusion
In this paper, we developed an energyefficient edge AI inference system through the joint selection of the inference tasks and optimization of the transmit beamforming vectors for minimizing the computation power consumption and the downlink transmission power consumption, respectively. Based on the critical insight that the inference tasks selection can be achieved by controlling the group sparsity structure of transmit beamforming vectors, we developed a group sparse optimization framework for network power minimization, for which a logsum function based threestage group sparse beamforming algorithm was developed to enhance group sparsity in the solutions. To resolve the resulting nonconvex and nonsmooth logsum function minimization problem, we further proposed a proximal iteratively reweighted algorithm. Furthermore, the global convergence analysis was provided, and a worstcase convergence rate in an ergodic sense has been derived for this algorithm.
Appendix A Proof of Lemma 1
Appendix B Proof of Lemma 2
First of all, and is convex, so that . Since is concave, we have,
(31) 
Therefore
(32) 
where the first inequality follows from (31). This completes the first statement .
Comments
There are no comments yet.