I Introduction
The number of wireless devices is now over billion, and with the advent of new 5Gandbeyond technologies, this is expected to grow to billion by 2022 [10]
. Many of these devices will be dataprocessingcapable nodes in the hands of users that facilitate rapidly growing dataintensive applications running at the network edge, e.g., social networking, video streaming, and distributed data analytics. Given the bursty nature of user demands, when certain devices are occupied with processing for computationallyintensive applications, e.g., facial recognition, locationbased augmented/virtual reality (AR/VR), and online 3D gaming, it may be desirable for them to offload their data to devices with underutilized resources
[41, 40, 29]. Traditionally, cloud computing architectures, such as Amazon Web Services and Microsoft Azure, have been adopted for such data intensive applications, but the exponential rise in data generation at the edge is making centralized architectures infeasible for providing latencysensitive quality of service at scale [10].As a current trend in wireless networks is reducing cell sizes [32], many 5G networks will be dense with short distances, forming several smaller subnets [11]. Networks of small subnets combined with improved computational and storage capabilities of edge devices are enabling mobile edge computing (MEC) architectures. At a high level, MEC leverages radio access networks (RANs) to boost computing power in close proximity to endusers, thus enabling the users to offload their computations to an edge server (central processing entity) as shown in Fig. 0(a) [8, 23, 28, 3, 26, 42, 27, 7]. In an MEC architecture, the edge servers have highperformance computing units which can process large amounts of computationally intensive tasks efficiently. This concept has been extended to “helper” edge server architectures as well, where devices with idle computation resources become (small) edge servers [6, 12, 9, 39, 17, 20].
The current trend in distributed computing, though, is a migration to architectures that are more decentralized than MEC. This is due to the fact that all edge nodes can take part in data offloading at different times, given the advances in 5G communication technologies in conjunction with improved computational capabilities of individual devices. For this reason, devicetodevice (D2D) network architectures (shown in Fig. 0(b)) that were previously studied in 4G LTE standards now hold the promise of providing distributed computing at scale [15].
Unlike the MEC system in Fig. 0(a), distributed computing in the D2D network of Fig. 0(b) will have more complicated topology management needs that must be considered together with the management of device resources. From a computation perspective, the edge nodes that receive offloaded tasks must have a suitable strategy for allocating its central processing unit (CPU) and/or storage resources to the tasks. From a communication perspective, wireless transmissions among edge nodes participating in data offloading will inevitably incur interchannel interference, which requires interference management via strategies to allocate subchannels, transmission powers, antenna array gains, and other device transmit resources. The focus of this paper is on addressing these challenges: we develop methodologies that jointly optimize computation and communication resources together with topology configuration in D2D networks to adapt to minimize overhead in edge computing systems.
Ia Related Work and Differentiation
We discuss related works on task offloading, resource management, and edge computing. We divide our analysis into two main categories: MEC and D2D.
IA1 MEC systems
Researchers have developed methods for resource management and offloading decisionmaking to maximize MEC system performance. Offloading decisions were thoroughly studied in [8], where management of device resources is assumed to be fixed. On the other hand, under the assumption that offloading decisions are given, studies have considered optimal allocations of CPU and subchannel resources [23], and have also considered these together with beamforming design for multipleinput multipleoutput (MIMO) systems [28, 3]. In a large network with limited subchannels, beamforming design is essential to mitigate inevitable interchannel interferences for robust data transfer and optimization. Recently, offloading decisions have been considered together with management of resources in MEC systems such as CPU [42, 27, 7, 26], subchannels [42], transmit powers [42, 27], and beamforming design [27]. Although many of these works have considered some computation and communication resources, they have not yet addressed all of the important variables in a unified optimization problem.
Though we focus on D2D in this paper, as mentioned previously, newer MEC architectures allow idle devices in close proximity to be dedicated computing nodes. Therefore, optimization in MEC systems can be viewed as a special case of D2D networks, where offloading is restricted to specific devices unidirectionally.
IA2 D2D networks
Several prior works have focused on optimizing communication quality in D2D systems, where the objectives have been to maximize sumrate [14, 36, 44, 22, 13, 37], spectral efficiency [24]
, or signaltonoise ratio (SINR)
[34], with consideration of device and channel resources such as subchannels [14, 36, 44, 22, 13], transmit powers [44, 22, 13], and beamforming design for MIMO systems [24, 34]. In this work, by contrast, we are focused on optimizing these and other system parameters to minimize time and energy consumption required to complete a task, which is an important objective in edge computing systems. Works on D2D in edge computing have primarily focused on D2Denabled (or D2Dassisted) MEC systems where several helper nodes are available as dedicated nodes for computing together with the edge server. In this respect, within a fixed topology, [6] investigated energy minimization based on CPU and transmission power allocation, and [12] studied joint time and energy minimization based on CPU, subchannel, and transmission power allocation. On the other hand, for a given set of system resources, the strategy of topology reconfiguration was discussed to minimize total energy in [9]. Some recent works have addressed topology configuration together with the allocation of specific resources such as CPU [39, 17, 20] and power [39, 17]. However, we are not aware of any work that has addressed computation, communication, and topology configuration together in a unified optimization model for D2D edge computing, which is the focus of our paper. Also, we consider the fully distributed case where there are no edge servers or dedicated nodes for computing, which makes the topology configuration problem more challenging.IB Summary of Contributions
Compared to the related works discussed in Section IA, the contributions of this paper are as follows:

We formulate a unified optimization model for D2D edge computing networks that minimizes total network overhead, defined as the weighted sum of time and energy consumption required to process a given task. Our model includes a framework for joint topology configuration, CPU allocation, subchannel allocation, and beamforming design for MIMO systems (Sections II and III).

We propose two methods for minimizing the total network overhead in our model, which we refer to as semiexhaustive search optimization and efficient alternate optimization. We compare these two methods in terms of optimality guarantees and computational complexity in solving our nonconvex problem, and show that semiexhaustive search optimization can be viewed as a “best effort” to obtaining the optimal solution in a realistic amount of time, but that its complexity becomes problematic as the size of the network grows (Section IV).

In developing these methods, we study the decomposition of the optimization into several subproblems: topology design, CPU allocation, and beamforming design. In particular, we solve for beamforming design problem for fixed resource allocation as a subproblem of overall optimization. We derive minimum communication overhead beamforming (MCOB), a coordinated beamforming algorithm which we show obtains the optimal beamformer for a minimum mean squared error receiver (Section IV).

We conduct several numerical experiments to evaluate the performance of our network overhead optimization methodology. Our results show, for example, that our two proposed algorithms for efficient data offloading can reduce the total overhead in D2D networks by 20%30% compared to computation without offloading (Section V).
Ii Wireless Devicetodevice
(D2D) Network Model
In this section, we develop our models for computational tasks, wireless signals, and the allocation of network resources in D2D systems.
Iia Task Model
We let be the set of nodes in the D2D network, with a total of nodes. Each node has a task to be completed, consisting of computational work involved in data processing, where the objective of the data processing is generically to perform a transformation from input to output data. For simplicity, we assume that each node has a single task that should be processed as a whole. This means that the task processing and its input data cannot be subdivided. A task is considered to be completed when the input data is successfully processed to the desired output. In general, task completion requires computational resources including CPU, RAM, and storage. In this paper, similar to previous works [42, 27, 7, 39, 17, 20], we focus on CPU as the computation resource. In case of mobile devices, many of today’s tasks require computationintensive processing with high CPU requirements, such as 3Dgaming and locationbased augmented/virtual reality (AR/VR) [41, 40, 29].
To quantify the complexity of the task for node (which we will refer to succinctly as task ), we introduce data size (in bits), which is the length of the bit stream of input data consisting of task . In other words, the bit stream of input data is represented as . Then, data workload is denoted as (in cycles), where (in cycles/bit) is the processing density, meaning how many CPU cycles are required to process a bit of data. That is, represents total number of CPU cycles required to complete task . The processing density depends on the application; for example, in the case of the audio signal detection in [19], since 500 cycles are required for processing 1 bit of data, is 500.
IiB Signal Model
Fig. 2 demonstrates our wireless D2D channel model among a set of nodes. We assume that the nodes can transmit using multiple antennas on subchannels, where the set of subchannels is denoted . Each node receives a signal through subchannel in our model as
(1) 
where is the number of antennas of node . The scalar denotes the transmit signal sent by node with unit power , where
can be understood as a single channel use of a Gaussian codeword vector that is encoded with
bits per channel use. The vector is the transmit beamformer of node with transmission power constraint , i.e., . Also, the matrix denotes a multipleinput multipleoutput (MIMO) channel from transmit node to receive node through subchannel . The noise vector is assumed to be complex additive Gaussian noise with zero mean and identity covariance matrix, i.e., . The scalar is an indicator of whether transmit node uses subchannel for transmission. In this paper, we assume that the transmit node uses only one subchannel for transmission; in other words, if , then .At receive node on subchannel , we consider a linear receive combiner
so that the estimated value
is given by(2) 
where a superscript denotes the conjugate transpose.
IiC Task and Resource Allocation
The assignment of tasks to either offloading or local processing determines the D2D network topology. Constraints on how subchannels and processing resources are allocated must be specified based on these assignments.
IiC1 Task assignment
Each task can be either processed locally at node or offloaded to another node for processing. We define as the task assignment variable of whether task is assigned to node for . If , then we have local processing of task at node . On the other hand, if for some , then we have offloaded processing where task is offloaded from to and processed at node . The set of task assignments is denoted by
(3) 
Due to the assumption that each task should be processed as a whole, task should be assigned to only one node, which implies the constraint that
(4) 
IiC2 Subchannel allocation
The task assignment specifies the configuration of how the nodes communicate with each other. Therefore, the subchannel allocation variable is related to task assignment variable as
(5) 
implies node is a transmit node, because task is not locally processed at node , implying transmission to another node. In this case, transmit node uses one of the subchannels for transmission, i.e., . On the other hand, if node is not a transmit node, then and there is no subchannel allocation for node , i.e., .
Each of the subchannels is assumed to have equal and nonoverlapping bandwidth of width . Consider, however, the case that node receives multiple tasks from multiple transmit nodes. If same subchannel is used by these transmitters, the receive node must jointly decode the data of tasks, which leads to degraded decoding performance. Therefore, in this paper, we follow prior work and assume that the transmit nodes that transmit to the same receive node use different subchannels [35]. In other words, for each receive node , we restrict the number of transmitters on subchannel according to
(6) 
where denotes the set of transmit nodes that transmit to the receive node given by
(7) 
IiC3 Computational resource allocation
Consider that node has multiple tasks to complete (its own and/or those offloaded to it). Its computational resource (CPU) will be shared across these multiple tasks, where (in cycles/sec or Hz) denotes the available CPU of node . We define the amount of CPU resource of node allocated to task as , which is subject to the constraints
(8)  
(9)  
(10) 
In (8), the total CPU resource allocated cannot exceed the available CPU resource for each node . In (9), implies that task has not been assigned to node , so no CPU resources will be allocated to task . In (10), the allocated CPU is restricted to a positive real value.
Iii D2D Network Optimization Model
In this section, we formulate the optimization problem for minimizing D2D network task completion overhead. We define the total network overhead as a cost function to be minimized, consisting of both computation and communication overhead.
Iiia Computation Overhead
We first define the computation overhead associated with node offloading to node . Based on the models from Section II, we can compute the computation time (in seconds) of task computed at node according to
(11) 
The computation energy consumption (in Joules) can be computed as
(12) 
where is the energy coefficient (in Joules seconds/cycles) of node that depends on the processor chip architecture [38]. Here, denotes the energy consumption per cycle (in units of Joules/cycle).
With this, we define the computation overhead as the weighted sum of time and energy consumption, given by
(13) 
where is a demand overhead factor. From (11) and (12), note that the time consumption and energy consumption have tradeoff relationship with respect to computation resources: as more computation resources are used, computation time decreases while computation energy increases. The overhead factor trades off the importance of these two factors, and should be determined by the requirement of task . For example, node with stringent requirement on task completion time can have a lower in order to place more importance on shortening the time at the expense of more energy consumption. gives the local computation overhead where task is locally processed at node .
IiiB Communication Overhead
We now define the communication overhead associated with transmission of a task from node to . When , we can write the signal to interference plus noise ratio (SINR) from node to node on subchannel as
(14) 
where all other transmit nodes using subchannel are interferences to the data stream from node when it uses subchannel .
Assuming perfect channel state information (CSI), we can write the maximum achievable data rate (in bits/second) from node to node on subchannel as
(15) 
where is the bandwidth of each frequency subchannel. Then, the total maximum achievable data rate from node to node over all subchannels is
(16) 
When node is a transmitter, by (5), only one subchannel is active. In other words, when , for , leading to . Letting be the active subchannel for node , i.e., satisfying , the achievable rate is
(17)  
Given the data rate, we can compute the communication time (in seconds) from offloading node ’s task to as
(18) 
The communication energy consumption for node corresponding to the link from to is
(19) 
where is the constant circuit power including power dissipations in the transmit filter, mixer, and digitaltoanalog converter, which are independent of the actual transmit power .
With these expressions for and , the communication overhead is defined with respect to the overhead factor as
(20) 
Note that there is a tradeoff between and with respect to the transmit power : as more power is applied, decreases due to the increasing data rate in (17), while increases because increases.
IiiC Total Network Overhead
Recall that there are two possibilities for task : (i) local processing, i.e., , and (ii) offloaded processing, i.e., for some . Local processing only incurs computation overhead while offloaded processing incurs both communication and computation overhead, . With this, for a given D2D network topology configuration, we can write the total network overhead to complete all tasks in the network as
(21) 
IiiD Optimization Formulation
We now formulate the problem jointly optimizing the D2D network parameters to achieve the minimum total network overhead
. The degrees of freedom available are the task assignments
, computational resource allocations , subchannel allocations , and beamforming design variables involving transmit beamformers and receive combiners . The optimization problem is given by:minimize  (22)  
subject to  (23)  
(24)  
(25)  
(26)  
(27)  
(28)  
(29)  
(30)  
(31)  
(32)  
variables 
Constraints (23)(27) and (30)(32) account for task assignment, subchannel allocation, and CPU allocation requirements, which were described in Section IIC. (29) captures the constraint for the transmission power budget of an individual node. Note that there is no constraint on such as a maximum magnitude restriction because the data rate is not affected by the magnitude of .
Assuming all nodes have antennas, meaning that for all , the optimization is a mixed integer program (MIP) with noninteger variables from , , , and integer variables from , . The function is nonconvex with respect to and , which makes the problem a nonconvex MIP. Existing solvers for nonconvex MIPs do not scale well with the number of variables [5], and even in a relatively small D2D setting with nodes, subchannels, and antennas, our problem has already more than 2000 variables. We next turn to addressing the challenge of solving this optimization at scale.
IiiE D2D Network Optimization Assumptions
A few assumptions made on the D2D model in this section are noteworthy. First, although the network states will be dynamic over time, we assume a quasistatic scenario with active nodes and fixed channels during one codeword block, similar to previous works [23, 28, 3, 42, 27, 7, 26, 6, 12, 39, 17]. The algorithms we develop for solving the optimization (22)(32) in Section IV could then be applied to each quasistatic scenario as the number of nodes and channel conditions change, or at some suitable time interval. Second, we assume the availability of a network operator, e.g., a base station, which can solve the optimization in a centralized manner via measurements of CSI, availability of subchannels, and knowledge of computation resources. This operator does not provide any additional computational capability to the D2D network as we assume it is occupied solving the optimization. Third, we do not take into account the process of transferring the result of an offloaded task computation back to the source node. We consider that the output data is negligible in size compared with the task so that it can be transferred through the network with minimal load.
Iv Optimization Algorithms
In this section, we develop two methods for solving the minimum overhead optimization problem (22)(32). The first method, semiexhaustive search, provides a besteffort attempt to obtain the optimal solution, but has exponential complexity. The second method, efficient alternate optimization, reduces the complexity to polynomial time, for which we use semiexhaustive search as an optimality benchmark.
Iva SemiExhaustive Search Optimization
Given the task assignments and subchannel allocations variables are binary, an intuitive approach to solving the optimization is to exhaustively search through all of their possibilities, so long as the search space is not prohibitively large. Then, for each possibility, we can derive solvers for the noninteger variables , , and . We refer to this method as semiexhaustive search. The overall procedure is described in Algorithm 1: each choice of and satisfying constraints (23)(27) is considered. For given task assignments , we solve the CPU allocation problem for the processing resources , which is a convex problem. In addition, for fixed task assignments and subchannel allocations , we solve the problem with respect to the beamformers and combiners , which is a beamforming design problem. We will develop solutions to these two problems in the rest of this section.
IvA1 CPU allocation
With task assignments determined, the optimization problem (22)(32) with respect to CPU allocations can be reduced to
minimize  (33)  
subject to  (34)  
variables 
The problem can be decomposed into independent subproblems: each node can allocate its own CPU regardless of the others. For each node , the optimization problem is given as
minimize  (35)  
subject to  (36)  
(37)  
variables 
Note that is convex with respect to (since all parameters in are positive) and the constraints (30)(32) are also convex. Therefore, optimization (33)(34) is convex. The decomposed subproblem (35)(37) for each is also a convex problem that can be easily solved.
IvA2 Beamforming design
With task assignments and subchannel allocations determined, the optimization problem (22)(32) with respect to the beamforming design variables and , can be reduced to
minimize  (38)  
subject to  (39)  
variables 
We refer to this as the minimum communication overhead beamforming (MCOB) problem. Conventionally, objective functions in beamforming resource allocation problems take the form of sum rate or sum harmonic rate utility functions [18]. In our D2D setting, the objective instead becomes the weighted sum of time and energy consumption for transmission.
We are interested in determining the variables and related to active data streams, i.e., for , , and with and . Denote set of all transmit nodes as from (7). Since each node offloads to one on one subchannel , we index this datastream as the tuple .^{1}^{1}1Once and are determined, the tuple is specified by and can be written as . For convenience, we are omitting the dependency of and on . Our problem can be then rewritten as
minimize  (40)  
subject to  (41)  
variables 
This problem is nonconvex and hard to solve due to due to the logarithm term in the data rate in (17). However, if the beamformers^{2}^{2}2In this case, the notation is short for which denotes all variables with . Throughout the paper, the context will make the distinction clear. The same simplification is applied for , , , and . are fixed, minimizing (40) leads to the well known minimum mean square error (MMSE) receiver. If we restrict ourselves to using MMSE receiver, we can transform the data rate into a quadratic form with the following lemma.
Lemma 1.
With an MMSEdesigned receiver, the data rate in (17) can be represented in quadratic form as
(42) 
where
(43) 
is an auxiliary variable, and the term is the MSE of receive node given by
(44) 
The proof is immediate from [31]. Since is concave with respect to each of the variables , and , the optimal solution to (42) is
(45)  
(46) 
where . Note that is the MMSE receiver solution.
Using the formulation in Lemma 1, the optimization problem (40)(41) can be written as
minimize  (47)  
subject to  (48)  
variables 
where
(49) 
For a given , the optimal solutions of and for (47)(48) are given by (45) and (46). Moreover, for given and , the function is convex and is concave with respect to . Optimization (47)(48) with respect to is thus a convexconcave multipleratio fractional programming problem [30], which is not convex. Motivated by [16], we will exploit the fractional programming approach to solve it.
Specifically, we have the following theorem, which introduces an equivalent problem that is convex with respect to each individual set of variables , , and when two other sets of variables and are introduced.
Theorem 1.
The proof of Theorem 1 is relegated to the Appendix. Optimization (47)(48) is equivalent to (50)(52) in the sense that they have the same globally optimal solutions. Using the fact that optimization (50)(51) is convex with respect to each set of variables , , and , we will solve for each set, iteratively, which will yield solutions with and being fixed. Specifically, we propose an iterative algorithm to solve (50)(51) and satisfy the system equations (52) simultaneously: given and , we solve for , , and , and then update and from the updated variables , , and .
To solve (50)(51) for fixed and , we use the block coordinate descent (BCD) method, where each set of the variables is solved fixing the other two. In particular, with and fixed, the optimal solution of each is given in (45). With and fixed, the optimal solution of each is given in (46). The remaining part is to solve for with and fixed.