I Introduction
Modern mobile devices such as smartphones and tablets are sophisticated computing and networking platforms with enhanced sensor capabilities (e.g., user location detection and air temperature and humidity measures). Moreover, mobile medical devices are now equipped with significantly advanced medical sensors for Electrocardiography (ECG) signals, body temperature, blood glucose, heart rates, blood oxygen saturation, etc. The popularity of mobile devices now enables instantaneous formation of largescale ad hoc networks by making connections among them. With the support of autonomous networking technologies (e.g., Qualcomm WiFi SON [1] and Bluetooth mesh networking [2]), devices can participate in such networks by transmitting the data collected from their sensors and relaying data from other devices. Therefore, it is important to optimally form a network topology that can improve data transfer rates and data fidelity under severe energy constraints.
One of the challenges in designing a network topology is the high computational complexity involved in finding the optimal solution in a largescale network. Because recent sensor networks contain a large number of sensor nodes, the number of potential network topologies increases exponentially with the number of nodes. Therefore, it is difficult in general to solve the optimization problem for network topology unless it is formulated as a special type of optimization problem (e.g., convex optimization problem).
Another challenge in largescale sensor networks comes from multisource multicast flows, which are inevitable in ad hoc networks. Multiple sources can be in the network because sensor nodes can generate data based on their own sensing operations and deliver information to a set of target destination nodes, resulting in multicast flows in the network. Examples of multisource multicast flows in networks include a sensor grid [3], a healthcare wireless sensor network [4], and the Internet of things [5]. Multisource multicast flows frequently overlap in network paths, but only one flow can be delivered at a time. This is the bottleneck problem that can induce delay in data delivery and degraded network throughput [6]. Therefore, the incoming rate of a node should be taken into account as a constraint for the network topology design problem, such that it should not exceed the link capacity.
In this paper, we propose a distributed network topology design that overcomes the challenges discussed above, while explicitly considering the multisource multicast flows in largescale sensor networks. Specifically, we adopt a gametheoretic approach to formulate a distributed network topology as a network formation game. The nodes in the network are considered as players in a game, which can decide to make connections with their neighbor nodes by considering the associated rewards and costs. The reward in the utility function represents reduced distance between source and destinations nodes, which enables to make shortcuts between sources to destinations as well as reduce connection failure ratio between them, leading to potential network throughput improvement. For this, we design the reward as a decreasing function of distance between a node and the destination, which provides an incentive for a node distant from the destination to make alternative links as it has higher path diversity but a node close to the destination to make direct links to the destination. Hence, the reward function represents the importance of locations of nodes in a network. We impose the cost associated with link formation on the utility function to prevent nodes from making redundant outgoing links. Therefore, each node can make the optimal number of outgoing links by taking actions that maximize its utility. Because each node can determine its own actions for outgoing link formation, the proposed approach is indeed a distributed solution to the network topology formation problem. Unlike a centralized optimization solution, which must evaluate all possible potential network topologies and thereby incurs high computational complexity, the proposed solution enables each node to choose its own actions, leading to significantly lower overall complexity.
The proposed approach also adopts the network coding [7] to solve the bottleneck problem incurred by multisource multicast flows in the network [8]. Network coding is widely known to have several advantages, such as efficient resource usage (e.g., bandwidth and power), and improved robustness and throughput [9, 10]. In this paper, we employ intersession network coding [11] which combines multiple packets from different sources into a single packet before transmission [12]. However, it is challenging to design a lowcomplexity strategy for topology formation since it is an NPhard problem to find the optimal network topology in a network with multisource multicast flows, where network coding is blindly deployed [13].
In this paper, we propose a lowcomplexity solution to topology design of multisource multicast flows by deploying network coding. Since network coding operations combine multiple incoming packets into a single packet, the outgoing rate of a node can always be fixed, so that it can eliminate the constraints on the incoming rate for a node. This enables a node to build its outgoing links without considering the link formations of other nodes, which means that decisions about link formation can be made between only two nodes. Therefore, an player network formation game that includes multicast flows can be decomposed into independent player link formation games with a unicast flow, as we analytically show in this paper. Because the complexity required to solve a player link formation game with a unicast flow is significantly lower than that needed for an player network formation game with multisource multicast flow, the overall complexity can be significantly reduced. Note that if network coding is not deployed, the player network formation game cannot be decomposed into player link formation games.
The main contributions of this paper can be summarized as follows.

We formulate the problem of network topology design as a network formation game, which leads to a distributed strategy for topology formation.

We propose to deploy network coding as a solution to the bottleneck problem inherently incurred by multisource multicast flows.

We design a utility function for the network formation game, where players in the game would like to maximize it by taking actions, which eventually leads to increased network throughput and reduced number of unnecessary redundant links between nodes.

We analytically show that network coding can decompose the network formation game into link formation games, leading to an algorithm with significantly low complexity.
Note that the focus of this paper is not on the code design for intersession network coding, which has been extensively studied in prior works [14, 15, 16, 17, 18, 19, 20, 21]. We rather focus on how to design network topologies that can lead to improved network performance (e.g., throughput and delay), which have been mostly considered as a given condition in previous literature.
The rest of the paper is organized as follows. In Section II, we briefly review the related works. The network model and detailed process of data collection and dissemination based on network coding operations are discussed in Section III. The network formation game for multicast flows and its decomposition into link formation games for a unicast flow are proposed in Section IV. Simulation results and numerical evaluations are presented in Section V, and conclusions are drawn in Section VI.
Ii Related Works
Before the notion of network coding, it was infeasible to achieve upper bound of multicast capacity by conventional storeandforward (SF) relaying architectures [22]. The Steiner tree based topology design can achieve the upper bound of multicast capacity, but solving the Steiner tree is an NPhard problem [23]. In [7], it is firstly shown that network coding can achieve the maximum throughput via the maxflow mincut theorem, and it is further proved that linear network coding [24] can achieve the upper bound of capacity. The optimal topology solution for a single source scenario is studied in [25], however, in multisource scenarios which are frequently observed in sensor network scenarios, the maxflow mincut bounds cannot fully characterize the capacity region, and thus, only loose outer bounds [26] and suboptimal solution [27] are studied.
Network coding has been deployed in a variety of sensor network scenarios [28, 9, 10]. For example, network coding can improve the energy efficiency of a body area sensor network [28]. A robust network coding protocol is proposed for smart grids to enhance the reliability and speed of data gathering [9]. In [10], a mobile crowdsensing scenario is considered for decentralized data collection, and network coding is deployed for energy and spectrum efficiency.
Topology design in sensor networks has been studied in the context of selforganizing networks [29, 30, 31]: in [29], protocols are proposed for the selforganization of wireless sensor networks with a large number of mainly static and highly energy constrained nodes; in [30], a selforganizing routing protocol for mobile sensor nodes declares the membership of a cluster as they move and confirms whether a mobile sensor node can communicate with a specific cluster head within a time slot allocated in a time division multiple access schedule; in [31], distributed energy efficient deployment algorithms are proposed for mobile sensors and intelligent devices that form an ambient intelligent network.
Distributed decision making has been widely considered in the field of game theory and there have been a large number of literatures on network formation games not only in economics but also in engineering [32]. For application to wireless sensor networks, gametheoretic distributed topology control for wireless transmission power is proposed in sensor networks [33]. The purpose of topology control is to assign pernode optimal transmission power such that the resulting topology can guarantee target network connectivity. A similar study of a topology control game in [34] aimed to choose the optimal power level for network nodes in ad hoc networks to ensure desired connectivity properties. In [35], a dynamic topology control scheme that prolongs the lifetime of a wireless sensor network is provided based on a noncooperative game.
[29, 30]  [31, 33, 34]  [27]  [25]  [35]  This Paper  

Source type  Multisource  Multisource  Multisource  Single source  Multisource  Multisource 
Flow type  Unicast  Unicast  Unicast  Multicast  Multicast  Multicast 
Solution type  Centralized  Distributed  Centralized  Distributed  Distributed  Distributed 
Relaying type  SF  SF  Network coding  Network coding  SF  Network coding 
In Table I
, serveral representative related works are classified in terms of source, flow, solution and relaying types. In contrast to
[29, 30, 31, 33, 34, 25, 27], this paper includes the most generalized source and flow types, i.e., multisource multicast flows. Compared to [35], which also considers multisource multicast flows, this paper explicitly considers network coding function in topology design problem such that previously described throughput advantage of network coding in a multicast flow can be properly utilized.Iii Network Coding based Sensor Networks
In this section, we describe our network model and network coding based packet dissemination in sensor networks.
Iiia Network Model
We consider a directed graph with a set of nodes and a set of directed links ^{1}^{1}1 If the considered network changes over time, the network can be modeled by a directed graph with a set of nodes and a set of directed links as a function of time slot . However, our focus in this paper is on the distributed solution for topology formation at each time slot, so that we omit the subscript in the rest of this paper without loss of generality. . An element can be a source node, a destination node, or a source and destination node. The number of nodes in is denoted by , where denotes the cardinality of a set. The set of destination nodes for is denoted by , and represents an index set of destinations for all network nodes.
A directed link from to is denoted by , where the active link () can deliver data. Otherwise, the link is inactive, and . Note that the link has direction, i.e., is the tail and is the head, so that . In this paper, is called an incoming link of or an outgoing link of . includes only active links so that is the number of active links in .
Let be the Euclidean distance between and . As special cases, we define that if and if and are not reachable. The set of neighbor nodes of is denoted by , where denotes a connection boundary. If , then a link between and cannot be formed, i.e., . and denote a set of neighbor nodes of with active incoming and outgoing links, respectively. An illustrative example of a sensor network topology with seven nodes is shown in Fig. 1.
For simplicity, we assume that the capacity of a link is one packet per unit of time slot, i.e., a node can transmit only one packet in each time slot [36, 37, 38]. We also assume that each sensor always has a packet to send at each time slot (e.g., a sensor generates a packet for every time slot). Hence, if it builds an outgoing link, there always exists a packet to be transmitted through the outgoing link. If a node has multiple outgoing links, it multicasts a single packet per time slot through all the outgoing links so that all outgoing links from one node deliver the same packet at the same time slot. If a node has multiple incoming links, it can receive multiple individual packets by deploying e.g., multipacket reception techniques [39, 40]. Even though a node can receive multiple packets at a single time slot, however, under the conventional SF relaying architecture, a node cannot transmit more than one packet at a time because of the link capacity constraints. Hence, a node becomes a bottleneck of flows when it receives a larger number of packets than its output link capacity (i.e., one packet per unit of time slot), which is referred to as the bottleneck problem.
To prevent the bottleneck problem, a node may restrict the amount of incoming packets not to exceed link capacity, and such constraint can be feasible as restricting the number of incoming link to at most one, i.e.,
(1) 
which is referred to as interlink dependency condition in this paper.
IiiB Data Collection and Network Coding Based Dissemination
Every node in plays the role of source by collecting data (i.e., sensing) and simultaneously plays the role of relay by disseminating the collected data. Since the goal of the network is to deliver all data collected by source nodes to their own destination nodes, this may incur in multisource multicast flows.
Let be source data collected by node , which needs to be delivered to destination nodes. We denote as the data transmitted at the link . The network status of is defined as the set of data included in with the interlink dependency condition in (1), which is denoted by and expressed as (2).
(2) 
If , then . Otherwise, . Hence, in (2) represents a set of data included in for the link dependent data .
IiiB1 Elimination of interlink dependency by network coding
If network coding is deployed in , the resulting network status is denoted by and is expressed as
(3) 
where denotes a network coded packet that flows into . The network coded packet
is a vector of the global coding coefficients
as the header and as the payload, which is constructed as(4) 
where and denote the addition and multiplication operations in a Galois field (GF), respectively. Hence, the network coding function combines all packets that flow into and generates a single packet . This operation allows a node to take multiple incoming links and prevent the bottleneck problem, so that the interlink dependency in (2) can be eliminated as in (3).
The elimination of interlink dependency through the network coding function can be interpreted as follows. The network coding function converts the link dependent data into the link independent data . Hence, receives a single of packet from all incoming links no matter how many incoming links are formed^{2}^{2}2Note that (3) does not mean that the packet is coming to the node for all incoming links . Actual packets in are not all the same as . All incoming packets are combined into based on the network coding operation in (4) and it can be interpreted as the node receives from previous nodes. . Examples of this interpretation are illustrated in Fig. 2.
In the network operation in (4), node combines its data and all the incoming data multiplied by local coding coefficients , expressed as
(5)  
(6) 
In (5), because is transmitted through , and (6) is induced from (4).
IiiB2 Conditions for perfect decoding
Let be a set of source nodes whose destination set includes and be an index set of source nodes for . Given the packets that received, we can construct a vector of network coded data and the global coding coefficient matrix , expressed as
(9) 
where .
Node is then able to perfectly reconstruct its source data, as long as satisfies the following two conditions: 1) for all , where denotes all zero vector with length , and 2) is fullrank, where is the matrix with all for removed from . The condition 1) ensures that the received packets include all data that should be reconstructed. This greatly depends on the connection failure ratio of , which will be considered in the latter part of this paper (i.e., Fig. 6) and it can be controlled as a system parameter. The condition 2) guarantees that the decoding process can uniquely reconstruct data for all
. Since the global coding coefficient matrix can become fullrank with high probability by RLNC
[22, 42]^{3}^{3}3 It is shown in [42] that if RLNC is employed, the probability that the global coding coefficient matrix is fullrank is at least . In general settings, GF size is much larger than the number of destinations in network. Hence, it is widely accepted that the global coding coefficient matrix is fullrank with high probability if RLNC is used., the condition 2) can be satisfied. The decoding process can then be implemented based on the wellknown Gaussian elimination in a GF [43].While the conditions for perfect reconstruction can be generally satisfied with high probability, some special applications (e.g., delaysensitive applications, errorprone networks with a high packet loss rate, etc.) may cause perfect reconstruction failure, i.e., random mixing in the intersession network coding may lead to increased decoding delay if only a subset of the coded sources of interest is arrived at the destination node. In this case, alternative decoding algorithms [44, 45, 46] can be deployed.
Iv Distributed Topology Formation Based on Gametheoretic Approaches
In this section, we propose a distributed topology formation strategy in a sensor network with multisource multicast flows. We formulate the problem of how to make decisions on link connections between nodes in the considered network as a game, referred to as network formation game. Then, we show that the network formation game can be decomposed into link formation games, which enables each node to decide which links are active or inactive. Therefore, this eventually leads to a distributed solution.
Iva Network Formation Game
Given a set of nodes and a destination index matrix , where is an index vector for destination nodes of , a strategic form of the network formation game can be expressed as
where , and denote a finite set of players, a finite set of actions for player , and the utility function of player , respectively. denotes the Cartesian product.
A network node is a player in the network formation game, which makes decisions about link formation with its neighbor nodes . The action of is denoted by . The utility of is defined as a quasilinear utility function, expressed as
(10) 
where denotes a set of actions taken by players other than in . Given destination nodes, the utility of a player can be determined by the reward and cost associated with its own and others’ actions. The reward for action taken by is given as
(11) 
where is determined by action .
The reward in (11) represents the distance reduction toward the destination nodes by the action , which induces short delay and high throughput. If all nodes in network build links with consideration of reward, they connect to the node as close as possible to the destination and it leads to draw a shortcut from sources to destinations. Let be a vector of distances from to destinations for all . We define the function such that it is inversely proportional to ^{4}^{4}4 In Section V, we provide more details about with an example (e.g., (31)(32)) and confirm the effect of node location on link formation (e.g., Fig. 4)., thereby leading to higher rewards for nodes closer to the destinations. For example, if is located closer to destinations than , then and correspondingly, meaning that positive rewards are given for the formation of link . Hence, the reward function can represent the importance of a node location. Moreover, the reward function can lead to higher sourcedestination connectivity in the network. In case of link formation, closer nodes have a greater effect on connectivity than distant nodes because a distant node has higher path diversity; it may have alternative routes toward destinations.
Given the actions selected by players, the cost is defined as
(12) 
where is a unit cost for link formation. We define . The cost in (12) represents the total payment required for all outgoing links that makes. This can be considered as the penalty for incurring interference to neighbor nodes or as the energy consumption required to transmit a radio signal. For a link between and , if either or decides to build the outgoing link, the unit cost for link formation is solely charged to the node that builds the link. If both nodes decide to build the link, the link formation cost is equally charged to them^{5}^{5}5 The equaldivision mechanism was first proposed in [47] and it has been extensively deployed in network formation cost (e.g., [48, 49]). , i.e.,
(13) 
The solution to the network formation game is the set of actions , which is optimally taken by each player and determines and the corresponding network topology. While the proposed solution to the network formation game can be obtained in a distributed way, the computational complexity required to find the solution can be significantly increased especially as becomes large (i.e., network size grows). Hence, we show that the network formation game can be decomposed into several link formation games by deploying network coding, which enables the solution to be found with significantly lower complexity in the next.
IvB Network Coding Based Game Decomposition
We define edgedisjoint subgraphs of as a set of subgraphs whose links are disjoint and the union of them is ^{6}^{6}6 There can be maximum edgedisjoint subgraphs in as each link with two nodes becomes a subgraph of . Hence, it is always possible to decompose into edgedisjoint subgraphs. . Specifically, for edgedisjoint subgraphs of ,

,

,

and

for .
The network formation game for a subgraph with can be expressed as
and the network status for the resulting network from is denoted by , as defined in (2).
Since the actions simultaneously determined by the players in a game are the union of the links that are active and inactive in the network, the product operation for games can be considered as the union of their network status, expressed as
(14) 
In Theorem 1, we show that network coding can decompose the network formation game into independent games for subgraph for .
Theorem 1.
The network formation game for a graph can be decomposed by network coding into independent games for edgedisjoint subgraphs.
Proof.
To show that the network formation game for a graph can be decomposed by network coding into independent games for edgedisjoint subgraphs it should be proved that
(15) 
where is the network coding function defined in (3).
The network formation game for a graph is the joint game of edgedisjoint subgraphs
, which can be played as sequential conditional games based on a chain rule as in (
16)–(18).(16)  
(17)  
(18) 
In here, the equality between (16) and (17) is based on (14), and (18) is based on the definition of network status in (2). Note that the network formation game expressed in (18) still includes the interlink dependency.
By applying the network coding function in (18), we have
(19)  
(20)  
(21)  
(22)  
where the equality between (20) and (21) is based on (3), and (22) is based on (14). Therefore, the network formation game for a graph can be decomposed by network coding into independent games for edgedisjoint subgraphs, which completes the proof. ∎
Importantly, Theorem 1 implies that with multiple destinations in can be further decomposed into for and with single destination node , which is shown in Theorem 2.
In order to prove this, we define a virtual subnode of that has flows to be delivered to destination as . By definition, there are virtual subnodes in . Similarly, a virtual sublink of with destination is denoted by , and includes sublinks. Then, a virtual subgraph for destination can be defined as , which satisfies

,

,

and

for .
Theorem 2.
The network formation game with multicast flows can be decomposed by network coding into independent games with unicast flows for edgedisjoint subgraphs.
Proof.
In this proof, we show that can be decomposed into for and .
In Theorem 1, it is shown that
(23) 
Since a subgraph can be decomposed into virtual subgraphs , the game with network coding can also be decomposed into independent games for virtual subgraphs based on Theorem 1, i.e.,
(24) 
Therefore, we can conclude from (23) and (24) that
(25) 
which completes the proof. ∎
Theorem 2 implies that network coding allows the network formation game for multicast flows (i.e., ) to be decomposed into independent games with unicast flows for edgedisjoint subgraphs (i.e., ). Moreover, Theorem 2 enables the topology of a network with multisource multicast flows to be determined in a distributed way, by solving independent games of edgedisjoint subgraphs with unicast flows, referred to as link formation game in this paper. More details about the link formation game are given in the next section. An illustrative example of the network formation and link formation games are shown in Fig. 3.
IvC Link Formation Games and Distributed Topology Design
As discussed in Section IVB, a link formation game consists of two players with a unicast flow. The strategic form of the link formation game can be expressed as
(26) 
where and denote a player set and an action set for player for destination , respectively. The utility function is expressed as
(27)  
(28) 
where is an inversely proportional function of . For the link formation game, the cost function can be expressed as
The corresponding normal form of the link formation game is shown in Table II.
()  

As a solution concept for the link formation game, we adopt the pure strategy Nash equilibrium (NE). A pure strategy NE for and can be expressed as
(29) 
and
(30) 
If multiple pure strategy NEs exist, the set of pure strategy NEs is denoted by
The pure strategy NE enables nodes and to decide which outgoing links are active or inactive, resulting in a stable network topology .
The steps for the proposed solution are described in Algorithm 1. In Algorithm 1, nodes are decomposed into sets of node pairs for . Then, the link formation game is formulated as given and a destination node . The link formation games, for , , are solved by finding pure strategy NE, and all the active links can eventually be included in .
Theorem 3.
It is guaranteed that at least one topology can be determined by Algorithm 1.
Proof.
This can be proved by showing that there exists at least one pure strategy NE for the link formation game, i.e. .
It is shown from Nash’s Existence Theorem [50] that every finite game has a mixed strategy NE, where pure strategies are chosen stochastically with certain probabilities. Since the link formation game defined in (26) includes a finite number of nodes and a finite number of actions, it is a finite game. Therefore, there exists a mixed strategy NE for this game.
Suppose that is a strategy of player with the probability of taking action . The corresponding utility is given by
Let and be mixed strategy NE for and that satisfy and for all .
For perturbation of mixed strategy NE , the resulting utility of can be expressed as
If , then , and thus, can always decrease its utility by decreasing . This means that the pure strategy strictly dominates any strategies . For , on the other hand, , and thus, the pure strategy strictly dominates any strategies . If , then
Comments
There are no comments yet.