We revisit the problem of device association in the setting of massive connectivity. In this problem, we seek to find a rule to associate mobile traffic to certain serving stations such that the incurred load is balanced across the available stations. Although a plurality of association methodologies are available in the literature, here we focus on the underexplored aspect of scalability; we seek load-balancing associations for thousands of devices to hundreds of stations, a setting where even linear program solvers become cumbersome. To address the arising computational challenge, we propose to use Optimal Transport (OT) theory, which studies the transfer of masses over a metric space. Recently, an entropic regularization of OT was shown to provide superfast algorithms for very large OT instances 
, making the framework applicable to a wide range of challenging applications, including image processing and machine learning.Our goal in this paper is to use regularized OT to derive association rules for a very large number of wireless devices.
Our centralized approach is motivated by the upcoming 5G wireless networks. According to recent reports, telecom operators are increasingly interested in deploying Cloud-Radio Access Network (Cloud-RAN, or C-RAN) systems, cf. . The architecture of C-RAN economizes computation and signal processing by migrating the computing part of base stations to a central cloud location, and using simple Remote Radio Heads (RRH) to broadcast signals . Since all the intelligence is now moved to the C-RAN controller, provisioning connectivity in a large geographical area is centrally decided, motivating the centralized association of devices to RRHs.
A contemporary C-RAN controller manages a big number of RRHs and mobile devices, thus the centralized device association is inherently a large scale problem. Furthermore, the number of RRHs and of devices are both expected to increase in the near future. On one hand, the deployment of RRHs will become very dense to improve the effective capacity of the network 
, while on the other hand, we expect a huge number of heterogenous smart IoT devices to connect to 5G mobile networks by 2020–estimated 20.4 billions in. Since 5G applications can have radically diverse requirements, the traffic footprint of each connected device (IoT or regular) can vary significantly. This motivates us to study the large-scale centralized device association with potentially different traffic requirements per device.
Ii The Device Association Problem
Ii-a Downlink Model
We consider the downlink111 Our approach applies also to the uplink as long as the modeled upload rate can be considered independent of the association rule. transmissions of a large C-RAN cell, containing devices and RRHs. We call the association variable, which is the fraction of time device connects to RRH . Our objective is to decide the association rules in order to optimize a network performance metric, such as the sum of RRH load or the average job completion time (delay).
In order to evaluate the impact of an association rule we consider the download rate obtained when device is connected to station while receiving exclusive service; this is denoted by . This rate should be calculated at the granularity of association changes. Typically, we may assume the use of a temporally-fair scheduler which distributes the station resources to the different users and averages out fast fading effects. In this case, a reasonable model for the download rate is:
where, is the bandwidth, , and the index is reversed on purpose to denote directivity, denotes the transmit power, the path loss, the strength of thermal noise. This model has been extensively used in the literature, cf. [7, 8, 9]. Here we use it only as an example; our framework only requires that are independent of the association variables. At this point, it might seem reasonable to connect each device to the RRH that provides the largest –known as the maxSINR rule–however, this results in poor performance as we explain next.
Ii-B RRH Load
We suppose that a device requires to download traffic , which is known222This information can either be provided by the device, or forecasted by the system. and device-dependent. Also, it downloads jobs with average size ; extension to device- or RRH-specific job sizes is trivial. Given a decision we may determine the load of RRH as:
Connecting devices to RRHs with highest has the beneficial effect that the load contribution of each device is minimized (since the term
is maximized). However, when devices are not uniformly distributed in the area and/or total traffic demand is imbalanced, some RRHs attract more connections and become overloaded. This will result into poor performance, because wireless service also depends on the competition between users at the associated RRH, and rapidly degrades as. For example, we may estimate the average job completion time by , where is the average number of jobs running at RRH . Assuming the jobs are served according to the processor sharing discipline, for , is given by 
As , the average completion time of a job of the connected devices will become very high.
Summing up the associations of device , we may quantify its average completion time by . Ultimately, we are interested in minimizing the average completion time over all devices, which leads to device association problem:
This is a non-convex continuous optimization problem, and we are interested to solve it for large dimensions.
Ii-C Related Work
The baseline maxSNR association rule reads “connect each device to the station with the strongest signal”. This simple association rule works well when the mobile traffic load is low, or symmetrically scattered around stations, but otherwise it can lead to significant load imbalances. An improved rule that captures interference (but not traffic fluctuations) is the maxSINR rule “connect each device to the station with the highest SINR”, .
To improve the performance over the above rules, the problem of device association has been studied in its integral form, where each device can be associated to a single station, i.e. the association variable , cf. [12, 13, 14, 15]. However, the combinatorial nature of such formulations makes them inefficient for large-scale instances. Instead,  allowed for all points in the plane and proved that the optimal solution is integral almost everywhere (except boundary points). In this paper, we directly define the association variables to take values with the understanding that fractional solutions correspond to multi-station coverage like CoMP . We observe that our optimal solutions are also “sparse”, meaning that although variables are allowed to take values in , at optimality most of them will be integral (0 or 1).
In the context of continuous device association, past work has considered the optimization of -optimal functions [7, 8] or general convex functions . None of the past approaches can be used to solve (1), which is non-convex. In fact, to the best of our knowledge, solving a large non-convex problem like (1) is generally intractable. Our goal in this paper is to propose a useful methodological tool, which can be applied to very large device association problems.
From the perspective of scalability, most past approaches do not meet the extreme requirements we consider in this paper. An exception is perhaps the distributed algorithm of . A limitation of this prior work, however, is that it depends on the observation of the exact statistical load , and it is not robust to erroneous estimates.
Iii Optimal Transport
Iii-a Introduction to Optimal Transport
The concepts of OT date back to the French mathematician Gaspard Monge who studied in 1781 the transportation of sand masses , what seems to be one of the first linear programming problems studied.
Although the theory of OT has been generalized to optimization with infinite variables, here we restrict our discussion to the illustrative case of “discrete OT”333The notion of discreteness refers to discrete probability measures, hence we have a finite number of continuous transportation variables., where probability mass must be transported between two discrete distributions and , and the transportation cost from point to point is –quite often taken to be the Euclidean distance between the two points. Due to Kantorovich , we can describe the transportation with a coupling
, essentially a joint probability distributionwith marginals and . The discrete OT problem can be written as:
This is a linear program, solvable in polynomial time w.r.t. its size . Further, consider a bipartite graph connecting the points with links of weight , and connect each point of to a virtual source with link capacity , and each point of to a virtual destination with link capacity . The discrete OT corresponds to finding a minimum cost s-t flow of one unit. Using network simplex , we can obtain the solution in ,444Pseudo-polynomial algorithms are faster, but their runtime guarantee depends on the values of [19, 20]. which for , , and , becomes , essentially quadratic to the input size . Although such solution is polynomial to the input size, our problem is of enormous dimensions, and hence the degree of the polynomial is important as well. Below we describe the regularized OT, a method to approximate OT in .
Iii-B Regularized OT
The OT can be approximated in an efficient manner using an entropic regularization . We modify the objective of OT by subtracting the entropy , weighted with the regularization strength coefficient . The regularized OT becomes:
Adding has a number of beneficial effects:
The objective of the regularized OT is –strongly convex, hence (5) has a unique optimal solution.
forces to be non-negative, hence we can drop the constraint .
More importantly, the new objective admits an intuitive reformulation, which leads to a faster algorithm. Consider the affine constraint sets , and , and let ; we can rewrite the regularized OT as:
where is the Kullback-Leibler (KL) divergence. Hence, the regularized OT is a KL projection of onto the intersection of and . Since the KL divergence is the special case of the Bregman divergence for the entropy function, our KL projections benefit from the convergence property of iterative Bregman projections on intersections of affine constraint sets [23, 24]. To obtain an iterative projection algorithm it is convinient to operate on the dual domain. Consider the Lagrangian function
The KKT stationarity condition requires that for each it must hold:
To obtain the projection with respect to we follow the steps: (i) fix , (ii) apply the complementary slackness condition , (iii) solve for . Similarly for
Setting and , we obtain the Sinkhorn algorithm  for regularized OT:
Theorem 1 (From ).
Assume , fix , and choose , Sinkhorn algorithm computes a –approximate solution of (2) in operations.
Since the problem size is , Sinkhorn algorithm converges almost linearly for any fixed (less the logarithmic term). This is to be contrasted with the almost quadratic convergence of network simplex
. Further, Sinkhorn requires only matrix-vector multiplications, hence it admits highly efficient GPU implementations. More information on computational transport can be found in.
Iii-C Sinkhorn vs LP
We implemented Sinkhorn in python and compared its performance to the embedded glpk solver , known to be one of the fastest LP solvers. Table I provides some indicative numerical results to highlight the advantageous performance of Sinkhorn over a standard LP solver, namely: (i) it is faster, (ii) scales better, and (iii) doesn’t run out of memory.
In the experiments, we stopped Sinkhorn when the total absolute residual becomes less than or . The runtimes provided are averages over 7 runs, computed with the timeit package. Notably, we can compute a 1.001-optimal transport in half a second.
Fig. 2(a)-2(b) showcase associations obtained by Sinkhorn and LP-glpk, where we can verify the fidelity of Sinkhorn to the optimal LP solution. Fig. 2(c) showcases a very large instance that the LP solver cannot address.
|1000||25||out of memory||27||37.9|
|5000||25||out of memory||135||204|
|10000||25||out of memory||434||568|
Iv OT as a Heuristic Load Balancer
In this section we explain how one can use OT algorithms to obtain device association rules. An instance of the OT problem is defined by the triplet , i.e., the costs and the left-right marginals. In order to produce a load balancer based on OT, we must provide appropriate values for these parameters. We propose the following choices:
For : we propose to use (i) the Euclidean distance between the location of device and the location of RRH , i.e., , or (ii) the incurred load per unit traffic .
For : the input traffic, .
For : we propose to use (i) equal RRH traffic , (where is the number of RRHs) or (ii) RRH traffic from the maxSINR rule, .
We provide some explanations about the choices. First, the left marginal ensures that the entire traffic of each devices is split among RRHs. To choose wisely, we should know the RRH traffic at optimality, however this information is often not accessible. Instead, it is easy to obtain the maxSINR rule. Last, Euclidean cost favors nearby RRHs, while normalized load connects devices to the RRHs with minimum incurred load, taking into account interference and path loss. We have the following interesting result:
Consider the maxSINR rule denoted with , and the incurred traffic , and suppose it is feasible, i.e., . Also, consider a instance of the OT problem, such that (i) , (ii) , (iii) , with solution .
Then, the association minimizes the total load .
First, note that whenever the maxSINR rule is feasible, it minimizes the total load. This is easy to check by observing that the total load contribution of each device is minimized under the maxSINR rule. Then the total load minimization follows by summing up over devices.
The lemma states that given the RRH traffic under the maxSINR rule denoted with , we may use OT to retrieve the association variables that minimize the total load, a very useful property.
Choosing the marginals like in Lemma 1 (or with a more elaborate scheme as we show below) will work better in practice, however, the choices given at the start of the section can also be useful: they are simpler to compute and can be sufficient when the spatial traffic is uniform. To showcase this property, we experiment with a scenario with random but uniform traffic. In a C-RAN cell with 25 RRHs we scale the number of devices from 100 to 5000. We compare the average completion time under 4 policies, (a) maxSINR, (b) OT with Euclidean costs and equal RRH traffic, (c) OT with load costs and equal RRH traffic, and (d) OT with load costs and RRH traffic equal to (same as Lemma 1). The values shown in Fig. 4 are the ratios of average completion time between the algorithm and the maxSINR. We observe that algorithm (d) performs the same with maxSINR (a) as predicted by the Lemma. The other two algorithms perform similarly. In particular, for a small number of devices, we see that selecting results in performance deterioration (up to 4 times worse), due to the random locations of the devices. However, as the number of devices increases, the traffic becomes more uniform and the choice becomes as good as the maxSINR solution.
V Learning RRH Load
. In this setting, both the maxSINR rule and our OT heuristics will overload the hot spot station. To effectively balance the load we must discover the correct traffic accumulation at RRHs which will provide the average completion time optimization.
We propose an iterative algorithm, where at iteration Sinkhorn algorithm is used with to obtain an association which results in a specific RRH loading . According to this loading, a new marginal is computed, and the process repeats until an accuracy criterion is satisfied. More specifically, our iterative algorithm picks the RRH with the highest load (step 5) and then decreases its aggregate traffic by a fixed term , dispersing the traffic to all other RRHs (step 6). We mention that increasing the traffic in a remote RRH will result in a large number of association changes in the Sinkhorn algorithm which will ensure that the steered traffic maintains minimum transportation cost. We provide the algorithm flow here.
Adaptive Sinkhorn Association
Figures 4(a)-4(c) show the results. First, comparing the two association rules we see that although the maxSINR rule associates the devices according to interference and not traffic, our adaptive Sinkhorn algorithm considers both, and converges to an association where some cell edge devices are steered to the neighboring RRHs. This is done to alleviate the load of the bottom-left RRH. Indeed, figure 4(c) shows the resulting loads of the 4 RRH. Our approach successfully equalizes the loads of the different RRH, while the maxSINR rule fails to do so. Ultimately, our scheme achieves an average completion time 6.3msec while the maxSINR rule 24msec, which corresponds to almost 4 times improvement.
In this paper, we studied the device association in C-RAN, and proposed an iterative algorithm to adjust the loads based on the theory of Optimal Transport. Specifically, we showed that an extension of the Sinkhorn algorithm for C-RAN systems can provide low delay associations for thousands of users in 0.5sec. Our methodology scales to very large problem instances, and has the potential to provide great improvements over the simple baseline approach.
-  M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transport”, in NIPS, 2013.
-  “The C-RAN (Centralized Radio Access Network) Ecosystem: 2017 - 2030 - Opportunities, Challenges, Strategies & Forecasts,” SNS Research Report, Jul. 2017.
-  A. Checko, H. Christiansen, Y. Yan, L. Scolari, G. Kardaras, M. Berger, and L. Dittmann, “Cloud RAN for Mobile Networks–A Technology Overview,” IEEE Communications Surveys & Tutorials, vol. 17, no. 1, pp. 405–426, 2015.
-  R. Irmer, H. Droste, P. Marsch, M. Grieger, G. Fettweis, S. Brueck, H.-P. Mayer, L. Thiele, and V. Jungnickel, “Coordinated multipoint: Concepts, performance, and field trial results,” IEEE Communications Magazine, vol. 49, no. 2, pp. 102–111, 2011.
-  N. Bhushan, J. Li, D. Malladi, R. Gilmore, D. Brenner, A. Damnjanovic, R. Sukhavasi, C. Patel, and S. Geirhofer, “Network densification: the dominant theme for wireless evolution into 5G,” IEEE Communications Magazine, vol. 52, no. 2, pp. 82–89, 2014.
-  Gartner newsroom, February 2017, http://www.gartner.com/newsroom/id/3598917
-  H. Kim, G. de Veciana, X. Yang, and M. Venkatachalam, “Distributed alpha-Optimal User Association and Cell Load Balancing in Wireless Networks,” IEEE/ACM Transactions on Networking, 2012.
-  N. Sapountzis, T. Spyropoulos, N. Nikaein, and U. Salim, “Optimal Downlink and Uplink User Association in Backhaul-limited HetNets,” IEEE INFOCOM, 2016.
-  N. Liakopoulos, G. Paschos, and T. Spyropoulos, “Robust User Association for Ultra Dense Networks,” IEEE INFOCOM, 2018.
-  H.-B. Mor, “Performance Modeling and Design of Computer Systems: Queueing Theory in Action,” Cambridge University Press, 2013.
-  Y. L. Lee, L.-C. Wang, T. C. Chuah, and J. Loo, “Joint Resource Allocation and User Association for Heterogeneous Cloud Radio Access Networks,” ITC 2016.
-  T. Bu, L. Li, and R. Ramjee, “Generalized Proportional Fair Scheduling in Third Generation Wireless Data Networks,” IEEE INFOCOM, 2006.
-  Q. Ye, B. Rong, Y. Chen, M. Al-Shalash, C. Caramanis, and J. G. Andrews, “User Association for Load Balancing in Heterogeneous Cellular Networks,” IEEE Transactions on Wireless Communications, vol. 12, no. 6, pp. 2706–2716, 2013.
-  G. Athanasiou, P. Weeraddana, C. Fischione, and L. Tassiulas, “Optimizing Client Association for Load Balancing and Fairness in Millimeter-Wave Wireless Networks,” IEEE/ACM Transactions on Networking, 2014.
-  M. Awais, A. Ahmed, M. Naeem, M. Iqbal, W. Ejaz, A. Anpalagan, and H. S. Kim, “Efficient Joint User Association and Resource Allocation for Cloud Radio Access Networks,” IEEE Access, 2017.
-  G. Monge, “Mémoire sur la théorie des déblais et des remblais,” Histoire de l’Académie Royale des Sciences de Paris, pp. 666–704, 1781.
-  L. Kantorovich, “On the transfer of masses,” Dokl. Acad. Nauk. USSR, vol. 37, pp. 7–8, 1942.
-  J. Orlin, S. Plotkin, and E. Tardos, “Polynomial Dual Network Simplex Algorithms,” Stanford report, STAN-CS-91-1374, 1991.
-  J. Orlin, “A polynomial time primal network simplex algorithm for minimum cost flows,” Mathematical Programming, vol. 78, pp. 109–129, 1997.
-  R. Tarjan, “Dynamic trees as search trees via euler tours, applied to the network simplex algorithm,” Mathematical Programming, vol. 78, no. 2, pp. 169–177, 1997.
-  G. Peyré and M. Cuturi, “Computational Optimal Transport,” Center for Research in Economics and Statistics, 2017.
-  J.-D. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyré, “Iterative Bregman Projections for Regularized Transportation Problems”, SIAM Journal on Scientific Computing, 37, pp. 1111-1138, 2015.
-  L. Bregman, “The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming,” USSR computational mathematics and mathematical physics, vol. 7, no. 3, pp. 200–217, 1967.
-  H. H. Bauschke and A. S. Lewis, “Dykstra’s algorithm with Bregman projections: a convergence proof”, Optimization, 2000.
-  R. Sinkhorn,“A relationship between arbitrary positive matrices and doubly stochastic matrices,” Ann. Math. Statist., pp. 876–879, 1964.
-  J. Altschuler, J. Weed, and P. Rigollet, “Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration,” arXiv:1705.09634, 2017.
-  The GNU Linear Programming Kit. https://www.gnu.org/software/glpk/