Artificial Intelligence (AI) applications are widely employed in various different sectors of our daily life, such as in smart cities, smart homes, smart transportation, smart health, and finance. Moreover, the upcoming 6th Generation mobile networks (6G) will support an even wider range of applications based on AI, such as holographic communications, super-smart homes, and cooperative autonomous robots. The 6G system will change the network vision from ”connected things” to ”connected intelligence” , in which advanced and ubiquitous AI will empower numerous applications.
Traditional use of Machine Learning (ML) models relies on batch processing in a central server and the employment of datasets containing user data. With the worldwide adoption of data protection and privacy legislation, the creation of datasets and applications based on ML has been considerably limited. One way of coping with such restrictions is the adoption of Federated Learning (FL), which is a distributed way of processing machine learning algorithms that does not disclose private data. In FL, clients train a local ML model using a private dataset, and the parameters of these local models are then sent to a central server. The server produces a global model on the basis of the numerous parameter values received and distributes this global model to the clients for further training. This round of processing is repeated until the global model produces results with an acceptable level of accuracy. In this way, user privacy is preserved. The most common approaches for the consolidation of the parameters sent by the clients to produce the global model (FedAvg) rely on the assumption that clients are synchronized and that local datasets are independent and identically distributed [mcmahan2017communication]. Such FL can be used for the training of large ML models involving thousands or even millions of parameters.
The processing of Federated Learning models has brought challenges to communication networks. FL clients may produce highly bursty traffic when uploading their model parameters to the server. For example, clients training a Convolutional Neural Networks model with few convolution layers and thousands of parameters may need to send hundreds of megabytes. When millions of parameters are involved, the amount of bytes sent can be of the order of gigabytes [Kaiming_CVPR2016-7780459]. Moreover, FL may impose stringent communication delays for the uploading of client parameters to enhance fast convergence to the global model, especially when the federation involves numerous clients. To cope with diverse communication delays, the server may either wait the arrival of the local parameters from all clients, increasing convergence time, or exclude the late arriving data from the parameter consolidation step, which reduces the accuracy of the model [li2020bandwidth]. In addition, FL may also require a very large number of training rounds to produce accurate global models [caldas2018leaf]. These challenges calls for efficient resource allocation mechanisms to meet the FL requirements.
Passive Optical Network (PON) is a cost-efficient access network technology for delivering broadband services [flavio]. Operators have already deployed 10 Gb/s Time Division Multiplexing (TDM) PONs during the past two decades. In recent years, the ITU and IEEE standardization groups have proposed next-generation PONs based on Time and Wavelength Division Multiplexing (TWDM) to increase the network capacity for supporting demanding applications and services. TWDM allows allocation in various wavelength channels of 25 Gb/s (50G-EPON) and 10 Gb/s (40G-XPON) [wey2020outlook].
Infrastructure service Providers owning a PON can increase network utilization, as well as their profits, by offering a diverse variety of services to a variety of different customers, such as residential users, enterprises, and mobile network operators (MNO). The capacity to support the Quality of Service (QoS) requirements of emerging 5G/6G applications, such as distributed Machine Learning, Tactile Internet, or Mobile Fronthauling (MFH) calls for efficient use of network resources. Nonetheless, the aforementioned challenges of FL applications make QoS provisioning challenging.
A few approaches have been proposed to deal with FL processing over PONs ([li2020bandwidth, li2020scalable]). One of these is the Bandwidth Slicing (BS) approach for TDM-PONs, that reserves portions of the PON capacity for FL clients [li2020bandwidth]. The bandwidth for each of the slices granted per cycle is orders of magnitude less than that required for transmitting a model update, which implies that several scheduling cycles will be required to fully upload the parameters of the client models. An architecture for scalable federated learning involving two-step of aggregation over PONs was introduced in [li2020scalable]. In this proposal, the parameters of local models are first aggregated at clients connected to Optical Network Unit (ONU) and then aggregated on a server connected to the Optical Line Terminator (OLT). As a consequence, the amount of upstream traffic remains relatively constant regardless of the number of clients in the federation. However, the high volume of traffic generated by the FL clients can create network bottlenecks, which impacts on the time to upload of the local parameters.
This paper discusses the main issues for supporting FL applications over PONs and introduces a Dynamic Wavelength and Bandwidth Allocation (DWBA) algorithm for 50G-EPONs that dynamically prioritizes FL traffic while maintaining the traditional guaranteed bandwidth scheme for PON customers. Two prioritization policies have been proposed to reduce the delay of FL traffic and delay-critical applications. In the first, the intra-ONU scheduler strictly prioritizes the FL traffic over that from other types of applications (FL-first policy). In the second, the delay-critical traffic is prioritized over the FL traffic (DC-first policy). The BS algorithm introduced by Li et. al (2020) [li2020bandwidth] was adapted to employ multiple wavelengths, as well as an adaptive polling cycle as required in 50G-EPON networks with dynamic resource allocation for comparison purpose.
Results show that the DC-first policy increases the FL model accuracy and reduces the delay of federated learning and delay-critical applications when compared to the BS approach and the FL-first policy.
Ii Resource Allocation in Passive Optical Networks
PON is a network access technology that offers larger capacity, greater cost-efficiency, and more energy savings than do other network access technologies. There are two main PON standards: Ethernet PON (EPON) and Gigabit Capable (GPON), with EPON being less expensive. GPON transmission system ofemploys synchronous frames issued at every 125 s, while those of EPON use Ethernet frames asynchronously for transmissions based on granted cycles of variable duration. While traditional PON standards allow bit rates of 1 and 10 , the next-generation PON standards allow those of 40 to 100 .
The 50 Gb/s optical access network standardized in IEEE 50G-EPON 802.3ca-2020  is a promising technology for adoption by InP to support emerging services with strigent latency and bandwidth requirements. This 50G-EPON technology employs the Time and Wavelength Division Multiple Access (TWDMA) technique for controlling uplink transmissions between the ONUs and the OLT. There are three main TWDM-PON-based access architectures for the connectivity between the OLT and ONUs: Multiple-Scheduling Domain (MSD), Single-Scheduling Domain (SSD), and Wavelength Agile (WA). In the first, ONUs transmit on a single wavelength at a time. In the second, ONUs can transmit simultaneously on all wavelengths, and in the third, more than one wavelength can be granted to a single ONU.
In this technology, the signaling protocol Multipoint Control Protocol (MPCP) is employed for resource allocation. This protocol uses Report and Gate messages for this allocation. Report messages are sent on the upstream to the OLT by the ONUs to request bandwidth, while Gate messages are sent on the downstream by the OLT to the ONUs to inform the grated wavelength(s) and transmission windows, as well as the starting time of the next transmission window. Resource allocation is carried out in two steps, one for wavelength allocation and the other for bandwidth allocation. The use of different schemes for transmission on multiple wavelengths can be defined on the basis of conventional Dynamic Bandwidth Allocation (DBA) algorithms for TDM-PONs.
For dynamic bandwidth allocation over EPONs, the Interleaved Polling with Adaptive Cycle Time (IPACT) algorithm has been adopted to complement the MPCP protocol. The IPACT algorithm employs an interleaved polling and statistical multiplexing technique that leads to efficient upstream channel usage. The limited policy has been used to assure bandwidth to ONUs according to pre-defined Service Level Agreements. Moreover, the original IPACT algorithm employs a single wavelength channel for scheduling. It has been modified to operate with multiple wavelengths in [mcgarry2006wdm], [wang2017dynamic] and [hussain2017low]. The modified IPACT algorithm was proposed for the SSD and MSD architectures [mcgarry2006wdm]. Additional algorithms have been proposed: the Water-Fill (WF) [wang2017dynamic] to promote fairness in the wavelength utilization and First-Fit (FF) [hussain2017low] to provide less delay. Moreover, when there is no scheduler for Quality of Service (QoS) provisioning in the PON, the First-Come-First-Served (FCFS) queuing policy is employed. However, this strategy does not consider the priority or required bandwidth/delay of the applications.
The performance of diverse applications in a PON is ensured by the adopted QoS mechanism, it controls the way frames are queued, prioritized, and scheduled. Such assurance of QoS can be provided by either the ONU or OLT. In the single-level architecture, the ONUs reports individual queue sizes, while the OLT distributes the bandwidth for each type of traffic. In the hierarchical architecture, the OLT allocates bandwidth for each ONU, and the ONUs manage the amount of bandwidth to be allocated for each traffic.
The most straightforward method of facilitating QoS
provisioning is the Differentiated Service approach. It classifies network traffic and delivers different services to different applications. The simplest way to implement Differentiated Services is to employ strict priority scheduling. The ONUs categorize the incoming traffic and put it in the buffer, imposing the prioritization of different traffic classes. With the employment of Differentiated Services, however, thePON can support packetized voice and video with strict bandwidth and latency constraints, as well as best-effort traffic [mcgarry2008ethernet]. However, none of the exiting mechanisms have been specially designed to support the QoS requirements of FL applications.
Iii Resource Allocation for Federated Learning
Fig. 1 illustrates the scenario of FL processing over a PON; where clients are connected to the ONUs and the server is attached to the OLT.
The FL training process can be either asynchronous or synchronous. In the former, the global model parameters are computed as soon as the server receives updates of the parameters of the local models from a certain number of clients. In the latter, the server aggregates the local parameters that arrive in a period of constant duration excluding the parameters from the late arriving stragglers. This exclusion, however, reduces the accuracy of the model as well as increases the time required for convergence to a final global model.
The synchronization time per round includes the downstream, computing, network, propagation, and aggregation delays, as shown in Fig. (a)a. The downstream delay includes the propagation and transmission delays of the parameters of the global model from the server to the clients. The computing time is the time taken to train the local model at the client in each round. The network delay is the time spent in communication the local model parameters from the clients to the server. The propagation delay is the time taken by the bits transmitted to travel from the client to the server on the network medium. The aggregation delay is the processing time of the aggregation algorithm.
The network delay depends on the network load and the resource allocation mechanism employed for the PON for allocating bandwidth and wavelength(s) to the ONUs. The computing time depends on the capacity of the client and the size of the training dataset, while the propagation delay depends on the distances between the server and the clients. Moreover, long network delays may increase the number of straggler clients, decreasing the model accuracy and increasing the convergence time.
The time taken to transmit the local model parameters to the server depends on the bandwidth allocated to the FL traffic. In general, PON customers receive a portion of the total available bandwidth in the PON due to the shared nature of the upstream channel. Residential and business customers usually have guaranteed bandwidth from tens to hundreds of Mbps, and other PON customers can require on demand up to tens of Gbps. However, the large size of the local model parameters, which can be in the order of gigabytes, may demand several seconds to be fully transmitted, even with guaranteed bandwidths on the order of Gbps, as shown in Figure (b)b.
The unique characteristics of FL applications introduce challenges for the management of the available bandwidth in scenarios with limited bandwidth and diversity of customers, services and applications. The recently proposed BS algorithm to the support of QoS for FL applications assures a slice of bandwidth for the FL traffic. This is then allocated according to the ascending order of downstream client delay and computing time. However, the approach is not efficient for PON scenarios when loads are high and traffic is bursty. The large bandwidth slice required to serve FL traffic properly reduces the available bandwidth for other PON customers. It also reduces the statistical multiplexing gain due to the static allocation of the network resources. On the other hand, if the slice is much smaller than the total available bandwidth, the granted bandwidth is likely to be insufficient to serve timely the FL processing due to the large size of the FL packets. As a consequence, clients send a small portion of the FL frames per cycle, requiring numerous scheduling cycles to be fully served. If two or more clients have FL frames in the queue at a given time, then only one client can use the slice, while the others will have to wait for that slice to become available, as shown in Figure (c)c.
Furthermore, the BS approach is not compatible with traditional PON business models, in which customers rent portions of the PON capacity from the InP to support their applications and services. Moreover, it is unclear who pays for a shared bandwidth slice.
In summary, even though the BS approach reduces the latency for FL applications in relation to the traditional First Come First Served approach, the static bandwidth allocation and the incompatibility of the approach with traditional business models may lead to issue of QoS support and deployment.
Iv Dwba Schemes for Supporting Fl traffic over 50G-Epon networks
This section introduces the proposed scheme for providing QoS for FL applications over 50G-EPON networks.
Iv-a Adaptation of the Bandwidth Slicing Approach to TWDM-EPON
The BS algorithm [li2020bandwidth] serves FL traffic by dynamically allocating bandwidth on a single wavelength for FL clients. It calculates the number of cycles an FL client requires to be completely served based on the required bandwidth and the fixed polling cycle length of 125 employed in the GPON technology. However, next-generation PONs employ multiple wavelengths and the cycle duration is unknown a priori when an adaptive polling cycle mechanism is employed, such as in the EPON technology. Therefore, the proposed BS algorithm cannot be directly employed in TWDM-EPON networks.
We proposed an adaptation of the BS approach for TWDM-EPONs called multi-wavelength BS algorithm (MW-BS). It deals with multiple wavelengths and employs an adaptive polling cycle for dynamic resource allocation. A portion of the PON capacity (bandwidth slice) is still reserved for the FL traffic in each scheduling cycle, but instead of using the polling cycle information to grant the transmission windows for FL traffic and then share the remaining slice capacity with other traffic types, MW-BS reserves the total slice to the FL traffic as long as a bandwidth request from any FL client exists. The use of a dynamic polling cycle reduces the FL traffic delays and avoids the need for information about the duration of the unknown upcoming cycles. Three variations of the MW-BS algorithm were proposed for different TWDM wavelength allocation policies, namely MW-BS-SSD, MW-BS-MSD, and MW-BS-FF.
The flow diagram in Figure 6 summarizes the proposed DWBA scheme residing on the OLT. The ONUs sends the Report message requesting bandwidth for Federated Learning as well as conventional applications. When a Report message arrives from a ONU containing a bandwidth request for FL traffic, the OLT first grants the bandwidth from the reserved slice, if available (Block 1); otherwise, the OLT allocates bandwidth for the conventional traffic (Block 2).
For allocation of bandwidth for the slice, The OLT also reserves bandwidth for the ONU for upcoming cycles. It selects the wavelength(s) as a function of the TWDM wavelength allocation policy, and calculates the next starting time for the FL transmission. For the SSD policy, the OLT grants all wavelengths. For the MSD policy, the OLT grants a predetermined-fixed wavelength. For the FF policy, the OLT grants the first available wavelength, and then calculates the granted transmission window for each allocated wavelength, depending on the number of wavelengths and the portion of the PON capacity designed for FL use. If the granted window is equal to the requested windows, the FL traffic will be fully served, and the OLT will make the bandwidth slice available for the next cycle.
The OLT also calculates the granted bandwidth for the convectional applications. If the OLT has previously allocated the bandwidth for the slice, it selects these wavelength(s) for the FL traffic. Otherwise, the OLT selects the wavelength(s) for the FL traffic and calculates the next starting time for that FL transmission as a function of the TWDM wavelength allocation policies involved. The transmission window for the conventional applications is calculated according to the limited policy. Finally, the OLT issues and send a Gate message with the granted bandwidths for both FL and conventional applications.
Iv-B DWBA for Federated Learning
To address the issues of resource allocation raised here, we introduce a DWBA algorithm that supports QoS provisioning for Federated Learning applications while meeting the requirements of delay-critical applications in TWDM-EPON networks. The algorithm is called DWBA for Federated Learning (DWBA-FL).
The idea behind our proposal is to allow PON customers to employ their guaranteed bandwidth for the scheduling of the FL application, without jeopardizing the QoS provisioning of other delay-critical applications. To achieve this, the proposed mechanism adopts the widely-used Differentiated Service approach to tackle the QoS provisioning problem of Federated Learning applications over Ethernet PON. The prioritization of FL traffic is used to comply with the traditional business model, as well as to improve statistical multiplexing gain.
The proposal employs an intra-ONU scheduler with a strict priority queuing policy to serve the ONU queues. The ONUs arbitrates the transmission demands of the different applications. Upon the arrival of a Report message, the OLT calculates the transmission window according to the conventional limited policy and selects the wavelength(s) on the basis of the TWDM wavelength allocation policies. The OLT then sends a Gate message containing the resource allocation decision. Upon the receipt of that Gate message, the intra-ONU scheduler distributes the received bandwidth among the queues in the ONU. In our model, traffic is classified as Federated Learning, delay-critical, delay-sensitive, or best-effort. The ONUs mantains four different queues for buffering frames for these types of traffic.
We propose two prioritization policies. The FL-first policy defines the FL traffic as that of the priority and the delay-critical, delay-sensitive, and best-effort being of , and priority, respectively. On the one hand, this strict prioritization of FL frames can reduce synchronization time for FL processing. It can also increase the delay of delay-critical application because the FL traffic requires a large bandwidth per cycle. To help alleviate this problem, we propose the DC-first policy, in which delay-critical traffic also has the highest priority and Federated Learning traffic the priority. Moreover, the proposal was defined for all TWDM architectures.
V Performance Evaluation
The performance of the proposed DWBA scheme was evaluated using an EPON simulator (EPON-Sim), previously validated in [ciceri2021PON_5G_MFH]. This extension was extended to support the three architectures, SSD, MSD and WA, proposed for 50G-EPON networks. Moreover, our proposal and the BS approach were introduced in the simulator.
V-a Simulation Model and Setup
The simulation scenarios include a 50G-EPON network with OLT serving 32 ONUs on an optical distribution network with a tree topology. Two wavelength channels of 25 Gbit/s were employed for upstream transmission, giving a total capacity of 50 Gbit/s. The total available bandwidth in the PON was equally distributed among the ONUs, so that each ONU has the same guaranteed bandwidth , while the aggregated offered load per ONU varied from to (for the sake of clearness and brevity, herein after, is omitted from the offered load values of ONU ).
The aggregated load included the traffic of the four different types of application: Federated Learning, delay-critical, delay-sensitive, and Best Effort. The benchmarking framework for learning in federated settings LEAF [caldas2018leaf] was used to generate the FL traffic. The FEMNIST dataset and CNN with two 5×5 convolution layers were used for model training, while the FedAvg algorithm was employed to aggregate the local parameters in the server. Other configurations for the learning process, such as learning rate and batch size, followed the settings defined in [li2020scalablex]. FL clients generated 26.4 MBytes of data in each round of training. Moreover, the ONUs put the local parameters into frames according to the Ethernet protocol, which has a Maximum Transmission Unit of 1500 bytes and a header field for signaling (preamble) of 20 bytes.
The delay-critical applications were modeled employing a Constant Bit Rate (CBR) flow. It was coded with a fixed-size packet of 70 bytes and an inter-arrival time of 12.5 s, which produces an offered load of 44.8 Mbps. The rest of the offered load was evenly distributed between delay-sensitive and Best Effort
traffic. The traffic streams were generated employing Pareto ON-OFF sources. The ON period time and packet-burst size followed a Pareto and Bounded Pareto distributions, respectively. The aggregated traffic at the ONU had a Hurst parameter of 0.8. Moreover, the packet lengths are uniformly distributed betweenand bytes.
A threshold value of was employed in the MW-BS algorithm, as in [li2020bandwidth]. This algorithm reduces the bandwidth for each ONU since it reserves bandwidth for the slice. Moreover, we employ the same aggregated offered load in the simulated algorithms to make a fair comparison. The duration of the guard period was set to 0.624 with a maximum cycle length of 1 ms. Each simulation scenario lasted 100 s and was replicated times.
V-B Simulation Results and Discussion
Mean delay values obtained by DWBA-FL were lower than 80 ms and 150 ms for the FL traffic in both underloaded and overloaded conditions, respectively. The delay values given by BS were at least twice as high as those given by our proposal (Fig. (b)b). This improvement is a consequence of the large windows allocated for transmissions of FL traffic when our proposal is employed.
Moreover, the use of the DC-first policy produced lower delay values for the delay-critical traffic lower than those given by both the FL-first policy and the BS algorithm (Fig. (a)a). This result occurs because the bandwidth slice is statically allocated for the FL traffic. Furthermore, the strict prioritization of FL traffic employing the FL-first policy and the huge amount of traffic produced by the FL application leads to delay-critical application to bandwidth starvation. The mean delay of the delay-critical traffic produced by the FL-first policy was from 200 us to 1000 us and is more than that produced by FL-first policy. Thus, the DWBA-FL with DC-first policy produce mean delay values for the Federated Learning and delay-critical application lower than the other algorithms.
Furthermore, the FF policy produces a slight decrease in delay values for both traffics than the other wavelength allocation policies (i.e., SSD and MSD). This results are a consequence of the wastage of bandwidth due to the excessive uses of guard periods and poor multiplexing gain when employing the SSD and MSD, respectively.
In Fig. (a)a, the blue curve shows the proportion of clients involved as a function of the computing time. It shows the minimal synchronization time per round without any communication delay. The MW-BS algorithm requires a longer synchronization time per round to produce the same percentage of the involved clients than that required by the proposed scheme with DC-first policy. For example, it is required a synchronization time of 1.9 s and 2.1 s to produce a percentage of involved clients of 50 % with our proposal and the MW-BS algorithm, respectively. To achieve a training accuracy of 76% (Fig. (b)b), the proposed scheme can reduce 9.5% of the training time compared to the BS algorithm (i.e., 0.2 s less for a synchronization time of 2.1 s), when the total traffic load is 0.8.
Fig. 13 shows the network delay as a function of the ONU offered load. The MW-BS produce delay values greater than 300 ms, whereas, with the DWBA-FA algorithm, these values are reduced to less than 150 ms. Moreover, for 80 % of the clients, which is the typical percentage of clients that produce accuracy greater than 75 % (see Fig. 12), the MW-BS scheme imposes a network delay greater than 200 ms, while the DWBA-FA imposes delay values lower than 100 ms, under underloaded conditions (i.e., load ¡ 0.85). In summary, DWBA-FL reduces the network delay when compared to the MW-BS scheme. This reduction in delay may decrease the number of stragglers, which in the end leads to a faster convergence and greater model accuracy.
This paper has introduced a resource allocation (RA) scheme for supporting Federated Learning applications in TWDM-EPONs networks. Our proposal includes a strict prioritization for federated learning and delay-critical traffic, which maximizes the allocated bandwidth and reduce the delay for both types of application. Our proposal reduces the synchronization time without compromising the number of involved clients. It also reduces the delay of time-critical and federated learning applications. Future research directions are envisioned as follows. Various FL applications coexisting on a given network infrastructure, which have different size of the local model parameters, number of clients and synchronization time. Mechanisms are needed to appropriately address the QoS provisioning for the diversity of FL applications. These schemes may schedule the FL traffics based on required bandwidth but also considering the number of straggler clients, the diverse FL packet sizes, and synchronization time.
This work was partially sponsored by grant #15/24494-8, São Paulo Research Foundation (FAPESP), and CNPq.