A Payload Optimization Method for Federated Recommender Systems

07/27/2021 ∙ by Farwa K. Khan, et al. ∙ HUAWEI Technologies Co., Ltd. 0

We introduce the payload optimization method for federated recommender systems (FRS). In federated learning (FL), the global model payload that is moved between the server and users depends on the number of items to recommend. The model payload grows when there is an increasing number of items. This becomes challenging for an FRS if it is running in production mode. To tackle the payload challenge, we formulated a multi-arm bandit solution that selected part of the global model and transmitted it to all users. The selection process was guided by a novel reward function suitable for FL systems. So far as we are aware, this is the first optimization method that seeks to address item dependent payloads. The method was evaluated using three benchmark recommendation datasets. The empirical validation confirmed that the proposed method outperforms the simpler methods that do not benefit from the bandits for the purpose of item selection. In addition, we have demonstrated the usefulness of our proposed method by rigorously evaluating the effects of a payload reduction on the recommendation performance degradation. Our method achieved up to a 90% reduction in model payload, yielding only a ∼4% - 8% loss in the recommendation performance for highly sparse datasets



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Federated Learning (FL) mcmahan2017communication

, a privacy-by-design machine learning approach, has introduced new ways to build recommender systems (RS). Unlike traditional approaches, the FL approach means that there is no longer a need to collect and store the users’ private data on central servers, while making it possible to train robust recommendation models. In practice, FL distributes the model training process to the users’ devices (i.e., the client or edge devices), thus allowing a global model to be trained using the user-specific local models. Each user updates the global model locally using their personal data and sends the local model updates to a server that aggregates them according to a pre-defined scheme. This is in order to update the global model.

A prominent direction of research in this domain is based on Federated Collaborative Filtering (FCF) ammad2019federated; chai2019secure; dolui2019poster that extends the standard Collaborative Filtering (CF) Hu2008 model to the federated mode. CF is one of the most frequently used matrix factorization models used to generate personalized recommendations either independently or in combination with other types of model koren2009matrix. Essentially, the CF model decomposes the user-item interaction (or rating) data into two sets of low-dimensional latent factors, namely the user-factor and item-factor, therefore capturing the user and item specific dependencies from the interaction data respectively. The learned factors are then used to generate personalized recommendations regarding the items that the users have not interacted with before.

The FCF distributes parts of the model computation so then all of the item-factors (i.e., the global model) are updated on the FL server and then distributed to each user. The user specific factors are updated independently and locally on each device using the user’s private data and the item-factors received from the server. The local model updates through the gradients are then calculated for all of the items on each user’s device. This is then transmitted to the server where the updates are aggregated to update the item-factors (also known as the update of the global model). To achieve model convergence, FCF and similar federated recommendation models require several communication rounds (of global vs. local model updates) between the FL server and the users. In each round, the computational payload (also known as the carrying capacity of a packet or transmission data unit) that is transferred (upload/download) across the network and between the server and users depends on the size of the global model (here it is the ).

Beyond the major challenges of FL systems litian2019federated; li2019federated, there exists a practical concern that arises when running large-scale federated recommender systems (FRS) in production. Considering the number of factors to be fixed, the model payload increases linearly with the increase in the number of items. Table  1

demonstrates the expected payload estimations of a global model with a total number of items between 3000 – 10 million. For a large-scale FRS comprised of 100,000 items, there exists a key problem of an increasing payload not only for the users but also for the broadband/mobile internet service providers and operators. The requirement to transmit huge payloads between the FL server and users over several communication rounds imposes strict limitations for a real-world large-scale FL based recommender system.

# Items 3912 10k 100k 500K 1 M 10 M
Payload (approx) 625KB 1.6 MB 16 MB 80 MB 160 MB 1.6 GB
Table 1: The Federated Collaborative Filtering model’s payload increases linearly with the increasing number of items. For a large-scale FL recommender system comprised of millions of items, the payload can exceed 1GB. Assuming a fixed number of factors = 20, below the payloads were estimated assuming a floating point precision of 64 and 8 bits per 1 byte. The simple formula used to estimate the payload is (#parameters 64) / 8 = Bytes. The FL model training with increased payloads becomes challenging for resource-constrained devices as well as for network operators with a limited communication bandwidth.

To tackle the payload challenge, we present a new payload optimization method for FRS as shown in Figure 1

. We adopted multi-armed bandit (MAB), a classical approach to reinforcement learning, in order to formulate our solution for minimizing the payloads. In each communication round, our optimization method intelligently selects part of the global model to be transmitted to all users. The selection process is guided by a bandit model with a novel reward policy well-suited for FRS. In this way, instead of transmitting (uploading/downloading) the huge payload that includes the entire global model, only part of the global model with a smaller payload is transmitted over the FL network. The users perform the standard model updates as part of the FRS 

ammad2019federated; 10.1007/978-3-030-67661-2_20, thus avoiding any additional optimization steps (see Figure 1

). As a case study, we have presented the payload optimization of a traditional FCF method. However, the proposed method can be generalized to advanced deep learning-based FL recommendation systems 

qi2020privacy and it can also be applied to a generic class of matrix factorization models 10.1007/978-3-030-67661-2_20

. We extensively compared the results from three benchmark recommendation datasets, specifically Movielens, Last-FM, and MIND. The findings confirm that the proposed method consistently performed better than the baseline method and achieved a 90% reduction in payload with an average recommendation performance degradation ranging from

4% to 8% for highly sparse datasets (Last-FM and MIND).

The contribution of this work is two-fold: (1) We have proposed the first method to optimize the payload in FRS and (2) We have empirically demonstrated the usefulness of our proposed method by rigorously evaluating the effects of payload reduction on recommendation performance.

Figure 1:

The payload optimization method proposed in this study for federated recommender systems (FRS). The payload optimization is performed on the FL server using a multi-arm bandit solution. 1. The bandit model samples a set of items from a probability distribution. 2. The bandit takes a certain action by selecting a particular set of items. 3. The FL server selects part of the global model

Q based on the item suggested by the bandit model. The selected is transmitted to all users. The updates through the gradients of are computed on each user’s device and transmitted to the FL server to update Q. 4. The local model updates of

are considered to be feedback. 5 The rewards are estimated for the selected items based on their feedback. 6. The estimated rewards are used by the bandit model to update the parameters of the probability distribution.

2 Methods

2.1 Collaborative Filtering (CF)

Given a collection of user-item interactions for users and items collected in a data matrix , the standard CF koren2009matrix is defined as a matrix factorization model:


The CF model factorizes into a linear combination of low-dimensional latent item-factors for and user-factors for collected in factor matrices and respectively, where is the number of factor. The cost function optimizing across all users and items is then given as:


where a confidence parameter is introduced to account for the uncertainties arising from the unspecified interpretations of in implicit feedback scenario. Specifically, denotes that the user has interacted with the item . However, can have multiple interpretations such as the user does not like the item or maybe the user is oblivious to the existence of the th item Hu2008. Lastly, is the L2-regularization parameter set to avoid over-fitting.

2.2 Federated Collaborative Filtering (FCF)

FCF extends the classical CF model to the federated mode ammad2019federated; chai2019secure; dolui2019poster. FCF distributes parts of the model computation (Eq. 2) to the user’s device as illustrated in Figure 1. The key idea is to perform local training on the device so then the user’s private interaction data (e.g., ratings or clicks) is never transferred to the central server. The global model is updated on the server after the local model updates have been received from a certain number of users. Specifically, for a particular user , the federated update of the private user-factor is performed independently without requiring any other user’s private data. The optimal solution is obtained by taking , setting , from Eq. 2


Importantly, the update depends on the item-factor Q which is received from the FL server for each round of model updates. However, the item-factor Q

is updated on the FL server using a stochastic gradient descent approach.


for some gain parameter and number of federated model updates to be determined. A particular user computes the item gradients independently of all other users as


where for item is defined as:


Each user transmits the gradients of all items as local model updates to the FL server, where the s are aggregated to update the global model Q (see Eq. 4

). The Adaptive Moment Estimation (Adam)  

kingma2015adam method is used in the context of FCF ammad2019federated; 10.1007/978-3-030-67661-2_20 to better adapt the learning rate () to support faster convergence and greater stability. Finally, in order to compute the recommendations , the user downloads the global model from the FL server according to a predefined configuration setting.

Importantly, in each FL training iteration, the model payloads Q and are transferred between the server and users, and vice-versa. The payload scales linearly with the increasing number of items (as shown in Table 1). We next present our method for optimizing the model payloads by reducing the size of Q and to the point where it is suitable for FRS when deployed in production.

3 Payload Optimization method for Federated Collaborative Filtering

We formulate a multi-arm bandit method to optimize Q model payloads for federated recommender systems (FRS). There exist numerous challenges when optimizing payloads. First, the FCF server does not know the user’s identity. Each user sends the updates which are aggregated without referencing any one user’s identity. To optimize the payload, we cannot determine the item memberships in terms of groups of users, therefore potentially relevant items may be selected in the Q model. Second, in contrast to the standard (offline) training of models, the FL training is performed online with the federated updates arriving from users in a continuously asynchronous fashion. In each iteration, Q is updated when the number of collected updates reaches a certain threshold . Several factors make the FL training computationally challenging such as a low number of users participating in the update, a lesser frequency of updates being sent by the users, and most importantly, a lousy communication over the Internet and the related network latency. In practice, the FCF model training is a complex online sequential learning problem that motivated our choice of proposed method for payload optimization. Consider a particular FCF-based recommendation model training set-up where at each FL iteration ,

  1. the FL server requests the set of items (potential –arms) from the bandit model,

  2. the bandit model selects a subset of items among the set of available items,

  3. the FL server only transmits the global model comprised of the selected items to users (or clients),

  4. a user for returns feedback for as the gradients of the selected items.

In our context, the feedback is used to compute the quantity that has to be optimized aka. reward . To handle the online sequential aspect of the FL model training, our bandit solution is composed of two main ingredients: (1) a strategy recommending the items in order to select the optimal , and (2) a function to infer the rewards when using the feedback received from the FL users. We refer to the proposed method as FCF-BTS (throughout the manuscript) and outline the FCF-BTS algorithmic steps in Algorithm 1.

1:  FL Server
2:  Set number of items to sample
3:  Initialize global model Q matrix and update threshold

  Initialize Bayesian Thompson Sampling bandit model

5:  Initialize local model updates matrix
6:  Initialize parameter to record exponential decay of the squared gradient
7:  for  do
8:     Select items from representing the largest sampled values ordered by their expected rewards Eqs. 9, 7
9:     Subset the Q factor matrix based on items, denoted as
10:     Transmit FL users
11:     Receive item-factor gradients: FL users
12:     if  then
13:         Update Q based on Eq.4
14:         Update using Eq.14
15:         for  do
16:            Compute reward using Eq.13
17:            Update parameters using Eqs.12, 11, 10
18:            Update
19:         end for
20:     end if
21:  end for
Algorithm 1 FCF-BTS: Payload optimization for Federated Collaborative Filtering

Formally, our bandit method for payload optimization is a tuple consisting of four elements :
Item is a subset of the items among the set of available items.
State is the set containing the feedback (or observations) collected by the bandit model from the FL environment. Particularly, , where includes the feedback that the item (for ) has received from the FL users at the iteration . We consider to be the feedback that contains the local model updates .
Actions is the set including the actions suggested by the bandit model. Specifically, , where denotes the action taken by the bandit to recommend the item (for ), to be included in at FL iteration .
Reward , where is the reward function. Particularly, where represents the reward for item (for ) in each FL iteration . After an action is taken by the bandit model, the user provides feedback , which is then used to estimate the reward using Eq. 13.

3.1 Sampling Strategy

As an item-based payload selection strategy, we used the widely known Bayesian Thompson Sampling (BTS) thompson1933likelihood; thompson1935theory; chapelle2011empirical; scott2010modern; kawale2015efficient approach with Gaussian priors for the rewards. We formulated a probabilistic model to sample the next set of item from the posterior distributions, which were then used for selecting

optimally. Specifically, we assumed that the model of item rewards followed a normal distribution with an unknown mean and fixed precision (

) as given by:


The prior probability for unknown

for an item is also believed to be normally distributed with parameter and precision such as:


The posterior probability distribution of the unknown

was obtained by solving the famous Bayes theorem 



where the updates for the posterior parameters of the prior are estimated as fink1997compendium; gelman2013bayesian:


where is the number of times that the item has been selected as part of .


In Eq. 10, is the estimated value of action at FL iteration (or time step) and given by:


where (Eq. 13) is the reward obtained at FL iteration when action was taken. Essentially, in each FL iteration , we update two parameters and of the selected item . Next, we sampled from the posterior distribution (specified in Eq. 9) before selecting the items (aka. –arms) corresponding to the largest sampled values ordered by their expected rewards (Eq. 7). Our setting is similar to that of the multiple arms selection () problem in RS streeter2008online; radlinski2008learning; uchiya2010algorithms; louedec2015multiple, where numerous studies have concluded that BTS achieved a substantial reduction in running time compared to non-Bayesian simpler sampling strategies gopalan2014thompson; broden2018ensemble.

3.2 Reward Function

In this section we present a novel reward function designed for FRS. At the FL iteration , the sampling strategy recommends item set , selected as part of to receive feedback (model updates or gradients denoted by ) from all of the users. For each item , the reward is optimized by integrating the immediate and gradual rate of changes in the gradients, jointly:


where is the regularization term. The quantities and are the gradients of item from the and iterations. As stated by ADAM kingma2015adam, records an exponential decay of the past squared gradients for an item as:


Taking inspiration from stochastic gradient approaches, our method computes a composite reward regularized by the number of FL iterations. The expression sums the reward as the function of the absolute differences in the gradients specifically modelling immediate changes during the initial FL iterations. The impact decreases as more rounds of updates have been completed. Whereas

increases the reward as the cosine similarity of the gradual changes in the gradients increases with the increasing number of FL iterations. The composite reward supplemented by the BTS strategy aims to balance exploration and exploitation. For instance, in the beginning, the item selection depends on the rate of change in the gradients. The items whose gradients changes are large are selected more often, whereas in the later phase, the selection of items is dependent on the overall similarity of the gradients in order to favor stable convergence, particularly in the online training of the recommendation model. Moreover, the regularization parameter

can be tuned to adjust the strength of the information sharing between the immediate and gradual changes, scaled by the a factor . For example, initializing restricts the method to estimate the reward by focusing on long-term gradual changes whereas pushes the function to infer the reward based on the immediate changes in the gradients.

3.3 Regret

We believe that the regret of FCF-BTS can be bounded with respect of the FL iterations . However, the existing works on FL (combined with stochastic gradient and BTS) do not provide sufficient tools for the proper analysis of our method. While the existing approaches provide using Gaussian priors) regret bounds agrawal2013further for BTS algorithm, they do not assume the FL problem settings. Alternatively, an information-theoretic analysis proposed an entropy-based regret bound over time steps for online learning algorithm using BTS russo2016information. However, their bound increases linearly according to the number of actions, which is typically large in our particular problem setting. An optimal regret bound for FCF-BTS is one that has a sub-linear dependency (or no dependency at all dong2018information) with the items (or –arms), in addition to remaining sharp within the large action spaces to duly satisfy the constraints of a privacy-preserving FL recommendation environment.

To summarize, the proposed FCF-BTS method offers a number of advantages in terms of production: (i) it allows for the optimizing of the payloads without collecting the user’s private or personal information such as the user-item interactions, (ii) the optimization of the payloads is performed on the server-side, thus avoiding any additional computational overhead on the user devices, (iii) no customization is needed on the user-side, and the users perform a typical federated local model update step as part of the FRS, and (iv) it enables the smooth plug-in/out payload optimization without making changes to the FL architecture or recommendation pipeline.

4 Related Work

The payload optimization problem and our solution to it are related to communication-efficient methodologies in federated learning. We next discuss the existing methods that promote communication efficiency and relate them to our work.

4.1 Non Recommender Systems

For traditional FL systems, our method can be viewed as a generalized approach for effective and efficient communication at each FL round DBLP:journals/corr/KonecnyMYRSB16 without assuming additional constraints on the users (or client devices), thus supporting privacy-sensitive applications. Several recent studies have provided practical strategies, such as the sparsification of model updates han2020adaptive and utilizing Golomb lossless encoding sattler2019robust. This is in addition to using knowledge distillation and augmentation jeong2018federated; he2020group, performing quantization DBLP:journals/corr/KonecnyMYRSB16, applying lossy compression and the dropout caldas2018expanding, and sub-sampling of the clientssaputra2019energy. From a theoretical perspective, these prior works have explored convergence guarantees with low-precision training in the presence of non-identically distributed data.

Federated Reinforcement Learning: A number of recent studies have adopted reinforcement learning, primarily to address hyper-parameter optimization NEURIPS2020_6dfe08ed and to solve contextual linear bandits NEURIPS2020_4311359e in federated mode.

However, unlike our method, none of these methods address the key challenge of the large-scale FRS running in production, specifically the huge payloads associated with the high number of items to be recommended.

4.2 Recommender Systems

Many studies have demonstrated promising results for FRS ammad2019federated; zhou2019privacy; chai2019secure; dolui2019poster; 10.1007/978-3-030-67661-2_20; qi2020privacy; tan2020federated

. The recommendation models include factorization machine and singular value decomposition 

tan2020federated, deep learning qi2020privacy and matrix factorization ammad2019federated; chai2019secure; dolui2019poster. To overcome the computation and communication costs as part of the recommendations, Chen et al.chen2018federated extended meta-learning to federated mode. Muhammad et al. muhammad2020fedfast

proposed a mechanism for the better sampling of users using K-means clustering and the efficient aggregation of local training models for faster convergence, hence favoring lesser communication rounds for FL model training. However, none of these approaches address the item-dependent payload optimization problem.

Recently, Qin et al. qin2020novel proposed a 4-layer hierarchical framework to reduce the communication cost between the server and the users. Notably, their approach assumes that the user-item interaction behaviors (such as ratings or clicks) are public data that can be collected on a central server. The idea is to select a small candidate set of items for each user by sorting the items based on the recorded user-item interactions. Then it will transmit the user-specific candidate set to each user in order to train the local model and perform inference. Unlike theirs qin2020novel, our approach does not require the recording of any user sensitive interaction data and it solves the payload optimization problem in a standard federated setting with minimal computational overheads on the FL server. Our approach follows the widely accepted FRS setting without requiring any additional requirement for the users to share their sensitive data. It uses only the local model updates to solve the payload optimization problem 111Notably, we did not consider this as a baseline approach in our experiments owing to the differences in the FL architecture and the assumptions on data privacy..

To the best of our knowledge, we have proposed the first method to solve the payload optimization problem for FCF assuming an implicit feedback scenario. However, the proposed method is applicable to a wider class of FRS, particularly concerning the modelling of explicit user feedback without a loss of generality.

5 Datasets

We used three benchmark recommendation datasets to test the proposed federated payload optimization method. The datasets were processed in order to model the implicit feedback interactions in this study. The characteristics for each of the preprocessed datasets have been given in Table  2. We dropped the –timestamp information from the datasets, since we only needed the user-item interactions to analyze the proposed FCF-BTS method. We selected the datasets to rigorously test FCF-BTS primarily for two reasons: (i) the datasets contain a diverse set of items ranging from 3064 to 17632, and (ii) the datasets are highly sparse in nature which is typically anticipated in a production environment.

5.1 Movielens-1M

Movielens-1M harper2015movielens rating dataset was made publicly available by the Grouplens research group (https://grouplens.org/datasets/movielens/). The dataset contained 1,000,209 ratings of 3952 movies made by 6040 users. The rating dataset consisted of the user, movie, rating, and timestamps information. The ratings were explicit, so we converted them to implicit feedback based on the assumption that the users have watched the video that they have rated. All ratings were changed to one irrespective of their original value, and missing ratings were set to zero.

5.2 Last-FM

Last-FM cantador2011second rating dataset was made publicly available by the Grouplens research group (https://grouplens.org/datasets/hetrec-2011/). The dataset contained 92834 listening counts of 17632 music artists by 1892 users. The listening count for each user-artist pair was set to one irrespective of the original value and missing listening counts were set to zero to convert the data into implicit feedback.

5.3 Mind

MIND-small wu2020mind news recommendations dataset was made publicly available by Github (https://msnews.github.io/). It was collected from the anonymized behavior logs of the Microsoft News website. This dataset contained the behavioral logs of 50,000 users. It was an implicit feedback dataset where 1 refers to clicked and 0 refers to non-clicked behavior. Users with at least 5 news clicks were considered. For simplicity, we denoted the MIND-small dataset with the abbreviation “MIND” throughout the manuscript.

Datasets # Users # Items # Interactions Sparsity (%)
Movielens-1M 6040 3064 914676 96.05%
Last-FM 1892 17632 92834 99,78%
MIND 16026 6923 163137 99,89%
Table 2: Overview of datasets used in the study, where #Interactions represent the total number of observed user-item interactions and Sparsity(%) refers to the percentage of unobserved interactions in the training dataset. The Last-FM and MIND datasets are highly sparse in nature, exhibiting a similar level of sparsity to what is typically expected in production datasets.

6 Experiments

To demonstrate the usefulness of the proposed bandit method, we compared the performance of FCF-BTS with three other methods. As a baseline approach, we used the FCF-Random method that does not benefit from bandits for item selection. Instead, it selects a part of the global model that is comprised of items selected at random. Furthermore, to assess the advantage of optimizing the payload in a model-driven fashion compared to the naive optimization method, we compared the FCF-BTS performance with the TopList recommendation of the most popular items to every user. In addition, we used FCF ammad2019federated as an upper-bound comparison to our FCF-BTS method. In each FL communication round, FCF (Original) transfers (uploads/downloads) the whole global model between the server and users. This provides an estimate of the recommendation performance for each dataset, achievable when no payload optimization is performed in federated mode.

6.1 Hyper-parameters

To ensure the fair treatment of all three methods, we adapted the same hyper-parameter settings for FCF (as shown in Table  3) that were found to be optimal from the previous studies ammad2019federated; 10.1007/978-3-030-67661-2_20. The FCF-BTS specific hyper-parameters of the prior were set as and the regularization of the reward was set as .

Model K
FCF 25 1 4 0.1 0.99 0.01 1e-8
Table 3: FCF’s hyper-parameters values used in our experiments. K represents the number of latent factors, is the L2-regularization term, denotes the implicit confidence parameter. , , , and are the parameters of the ADAM optimizer.

The threshold parameter in Algorithm 1 refers to the number of federated model updates that are needed to update the global model. For each dataset, we selected as (Movielens, Last-FM, MIND) relative to the total number of users ammad2019federated; 10.1007/978-3-030-67661-2_20.

6.2 Model training and evaluation criteria

We followed the training and evaluation approach of Flanagan et al. 10.1007/978-3-030-67661-2_20 and performed 3 rounds of model rebuilds. The training set of every user was comprised of 80% item interactions that were selected at random. The performance metrics were then computed on the remaining 20% of interactions (test set) for each user separately. Likewise, the users’ performance metrics were also aggregated to update the global metric values on the FL server. Notably, the FL server triggers the update of the global model if the , implying that in each iteration, only a subset of users sent their test set performance metrics along with the local model updates. At the 1000th iteration, we took the average of the previous ten global metric values to account for the biases that originate from the unequal test set distributions of the users sending asynchronous updates to the FL server.

We used well-known recommendation metrics bobadilla2013recommender namely Precision, Recall, F1, and Mean Average Precision (MAP) to evaluate our models for top predicted recommendations, given the recommendation list length of 100. To implement these metrics, we adapted the formulation of Flanagan et al. 10.1007/978-3-030-67661-2_20 (as described in their equations S2 - S5). To make the recommendation metrics comparable, we further normalized the performance metrics using the theoretically best achievable metrics for each dataset. We computed the theoretically best metrics by recommending items from the test set of each user. However, if the user had less than 100 items in their test set, a recommendation list was formed by adding items at random with which the user has not interacted with in the past. Likewise, the TopList performance metrics were estimated using the 100 most popular items ranked by their interaction frequency in the training set.

Finally, we calculated two summary statistics to analyze the effect of the payload reduction on the recommendation performance degradation namely “Impr %" to quantify the relative performance improvement of FCF-BTS compared to FCF-Random or TopList, and “Diff %" to compute the relative difference between FCF-BTS and FCF (Original) performances,


where is the mean of the recommendation metric values across 3 model builds.

7 Results

As FCF-BTS is the first payload optimization method for FRS, we used FCF-Random and TopList as the baseline comparison methods. We rigorously analyzed the effect of payload reduction on the recommendation performance degradation (loss of accuracy) using FCF-BTS and FCF-Random. In particular, we analyzed the recommendation performance when 25%, 50%, 75%, 80%, 85%, 90%, 95% or 98% of the original model payload was reduced. In practice, this payload reduction implies that 75%, 50%, 25%, 20%, 15%, 10%, 5% or 2%-of-items from the total number of items has been used during the FL model training.

Figure 2:

The effects of payload reduction on the recommendation performance degradation (loss of accuracy). The X-axis denotes the % reduction in payload of the original model. The Y-axis (left side) represents the metric values, while the % degradation compared to the original model’s performance is shown on the right side. Each point denotes the average test set recommendation performance over three rounds of model rebuild with error bars showing the standard deviation over the mean. The proposed FCF-BTS consistently outperforms FCF-Random (baseline) and demonstrates a substantial performance gain compared to the TopList recommendations while minimizing the payload by up to 90%.

The results demonstrate that the FCF-BTS outperforms FCF-Random (Baseline) consistently as shown in Figure 2. We noticed a significant improvement for highly sparse datasets such as Last-FM and MIND. In comparison to the upper-bound method, FCF-BTS closely matches the performance of FCF (Original) in the Last-FM and MIND datasets as there was up to a 90% reduction in the model payload, confirming that FCF-BTS achieves the required performance with an extremely small payload. The method gets close in the Movielens dataset with a 75% payload reduction. This finding implies that the use of bandits is beneficial for production datasets that are inherently sparse in nature. Most importantly, FCF-BTS yields substantial performance gains compared to the TopList recommendations in the Last-FM dataset while using only 2% of the model payload. It shows a comparable performance in Movielens and MIND when 5% of the items are used for model training.

Particularly, FCF-BTS showed promising results with a 90% payload reduction for all three datasets as shown in Table  4. In the Movielens dataset, the performance degradation for precision, recall, F1 and MAP was 18.77%, 20.19%, 19.88% and 23.06% respectively, compared to the performance achievable by the FCF (original) model. On the other hand, FCF-BTS improved precision (28.3%), recall (27.57%), F1 (27.74%) and MAP (40.75%) relative to FCF-Random (Baseline) and similarly, FCF-BTS showed precision (46.53%), recall (48.19%), F1 (47.32%) and MAP (59.99%) incremental improvements compared to the TopList recommendations.

In the Last-FM dataset, FCF-BTS had 6.12%, 5.69%, 5.93% and 8.8% less precision, recall, F1 and MAP metrics respectively, compared to the upper-bound performance metrics. FCF-BTS showed an increase in precision (72.64%), recall (73.6%), F1 (73.1%) and MAP (98.85% ) over FCF-Random (Baseline). In comparison to the TopList, FCF-BTS resulted in substantially better recommendations while improving precision, recall, F1, and MAP by 164.88%, 165.14% 164.93% and 233.44% respectively (see Table  4).

Precision Recall F1 MAP
FCF 0.37440.00582 0.38550.00754 0.38170.00566 0.24000.00702
FCF-BTS 0.30410.00801 0.30760.01055 0.30580.00918 0.18460.00774
FCF-Random 0.23700.01154 0.24110.00644 0.23940.00765 0.13110.00685
TopList 0.20750.00027 0.20760.00052 0.20760.00046 0.11540.00014
FCF-BTS vs. FCF (Diff%) 18.77 20.19 19.88 23.06
FCF-BTS vs. FCF-Random (Impr%) 28.3 27.57 27.74 40.75
FCF-BTS vs. TopList (Impr%) 46.53 48.19 47.32 59.99
FCF 0.21310.01128 0.21240.01044 0.21270.01086 0.13280.00745
FCF-BTS 0.20010.00523 0.20030.00502 0.20010.00512 0.12110.00456
FCF-Random 0.11590.00487 0.11530.00479 0.11560.00482 0.06090.00218
TopList 0.07550.00233 0.07550.00232 0.07550.00232 0.03630.00139
FCF-BTS vs. FCF (Diff%) 6.12 5.69 5.93 8.8
FCF-BTS vs. FCF-Random (Impr%) 72.64 73.6 73.1 98.85
FCF-BTS vs. TopList (Impr%) 164.88 165.14 164.93 233.44
FCF 0.11080.00314 0.11210.00438 0.11100.00339 0.04960.00286
FCF-BTS 0.10590.00379 0.10570.00386 0.10590.00380 0.04610.00264
FCF-Random 0.02940.00259 0.02960.00281 0.02940.00263 0.01020.00112
TopList 0.10020.00067 0.10030.00046 0.10030.00063 0.04180.00044
FCF-BTS vs. FCF (Diff%) 4.43 5.71 4.67 7.1
FCF-BTS vs. FCF-Random (Impr%) 260.06 256.1 259.32 352.46
FCF-BTS vs. TopList (Impr%) 5.67 5.32 5.58 10.39
Table 4: A detailed analysis of the 90% payload reduction for the recommendation performance degradation (loss of accuracy). The values denote the mean ± standard deviation of the test set recommendation performance across 3 rounds of model rebuild. “Diff (%)" represents the relative percentage differences in the performance of the proposed FCF-BTS model compared to the upper-bound performance achievable by FCF (Original). “Impr (%)" refers to the relative percentage improvements in the performance of the proposed FCF-BTS model in comparison to the FCF-Random (Baseline) and TopList recommendations. Notably, FCF-BTS consistently outperforms both of the baseline methods including FCF-Random (Baseline) and TopList. With a 4%–8% loss in accuracy, FCF-BTS closely matches the performance of FCF (Original) when it comes to highly sparse datasets (Last-FM and MIND).

Lastly, for the MIND dataset, the performance of FCF-BTS closely matched the performance achievable by the FCF (Original) model. The relative differences in precision, recall, F1 and MAP metrics were 4.43%, 5.71%, 4.67% and 7.1% respectively, which are small compared to the performance differences given by FCF-Random (Baseline). FCF-BTS significantly outperformed FCF-Random (Baseline) with 260.06%, 256.1%, 259.32% and 352.46% higher precision, recall, F1 and MAP metrics, respectively. In contrast to the performance of TopList, FCF-BTS demonstrates incremental increases in precision (5.67%), recall (5.32%), F1 (5.58%) and MAP (10.39%).

Next, we demonstrated that the proposed FCF-BTS method converges on the optimum and closely matches the solution that is achieved by FCF (original) for the sparse dataset (Last-FM and MIND). Figure 3 shows that FCF (Original) reached the optimal solution between FL iterations in all three datasets. For the Last-FM and MIND datasets, we observed that the FCF-BTS method converges on the optimal solution between , thus requiring additional iterations to get close to the upper-bound optimal solution as shown in Figure 3. This is typically expected in any form of optimization method that uses part of the whole model (fewer parameters) in each iteration. Most importantly, it validates the fact that FCF-BTS converges on the optimal solution while using only 10% of the model payload, compared to the naive FCF-Random (baseline) method. In the Movielens dataset, we realized that FCF-BTS converges on the optimum in iterations similar to FCF (Original). However, the differences in performance are relatively large compared to the Last-FM and MIND datasets. Nevertheless, Figure 3 illustrates the convergence stability of FCF-BTS across the three datasets up to 1,000 FL iterations similar to the FCF (Original) method’s convergence. In summary, our rigorous analysis confirms that the FCF-BTS solution closely matches the FCF (Original) method’s optimal solution for sparse datasets, although at a different rate. The results summarize that with a loss in the recommendation accuracy of (for highly sparse datasets) in comparison to the standard FCF method, FCF-BTS makes it possible to utilize a smaller payload (reduction up to 90%) in FL model training.

Figure 3: Convergence analysis of the proposed FCF-BTS method in a 90% payload reduction scenario. X-axis shows the FL iterations. The Y-axis represents the performance metric values. For each FL iteration, we took the average of the previous ten metric values to account for the biases originating from the user’s unequal test set distributions. Each line denotes the average test set recommendation performance across 3 rounds of model rebuilds. In iterations, FCF-BTS converges to the best performance, which is close to FCF (Original) for the sparse dataset (Last-FM and MIND), while minimizing the payload by up to 90%.

8 Conclusion

In this study, we tackled the challenge of increasing payloads faced by FRS if deployed in a real-world situation. The requirement to move huge model payloads between the FL server and the user over several training rounds is neither practical nor feasible for a RS operating in production. We introduced an optimization method that addresses the payload challenge by selecting part (smaller payload) of the global model to be transmitted to all users. The selection process was guided by a bandit model optimizing a novel reward policy suitable for FRS. The proposed method was rigorously tested on three benchmark recommendation datasets and the empirical results demonstrate that our method consistently performed better compared to the simpler and naive optimization approaches. Our method achieved a 90% reduction in payload with a minimal loss of recommendation performance from 4% to 8% in highly sparse datasets. In addition, our method yielded a performance comparable to TopList with a 95% payload reduction in two out of three datasets. The results establish that the bandit-based payload optimization can provide a similar quality of recommendation without increasing the computational cost for the user’s devices when participating in the FRS, particularly in production.

In future work, we intend to extend the current research work in multiple directions. We have presented the payload optimization of the standard FCF to demonstrate a proof-of-concept. It would be interesting to investigate whether similar results will be achieved in the context of larger datasets and far more recent and advanced FRS methods qi2020privacy; 10.1007/978-3-030-67661-2_20. In this study, we empirically validated the usefulness of the proposed optimization method. A key next step would be to study the theoretical properties reflecting upon the convergence guarantees and regret bounds for the novel reward function.


This work was supported by Helsinki Research Center, Europe Cloud Service Competence Center, Huawei Technologies Oy (Finland) Co. Ltd.