Federated Multi-view Matrix Factorization for Personalized Recommendations

by   Adrian Flanagan, et al.

We introduce the federated multi-view matrix factorization method that extends the federated learning framework to matrix factorization with multiple data sources. Our method is able to learn the multi-view model without transferring the user's personal data to a central server. As far as we are aware this is the first federated model to provide recommendations using multi-view matrix factorization. The model is rigorously evaluated on three datasets on production settings. Empirical validation confirms that federated multi-view matrix factorization outperforms simpler methods that do not take into account the multi-view structure of the data, in addition, it demonstrates the usefulness of the proposed method for the challenging prediction tasks of cold-start federated recommendations.



There are no comments yet.


page 1

page 2

page 3

page 4


Privacy Threats Against Federated Matrix Factorization

Matrix Factorization has been very successful in practical recommendatio...

Secure Federated Matrix Factorization

To protect user privacy and meet law regulations, federated (machine) le...

Federated Multi-View Learning for Private Medical Data Integration and Analysis

Along with the rapid expansion of information technology and digitalizat...

Bayesian exponential family projections for coupled data sources

Exponential family extensions of principal component analysis (EPCA) hav...

Multi-View Non-negative Matrix Factorization Discriminant Learning via Cross Entropy Loss

Multi-view learning accomplishes the task objectives of classification b...

Multi-view Clustering via Deep Matrix Factorization and Partition Alignment

Multi-view clustering (MVC) has been extensively studied to collect mult...

Practical and Secure Federated Recommendation with Personalized Masks

Federated recommendation is a new notion of private distributed recommen...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In many machine learning problems multiple heterogeneous and related data sources or views are available to build a model. The challenge is to effectively integrate the views into a coherent model which performs better than the equivalent single view based model. As an example of a multiple view problem we consider the case of a movie recommender system where historical user-movie watch data allows us to generate user personalized recommendations. By adding other sources of information or data views such as user personal information (e.g., age, gender, and location) and movie features (e.g. for movies genre, actors, director, box office revenue) we would expect to generate better personalized recommendations. While we use the example of a movie recommender the methods described here can be generalized to any type of recommender system.

With an increasing focus on user privacy and legislation such as the GDPR111https://gdpr-info.eu/ users may not opt-in to share their personal data and companies may be less willing to record, upload and store users’ data to generate multi-view recommendations. The Federated Learning (FL) [mcmahan2017communication] paradigm addresses the issue of users’ privacy. In FL model learning is distributed to the end clients (i.e. user’s devices), and model updates are generated locally with the users’ data and only the model updates are uploaded and aggregated in a central server ensuring the raw user’s private data never leaves the client device.

Multiple data views can be combined using Multi-View Matrix Factorization (MVMF) which is an extension of the standard Collaborative Filter (CF) [balabanovic1997fab, sarwar2001item] for generating recommendations. Figure 1 conceptualizes the principle of learning from the multiple data views 1) user-item (video), 2) user-features, 3) item (video)-features for predicting personalized movie recommendations. The goal of multi-view matrix factorization is to learn a joint factorization of all the three data matrices. Essentially, the joint factorization decomposes the observed data into sets of low-dimensional latent factors that capture the dependencies between the matrices. The MVMF model therefore uses a combination of the 3 datasets to learn a better model of the user-item interactions. As a result, the model is able to significantly improve recommendations to the user, thereby enhancing user experience. With the blend of matrix factorization, latent variables and multi-view machine learning approaches, it is possible to address several challenges in recommendation systems such as generating a recommendation for a new user (cold-start user), a recommendation for new items (cold-start item) and a recommendation for an entirely new user and new item (out-of-matrix prediction), which is effectively not possible with the simpler CF approaches.

In what follows we describe a Federated Learning (FL) implementation of Multi-View Matrix Factorization (MVMF). The use of Federated MVMF (FED-MVMF) also allows us to address the issue of cold start in recommender systems in a distributed FL setting. We show that federation of the MVMF is technically feasible and formulate the updates using a stochastic gradient-based approach. We compare our multi-view approach with single-view matrix factorization on the MovieLens, BooksCrossing and an in-house production dataset (anonymized). The findings confirm that our model substantially outperforms the simpler alternatives. In addition, we empirically demonstrate cold-start recommendations with FED-MVMF. Our original contributions in this work are three fold: (1) we formulate as far as we know the first federated multi-view matrix factorization method with side-information sources, (2) we empirically demonstrate that the method outperforms simpler federated Collaborative Filter methods, (3) we present the first mechanism for cold-start predictions in federated learning mode.

Figure 1: Multi-view matrix factorization method with side-information sources (left). Cold-start prediction using FED-MVMF (Right). For prediction of a newly released item, the model predicts the user-item interactions (cold-start item). Similarly, if a new user comes in, the model can make predictions on which items this user is likely to interact, even though there is zero historical interaction data for this user (cold-start user). Finally, when a new user signs up and new items are released, the model can make prediction with zero historical user-item interaction information for the user and item both (cold-start user item).

2 Multi-view matrix factorization

Given multi-view data sources , and , for users, items, characterized by descriptive features, the multi-view matrix factorization method MVMF is defined as a generative model [fang2011matrix]:


where represents the interaction between user and item value , and denote the value of user ’s personal data feature where and denote the value of item at feature , for . The interactions are generally derived from explicit user feedback such as ratings given by a user to an item [zhou2008large], or implicit feedback when the user interacted with the item and is unspecified otherwise [Hu2008]. In this work, we consider the case of implicit feedback for simplicity, however, the proposed method is applicable to explicit feedback scenario without loss of generality.

The model learns the latent matrices which are represented as the user-factor matrix, the item-factor matrix, user-feature factor matrix and the item-feature factor matrix, where is the number of latent factors.

The shared user-factor matrix P captures the statistical dependencies between the item interaction and user personal data sets as shown in Figure  1. Likewise, Q item-factor matrix captures the common patterns between item interactions and item-features data source. The joint factorization, therefore, learns the shared dependencies between the item interactions and side-information sources. The latent factors U and V are specific to the user personal and item features respectively, capturing the source-specific variation.

The inference is then performed by optimizing the cost function on the joint factorization of all the data sources as:


where , for is a confidence parameter to account for implicit feedback uncertainty [Hu2008] and are the L2-regularization terms. Specifically, can be tuned to adjust the strength of information sharing with the side-data matrices. For example, initializing restricts the model to not learn any shared factors, while pushes the model to learn factors shared between item interactions and side-data sources. The value can be chosen informatively based on the prior knowledge about the data generation process or through hyper-parameter optimization. The latent factors P, Q, U and V are inferred using Alternating Least Square [fang2011matrix].

Several related formulations for joint factorization has been proposed earlier with linear, kernelized and Bayesian variants of matrix factorization with data sources on both sides [ammad2014integrative, cortes2018cold, gonen2013kernelized, singh2008relational].

3 Federated multi-view matrix factorization

We present the first Federated treatment of Multi-view Matrix Factorization (FED-MVMF) which combines side-information sources simultaneously from both sides. The multi-view data sources are distributed and not stored on central servers. The user-item interaction data and user personal data are available on user’s devices only, while, the item features are stored on the item server as shown in Figure 2. The proposed framework is presented for the particular case of personalized federated recommendations, though is applicable to other domains as well.

Figure 2: Federated multi-view matrix factorization method with side-information data sources. The master model is updated on the server and then distributed to users. Each user-specific local model (user-factor) remains on the user, and is updated on the user using the local user data and from the server. The updates through the gradients of are computed on each user and transmitted to the server where they are aggregated to update the master model . Meanwhile, the master model Q is also transmitted to the item server, item-feature factor matrix V is updated using the item features. The updates comprising the gradients of Q are further computed and transmitted to the FL server.

FED-MVMF performs a federated factorization of the data matrices R, X, Y jointly as defined in Eq. 2 to learn the latent factors P, Q, U and V. The federated factorization is formulated using stochastic gradient decent inference. We observe that federated inference of U and Q is fundamentally challenging as their updates depend on all the users while federated constraints prohibit a direct integration of user-level data. Next, we discuss our federated solution.

Federated U:

Federated inference of the user-feature factors U

is a key challenge. The update requires the user-factor vectors

from all the users, as


Therefore, cannot be inferred on individual users and must be carried out on the FL server. However, owing to the privacy constraint, each user also preserves the corresponding locally on the device and can not transmit it to the FL server, further complicating the update of .

We formulate a stochastic gradient descent approach to allow for the update of the

vectors on the server, while preserving the user’s privacy. Formally, is updated on the FL server as


for some gain parameter to be determined. However, computation of requires a summation over all users. Therefore, we define as


where is calculated on each user independently of all the other users. All the users then send back the gradient values to the FL server. Finally, be formulated as aggregate of the user gradients as


enabling federated update of .

Federated Q:

Analogous to U, the federated inference of Q is also non-trivial and more complex as Q is a shared factor between user-item interactions and item-features. The inference depends on both latent factors P and V. Practically, the update of item factor vectors requires the user factor vectors from all the users and from all the items,

Therefore, the updates of can not be done on the user’s device and must be performed on the FL server. But, due to the privacy constraints, each user preserves the locally on the device and can not send it to the FL server, further complicating the inference of . We present a stochastic gradient descent approach to allow for the update of the vectors on the FL server, while preserving the user’s personal data.

Formally, is updated on the FL server as


for gain parameter to be determined. However, computing involves a summation over all users and item features . Therefore, we define and as


where is calculated on each user independently of all the other users. All the users then report back the gradient values to the FL server. And is calculated on the item server and the gradient values are transmitted to the FL server. Finally, the derivative can then be computed using an aggregate of the user and item-features gradients as


making it possible to perform federated update of .

Localized P:

The inference of user factors P depends on the item-factors Q, user-features factors U and user’s own data , locally available at each user. The factor models Q and U are received from the FL server and are used to compute the corresponding at each user’s device as


where is the optimal solution obtained , from Eq. 2. Notably, the updates can be carried out independently for each user without reference to any other user’s personal data.

Localized V:

The FL server transmits the latest item factors Q to the item server in each iteration. The item’s features is used to compute the locally for each item. The updates can be carried out independently for each item without reference to any other private data as


where is the optimal solution obtained , from Eq. 2

1:  FL Server
2:  Number of items , Number of user features , Number of factors
3:  Initialize master model factor matrices Q,U and update threshold
4:  while True do
5:     Transmit Q and U to users
6:     Transmit Q Item Server
7:     Receive factor Q gradients
8:     Receive factor Q gradients for
9:     Receive factor U gradients for
10:     if  then
11:         Update U using Eq.6
12:         Update Q using Eq.10
13:     end if
14:  end while
16:  FL User
17:  while True do
18:     Receive master model factor matrices
19:     Compute local model factor using Eq. 11
20:     Generate recommendations:
21:     Compute factor Q gradients using Eq. 8
22:     Compute factor U gradients using Eq. 5
23:     Transmit and FL Server
24:  end while
26:  Item Server
27:  while True do
28:     Receive master model factor matrix Q
29:     Compute local model factor using Eq. 12
30:     Compute factor Q gradients using Eq. 9
31:     Transmit FL Server
32:  end while
Algorithm 1 FED-MVMF: Federated Multi-View Matrix Factorization

A constant gain factor is used for the update of and U and it is seen that the value needs to be chosen with some care to ensure convergence. Gradient descent is the simplest form of optimisation and there are many variations on it which can lead to faster convergence and greater stability some of which are summarised in [DBLP:journals/corr/Ruder16]

. The Adaptive Moment Estimation (Adam) method

[kingma2015adam] has also been used in the context of FCF [ammad2019federated]. We resort to the same approach for inference of FED-MVMF model. In Adam the gradient descent is split into 2 separate parts which record an exponentially decaying of past squared gradients and an exponentially decaying average of past gradients ,




with . The are typically initialized to values and hence biased towards . To counteract these biases bias corrected versions of are given by,




The updates are then given by,


is a constant learning rate and e.g. is to avoid a divide by scenario. We used Bayesian Optimization [snoek2012practical] approach to chose the values of and .

Likewise, we adopt the same treatment for the inference of user-feature factors U

Iterative federated updates: We outline the steps of the proposed FED-MVMF model in Algorithm 1. The FL iterations are performed till the model has converged, where in each iteration master model is updated when the number of collected federated updates from users and item server reached a certain threshold . In the standard mode, the computational complexity of the algorithm is where is the number of iterations. However in federated learning set-up, several other parameters can influence the computational complexity of the algorithm such as number of users participating in the update, how frequent the updates are sent by the users, what are specifications of user’s devices (laptop or mobile) and importantly the communication over Internet and network latency [li2019federated]. In Section 7, we give payload estimates on the algorithm complexity in terms model sizes and time when tested in production settings.

4 Federated cold-start recommendations

The multi-view matrix factorization allows for the inclusion of side-information sources for both users and items simultaneously, making it possible to solve the difficult task of predicting recommendation to new users (cold-start users) or new items (cold-start items) and/or predicting recommendations to an entirely new user on a previously unseen item. Here, a common assumption is that for a new user or a new item, there exist no historical interaction data , though user’s personal features or item’s features are available. Figure 1 shows the schematic for numerous cold-start recommendation scenarios using standard multi-view matrix factorization. However, in contrast to the cold-start recommendation solution offered by standard approaches [ammad2014integrative, cortes2018cold], the FL requires customized solution owing to the privacy constraints and decentralized nature of the multi-view data. We next present the solution of federated cold-start recommendations problem utilizing the proposed FED-MVMF model.

Cold-start user recommendation:

When a new user joins a FL recommendation system with no previous item interaction data, a new local factor model is created using U and user personal features . The cold-start user recommendation is then generated as outlined in Algorithm 2.

1:  FL User
2:  Receive master model factor matrices Q, FL Server
3:  Get personal features of new user
4:  Compute new local factor matrix
5:  Compute recommendations
Algorithm 2 Federated cold-start user recommendation

Cold-start item recommendation:

New items are frequently added to the collection and it is greatly important for the service provider to recommend the new item to a potentially interested user from day zero. The FED-MVMF solves the cold-start item recommendation challenge by creating a new item factor matrix at the item server, given the item features and V. The master model item-factor matrix is updated as and transmitted to the FL server. The users receive the updated and compute recommendations: including the new item as outlined in Algorithm 3.

1:  Item Server
2:  Receive master model factor matrix FL Server
3:  Get local item-feature factor matrix V
4:  Get item features of new item
5:  Compute new item factor matrix
6:  Update the item-factor model matrix Q with
7:  Transmit FL Server
8:  FL Server
9:  Receive updated master model factor matrix
10:  Transmit master model to existing users
11:  FL User
12:  Compute recommendations
Algorithm 3 Federated cold-start item recommendation

Cold-start user-item recommendation:

The prediction of cold-start user-item recommendation is deemed as out-of-matrix prediction task and is perhaps the most challenging in practice. However, the solution is made possible by FED-MVMF with inclusion of factor matrices originating from side-information sources. Technically, FED-MVMF solves the prediction task by combining solutions of federated cold-start user recommendation and federated cold-start item recommendation. The user creates a new local factor model using U and user personal features and receives the updated master model item-factor matrix to compute recommendations: for the new item as outlined in Algorithm 4.

1:  Item Server
2:  Receive master model factor matrix FL Server
3:  Get local item-feature model matrix V
4:  Get item features of new item
5:  Compute new item factor matrix
6:  Update the item-factor model matrix Q with
7:  Transmit FL Server
8:  FL Server
9:  Receive updated master model factor matrix
10:  Transmit master model to existing users
11:  FL User
12:  Receive master model factor matrices
13:  Get personal features of new user
14:  Compute new local factor matrix
15:  Compute recommendations
Algorithm 4 Federated cold-start user-item recommendation

5 Related work

The federated multi-view matrix factorization problem and our solution for it are related to several matrix factorization as well as federated learning methodologies. We next discuss the existing methods that solve special cases of the problem, and relate them to our work.

5.1 Multi-view learning

For the non-federated case, our model can be seen as a multi-view matrix factorization with side-information sources [fang2011matrix].

One-way factorization: Several methods perform integrated analysis of multiple matrices , …, , where is the number of paired samples in matrices with dimensions, such that the matrices are paired in one mode only. Classical approaches like Canonical Correlation Analysis [hotelling1936relations] perform joint factorization of two matrices. More recent advancements, like Group Factor Analysis [klami2015group] can integrate several matrices. However, unlike ours, none of these methods perform factorization of matrices paired on both sides.

Two-way factorization: Similar to our approach a few methods perform two-way factorization of matrices coupled in both modes [cortes2018cold, fang2011matrix]. Moreover, [gonen2013kernelized] introduced a non-linear Kernelized Bayesian Matrix Factorization coupled with multiple side-information sources in X and Y. Recently, [bunte2016sparse] extended the group factor analysis approach for bi-clustering using two-way factorization that integrates data sources from both sides. Extending the similar line of research, [strahl2020] scales matrix factorization with two-way side information sources for efficient inference when the number of covariates is large. Two way factorization or collective matrix factorization methods are especially suitable and widely used for personalized recommendation applications [ammad2014integrative, cortes2018cold, singh2008relational].

However, none of these methods presents a federated learning solution, and our method is the first to provide a federated multi-view matrix factorization integrating side-information sources from both sides.

5.2 Federated learning

Within federated learning, our method is a general multi-view matrix factorization, where several existing methods can be seen as special cases of our model.

One matrix: For a single matrix, our model can be seen as a collaborative filter [ammad2019federated, chai2019secure, dolui2019poster]. This case is also close to distributed matrix factorization of [gemulla2011large, yu2014parallel, zhou2008large]. However, none of these approaches is able to integrate multiple data sources.

Two matrices: For two data sets, with partially paired samples, vertical federated learning approaches [hardy2017private, liu2018secure] take advantage of the common samples, while horizontal federated learning approaches [mcmahan2017communication, DBLP:conf/nips/SmithCST17] leverage the overlap of feature columns to improve the predictive performances. For a comprehensive review of these approaches see [li2019federated, yang2019federated]. Furthermore, recently [huangiterative] performed federated factorization of multiple matrices, however, their method is able to factorize matrices paired in a single mode only.

Other federated learning approaches based on neural networks

[DBLP:journals/corr/KonecnyMRR16, mcmahan2017communication] do not address the problem of personalized recommendation using multiple side-data sources. In addition, [mcmahan2017communication] aggregates model weights at the server whereas we employ a gradient based aggregation suitable for matrix factorization. Other approaches like meta-learning [chen2018federated] have been proposed in the context of recommendation systems. Recently, [jalalirad2019simple] adapt the meta-learning approach and parallel implementation of federated learning. More recently, [bonawitz2019towards] discuss optimizing federated learning at scale, however, none of these methods address the multi-view matrix factorization problem.

To the best of our knowledge, our method is the first federated multi-view matrix factorization that integrates data from both sides to provide personalized federated recommendations.

6 Data

In this study, we used three datasets: two public and a private in-house anonymized production dataset. These datasets are MovieLens-1M [harper2016movielens], BookCrossings [ziegler_2005] and in-house (anonymized). These datasets are characterized by varying degree of sparse user-item interactions, in addition to having descriptive features for both users and items. Interactions of in-house dataset are implicit while interactions of public datasets are explicit i. e. they include an exact rating a user specified for an item. For generalize treatment, we convert the public datasets into implicit datasets as well. The pre-processing details of each dataset are provided in the Supplementary Material. Final characteristics of a dataset after pre-processing are presented in Table 1.

6.1 MovieLens-1M

MovieLens dataset contains about 1 million explicit ratings users selected for movies [harper2016movielens]. We converted explicit ratings to implicit ones simply by assuming that a user watched a movie if she put a rating for it, otherwise she has not watched. We also ignore timestamps in all subsequent experiments. User features are the following: Age, Gender, Occupation and ZipCode. We converted the ZipCodes into US regions (e.g. MidWest, South etc.), therefore all user features are categorical with small cardinality. Item features are much richer. Each item is described by 1128 real numbers from interval [0,1]. Each value correspond to the strength of some tag. Examples of tags are (atmospheric, thought-provoking, realistic, etc.). After excluding ratings which does not have both item and user features we have 914676 interactions in total. More statistics are provided in Table 1.

6.2 BookCrossings

BookCrossings dataset is a dataset scraped by [ziegler_2005] from the popular books rating web-site. Dataset contains both explicit and implicit ratings. At first, we have discarded all 0 ratings, then we have substituted all positive ratings with 1. Hence, we made implicit ratings from the explicit ones similarly to MovieLens pre-processing. We also selected 2999 most popular items and left only interactions with them. the amount of items is taken to be close to other datasets. It makes results more comparable and reduce computational workload. User feature include only Age and Location. We discard all the users with empty or too high age, then we have formed age groups of 10 years intervals. Location typically consist of town, region and country. We cleaned it as much as possible, but anyway it is a high cardinality categorical feature. There are 4 book features: Book-Title, Book-Author, Year-Of-Publication, Publisher. We extracted the key-words from titles and use them as features. All book features except the Year-Of-Publication have high cardinality. Parameters of pre-processed dataset are given in Table 1.

6.3 In-house Production Dataset

The in-house production dataset consists of a data snapshot extracted from the database that contains user view events (interactions). It is the largest dataset we experimented with. We did not filter out users or items if the amount of events with those is small. Hence, many users have very few interactions which makes this datasets challenging for Collaborative filtering methods like FCF. User features have several categorical features and some of those have high cardinality. In general, user features are similar to the user features of public datasets. Item features are similar to the tags features of the MovieLens data although not so reach. Further statistics are in Table 1.

Particularly, FED-MVMF integrates user and item features for matrix factorization. We treat all user and item features as categorical (except for Movielens item-features, which are described by real numbers). Some features may have high cardinality (e.g. key-words of a book title) and different number of features per item. For instance, one book has 1 key-word in its title while another may have 3. Therefore, we processed all user (or item) features using a hash function of a certain output dimension. We call it hash size. This size depends on a dataset and exact numbers are provided in Table 1. More precisely, we form stings like this {feature_name}__{feature_value} and hash all the strings of a user (item) into a vector of hash size. Originally, this is a vector of zeros. Hashing here means setting to 1 the corresponding coordinate of the vector. If there is a hashing collision we increase the corresponding value by 1. As a result, we obtain a sparse vector of fixed size for each user and item.

Dataset # Users # Items # Interactions Sparsity (%) User features Item features
Movielens 6040 3064 914676 4.9% 3434 1128
BookCrossings 19912 2999 72794 0.12% 7405 10000
In-house production dataset (anonymized) 815614 3912 2213122 0.07% 300 300
Table 1: Overview of the datasets used in the study, where # interactions refers to the total number of user-item interactions and Sparsity (%) denotes the percentage of observed interactions in a particular dataset. User features represents the number of features used to encode personal data, while Item features is the number of features used to describe item features. In this study, the user-interactions, user features and item features are denoted by and Y respectively.

7 Experiments and Results

We used Federated Collaborative Filter (FCF) [ammad2019federated, chai2019secure, dolui2019poster] as a baseline method. FCF is a federated matrix factorization method, however, does not integrate side-information sources.

Federated learning production system design:

We implement a production equivalent client-server architecture  [SHARMA201516][Gamma95], in which numerous clients are served by a single FL server and an Items service. All entities are implemented with Python (3.7.3) in a multiprocessing setup and cloud hosted on Ubuntu Linux 18.04 server infrastructure. The FL server has two data persistence layers of Redis (4.0.9) and PostgreSQL (9.6.10) databases, the Items service has only the Redis layer and the clients have no persistence layer. The hardware specifications are 8 cores with 64 gigabytes (GB) of memory for FL server and 16 cores with 16 GB of memory for Items service and clients.

Both FL server and Items service use Gunicorn (19.7.1), the Python based web server to expose Application Programming Interfaces (API) as the only mechanisms to consume their services. Nginx (1.14) server lies between Gunicorn and clients optimizing service requests/responses by caching, compressing and decompressing payloads. Additionally, the FL server Base64 encodes/decodes all outgoing and incoming communications.

The FL server initializes a master model for each of the available models with their respective hyper-parameters. FL server and clients query the Items service for item metadata. Thus, data on each client is formulated from the user’s personal data, item interactions and features associated with that item. Each client downloads a copy of an initialized master model and it’s update-signature from the FL server to their local storage. Model updates, metrics and inferences are derived from this local copy against user’s data on the client. Periodically, clients encapsulate an update payload by combining their model updates and performance metrics. The update payload is randomly uploaded to the FL server for aggregation. This random upload strategy differs from  [bonawitz2019towards] approach that maintains a subset of pre-selected known clients to upload their updates, by providing client anonymity, enhancing their privacy.

The FL server implements queuing strategy to process incoming clients’ requests, responses and updates, for models at different stages. An update processor validates client payloads based on their update-signature before appending them to their respective queues. A First In First Out (FIFO) model aggregator pops the oldest payload from the queue, recovering its model update and metrics. Updates are aggregated, an update counter incremented and the metrics persisted to a Structured Query Language (SQL) database for performance monitoring. The update counter is governed by the threshold parameter , defined for each model at initialization. When sufficient aggregates have been accumulated, the aggregator first invalidates the current update-signature, implicitly informing clients not to upload more payloads and prepare for an updated master model and update-signature. Then, a new model composed of the previous model and the its updates aggregate, is promoted replacing its predecessor master model. A new update-signature is also generated for the renewed master model, prior to flushing the payload queues and updating the update processor and aggregator validation values.

Hyper-parameters, training and evaluation criteria:

BO Bounds [2, 25] [100, 50000] [4, 110] [0.001, 0.099] [0.1, 1] [0.1, 0.55] [0.55, 0.99] [1e-8, 0.05] [0.1, 0.99]
In-house Anonymized Dataset
FCF 25 32000 110 1 0.1 0.99 1e-8 0.1
FED-MVMF 22 47200 110 0.099 0.1 0.55 0.55 0.05 0.99
Movielens Dataset
FCF 25 100 4 1 0.1 0.99 1e-8 0.1
FED-MVMF 25 3700 4 0.0989 1 0.1 0.98 0.0499 0.1
Book Crossings Dataset
FCF 25 10000 110 1 0.1 0.55 0.05 0.99
FED-MVMF 23 13300 4 0.099 0.1 0.1 0.55 1e-8 0.1
Table 2: Summary of the hyper-parameters selected using Bayesian Optimization. Particularly, , , and are defined for the model, while , and for Adam’s learning rate and is FL hyper-parameter defining the threshold on the amount of federated model-updates needed to update the master model.

The FCF and FED-MVMF models share similar set of hyper-parameters except which controls the strength of information shared with the side-information sources, and is specific to FED-MVMF. To choose optimal hyper-parameters, we used Bayesian optimization approach [snoek2012practical]. Table 2 illustrates the configuration settings for Bayesian optimization describing hyper-parameter bounds that were used to obtain the optimal set of hyper-parameters for the models. We performed 3 rounds of model rebuilds in production settings. In each round, the item interactions for every user were randomly divided into 80% training and 20% test sets, and select the metric value when 1000 iterations of federated model updates are reached, to ensure model convergence. Notably, in each federated iteration only a subset of users contribute to update the master model and report their performance metrics. Hence, at we take average of the previous 10 values to account for sampling biases in metric values.

To evaluate the models, we use the widely adapted recommendation metrics [BOBADILLA2013109], Precision, Recall, F1, Mean Average Precision (MAP) and Normalized Mean Ranking (NMR) for the top predicted recommendations. The metrics were further normalized by the theoretically best achievable metrics for each dataset, to make them comparable.


notation , , means true positives, true negatives and false negatives respectively, at ranked prediction list from positions and is the amount of data points in the test-set of a user. Note in all these metrics the possible values is in the range where values closer to imply a better performance value except NMR the closer the value is to the better.

When making a comparison of metric values between for example the FCF and FED-MVMF model the ”Impr %” is quoted. In general this is given by the following.


where Metric Mean (FED-MVMF), Metric Mean(FCF) refer to the average value of the metric over a given number of model builds. This definition implies that if the Impr % value is positive then the FED-MVMF metric value is higher than the FCF value and vice-versa. Next we show performance plots for these metrics demonstrating stable model convergence over several rounds of federated iterations.

Convergence Analysis and Performance Metric Selection:

Figure 3 shows the FED-MVMF model’s convergence for each of the data sets. In the figure, Y-axis denotes the recommendation metric, whereas X-axis represents or rounds of master model updates. At we compute average of previous 10 values and report in Table 3.

Figure 3: Recommendation performance of FED-MVMF over several rounds of master model updates (). The in-house, Movielens and BooksCrossing results are shown in top-row, middle-row and bottom-rows respectively. The results demonstrate stable model convergence over several rounds of updates. In each federated iteration only a subset of users contribute to update the master model and report their performance metrics. Hence, at we take average of the previous 10 values to account for sampling biases in metric values.

Recommendation Performance:

We first compare the performance of the proposed FED-MVMF with FCF on three real personalized recommendation data sets. The results demonstrate that the FED-MVMF method outperforms FCF substantially in most cases as shown in Table 3. Importantly, the relative improvement in performances goes upto 70% when measured across different metric and datasets. We noticed a significant improvement for highly sparse datasets such as the in-house anonymized production dataset and BooksCrossing dataset.

This finding implies that the use of side-information sources is beneficial for production datasets which are inherently sparse in nature. The strength of predictive signal could be dependent on the type and nature of side-information sources used for a particular dataset. The results demonstrate larger improvements in recommendation performances of in-house and BookCrossing dataset that integrated discretized features compared to the Movielens dataset which included dense features encoded with real-values. Here, our main research goal was to propose a FED-MVMF method that can take advantage of the side-information sources in a federated learning. The development of a multi-view model is a technically challenging research problem owing to the federated nature of the multi-view data sets and hard constraints of the federated learning design. Our results confirmed that the developed solution takes advantage of the side-information sources to provide substantially improved recommendations.

Precision Recall F1 MAP NMR
In-house Anonymized Dataset
FCF 0.18110.0009 0.18160.0007 0.18120.0007 0.08420.0009 0.30970.0017
FED-MVMF 0.27710.0022 0.27790.0028 0.2770.0023 0.14110.0021 0.15450.0006
Impr (%) 53 53 53 68 50
Movielens Dataset
FCF 0.34100.0100 0.35710.0019 0.35250.0037 0.20550.0090 0.10060.0021
FED-MVMF 0.36660.0017 0.38250.0043 0.37850.0037 0.22950.0013 0.08410.0012
Impr (%) 8 7 7 12 16
Book Crossings Dataset
FCF 0.0431±0.0025 0.0428±0.0023 0.0431±0.0024 0.0166±0.0012 0.39±0.0015
FED-MVMF 0.0639±0.0011 0.0625±0.0016 0.0636±0.0013 0.0284±0.0012 0.3378±0.006
Impr (%) 48 46 48 71 13
Table 3: Comparison of the test set performance between FED-MVMF and FCF methods. The values denote the mean standard deviation of metric values across 3 different model builds. Impr (%) refers to the relative percentage improvement between the mean values of FED-MVMF and FCF. FED-MVMF model outperforms the FCF model showing a substantial improvement, going upto 70%.

Furthermore, we also present payload estimates in terms of model sizes and time, particularly when tested in production settings.

Payload Analysis:

We next analyzed the payloads for the two federated models. The item-factor matrix Q is common between FCF and FED-MVMF models, however an additional payload for the FED-MVMF model comes from the user-features factor matrix U. The size of Q depends on number of latent factors and number of items , whereas the size of U is dependent on as well as the number of user-features . In the case of FED-MVMF, both Q and U are transmitted as part of model updates between the FL clients and the FL server. As expected, we observed increased payloads for the FED-MVMF model compared to the FCF which does not include user-features. Notably, the model sizes scales linearly with increasing number of items and user-features. The relative increase in model size ranges from 80% to 200% across the three data sets, whereas computation time increases from 24% to 52%. We think that for the case of Books-Crossing, the larger time taken by FCF to compute model updates could be merely a technical artifact and need further clarification. The model update time on FL server also includes the time taken by Item-Server to update item-feature matrix V and compute gradients for Q.

Compared to the FCF model, the proposed FED-MVMF model increases payloads for the FL recommendation system. However FED-MVMF yields better recommendation performances and additionally provides principled solution to cold-start recommendation problems in FL.

FL Client FL Server
Model Download Model Update Model Upload Model Update
Size (KB) Time (MS) Time (MS) Size (KB) Time (MS) Time (MS)
In-house Anonymized Dataset
FCF 265.28 7.39 9.47 242.31 7.35 2.87
FED-MVMF 483.16 10.78 11.78 436.07 9.06 4.19
Impr (%) 82 45 24 79 23 45
Movielens Dataset
FCF 402.17 8.06 10.46 402.17 16.09 3.44
FED-MVMF 849.47 14.80 15.98 849.38 15.29 3.30
Impr (%) 111 83 52 111 -4 -4
Book Crossings Dataset
FCF 390.84 7.23 46.56 350.85 4.62 2.95
FED-MVMF 1021.80 15.78 17.40 1057.00 7.07 3.16
Impr (%) 161 118 -62 201 53 7
Table 4: Payload comparison between FCF and FED-MVMF in terms of the model size (KB=KiloBytes) and time (MS=Milliseconds). The FL client downloads the model (Model Download), update the local model and computes the master model updates or gradients (Model Update) and uploads the gradients to the FL Server (Model Upload). The FL server aggregates the updates arriving from the clients and updates the master model (Model Update).

Cold-start recommendation performance:

FED-MVMF provides a principled solution to the commonly occurring problem in production: cold-start recommendations. To demonstrate the usefulness of the FED-MVMF model for cold-start predictions, we conducted comprehensive analysis of all the three cold-start scenarios: cold-start users, cold-start items and cold-start user-items. For case of cold-start users scenario ,the model did not observe any of the interaction data. A random subset of 10% users were completely held-out during the model training and model parameters were learned with remaining 90% of the users. For case of cold-start items, a random subset of 10% items were entirely left-out during the model training and model parameters were learned with remaining 90% of the items. For case of cold-start users-items, a random subset of 10% users and items were excluded from the model training and model parameters were learned with remaining 90% of the users and items. Likewise, 3 rounds of the model rebuilds were done for each of the scenario. Table 5 illustrates the recommendation performances across all scenarios.

The result demonstrate that without loss of generality, the FED-MVMF model can be used for cold-start recommendations reliably. Specifically, the model shows good cold-start prediction performances for a new user, which is a fundamentally valuable in a federated learning solution where new users are enrolled in the service continuously. The performance of cold start item prediction is observed to be lower than that of cold start user indicating that prediction may be improved further. It is likely that the difference is due to lower quality of the item side-information source. Moreover, the low standard deviation of the results indicates that model predictions are precise across variations in training sets.

Precision Recall F1 MAP NMR
In-house Anonymized Dataset
CS-Users 0.35590.0015 0.33590.0012 0.35180.0014 0.17430.0012 0.06210.0025
CS-Items 0.02630.0006 0.05150.0011 0.02920.0007 0.01140.0003 0.27270.0032
CS-Users-Items 0.17390.0012 0.33520.0027 0.19160.0014 0.13840.0030 0.07920.0030
Movielens Dataset
CS-Users 0.46180.0086 0.50080.0102 0.49840.0093 0.35040.0088 0.30250.0072
CS-Items 0.00430.0003 0.04640.0032 0.02910.002 0.00310.0005 0.41570.0006
CS-Users-Items 0.04400.0031 0.45280.0252 0.29030.0173 0.02390.0015 0.33840.0084
Book Crossings Dataset
CS-Users 0.05210.0027 0.05590.0022 0.05310.0024 0.02540.0032 0.33960.0034
CS-Items 0.00540.0005 0.0120.001 0.00630.0005 0.00470.0004 0.49180.0128
CS-Users-Items 0.01660.0030 0.03990.0068 0.01990.0034 0.01370.0054 0.35030.0055
Table 5: Cold-Start recommendation performance metrics of FED-MVMF using different metrics averaged over all users. The values denote the mean standard deviation across 3 different model builds. The proposed FED-MVMF model made possible to recommend items for challenging cold-start scenarios in federated learning model.

8 Conclusion

We introduced the federated multi-view matrix factorization method where the federated paradigm does not require collecting raw user data to a centralized server thus enhancing the user privacy. The proposed federated multi-view model is tested on three different datasets and we showed that including the side-information from both users and items increases recommendation performance compared to a standard federated Collaborative Filter. The multi-view approach provides a solution to the cold-start problem common to standard Collaborative Filter recommenders. The results establish that the federated multi-view model can provide better quality of recommendations without comprising the user’s privacy in the widely used recommender applications.

Future Work:

An important aspect of any federated learning system is the amount of data, or the payload, to be moved between the FL server and user. In any matrix factorization based recommender system the model size, or factor matrix size is directly proportional to the number of items to recommend which in large federated recommendation systems is not feasible. Our main challenge is to break the direct dependence of model size on the number of items to recommend.