1 Introduction
Machine learning models exploit similarities in users’ interaction patterns to provide recommendations in applications across fields including entertainment (e.g., books, movies, and articles), dating, and commerce. Such recommendation models are typically trained using millions of data points on a single, central system, and are designed under the assumption that the central system has complete access to all the data. Further, they assume that accessing the model poses no privacy risk to individuals. In many settings, however, these assumptions do not hold. In particular, in domains such as healthcare, privacy requirements and regulations may preclude direct access to data. Moreover, models trained on such data can inadvertantly leak sensitive information about patients and clients. In addition to privacy concerns, when data is gathered in a distributed manner, centralized algorithms may lead to excessive memory usage and generally require significant communication resources.
As a concrete example, consider the use of recommender systems in the healthcare domain. There, recommender systems have been used in a variety of tasks including decision support (Duan et al., 2011), clinical risk stratification (Hassan and Syed, 2010) and automatic detection of omissions in medication lists (Hasan et al., 2008). Such systems are typically built using electronic health records (EHRs), which are subject to privacy constraints that limit the ability to share the data between hospitals. This restricts practical applications of recommender systems in healthcare settings as single hospitals typically do not have sufficient amounts of data to train insightful models. Even when training based on a single hospital’s data is possible, the resulting models will not capture distributional differences between hospitals, thus limiting their applicability to other hospitals.
Recently, federated learning (McMahan et al., 2017)
was proposed as an algorithmic framework for the settings where the data is distributed across many clients and due to practical constraints cannot be centralized. In federated learning, a shared server sends a global model to each client, who then update the model using their local data. The clients send statistics of the local models (for example, gradients) to the server. The server updates the shared model based on the received client information and broadcasts the updated model to the clients. This procedure is repeated until convergence. Federated learning has proved efficient in learning deep neural networks for image classification
(McMahan et al., 2017)and text generation tasks
(Yang et al., 2018; Hard et al., 2018).While federated methods address practical computing and communication concerns, privacy of the users in a federated system is potentially vulnerable. Although such systems do not share data directly, the model updates sent to the server may contain sufficient information to uncover model features and raw data information (Milli et al., 2019; Koh and Liang, 2017; Carlini et al., 2019; Hitaj et al., 2017), possibly leaking information about the users. These concerns motivate us to adopt differential privacy (Dwork et al., 2006) as a framework for limiting exposure of users’ data in federated systems. A differentially private mechanism is a randomized algorithm which allows us to bound the dependence of the output on a single data point. This, in turn, translates to bounds on the amount of additional information a malicious actor could infer about a single individual if that individual were included in the training set.
While the differential mechanism presents itself as a natural solution to privacy concerns of users in federated systems, combining the two paradigms faces some major challenges. The key ones emerge due to the differences in how the two frameworks function. On the one hand, federated learning algorithms are typically iterative and involve multiple querying of the individual entities to collect uptodate information. On the other hand, in a differential privacy setting where the information obtained in each query must be privatized via injecting noise, the total amount of noise required to be added to a query scales linearly with the number of iterations (thus reducing utility of the system and the information content) Kairouz et al. (2019); McMahan et al. (2018).
In this paper, we present a novel differentially private federated recommendation framework for the setting where each user’s data is associated with one of many entities, e.g., hospitals, schools or banks. Each entity is tasked with maintaining the privacy of the data entrusted to it against possible attacks by malicious entities. An untrusted server is available to learn centralized models and communicate (in both directions) with the individual entities. Our method learns perentity recommender models by sharing information between entities in a federated manner, without compromising users’ privacy or requiring excessive communication. Specifically, our method learns differentially private prototypes for each entity, and then uses those prototypes to learn global model parameters on a central server. These parameters are returned to the entities which use them to learn local recommender models without any further communication (and, therefore, without any additional privacy risk).
To our knowledge, the proposed framework is the first scheme that introduces differential privacy mechanisms to federated recommendations. Unlike typical federated learning algorithms, our method requires only two global communication steps. Such a succinct communication reduces the amount of noise required to ensure differential privacy while also reducing communication overhead and minimizing the risk of communication interception. Yet despite providing differential privacy guarantees to participating entities, the framework allows each entity to benefit from data held by other entities through building its own private, uniquely adapted model. Specific contributions of the paper can be summarized as follows:

We propose a federated recommendation framework for learning latent representation of products and services while bounding the privacy risk to the participating entities. This is accomplished by estimating the column space of an interaction matrix from differentially private prototypes via matrix factorization.

We enable federating recommendations under communication constraints by building in the requirement that the number of communication rounds between participating entities and the shared server is only two.

We demonstrate generalizable representations and strong predictive performance in benchmarking tests on synthetic and realworld data comparing the proposed framework with individual models and conventional federated schemes that lack privacy guarantees.
2 Background
2.1 Recommender Systems
The goal of recommender systems is to suggest new content to users. Recommender systems can be broadly classified in two categories: content filtering and collaborative filtering. Under the contentbased paradigm, user and/or item profiles are constructed from demographic data or item information, respectively. For example, a user profile could include age while movies could be associated with genre, principal actors, etc. With this information, similarity between users or items can be computed and utilized for recommendation via, for example, clustering or nearest neighbors techniques
(Koren et al., 2009). Collaborative filtering (Goldberg et al., 1992) relies on past user behaviour (ratings, views, purchases) to make recommendations, avoiding the need for additional data collection (Herlocker et al., 1999; Koren, 2008). In this paper we focus on collaborative filtering, although our methodology could be extended to incorporate additional contentbased information (Rendle, 2010). Below we introduce notation and summarize relevant techniques.Consider a set of users and a set of items, where each user has interacted with a subset of the items. We assume that the interactions for user
can be summarized via a partially observed feedback vector
, and that all useritem interactions can be represented by a partially observed matrix . Entries can be in the form of explicit feedback, e.g. numerical ratings from 1 to 5, or implicit, such as binary values indicating that a user viewed or clicked on some content (Hu et al., 2008; Zhao et al., 2018; Jawaheer et al., 2010). The goal is to predict items that a user would like but has not previously interacted with (i.e., to predict which of the missing values in have high values).2.1.1 Matrix Factorization
Matrix factorization is a popular and effective collaborative filtering approach used in many different fields to find low dimensional representation of users and items (Koren, 2008; Koren et al., 2009; Srebro and Salakhutdinov, 2010; McAuley and Leskovec, 2013).
A matrix factorization approach assumes that users and items can be characterized in a low dimensional space for some , i.e., that the partially observed matrix can be approximated by , where aggregates users’ representations, and collects items’ representations. In this paper, we rely on nonnegative factorization, i.e., we constrain the estimates of and to be nonnegative. Such a constraint often results in more interpretable latent factors and improved performance (Zhang et al., 2006; McAuley and Leskovec, 2013). In this setting, and can be estimated as
(1) 
where is a regularizer. For the remainder of this paper, we assume .
Since we only have access to a subset of the entries of , (1) is solved by minimizing the error over the training set of ratings ,
(2) 
where denotes the th row of – i.e., the latent representation for the th user – and is the th row of – i.e., the latent representation for the th item.
2.2 Federated learning
Federated learning was introduced by McMahan et al. (2017) as a framework for learning models in a decentralized manner, and originally applied to learning with neural networks. The goal of federated learning is to infer a global model without collecting raw data from participating users. This is achieved by having the users (or entities representing multiple users) locally compute model updates based on their data and share these updates with a central server. The server then updates the global model and sends it back to the users.
While they avoid directly sharing users’ data, most federated learning algorithms offer no formal guarantees that a malicious agent could not infer private information from the updates. For example, in a naïve application of the original federated learning method (McMahan et al., 2017) to a matrixfactorizationbased recommender system, each entity shares parameters including a lowdimensional representation of each user, leading to a high risk of potential privacy breaches.
2.3 Differential Privacy
Differential privacy (Dwork et al., 2006) is a statistical notion of privacy that bounds the potential privacy loss an individual risks by allowing her data to be used in the algorithm.
Definition 2.1.
A randomized algorithm satisfies differential privacy (DP) if for any datasets and differing by only one record and any subset of outcomes ,
In other words, for any possible outcome, including any given individual in a data set can only change the probability of that outcome by at most a multiplicative constant which depends on
. Differential privacy has been applied to recommender systems by adding noise to the average item ratings and the itemitem covariance matrix (McSherry and Mironov, 2009). However, this approach is designed for systems wherein a centralized server needs to collect all the data to derive users and items’ means and covariances. Differential privacy is more difficult to impose in iterative algorithms, such as those commonly used in federated learning scenarios, since the iterative nature of these algorithms requires splitting privacy budget across iterations, thus bringing forth technical challenges Abadi et al. (2016); Wu et al. (2017); McMahan et al. (2018).2.3.1 Differentially private prototypes
Our design of private prototypes is motivated by the efficient differentially private
means estimator for highdimensional data introduced in
Balcan et al. (2017). This algorithm first relies on the JohnsonLindenstrauss lemma to project the data into a lower dimensional space that still preserves much of the data’s underlying structure. Then, the space is recursively subdivided, with each subregion and its corresponding centroid being considered a candidate centroid with probability that depends on the number of points in the region and the desired value of privacy . The final means are selected from the candidate centroids by recursively swapping out candidates using the exponential mechanism (McSherry and Talwar, 2007), where the score for each potential collection is the clustering loss. The selected candidates are mapped back to the original space by taking a noisy mean of data points in the corresponding cluster, providing DP.The complete algorithm is parametrized by the number of clusters ; the privacy budget ; a parameter such that with probability at least , the clustering loss associated with the centers satisfies
where OPT is the optimal loss under a nonprivate algorithm; is the number of data points; is the dimensionality of the data; and is the radius of a ball that bounds the data. We formalize this algorithm as private_prototypes in the supplementary material.
The Balcan et al. (2017) method is one of a number of differentially private algorithms for finding cluster representatives or prototypes. Blum et al. (2005) introduced SuLQ means, where the server updating clusters’ centers receives only noisy averages. Unlike the approach of Balcan et al. (2017)
, this algorithm does not have guarantees on the convergence of the loss function.
Nissim et al. (2007) and Wang et al. (2015) use a similar framework but calibrate the noise by local sensitivity, which is difficult to estimate without assumptions on the dataset (Zhu et al., 2017). Private coresets have been used to construct differentially private means and medians estimators (Feldman et al., 2017), but this approach does not scale to large data sets.3 Related Work
To the best of our knowledge, the current paper is the first to propose learning recommender systems in a federated manner while guaranteeing differential privacy. However, a number of other approaches have been proposed for incorporating notions of privacy in recommender systems (Friedman et al., 2015); we summarize the main ones below.
Cryptographic methods.
A number of private recommender systems have been developed using a cryptographic approach (Miller et al., 2004; Kobsa and Schreck, 2003). Such methods use encryption to protect users by encoding personal information with cryptographic functions before it is communicated. In the healthcare context, Hoens et al. (2013) have applied cryptographic methods to providing physician recommendations. However, these methods require centralizing the dataset to perform calculations on the protected data, which may be infeasible when the total data size is large or communication bandwidth is limited, or where regulations prohibit sharing of individuals’ data even under encryption.
Federated recommender systems.
In addition to the generic federated learning algorithms discussed in Sec 2.2, alternative federation methods have been proposed for matrix factorization, where the information being shared is less easily mapped back to individual users. Kim et al. (2017)
consider federated tensor factorization for computational phenotyping. There, the objective function is broken into subproblems using the Alternating Direction Method of Multipliers
(ADMM, Boyd et al., 2011), where the alternated optimization is utilized to distribute the optimization between different entities. User factors are learned locally, and the server updates the global factor matrix and sends it back to each entity. In a similar way, Ammaduddin et al. (2019) perform federated matrix factorization by taking advantage of alternating least squares. They decouple the optimization process, globally updating items’ factors and locally updating users’ factors. These two approaches converge to the same solution as nonfederated methods. However, since current variables need to be shared at each optimization stage, this technique requires large communication rates and users’ synchronization. While either of the above factorization methods could be adapted to recommender systems, they lack strict privacy guarantees and require extensive communication.Differential privacy.
In a recommender systems context, McSherry and Mironov (2009) rely on differential privacy results to obtain noisy statistics from ratings. Although the resulting model provides privacy guarantees, it requires access to the centralized raw data in order to estimate the appropriate statistics. This makes it unsuited for the datadistributed setting we consider.
Exploiting public data.
Xin and Jaakkola (2014) consider matrix factorization methods to learn (see Sec 2) in the setting where we can learn the item representation matrix from publicly available data. The public item matrix is then shared with private entities to locally estimate their latent factors matrix . The applicability of this approach is hindered by potentially limited access to public data, which is the case in sensitive applications such as healthcare recommendations. Our approach provides an alternative method for learning a shared estimate of from appropriately obscured private data.
4 A Differentially Private, Federated Recommender System
We propose a model for learning a recommender system in a federated setting where different entities possess a different number of records. We assume the data is split between entities such that each entity possesses data for more than one user. The partially observed useritem interaction matrix associated with the th entity is denoted by .
We assume that the training data is sensitive and should not be shared outside the entity to which it belongs. While each entity will need to communicate information to a nonprivate server, we wish to ensure this communication does not leak sensitive information.
In order for differential privacy and federated recommender systems to work in concert, our framework must accomplish two objectives: 1) make recommendations privately by injecting noise in a principled way, and 2) reduce the number of communications to minimize the amount of injected noise. The solutions to these requirements are interrelated. We first describe a method that reduces the number of communication steps to two, and then procede to describe how to solve the privacy challenge.
4.1 A Oneshot Private System
Most federated learning methods require multiple rounds of communication between entities and a central server, which poses a problem for differential privacy requirements. Specifically, we can think of each round of communication from the entities to the server as a query sent to the individual entities, which has potential to leak information. If we query an DP mechanism times, then the sequence of queries is only DP (McSherry, 2009). In practice, this means that, the more communication we require, the more noise must be added to maintain the same level of differential privacy.
To minimize the amount of noise a differential privacy technique will introduce, our method must limit the number of communication calls between the entities. In the context of matrix factorizationbased recommendations which involve estimating , as discussed in Sec 3, Xin and Jaakkola (2014) show that transmission of private data can be avoided by using a public dataset to learn the shared item representation . Given , each entity can privately estimate without releasing any information about . Building upon this idea, we constrain the communication to only two rounds, back and forth. However, in our problem setting we do not have access to a public data set. Instead, we construct a shared item representation based on privatized prototypes collected from each entity. These prototypes are designed to: a) contain similar information as , thus allowing construction of an accurate item representation; b) be of low dimension relative to , hence minimizing communication load; and c) maintain differential privacy with respect to the individual users. We elaborate on building prototypes in Sec 4.2.
Once we have generated prototypes for each entity, we send them to a centralized server that learns shared item representation through traditional matrix factorization (see Sec 2). This shared matrix is then communicated back to the individual entities which use it to learn their own users’ profile matrices and make local predictions.
In contrast to iterative methods, the proposed approach requires only two rounds of communication: one from the entities to the server, and one from the server to the entities. In addition to reducing communication costs and removing the need for synchronization, this strategy allows us to conserve the privacy budget. With only one communication requiring privatization, we are able to minimize the noise that must be added to guarantee a desired level of privacy.
4.2 Learning Prototypes
For our algorithm to succeed, we must find a way to share the information from all entities in order to build a global item representation matrix . We want the prototypes to be representative of the data set, i.e., ensure they convey useful information. Note that to satisfy differential privacy, each set of prototypes must be differentially private.
Differentially private dataset synthesis methods (see Bowen and Liu (2016) for a survey) could be used to generate having statistical properties similar to . However, these methods tend to be illsuited for highdimensional settings and would involve sending a large amount of data to the server. Instead, we consider methods that find differentially private prototypes of our dataset, with the aim of obtaining fewer samples that still capture much of the variation present in the individual data. Since we will use these prototypes to capture lowrank structure, provided each entity sends the number of prototypes larger than the rank, it is possible for such prototypes to contain the information required to recover singular vectors of yet still be smaller than , thus reducing the amount of information that needs to be communicated. When selecting the prototype mechanism, we recall the following two observations.
algocf[h!]
algocf[h!]
Remark 1.
Nonnegative matrix factorization (NMF) and spectral clustering have been shown to be equivalent
(Ding et al., 2005).Remark 2.
(Theorem 3 in Pollard (1982)) Let be the optimium of the means objective on a dataset distributed according to some distribution on , and let be the set of discrete distributions on with support size at most . Then, the discrete distribution implied by is the closest discrete distribution to in with respect to the 2Wasserstein metric.
Since we are learning the item matrix via NMF, Remark 1 suggests that one should capture the centroids of clusters in to preserve spectral information. Remark 2 implies that the prototypes obtained via means are close, in a distributional sense, to the underlying distribution. Following these intuitions, we consider prototype generation methods based on means. Since the learned prototypes are created to capture the same latent representation that would be captured by NMF, we expect the estimated item matrix to be close to the true .
Due to being appropriate for highdimensional data, we adopt the framework of the differentially private candidates algorithm of Balcan et al. (2017). Note that this algorithm initially maps the data onto a lowdimensional space; however, since we are using the prototypes to learn a lowdimensional representation, such a mapping is unlikely to adversely impact the accuracy of the proposed method. We augment the scheme of Balcan et al. (2017) – while maintaining accuracy and privacy guarantees – by a novel recovery algorithm. The algorithm increases overall efficiency by exploiting the sparsity of the data and the Gumbel trick, often used to efficiently sample from discrete distributions Papandreou and Yuille (2011); Balog et al. (2017); Durfee and Rogers (2019).
After obtaining cluster assignments for each datapoint, instead of sequentially applying the exponential mechanism to recover nonzero entries on the centroid, we add noise drawn from a Gumbel distribution to the centroid mean and take the top entries, where denotes the number of nonzero entries in the dataset. We formalize this procedure as Algorithm LABEL:alg:sparserec. Algorithm LABEL:alg:fedRecSys summarizes the proposed private, federated recommender system.
Theorem 4.1.
Algorithm LABEL:alg:fedRecSys is Differentially Private.
Proof.
The server interacts with the private datasets only once, when collecting the private prototypes. Durfee and Rogers (2019) prove that adding noise to the utility function , and selecting the top values from the noisy utility, is equivalent to applying the exponential mechanism times; therefore, transmission of a single prototype is DP. The parallel composition theorem (McSherry, 2009) establishes that the overall privacy budget is given by the maximum of the individual budgets, implying that the overall algorithm is DP. ∎
5 Experiments
We first test the performance of the proposed differentiallyprivate federated recommender system on synthetic data and report the results in Sec 5.3. Then, to demonstrate the ability to provide highquality recommendations in realistic settings, in Sec 5.4 we apply the system to realworld datasets.
For all the experiments, we fixed the level of regularization to since we did not observe notable difference in performance when varying from to .
5.1 Datasets
We test the proposed scheme on three different datasets. The first one is a synthetic dataset intended to simulate discrete processes such as ratings or counting event occurrences. The relevant matrices are generated as , , and . We set , , , and distribute the data uniformly across 10 different entities.
The second dataset is from the eICU Collaborative Research Database (Pollard et al., 2018), which contains data collected from critical care units throughout the continental United States in 2014 and 2015. Since different visits can have diverse causes and diagnoses, we count each patient visit as a separate observation. We use the laboratories and medicines tables from the database, and create a 2way table where each row represents a patient and each column either a lab or medicine. Matrix is composed using data from over k patients, laboratories and medications, and hospitals. Each entry represents how many times a patient took a test or a medication; the goal is to recommend treatments.
Finally, we consider the Movielens 1M dataset, containing 1,000,209 anonymous ratings from 6,040 MovieLens users on approximately 3,900 movies. We use the first digit of each user’s ZIP code to set up a natural federation of the data.
5.2 Evaluation metrics
To assess convergence and perform parameter tuning, we use the Root Mean Squared Error (RMSE) between the real and the reconstructed . In the case of the synthetic data, RMSE is a suitable measure to examine the fit quality since we have access to the ground truth.
Additionally, to evaluate the quality of recommendations in the hospital and movie data tests, we compare the real and predicted rankings over the test samples using Mean Average Ranking Hu et al. (2008). Concretely, let be the percentile of the predicted ranking of item for user , where 0 means very highly ranked and above all other items. We calculate on a test set defined as
(3) 
This measure compares the similarity between the real and predicted ranks. Intuitively, for a random ranking the expected is , so means a ranking no better than random. Conversely, lower values indicate highly ranked recommendations matching the users’ patterns.
5.3 Evaluating the impact of federation and privacy on synthetic data
Recall that our algorithm differs from the standard matrix factorization schemes in two key aspects: first, it learns the item matrix using prototypes, rather than the actual data; second, it learns the users’ submatrices independently given , rather than jointly. Moreover,insteat of learning the prototypes using exact means, to ensure differential privacy we use an DP algorithm. Here we explore the effect of these algorithmic features.
In particular, we compare our framework with the following algorithms:

Matrix factorization: Apply Eq 2 until convergence on .

MF + means: Apply Eq 2 to factorize a matrix of exemplars , where collects the means from each matrix . Use the estimate to learn individual matrices from .

MF + random: Identical to MF + means, but instead of using the cluster means, use random samples from .

MF + private prototypes: Identical to MF + means, but instead of using true cluster means, use the generated DP prototypes ^{1}^{1}1See Algorithm private_prototypes in the supplementary. .
We first evaluate how means performs in a nonprivate setting. Figures 0(a) and 0(b) show the RMSE when and are fixed, respectively.^{2}^{2}2In the supplementary, we provide additional experiments varying between 10 and 300, and , the dimension of the latent space, between 20 and 80l the behavior is similar. In both figures, we see unsurprisingly that MF has the lowest RMSE, with random exemplars from the original dataset performing second best. For larger values of in Fig 0(b), means performance deteriorates compared to random. Based on examination of the centroids, this is most likely due to
means overfitting to outliers for larger values of
while random performance improves as its number of exemplars approaches the full . We note that our synthetic data does not contain any clusters, so this is the worstcase scenario for the means setting; even so, we observe that the difference in reconstructive performance between the three methods is fairly small. None of the above methods guarantee privacy.Next, we compare the performance of private means and nonprivate means. In Fig 0(c), we consider a relatively small value of , and investigate the effect of as the number of latent factors changes. As expected, larger values of (i.e., less private settings) yield better results. Here we observe little difference in the performance between the private and nonprivate algorithms. However, in Fig 0(d) we see that for large , the private methods perform significantly better than the nonprivate means, mirroring the results in Fig 0(b). We hypothesise that the noise introduced in the private and random scenarios acts as a regularizer, helping avoid overfitting. We note that, since the sensitivity of the random exemplar mechanism is equal to the range of the data, directly privatizing random exemplars would add excessive noise.
In both Fig 0(c) and Fig 0(d), we find that decreasing (and therefore increasing privacy) does not have a significant negative effect on the reconstruction quality. In Fig 0(d), for larger values , MF + private means performs equally well, even for the smallest value of , as the noise is averaged over a large number of samples. Here, we can guarantee DP instead of DP with a minimal drop in RMSE.
5.4 Evaluating the federated recommender system
To evaluate the entire system, we assess our model on realworld data from the eICU dataset and the Movielens 1M dataset. Similar to the experiments in the previous section, we assume that each entity extracts exemplars via private prototypes and sends them to the server. The server learns the item matrix and sends it back to the entities. Each entity learns its own user matrix and reconstructs . We construct a test set by randomly selecting 20% of the users; for each selected user, we randomly select five entries.
We compare our private federated recommender system with: 1) nonprivate centralized matrix factorization, and 2) individual centralized matrix factorization for each hospital. The comparison is performed using different numbers of latent factors varied between 10 and 50. For the private prototypes, we fix for the hospitals’ data and for Movielens; in both cases, .
The results on two datasets are similar and thus we present Movielens results in the supplementary. Fig 1(a) and Fig 1(b) show the average reconstruction error over training and test data, respectively. As expected, on the training set we see lower RMSE by the individual models than when using a jointly learned model, since there are times as many parameters to model the overall variation. Perhaps surprisingly given the noise introduced via the differential privacy mechanism, the federated model achieves training set RMSE comparable to that achieved by individual models.
Analysis of the test set RMSE (Fig 1(b)) reveals the benefit of the federated model. The individual models obtain RMSE comparable to the jointly learned model, indicating that the low training set RMSE results from the individual models overfitting. The federated model, however, generalizes well to the test set. We hypothesise that this is because the jointly learned item matrix aids in generalization, and the use of noisy prototypes discourages overfitting.
Fig 1(c) shows the average ranking quality for the three methods. Consistent with the test set RMSE, the federated model obtains the best ranking performance. As intended, the federated model allows each hospital to improve its predictions by obtaining relevant information from other hospitals, without compromising its patients’ information.
6 Conclusion
We propose a novel, efficient framework to learn recommender systems in federated settings. Our framework enables entities to collaborate and learn common patterns without compromising users’ privacy, while requiring minimal communication. Our method assumes individuals are grouped into entities, at least some of which are large enough to learn informative prototypes; we do not require privacy within an entity.
A future direction could be to extend this approach to the more extreme scenarios where each entity represents a single individual. This would be useful for commerce or content sites where each user wants to maintain privacy. Another avenue for future work is to investigate error bounds for the reconstructed matrix. Such results could allow entities to determine an appropriate privacy budget while still learning useful models.
References
 Deep learning with differential privacy. In SIGSAC, pp. 308–318. Cited by: §2.3.
 Federated collaborative filtering for privacypreserving personalized recommendation system. arXiv preprint arXiv:1901.09888. Cited by: §3.
 Differentially private clustering in highdimensional Euclidean spaces. In ICML, pp. 322–331. Cited by: §B.1, §2.3.1, §2.3.1, §4.2.
 Lost relatives of the Gumbel trick. In ICML, pp. 371–379. Cited by: §4.2.
 Practical privacy: the SuLQ framework. In PODS, pp. 128–138. Cited by: §2.3.1.
 Comparative study of differentially private data synthesis methods. arXiv preprint arXiv:1602.01063. Cited by: §4.2.
 Distributed optimization and statistical learning via the alternating direction method of multipliers. FTML 3 (1), pp. 1–122. Cited by: §3.
 The secret sharer: evaluating and testing unintended memorization in neural networks. In USENIX Security Symposium, pp. 267–284. Cited by: §1.
 On the equivalence of nonnegative matrix factorization and spectral clustering. In ICDM, pp. 606–610. Cited by: Remark 1.
 Healthcare information systems: data mining methods in the creation of a clinical recommender system. Enterprise Information Systems 5 (2), pp. 169–181. Cited by: §1.
 Practical differentially private topk selection with paywhatyouget composition. In NeurIPS, pp. 3527–3537. Cited by: §4.2, §4.2.
 Calibrating noise to sensitivity in private data analysis. In TCC, pp. 265–284. Cited by: §1, §2.3.

Coresets for differentially private kmeans clustering and applications to privacy in mobile sensor networks
. In ISPN, pp. 3–16. Cited by: §2.3.1.  Privacy aspects of recommender systems. In Recommender Systems Handbook, pp. 649–688. Cited by: §3.
 Using collaborative filtering to weave an information tapestry. Communications of the ACM 35 (12), pp. 61–71. Cited by: §2.1.
 Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604. Cited by: §1.
 Towards a collaborative filtering approach to medication reconciliation. In AMIA Annual Symposium, Vol. 2008, pp. 288. Cited by: §1.
 From Netflix to heart attacks: collaborative filtering in medical datasets. In IHI, pp. 128–134. Cited by: §1.
 An algorithmic framework for performing collaborative filtering. In sigir, pp. 230–237. Cited by: §2.1.
 Deep models under the gan: information leakage from collaborative deep learning. In CCS, pp. 603–618. Cited by: §1.
 Reliable medical recommendation systems with patient privacy. TIST 4 (4), pp. 67. Cited by: §3.
 Collaborative filtering for implicit feedback datasets. In ICDM, pp. 263–272. Cited by: §2.1, §5.2.
 Comparison of implicit and explicit feedback from an online music recommendation service. In HetRec, pp. 47–51. Cited by: §2.1.
 Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977. Cited by: §1.
 Federated tensor factorization for computational phenotyping. In SIGKDD, pp. 887–895. Cited by: §3.
 Privacy through pseudonymity in useradaptive systems. TOIT 3 (2), pp. 149–183. Cited by: §3.
 Understanding blackbox predictions via influence functions. In ICML, pp. 1885–1894. Cited by: §1.
 Matrix factorization techniques for recommender systems. Computer (8), pp. 30–37. Cited by: §2.1.1, §2.1.
 Factorization meets the neighborhood: a multifaceted collaborative filtering model. In SIGKDD, pp. 426–434. Cited by: §2.1.1, §2.1.
 Hidden factors and hidden topics: understanding rating dimensions with review text. In RecSys, pp. 165–172. Cited by: §2.1.1, §2.1.1.
 A general approach to adding differential privacy to iterative training procedures. arXiv preprint arXiv:1812.06210. Cited by: §1, §2.3.
 Communicationefficient learning of deep networks from decentralized data. In AISTATS, Cited by: §1, §2.2, §2.2.
 Privacy integrated queries: an extensible platform for privacypreserving data analysis. In SIGMOD, pp. 19–30. Cited by: §4.1, §4.2.
 Differentially private recommender systems: building privacy into the netflix prize contenders. In SIGKDD, pp. 627–636. Cited by: §2.3, §3.
 Mechanism design via differential privacy. In FOCS, pp. 94–103. Cited by: §2.3.1.
 PocketLens: toward a personal recommender system. TOIS 22 (3), pp. 437–476. Cited by: §3.
 Model reconstruction from model explanations. In FAT*, Cited by: §1.
 Smooth sensitivity and sampling in private data analysis. In STOC, pp. 75–84. Cited by: §2.3.1.
 Perturbandmap random fields: using discrete optimization to learn and sample from energy models. In ICCV, pp. 193–200. Cited by: §4.2.
 Quantization and the method of kmeans. IEEE Transactions on Information theory 28 (2), pp. 199–205. Cited by: Remark 2.
 The eicu collaborative research database, a freely available multicenter database for critical care research. Scientific data 5. Cited by: §5.1.
 Factorization machines. In ICDM, pp. 995–1000. Cited by: §2.1.
 Collaborative filtering in a nonuniform world: learning with the weighted trace norm. In NeurIPS, pp. 2056–2064. Cited by: §2.1.1.
 Differentially private subspace clustering. In NeurIPS, pp. 1000–1008. Cited by: §2.3.1.

Bolton differential privacy for scalable stochastic gradient descentbased analytics
. In SIGMOD, pp. 1307–1322. Cited by: §2.3.  Controlling privacy in recommender systems. In NeurIPS, pp. 2618–2626. Cited by: §3, §4.1.
 Applied federated learning: improving google keyboard query suggestions. arXiv preprint arXiv:1812.02903. Cited by: §1.
 Learning from incomplete ratings using nonnegative matrix factorization. In ICDM, pp. 549–553. Cited by: §2.1.1.
 Explicit or implicit feedback? engagement or satisfaction?: a field experiment on machinelearningbased recommender systems. In SAC, Cited by: §2.1.
 Differentially private data publishing and analysis: a survey. TKDE 29 (8), pp. 1619–1638. Cited by: §2.3.1.
Appendix A Private means Definitions and Subroutines
a.1 Differential Privacy Definitions
Definition A.1.
Let be a utility function where measures the utility of outputting given a dataset . The exponential mechanism outputs with probability proportional to , where is the sensitivity of defined by
Definition A.2.
a.2 Subroutines
algocf[htbp]
algocf[h!]
algocf[h!]
algocf[!h]
Appendix B Experiments Details
b.1 Experiment 1: Private means vs means on Poisson Distributed Data
For this experiment we generated , , and . We set , , and observe average behaviour of private means. As increases, the level of privacy decreases thus reducing means objective and approaching the objective achieved by standard, nonprivate kmeans.
To implement standard means we used the Python library scikitlearn. For private prototypes, we modified and implemented in Python publicly available MATLAB code from Balcan et al. (2017) (https://github.com/mouwenlong/dpclusteringicml17).
b.2 Further Experimentation on the Number of Entities
Fig 5 shows the RMSE on the synthetic test dataset described in section 5.1. We observe that as the number of entities increases, the convergence improves. This is expected since the number of observations used to approximate V also grows.
b.3 Experiment 2: Varying Parameters for Normal Synthetic Data
In Section 5.3 we showed results for fixed values of the number of prototypes and the number of latent features . Below we show additional plots for different values of those parameters.
In Fig 6 we observe as the number of samples increases, random exemplars outperforms means for all values of . Note that private means performs well over a wide range of . As increases, private means converge to the same value for various values of . Fig 7 compares all methods for different values of . The difference in RMSE is clearer for small values of . For large values of , the performance of random and private approaches that of matrix factorization.
b.4 Movielens Results
Similar to experiments on the eICU dataset, we observe unsurprisingly in Fig 8 that the global nonprivate matrix factorization model has lower RSME than the distributed approaches (i.e., individual models and Private 50means). However, there is a benefit from collaboration. Recall that an average ranking above 0.5 means a ranking no better than random. Conversely, lower values indicate highly ranked recommendations matching the users’ patterns. We observe in Fig 8 the benefit of collaboration: the quality of recommendations is better for the prototypes models than the local individual models. With a small privacy budget, our method is able to share insights among entities, without sacrificing their privacy, and delivering better recommendations.