A common thread in ML applications is the collection of massive amounts of training data, which is often centralized for analysis. However, when training ML models in a multi-party setting, users must share their potentially sensitive information with a centralized service.
Federated learning  is the current state of the art in secure multi-party ML: clients train a shared model with a secure aggregator without revealing their underlying data or computation. But, doing so introduces a subtle threat: clients, who previously acted as passive data contributors, are now actively involved in the training process . This presents new privacy and security challenges .
Prior work has demonstrated that adversaries can attack the shared model through poisoning attacks [9, 33], in which an adversary contributes adversarial updates to influence shared model parameters. Adversaries can also attack the privacy of other other clients in federated learning. In an information leakage attack, an adversary poses as an honest client and attempts to steal or de-anonymize sensitive training data through careful observation and isolation of a victim’s model updates [21, 27].
We claim that solutions that solve these two attacks are at tension and are inherently difficult to solve concurrently: client contributions or data can be made public and verifiable to prevent poisoning, but this violates the privacy guarantees of federated learning. Client contributions can be made more private, but this eliminates the potential for accountability from adversaries. Prior work has attempted to solve these two attacks individually through centralized anomaly detection, differential privacy [4, 17, 18] or secure aggregation . However, a private and decentralized solution that solves both attacks concurrently does not yet exist. This is the focus of our work.
Because ML does not require strong consensus or consistency to converge , traditional peer to peer (P2P) strong consensus protocols such as BFT protocols  are too restrictive for ML workloads. To facilitate private, verifiable, crowd-sourced computation, distributed ledgers (blockchains)  have emerged. Through design elements, such as publicly verifiable proof of work, eventual consistency, and ledger-based consensus, blockchains have been used for a variety of decentralized multi-party tasks such as currency management [19, 31], archival data storage [5, 29] and financial auditing . Despite this wide range of applications, a fully realized, accessible system for large scale multi-party ML that is robust to both attacks on the global model and attacks on other clients does not exist.
We propose Biscotti: a decentralized public P2P system that co-designs a blockchain ledger with a privacy-preserving multi-party ML process. Peering clients join Biscotti and contribute to a ledger to train a global model, under the assumption that peers are willing to collaborate on building ML models, but are unwilling to share their data. Each peer has a local training data set that matches the desired objective of the training process and increases training data diversity. Biscotti is designed to support stochastic gradient descent (SGD)
, an optimization algorithm which iteratively selects a batch of training examples, computes their gradients against the current model parameters, and takes gradient steps in the direction that minimizes the loss function. SGD is general purpose and can be used to train a variety of models, including deep learning.
The Biscotti blockchain coordinates ML training between the peers. Each peer has a local training data set that matches the learning goal of the global model and provides training data diversity. Peers in the system are weighed by the value, or stake, that they have in the system. Inspired by prior work , Biscotti uses proof of stake in combination with verifiable random functions (VRFs)  to select key roles that help to arrange the privacy and security of peer SGD updates. Our use of stake prevents groups of colluding peers from overtaking the system without a sufficient ownership of stake. Biscotti prevents peers from poisoning the model through a Reject on Negative Influence defense  and prevents peers from performing an information leakage attack through differentially private noise  and Shamir secrets for secure aggregation .
The contributions of our Biscotti design include:
A novel protocol that leverages blockchain primitives to facilitate secure and private ML in a P2P setting
The translation of established blockchain primitives such as proof of stake  and verification into a multi-party ML setting.
In our evaluation we deployed Biscotti on Azure and considered its performance, scalability, churn tolerance, and ability to withstand different attacks. We found that Biscotti can train an MNIST softmax model with 100 peers on a 60,000 image dataset in 50 minutes. Biscotti is fault tolerant to node churn every 15 seconds across 50 nodes, and converges even with such churn. Finally, we subjected Biscotti to information leakage attacks  and poisoning attacks  from prior work and found that Biscotti is resilient to both types of attacks.
2 Challenges and contributions
In this section we describe some of the challenges in designing a peer
to peer (P2P) solution for multi-party ML and the key pieces in
Biscotti’s design that resolve each of these challenges.
Poisoning attacks: update validation using RONI. In multi-party ML, peers possess a disjoint, private set of the total training data. They have full control over this hidden dataset and thus a malicious peer can perform poisoning attacks without limitation.
Biscotti validates an SGD update using the Reject on Negative Influence (RONI) defense , in which the effect of a set of candidate data points (or model updates) are evaluated and compared to a known baseline model. In multi-party settings, known baseline models are not available to users, so Biscotti validates an SGD update by evaluating the effect of the isolated update with respect to a peer’s local dataset. The performance of the current model with and without the update is evaluated. A validation error is computed and updates that negatively influence the model are rejected.
Since no peer has access to the global training dataset, using a
peers’ local data for update verification may introduce false
positives and incorrectly reject honest updates.
Biscotti solves this by using a majority voting scheme. As more peers
are used to validate a peer’s SGD update, the baseline dataset
approaches the global data distribution, reducing false positives.
Information leakage attacks: random verifier peers and differentially private updates using pre-committed noise. Revealing SGD updates to untrusted nodes puts a peer’s training data at risk. By observing a peer’s updates from each verification round, an adversary can perform an information leakage attack  and recover details about the peer’s training data.
We prevent attacks during update verification in two ways. First,
Biscotti uses verifiable random functions (VRFs)  to
ensure that malicious peers cannot deterministically be selected to
verify a victim’s gradient. Second, peers send differentially-private
updates [4, 17] to verifier peers
(Figure 1 top): before sending a gradient to a
verifier, pre-committed -differentially private noise is
added to the update, masking the peer’s gradient in a way that neither
the peer nor the attacker can influence or observe. By verifying
noised SGD updates, peers in Biscotti can verify the updates of other
peers without directly observing their un-noised versions. As shown in
Figure 1, a decreasing value of
provides stronger privacy for a user, at the cost of a higher false
Leaking information through public ledger: secure update aggregation. The goal of a blockchain is to provide complete verifiability and auditing of state in a ledger, but this is counter to the goal of privacy-preserving multi-party ML. For example, information leakage attacks are also possible if SGD updates are stored in the ledger.
The differentially private updates noted above are one piece of the puzzle. The second piece is secure update aggregation: a block in Biscotti does not store updates from individual peers, but rather an aggregate that obfuscates any single peer’s contribution during that round of SGD. we use polynomial commitments  to hide the values of individual updates before they are aggregated in a block, thereby hiding the update until it is aggregated as a part of a block in the ledger (Figure 1 bottom).
The bottom of Figure 1 shows the effect of
increasing the number of batched SGD updates. Even when these updates
are coming from the same user, it becomes increasingly difficult to
perform information leakage on a single instance of a user’s SGD
update when the number of aggregated examples increases.
Sybil attacks: VRFs based on proof of stake. Adversaries can collude or generate aliases to increase their influence in a sybil attack . For example, an adversary can exploit the majority voting scheme to validate their malicious SGD updates. With an increasing number of peers in the system, an adversary increases their chance of sending malicious SGD updates to the final block.
Inspired by Algorand , Biscotti uses three verifiable random functions (VRFs)  to select a subset of peers for the different stages of the training process: adding noise to updates, validating an update, and securely aggregating the update. To mitigate the effect of sybils, these VRFs select peers proportionally to peer stake, ensuring that an adversary cannot increase their influence in the system without increasing their total stake.
3 Assumptions and threat model
We assume a public machine learning system that is backed by an auxiliary measure of stake. Stake may be derived from an online data sharing marketplace, a shared reputation score among competing agencies or auxiliary information on a social network.
Like federated learning, Biscotti assumes that users are willing to collaborate on building ML models among each other, but are unwilling to directly share their data when doing so . Each peer has a private, locally-held training data set that matches the learning goal of the global model, but each peer’s dataset has sufficient utility to increase performance of the global model.
3.1 Design assumptions
Proof of stake. Users in the system are weighed by
the value they have in the system . We assume that
there is a proof of stake function that is available to all nodes in
the system, that takes a peer identifier and returns the fraction of
stake that this peer has in the system. Peers accrue stake as they
contribute towards building the global model, and lose stake when
their proposed updates are rejected (peers use a well-defined
stake update mechanism). We assume that at any point in time the
majority of stake in the system is honest and the stake function to be
properly bootstrapped: upon initialization and onwards, any set of
peers that has a majority of stake in the system is legitimate and
will not attempt to subvert the system.
Blockchain topology. Each peer is connected to some
subset of other peers in a topology that allows flooding or gossip-
based dissemination of updates that eventually reach all peers in the
system. For example, this could be a random mesh topology with
flooding, similar to the one used for transaction and block
dissemination in Bitcoin .
. We assume that ML training parameters are known to all peers: the model, its hyperparameters, its optimization algorithm, the learning goal of the system and other details (these are distributed in the first block). Peers have local datasets that they wish to keep private and is sufficiently unique from the rest of the global data. However, a peer’s dataset is not so distinct that it cannot be used as the quiz set for validation of other gradients. We assume that the distribution of data between clients in the system is sufficiently uniform such that RONI is accurate when used to validate gradients. This is essential for RONI to be used as a defense against poisoning attacks in our setting.
3.2 Attacker assumptions
Peers may be adversarial and send malicious updates to the system to perform a poisoning attack on the shared model or an information leakage attack against a targeted peer victim. In doing so, we assume that peers may control multiple peers in the system and collude in a sybil attack . We do not restrict adversaries at the API level: they can provide any arbitrary value when computing, validating, or adding noise to a gradient update. Although peers may be able to increase the number of peers they control in the system, we assume that adversaries cannot artificially increase their stake in the system except by providing valid updates that significantly improve the performance of the global model.
When adversaries perform a poisoning attack, we assume that their goal is to influence the final global model prediction outputs in a way that impacts the performance of the global model on validation sets in an observable manner. Our defense relies on using RONI with local validation datasets to verify the authenticity of gradients: this does not include gradients that are used for specific targetted poisoning attacks  that only impact a narrow subset of classes in the model, or attacks on unused parts of the model topology, such as backdoor attacks [6, 20].
When adversaries perform an information leakage attack, we assume that they aim to learn properties of a victim’s local dataset. This does not include class-level information leakage attacks on federated learning , which attempt to learn the properties of an entire target class. We do not consider the properties of a defined class to be private in our setting, and only consider attacks that are performed by targeting and observing the contributions from an individual target peer.
4 Biscotti design
Biscotti implements peer-to-peer ML based on SGD. For this process, Biscotti’s design has the following goals:
Convergence to an optimal global model
Peer contributions to the model are verified before being accepted to prevent poisoning
Peer training data is kept private and information leakage attacks on training data are prevented
Colluding peers cannot gain influence without acquiring sufficient stake
Biscotti meets these goals through a blockchain-based design
that we describe in this section.
Design overview. Each peer in the system has local data that the peer is incentivized to contribute to the learning procedure, and an amount of preconceived stake in the system. Using Biscotti, peers join a public ledger and collaboratively train a global model. Each block in the distributed ledger represents a single iteration of SGD and the ledger contains the state of the global model at each iteration. Figure 2 overviews the Biscotti design with a step-by-step of illustration of what happens during a single SGD iteration in which a single block is generated.
In each iteration, peers locally compute SGD updates (step in Figure 2). Since SGD updates need to be kept private, each peer first masks their update using differentially private noise. This noise is obtained from a unique set of noising peers for each client selected by a VRF . By using a unique noising VRF set, Biscotti prevents unmasking of updates using collision attacks. (step and ). This noise is committed by the noise providers in advance of the iteration during a bootstrapping phase; therefore, it cannot be used to maliciously mask poisoned updates as valid.
The masked updates are validated by a verification committee (selected using a different VRF, also weighted by peer stake) to defend against poisoning. Each member in the verification committee signs the commitment to the peer’s unmasked update if their local RONI passes (step ). The majority of the committee must sign an update (step ) for it to be considered valid. The commitments of each unmasked updates that accumulates enough verification signatures are divided into Shamir secret shares (step ) and given to an aggregation committee (selected using a third weighted VRF) that uses a secure protocol to aggregate the unmasked updates (step ). Each peer who contributes a share to the final aggregated update receives additional stake in the system, which can be verified through the visibility of commitments in the ledger. The aggregate update is added to the global model to create a newly-generated block, which is added to the ledger: the block is disseminated to all the peers so that the next iteration can take place with an updated global model and stake. (step ).
Next, we describe how we bootstrap the Biscotti training process.
4.1 Initializing the training
Biscotti requires peers to initialize the training process from information distributed in the first (genesis) block. We assume that this block is distributed out of band by a trusted authority and the information within it is reliable. This block includes key P2P information like each peer ’s public key , and information required for multi-party ML, such as the initial model state and expected number of iterations .
Biscotti also requires the initial stake for all peers and the stake update function, which is used to update the weight of peers based on their contributions throughout the learning process.
Lastly, Biscotti requires pre-committed differentially private noise to be committed to the system. This noise is used in the noising protocol of Biscotti (step in Figure 2) and must be pre-committed to prevent information leakage and poisoning attacks.
In summary, each peer that joins Biscotti obtains the following information from the genesis block:
The commitment public key for creating commitments to SGD updates (see Appendix C)
The public keys of all other peers in the system which are used to create and verify signatures in the verification stage.
The initial stake distribution among the peers.
A stake update function for updating a peer’s stake when a new block is appended.
4.2 Blockchain design
Distributed ledgers are constructed by appending read-only blocks to a chain structure and disseminating blocks using a gossip protocol. Each block maintains a pointer to its previous block as a cryptographic hash of the contents in that block.
Each block in Biscotti contains, in addition to the previous block hash pointer, an aggregate () of SGD updates from multiple peers and a snapshot of the global model at iteration . Newly appended blocks to the ledger store the aggregate updates of multiple peers. Each block also contains a list of commitments for each peer ’s update for privacy and verifiability. These commitments provide privacy by hiding the individual updates, yet can be homomorphically combined to verify that the update to the global model by the aggregator was computed honestly. The following equality holds if the list of committed updates equal the aggregate sum:
This process continues until the specified number of iterations is met, upon which the learning process is terminated and each peer extracts the global model from the final block. In summary, each block for an iteration contains the following (Figure 3):
The global model
The aggregate , which moves the model from to .
The commitments of each peer’s model update in
A list of verifier signatures of each commitment confirming that the update has passed validation.
An updated stake map for all peers in the system
The SHA-256 hash of the previous block at
4.3 Using stake for role selection
For each ML iteration in the algorithm, public VRFs designate roles for each peer in the system to prevent colluding peers from dominating the system. These VRFs are weighed based on the stake that peers have in the system.
Each role in the system (noiser, verifier, aggregator) is selected using a different VRF. Peers can be any or all of the roles in any given iteration. To provide additional privacy, the output of the noiser VRF should be unique to each peer. Therefore, it uses a peer ’s public key as the random seed in the VRF (more details are given in Section 4.4). The verification and aggregation VRFs are seeded with a global public key and the SHA-256 hash of the previous block to enforce determinism once a block is appended. For any iteration , the output of the VRF is globally observed and verifiable. Since an adversary cannot predict the future state of a block until it is created, they cannot speculate on outputs of the VRF and strategically perform attacks.
To assign each role to multiple peers, we use consistent hashing (Figure 4). The initial SHA-256 hash is repeatedly re-hashed: each new hash result is mapped onto a keyspace ring where portions of the keyspace are proportionally assigned to peers based on their stake. This provides the same stake-based property as Algorand 
: a peer’s probability of winning this lottery is proportional to their stake.
Each peer runs each of the VRFs to determine whether they are an aggregator, verifier or both during a particular iteration. Each peer that is not an aggregator or verifier also computes their set of noise providers using the noise VRF. This noise VRF set is also accompanied by a proof. When a noising peer receives a noise request, they can use this proof and the peer’s public key to verify its correctness, allowing them to determine that they are indeed a noise provider for that particular peer.
4.4 Noising protocol
Sending a peer’s SGD update directly to another peer may leak information about their private dataset . To prevent this, peers use differential privacy to hide their updates prior to verification. We adopt our implementation of differential privacy from prior work by Abadi et al. 
, which samples noise from a normal distribution and adds it to SGD updates (see AppendixB for formalisms).
Since the noise is only used in the verification stage, this
does not affect the utility of the model once the update is
aggregated into the ledger. Biscotti uses secure aggregation to preserve
privacy in the aggregation phase, so differential privacy is not
required in this stage.
Using pre-committed noise to thwart poisoning. Attackers may maliciously use the noising protocol to execute poisoning or information leakage attacks. For example, a peer can send a poisonous update , and add noise that ‘unpoisons’ this update into an honest-looking update , such that . By doing this, a verifier observes the honest update , but the poisonous update is applied to the model.
To prevent this, Biscotti requires that every peer pre-commits the noise vectorfor every iteration in the genesis block. Since the updates cannot be generated in advance without the knowledge of the global model, therefore a peer cannot effectively commit noise that unpoisons an update. Furthermore, Biscotti requires that the noise that is added is taken from a different peer than the one creating the update. This peer is determined using a noising VRF and further restricts the control that a malicious peer has over the noise used to sneak poisoned updates past verification.
Using a VRF-chosen set of noisers to thwart information leakage. A further issue may occur in which a noising peer and a verifier can collude in an information leakage attack against a victim peer . The noising peer can commit a set of zero noise which does not hide the original gradient value at all. When the victim peer sends its noised gradient to the verifier , performs an information leakage attack and correlates ’s gradient back to its original training data. This attack is viable because the verifier knows that is providing the random noise and provides zero noise.
This attack motivates Biscotti’s use of a private VRF that selects a
group of noising peers based on the victim ’s public key. In
doing so, an adversary cannot pre-determine whether their noise will
be used in a specific verification round by a particular victim, and
also cannot pre-determine if the other peers in the noising committee
will be malicious in a particular round. Our results in
Figure 12 show that the probability of an
information leakage is negligible given a sufficient number of
Protocol description. For an ML workload that may be expected to run for a specific number of iterations , each peer generates noise vectors and commits these noise vectors into the ledger, storing a table of size by (Figure 5). When a peer is ready to contribute an update in an iteration, it runs the noising VRF and contacts each noising peer , requesting the noise vector pre-committed in the genesis block ). The peer then uses a verifier VRF to determine the set of verifiers. The peer masks their update using this noise and submits to these verifiers the masked update, a commitment to the noise, and a commitment to the unmasked update. It also submits the noise VRF proof which attests to the verifier that its noise is sourced from peers that are a part of their noise VRF set.
4.5 Verification protocol
The verifier peers evaluate the quality of a peer’s update with respect to their local dataset and reject the update if it fails their RONI validation. Each verifier receives the following from each peer :
The masked SGD update:
Commitment to the SGD update:
The set of noise commitments:
A VRF proof confirming the identity of the noise peers
When a verifier receives a masked update from another peer, it can confirm that the masked SGD update is consistent with the commitments to the unmasked update and the noise by using the homomorphic property of the commitments . A masked update is legitimate if the following equality holds:
The verifier then evaluates the masked SGD update using RONI . In RONI, as long as the isolated update on the model dose not degrade the performance beyond a threshold, the update is accepted and is signed by the verifier using their public key.
Verifiers use their local dataset in performing verification and may therefore disagree on the impact of an update. We use a majority voting scheme to accept or reject updates. Once a peer receives a majority number of signatures from the verifiers, the update can be disseminated for aggregation.
4.6 Aggregation protocol
All peers with a sufficient number of signatures in the verification stage aggregate their SGD updates together and apply the aggregate to the global model. The update equation in SGD (see Appendix A) can be re-written as:
where is the verified SGD update of peer and is the global model at iteration .
However, these updates cannot be directly collected at a peer because they contain sensitive information. Hence, they present a privacy dilemma: no peer should observe an update from any other peer, but the updates must eventually be stored in a block.
The objective of the aggregation protocol is to enable a set of aggregators, predetermined by the VRF function, to create the next block with the updated global model by learning the without learning any individual updates. Biscotti uses a technique that preserves privacy of the individual updates if at least half of the aggregators participate honestly in the aggregation phase. This guarantee holds if the VRF function selects a majority of honest aggregators, which is likely when the majority of stake is honest.
Biscotti achieves the above guarantees using polynomial commitments (see Appendix C) combined with verifiable secret sharing  of individual updates. The update of length is encoded within a -degree polynomial, which can be broken down into shares such that (). These shares are distributed equally among aggregators. Since an update can be reconstructed using shares, it would require colluding aggregators to compromise the privacy of an individual update. Therefore, given that any majority of aggregators is honest and does not collude, the privacy of an individual peer update is preserved.
Recall that a peer with a verified update already possesses a commitment to its SGD update signed by a majority of the verifiers from the previous step. To compute and distribute its update shares among the aggregators, peer runs the following secret sharing procedure:
The peer computes the required set of secret shares for aggregator . In order to ensure that an adversary does not provide shares from a poisoned update, the peer computes a set of associated witnesses . These witnesses will allow the aggregator to verify that the secret share belongs to the update committed to in . It then sends to each aggregator along with the signatures obtained in the verification stage.
After receiving the above vector from peer , the aggregator runs the following sequence of validations:
ensures that has passed the validation phase by verifying that it has the signature of the majority in the verification set.
verifies that in each share is the correct evaluation at of the polynomial committed to in . (For details, see Appendix C)
Once every aggregator has received shares for the minimum number of updates required for a block, each aggregator aggregates its individual shares and shares the aggregate with all of the other aggregators. As soon as a aggregator receives the aggregated shares from at least half of the aggregators, it can compute the aggregate sum of the updates and create the next block. The protocol to recover is as follows:
All m aggregators broadcast the sum of their accepted shares and witnesses
Each aggregator verifies the aggregated broadcast shares made by each of the other aggregators by checking the consistency of the aggregated shares and witnesses.
Given that obtains the shares from aggregators including itself,
can interpolate the aggregated shares to determine the aggregated secret
Once has figured out , it can create a block with the updated global model. All commitments to the updates and the signature lists that contributed to the aggregate are added to the block. The block is then disseminated in the network. Any peer in the system can verify that all updates are verified by looking at the signature list and homomorphically combine the commitments to check that the update to the global model was computed honestly (see 4.2). If any of these conditions are violated, the block is rejected.
4.7 Blockchain consensus
Because VRF-computed subsets are globally observable by each peer, and based only on the SHA-256 hash of the latest block in the chain, ledger forks should rarely occur in Biscotti. For an update to be included in the ledger at any iteration, the same noising/verification/aggregation committees are used. Thus, race conditions between aggregators will not cause forks in the ledger to occur as frequently as in e.g., BitCoin .
When a peer observes a more recent ledger state through the gossip protocol, it can catch up by verifying that the computation performed is correct by running the VRF for the ledger state and by verifying the signatures of the designated verifiers and aggregators for each new block.
In Biscotti, each verification and aggregation step occurs only for a specified duration. Any updates that are not successfully propagated in this period of time are dropped: Biscotti does not append stale updates to the model once competing blocks have been committed to the ledger. This synchronous SGD model is acceptable for large scale ML workloads which have been shown to be tolerant of bounded asynchrony . However, these stale updates could be leveraged in future iterations to lead to faster convergence if their learning rate is appropriately decayed . We leave this as an optimization for future work.
Biscotti in 4,500 lines of Go 1.10 and 1,000 lines of Python 2.7 and released it as an open source project. We use Go to handle all networking and distributed systems aspects of our design. We used PyTorch to generate SGD updates and noise during training. By building on the general-purpose API in PyTorch, Biscotti can support any model that can be optimized using SGD. We use the go-python  library to interface between Python and Go.
We use the kyber  and CONIKS  libraries to implement the cryptographic parts of our system. We use CONIKS to implement our VRF function and kyber to implement the commitment scheme and public/private key mechanisms. To bootstrap clients with the noise commitments and public keys, we use an initial genesis block. We used the bn256 curve API in kyber for generating our commitments and public keys that form the basis of the aggregation protocol and verifier signatures. For signing updates, we use the Schnorr signature  scheme instead of ECDSA because multiple verifier Schnorr signatures can be aggregated together into one signature . Therefore, our block size remains constant as the verifier set grows.
We had several goals when designing the evaluation of our Biscotti implementation. We wanted to demonstrate that (1) Biscotti can be used to successfully train a variety of ML models in a P2P deployment, (2) Biscotti’s design scales to accommodate more nodes, (3) Biscotti is robust to poisoning attacks and information leakage, and (4) Biscotti is tolerant to node churn.
For experiments done in a distributed setting, we deployed Biscotti to train an ML model across 20 Azure A4m v2 virtual machines, with 4 CPU cores and 32 GB of RAM. We deployed a varying number of peers in each of the VMs. We measure the error of the global model against a partitioned validation dataset and ran each experiment for 100 iterations. To evaluate Biscotti’s defense mechanisms, we ran known inference and poisoning attacks on federated learning [27, 22] and measured their effectiveness under various attack scenarios and Biscotti parameters. We also evaluated the performance implications of our design by isolating specific components of our system and varying the VRF committee sizes with different numbers of peers.
By default, we execute Biscotti with the parameter values in Table 2. Table 1 shows the datasets, types of model, number of training examples , and the number of parameters in the model that we used for our experiments. In local training we were able to train a model on the Credit Card and MNIST datasets with accuracy of 100% and 96%, respectively.
6.1 Baseline performance
|Dataset||Model Type||Examples ()||Params ()|
|Privacy budget ()||2|
|Number of noisers||2|
|Number of verifiers||3|
|Number of aggregators||3|
|Proportion of secret shares needed||0.125|
|Initial stake||Uniform, 10 each|
|Stake update function||Linear, +/- 5|
We start by evaluating how Biscotti generalizes to different workloads and model types. We evaluated Biscotti with logistic regression and softmax classifiers. However, due to the general-purpose PyTorch API, we claim that Biscotti can generalize to models of arbitrary size and complexity, as long as they can be optimized with SGD and can be stored in our block structure. We evaluate logistic regression with a credit card fraud dataset from the UC Irvine data repository, which uses an individual’s financial and personal information to predict whether or not they will default on their next credit card payment. We evaluate softmax with the canonical MNIST  dataset, a task that involves predicting a digit based on its image.
We first execute Biscotti in a baseline deployment and compare it to the original federated learning baseline . For this we partitioned the MNIST dataset  into 100 equal partitions, each of which was shared with an honest peer on an Azure cluster of 20 VMs, with each VM hosting 5 peers. These Biscotti/Federated Learning peers collaborated on training an MNIST softmax classifier, and after 100 iterations both models approached the global optimum. The convergence rate over time for both systems is plotted in Figure 6 and the same convergence over the number of iterations is shown in Figure 7. In this deployment, Biscotti takes about 8.3 times longer (6 minutes vs. 50 minutes) than Federated Learning, yet achieves similar model performance after 100 iterations.
6.2 Performance cost break-down
In breaking down the overhead in Biscotti, we deployed Biscotti over a varying number peers in training on MNIST. We captured the total amount of time spent in each of the major phases of our algorithm in Figure 2: collecting the noise from each of the noising clients (steps and ), executing RONI and collecting the signatures (steps and ) and securely aggregating the SGD update (steps and ). Figure 8 shows the breakdown of the total cost per iteration for each stage under a deployment of 40, 60, 80 and 100 nodes. Biscotti spends most of its time in the verification stage. We found this to be a limitation of the go-python library and performing RONI on models in PyTorch. Optimizing these components remains future work.
6.3 Scaling up VRF sets in Biscotti
We evaluate Biscotti’s performance as we change the size of the VRF-computed noiser/verifier/aggregator sets. For this we deploy Biscotti on Azure with the MNIST dataset with a fixed size of 50 peers, and only vary the number of noisers needed for each SGD update, number of verifiers used for each verification committee, and the number of aggregators used in secure aggregation. Each time one of these sizes was changed, the change was performed in isolation; the rest of the system used the default values in Table 2. Figure 9 plots the average time taken per iteration in each experiment.
Increasing the number of noisers only incurs additional round trip latencies and adds little overhead to the system. As shown in Figure 8, verification adds the largest overhead to our system, with a large performance penalty when 10 verifiers are used. Lastly, the performance of the system improves when the number of aggregators increases. Since aggregators do not contribute updates to the ledger at a given iteration, this reduces the number of contributors, which lowers the aggregate amount of communication and verification in the system. Additionally, increasing the number of aggregators makes it easier for a sufficient number of secret shares to be collected in any given iteration, reducing the amount of time taken to create the block.
6.4 Training with node churn
A key feature of Biscotti’s P2P design is that it is resilient to node churn (node joins and failures). For example, the failure of any Biscotti node does not block or prevent the system from converging. We evaluate Biscotti’s resilience to peer churn by performing an MNIST deployment with 50 peers, failing a peer at random at a specified rate. For each specified time interval, a peer is chosen randomly and is killed. In the next time interval, a new peer joins the network and is bootstrapped, maintaining a constant total number of 50 peers. Figure 10 shows the rate of convergence for varying failure rates of 4, 2, 1 peers failing and joining per minute. When a verifier or aggregator fails, Biscotti defaults to the next iteration, which appears as periods of inactivity/flat regions in Figure 10.
Even with churn Biscotti is able to make progress towards the global objective, albeit with some delay. Increasing the failure rate does not harm Biscotti’s ability to converge; Biscotti is resilient to churn rates up to 4 nodes per minute (1 node joining and 1 node failing every 15 seconds).
6.5 Biscotti defense mechanisms
We evaluate Biscotti’s ability to resolve the challenges presented in
Section 2: security (from poisoning), privacy
(using secure aggregation), and privacy (using noise).
Defending against a poisoning attack. We deploy Biscotti and federated learning and subject both systems to a poisoning attack while training on the Credit Card dataset. We introduce 48% of peers into the system with a malicious dataset: all labels in their local dataset are flipped. A successful attack results in a final global model that predicts credit card defaults to be normal, and normal credit card bills to be defaults, resulting in a high error. Figure 11 shows the resulting test error while training on federated learning and Biscotti when 48% of the total stake is malicious.
In federated learning, no defense mechanisms exist, and thus a poisoning attack creates a tug or war between honest peers and malicious peers. The performance of the model (top line in Figure 11) jostles between the global objective and the poisoning objective, without the model making progress towards reaching either of the objectives.
The combination of RONI verification and the stake update function
ensures that honest peers gain influence over time. Each time a
malicious client’s update is rejected, their stake in the system is
penalized. In the above experiments, after 100 iterations (at the end
of the training), the stake of honest clients went up from 52% to
Defending against an information leakage attack. We also evaluate a proposed attack on the noising protocol, which aims to de-anonymize peer gradients. This attack is performed when a verifier colludes with several malicious peers. When bootstrapping the system, the malicious peers pre-commit noise that sums to 0. As a result, when a noising VRF selects these noise elements for verification, the masked gradient is not actually masked with any noise, allowing a malicious peer to perform an information leakage attack on the victim.
We evaluated the effectiveness of this attack by deploying Biscotti with 100 peers and varying the proportion of malicious users (and malicious stake) in the system. Each malicious peer is bootstrapped in the system with zero noise, and performs an information leakage whenever they are chosen as a verifier, and the total added noise is zero. We measure the probability of a privacy violation occurring in this deployment, and Figure 12 shows the probability of a privacy violation when the proportion of malicious peers increases.
When the number of noisers for an iteration is 3, an adversary requires needs at least 15% of stake to successfully unmask an SGD update. This trend continues when 5 noisers are used: over 30% of the stake must be malicious. When the number of noisers is 10 (which has minimal additional overhead according to Figure 9), privacy violations do not occur even with 50% of malicious stake. By using a stake-based VRF for selecting noising clients, Biscotti prevents adversaries from performing information leakage attacks on other clients unless their proportion of stake in the system is overwhelmingly large.
RONI limitations. We used RONI for our poisoning detection mechanism in Biscotti. However, for RONI to work effectively, the data distribution among the clients should ideally be IID. This ensures that the verifiers in each round are qualified
to evaluate the quality of updates by having data for classes that are included in the update. Applications like training a collective facial recognition system where the data has a bimodal distribution (e.g., every client possesses data belonging to a different class) would not make significant progress in Biscotti because updates would be frequently rejected by RONI.
Fortunately, Biscotti is compatible with other poisoning detection
approaches. In particular, Biscotti can integrate any mechanism that
can deal with masked updates and does not require history of updates
(since previous updates are aggregated in the blockchain).
Stake limitations and data diversity. The stake that a client possesses plays a significant role in determining the chance that a node is selected as a noiser/verifier/aggregator. For verifiers this is motivated by the fact that the more contributions you make to the training, the better qualified you are in evaluating the quality of future updates. However, this naive verifier selection means that some unpoisoned updates might be rejected because a verifier may be insufficiently qualified to evaluate the quality of those particular updates.
In our future work we plan to implementing a stake mechanism that is based on the diversity of data that a peer contributes to the system. This data diversity measure would allow a peer to be matched with a verifier with similar data/classes so that the effectiveness of the update can be established more accurately.
Biscotti’s use of stake does not resolve fundamental issue with the stake mechanism, which we consider to be outside the scope of our design. For example, Biscotti’s stake mechanism suffers from the classic problem of nothing at stake and long range attacks.
8 Related work
We review the broader literature on secure, private, and distributed
ML. Many solutions defend against poisoning attacks and information
leakage attacks separately, but due to the tradeoff between
accountability and privacy, no solution is able to defend against both
in a P2P setting.
Securing ML. As an alternative defense against poisoning attacks, Baracaldo et al.  employ data provenance as a measure against poisoning attacks. The history of training data in the system is tracked, and when an anomalous data source is detected, the system removes information from anomalous sources of data. This system requires a system to track data provenance, but if this exists, Biscotti could potentially encode this information into the block structure and retroactively retrain models upon detection of an anomalous source.
AUROR  and ANTIDOTE  are other systems to defend multi-party ML systems from poisoning attacks. However, these attacks rely on anomaly detection methods, which require centralized access to information in the system and are infeasible in a P2P setting.
Biscotti is designed for the P2P setting: models, training data and computational resources are not housed in a central location and there does not exist a single point of control in the training process.
By using RONI, Biscotti increases its false positive rate in detecting
malicious updates but is able to decentralize the information required
in detection poisoning attacks, while providing data and SGD update
Privacy-preserving ML. Cheng et al. recently proposed using TEEs and privacy-preserving smart contracts to provide security and privacy to multi-party ML tasks . But, some TEEs have been found to be vulnerable ; Biscotti does not use TEEs.
Other solutions employ two party computation  and encrypt both the model parameters and the training data when performing multi-party ML. However, this requires an assumption about the two party setting, and systems in this space do not scale well to P2P settings.
Differential privacy is another technique that is often applied to multi-party ML systems to provide privacy [18, 4, 41], but this introduces privacy-utility tradeoffs. Since Biscotti does not apply the noised updates to the model, we do not need to consider the privacy-utility tradeoff when performing distributed SGD. However, the value of affects the error rate in verification.
Biscotti does not rely on the assumptions made by prior work to provide privacy and is novel in simultaneously handling poisoning attacks and in preserving privacy.
The emergence of large scale multi-party ML workloads and distributed ledgers for scalable consensus have produced two rapid and powerful trends; Biscotti’s design lies at their confluence. To the best of our knowledge, this is the first system to provide privacy-preserving peer-to-peer ML through a distributed ledger, while simultaneously considering poisoning attacks. And, unlike prior work in distributed ledgers to facilitate privacy-preserving ML, Biscotti does not rely on trusted execution environments or specialized hardware. In our evaluation we demonstrated that Biscotti can coordinate a collaborative learning process across 100 peers and produces a final model that is similar in utility to state of the art federated learning alternatives. We also illustrated its ability to withstand poisoning and information leakage attacks, and frequent failures and joining of nodes (one node joining, one node leaving every 15 seconds).
We would like to thank Heming Zhang for his help with bootstrapping the deployment of Biscotti on PyTorch. We would also like to thank Gleb Naumenko, Nico Ritschel, Jodi Spacek, and Michael Hou for their work on the initial Biscotti prototype. This research has been sponsored by the Huawei Innovation Research Program (HIRP), Project No: HO2018085305. We also acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC), 2014-04870.
-  A CONIKS Implementation in Golang. https://github.com/coniks-sys/coniks-go, 2018.
-  DEDIS Advanced Crypto Library for Go. https://github.com/dedis/kyber, 2018.
-  go-python. https://github.com/sbinet/go-python, 2018.
-  Abadi, M., Chu, A., Goodfellow, I., McMahan, B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with differential privacy. In 23rd ACM Conference on Computer and Communications Security (2016), CCS.
-  Azaria, A., Ekblaw, A., Vieira, T., and Lippman, A. Medrec: Using blockchain for medical data access and permission management. OBD ’16.
-  Bagdasaryan, E., Veit, A., Hua, Y., Estrin, D., and Shmatikov, V. How To Backdoor Federated Learning. ArXiv e-prints (2018).
Baracaldo, N., Chen, B., Ludwig, H., and Safavi, J. A.
Mitigating poisoning attacks on machine learning models: A data
provenance based approach.
Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security(2017), AISec.
-  Barreno, M., Nelson, B., Joseph, A. D., and Tygar, J. D. The Security of Machine Learning. Machine Learning 81, 2 (2010).
Biggio, B., Nelson, B., and Laskov, P.
Poisoning Attacks Against Support Vector Machines.In Proceedings of the 29th International Coference on International Conference on Machine Learning (2012), ICML.
-  Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., Ramage, D., Segal, A., and Seth, K. Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (2017), CCS.
-  Bottou, L. Large-Scale Machine Learning with Stochastic Gradient Descent. COMPSTAT ’10. 2010.
-  Castro, M., and Liskov, B. Practical byzantine fault tolerance. In Proceedings of the Third Symposium on Operating Systems Design and Implementation (1999), OSDI ’99.
-  Cheng, R., Zhang, F., Kos, J., He, W., Hynes, N., Johnson, N., Juels, A., Miller, A., and Song, D. Ekiden: A platform for confidentiality-preserving, trustworthy, and performant smart contract execution. ArXiv e-prints (2018).
-  Dean, J., Corrado, G. S., Monga, R., Chen, K., Devin, M., Le, Q. V., Mao, M. Z., Ranzato, M., Senior, A., Tucker, P., Yang, K., and Ng, A. Y. Large scale distributed deep networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (2012), NIPS.
-  Dheeru, D., and Karra Taniskidou, E. UCI machine learning repository, 2017.
-  Douceur, J. J. The sybil attack. IPTPS ’02.
-  Dwork, C., and Roth, A. The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science 9, 3-4 (2014).
-  Geyer, R. C., Klein, T., and Nabi, M. Differentially private federated learning: A client level perspective. NIPS Workshop: Machine Learning on the Phone and other Consumer Devices (2017).
-  Gilad, Y., Hemo, R., Micali, S., Vlachos, G., and Zeldovich, N. Algorand: Scaling byzantine agreements for cryptocurrencies. In Proceedings of the 26th Symposium on Operating Systems Principles (2017), SOSP.
-  Gu, T., Dolan-Gavitt, B., and Garg, S. BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain. ArXiv e-prints (2017).
-  Hitaj, B., Ateniese, G., and Pérez-Cruz, F. Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (2017), CCS.
-  Huang, L., Joseph, A. D., Nelson, B., Rubinstein, B. I., and Tygar, J. D. Adversarial Machine Learning. In Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence (2011), AISec.
-  Kate, A., and Zaverucha, Gregory M.and Goldberg, I. Constant-size commitments to polynomials and their applications.
-  Lecun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998).
-  Maxwell, G., Poelstra, A., Seurin, Y., and Wuille, P. Simple schnorr multi-signatures with applications to bitcoin. Cryptology ePrint Archive, Report 2018/068, 2018.
-  McMahan, H. B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (2017), AISTATS.
-  Melis, L., Song, C., De Cristofaro, E., and Shmatikov, V. Inference Attacks Against Collaborative Learning. ArXiv e-prints (2018).
-  Micali, S., Vadhan, S., and Rabin, M. Verifiable random functions. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science (1999), FOCS.
-  Miller, A., Juels, A., Shi, E., Parno, B., and Katz, J. Permacoin: Repurposing bitcoin work for data preservation. In Proceedings of the 2014 IEEE Symposium on Security and Privacy (2014), S&P.
-  Mohassel, P., and Zhang, Y. Secureml: A system for scalable privacy-preserving machine learning, 2017.
-  Nakamoto, S. Bitcoin: A peer-to-peer electronic cash system.
-  Narula, N., Vasquez, W., and Virza, M. zkledger: Privacy-preserving auditing for distributed ledgers. In 15th USENIX Symposium on Networked Systems Design and Implementation (2018), NSDI.
-  Nelson, B., Barreno, M., Chi, F. J., Joseph, A. D., Rubinstein, B. I. P., Saini, U., Sutton, C., Tygar, J. D., and Xia, K. Exploiting Machine Learning to Subvert Your Spam Filter. In Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats (2008), LEET.
-  Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and Lerer, A. Automatic differentiation in pytorch. In NIPS Autodiff Workshop (2017).
-  Pedersen, T. P. Non-interactive and information-theoretic secure verifiable secret sharing. In Proceedings of the 11th Annual International Cryptology Conference (1992), CRYPTO.
-  Recht, B., Re, C., Wright, S., and Niu, F. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems 24, NIPS. 2011.
-  Rubinstein, B. I., Nelson, B., Huang, L., Joseph, A. D., Lau, S.-h., Rao, S., Taft, N., and Tygar, J. D. ANTIDOTE: Understanding and Defending Against Poisoning of Anomaly Detectors. In Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement (2009), IMC.
-  Schnorr, C. P. Efficient identification and signatures for smart cards. In Advances in Cryptology — CRYPTO’ 89 Proceedings (1990).
-  Shamir, A. How to share a secret. Communications of the ACM (1979).
-  Shen, S., Tople, S., and Saxena, P. Auror: Defending against poisoning attacks in collaborative deep learning systems. In Proceedings of the 32nd Annual Conference on Computer Security Applications (2016), ACSAC.
-  Shokri, R., and Shmatikov, V. Privacy-preserving deep learning. In Proceedings of the 2015 ACM SIGSAC Conference on Computer and Communications Security (2015), CCS.
-  Van Bulck, J., Minkin, M., Weisse, O., Genkin, D., Kasikci, B., Piessens, F., Silberstein, M., Wenisch, T. F., Yarom, Y., and Strackx, R. Foreshadow: Extracting the keys to the Intel SGX kingdom with transient out-of-order execution. In Proceedings of the 27th USENIX Security Symposium (2018), USENIX SEC.
Appendix A Distributed SGD
Given a set of training data, a model structure, and a proposed learning task, ML algorithms train an optimal set of parameters, resulting in a model that optimally performs this task. In Biscotti, we assume stochastic gradient descent (SGD)  as the optimization algorithm.
In federated learning , a shared model is updated through SGD. Each client uses their local training data and their latest snapshot of the shared model state to compute the optimal update on the model parameters. The model is then updated and clients update their local snapshot of the shared model before performing a local iteration again. The model parameters are updated at each iteration as follows:
where represents a degrading learning rate, is a regularization parameter that prevents over-fitting, represents a gradient batch of local training data examples of size and represents the gradient of the loss function.
SGD is a general learning algorithm that can be used to train a variety of models, including deep learning 
. A typical heuristic involves running SGD for a fixed number of iterations or halting when the magnitude of the gradient falls below a threshold. When this occurs, model training is considered complete and the shared model stateis returned as the optimal model .
In a multi-party ML setting federated learning assumes that clients possess training data that is not identically and independently distributed (non-IID) across clients. In other words, each client possesses a subset of the global dataset that contains specific properties distinct from the global distribution.
When performing SGD across clients with partitioned data sources, we redefine the SGD update at iteration of each client to be:
where the distinction is that the gradient is computed on a global model , and the gradient steps are taken using a local batch of data from client . When all SGD updates are collected, they are averaged and applied to the model, resulting in a new global model. The process then proceeds iteratively until convergence.
To increase privacy guarantees in federated learning, secure aggregation protocols have been added to the central server  such that no individual client’s SGD update is directly observable by server or other clients. However, this relies on a centralized service to perform such an aggregation and does not provide security against adversarial attacks on ML.
Appendix B Differentially private SGD
We use the implementation of differential privacy as explained by Abadi et al. , which samples normally distributed noise at each iteration, shown in Algorithm 1. Each client commits noise to the genesis block for all expected iterations . Abadi’s solution also requires that the norm of the gradients be clipped to have a maximum norm of 1, we assume this to also be the case when performing SGD in Biscotti.
This precommited noise is designed such that a neutral third party aggregates a client update from Equation (1b) and precommitted noise from Algorithm 1 without any additional information. The noise is generated without any prior knowledge of the SGD update it will be applied to while retaining the computation and guarantees provided by prior work. The noisy SGD update follows from aggregation:
Appendix C Polynomial Commitments and Verifiable Secret Sharing
Polynomial Commitments  is a scheme that allows commitments to a secret polynomial for verifiable secret sharing . This allows the committer to distribute secret shares for a secret polynomial among a set of nodes along with witnesses that prove in zero-knowledge that each secret share belongs to the committed polynomial. The polynomial commitment is constructed as follows:
Given two groups and with generators and of prime order such that there exists a asymmetric bilinear pairing for which the t-SDH assumption holds, a commitment public key (PK) is generated such that where is the secret key. The committer can create a commitment to a polynomial of degree using the commitment PK such that:
Given a polynomial and a commitment , it is trivial to verify whether the commitment was generated using the given polynomial or not. Moreover, we can multiply two commitments to obtain a commitment to the sum of the polynomials in the commitments by leveraging their homomorphic property:
Once the committer has has generated , it can carry out a - secret sharing scheme to share the polynomial among a set of participants in such a way that in the recovery phase a subset of at least participants can compute the secret polynomial. All secret shares shared with the participants are evaluations of the polynomial at a unique point and are accompanied by a commitment to a witness polynomial such that . By leveraging the divisibility property of the two polynomials and the bilinear pairing function , it is trivial to verify that the secret share comes from the committed polynomial . This is carried out by evaluating whether the following equality holds:
If the above holds, then the share is accepted. Otherwise, the share is rejected.