Blockchain-based Federated Learning: A Comprehensive Survey

10/05/2021
by   Zhilin Wang, et al.
0

With the technological advances in machine learning, effective ways are available to process the huge amount of data generated in real life. However, issues of privacy and scalability will constrain the development of machine learning. Federated learning (FL) can prevent privacy leakage by assigning training tasks to multiple clients, thus separating the central server from the local devices. However, FL still suffers from shortcomings such as single-point-failure and malicious data. The emergence of blockchain provides a secure and efficient solution for the deployment of FL. In this paper, we conduct a comprehensive survey of the literature on blockchained FL (BCFL). First, we investigate how blockchain can be applied to federal learning from the perspective of system composition. Then, we analyze the concrete functions of BCFL from the perspective of mechanism design and illustrate what problems blockchain addresses specifically for FL. We also survey the applications of BCFL in reality. Finally, we discuss some challenges and future research directions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

01/27/2022

Towards a Secure and Reliable Federated Learning using Blockchain

Federated learning (FL) is a distributed machine learning (ML) technique...
01/14/2022

Demystifying Swarm Learning: A New Paradigm of Blockchain-based Decentralized Federated Learning

Federated learning (FL) is an emerging promising privacy-preserving mach...
01/04/2022

Survey on the Convergence of Machine Learning and Blockchain

Machine learning (ML) has been pervasively researched nowadays and it ha...
01/04/2021

Fusion of Federated Learning and Industrial Internet of Things: A Survey

Industrial Internet of Things (IIoT) lays a new paradigm for the concept...
04/27/2021

Secure and Efficient Federated Learning Through Layering and Sharding Blockchain

Federated learning (FL) has emerged as a promising master/slave learning...
09/09/2021

System Optimization in Synchronous Federated Training: A Survey

The unprecedented demand for collaborative machine learning in a privacy...
12/01/2020

A Systematic Literature Review on Federated Learning: From A Model Quality Perspective

As an emerging technique, Federated Learning (FL) can jointly train a gl...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Nowadays, machine learning (ML) has been applied in every field, profoundly changing human life. Daily generated data can be gathered from massive end users to train ML models which bring benefits in terms of providing better services to improve our quality of life. However, the current ML framework usually requires end devices to transfer the collected data to the central server for model training, thus causing two challenges. First, data transferring may consume a large amount of communication resources. Second, the submission of raw data increases the risk of privacy leakage, making data owners reluctant to upload data to the central server for security concern.

To address the above concerns, Google proposed a novel ML framework named federated learning (FL), which can effectively protect the privacy of users while allowing multiple end devices to collaboratively train an ML model [mcmahan2017communication]. Different from the conventional ML framework, FL does not require data owners, i.e., clients, to transfer the raw data to the central server for model training, but only upload the parameters of the model trained by the local data. This prevents privacy leakage caused by the data transferring and reduces the transmission cost. In the past few years, FL has been studied well and developed rapidly [Bonawitz2019, Hard2018]. However, the traditional FL framework still faces some problems which undermine the reliability of the whole system [Sattler2020, mcmahan2017communication, Bagdasaryan2018] and can be summarized as follows.

  • Single point of failure. In an FL paradigm, a central server, which is usually named as aggregator, is employed to perform the integration of local training results so as to update the global model. However, the aggregator is not always reliable. Once the centralized aggregator is compromised, the whole FL system will fail. Some potential problems of the aggregator include intentionally dishonest aggregation, accidental network connection failure, unexpected external attacks, etc.

  • Malicious clients and false data. Given the large number of participants in FL, it is impossible to hold the assumption that all clients are honest and will train the local models according to the predefined FL protocols. Therefore, there may exist dishonest clients submitting false data about their local training results. The performance of the global model can be heavily affected by the contamination of invalid data, and the whole FL system might be attacked by malicious clients via other means, such as training the local models using partial datasets.

  • The lack of incentives. In the traditional FL, clients are considered to be contributing their computing powers without receiving any payments, leading to the difficulty of encouraging clients to follow the protocol honestly and provide reliable data. More importantly, since FL usually requires multiple devices to work collaboratively, especially for the data-intensive training tasks in need of a large number of participants, the traditional FL framework may fail to attract enough number of clients engaging in the FL training due to the lack of incentives.

Clearly, the above deficiencies prevent FL from working efficiently and reliably. Therefore, improvements to the traditional FL become essential. Blockchain, as an emerging technology, functions with several attractive properties, such as decentralization, anonymity, and traceability, which has been applied in lots of fields [bodkhe2020blockchain, frizzo2020blockchain, pournader2020blockchain]. Recently, blockchain has also been utilized to address the challenges faced by the conventional FL. First, decentralization can be realized by deploying blockchain in FL, which means that the central aggregator can be replaced by the peer-to-peer blockchain system and the job of aggregating the global model can be handled by blockchain nodes, thus avoiding the unreliability of the whole FL system caused by the failure of the centralized server [Ramanan2019]. Moreover, blockchain can provide verification mechanisms for FL in the name of transaction verification, by which the unqualified or even malicious local model updates can be removed before the global model is aggregated [kim2019blockchain]. Further, blockchain can effectively distribute rewards to FL clients for encouraging their participation and honest behaviors [Liu2020]. Based on our investigation, we argue that blockchained FL framework has at least the following merits:

  • Single-point-failure can be avoided by taken the place of central aggregator with blockchain. In a blockchained FL system, the model aggregation will executed by more than one client.

  • Unreliable data can be filtered out by the verification mechanism. Before the local model updates are being aggregated, the unreliable data will be detected, and only valid data can be added to the global model.

  • More participates and computational resources can be attracted through the incentive mechanisms. Economical incentives (e.g., cryptocurrency) can not only encourage more devices to participate in the model training but also encourage clients to behave following the rules.

  • Learning data can be stored and shared on the distributed ledger. Once the data are recorded on the distributed ledger, they can be hardly tampered. Meanwhile, authorized clients can access to the distributed ledger to retrieve the public data, improving the training efficiency.

From existing research, despite the excellent performance of BCFL in terms of decentralization and providing incentives, new problems such as resource allocation, communication delays, and external attacks arising from the combination of the two still need to be addressed in future research.

To the best of our knowledge, our paper is the first comprehensive investigation of BCFL. Following are the main contributions of our work:

  • We investigate the current research of blockchained FL, and analyse the motivations of applying blockchain to FL.

  • We detail the foundations of BCFL, including the BCFL architecture, blockchain types, and training devices. We first propose that BCFL architectures can be classified into three types based on coupling: fully coupled BCFL, flexibly coupled BCFL, and loosely coupled BCFL.

  • We present the functions of BCFL from the perspective of verification mechanism, global model aggregation, distributed ledger and incentive mechanism. The analysis of these functions explains the changes that blockchain can bring to FL.

  • We analyze the current challenges of BCFL and discuss the future research directions.

The rest of this article is organized as follows. In Section II, we introduce the basics of FL and blockchain; we present the foundations of BCFL in Section III. In Section IV, we detail the four functions of BCFL; and in Section V, we investigate the applications of BCFL in different domains; discussions of the current challenges and future research directions of BCFL are presented in Section VI; and we conclude the paper in Section VII.

Ii Background Knowledge

In this chapter, we will go through the basic principles of blockchain and FL, respectively.

Ii-a Brief Introduction to FL

In real life, mobile devices with smart sensors are used extensively and subsequently generate a massive amount of data. Based on training such data for improving the performance of devices, artificial intelligence has been greatly developed. The data in multiple devices is usually unbalanced and non-independent and identically distributed (Non-IID), and the communication cost among devices is expensive since the devices are massively distributed

[Zhao2018, Yang2020, Kairouz2019]. In addition, storing all the data in a centralized manner is not a secure choice. In these cases, Google introduced a novel distributed machine learning framework termed FL to address the above issues of machine learning on mobile devices [konevcny2016federated, mcmahan2017communication, Konecny2016].

FL is a distributed machine learning technique that trains data on local devices, then the local devices upload the local model updates, i. e., weights and gradients of the local models, to a central server and runs a predefined aggregation algorithm to obtain a global model. The topology of FL is shown in Figure 1. Usually, the local devices are referred to clients, and the central sever is termed aggregator. The basic merit of FL is that it requires no access to the raw data on local devices directly [mcmahan2017communication]. FL allows the privacy of the raw data to be preserved effectively, and also reduces the cost of data transmission and extent the availability of mobile devices [Li2019]. The workflow of traditional FL is described as follows[Konecny2016, Hard2018, Nishio2019]:

  • Clients Selection Clients are selected based on definite protocols, and then they download the latest global model before the training task starts.

  • Local Model Training Clients train the local data and update the trained local models based on predefined algorithms (e.g., Stochastic Gradient Decent (SGD)) independently.

  • Upload Local Model Updates Clients transfer the local model updates to the aggregator.

  • Global Model Aggregation Global model is calculated in the aggregator by executing the aggregation algorithms such as FederatedAveraging (FedAvg).

Fig. 1: Topology of traditional FL

In [Yang2019]

, FL is classified into three categories, i.e., Horizontal FL, Vertical FL, Federated Transfer Learning, based on the characteristics of raw data distribution. The details of the categories of FL are as below.

  • Horizontal FL The datasets have the same characteristics with varying samples.

  • Vertical FL The datasets have the same sample space while the feature space is various.

  • Federated Transfer Learning In the case where both datasets have less overlapping samples and features, transfer learning is used to overcome the lack of data or labels without slicing the data.

In FL, privacy and communication effectiveness are often the primary concerns [Kairouz2019]. [Mothukuri2021]

surveys the research related to the privacy issues of FL, illustrating several attacks which will lead to the leakage of data privacy, e.g., membership inference attack and GAN-based(a deep learning algorithm) inference attack. Meanwhile, authors introduce several countermeasures such as

Differential Privacy(DP) and Secure Multi-party Computation(SMC). DP is a commonly applied technology which preserves privacy by adding noises to private information, i.e., the local model updates that required to be uploaded to the central server. DP reduces the possibility of the data being reverse inferred without too significant loss of data quality. Now DP is widely used in FL to protect the privacy of clients [Geyer2017, Choudhury2019, Truex2019].

To the concern of communication, Sattler et al [Sattler2020] argue that the methodology of collaborative training protects data privacy but causes communication challenges, e.g., the increasing of communication costs. They propose a novel compression algorithm termed Sparse Ternary Compression(STC) which derived from top-k(a technology used to compression) to address the above issue. The experiment results indicate that STC is effective in general situations. Besides, FedPAQ, i.e., FL method with periodic averaging and quantization, is another methodology designed to overtake the challenges of communication bottleneck and scalability and guarantee the accuracy at the meantime in FL [Reisizadeh2019]. FedPAQ allows partial nodes to participate in the local training and then nodes transfer the quantized updates to the parameter server, which averages the global model periodically.

Since FL framework was proposed by Google in 2016, it has been used in practice in many areas, including wireless devices[Chen2020a, Niknam2020, li2020review], healthcare[xu2019federated, brisimi2018federated, chen2020fedhealth], Internet of Things[Du2020, khan2020federated], smart city[qolomany2020particle, jiang2020federated], business and finance [Yang2019]. According to [mcmahan2017communication], the following situations are appropriate for the implementation of FL.

  • The benefit of training the raw data from local devices compared to training the agent data in the central server is significant.

  • The data to be trained is sensitive or the size of the data is large, and uploading such data to the data center is not appropriate to protect privacy of the devices.

  • The labels of the data can be deduced based on the user interactions in the supervised learning tasks.

Ii-B Brief Introduction to Blockchain

In 2008, Nakamoto introduced a peer to peer payment system termed Bitcoin which is totally decentralized and transparent [nakamoto2019bitcoin]. Now Bitcoin is the largest cryptocurrency in the world. The technology backed Bitcoin is blockchain, which provides traceable and immutable records for every transaction and deliver rewards to working nodes based on their contributions. Figure 2 indicates the topology of blockchain. Blockchain technology has been widely used in cryptocurrency(e.g., Bitcoin, Ethereum[wood2014ethereum]), healthcare[agbo2019blockchain, tanwar2020blockchain], smart city[Xie2019, aujla2020blocksdn], Internet of Things(IoT)[wang2020blockchain], etc.

Typically, blockchain has the following properties[Lin2017, Niranjanamurthy2019].

  • Decentralization Since blockchain utilizes the P2P network, there is no need for a third party or a single central node to assist in network propagation. In this way, all nodes are equal. For example, Bitcoin, first proposed by Nakamoto in 2008, was designed to avoid third-party payment platforms interfering with transactions.

  • Traceable The data on the blockchain can be traced back to its source due to the special structure of the blockchain.

  • Anonymity In blockchain, although data on chain is public, privacy can be prevented from being obtained by others through the encryption of private information of users.

  • Immutability Data stored through the blockchain structure is very difficult to be altered.

Blockchain is a distributed ledger which empowered by participated devices named miners. Each miner keeps one replica of the entire ledger locally, and competes to win the opportunity to generate new block which contains a package of transactions.

Bitcoin system is public, which means everyone can join or leave without permissions. While other systems based blockchain are private, allowing only the certified users to participate. Typically, blockchain can be roughly classified into three categories, i.e., private blockchain, consortium blockchain and public blockchain.

  • Public Blockchain In public blockchain, everyone can join or leave without permissions, and participate in consensus process and access to the public ledger. Bitcoin and Ethereum are public blockchain. Public blockchain is entirely decentralized without central authority who may control the network, leaving the records on public blockchain immutable. However, the speed of proceeding the transactions on public blockchain is limited since numerous users are on that chain and transactions required to process are in large amount.

  • Private Blockchain In contrast to public blockchain, nodes on private blockchain are under supervision, which means that only the authorized nodes can join in that network and access to the shared ledger. Meanwhile, nodes on private blockchain are public to other nodes, allowing all the actions on that blockchain traceable. However, private blockchain is not totally decentralized to some extent.

  • Consortium Blockchain Consortium blockchain is partially decentralized and controlled by several predefined or selected nodes(i.e., authorities who have the rights to generate new blocks). Typically, consortium blockchain is private blockchain with different authority mechanisms.

Generally, private blockchain and consortium blockchain can be termed permissioned blockchain since both of them required permissions before the potential users register in the blockchain network. In the applications of blockchain, what kind of blockchain should be adopted is determined by the purpose of the usages.

From the perspective of structure, blockchain is composed by six layers, i.e., data layer, network layer, consensus layer, incentive layer, constract layer and application layer [Xie2019, xu2017design, yu2018virtualization, yuan2016towards]. Next, we will discuss the details of each layer of blockchain.

Fig. 2: Structure of block.
  • Data Layer This is the fundamental layer of blockchain. Within each block, block header and block body are included. The block header contains the hash of the parent block, which is used to connect two blocks. When multiple blocks are generated and connected, they form a blockchain, as shown in Figure 2. In the block header, the data related to mining, such as timestamp, Nonce, and difficulty value are also included, and the Merkle root is the hash value obtained from the Merkle tree in the block body. In the body of block, the transactions are encrypted and stored by a Merkle tree, which can facilitate data query.

  • Network Layer The network layer mainly provides mechanisms of information exchange for each node in the blockchain network, including P2P network mechanism, information propagation mechanism, and data verification mechanism. With P2P network, the risks caused by partial nodes or network failures can be avoided because the nodes communicate with each other. When a transaction is created, it will subsequently be propagated to all nearby nodes for validation. If the validation is approved, it will be propagated to other nodes. Through the propagation mechanism and validation mechanism, invalid transactions can be effectively filtered out, and only valid transactions can move to the next mining process.

  • Consensus Layer Since the blockchain is composed by a large number of nodes, each node can validate transactions, so it is necessary to determine who can generate the new block. This is a process of reaching consensus among nodes, both to be democratic (avoiding authoritative centers) and efficient (enabling all nodes to be willing to reach consensus). Many consensus mechanisms are currently used, such as PoW[vukolic2015quest], PoS[saleh2018blockchain], etc. Proof of Working (PoW), for example, is the most commonly used consensus mechanism, which is adopted by Bitcoin. It works by each working node (miner) performing a mining process on the block which contains a package of transactions, i. e., solving a mathematical puzzle, and the node which solves it firstly will get the opportunity to generate new block. The process of mining consumes a lot of computing power, but provides robust security. Usually, the choice of consensus mechanism is determined by the specific needs of the designed blockchain.

  • Incentive Layer Nodes on a blockchain do not all voluntarily provide the computing power to generate new blocks, unless incentives are offered. In incentive layer, miners will be rewarded based on defined protocols. Typically, the rewards are available when a new block is generated, or the rewards are obtained by charging fees for transactions. By offering economical rewards, miners can be encouraged to participate in mining honestly.

  • Contract Layer The contract layer provides various types of code, scripts and smart contracts that control the operation of the blockchain. Smart contracts are written into the blockchain through computer language with trigger conditions for certain events, and once these events are triggered, the smart contracts are automatically executed according to defined rules. Smart contracts can automatically handle issues on the blockchain network, requiring no third-party intervention and making the blockchain more independent and transparent.

  • Application Layer The uppermost layer of the blockchain, i.e., the application layer, provides the channel for the blockchain to connect with the real world. Blockchain-based applications are deployed in the application layer, such as various types of applications developed on Ethereum.

We should notice that not all the layers mentioned above must be equipped on the blockchain. The three lower layers can be seen as concordance layers, which are essential; the upper three layers are not required for all blockchains.

We take a real-life payment application of Bitcoin as an example to illustrate the workflow of the blockchain.

  • First, user A pays a certain amount of bitcoins to user B and this transaction is recorded.

  • The nearby node propagates this transaction to other nodes, and these nodes will verify whether the transaction is valid or not.

  • If the verification result is valid, the transaction will be put into a block; otherwise, it will be discarded.

  • All nodes that receive the transaction execute PoW, and the one who wins will have the right to generate a new block.

  • The new block will be broadcast to other nodes and added to the blockchain.

Iii Foundations of BCFL

In this paper, we investigate the implementation of several new features in the FL model through blockchain, so as to address some existing problems of FL. In this section, we explore BCFL as a whole system, describing and classifying its architectures. Our work is based on the perspective of the components of the BCFL model. At the beginning, We propose a methodology to classify the architectures of BCFL according to the coupling between blockchain and FL. Next, we analyze blockchain and FL in this system respectively. Since the blockchain has different types, various properties that BCFL models have on different types of chains are discussed. We notice that the participants of the model training of BCFL are distinct, which will affect the deployment of BCFL in specific applications. We will also provide lessons learned in each subsection to illustrate more concretely how BCFL model works. Table LABEL:table:summery shows the summery of the relevant literatures.

Ref. Blockchain Learning devices Blockchain types Consensus protocol Central aggregator Contribution
Fully coupled [Lu2020c] / Device Permissioned PoQ No Blockchained FL to share and retrieve data for IoT devices
[cao2021towards] / Device Public DAG-FL No

Introduced a direct acyclic graph based FL consensus (DAG-FL) to address the asynchrony of devices and anomaly detection of BCFL

[Mugunthan2020] Ethereum / Public / No BlockFlow is an accountable FL system that is fully decentraized and privacy-preserving
[Preuveneers2018] MultiChain / Permissioned PoC No Proposed a permissioned blockchained FL to do anomaly detection
[Li2020c] / / Permissioned CCM No Proposed BFLC which defines the model storage pattens, the training process and committee consensus
[Ramanan2019] Ethereum / Public/ permissioned PoA No Leveraged smart contract to take place of the central aggregator
[Toyoda2019] Ethereum / Public / No Designed a competitive incentive mechanism
[kim2019blockchain] / Edge Public / No Proposed node recognition based local learning weighting method and node selection method
[Hua2020] / / / / No Used blockchain to store, transfer and share machine learning models
[Chai2020] / Device Public PoK No Hierarchical BCFL to share information in IoV
[weng2019deepchain] / / public blockwise-BA No DeepChain provides a value-driven incentive mechanism based on BC
[Bao2019] / / Public / No Proposed FLChain to distribute trust and incentive among trainers
Flexibly coupled [Qu2020] / Device / PoW Yes Prevented single point failure in fog computing based on FL-Block
[Majeed2019] Ethereum Edge / PoW No Leveraged channels to train models and global model state of tie to aggregate model
[Kim2019] Ethereum Device Public PoW No Blockchained FL to exchange and verify model updates and latency analysis
[passerat2019blockchain] Ethereum / Permissioned / Yes Privacy-preserved healthcare consortia based on BCFL
[Hieu2020] / Device / PoW Yes

Used deep reinforcement learning to derive the optimal decisions for the model owner

[Lu2020b] / Edge Permissioned DPoS Yes Leveraged blockchain to train local model updates before global aggregation
[Li2020b] Ethereum / Permissioned PoW/PoS Yes Transplanted the entire crowdsourcing system onto the blockchain
[Ma2020] / Device Public PoW No The central sever was replaced by blockchain
[Lu2020] / Edge Permissioned DPoS No Developed a lightweight verification scheme for permissioned blockchain based on DPoS
[Pokhrel2020] / Device / PoW No Autonomous vehicular system based on BCFL
[Desai2020] Hyperledger fabric/ethereum Device Permissioned/ public / Yes Proposed a BCFL to deter adversarial attacks by accounting
[Zhao2020] / / Permissioned PoS/PoQ No Used the blockchain to replace the central aggregator
[Martinez2019] EOS Device Public PoC Yes Using EOS BC and IPFS to record uploaded updates in a scalable manner and reward users based on training data cost
[Lu2020a] / Edge Permissioned DPoS Yes Developed hybrid blockchain and asynchronous FL to share data in IoT
[Sharma2020] / Edge / / Yes Proposed a multi-layer distributed computing defence framework
[Shen2020] / Edge Permissioned / Yes analyzed the unintended property leakage in BCFL for intelligent edge computing
[Zhang2020a] Ethereum / / PoW Yes Designed an anchoring protocol to build a Merkle tree
[Cui2020] Ethereum Edge Permissioned PoS Yes Blockchain serves as a distributed ledger that records the transactions in terms of models and training parameters
[Liu2020] Ethereum Device / / Yes Prevented poisoning and membership inference attacks
Loosely coupled [UrRehman2020] Ethereum Edge Public PoW Yes Designed a blockchain based reputation-aware fine-gained FL
[kang2019incentive] / Edge Permissioned / Yes Reputation management and incentive mechanism based on blockchain
[Kumar2020] / / Public / Yes Leveraged blockchain to retrieve data in hospitals
[Kang2020] / Edge Permissioned / Yes The concept of reputation is introduced as a metric
[fan2020hybrid] Ethereum Edge Public/ permissioned PBFT Yes Leveraged the hybrid blockchained system for FL in edge computing
[Doku2020] / Edge / PoCI No Ensured the data used to train models in the network is trustworthy and relevant
[Nagar2019] Ethereum Device Public PoW Yes The blockchain network enables exchanging device’s local model updates while verifying their work
/ means that the relevant information is not clearly mentioned in that literature.
TABLE I: Table of the literature related to blockchained FL

For rigorous expression in this paper, some terminologies of blockchained FL are listed and explained below:

  • Clients: devices that work in FL system to collect data and train local models.

  • Nodes: members in blockchain network to provide computing powers and generate new blocks, which can also be called miners.

  • Aggregator: server or other powerful enough equipments to aggregate the global model.

  • Distributed ledger: a traceable and audible database distributed across multiple nodes in blockchain network, storing data for retrieve or audit.

  • Transaction: data records in each block.

  • Local Model updates: gradients and weights computed by clients based on local raw data.

Iii-a Architectures of BCFL

Before we design the BCFL model, a clear understanding of its architecture is necessary. No relevant studies have been conducted on the architecture of BCFL, in our paper, we will fill this gap. We group the architectures of BCFL into three categories: fully coupled BCFL, flexibly coupled BCFL and loosely coupled BCFL, based on different coupling.

Iii-A1 Fully Coupled BCFL

We can define the the framework as the fully coupled blockchain-based FL model (FuC-BCFL) when the clients of FL are the nodes of blockchain, in other words, the clients not only train the local models, but also verify the updates and generate new blocks. The topology of FuC-BCFL is shown in Figure 3. We can derived from the definition of FuC-BCFL that FL model is decentralized since every node on blockchain has chance to participate in the local model training and global model aggregation, thus the role of central aggregator can be take place by the blockchain. In such a framework, there are two methodologies to average the global model: i) some selected nodes collect the validated local model updates and then conduct the aggregation algorithm; ii) all the nodes can participate in the global model aggregation. The distributed ledger contains the training data, including the verified local model updates, global model updates and other data produced during the learning process. Typically, the workflow of the FuC-BCFL can be summarized as follows:

Fig. 3: Topology of fully coupled BCFL.
  • Clients collect data and train the models locally.

  • Local model updates are verified by the (selected) clients.

  • Verified local updates are collected by (selected) clients and then the global model will be updated.

  • New block which stores the verified model updates is added into the distributed ledger.

  • According to incentive mechanism, rewards will be distributed to participates.

FuC-BCFL has been mentioned in various studies. In [Kim2019], clients of FL are edge sides which can sensor data and provide computing powers, and they are responsible for data collecting and data training. The blockchain in that framework works as the distributed ledger to record the training data as well. In that system, the integrity of the raw data is protected and the malicious clients are prevented.

[Hua2020] proposed a FL system based on blockchain, all participates competed to generate new blocks, and then the winner will collect the model parameters and update them into the blockchain. Since no raw data is shared during the training process, the system can preserve the data privacy in a secure manner.

The FL platform with blockchain is designed in [Toyoda2019], assuming all the participates can work rationally under the competition incentive mechanism. This platform can deal with any kind of raw data such as texts, audio, and images, etc. Before local model updates are uploaded, several workers will be selected to go through the security procedure under the smart contract to choose the valid data.

BAFELE is a blockchained FL framework which is central aggregator free and thus decentralized [Ramanan2019]. By delineating the FL mechanism into various rounds and collecting the local model updates and then updating the global model, BAFELE can achieve the same model training result performance as the conventional FL model. Meanwhile, it costs less computational resources.

From the above discussions, we can conclude the following merits and demerits of FuC-BCFL framework.

Merits of FuC-BCFL:

  • The single-point-failure can be avoided effectively as the framework is decentralized and every node has an copy of the distributed ledger.

  • No data are required to transfer to any central server, avoiding the data privacy leakage and reducing communication cost.

Demerits of FuC-BCFL:

  • More computational resources are required due to the operations of both blockchain and FL are running on the same network. Clients not only perform local training, but also integrate the global model.

  • The communication bandwidth of blockchain network is limited, so the latency of communication could be a challenge to FuC-BCFL.

Iii-A2 Flexibly Coupled BCFL

We proposed the flexibly coupled blockchain-based FL model (FlC-BCFL) when blockchain and FL system are in distinct networks. It means that the clients of FL are not the nodes of blockchain (miners). The topology of the flexibly coupled BCFL is shown in Figure 4. From the topology we can see that clients are responsible for local data collecting and training, while the local model updates verification will be done by miners on blockchain. FL can also manipulate blockchain to store the model updates, and the miners on blockchain can also aggregate the global model, making the central aggregator free in that system.

Fig. 4: Topology of flexibly coupled BCFL.
  • Clients collect local data and train the local models, and then upload the local model updates to blockchain.

  • Miners on blockchain perform verification mechanism and only the validated updates can be used to update the global model.

  • After the global model is aggregated, all the data will be stored on distributed ledger.

  • Rewards are allocated to participates according to their performances.

In [Ma2020], a reliable and self-motivated FlC-BCFL system is illustrated, which designs a smart contract to publish task and calculate the global model. Nodes train models locally, while miners aggregate the global model on the blockchain. Miners update the global model according to the algorithm defined in smart contract.

In [Lu2020], blockchain is used to aggregate the global model and the FL process is executing locally to get the local model updates. All the base stations as miners on blockchain and execute the global model aggregation process, while the work in [Lu2020b] leverages one Macro Base Station to integrate the global model.

Kim et al [kim2019blockchain] proposed BlockFL to exchange and verify local model updates on blockchain. BlockFL focus on removing the central aggregator of FL model, making it decentralized. Miners associated with clients are randomly selected, and local model updates are cross verified among miners. In that paper, the latency of communication on BlockFL is analyzed.

Reference [Pokhrel2020a] adopts a similar framework with BlockFL which combines autonomous vehicles and miners. The uniform random vehicle-miner association scheme is proposed in that framework, ensuring all the participates can be trusted. To prevent privacy leakage on internet of things devices, the model in [Zhao2020] is composed by manufactures, customers and blockchain. Manufactures establish the learning task and gain the final global model, and customers provide their computational powers to train local models, meanwhile, the blockchain verify and store the model updates.

We summarise the merits and demerits of flexibly coupled BCFL as below.

Merits of flexibly coupled BCFL:

  • FL and blockchain running on different networks and devices, reducing communication pressure and latency.

  • The raw data remains on the clients, reducing the risk of data leakage caused by malicious attacks on the blockchain network.

  • Blockchain can provide data sharing for FL, which is more efficient than conventionally FL.

Demerits of flexibly coupled BCFL

  • Blockchain and FL belong to two different systems, so it is hard to coordinate the management of them.

  • Single-point-failure still occurs when the central aggregator remains.

Iii-A3 Loosely Coupled BCFL

In [Kang2020a] and [UrRehman2020], reputation as a crucial criteria is introduced to measure the reliability and trustworthiness of the participates in blockchained FL system. Blockchain in loosely coupled BCFL framework (LoC-BCFL) is used to verify model updates and manage the reputation of participates, and only the reputation related data can remain on distributed ledger. Verification of the updates and reputation management are a part of incentive mechanisms to ensure the participates can behave honestly. We describe the framework of loosely coupled BCFL as show in Figure 3-3. The workflow of loosely coupled BCFL is as follows:

Fig. 5: Topology of loosely coupled BCFL.
  • Clients train models locally and upload the local model updates to blockchain.

  • Miners verify the local model updates, then generate reputation opinions for the clients.

  • Miners compete to generate new block which contains the reputation related data, and the new block will be added into the distributed ledger.

  • Aggregator collects the verified updates and then execute the global model aggregation algorithm.

  • Rewards and penalties are depended on the reputation opinions of clients.

In [Kang2020a], reputation and contract theory are combined to support the incentive mechanism for FL. The reputation is calculated by the task publisher according to their historical reputation records in that system. The reputation opinions are stored on reputation blockchain after the selected workers finish their proof work. Then the selected workers can can start FL process, after training the model locally, workers upload the local model updates to task publisher for verification and global aggregation. In that system, reputation is manipulated to choose qualified devices as the workers to conduct federating learning.

Reference [UrRehman2020], a reputation-aware fine-gained FL system is proposed to establish a trustworthy computational environment for mobile edges. Reputation of each participate is calculated by public blockchain and smart contracts. The details of other related literature can be fond on Table 3-1.

Merits of loosely coupled BCFL:

  • Blockchain and FL are completely independent, and FL retains its data better on its members than the previous two architectures.

  • The reputation management mechanism enables better management of participants, ensures the quality of data submitted during model training, and improves the accuracy of the model; it also prevents malicious participants from attacking the system.

Demerits of loosely BCFL:

  • Blockchain is rarely involved in the FL process and only responsible for verification and reputation management, thus the FL model is not decentralized, the risks suck as privacy data leakage and single-point-failure still exist.

  • Maintaining blockchain and FL independently, resulting in inefficient utilization of resources.

Iii-A4 Summerized Lessons

We classify the blockchain based FL frameworks into three categories as mentioned above. We exploit their topology, workflow, merits and demerits. To better understand the characteristics of different structures of BCFL, we summarise the lessons we learned from the above discussions.

  • We can design different BCFL structures according to specific demands. If the system needs to be aggregator free, then the fully coupled BCFL framework is recommended. The flexibly coupled BCFL is suggested when the FL network is not appropriate to run on blockchain network while needs blockchain to assist its learning process for higher model accuracy or data sharing. We can also manipulate blockchain to do reputation management to restrain the behaviors of participates, in this situation, the loosely coupled BCFL will be a good choice.

  • Despite the fact that we can classify them and they can exhibit different properties, it is currently rather challenging to indicate which structure of the BCFL is the safest and most reliable. We argue that the safety and reliability of BCFL should be evaluated in the perspective of specific computing needs and environments.

  • Resource constraint and communication latency are impediments to the efficient operation of BCFL and must be addressed regardless of the architecture.

Iii-B Blockchain Types in BCFL

In this subsection, we will analyze two kinds of blockchain used to assist FL system: public blockchain and permissioned blockchain. We are going to introduce their properties, related works, merits and demerits, respectively. To summarize this subsection, some learned lessons are provided as well.

Iii-B1 Public Chain

Public blockchain is widely used in blockchain-based FL system since it’s decentralized and transparent. Nodes on public blockchain can be any devices which are willing and have enough powers to take part in the learning process without further certification.

Reference [Ma2020] proposes a FL system named BC-FL which runs on a public blockchain network. Training nodes and miners can engage in the system without permission and work together to train a global model. Miners on that network take Proof of Work (PoW) as their verification consensus to generate new block. The BlockFL model advocated in [Kim2020] manipulates public blockchain to verify model updates, the miners are any devices which can provide with sufficient computational powers. Miners compete to complete the PoW, and then the newly generated block will be added to distributed ledger. To attract more vehicles and base stations to provide data and computational resources, the FL system runs on a public blockchain network [Chai2020]. Proof of knowledge (PoK), a lightweight consensus which combines machine learning with blockchain consensus to avoid complicated computation, is illustrated in that system. In the above models, lowering the barrier to engagement enables more computational resources and more data, however, there are still invalid data and malicious nodes due to less discussion towards misbehaviour detection.

To tackle the security issues of public blockchain used for FL, researchers manipulated protocol designs to prevent misbehaviour from malicious workers to ensure the quality of the learned model. BlockFlow in [Mugunthan2020] is a FL system aided by a public blockchain, however, in order to avoid the malicious clients, the system requires every participate to evaluate each other. Subsequently, a scoring procedure which maintained in smart contract conduct is implemented to reflect the training performance of the clients. By doing these, clients are encouraged to provide high-quality training behaviours to that system. The model in [Toyoda2019] indicates a generic full-fledged protocol to improve the reliability of FL system via permissionless blockchain. Workers are not likely to sabotage the learning process due to the competitive model update methodology is designed.

Following are the merits and demerits of public blockchain in BCFL.

Merits of public blockchain in BCFL:

  • More data resources and computational powers can be attracted to collaboratively train a common model , thus large scale FL task can be realized.

  • Public blockchain is totally decentralized and transparent, thus the learning process is traceable and auditable.

Demerits of public blockchain in BCFL:

  • Opening to all devices can lead to difficult to hinder the law-quality data and malicious behaviours.

  • Public blockchain in BCFL generally requires complex consensus, e.g. PoW and PoS, to validate model updates and to create new blocks, causing a significant consumption of computing resources.

Iii-B2 Permissioned Chain

In contrast to public blockchain, permissioned blockchain is only available to authorised clients. In the BCFL system, before the devices are registered in the FL, they will be selected based on their computational resources, participation willingness, and historical performances.

The current research about permissioned blockchain used in BCFL is mainly focus on node selection, i.e, which node can be a part of that chain to continue the learning process. The devices that are intended to be included are usually evaluated before the blockchain starts to operate. In addition, the devices stay or leave at the end of the training depending on their performance. In [Li2020c], alliance chain is leveraged to enable authority, i.e, nodes management, gradients validation and block generation. Committee Consensus Mechanism (CCM) is designed to validate gradients. The committee is composed by a few honest nodes, and they are responsible for charging the verification process. CCM requires less computational resources than PoW, meanwhile can perform as a secure and reliable consensus mechanism. Reference [Bao2019] introduces FLChain to settle a reliable and auditable FL ecosystem. Trainers registered on the blockchain are the entities who are willing to get involved into the training process. Before local model training, miners will be selected according to their reliability and motivation. Malicious trainer’s misbehavior will be detected and punished by the authority in FLChain. In [Lu2020c], FL and permissioned blockchain are integrated. End IoT devices, i.e, base stations and road side unites, are called super nodes on that chain. The local model is trained by committee parties, which are those related registered devices can meet the request of data sharing. Meanwhile, permissioned blockchain remains the data sharing records for audit.

The following is about the merits and demerits of the permissioned blockchain as applied to BCFL systems.

Merits of permissioned blockchain in BCFL:

  • The permissioned blockchain offers a platform for a light-weight consensus protocol, which reduces resource consumption while keeping the system secure.

  • The exposure of the system to malicious attacks is reduced by excluding unauthorised devices from the training of the model.

  • The performances of authorised nodes can be constrained to guarantee the accuracy of the model, due to the evaluation scheme that usually exists within the permissioned blockchain.

Demerits of permissioned blockchain in BCFL:

  • Less attractive to devices and computing resources than public blockchain.

  • Reduced applicability of the system due to threshold of access by users.

Iii-B3 Summerized Lessons

When designing a BCFL system, it is necessary to decide which type of blockchain to use. By using the different blockchains, the system’s different model training needs can be realised. With the above discussion we will conclude with some lessons about blockchain types.

  • We argue that the types of blockchain intrinsically determine the number as well as the quality of the BCFL system’s users. Due to the fact that in some computing environments more computing resources and more participants are required, the task of establishing FL on the public chain can be chosen. However, if the training of a model for FL needs to be implemented on a small scale, the permissioned blockchain can be chosen

  • The public blockchain and the commissioned blockchain can be used in conjunction with each other, and in [Ramanan2019, Desai2020, fan2020hybrid], they assist in the training of the model by providing a distinct role for the BCFL system respectively.

Iii-C Learning Devices in BCFL

In this part we will explore the devices in BCFL system, i.e. on which devices FL will run. We argue that, based on the current literature, BCFL can be used for either end devices, such as mobile phones and smart cars, which can sense external data, or edge nodes [Wang2020], such as base stations, routers and other devices with high computing powers. In the following content, we will discuss the deployment of BCFL on end devices and edge nodes respectively.

Iii-C1 End Devices

Mobile devices such as mobile phones and automated vehicles generally have computing capabilities, so in order to improve computing efficiency, on-device machine learning is used. On device machine learning requires more data than single device’s local data, and the data sharing in devices is necessary [konevcny2016federated, mcmahan2017communication]. FL as a technique for distributed learning is designed to address the above mentioned issues. The end devices gather external data and train it locally when they are involved in FL. Raw data is not transferred to the sever, but only the local model updates to the aggregator. When blockchain is used in the above situation, it usually serves to provide decentralization function or as distributed ledgers for FL. This not only guarantees the data privacy of the end devices, but also improves the security of the entire system.

For now, end devices are looking for convenience and intelligence, so it is inevitable that some storage capacity and computing capability is constrained. Current research focuses on the issues that exist when BCFL is used on on devices, for such things as communication delays, security leakage, and computing resource allocation. In [Kim2019], on-device blockchained FL model is proposed. This paper focuses on data exchanges and verification, and arguers that end-to-end latency is an obstacle of BCFL, and that adjusting the blockchain generation rate could be helpful. However, computing capabilities limitation of end device is not mentioned. The model in [Hieu2020] considers the above issues and designs a deep reinforcement learning methodology to help the machine learning model owner to make the optimal decisions to reduce transmission delay and arrangement the energy consumption. Reference [Lu2020c] leverages blockchain to prevent privacy leakage to secure the data sharing process of the distributed devices. Numerical results shows that the proposed data sharing scheme performs accurately and effectively.

The merits and demerits of applying BCFL on end devices are listed as below.

Merits of on-device based BCFL:

  • Raw data is not required to be transferred to any other devices, reducing the resources consumed by data transmission, while data security is ensured.

  • The usage of end devices is widespread, thus attracting more users and generating more data for model training.

Demerits of on-device based BCFL:

  • End devices have limited computing, storage and communication capacity to undertake complex local computing.

  • An end device is not only responsible for local data collection, model training and data storage, but also for providing the resources to keep the blockchain network running, which may result in the device being unable to do other tasks properly.

Iii-C2 Edge Nodes

In [wang2019adaptive, wang2019edge, qian2019privacy], FL technology is used to support edge computing. In conventional edge computing scheme, raw data is sent to nearby edge node, which can be considered as the central sever where raw data will be proceed. Despite FL can avoid the transmission of raw data by training the raw data locally and uploading the model updates to the edge node or central server, the risks of FL itself such as single-point-failure and privacy leakage still remain. By leveraging blockchain to support the FL based edge computing, the whole system can be more secure and reliable. In the system which combines edge computing, FL and blockchain, all the end devices collect the raw data and then sent them to the nearby edge nodes for model training; blockchain provides data verification and data sharing for edge nodes; and the verified model updates will be transmitted to central server for global model aggregation.

Reference [Cui2020] introduces a system named CREAT, which adopts blockchain to help the edge computing to catch content during the FL process. IoT devices transfer collected data to blockchain, and each edge node downloads the data and then computes the gradients independently. The original purpose of applying FL model to edge computing is to ensure edge nodes can collaboratively learn the features of users and files so that the catch hit rate can be promoted by predicting popular files. Blockchain is incorporated to secure the data transmission and sharing. In [fan2020hybrid], edge nodes equipped with computational powers and storage can receive the data from end devices, and train the deep learning model collaboratively. Public blockchain and permissioned blockchain provide the collaboratively and auction mechanism to the FL system, respectively.

Here are the merits and demerits of learning devices of BCFL.

Merits of edge nodes based BCFL:

  • Edge nodes based BCFL is able to provide sufficient storage capacity and computational resources.

  • Edge computing can be more secure and reliable, and it’s application is wider.

Demerits of edge nodes based BCFL:

  • Raw data needs to be transferred, which reduces security and increases the consumption of the resources required for transmission.

  • The distribution of edge nodes is not as widespread as end devices, which may limit the application of BCFL.

Iii-C3 Summerized Lessons

In this section we explore the scenario when BCFL is deployed on end devices and edge nodes respectively. Besides inconsistent workflows, different devices can affect the overall performance of the system. We will conclude this subsection in the following.

  • The overall difference between the ways in which end devices and edge nodes are involved in a BCFL system is that the former keeps raw data local, while the latter needs to collect raw data from multiple devices.

  • From the blockchain level, some of the models’ blockchains are maintained via edge nodes, while others are maintained via end devices. The blockchain, as a technology that assists FL to be secure and communicationally enhanced, can have participants, i.e. nodes, that are not devices directly involved in the training process of FL. This is noticed in Section III-A.

Iv Functions of BCFL

In this section, we investigate the specific functions of BCFL with the perspective regarding its workflow, including verification of model updates, aggregation of global model, utilization of the distributed ledger, and incentive mechanism.

Iv-a Verification of Model Updates

To train a well performed global model, FL needs to ensure that all the devices engaged in the model training process work honestly and provide reliable data. This problem is not well tackled in traditional FL models. To address this issue, we can take advantage of blockchain to verify the submitted data, excluding the dishonest and unreliable data.

Iv-A1 Verification Protocol

In each round, the local devices transmit the trained local model updates to the miners for further validation (no data transfer is required in FuC-BCFL framework). Therefore, a suitable validation mechanism needs to be designed to verify the validity of the data and reduce the time and resources consumed.

Current research places significant emphasis on verification mechanisms. The work in [Kang2020a] proposes a Proof of Verifying (PoV) consensus to ensure the uploaded local model updates are valid before the global model aggregation. The main idea of PoV is to prepare the testing dataset in advance and set a threshold for accuracy. According to PoV, the testing dataset which is reliable and provided by the task publisher will be prepared on the blockchain, and then the miners utilize this dataset to verify the uploaded updates. The qualified updates are selected based on a given accuracy threshold and put into blocks as transactions. The threshold can be determined empirically, but the selection of testing dataset is a challenge because it is hard to use the previous data for valuation once a new learning environment is situated.

The verification process in [Li2020c] is similar to the PoV mentioned above, and the miners in a committee are responsible for verifying the updates and scoring them, while the details of how to score the updates are not mentioned.

Reference [Cui2020] designs smart contracts to verify the transactions storing the local model updates. The whole process requires the randomly selected consortium members to vote whether the updates are reliable or not, and the decision is based on the number of received votes. Although randomly selected members are required to participate in the voting, it is hard to show that this avoids the influence of subjectivity, so more evidence is needed to support this methodology.

[Lu2020a] designs a two-stage verification scheme, which manipulates cumulatively calculated reputations based on the accuracy of the updates and nodes on the blockchain to evaluate the quality of the transactions.

Although the importance of validation mechanisms is mentioned in some studies, no specific descriptions of the workflows are provided [kim2019blockchain, Pokhrel2020a].

Iv-A2 Summerized Lessons

  • The verification mechanism can be designed in various forms, but it is more common to filter the updates before conducting model aggregation to avoid unreliable data from affecting the global model. Of course, it is also possible to manage the updates through the feedback after model aggregation.

  • By validating the updates, the verification mechanism can not only filter out the unreliable data, but also the data providers can be constrained to behave. In addition, the results of verification can also be used for the later guidance of rewards allocation.

  • Based on our research, although researchers realized the importance of the verification of model updates before aggregating them, studies about the design of effective validation mechanisms are lacking.

Iv-B Aggregation of Global Model

The basic idea of FL is to distribute model training tasks to numerous local devices and then to integrate the local models through a central aggregator. Therefore, model integration is a crucial component of the FL process. In the following section, we will explore how can we utilize blockchain technology to assist the aggregation of global model for FL. Based on our investigation of current research, our analysis will focus on the members who are engaged in model integration in the BCFL framework. In Section 3.A, we discussed the architectures of BCFL, and we found that in some BCFL models, the central aggregator are still remained since the blockchain and FL are coupled in different ways [passerat2019blockchain, Lu2020b]. In the following content, we are not going to discuss this kind of situation because we are rather interested in knowing how decentralized model integration is enabled via the application of blockchain.

Iv-B1 Selected Blockchain Nodes

In some models, after the local model updates are verified by the nodes on the blockchain, only the selected nodes participate in global model integration. Those selected nodes are usually well equipped with enough computational resources or have good historical performance records.

In [Li2020c], the authors propose a committee consensus mechanism to verify the local model updates and then aggregate the global model. They argue that the election of the committee is crucial to performance of the global model, and they also introduce three kinds of committee election methodologies, including random election, sorting by score, multi-factor optimization. The experimental results show that the model under this mechanism can obtain similar performance as the conventional FL model.

In [Lu2020c], the committee nodes are responsible for model training and aggregation, which are selected according to their registration records. This kind of election of committee lacks the evaluation of data provider’s reliability, leaving the quality of raw data uncertain.

By selecting some nodes to participate in the model integration, on the one hand, it can avoid the existence of a central node and achieve decentralization; on the other hand, the selected nodes are usually more reliable, and the overall resource consumption can be reduced by implementing them to complete the model aggregation.

Iv-B2 All Blockchain Nodes

When all the data providers or miners are independently involved in the aggregation of the model, such a framework is decentralized and avoids any authority center completely. This is the most commonly used framework for applying blockchain to FL.

Fully decentralized global model aggregation is usually done by miners or data providers on the blockchain, i.e., local devices. In flexibly coupled BCFL models, miners and data providers are not the same, and each miner aggregates the global model via aggregation algorithms after finishing the verification of local data updates [Ma2020, Kim2019]. While in the fully coupled BCFL framework, the local devices are usually the miners, so they not only collect the data and then train the local models, they also verify the updates and calculate the global model[Toyoda2019, Preuveneers2018].

By replacing the central aggregator with the blockchain, the task of model integration is delegated to nodes on the blockchain, which can be miners or data providers, depending on the different coupling framework. In that case, the BCFL can be completely decentralized that every node can participate in model aggregation, avoiding single-point-failure effectively.

Iv-B3 Summerized Lessons

  • Blockchain allows FL to modify the process of model aggregation, leaving central aggregator unnecessary.

  • No matter the global model is computed by partial nodes or all nodes, the integration of the model can be effectively decentralized.

Iv-C Utilization of the Distributed Ledger

In the conventional FL model introduced by Google[Konecny2016a], the raw data are kept on the local devices while the local model updates shall upload to the central aggregator. With the help of blockchain technology, FL can work effectively without the central aggregator. When the miners finish the verification work, the new block will be generated and added to a blockchain where the validated local model updates and the aggregated global model are stored[kim2019blockchain]. In this process, blockchain works as the distributed ledger, which stores the model updates and provides an accessible platform for all the qualified participates to retrieval the data. In this subsection, we will discuss the two aspects of blockchain as the distributed ledger in the BCFL model: data storage and data sharing.

Iv-C1 Data Storage

In conventional FL, local model updates are generally transferred to the central aggregator and then stored, requiring more transferring cost and storage capacity. By incorporating blockchain for assisting FL, the data storage issue in the training process can be effectively ameliorated. To some extent, blockchain is a distributed ledger that can provide a secure, traceable, and immutable way to store data. All the training related data, including the local model updates, global model updates and reputation of the participates, are treated as the transactions of blockchain and needed to be verified by the miners. First only the validated data can be recorded in the newly generated block, and then the block will be added to a blockchain. By this design, the data in distributed ledger is traceable and immutable, which means once the transaction is added to the blockchain, it is nearly impossible for any device to change the records.

Current research is less concerned about the concrete structure of the blockchain in BCFL. The work in [Sharma2020] describes details of the structure of the blocks in the blockchain used for FL. A block consists of a block header, which contains information such as model ID, data ID, timestamp and data types, and a block body, which holds model updates.

In [Li2020c], the recommended system chooses the alliance blockchain to store the data, allowing only the authorized participates to access to the ledger. The blocks on that blockchain are varied, and they are two kinds of them: one is used to store the global model for each round and is called model block; and the other one is named update block, which is implemented to store the local model updates and other learning information such as address of devices and update scores.

Iv-C2 Data Sharing

In Google’s conventional FL model, only the central aggregator can get the updates from the devices [Konecny2016, Konecny2016a], while in blockchained FL model, all the qualified participates can access to the blockchain to retrieval and share the data to support model training. Blockchain provides a data sharing platform for FL to train a machine learning model with better generalization capability. What’s more, the data shared during the training process are the local model updates and other related data(i.e, reputation, IP address, timestamp and so on.) rather than the raw data [UrRehman2020, Awan2019, Ma2020]. In this case, the data privacy can be well protected and the efficiency of model training can be improved.

Some research focus on designing the scheme of data sharing based on blockchained FL [Lu2020a, Lu2020c]. For example, reference [Lu2020c] builds a permissioned blockchain-based FL environment to share the data among distributed industrial IoT devices. In permissioned blockchain network, there are two kinds of transactions should be proceed:data retrieval and data sharing. The local devices communicate through the blockchain, which can ensure the security of data transmission. The super nodes on the permissioned blockchain, i.e., routers, base stations, and other facilities with strong computing powers, keep the records of the local devices of the IoT after being encrypted. In addition, in order to improve the efficiency of data retrieval and model training, local devices with the same data type are grouped in a community. In each committee, the ID information of each participant is public. By this design, the data can be shared in an efficient and secure way. The authors argue that the encryption methodology for data sharing can’t avoid data leakage, thus they design a request and reply protocol between the data requester and the permissioned blockchain. After the requester sends a request for data sharing, the blockchain members will first check whether there are already records that match the request, and return the result directly if there are; if not, they will train the model through the relevant committee nodes and finally return the result. In this model, blockchain provides the platform to store data and retrieve it securely. However, since this data sharing framework involves storing the model for retrieval in advance and keeping the data of local devices through super nodes, further research is needed to investigate whether it can effectively prevent external attacks.

Iv-C3 Summerized Lessons

  • From the perspective of learning process, blockchain provides distributed data storage and public data sharing for FL. Instead of storing the data generated during the learning process in the central aggregator, federal learning only needs to store this data through the blockchain, which can make the relevant data freely available to all authenticated participants.

  • From the perspective of data security, the blockchain itself can be seen as a distributed ledger with characteristics such as immutability, auditability and decentralization. Blockchain can record all necessary data and also prevent malicious nodes from altering it. And only authenticated participants can access the data related to FL, preventing the privacy leakage.

Iv-D Incentive Mechanism

This subsection will discuss how the incentive mechanism in BCFL ensure that participants work honestly according to the protocol, ensuring the final trained global model reliable.

Iv-D1 Incentive Mechanism Design

FL offers a distributed computing solution for machine learning. However, traditional federation learning models cannot guarantee that all participating clients are reliable. Blockchain can address this issue by distributing the corresponding rewards to nodes that have contributed in the generation of blocks based on their contributions. In this way, by incorporating a blockchain into the FL model and rewarding the participants (local devices and miners) according to a certain scheme, participants can be motivated to provide reliable training data. In addition, the incentive mechanism can also penalize dishonest nodes, filtering out the malicious participates.

Incentive mechanisms have been emphasized in the existing studies. The work in [weng2019deepchain] designs a payment-based incentive mechanism to encourage participates to collaboratively train a deep learning model. Two properties of that incentive mechanism are introduced, i.e, compatibility and liveness. Compatibility assumes that all the participates can get maximum rewards based on their contributions, and liveness means that all the participates have the willingness to update both the local model and global model. After the final global model is updated, the rewards will distribute to local devices and miners according to their contributions.

In [Toyoda2019], repeated competition is implemented to motivate the workers to obey the rules of the protocol in order to obtain the maximum profits. The basic idea is to introduce a mechanism for workers to compete for the opportunity to update models at each training round and to constrain their subsequent performance through a voting scheme. The distribution of the returns will be determined by sorting the records of the votes.

Reference [Desai2020] argues that monetary is most popular incentive for participates in BCFL, and illustrates a penalty scheme which requires each participate to deposit a certain amount of cryptocurrency on the blockchain. When the global model is well trained, the deposits will return to the participates, and additional rewards are distributed to encourage honest behavior. The rewards are determined by the average time participants take to submit data, with faster submissions being awarded more. On the contrary, if one participate is found being dishonest, then it’s deposits will lost.

In other studies, there are distributions of returns based on calculating the contribution of participants in model training [Li2020c, Zhang2020a], and participant management based on reputation [kang2019incentive, UrRehman2020, Zhao2020]. These studies provide ideas for future research.

Iv-D2 Summerized Lessons

  • Incorporating an incentive mechanism into the FL model to give participants certain rewards can effectively regulate and discipline their behaviors and can encourage participants to provide reliable training data.

  • Current research lacks in-depth study on how to allocate rewards. On the one hand, a decentralized evaluation system needs to be designed; on the other hand, some defects of the blockchain itself should be taken into account when designing incentive mechanism.

V Applications of BCFL

FL and blockchain are already being applied in many fields. Instead of exploring the real-life applications of both separately, this subsection will investigate the applications of BCFL, a joint technology. According to the current research, BCFL has been initially applied in the fields of Internet of Things, smart city, financial payment, and healthcare, etc. Even though these research are all based on specific usage environments to apply BCFL, there is no general framework for it yet.

V-a BCFL for IoT

In IoT area, devices are decentralized, so model training on them requires timely and secure data and strong model generalization capability. FL in the Internet of Things(IoT) can collaboratively train a global model by numerous devices, avoiding the leakage of private raw data for each device [Du2020, Khan2020]. However FL itself has several deficiencies (e.g. single-point-failure and lack of incentives), and blockchain technology can make the training of models for IoT devices more secure.

Research on the applications of BCFL in the IoT domain focuses on data security, resource planning, communication, and failure detection, all with the aim of enabling IoT devices to jointly train a model with good performance.

The work in [Lu2020a] introduces a BCFL model to protect the privacy in Internet of Vehicles (IoV), and in [Lu2020], communication efficiency and resource limitation in IoT devices based on the BCFL framework are investigated. In industrial IoT (IIoT), the data heterogeneity in failure detection challenges the reliability of the whole system. In [Zhang2020a], a blockchain-based FL model is proposed for failure detection in IIoT. First, a FL model is deployed among IoT devices and a central server is set up for model integration; then, data from local devices is stored via blockchain, which also provides incentives. In the aspect of failure detection, a new aggregation algorithm is designed to reduce the impact of data heterogeneity by considering the distance between positive and negative classes in each dataset.

V-B BCFL for Healthcare

In healthcare area, data of patients are sensitive thus both patients and hospitals are reluctant to share their heal data. FL can help train the model distributively, while the data leakage is the biggest challenge [choudhury2019differential, chen2020fedhealth]. Blockchain can be implemented among patients or hospitals, allowing participates to share data without privacy disclosure.

Passerat-Palmbach et al. [passerat2019blockchain] point out that the protection of patient privacy constrains researchers from analyzing health data, and the existing tools are insufficient to address the issue, so they suggest to use both blockchain and FL for healthcare consortia. In their model, data access, model integration, weight encryption, and auditing of the learning process are emphasized. However, this study is specific to consortia and is not appropriate for most health problems and lacks concrete solutions which can be operated.

In contrast to [passerat2019blockchain], Kumar et al. [Kumar2020] offer an specific solution for COVID-19 detection via BCFL models. Hospitals train the local model based on their own private data and share only the weights and gradients, and blockchain records the learning process and related data. Researchers highlights the privacy of patients, and BCFL framework can protect the privacy when the global model is training. That paper builds a secure and decentralized data sharing platform among hospitals, enabling the automatic detection of COVID-19 in a secure manner.

The research used BCFL in healthcare are rare now, but the research direction is promising since a large amount of medical data have to be proceed and BCFL can offer secure learning environments.

V-C BCFL for Business and Finance

Blockchain first emerged as the basis for Bitcoin, and the explosion of various blockchain-based virtual currencies in recent years in particular has elevated the status of blockchain as an underlying technology in finance and business. Meanwhile, FL can offer a distributed machine learning framework. Therefore, BCFL can provide secure and decentralized applications for the financial and business fields.

The most direct application of BCFL in the finance and business field is to provide a monetary payment method. FedCoin, introduced in [Liu2020], provides a peer-to-peer payment system based blockchain for FL. FedCoin is different to Bitcoin, which depends on PoW, it utilizes the proof of Sharpley (PoSap) to generate new blocks. Such a payment system can be applied to a commercial system based on FL.

In addition, BCFL can be applied in the areas of financial investment, for example, the processing of financial big data. Data from customers of financial companies is sensitive, customers do not want to disclose their data to the concern of privacy, and companies are obliged to keep their customers’ confidential. Therefore, companies can deploy BCFL to obtain data and train models to develop more accurate market-oriented financial products, while protecting customer privacy.

V-D BCFL for Smart City

The construction of a smart city requires a large amount of data, and by training these data and getting reasonable models, it can provide better services to citizens. Similar to many machine learning situations, privacy and security have been constraints to the development of smart cities.

BCFL can provide a secure big data training architecture, while offering rewards based on user contributions to motivate users to provide more data. Imagine this scenario, when government departments need to optimize urban traffic and need multiple devices and users to provide data and collaboratively train models. Traditional machine learning frameworks cannot guarantee privacy and provide incentives at the same time. However, with BCFL, the requirements can be met.

The major advantage of BCFL for smart cities is not only that it can protect privacy and deliver incentives, but also that it can allow more devices to join, adapting to the large number of devices and users in a smart city.

Vi Challenges and Future Research directions

While BCFL has many advantages, some challenges that may hinder the operations of the BCFL model cannot be ignored. In this section, we will analyze the current research deficiencies of BCFL, and then we also suggest some potential future research topics. We argue that a good BCFL model should have high security, high training efficiency and low computational cost. The design of BCFL is a trade-off between these three aspects, and our following analysis will be carried out from them.

Vi-a Privacy and Security

Security and privacy are of importance to the BCFL model, and although both blockchain and FL have privacy-preserving properties, there are still issues that may lead to privacy leakage.

Vi-A1 Anonymity

In the conventional FL model, only the central center knows the sources of the local model updates. However, the addresses of clients are public in BCFL, and other clients obtain the training behaviors based on the public information from blockchain. What’s more, clients generally do not communicate with each other addresses are private information. While in BCFL, since identity information such as public addresses, clients may be able to communicate with each other, increasing the risk of collusion among clients.

Vi-A2 Shared Data

Blockchain stores the blocks which contain the model updates through a chained structure, and all members within the blockchain can access the data from the public distributed ledger as well as download the data. In BCFL, clients can get information about other members from blockchain. In BCFL using public blockchain, since there is no access restriction, information of members may be available to external devices, threatening the security of the whole system. Data sharing can improve the speed of model training and facilitate clients to perform model updates, but the risks associated with data security cannot be ignored.

Vi-A3 Malicious Attack

In the decentralized BCFL model, there is no authority center to regulate the behaviors of participates, therefore the risks of being attacked by potential malicious participates exist. On the one hand, the attacks may come from blockchain system, such as forking, double spending, and selfish mining, etc. Forking is one of most common attacks launched by attackers, which tries to obtain more profits by replacing the most trusted chain (i.e., longest chain) with an alternative chain. Double spending occurs when a currency is spent twice. Selfish mining attack, also named block withholding attack, happens when an entity validates one block but does not broadcast it to the network. On the other hand, attacks from FL system will hinder the deployment of BCFL, including data poisoning, inference attacks, etc. Malicious users can launch data poisoning attack by utilizing dirty data to train the local models, and then upload the biased local models to the aggregator, leading the parameters of global model inaccurate. Even though the uploaded parameters are encrypted, malicious users can still deduce the really information by analyzing them, so inference attacks may cause the leakage of privacy in FL system.

Since malicious attacks deteriorate the reliability of BCFL, future research can focus on the combination of the two technologies to reduce the risk of being attacked. For example, reasonable mechanisms can be designed to use blockchain for the selection of users and data.

Vi-B Training Efficiency

The goal of FL is to train a global model through the collaborative work of multiple devices, not only the accuracy of the global model, but also the time and computational cost consumed by training, should be taken into account.

Vi-B1 Reliability of Data

Since we cannot guarantee that all participants are honest, it is unreasonable to assume that all data are reliable. We cannot ignore the impact of unreliable data on the global model. At least three measures are considered to improve the reliability of data:

  • Perform clients selection before training to exclude potentially dishonest nodes. The impact of the types of blockchain needs to be considered in the selection of clients. When BCFL uses public blockchain, any device can join the training without permission. In this case, the selection scheme can be designed to decide whether allow those devices to continue to participate in the training based on the performance of the clients in the previous round. It requires to evaluate the device performance in single or multiple rounds. In fact, many verification mechanisms have adopted this approach. As for the permissionsed blockchain, since potential devices need to obtain permissions to join the training, malicious attacks and invalid data can be reduced to some extent.

  • Design efficient verification mechanisms to speed up the processing so as to reduce the time consumption, and improve the accuracy of verification to ensure that only qualified data can participate in model integration. The current research lacks a detailed study of the process of the verification mechanism. Devices involved in the verification mechanism need to be considered. No matter data are verified among clients or through miners on the blockchain, privacy needs to be prevented from being leaked. Different verification mechanisms can affect the security of the model.

  • A reasonable incentive mechanism should be designed to encourage participants to provide truthful data, and penalizing those who are dishonest. Most of the current research focuses on how to distribute rewards, and we believe that innovation can be made from the perspective of punishment. In the blockchain ecosystem, behavior of clients can be constrained by depositing a portion of virtual currency (e.g., bitcoin and ethereum) before training.

Vi-B2 Communication Latency

Communication latency occurs in both FL and blockchain networks, which are also a constraint to the development of these two technologies. Latency analysis has been given enough attention in BCFL, and a number of studies have already proposed solutions, for example, Kim et al. [kim2019blockchain]]suggests reducing the computational difficulty of PoW to lower latency.

Vi-B3 Asynchrony

During the training process, the time of participates entering and exiting affects the effectiveness of the training. The time to join training can be specified by designing a participant selection mechanism, however, several factors can cause participants to drop out of training early, such as network issues, damaged devices, limited storage space, etc. The above-mentioned problem affects the distribution of rewards apart from the correctness of the global model.

Vi-C Training Cost

Vi-C1 Storage

In the conventional federation learning model, local model updates are stored on the aggregator, while in BCFL, data are stored through the blockchain. Meanwhile, every clients can also keep a copy of the blockchain locally and update it continuously, increasing the storage cost. For devices with insufficient storage capacity, they may not be able to continue to participate in training as the data stored in the blockchain grows. In addition to the storage of data, it will be a research direction how the clients and miners can efficiently retrieve data on blockchain.

Vi-C2 Computing Consumption

FL usually requires multiple rounds of iterations to get the final global model, so the cost of model training is usually related to the number of training iterations. The trade-off between model accuracy and cost has been a topic for researchers. Compared with traditional FL models, BCFL requires not only local model training, model aggregation and updating, but also data validation and block generation. These activities consume a large amount of computational resources and increase the cost of training models.

In FlC-BCFL model, miners and clients are different devices, and the cost calculation needs to be based on different roles. For miners, a significant amount of computations will be used to run consensus protocols, i.e., mining, which is an arithmetic-intensive process, so light-weight consensus protocols can be designed to reduce the computational difficulty. In addition, the overall cost can be lower by reducing the cost of data verification. For clients, reducing the number of training sessions while ensuring the training quality can reduce the cost.

The training cost of the model is not only related to the accuracy of the global model, but also affects the security of the whole BCFL model. For example, if the difficulty of generating blocks is reduced or the process of verification is simplified, although the computational cost can be reduced, it may lead to security problems. In addition, from the overall perspective of BCFL, the blockchain and FL need to operate in a coordinated manner, and how to allocate resources will also affect the computational cost. Therefore, computational cost is a topic that needs to be addressed gradually in future research.

Vii Conclusion

In this paper, a detailed investigation of blockchained FL (BCFL) is provided. We first introduce blockchain and FL respectively. Then we investigate the foundations of BCFL, including the architecture of BCFL, the blockchain on BCFL, and the devices on BCFL. We also analyze four functions of BCFL, i.e., verification of model updates, aggregation of global model, utilization of the distributed ledger, and incentive mechanism. After that, we survey the applications of BCFL in real life. Finally, we discuss the existing challenges of BCFL and give the corresponding future research directions.

Blockchain and FL are both emerging technologies, and their combination can efficiently address the security and privacy issues of distributed machine learning. This paper is the first detailed survey on BCFL, and we believe that BCFL will be used more often in the future. We hope our work will bring new ideas for future BCFL research.

References