Selecting Reliable Blockchain Peers via Hybrid Blockchain Reliability Prediction

10/31/2019 ∙ by Peilin Zheng, et al. ∙ 0

Blockchain and blockchain-based decentralized applications are attracting increasing attentions recently. In public blockchain systems, users usually connect to third-party peers or run a peer to join the P2P blockchain network. However, connecting to unreliable blockchain peers will make users waste resources and even lose millions of dollars of cryptocurrencies. In order to select the reliable blockchain peers, it is urgently needed to evaluate and predict the reliability of them. Faced with this problem, we propose H-BRP, Hybrid Blockchain Reliability Prediction model to extract the blockchain reliability factors then make personalized prediction for each user. Large-scale real-world experiments are conducted on 100 blockchain requesters and 200 blockchain peers. The implement and dataset of 2,000,000 test cases are released. The experimental results show that the proposed model obtains better accuracy than other approaches.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 7

Code Repositories

BlockchainReliabilityPrediction

Data and Implement of Paper


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Blockchain is firstly proposed by Bitcoin (Nakamoto, 2008). It consists of a continuously growing list of records, called blocks, which are linked and secured using cryptography. In a period of time, each peer in the P2P transaction network records the transactions and package them into a block to join the blockchain. The blockchain is maintained by all the peers in the P2P network through a consensus protocol. In Bitcoin-like blockchain systems, after receiving the previous block, the peer will try to calculate the hash for the next block as soon as possible to get the rewards, such as cryptocurrencies. This competition is so called mining and the mining users are called miners. In Bitcoin-like mining, if a user connects to unreliable peers that returns the wrong block or old block, the user will never gain the cryptocurrency reward.

Blockchain-based decentralized applications (DApp) have gained a lot of attentions from both industry and academia in recent years (Zheng et al., 2017). Most DApp users do not run a blockchain peer by themselves, but interact with the third-party peers. However, this kind of third-party peers had been reported to be unreliable (12), leading to bad user experience and even cryptocurrency lost by users’ misoperation.

Figure 1. Real-world Blockchain Peers and Requesters

Therefore, it is necessary for blockchain users to select the more reliable peers. The effect of selecting the peers can be summarized as two folds: (1) For blockchain mining: The blockchain users’ mining profit is proportional to reliability of the peers connected to it. (2) For blockchain-based application users: The reliability of the peers determines the correctness and delay of transactions. Selecting reliable peers will help reduce the delay and avoid cryptocurrencies lost by repeated transactions. Thus there is great economic benefit and urgency to select reliable blockchain peers. There are more than 20,000 blockchain peers at the same time in the real-world. But single user cannot connect to all the peers in the meanwhile to evaluate their reliability so that the user need to predict the reliability.

There are some difficulties of blockchain reliability prediction As for blockchain reliability, Zheng et al.(Zheng et al., 2018) and Dinn et al. (Dinh et al., 2017) propose the ways to evaluate the availability and performance of blockchain systems. However, these methods enable only the owner of the peer to know the reliability, thus do not work for other users. And, since the network situation is different for each user, the observed reliability of the same peer could be different for different users, which will be shown in Section 5.2. To attack this challenge, a personalized reliability prediction method is needed.

In this paper, we propose a hybrid collaborative reliability prediction model for blockchain systems, called H-BRP

. H-BRP does not predict the success rate of the blockchain peers directly. The main idea of H-BRP is to extract blockchain-related factors from the request history (e.g., block hash, block height). Then it uses the relationship between similar blockchain users and peers to do the collaborative prediction with hybrid linear regression. In this way, H-BRP obtains personalized prediction results for different users with higher accuracy than other approaches, as the real-world experiment shows. As shown in Figure 

1, we deploy 100 blockchain requesters to evaluate and predict the reliability of 200 real-world blockchain peers, showing the feasibility and effectiveness of the model.

In summary, the main contributions of this paper are summarized as follows:

  • We propose H-BRP, Hybrid Blockchain Reliability Prediction model for blockchain systems. This model can extract blockchain factors related to reliability. And, it uses the relationship between similar users and peers for personalized prediction.

  • We conduct real-world experiment with 2,000,000 test cases from 100 requesters to 200 blockchain peers as shown in Figure 1

    . The results show the effectiveness of the proposed model. The implement and dataset will be open-source.

The rest of the paper is organized as follows. Section 2 introduces the basic concepts of blockchain systems. Section 3 describes the motivating example of reliability prediction of blockchain systems. Section 4 proposes the details of the Hybrid Blockchain Reliability Prediction model, including the data processing, training and prediction. Section 5 introduces the implement of H-BRP and the experiment results. Section 6 provides the related work and discussion about blockchain reliability. Section 7 concludes the paper and gives the future work.

2. Basic Concepts

This section introduces the basic concepts of the blockchain and decentralize application.

In a narrow sense, the blockchain is a kind of data structure. The concept of the blockchain was firstly proposed as the underlying storage for peer-to-peer payments in Bitcoin(Nakamoto, 2008). As shown in Figure 2, every block contains the transactions in a period of time. Then every block is joint to a chain-like data structure named blockchain. Each peer in the peer-to-peer network maintains a blockchain by itself. And they keep it the same with each other via consensus protocols. Each block has a hash value of itself and this hash value is contained in the next block to make it tamper-resistant and traceable.

Figure 2. Data Structure of Blockchain(Zheng et al., 2018)
Figure 3. Different Thrid-party Blockchain Peers in DApps

In a wide sense, the blockchain can be regarded as a new kind of distributed system. The basic concepts of blockchain are listed as follows:

  • Transaction: A transaction represents a message to change the ledger, such as transferring cryptocurrencies. If someone wants to send Bitcoin to others, he should broadcast the transaction to the p2p network.

  • Block: Block is a data package consists of the transactions in a period of time. Each block has a hash value of it self (Nakamoto, 2008; Buterin and others, 2013; Wood, 2014) so that the hash value can be used to check the authenticity of the block.

  • Chain: The chain consists of all the blocks that are linked by their hash. In the Bitcoin blockchain(Nakamoto, 2008), every block is generated after the previous one so that they record the hash of the previous block. This chain-like structure is shown in Figure 2.

  • Mining: In public blockchain systems, after receive the previous block, the peers will try to find out a nonce for the next block to get the rewards, such as Bitcoin. This process is so called mining and the peers are called miners.

  • Decentralized Application: Blockchain-based decentralized applications (DApp) use blockchain as the underlying technology (17; R. S. (2016); 20; H. Yung-chen (2017); R. Chen. (2017)). Most DApp users connect to third-party peers to get the blockchain data. Figure 3 shows different third-party blockchain peers in DApps. If the user connect to an unreliable peer, the DApp would not work.

In summary, blockchian is a growing chain-like data structure maintained by peers in P2P network. Each peer in the P2P network wants to get the latest (or so-called highest) correct block. Therefore, a blockchain peer is reliable if it can return the latest block for the requesters.

Figure 4. Motivating Example of Blockchain Reliability Prediction

3. Motivating Example

In this section, a motivating example of blockchain reliability prediction is given. Blockchain users can choose either to run blockchain peers by themselves or use the third-party peers to interact with the blockchain systems. As shown in Figure 4, no matter the user run a peer by himself or not, he should select some blockchain peers to connect to. The challenge is that, there are more than 20,000 peers online at the same time, and he cannot test all of them to see which one is reliable for him. The influence of the reliability of the blockchain peers can be divided into two folds: for blockchain mining and for blockchain-based application user.

3.1. For Blockchain Mining

The premise of blockchain mining is that the user should get the latest previous block. If the user connects to the peers with low reliability, he cannot get the previous block in time. Then the user will waste the computing resources in computing at the wrong block without any rewards of cryptocurrencies. To improve the block synchronization speed and economic benefit, it is vital to evaluate and predict the reliability of the blockchain peers.

3.2. For Blockchain-based Application User

Assumed that the user in Figure 4 is like most blockchain-based application users that he connect to third-party blockchain peers. Then he need to select the most reliable one since unreliable peer will cause consequences. For example, one of the most famous wallet of cryptocurrencies called imToken had been reported to fail to sync with the Ethereum network (12)

. At that moment, the users think wrongly that their transactions are not confirmed by the network, then they send other repeated transactions again and again, which causes the loss of their money.

Thus it is really important for the blockchain users to know which blockchain peer is more reliable to avoid the loss of cryptocurrencies and improve the experience with the blockchain.

And, in reality, a user cannot connect to all the blockchain peers in the meanwhile, which means that there are lots of peer of unknown reliability. Thus it is necessary to predict the unknown reliability of blockchain peers.

In summary, the motivations of this paper are to evaluate and predict the reliability of blockchain peers and help select the reliable peers to improve the blockchain synchronization speed, avoid loss of money, and improve the experience of blockchain.

4. Methodology

In this section, the methodology of Hybrid Blockchain Reliability Prediction model will be introduced, including the architecture and the steps in details.

Figure 5. Architecture of the Blockchain Reliability Prediction

4.1. Architecture

The architecture of the blockchain reliability prediction is shown in Figure 5. It consists of 3 major roles as follows:

  • Blockchain Peers: Blockchain peers are the nodes which maintaining the blockchain in the P2P network.

  • Blockchain Requesters: Requesters are the nodes installed with the H-BRP data collecting program, which will randomly request the blockchain data from some of the peers in a period of time. The requesters can be considered as the users in the blockchain systems.

  • Data Collector & Predictor: Data collector is a central server that collecting all the test cases from the blockchain requesters. Then the data will be used to evaluate and predict the reliability of the blockchain systems.

The main idea of this architecture is that users can contribute their blockchain request history to a central data collector. Then the collector will summarize the historical data and do personalized reliability prediction for each user and peer. As shown in Figure 5, there are 4 steps of Hybrid Blockchain Reliability Prediction: Block Request Testing, Upload Test Cases, Reliability Prediction, and Select Reliable Peers.

4.1.1. Block Request Testing


In a period of time, a requester has more than one blockchain peer as candidate to connect to. Before knowing the candidates are reliable or not, the requester needs to connect to them. However, limited by the network conditions, the requester cannot request all the candidates in the meantime. Therefore, H-BRP proposes random batch block request testing for blockchain peers.

The random batch request testing include several stages as follows:

  1. Given a batch size n, and the time period of t seconds,

  2. For each time period, each requester selects n candidates from the list randomly.

  3. Each requester requests the latest block from the selected candidates, parses it and records the height and hash to the local storage.

4.1.2. Upload Test Cases


After above stages, a requester achieves the raw data in a tuple of ClientIP, BatchTime, PeerIP, StartTime, EndTime, Height, BlockHash. The example is shown in Table 1. In particular, H-BRP records the BatchTime to compare the test cases that are sent at the same time to see which one returns the higher block. H-BRP also records the StartTime and EndTime to see how long the round-trip time is during every test case. And, H-BRP records the Height and BlockHash in order to backtrack that whether the peer returns a correct block.

RequesterIP BatchTime PeerIP StartTime EndTime Height BlockHash
103.49.160.131 1532328744 167.99.208.120 1532328744 1532328744 6232293 0xa8b2b…
103.49.160.131 1532328744 219.117.201.187 1532328744 1532328744 null null
103.49.160.131 1532328744 116.62.100.69 1532328744 1532328744 5936957 0x073d4…
103.49.160.131 1532328744 147.75.80.165 1532328744 1532328745 6014476 0x7793a…
103.49.160.131 1532328744 47.75.9.16 1532328749 1532328749 6013794 0x8050b…
Table 1. Example of Random Batch Block Request Testing

When there are enough test cases, the requester can choose to upload the test cases to the collector. The more test cases that the requesters contribute to the collector, the more accurate reliability prediction will be done.

4.1.3. Reliability Prediction


After receiving enough test cases from users, the collector/predictor can choose different predicting model to do reliability prediction for the users and peers. Reliability prediction is the key step in the whole architecture. As shown in Figure 5, H-BRP model includes three substeps: H-BRP Matrix Generation,Collaborative Prediction, and Hybrid Training/Prediction. The detailed model and implement will be proposed in the next subsection. Here are the main ideas of it.

  • H-BRP Matrix Generation: This substep can be regarded as the data preprocessing. H-BRP proposes some factors that are related to blockchain reliability. It transfers the data from a list of test cases into some metrics. In this substep, each factor is extracted into a requester-peer matrix.

  • Collaborative Prediction: Since blockchain peer shows different network delay to different users, the reliability observed by different users could be different. The case study in Section 5 will show this difference. Therefore, it is necessary for the model to do personalized prediction for different users. To attack this problem, this substep is to find out similar blockchain users or peers, and then predict the unknown reliability factors for them.

  • Hybrid Training/Prediction: H-BRP assumed that there is a mapping between the reliability and factors extracted in previous substeps. Thus the reliability can be predicted based on the prediction of the related factors. In this substep, H-BRP first trains a linear regression model using the known reliability and factors. After that, H-BRP uses this model and the predicted factors to do reliability prediction.

Sincerely, directly predicting the reliability without extracting the factors could be chosen. But direct reliability prediction will make it lose lots of valuable information from the source data. That is why H-BRP extracts the factors from the test cases and do collaborative prediction by finding similar blockchain users/peers.

In summary, the key idea is to maximize the use of available information, such as blockchain features and users’ similarity. The detailed model will be propose in next subsection.

4.1.4. Select Reliable Peers


Based on the personalized reliability prediction result, the users can choose the blockchain peers with more reliability. For blockchain-based application users, the most reliable peer should be chosen. As for blockchain miners, they can choose top K peers ranked by predicted reliability.

4.2. Hybrid Block Reliability Prediction Model

In this subsection, the details of Hybrid Block Reliability Prediction Model will be described, as shown in Figure 6.

Figure 6. Details of Hybrid Block Reliability Prediction Model

4.2.1. Blockchain Factor Matrix Generation


After finishing the block request testing, the data collector use the request data to generate the Blockchain Factor Matrix.

First, Set up a blocks-tolerance value as MaxBlockBack to represent the max tolerance for block backwardness of the peer in the blockchain. Then set up a time-tolerance value as MaxRTT to represent the max round-trip time for the peer.

The Success Rate Matrix is generated as follow:

For each requester R and peer P , set up a success counters for reliable requests as SuccessRequest and a failure counters as FailureRequest.

Next backtrack each batch of block requests to the peer, the peer responses successfully if and only if it:

  1. Returns right block: The block hash is right in the corresponding block height on the main blockchain.

  2. Returns recent block height: The block height subtracted from the highest one in the batch is no more than MaxBlockBack. If MaxBlockBack is set to 0, it requires the peer is reliable only when it returns the highest block in the batch.

  3. Returns in time: The round-trip time of the request to the peer is no more than MaxRTT.

If the blockchain peer P responses successfully in a batch, then count it into SuccessRequest, otherwise into FailureRequest. Then the success rate of requester R to peer P can be calculated by :

(1)
(2)

After that, a matrix of success rate is achieved. As shown in Figure 6, the gray area is the known success rates, while the yellow area is the unknown success rates, which is needed to predict. Some research in service computing use the success rate or failure rate to predict the unknown entries in the matrix to predict the reliability of service. However, in blockchain reliability, this would lose some information from the source data because it do not take blockchain factors into account. Therefore, H-BRP generates three matrix corresponding to the above three blockchain related factors:

  1. Right Block Matrix:

    Right Block Matrix reflects on the rate at which the P returns the correct block to R. It can be generated by the following equation:

    (3)

    where is the counter of the requests that return the right block from P to R.

  2. Recent Height Matrix:

    Recent Height Matrix reflects on the rate at which the P returns the recent height to R. It can be generated by the following equation:

    (4)

    where is the counter of the requests that return the recent height from P to R.

  3. Round-trip Time Matrix:

    Round-trip Time Matrix reflects on the average round-trip time of the block requests between P and R. It can be generated by the following equation:

    (5)

    where is the round-trip time of the request from R to P in batch k.

In summary, in this phase, H-BRP generates one Success Rate matrix and three blockchain related factor matrices. The main idea of the matrix generation is to extract more information related to blockchain in the source data. Thus the prediction using this data will be more accurate.

4.2.2. Collaborative Prediction


After generating the matrices, the RightBlock Matrix, RecentHeight Matrix and RoundTripTime Matrix will be used into three collaborative filtering models. The target is to predict the missing value in the blank of the matrix. As shown in Figure 6, the target in this step is to predict the factors. It is assumed that the three events (right block, recent height, and in time) corresponding to the matrix are independent. Thus every factor matrix will be predicted through the following phases independently.

(1)Similarity Calculation

In each matrix, H-BRP employ PCC to calculate the similarity between Blockchain Peers P and P by using:

(6)

where is a set of blockchain requesters that connected to both the blockchain peers and , and

is the average value of the vector

in the matrix

(2)Similar Blockchain Peer Selection

After calculating the similarity values between the peers, a set of similar peers can be identified by setting a parameter k to select Top-k peers as similar peers to one specific peer.

To predict a missing factor entry in the factor matrix, a set of similar blockchain peers with the blockchain peer P can be identified by:

(7)

where is the largest PCC value with blockchain peer P and can be computed by Equation (6).

(3)Unknown Factor Prediction

Employing the similar blockchain peers , H-BRP adopts item-based approaches (Sarwar et al., 2001) (named as IPCC) to predict the missing value by:

(8)

where and are average value of the blockchain peer and observed by different requesters, respectively, and is the significant weight of the similar blockchain peer , which defined as:

(9)

4.2.3. Hybrid Training/Prediction


After the collaborative filtering prediction of the three-factor matrix, the factor prediction is achieved. In this step, the predicted factors will be used to predict the unknown success rate.

(1)Hybrid Training

First, it is assumed that there is a mapping between Success Rate and the three factors (RightBlock, RecentHeight, and RoundTripTime):

(10)

Thus the mapping can be transferred to the matrix by the above equations.

And H-BRP sets up a regression model to fit this mapping. As shown in Figure 6, as the gray area and arrows show, the known SuccessRate (gray area) and the known value in the three-factor matrices are used to train the regression model. During the training, the model learns from this hybrid data.

(2)Success Rate Prediction

After the regression training, the regression model can represent the mapping between Success Rate and the three factors (RightBlock, RecentHeight, and RoundTripTime). Thus the factors predicted in the collaborative prediction can be input into the model, with the output as the success rate. As shown in Figure 6, the predicted three-factors matrices (the white area) are input into the regression model and come out with the SuccessRate matrix predicted (the yellow area).

(3)Predict Reliabity

By the above steps, H-BRP obtain the predicted Success Rate from blockchain requester R to Blockchain Peer P. To predict the reliability of P observed by R, H-BRP adopts the commonly used exponential reliability function (Lyu, 1996):

(11)

where (failure-rate) is the rate of failures of request during a certain time duration, and is the time period for which the reliability is to be calculated.

The value of can be calculated by:

(12)

Thus the reliability from R to P can be calculated by:

(13)

5. Implement and Experiment

In this section, we implement and evaluate the proposed approach based on a real-world dataset, which is collected from 100 requesters to 200 blockchain peers. First, we introduce the details of implementation and dataset description, and then the evaluation & analysis of three research questions (i.e., reliability, accuracy, parameters impacts) are introduced, respectively.

5.1. Implement and Dataset

H-BRP is implemented by ShellScript, NodeJS and Python. Random Batch Block Request Testing is implemented by ShellScript to enable it to collect the data in all Linux server. Although there are some Remote Procedure Call testing frameworks that can be used, but most of them have lots of dependencies. And the dependencies are different in different Linux versions case by case. If a user wants to install H-BRP quickly to his client, the program should be light enough. ShellScript can meet all these requirements. And Matrix Generation is implemented by NodeJS and Python. More specifically, the NodeJS program is used to parse and analysis the data from the main blockchain to check which block request returns the right block. And the Python program is used to generate the SuccessRate Matrix, RightBlock Matrix, RecentHeight Matrix, and RoundTripTime Matrix and predict the blockchain reliability.

As for the blockchain requesters and peers. PlanetLab111http://www.planetlab.org is an organization that provided more than 1000 nodes all over the world. In this paper, 61 of them are selected to send the block requests. Vultr222http://www.vultr.com is a platform that provides cloud server leases. In this paper, 35 Linux servers (running the Cent OS) are rented from Vultr. Besides another 4 Linux servers owned by the research team, H-BRP deploys the requester program on 100 servers in total as the requesters. Ethernode333http://www.ethernode.org is a website that showing all the blockchain peers of Ethereum over the world. In this paper, 200 blockchain peers are selected to be test. The blockchain peers are from 21 countries and the requesters are from 15 countries.

We deploy the requester with the batch size as n=5 and the time period as t=5. After deploying the requesters, each requester sends random batch block requests to 5 blockchain peers in the period of 5 seconds. Finally, with the H-BRP implement, a dataset of over 2,000,000 test cases from 100 requesters to 200 blockchain peers is obtained. All the implement and dataset will be released on the website. For double-blind review, we upload the examination result to an anonymous github444https://github.com/forreview/H-BRP .

The experiment of analysis and prediction is conducted on the dataset to answer the following research questions:

Question 1: How is the reliability of the blockchain system evaluated by H-BRP?

Question 2: How accurate is the method proposed compared with other reliability prediction methods?

Question 3: What is the impact of different parameters set in the model?

5.2. RQ1: Case Study

In this subsection, H-BRP parses and analysis the obtained dataset to give some cases study to see the reliability of the blockchain system. The matrix generation is under the experimental settings of MaxBlockBack=12, MaxRTT=2000. After matrix generation, the dataset is presented as a 100 200 SuccessRate matrix.

Figure 7. Success Rate Distribution of H-BRP Dataset

From Equation 13 we learn that the higher success rate means the higher reliability. Figure 7 shows the success rate distribution of 20 blockchain peers and 4 requesters. In this case, the blockchain peers show different reliability to different requesters. This is mainly caused by the network situation that some requesters cannot receive the block from some remote peers due to the long network delay Thus the reliability they observed could be different. Since different requesters have different observed reliability to the same peer, it is required to make personalized prediction.

RequesterSuccess RatePeer 147.75.111.247 147.75.100.193
130.194.252.8 0.4873 0.0802
130.194.252.9 0.4444 0.0327
192.33.90.67 0.1783 0.7079
194.29.178.14 0.1929 0.7014
Table 2. Case Study of H-BRP Dataset
Figure 8. Average Succsess Rate of Blockchain Peers
Figure 9. Average Succsess Rate of Blockchain Requesters

Moreover, we extract 2 peers and 4 requesters to see how is the reliability exactly, as shown in Table 2. As for the requesters in 130.194.252.8 and 130.194.252.9, the success rate when they connect to 147.75.111.247 is much higher than 147.75.100.193. However, when considering to 192.33.90.67 and 194.29.178.14, the comparison of success rate is opposite. From this case, we can learn that the similar users may observe similar reliability of similar peers. That is why H-BRP obtains the relationship between similar users and peers to do collaborative prediction.

Parameter Method Density=0.30 Density=0.50 Density=0.65 Density=0.80 Density=0.95
UMEAN 0.3789 0.3774 0.3765 0.3758 0.3755
IMEAN 0.0925 0.0922 0.0925 0.0921 0.0918
MaxBlockBack=0, UPCC 0.2823 0.2791 0.2769 0.2754 0.2758
MaxRTT=1000 IPCC 0.0821 0.0777 0.0764 0.0758 0.0748
UIPCC 0.0851 0.0806 0.0791 0.0782 0.0773
H-BRP 0.0803 0.0731 0.0712 0.07 0.0672
UMEAN 0.4547 0.4531 0.4519 0.4513 0.4508
IMEAN 0.1171 0.1168 0.1167 0.1162 0.1166
MaxBlockBack=12, UPCC 0.3627 0.3591 0.3566 0.3552 0.3559
MaxRTT=1000 IPCC 0.1009 0.0949 0.0919 0.0908 0.092
UIPCC 0.1053 0.0994 0.0963 0.0951 0.0961
H-BRP 0.1031 0.0925 0.0899 0.0879 0.0845
UMEAN 0.4735 0.472 0.4707 0.4702 0.47
IMEAN 0.0848 0.0851 0.0848 0.0845 0.085
MaxBlockBack=12, UPCC 0.3793 0.3775 0.3758 0.3752 0.376
MaxRTT=2000 IPCC 0.081 0.0785 0.0765 0.0748 0.0751
UIPCC 0.0887 0.0864 0.0845 0.0829 0.0831
H-BRP 0.0654 0.0567 0.0529 0.0507 0.0464
UMEAN 0.5088 0.5081 0.5067 0.5065 0.5067
IMEAN 0.0938 0.094 0.0934 0.0931 0.0941
MaxBlockBack=100, UPCC 0.4595 0.4585 0.4569 0.4564 0.4574
MaxRTT=5000 IPCC 0.0921 0.0887 0.0842 0.0806 0.0774
UIPCC 0.1022 0.0993 0.0954 0.0923 0.0895
H-BRP 0.0648 0.0562 0.0524 0.0494 0.0433
Table 3. Comparison of RMSE of Blockchain Reliability Prediction Approaches

We rank the blockchain peers to see how their reliability is. The average success rate of block requests is shown in Figure 8. In these 200 blockchain peers, half of them show very low reliability to all the requesters, which means that they do not always return the latest block in time. This is mainly caused by that the block propagation in blockchain system is slow. Once a block is mined, it takes it a period of time to be propagated to the whole network. If the P2P network connectivity is not good, some of the peers will not receive the latest block. On the other hand, there are some large blockchain miners that generate most of the blocks. Thus the peers that are closer to the miners will have the higher chance to receive the latest block, resulted in the difference of reliability.

As for the blockchain users, we calculate out the average success rate of 100 requesters to see the reliability observed by the users. Figure 9 shows the result that the average success rate is lower than 0.3. It means that, if a user connect to the blockchain peers randomly, his chance to get the correct latest block is quite low. Compared to the most reliable peer shown in Figure 8, if the user connect to the most reliable one, the chance to get the block will be increased by more than two times. Therefore, before selecting blockchain peers, predicting the reliability and choosing the reliable peers will truly help the users to get the latest block.

5.3. RQ2: Accuracy of Different Method

To study the prediction performance, we compare our approach (H-BRP) with five other ones in reliability prediction: user-mean (UMEAN), item-mean (IMEAN), user-based approach using PCC (UPCC) (Breese et al., 1998), item-based approach using PCC (IPCC) (Sarwar et al., 2001), and user-item-based approach (UIPCC) (Zheng and Lyu, 2010)

. UMEAN employs the average success rate of the current requester on other blockchain peers for the prediction, while IMEAN employs the average success rate of the blockchain peers observed by other requesters for the prediction. UPCC only employs similar blockchain requesters for the failure probability prediction, while IPCC only employs similar blockchain peers for the prediction. And UIPCC is the combination of UPCC and IPCC. In this paper, those approaches are compared with H-BRP, to predict the same training Success Rate Matrix.

For each round, first we randomly remove the entries in the generated Success Rate Matrix to transfer it into the target density. After that, the removed entries are set as the test value. The same training matrix is the input of every reliability prediction approach, while the predicted value is the output. And the output value is compared with the test value to measure the prediction accuracy.

Root Mean Square Error (RMSE) metric is employed to measure the prediction accuracy of different approaches. RMSE is defined as:

(14)

where smaller RMSE values indicate better prediction accuracy.

(a) MaxBlock’=0, MaxRTT=1000
(b) MaxBlock’=12, MaxRTT=1000
(c) MaxBlock’=12, MaxRTT=2000
(d) MaxBlock’=100, MaxRTT=5000
Figure 10. Impact of Density
(a) MaxRTT=2000, Density=0.5
(b) MaxRTT=2000, Density=0.8
Figure 11. Impact of MaxBlockBack
(a) MaxBlockBack=12, Density=0.5
(b) MaxBlockBack=12, Density=0.8
Figure 12. Impact of MaxRTT

As for the parameters in this subsection, the density of the matrices is set as density = 0.3, 0.5, 0.65, 0.80, 0.95. We set K=3 to select Top3 similar blockchain peers for the collaborative prediction. And we set MaxBlockBack=0, MaxRTT=1000 to evaluate the prediction accuracy for the siutation that have extremely high requirements for blockchain synchronization speed (e.g., Bitcoin miner). We set MaxBlockBack=12, MaxRTT=1000 and MaxBlockBack=12, MaxRTT=1000 to evaluate the accuracy for the situation that have high requirement for confirming blockchain data (e.g., cryptocurrencies wallet, cryptocurrencies exchange). We set MaxBlockBack=100, MaxRTT=5000 to evaluate the accuracy for daily usage (e.g., ordinary blockchain users) that has high tolerance for block backwardness and latency.

The experiment under the same setting will be run in 20 rounds then come out with the average value of RMSE. The results are shown in Table 3. The experiment results show that H-BRP model achieves better accuracy than other approaches in different requirements for reliability and different matrix density. Mean Absolute Error (MAE) and Normalized Mean Absolute Error (NMAE) are also used in this experiment. The results can be checked on the anonymous github4.

5.4. RQ3: Impact of Parameters

In this subsection, comparison of RMSE with different parameters is given to evaluate the impact of different parameters set in the model. Since UPCC and UMEAN have much higher RMSE (lower accuracy) than other approaches, we will not take UPCC and UMEAN into the comparison to make the figures more clear.

5.4.1. Impact of Density


In this experiment, we compare the RMSE in the same MaxBlockBack and MaxRTT to see the impact of the density of the training matrix.

As shown in Figure 10, the results show that the accuracy rises as density increases. The main reason is that, the higher density of the training matrix is, the more information is input into the model, thus the more accurate the model is. The result also shows that H-BRP model has better accuracy than other models in most cases. It means that even each requester only has the request history with random 30% of blockchain peers, the prediction of the remain 70% can be realized.

5.4.2. Impact of MaxBlockBack


MaxBlockBack represents the block backwardness tolerance of the blockchain requesters. To compare the impact of MaxBlockBack, we set the parameters as MaxRTT=2000 and Density=0.5, 0.8

to see how the accuracy is changed with the variance of block backwardness tolerance.

The experiment result shows that the more block backwardness tolerance given, the lower accurate the model is, as the RMSE is increasing. This may be caused by that the higher block latency tolerance is given, the less difference between the blockchain peers is. Thus the accuracy of the models is affected.

5.4.3. Impact of MaxRTT


As for different MaxRTT, we set the experimental parameters as MaxBlockBack=12 and Density=0.5, 0.8 to see how the accuracy is changed with the round-trip time tolerance.

The experiment result shows that the higher MaxRTT is given, the more accurate the model is. In the MaxRTT1000, the RMSE of all prediction models are very close and large, which means that the models show low accuracy in this setting. The main reason is that only few blockchain peers can response in 1000 ms. Therefore, the fluctuation of success rate is relatively large, which causes the models to be less accurate. However, especially in the situation that MaxRTT1000, H-BRP shows great advantage over other models.

6. Related Work and Discussion

This section will describe the related work in blockchain reliability prediction, including blockchain related research and traditional software reliability research.

As for the blockchain reliability or availability, Zheng et.al (Zheng et al., 2018) propose a scalable framework for detailed and real-time monitoring of blockchain systems, which has much lower overhead and more details about the blockchain systems compared with previous approaches. Weber et al. (Weber et al., 2017) propose a method to identify the availability limitations of Bitcoin and Ethereum, showing that the reading availability is high while the writing availability is low. Kalodner et al. (Kalodner et al., 2017) propose an open-source software platform for blockchain systems, which parsing the data from the p2p nodes and raw blockchain data for users to monitor and analyze the system. Yang et al. (Yang, 2017) propose a benchmark for Fabric blockchain. Dinh et al. (Dinh et al., 2017) describe frameworks for analyzing private blockchains in varying workloads. Guapta et al. (Dickson, ) also propose a method for analyzing performance. Gervais et al. (Gervais et al., 2016) present a novel quantitative framework for the security and performance of PoW blockchains.

As for traditional software reliability research, Michael et al. propose a handbook of software reliability engineering (Lyu, 1996). The main idea of software reliability prediction is to predict the unknown reliability of software systems based on the past data (Jiang et al., 2013). Chen et al. (Chen et al., 2011) propose an enhanced qos prediction approach for service selection. Zheng et al. (Zheng and Lyu, 2010) and Silic et al. (Silic et al., 2013) propose a set of collaborative filtering approaches to predict reliability of software systems.

However, the previous blockchain research does not give a method of reliability prediction for blockchain systems. And it always focus on few blockchain peers. On the other hand, the previous research about reliability prediction cannot fit the blockchain systems since blockchain factors are not taken into consideration. To attack these challenges, in this paper, the main idea of Hybrid Blockchain Reliability Prediction model is to extract blockchain related factors to predict the reliability of blockchain system.

7. Conclusion and Future Work

In this paper, we firstly propose a Hybrid Blockchain Reliability Prediction model for blockchain systems. It can do personalized reliability prediction for blockchain users to improve the block synchronization speed and avoid the loss of cryptocurrencies. Real-world experiment with 2,000,000 test cases from 100 requesters to 200 blockchain peers is conducted, and the results show the proposed model is effective and more accurate than previous reliability prediction model. Specifically, implementation details and the dataset will be released for research.

In the future, our work can be extended in different aspects: (1) Decentralized Collector: The centralized collector is a limitation as a blockchain tool. To collect the data on blockchain could be chosen but the throughput would be too low. It should be transferred to a suitable decentralized platform. (2) Model Complexity: As for time complexity, H-BRP consumes 120% to 530% of other approaches in different scalability. This might not meet the requirement of online prediction. The model complexity could be improved. (3) Scalability: This paper only selects 200 blockchain peers of Ethereum Mainnet to evaluate and predict the reliability. However, it is reported that there are over 14,000 Ethereum peers and over 10,000 Bitcoin peers over the world which are available to be test.

References

  • J. S. Breese, D. Heckerman, and C. Kadie (1998) Empirical analysis of predictive algorithms for collaborative filtering. In

    Proc. 14th Annual Conf. Uncertainty in Artificial Intelligence (UAI’98)

    ,
    pp. 43–52. Cited by: §5.3.
  • V. Buterin et al. (2013) Ethereum white paper. Ethereum. Cited by: 2nd item.
  • L. Chen, Y. Feng, J. Wu, and Z. Zheng (2011) An enhanced qos prediction approach for service selection. In Services Computing (SCC), 2011 IEEE International Conference on, pp. 727–728. Cited by: §6.
  • R. Chen. (2017) A brief overview of dapp development. https://thecontrol.co/a-brief-overview-of-dapp-development-b8ac1648322c. Cited by: 5th item.
  • [5] A. D. Gupta. A. Dickson Analyzing performance in blockchain-based systems. http://blog.stratumn.com/perf-blockchains/. Cited by: §6.
  • T. T. A. Dinh, J. Wang, G. Chen, R. Liu, B. C. Ooi, and K. Tan (2017) BLOCKBENCH: a framework for analyzing private blockchains. In Proceedings of the 2017 ACM International Conference on Management of Data, pp. 1085–1100. Cited by: §1, §6.
  • A. Gervais, G. O. Karame, K. Wüst, V. Glykantzis, H. Ritzdorf, and S. Capkun (2016) On the security and performance of proof of work blockchains. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 3–16. Cited by: §6.
  • Y. Jiang, H. Zhang, H. Liu, X. Song, W. N. N. Hung, M. Gu, and J. Sun (2013) System reliability calculation based on the run-time analysis of ladder program. See DBLP:conf/sigsoft/2013, pp. 695–698. External Links: Link, Document Cited by: §6.
  • H. Kalodner, S. Goldfeder, A. Chator, M. Möser, and A. Narayanan (2017) BlockSci: design and applications of a blockchain analysis platform. arXiv preprint arXiv:1709.02489. Cited by: §6.
  • M. R. Lyu (1996) Handbook of software reliability engineering. McGraw-Hill, New York. Cited by: §4.2.3, §6.
  • S. Nakamoto (2008) Bitcoin: a peer-to-peer electronic cash system. Cited by: §1, 2nd item, 3rd item, §2.
  • [12] Own your rpc: abnormal wallet peer causes money lost by repeated transaction. https://medium.com/c2736464697/own-your-rpc-abnormal-wallet-peer-causes-money-lost-by-repeated-transaction-9aa7fefea8e6. Cited by: §1, §3.2.
  • R. S. (2016) Decentralized applications: harnessing bitcoin’s blockchain technology. O’Reilly Media. Cited by: 5th item.
  • B. Sarwar, G. Karypis, J. Konstan, and J. Riedl (2001) Item-based collaborative filtering recommendation algorithms. In Proc. 10th Int’l Conf. World Wide Web (WWW’01), pp. 285–295. Cited by: §4.2.2, §5.3.
  • M. Silic, G. Delac, and S. Srbljic (2013)

    Prediction of atomic web services reliability based on k-means clustering

    .
    See DBLP:conf/sigsoft/2013, pp. 70–80. External Links: Link, Document Cited by: §6.
  • I. Weber, V. Gramoli, A. Ponomarev, M. Staples, R. Holz, A. B. Tran, and P. Rimba (2017) On availability for blockchain-based systems. In Proceedings of the 36th International Symposium on Reliable Distributed Systems (SRDS). IEEE, Cited by: §6.
  • [17] (2017) What is a decentralized application coindesk. https://www.
    coindesk.com/information/what-is-a-decentralized-application-dapp/
    .
    Cited by: 5th item.
  • G. Wood (2014) Ethereum: a secure decentralised generalised transaction ledger. Ethereum Project Yellow Paper 151. Cited by: 2nd item.
  • B. Yang (2017) Blockchain guide. https://github.com/yeasy/. Cited by: §6.
  • [20] (2017) Your first dapp. https://dappsforbeginners.wordpress.com/tutorials/your-first-dapp/. Cited by: 5th item.
  • H. Yung-chen (2017) Talk about dapp decentralized application.. https://medium.com/taipei-ethereum-meetup/%E8%AB%96-dapp-decentralized-application-c843e7ed2b69. Cited by: 5th item.
  • P. Zheng, Z. Zheng, X. Luo, X. Chen, and X. Liu (2018) A detailed and real-time performance monitoring framework for blockchain systems. In Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice, pp. 134–143. Cited by: §1, Figure 2, §6.
  • Z. Zheng and M. R. Lyu (2010) Collaborative reliability prediction of service-oriented systems. In Software Engineering, 2010 ACM/IEEE 32nd International Conference on, Vol. 1, pp. 35–44. Cited by: §5.3, §6.
  • Z. Zheng, S. Xie, H. Dai, X. Chen, and H. Wang (2017) An overview of blockchain technology: architecture, consensus, and future trends. In Big Data (BigData Congress), 2017 IEEE International Congress on, pp. 557–564. Cited by: §1.