In the past decade, blockchain technology has been successfully applied in different fields and potentially will be applied in more critical areas. However, the current mechanism consumes a huge amount of energy for conducting the computation needed for maintaining the security guarantee of the system, and various concerns are arising accordingly. “The cryptocurrency uses as much CO a year as 1 million transatlantic flights. We need to take it seriously as a climate threat” according to the Guardian . Because of such concerns, “Bitcoin’s need for electricity is its Achilles Heel” according to the Forbes . According to , the energy consumption of Bitcoin has been steadily increasing until late November 2018 when the Bitcoin price dropped suddenly. Even after the drop, more than 50 TWh (trillion watt hour) per year is consumed just for maintaining the blockchain underlying Bitcoin. This amount is from Bitcoin only, and the total energy consumption by all applications based on blockchain will be much more than that.
The main issue is that all energy is wasted to some extent. The majority of the energy is being consumed by the hash calculation in PoW-based blockchains. Proof-of-Capacity (PoC) is proposed to address this issue (e.g., Burstcoin ), however it does not completely solve the problem because storage resources are wasted instead.
Primecoin  uses prime number finding instead of hash calculation as PoW, and miners seek special sequences of prime numbers (Cunningham chains). However, the application of those numbers is limited in cryptographic protocols. Gridcoin , Golem , and FoldingCoin  are cryptocurrencies that distribute rewards to miners based on the amount of scientific computation they performed. Though being similar, the Proof-of-Deep-Learning (PoDL) proposed in this paper differ from them substantially. Miners’ computing ability does not make blockchain secure in those systems, however PoDL is an improved PoW-like consensus mechanism with which deep learning power of honest miners provide tamper-proofness. Owing to this, PoDL can be deployed in any PoW-based blockchain applications, recycling their miners’ energy for deep learning. Proof-of-Stake (e.g., Nxt ) or Proof-of-Important (e.g., NEM ) are alternative consensus mechanisms with less energy consumption. However, their principle is orthogonal to that of PoW, and the ‘block mining’ does not involve computation. Therefore, these are orthogonal to our work.
We present a novel design of blockchain which reinvests the energy consumed by blockchain maintenance in computation tasks of deep learning. This is done by introducing Proof-of-Deep-Learning (PoDL) mechanism which forces miners to perform deep learning training and present trained models as proofs. The contributions of this paper are summarized as follows.
(1) We present the first consensus mechanism, PoDL, that maintains blockchain via deep learning instead of useless hash calculation; (2) Our PoDL can be applied to any cryptocurrency based on PoW mechanisms because we only incrementally add components to block headers; (3) Our experiment shows that the design is feasible for cryptocurrencies whose block intervals are much greater than 10 seconds.
Ii-a Proof of work (PoW) and Block Mining
Proof of work (PoW)  is used commonly in many cryptocurrencies (Bitcoin, ZCash, Monero, Litecoin etc.), where miners need to find a hash value smaller than some threshold, and this involves brute-force search over a large search space. Therefore, plenty of computation resource is needed in the PoW mechanism of those cryptocurrencies. Block mining is the process of creating a block with a valid hash value (i.e., less than a small threshold). Miners who create blocks with valid hash values are rewarded for their work. More specifically, they are allowed to insert one Coinbase transaction which creates and sends certain amount of block reward to any address specified by the miner.
Ii-B Deep Learning (DL) and its Training
Deep learning 
(DL) outperforms traditional machine learning algorithms dramatically in many areas. To achieve a proper DL model, one needs to provide a dataset (calledtraining dataset) and train the model. The training is composed of two algorithms, feed-forwarding and back-propagation, that are interchangeably executed. We say an epoch
is finished when a pair of feed-forwarding and back-propagation are interchangeably performed through the neural network exactly once for every record in the training dataset. Multiple epochs are repeated in training, and the accuracy of a trained model can be tested with another dataset without overlaps (calledtest datset). Such a training is an approximation algorithm based on hill climbing that seeks local optima in the entire model parameter space. Efficient algorithms that find global optimum are unknown yet, and this is the core of our PoDL.
Iii Our Energy-recycling Blockchain with PoDL
We propose to recycle the energy consumed in the block mining by introducing Proof-of-Deep-Learning (PoDL) with DL training. Namely, we let miners train DL models, and the block generated by the miner who trained a proper DL model will be accepted by full nodes. In the block chain with PoDL, we have an extra stakeholder besides miners and full nodes: model requester who outsources DL model training to miners. The goal of this paper is to present a proof-of-concept design, and we consider the simplest model where there is only one model requester who provides training/test datasets that describe the desired model. Because the model requester’s goal is to get the best DL model, we assume s/he will be a semi-honest adversary who does not collude with anyone.
Iii-a Overview of New Blockchain with PoDL
Block acceptance policy. When miners submit blocks and block headers, we let them submit trained DL models. Then, we let full nodes choose the block that is valid and also comes with the model with the highest accuracy when validated with the test datasets. Full nodes are asked to validate models on test datasets on their own to calculate the accuracy. In order to prevent Denial-of-Service attack, we let miners self-validate their models and report their models’ accuracy as well. Full nodes are asked to start their validation from the models with the highest accuracy first and stop when they find the first model whose validated accuracy is same as the claimed one. This replaces the PoW validation, and we do not require the hash of the block header to be smaller than threshold values. To tie blocks and DL models, we require that models be hashed and that hash values be included in block headers.
Preventing model overfitting. If test datasets are available to miners, they are motivated to cheat by training DL models directly on test datasets, i.e., overfitting the model. To prevent this, we set up two phases between blocks. In the first phase, the model requester releases training datasets to miners for their training, and they do not release test datasets until the second phase. Miners will become able to validate their accuracy and submit the models to full nodes after the first phase is over. The following mechanism will prevent miners from continuing the training in the second phase.
Preventing model stealing and training in the second phase. Miners may cheat by (1) stealing others’ DL models published in the second phase or by (2) further training DL models with the released test datasets for higher accuracy (i.e., model overfitting). To prevent these, we require that miners release the block headers in the first phase if they want to compete in the second phase, and the headers serve as the commitment of their models. In the second phase, full nodes will validate the blocks and models whose block headers have been submitted in the first phase only. By doing so, if miners steal others’ models or retrain their models, their new models’ hash value will be different, and their block headers will be different from what full nodes received in the first phase.
Blockchain verification: To verify the whole blockchain (e.g.,
when Initial Block Download occurs), full nodes must ensure the accepted DL models are trained from training datasets only and their accuracy in the test datasets is same as the claimed one. To provide such verifiability, miners are asked to submit the parameters necessary for repeating the training: hyperparameter, initial weights, number of epochsetc.. With these, full nodes are able to repeat the training to determine whether the accepted model can be reconstructed from the training datasets only. Furthermore, they can verify the claimed accuracy with test datasets.
Iii-B Blockchain Description with PoDL
Phase 1 for determining block at height : Given training datasets released by the model requester, miners train DL models without knowing test datasets as part of PoDL. The miners generate the block and the header by following the rule of the underlying blockchain system (e.g., generate transactions and merkle trees of blocks in cryptocurrencies) and including the hashed model, and they submit the block header to full nodes by the end of Phase 1.
Phase 2 for determining block at height : The model requester releases test datasets, and miners validate their trained models and submit the highest-accuracy ones to full nodes along with the block and the block header. Then, full nodes choose and validate the accuracy of submitted DL models in the decreasing order of the accuracy claimed by the miners, and accept the first one (as well as the corresponding block and header) that has the claimed accuracy. In the case of a tie, full nodes follow the policy of underlying blockchain system (e.g., accept the one which arrived earlier as in Bitcoin). Full nodes ignore all models whose block headers are not received in the first phase, and furthermore, full nodes do not accept other blocks at height once Phase 2 for is finished.
Dealing with short training time: Block generation rates are controlled to be constant on average in many cryptocurrency systems (e.g., 10 minutes in Bitcoin and 2.5 minutes in Litecoin). Therefore, miners train for a short period time only in Phase 1, yielding low accuracy increment. We present two mechanisms to address this. First, the model requester does not collect the model until the model accuracy does not increase significantly after multiple blocks, and s/he releases new test datasets for different blocks. By doing so, training for one DL model spans across multiple blocks, and a good-enough model is achieved at the end. Besides, Phase 2 for and Phase 1 for may happen concurrently. Namely, after validating the accuracy of trained model for (which is Phase 2 for ), miners immediately start training for (which is Phase 1 for ). This is acceptable as long as the model requester provides fresh test datasets for every block. Note that one training dataset is reused across multiple blocks, therefore the miners need to access the training dataset only once per model.
The rest of the blockchain remains the same. Finally, we present an example blockchain with our new PoW mechanism in Fig. 1, where stands for the root of the Merkle Tree.
Iii-C Properties of Our Design with PoDL
Block reversibility: Because of our block acceptance policy, accepted DL models have lower accuracy in earlier blocks, and the DL models have higher accuracy in later blocks. Accordingly, it becomes much more challenging to present a model whose accuracy is higher than the accepted ones. Therefore, the previous blocks become hardly reversible only after the models with high-enough accuracy appear in the blockchain. Due to this, whether blocks are reversible does not depend on the number of confirmations. Rather, it depends on the highest accuracy of the model along the blocks.
Hardness of double spending: Firstly, full nodes in our blockchain accept the blocks in Phase 2 if and only if their headers are received in Phase 1. Therefore, even if adversaries have access to test datasets after a block is confirmed, they are unable to submit new blocks with new models because the corresponding block header does not exist in the list of Phase 1 for as long as full nodes are honest.
Even if the majority of the full nodes collude with miners, double spending without 51% computing resources is still a low-probability event. Training algorithms seek local optima with certain randomness because no known algorithms can find the global optimum. Therefore, if only the highest-accuracy models are accepted, it is challenging to further improve the accuracy beyond it (also see Fig.3 in our simulation). If adversaries wish to double spend in our blockchain by controlling majority of the full nodes, they must present a longer sequence of blocks where all blocks must contain DL models with higher accuracy. Furthermore, the DL models’ training must be repeatable with training datasets only as well. Owing to the randomness of the training performance that depends on the random choice of hyperparameters and initial weights, we conjecture that this is extremely challenging unless the adversary possesses more than 51% of the computing resources for DL training. Adversaries having good hyperparameters may have an advantage for reversing the blocks, however those parameters will be published to other miners as well, making it hard to reverse blocks again.
Datasets provision: Training and test datasets for DL may have large volumes, however these are necessary for blockchain verification. The storage burden will be prohibitively high if datasets are stored in the blockchain, therefore we assume model requester will provision datasets properly (i.e., by following the release time for different blocks). Model requesters are motivated to play this role as their goal is to get the best model.
Storage burden: DL models’ sizes vary from 100KB to 10GB, and storing all model parameters including those for training repeatability can be a huge burden. However, various techniques can be used to reduce the sizes without affecting the accuracy too much [15, 14], and we may limit the model sizes to a common one, e.g., 10MB/model in [15, 14]. Furthermore, because the tamper-proofness is guaranteed by high-accuracy models only, we can free the storage by removing models with low accuracy. The later blocks with high-accuracy models will still prevent the double spending.
Network delay: Blocks submitted to full nodes include DL models and training parameters, therefore full nodes will experience extra network delay. Besides, miners experience extra delay as well owing to the retrieval of training/test datasets. However, note that the same training datasets are used for multiple blocks, without needing retrieval at every block. New test datasets need to be retrieved in every Phase 2, however the phase overlaps with Phase 1 for the next block, and it does not affect the block generation rate as long as the network delay is smaller than the block interval. Besides, the block acceptance is determined by the accuracy of DL models rather than their arrival time (except for tied models), the impact of test datasets’ network delay is minimal.
Impact to ASIC devices: ASIC is traditionally considered adverse to the blockchain ecosystem, but it will in fact be beneficial in the blockchain with PoDL because ASIC devices will be designed for deep learning training, and it will contribute to the development of better hardware.
Iv Feasibility Validation by Experiments
Iv-a Experiment Setting
The experiment for block generation and validation was conducted with a laptop with Intel i7-6700. We implemented blockchain functions (e.g., transaction/block generation, hash calculation) based on 
under Python 3.6. The deep learning experiment was conducted with a desktop with i7-6850K, 24GB RAM, and two GTX 1080Ti GPUs. A single-word command recognition model was trained with TensorFlow using a dataset including 105,000 audio samples.
Iv-B Benchmark Tests
Instead of constantly calculating hash values, miners will constantly conduct DL training in our blockchain. Therefore, DL training itself is the counterpart of hash brute force rather than extra overhead compared to existing blockchain systems. The extra computation tasks brought by our new PoDL mechanism are: (1) Miners’ hash calculation for (model) at Phase 1; (2) Full nodes’ sorting by accuracy at Phase 2.; (3) Full nodes’ validation for (model)’s correctness and search for the block header at Phase 2.; (4) Miners’ accuracy determination at Phase 2.; (5) Full nodes’ accuracy verification at Phase 2.; (6) Full nodes’ full verification at Initial Block Download. Owing to lack of sufficient data, we omit the evaluation for (6) which requires a large number of models as well as their training parameters. Since (6) is a one-time process, its impact is much smaller than the rest. For the rest, we measure the elapsed time for conducting those tasks and compare it with common block intervals, which explains how much of miners’ time is devoted to DL training, i.e., effectiveness of our energy recycling.
|(SHA-256)||(Google Dense Map)||(Quicksort)|
|5.9 ms/MB ||89.75ms/1M inserts||154.9 ms/1M objects |
|16.25ms/1M reads |
Hash calculation is involved in (1); sorting is needed in (2); sorting and searching is needed in (3). For those, we present existing benchmark results in Table I to show their overhead. Note the load factor for the hash table in  is . Generally speaking, their extra overhead is negligible compared to block intervals of any cryptocurrency.
(4) and (5) involve feed-forwarding on DL models. Miners need to run feed-forwarding algorithm for once for every record in the test datasets. Full nodes need to run that algorithm for as many times for every record as the number of models they need to validate in Phase 2, but this will be small because full nodes will stop validating the models as soon as they find the model with the claimed accuracy. We present the elapsed time for model validation in Fig. 2. The average over 1,000 repeated block validations is 1.96 seconds, which is negligible when compared to block intervals of some popular cryptocurrencies (e.g., 10 minutes for Bitcoin and Bitcoin Cash, 2.5 minutes for Litecoin), meaning that PoDL works well with those cryptocurrencies because most energy can be recycled. If block intervals are smaller (e.g., 10-19 seconds for Ethereum), PoDL will be less effective.
Iv-C Simulation for Accuracy Growth
We also performed a simulation to see how accuracy of a model increases with the growth of blockchain. Specifically, we measured the accuracy increment along the epochs in the DL training by measuring the accuracy of the model we achieved every 400 epochs, which takes approximately 100 seconds. In Bitcoin, this translates to 1 block per 2400 epochs, and the result is shown in Fig. 3. For more complicated models, the training will span across more blocks.
V Conclusion and Future Work
We presented a proof-of-concept design of a energy-recycling blockchain with our novel PoDL mechanism. Miners perform training tasks of deep learning instead of hash calculation, and they present trained DL models as their proof of deep learning. Model stealing and overfitting are prevented by our block acceptance policy with separated phases. Without majority DL training power, double spending is hard even though majority of full nodes are malicious.
Our proof-of-concept design has much room of improvement. The model requester may be generalized to multiple malicious requesters who may collude with miners. Besides, more extensive study needs to be performed with a realistic pattern of block submission and more DL models/datasets. It is our future work to extend this study to improve and complete the PoDL mechanism.
-  https://gridcoin.us/assets/img/whitepaper.pdf
-  https://www.febooti.com/products/filetweak/members/hash-and-crc/hash-benchmark/
-  Bitcoin energy consumption index, https://digiconomist.net/bitcoin-energy-consumption
-  Blockchain, https://www.burst-coin.org/proof-of-capacity
-  The blockchain application platform, https://nxtplatform.org/
-  Foldingcoin, https://foldingcoin.net/
-  Golem gnt, https://golem.network/
-  Nem, https://nem.io/technology/
-  Comparison of internal sorting algorithms (Sep 2008), https://attractivechaos.wordpress.com/2008/08/28/comparison-of-internal-sorting-algorithms/
-  (Aug 2016), https://tessil.github.io/2016/08/29/benchmark-hopscotch-map.html
-  Coppola, F.: Bitcoin’s need for electricity is its ’achilles heel’ (May 2018), https://www.forbes.com/sites/francescoppola/2018/05/30/bitcoins-need-for-electricity-is-its-achilles-heel/#3825e5ef2fb1
-  Dwork, C., Naor, M.: Pricing via processing or combatting junk mail. In: Annual International Cryptology Conference. pp. 139–147. Springer (1992)
-  van Flymen, D.: Learn blockchains by building one. https://github.com/dvf/blockchain (2017)
-  Gong, Y., Liu, L., Yang, M., Bourdev, L.: Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115 (2014)
-  Han, S., Mao, H., Dally, W.J.: Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015)
-  Hern, A.: Bitcoin’s energy usage is huge – we can’t afford to ignore it (Jan 2018), https://www.theguardian.com/technology/2018/jan/17/bitcoin-electricity-usage-huge-climate-cryptocurrency
-  King, S.: Primecoin: Cryptocurrency with prime number proof-of-work. July 7th (2013)
-  LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436 (2015)
-  Warden, P.: Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209 (2018)