The recent studies highlight the exposure of machine learning algorithms to adversarial attacks where non traceable changes are introduced in the input data leading to erroneous predictions of outputs deceiving the machine learning algorithm used. The authors in  define and analyse the various forms of adversarial attacks launched in real time situations and also propose the plausible defence strategies to combat such attacks. In case of adversarial images, adversarial noise gets introduced which are used to train machine learning models being subjected to black box attacks. The detectors help to identify adversarial changes incorporated in the original image. The threats relevant to adversarial attacks predominantly exist in classification of image objects captured through cell phone camera where even the google inception model falls prey to such attacks. The Robust Physical Perturbation algorithm is a case wherein imposters print forged road sign posters and replace it with the real sign. Similar discrepant approaches have been identified in the form of cyberspace attacks. Robotic visual images and also three dimensional object images are fed to ML algorithms for classifications and predictions , . One of the most interesting applications of blockchain is in intrusion detection. Intrusion detection with the intersection of blockchain has huge scope of implementation in case of cryptocurrency and smart contract . Blockchain has a lot of potential applications in the energy sector which can be observed in peer to peer energy trading, IoT applications incorporating blockchain, decentralized marketplaces, charging of electric vehicles and e-mobility . The non- financial applications of blockchain are Ethereum and Hyperledger. The authors in 
have identified Binary Neural Network (BNN) to be more robust than full precision networks. Hence input discretization or dimensionality reduction of the input parameters when combined with BNN makes the model more robust against adversarial attacks. The challenges of the existing works are summarized in Table1. The existing solutions against various types of attacks on training and ML algorithms are given below:
Adversarial Attack on training data: Adversarial training using Brute Force, Data compression as a counter-measure, Foveated Imaging Mechanism, Randomization of Data
Adversarial Attack for network model: Deep Contractive Network, Regularization and Masking of the Gradient, Defensive Filtration, Bioinspired Defence Mechanism
Poisoning Attack: Sanitization of Data, Micromodel based defence, Strong Intentional Perturbations,Human in the Loop (HITL) model, TRIM algorithm
|Ref.||Methods used||Evaluation metrics||Research Challenges|
|||Blockchain technology to secure e-commerce transactions||MD5, smart contracts and digital signatures||Scalability, computing resources|
|||Linear regression||Mean Squared Error, Execution time||Delay/overhead in data processing|
|||Intrusion Detection System on Blockchain||Data integrity, transparency||Attacks prevention, scalability|
|||Binary neural networks||Weight decay value, learning rate||Multi-steps attacks still occur|
|||Blockchain system for dApps||Smart contracts||Transaction delay, lacks high throughput|
Some of the limitations of the existing defence mechanisms include:
The existing defence mechanisms deal with specific type of attacks and hence fail to adapt to newer attacks.
The defence mechanisms such as Brute force method consumes excessive computational resources.
The present work emphasizes on elimination of these limitations using the proposed blockchain based approach.
Blockchain-Based Fragmentation Approach to Secure Machine Learning Datasets
Blockchain is a technology wherein list of timestamped immutable data records are stored and managed in blocks by groups of computational entities. The blocks are interconnected with one another through cryptographic hashes of the preceding block. Each block contains three components namely the timestamp, hash of the preceding block and the data pertaining to transactions. Hence if any updates in the transactions need to incorporated, it has to be uniformly updated in all the blocks constituting the blockchain through consensus mechanism . This ensures immutability property of the blockchain which establishes blockchain as the ideal technology for addressing all types of attacks on machine learning algorithms. Blockchain technologies have been successfully implemented in cryptocurrencies, supply chain management, asset management, health care, maintenance of digital Ids and many others [14, 15]. With the present world being dependent on data centric analysis requiring accurate machine learning algorithms, it becomes extremely necessary to ensure defence from all possible attacks. There is dire need to build a model that is robust enough to combat all such attacks on datasets and ML algorithms. This acts as the primary motivation behind the present work conducted. In this work, the datasets and the ML algorithm are stored in an encrypted format in the private cloud. Any user who intends to use this dataset or ML algorithm will be issued a block id along with the hash of the dataset. The user on receipt of this id can apply the ML algorithms on the datasets to perform predictive analysis. On completion of this process at the user end, a new hash will be generated. This user generated hash will be compared with the hash of the blockchain. If the hashes match it can concluded that the datasets and the ML algorithms have not been compromised.
Figures 1 and 2 represent the architecture diagram of the proposed work. Figure 1 describes the process of data handling and storage in private cloud. The private cloud holds the encrypted fragments of the data sets and ML algorithms. The user when initiates a request for download, the data is decrypted and defragmented. Figure 2 describes the hybrid blockchain wherein the creation of the blocks is done by the administrator depicting the private blockchain and the visibility and access of the blocks are provided to the user representing the public blockchain.
The key objective in using a private cloud is to let the owner have full control over the dataset. The owner uses the private cloud to restrict access to approved users and eliminate unauthorized access. This level of access control greatly improves the overall objective of securing the dataset. On the cloud, the dataset is stored as encrypted fragments so as to improve security. Once a download request for the dataset is initiated, the fragments are decrypted and defragmented so as to provide the user the original dataset file.
The user may then use the public blockchain to view and verify the hash of file with the computed hash of the downloaded file to ensure file integrity. This helps to establish and justify the integrity of the dataset to any third party.
The admin is responsible to add the dataset name and hash of the file into the blockchain. This is done with the special private blockchain access through the admin private key wherein he/she may add a dataset hash as a block to the blockchain making it visible publicly thereby maintaining integrity of the file.
This form of integrity check with a blockchain brings in a new flavour to the existing forms of security and can act as a stepping stone for more futuristic ideas of automated security. The hybrid blockchain can act as a means of utilizing the features of both private and public blockchain to get a desired outcome. Here, we bring in the concept of full authority to the owner of the data while not restricting view of the data to the public.
Experiments and Results
To simulate the experimentation, the following software are used in this work. For fragmentation we have used 7Zip, an open source file archiever software. The private cloud is hosted in Google Cloud Platform. Blockchain is simulated with the help of Remix IDE (Ethereum) through smart contract developed using Solidity. To conduct this experimentation, Medical Cost Dataset from Kaggle is used. This dataset has 1338 Rows of data with 7 Attributes. Before storing the dataset in private cloud, it has been divided into several fragments using 7zip open source file archiever software. These fragments are then encrypted using AES encryption with 256 Bit Key size and uploaded to the Virtual Private Cloud (VPC) in Google Cloud. The admin can then compute hash of the datasets and ML algorithm and store the same in a blockchain. The linear regression algorithm is used for experimentation purposes in the present study. The sample logs created in the block chain is depicted in Figure3.
A simulation of the deployed contracts is performed to manage the blocks in the blockchain.
If the user wants to test the accuracy of ML algorithm on the dataset, he can request access from the admin for the same. When the user provides a private key, the dataset will be defragmented and the user can download the dataset and ML algorithm. The user may compute the hash of the file downloaded and compare the hash with the public blockchain access following which the experimentation of ML algorithms on the dataset can be performed by the user. After experimentation, any third party may verify the originality of the results obtained by comparing the generated hash with the public blockchain hash. If the hashes match, it means that the dataset and ML algorithm is not compromised
Conclusion and Future Scope
In this work we successfully implemented a blockchain based solution to identify attacks on machine learning algorithms and medical datasets. The use of the same concept to power the need of securing datasets of an organization would mean that the private blockchain requires authentication from a wide range of higher officials awaiting a consensus. A feasibility check on the different consensus for such a large scenario while taking into consideration the processing power, time and resources for data block creation and mining could be a much needed analysis. A complete decentralized solution of this could be the use of decentralized storage such as Inter Planetary File System (IPFS) or SWARM so that the dataset may be kept more secure and not on a single entity. Securing the dataset by decentralised storage may be a stepping stone in to the future of decentralization, a peek into web 3.0.
-  Mahdavinejad, Mohammad Saeid, Mohammadreza Rezvan, Mohammadamin Barekatain, Peyman Adibi, Payam Barnaghi, and Amit P. Sheth, ”Machine learning for Internet of Things data analysis: A survey,” Digital Communications and Networks., vol. 4, no. 3, pp. 161–175, 2018.
-  Zhang, Yun, Xiaohua Li, Jili Fan, Tiezheng Nie, and Ge Yu, ”A Blockchain Based Secure E-Commerce Transaction System,” Prof. International Conference on Web Information Systems and Applications, pp. 560–566, 2019.
-  Kwon, Hyun, Yongchul Kim, Ki-Woong Park, Hyunsoo Yoon, and Daeseon Choi, Daeseon, ”Multi-targeted adversarial example in evasion attack on deep neural network”, IEEE Access, vol. 6, pp. 46084–46096, 2018.
-  Bhagoji, Arjun Nitin, Daniel Cullina, Chawin Sitawarin, and Prateek Mittal, ”Enhancing robustness of machine learning systems via data transformations, proc. 52nd Annual Conference on Information Sciences and Systems (CISS), pp. 1–5, 2018.
-  Jagielski, Matthew, Alina Oprea, Battista Biggio, Chang Liu, Cristina Nita-Rotaru, and Bo Li, ”Manipulating machine learning: Poisoning attacks and countermeasures for regression learning”, proc. 2018 IEEE Symposium on Security and Privacy (SP), pp. 19–35, 2018.
-  Suciu, Octavian, Radu Marginean, Yigitcan Kaya, Hal Daume III, and Tudor Dumitras, ”When does machine learning FAIL? generalized transferability for evasion and poisoning attacks,” proc. 27th USENIX Security Symposium (USENIX Security 18), pp. 1299–1316, 2018.
-  IEEE Access, vol. 6, pp. 14410–14430, 2018.
-  Goodfellow, Ian, Patrick McDaniel, and Nicolas Papernot, ”Making machine learning robust against adversarial inputs,” Communications of the ACM, vol. 61, no. 7, 2018.
-  Rouani, Bita Darvish, Mohammad Samragh, Tara Javidi, and Farinaz Koushanfar, ”Safe machine learning and defeating adversarial attacks”, IEEE Security & Privacy, vol. 17, no. 2, pp. 31–38, 2019.
-  Meng, Weizhi, Elmar Wolfgang Tischhauser, Qingju Wang, Yu Wang, and Jinguang Han, ”When intrusion detection meets blockchain technology: a review,” Ieee Access, vol. 6, pp. 10179–10188, 2018.
-  Andoni, Merlinda, Valentin Robu, David Flynn, Simone Abram, Dale Geach, David Jenkins, Peter McCallum, and Andrew Peacock, ”Blockchain technology in the energy sector: A systematic review of challenges and opportunities,” Renewable and Sustainable Energy Reviews, vol. 100, pp. 143–174, 2019.
-  Panda, Priyadarshini, Indranil Chakraborty, and Kaushik Roy, ”Discretization based Solutions for Secure Machine Learning against Adversarial Attacks”, IEEE Access, 2019.
-  Deepa, N., Pham, Q. V., Nguyen, D. C., Bhattacharya, S., Gadekallu, T. R., Maddikunta, P. K. R., … & Pathirana, P. N. (2020). A Survey on Blockchain for Big Data: Approaches, Opportunities, and Future Directions. arXiv preprint arXiv:2009.00858.
-  Cai, Wei, Zehua Wang, Jason B. Ernst, Zhen Hong, Chen Feng, and Victor CM Leung, ”Decentralized applications: The blockchain-empowered software system,” IEEE Access, vol. 6, pp. 53019–53033, 2018.
-  Bojja, G. R., & Liu, J. (2020, January). Impact of IT Investment on Hospital Performance: A Longitudinal Data Analysis. In Proceedings of the 53rd Hawaii International Conference on System Sciences.