Digital or crypto currencies, for example, Bitcoin111https://bitcoin.org/en and Ethereum222https://www.ethereum.org, have recently witnessed a tremendous interest from the user  as well as the developer community [2, 3]. The crypto currencies are essentially smart contracts between the users which are executed using a data structure referred to as ‘blockchain’. Thus, a blockchain stores financial transactions whilst satisfying the following two constraints: (a) anyone should be able to write to the blockchain, and (b) there should not be any centralised control.
A blockchain is a database and an application software on top of it  that dictates the data definition and data update mechanism for the blockchain. A blockchain does not only allows to add new data to the database but it also ensures that all the users on the network have exactly the same data. Thus, a blockchain is a distributed and decentralised linked data structure for data storage and retrieval which also ensures that the data is decentralised and is resistant to data modifications.
One of the limitations of blockchain is its inherent deficiency in search query processing  primarily due to the linked data storage and the absence of a well-defined data indexing structure for various queries. Bitcoin, for instance, is the most notable blockchain network, however in practice it has two limitations: (a) it takes a considerable amount of time, possibly up to ten (10) minutes, for a transaction to be issued and verified and the final confirmation may take up to an hour, and (b) a new block can only be generated by miners, which requires extensive computations.
Databases, in addition to having a defined data structure are optimised for faster query processing [6, 7, 8, 9, 10, 11]; but are not resistant to data modifications [12, 13, 14]. More specifically, distributed databases have the following limitations: (a) database can be tampered either by a malicious user or by the database administrator [15, 16, 17], (b) the backup-based disaster recovery scheme of the database cannot be normally activated in the event of a system failure caused by data loss, and (c) the multiple copies of the database are not entirely consistent and the data synchronisation operations are required to resolve data conflicts.
Therefore, a database system is desirable that has the features of the blockchain and the distributed databases combined together such that the inherent resistance of the blockchain to data modification and the query speed of the distributed databases is achieved. In this work, we showcase ChainSQL333The source code for ChainSQL is available online at: https://github.com/ChainSQL/chainsqld, a new blockchain-based log database system that combines the features of the blockchain technology and the distributed databases.
ChainSQL has two usecases, each of which is implemented as a middleware between the enterprise application and the underlying database: (i) a multi-active database middleware that connects the enterprise application with the database system, and (ii) a database disaster recovery middleware that connects the database production nodes with the disaster recovery nodes. Details about the usecases are provided in Section 3.
ChainSQL Highlights. ChainSQL features a secure design due to the authorisation requirement to access the personal user data. The transactions are stored in the blockchain whereas the actual data is stored in the database. The data is distributed to improve service availability. Many-to-one disaster recovery architecture allows a single backup centre to be used with multiple production sites. The backup database can be operated without data recovery. Thus, ChainSQL not only provides the instantaneity of the traditional databases but also the security of the blockchain . It can be easily configured with the commonly used databases (see Figure 2 for the configuration interface) such as MySQL, Oracle, IBM DB2, and it is easy to program using APIs. As the database log is immutable, the history database actions are preserved and therefore, it allows auditing using the data stored in the blockchain. The integration of the blockchain with the new applications is simple as it only requires using the ChainSQL interface rather than the database interface.
2 ChainSQL Overview
In this section, we present an overview of ChainSQL as the context for understanding the applications that we demonstrate in Section 3. ChainSQL regards three aspects, a blockchain network, a database and a set of users, an outline of which is given as follows:
The blockchain network used by ChainSQL is Ripple444https://ripple.com due to the following reasons: (a) Ripple is able to issue and verify transactions quickly (within four (04) seconds); (b) It avoids extensive Bitcoin like computations for new block generation by incorporating its own ’unique node list’ (UNL) scheme. More details about blockchain consensus models and Ripple can be found at the study .
A database is configured on top of the blockchain nodes which is synchronised with the blockchain and facilitates quick database like blockchain ‘read’ operations.
Users can query the ChainSQL network as follows: (a) directly query the blockchain network (b) create a database blockchain node for fast access, or (c) both (a) and (b) combined together. An overview of ChainSQL access mechanism is shown in Figure 1.
We now briefly discuss ChainSQL components:
Application Interface. The access to the blockchain is via APIs provided as an interface to the application and therefore, a transaction command to the blockchain is similar to a database operation in the user context. Multiple programming languages are supported by ChainSQL APIs555https://github.com/ChainSQL, which enable the flexibility and applicability of ChainSQL.
Database Operations. The database operations are performed in a real-time environment. The blockchain network directly transmits the transaction data to the corresponding database for processing. The consensus mechanism to authenticate the validity of a transaction is given as follows: a transaction is authenticated by a set of nodes that are a subset of the blockchain network nodes drawn by the implementation of a UNL scheme. If a transaction is authenticated, it is sent to the blockchain network for consensus and is subsequently written into the database. If consensus fails, the database operation is rolled back. The entire process is completed within a few seconds and therefore, the user is updated in near real-time about the transaction status.
Database Recovery. One of the blockchain network nodes is configured with a database to keep transactions in the blockchain or to execute database operations to recreate a new table. A node on the blockchain network can be either a full-record node (that stores all the transactions on the blockchain network) or a partial-record node.
We demonstrate two usecases of ChainSQL: (i) a multi-active database middleware for connecting the user application with the underlying database, and (ii) a disaster recovery middleware that connects user application production nodes with the disaster recovery nodes.
3.1 Multi-active Database Middleware
Multi-active database is a middleware that connects the enterprise application with the underlying database. The underlying database could be either a traditional relational database or a NoSQL database. All data definition and data manipulation operations for the database are recorded in an operation log that is maintained using the blockchain technology and is immutable, i.e. it cannot be modified or deleted. The operation log can be used to regenerate the database and therefore can be used for database audit. A user application calls the ChainSQL APIs to obtain the transaction data suitable for the blockchain network and sends the transaction data to a network node. The node has to authenticate and validate the newly arrived data before it sends the data to other nodes in the network. The network nodes achieve consensus on the data in the form of transactions that are grouped together as blocks. If consensus is achieved, every node in the network has exactly the same data stored as a set of blocks in order. The nodes configured with the database send the data to the database to synchronise the database operations. In case of a node failure, the user can switch to any other node on the network seamlessly. This ensures zero-recovery-time and a multi-active database deployment in real-time. The fault node is restored from the most recent checkpoint during the recovery process. One of the concerns in a multi-active database is the security of data when it is being transmitted across the network. The middleware provides both symmetric and asymmetric encryption schemes from which the user can choose the appropriate security mechanism for the data. Another important aspect of the multi-active middleware is the expandability of the blockchain network nodes. A new node can automatically get the log from an existing node in the network and can replay that log to generate its own version of the database which is the same as other network nodes. Once it is established that the new node has the same data as the other network nodes, it can also participate in consensus build-up and synchronous data writing.
3.2 Multi-disaster Recovery Middleware
Multi-disaster Recovery is also a middleware that connects the database production nodes of the enterprise application with the disaster recovery nodes. As mentioned earlier, the user operations are recorded as log files, e.g. Binlog, Redo Log and so forth, in the production centre and are analysed prior to a blockchain transaction generation. During disaster recovery, the first step is to achieve the consensus for the blockchain network that must include the backup node such that the backup node has exactly the same data as every node on the blockchain network. When a new block is generated, the backup node reads the block and sends it to the disaster recovery centre. The recovery centre performs the database backup using the transaction data. Thus, if a node fails at the production centre, the users switch to the recovery centre to complete the task. This is achieved by elevating a backup node to the status of a production node. The data of the production centre is transmitted to the disaster recovery centre within ten (10) seconds of a node failure and the log is immediately re-executed to achieve the recovery point objective within a user-specified time.
Application. An important application scenario where ChainSQL has been used is in a banking environment. The business requires the core business data to be protected and the most recent data to be available across the whole business process. The encryption based tamper-resistance feature of the ChainSQL along with the multi-active database ensures security of the customer data and continuous business operations. In case of a node failure, the data-level disaster recovery backup system activates seamlessly thus ensuring the continuity of the business operations.
4 Related Work
Many blockchain systems have been proposed in literature of which Bitcoin, Ethereum and Ripple are some of the notable blockchain systems. The study  is a detailed note on data processing for blockchain systems. A number of initiatives have been taken from the data storage and retrieval perspective. BigChainDB , for instance, combines NoSQL document-based database capabilities for fast queries and reliability of a blockchain. The tamper-resistance is achieved via shared replication, reversion of disallowed updates or deletes, and cryptographic signing of all transactions. However, BigChainDB only supports MongoDB and lacks support for SQL databases. Another notable blockchain system, Ustore , uses locality-aware partitioning and remote direct memory access to achieve fast retrievability but in-memory data storage in Ustore compromises the disaster-recovery capability of the system.
In this proposal, we demonstrate the key features of ChainSQL and its novel applications through two usecases that are implemented as a middleware between the user application and the database. The first usecase is a tamper-resistant multi-active database and the second is a data-level disaster recovery backup. ChainSQL is the first system of its kind that features the tamper-resistance of the blockchain and the fast query processing of the distributed databases. The utility of the ChainSQL is evident from its business usecases in domains including finance and supplychain and therefore, it offers promising application scenarios for future. Future research on spatial data, dynamics, data analytics, sharding, and verification may be conducted based on the system [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31].
The work was partially supported by the CAS Pioneer Hundred Talents Program, China [grant number Y84402, 2017], SIAT-Peersafe IoT Security Lab supported by PeerSafe and PeerCome in Beijing and Shenzhen, China, respectively [grant number Y7Z0181001, 2017], and CAS President’s International Fellowship Initiative , China [grant number 2018VTB0005, 2018]. The authors would also like to acknowledge the application development contributions made by Xiaoming Lu and other developers at Peersafe and SIAT.
-  Blockchain news. https://cointelegraph.com/, 2018. online; accessed: 09-Jan-2018.
-  Davide De Rosa. Blockchain programming, 2015. online; accessed: 09-Jan-2018; available at: http://davidederosa.com/basic-blockchain-programming/.
-  Tien-Tuan-Anh Dinh et al. Untangling blockchain: A data processing view of blockchain systems. CoRR, abs/1708.05665, 2017.
-  S. Suzuki and J. Murai. Blockchain as an audit-able communication channel. In 2017 IEEE COMPSAC, volume 2, pages 516–522, July 2017.
-  Aaron Wright and Primavera De Filippi. Decentralized blockchain technology and the rise of lex cryptographia. Available at: https://ssrn.com/abstract=2580664, Mar 2015.
-  Ali Hadian, Sadegh Nobari, Behrouz Minaei-Bidgoli, and Qiang Qu. ROLL: fast in-memory generation of gigantic scale-free networks. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 1829–1842, 2016.
-  Qiang Qu, Jiangnan Qiu, Chenyan Sun, and Yanzhang Wang. Graph-based knowledge representation model and pattern retrieval. In Fifth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2008, 18-20 October 2008, Jinan, Shandong, China, Proceedings, Volume 5, pages 541–545, 2008.
-  Siyuan Liu and Qiang Qu. Dynamic collective routing using crowdsourcing data. Transportation Research Part B: Methodological, 93:450–469, 2016.
-  Qiang Qu, Cen Chen, Christian S. Jensen, and Anders Skovsgaard. Space-time aware behavioral topic modeling for microblog posts. IEEE Data Eng. Bull., 38(2):58–67, 2015.
-  Qiang Qu, Siyuan Liu, Bin Yang, and Christian S. Jensen. Integrating non-spatial preferences into spatial location queries. In Conference on Scientific and Statistical Database Management, SSDBM ’14, Aalborg, Denmark, June 30 - July 02, 2014, pages 8:1–8:12, 2014.
-  Qiang Qu, Siyuan Liu, Feida Zhu, and Christian S. Jensen. Efficient online summarization of large-scale dynamic networks. IEEE Trans. Knowl. Data Eng., 28(12):3231–3245, 2016.
-  C.L. Philip Chen and Chun-Yang Zhang. Data-intensive applications, challenges, techniques and technologies: A survey on big data. J. Inf. Sci., 275(C):314 – 347, 2014.
-  Muhammad Muzammal, Qiang Qu, and Bulat Nasrulin. Renovating blockchain with distributed databases: An open source system. Future Generation Comp. Syst., 90:105–117, 2019.
-  Bulat Nasrulin, Muhammad Muzammal, and Qiang Qu. Chainmob: Mobility analytics on blockchain. In 19th IEEE International Conference on Mobile Data Management, MDM 2018, Aalborg, Denmark, June 25-28, 2018, pages 292–293, 2018.
-  Muhammad Muzammal, Moneeb Gohar, Arif Ur Rahman, Qiang Qu, Awais Ahmad, and Gwanggil Jeon. Trajectory mining using uncertain sensor data. IEEE Access, 6:4895–4903, 2018.
-  Siyuan Liu, Qiang Qu, Lei Chen, and Lionel M. Ni. SMC: A practical schema for privacy-preserved data sharing over distributed data streams. IEEE Trans. Big Data, 1(2):68–81, 2015.
-  A. S. M. Touhidul Hasan, Qiang Qu, Chengming Li, Lifei Chen, and Qingshan Jiang. An effective privacy architecture to preserve user trajectories in reward-based LBS applications. ISPRS Int. J. Geo-Information, 7(2):53, 2018.
-  Arati Baliga. Understanding blockchain consensus models. Technical report, Persistent Systems Ltd., 2017. Available online at: https://pdfs.semanticscholar.org/da8a/37b10bc1521a4d3de925d7ebc44bb606d740.pdf.
-  Trent McConaghy et al. BigchainDB: A scalable blockchain database. BigChainDB, 2016.
-  Ann Dinh et al. Ustore: A distributed storage with rich semantics. CoRR, abs/1702.02799, 2017.
-  Xin Cao, Lisi Chen, Gao Cong, Christian S. Jensen, Qiang Qu, Anders Skovsgaard, Dingming Wu, and Man Lung Yiu. Spatial keyword querying. In Conceptual Modeling - 31st International Conference ER 2012, Florence, Italy, October 15-18, 2012. Proceedings, pages 16–29, 2012.
Lei Wang, Hongyan Li, Qiang Qu, Huaqiang Zhang, and Bin Zhou.
Verifying the consistency between business process model and data
First IITA International Joint Conference on Artificial Intelligence, Hainan Island, China, 25-26 April 2009, pages 171–174, 2009.
-  Qiang Qu, Feida Zhu, Xifeng Yan, Jiawei Han, Philip S. Yu, and Hongyan Li. Efficient topological OLAP on information networks. In Database Systems for Advanced Applications - 16th International Conference, DASFAA 2011, Hong Kong, China, April 22-25, 2011, Proceedings, Part I, pages 389–403, 2011.
-  Fang Zhou, Qiang Qu, and Hannu Toivonen. Summarisation of weighted networks. J. Exp. Theor. Artif. Intell., 29(5):1023–1052, 2017.
-  Sadegh Nobari, Qiang Qu, and Christian S. Jensen. In-memory spatial join: The data matters! In Proceedings of the 20th International Conference on Extending Database Technology, EDBT 2017, Venice, Italy, March 21-24, 2017., pages 462–465, 2017.
-  Muhammad Muzammal. Test-suite prioritisation by application navigation tree mining. In International Conference on Frontiers of Information Technology, FIT 2016, Islamabad, Pakistan, December 19-21, 2016, pages 205–210, 2016.
-  Muhammad Muzammal and Rajeev Raman. Mining sequential patterns from probabilistic databases. Knowl. Inf. Syst., 44(2):325–358, 2015.
-  Muhammad Muzammal and Rajeev Raman. Mining sequential patterns from probabilistic databases. In Advances in Knowledge Discovery and Data Mining - 15th Pacific-Asia Conference, PAKDD 2011, Shenzhen, China, May 24-27, 2011, Proceedings, Part II, pages 210–221, 2011.
-  Qiang Qu, Siyuan Liu, Christian S. Jensen, Feida Zhu, and Christos Faloutsos. Interestingness-driven diffusion process summarization in dynamic networks. In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part II, pages 597–613, 2014.
-  Biying Tan, Feida Zhu, Qiang Qu, and Siyuan Liu. Online community transition detection. In Web-Age Information Management - 15th International Conference, WAIM 2014, Macau, China, June 16-18, 2014. Proceedings, pages 633–644, 2014.
-  Ildar Nurgaliev, Muhammad Muzammal, and Qiang Qu. Enabling blockchain for efficient spatio-temporal query processing. In Web Information Systems Engineering - WISE 2018 - 19th International Conference, Dubai, United Arab Emirates, November 12-15, 2018, Proceedings, Part I, pages 36–51, 2018.