Verity: Blockchains to Detect Insider Attacks in DBMS

01/02/2019
by   Shubham S. Srivastava, et al.
0

Integrity and security of the data in database systems are typically maintained with access control policies and firewalls. However, insider attacks -- where someone with an intimate knowledge of the system and administrative privileges tampers with the data -- pose a unique challenge. Measures like append only logging prove to be insufficient because an attacker with administrative privileges can alter logs and login records to eliminate the trace of attack, thus making insider attacks hard to detect. In this paper, we propose Verity -- first of a kind system to the best of our knowledge. Verity serves as a dataless framework by which any blockchain network can be used to store fixed-length metadata about tuples from any SQL database, without complete migration of the database. Verity uses a formalism for parsing SQL queries and query results to check the respective tuples' integrity using blockchains to detect insider attacks. We have implemented our technique using Hyperledger Fabric, Composer REST API, and SQLite database. Using TPC-H data and SQL queries of varying complexity and types, our experiments demonstrate that any overhead of integrity checking remains constant per tuple in a query's results, and scales linearly.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

02/06/2018

Axiomatic Foundations and Algorithms for Deciding Semantic Equivalences of SQL Queries

Deciding the equivalence of SQL queries is a fundamental problem in data...
08/24/2018

Integrity Authentication for SQL Query Evaluation on Outsourced Databases: A Survey

Spurred by the development of cloud computing, there has been considerab...
03/26/2021

HyperSec: Visual Analytics for blockchain security monitoring

Today, permissioned blockchains are being adopted by large organizations...
12/06/2018

vChain: Enabling Verifiable Boolean Range Queries over Blockchain Databases

Blockchains have recently been under the spotlight due to the boom of cr...
06/06/2018

Extended Diffix

A longstanding open problem is that of how to get high quality statistic...
01/13/2020

Leveraging Blockchain for Immutable Logging and Querying Across Multiple Sites

Blockchain has emerged as a decentralized and distributed framework that...
08/11/2020

High-concurrency Custom-build Relational Database System's design and SQL parser design based on Turing-complete automata

Database system is an indispensable part of software projects. It plays ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Conventional integrity constraints in a relational database system (DBMS) involve ensuring the integrity of tuples according to predefined constraints such as foreign key constraints, data types of attribute values, etc. Another aspect of integrity stems from malicious tampering of the tuples in a DBMS. Typically this data integrity is ensured with access control policies and firewalls. With access control policies, only selected few users of a DBMS are given administrative privileges. Firewalls ensure that an outsider cannot get direct access to the DBMS server. However, an insider attack is where a privileged user, e.g., an administrator, misuses the privileges to gain access to or tamper with the data. As reported in the recent 2017 and 2018 surveys [4, 5, 15, 31], about 30% of organizations face insider attacks and a staggering 55–60% of the attackers are privileged users or administrators. Among the assets that are most at risk, “database systems” top the list with 50–57% of the insider attacks on them. The surveys conjecture that the topmost reason for insider attacks is “insufficient data protection strategies”. The surveys further point out that 90% of organizations feel vulnerable to insider threats, and insider attacks remain the most difficult to detect. Thus, insider attack has become a non-trivial and non-negligible issue in protecting the data integrity in a DBMS.

Insider attacks can be passive or active. Passive attacks involve unethical access and use of the data, while active attacks involve tampering with the data and logs, to alter the results of queries. A simple yet illuminating example of the second type of attack is tampering with academic grade records, and it has been reported multiple times in the recent past [3, 6, 7, 13]. Some countries like India have adopted Electronic Voting Machines (EVMs) for elections, avoiding paper ballots. An EVM has an embedded DBMS inside it, and each vote serves as a transaction. These EVMs are also vulnerable to insider attacks and tampering [8, 11, 18]. The traditional databases have come a long way over the past several decades in efficient, diverse, and scalable data storage solutions. SQL query optimization and processing along with the modern hardware has efficiently tackled the “memory-wall”. However, as noted above, the new age challenges are security and integrity of the data, and detection and prevention of insider attacks forms a critical component.

On the other hand, blockchain is an emerging technology for decentralized data storage with strong guarantees of immutability and tamper resistance. Blockchains can be considered analogous to append only logs in a native DBMS. However, in an insider attack, the attacker with administrative privileges can alter logs and login records to remove proof of data tampering. One straight-forward solution could be to push all the databases on blockchain frameworks. Indeed, there have already been efforts in this direction. E.g., BigchainDB [9] integrates Tendermint [20, 44] with MongoDB [17] (a NoSQL database), and provides a high transaction rate. However, it supports only decentralized blockchain based data management eco-system and supports only MongoDB’s querying interface. LedgerDB [16] is another blockchain based database, which supports high transaction throughput. However, LedgerDB supports only a single table and does not support various SQL features. ChainDB by Bitpay Inc [10] is another such solution, but it focuses on bitcoin transactions, than providing a general purpose DBMS solution. Most blockchain frameworks, as well as blockchain powered DBMSs, do not provide a rich SQL querying interface that is common to a modern DBMS. Additionally, there is a growing concern for data privacy in pushing the existing data on the public blockchain networks [1].

Intrusion Detection Systems (IDS) are analytical systems that focus on user profiling for suspicious activity detection, e.g., sudden large financial transactions, user logins from irregular locations, transactions non-compliant with the DBMS policies etc [30, 27, 41, 51, 45, 38, 26]. An IDS will not necessarily detect an insider attack if the attack does not violate its analytical modelling and user-profiling framework rules, e.g., a DBMS administrator illegitimately modifying or inserting a few tuples in a DBMS may not come under IDS radar if there is no perceived irregularity of the behaviour.

On this background, we propose a solution to detect insider attacks in a DBMS. The primary contribution of this paper is Verity, that acts as a framework facilitating use of any blockchain with any SQL DBMS (centralized or distributed). It uses the data immutability of blockchains, along with the rich SQL interface of a DBMS, without requiring to migrate entire data, or adopting a new query interface or language. The main novelty of Verity lies in our detailed algorithmic and protocol framework to handle a variety of CRUD (Create, Read, Update, Delete) SQL queries by supporting a large part of the SQL grammar – specifically the queries having nested SELECT clauses and joins over multiple tables (Section 3, Section 3.2). Verity layer in itself does not store any data or metadata about data, thus maintaining data privacy. It facilitates identification of illegitimate data tampering whenever the tampered data is accessed in an SQL query evaluation. This in turn helps to stop any cascade effect of the tampered data affecting critical decisions based on it, e.g., academic grades, financial accounting etc111Prevention of an insider attack will require different artifacts for interacting with the DBMS, and is not within the scope of the current paper.. Our solution also allows the flexibility of enabling or disabling the blockchain-based integrity checking in a plug-and-play fashion. We have implemented Verity using web-based SQL interface, Hyperledger Fabric [14], its Composer REST API [2], and SQLite database [19]. Since Verity is a framework facilitating the use of blockchains with an SQL DBMS, for the scope of this paper, we have not focused on the aspects of performance optimization for increasing the system throughput, failure-recovery, or efficient storage/indexing methods, because they are dependent on individual blockchain and DBMS platforms. However, our experimental results on TPC-H data of varying scaling factor and SQL queries of varying complexity (Section 5, Appendix A) demonstrate that any overhead incurred by Verity’s integrity checking process remains constant per tuple in the results of a query, and thus scales linearly. Also in Section 4 we discuss the aspects of possible future throughput optimizations.

2 Problem Setting

In this section, we discuss preliminaries about blockchain framework, cryptographic hash functions, and data privacy, along with an overview of the attacker model and proposed framework.

2.1 Blockchains

The concept of blockchains was introduced as a technology powering a peer to peer digital currency, Bitcoin [46]. However, capabilities of blockchains go beyond cryptocurrencies, as it can be used as a decentralized data store with strong cryptographic guarantees of tamper resistance. Conceptually, a blockchain is a linked-list, where each node in the list is called a block. Each block is cryptographically linked to the previous block, forming a continuously growing chain of blocks. The first block is called Genesis Block, which is known to all the blockchain participants, and acts as a common reference for verifying the sanctity of the blockchain. Each block after genesis may contain several transactions and other metadata such as timestamp, block height (distance from the Genesis Block) etc. Unlike a conventional data storage or DBMS, blockchain has stronger cryptographic guarantees, i.e., data once written in a block, cannot be easily modified, and creation of a new block on the chain requires consensus among blockchain peers. Any roll-back of a transaction or erasure of data gets logged as another transaction in a block, thus providing strong guarantees of data provenance (traceability).

In a blockchain, all the participants are peers. Identities of these peers may be known or hidden, which makes two broad categories of blockchains: (a) permissionless with hidden peer identities, and (b) permissioned with known peer identities. Permissionless blockchains are more popular for cryptocurrencies, where any peer can join the blockchain network and participate in the consensus protocol. Permissioned blockchains are more suitable for business applications, and they provide an alternate way of peer consensus than permissionless blockchains. Among permissioned blockchains, each may have different type of peer consensus. In Verity, we have used Hyperledger Fabric [14]. However, Verity by design is agnostic to any specific blockchain, and if a different blockchain network is used, appropriate peer consensus protocol will need to be implemented. For the scope of this paper, we do not get into the details of peer consensus algorithms, and refer the interested reader to the relevant literature [46, 14].

However, we would like to make a note here that unlike a conventional DBMS, where even a single administrator or a small number of them colluding can tamper the data and logs, in a blockchain, it requires collusion among a large fraction of peers, making it a non-trivial process. We explain this further in Section 2.2.

2.2 Attacker Model Assumptions

A DBMS has one or more administrators, and we assume that an inside attacker has full access to the database server with administrative privileges. Thus the administrator/s can tamper with the data, and modify logs and login records as well. A typical DBMS and its administrators do not have a strong cryptographic peer consensus protocol like blockchains to authorize a change. Any subset of the administrators can pose a threat as inside attackers.

The goal of Verity is to detect an insider attack, so we assume that the set of peers in a permissioned blockchain that we use along with a DBMS, are not exactly same as the respective database administrators. We also assume that, in a rare case, if the permissioned blockchain peers are same as DBMS administrators, not all of them are colluding in an insider attack. An insider attack that may go undetected for a long time can jeopardize any decisions based on the tampered data due to the cascading effect. In this scenario, we also assume that in an organization, not all critical decision makers are DBMS administrators – for the sake of fairness, the three sets (1) blockchain peers, (2) DBMS administrators, and (3) critical decision makers are not completely overlapping one another – thus making collusion among a large fraction of them less likely.

Our goal is to make tamper detection automatic, whenever the tampered data is accessed through SQL queries, than relying on manual audit, and without involving any specific privileged users. We also want tampering to be visible to anyone having access to the concerned data. In Section 3 we demonstrate how we achieve this.

For the scope of Verity, we assume that administrators’ access methods, such as private keys or passwords are secure. Our attack model does not consider any application vulnerability or system vulnerability based attacks – we consider that otherwise the database is not hacked – only a privileged user is tampering with the tuples. Note, however, that since Verity uses blockchains to log changes and verify tuple integrity, as described further in this paper, even data tampering done via hacking and bypassing blockchains can be detected. Peer consensus makes logging an illegitimate change on blockchain non-trivial, and adds traceability to it. Any aggressive, active damage to the system such as corrupting the hardware or physical theft of property is self-evident, and will result in detection of the attack.

2.3 Cryptographic Hash Functions

In the blockchain nodes we mainly store metadata about tuples in a DBMS. This process is elaborated in Section 2.4. In this section, we set the basic background of hash functions used to generate this metadata.

Cryptographic hash functions [12, 37] map any data of arbitrary length to a bit string of fixed length, called hash. A hash function is given as: where is a constant such as 128, 256, or 512. By design, cryptographic hash functions are one-way functions, and it is computationally difficult to invert (a.k.a. break) them, to generate the original data from a given hash. Also for two different pieces of input data, the difference in their hash values is not relative, and often a minor change in the input value results in a drastic change in the output hash value. Noting these properties of cryptographic hash functions, in Verity we use SHA256 [37] to generate a unique fingerprint of 256-bit for each tuple in the DBMS. This procedure is elaborated in the next section.

2.4 Metadata about Data

In order to use a blockchain network with an existing DBMS, without migrating the whole data on the blockchain, we create metadata about data, and only store that on the blockchain. This metadata is generated using the SHA256 cryptographic hash function for each tuple in a DBMS as follows. We assume that each table has a primary key defined on it222If a table does not have a primary key, we can use row-number as the primary key, or treat all the columns together as a composite primary key.. For each tuple in table , we first generate RowID as:

That is we concatenate primary key of the tuple along with the table name and generate a hash from it. Assuming that the tuple has in all columns, , using the RowID generated earlier, we generate fingerprint of the entire tuple as follows:

That is, we concatenate the RowID along with each column value and generate another 256-bit hash from it. If an attribute has a NULL value, we skip over that and concatenate the next non-NULL valued attribute. This is specifically done so that any change of a NULL value to a non-NULL and vice versa can be detected through the change in the fingerprint.

This fingerprint is then stored on a blockchain using digital signature (private key) of a blockchain peer, and after passing the peer consensus protocol (ref. Section 2.1). Thus every fingerprint of every tuple stored in a blockchain has an owner (blockchain peer) of the transaction associated with it. Any legitimate modifications to a tuple get logged on the blockchain with a new fingerprint of the tuple along with the digital signature of the respective owner. Any previous fingerprints of the modified tuple are also preserved in the blockchain. Thus blockchain provides non-repudiation on maintaining the number of legitimate tuple modifications along with their owners. However, with hash based fingerprints, the exact nature of modifications cannot be tracked through blockchains. This is in order to honour data privacy as discussed next.

2.5 Data Privacy

In our setting, the aforementioned fingerprint of each tuple in a DBMS is pushed on the blockchain. Along with this, we also store number of rows in each table on the blockchain. Detailed handling of this metadata in the event of updates is discussed in Section 3. Note that our method of storing fingerprints preserves the privacy of the original data in a DBMS, because from a fingerprint of a tuple, original attribute values cannot be retrieved (ref. Section 2.3). We trust that this is an important property of Verity system, especially among the growing concern of data privacy in using public blockchain frameworks [1]. Thus with Verity, even if a public blockchain framework is used for storing metadata, the original database values are never revealed on the blockchain framework.

2.6 Detecting Tampering

Conceptually, working of Verity can be summarized as follows:

  1. [itemindent=0mm,labelindent=0mm,leftmargin=4mm,noitemsep]

  2. We assume that at the very beginning the data in a DBMS is clear from any tampering or existing attack.

  3. We create a fingerprint for each tuple, and store this fingerprint on the blockchain.

  4. As given in Section 2.2, we assume that the same set of DBMS administrators are not peers on the blockchain network, and not all blockchain peers are colluding in an attack.

  5. Further we assume that a normal user does not have a way to access/query the given DBMS by circumventing the Verity framework. Note that in Section 2.2, we have clarified that in the context of Verity, we are not assuming a hacker scenario, and private keys and passwords are secure.

  6. A well-intentioned DBMS administrator makes any updates to the data through Verity, following the peer consensus protocol, and the modification gets logged into the blockchain. Each update that gets logged on the blockchain has a digital signature of the peer submitting the transaction. Thus for every blockchain update, there is accountability, and number of modifications can be traced through blockchain.

  7. When an inside attacker tampers with the DBMS by circumventing Verity and in turn blockchain logging, the tampering gets detected when the tampered tuples are retrieved in an SQL query at a later time issued through Verity’s interface.

Having summarized the overall concept and functioning of Verity, next in Section 3, we elaborate on the detailed architecture, and handling of SQL queries by taking into consideration SQL grammar. We trust that this will facilitate the community to use the powerful features of any commercial or opensource blockchain framework with any SQL database, without having to migrate entirely to a new system. This, in our opinion, will greatly help in faster detection of insider attacks.

3 Architecture

Three main components of the Verity framework are:

  1. [label=(),leftmargin=8mm,noitemsep]

  2. A blockchain network,

  3. An SQL database, and

  4. An HTTP based web application connecting these two that intercepts SQL queries for data integrity checks.

Figure 1: Architecture and Communication Sequence

Blockchain network and SQL database can be any blockchain and any DBMS. It is the third component (c) where we make our main contributions. Through this component we facilitate a DBMS to use a blockchain network without requiring to migrate entire data on the blockchain. Verity’s system architecture is given in Figure 1. The web application server is an HTTP based frontend for users to issue SQL queries.

In this interface, we have built our logic of parsing and intercepting SQL queries issued by an end user, and checking fingerprints of the tuples involved in building the results of that query against blockchain. REST API is an interface for Hyperledger Composer framework and is used for querying Hyperledger Fabric for retrieving or adding tuple fingerprints (ref. Section 2.4).

Note: In Verity, we have used an HTTP based interface for SQL query processing, but this interface can be in any other form too. E.g., command line or programmatic (e.g., JDBC, ODBC), based on the individual application need. Using the detailed SQL parsing algorithms given further in this section, any different interface can be built for achieving the same functionality.

In Section 3.1 we take a brief overview of the sequence of actions taken in the Verity framework for any SQL query issued through it, and then in Section 3.2 we give the details of handling four main type of SQL queries – SELECT, INSERT, UPDATE, DELETE, and any intermix of them adhering to the SQL grammar.

3.1 Verity Communication Sequence

Figure 1 shows the interactions between different components of Verity for any SQL query. An SQL query initiated using the web application interface (

1
) is parsed by the SQL processor, and it is sent to the DBMS with methodical modifications required to check integrity (

2
). The details of this process are discussed in Section 3.2. DBMS returns tuples matching this modified query (

3
). These tuples are then verified using the corresponding fingerprints stored on the blockchain (

4
,

5
), which constitutes Verity’s integrity checking phase. Once the check is successfully completed, Verity returns results of the original SQL query to the end user. Thus the integrity checking process is completely opaque to the end user, and user is notified only if the integrity check fails. In case of INSERT, UPDATE, DELETE queries, steps

2

5
may have to be executed twice if these queries have nested SELECT queries inside them. This process is elaborated further in Section 3.2.

3.2 SQL Parsing

As noted before in Section 2.6, our tamper detection model is through intercepting SQL queries and their results to check tuples’ integrity. In this section we elaborate on this aspect and show methodically how tampering can be detected through subsequent SQL query results using the Verity framework. We achieve this in the following main steps.

  1. [itemindent=0mm,labelindent=0mm,leftmargin=4mm,noitemsep]

  2. Parse the given SQL query using standard SQL grammar.

  3. Determine which tuples from the base tables are accessed, modified, or inserted in the query execution and results. For this step, Verity maintains information about the DBMS table schemas with it. Note that consistent with Verity’s privacy policy, it only stores DBMS schema, and not actual data or metadata (fingerprints of the tuples).

  4. Depending on the type of the query (detailed in Sections 3.2.13.2.4) retrieve the tuples accessed or modified by the query from the base tables in entirety, i.e., with all the attributes of those tuples. This step is opaque to the end user.

  5. Generate fingerprints of these tuples as given in Section 2.4, and cross-check those fingerprints against Hyperledger Fabric, or if tuples are inserted or modified, log the new fingerprints on Hyperledger.

  6. Once the validity of all the base tuples is established, send back the results of the original user query.

SQL is a rich data manipulation language with a lot of syntactic sugar. Within the entire range of SQL’s syntactic features, currently in Verity, we have focused on SELECT, INSERT, UPDATE, and DELETE queries, with any intermix of them. Presently we do not handle queries with outerjoins, and IN, ANY, EXISTS, GROUP-BY, HAVING, and aggregation clauses. In the future we plan to extend our parser to handle most of the SQL syntax.

In Verity, we do not assume any special access to the DBMS for knowing the tuples processed by an SQL query. Thus it is imperative to methodically intercept and parse every SQL query to get the base tuples for the validity check. We have achieved this through a LookAhead Left-to-Right (LALR) bottom up SQL parser. This is similar to a typical DBMS query parser and plan generator.

Next we present our algorithms to handle the four type of queries with any intermix and nesting. Algorithm LABEL:alg:parser is the general outer wrapper which accepts an SQL query and invokes appropriate parsing mechanism depending on the type of the outermost query, i.e., if an INSERT query has a nested SELECT subquery, the parser calls Algorithm LABEL:alg:insert meant to process INSERT queries (ref line LABEL:alg:parser:insert in Algorithm LABEL:alg:parser).

algocf[htbp]    

3.2.1 SELECT queries

SQL grammar for a SELECT query is given below (complete grammar is not shown for the sake of simplicity).

  select_statement: ‘SELECT’ projection-attr ‘FROM’
                     tables [‘WHERE’ qualifications]
  tables: single_table | ‘(’ select_statement ‘)’
                    | tables ‘,’ tables

An SQL query – with or without joins – can have projections for only some attributes from the tables, and SQL grammar allows FROM clause to have a nested SELECT query, which is treated as a temporary table333Since currently we do not handle IN, ANY, EXISTS, GROUP-BY, HAVING clauses, we assume that the WHERE clause will be devoid of any nested SELECT statements.. However, recall from Section 2.4 that the fingerprint of each tuple stored on the blockchain includes all the attributes in that tuple. Thus to verify integrity of a tuple, we need to retrieve all the attributes of it from the base tables. One naïve way of doing it is to retrieve all the tuples in the base tables before query execution, and check each tuple’s integrity. However, this incurs following main problems – (a) it misses the advantages of selectivity of a query444Selectivity of a query is high if it accesses only few tuples, and vice versa., (b) it puts the onus of performing joins of these tuples after verification on the Verity framework, thus missing the benefits of SQL query optimization methods of a native DBMS, (c) it incurs the problem of “atomicity” of checking the integrity of tuples and returning query results to the user – commonly known as the Time Of Check to Time Of Use (TOCTOU) race condition. Hence we perform this step as follows.

As shown in the grammar, a SELECT query can have nested subqueries within it as a part of the FROM clause . Our LALR parser does bottom-up parsing of all the subqueries wherein innermost nested SELECT query is parsed first. For every detected SELECT query, the parser modifies its projected attributes to include all the attributes of the respective base tables in the FROM clause. Consider for instance, the following query on tables with attributes , , respectively stored in that order.

ŁSELECT_t1.a,_t2.b,_t3.c ŁFROM_t1,_t2,_t3 ŁWHERE_t1.a=t2.a_AND_t2.b=t3.b

This query has two joins over three tables, , and is projecting out only three out of total nine attributes in three tables. For Verity’s integrity check however, we need to have all the attributes of the tuples from that are part of the join results. Thus Verity’s SQL parser internally rewrites this query as follows.

ŁSELECT_t1.x,_t1.y,_t1.a,_t2.a,_t2.s, Ł11t2.b,_t3.c,_t3.d,_t3.b ŁFROM_t1,_t2,_t3 ŁWHERE_t1.a=t2.a_AND_t2.b=t3.b

Now consider the same query rewritten in a different syntax using a nested SELECT clause as given below.

ŁSELECT_t1.a,_r1.b,_t3.c ŁFROM_t1,_(SELECT_a,_b_FROM_t2)_AS_r1,_t3 ŁWHERE_t1.a=r1.a_AND_r1.b=t3.b

This query is intercepted and rewritten by Verity as:

ŁSELECT_t1.x,_t1.y,_t1.a, Ł11r1.a,_r1.s,_r1.b, Ł11t3.c,_t3,d,_t3,b ŁFROM_t1,_(SELECT_a,_s,_b_FROM_t2)_AS_r1,_t3 ŁWHERE_t1.a=r1.a_AND_r1.b=t3.b

algocf[t]    

Algorithm LABEL:alg:select shows programmatic way of doing this procedure for a SELECT query having any level of nesting. At the beginning Algorithm LABEL:alg:select receives the original user query , with empty. The original query is stored for later reference as (line LABEL:alg:select:qorig). If the query has nested SELECT queries, then they are parsed recursively. Every time  is invoked on a nested query, the outer SELECT query is sent as . Note here that since this is a bottom-up parser, the outermost query does not get parsed or processed until all the inner queries are done parsing (lines LABEL:alg:select:recurse-beginLABEL:alg:select:recurse-end).

When the query has no more nested subqueries, we change the projected attributes in the current query to include all the attributes of all the base tables in that query (line LABEL:alg:select:changeproj). Recall that we store the DBMS schema with Verity to be able to do this. After the innermost subquery is parsed, recursion unwinds, and all the parent queries are processed to change projected attributes from their respective base tables. When the recursion ends, we are back to the outermost query. Here we get a new different from which projects out all the attributes in the accessed base tables. Then this query is executed on the DBMS (line LABEL:alg:select:dbexec). table_list contains all the base tables accessed in the query, including any subqueries. table_list is populated while the queries are parsed recursively in a bottom-up manner. This step is not shown explicitly in the algorithm for simplicity of presentation.

The results returned by this modified are cached in . Note that this is a temporary caching of the results, and Verity does not store persistently in any manner. Then for each unique base table in the query, we project out tuples of only that table from by using the DBMS schema stored in Verity, and verify the tuple fingerprint against Hyperledger Fabric (lines LABEL:alg:select:foreachtable-beginLABEL:alg:select:foreachtable-end). Once this verification is done, we project out the attributes from according to the original query , and generate the results of the original user query (line LABEL:alg:select:origres).

3.2.2 UPDATE queries

The grammar for UPDATE queries is as given below.

update_statement: ‘UPDATE’ single_table ‘SET’
                set_clauses [‘WHERE’ qualifications]
set_clauses: set_clause ‘,’ set_clauses
            | set_clause
set_clause: identifier ‘=’ expression
           | identifier ‘=’ ‘(’ select_statement ‘)’

An UPDATE query can have nested SELECT queries as a part of the SET clause. For instance consider the following UPDATE query.

ŁUPDATE_T1_SET_T1.a_=_ Ł(SELECT_a_FROM_T2_WHERE_T2.key=1234) ŁWHERE_T1.b_=_4567

In this query whichever row of has “b” column with value 4567, its respective “a” column is updated with a value of “a” from such that in the same row has value 1234. Note that in this query the nested SELECT query must always return a unique single value, and not a list of values (as that will violate the arithmetic of “=” operator).

Here Verity uses the same principle of modifying the query along with its nested queries to get – (a) the entire tuples that are going to provide update values, (b) old tuples that are going to get updated. It first checks the fingerprints of all these rows, generates new fingerprints for the updated rows, stores them on Hyperledger, and then sends the updated rows to the DBMS. This is achieved methodically using Algorithm LABEL:alg:update. Its functioning is explained briefly as follows.

algocf[htbp]    

If the SET clause of UPDATE query has nested SELECT queries, then for each such SELECT query , we invoke Algorithm LABEL:alg:select for   to get the tuples used in UPDATE. Note that Algorithm LABEL:alg:select also methodically checks the integrity of the tuples accessed by this SELECT query, thus ensuring that the tuples used for an update have not been tampered. The SET clause can have multiple such SELECT queries for each set condition. The results returned for each SELECT query are stored separately as rows[Q’] (lines LABEL:alg:update:each-select-startLABEL:alg:update:each-select-end). Then if there is a WHERE clause in the original UPDATE query, e.g., WHERE T1.b = 4567 of the example query given above, we retrieve old tuples from table that will get updated (lines LABEL:alg:update:newq-beginLABEL:alg:update:dbexec).

We check the fingerprints of these tuples against Hyperledger (line LABEL:alg:update:checkoldkeys). Next we construct the new tuples that will be in the table after the update (line LABEL:alg:update:construct), create their fingerprints, and send them to Hyperledger. Note that since we have completely deconstructed the original UPDATE query and have individual updated tuples, we simply send these individual tuples to the DBMS than the original UPDATE query (lines LABEL:alg:update:foreachrow-startLABEL:alg:update:foreachrow-end).

3.2.3 INSERT queries

SQL grammar for an INSERT query is as given below.

insert_statement: ‘INSERT INTO’ single_table
        ‘(’ id_list ‘)’ ‘VALUES’ ‘(’ expr_list ‘)’
        | ‘INSERT INTO’ single_table
        ‘(’ id_list ‘)’ select_statement

algocf[htbp]    

Like UPDATE, INSERT query too can have nested SELECT statements in it. This is when the tuples are inserted by constructing them out of a results of another SELECT query. We have given a methodical way of handling an INSERT query in Algorithm LABEL:alg:insert. If the INSERT query has a nested SELECT, we invoke Algorithm LABEL:alg:select, verify the tuples accessed by this SELECT query, done as a part of Algorithm LABEL:alg:select, and get the tuples to be inserted (lines LABEL:alg:insert:select-startLABEL:alg:insert:select-end in Algorithm LABEL:alg:insert). Else tuples are populated from the original query (line LABEL:alg:insert:nonesting). For each new tuple to be inserted, we generate its fingerprint and store it on Hyperledger. Like UPDATE queries, here too we deconstruct the INSERT query completely to generate each new tuple, and thus we send these individual tuples to the DBMS for insertion than executing the original INSERT query (lines LABEL:alg:insert:foreachrow-startLABEL:alg:insert:foreachrow-end).

3.2.4 DELETE queries

DELETE query’s grammar is as given below.

delete_statement: ‘DELETE FROM’ single_table
                [‘WHERE’ qualifications]

Processing of a DELETE query is very similar to an INSERT query. DELETE query does not contain any nested SELECT query. This is because currently we do not process IN, ANY, EXISTS, GROUP-BY, HAVING clauses, which may in turn contain nested SELECTs as a part of the WHERE qualifications. Thus processing of DELETE queries just involves fetching the tuples to be deleted, verifying their fingerprints before deletion, marking their fingerprints as deleted on Hyperledger, and then deleting them from DBMS. Algorithm LABEL:alg:delete shows this procedure.

algocf[htbp]    

4 Discussion and Future Work

In this section we discuss additional factors for future enhancements in Verity.

4.1 Thwarting SQL Injection Attack

In Section 3.2, we elaborated on our main contribution, i.e., our way of intercepting SQL queries using the SQL grammar, methodically deconstructing each nested SQL query, and verifying the integrity of the tuples returned by that query. A positive side effect of this is – Verity’s SQL parser can act as an intermediary that can thwart SQL injection attacks. In the future, Verity’s SQL parser can also be empowered with checks for malicious SQL statements.

4.2 ACID Properties

ACID (Atomicity, Consistency, Isolation, Durability) properties make the core of any mature DBMS. Verity is a dataless framework between a blockchain and a DBMS (it does not store any data or metadata with itself), thus durability does not apply to Verity. However, queries that modify the tuples in DBMS are processed via Verity, and thus need atomicity, consistency, and isolation properties of transaction management. As given in Section 3.2, Verity maintains information about the schema of the database. Using that, in the future, we plan to have an elaborate method to handle concurrent SQL queries that are modifying tuple values. DBMS transaction management is a well-studied topic [49], and borrowing the same concepts of transaction management and serializability, we can handle concurrent SQL queries modifying the tuples using the DBMS schema.

4.3 Optimizations

Currently the Verity SQL parser is single-threaded, processes one query at a time, and makes blockchain fingerprint lookup one tuple at a time for that query. Also for any transactional queries (queries that modify tuples), to maintain atomicity, consistency, and isolation (ACI properties) Verity immediately pushes them on Hyperledger. The typical block creation process is time consuming, and Hyperledger Composer adds further overheads [33]. This is the reason for our relatively higher total query execution times as shown in Table 2, Figure 2, and our supplementary report [52]. In the future we will make tuple fingerprint lookups on Hyperledger in parallel to improve overall query processing time, and investigate into the throughput of Hyperledger Fabric and Composer framework. These optimizations will go hand-in-hand with the previousely mentioned aspects in Section 4.2.

SQL language is a well-researched topic with a rich set of literature for manipulating and rewriting SQL queries for better performance. In the current Verity framework, we have adopted a way of projecting out all the columns of the tables in a nested SELECT query for integrity checking. This is due to the way tuple fingerprints are stored on the blockchain. However in the future, we can have more intelligent logic based on the structure of the query, for parsing and tuple checking, which can reduce the performance overhead and blockchain lookup time.

4.4 Indexes, Views on Base Tables

Indexes and views on tables are common in modern DBMS. The use of indexes or views in query execution is most times opaque to the end user and is decided internally by the DBMS query optimization method. Thus in an insider attack, it is possible that the attacker modifies tuples in the base tables of a DBMS, but indexes or views created on them are not updated. In such a case, the database itself is in inconsistent state. Since Verity considers DBMS to be an independent entity, we assume that these consistency checks across base tables and their indexes or views will be done by the DBMS. However, we want to highlight that the goal of Verity is to preclude illegitimately tampered tuples from being used in subsequent SQL queries, and avoid the cascade effect in important decisions based on them, e.g., academic grades or financial entities. Verity allows any tuples retrieved from DBMS that pass the integrity check against blockchain fingerprints. In case of any illegitimate modifications found in tuples, it flags them, and prevents sending the results of the query.

4.5 Special Case of Illegitimate Delete

In Section 3.2.4, we outlined our procedure for handling a DELETE query that comes through the Verity framework. However, an insider attack containing illegitimate tuple deletions presents a different challenge for detecting it purely through the subsequent SQL queries and their results. If an inside attacker deletes tuples from DBMS (and erases any logs relating to that too), the respective tuples never get selected as a part of any subsequent SQL queries, never show up in Verity framework for checking against blockchain fingerprints, and thus deletion does not get detected.

We propose the following solution for this. Recall from Section 2.5, that along with the fingerprint of each tuple, we also store the total number of tuples in a table on the blockchain. One solution for detecting illegitimate delete is to periodically run SELECT count(*) FROM T query for each table T in the DBMS, and cross verify the number of tuples returned with the tuple count stored on the blockchain. However, this check can be fooled with an illegitimate delete followed by a dummy (illegitimate) insert in the same table. To handle this case, Verity framework can periodically run SELECT * FROM T query for every table T, and check every tuple’s fingerprint against that stored on the blockchain. The dummy tuple’s fingerprint will not be found on the blockchain, and an alert will be raised for tampering of tuples.

5 Experiments

In this section we present our experiments using Verity.

(a) Scaling Factor=
(b) Scaling Factor=
(c) Scaling Factor=
(d) Scaling Factor=
Figure 2: Plots of relation between processing time vs number of effective tuples processed as per our parsing method (see Section 3.2) for 4 TPC-H scaling factors, on 43 queries as given in Appendix A.

5.1 Computation Setup

As described before in Section 3, we use Hyperledger Fabric v1.1 as the blockchain network, and Hyperledger Composer v0.19.18 REST API to write the chaincodes (for interacting with Hyperledger Fabric). Hyperledger Fabric was deployed using docker version 18.09.0, build 4d60db4. We wrote our SQL parser using Python v3.5.2 programming language, and used SQLite v3.11.0 as the underlying DBMS.

For testing, we deployed this setup on two Asus 2U servers with Intel C602-A chipset, Intel Core Xeon E5-2620 processor, should we mention number of cores on openstack vm? 16GB RAM and 1TB SATA Hard disk running Ubuntu 16.04.01 with Linux 4.4.0-62-generic kernel. We used one machine to deploy Hyperledger Fabric, and used another to parse the queries and send requests to Hyperledger. The machines were networked together using a CAT5 LAN cable. Which machine SQLite deployed on?.

5.2 Dataset and SQL Queries

We used the TPC-H [21] synthetic relational data generator in our experiments, and generated four datasets using scaling factors 0.001, 0.002, 0.005, and 0.01. Note that, although we have used TPC-H, our goal in Verity framework is not to benchmark SQLite or Hyperledger Fabric, but to demonstrate functioning of Verity under variety of SQL queries. Table 1 shows the number of tuples in each table within these datasets corresponding to each scaling factor.

Using the 22 queries from TPC-H suite as guidelines, we generated 43 queries of varying complexity and intermix of SELECT, INSERT, UPDATE, and DELETE555TPC-H suite did not have many update queries.. All the 43 queries are given in Appendix A. In summary – we used 27 SELECT queries having nested subqueries as well as joins, 6 INSERT, 5 UPDATE, and 5 DELETE queries, simple as well as with nested SELECT statements within them. Note that the queries have varying levels of complexity and nested structures within them.

Number of tuples
Scale Factors 0.001 0.002 0.005 0.01
Customer 150 300 750 1500
Lineitem 6005 11957 30201 60175
Nation 25 25 25 25
Orders 1500 3000 7500 15000
Part 200 400 1000 2000
Patsupp 700 1500 3900 8000
Region 5 5 5 5
Supplier 10 20 50 100
Total 8595 17207 43431 86805
Table 1: Number of tuples in each table of the TPC-H dataset
Q No. Q Type Is nested query? # Tables # Tuples End-to-end time Time per tuple
Scaling Factor = 0.001
2 S 1 5 0.52 0.10
11 U(S) 3 3 2.68 0.89
18 I 1 5 12.12 2.42
19 U 1 6 14.18 2.36
26 D 1 12 29.37 2.45
40 S(S) 3 2504 249.08 0.10
Scaling Factor = 0.002
2 S 1 5 0.48 0.10
11 U(S) 3 3 2.55 0.85
18 I 1 5 11.84 2.37
19 U 1 6 13.87 2.31
26 D 1 12 28.80 2.40
40 S(S) 3 5040 416.84 0.08
Scaling Factor = 0.005
2 S 1 5 0.45 0.09
11 U(S) 3 3 2.59 0.86
18 I 1 5 11.97 2.39
19 U 1 6 13.96 2.33
26 D 1 12 29.14 2.43
40 S(S) 3 12632 1,037.30 0.08
Scaling Factor = 0.01
2 S 1 5 0.52 0.10
11 U(S) 3 3 2.66 0.89
18 I 1 5 12.25 2.45
19 U 1 6 14.08 2.35
26 D 1 12 29.30 2.44
40 S(S) 3 25314 2,146.84 0.08
Table 2: Execution stats of representative queries from full report [52]. “QNo.” as per list in Appendix A. “# Tables” – tables accessed by the query, “# Tuples” – total number of tuples accessed/verified. S, I, U, D stands for SELECT, INSERT, UPDATE, DELETE respectively, letter in bracket shows type of nested query.

5.3 Performance Metrics

For experiments we used the following metrics:

  1. [noitemsep]

  2. TPC-H scaling factor, 0.001, 0.002, 0.005, 0.01.

  3. Time taken for end-to-end execution of each query, i.e., wall-time, averaged over 5 runs.

  4. Time taken for the execution of SQL queries on SQLite (Section 3.2 elaborates how the queries are executed).

  5. Time taken for Hyperledger Fabric lookup of fingerprints of the tuples in the query results (Section 3.2 describes how tuples are extracted for lookup from the SQL query results).

  6. Number of tuples accessed, inserted, or modified by a query – in case of an INSERT, UPDATE, DELETE query with nested SELECT subquery, this number includes the tuples accessed by the nested SELECT query as well as the ones modified by the INSERT, UPDATE, or DELETE query.

5.4 Analysis of the Results

We ran 43 queries on the data generated by each TPC-H scaling factor separately, and noted the end-to-end wall time, time taken to run the queries on SQLite, and time taken by Hyperledger Fabric averaged over 5 runs. We observed that the time taken for running the queries on SQLite was very negligible compared to the Hyperledger lookup. Hence we plotted the end-to-end runtime of the 43 queries against the total number of tuples affected or accessed in each query for each scaling factor. Figure 2 shows these plots.

From the plots in Figure 2, we can clearly see a linear relationship between the number of tuples affected by a query and total query runtime. Recall from our architecture given in Section 3, that Verity acts only as a framework facilitating the use of a blockchain network with an SQL DBMS. As given in Section 2.4, we store only 256 bit fixed-length fingerprint of each tuple on the blockchain network. Thus lookup overhead per tuple remains constant, and scales linearly as per total number of tuples.

For an interested reader, we have also given the exact execution times for each query, for each scaling factor in our supplementary report at [52]. In Table 2, we present the execution statistics for representative queries from this report of varying types (S, I, U, D), complexity, and nesting structure for quick insights into the results. From this detailed analysis, it can be observed that for a SELECT query, typical Hyperledger lookup time is 0.08–0.1 second per tuple, and for queries requiring modification of tuples, such as INSERT, UPDATE, DELETE, this time is about 0.8–2.5 seconds per tuple. The higher time for the queries requiring modifications in the tuples is due to the peer-consensus protocol that needs to add a fingerprint, or adjust the number of tuples per table on the Hyperledger network (ref. Sections 2.4, 2.5).

From these results, one may feel that use of blockchains along with a DBMS incurs overheads that seem undesirable for high-performance throughput. However, as discussed before in the text of this paper, in the present work, our aim is to – (a) establish a formalism for handling a rich set of complex SQL queries, (b) without migrating entire DBMS data on a blockchain, (c) thereby maintaining privacy, (d) while using the tamper-resistance properties of a blockchain to detect an insider attack. We have achieved this by intercepting SQL queries and their resulting tuples, by treating both the blockchain as well as DBMS as black-boxes. Verity itself does not do any SQL query optimization, or data or metadata caching and indexing. Thus the overall throughput of the Verity framework can be improved by investigating the blockchain throughput improvement methods (which is not the focus of our present work). Nevertheless, in Section 4.3 we have discussed the performance optimizations we intend to do in the Verity framework through parallelizing the blockchain lookups, and investigating SQL query structures.

6 Related Work

We classify related work broadly into categories given in the following subsections.

6.1 Blockchains and DBMS

As discussed briefly in Section 1, current efforts for using blockchain technology for DBMS mainly offer solutions for integrating blockchain functionalities, such as peer-consensus protocol and decentralized storage into native DBMS. Big-chainDB [9] integrates Tendermint [20, 44] with MongoDB [17]. It supports only decentralized blockchain based data management eco-system, and only MongoDB’s querying interface. LedgerDB [16] supports high transaction throughput, but provides only a single table, and does not support various SQL features. ChainDB by Bitpay Inc [10] focuses on bitcoin transactions, than providing a general purpose DBMS solution. EthernityDB [39] integrates a DBMS functionality into an Ethereum blockchain, by keeping all the data on the chain, and by mapping DBMS functionalities onto Ethereum smart contracts.

In most of these solutions the data from native DBMS has to be migrated entirely onto a new blockchain powered system. Solutions such as BigchainDB, LedgerDB, or ChainDB do not provide rich SQL interface. In comparison to them, our goal in Verity is to use blockchain functionality of non-repudiation with a native DBMS without complete migration of the data, and by just intercepting the SQL queries and their results for insider attack detection. In Verity, we store only fingerprints of the tuples on a blockchain, thus maintaining data privacy.

6.2 Intrusion Detection Systems

Intrusion Detection Systems (IDS) mainly use machine learning techniques to model legitimate or illegitimate behaviour to detect an anomaly. Their performance heavily relies on the training data used for modelling user behaviour

[23, 24, 43, 47, 50]. An interested reader can refer to a more comprehensive survey given in [40].

IDSs specific for DBMS model access patterns of DBMS users and learn data dependencies among data items. DEMIDS (DEtection of MIsuse in Database System) [30] analyzes access patterns of users by using audit logs, builds user profiles, and use these to detect unusual behaviour. RBAC (Role Based Access Control) [27] improves upon DEMIDS by creating profiles for each role instead of individual user and associating users with roles, which enables them to handle a large number of database users. They use Naïve Bayes Classifier. Wu et al [54] do role profiling with role hierarchies. Another technique mines the dependencies among data items [41], to detect what set of data items are usually accessed before a particular data is changed. Any transactions not compliant with these pre-built models are treated as malicious. WDDRM (Weighted Data Dependency Rule Miner) [51] improves upon this technique by associating weight to an attribute based on the sensitivity of the data stored in that attribute. There are other techniques that use time signatures [45, 38]

and Hidden Markov Models

[26].

6.3 Mitigation and Prevention Techniques

In [53], Vance et al propose to make the user accountable for policy violation, and in [28], Bishop et al propose attribute-based group access control (ABGAC) policies to act as a deterrent. Wu et al [55] propose the concept of “active data leak prevention” where they use an encrypted secure data container (SDC) to ensure that only authorized users are able to access the data in a trusted environment. In [48], Pramanik et al propose policies that prohibit modification of a sensitive file, while another “inappropriate” file is open. Chagarlamudi at al [29] propose sequential access checking technique that prevents execution of malicious user tasks by using PetriNets. Confidentiality via camouflage performs deterministic numerical interval-based responses of ad-hoc queries to a DBMS [36, 34, 35].

Along with these techniques, access control policies are used to prevent insider attacks. In [42], they integrate security policy mechanism in the system. Other techniques augment access control with role based policies, trustworthiness of the users, and risk assessment of the roles [32, 25]. In Cyber Deception [22], they propose a technique based on software defined networks (SDN) to defend the network with extensive scrutiny of the network.

In summary, Verity’s approach to insider attack detection is significantly different from IDS, or previously proposed prevention and mitigation techniques. In Verity, we use a technique to detect any tuple with illegitimate modifications, that is accessed in subsequent SQL queries, through fingerprint checks against Hyperledger Fabric. At the time of bootstrapping, Verity stores fingerprints of the tuples on Hyperledger, but that does not involve any machine learning of the system666Special case of illegitimate deletion of tuples is discussed in Section 4.5.. Also through Verity, one can use any blockchain network with any SQL DBMS by using their respective interfaces, and peer consensus protocols.

7 Conclusion

In this paper, we propose Verity – a framework to use blockchains to detect an insider attack or data tampering in a DBMS. We have achieved this by storing fingerprints of DBMS tuples on a blockchain (Section 2.4), intercepting SQL queries (Section 3.2), and verifying validity of the tuples accessed or affected by these queries against the blockchain fingerprints. For any queries modifying the data that are processed through Verity, it can attribute the changes to the corresponding owner. Verity has some latency due to its tuple integrity checks against blockchain for each query. However, we trust that in systems where integrity of the DBMS tuples is important from the point of critical decisions based on them, this latency can be tolerated instead of compromising the integrity of the data. Nevertheless, we have discussed ways of improving throughput in Section 4.3. Verity framework facilitates use of blockchain’s non-repudiation along with a native SQL DBMS without the need of data migration or adoption of different query interface. Verity framework does not store any data or metadata within itself persistently. This makes Verity quickly adoptable in critical applications requiring data integrity, provided some latency can be tolerated. We have implemented Verity for an academic grade management system and our current solution works very well there as the database does not have very high transaction rate and can tolerate delays caused by the intervention of the blockchain.

References

  • [1] By 2021, 75 percent of public blockchains will suffer privacy poisoning – inserted personal data that renders the blockchain noncompliant with privacy laws. https://www.dig-in.com/list/gartners-top-10-tech-predictions-for-2019-and-beyond.
  • [2] Hyperledger Composer. https://hyperledger.github.io/composer/v0.19/index.html.
  • [3] NY school official accused of changing grades. http://www.fox5ny.com/news/new-rochelle-grade-tampering-scandal.
  • [4] Trends and future directions in data security – 2015 vormetric insider threat report. Technical report, 2015. https://dtr.thalesesecurity.com/insiderthreat/2015/pdf/2015-vormetric-insider-threat-press-deck-v3.pdf.
  • [5] 2017 Insider Threat Study, 2017. https://haystax.com/blog/whitepapers/insider-attacks-industry-survey/.
  • [6] Former University of Iowa student nabbed in high-tech cheating scheme, 2017. http://www.nydailynews.com/news/national/student-arrested-stealing-tests-changing-grades-article-1.3595691.
  • [7] Probe finds late grade changes for 5,500 in Prince George’s, 2017. https://www.washingtonpost.com/local/education/probe-finds-late-grade-changes-for-5500-in-prince-georges/2017/11/03/5e54e10c-be62-11e7-959c-fe2b598d8c00_story.html.
  • [8] AAP Alleges Mass Deletion Of Votes, EVM Tampering At All-Party Meet, 2018. https://www.ndtv.com/delhi-news/aap-alleges-mass-deletion-of-votes-evm-tampering-at-all-party-meet-1960579.
  • [9] Bigchaindb 2.0: The blockchain database. white paper, 2018. https://www.bigchaindb.com/whitepaper/bigchaindb-whitepaper.pdf.
  • [10] Chaindb: A peer-to-peer database system, 2018. https://bitpay.com/chaindb.pdf.
  • [11] Congress demands JPC probe over alleged irregularities in Telangana polls, 2018. https://www.indiatoday.in/elections/story/congress-demands-jpc-probe-over-alleged-irregularities-in-telangana-polls-1409998-2018-12-15.
  • [12] Cryptographic hash functions, 2018. https://en.wikipedia.org/wiki/Cryptographic˙hash˙function.
  • [13] Former UGA student hacked into system to change grade, 2018. http://www.fox5atlanta.com/news/police-former-uga-student-hacked-into-system-to-change-grade.
  • [14] Hyperledger fabric, 2018. https://www.hyperledger.org/projects/fabric.
  • [15] Insider Threat – CA Technologies, 2018. https://www.ca.com/content/dam/ca/us/files/ebook/insider-threat-report.pdf.
  • [16] LedgerDB github repo., 2018. https://github.com/ledgerdb/ledgerdb.
  • [17] MongoDB – Opensource Document Database, 2018. https://www.mongodb.com/.
  • [18] Opposition raises EVM tampering claims in Kairana, Noorpur, 2018. https://economictimes.indiatimes.com/news/politics-and-nation/opposition-raises-evm-tampering-claims-in-kairana-noorpur/articleshow/64351367.cms.
  • [19] SQLite Database, 2018. https://www.sqlite.org/index.html.
  • [20] Tendermint, 2018. https://tendermint.com/.
  • [21] TPC-H Benchmark, 2018. http://www.tpc.org/tpch/.
  • [22] S. Achleitner, T. La Porta, P. McDaniel, S. Sugrim, S. V. Krishnamurthy, and R. Chadha. Cyber deception: Virtual networks to defend insider reconnaissance. In Proceedings of the 8th ACM CCS International Workshop on Managing Insider Security Threats, MIST ’16, pages 57–68, New York, NY, USA, 2016. ACM.
  • [23] G. Ali, N. A. Shaikh, and Z. A. Shaikh. Towards an automated multiagent system to monitor user activities against insider threat. In 2008 International Symposium on Biometrics and Security Technologies, pages 1–5, April 2008.
  • [24] Q. Althebyan and B. Panda. A knowledge-base model for insider threat prediction. In 2007 IEEE SMC Information Assurance and Security Workshop, pages 239–246, June 2007.
  • [25] N. Baracaldo and J. Joshi. A Trust-and-risk Aware RBAC Framework: Tackling Insider Threat. In Proceedings of the 17th ACM Symposium on Access Control Models and Technologies, SACMAT ’12, pages 167–176, New York, NY, USA, 2012. ACM.
  • [26] D. Barbará, R. Goel, and S. Jajodia. Mining malicious corruption of data with hidden Markov models. In Research Directions in Data and Applications Security, pages 175–189. Springer, 2003.
  • [27] E. Bertino, E. Terzi, A. Kamra, and A. Vakali. Intrusion detection in RBAC-administered databases. In Computer security applications conference, 21st annual, pages 10–pp. IEEE, 2005.
  • [28] M. Bishop, S. Engle, S. Peisert, S. Whalen, and C. Gates. We have met the enemy and he is us. In Proceedings of the 2008 New Security Paradigms Workshop, NSPW ’08, pages 1–12, New York, NY, USA, 2008. ACM.
  • [29] M. Chagarlamudi, B. Panda, and Y. Hu. Insider threat in database systems: Preventing malicious users’ activities in databases. In 2009 Sixth International Conference on Information Technology: New Generations, pages 1616–1620, April 2009.
  • [30] C. Y. Chung, M. Gertz, and K. Levitt. Demids: A misuse detection system for database systems. In Integrity and Internal Control in Information Systems, pages 159–178. Springer, 2000.
  • [31] M. Collins, M. Theis, R. Trzeciak, J. Strozer, J. Clark, D. Costa, T. Cassidy, M. Albrethsen, and A. Moore. Common sense guide to mitigating insider threats. Technical Report CMU/SEI-2016-TR-015, Software Engineering Institute, Carnegie Mellon University, Pittsburgh, PA, 2016.
  • [32] J. Crampton and M. Huth. Towards an access-control framework for countering insider threats. In Insider Threats in Cyber Security, 2010.
  • [33] G. Engstrand. Evaluating Hyperledger Composer. https://www.infoq.com/articles/evaluating-hyperledger-composer.
  • [34] R. Garfinkel, R. Gopal, and P. Goes. Privacy protection of binary confidential data against deterministic, stochastic, and insider threat. Management Science, 48(6):749–764, 6 2002.
  • [35] R. Garfinkel, R. Gopal, and D. Rice. New approaches to disclosure limitation while answering queries to a database: Protecting numerical confidential data against insider threat based on data or algorithms. In Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS’06), volume 6, pages 125a–125a, Jan 2006.
  • [36] R. D. Gopal, R. S. Garfinkel, and P. B. Góes. Confidentiality via camouflage: The CVC approach to disclosure limitation when answering queries to databases. Operations Research, 50(3):501–516, 2002.
  • [37] H. Handschuh. SHA Family (Secure Hash Algorithm), pages 565–567. Springer US, Boston, MA, 2005.
  • [38] S. Hashemi, Y. Yang, D. Zabihzadeh, and M. Kangavari. Detecting intrusion transactions in databases using data item dependencies and anomaly analysis. Expert Systems, 25(5):460–473, 2008.
  • [39] S. Helmer, M. Roggia, N. E. Ioini, and C. Pahl. EthernityDB – Integrating Database Functionality into a Blockchain. In New Trends in Databases and Information Systems, 2018.
  • [40] I. Homoliak, F. Toffalini, J. Guarnizo, Y. Elovici, and M. Ochoa. Insight into insiders: A survey of insider threat taxonomies, analysis, modeling, and countermeasures. CoRR, abs/1805.01612, 2018.
  • [41] Y. Hu and B. Panda. A data mining approach for database intrusion detection. In Proceedings of the 2004 ACM symposium on Applied computing, pages 711–716. ACM, 2004.
  • [42] G. G. Jabbour and D. A. Menascé. Stopping the insider threat : the case for implementing integrated autonomic defense mechanisms in computing systems. 2010.
  • [43] M. Kandias, A. Mylonas, N. Virvilis, M. Theoharidou, and D. Gritzalis. An insider threat prediction model. In S. Katsikas, J. Lopez, and M. Soriano, editors, Trust, Privacy and Security in Digital Business, pages 26–37, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg.
  • [44] J. Kwon. Tendermint: Consensus without mining. Draft v. 0.6, fall, 2014.
  • [45] V. C. Lee, J. A. Stankovic, and S. H. Son. Intrusion detection in real-time database systems via time signatures. In Real-Time Technology and Applications Symposium, 2000. RTAS 2000. Proceedings. Sixth IEEE, pages 124–133. IEEE, 2000.
  • [46] S. Nakamoto. Bitcoin: A Peer-to-Peer Electronic Cash System. 2008. https://bitcoin.org/bitcoin.pdf.
  • [47] S. Panigrahi, S. Sural, and A. K. Majumdar. Two-stage database intrusion detection by combining multiple evidence and belief update. Information Systems Frontiers, 15(1):35–53, Mar 2013.
  • [48] S. Pramanik, V. Sankaranarayanan, and S. Upadhyaya. Security policies to mitigate insider threat in the document control domain. In 20th Annual Computer Security Applications Conference, pages 304–313, Dec 2004.
  • [49] R. Ramakrishnan and J. Gehrke. Database Management Systems. McGraw-Hill, Inc., New York, NY, USA, 3 edition, 2003.
  • [50] V. Sankaranarayanan, S. Pramanik, and S. Upadhyaya. Detecting masquerading users in a document management system. In 2006 IEEE International Conference on Communications, volume 5, pages 2296–2301, June 2006.
  • [51] A. Srivastava, S. Sural, and A. K. Majumdar. Weighted intra-transactional rule mining for database intrusion detection. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 611–620. Springer, 2006.
  • [52] S. S. Srivastava, M. Atre, S. Sharma, R. Gupta, and S. K. Shukla. Verity Technical Report – Additional Experimental Results. Technical report. https://www.cs.ox.ac.uk/people/medha.atre/papers/veritytechrep.pdf.
  • [53] A. Vance, P. B. Lowry, and D. Eggett. Increasing accountability through user-interface design artifacts: A new approach to addressing the problem of access-policy violations. MIS Q., 39(2):345–366, June 2015.
  • [54] G. Z. Wu, S. L. Osborn, and X. Jin. Database intrusion detection using role profiling with role hierarchy. In W. Jonker and M. Petković, editors, Secure Data Management, pages 33–48, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg.
  • [55] J. Wu, J. Zhou, J. Ma, S. Mei, and J. Ren. An active data leakage prevention model for insider threat. In Intelligence Information Processing and Trusted Computing (IPTC), 2011 2nd International Symposium on, pages 39–42. IEEE, 2011.

Appendix A SQL Queries

Q1: SELECT o_year, nation, ( sum (volume) as mkt_share) from (( select (o_orderdate as o_year), ((l_extendedprice * (1 - l_discount)) as volume), (n2.n_name as nation) from part, supplier, lineitem, orders, customer, (nation as n1), (nation as n2), region where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = “asia” and s_nationkey = n2.n_nationkey and o_orderdate ’1995-01-01’ and o_orderdate ‘1996-12-31’ and p_type = ‘large plated tin’ ) as all_nations);

Q2: SELECT * from region;

Q3: SELECT * from supplier;

Q4: SELECT * from nation;

Q5: UPDATE customer set c_name = “sjadfd”, c_address = “kafawehrnj”, c_phone=“7894561265”, c_acctbal = 22, c_mktsegment = “klasjfaw”, c_comment=“laksfnwe” where c_custkey = 91639739;

Q6: INSERT into nation (n_nationkey, n_name, n_regionkey, n_comment) values ( 93793619 ,“algeria”, 123454556741 ,“haggle detect slyly agai”);

Q7: INSERT into customer ( c_custkey , c_name , c_address , c_nationkey , c_phone , c_acctbal ,c_mktsegment, c_comment ) values (91639739 , “loren”, “lipsum”, 93793619 , “1234”, 234, “muspil”, “nerol”);

Q8: INSERT into region (r_regionkey, r_name, r_comment) values (123454556741, “sambhal”, “jhfasfhf kajhfawerb idauhfwerbe aksfhnwejrb”);

Q9: DELETE from nation where n_nationkey = 93793619;

Q10: DELETE from region where r_regionkey = 123454556741;

Q11: UPDATE supplier set s_nationkey = (select c_nationkey from customer where c_custkey= 91639739), s_phone = ( select c_phone from customer where c_custkey= 91639739), s_comment =“askdenrjuhereu”, s_acctbal = 2 + 10 where s_suppkey = 91639739 +1000;

Q12: SELECT s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 1 and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = “africa”;

Q13: SELECT n_name, (sum(l_extendedprice * (1 - l_discount)) as revenue) from customer, orders, lineitem, supplier, nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = “asia” and o_orderdate “1995-03-09” and o_orderdate “1996-03-09”;

Q14: SELECT supp_nation, cust_nation, l_shipdate, (sum (volume) as revenue) from (( select (n1.n_name as supp_nation), ( n2.n_name as cust_nation), l_shipdate, ((l_extendedprice * (1 - l_discount)) as volume) from supplier, lineitem, orders, customer, (nation as n1), (nation as n2) where s_suppkey = l_suppkey and o_orderkey = l_orderkey and c_custkey = o_custkey and s_nationkey = n1.n_nationkey and c_nationkey = n2.n_nationkey and ( (n1.n_name = “india” and n2.n_name = “united states”) or (n1.n_name = “united states” and n2.n_name = “india”) ) and l_shipdate ‘1995-01-01’ and l_shipdate ‘1996-12-31’ ) as shipping);

Q15: SELECT (sum(l_extendedprice * l_discount) as revenue) from lineitem where l_shipdate “1994-04-15” and l_shipdate “1995-04-15” and l_discount 0.04 - 0.01 and l_discount 0.04 + 0.01 and l_quantity 20 ;

Q16: SELECT o_orderpriority, (count(*) as order_count) from orders where o_orderdate “1995-03-09” and o_orderdate “1995-06-09”;

Q17: SELECT l_shipmode, o_orderpriority from orders, lineitem where o_orderkey = l_orderkey and (l_shipmode = “ship” or l_shipmode = “air”) and l_commitdate l_receiptdate and l_shipdate l_commitdate and l_receiptdate “1995-01-01” and l_receiptdate “1996-01-01”;

Q18: INSERT into customer ( c_custkey , c_name , c_address , c_nationkey , c_phone , c_acctbal ,c_mktsegment, c_comment) values (91639738 , “sumpil”, “renol”, 93793619 , “9242”, 234, “pilsum”, “rolen”) , ( 91639737 , “abc”, “def”, 93793619 , “1234”, 234, “yhbdsra”, “afgsdf”), (96244913, “pkjhbc”, “mnhgre”, 93793619 , “9543”, 234, “qaxcvf”, “iomnbgf”), ( 96244914, “yuthgbvfg”, “qgytrevd”, 93793619 , “75345”, 234, “liyhvdrt”, “qfgkdyv”), (96244915, “ramnabfubt”,
“njhiyfcvh”, 93793619 , “126789”, 234, “summinhsve”,
“qgjorutbbs”);

Q19: UPDATE customer set c_name=“ashdehhrbeki” where c_name = “sjadfd”;

Q20: UPDATE customer set c_name = “sjadfd”, c_address = “asndkwewhrwoer”, c_phone = “3245456458”, c_acctbal=22 ,c_mktsegment = “ajshuejre”, c_comment = “asjhdeke” where c_nationkey = 93793619;

Q21: DELETE from customer where c_nationkey = 93793619;

Q22: INSERT into supplier (s_suppkey, s_name, s_address, s_nationkey, s_phone, s_acctbal, s_comment) select ( c_custkey + 1000 ), c_name, c_address, c_nationkey, c_phone, c_acctbal, c_comment from customer where c_nationkey = 93793619;

Q23: SELECT l_orderkey, (sum ( l_extendedprice) as revenue), o_orderdate, o_shippriority from customer, orders, lineitem where c_mktsegment = “automobile” and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate “1995-03-09” and l_shipdate “1995-03-09”;

Q24: SELECT (sum ( l_extendedprice * (1 - l_discount )) as promo_revenue) from lineitem, part where l_partkey = p_partkey and l_shipdate “1995-01-01” and l_shipdate “1995-02-01”;

Q25: SELECT * from customer;

Q26: DELETE from supplier where s_nationkey = 93793619;

Q27: SELECT p_brand, p_type, p_size, (count ( ps_suppkey ) as supplier_cnt ) from partsupp, part where p_partkey = ps_partkey and p_brand ‘brand#34’ and p_type not like ‘medium brushed brass’ and ( p_size = 22 or p_size = 47 or p_size = 30 or p_size = 29 or p_size = 11 or p_size = 37 or p_size = 42 or p_size = 34 or p_size = 40 );

Q28: SELECT * from part;

Q29: SELECT ps_partkey, (sum(ps_supplycost * ps_availqty) as value) from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = “india”;

Q30: SELECT c_custkey, c_name, (sum ( l_extendedprice * (1 - l_discount)) as revenue), c_acctbal, n_name, c_address, c_phone, c_comment from customer, orders, lineitem, nation where c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate “1995-01-01” and o_orderdate “1995-04-01” and l_returnflag = ‘r’ and c_nationkey = n_nationkey;

Q31: UPDATE supplier set s_acctbal = (select sum(c_custkey) from (customer as c), (nation as n) where c.c_nationkey = n.n_nationkey), s_phone = ( select c_phone from customer where c_custkey= 91639738), s_comment = “asdherbejhrbeh” where s_suppkey = 100001;

Q32: SELECT s_suppkey, n_name, s_name from supplier, nation where supplier.s_nationkey = nation.n_nationkey;

Q33: SELECT s_suppkey, n_name, s_name from (( select * from supplier ) as sup ), nation where sup.s_nationkey = nation.n_nationkey;

Q34: SELECT s_suppkey, s_name, s_address, s_phone, total_revenue from supplier, (( select ( l_suppkey), (max ( l_extendedprice * (1 - l_discount)) as total_revenue) from lineitem where l_shipdate “1995-01-01” and l_shipdate “1995-04-01”) as sup) where s_suppkey = sup.l_suppkey;

Q35: SELECT sn.s_name, rn.r_name from (( select n_name, s_name from (supplier as sup ), nation where sup.s_nationkey = nation.n_nationkey) as sn), (( select n_name, r_name from ( region as reg ), nation where reg.r_regionkey = nation.n_regionkey ) as rn ) where sn.n_name = rn.n_name;

Q36: SELECT * from partsupp;

Q37: SELECT * from orders;

Q38: SELECT nation, o_year, (sum (amount) as sum_profit) from (( select ( n_name as nation), (o_orderdate as o_year), ((( l_extendedprice * (1 - l_discount )) - ( ps_supplycost * l_quantity )) as amount ) from part, supplier, lineitem, partsupp, orders, nation where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like ‘%ab%’ ) as profit);

Q39: SELECT l_returnflag, l_linestatus, (sum (l_quantity) as sum_qty), (sum (l_extendedprice) as sum_base_price), (avg (l_quantity) as avg_qty), (avg (l_extendedprice) as avg_price), (avg (l_discount) as avg_disc) from lineitem where l_quantity 20;

Q40: SELECT c_orders.c_custkey, (count(*) as custdist) from (( select c_custkey, o_orderkey from customer, orders where c_custkey = o_custkey and o_comment not like ‘%fi%al%’ ) as c_orders); select * from customer;

Q41: DELETE from supplier where s_suppkey 20000;

Q42: INSERT into supplier ( s_suppkey, s_name, s_address, s_nationkey, s_phone, s_acctbal, s_comment ) select ( c_custkey + 100000 ), n_name, c_address, c_nationkey, c_phone, c_acctbal, c_comment from ( customer as c), ( nation as n ) where c.c_nationkey = n.n_nationkey ;

Q43: SELECT * from lineitem;