Satoshi Nakamoto’s 2008 proposal of Bitcoin  has revolutionised the financial sector. It helped realise a monetary system without relying on a central trusted authority, which has since then given rise to hundreds of new systems known as cryptocurrencies. Interestingly however, a closer look into the basics of Bitcoin sheds light on a new technology, blockchains. Ever since, there has been a lot of ongoing academic research [21, 28, 14, 16] on the security and applications of blockchains as a primitive. A blockchain in its most primitive form is a decentralised chain of agreed upon blocks containing timestamped data.
A consensus mechanism supports the decentralised nature of blockchains. There are different types of consensus mechanisms that are based on different resources, such as Proof of Work (PoW) based on computational power, Proof of Stake (PoS) based on the stake in the system, Proof of Space based on storage capacity, among many others. Typically, users in the system store a local copy of the blockchain and run the consensus mechanism to agree on a unified view of the blockchain. These mechanisms must rely on non-replicability of resources to be resilient against simple sybil attacks where the adversary spawn multiple nodes under his control.
Apart from its fundamental purpose of being a digital currency, Bitcoin exploits the properties of its blockchain, as in being used as a tool for many different applications, such as timestamp service [23, 22], to achieve fairness and correctness in secure multi-party computation [9, 7, 15, 31], and to build smart contracts . It acts as an immutable public bulletin board, supporting the storage of arbitrary data through special operations. For instance, the code, can take up to 80 bytes of arbitrary data that gets stored in the blockchain. With no requirement for centralised trust and its capability of supporting complex smart contracts, communication through the blockchain has become practical, reasonably inexpensive and very attractive for applications.
Blockchain and Immutability
The debate about the immutability of blockchain protocols has gained worldwide attention lately due to the adoption of the new General Data Protection Regulation (GDPR) by European states. Several provisions of the GDPR regulation are inherently incompatible with current permissionless immutable blockchain proposals (e.g., Bitcoin and Ethereum)  as it is not possible to remove any data (addresses, transaction values, timestamp information) that has stabilised111A transaction (or data) is considered stable in the blockchain when it is “deep" enough into the chain. We formally define this property in Section 2.2. in the chain in such protocols. Since permissionless blockchains are completely decentralised and allow for any user to post transactions to the chain for a small fee, malicious users can post transactions to the system containing illegal and/or harmful data, such as (child) pornography, private information or stolen private keys, etc. The existence of such illicit content was first reported in  and has remained a challenge for law enforcement agencies like Interpol . Moreover, quantitative analysis in the recent work of Matzutt et al.  shows that it is not feasible to “filter" all data from incoming transactions to check for malicious contents before the transaction is inserted into the chain. Therefore, once it becomes public knowledge that malicious data was inserted (and has stabilised) into the chain, the honest users are faced with the choice of either, willingly broadcast illicit (and possibly illegal [34, 5]) data to other users, or to stop using the system altogether.
This effect greatly hinders the adoption of permissionless blockchain systems, as honest users that are required to comply with regulations, such as GDPR, are forced to withdraw themselves from the system if there is no recourse in place to deal with illicit data inserted into the chain.
1.1 State of the Art
Specifically to tackle the problem of arbitrary harmful data insertions in the blockchain, the notion of redacting the contents of a blockchain was first proposed by Ateniese et al. . The authors propose a solution more focused on the permissioned blockchain setting222The permissioned blockchain setting is when there is a trusted third party (TTP) that deliberates on the users’ entry into the system. based on chameleon hashes . In their protocol, a chameleon hash function replaces the regular SHA256 hash function when linking consecutive blocks in the chain. When a block is modified, a collision for the chameleon hash function can be efficiently computed (with the knowledge of the chameleon trapdoor key) for that block, keeping the state of the chain consistent after arbitrary modifications.
In a permissioned setting where the control of the chain is shared among a few semi-trusted parties, the solution from  is elegant and works nicely, being even commercially adopted by a large consultancy company [4, 3, 11]. However, in permissionless blockchains such as Bitcoin, where the influx of users joining and leaving the system is ever changing and without any regulation, their protocol clearly falls short in this scenario, as their techniques of secret sharing the chameleon trapdoor key and running a MPC protocol to compute a collision for the chameleon hash function do not scale to the thousands of users in the Bitcoin network. Moreover, when a block is removed in their protocol it is completely unnoticeable to the users, leaving no trace of the old state. Although this could make sense in a permissioned setting, in a permissionless setting one would like to have some public accountability as to when and where a redaction has occurred.
Later, Puddu et al.  proposed a blockchain protocol where the sender of a transaction can encrypt alternate versions of the transaction data, known as “mutations"; the only unencrypted version of the transaction is considered to be the active transaction. The decryption keys are secret shared among the miners, and the sender of a transaction establishes a mutation policy for his transaction, that details how (and by whom) his transaction is allowed to be mutated. On receiving a mutate request, the miners run a MPC protocol to reconstruct the decryption key and decrypt the appropriate version of the transaction. The miners then publish this new version as the active transaction. In case of permissionless blockchains, they propose the usage of voting for gauging approval based on computational power. However, in a permissionless setting a malicious user can simply not include a mutation for his transaction, or even set a mutation policy where only he himself is able to mutate the transaction. Moreover, to tackle transaction consistency, where a mutated transaction affects other transactions in the chain, they propose to mutate all affected transactions through a cascading effect. This however, completely breaks the notion of transaction stability, e.g., a payment made in the past to a user could be altered as a result of this cascading mutation. The proposal of  also suffers from scalability issues due to the MPC protocol used for reconstructing decryption keys across different users.
It is clear that for a permissionless blockchain without centralised trust assumptions, a practical solution for redacting harmful content must refrain from employing large-scale MPC protocols that hinders the performance of the blockchain. It also must accommodate public verifiability and accountability such that rational miners are incentivised to follow the protocol.
1.2 Our Contributions
Editable Blockchain Protocol
We propose the first editable blockchain protocol for permissionless systems in Section 3, which is completely decentralised and does not rely on heavy cryptographic primitives or additional trust assumptions. This makes our protocol easily integrable in systems like Bitcoin (as described in Section 5). The edit operations can be proposed by any user and they are voted in the blockchain through consensus; the edits are only performed if approved by the blockchain policy (e.g., voted by the majority). The protocol is based on a PoW consensus, however, it can be easily adapted to any consensus mechanism, since the core ideas are inherently independent of the type of consensus used. Our protocol also offers accountability for edit operations, where any edit in the chain can be publicly verified.
We build our protocol on firm theoretical grounds, as we formalise all the necessary properties of an editable blockchain in Section 4, and later show that our generic protocol of Section 3.3 satisfies these properties. We borrow the fundamental properties of a secure blockchain protocol from  and adapt them to our setting.
We demonstrate the practicality of our protocol with a proof-of-concept implementation in Python. We first show in Section 6 that adding our redaction mechanism incurs in just a small overhead for chain validation time compared to that of the immutable protocol. Then, we show that for our protocol the overhead incurred for different numbers of redactions in the chain against a redactable chain with no redactions is minimal (less than for redactions on a blocks chain). Finally, we analyse the effect of the parameters in our protocol by measuring the overhead introduced by different choices of the system parameters when validating chains with redactions.
1.3 Our Protocol
Our protocol extends the immutable blockchain of Garay et al.  to accommodate for edit operations in the following way: We extend the block structure to accommodate another copy of the transaction’s Merkle root, that we denote by old state. We also consider an editing policy for the chain, that determines the constraints and requirements for approving edit operations. To edit a block in the chain, our protocol (Fig. 1) executes the following steps:
A user first proposes an edit request to the system. The request consists of the index of the block he wants to edit, and a candidate block to replace it.
When miners in the network receives an edit request, they first validate the candidate block using its old state information and verifying the following conditions: (1) it contains the correct information about the previous block, (2) it has solved the proof of work and (3) it does not invalidate the next block in the chain. If the candidate block is valid, miners can vote for it during the request’s voting period by simply including the hash of the request in the next block they mine. The collision resistance property of the hash function ensures that a vote for an edit request cannot be considered as a vote for any other edit request.
After the voting period for a request is over, everyone in the network can verify if the edit request was approved in accordance to the policy (e.g., by checking the number of votes it received). If the request was approved, then the edit operation is performed by replacing the original block with the candidate block.
To validate an edited chain, the miners validate each block exactly like in the immutable protocol; if a “broken" link is found between blocks, the miner checks if the link still holds for the old state information333A similar technique is used in  to “scar” a block that was previously redacted.. In the affirmative case, the miner ensures that the edited block has gathered enough votes and is approved, according to the policy of the chain.
Throughout this work we denote by the security parameter and by the output of an algorithm on input . We also use the terms “redact" and “edit" interchangeably in this paper.
2.1 Blockchain Basics
We make use of the notation of  to describe a blockchain. A block is a triple of the form , where , and . Here is the state of the previous block, is the data and is the proof of work of the block. A block is valid iff
Here, and are cryptographic hash functions, and the parameter is the block’s difficulty level.
The blockchain is simply a chain (or sequence) of blocks, that we call . The rightmost block is called the head of the chain, denoted by . Any chain with a head can be extended to a new longer chain by attaching a (valid) block such that ; the head of the new chain is . A chain can also be empty, and in such a case we let . The function denotes the length of a chain (i.e., its number of blocks). For a chain of length and any , we denote by the chain resulting from removing the rightmost blocks of , and analogously we denote by the chain resulting in removing the leftmost blocks of ; note that if (where ) then and . If is a prefix of we write . We also note that the difficulty level can be different among blocks in a chain.
2.2 Properties of a Secure Blockchain
In this section we detail the relevant aspects of the underlying blockchain system that is required for our protocol.
We consider time to be divided into standard discrete units, such as minutes. A well defined continuous amount of these units is called a slot. Each slot is indexed for . We assume that users have a synchronised clock that indicates the current time down to the smallest discrete unit. The users execute a distributed protocol to generate a new block in each slot, where a block contains some data. We assume the slots’ real time window properties as in . In [21, 39, 28] it is shown that a “healthy” blockchain must satisfy the properties of persistence and liveness, which intuitively guarantee that after some time period, all honest users of the system will have a consistent view of the chain, and transactions posted by honest users will eventually be included. We informally discuss the two properties next.
Persistence: Once a user in the system announces a particular transaction as stable, all of the remaining users when queried will either report the transaction in the same position in the ledger or will not report any other conflicting transaction as stable. A system parameter determines the number of blocks that stabilise a transaction. That is, a transaction is stable if the block containing it has at least blocks following it in the blockchain. We only consider a transaction to be in the chain after it becomes stable.
Liveness: If all the honest users in the system attempt to include a certain transaction into their ledger, then after the passing of time corresponding to slots which represents the transaction confirmation time, all users, when queried and responding honestly, will report the transaction as being stable.
Throughout the paper we refer to the user as both a user and a miner interchangeably.
2.3 Execution Model.
In the following we define the notation for our protocol executions. Our definitions follow along the same lines of .
A protocol refers to an algorithm for a set of interactive Turing Machines (also called nodes) to interact with each other. The execution of a protocolthat is directed by an environment/outer game , which activates a number of parties as either honest or corrupted parties. Honest parties would faithfully follow the protocol’s prescription, whereas corrupt parties are controlled by an adversary , which reads all their inputs/messages and sets their outputs/messages to be sent.
A protocol’s execution proceeds in rounds that model atomic time steps. At the beginning of every round, honest parties receive inputs from an environment ; at the end of every round, honest parties send outputs to the environment .
is responsible for delivering all messages sent by parties (honest or corrupted) to all other parties. cannot modify the content of messages broadcast by honest parties.
At any point can corrupt an honest party , which means that gets access to its local state and subsequently controls party .
At any point of the execution, can uncorrupt a corrupted party , which means that no longer controls . A party that becomes uncorrupt is treated in the same way as a newly spawning party, i.e., the party’s internal state is re-initialised and then the party starts executing the honest protocol no longer controlled by .
Note that a protocol execution can be randomised, where the randomness comes from honest parties as well as from and . We denote by the randomly sampled execution trace. More formally, denotes the joint view of all parties (i.e., all their inputs, random coins and messages received, including those from the random oracle) in the above execution; note that this joint view fully determines the execution.
3 Editing the Blockchain
In this section we introduce an abstraction of a blockchain protocol, and we describe how to extend into an editable blockchain protocol .
3.1 Blockchain Protocol
We consider an immutable blockchain protocol (for instance ), denoted by , where nodes receive inputs from the environment , and interact among each other to agree on an ordered ledger that achieves persistence and liveness. The blockchain protocol is characterised by a set of global parameters and by a public set of rules for validation. The protocol provides the nodes with the following set of interfaces which are assumed to have complete access to the network and its users.
: returns a longer and valid chain in the network (if it exists), otherwise returns .
: The chain validity check takes as input a chain and returns iff the chain is valid according to a public set of rules.
: The block validity check takes as input a block and returns iff the block is valid according to a public set of rules.
: takes as input some data and broadcasts it to all the nodes of the system.
The nodes in the protocol have their own local chain which is initialised with a common genesis block. The consensus in guarantees the properties of persistence and liveness discussed in Section 2.2.
3.2 Editable Blockchain
We build our editable blockchain protocol by modifying and extending the aforementioned protocol . The protocol has copies of all the basic blockchain functionalities exposed by through the interfaces described above, and modifies the and algorithms in order to accommodate for edits in . In addition, the protocol provides the following interfaces:
: takes as input the chain , an index of a block to edit and some data . It then returns a candidate block for .
: takes as input a candidate block and the chain and returns iff the candidate block is valid.
The modified chain validation and block validation algorithms are presented in Algorithm 1 and Algorithm 2, respectively, while the new algorithms to propose an edit to a block and to validate candidate blocks are presented in Algorithm 3 and Algorithm 4, respectively. In Fig. 2 we formally describe the protocol .
Intuitively, we need modifications for chain validation and block validation algorithms to account for an edited block in the chain. A block that has been edited possesses a different state, that does not immediately correlate with its neighbouring blocks. Therefore, for such an edited block we need to ensure that the old state of the block (the state before the edit) is still accessible for verification.444Note that the protocol does not need to maintain the redacted data for verification, and therefore all redacted data is completely removed from the chain. We do this by storing the old state information in the block itself. This therefore requires a modified block validation algorithm and a modified chain validation algorithm overall.
We note that for simplicity our protocol is restricted to perform a single edit operation per block throughout the run of the protocol. In Appendix A we describe an extension of the protocol to accommodate for an arbitrary number of redactions per block.
We introduce the notion of a blockchain policy , that determines if an edit to the chain should be approved or not. The protocol is parameterised by a policy that is a function that takes as input a chain and a candidate block (that proposes a modification to the chain ) and it returns if the candidate block complies with the policy , otherwise it outputs ; in case the modification proposed by is still being deliberated in the chain , then returns .
In its most basic form, a policy requires that a candidate block should only be accepted if was voted by the majority of the network within some predefined interval of blocks (or voting period ). A formal definition follows.
Definition 1 (Policy).
A candidate block generated in round is said to satisfy the policy of chain , i.e., , if it holds that and the ratio of blocks between and containing (a vote for ) is at least , for , and , where is the persistence parameter, is the voting period, and is the ratio of votes necessary within the voting period .
3.3 Protocol Description
We denote a block to be of the form , where is the hash of the previous block, is the block data, and is the old state of the block data. To extend an editable chain to a new longer chain , the newly created block sets , where . Note that upon the creation of block , the component takes the value , that represents the initial state of block .
During the setup of the system, the chain is initialised as , and all the users in the system maintain a local copy of the chain and a pool consisting candidate blocks for edits, that is initially empty. The protocol runs in a sequence of rounds (starting with ).
In the beginning of each round , the users try to extend their local chain using the interface , that tries to retrieve new valid blocks from the network and append them to the local chain. Next, the users collect all the candidate blocks from the network and validate them by using (Algorithm 4); then, the users add all the valid candidate blocks to the pool . For each candidate block in , the users compute to verify if the candidate block should be adopted by the chain or not; if the output is they replace the original block in the chain by the candidate block and remove from . If the output is , the users remove the candidate block from , otherwise if the output is they do nothing. To create a new block the users collect transactions from the network and store them in ; if a user wishes to endorse the edit proposed by a candidate block that is still in stage, the user can vote for the candidate block by simply adding to the data . After the block is created and the new extended chain is built, the users broadcast the new chain iff (Algorithm 1). Finally, if a user wishes to propose an edit to block in the chain , she first creates the new data , that represents the modifications that she proposes to make to the data , and calls (Algorithm 3) using the interface with the chain , index of the block in and the new data . The algorithm returns a candidate block that is broadcasted to the network.
Given a chain , the user needs to validate according to some set of validation rules. To do this, she uses the interface, that is implemented by Algorithm 1. The algorithm takes as input a chain and starts validating from the head of . In Algorithm 1, the validity of the block is checked. If the assertion in Algorithm 1 is false and if the check in Algorithm 1 is successful, then the block is a valid edited block. In Algorithm 1, the validity of is checked in the context of a candidate block and whether the block is accepted according to the voting policy of the chain.
To validate a block, the algorithm (described in Algorithm 2) takes as input a block and first validates the data included in the block according to some pre-defined validation predicate. It then checks if the block indeed satisfies the constraints of the PoW puzzle. Apart from this check, the or () condition is to ensure that in case of dealing with an edited block , the old state of still satisfies the PoW constraints.
Proposing an Edit
Any user in the network can propose for a particular data to be removed or replaced from the blockchain. She uses the algorithm as described in Algorithm 3 and constructs a candidate block to replace the original block. The algorithm takes as input a chain , the index of the original block and new data that will replace the original data. If the user’s intention is simply to remove all data from block then . It then generates a candidate block as the tuple .
Validating Candidate Blocks
When the user wishes to validate a candidate block for the -th block of a chain , she uses which is described in Algorithm 4. It retrieves the blocks and of index and respectively from the chain . In Algorithm 4 it is checked if the link from to holds and that the link from to also satisfies the condition . The latter condition checks if the “old link" still holds. If both checks are successful the candidate block is considered valid, otherwise it is considered invalid.
4 Security Analysis
In this section we analyse the security of our editable blockchain protocol of Fig. 2.
We assume the existence of an immutable blockchain protocol , as described in Section 3.1, that satisfies the properties of chain growth, chain quality and common prefix . The basic intuition behind our security analysis is that, given that satisfies the aforementioned properties, our editable blockchain protocol , (which is parameterised by a policy ), preserves the same properties (or a variation of the property in the case of common prefix). Therefore, our protocol behaves exactly like the immutable blockchain when there are no edits in the chain, and if an edit operation was performed, it must have been approved by the policy . We discuss each individual property next.
The chain growth property from is automatically preserved in our editable blockchain , since the possible edits do not allow the removal of blocks or influence the growth of the chain. We present the formal definition next, followed by a theorem stating that preserves chain growth whenever satisfies chain growth.
Definition 2 (Chain Growth ).
Consider the chains possessed by two honest parties at the onset of two slots , with at least slots ahead of . Then it holds that , for and , where is the speed coefficient.
If satisfies -chain growth, then satisfies -chain growth for any policy .
We note that extends , that by assumption satisfies chain growth. Also, note that in it is not possible to remove a block from the chain (for any policy ), thereby reducing the length of . In other words, the edits performed do not alter the length of the chain. Therefore, we conclude that satisfies chain growth whenever satisfies chain growth.
The chain quality property informally states that the ratio of adversarial blocks in any segment of a chain held by a honest party is no more than a fraction , where is the fraction of resources controlled by the adversary.
Definition 3 (Chain Quality ).
Consider a portion of length -blocks of a chain possessed by an honest party during any given round, for . Then, the ratio of adversarial blocks in this segment of the chain is at most , where is the chain quality coefficient.
Let be a collision-resistant hash function. If satisfies -chain quality, then satisfies -chain quality for any -policy where .
We note that the only difference in in relation to is that blocks can be edited. An adversary could edit an honest block in the chain into a malicious block (e.g., that contains illegal content), increasing the proportion of malicious blocks in the chain, and therefore breaking the chain quality property. We show below that
has only a negligible probability of violating chain quality of.
Let propose a malicious candidate block for editing an honest block . Since possesses only computational power, by the chain quality property of we know that the adversary mines at most ratio of blocks in the voting phase. As the policy stipulates, the ratio of votes has to be at least for to be approved, where . Therefore, can only be approved by the policy if honest nodes vote for it. Observe that the adversary could try to build an “honest looking" (e.g., without illegal contents) candidate block such that , in an attempt to deceive the honest nodes during the voting phase; the honest nodes could endorse the candidate block during the voting phase, and the adversary would instead edit the chain with the malicious block . The adversary has only a negligible chance of producing such a candidate block where , since this would violate the collision-resistance property of the hash function .
Moreover, is incorporated to the chain only if it is an honest candidate block. This concludes the proof.
The common prefix property informally says that if we take the chains of two honest nodes at different time slots, the shortest chain is a prefix of the longest chain (up to the common prefix parameter ). We show the formal definition next.
Definition 4 (Common Prefix ).
The chains possessed by two honest parties at the onset of the slots are such that , where denotes the chain obtained by removing the last blocks from , where is the common prefix parameter.
We remark however, that our protocol inherently does not satisfy Definition 4. To see this, consider the case where two chains and are held by two honest parties and at slots and respectively, such that . In slot starts the voting phase (that lasts blocks) for a candidate block proposing to edit block , such that . Note that at round the voting phase is still on, therefore . By round , the voting phase is complete and in case the block is replaced by in . However, in chain the -th block is still , since the edit of is waiting to be confirmed. Therefore, , thereby violating Definition 4.
The pitfall in Definition 4 is that it does not account for edits or modifications in the chain. We therefore introduce a new definition that is suited for an editable blockchain (with respect to an editing policy). The formal definition follows.
Definition 5 (Editable Common prefix).
The chains of length and , respectively, possessed by two honest parties at the onset of the slots satisfy one of the following:
for each such that , it must be the case that , for ,
where denotes the chain obtained by pruning the last blocks from , denotes the chain policy, and denotes the common prefix parameter.
Intuitively, the above definition states that if there exists a block that violates the common prefix as defined in Definition 4, then it must be the case that this block is an edited block whose adoption was voted and approved according to the policy in chain . We show that our protocol satisfies Definition 5 next.
Let be a collision-resistant hash function. If satisfies -common prefix, then satisfies -editable common prefix for a -policy.
If no edits were performed in a chain , then the protocol behaves exactly like the immutable protocol , and henceforth the common prefix property follows directly.
However, in case of an edit, consider an adversary that proposes a candidate block to edit in chain , which is later edited by an honest party at slot . Observe that by the collision resistance property of , is not able to efficiently produce another candidate block such that . Therefore, since is honest and adopted the edit in , it must be the case that received enough votes such that . This concludes the proof. ∎
How the properties play together: By showing that satisfies the three aforementioned properties, we show that is a live and persistent blockchain protocol immutable against edits not authorised by the policy .
The editable common prefix property ensures that only policy approved edits are performed on the chain. The Chain quality property, for a -policy where , ensures that an adversary does not get a disproportionate contribution of blocks to the chain.
5 Integrating into Bitcoin
In this section we describe how our generic editable blockchain protocol (Fig. 2) can be integrated into Bitcoin. For simplicity, we consider one redaction per block and the redaction is performed on one or more transactions included in the block. The extension of the generic protocol for multiple redactions (described in Appendix A) can be immediately applied to the construction described in this section. Next, we give a brief background on the Bitcoin protocol.
5.1 Bitcoin Basics
A simple transaction in Bitcoin has the following basic structure: an input script, an output script with a corresponding amount, and a witness. More complex transactions may have multiple input and output scripts and/or more complex scripts. A transaction that spends some output of , has the ID of in its input, denoted by , and a witness that satisfies the output script of (as shown in Fig. 3). The amount being spent by the output script needs to be smaller (or equal) than the amount of . The most common output scripts in Bitcoin consists of a public key, and the witness is a signature of the transaction computed using the corresponding secret key. We refer the reader to  for a comprehensive overview of the Bitcoin scripting language.
Insertion of Data
Users are allowed to propose new transactions containing arbitrary data, that are then sent to the Bitcoin network for a small fee. Data can be inserted into specific parts of a Bitcoin transaction, namely the output script, input script and witness. Matzutt et al.  provide a quantitative analysis of data insertion methods in Bitcoin. According to their analysis, and coinbase transactions are the major pockets apart from some non-standard transactions, where data is inserted.
5.2 Modifying the Bitcoin Protocol
In this section we detail the modifications to the Bitcoin protocol necessary to integrate it to our generic editable blockchain protocol of Section 3. The resulting protocol is a version of Bitcoin that allows for redaction of (harmful) data from its transactions.
By redaction of transactions, we mean removing data from a transaction without making other changes to the remaining components of the transaction. As shown in Fig. 5, consider a transaction that contains some harmful data in its output script, and let be a candidate transaction to replace in the chain, where is exactly the same as , except that the harmful data is removed ( Fig. 5).
A user who wishes to propose a redaction proceeds as follows: First, constructs a special transaction (as shown in Fig. 4) containing and , that respectively denotes the hash of the transaction being redacted, and the hash of that is the candidate transaction to replace in the chain555We note that our transaction ID is Segwit compatible, as the witness is not used with the hash to generate a transaction’s ID.. Then, broadcasts the special transaction and the candidate transaction to the network; requires a transaction fee to be included in the blockchain, while is added to a pool of candidate transactions666If a candidate transaction does not have a corresponding in the blockchain then the transaction is not included in the candidate pool, and it is treated as spam instead.. The candidate transaction is validated by checking its contents with respect to , and if it is valid, then it can be considered for voting.
The redactable Bitcoin protocol is parameterised by a policy parameter (Definition 1). The policy dictates the requirements and constraints for redaction operations in the blockchain. An informal description of a (basic) policy for Bitcoin would be:
A proposed redaction is approved valid if the following conditions hold:
It is identical to the transaction being replaced, except that it can remove data.
It can only remove data that can never be spent, e.g., output scripts.
It does not redact votes for other redactions in the chain.
It received more than % of votes in the consecutive blocks (voting period) after the corresponding is stable in the chain.
where voting for a candidate transaction simply means that the miner includes in the coinbase (transaction) of the new block he produces. After the voting phase is over, the candidate transaction is removed from the candidate pool.
The reason for restricting the redactions to non-spendable components of a transaction (e.g., ) is that, permitting redactions on spendable content could lead to potential misuse (Section 7) and future inconsistencies within the chain. We stress however, that this is not a technical limitation of our solution, but rather a mechanism to remove the burden of the user on deciding what redactions could cause inconsistencies on the chain in the future. We feel that the aforementioned policy is suitable for Bitcoin, but as policies are highly dependent on the application, a different policy can be better suited for different settings.
New Block Structure
To account for redactions, the block header must accommodate an additional field called . When a block is initially created, i.e., prior to any redaction, this new field takes the same value as . For a redaction request on block , that proposes to replace with the candidate transaction , the transactions list of the candidate block (that will replace ) must contain in addition to the remaining transactions. A new is computed for the new set of transactions, while remains unchanged. To draw parallels with the abstraction we described in Section 3.1, is analogous to and is analogous to .
|hash of the previous block header|
|root of the merkle tree (whose the leaves are the transactions)|
|the difficulty of the proof-of-work|
|the timestamp of the block|
|nonce used in proof-of-work|
|root of the merkle tree of old set of transactions|
The validation of a block consists of the steps described below.
Validating transactions: The block validates all the transactions contained in its transactions list; the validation of non-redacted transactions is performed in the same way as in the immutable version of the protocol. Transactions that have been previously redacted require a special validation that we describe next. Consider the case presented in Fig. 5, where is replaced by . The witness was generated with respect to and is not valid with respect to . Fortunately, the old state (hash of the redacted transaction) is stored, as shown in Fig. 7, ensuring that the witness can be successfully validated with respect to the old version of the transaction. Therefore, we can ensure that all the transactions included in the block have a valid witness, or in case of redacted transactions, the old version of the transaction had a valid witness. To verify that the redaction was approved in the chain one needs to find a corresponding (Fig. 4) in the chain, and verify that it satisfies the chain’s policy.
PoW verification: The procedure to verify the PoW puzzle is described in Algorithm 2. If the block contains an edited transaction, i.e., , then substitute the value in with that in and check if the hash of this new header is within .
To validate a full chain a miner needs to validate all the blocks within the chain. The miner can detect if a block has been redacted by verifying its hash link with the next block; in case of a redacted block, the miner verifies if the redaction was approved according to the chain’s policy. The miner rejects a chain as invalid if any of the following holds: (1) a block’s redaction was not approved according to the policy, (2) the value of the redacted block is incorrect with respect to the set of transactions (that contains the hash of the redacted transaction) or (3) a previously approved redaction was not performed on the chain.
Removing a transaction entirely or changing spendable data of a transaction may result in serious inconsistencies in the chain. For example, consider a transaction that has two outputs denoted by and , where the second output has a data entry and the first output contains a valid spendable script that will be eventually spent by some other transaction . If the redaction operation performed on affects the output script of , may become invalid, causing other transactions to become invalid. A similar problem may arise if the redaction is performed on the input part of enabling the user who generated to possibly double spend the funds. Therefore, we only allow redactions that do not affect a transaction’s consistency with past and future events.
Redaction and Retrievability
The redaction policy for Bitcoin restricts redactions to only those operations that do not violate a transaction’s consistency. This means that we do not allow monetary transactions to be edited (such as standard coin transfer). We stress, however that the main objective of redacting a transaction is to prevent some malicious content , that is stored inside , from being broadcasted as part of the chain, thereby ensuring that the chain and its users are legally compliant. Note that we cannot prevent an adversary from locally storing and retrieving the data , even after its redaction, since the content was publicly stored in the blockchain. In this case, the user that willingly keeps the malicious (and potentially illegal) data will be liable.
Our proposal offers accountability during and after the voting phase is over. Moreover, the accountability during the voting phase prevents the problem of transaction inconsistencies discussed above.
Voting Phase Accountability: During the voting phase, anyone can verify all the details of a redaction request. The old transaction and the proposed modification (via the candidate transaction) are up for public scrutiny. It is publicly observable if a miner misbehaves by voting for a redaction request that, apart from removing data, also tampers with the input or (a spendable) output of the transaction, in turn affecting its transaction consistency. This could discourage users from using the system due to its unreliability as a public ledger for monetary purposes. Since the miners are heavily invested in the system and are expected to behave rationally, they would not vote for such an edit request (that is against the policy) during the voting phase.
Victim Accountability: After a redaction is performed, our protocol allows the data owner, whose data was removed, to claim that it was indeed her data that was removed. Since we store the hash of the old transaction along with the candidate transaction in the edited block (refer to Fig. 7), it is possible for a user that possesses the old data (that was removed) to verify it against the hash that is stored in the redacted block. This enforces accountability on the miners of the network who vote for a redaction request by discouraging them from removing benign data. At the same time, our protocol guarantees protection against false claims, as the hash verification would fail.
6 Proof-of-Concept Implementation
In this section we report on a Python proof-of-concept implementation used for evaluating our approach. We implement a full-fledged Blockchain system based on Python 3 that mimics all the basic functionalities of Bitcoin. Specifically, we include a subset of Bitcoin’s script language that allows us to insert arbitrary data into the chain, which can be redacted afterwards. The redacting mechanism is built upon the proposed modifications to Bitcoin that we describe in Section 5. For conceptual simplicity we rely on PoW as the consensus mechanism.