Collaborative Privacy for Web Applications

by   Yihao Hu, et al.
Boston University

Real-time, online-editing web apps provide free and convenient services for collaboratively editing, sharing and storing files. The benefits of these web applications do not come for free: not only do service providers have full access to the users' files, but they also control access, transmission, and storage mechanisms for them. As a result, user data may be at risk of data mining, third-party interception, or even manipulation. To combat this, we propose a new system for helping to preserve the privacy of user data within collaborative environments. There are several distinct challenges in producing such a system, including developing an encryption mechanism that does not interfere with the back-end (and often proprietary) control mechanisms utilized by the service, and identifying transparent code hooks through which to obfuscate user data. Toward the first challenge, we develop a character-level encryption scheme that is more resilient to the types of attacks that plague classical substitution ciphers. For the second challenge, we design a browser extension that robustly demonstrates the feasibility of our approach, and show a concrete implementation for Google Chrome and the widely-used Google Docs platform. Our example tangibly demonstrates how several users with a shared key can collaboratively and transparently edit a Google Docs document without revealing the plaintext directly to Google.



page 2

page 3

page 6

page 7

page 9

page 10

page 11

page 13


Security and Privacy Perceptions of Third-Party Application Access for Google Accounts (Extended Version)

Online services like Google provide a variety of application programming...

A Revision Control System for Image Editing in Collaborative Multimedia Design

Revision control is a vital component in the collaborative development o...

Exploring Privacy Implications in OAuth Deployments

Single sign-on authentication systems such as OAuth 2.0 are widely used ...

PASSAT: Single Password Authenticated Secret-Shared Intrusion-Tolerant Storage with Server Transparency

In this paper, we introduce PASSAT, a practical system to boost the secu...

Blockchain-Enabled End-to-End Encryption for Instant Messaging Applications

In the era of social media and messaging applications, people are becomi...

EmPoWeb: Empowering Web Applications with Browser Extensions

Browser extensions are third party programs, tightly integrated to brows...

COLiER: Collaborative Editing of Raster Images

Various web-based image-editing tools and web-based collaborative tools ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Collaborative web applications (apps) such as the Google productivity suite (Docs, Sheets, and Slides) enable multiple users to simultaneously edit a number of common documents. To enable server-side features, such as compression or version control, the contents of these documents are typically available in plaintext to the app provider. As a result, the provider, affiliated third parties, or malicious parties who have infiltrated the provider, may also be able to mine the plaintext for behavioral advertising, social engineering, or even identity theft.

To help preserve their privacy, some users encrypt their data client-side, allowing only users who know a shared private key to read the plaintext. However standard encryption often inhibits the human usability experience [23, 28] and its block or streaming encoding is likely to impair or completely break the collaborative functionality provided by a web service. Likewise, anonymization overlays like Tor [8] or private browsing may only superficially obfuscate the connection between users and data, as the data itself may very well contain deanonymizing features.

In contrast to these existing approaches, we propose a transparent and light-weight encryption layer between clients and providers that cryptographically protects user data without breaking collaborative features. Users with access to the document secret may view and edit the document within the collaborative framework as if no encryption layer is present. On the other hand, users who do not know the document secret, and this may include the app provider, see obfuscated text. This layer is implemented through browser extension and it makes extensive use of the standard XMLHttpRequest API [9] used by a variety of web applications (e.g., Google productivity suite, Conceptboard, MeetingWords, Collabedits, Codepen, etc. [4, 1, 5, 7]) to transmit user edits.

Our approach is based on a novel character-level variation of the venerable polyalphabetic substitution cipher [20]. The benefit of encrypting without the need for context, and at the smallest unit of information of many collaborative apps (i.e., one Unicode character), is that our approach maintains app functionality while simultaneously maintaining provider bandwidth usage and avoiding the need for heavy-duty reverse-engineering of app-related code or network protocols (which may be obfuscated). For example, when a character is added into a Google Docs document, we do not need to parse or modify the control traffic designed to mark the location, style, and font of the edit; instead, we encrypt only the raw character datum, and all associated control information flows through to the provider as before.

Though the substitution cipher itself is vulnerable to a number of well-known attacks, such as statistical attacks and chosen plaintext attacks [24], we provide approaches for strengthening the cipher to provide useful security, through standard mitigations approaches such as homophony and mapping randomization in addition to novel approaches based on range extension. In the latter case, the plaintext is extended from the typically narrow band of the Unicode character space (e.g., those associated with the English and/or Greek alphabets) to the entirety of the Unicode space in a manner than helps equalize character and multi-character distributions in order to complicate statistical attacks. Finally, we demonstrate the effectiveness of our system through the implementation of a Chrome browser extension that showcases its use in preserving privacy for the popular Google Docs collaborative platform.

The following are our main contributions:

  • We identify a robust mechanism for encrypting/decrypting user data within collaborative environments that utilize the XMLHttpRequest API without affecting server-side control traffic.

  • We develop and analyze a novel character-level variation of the polyalphabetic substitution cipher that is more resilient to classical attacks on the cipher.

  • We concretely demonstrate an integration of the two previous contributions as a prototype privacy-preserving Chrome extension for managing Google Docs.

We begin in Section II with a review of some of the related work from the literature. Next we present the architecture of our system in Section III, including descriptions of our software interface, followed by our new character-level encryption scheme in Section IV. Section V describes our prototype together with screenshots of it in action. We conclude in Section VI with some final thoughts, including limitations of our approach.

Ii Related Work

We next outline several representative (but hardly exhaustive) approaches to web-based privacy preservation from the literature. Our approach is specifically attuned to online collaborative environments, and our use of a memoryless character-level encryption is thus one key point of departure with the related work below.

Ii-a M-Aegis

M-Aegis [15] aims to protect data from cloud providers by using a transparent window that sits atop an existing application and encrypts input data in transit. Our approach differs from M-Aegis in a number of ways.

First, M-Aegis focuses on native Android apps, whereas our approach focuses on browser-hosted apps and is not operating-system dependent. Rather than mimicking portions of an app’s interface with a GUI overlay, we hook directly into the web application to intercept user input as transparently as possible. Indeed, our implementation executes in the browser together with the web app and has access to its Document Object Model (DOM).

Moreover, in real-time collaborative environments, edits may occur character-by-character and at different locations in a document. Block-based encryption schemes, like those utilized by M-Aegis, require the ability to discern the context of edits in and re-encrypt on the fly. On the other hand, our approach is specifically targeted to real-time collaborative environments, where bandwidth constraints typically dictate sending only localized modifications of a resource, rather than its entire contents, when a change is made. Because our character-level encryption functions at the granularity of the service provider, it maintains low network traffic load and preserves a natural transparency for many back-end services, including search, find-and-replace, character count, and the like.

Ii-B MessageGuard

Another closely related work is MessageGuard [19], which implements a system that layers end-to-end encryption on top of existing web applications, using the browser as a global control point and deploying as either a browser extension or a bookmarklet. Our approach differs from MessageGuard in the following ways.

MessageGuard uses the iFrame HTML element as a middleware overlay between the user and web app, modifying data before it reaches the application. We, on the other hand, intercept data between the application and server, modifying it in transit. In this manner, our users’ interaction with the app does not change, meaning that their experience is preserved. In addition, MessageGuard currently focuses mainly on messaging tools such as Facebook Chat, and Gmail, rather than collaborative editing tools. For example, MessageGuard customizes iFrame Graphical User Interfaces for each web application to which it is applied. For collaborative editing tools, which are often constructed with JavaScript, it can be quite challenging to produce an iFrame overlay that matches the users’ experience with respect to formatting, layout, and interface.

Finally, MessageGuard utilizes block-based encryption, which is less suitable for collaborative environment than our character-level approach, as discussed previously.

Ii-C Fully-homomorphic Encryption

There are also a number of methods of ensuring data confidentiality with the help of the cloud provider, most notably based on the use of fully-homomorphic encryption (see, for example,  [12, 27, 25, 13], although there is a wealth of additional literature within their citations and reverse-citations). These methods aim to have the server agnostically compute functions of a user’s data, and they are aimed toward an honest-but-curious provider. Our approach does not require any server-side modifications while maintaining transparency to the user or multiple collaborating users. We also avoid the heavy computational machinery required for these schemes.

Ii-D Classical Encryption

More generally, there are quite a number of tools that aim to encrypt user data, as typified by PGP [29] and S/MIME [18]. These tools are all meant for one operating user at a time, rather than collaborating users, and they are not designed up-front to function within the back-end’s existing processing methods.

Iii System Architecture

Our proposed system is divided into two fundamental components:

  1. A browser interface, which intercepts and modifies data that enters or leaves the app within the browser.

    Our specific prototype extension makes use of standard Chrome features to insert interface code between the app and the provider, and, thus, it may be expected to persist over several browser revisions. Indeed, these features are also common in the popular Firefox browser, and our system should be portable to it as well.

  2. a character by character collaborative encryption scheme that runs within the interface to encrypt and decrypt data streams using an extension of the substitution cipher.

We next present details of our architecture, starting in Section III-A with an overview of our threat model. Section III-B describes the browser interface together with the software hooks that enable it. Thereafter, Section IV describes the collaborative encryption scheme, together with analyses and approaches to strengthening its security.

Iii-a Threat Model

We assume that the collaborating users have an out-of-band method for sharing a common secret key for encryption and decryption, and the strength of our encryption scheme is based on some standard assumptions about the statistical properties of the text being edited (elaborated in Section IV within each relevant subsection), which are known to the attacker.

Iii-A1 In Scope

Our threat model includes an honest-but-curious cloud collaborative service provider or third party that observes and mines data at rest on the service’s servers. Third parties could include attackers with access to the provider’s data servers, partners in a business relationship with the provider or law enforcement agencies.

Iii-A2 Out of Scope

Since we only focus on the collaborative real-time editors, threats to other kinds of app, such as Facebook Messenger and Emails, are not considered in this paper. Moreover, we primarily protect against attacks at rest on the service provider’s servers, and thus do not handle:

  • Browser attacks - we assume that the browser reliably executes both the application and our browser extension, even though the provider might also provide the browser (e.g. Google and the Google Chrome browser).

  • Side-channel attacks - either by the provider or by a “man-in-the-middle” attacker. These include active attacks based on statistically correlating key-strokes or client-server communication with user activities based on fine-grained timing. We also do not consider information leakage from formatting, style, table structure, or other “area affects”, and, instead, focus on text alone. We believe that these side-channel attacks should be addressed by orthogonal mechanisms.

  • The client-side app - although we do not assume the app is trustworthy, we do assume that the implementer of our framework can reverse engineer the application’s protocols to the level of identifying the paths through which input data is transported. Our approach does not cover providers maintaining concealed channels for transferring this data or encrypted metadata, which we would not be expected to access.

Iii-B Browser Interface

Our approach uses a browser extension-based content script [6, 2] to inject JavaScript payloads into web applications. The payloads hook specific functions of JavaScript objects that serve as interfaces for app data. With hooks in place, we can filter and modify the data, which contains event messages from an app’s proprietary protocol.

We have implemented our framework as a browser extension that provides application data interception and modification functionality for the Google Chrome browser. Content scripts typically can access the Document Object Model (DOM) of targeted pages, but cannot use variables or functions defined by web pages or by other content scripts [2]. However, by utilizing features in the environment of the browser, we are able to interact with web scripts and implement hooks on typical sources and sinks of web application data.

The ability to run code on a web page is only part of the challenge of collaborative encryption. Editors that are not HTML-backed editors (like the Google Docs framework [4]) often generate their client graphical interface through obfuscated JavaScript. As such, a successful prototype must also identify an appropriate hook through which to intercept communications between the client and the service provider. When these frameworks make use of the XMLHttpRequest Application Program Interface (XHR), however, one may pick out and overwrite the XMLHttpRequest.send and methods to intercept and modify the entire client-provider data stream.

We next describe some of the details involved with this prototype implementation, stressing that our content script hooking exploits a stable feature of the browser (dating back at least to at Chrome 9.0, circa 2011).

Iii-B1 Content Scripts

Content scripts run JavaScript code within a specific web page context. In Chrome, these scripts may be injected either before the DOM is constructed (document_start mode), after the DOM is complete (document_end mode), or right after the window’s onload handler is called (document_idle mode) [2]. An enabling feature of these scripts is that, in document_start mode, they can insert code before the DOM is constructed. The result is that, by design, the inserted code overshadows corresponding methods that are loaded through the web page.

As a general template, the script modeled in Figure 1 can be injected into a web page, before the DOM is constructed, to overshadow an existing overrideFunction. In our prototype example, we combine two payload injections into a Google Docs page to produce an encryption middleware:

  • Outgoing Payload
    - A script that overwrites XMLHttpRequest.send.

  • Incoming Payload
    - A script that overwrites and decrypts (with a user-supplied key) initial page content that has been retrieved from the service provider.

  var code = overrideFunction() {
      ...//Payloads to be injected
  var script = document.createElement("script");
  script.textContent = "(" + code + ")();";
  (document.head ||
Fig. 1: Injection template.

Iii-B2 Outgoing Payload

  XMLHttpRequest.prototype.realSend = XMLHttpRequest.prototype.send;
  var newSend = function(outgoing_data) {
    if (outgoing_data.contain(new_entered_chars)){
      encrypt_algorithm(outgoing_data.new_entered_chars, key);
  XMLHttpRequest.prototype.send = newSend;
Fig. 2: Outgoing data interception and modification.

The outgoing interception payload queries the user for an encryption key and then injects a JavaScript snippet similar to that in Figure 2 into the DOM of the underlying page. When this snippet redefines the XMLHttpRequest.send method, the new method is subsequently applied to all XMLHttpRequest uses and is executed every time XMLHttpRequest.send is called. In effect, The overshadowing method acts as a “man in the middle” and allows direct access to the outgoing data so that it may be viewed and modified before being sent out.

The outgoing_data is sent in an incremental fashion, every time changes (such as keystrokes or formatting modifications) are made within the editing window. In our current prototype, we focus only on modifying keystrokes.

Iii-B3 Incoming Payload

    Object.defineProperty(this, "target", {
        get: function(){
            var text = saved_file;
            decrypt(text, key);
            return text;
        set: function(val){
            saved_file = val;
Fig. 3: Decoding data stored on the service.

The incoming payload requests a decryption key from the user, and then initially decrypts the current state of the document from the service backend using the JavaScript snippet resembling Figure 3. In the case of our prototype, Google Docs loads the document content into a page property, and our redefinition of the getter function of this property allows the Server-stored ciphertext to be intercepted and decrypted into plaintext before being displayed.

Once the initial state has been established, the incoming payload decrypts updates incoming content from the provider in the fashion of Figure 4. Similarly to the overshadowed XMLHttpRequest.send in the outgoing payload, this snippet acts as a “man in the middle” to intercept and decrypt incoming data, where incoming_data here carries only updates to the document. The only difference with the overshadowed XMLHttpRequest.send is that the incoming data is loaded into the property responseText, from which the web app loads updates to the editing window. Therefore, once the getter method is redefined, the incoming data can be successfully intercepted, identified, and modified before being returned to the web app for further processing.

  var realOpen =;
  var newOpen = function(){
    Object.defineProperty(xhr, "responseText", {
      get: function(){
          if (xhr.readyState===4){
            var incoming_data = xhr.response;
            if (incoming_data.contains(new_inputs))
            return incoming_data;
          } }
    realOpen.apply(this, arguments);
  }; = newOpen;
Fig. 4: Decoding incoming user updates.

Iv Collaborative Encryption

Traditional encryption schemes aim to “confuse and diffuse” a plaintext [22] into a ciphertext, so that a small perturbation in the plaintext produces an unpredictable “avalanche” [11] of changes in the ciphertext. As an overlay for a collaborative system, however, this model has some significant drawbacks.

Consider, for example, several users editing a shared document online. If one user changes an “e” to an “a” somewhere in the document, the cloud back-end propagates only this change to the other users, and not an entirely new copy of the document, in order to limit communication overhead. From the perspective of an overlay, however, if the one letter change completely affects an encryption block, then, in effect, the back-end must update all users with the entire block that was changed upon every edit, and the collaboration is very inefficient.

As such, for our platform we seek a “locally-encodable” encryption scheme that manages two seemingly contradictory demands:

  1. Minimize the number of ciphertext characters affected by a small change to the plaintext.

  2. Make it difficult to determine a plaintext, or even parts of a plaintext, from a given ciphertext.

The second demand is typical of encryption protocols, and can be defined in a number of ways, most notably based on computational or information-theoretic assumptions. The first demand is specific to our collaborative context, and we next develop several approaches for meeting it.

The overarching basis for our approach will be the classical substitution cipher, as formalized and described in Section IV-A. It is well known that the substitution cipher leaks statistical information about its plaintext and is also not robust to a (chosen or known) plaintext attack. For the issue of statistical leakage, we propose two approaches based on spreading the plaintext alphabet over a larger ciphertext alphabet. In Section IV-B, we consider the approach of apportioning the plaintext alphabet into many equal-sized blocks in the ciphertext alphabet, a practical scheme with a challenging analysis. In Section IV-C, on the other hand, we evaluate apportioning the plaintext into varying-sized blocks in the ciphertext alphabet, resulting in a more complicated implementation with a simpler analysis. Finally, in Section IV-D we consider mitigations for plaintext attacks.

Iv-a Substitution - a simple approach

We formalize the first demand of our locally-encodable encryption as Definition 1, based on an encryption function indexed by a key string (which is the encryption secret shared by collaborating users) and mapping plaintext strings over an alphabet into ciphertext strings over the same alphabet.

Definition 1.

An encryption function is locally-encodable if, for some constants and all ,

where is the Levenshtein edit distance metric [17], denoting the minimum number of insertions, deletions, and/or single character transpositions needed to transform string into string .

When is the ciphertext length, Definition 1 generalizes any deterministic, fixed-length encryption algorithm . Likewise, is disallowed because it prohibits unique decryption, in that two different plaintexts might map to the same ciphertext.

From the perspective of interfacing cleanly with existing collaborative environments, we desire a position encryption is one that is position independent.

Definition 2.

Encryption function is position-independent if:

where denotes string concatenation and .

Position-independent encryptions are useful because they can be calculated without needing to consider the entire text. More precisely, the encryption can be calculated based on individual characters being edited.

Consider, for example, a plaintext in a document that is being edited collaboratively, and suppose the string is encrypted with a position-independent scheme as on the server. Changing the plaintext by transposing the “c” to a “g” and deleting first “h” corresponds to modifying the ciphertext by transposing to and deleting . The implementational consequence of this is that, when looking at the event messages that are being sent between users, we just need to identify the actual letters being transmitted, and not other metadata, such as their position in the text.

It is not hard to see that the simple substitution cipher, which maps strings based on a one-to-one correspondence between input and output character spaces, is an encryption scheme that is both locally-encodable () and position-independent. One of its well-known drawbacks is that, despite the theoretically large work-factor to break it (95! for printable ASCII characters), the cipher readily yields to classical statistical analysis, since it preserves the character distribution of its source and leaks exact information about where a given string is being changed.

Figure 5 shows a histogram of the characters found in Mark Twain’s The Adventures of Tom Sawyer [26] as a baseline for subsequent examples. One can clearly see the dominance of characters such as the space (SP) and letter e, which can thus be identified in the substitution-encrypted text.

Fig. 5: Histogram of characters in Gutenberg’s translation of Mark Twain’s “Tom Sawyer”. SP, CR, and LF denote a space, carriage-return, and linefeed, respectively.

Iv-B Alphabet Extension

To combat frequency analysis, it is possible to embed the range of usable characters (say, the 95 printable ASCII characters) within the larger Unicode space that is supported by many web applications. For JavaScript engines, it is convenient to use the Unicode characters in the range 0x0020 to 0xD7FF, since many JavaScript engines encode strings as sequences of 16-bit Unicode Transformation Format (UTF-16) code units, where each character is represented by a single code unit [3].


Iv-B1 Encryption

One way of achieving this embedding is to divide the available Unicode region into non-intersecting -character blocks (corresponding to printable ASCII characters), and assigning to each block a pseudorandom permutation seeded by the block’s ID and the encryption key (i.e., the password shared by the various users). To encrypt a printable character, one uniformly randomly picks a -character block from the Unicode range, and uses the corresponding permutation to produce a Unicode character.

As an example, consider the extension algorithm encoding a plaintext character b (see igure 6). The user picks a -character Unicode block in a random (and not necessarily reproducible) way; in this case, she may choose the second block, with Unicode characters in the range 0x007F-0x00DD. A concatenation of the block ID (2) and a shared password is then used to seed a pseudorandom number generator (PRNG) that produces a permutation of the Unicode characters in the range; there are a number of well-known and efficient methods for producing such a random permutation of integers, dating back (at least) to Hall and Knuth (see  [14, 16, 21] for some implementations). Since our plaintext b is the 66th printable ASCII character, we replace it with the 66th element of our pseudorandom permutation, in this case <<, which is our ciphertext character.

Fig. 6: Demonstration of the alphabet extension of a substitution cipher.

Iv-B2 Decryption

To decrypt a ciphertext, a second (authorized) user identifies the Unicode block in which the encrypted character is found, and seeds a PRNG with a concatenation of the resulting block ID and the shared password. This PRNG is then used to produce a pseudorandom permutation, the same one produced by the encrypting user, which is inverted to produce the original plaintext character.

In our earlier example, the second user would identify ciphertext << as belonging to Unicode block 0x007F-0x00DD, which has block ID (2). She would concatenate this ID with her shared password to seed a PRNG and produce the permutation found in that block on the figure. The permutation is a one-to-one correspondence between printable ASCII characters and Unicode characters in the range, so it is readily inverted to produce the plaintext b.

Iv-B3 Unigram analysis

This approach naturally increases the entropy of the resulting ciphertext over simple substitution, as expressed by the following straightforward lemma (we use the notation  to denote the plaintext alphabet, and  to denote the ciphertext alphabet).

Lemma 1.

Extending the base alphabet from to characters in this manner increases character entropy by bits.


The plaintext unigram entropy is given by

The encoding process uniformly distributes characters among the

blocks, meaning that the probability of seeing a character

corresponding to (according to the random permutation of its block) in the ciphertext alphabet is . Computing the resulting entropy of the ciphertext produces:


In the specific case of graphic Unicode characters consistently accessible via JavaScript (i.e., 0x0020 to 0xD7FF), we add roughly bits of entropy to the ciphertext. Extending to valid code points of UTF-8 provides approximately bits of extra entropy.

Figure 7 shows the sorted empirical entropies of the blocks produced by this extended encoding of “Tom Sawyer”. All blocks have a reasonably high entropy within roughly bits of the input text’s entropy of bits, and the overall ciphertext entropy is roughly , which is about bits more than the input text entropy.

Fig. 7: Sorted entropies of Unicode blocks in the alphabet extension cipher. The horizontal line represents the plaintext’s entropy.
Fig. 8: Histogram of plaintext characters mapping to Unicode characters 0x18DF0 - 0x18E4F in the alphabet extension cipher.
Fig. 9: Histogram of plaintext characters mapping to Unicode characters 0x18DF0 - 0x18E4F in the entropy maximization cipher.
Fig. 10: Sorted entropies of Unicode blocks in the entropy maximization cipher. The horizontal line represents the full text’s entropy.

The problem with this scheme becomes evident by examining the histogram of input characters mapped to a specific block, as shown in Figure 8. The uneven distribution of plaintext characters may carry over to ciphertext characters; for example, one can readily identify that the most common ciphertext characters will be mapped from a space character and the letter “e”.

Iv-B4 Greedy Entropy Maximization

Within the alphabet extension scheme of Section IV-B there is flexibility about which Unicode block range to use in producing a ciphertext character. Though a random choice produces a high overall entropy, it might be more advantageous to flatten the histogram of characters mapped into each block. In other words, we would like each character within a given block to appear more or less equally in the ciphertext in order to complicate frequency analysis, or, more formally, to maximize the minimum entropy of a block.

A joint optimization of entropy across the entire plaintext message is inappropriate for our application, which requires online encryption one character at a time, but a greedy entropy maximization is feasible. In this approach, we maintain the histograms of each Unicode block in memory. When a new character needs to be mapped, we consider its effect on the entropy of each Unicode block and add it to the block for which it most raises the entropy, breaking ties uniformly at random. This has the effect of significantly flattening the distribution of characters.

Indeed, Figure 9 shows that the first

k of characters from the same text under heuristic produces a flat distribution for the block from Figure 

8, with correspondingly high entropies for each block, as shown in Figure 10.

Local decodability for

By increasing the constant in the locally-decodable definition, it is possible to even further reduce the amount of information leaked by an edit, at the expense of significantly increasing the implementational complexity of the system. In this scenario, one user edit results in a constant number of edits in the ciphertext. A consequence of this approach is that the overlay has to be able to produce edit events de novo, which is fragile to updates in the underlying web application.


Our encryption scheme relies on two elements for its security. First, by greedily maximizing entropy, we end up significantly flattening the distribution of characters mapped into a given block, complicating character-based frequency analysis. Second, the pseudo-random permutation choices per block provide some separability, in that decoding the permutation of one block does not directly lead to the decoding of another block (although it may provide side-information with which to mount an attack).

Iv-C Information Theoretic Optimization

Thusfar, we have utilized fixed-length blocks, each encrypting (through substitution) the entire range of usable plaintext characters and a heuristic greedy entropy maximization method to flatten the block-wise unigram distribution. It turns out, however, that variable-length blocks, with lengths adapted to the probability distribution of plaintext characters, can provide even better defense against statistical attacks because the unigram distribution of ciphertext characters can provably be made arbitrarily close to uniform.

Iv-C1 Variable-length block algorithm

Let denote the plaintext alphabet of size . Each plaintext character is first mapped (independently of other plaintext characters) uniformly at random to a ciphertext character , where are disjoint ciphertext sub-alphabets (one for each plaintext character) and is the entire ciphertext alphabet. Let denote a permutation on . Our randomized homophonic substitution cipher can be described by the encryption function which , which is invertible with knowledge of . Here, the permutation represents a shared secret key which is available to both the encryption and decryption algorithms, but not the attacker.

In order to simplify the exposition, we shall assume that the plaintext stochastic process is first-order stationary, meaning that the unigram (i.e., marginal) distribution of individual plaintext characters is the same at all positions within the plaintext sequence. While this assumption may not hold exactly in practice, it is a fairly weak technical assumption to make since it still allows the process to be non-stationarity (of higher orders) and also have strong temporal dependencies (memory). Moreover, it can be made to hold to any desired degree by encrypting a suitably long sequence of consecutive plaintext characters at once as a group and permuting the sequential ordering of characters within the group using another shared secret key. For simplicity, however, we will assume that the first-order stationarity condition holds without such grouping and sequential ordering permutation.

Iv-C2 Unigram distribution

Since the encryption process operates in a character-wise and statistically time-invariant manner, it follows that the ciphertext character process is also first-order stationary. If the unigram (first-order) probability mass function (pmf) of the plaintext is , , then the unigram (first-order) pmf of the ciphertext is given by , for all and .

This is because in order to get ciphertext character , the plaintext character that corresponds to it must be generated (this happens with probability ) and then the particular ciphertext character within the bin from which arises (under permutation ) must be picked (this happens with probability ).

Proposition 1.

If the unigram pmf over plaintext characters and the ciphertext sub-alphabet sizes are such that for all , , then the unigram distribution of ciphertext characters is exactly uniform over the ciphertext alphabet, i.e., for all .

If the plaintext unigram probabilities are all rational numbers, then the ciphertext unigram probabilities can be made exactly uniform over the ciphertext alphabet using a sufficiently large, but finite,

. In practice, the plaintext unigram probabilities would be estimated empirically as normalized character counts (frequencies) in some corpus of documents. The estimated probabilities would therefore be rational numbers. If, on the other hand, even one

is irrational, exact uniformity cannot be attained with any finite . However, one can always approximate any irrational fraction with a rational one with a sufficiently large denominator. Thus, the ciphertext unigram distribution can be made as close to uniform as desired by making sufficiently large. In practice, if are not integers, we would drop the fractional parts and distribute (in some manner) any remaining characters in the ciphertext alphabet among the plaintext alphabet characters.

In different scenarios, the plaintext distribution may be known to both the users and the attacker, or only to the various users, or to none. Similarly, the sizes of the ciphertext sub-alphabets may be known to both the users and the attacker or only the users. However, the overall ciphertext alphabet will be known to both the users and the attacker. In what follows, we assume that the users know everything and with the exception of the secret key, the attacker also knows everything.

If the unigram ciphertext distribution can be made exactly uniform, then no statistical test based only on observed ciphertext (ciphertext-only attack) will be able to tease the plaintext characters apart (with confidence better than a random guess) using only a unigram frequency analysis (we discuss multigrams below). On the other hand if the ’s are not exactly uniform, and they are all distinct for different and known to the attacker, then as becomes very large, a ciphertext-only attack may be able break the cipher with overwhelming probability. However, the closer that the ’s are to being uniform, the longer that the attacker will have to wait to gather enough ciphertext characters before being able to break the cipher with sufficient confidence.

Iv-C3 Unigram sample complexity analysis

In order to gain quantitative insight into how long the ciphertext needs to be before it can be broken with some desired degree of confidence via unigram analysis and how this minimum length increases with increasing ciphertext alphabet size , we consider the following simpler task for the attacker: in a binary plaintext alphabet , decide whether a particular ciphertext character corresponds to the plaintext character or the plaintext character . Let denote the number of ciphertext characters that equal in a message of length . For analytical tractability, here we will assume that the plaintext process is stationary and memoryless, i.e., it is a sequence of independent and identically distributed (iid) characters. Then

will have a binomial distribution for

trials with success probability equal to , if corresponds to plaintext character , and success probability , if corresponds to plaintext character . The attacker’s task of deciding or based on and knowledge of is a simple Bayesian binary hypothesis testing problem that has been extensively studied in the literature.

Indeed, for sufficiently large , the error probability of the optimum plaintext decoding (ı.e., based on the Maximum A posteriori Probability [MAP] rule) approximates , where denotes the Kullback-Leibler (KL) divergence [10, Section 11.9]. We have the following result whose proof may be found in Section 11.9 of [10].

Proposition 2.

[10] If and the plaintext process is stationary and memoryless, then for each ciphertext character , the error probability of the optimum plaintext decoding rule (which is the Maximum A posteriori Probability or MAP rule) goes to zero exponentially fast with the ciphertext size :



is a probability and denotes the Kullback-Leibler (KL) divergence between the binary probability distributions and .111To be technically precise, is finite if (resp. 1) whenever (resp. 1) and is infinity otherwise. Also is treated as zero.

Thus for all sufficiently large, . In order to achieve a target decoding error probability of or less, we require a message of length characters. If there was no ciphertext alphabet expansion, i.e., , then the minimum number of samples needed to attain a decoding error probability of is given by . Therefore, for each , we need times more ciphertext samples compared to the case when there is no alphabet expansion.

Theorem 1.

If and the plaintext process is iid, then the ratio of the length of ciphertext needed to break the cipher (via unigram analysis) with ciphertext alphabet expansion to the length needed without alphabet expansion is given by:

Iv-C4 Example

Consider as a toy example the non-uniform plaintext unigram distribution given by and , and the ciphertext alphabet size is (giving a ciphertext to plaintext size ratio equal to that of UTF-16 Unicode to ASCII). Then taking and , we get and which is more uniform over an alphabet of size than the plaintext is over the plaintext alphabet. Of course, in this particular example if was a multiple of 10, then the distribution would be exactly uniform, i.e., and the cipher will be unbreakable even via multigram analysis (for an iid plaintext process). Continuing, we have and . This makes , i.e., the ciphertext length needed to break a single character (at any confidence level) with a -fold ciphertext alphabet expansion is about million times that needed to break a single character (to the same confidence level) without ciphertext alphabet expansion. Specifically, for (99% We would like to emphasize that these numbers are just for the toy example where the plaintext alphabet has only two characters and the unigram distribution of the two characters is highly non-uniform. These numbers can be expected to be much more larger in practice because typical plaintext alphabet sizes are much larger than (

for ASCII) and the unigram plaintext distribution is much less skewed.

Iv-C5 Multigram distribution

If the plaintext process has memory, then even if the unigram distribution of the ciphertext can be made exactly uniform by choosing sufficiently large, it may still be possible to break the cipher to any desired degree of confidence by performing an -gram analysis, with , on a sufficiently long piece of the ciphertext ( sufficiently large). This is because it is, in general, difficult to ensure the uniformity of the -gram ciphertext distribution (for ) even if the unigram ciphertext distribution is exactly uniform. Interestingly, however, the “uniformizing” effect of ciphertext alphabet expansion becomes more effective on ciphertext -grams as increases. In particular we show that the -distance between the -gram ciphertext distribution and the uniform distribution on ciphertext -grams, decreases exponentially fast in if the ciphertext alphabet size is strictly larger than , where is the probability of the smallest non-zero-probability plaintext character. To see this, let denote the joint pmf of consecutive plaintext characters, i.e., the -gram plaintext distribution. Thus, .

Then for all -tuples of plaintext characters and all , the joint pmf of consecutive ciphertext characters (-gram ciphertext distribution) is given by

The following proposition bounds the distance between and the which is the uniform pmf on ciphertext -grams.

Theorem 2.


The numerator of the first inequality in the above proof is the -distance between the joint pmf of ciphertext -grams and the product of marginal pmfs of ciphertext unigrams. This can be interpreted as a measure of statistical independence of consecutive ciphertext characters. This distance is never more than . The bound in Theorem 2 implies that if , then

showing that higher order -gram ciphertext distributions approach uniform exponentially fast with increasing . Thus, even though higher-order -gram analysis may help the attacker to break the cipher (due to the underlying memory in the plaintext), for larger and larger it will also become increasingly harder to confidently break it since the required message length will become unreasonably large. The hardness is not just computational but also statistical.

Iv-D Plaintext Attacks

A known or chosen plaintext (with a corresponding ciphertext) significantly reduces the complexity of breaking a substitution cipher by providing some of the plaintext-ciphertext substitutions that form the encryption key; language and context may be used to infer the remaining substitutions. Utilizing a polyalphabetic cipher, as described in this work, improves the resilience of the cipher, since less information is revealed with each substitution. In other words, if the letter character a is mapped uniformly at random to one of ten ciphertext characters, then revealing one of these plaintext-ciphertext connections only reveals one tenth of the occurrences of a. This mapping can be modified in some coarse manner, say based on the month in which the text is produced or the name of the original author of the work, in order to limit the usability of known plaintexts.

The cipher can be made even more resilience by encrypting one plaintext character with more than one ciphertext character, and, indeed, this does not break our collaborative encryption model or our prototype implementation, although it is possible that location-sensitive processing could suffer. Though character-level encryption is inherently weaker than block- or stream-based encryption, we stress that the proposed approach provides a measurable increase in privacy, where currently none exists, without requiring server-side or browser rewriting, and that the encryption can be strengthened further at the expense of efficiency.

V Prototype

We demonstrate our proposed encryption framework through the Google Docs platform, as implemented for the Google Chrome browser.

Google Docs is a collaborative document editing service provided by Google for personal, academic, and corporate use. Two or more users can edit a document’s state, including text, formatting, and figures, together in real-time using only a modern web browser. Data resides on Google’s servers, and any collaborative edits pass through Google’s infrastructure before being forwarded to other collaborators.

Our code (which will be made public after deanonimization) runs on the Google Chrome browser. Screenshots of our prototype in action are provided in Figures 11 and 12.

V-a Mechanism

For Google Docs, document data is structured as a series of event messages, each of which has an associated opcode and a set of fields specific to that opcode. The client-side app parses these messages, which specify document contents, layout, and formatting, to render a document for the user.

With our architecture we were able to intercept both sources of information using the techniques presented in Section III-B. Our implementation involves hooking the XMLHttpRequest prototype for all frames originating from Google’s relevant servers, providing access to the incoming and outgoing XHR data streams. By examining the effects on document state of messages with particular opcodes, we found that events with opcode is specify one or more character insertions. By filtering for is events, we were able to capture collaborative edits of a document’s text contents.

Fig. 11: A collaborator running our browser extension with correct password edits a Google Docs document. The app functions as normal with its collaborative features. However, data specifying the document’s text contents is intercepted and modified in transit to and from Google’s servers according to the encryption scheme described before. Only users running the browser extension with the correct password can view the original contents.

We then exploited the document_start feature to gain control of the DOCS_modelChunk variable at app load-time and inspected the series of event messages it contained. We found that these messages captured the cumulative document state as determined by the combination of all previous collaborative edits. The messages followed the same format as those of the XHR stream, but state changes were combined across messages where possible. Filtering for is events again provided enough specificity to capture all document state information concerning text contents. Our filter excludes formatting states, so formatting is preserved among clients. Google’s servers only ever handle the encrypted document text, and all other document state remains unmodified. From the provider’s or an outsider’s perspective, documents contain indecipherable text, but protocol messages are all valid and parse-able. From an authorized client’s perspective, the application performs normally, and document contents do not deviate from the original form authored by collaborators.

V-B Performance Evaluation

We focused our evaluation of the project demonstration on users’ experience in both qualitative and quantitative way.

Qualitatively, our prototype extension does not modify formatting (bold type, font size, line spacing, etc.) as well as the Google Docs GUI. This enables users to edit and collaborate as if the encryption layer does not exist. The only discrepancy is spelling corrections (which are handled server-side) are disabled as servers only store ciphertext.

On the quantitative side, we measured the delays our extension would cause due to encryption, recording the time between the start and the end of overridden XMLHttpRequest call. The resulting delay is linearly proportional to the number of characters being encrypted, fitting the equation , where is the number of character edits, and time is measured in milliseconds and measures both outgoing and incoming data stream. Since large numbers of character edits happen only during the process of copy-and-paste and loading of the page, most users will experience an average of delay during their active editing.

Fig. 12: An undesired third party (or the provider) views the Google Docs document as a guest without a correct private key. Since the document has been configured to allow public viewing, Google Docs permits access. However, in the absence of the extension and a valid password, document text is indecipherable to the third party.

Vi Conclusions

We have presented a client side encryption system for real time collaborative editing web app. The system consists of an encryption interface as well as a novel variant of the polyalphabetic substitution cipher, designed to seamlessly encrypt and decrypt data without interfering with app functionality or the users’ experience within the web app. In this way, the users’ data privacy is preserved both during transmission and at rest with the provider.

We have implemented a prototype of our system for the Google Docs collaborative word processing web application within the framework of the Google Chrome web browser. We believe that our choice of prototyping framework serves as a reasonable template for other major browsers and web apps, and that our design can be straightforwardly extended to any real-time collaborative editor which uses the standard XHR interface for client-server communication, including such well-used products as the Google productivity suite (Docs, Slides, Sheets), Conceptboard, MeetingWords, Collabedit, Codepen [4, 1, 5, 7].

Vi-a Extensions

There are a number of potential extensions to this work. One likely straightforward task would be to extend of proof-of-concept beyond the Chrome browser. We have been able to load our extension within Firefox with some minor changes to the manifest and content script, but it could be useful to generalize the prototype to other browsers, like Safari, Microsoft Edge, and Opera.

Secondly, our framework relies on the implementer to perform a light-weight evaluation of a particular cloud application’s internal protocol to identify app data sources and sinks and implement filters for relevant event messages that are to be encrypted. The lack of standardized formatting of event messages across similar web apps necessitates this manual analysis, although it should be possible to automate most of this process.

Our architecture works best for applications that represent a shared resource’s state as a series of event messages that follow a consistent protocol. Because rich media, such as images and charts, usually require more complex representations, we do not currently protect such data. In other words, we currently only protect collaborative data leaving the client-side app via the standard XHR interface. although it is technically possible for a client-side application to transfer unencrypted data to external hosts using other standard browser interfaces. We also choose not to obfuscate the document formatting (e.g. font type, highlight, etc.) in our initial prototype for sake of efficiency. A more complete prototype would seek to protect all potential data transfer mechanisms in an efficient manner.

With respect to our encryption, a motivated provider could identify and blacklist features of our approach, such as the utilization of a broad range of the Unicode spectrum. One mitigation to such blacklisting could include limiting encryption character ranges (with a corresponding reduction in privacy preservation). In addition, our character-level encryption, of necessity, provides weaker protection against some commonly protected attacks, such as chosen and known plaintext and malleability attacks; we have provided approaches to strengthen our cipher against these attacks, but our application model is not conducive to the level of protection afforded by some modern cryptographic ciphers.

With respect to the encryption interface, the first step is to mirror the Chrome extension to Firefox Add-on. It was found that Chrome extension could be loaded directly into Firefox with some minor changes in manifest as well as content script even though Firefox is different browser. What’s more, while same methods are used by Google Docs in Firefox to send and receive updates as well as load saved content, some data transmitted has a different structure from the web app running in Chrome. This would require further analysis on data structure to fully implement the same functional system on Firefox. Besides duplicating the system for other browsers, more collaborative web apps, or even social media such as Facebook and Twitter, are to be explored as this system has the potential to be applied to various web apps for the purpose of data security and users’ privacy.


The authors would like to thank John Moore for work on an earlier prototype that paved the way for this approach. The work of Ari Trachtenberg was supported, in part, by the National Science Foundation under Grant No. CCF-1563753. Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding parties.


  • [1] Conceptboard - visual project collaboration made easy.
  • [2] Content Scripts.
  • [3] ECMAScript Language Specification.
  • [4] Google docs.
  • [5] Meetingwords: Collaborative text editing.
  • [6] Mozilla development network: Content scripts.
  • [7] Online text editor - collabedit.
  • [8] Tor: Anonymity online.
  • [9] Xmlhttprequest living standard.
  • [10] T. M. Cover and J. A. Thomas. Elements of information theory. John Wiley & Sons, 2012.
  • [11] H. Feistel. Cryptography and computer privacy. Scientific american, 228:15–23, 1973.
  • [12] C. Gentry. A fully homomorphic encryption scheme. PhD thesis, Stanford University, 2009.
  • [13] S. Halevi and V. Shoup. Helib-an implementation of homomorphic encryption, 2014.
  • [14] M. Hall and D. E. Knuth. Combinatorial analysis and computers. American Mathematical Monthly, pages 21–28, 1965.
  • [15] B. Lau, S. Chung, C. Song, Y. Jang, W. Lee, and A. Boldyreva. Mimesis Aegis: A Mimicry Privacy Shield–A System’s Approach to Data Privacy on Public Cloud. In Proceedings of the 23rd USENIX conference on Security Symposium, pages 33–48. USENIX Association, 2014.
  • [16] D. H. Lehmer. Teaching combinatorial tricks to a computer. In Proc. Sympos. Appl. Math. Combinatorial Analysis, volume 10, pages 179–193, 1960.
  • [17] V. I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, volume 10, pages 707–710, 1966.
  • [18] B. Ramsdell. S/mime version 3 message specification. 1999.
  • [19] S. Ruoti, K. Seamons, and D. Zappala. Layering Security at Global Control Points to Secure Unmodified Software. In 2017 IEEE Secure Development Conference, pages 42–49. IEEE, 2017.
  • [20] B. Schneier. Applied cryptography: protocols, algorithms, and source code in C. john wiley & sons, 2007.
  • [21] R. Sedgewick. Permutation generation methods. ACM Computing Surveys (CSUR), 9(2):137–164, 1977.
  • [22] C. E. Shannon. Communication theory of secrecy systems*. Bell system technical journal, 28(4):656–715, 1949.
  • [23] S. Sheng, L. Broderick, C. A. Koranda, and J. J. Hyland. Why Johnny still can t encrypt: evaluating the usability of email encryption software. In Symposium On Usable Privacy and Security, 2006.
  • [24] A. Sinkov and T. Feil. Elementary cryptanalysis, volume 22. Maa, 2009.
  • [25] D. Stehlé and R. Steinfeld. Faster fully homomorphic encryption. In Advances in Cryptology-ASIACRYPT 2010, pages 377–394. Springer, 2010.
  • [26] M. Twain. The adventures of tom sawyer.
  • [27] M. Van Dijk, C. Gentry, S. Halevi, and V. Vaikuntanathan. Fully homomorphic encryption over the integers. In Advances in cryptology–EUROCRYPT 2010, pages 24–43. Springer, 2010.
  • [28] A. Whitten and J. D. Tygar. Why Johnny can’t encrypt: A usability evaluation of pgp 5.0. In Usenix Security, volume 1999, 1999.
  • [29] P. R. Zimmermann and P. R. Zimmermann. The official PGP user’s guide, volume 265. MIT press Cambridge, 1995.