SNARKs to the rescue: proof-of-contact in zero knowledge

05/26/2020 ∙ by Zachary Ratliff, et al. ∙ 0

This paper describes techniques to help with COVID-19 automated contact tracing, and with the restoration efforts. We describe a decentralized protocol for "proof-of-contact" (PoC) in zero knowledge where a person can publish a short cryptographic proof attesting to the fact that they have been infected and that they have come in contact with a set of people without revealing any information about any of the people involved. More importantly, we describe how to compose these proofs to support broader functionality such as proofs of nth-order (transitive) exposure which can further speed up automated contact tracing. The cryptographic proofs can be publicly verified. In both cases, the burden (and control) is on (and with) the person proving contact or health and not on (with) third parties or healthcare providers rendering the system more decentralized, and accordingly more scalable.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Contact tracing, identifying and notifying individuals who have been in close contact with an infected individual, is widely recognized as an essential tool in protecting against the spread of the novel COVID-19 virus. Automated approaches to contact tracing can help significantly scale the effort relative to manual approaches alone which tend to be slower, and are more labor intensive and costly. Implementations of automated contact tracing systems must however address the privacy concerns of individuals in order to enjoy widespread adoption, something that early straightforward attempts failed to do [singapore2020, cho2020contact].

There is a large body of recent proposals for automated Bluetooth-based contact tracing systems with differing privacy guarantees. We refer the reader to [Vaudenay:2020] for a recent survey. The systems fall into two general categories based on the information flows: decentralized vs centralized. With decentralized approaches, a user’s mobile application generates ephemeral randomized tokens that are regularly broadcast to (and received by) close by mobile phones. Phones save the tokens they broadcast and the ones they receive for a defined period of time. Once a user tests positive, he can opt to report all the tokens his application generated. Reporting is done with the help of the medical provider or some third party. Other individuals who saw the user’s token, and accordingly were in close physical proximity to the user, learn they are at-risk and can seek testing and/or quarantine. Centralized approaches use a central server to generate ephemeral tokens that users share with each other and reporting involves the central server making the connections and alerting users [Vaudenay:2020].

The majority of these existing proposals for automated contact tracing are slow to react and do not adequately address exposure risk. Specifically, they only alert the first level of individuals who have come in close contact with the infected individual after the latter tests positive. Such first order contact tracing may not be enough to control the spread of the virus in a timely manner given that there is a period of time in which individuals can be asymptomatic but infectious, and this period is generally longer than the virus incubation period. Consider for example the following scenario. Alice is asymptomatic but infectious at time and comes in contact with Bob. Bob gets infected and comes in contact with Charlie at time where is the virus incubation period. Alice starts showing symptoms and tests positive at time , at which point Bob get notified. Bob may not show symptoms, may wait to get tested, or may not even get tested. Even if Bob gets tested at time , there is a period of time (could be several days) during which Charlie is not even aware of the exposure risk, and is going about his business as usual.

We propose a new cryptographic protocol for privacy-preserving contact tracing using zero-knowledge succinct non-interactive arguments of knowledge (zk-SNARK). Our protocol builds on existing Bluetooth-based decentralized approaches and allows clients provide a cryptographic proof of proximity after a positive diagnosis. These proofs are succinct, consisting of only a few hundred bytes, and take just a few milliseconds to verify. The zero knowledge property ensures that a person verifying these proofs only learns the statement “I was close to someone who tested positive for the virus” or “I was close to someone who was close to someone who tested positive for the virus” and so on, but nothing else such as who the person is or where the interaction occurred. Our approach is fully decentralized requiring little assistance from the healthcare provider, and may be extended to support broader functionality (e.g., proof of health/immunity). We start with a simple proof-of-contact protocol, and extend it to support transitive exposure by composing proofs of contact using proof-carrying data (PCD) [chiesa2012proof]. Additionally, we demonstrate a simple SNARK-based construction of proof-of-health/immunity, useful for a society in the restoration phases of a pandemic.

In a nutshell, our baseline contact tracing zk-SNARK attests to the statements:

  1. person A was in close proximity to B at some time

  2. person A tested positive for the virus at time

  3. is within 14 days of

in zero knowledge i.e., without learning anything about A or B. A uses the SNARK to produce a proof, and publishes the proof. Anyone, including, B can publicly verify the proof, and seek testing if the proof checks. This is then extended to create proofs of transitive exposure.

In summary, our protocol offers the following benefits:

  • Efficiency is achieved using efficient pre-processing zk-SNARK construction and performing the signature verification outside the SNARK to reduce prover cost [naveh2016photoproof].

  • No trusted third parties or databases required. The public registry need not be trusted. We only require that the zk-SNARK for the desired functionality is correctly setup.

  • Strong end-to-end privacy guarantees. Proximity tokens are not shared; not with third parties nor with healthcare providers.

  • Correctness. A valid proof guarantees the authenticity of the user’s test results and the validity of the statement.

  • Adoption/Practicality. A medical organization only needs to sign records using an existentially unforgeable and publicly verifiable signature scheme. This is a simple task for the medical organization and it deters malicious (non-infected) users from seeking signatures.

  • Decentralization. The burden is on the infected person to actually prove and publish. This allows better scaling (instead of requiring providers or third parties to centrally manage patient proximity data) and potentially better privacy since the user (the stakeholder) has full control over their private data and can share at will.

  • Exposure risk using proof composition. Transitive exposure risk is available in a timely manner through proof composition. This allows users who may have had secondhand contact with an individual who tested positive for a virus, to learn in zero-knowledge about this exposure.

2 Background

2.1 Proximity Tokens

Contact tracing requires monitoring and recording physical interactions between clients. For example, if Alice walks into a cafe where Bob is eating, a method for detecting and measuring their proximity is needed. There have been several works proposing various means of proximity sensing between mobile phones, including using Bluetooth [liu2013face], WiFi [sapiezynski2017inferring], and audio [thiel2012sound] signals.

Regardless of the underlying technology, we assume a mobile phone frequently broadcasts proximity tokens that are received by nearby phones. For example, within each epoch, the phone frequently broadcasts its unique token, and receives tokens from nearby phones. This simplified model has been adopted by the majority of decentralized privacy-preserving contact tracing protocols [Vaudenay:2020].

2.2 Preprocessing zk-SNARK

We review the definitions of arithmetic circuits, preprocessing zero knowledge succinct non-interactive arguments of knowledge (pp-zk-SNARKs) and we refer the reader to [ben2017scalable] for details.

First, we introduce arithmetic circuit satisfiability in Field . An -arithmetic circuit is defined by the relation . Here is called the witness (auxiliary input) and is the public input and the output is . The language of the circuit is defined by . Here (i.e., is represented as field elements), , and the output in .

A hashing circuit for example takes the (private) input/witness and its hash , and asserts that .

A preprocessing zk-SNARK (pp-zk-SNARK) for -arithmetic circuit satisfiability comprises three algorithms , corresponding to the Generator, the Prover, and the Verifier.

Given a security parameter and the -arithmetic circuit , sample a keypair comprising a public proving key and a public verification key .

Given the public prover key and any , generate a succinct proof attesting that

checks that is a valid proof for .

2.3 Proof-carrying data

Proof-carrying data (PCD) captures the security guarantees necessary for recursively composing zk-SNARKs. More specifically, given a compliance predicate , a PCD system checks that a computation involving a set of incoming messages , private local data , and outgoing message , is -compliant.

Formally, a proof-carrying data system consists of three polynomial-time algorithms corresponding to the Generator, Prover, and Verifier.

Given a security parameter and the compliance predicate expressed as a -arithmetic circuit, sample a keypair comprising a public proving key and a public verification key .

Given the public prover key , a set of input messages along with compliance proofs , local input , and output , generate a succinct proof attesting that is -compliant.

checks that is -compliant.

3 Proof-of-Contact Protocol

Consider an existentially unforgeable signature scheme (e.g., ECDSA) with private signing key and public verification key . Let be three collision-resistant hash functions. Let be a pp-zk-SNARK. The baseline protocol builds on [canetti2020anonymous] and works as follows:

  • Trusted setup phase: a trusted entity sets up the system and runs the generator algorithm ; we describe the circuit in more detail shortly. During this phase, each healthcare provider obtains a certificate for its signing key signed by a trusted certification authority.

  • Each user generates a private random string

  • User A generates a random token every time period (the epoch e.g., 5 minute intervals) as , and frequently broadcasts the token. We omit the time subscript hereafter whenever it is clear.

  • Whenever user A receives a proximity token from user B at time , she computes and stores it for 14 days. User B computes the same output. Here we sort the tokens (e.g., lexicographically) before passing them to the hash function.

  • User A tests positive for the virus at time , and obtains a “COVID.positive” test result from a medical provider. User A computes and requests signature from the healthcare provider where is the provider’s private signing key. Note that user A does not have to reveal her secret to the provider. User A may provide only, and a cryptographic proof that for some valid private witness .

  • User A then generates a short cryptographic proof using attesting to these facts

    1. days

  • User A publishes tuple to some public registry. If the public registry already contains a tuple with the value , then the user does not upload these values (in order to prevent linkability). Several techniques may be used here for network unlinkability (e.g., the user app can either use mixing or onion routing solutions, or the provider can publish the material on behalf of the user).

  • User B checks the public registry periodically to find a matching and can quickly verify the proof using . If the proof checks, user B verifies the signature given and the public verification key of the healthcare provider.

  • User B seeks testing, and can show the proof-of-contact to her healthcare provider to expedite the process if needed.

3.1 Security Analysis

Linkability

Tokens are never shared, or published. Only the hash of two tokens is published after a user tests positive. This means different tokens may not be linked as belonging to the same user. The same is true with linking different hashes. Recall when reporting a positive test, user A publishes for all proximity edges. Only user B or some dishonest user C who forms a clique with A and B at time may learn . Since user C is part of the clique, does not leak additional information. User C cannot use to create valid proofs on behalf of A or B without knowledge of their private strings .

Identification

After seeing a proof containing , a curious user B who keeps track of all physical encounters can a posteriori identify the infected person in some form. This attack is common to the majority of the decentralized systems [Vaudenay:2020]. We observe that some form of this leakage is inherent to the protocol. For example, if user B has only encountered one person before getting alerted, user B will be able to identify the infected person no matter how privacy-preserving the alert/protocol is. This may be acceptable in some cases, for example, learning that the “tall person in the dairy aisle at the grocery store” tested positive. A more recent decentralized protocol that mitigates identification attacks has been proposed using “parroting” [kalai-aen:2020]. As discussed earlier, we think our general idea can be applied to this class of protocols as well.

3.2 Proof of Health or Immunity

The same technique may be used by user A to prove their health i.e., that user A received a negative test from her healthcare provider within the past day or week. The same is true for proving immunity with the antibodies test that is gaining traction as communities look towards restoration. Using zk-SNARKs to do this avoids any dissemination of the sensitive test results to third parties keeping users in control and maintaining decentralization. It also provides assurances to the other party that allows them to trust the result.

3.3 Transitive exposure proofs

As discussed earlier, it can be beneficial to provide more granular th order exposure risk data to users to limit the spread of the virus. For example, a user may want to know whether they have had transitive exposure to a virus. Consider that Alice comes in contact with both Bob and Charlie independently of one another. Later, Bob tests positive for the virus, and Alice is alerted that she is at risk. Although Charlie did not directly come in contact with a carrier of the virus, he may find it useful to know that someone he came in contact with has. This transitive approach to contact tracing could enable more informative statistics for users such as a risk profile, i.e., a risk score based on how many degrees of exposure an individual has. Someone who is four transitive hops away from a virus carrier would be at lower risk from someone who is two hops away.

A strawman approach to extending the proof-of-contact protocol for transitive proofs works as follows:

  • As in the original protocol, a trusted entity sets up the system and runs the generator algorithm ; here is an additional circuit with corresponding prover and verifier keys , for proving transitive exposure.

  • User B checks the public registry periodically to find a matching (from some user A who tested positive) and can quickly verify the proof using . If the proof checks, user B verifies the signature given and the public verification key of the healthcare provider.

  • User B then generates a short cryptographic proof using attesting to these facts

    1. days

  • User B publishes tuple to the public registry.

  • User C checks the public registry periodically to find a matching and can quickly verify the proof using . If the proof checks, user C can recursively verify the next proof in the chain until eventually arriving at the original proof. Finally, user C verifies the original proof using .

Observe that in this case the SNARK includes the constraint , corresponding to the day incubation period of COVID-19. The parameter is configurable, however, in general the time that Bob comes in contact with Charlie should come after the time Bob came in contact with Alice plus the incubation period. This will reduce the number of false positives that arise when Bob alerts Charlie of 2nd-order exposure even though Bob could not have possibly become contagious from Alice yet.

3.4 Transitive exposure using proof-carrying data

The above protocol suffers from a linkability flaw with the uploaded pairs. An adversary observing the public registry can deduce that whoever uploaded must have came in contact with the person who uploaded . In order to circumvent this drawback, we modify the protocol to use proof-carrying data (PCD). Using PCD, previous proofs in the chain are verified and a proof that this verification was performed correctly is provided. The PCD system hides the details of intermediate proofs, while allowing a user to verify that the entire chain is valid. Instead of uploading the pairs , transitive proofs consist only of single values which are indistinguishable from random.

For proof-of-contact, we represent the compliance predicate as the hospital signature verification algorithm , coupled with the steps necessary to prove that the randomness of is consistent with the randomness of some . More formally, a user who tested positive can perform the -compliant computation that takes as input , and outputs satisfying the following constraints:

  1. days

The user then uploads the value along with a cryptographic proof attesting that is -compliant.

For proving transitive exposure, a user B who sees the value along with the PCD proof attesting to first-hand exposure, performs the -compliant computation that takes as input , and outputs satisfying the following constraints:

  1. days

Additionally, user B runs a verifier circuit over and provides a cryptographic proof that and is -compliant. Figure 1 illustrates the complete flow from proof-of-contact to proof of transitive exposure.

Choice of digital signature scheme

Encoding the digital signature verification scheme inside the compliance predicate is expensive with respect to circuit size. For this reason, we choose the RSA digital signature scheme which can be represented efficiently over by choosing public exponent and performing modular multiplication via radix arithmetic as suggested in [naveh2016photoproof].

3.5 Proofs of surface transmission via PCD

In some cases, contact tracing by measuring proximity between users may not be sufficient for effectively curbing the spread of a virus. A virus that lives for extended periods on surfaces could transmit from one user to another even though they have never been in close contact. For example, if a contagious user Alice sits on a park bench, Bob, who visits the park the next day, may become infected from sitting on the same bench. If Alice tests positive, it would be ideal that users who are at risk from the surface spread of the virus are alerted.

One approach is to place Bluetooth devices around public spaces, and have them participate in the contact tracing protocol. The devices could exchange tokens with users and verify proofs in the usual way. After discovering a matching token, and verifying the corresponding proof, the device uploads a transitive proof of exposure, which alerts users of the surface transmission risk.

Suppose rather than using PCD, the Bluetooth device on the park bench simply uploads its secondary tokens, i.e., the tokens exchanged with users within 14 days of Alice’s park visit. Although Bob is alerted of his surface contact, he must trust that the park bench Bluetooth device is acting honestly since he has no way of verifying that an infected user actually came in contact with the park bench. By using PCD, Bob maintains all the security and privacy guarantees that hold from the original contact tracing protocol.

Figure 1: Overview of proof-carrying data for transitive exposure

4 Performance Evaluation

We implemented a simplified proof-of-concept zero-knowledge proof-of-contact SNARK using the libsnark library [libsnark]. The library uses the NP-complete language R1CS to express the arithmetic circuits that represent the SNARK. There are existing R1CS gadgets for performing useful functionality, such as comparisons and collision-resistant hashing. It includes an implementation of the subset-sum collision-resistant hashing gadget, which we use as an efficient one-way hash.

We characterize the performance of our proof-of-contact SNARK in terms of the running time and key sizes for both the prover and verifier (Table 1). Since the generator phase is only executed once during setup, we provide concrete numbers on the size of the arithmetic circuit (3060 gates) but disregard the time of the generator (166 ms). The circuit did not account for sorting.

Prover Verifier
Running time (ms) 65 9
Key size (KB) 722 30
Table 1: Performance of PoC pp-zk-SNARK implementation on MacBook Pro with 2.9 GHz Intel core i9 and 32 GB RAM

5 Related Work

There are a few existing proposals for privacy-preserving contact tracing [canetti2020anonymous][dp3t2020][pepppt2020][applegoogle2020][berke2003assessing][raskar2020apps]

. Although most of these works suggest similar techniques for estimating and exchanging proximity information between users, the underlying cryptographic protocols and their privacy guarantees differ.

[canetti2020anonymous][pepppt2020] use randomly generated pseudonyms that nearby users can exchange over Bluetooth. Individuals who test positive for a virus can upload their generated pseudonyms to a public registry, allowing other users to match the tokens they have collected with those in the registry. Both works suggest that healthcare workers should be the ones to upload users’ tokens to the public registry after giving a positive test diagnosis in order to prevent malicious polluting of the database. Similar to the protocol introduced in this work, mixing can be applied to prevent linkability via traffic analysis.

Apple and Google have released a protocol specification that closely resembles that of [canetti2020anonymous] and [pepppt2020]. Users generate a rolling pseudorandom identifier and some associated encrypted metadata, that nearby users exchange over Bluetooth. The pseudorandom identifiers are derived using the current time and temporary exposure keys, which get distributed after a positive diagnosis.

Finally, [berke2003assessing] proposes partitioning GPS and time data into discrete spatiotemporal points and obfuscating these points using a one-way hash function. Infected users upload their obfuscated location histories after redacting personally identifiable information such as the GPS coordinates that represent a home or work address. Using private-set intersection (PSI), individuals can privately determine whether or not their location history overlaps with that of infected users. The privacy guarantees of such an approach differ significantly from that offered in [canetti2020anonymous][pepppt2020][applegoogle2020], and those presented in this paper.

The approaches described above provide different flavors of privacy and decentralization. However, each solution places an increased burden on the healthcare providers relative to the zero-knowledge SNARK technique we have outlined. Our approach requires only that healthcare workers sign positive diagnoses rather than generate one-time codes or upload tokens to a public registry. Additionally, our approach is fully decentralized and supports broader functionality such as proofs of transitive exposure, not currently supported by the other proposed solutions.

References