1 Introduction
Contact tracing, identifying and notifying individuals who have been in close contact with an infected individual, is widely recognized as an essential tool in protecting against the spread of the novel COVID19 virus. Automated approaches to contact tracing can help significantly scale the effort relative to manual approaches alone which tend to be slower and more labor intensive. Implementations of automated contact tracing systems must however address the privacy concerns of individuals in order to enjoy widespread adoption, something that early straightforward attempts failed to do [singapore2020, cho2020contact].
There is a large body of recent proposals for automated Bluetoothbased contact tracing systems with differing privacy guarantees. We refer the reader to [Vaudenay:2020] for a recent survey. The systems fall into two general categories based on their information flows: decentralized vs centralized. With decentralized approaches, a user’s mobile device generates ephemeral randomized tokens that are regularly broadcast to (and received by) nearby mobile devices. Devices save the tokens they broadcast and the ones they receive for a defined period of time. Once an individual tests positive, he can opt to report all the tokens his application generated. Reporting is done with the help of the healthcare provider or some third party. Other individuals who saw the token, and accordingly were in close physical proximity to the infected individual, learn they are atrisk and may seek testing and/or quarantine as a result. On the other hand, centralized approaches use a central server to generate ephemeral tokens that users share with each other and reporting involves the central server making the connections and alerting users [Vaudenay:2020].
The majority of these existing proposals for automated contact tracing are slow to react and do not adequately address exposure risk. Specifically, they only alert the first level of individuals who have come in close contact with the infected individual after the latter tests positive. Such first order contact tracing may not be fast enough to control the spread of the virus in a timely manner given that there is a period of time in which individuals can be asymptomatic but infectious, and this period is generally longer than the virus incubation period. Consider for example the following scenario. Alice is asymptomatic but infectious at time and comes in contact with Bob. Bob gets infected and comes in contact with Charlie at time where is the virus incubation period. Alice starts showing symptoms and tests positive at time , at which point Bob gets notified. Bob may not show symptoms, may wait to get tested, or may not even get tested. Even if Bob gets tested at time , there is a period of time (could be several days) during which Charlie is not even aware of the exposure risk, and is going about his business as usual.
We propose a new protocol for privacypreserving contact tracing that does not suffer from these limitations. Our protocol permits an individual A who tests positive to quickly furnish a cryptographic proof attesting to the the following statements:

individual A was in close proximity to individual B at some time

individual A tested positive for the virus at time

is within 14 days of
in zero knowledge i.e., without leaking information about A or B. A produces and publishes the proof. Anyone, including, B can publicly verify the proof, and seek testing if the proof checks and they are involved. Using this first proof, individual(s) B who came in close proximity to A can then quickly publish a cryptographic proof attesting to the fact that B was in close proximity to the individual who tested positive (in this case A), and that B was in close proximity to other individual(s) C at . This allows individual(s) C who came in close proximity to B to realize their exposure risk in a timely manner, and act accordingly.
Our protocol relies on zeroknowledge succinct noninteractive arguments of knowledge (zkSNARK) as the cryptographic building block. Cryptographic proofs are succinct, consisting of only a few hundred bytes, and take just a few milliseconds to verify. The zero knowledge property ensures that a person verifying these proofs only learns the statement “I was close to someone who tested positive for the virus” or “I was close to someone who was close to someone who tested positive for the virus” and so on, but nothing else such as who the person is or where the interaction occurred (with some caveats discussed later in the paper). Our approach is fully decentralized requiring little assistance from the healthcare provider, and may be extended to support broader functionality (e.g., proof of health/immunity). We start by describing the simple proofofcontact protocol, and extend it to support transitive exposure by composing proofs of contact using proofcarrying data (PCD) [chiesa2012proof]. Additionally, we demonstrate a simple SNARKbased construction of proofofhealth/immunity, useful for a society in the restoration phases of a pandemic.
In summary, our protocol offers the following benefits:

Efficiency is achieved using efficient preprocessing zkSNARK construction and performing the signature verification outside the SNARK to reduce prover cost [naveh2016photoproof].

No trusted third parties or databases required. The public registry need not be trusted. We only require that the zkSNARK for the desired functionality is correctly setup.

Strong endtoend privacy guarantees. Proximity tokens are not shared; not with third parties nor with healthcare providers.

Correctness. A valid proof guarantees the authenticity of the user’s test results and the validity of the statement.

Adoption/Practicality. A medical organization only needs to sign records using an existentially unforgeable and publicly verifiable signature scheme. This is a simple task for the medical organization and it deters malicious (noninfected) users from seeking signatures.

Decentralization. The burden is on the infected person to actually prove and publish. This allows better scaling (instead of requiring providers or third parties to centrally manage patient proximity data) and potentially better privacy since the user (the stakeholder) has full control over their private data and can share at will.

Exposure risk using proof composition. Transitive exposure risk is available in a timely manner through proof composition. This allows users who may have had secondhand contact with an individual who tested positive for a virus, to learn in zeroknowledge about this exposure.
2 Background
2.1 Proximity Tokens
Contact tracing requires monitoring and recording physical interactions between clients. For example, if Alice walks into a cafe where Bob is eating, a method for detecting and measuring their proximity is needed. There have been several works proposing various means of proximity sensing between mobile phones, including using Bluetooth [liu2013face], WiFi [sapiezynski2017inferring], and audio [thiel2012sound] signals.
Regardless of the underlying technology, we assume a mobile phone frequently broadcasts proximity tokens that are received by nearby phones. For example, within each epoch, the phone frequently broadcasts its unique token, and receives tokens from nearby phones. This simplified model has been adopted by the majority of decentralized privacypreserving contact tracing protocols [Vaudenay:2020].
2.2 Preprocessing zkSNARK
We review the definitions of arithmetic circuits, preprocessing zero knowledge succinct noninteractive arguments of knowledge (ppzkSNARKs) and we refer the reader to [ben2017scalable] for details.
First, we introduce arithmetic circuit satisfiability in Field . An arithmetic circuit is defined by the relation . Here is called the witness (auxiliary input) and is the public input and the output is . The language of the circuit is defined by . Here (i.e., is represented as field elements), , and the output in .
A hashing circuit for example takes the (private) input/witness and its hash , and asserts that .
A preprocessing zkSNARK (ppzkSNARK) for arithmetic circuit satisfiability comprises three algorithms , corresponding to the Generator, the Prover, and the Verifier.

Given a security parameter and the arithmetic circuit , sample a keypair comprising a public proving key and a public verification key .

Given the public prover key and any , generate a succinct proof attesting that

checks that is a valid proof for .
2.3 Proofcarrying data
Proofcarrying data (PCD) captures the security guarantees necessary for recursively composing zkSNARKs. More specifically, given a compliance predicate , a PCD system checks that a computation involving a set of incoming messages , private local data , and outgoing message , is compliant.
Formally, a proofcarrying data system consists of three polynomialtime algorithms corresponding to the Generator, Prover, and Verifier.

Given a security parameter and the compliance predicate expressed as a arithmetic circuit, sample a keypair comprising a public proving key and a public verification key .

Given the public prover key , a set of input messages along with compliance proofs , local input , and output , generate a succinct proof attesting that is compliant.

checks that is compliant.
3 ProofofContact Protocol
Consider an existentially unforgeable signature scheme (e.g., ECDSA) with private signing key and public verification key . Let be three collisionresistant hash functions. Let be a ppzkSNARK. The baseline protocol builds on [canetti2020anonymous] and works as follows:

Trusted setup phase: a trusted entity sets up the system and runs the generator algorithm ; we describe the circuit in more detail shortly. During this phase, each healthcare provider obtains a certificate for its signing key signed by a trusted certification authority.

Each user generates a private random string

User A generates a random token every time period (the epoch e.g., 5 minute intervals) as , and frequently broadcasts the token. We omit the time subscript hereafter whenever it is clear.

Whenever user A receives a proximity token from user B at time , she computes and stores it for 14 days. User B computes the same output. Here we sort the tokens (e.g., lexicographically) before passing them to the hash function.

User A tests positive for the virus at time , and obtains a “COVID.positive” test result from a medical provider. User A computes and requests signature from the healthcare provider where is the provider’s private signing key. Note that user A does not have to reveal her secret to the provider. User A may provide only, and a cryptographic proof that for some valid private witness .

User A then generates a short cryptographic proof using attesting to these facts




days


User A publishes tuple to some public registry. If the public registry already contains a tuple with the value , then the user does not upload these values (in order to prevent linkability). Several techniques may be used here for network unlinkability (e.g., the user app can either use mixing or onion routing solutions, or the provider can publish the material on behalf of the user).

User B checks the public registry periodically to find a matching and can quickly verify the proof using . If the proof checks, user B verifies the signature given and the public verification key of the healthcare provider.

User B seeks testing, and can show the proofofcontact to her healthcare provider to expedite the process if needed.
3.1 Security Analysis
 Linkability

Tokens are never shared, or published. Only the hash of two tokens is published after a user tests positive. This means different tokens may not be linked as belonging to the same user. The same is true with linking different hashes. Recall when reporting a positive test, user A publishes for all proximity edges. Only user B or some dishonest user C who forms a clique with A and B at time may learn . Since user C is part of the clique, does not leak additional information. User C cannot use to create valid proofs on behalf of A or B without knowledge of their private strings .
 Identification

After seeing a proof containing , a curious user B who keeps track of all physical encounters can a posteriori identify the infected person in some form. This attack is common to the majority of the decentralized systems [Vaudenay:2020]. We observe that some form of this leakage is inherent to the protocol. For example, if user B has only encountered one person before getting alerted, user B will be able to identify the infected person no matter how privacypreserving the alert/protocol is. This may be acceptable in some cases, for example, learning that the “tall person in the dairy aisle at the grocery store” tested positive.
4 Transitive exposure proofs
As discussed earlier, it can be beneficial to provide more granular th order exposure risk data to users to limit the spread of the virus. For example, a user may want to know whether they have had transitive exposure to a virus. Consider that Alice comes in contact with both Bob and Charlie independently of one another. Later, Bob tests positive for the virus, and Alice is alerted that she is at risk. Although Charlie did not directly come in contact with a carrier of the virus, he may find it useful to know that someone he came in contact with has. This transitive approach to contact tracing could enable more informative statistics for users such as a risk profile, i.e., a risk score based on how many degrees of exposure an individual has. Someone who is four transitive hops away from a virus carrier would be at lower risk from someone who is two hops away.
A strawman approach to extending the proofofcontact protocol for transitive proofs works as follows:

As in the original protocol, a trusted entity sets up the system and runs the generator algorithm ; here is an additional circuit with corresponding prover and verifier keys , for proving transitive exposure.

User B checks the public registry periodically to find a matching (from some user A who tested positive) and can quickly verify the proof using . If the proof checks, user B verifies the signature given and the public verification key of the healthcare provider.

User B then generates a short cryptographic proof using attesting to these facts



days


User B publishes tuple to the public registry.

User C checks the public registry periodically to find a matching and can quickly verify the proof using . If the proof checks, user C can recursively verify the next proof in the chain until eventually arriving at the original proof. Finally, user C verifies the original proof using .
Observe that in this case the SNARK includes the constraint , corresponding to the day incubation period of COVID19. The parameter is configurable, however, in general the time that Bob comes in contact with Charlie should come after the time Bob came in contact with Alice plus the incubation period. This will reduce the number of false positives that arise when Bob alerts Charlie of 2ndorder exposure even though Bob could not have possibly become contagious from Alice yet.
4.1 Transitive exposure using proofcarrying data
The above protocol suffers from a linkability flaw with the uploaded pairs. An adversary observing the public registry can deduce that whoever uploaded the tuple must have came in contact with the person who uploaded the tuple . In order to circumvent this drawback, we modify the protocol to use proofcarrying data (PCD). Using PCD, previous proofs in the chain are verified and a proof that this verification was performed correctly is provided. The PCD system hides the details of intermediate proofs, while allowing a user to verify that the entire chain is valid. Instead of uploading the pairs , transitive proofs consist only of single values which are indistinguishable from random.
For proofofcontact, we represent the compliance predicate as the hospital signature verification algorithm , coupled with the steps necessary to prove that the randomness of is consistent with the randomness of some . More formally, a user who tested positive can perform the compliant computation that takes as input , and outputs satisfying the following constraints:




days

The user then uploads the value along with a cryptographic proof attesting that is compliant.
For proving transitive exposure, a user B who sees the value along with the PCD proof attesting to firsthand exposure, performs the compliant computation that takes as input , and outputs satisfying the following constraints:



days
Additionally, user B runs a verifier circuit over and provides a cryptographic proof that and is compliant. Figure 1 illustrates the complete flow from proofofcontact to proof of transitive exposure.
 Choice of digital signature scheme

Encoding the digital signature verification scheme inside the compliance predicate is expensive with respect to circuit size. For this reason, we choose the RSA digital signature scheme which can be represented efficiently over by choosing public exponent and performing modular multiplication via radix arithmetic as suggested in [naveh2016photoproof].
4.2 Proofs of surface transmission via PCD
In some cases, contact tracing by measuring proximity between users may not be sufficient for effectively curbing the spread of a virus. A virus that lives for extended periods on surfaces could transmit from one user to another even though they have never been in close contact. For example, if a contagious user Alice sits on a park bench, Bob, who visits the park the next day, may become infected from sitting on the same bench. If Alice tests positive, it would be ideal that users who are at risk from the surface spread of the virus are alerted.
One approach is to place Bluetooth devices around public spaces, and have them participate in the contact tracing protocol. The devices could exchange tokens with users and verify proofs in the usual way. After discovering a matching token, and verifying the corresponding proof, the device uploads a transitive proof of exposure, which alerts users of the surface transmission risk.
Suppose rather than using PCD, the Bluetooth device on the park bench simply uploads its secondary tokens, i.e., the tokens exchanged with users within 14 days of Alice’s park visit. Although Bob is alerted of his surface contact, he must trust that the park bench Bluetooth device is acting honestly since he has no way of verifying that an infected user actually came in contact with the park bench. By using PCD, Bob maintains all the security and privacy guarantees that hold from the original contact tracing protocol.
5 Performance Evaluation
We implemented a simplified proofofconcept zeroknowledge proofofcontact SNARK using the libsnark library [libsnark]. The library uses the NPcomplete language R1CS to express the arithmetic circuits that represent the SNARK. There are existing R1CS gadgets for performing useful functionality, such as comparisons and collisionresistant hashing. It includes an implementation of the subsetsum collisionresistant hashing gadget, which we use as an efficient oneway hash.
We characterize the performance of our proofofcontact SNARK in terms of the running time and key sizes for both the prover and verifier (Table 1). Since the generator phase is only executed once during setup, we provide concrete numbers on the size of the arithmetic circuit (3060 gates) but disregard the time of the generator (166 ms). The circuit did not account for sorting.
Prover  Verifier  

Running time (ms)  65  9 
Key size (KB)  722  30 
6 Related Work
There are a few existing proposals for privacypreserving contact tracing [canetti2020anonymous][dp3t2020][pepppt2020][applegoogle2020][berke2003assessing][raskar2020apps]
. Although most of these works suggest similar techniques for estimating and exchanging proximity information between users, the underlying cryptographic protocols and their privacy guarantees differ.
[canetti2020anonymous][pepppt2020] use randomly generated pseudonyms that nearby users can exchange over Bluetooth. Individuals who test positive for a virus can upload their generated pseudonyms to a public registry, allowing other users to match the tokens they have collected with those in the registry. Both works suggest that healthcare workers should be the ones to upload users’ tokens to the public registry after giving a positive test diagnosis in order to prevent malicious polluting of the database. Similar to the protocol introduced in this work, mixing can be applied to prevent linkability via traffic analysis.
Apple and Google have released a protocol specification that closely resembles that of [canetti2020anonymous] and [pepppt2020]. Users generate a rolling pseudorandom identifier and some associated encrypted metadata, that nearby users exchange over Bluetooth. The pseudorandom identifiers are derived using the current time and temporary exposure keys, which get distributed after a positive diagnosis.
Finally, [berke2003assessing] proposes partitioning GPS and time data into discrete spatiotemporal points and obfuscating these points using a oneway hash function. Infected users upload their obfuscated location histories after redacting personally identifiable information such as the GPS coordinates that represent a home or work address. Using privateset intersection (PSI), individuals can privately determine whether or not their location history overlaps with that of infected users. The privacy guarantees of such an approach differ significantly from that offered in [canetti2020anonymous][pepppt2020][applegoogle2020], and those presented in this paper.
The approaches described above provide different flavors of privacy and decentralization. However, each solution places an increased burden on the healthcare providers relative to the zeroknowledge SNARK technique we have outlined. Our approach requires only that healthcare workers sign positive diagnoses rather than generate onetime codes or upload tokens to a public registry. Additionally, our approach is fully decentralized and supports broader functionality such as proofs of transitive exposure, not currently supported by the other proposed solutions.
Comments
There are no comments yet.