In the design of contact tracing apps, the choice between ‘centralised’ and ‘decentralised’ architectures has received great public attention. The latter, which has been adopted by the majority of countries, by the Google-Apple API and by the DP3T consortium [troncoso2020decentralized], has many privacy advantages, but one disadvantage is that each individual user is able to determine which of the many tokens they have collected came from an infected user, and consequently (by recalling the precise time and strength of the contact), may be able to identify the infected individual among their contacts [vaudenay2020centralized]. This may compromise the privacy of the infected person, and violates the principle of manual contact tracing that a person should be told only that they have been in contact with an infected person, and not the person’s identity [cdc2020contact].
One possible solution to this problem, discussed in [chan2020pact], is for infected users to send to the system the tokens they have collected rather than those they have distributed. These can then be rerandomised before being broadcast to all users, so that users are able to recognise a rerandomised token as being derived from one they have broadcast, but not specifically which one. The simple protocol described in [chan2020pact] is as follows (working in a multiplicative group of prime order with generator ):
Each user generates a keypair with random .
Each broadcast token is of the form , where each is chosen uniformly at random.
On testing positive, a user who has received the token uploads for fresh random .
To determine whether they are at risk, user checks whether a token of the form with is present on the server.
Conditional on the Decisional Diffie-Hellman (DDH) assumption on , the tokens generated by each user are pseudorandom, and furthermore a user who learns they are at risk cannot tell which of their tokens (generated using ) was reported, because of the exponentiation by fresh random . However, as the authors note, this protocol is fatally flawed in the presence of malicious users. Such a user can simply use a fresh for every token they generate, and this cannot be detected or prevented (one could imagine requiring every token to be accompanied by a zero-knowledge proof that it was generated using one of the keys in a public list of public keys, but this would not be remotely practical).
In this note we show how to modify the protocol so as to be robust against fully malicious users, at the cost of requiring the server to send a ‘personalised’ (but non-secret) set of rerandomised tokens to each user. The essential idea is to extend the rerandomisation step such that the messages corresponding to malformed tokens are flat random.
2 Protocol description
We describe the protocol in three phases: registration; broadcasting, where a user has contacts with others and transmits tokens; and infection, after the user has tested positive. Throughout, is assumed to be a multiplicative abelian group of prime order , with generator .
Registration phase: sample and send (non-anonymously) to the server, which adds it to the list of public keys.
Since registration is not required to be anonymous, the server can ensure that each individual is only able to register a single key.
Broadcast phase: sample , and broadcast the token
replacing with a fresh random value after a suitable period.
Infection phase: a user who tests positive sends the server the list of tokens it has received. The server verifies that each token has (discarding those that fail). At the end of each day, for each user , say with public key , for each token in its list the server samples and sends to
On receiving , user checks whether , and if so knows that they are infected.
Note that ambiguity (but not other privacy properties) is dependent on the honesty of the server; similarly, in manual contact tracing ambiguity is dependent on the discretion of the tracer.
To establish correctness, observe that if came from a token broadcast by , then we have
as required (for some ).
The communication cost of this protocol is equivalent to that of just sending each user a list of all the tokens from infected users, with computational cost per message passed from server to user (a single exponentiation by the user, and four exponentiations by the server).
3 Security properties
In this section we establish three key security properties of the protocol. First, conditional on the DDH assumption on , the tokens broadcast by a user with randomly chosen key are computationally indistinguishable from independent random group elements, even with knowledge of the public key (Theorem 1), and so no privacy is lost by uninfected users. Second, for each user the output of Shuff
(as a probability distribution onwith random ) is equal on all tokens honestly generated by (Theorem 2), and so an honest-but-curious will be unable to determine which of their broadcast tokens corresponded to contact with an infected person. Third, for any the output of Shuff on any input other than a token honestly generated by is uniformly random (Theorem 3), and so a malicious user is not able to defeat ambiguity by broadcasting malformed tokens.
Let be a positive integer, and . If satisfies the DDH assumption then
Let be a PPT algorithm distinguishing the two distributions, and let be a DDH challenge (so either or for random ). Run on for random . ∎
Let and . Then for all we have
We have . Since
is uniformly distributed, so isand hence so is , as required. ∎
Let . Then for all either for some or we have
Without loss of generality for some and . If then . Then
Since and are independently uniformly distributed (IUD), we have that and are IUD, and hence so are and . Hence since we have that and are IUD and hence so are and , as required. ∎
The other approach for achieving ambiguity of which the author is aware is to use a Private Set Intersection Cardinality (PSI-CA) protocol to allow users to determine whether the set of tokens they have collected intersects with the set of tokens held by the server from infected users, without learning which tokens are in the intersection. This was proposed independently in [trieu2020epione] and in [contrail2020]. The security analysis in [trieu2020epione] is expressly limited to the semi-honest setting, although it is suggested that one could guard against a dishonest user by requiring them to provide zero-knowledge proofs of correct behaviour, no doubt with significant performance consequences.
The protocol in [contrail2020] is similarly clearly flawed in the presence of a fully malicious user (specifically, at step 2 of the protocol, Alice may use different values of for different and thereby reidentify elements despite Bob’s permutation). Moreover, no proofs are provided for the claimed security properties, and it seems that even in the semi-honest setting the claim that the server obtains no information about the contacts of undiagnosed users may be incorrect (for example, if the authorities can send to a suspect two tokens and such that then they will be able to identify the suspect as Alice when she performs the protocol).
The trick for this protocol was to ensure correct behaviour not by cumbersome zero-knowledge proofs but by rerandomising in such a way that a malformed token just results in the malefactor seeing random noise. The most important question for future work is whether a similar trick can be applied to obtain a lightweight DH-based protocol for PSI-CA which is robust against a fully malicious adversary. This would be extremely desirable because it could easily be added to DP3T-style systems with no changes to the system structure or to the technically-constrained Bluetooth Low Energy tokens.
A second question is whether it is possible for the server, rather than sending all the rerandomised tokens to each user, to instead combine them in some way such that the user can tell whether they included at least one of the special form . This would be desirable for both performance and privacy reasons, since it would prevent users from learning how many of the tokens sent in by infected individuals were theirs. If the question was whether they were all of the special form then this would be trivial: just multiply together all the tokens componentwise. Unfortunately we have been unable to find a similar solution for the ‘disjunctive’ task.