Providing genuine identity to all people by 2030 is a UN Sustainable Development Goal. Yet, current top-down digital identity-granting solutions are unlikely to close the 1Bn-people gap [worldbankundocumented] in time, as they are not working for citizens of failed states nor for people fleeing harshness [geisler2017impediments, humantide]. While more than half the world population is now online, the prevailing types of digital identity provided online may be fake or duplicated, resulting in lack of accountability and trust.
Granting an identity document by a state is normally a complex process as it requires careful verification of the person’s credentials. The process culminates in the state granting the applicant a state-wide identifier that is unique (no two people have the same identifier) and singular (no person has two identifiers).
Granting a globally-unique and singular identifier, henceforth referred to as genuine global identifier, might seem even more daunting, except for following fundamental premise: Every person deserves a global identifier; thus, there are no specific credentials to be checked, except for the existence of the person. As a result, a solution for granting global identity documentation to people, and in particular granting a genuine global identifier, may focus solely on ensuring a one-to-one correspondence between humans and their global identifiers. Hence, a solution that is workable for all must be a decentralized, distributed, grassroots, bottom-up, self-sovereign process in which every human being may create and own a genuine global identifier. Solid foundations are being laid by the notion of self-sovereign identities [ssi] and the W3C Decentralized Identifiers standards [did], which aim to let people freely create and own identifiers and associated credentials. We augment this freedom with the goal that each person declares exactly one identifier as its genuine global identifier.111Such identifiers may provide the necessary foundation for a notion of global citizenship and, subsequently, for democratic global governance [shapiro2018point]. (Besides the genuine global identifier, one may create, own and use any number of identifiers of other types.)
A person can become the owner of a genuine global identifier in a simple and straightforward way. With a suitable app, it could be done literally with a click of a button:
Choose a new cryptographic key-pair , with being the public key and the private key.
Declare to be a global identifier by publicly posting a declaration that is a global identifier, signed with .
Lo and behold! You have become the proud rightful owner of a genuine global identifier.222As the public key may be quite long, one may also associate oneself with a shorter “nickname”, a hash of the public key, e.g. a 128-bit hash (as a UUID) or a 256-bit hash (as common in the crypto world). Such a declaration does not necessarily expose the person making the declaration; it only reveals to all that someone who knows the secret key for the public key has declared as a genuine global identifier. Depending on personality and habit, the person may or may not publicly associate oneself with . E.g., a person with truthful social media accounts may wish to associate these accounts with its newly-minted genuine global identifier.
If becoming the rightful owner of a genuine global identifier is so simple, what could go wrong? In fact, so many things can go wrong, that this paper is but an initial investigation into describing, analyzing, and preventing them. Some of them are enumerated below; denotes an agent:
The key-pair is not new, or else someone got hold of it between Step 1 and Step 2 above. Either way, someone else has declared to be a public key prior to the declaration by . In which case cannot declare .
Agent failed to keep secret so that other people, e.g. , know , in which case is also an owner of and, thus, is compromised. Figure 1 (left) illustrates a compromised identifier.
The agent intended to divulge its public key only on a need-to-know basis, but and its association to became public knowledge, requiring agent to replace its genuine global identifier with a new one.
Agent declared , but also declared another global identifier . Then, and are duplicates, the identifier declared in the latter of the two declarations is a sybil, and agent is corrupt. An honest agent does not declare sybil identifiers. Figure 1 illustrates honest and corrupt agents.
We aim to develop a foundation for genuine global identifiers with which we can describe, characterize and aim to prevent all the problems listed above, utilizing basic concepts of public-key cryptography and graph theory.
Related Work. Genuine global identifiers aim to bridge the gap between agents and their corresponding identifiers. This distinction, acknowledged in semiotics as the difference between signified and signifier, strongly relates to the study of sense and reference initiated by Frege [frege2003sense] followed by vast literature in analytic philosophy and the philosophy of language. Conceptually, the formal framework suggested in this paper may be seen as an attempt to computationally realize unique signifiers of agents in a distributed setting.
On the practical side, digital identities is a subject of extensive study, with many organizations aiming at providing solutions. Many of these solutions, e.g. Self-Sovereign Identifiers [ssi], are not concerned about uniqueness or singularity; our requirements are different, as we aim at a one-to-one correspondence between identifiers and their owners. Some business initiatives (e.g., the Decentralized Identity Foundation333https://identity.foundation/) bring together tech giants as well as smaller organizations (e.g., the Global Identity Foundation [gif] and Sovrin [sovrin]); we aim at developing foundations for a bottom-up solution.
Some high-profile projects provide nation-wide digital identities, e.g. India’s Aadhaar system [aadhaar], Sierra Leone’s Kiva identity protocol [staats2013kiva] and the World Food Programme’s cash aid distribution program in refugee camps [kshetri2018blockchain]. Here we are concerned with global identities, not bound to any national boundaries, and argue that top-down approaches fail to provide such a solution. In this context, we mention the concept of Proof of Personhood [borge2017proof], aiming at providing unique and singular identities by means of conducting face-to-face encounters, an approach suitable only for small communities.
Our solution is based upon the notion of trust, thus we mention Andersen et al. andersen2008trust, studying axiomatizations of trust systems. They are not concerned, however, with sybils, but with quality of recommendations. Finally, we mention the work on sybil-resilient community growth [poupko2019sybil], describing algorithms for the growth of an online community that keep the fraction of sybils in it small; and work on sybil-resilient social choice [SRSC], describing aggregation methods to be applied in situations where sybils have infiltrated the electorate. In these two papers, a notion of genuine and sybil identities is used without specifying what they are; here, we define a concrete notion of genuine global identifiers, and derive from it a formal definition of sybils and related notions of honest and corrupt agents and byzantine identifiers.
Genuine Global Identifiers
Ingredients. The ingredients needed for a realization of genuine global identities are:
A set of agents. It is important to note that, mathematically, the agents form a set (of unique entities) not a multiset (with duplicates). Intuitively, it is best to think of agents as people (or other physical beings with unique personal characteristics, unique personal history, and agency, such as intelligent aliens), which cannot be duplicated, but not as software agents, which can be.
A way for agents to create cryptographic key-pairs.444We assume standard cryptographic computational hardness. This can be realized, e.g., using the RSA standard [rsapaper]. Our solution does not require a global standard or a uniform implementation for public key encryption: Different agents can use different technologies for creating and using such key-pairs, as long as the signatures-verification methods are declared.
A way for agents to sign strings using their key-pairs. As we assume cryptographic hardness, it shall not be possible for an agent that does not know a certain key-pair to sign strings with this key-pair.
A bulletin board or public ledger, to which agents may post signed messages and observe other agents’ signed messages. A critical requirement is that all agents observe the same order of messages. Future work may allow weakening this requirement so that the same order is observed only eventually, as well as allowing partial orders.
Agents and their Global Identifiers. We assume a set of agents that is fixed over time.555Birth and death of agents will be addressed in future work. Agents can create new key-pairs . We assume that an agent that has a key-pair can sign a string, and denote by the string resulting from signing the string with . Intuitively, each agent corresponds to a human being. Importantly, members of the set of agents (e.g., containing all human beings) cannot be referenced explicitly; specifically, the posted signed messages never refer directly to agents : Indeed, the desire to provide people with genuine global identifiers without having access to, and without being dependant upon, their intrinsic (e.g. biometric) identifiers is the main motivation for this work.
As we aim global identifiers to be self-sovereign identities that conform to the W3C Decentralized Identifiers emerging standards, we let agents create and own their global identifiers. An agent can publicly declare a global identifier for which it knows the private key . A global identifier declaration has the form and can be effected by agent posting to a public ledger. We denote this action by . Here we assume that all agents have the same view of the sequence of all declarations made; subsequent work may relax this assumption.
Definition 1 (Global Identifier).
Let be a sequence of global identifier declaration events and the first declaration event in which occurs. Then is a global identifier and is the rightful owner of , given .
Definition 2 (Genuine Global Identifier, Sybil, Honest, and Corrupt Agents).
Let be a sequence of global identifier declaration events and be the rightful owner of global identifier in . Then is genuine if it is the first global identifier declared in by , else is a sybil. An agent is corrupt if it declares any sybils, else is honest. (All notions are relative to .)
See Figure 1 and some remarks: (1) An agent is the rightful owner of its genuine identifier as well as of any subsequent sybils that it declares. (2) If , the rightful owner of , is corrupt, then its first declared identifier is genuine and the rest of its declared identifiers are all sybils. (3) An honest agent may create and use many key-pairs for various purposes, yet remain honest as long as it has declared at most one public key as a global identifier.
Mutual Sureties and Their Graphs
A key element of our approach is the pledging of mutual sureties by agents. Intuitively, mutual surety pledges provide a notion of trust (to be formalized below) between the owners of global identifiers; later we employ such notions of trust.
Specifically, we aim to capture the notion that two agents that know each other and know the global identifiers declared by each other, are each willing to pledge surety to the other regarding the good standing of the global identifiers.
For to provide surety regarding the global identifier of , first has to know . How this knowledge is established is not specified in our formal framework, but this is quite an onerous requirement that cannot be taken lightly or satisfied casually. E.g., we may assume that one knows one’s family, friends and colleagues, and may diligently get to know new people if one so chooses. We envision several types of sureties of increasing strength, in which an agent with global identifier makes a pledge regarding the global identifier of another agent ; all assume that the agent knows the agent . We propose four Surety Types, which are cumulative as each includes the previous ones, and explain on what basis one may choose to pledge each of them.
- Surety of Type 1: Ownership of global identifier.
Agent pledges that owns global identifier .
Agent can prove to that it owns without disclosing to . This can be done, for example, by asking to sign a novel string and verifying that is indeed signed using . This surety type is the weakest of all four, it is the one given in “key signing parties” and is implicitly assumed by applications such as PGP and web-of-trust [abdul1997pgp]. For a given surety type, we say that the surety is violated if its assertion does not hold; in particular, a surety of Type 1 is violated if in fact does not know the secret key for the public key .
In general, mutual surety between two agents with two global identifiers is pledged by both agents pledging a surety to the global identifier of the other agent.666We consider undirected graphs, as we require surety to be symmetric. Indeed, one may consider directed sureties. We define below three additional surety types, where the format of a surety pledge of Type by the owner of to the owner of is , . The corresponding surety event is , and the surety enters into effect once both parties have made the mutual pledges. We now take to be a record of both declaration events and pledge events.
Definition 3 (Mutual Surety).
The global identifiers have mutual surety of type X, , if there are for which in which case and are the witnesses for the mutual surety between and .
A sequence of events induces a sequence of surety graphs in which the vertices are global identifiers that correspond to global identifier declarations and the edges correspond to mutual surety pledges, as follows.
Definition 4 (Surety Graph).
Let be a sequence of events and let denote its first events. Then, for each , induces a surety graph of type X, , , as follows:777We allow surety pledges to be made before the corresponding global identifier declarations, as we do not see a reason to enforce order.
Observe that mutual sureties can be easily pledged by agents, technically. However, we wish agents to be prudent and sincere in their mutual surety pledges. Thus, we expect a mechanism that, on one hand, rewards the pledging of sureties but, on the other hand, punishes for surety violations, for example based on the approach of [seuken2014sybil]. While the specifics of such a mechanism is beyond the scope of the current paper, note that with such a mechanism in place, the commissive illocutionary force [illocutionarybook] of a surety pledge will come to bear.
Updating a Global Identifier with Mutual Sureties
Once creating a genuine global identifier is provided for, one must also consider the many circumstances under which a person may wish to update their global identifier:
Identifier loss: The private key was lost.
Identifier theft: The private key was stolen, robbed, extorted, or otherwise compromised.
Identifier disclosure: The global identifier was disclosed with unwarranted consequences.
Identifier refresh: Proactive identifier update to protect against all the above.
The global identifier declaration event establishes as a global identifier. To support updating a global identifier, we add the global identifier update event , which declares that is a new global identifier that replaces . A public declaration of identifier update has the form , i.e., it is signed with the new identifier. We refer to declarations of both types as global identifier declarations, and extend the assumption that a new identifier can be declared at most once to this broader definition of identifier declaration. The validity of an identifier update declaration is defined inductively, as follows.
Definition 5 (Valid identifier Update declaration).
Let be a sequence of declarations, the set of global identities declared in , and . A global identifier update event over has the form , .
A global identifier update event is valid and is the rightful owner of if it is the first identifier declaration event of and is the rightful owner of .
Valid global identifier declarations should form linear chains, one for each agent, each starting from and ending with the currently valid global identifier of the agent:
Definition 6 (Identifier Provenance Chain).
Let be a sequence of declarations and the declared set of global identities. An identifier provenance chain (identifier chain for short) is a subsequence of of the form (starting from the bottom):
Such an identifier chain is valid if the declarations in it are valid. Such an identifier chain is maximal if there is no declaration
for any and . A global identifier is current in if it is the last identifier in a maximal identifier chain in .
Note that it is very easy for an agent to make an update declaration for its identifier. However, it is just as easy for an adversarial agent wishing to steal the identifier to make such a declaration. Hence, this ability must be coupled with a mechanism that protects the rightful owner of an identifier from identifier theft through invalid identifier update declarations. Here we propose to use a stronger type of mutual sureties to support valid identifier update declarations and help distinguish between them and invalid declarations.
- Surety of Type 2:
Rightful ownership of a global identifier. Agent pledges that is the rightful owner of global identifier .
In addition to proving to that it owns , must provide evidence that itself, and not some other agent, has declared . A selfie video of pressing the declare button with , signed with a certified timestamp promptly after the video was taken, and then signed by , may constitute such evidence. A suitable app may record, timestamp, and sign such a selfie video automatically during the creation of a genuine global identifier. In particular, this surety is violated if in fact did not declared as a global identifier.
Note that immediately after an identifier update declaration has been made, the new identifier may not have any surety edges incident to it. Thus, as a crude measure, we may require that the identifier update would come to bear only after all the Type 2 surety neighbors of the old identifier, or a sufficiently large majority of them, would update their mutual sureties to be with the new identifier. To achieve that, an agent wishing to update its identifier would have to approach its neighbors and to create such updated Type 2 mutual surety pledges.
Consider two friends, agent and agent having a mutual surety pledge between them. If would lose her identifier, she would create a new key-pair, make an identifier update declaration, and ask for a new mutual surety pledge between ’s identifier and ’s new identifier.
The following observation follows from: (1) a valid identifier chain has a single owner; and (2) whether a Type 2 surety between two identifiers is violated depends on their rightful owners.
Let be a sequence of update declarations and and be two valid identifier chains in . If a Type 2 surety pledge between two global identities is valid, then any Type 2 surety pledge between two global identifiers in these chains, is valid.
The import of Observation 1 is that a Type 2 mutual surety can be “moved along” valid identifier chains as they grow, without being violated, as it should be. Below we argue that invalid identifier update declarations are quite easy to catch, thus the risk of stealing identities can be managed. In effect, we show the value of Type 2 surety pledges in defending an identifier against invalid update declarations.
Let be a sequence of declarations, be two identifier chains in , and assume that there is a valid Type 2 surety pledge between the two current global identitfiers . Now assume that the identifier update declaration is made, namely, some agent has declared to replace by . Then, it will be hard for to secure surety from and, if she attempts to do so, then will know that is not valid and thus (if is honest) a Type 2 mutual surety between and will not be established. Consider the following case analysis:
Assume notices . Then she would inform that she did not declare , and thus will know that is not valid.
Alternatively, assume that notices . She would approach to update the Type 2 mutual surety between them accordingly; would deny owning , and thus will know that is invalid.
Alternatively, would approach to update the Type 2 mutual surety has with to be with with instead; will see (or suspect, if did not reveal himself) that is not , will double check with and thus know that the declaration is invalid.
Sybil- and Byzantine-Resilient Community Growth with Mutual Sureties
Ideally, we would like to attain sybil-free communities, but acknowledge that one cannot prevent sybils from being declared, and, furthermore, perfect detection and eradication of sybils is out of reach. Thus, our aim is to allow a community of agents that are the rightful owners of genuine global identifiers to grow (i.e., admit new members) indefinitely, while retaining a bounded sybil penetration. (Note that, fortunately, democratic governance can be achieved even with bounded sybils penetration; see SRSC SRSC.)
Community history. For simplicity, we assume a single global community and consider elementary transitions obtained by either adding a single member to the community or removing a community member:
Definition 7 (Elementary Community Transition).
Let denote two communities in . We say that is obtained from by an elementary community transition, and we denote it by , if:
for some , or
for some .
Definition 8 (Community History).
Let be a sequence of events. A community history wrt. is a sequence of communities such that and holds for every .
We do not consider community governance in this paper, only the effects of community decisions to add or remove members. Hence, we assume that the sequence of events includes the events and . With this addition, induces a community history , where if for ; if for ; else = .
Definition 9 (Community, Sybil penetration rate).
Let be a sequence of events and let denote the sybil identifiers in wrt. . A community in is a subset of identifiers . The sybil penetration of the community is given by .
The following observation is immediate.
Let be the community history wrt. a sequence of declarations . Assume that , and that whenever for some , it holds that for some fixed . Then, the expected sybil penetration rate for every is at most .
That is, the observation above states that a sybil-free community can keep its sybil penetration rate below
, as long as the probability of admitting a sybil to it is at most. While the simplicity of Observation 2 might seem promising, its premise is naively optimistic. Due to the ease in which sybils can be created and to the benefits of owning sybils in a democratic community, the realistic scenario is of a hoard of sybils and a modest number of genuine global identifiers hoping to join the community. Furthermore, once a fraction of sybils has already been admitted, it is reasonable to assume that all of them (together with their perpetrators of course) would support the admission of further sybils. Thus, there is no reason to assume neither the independence of candidates being sybils, nor a constant upper bound on the probability of sybil admission to the community. Hence, in the following we explore sybil-resilient community growth under more realistic assumptions.
Sybil-Resilient Community Growth
A far more conservative assumption includes a process employed by the community with the aim of detecting sybil identities. We shall use the abstract notion of sybil detector in order to capture such process, that may take the form of a query, a data-based comparison to other identifiers, or a manual examination by some other agent. To leverage this detector to sybil-resilient community growth regardless of the sybil distribution among the candidates, we shall utilize a stronger surety type, defined as follows:
- Surety of Type 3:
Rightful ownership of a genuine global identifier. Agent pledges Surety Type 2 and that is the genuine global identifier of .
Providing this surety requires a leap of faith. In addition to obtaining from a proof of rightful ownership of , must also trust not to have declared any other identifier prior to declaring . There is no reasonable way for to prove this to , hence the leap of faith.
Since Type 3 sureties inherently aim to distinguish between genuine and sybil identifiers, our approach for sybil-resilient community growth is established upon the underlying surety graph. Specifically, we consider a setting where potential candidates to join the community are identifiers with a surety obtained from current community members. Conversely, we consider a violation of a surety in one direction as a strong indication that the surety in the other direction is violated. That is, if and was shown to be sybil, (i.e., has declared some other as a global identifier before declaring as a global identifier), then should undergo a thorough examination in order to determine whether it is sybil as well.
Next, we formalize this intuition in a simple stochastic model where admissions of new members are interleaved with random sybil detection among community members:
An identifier is admitted to the community via an elementary community transition only if there is some with .
Every admittance of a candidate is followed by a random sybil detection within the community: An identifier is chosen uniformly at random. If is genuine it is declared as such. If is sybil, it is successfully detected with probability .
The detection of a sybil identifier implies the successful detection of its entire connected component (with probability 1). That is, if is detected as sybil, then the entire connected component of in is detected and expelled from the community.
The sybil identities are operated from at most disjoint sybil components in . Furthermore, we assume that sybils join sybil components uniformly at random, i.e., a new sybil member has a surety to a given sybil component with probability , else, it forms a new sybil component with probability .
Note that assumption (2) is far weaker than the premise in Observation 2 as it presumes nothing on the sybil penetration among the candidates, but rather on the proactive ability to detect a sybil, once examined. Assumption (3) exploits the natural cooperation among sybil identities, especially if owned by the same agent, and assumes that if a sybil is detected by a shallow random check with probability , then all its neighbours will be examined with a thorough examination that would detect sybils with probability , and that the process will continue iteratively until the entire sybil component connected to the initially-detected sybil is identified. In assumption (4), the choice of both the parameter and the locations of the components are adversarial – the attacker may choose how to operate. While realistic attackers may also choose to which component shall the new (sybil) member join, uniformity is assumed to simplify the analysis. Possible relaxations of this model to adhere for a more realistic scenario are deferred to future work.
Our main result for this setting is an upper bound on the expected sybil penetration, assuming bounded computational resources of the attacker.
In the stochastic model described above, obtaining an expected sybil penetration is NP-hard for every constant .
Let denote a sybil component within the community at time . In the stochastic model described above, is detected and immediately expelled with probability . The expected size of the component in this model is obtained in a steady state, i.e., in a state in time where , that is:
where and . It follows that . Solving this quadratic equation implies that the size of a single sybil component in the steady state is . It follows that the number of sybil identities in the community in a steady state is .
The crucial observation now is that operating from nonempty sybil components corresponds to obtaining an independent set of size (at least, choosing a single vertex in each component). The theorem follows from the fact that approximating independent set within a constant factor is NP-hard (see, e.g., arora2009computational arora2009computational). ∎
The following corollary establishes an upper bound on the sybil penetration rate regardless of the attacker’s computational power. The result is formulated in terms of the second eigenvalue of the graph restricted to. (A formal definition of is provided in the Supplementary Material.)
Let be a community history wrt. a sequence of events . If every community with satisfies , then the expected sybil penetration in every under the stochastic model depicted above, is at most .
Recall that the size of the maximal independent set is a trivial upper bound on . The cardinality of an independent set in a -expander is at most [hoory2006expander]; thus, . It follows that the number of sybil identities in the community in a steady state is . ∎
Byzantine-Resilient Community Growth
Here we consider the challenge of byzantine-resilient community growth. Intuitively, the term byzantines aims to capture identifiers owned by agents that are acting maliciously, possibly in collaboration with other malicious agents. Formally, we define byzantines as follows.
Definition 10 (Byzantine and harmless identifiers, Byzantine penetration).
An identifier is said to be byzantine if it is either a sybil or the genuine identifier of a corrupt agent. Non-byzantine identifiers are referred to as harmless. We denote the byzantine and harmless identifiers in by , respectively. The byzantine penetration of a community is given by .
Since , it holds that for every community , hence an upper bound on the byzantine penetration also provides an upper bound on the sybil penetration.
To achieve byzantine-resilient community growth, we need a stronger surety type, defined as follows:
- Surety of Type 4:
Rightful ownership of a genuine global identifier by an honest agent. Agent pledges Surety Type 3 and, furthermore, that is a genuine global identifier of an honest agent .
Here has to put even greater trust in : Not only does has to trust that her past actions resulted in being her genuine global identifier, but she also has to take on faith that has not declared any sybils since and, furthermore, that will not do so in the future. Note that a Type 4 surety is violated if ever declares some other as a global identifier after declaring its genuine global identifier . See Figure 2 for illustrations of violations of sureties of Types 3 and 4.
We provide sufficient conditions for Type 4 sureties to be used for byzantine-resilient community growth.
Let be a community history wrt. sequence . Let , , and . Assume that , and that for all :
for all .
Every satisfies .
Then, every community has Byzantine penetration .
The definition of conductance and a proof are given in the Supplementary Material. Roughly speaking, Theorem 2 suggests that whenever: (1) the surety graph has a bounded degree, (2) a single identifier is added to the community at each time step, (3) sufficiently many edges are adjacent to community members, (4) edges between harmless and byzantine identifiers are relatively scarce, and (5) conductance within each community is sufficiently high, then the community may grow indefinitely with bounded byzantine penetration.
An analogous theorem was presented by Poupko et al. poupko2019sybil, which assumed a notion of honest (what we refer to as genuine) and sybil identities without defining what they are; this also results in a difference in the ensuing definitions of harmless and byzantine identifiers.
We provided a formal foundation for genuine global identifiers, their mutual sureties and their applications.
While this paper is quite formal, we aimed the constructions to be readily amenable to implementation, and hinted at some needed implementation ingredients.
As promising future work, we note the following:
Directed surety graphs, as opposed to the undirected graphs considered here;
Surety graphs containing surety edges of various types combined;
Agents birth and death, to accommodate a dynamic, real-world setting.
Relaxations of the surety-based community growth assumptions.
More broadly, realizing the proposed solution entails developing additional components, notably sybil-resilient governance mechanisms, e.g. along the lines of [SRSC]; a mechanism for encouraging honest behavior and discouraging corrupt behavior, e.g. along the lines of [seuken2014sybil]; and a cryptocurrency to fuel such a mechanism and the system in general.
We thank the generous support of the Braginsky Center for the Interface between Science and the Humanities.