Symbolic Timed Observational Equivalence

01/12/2018 ∙ by Vivek Nigam, et al. ∙ SRI International 0

Intruders can infer properties of a system by measuring the time it takes for the system to respond to some request of a given protocol, that is, by exploiting time side channels. These properties may help intruders distinguish whether a system is a honeypot or concrete system helping him avoid defense mechanisms, or track a user among others violating his privacy. Observational equivalence is the technical machinery used for verifying whether two systems are distinguishable. Moreover, efficient symbolic methods have been developed for automating the check of observational equivalence of systems. This paper introduces a novel definition of timed observational equivalence which also distinguishes systems according to their time side channels. Moreover, as our definition uses symbolic time constraints, it can be automated by using SMT-solvers.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Time side channels can be exploited by intruders in order to infer properties of systems, helping them avoid defense mechanisms, and track users, violating their privacy. For example, honeypots are normally used for attracting intruders in order to defend real systems from their attacks. However, as honeypots run over virtual machines whereas normal client systems usually do not, it takes longer for a honeypot to respond to some protocol requests. This information can be used by the attacker to determine which servers are real and which are honeypots. For another example, passports using RFID mechanisms have been shown to be vulnerable to privacy attacks. An intruder can track a particular’s passport by replaying messages of previous sessions and measuring response times.

The formal verification of such properties is different from usual reachability based properties, such as secrecy, authentication and other correspondence properties. In the verification of reachability properties, one searches for a trace that exhibits the flaw, e.g., the intruder learning a secret. In attacks such as the ones described above, one searches instead for behaviors that can distinguish two system, e.g., a behavior that can be observed when interacting with one system, but that cannot be observed when interacting with the other system. That is, to check whether the systems are observationally distinguishable. This requires reasoning over sets of traces.

Various notions of observational equivalence have been proposed in the programming languages community as well as in concurrent systems [2, 7, 28, 22] using, for example, logical relations and bisimulation. Observational equivalence has also been proposed for protocol verification notably the work of Cortier and Delaune [14]. A number of properties, e.g., unlinkability and anonymity [3], have been reduced to the problem of observational equivalence. As protocol verification involves infinite domains, the use of symbolic methods has been essential for the success of such approaches.

The contribution of this paper is three-fold:

  • Symbolic Timed Observational Equivalence: We propose a novel definition of timed equivalence over timed protocol instances [30]. Timing information, e.g., duration of computation, is left symbolically and can be specified in the form of time constraints relating multiple time symbols, e.g., ;

  • SMT Solvers for proving Time Observational Equivalence: SMT solvers are used in two different ways. We specify the operational semantics of timed protocols using Rewriting Modulo SMT [31]. Instead of instantiating time symbols with concrete values, in Rewriting Modulo SMT, a configuration of the system is symbolic and therefore may represent an unbounded number of concrete configurations. Rewriting of a symbolic configuration is only allowed if the set of (time) constraints in the resulting state is satisfiable. SMT-Solvers are used to perform this check. This means not only that there are a finite number of symbolic traces starting from a given configuration, but also reduces considerably the search space needed to enumerate these traces. We demonstrate this with experiments.

    The second application of SMT-Solvers is on the proof of timed observational equivalence, namely, to check whether the timing of observations can be matched. This check involves the checking for the satisfiability of formulas [17].

  • Implementation: Relying on the Maude [12] support for Rewriting Modulo SMT using the SMT-solvers CVC4 [4] or Yices [yices], we implemented in Maude the machinery necessary for enumerating symbolic traces. However, as checking for the satisfiability of formulas [17] is not supported by Maude, we integrate our Maude machinery with the SMT solver Yices [17]. We carry out some proof-of-concept experiments demonstrating the feasibility of our approach.

Section 2 describes some motivating examples on how intruders can using time side channels for his benefit. We introduce the basic symbolic language in Section 3 and the timed protocol language in Section 4. Section 5 introduces symbolic timed observational equivalence describing how to prove this property. Section 6 describes our implementation architecture and the experiments carried out. Finally, in Section 7, we conclude by commenting on related and future work.

Some missing proofs are shown in the Appendix.

2 Examples

We discuss some motivating examples illustrating how intruders can exploit time side channels of protocols.

Red Pill

Our first example is taken from [23]. The attack is based on the concept of red pills. The overall goal of the attacker is to determine whether some system is running on a virtual machine or not. As honeypots trying to lure attackers normally run on virtual machines, determining if a system is running on a virtual machines or not gives an attacker one means to avoid honeypots [23]. The system running in a virtual machine or a concrete machine follow exactly the same protocol.

When an application connects to the malicious server, the server first sends a baseline request followed by a differential request. The time to respond to the baseline request is same whether running in a virtual machine or not and is used for calibration. The time to respond to the differential request is longer when executed in a virtual machine. When not taking time into account, the set of traces for this exchange is the same whether the application is running on a virtual machine or not. However, if we also consider the time to respond to the two requests, the timed traces of applications running on virtual machines can be distingushed from those of applications running on native hardware.

Passport RFID

Our second example comes from work of Chothia and Smirnov [11] investigating the security of e-passports. These passports contain an RFID tag that, when powered, broadcast information intended for passport readers. Also, once powered, e-passport broadcasts can’t be turned off. Chothia and Smirnov identified a flaw in one of the passport’s protocols that makes it possible to trace the movements of a particular passport, without having to break the passport’s cryptographic key. In particular, if the attacker records one session between the passport and a legitimate reader, one of the recorded messages can be replayed to distinguish that passport from other passports. Assuming that the target carried their passport on them, an attacker could place a device in a doorway that would detect when the target entered or left a building. In the protocol, the passport receives an encryption and a mac verifying the integrity of the encryption. The protocol first checks the mac, and reports an error if the check fails. If the mac check succeeds, it checks the encryption. This will fail if the encryption isn’t fresh. When the recorded encryption, mac pair is replayed to the recorde passport, the mac check will succeed but the encryption check will fail, while the mac check will fail when carried out by any other passport as it requires a key unique to the passport. The time to failure is significantly longer for the targeted passport than for others, since only the mac check is needed and it is faster.

Anonymous Protocol

Abadi and Fournet [1] proposed an anonymous group protocol where members of a group can communicate within each other without revealing that they belong to the same group. A member of a group broadcasts a message, , encrypted with the shared group key. Whenever a member of a group receives this message, it is able to decrypt the message and then check whether the sender indeed belongs to the group and if the message is directed to him. In this case, the receiver broadcasts an encrypted response .

Whenever a player that is not member of the group receives the message , it does not simply drop the message, but sends a decoy message with the same shape as if he belongs to the group, i.e., in the same shape as . In this way, other participants and outsiders cannot determine whether a two players belong to the same group or not.

However, as argued in [13], by measuring the time when a response is issued, an intruder can determine whether two players belong to the same group. This is because decrypting and generating a response take longer than just sending a decoy message.

3 Term Language

The basic term language contains usual cryptographic operators such as encryption, nonces, tuples. More precisely the term language is defined by the following grammar. We assume given text constants, and player names . We also assume a countable set of nonces, , and of symbols, , as well as a countable number of sorted variables, , where and are disjoint. Below represents a variable of sort player.

A term is ground if it does not contain any occurrence of variables and symbols. A term is symbolic if it does not contain any occurrence of variables, but it may contain occurrences of symbols. will range over symbolic terms. We define as the set of symbols appearing in a symbolic term.

It is possible to add other cryptographic constructions, such as hash, signatures, but in order to keep things simple and more understandable, we only include encryption. As hashes and signatures can be specified using encryption, this is not limiting. Finally, it is easy to extend the results here with fresh keys. These are treated in the same way as nonces, but to keep it simple, we do not include them.

We will use two types of (capture avoiding) substitutions. Variable substitutions written which are maps from variables to symbolic terms . Symbol substitutions written mapping symbols to symbolic terms .

3.1 Symbolic Term Constraints

Intuitively, variables are entities that can be replaced by symbolic terms, while a symbol denotes a (possibly infinite) set of terms. For example, if the symbol can be instantiated by any one of the (symbolic) terms , then the symbolic term represents the set of terms:

Such simple idea has enabled the verification of security protocols, which have infinite search space on ground terms, but finite state space using symbolic terms.

We formalize this idea by using derivability constraints. Derivability constraints are constructed over minimal sets defined below.

Definition 3.1

A set of symbolic messages is minimal if it satisfies the following conditions:

  • contains all guessable constants, such as player names and public keys;

  • does not contain tuples;

  • if if and only if where is the inverse key of ;

Formally, the symbolic terms derivable from is the smallest set defined inductively as follows:

  • if then ;

  • if and , then ;

  • if , then ;

From two minimal sets, and , we can construct the minimal set, , representing the union of and by applying the following operations until a fixed point is reached starting from :

  • and , then ;

  • , then ;

  • , then .

For example, given the minimal sets:

The minimal set obtained by the union of and is:

We consider two types of constraints on terms: Derivability constraints (Definition 3.2) and comparison constraints (Definition 3.6).

Definition 3.2

A derivability constraint has the form , where is minimal. This constraint denotes that can be any (symbolic) term derived from .

For example, the derivability constraint

specifies that may be instantiated by, e.g., the terms , and so on.

To improve readability (and also reflect our implementation), we will elide in any constraint the guessable terms. For example, we write the derivability constraint above simply as as are all guessables, namely player names and public keys.

Notice that any denotes a infinite number of symbolic terms due to the tupling closure. We will abuse notation and use to denote that the symbolic term is in the set of terms that can be instantiated with. Moreover, we assume that for any given set of derivability constraints , there is at most one derivability constraint for any given , that is, if , then . We write . We write for the derivability constraint for in if it exists. Moreover, we write if the term can be derived from .

Definition 3.3

The symbol dependency graph of a given set of derivability constraints , written , is a directed graph defined as follows:

  • Its nodes are symbols in , that is, ;

  • It contains the edge if and only if and contains at least one occurrence of .

While in general the symbol dependency graph of can be cyclic, our operational semantics will ensure that these graphs are acyclic.

Consider the following set of derivability constraints:

Its dependency graph is the directed acyclic graph (DAG).

Whenever the dependency graph of a set of constraints is a DAG, we classify the set as acyclic. We can compute a topological sort of the DAG in linear time. For example, a topological sort of

is .

Given a set of derivability constraints, we can now formally specify the set of terms that a symbolic term denotes.

Definition 3.4

Let be a symbolic term. Let be an acyclic set of derivability constraints. Assume . We define the operator as the set of symbolic terms obtained by replacing all occurrences of in by a term . Formally, the set:

Moreover, for a set of symbolic terms is the set .

Let be any topological sort of the DAG . Then the meaning of a symbolic term with respect to , written , is the set obtained by applying consecutively as follows:

For example, is the set of terms:

It contains the terms , , . The set contains the terms , by applying to the term the substitution followed by .

Notice that for any acyclic set of derivability constraints such that its lowest height symbols (w.r.t. ) have constraints of the form where are ground terms, then is an (infinite) set of ground terms. This is because the successive application of will eventually eliminate all symbols.

Given terms , we describe how to check whether . We first build the matching subsitution from symbols in to (sub)terms in . If no such matching subsitution exists, then . For each , let . We check whether recursively as follows:

  • If , return true;

  • If , then we check whether ;

  • if , then for each , we check whether .

Definition 3.5

if for each , .

The following definitions specify the second type of term constraints called comparison constraints.

Definition 3.6

A comparison constraint is either an equality constraint of the form or an inequality constraint of the form .

A set of comparison constraints should be interpreted as a conjunction of constraints. The following definition specifies when it is satisfiable.

Definition 3.7

Let be a set of derivability constraints and be a set of comparison constraints. The set is satisfiable w.r.t. , written , if there is a subsitution mapping all symbols in to ground terms in , such that:

  • for all equality constraints , ;

  • for all inequality constraints , .

We define the procedure below, , for checking whether a set of comparison constraints is satisfiable.

Definition 3.8

Let be a (finite) set of comparison constraints and a set of derivability constraints. Let be all the equality constraints in . Then is true if and only if

  1. There is a unifer of the terms and mapping symbols to symbolic terms, that is, ;

  2. For all inequality constraint , ;

  3. is consistent with (as done in Section 3.3).

Lemma 3.9

if and only if .

Moreover, the meaning of a symbolic term should take comparison constraints into account. That is, it should not be possible to replace a symbol by a term that falsifies some comparison constraint. We extend Definition 3.4 accordingly.

Definition 3.10

Let be an acyclic set of derivability constraints and a set of comparison constraints. The meaning of a symbolic term w.r.t. and , written , is the set of terms such that there exists a matching substitution :

  • ;

  • For all equality constraints , ;

  • For all inequality constraints , .

For example, and a set of a single comparison constraint . The term , but . This is because the matching substitution turns the constraint false: .

3.2 Symbolic Time Constraints

Assume a time signature which is disjoint to the message alphabet . It contains numbers (real and natural), variables and pre-defined functions.

Time Expressions are constructed inductively by applying arithmetic symbols to time expressions. For example is a Time Expression. The symbols range over Time Expressions. We do not constrain the set of numbers and function symbols in . However, in practice, we allow only the symbols supported by the SMT solver used. All examples in this paper will contain SMT supported symbols (or equivalent). Finally, the time variable will be a keyword in our protocol specification language denoting the current global time.

Definition 3.11 (Symbolic Time Constraints)

Let be a time signature. The set of symbolic time constraints is constructed inductively using time expressions as follows: Let be time expressions, then

are Symbolic Time Constraints.

For example, is a Time Constraint. Time Constraints will range over .

Intutively, given a set of time constraints , each of its models with concrete instantiations for the time variables corresponds to a particular scenario. This means that one single set of time constraints denotes a possibly infinite number of concrete scenarios. For example, the set of constraints has an infinite number of models, e.g., .

Finally, SMT-solvers, such as CVC4 [4] and Yices [17], can check for the satisfiability of a set of time constraints.

3.3 Symbolic Constraint Solving

For protocol verification, we will assume a traditional Dolev-Yao intruder [15], that is, an intruder that can construct messages from his knowledge by tupling and encrypting messages. However, he cannot decrypt a message for which he does not possess the inverse key. This is captured by the definition of minimal sets Definition 3.1.

Definition 3.12

An intruder knowledge is a minimal set of symbolic terms.

During protocol execution, the intruder sends messages to honest participants constructed from his knowledge base. Suppose an honest player is ready to receive a message matching a term , possibly containing variables. Rather than considering all possible ground instances of that the intruder could send, we consider a finite representation of thie set, namely symbolic messages where the possible values of the symbols are constrained by derivability constraints. To compute this this representation the intruder replaces variables with symbolic terms, possibly containing fresh symbols, and then constrains the symbols so that the allowed instances are exactly the terms matching that the intruder can derive from his current knowledge .

For example, consider the term (which is expected as input by an honest player). Here and are variables and is constrained by derivability constraints . We create two fresh symbols and for, respectively, the variables and . We use to denote such substitution of variables by symbolic terms. In this example . We then obtain .

It remains to solve the following problem:

Given an intruder knowledge, , and a set of derivability constraints constraining the symbols in , find a representation of all instances of a symbolic term , satisfying , that can be generated from .

We implemented the function called that enumerates all possible instances. Its specification is in the Appendix. We describe informally next and illustrate it with some examples. A similar algorithm is also used by [14].

In particular, takes as input a term , which is expected by the honest participant, the intruder knowledge and the derivability constraints for the existing symbols. then generates as output a pair:

where maps the variables of to symbols, and each is a solution to the problem above for . If , then there are no solutions, that is, the intruder is not able to generate a term which matches .

Intuitively, the function constructs a solution by either matching with a term in his knowledge (base case) or constructing from terms in and using tupling and encryption. The following examples illustrates the different cases involved:

Example 3.13

Consider the following cases for deriving the term .

  • Case 1 (matching with a term in ): Assume:

    Then the solution of is:

    where and is a fresh symbol. Notice that since is mapped to a particular term (), no derivability constraint for it is generated. Additionally, notice that is constrained to be the same as . This causes the removal of the derivability constraint ;

  • Case 2 (constructing terms from ): Assume that and has no encryption term. Then the solution of is:

    which corresponds to generatign the term .

  • Case 3 [No Solution]: Assume that and . Since cannot be instantiated to , the intruder cannot use the term .

4 Timed Protocol Language

The language used to specify a cryptographic protocol has the standard constructions, such as the creation of fresh values, sending and receiving messages. Moreover, it also includes “if then else” constructors needed to specify, for example, the RFID protocol used by passports. A protocol is composed of a set of roles.

Definition 4.1 (Timed Protocols)

The set of Timed Protocols, , is composed of Timed Protocol Roles, , which are constructed by using commands as specified by the following grammar:

Intuitively, generates a fresh value binding it to the variable , denotes sending the term and the receiving a term, and denotes that if can be matched with , that is, instantiate the variables in so that the resulting term is , then the protocol proceeds by execution and otherwise to . A command is only applicable if the associated constraint is satisfiable. We elide the associated time constraint whenever is a tautology, that is, it is always true.

Example 4.2

The Needham-Schroeder [29] protocol is specified as follows where are variables:

Example 4.3

Consider the following protocol role which is a modification of Alice’s role in the Needham-Schroeder’s protocol (Example 4.2):

Here, Alice checks whether the received message has the expected shape before proceeding. If it does not have this shape, then she sends an error message.

Example 4.4

The following role specifies the verifier of a (very simple) distance bounding protocol [8]:

It creates a fresh constant and sends it to the prover, remembering the current global time by assigning it to the time variable . Finally, when it receives the response it checks whether the current time is less than .

Example 4.5 (Passport)

Timed conditionals can be used to specify the duration of operations, such as checking whether some message is of a given form. In practice, the duration of these operations can be measured empirically to obtain a finer analysis of the protocol [11].

For example, consider the following protocol role:

This role creates a fresh value and sends it. Then it is expecting a pair of two messages and , remembering at time variable when this message is received. It then checks whether the first component is of the form , i.e., it is the correct MAC. This operation takes time units. The time variable is equal to the time , i.e., the time when the message was received plus the MAC check duration. If the MAC is not correct, an message is sent exactly at time . Otherwise, if the first component, , is as expected, the role checks whether the second component, , is an encryption of the form , which takes (a longer) time . If so it sends the message, otherwise the message, both at time which is .

Example 4.6 (Red Pill Example)

We abstract the part of sending the baseline message, e.g., the messages that establish the connection to the server, and the part that sends the differential messages. We assume that it takes to complete the exchange of the baseline messages.

Then the part of the protocol that depends on the application starts. We abstract this part using the messages and . If the application is running over a virtual machine, then takes time units; otherwise takes time units, where dVirtual > dReal.

The intruder can distinguish whether an application is running over a virtual machine or not by measuring the time it takes to complete the exchange of and messages.

Example 4.7 (Anonymous Protocol)

We specify (a simplified version of the) anonymous group protocol proposed by Abadi and Fournet for private authentication [1]. Whenever a broadcasted message is received by an agent, it checks whether it has been encrypted with the group key . If this is the case, then it checks whether the player sending the message with key is part of the group. If so, then it sends a response encrypted with his private key. Otherwise, he sends a decoy message.

Notice the use of time constraints to capture that the steps of the protocol take some time, namely and .

4.1 Operational Semantics for Timed Protocols

The operational semantics of timed protocols is given in Figure 1. The rewrite rules are rewrite configurations defined below:

Definition 4.8

A symbolic term configuration has the form , where

  • is a set of player roles of the form composed by an identifier, , a protocol , and a set of known keys ;

  • is the intruder knowledge;

  • is a set of derivability constraints;

  • is a set of comparison constraints;

  • is a set of time constraints;

  • is a time symbol representing global time.

The operational semantics of timed protocols is defined in Figure 1. The New rule replaces the (bound) variable by a fresh nonce . The Send rule sends a message which is then added to the intruder knowledge. The Receive rule expects a term of the form . The function returns the variable substitution and a set of solutions . Each solution intuitively generates a different trace. We apply in the remaining of the program and apply the symbol substitution to all symbols in the resulting configuration. This rule also has a proviso that the message is encrypted with keys that can be decrypted by the honest participant. This is specified by the function . Finally, it also adds to the set of keys of the honest participant , the keys he can learn from the message . The rule If-true checks whether the terms and can be matched from the intruder knowledge . This is done by the function which is defined in a similar fashion as . It then adds the equality constraint to the set of comparison constraints. Finally, the rule If-false replaces the variables in by fresh symbols, constrained in with the intruder knowlegde. That is if is a fresh symbol, then . It also adds the corresponding inequality constraint. The intuition of replacing variables in by fresh symbols is to specify that for any instance of these variables, the resulting term cannot be matched with as specifies the inequality constraint.

Example 4.9

Consider the Needham-Schroeder protocol in Example 4.2. Assume that the intruder initially only knows his secret key (and the guessables), and there are no symbols . An execution of Alice’s protocol role is as follows. Alice creates a fresh constant and sends the message . At this point, the intruder knowledge is:

He now can send a message to , namely where are fresh and constrained . At this point, Bob creates a fresh value and sends the message . The intruder learns this message (and no further):

Now, the intruder can fool alice by sending her a message of the form . We create a fresh symbol for obtaining and attempt to generate this message from using . Indeed we can generate this message using . This generates the . This substitution is consistent with . Notice that is not constrained. The protocol finishes by the intruder simply forwarding the message send by alice to bob. Bob then thinks he is communicating with alice, but he is not.

Each rule has two general provisos. The first is that the resulting set of comparison constraints should be consistent. This can be checked as defined in Definition 3.8.

The second, more interesting, condition is on the time symbols. Whenever a rule is applied, time constraints are added to the configuration’s constraint set. These time constraints are obtained by replacing in with together with the constraint specifying that time can only advance. The rule is fired only if the resulting set of time constraints () is consistent, which can be done using SMT solver. This way of specifying systems is called Rewriting Modulo SMT [31].

Fig. 1: Operational semantics for basic protocols. Here is a substitution mapping the variables in by fresh symbols; and the function applies the symbol substitution to the range of the variable substitution ; is the time constraint obtained by replacing in by the global time ; and . The function isReceivable checks whether the message can be decrypted with the keys he has in . Every rule has the proviso that the set of comparison constraints and the set of time constraints should be satisfiable. Rules are only applicable if the set of time constraints are consistent.
Definition 4.10

Let be the set of rules in Figure 1. A timed trace is a labeled sequence of transitions written such that for all , is an instance of a rule in and is if it is an instance of Send rule sending term at time , if it is an instance of Receive rule receiving term at time , and otherwise.

The use of rewriting modulo SMT considerably reduces the search space. Timed protocols are infinite state systems, as time symbols can be instantiated by any (positive) real number. With the use of rewriting modulo SMT we simply have to accumulate constraints. Only traces with satisfiable sets of time constraints are allowed. Indeed, as we describe in Section 6, the number of traces is not only finite (as stated in the following Proposition), but very low (less than 40 traces). As observational equivalence involves the matching of traces, checking for observational equivalence can be automated.

Proposition 4.11

The set of traces starting from any configuration is finite.

Proposition 4.12

Let be a trace. For any , such that