1 Introduction
Information leaks through timing side channels remain a challenging problem [38, 32, 27, 17, 14, 51, 40]. A program leaks secret information through timing side channels if an attacker can deduce secret values (or their properties) by observing response times. We consider the problem of mitigating timing side channels. Unlike elimination techniques [7, 34, 50] that aim to completely remove timing leaks without considering the performance penalty, the goal of mitigation techniques [29, 10, 52] is to weaken the leaks, while keeping the penalty low.
We define the Shannon mitigation problem that decides whether there is a mitigation policy to achieve a lower bound on a given security entropybased measure while respecting an upper bound on the performance overhead. Consider an example where the programunderanalysis has a secret variable with seven possible values, and has three different timing behaviors, each forming a cluster of secret values. It takes second if the secret value is , it takes seconds if the secret is between and , and it takes seconds if the secret value is or . The entropybased measure quantifies the remaining uncertainty about the secret after timing observations. Minguess entropy [28, 45, 11] for this program is , because if the observed execution time is , the attacker guesses the secret in one try. A mitigation policy involves merging some timing clusters by introducing delays. A good solution might be to introduce a second delay if the secret is , which merges two timing clusters. But, this might be disallowed by the budget on the performance overhead. Therefore, another solution must be found, such as introducing a seconds delay when the secret is one.
We develop two variants of the Shannon mitigation problem: deterministic and stochastic. The mitigation policy of the deterministic variant requires us to move all secret values associated to an observation to another observation, while the policy of the stochastic variant allows us to move only a portion of secret values in an observation to another one. We show that the deterministic variant of the Shannon mitigation problem is intractable and propose a dynamic programming algorithm to approximate the optimal solution for the problem by searching through a restricted set of solutions. We develop an algorithm that reduces the problem in the stochastic variant to a wellknown optimization problem that depends on the entropybased measure. For instance, with minguess entropy, the optimization problem is mixed integerlinear programming.
We consider a threat model where an attacker knows the public inputs (knownmessage attacks [29]), and furthermore, where the public input changes much more often than the secret inputs (for instance, secrets such as bank account numbers do not change often). As a result, for each secret, the attacker observes a timing function of the public inputs. We call this model functional observations of timing side channels.
We develop our tool Schmit that has three components: side channel discovery [49], search for the mitigation policy, and the policy enforcement. The side channel discovery builds the functional observations [49] and measures the entropy of secret set after the observations. The mitigation policy component includes the implementation of the dynamic programming and optimization algorithms. The enforcement component is a monitoring system that uses the program internals and functional observations to enforce the policy at runtime.
To summarize, we make the following contributions:

We formalize the Shannon mitigation problem with two variants and show that the complexity of finding deterministic mitigation policy is NPhard.

We describe two algorithms for synthesizing the mitigation policy: one is based on dynamic programming for the deterministic variant, that is in polynomial time and results in an approximate solution, and the other one solves the stochastic variant of the problem with optimization techniques.

We consider a threat model that results in functional observations. On a set of microbenchmarks, we show that existing mitigation techniques are not secure and efficient for this threat model.

We evaluate our approach on five realworld Java applications. We show that Schmit is scalable in synthesizing mitigation policy within a few seconds and significantly improves the security (entropy) of the applications.
2 Overview
First, we describe the threat model considered in this paper. Second, we describe our approach on a running example. Third, we compare the results of Schmit with the existing mitigation techniques [29, 10, 52] and show that Schmit achieves the highest entropy (i.e., best mitigation) for all three entropy objectives.
Threat Model. We assume that the attacker has access to the source code and the mitigation model, and she can sample the runtime of the application arbitrarily many times on her own machine. During an attack, she intends to guess a fixed secret of the target machine by observing the mitigated running time. Since we consider the attack models where the attacker knows the public inputs and the secret inputs are less volatile than public inputs, her observations are functional observations, where for each secret value, she learns a function from the public inputs to the running time.
Example 2.1.
Consider the program shown in Fig 1(a). It takes secret and public values as inputs. The running time depends on the number of set bits in both secret and public inputs. We assume that secret and public inputs can be between 1 and 1023. Fig 1(b) shows the running time of different secret values as timing functions, i.e., functions from the public inputs to the running time.
Side channel discovery. One can use existing tools to find the initial functional observations [49, 48]. In Example 9.1, functional observations are = , where is a variable whose value is the number of set bits in the public input. The corresponding secret classes after this observation is where shows a set of secret values that have set bits. The sizes of classes are . We use norm as metric to calculate the distance between the functional observations
. This distance (penalty) matrix specifies extra performance overhead to move from one functional observation to another. With the assumption of uniform distributions over the secret input, Shannon entropy, guessing entropy, and the minguessing entropy are 7.3, 90.1, and 1.0, respectively. These entropies are defined in Section
3 and measure the remaining entropy of the secret set after the observations. We aim to maximize the entropy measures, while keeping the performance overhead below a threshold, say 60% for this example.Mitigation with Schmit. We use our tool Schmit to mitigate timing leaks of Example 9.1. The mitigation policy for the Shannon entropy objective is shown in Fig 2(a). The policy results in two classes of observations. The policy requires to move functional observations to and all other observations to
. To enforce this policy, we use a monitoring system at runtime. The monitoring system uses a decision tree model of the initial functional observations. The decision tree model characterizes each functional observation with associated program internals such as method calls or basic block invocations
[48, 47]. The decision tree model for the Example 9.1 is shown in Fig 2(b). The monitoring system records program internals and matches it with the decision tree model to detect the current functional observation. Then, it adds delays, if necessary, to the execution time in order to enforce the mitigation policy. With this method, the mitigated functional observation is = and the secret class is as shown in Fig 2 (c). The performance overhead of this mitigation is 43.1%. The Shannon, guessing, and minguess entropies have improved to 9.7, 459.6, and 193.5, respectively.Comparison with state of the art. We compare our mitigation results to blackbox mitigation scheme [10] and bucketing [29]. Blackbox double scheme technique. We use the double scheme technique [10] to mitigate the leaks of Example 9.1. This mitigation uses a prediction model to release events at scheduled times. Let us consider the prediction for releasing the event at
th epoch with
= , where is the time arrival of the th request, is the prediction for the request , and models the basis for the prediction scheme at th epoch. We assume that the request are the same type and the sequence of public input requests for each secret are received in the beginnig of epoch . Fig 3(a) shows the functional observations after applying the predictive mitigation. With this mitigation, the classes of observations are . The number of classes of observations is reduced from 10 to 4. The performance overhead is 39.9%. The Shannon, guessing, and minguess entropies have increased to 9.00, 321.5, and 5.5, respectively. Bucketing. We consider the mitigation approach with buckets [29]. For Example 9.1, if the attacker does not know the public input (unknownmessage attacks [29]), the observations are as shown in Fig 3(b). We apply the bucketing algorithm in [29] for this observations, and it finds two buckets shown with the red lines in Fig 3(b). The bucketing mitigation requires to move the observations to the closet bucket. Without functional observations, there are 2 classes of observations. However, with functional observations, there are more than 2 observations. Fig 3(c) shows how the pattern of observations are leaking through functional side channels. There are 7 classes of observations: . The Shannon, guessing, and minguess entropies are 7.63, 102.3, and 1.0, respectively. Overall, Schmit achieves the higher entropy measures for all three objectives under the performance overhead of 60%.3 Preliminaries
For a finite set , we use for its cardinality. A
discrete probability distribution
, or just distribution, over a set is a function such that . Let denote the set of all discrete distributions over . We say a distribution is a point distribution if for a . Similarly, a distribution is uniform if for all .Definition 1 (Timing Model).
The timing model of a program is a tuple where is the set of secretinput variables, is the set of publicinput variables, is a finite set of secretinputs, and is the executiontime function of the program over the secret and public inputs.
We assume that the adversary knows the program and wishes to learn the value of the secret input. To do so, for some fixed secret value
, the adversary can invoke the program to estimate (to an arbitrary precision) the execution time of the program. If the set of public inputs is empty, i.e.
, the adversary can only make scalar observations of the execution time corresponding to a secret value. In the more general setting, however, the adversary can arrange his observations in a functional form by estimating an approximation of the timing function of the program.A functional observation of the program for a secret input is the function defined as . Let be the finite set of all functional observations of the program . We define an order over the functional observations : for we say that if for all .
The set characterizes an equivalence relation , namely secrets with equivalent functional observations, over the set , defined as following: if there is an such that . Let be the quotient space of characterized by the observations . We write for the secret set corresponding to the observations . Let be the size of observational equivalence class in , i.e. for and let .
Shannon entropy, guessing entropy, and minguess entropy are three prevalent information metrics to quantify information leaks in programs. Köpf and Basin [28] characterize expressions for various informationtheoretic measures on information leaks when there is a uniform distribution on given below.
Proposition 1 (Köpf and Basin [28]).
Let be a set of observations and let be the set of secret values. Let be the corresponding size of secret set in each class of observation and . Assuming a uniform distribution on , entropies can be characterized as:

Shannon Entropy: ,

Guessing Entropy: , and

MinGuess Entropy: .
4 Shannon Mitigation Problem
Our goal is to mitigate the information leakage due to the timing side channels by adding synthetic delays to the program. An aggressive, but commonlyused, mitigation strategy aims to eliminate the side channels by adding delays such that every secret value yields a common functional observation. However, this strategy may often be impractical as it may result in unacceptable performance degradations of the response time. Assuming a wellknown penalty function associated with the performance degradation, we study the problem of maximizing entropy while respecting a bound on the performance degradation. We dub the decision version of this problem Shannon mitigation.
Adding synthetic delays to executiontime of the program, so as to mask the sidechannel, can give rise to new functional observations that correspond to upperenvelopes of various combinations of original observations. Let be the set of functional observations. For , let be the functional observation corresponding to upperenvelope of the functional observations in the set . Let be the set of all possible functional observations resulting from the upperenvelope calculations. To change the observation of a secret value with functional observation to a new observation (we assume that ), we need to add delay function .
Mitigation Policies.
Let be a set of admissible postmitigation observations. A mitigation policy is a function that for each secret suggests the probability distribution over the functional observations. We say that a mitigation policy is deterministic if for all we have that is a point distribution. Abusing notations, we represent a deterministic mitigation policy as a function . The semantics of a mitigation policy recommends to a program analyst a probability to elevate a secret input from the observational class to the class by adding units delay to the corresponding executiontime for all . We assume that the mitigation policies respect the order, i.e. for every mitigation policy and for all and , we have that implies that . Let be the set of mitigation policies from the set of observational clusters into the clusters .
For the functional observations and a mitigation policy , the resulting observation set is defined as:
Since the mitigation policy is stochastic, we use average sizes of resulting observations to represent fitness of a mitigation policy. For , we define their expected class sizes as (observe that ). Assuming a uniform distribution on , various entropies for the expected class size after applying a policy can be characterized by the following expressions:

Shannon Entropy: ,

Guessing Entropy: , and

MinGuess Entropy: .
We note that the above definitions do not represent the expected entropies, but rather entropies corresponding to the expected cluster sizes. However, the three quantities provide bounds on the expected entropies after applying . Since Shannon and MinGuess entropies are concave functions, from Jensen’s inequality, we get that and are upper bounds on expected Shannon and MinGuess entropies. Similarly, , being a convex function, give a lower bound on expected guessing entropy.
We are interested in maximizing the entropy while respecting constraints on the overall performance of the system. We formalize the notion of performance by introducing performance penalties: there is a function such that elevating from the observation to the functional observation adds an extra performance overheads to the program. The expected performance penalty associated with a policy , , is defined as the probabilistically weighted sum of the penalties, i.e. . Now, we introduce our key decision problem.
Definition 2 (Shannon Mitigation).
Given a set of functional observations , a set of admissible postmitigation observations , set of secrets , a penalty function , a performance penalty upper bound , and an entropy lowerbound , the Shannon mitigation problem , for a given entropy measure , is to decide whether there exists a mitigation policy such that and . We define the deterministic Shannon mitigation variant where the goal is to find a deterministic such policy.
5 Algorithms for Shannon Mitigation Problem
5.1 Deterministic Shannon Mitigation
We first establish the intractability of the deterministic variant.
Theorem 5.1.
Deterministic Shannon mitigation problem is NPcomplete.
Proof.
It is easy to see that the deterministic Shannon mitigation problem is in NP: one can guess a certificate as a deterministic mitigation policy and can verify in polynomial time that it satisfies the entropy and overhead constraints. Next, we sketch the hardness proof for the minguess entropy measure by providing a reduction from the twoway partitioning problem [31]. For the Shannon entropy and guess entropy measures, a reduction can be established from the Shannon capacity problem [19] and the Euclidean sumofsquares clustering problem [8], respectively.
Given a set of integer values, the twoway partitioning problem is to decide whether there is a partition into two sets and with equal sums, i.e. . W.l.o.g assume that for . We reduce this problem to a deterministic Shannon mitigation problem with clusters with the secret set such that . If
is odd then the solution to the twoway partitioning instance is trivially
no. Otherwise, let . Notice that any deterministic mitigation strategy that achieves minguess entropy larger than or equal to must have at most two clusters. On the other hand, the best minguess entropy value can be achieved by having just a single cluster. To avoid this and force getting two clusters corresponding to the two partitions of a solution to the twoway partitions problem instance , we introduce performance penalties such that merging more than clusters is disallowed by keeping performance penalty and performance overhead . It is straightforward to verify that an instance of the resulting minguess entropy problem has a yes answer if and only if the twoway partitioning instance does. ∎Since the deterministic Shannon mitigation problem is intractable, we design an approximate solution for the problem. Note that the problem is hard even if we only use existing functional observations for mitigation, i.e., . Therefore, we consider this case for the approximate solution. Furthermore, we assume the following sequential dominance restriction on a deterministic policy : for if then either or . In other words, for any given , can not be moved to a higher cluster than without having be moved to that cluster. For example, Fig 4(a) shows Shannon mitigation problem with four functional observations and all possible mitigation policies (we represent with ). Fig 4(b) satisfies the sequential dominance restriction, while Fig 4(c) does not.
The search for the deterministic policies satisfying the sequential dominance restriction can be performed efficiently using dynamic programming by effective use of intermediate results’ memorizations.
Algorithm (LABEL:alg:dynmerge) provides a pseudocode for the dynamic programming solution to find a deterministic mitigation policy satisfying the sequential dominance. The key idea is to start with considering policies that produce a single cluster for subclasses of the problem with the observation from , and then compute policies producing one additional cluster in each step by utilizing the previously computed subproblems and keeping track of the performance penalties. The algorithm terminates as soon as the solution of the current step respects the performance bound. The complexity of the algorithm is . algocf[t!]
5.2 Stochastic Shannon Mitigation Algorithm
Next, we solve the (stochastic) Shannon mitigation problem by posing it as an optimization problem. Consider the stochastic Shannon mitigation problem with a stochastic policy and . The following program characterizes the optimization problem that solves the Shannon mitigation problem with stochastic policy.
Maximize , subject to:

for

for all .

.

for .
Here, the objective function is one of the following functions:

Guessing Entropy

MinGuess Entropy

Shannon Entropy
The linear constraints for the problem are defined as the following. The condition (1) and (2) express that provides a probability distributions, condition (3) provides restrictions regarding the performance constraint, and the condition (4) is the entropy specific constraint. The objective function of the optimization problem is defined based on the entropy criteria from . For the simplicity, we omit the constant terms from the objective function definitions. For the guessing entropy, the problem is an instance of linearly constrained quadratic optimization problem [36]. The problem with Shannon entropy is a nonlinear optimization problem [12]. Finally, the optimization problem with minguess entropy is an instance of mixed integer programming [35]. We evaluate the scalability of these solvers empirically in Section 6 and leave the exact complexity as an open problem. We show that the minguess entropy objective function can be efficiently solved with the branch and bound algorithms [39]. Fig 4(b,c) show two instantiations of the mitigation policies that are possible for the stochastic mitigation.
6 Implementation Details
A. Environmental Setups. All timing measurements are conducted on an Intel NUC5i5RYH. We switch off JIT Compilation and run each experiment multiple times and use the mean running time. This helps to reduce the effects of environmental factors such as the Garbage Collections. All other analyses are conducted on an Intel i52.7 GHz machine.
B. Implementation of Side Channel Discovery. We use the technique presented in [49] for the side channel discovery. The technique applies the functional data analysis [41]
to create Bspline basis and fit functions to the vector of timing observations for each secret value. Then, the technique applies the functional data clustering
[23] to obtain classes of observations. We use the number of secret values in a cluster as the class size metric and the distance norm between the clusters as the penalty function.C. Implementation of Mitigation Policy Algorithms. For the stochastic optimization, we encode the Shannon entropy and guessing entropy with linear constraints in Scipy [25]. Since the objective functions are nonlinear (for the Shannon entropy) and quadratic (for the guessing entropy), Scipy uses sequential least square programming (SLSQP) [37] to maximize the objectives. For the stochastic optimization with the minguess entropy, we encode the problem in Gurobi [21] as a mixedinteger programming (MIP) problem [35]. Gurobi solves the problem efficiently with branchandbound algorithms [1]. We use Java to implement the dynamic programming.
D. Implementation of Enforcement. The enforcement of mitigation policy is implemented in two steps. First, we use the initial timing functions and characterize them with program internal properties such as basic block calls. To do so, we use the decision tree learning approach presented in [49]. The decision tree model characterizes each functional observations with properties of program internals. Second, given the policy of mitigation, we enforce the mitigation policy with a monitoring system implemented on top of the Javassist [16] library. The monitoring system uses the decision tree model and matches the properties enabled during an execution with the tree model (detection of the current cluster). Then, it adds extra delays, based on the mitigation policy, to the current executiontime and enforces the mitigation policy. Note that the dynamic monitoring can result in a few microsecond delays. For the programs with timing differences in the order of microseconds, we transform source code using the decision tree model. The transformation requires manual efforts to modify and compile the new program. But, it adds negligible delays.
E. Microbenchmark Results. Our goal is to compare different mitigation methods in terms of their security and performance. We examine the computation time of our tool Schmit in calculating the mitigation policies. See appendix for the relationships between performance bounds and entropy measures.
Applications: Mod_Exp applications [33] are instances of squareandmultiply modular exponentiation () used for secret key operations in RSA [43]. Branch_and_Loop series consist of 6 applications where each application has conditions over secret values and runs a linear loop over the public values. The running time of the applications depend on the slope of the linear loops determined by the secret input.
Computation time comparisons: Fig 5 shows the computation time for Branch_and _Loop applications (the applications are ordered in xaxis based on the discovered number of observational classes). For the minguess entropy, we observe that both stochastic and dynamic programming approaches are efficient and fast as shown in Fig 5(a). For the Shannon and guessing entropies, the dynamic programming is scalable, while the stochastic mitigation is computationally expensive beyond 60 classes of observations as shown in Fig 5(b,c).
Mitigation Algorithm Comparisons: Tab 1 shows microbenchmark results that compare the four mitigation algorithms with the two program series. Double scheme mitigation technique [10] does not provide guarantees on the performance overhead, and we can see that it is increased by more than 75 times for mod_exp_6. Double scheme method reduces the number of classes of observations. However, we observe that this mitigation has difficulty improving the minguess entropy. Second, Bucketing algorithm [29] can guarantee the performance overhead, but it is not an effective method to improve the security of functional observations, see the examples mod_exp_6 and Branch_and_Loop_6. Third, in the algorithms, Schmit guarantees the performance to be below a certain bound, while it results in the highest entropy values. In most cases, the stochastic optimization technique achieves the highest minentropy value. Here, we show the results with minguess entropy measure. Also, we have strong evidences to show that Schmit achieves higher Shannon and guessing entropies. For example, in B_L_5, the initial Shannon entropy has improved from to , , , and for the double scheme, the bucketing, the stochastic, and the deterministic algorithms, respectively.
Initial Characteristics  Double Scheme  Bucketing  Schmit (Determ.)  Schmit (Stoch.)  
App(s)  #S  #P  #K  mGE  #K  mGE  O(%)  #K  mGE  O(%)  K  #mGE  O(%)  #K  mGE  O(%)  
M_E_1  32  32  0.5  1.0  1  16.5  1  16.5  0.0  1  16.5  0.0  1  16.5  0.0  1  16.5  0.0 
M_E_2  64  64  0.5  1.0  2  16.5  1  32.5  5,221  1  32.5  27.6  1  32.5  21.4  1  32.5  21.4 
M_E_3  128  128  0.5  2.0  2  32.5  1  64.5  5,407  1  64.5  33.9  1  64.5  22.7  1  64.5  22.7 
M_E_4  256  256  0.5  2.0  4  10.5  1  128.5  6,679  1  128.5  30.7  1  128.5  28.3  1  128.5  28.3 
M_E_5  512  512  0.5  5.0  23  1.0  1  256.5  7,294  2  128.5  50.0  1  256.5  31.0  1  253.0  30.3 
M_E_6  1,024  1,024  0.5  8.0  40  1.0  1  512.5  7,822  20  1.0  34.5  2  27.5  46.7  5  85.5  50.0 
B_L_1  25  50  0.5  10.0  4  3.0  3  3.0  73.0  3  3.0  17.5  2  5.5  26.1  2  6.5  34.9 
B_L_2  50  50  0.5  10.0  8  3.0  4  3.0  61.3  5  3.0  21.9  2  10.5  45.3  2  13.0  45.3 
B_L_3  100  50  0.5  20.0  16  3.0  4  8.0  42.4  8  3.0  33.4  2  20.5  48.3  2  21.5  50 
B_L_4  200  50  0.5  20.0  32  3.0  6  3.0  36.9  16  3.0  28.7  2  48.0  48.7  2  50.5  49.7 
B_L_5  400  50  0.5  20.0  64  3.0  8  3.0  35.4  32  3.0  27.2  3  65.5  32.0  2  100.5  50.0 
B_L_6  800  50  0.5  20.0  125  3.0  12  8.0  37.8  29  3.0  52.5  3  133.0  34.6  2  200.5  49.6 
7 Case Study
Research Question. Does Schmit scale well and improve the security of applications (entropy measures) within the given performance bounds?
Methodology. We use the deterministic and stochastic algorithms for mitigating the leaks. We show our results for the minguess entropy, but other entropy measures can be applied as well. Since the task is to mitigate existing leakages, we assume that the secret and public inputs are given.
Objects of Study. We consider four realworld applications:
Num.  Num.  Num.  Initial.  Initial.  

Application  Methods  Secret  Public  clusters  Minguess  
GabFeed  573  1,105  65  6.50  34  1.0 
Jetty  63  800  635  0.1  20  4.5 
Java Verbal Expressions  61  2,000  10  0.02  9  50.5 
Password Checker  6  20  2,620  0.05  6  1.0 
In the inset table, we show the basic characteristics of these benchmarks.
GabFeed is a chat server with 573 methods [4]. There is a side channel in the authentication part of the application where the application takes users’ public keys and its own private key, and generating a common key [15]. The vulnerability leaks the number of set bits in the secret key. Initial functional observations are shown in Fig (a)a. There are 34 clusters and minguess entropy is 1. We aim to maximize the minguess entropy under the performance overhead of 50%.
Jetty. We mitigate the side channels in util.security package of Eclipse Jetty web server. The package has Credential class which had a timing side channel. This vulnerability was analyzed in [15] and fixed initially in [6]. Then, the developers noticed that the implementation in [6] can still leak information and fixed this issue with a new implementation in [5]. However, this new implementation is still leaking information [49]. We apply Schmit to mitigate this timing side channels. Initial functional observations is shown in Fig (d)d. There are 20 classes of observations and the initial minguess entropy is 4.5. We aim to maximize the minguess entropy under the performance overhead of 50%.
Java Verbal Expressions is a library with 61 methods that construct regular expressions [2]. There is a timing side channel in the library similar to password comparison vulnerability [3] if the library has secret inputs. In this case, starting from the initial character of a candidate expression, if the character matches with the regular expression, it slightly takes more time to respond the request than otherwise. This vulnerability can leak all the regular expressions. We consider regular expressions to have a maximum size of 9. There are 9 classes of observations and the initial minguess entropy is 50.5. We aim to maximize the minguess entropy under the performance overhead of 50%.
Password Checker. We consider the password matching example from loginBad program [9]. The password stored in the server is secret, and the user’s guess is a public input. We consider 20 secret (lengths at most 6) and 2,620 public inputs. There are 6 different clusters, and the initial minguess entropy is 1.
Findings for GabFeed. With the stochastic algorithm, Schmit calculates the mitigation policy that results in 4 clusters. This policy improves the minguess entropy from 1 to 138.5 and adds an overhead of 42.8%. With deterministic algorithm, Schmit returns 3 clusters. The performance overhead is 49.7% and the minguess entropy improves from 1 to 106. The user chooses the deterministic policy and enforces the mitigation. We apply CART decision tree learning and characterizes the classes of observations with GabFeed method calls as shown in Fig (b)b. The monitoring system uses the decision tree model and automatically detects the current class of observation. Then, it adds extra delays based on the mitigation policy to enforce it. The results of the mitigation is shown in Fig (c)c. Answer for our research question. Scalability: It takes about 1 second to calculate the stochastic and the deterministic policies. Security: Stochastic and deterministic variants improve the minguess entropy more than 100 times under the given performance overhead of 50%, respectively.
Findings for Jetty. The stochastic algorithm and the deterministic algorithm find the same policy that results in 1 cluster with 39.6% performance overhead. The minguess entropy improves from 4.5 to 400.5. For the enforcement, Schmit first uses the initial clusterings and specifies their characteristics with program internals that result in the decision tree model shown in Fig (e)e. Since the response time is in the order of microseconds, we transform the source code using the decision tree model by adding extra counter variables. The results of the mitigation is shown in Fig (f)f. Scalability: It takes less than 1 second to calculate the policies for both algorithms. Security: Stochastic and deterministic variants improve the minguess entropy 89 times under the given performance overhead.
Findings for Java Verbal Expressions. For the stochastic algorithm, the policy results in 2 clusters, and the minguess entropy has improved to 500.5. The performance overhead is 36%. For the dynamic programming, the policy results in 2 clusters. This adds 28% of performance overhead, while it improves the minguess entropy from 50.5 to 450.5. The user chooses to use the deterministic policy for the mitigation. For the mitigation, we transform the source code using the decision tree model and add the extra delays based on the mitigation policy.
Findings for Password Matching. Both the deterministic and the stochastic algorithms result in finding a policy with 2 clusters where the minguess entropy has improved from 1 to 5.5 with the performance overhead of 19.6%. For the mitigation, we transform the source code using the decision tree model and add extra delays based on the mitigation policy if necessary.
8 Related Work
Quantitative theory of information have been widely used to measure how much information is being leaked with sidechannel observations [45, 28, 11, 22]. Mitigation techniques increase the remaining entropy of secret sets leaked through the side channels, while considering the performance [29, 10, 52, 53, 26, 44].
Köpf and Dürmuth [29] use a bucketing algorithm to partition programs’ observations into intervals. With the unknownmessage threat model, Köpf and Dürmuth [29] propose a dynamic programming algorithm to find the optimal number of possible observations under a performance penalty. The works [10, 52] introduce different blackbox schemes to mitigate leaks. In particular, Askarov et al. [10] show the quantizing time techniques, which permit events to release at scheduled constant slots, have the worst case leakage if the slot is not filled with events. Instead, they introduce the double scheme method that has a schedule of predictions like the quantizing approach, but if the event source fails to deliver events at the predicted time, the failure results in generating a new schedule in which the interval between predictions is doubled. We compare our mitigation technique with both algorithms throughout this paper.
Elimination of timing side channels is a common technique to guarantee the confidentiality of software [7, 34, 50, 18, 30, 33]. The work [50] aims to eliminate side channels using static analysis enhanced with various techniques to keep the performance overheads low without guaranteeing the amounts of overhead. In contrast, we use dynamic analysis and allow a small amount of information to leak, but we guarantee an upperbound on the performance overhead.
Machine learning techniques have been used for explaining timing differences between traces [46, 47, 48]. TizpazNiari et al. [48] consider performance issues in softwares. They also cluster execution times of programs and then explain what program properties distinguish the different functional clusters. We adopt their techniques for our security problem.
Acknowledgements.
The authors would like to thank Mayur Naik for shepherding our paper and providing useful suggestions. This research was supported by DARPA under agreement FA87501520096.
References
 [1] Branch and bound algorithm for mip problems, http://www.gurobi.com/resources/gettingstarted/mipbasics
 [2] Verbal expressions library, https://github.com/VerbalExpressions/JavaVerbalExpressions
 [3] Timing attack in google keyczar library (2009), https://rdist.root.org/2009/05/28/timingattackingooglekeyczarlibrary/
 [4] Gabfeed application (2016), https://github.com/ApogeeResearch/STAC/tree/master/Engagement_Challenges/Engagement_2/gabfeed_1
 [5] Timing sidechannel on the length of password in eclipse jetty (May 2017), https://github.com/eclipse/jetty.project/commit/ 2baa1abe4b1c380a30deacca1ed367466a1a62ea
 [6] Timing sidechannel on the password in eclipse jetty (May 2017), https://github.com/eclipse/jetty.project/commit/ f3751d70787fd8ab93932a51c60514c2eb37cb58
 [7] Agat, J.: Transforming out timing leaks. In: Proceedings of the 27th ACM SIGPLANSIGACT symposium on Principles of programming languages. pp. 40–53. ACM (2000)
 [8] Aloise, D., Deshpande, A., Hansen, P., Popat, P.: Nphardness of euclidean sumofsquares clustering. Machine Learning 75(2), 245–248 (May 2009)
 [9] Antonopoulos, T., Gazzillo, P., Hicks, M., Koskinen, E., Terauchi, T., Wei, S.: Decomposition instead of selfcomposition for proving the absence of timing channels. In: PLDI. pp. 362–375. ACM (2017)
 [10] Askarov, A., Zhang, D., Myers, A.C.: Predictive blackbox mitigation of timing channels. In: Proceedings of the 17th ACM conference on Computer and communications security. pp. 297–307. ACM (2010)
 [11] Backes, M., Köpf, B., Rybalchenko, A.: Automatic discovery and quantification of information leaks. In: Security and Privacy, 2009 30th IEEE Symposium on. pp. 141–153. IEEE (2009)
 [12] Bertsekas, D.P.: Nonlinear programming. athena scientific, 2016. Tech. rep., ISBN 9781886529052
 [13] Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and regression trees. Wadsworth: Belmont, CA (1984)
 [14] Brumley, D., Boneh, D.: Remote timing attacks are practical. Computer Networks 48(5), 701–716 (2005)
 [15] Chen, J., Feng, Y., Dillig, I.: Precise detection of sidechannel vulnerabilities using quantitative cartesian hoare logic. In: CCS. pp. 875–890 (2017)
 [16] Chiba, S.: Javassist  a reflectionbased programming wizard for java. In: Proceedings of OOPSLA’98 Workshop on Reflective Programming in C++ and Java. vol. 174 (1998)
 [17] Dhem, J.F., Koeune, F., Leroux, P.A., Mestré, P., Quisquater, J.J., Willems, J.L.: A practical implementation of the timing attack. In: International Conference on Smart Card Research and Advanced Applications. pp. 167–182. Springer (1998)
 [18] Eldib, H., Wang, C.: Synthesis of masking countermeasures against side channel attacks. In: International Conference on Computer Aided Verification. pp. 114–130. Springer (2014)
 [19] Fallgren, M.: On the complexity of maximizing the minimum shannon capacity in wireless networks by joint channel assignment and power allocation. In: 2010 IEEE 18th International Workshop on Quality of Service (IWQoS). pp. 1–7 (2010)
 [20] Ferraty, F., Vieu, P.: Nonparametric functional data analysis: theory and practice. Springer Science & Business Media (2006)
 [21] Gurobi Optimization, L.: Gurobi optimizer reference manual (2018), http://www.gurobi.com
 [22] Heusser, J., Malacaria, P.: Quantifying information leaks in software. In: Proceedings of the 26th Annual Computer Security Applications Conference. pp. 261–269. ACM (2010)
 [23] Jacques, J., Preda, C.: Functional data clustering: a survey. Advances in Data Analysis and Classification 8(3), 231–255 (2014)

[24]
Johnson, S.C.: Hierarchical clustering schemes. Psychometrika
32(3), 241–254 (1967)  [25] Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: Open source scientific tools for Python (2001–), http://www.scipy.org/
 [26] Kadloor, S., Kiyavash, N., Venkitasubramaniam, P.: Mitigating timing based information leakage in shared schedulers. In: Infocom, 2012 Proceedings IEEE. pp. 1044–1052. IEEE (2012)
 [27] Kocher, P.C.: Timing attacks on implementations of DiffieHellman, RSA, DSS, and other systems. In: Annual International Cryptology Conference. pp. 104–113. Springer (1996)
 [28] Köpf, B., Basin, D.: An informationtheoretic model for adaptive sidechannel attacks. In: Proceedings of the 14th ACM Conference on Computer and Communications Security. pp. 286–296. CCS ’07, ACM, New York, NY, USA (2007)
 [29] Köpf, B., Dürmuth, M.: A provably secure and efficient countermeasure against timing attacks. In: Computer Security Foundations Symposium, 2009. CSF’09. 22nd IEEE. pp. 324–335. IEEE (2009)
 [30] Köpf, B., Mantel, H.: Transformational typing and unification for automatically correcting insecure programs. International Journal of Information Security 6(23), 107–131 (2007)
 [31] Korf, R.E.: A complete anytime algorithm for number partitioning. AI 106, 181–203 (1998)
 [32] Lampson, B.W.: A note on the confinement problem. Communications of the ACM 16(10), 613–615 (1973)
 [33] Mantel, H., Starostin, A.: Transforming out timing leaks, more or less. In: European Symposium on Research in Computer Security. pp. 447–467. Springer (2015)
 [34] Molnar, D., Piotrowski, M., Schultz, D., Wagner, D.: The program counter security model: Automatic detection and removal of controlflow side channel attacks. In: International Conference on Information Security and Cryptology. pp. 156–168. Springer (2005)

[35]
Nemhauser, G.L., Wolsey, L.A.: Integer programming and combinatorial optimization. Wiley, Chichester. GL Nemhauser, MWP Savelsbergh, GS Sigismondi (1992). Constraint Classification for Mixed Integer Programming Formulations. COAL Bulletin
20, 8–12 (1988)  [36] Nocedal, J., Wright, S.J.: Numerical optimization 2nd (2006)
 [37] Nocedal, J., Wright, S.J.: Sequential quadratic programming. Springer (2006)
 [38] Padlipsky, M., Snow, D., Karger, P.: Limitations of endtoend encryption in secure computer networks. Tech. rep., MITRE CORP BEDFORD MA (1978)
 [39] Papadimitriou, C.H., Steiglitz, K.: Combinatorial optimization: algorithms and complexity. Courier Corporation (1998)
 [40] Phan, Q.S., Bang, L., Pasareanu, C.S., Malacaria, P., Bultan, T.: Synthesis of adaptive sidechannel attacks. In: Computer Security Foundations Symposium (CSF), 2017 IEEE 30th. pp. 328–342. IEEE (2017)
 [41] Ramsay, J., Hooker, G., Graves, S.: Functional data analysis with R and MATLAB. Springer Science & Business Media (2009)
 [42] Ramsay, J.O.: Functional data analysis. Wiley Online Library (2006)
 [43] Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and publickey cryptosystems. Communications of the ACM 21(2), 120–126 (1978)
 [44] Schinzel, S.: An efficient mitigation method for timing side channels on the web. In: 2nd International Workshop on Constructive SideChannel Analysis and Secure Design (COSADE) (2011)
 [45] Smith, G.: On the foundations of quantitative information flow. In: International Conference on Foundations of Software Science and Computational Structures. pp. 288–302. Springer (2009)
 [46] Song, L., Lu, S.: Statistical debugging for realworld performance problems. In: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications. pp. 561–578. OOPSLA ’14 (2014), http://doi.acm.org/10.1145/2660193.2660234
 [47] TizpazNiari, S., Černý, P., Chang, B.Y.E., Sankaranarayanan, S., Trivedi, A.: Discriminating traces with time. In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems. pp. 21–37. Springer (2017)

[48]
TizpazNiari, S., Černý, P., Chang, B.E., Trivedi, A.: Differential performance debugging with discriminant regression trees. In: 32nd AAAI Conference on Artificial Intelligence (AAAI). pp. 2468–2475 (2018)
 [49] TizpazNiari, S., Černý, P., Trivedi, A.: Datadriven debugging for functional side channels. arXiv preprint arXiv:1808.10502 (2018)
 [50] Wu, M., Guo, S., Schaumont, P., Wang, C.: Eliminating timing sidechannel leaks using program repair. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. pp. 15–26. ACM (2018)
 [51] Yarom, Y., Genkin, D., Heninger, N.: Cachebleed: a timing attack on openssl constanttime rsa. Journal of Cryptographic Engineering 7(2), 99–112 (2017)
 [52] Zhang, D., Askarov, A., Myers, A.C.: Predictive mitigation of timing channels in interactive systems. In: Proceedings of the 18th ACM conference on Computer and communications security. pp. 563–574. ACM (2011)
 [53] Zhang, D., Askarov, A., Myers, A.C.: Languagebased control and mitigation of timing channels. PLDI 47(6), 99–110 (2012)
9 Appendix
9.1 Overview of Schmit
Schmit consists of three components:
1) Initial Security Analysis.
Inspired by [49], for each secret value, we use Bspline basis [42] in general to model arbitrary timing functions of secret values in the domain of public inputs, but we also allow simpler functional models such as polynomial functions. We use the nonparametric functional clustering [20] with hierarchal algorithms [24] to obtain the initial classes of observations or clusters. The clustering algorithm groups timing functions that are close to each other in the same cluster. The size of class is the number of secret values in the cluster. The norm distance between clusters forms the penalty matrix.
Highlight. This step finds the classes of observations over secret values using functional clustering and returns the label (cluster) of each secret value and the distance (as a penalty) between the clusters.
2) Mitigation policy.
We uses the policy algorithms (Section 5) to calculate the mitigation policy given the clusters, their sizes, and their distances. We use two types of algorithms: deterministic and stochastic. The deterministic algorithm is an instance of dynamic programming implemented in Java. The stochastic algorithms have three variants for three types of information theory measure. The variant based on Minguess entropy is the main emphasis in this paper that implemented using Gurobi [21]. The two other variants (for Shannon and Guessing entropies) are implemented in python using Scipy library [25]. See Section 6(C) for further details.
Highlight. This step calculates the mitigation policy that shows how to merge different clusters to maximize an information theory criterion given an upperbound on the amount of performance overhead.
3) Enforcement of mitigation policy.
In the first step, we characterize each class of observation with program internal properties. We use decision tree algorithms [13] to characterize each class of observation with corresponding program internal features. Fig 2(b) in Section 2 is an example of decision tree model that characterizes each class of observation of Fig 1(b) in Section 2 with the basic block invocations at line 16 of modExp method. In the second step, we enforce the mitigation policy. This step can be done either with a monitoring system at runtime automatically or with a source code transformation semiautomatically. The enforcement uses the decision tree model and matches the properties enabled during an execution with the tree model. Then, it adds extra delays, based on the mitigation policy, to the execution in order to enforce the mitigation policy. The result of mitigation can be verified by applying the clustering algorithm on the mitigated execution times.
Highlight. This step uses the functional clusters and the decision tree model and enforces the mitigation policy either with a monitoring system at runtime or souce code transformations. The clustering algorithm over the mitigated execution times can be used to verify the mitigation model.
9.2 Exponential blowup for functional mitigations.
Figure 8 shows possible observational classes for for three observation classes , and . The cluster corresponds to the observation class with the executiontime equals to the upperenvelope of all functions from the classes in .
9.3 Performance bounds versus entropy measures.
Fig 9 shows the relations between relaxing the performance bound and the entropy values. For minguess entropy, Fig 9(a) shows that the stochastic optimization improves the entropy gradually from 95 to 186 by relaxing the bound. However, the dynamic programming has only improved when the performance bound exceeds 1.0. For the Shannon and guessing entropy, Fig 9(b) shows how Schmit improves the entropy with relaxing the performance bounds.
Comments
There are no comments yet.