Privacy Vulnerabilities of Dataset Anonymization Techniques

05/28/2019 ∙ by Eyal Nussbaum, et al. ∙ 0

Vast amounts of information of all types are collected daily about people by governments, corporations and individuals. The information is collected when users register to or use on-line applications, receive health related services, use their mobile phones, utilize search engines, or perform common daily activities. As a result, there is an enormous quantity of privately-owned records that describe individuals' finances, interests, activities, and demographics. These records often include sensitive data and may violate the privacy of the users if published. The common approach to safeguarding user information, or data in general, is to limit access to the storage (usually a database) by using and authentication and authorization protocol. This way, only users with legitimate permissions can access the user data. In many cases though, the publication of user data for statistical analysis and research can be extremely beneficial for both academic and commercial uses, such as statistical research and recommendation systems. To maintain user privacy when such a publication occurs many databases employ anonymization techniques, either on the query results or the data itself. In this paper we examine variants of 2 such techniques, "data perturbation" and "query-set-size control" and discuss their vulnerabilities. Data perturbation deals with changing the values of records in the dataset while maintaining a level of accuracy over the resulting queries. We focus on a relatively new data perturbation method called NeNDS to show a possible partial knowledge attack on its privacy. The query-set-size control allows publication of a query result dependent on having a minimum set size, k, of records satisfying the query parameters. We show some query types relying on this method may still be used to extract hidden information, and prove others maintain privacy even when using multiple queries.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In today’s world many organizations and individuals constantly gather information about people, whether directly or indirectly. This leads to enormous databases storing private information regarding individuals’ personal and professional life. Commonly, access to these records is limited and safeguarded using authorization and authentication protocols. Only authorized users may query the system for data. There are however instances in today’s global network of organizational connections, the growing demand to disseminate and share this information is motivated by various academic, commercial and other benefits. This information is becoming a very important resource for many systems and corporations that may analyze the data in order to enhance and improve their services and performance. The problem of privacy-preserving data analysis has a long history spanning multiple disciplines. As electronic data about individuals becomes increasingly detailed, and as technology enables ever more powerful collection and curation of these data, the need increases for a robust, meaningful, and mathematically rigorous definition of privacy, together with a computationally rich class of algorithms that satisfy this definition. A comparative analysis and discussion of such algorithms with regards to statistical databases can be found in [1]. One common practice for publishing such data without violating privacy is applying regulations, policies and guiding principles for the use of the data. Such regulations usually entail data distortion for the sake of anonymizing the data. In recent years, there has been a growing use of anonymization algorithms based on differential privacy introduced by Dwork et al. [3]. Differential privacy is a mathematical definition of how to measure the privacy risk of an individual when they participate in a database. To construct a data collection or data querying algorithm which constitutes differential privacy, one must add some level of noise to the collected or returned data respectively. While ensuring some level of privacy, these methods still have several issues with regards to implementation and data usability. Sarwate and Chaudhuri [4] discuss the challenges of differential privacy with regards to continuous data, as well as the trade-off between privacy and utility. In some cases, the data may become unusable after distortion. Lee and Clifton [5] discuss the difficulty of correctly implementing differential privacy with regards to the choice of as the differential privacy factor. Due to these issues and restrictions, other privacy preserving algorithms are still in prevalent in many databases and statistical data querying systems. In this paper, we address vulnerabilities of several implementations of such privacy preserving algorithms.
The vulnerability of databases, and hence the potential avenues of attack, depend among other things on the underlying data structure (and query behavior). The information stored in databases also comes in many forms, such as plain text, spatial coordinates, numeric values, and others. Each combination of structure and data format allows for its own specific attack and requires its own unique handling of privacy protection. Another factor when handling privacy in databases is the type of queries allowed (which may be dictated by the previously mentioned structure and data format). For example, datasets with timestamp values may only allow min/max and grouping queries, while those containing sequential numeric values may also allow queries regarding averages, sums, and other mathematical formulas. In Section 2 we analyze the effectiveness of different queries using the -query-set-size limitation over aggregate functions in maintaining individual user privacy in a vehicular network.
Another field where privacy concerns are a growing issue is the field of recommendation systems. Many of these systems use the collaborative filtering technique, in which users are required to reveal their preferences in order to benefit from the recommendations. Su et al. [6] survey these techniques in depth. Several methods aimed at hiding and anonymizing user data have been proposed and studied in an attempt to reduce the privacy issues of collaborative filtering. These methods include data obfuscation, random perturbation, data suppression and others [2, 7, 8, 9]. Most of these methods rely on experimental results alone to show effectiveness, and some have already been shown to have weaknesses that can be exploited in order to recover the original user data [10, 11]. Parameswaran and Blough [2] propose a new data obfuscation technique dubbed “Nearest Neighbor Data Substitution” (NeNDS). In Section 3 we detail a privacy attack on NeNDS based on partial prior information, as well as address shortcomings in the NeNDS algorithm and propose avenues of research for its improvement. Finally, we conclude in Section 4.

2 Combining Queries with -Limited Results

The underlying data structure of a database is one of the factors in determining the querying methods used over the database. The database logic itself may further restrict queries, in some cases allowing for querying a specific key and in others only returning aggregate results over a set of values. The data type stored may also be a factor when discerning which querying methods may be used. Numeric values can allow for mathematical queries such as sums, averages and medians. Text fields may allow for string operations such as “contains”, “starts-with”, or even regular expressions. In the same manner, these queries may also be prohibited as they may convey information that is meant to remain private. Other limitations may be placed on queries as well, such as the query-set-size limitation, blocking query results in cases where a predefined number () of record look-ups have not been reached (i.e. the number of users/items taken into consideration by the query are less than ). Venkatadri et al. [12] recently demonstrated a Privacy attack on Facebook users by utilizing an exploit in Facebook’s targeted advertising API which similarly restricted query results containing too few users. Using a combination of multiple queries which returned aggregate results (or no results due to a low number of users matching the query), the researchers were able to narrow down personally identifiable information which was regarded as private by the users. In this section we look at such cases and attempt to determine whether an attacker can use a combination of allowed queries in order to extract information which the prohibited queries mean to block. This may be done using multiple queries of the same type, or a combination of several query types.

2.1 Dataset and Query Models

We attempt to show privacy attacks on data gathered from vehicular networks. The gathered data is stored in a centralized database which allows a set of queries that are designed to return meaningful information without compromising that privacy of the users. A privacy attack is defined as access to any information gathered from the user that was not made available from standard queries on the database.

2.1.1 Graph Datasets Model


A vehicular network is comprised of unique units distributed in the real world and are displayed on a graph as a set of vertices such that each vertex represents one vehicle at a single (discrete) point in time . The timestamps are measured as incremental time steps from the system’s initial measurement designated . We consider three different graph models:

  • A linear graph with vehicles distributed along discrete coordinates on the axis between .

  • A two-dimensional planar graph with vehicles distributed along discrete coordinates on the and axis between .

  • A three-dimensional cubic graph with vehicles distributed along discrete coordinates on the , and axis between .

For each vehicle at each timestamp, the speed is measured. We denote this with being a discrete value timestamp.

2.1.2 Query Model


Following are the set of queries allowed over the database.

  • : given a range a timestamp , return the average speed over all vehicles in the given range at the given time.

  • given a range a timestamp , return the max speed over all vehicles in the given range at the given time.

  • given a range a timestamp , return the min speed over all vehicles in the given range at the given time.

  • given a range a timestamp , return the median speed over all vehicles in the given range at the given time.

The range is defined by a set of boundaries over the relevant graph:

  • in : A starting coordinate and end coordinate .

  • in : A rectangle with corners .

  • in : A box with corners .

In order to protect user privacy, all queries deal with measurements over aggregated data so as not to indicate a single user’s information. As such, the queries only return a result if at least unique values have been recorded for the scope over which the query has been run, where . The value is known to the attacker, however the number of records which were a part of each query result is not (i.e. the attacker only knows that if a result returned there are at least records in the requested scope , but not the exact number).

2.2 Analysis of

In this section we present privacy attack problems over different graphs and queries.

2.2.1 Linear vehicular placement


Model: A linear graph with vehicles.
Queries: .
Attack: find the speed of a single vehicle at a given time .

It is easy to see that a single query will not constitute an attack. The attack can be performed using the following algorithm:

  • Select a range with .

  • Run query and denote the result .

  • Select a new range with .

  • Run query and denote the result .

  • Continue querying over ranges, each time incrementing until a result isn’t returned. Mark the last coordinate which returned a result as and the result returned as . Note that there were records in this scope.

  • You can now backtrack over all results and calculate the speed of each vehicle between and .

Denote this algorithm . We can see that the runtime for this algorithm is the number of query iterations required to find a section with vehicles.

2.2.2 2D vehicular placement


Model: A two-dimensional planar graph with vehicles distributed along discrete coordinates on the and axis between .
Queries: .
Attack(1): find the speed of a single vehicle at a given time .
Attack(2): find the average speed of a set of vehicles, with the size of smaller than , at a given time .
Assumptions: The values of and are known, where . We first select some value on the axis, denote this value , and split into 3 ranges (the section above , the section below , and the section containing only ):

  • : .

  • : .

  • : .

Note that both and contain , and the union of and is the entire graph containing all vehicles (). We define to be and respectively. See partition example in Figure 1. It is important to note that due to symmetry, this partition can also be done around some value on the axis, with the sections built around this value .

We now perform 5 queries on :
, , , , .

Figure 1: Vehicular network range partition around

If one of the selected queries does not return a response (i.e. it contains less than vehicles), we re-select and repeat the process until all 5 queries are answered (such a value should exists due to the size of

and the probable distribution of vehicles).

Using the results we wish to find the average speed of vehicles in section , and the number of vehicles in each section: . The number of vehicles in each section is a function of the section range and a given timestamp: . We denote the number of vehicles in each section as follows:

  1. .

  2. .

  3. .

  4. .

  5. .

To do so, we solve the following equation system:

  • .

  • .

  • .

  • .

Solving this system gives us the following:

  • .

  • .

  • .

  • .

Denote this process . The runtime for this algorithm is the equivalent to running 5 queries on the dataset, with the addition of solving above equation system.

With these values we can now attempt Attack(1) and Attack(2): If , we have succeeded in Attack(2). If , we can run on which represents the boundaries of a linear graph, we can select on any vehicle with vehicles on either side of it as the target vehicle and perform Attack(1). If , we cannot complete either attack, so we select a new value and run again. There exists an edge case of graphs where for all values of that we can choose as , the number of vehicles will be equal to , in which case we will be unable to perform any attack. This scenario is, however, unlikely in the case of vehicular networks. In addition, since we have the number of vehicles and in and respectively, if these values are sufficiently large in relation to , we can look at these ranges as sub-graphs of and run on them with and .
It is easy to see that we can apply the same method used on the two dimensional graph on the three dimensional graph with some minor modifications as follows. We again select some value on the axis, denote this value , and split into 3 ranges (the section above , the section below , and the section containing only ). In this instance, these sections are represented as cubes in the following manner:

  • : .

  • : .

  • : .

Similarly, we define , to be and respectively. Note that after running our five queries on the five sections, we achieve the same linear equations as in the two dimensional case. Solving these equations now leaves us with the average speed over the plane defined by , and the number of vehicles in this plane. As in the two dimensional case, if we have succeeded in Attack(2). If we now have a sub-graph of which constitutes a two dimensional graph on which we may be able to perform . The minimum size for this to be possible is .
While our results, given as and , refer to the average speeds of vehicles in their respective graph placements, they are not limited to speed values. The same methods can be used for any numeric value that can be averaged over a set of vehicles in this manner, such as number of traffic violations a vehicle has accumulated, number of accidents the vehicle has been involved in, and so on. Any of these, when given as averages over a set of vehicles may appear innocent and maintain high level of privacy for an individual in the system. However, as we have shown, an individual’s data can be inferred with minimal effort by employing our methods. Of course, we are also not limited to vehicular networks. Any data set with the same structure of node placement in a graph will yield the same results.

2.3 Analysis of and

In this section we look at possible attacks using the minimum, maximum and median value queries over ranges in the graph as defined previously by and respectively. Similar to the case of , we define that the queries will not return a result if the target Range at time contains less than individual values. In addition, our analysis of potential attacks rests on the following set of assumptions:

  • The data set consists of unique values.

  • The value is known to the attacker.

  • In case a result is returned, the number of actual values in is not known to the attacker.

  • If contains an even number of values, returns the lower of the median values.

  • The attacker is limited only to the and queries, but can perform any number of queries over the data set.

For simplicity, we will treat the data set as in the previous section - a linear graph representing a snapshot in time of recorded speeds of vehicles in a specified area. A query of type ( being or ) at time over a range beginning at and ending at (inclusive) will be denoted .

We note that there are several special cases in which a trivial attack can be performed. We will address these cases before moving on to the general case.

2.3.1 Case 1: Global Min/Max


Since there exist a unique global minimum and global maximum in the graph, it is easy to see that by querying over the entire graph and iteratively decreasing the range until a new minimum/maximum is found, the vehicle with the minimum and maximum speeds can be discovered.

2.3.2 Case 2: Min/Max


Similar to the case of a global min/max, if a vehicle has the local minimum or maximum value with regards to his nearest neighbors then their speed can be discovered. This is done using the same method as stated for the global min/max. A range consisting of vehicles, with the outer vehicle having a min (max) speed in that group must be found. Once found, decrease the range until a group of size remains in its bounds. By our definition, the min (max) value now changes, and the attacker knows that the previous value belongs to the vehicle that has been removed from the range. Note that if a such a min/max vehicle exists in the graph, the attacker can find it given enough queries.

2.3.3 Case 3:


In this case, since all values are defined to be unique, querying on a range containing exactly vehicles return values, each belonging to a specific vehicle. An attacker can query over a single coordinate at the left-most side of the graph and increase the range until a result is returned. The first time a result is returned, the minimum group size has been reached, and the attacker has the speed of each of the vehicles. Each speed cannot be attributed to a specific vehicle, but we will denote these values . The attacker now decreases the range’s size from the left until no result is returned, this indicates the range now only contains vehicles. Increasing the range to the right until a result is returned indicates that a new vehicle has been added to the range. Since all values are unique, one of the values will be missing from the results. This belongs to the left-most vehicle from the previous query results. Continuing this method until the entire graph has been scanned will reveal the speeds of each vehicle in the graph.

2.3.4 The General Case:


We show that for the general case, there exists a linear placement of vehicles such that at least vehicle will have a speed whose value will remain hidden from an attacker. Note that if a combination of queries can be used to attain the same results as the query , then a privacy attack can be performed in the manner detailed in Section 2.2.1. Hozo et al. [13]

devise a method to estimate the average value and variance of a group using knowledge of only the minimum, maximum and median values. However, for the attack described in

to succeed, the actual average value is required and not just an estimate. We use an adversarial model and show that for any number of vehicles and any minimal query size , a vehicle arrangement can be created in which the attacker, using any combination of the above mentioned queries, lacks the ability to discover the speed of at least vehicle. For any value of and we prove this for a specific vehicle placed at the leftmost occupied coordinate on the axis (denoted ). For any value of and we prove this vehicle may be at any coordinate.

Lemma 2.1.

Let be a set of vehicles positioned along a linear graph at coordinates at time . If , for any value there exists a corresponding assignment of speeds , such that the speed of cannot be determined by any attacker with access to the and queries over the graph.

Proof.

We prove by induction for and , then extrapolate for and .

Show Correctness for


With vehicles positioned at , set the values of such that . Since the queries will only return results when the range queried contains the range . It is easy to see that:

  • .

  • .

  • .

As such, the value of is never revealed.

Assume Correctness for


Given a set of vehicles positioned at coordinates , assume there exists an assignment of corresponding speeds such that cannot be determined by an attacker with access to any number of queries with a limitation.

Prove for


We assign such that for the subgraph , for , the value of is never revealed by any query . We note properties regarding of the node , placed at :

  1. There exists only queryable range, , for which any query will take both and into consideration.

  2. Regardless of the value of , the queries and cannot return as a result. (Otherwise, would have been a result of one of the queries over the subgraph )

Due to these properties, we must only ensure that the query
does not return as it’s result. Denote to be the result of . If then we set so that . Conversely, if then we set so that . We now have an assignment such that the value cannot be discovered by an attacker.

Extrapolate for


The parameter is defined as the minimum number of vehicles required to be in a range in order for a result to be returned. For any value of increasing the value of only reduces the number of available queries that will return a result. Since it holds that there exists an assignment such that cannot be discovered for , then setting for the same assignment will not give any new information to the attacker and will remain unknown. It can be seen that this is true for any value such that . ∎ While Lemma 2.1 holds for any value of and , such an assignment, where a specific node is deterministically undiscoverable, is susceptible to prior knowledge attacks. In addition, in most real world cases, the value of is chosen to be on a level of magnitude lower than as to allow for many queries. We show that for these cases, specifically any case where and , the vehicle whose speed is never returned by any query can be chosen as any vehicle by the adversary.

Lemma 2.2.

Let be a set of vehicles positioned along a linear graph at coordinates at time . If , for any value there exists a corresponding assignment of speeds , such that there exists a node with speed which cannot be determined by any attacker with access to the and queries over the graph.

Proof.

We prove by induction for and , then extrapolate for and .

Show Correctness for


With vehicles positioned at , set the values of such that . The value of cannot be determined by an attacker even by running all possible query combinations on the graph. The results of all such possible queries can be see in Table 1.

Range Containing Vehicles
Range Containing Vehicles
Range Containing Vehicles
Range Containing Vehicles
Range Containing Vehicles
Table 1: All Possible Results of and with and .
Assume Correctness for )


Given a set of vehicles positioned at coordinates , assume there exists an assignment of corresponding speeds such that there exists some value belonging to some vehicle at position , which cannot be determined by an attacker with access to any number of queries under a limitation.

Prove for


We assign such that for the subgraph , for , there exists some value of which is never revealed by any query . Assume . We note properties regarding of the node , placed at :

  1. Regardless of the value of : . (i.e. cannot be the result of any query in the range )

  2. regardless of the value of : . (i.e. cannot be the result of any query in the range )

Therefore, we must only assign such that it does not cause to be the result of any query. Define to be the result of . Due to the properties of , if then . Conversely, if then . Otherwise at least one of those queries would have returned as a result, which contradicts the induction assumption. Define to be the closest median value to from the previously stated queries.

We set

to be some uniformly distributed random value between

and . We now look at and note that for any value , the results of and are either the same value or adjacent values, as the speeds in the range differ by exactly value. Since no value is adjacent to , then cannot be the result of any value . There exist no other queries of the type which contain both and , therefore we now have an assignment such that the value cannot be discovered by an attacker.
The above holds for the assumption . It is easy to see that due to symmetry, the case where allows us to shift all values of one vehicle to the right, and assign the random value between and to . This completes correctness for all positions of .

Extrapolate for


Similar to 2.1, increasing for a given value of only reduces the amount of information available to the attacker. Therefore, if a value exists for an assignment in a graph with vehicles under the limitation (with ), it will exist for any value of such that .

3 Collaborative Filtering

Collaborative filtering (CF) is a technique commonly used to build personalized recommendations on the Web. In collaborative filtering, algorithms are used to make automatic predictions about a user’s interests by compiling preferences from several users. In order to provide personalized information to a user, the CF system needs to be provided with sufficient information regarding his or her preferences, behavioral characteristics, as well as demographic information of the individual. The accuracy of the recommendations is dependent largely on how much of this information is known to the CF system. However, this information can prove to be extremely dangerous if it falls in the wrong hands. Several methods aimed at hiding and anonymizing user data have been proposed and studied in an attempt to reduce the privacy issues of collaborative filtering. Among these methods is the data obfuscation technique “Nearest Neighbor Data Substitution” (NeNDS) proposed by Parameswaran and Blough in [2]. Using this approach, items in each column of the database are clustered into groups by closeness of their values, and a substitution algorithm is applied to each group. The algorithm gives each item a new location within the group such that each item now corresponds to a new row in the original database. The relative closeness in values of the substituted items allows for the recommendation system to maintain a good degree of approximation when the CF algorithm is applied to obtain recommendations, while the substitution itself offers a level of privacy by hiding the original values associated with each individual user. In this section, we show the possibility of a privacy attack on the substituted database by an attacker with partial knowledge of the original data.

3.1 The NeNDS Algorithm

The Nearest Neighbor Data Substitution (NeNDS) technique is a lossless data obfuscation technique that preserves the privacy of individual data elements by substituting them with one of their Euclidean space neighbors. NeNDS uses a permutation-based approach in which groups of similar items undergo permutation. The permutation approach hides the original value of a data item by substituting it with another data item that is similar to it but not the same. NeNDs treats each column in the database as a separate dataset. The first step in NeNDS is the creation of similar sets of items called neighborhoods. These items contained in each neighborhood are selected in a manner that maintains Euclidean closeness between neighbors using some distance measuring function suited to the data. Each data set is divided into a pre-specified number of neighborhoods. The items in each neighborhood are then permuted in such a way that each item is displaced from its original position, no two items undergo swapping, and the difference between the values of the original and the obfuscated items is minimal. The number of neighbors in each neighborhood is denoted , with where is the number of items in the dataset (this is due to the fact that does not allow any permutation and is the trivial case of swapping between 2 items and easily reversible).
The substitution process is performed by determining the optimal permutation set subject to the following conditions:

  • No two elements in the neighborhood undergo swapping.

  • The elements are displaced from their original position.

  • Substitution is not performed between duplicate elements.

The permutation mapping is done by creating a tree depicting all possible permutation paths and selecting the path with the minimal maximum distance between any 2 substitutions. For example, we look at the case of the neighborhood . The optimal path for substitution would be with the new neighborhood order being and the maximal difference between any 2 substituted items being and . Once the substitutions in each neighborhood is complete, the column of the original database is replaced with column containing the new item positions. The detailed algorithm can be found in [2]. Note that this algorithm is deterministic for any given value of , and will yield the same permutations given any original order of the original dataset.

3.2 Privacy Attack on NeNDS

In this section we will show an attack on a NeNDS permutated database by an attacker with partial knowledge of the original database, specifically the attacker knows the original position of at least items in each neighborhood. The attack is performed under the following assumptions:

  • The attacker has complete knowledge of the NeNDS algorithm.

  • The attacker knows the neighborhood size, used by the algorithm.

  • The attacker can measure the Euclidean distance between the items in the database.

  • The attacker has access to the output permutated database (i.e. the new positions of all items).

We will show the attack for a single dataset (column), however since the algorithm is performed independently for each dataset, this can be extended to the entire database. For a given dataset of size , we define the following notations:

  • Let be the original dataset .

  • Let be the NeNDS obfuscated dataset .

  • Let be the original data items in the neighborhood, .

  • Let be the obfuscated data items in the neighborhood, .

  • Let be the 2 items in whose original position is unknown to the attacker.

The attack is successful if the attacker can determine the original position in of and for all values of .

3.2.1 The Case of


We look at the simple case of the minimal neighborhood size, . In this case, we have for each value of the neighborhood . The attacker can only know the location of 1 of these items. Assume, without loss of generality, that the attacker knows the position of , and as such the original dataset to be where both and could be the original positions of and . We now look at the output neighborhood after the NeNDS algorithm. Due to the restrictions of the NeNDS algorithm which require each item to be relocated and do not allow swapping between 2 items, the resulting neighborhood can only be one of the following permutations:

  1. .

  2. .

Any other permutation would entail leaving an item in its original position. Assume permutation (1). The attacker can determine that the value could not have originally been in position since this is the current position of and the algorithm does not allow swapping between 2 items. Therefore, and . Assume permutation (2). The attacker can determine that the value could not have originally been in position for the same reason, and reaches the same conclusion - the original order for the neighborhood is .

3.2.2 The General Case of any


In this section we will show that the knowledge of original value positions is enough for an attacker to learn the original positions of all values in a neighborhood. We define and for any value to be the original and new location (row) of that value respectively. Taking some neighborhood in , the attacker knows the position for values in . For 2 values, , positions remain unknown. After obfuscation, all new positions are known to the attacker. With this knowledge, since the values in the neighborhood are chosen by their Euclidean closeness, the attacker learns the 2 values and their new positions . There remain 2 possible original positions between which the attacker cannot distinguish (i.e. each one of the values could have been at each one of the possible positions originally).
We now examine the new values in . There are 2 cases: either 1 of the values is or , or both values are from the other values in whose original position is known to the attacker. Note that the case cannot exist since by definition of the algorithm, no 2 items undergo swapping. We now show the attack for both cases, resulting in the discovery of the original positions for .

Case 1


Assume, without loss of generality, that resides in a position whose original value is unknown, meaning was either or . It is easy to see that since no item remains in the same position after obfuscation. In addition, the remaining unknown position is . The attacker now knows the original position of both previously unknown values.

Case 2


In this case, both and now contain values whose original position were known to the attacker. We arbitrarily define those positions to be and and their original values and respectively. The attacker can know use the following method to backtrack the obfuscation path and find the original positions of and