Graph Based Proactive Secure Decomposition Algorithm for Context Dependent Attribute Based Inference Control Problem

03/01/2018 ∙ by Ugur Turan, et al. ∙ Middle East Technical University The University of Texas at Dallas 0

Relational DBMSs continue to dominate the database market, and inference problem on external schema of relational DBMS's is still an important issue in terms of data privacy.Especially for the last 10 years, external schema construction for application-specific database usage has increased its independency from the conceptual schema, as the definitions and implementations of views and procedures have been optimized. This paper offers an optimized decomposition strategy for the external schema, which concentrates on the privacy policy and required associations of attributes for the intended user roles. The method proposed in this article performs a proactive decomposition of the external schema, in order to satisfy both the forbidden and required associations of attributes.Functional dependency constraints of a database schema can be represented as a graph, in which vertices are attribute sets and edges are functional dependencies. In this representation, inference problem can be defined as a process of searching a subtree in the dependency graph containing the attributes that need to be related. The optimized decomposition process aims to generate an external schema, which guarantees the prevention of the inference of the forbidden attribute sets while guaranteeing the association of the required attribute sets with a minimal loss of possible association among other attributes, if the inhibited and required attribute sets are consistent with each other. Our technique is purely proactive, and can be viewed as a normalization process. Due to the usage independency of external schema construction tools, it can be easily applied to any existing systems without rewriting data access layer of applications. Our extensive experimental analysis shows the effectiveness of this optimized proactive strategy for a wide variety of logical schema volumes.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

As the demand towards automated systems and processes have increased, the technology in business applications has focused on to two different dimensions: application usage and statistical analysis. In each case, inference based privacy preserving techniques, in terms of databases, has been an important problem in order to protect the sensitive data. Modern approaches like differential privacy preserving techniques[12, 16] or intentionally deception mechanisms provide secure ways to represent statistical results without revealing sensitive data. However their usage cannot be applied to traditional applications [23].

Many business applications aim to monitor and update single entity data. As an example, consider a call center module of a bank. When a customer wants to apply for a campaign, the operator should check her transaction history for prerequisites. Transactions are sensitive data, but, they cannot be altered by adding noise or hypothetical rows cannot be added for deception. The financial transactions should be viewed exactly as they are. If the task were a statistical analysis of transactions, then both techniques could have been applied to protect the inference of sensitive data. However these kinds of processes are mainly based on single entity business procedures, as standard application usage. That is, the call center employee should be able to access to a set of sensitive data according to assigned user role, and, the external layer of the database presented to this user role should not reveal any more information other than required.

This objective needs three different perspectives. Firstly, the schema of external layer should be decomposed with a fine-grained attribute-based approach which preserves required associations for the user role and prevents any other inferences. This objective is the focus of this article, as necessary theorems and algorithms is proposed in this paper. The second perspective deals with inferences based on dynamic data distribution and the last one is about collaboration attacks [8, 13]. Both of them are also needed to be handled in order to satisfy privacy of sensitive data. The mechanisms for the last two inference channels can be applied as add-ons to the strategy given in this paper. However, the main and the first objective should be arranging the external layer for a specific user to prevent all unwanted inference operations. By definition, this is a proactive step, and it should be viewed as a policy-based normalization stage in terms of privacy.

For this research, we have been motivated with a real life example. The problem was related with the recycling business application developed for a municipality in Antalya province in Turkey. The product was a web and point-of-sale application, in which, all citizens having smart cards were giving their recycling wastes to the waste-collector companies, and these companies were loading credits to the cards of citizens according to the current expected market value of the waste, also determined by type of the waste. In 2016, a citizen complained that she has been identified, and she is receiving messages in consistent with her consumption and recycling waste she has produced. It may be argued that many citizens may have nearly same consumption and thoroughly, waste statistics distributed within a year. However, the case is different as the claimant owns a hand-made gift company and has much more glass waste in November during preparation of gifts for new year’s day. Recycled waste collector companies were expected to query the time-based collection statistics of the trucks and the town management was also expected only to view the usage statistics of the system. Additionally, collector’s views are defined only as a subset of the town management external view set.

For a simplified description, the views are as follows:

[Available for the Collector and Town Management]

View = (TruckId, DateTime, GPSCoordinates,

TotalWasteWeight, RecyclingWasteWeight,

WasteType)

[Available for only Town Management]

View = (CitizenId, Name, Surname, Address,

PhoneNumber)

A malicious worker in the town management can use GPSCoordinates attribute in association with the Address attribute and determine a small subset of citizens who had given glass waste in a period of time, using a simple GPS checking function , such as:

Select *

From View, View

Where near(GPSCoordinates, Address)

Indeed, one malicious worker has shared this information with an advertisement company and they have used this information in favor of their glass-producer clients. It is not surprising to receive messages from other glass-producers for a company which always purchases glass. However this small gift company is a part of a charity organization and they use glasses they have collected from their members, and they do not purchase too much from the market. Therefore, it was not possible for glass-producers to determine this company for advertisement. As a result, the privacy of the citizen had been violated. The main reason behind the problem is the lack of security policy while defining views and the attribute association based cross-control in between views. As a solution, security dependent sets have been formed for the original database schema and the algorithms proposed in this paper have been applied. It should be noted that the relationship between GPSCoordinates and Address attributes is a probabilistic dependency, which can be treated as a kind of functional dependency. To overcome this privacy problem, following views are generated:

New View = (TruckId, DateTime, WasteType,

RecyclingWasteWeight, TotalWasteWeight)

New View = (TruckId, GPSCoordinates)

New View = (CitizenId, Name, Surname,

Address, PhoneNumber)

Any join between “New View ” and “New View ” is not a meaningful join 111Equi-join between primary and foreign keys as TruckId is only foreign key, not a primary or candidate key, in these relations [26].

This paper focuses on this problem and presents a complete mechanism to satisfy the goal. The core of the mechanism is based on the Functional Dependency Graph representation of database schema, which is constructed by defining attribute-sets as vertices and dependencies as edges. The aim is to find an external layer decomposition which strictly allows the required attribute associations and prevents inhibited associations, both in compliance with the privacy policy for a specific user role. Owing to the nature of domain, the proposed mechanism has attribute based granularity and the advantage of rearranging external layer without making any change on other layers. There may be other alternative approaches also for the kind of secure decomposition, introduced in this paper. The decomposition may be based on the forbidden attribute sets to satisfy the privacy or the required attributes sets needed for the user role [5]. The strategy given in this paper is an optimized combination of these two approaches and the most crucial step proposed in this paper, is to check the required sets together with the forbidden sets. This control step assures a fully compliant policy proactively.

The rest of this paper is organized as follows: the preliminaries and problem definition are given in the next section. In the following section, we first present graph-based representation of the problem and, then, formal definitions related to the problem and the algorithm for secure decomposition, which is based only on inhibited attribute sets, is given, which aims to minimize dependency loss. Afterward, the required attribute sets are defined, and the algorithm is extended to perform a complete policy-check and output a minimal secure decomposition, by the help of both user policy-based requirements and privacy policy. All algorithms are proven to produce secure decompositions. A brief related work section is given for literature review and experiments are also performed to show that the algorithm is also applicable even in large relational schemas. Future work and conclusions are given at the end of the paper.

2 Preliminaries and Problem Definition

Secure decomposition of the external schema to prevent unwanted inferences has been covered firstly in literature by [11] and [10]. These works concantrate on the required attribute sets (visibility constraints) and produce minimal sized fragments according to the dependencies and constraints. We have improved this process in our previous work [26] and we have defined, security dependency set concept and secure decomposition problem formally and we have proposed a decomposition algorithm. The algorithm aims to have maximal fragments (minimal dependency loss) according to forbidden set of attributes.

We use the following concepts (the original and fully formal definitions are in [26]) as:

  • Security dependent set is the set of attributes from the logical schema, which should not be associated by using related schema and meaningful joins. The sets are determined logically with respect to the domain requirements.

  • Meaningful join is briefly an equi-join operation in between a foreign and primary keys. The inference problem is defined on inhibiting all possible meaningful joins, among the attributes of forbidden attribute sets.

  • is given as the closure of all relations in a logical schema where the closure is determined by performing all possible meaningful joins to all relations of the schema.

  • is the closure of all functional dependencies including the produced ones with transition, union and decomposition properties of functional dependencies.

  • Identifiers of an attribute are defined as the attribute set, to which the attribute is dependent to in .

  • Secure logical schema is the one which prevents all associations of the security dependent sets by using and . It is generated by a decomposition algorithm applied on the original schema. The algorithm is given in [26].

The original definitions in [26] also includes the definition of probabilistic dependencies. For the sake of simplicity, they can be assumed to produce new security dependent sets as given in decomposition algorithm. Therefore, probabilistic dependencies are assumed to be inherently covered in all definitions and algorithms in this paper. Moreover, security dependent sets with a single attribute can easily be eliminated via removing this attribute from the schema, so security dependent sets are all assumed to be consisted of at least two attributes thereafter.

The decomposition algorithm, proposed in [26], works as follows:

  1. Produce power set of all attributes for a relation.

  2. For each element set in this power set:

    1. Eliminate the element set, if it contains all attributes of any secure dependent set.

    2. Eliminate the element set, if it contains any attribute in secure dependent set with its identifier.

  3. Lastly, eliminate all element sets (trivial subsets), contained by other element sets.

In order to illustrate the behavior of secure decomposition algorithm in [26]. We consider the following simple example:

Let the relation and is the primary key, single identifier for all other attributes, as the dependencies are illustrated in Figure 1. Additionally, there is a single security dependent set as .

Figure 1: Dependency Graph

Therefore, the subsets containing both and needs to be eliminated in order to prevent the association between and . Moreover, the subsets containing and will be eliminated as well, since is the identifier. After the elimination of trivial subsets, secure deposition of can be given as:

       

Figure 2: Dependency Graph After Strong-Cut

As it can be seen from the new decomposed schema, there is no way to perform a meaningful join between decomposed sets to associate and . As a graph notation (details will be discussed later) Figure 2 shows the dependecnies formed after the decomposition and according to this figure, there is no way to associate and together, starting from a vertex in this graph. In other words, if the ways of associate secure dependent sets attributes are defined as a chain of functional dependencies through meaningful joins, the algorithm breaks these chains from both sides for both attributes. It is obvious that the relations containing the security dependent set should be removed, but the algorithm in [26], breaks association of each attribute in security dependent set with its identifiers to prevent all meaningful joins. In this paper, we call this strategy as a approach.

However, this approach can be relaxed by cutting the chains only at a single point by producing:

   

The dependencies of this schema is depicted in Figure 3. There is also another possible schema as:

   

Figure 3: Dependency Graph After Weak-Cut

Both of these schemas are consistent with the privacy constraint defined by security dependency set. The aim of this work is to develop a algorithm that decomposes the original schema with a minimum loss of functional dependencies while satisfying the security constraints.

The motivation of this work can be described as developing a decomposition that should not be much lossier than needed. This requirement defines an optimization problem and to the best of our knowledge, this is the first attempt in literature to construct an optimized secure decomposition satisfying the policy, while minimizing the dependency loss.

Directed graph representation is selected as the most suitable mathematical model to represent the problem, since a functional dependency can be easily demonstrated as a directed edge and by this way, all algorithmic background in the graph theory can be used for further enhancements of the concept and algorithm. As a result, a new algorithm will be proposed for secure decomposition concept, which aims to decompose the original schema minimally by preserving the idea of prohibiting decomposed relations to be used in meaningful joins to associate the attributes in a security dependent set.

The problem is basically building a decomposition of the original schema, as any set of securely dependent attributes cannot be associated by joins on keys.

This paper also introduces required attribte set definition and the relaxed-cut algorithm is improved with a consistency check among forbidden and required set attributes.

3 Relaxed-Cut Secure Decomposition Algorithm

We firstly give the basic definitions by using graph notation.

Definition 1

Functional Dependency Graph (denoted as FDG hereafter, for a schema) The given functional dependency set (F) of a logical schema (where F is decomposed - i.e., there is a single element on the right hand side - and thus for each functional dependency such as , is an attribute set and is a single attribute) can be represented as a directed graph as follows:

  • In a normalized schema, all attributes are expected to exist in , but the schema may not be normalized, so each attribute should also be element of individually.

  • Each one of and is a single node in

  • Each relation (attribute sets) is an individual node in .

  • Each dependencies of is an edge in , if both sides of the dependency exist as a different node in .

Example-1: Assume that the logical schema consists of four relations , , and :

where and are foreign keys.

where and are foreign keys.

Then, the graph (FDG) constructed for this schema is given in Figure 4.

Figure 4: of Example-1
1:
2:: logical schema as ,
3:
4:: functional dependency graph of
5:begin
6:
7:
8://Step-1
9:for each  do
10:     if   then
11:         remove from
12:         for each  do
13:              add to
14:         end for
15:     end if
16:end for
17://Step-2
18:for each  do
19:     for each  do
20:         add to
21:     end for
22:end for
23://Step-3
24:for each  do
25:     if  and  then
26:         add to
27:     end if
28:end for
29://Step-4
30:for each  do
31:     if   then
32:         add to
33:     end if
34:end for
35://Step-5
36: Closure set of
37://Step-6
38:for each  do
39:     if  and and  then
40:         add to
41:     end if
42:end for
43:end
Algorithm 1 Constructing Functional Dependency Graph (FDG)

The steps of the FDG construction algorithm is as follows:

  1. Decompose all functional dependencies in , such that each functional dependency will have a single element in the right-hand side.

  2. Create an individual vertex for all attributes in schema and add to .

  3. Create vertices for the attribute sets with more than one element, which exist in left-hand side of any functional dependency and does not exist in .

  4. Create additional vertices, which include the attributes of a relation in and does not exist in .

  5. Generate

  6. For each in , add an edge to if and are different vertices in .

The graph in Figure 4 is obtained by the above algorithm for example-1.

Lemma 1

The edges of transitive closure of is equal to

(SKETCH) It can be easily seen that the transitive definition is same for functional dependencies and its corresponding graph. The equivalency is based on the transitive property on the graphs and functional dependencies.

Definition 2

Common Ancestor of an Attribute Set is a vertex in , from which there exist simple paths to each element of attribute set.

In Figure 4, is one of the Common Ancestors for the set of vertices .

Definition 3

Join Chain of an Attribute Set (Denoted as JC hereafter) is the set of edges of simple paths in , from a common ancestor to the attribute set.

The attribute sets may be a forbidden set (i.e. Secure Dependent Set) or a required set (which will be defined later). Let the relational schema is as given in Figure 4. Let the forbidden set is , and the functional dependency graph is constructed as in Figure 4. The join chain sets according to the forbidden set is given as below. The first join chain is emphasized with red colour and bold for an ilustrative example.

1:
2:: functional dependency graph of schema
3:: attribute set
4:
5:: join chain set
6://Step-1
7:begin
8:
9:for each  do
10:     remove from
11:     add to
12:end for
13://Step-2
14:Initialize as array of array of vertices
15:Initialize as array of array of set of edges
16:for each  do
17:      empty set of connected vertices
18:      empty set of path edge sets
19:      apply to with starting vertex
20:     
21:     for each  in  do
22:         
23:          simple path to in
24:         
25:     end for
26:end for
27://Step-3
28: array of shared vertices by all TargetArr rows
29:for each  do
30:     add to
31:end for
32://Step-4
33:for each  do
34:     for each  do
35:         if   then
36:              remove from
37:         end if
38:     end for
39:end for
40:end
Algorithm 2 Generating Join Chain Set Algorithm

The steps of the Join Chain Construction algorithm is as follows:

  1. All edges are reversed.

  2. Taking each element of attribute set () as starting vertex, apply up to all connected vertices and all possible simple paths are determined for each end vertex.

  3. If there exist simple paths to the same end vertex, which are common (common ancestors in original ) for all set attributes (assumed to be starting vertices), all combinations of constructed simple paths, starting from different set attribute and ending in the same vertex is a join chain.

  4. If a chain composes another chain, it is discarded.

In example-1, and are determined as common ancestors and all combinations of simple paths as (2 alternatives) & (2 alternatives) and (2 alternatives) and (2 alternatives) are given as different join chains.

Definition 4

Minimum-Cut Secure Decomposition: Decomposing the relational schema by removing the minimum number of functional dependencies (i.e., not allowing the attributes of the lost functional dependency in the same relation) to satisfy all security requirements.

Minimized-Cut Secure Decomposition Problem is equivalent to Minimum Hitting Set Problem [2]

and thus it is NP-Complete. We propose a simple greedy heuristic algorithm to solve Relaxed-Cut Secure Decomposition problem (which is defined below), and due to the structure of our problem, we observed that this greedy approach mostly determines the optimal solution.

Definition 5

Relaxed-Cut Secure Decomposition: Decomposing the relational schema by greedly removing the functional dependencies in ordet to cut the dependencies of secure dependent attributes at least through one of the join chains.

Unlike strong cut which cuts the identifiers of all attributes of the secure dependent sets, relaxed cut aims to remove the functional dependencies as little as possible. The steps of the Relaxed-Cut Secure Decomposition algorithm are as follows:

  1. Calculate all join chains( ) for each security dependent set (Algorithm-2).

  2. For each edge in the , determine the number of times (SecurityCount) it appears in join chains.

  3. Sort the edges first according to their SecurityCount in descending order, then, the number of attributes on the nodes at both sides of the edge, in ascending order (in order to cut lower chains first).

  4. Traverse the sorted list and mark each join chain as cut, if the edge is contained. These edges are selected ones and to be a selected one, an edge should be an element of at least one unmarked join chain. Set of selected edges are named as new security dependent sets.

  5. All subsets of the attributes of the relational schema are generated, which is called as in the algorithm. Then, for each new security dependency set, each element of is processed. The element set is eliminated if it contains all attributes of that security dependency set together.

  6. After that, among the remaining subsets redundant ones (used for unnecessary sub-relations composed by other sub-relations) are also eliminated.

The steps 1 through 4 can be named as “Relaxation Stage” and steps 5 and 6 as “Decomposition Stage”. Decomposition stage is a subpart of the secure decomposition algorithm proposed in [26] except the identifier elimination stage.

1:
2:: logical schema as
3:: set of security dependent sets for
4:
5:: a subset of maximal subsets of satisfying security decomposition constraints
6:begin
7: empty array of join chain sets
8://Step-1
9:for each  do
10:      join chain set for (Algorithm-2)
11:     add to
12:end for
13: FDG of (Algorithm-1)
14: array of integers initialized to 0, size
15://Step-2
16:for each  do
17:     for each  do
18:         for each  do
19:              if   then
20:                  
21:              end if
22:         end for
23:     end for
24:end for
25://Step-3
26: sort edges in descending order (uses )
27: empty set of edges
28:for each  do //Step-4
29:     for each  do
30:         for each  do
31:              if  and is unmarked then
32:                  mark
33:                  add to
34:              end if
35:         end for
36:     end for
37:end for
38: empty set of attribute sets
39:for each  do
40:     add to
41:end for
42:for each  do //Step-5
43:      Power Set of
44:     for each  do
45:         for each  do
46:              if   then
47:                  remove from
48:              end if
49:         end for
50:     end for
51:     for each  do //Step-6
52:         for each  do
53:              if   then
54:                  remove from
55:              end if
56:         end for
57:     end for
58:end for
59:end
Algorithm 3 Relaxed-Cut Secure Decomposition Algorithm

Figure 5: of Example-3

Example-2: Consider the following schema and the forbidden sets.

Forbidden Sets =

Functional dependency graph is constructed as in Figure 5.

Join chains are determine for this example as follows:

For :

   

   

   

For :

   

For :

   

Finally, relaxed cut secure dependency algorithm is executed using Security Counts. The edges are shown up to all marked ones in Table-1. Plus (+) sign in a row indicates that this edge is selected and the join chain on the column is marked. Minus (-) sign is used for already marked join chains and the edges without plus (+) sign in the row is not selected.

As a result, the following edges are selected:

Output of Algorithm in [26] with the Security Dependent Set would be as (i.e. with strong-cut):

       

   

    

On the other hand the output of Algorithm-3 with Security Dependent Sets will be as follows:

       

   

The algorithm can be improved by defining a total participation count to all edges for all possible join chains of combination of attributes but it will result in a high time-cost.

Theorem 3.1

Algorithm-3 generates a secure logical schema.

Assume that the resulting decomposed relations can be joined by foreign keys to associate securely dependent attributes. Then the functional dependency graph of new schema should contain a join chain for the attributes of this security dependent set. However, this cannot happen since each join chain has been cut at least by an edge and the attributes of these cut edges are given as new security dependent sets, which means their coexistence is prevented.

Therefore, resulting decomposed relations form a secure logical schema and the new forbidden sets, serve for the same privacy degree with respect to secure decomposition in [26] by using original forbidden sets.

4 Secure Decomposition with Required Attribute Sets

Considering the frontend applicational usage of relational databases, all roles and permitted functionalities are predetermined on the requirement analysis and design stages of the project. Thus, each functionality may be represented as a set of database queries and each query can be shown as an attribute set, which should be associated to accomplish the task. However, these required sets should be checked against the security dependent sets as a verification step. If any inconsistency is determined, the design of queries or security policy should be reviewed. In addition to these, the decomposition alternatives (according to the edge selection strategy in Algorithm-3) can be quantified and chosen according to the given association sets, which results in a decomposition which satisfies both security dependencies and needed functionalities.

First, we define required attribute set:

Definition 6

Required Attribute Set (Denoted as hereafter) is a set of attributes in the relational schema, which should be associated with a series of meaningful joins to satisfy a functionality of the applicational usage.

It is important to note that, each functionality of a user role should be mapped to a set of .

Required and forbidden sets must be consistent, and the decompostion algorithm must satisfy both requirements.

Definition 7

Consistency Check Between Required Sets and Forbidden Sets: Required and Forbidden Sets are consistent with each other if there is a ”cut set” that contains at least one element from each one of the join chains for each forbidden set (each forbidden set forms a set of join chains) and there is at least one set in join chains corresponding to each required set (each required set forms a set of join chains) that do not contain any element from the cut set.

In the case of any inconsistency discovered, security policy and association set needs should be revised by the designer.

Consistency check (CC) problem can be simplified as follows: the edges are mapped to letters, and thus join chains become set of letters. The consistency check problem can be defined as to determine a set of letters (cut set) such that at least one letter from each set of the forbidden sets and none of the letters in at least one of the sets of each required set must be in the cut set.

Consider the following CC problem instance:

The above CC instance is consistents since at least one element of the cut set {a, g} is in each forbidden set, and one set for each set of required sets (as {d, e} and {b, c, f}) do not contain any element of the cut set.

On the other hand, the following CC instance is inconsistent since no cut set satisfying the requirements exists.

Theorem 4.1

Consistency Check () Problem is NP-Complete.

Input: Set of sets and set of sets of sets as follows:

Problem: Given and as above determine (i.e. it is consistent), if there is a set of for each in , there is at least one for each in such that does not include any of .

More formally;

Given sets, the system is consistent

if there exists

such that

NPC Proof: Given 3SAT instance constuct an instance of as follows:

3SAT instance: such that each is either or and there are exactly different propositional variables ,…

Construct instance as follows:

is satisfiable if and only if is consistent.

is satisfiable if there is a truth assignment for each literal to make all clauses as true. This is equivalent to , such that for each literal one element from each is selected to be in set (that is equivalent to false in or an edge to be cut in original consistency check problem). If each clause in satisfiable in , then, at least 1 literal of or or must be true. That means either or or .

The following algorithm checks for inconsistency for given required sets and forbidden sets . The initial iterations are the same to find a suitable decomposition, if exists.

The steps of the Algorithm-4 is as follows:

  1. Calculate all join chains() for each security dependent set (Algorithm-2).

  2. Calculate all join chains() for each required attribute set (Algorithm-2).

  3. For each different edge combination set, in which each single edge is chosen from a single join chain in all ’s, check if these selected edges are unbroken while breaking all forbidden sets’s join chains, then there is a disjoint edge set which breaks all join chains in . If such an edge combination cannot be found then inconsistency exists.

The process will continue with decomposition stage of Algorithm-3 with output to find a secure decomposition.

1:
2:: set of required sets for logical schema
3:: set of security dependent sets for logical schema
4:
5:: true or false
6:: possible forbidden sets for decomposition
7:begin
8: set of join chain sets
9: array of set of join chain sets
10:for each  do //Step-1
11:      join chain sets for (Algorithm-2)
12:     add to
13:end for
14:
15:for each  do //Step-2
16:      join chain sets for (Algorithm-2)
17:     add to
18:     
19:end for
20:for each possible edge set , as one edge is selected from an
21:     element of all members do //Step-3
22:      true
23:     for each  do
24:         if   then
25:               false
26:              break
27:         end if
28:     end for
29:     if   then
30:         break
31:     end if
32:end for
33:end
Algorithm 4 Consistency Check and Determining Cut Set Algorithm
EDGE EDGE NUMBER SecurityCount
4 + + + +
3 + + +
2 + +
2 + -
Not Selected 2 - -
Not Selected 2 - -
Not Selected 2 - -
Not Selected 2 - -
Not Selected 2 - -
2 + -
Table 1: Greedy Edge Selection Phase
Criteria
#edges in FDG 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000
#edges in Forbidden Join Chains 10 10 10 10 10 10 20 30 40 50
#Forbidden Join Chains 20 40 60 80 100 100 100 100 100 100
#Required Attribute Sets 50 50 50 50 50 50 50 50 50 50
#Join Chains per Required Attribute Set 10 10 10 10 10 10 10 10 10 10
#edge per Join Chain for Required Attribute Set 10 10 10 10 10 10 10 10 10 10
#Duration 2 ms 2 ms 3 ms 3 ms 4 ms 3 ms 3 ms 4 ms 4 ms 6 ms
Table 2: Timings for Implementation Strategy-I
Criteria
#edges in FDG 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000
#edges in Forbidden Join Chains 10 10 10 10 10 10 10 10 10 10
#Forbidden Join Chains 100 80 60 40 20 100 100 100 100 100
#Required Attribute Sets 50 50 50 50 50 50 50 50 50 50
#Join Chains per Required Attribute Set 10 10 10 10 10 10 10 10 10 10
#edge per Join Chain for Required Attribute Set 10 10 10 10 10 10 20 30 40 50
#Duration 2 ms 2 ms 1 ms 1 ms 1 ms 1 ms 1 ms 1 ms 1 ms 1 ms
Table 3: Timings for Implementation Strategy-II

5 Experiments

Constructing functional dependency graph of a given schema is a straightforward task and its time complexity is which is negligable for a proactive process. Additionally, the time complexity of generating join chains for given “Required” or “Forbidden” sets mainly depends on the, “all simple paths” [21] algorithm, which corresponds to the main step of the process. Hence, also the overall algorithm is related to time complexity of DFS which is also negligible for a proactive solution. The main time consuming part of the whole process is the Algorithm-4, since it is an exhaustive algorithm. The algorithm can be implemented in two different directions:

  • Checking each “Forbidden” set according to a brute-force selection on “Required” set (Implementation Strategy-I as given in Algorithm-4).

  • Checking each “Required” set according to a brute-force selection on “Forbidden” set (Implementation Strategy-II).

Since the problem is NP-complete some new heuristics may also be added to the algorithm which might reduce only the expected execution time. The experiments are performed by choosing meaningfull database parameters. It is important to note that, any logical database can be divided into sub-schemas in terms of join chains, so the algorithm can be repeated at each sub-schema easily. The timings are collected using a i7, 16 GB RAM machine and heuristics are used during implementation for a better result, such as checking the forbidden against allowed, or vice versa up to their counts.

Table-2 presents the results of the algorithm when each “Required” set is checked according to a brute-force selection on “Forbidden” set and respectively Table-3 depicts the results for the implementation strategy-II.

These benchmarks show that the algorithm is scalable for even large database sizes. The algorithm is exponential for the worst case for both strategies, but, the better strategies can be desing according to the brute-force selection set size. As shown in the experiments, the timings are mainly dependedent on the brute-force selected set size.

6 Related Work

Database privacy and inference problem has been very popular and several works in this field influenced us in developing the new strategy proposed in this paper. Inference problem has been discussed in many papers [19, 24, 3, 1, 6], but most of them are about reactive solutions. These kind of approaches include query rewriting mechanisms [20], data perturbation methods [9, 18], deception strategies[15] and decomposition-based approaches[26, 11, 10]. The basis of main data perturbation methods is “Differential Privacy”, whose idea is try to add noise to the query result in order to prevent the identification while producing meaningful result. Differential Privacy method is a big step in the literature to prevent inference attacks; however its usage is limited to the statistical analysis. Another approach, known as deception mechanism, aims to corrupt the data by inserting anonymous data or structures, however its applicability is also limited as Differential Privacy. K-anonymity [25] is a another leading research on this field, but differential privacy has proven the hardness of satisfying “non-identifiability” problem[14] for dynamic data distribution.

Application usage of database basically needs single-row identification with actual values (as described in the Introduction section). For this kind of applications, the methods developed to prevent the identification can be categorized as being reactive or proactive. Reactive methods tend to behave dynamically according to the policy or data distribution. Query rewriting techniques (as in Truman and Non-Truman Models called by Elisa Bertino et.al.’s paper [3]) are reactive solutions to the inference problem. Query history tracking mechanisms and Chinese-Wall method [4] like approaches are subject to performance issues, because of being reactive.

To determine the purpose [7] of the user during privacy protection is a major step. During these checks, attribute-based granularity [22] should be used to preserve precise privacy. The idea proposed in this paper is totally proactive, as normalization process, and can also be supported by reactive methods to construct a complete mechanism. Database security policy should be checked against visibility [17] requirements and the external layer should be constructed accordingly. This paper states a complete, optimized and applicable decomposition strategy compared to [26], [11] and [10]. The aim of this paper is to perform the decomposition with policy check, minimal loss of dependencies and by taking care of indirect dependencies (called as ‘probabilistic dependency in [26]). The related works propose an effective way of decomposition database in a somehow similar manner; nevertheless this paper combines maximal availability, intended privacy, policy check and indirect dependencies to carry out a definite decomposition for the external layer.

7 Conclusion and Future Work

The approach given in this paper is a proactive cross-control of the required and the forbidden attribute sets of relational schema, which is achieved using a secure decomposition technique to produce an external schema with maximal availability and minimal loss of the dependencies. As a future work, this optimization process can be further improved by considering the query statistics of the users, and integrating applicable reactive control mechanisms. Even though the method presented in this paper is proactive, experiments show that the timings are acceptable even for a reactive-like behavior in future. We have experienced the benefits of this approach as given in a real-life example.

References

  • [1] R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu. Hippocratic databases. In Proceedings of the 28th International Conference on Very Large Data Bases, VLDB ’02, pages 143–154. VLDB Endowment, 2002.
  • [2] J. Bailey and P. J. Stuckey. Discovery of minimal unsatisfiable subsets of constraints using hitting set dualization. In M. V. Hermenegildo and D. Cabeza, editors, Practical Aspects of Declarative Languages, pages 174–186, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg.
  • [3] E. Bertino, J.-W. Byun, and N. Li. Foundations of security analysis and design iii. chapter Privacy-Preserving Database Systems, pages 178–206. Springer-Verlag, Berlin, Heidelberg, 2005.
  • [4] D. Brewer and M. Nash. The chinese wall security policy. In Security and Privacy, 1989. Proceedings., 1989 IEEE Symposium on, pages 206–214, May 1989.
  • [5] A. Brodsky, C. Farkas, and S. Jajodia. Secure databases: constraints, inference channels, and monitoring disclosures. Knowledge and Data Engineering, IEEE Transactions on, 12(6):900–919, Nov 2000.
  • [6] L. J. Buczkowski and E. Perry. Database inference controller. In DBSec, pages 311–322, 1989.
  • [7] J.-W. Byun and N. Li. Purpose based access control for privacy protection in relational database systems. The VLDB Journal, 17(4):603–619, July 2008.
  • [8] Y. Chen and W. Chu. Protection of database security via collaborative inference detection. In H. Chen and C. Yang, editors, Intelligence and Security Informatics, volume 135 of Studies in Computational Intelligence, pages 275–303. Springer Berlin Heidelberg, 2008.
  • [9] O. Cooperation. Oracle database: Security guide. b14266.pdf, July 2012.
  • [10] S. De Capitani di Vimercati, S. Foresti, S. Jajodia, S. Paraboschi, and P. Samarati. Fragments and loose associations: Respecting privacy in data publishing. Proc. VLDB Endow., 3(1-2):1370–1381, Sept. 2010.
  • [11] S. D. C. di Vimercati, S. Foresti, S. Jajodia, G. Livraga, S. Paraboschi, and P. Samarati. Fragmentation in presence of data dependencies. IEEE Transactions on Dependable and Secure Computing, 11(6):510–523, Nov 2014.
  • [12] C. Dwork. Differential Privacy: A Survey of Results. Springer Berlin Heidelberg, Berlin, Heidelberg, 2008.
  • [13] T. H. Hinke. Inference aggregation detection in database management systems. In Proceedings of the 1988 IEEE Conference on Security and Privacy, SP’88, pages 96–106, Washington, DC, USA, 1988. IEEE Computer Society.
  • [14] J. Lee and C. Clifton. Differential identifiability. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, pages 1041–1049, New York, NY, USA, 2012. ACM.
  • [15] W. Luo, Q. Xie, and U. Hengartner. Facecloak: An architecture for user privacy on social networking sites. In 2009 International Conference on Computational Science and Engineering, volume 3, pages 26–33, Aug 2009.
  • [16] F. McSherry and K. Talwar. Mechanism design via differential privacy. pages 94 – 103, 2007.
  • [17] A. Motro. An access authorization model for relational databases based on algebraic manipulation of view definitions. In Proceedings of the Fifth International Conference on Data Engineering, pages 339–347, Washington, DC, USA, 1989. IEEE Computer Society.
  • [18] K. Muralidhar, R. Parsa, and R. Sarathy. A general additive data perturbation method for database security. Management Science, 45(10):pp. 1399–1415, 1999.
  • [19] J. Park, X. Zhang, and R. S. Attribute mutability in usage control. In In Proceedings of the Proceedings of 18th Annual IFIP WG 11.3 Working Conference on Data and Applications Security, pages 15–29. Kluwer, 2004.
  • [20] S. Rizvi, A. Mendelzon, S. Sudarshan, and P. Roy. Extending query rewriting techniques for fine-grained access control. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD ’04, pages 551–562, New York, NY, USA, 2004. ACM.
  • [21] F. Rubin. Enumerating all simple paths in a graph. IEEE Transactions on Circuits and Systems, 25(8):641–642, August 1978.
  • [22] J. Shi, H. Zhu, G. Fu, and T. Jiang. On the soundness property for sql queries of fine-grained access control in dbmss. In Computer and Information Science, 2009. ICIS 2009. Eighth IEEE/ACIS International Conference on, pages 469–474, June 2009.
  • [23] G. Smith. The semantic data model for security: representing the security semantics of an application. In Data Engineering, 1990. Proceedings. Sixth International Conference on, pages 322–329, Feb 1990.
  • [24] M. Stonebraker and E. Wong. Access control in a relational data base management system by query modification. In Proceedings of the 1974 Annual Conference - Volume 1, ACM ’74, pages 180–186, New York, NY, USA, 1974. ACM.
  • [25] L. Sweeney. K-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 10(5):557–570, Oct. 2002.
  • [26] U. Turan, I. H. Toroslu, and M. Kantarcioglu. Secure logical schema and decomposition algorithm for proactive context dependent attribute based inference control.

    Data and Knowledge Engineering

    , 111:1 – 21, 2017.