Snapshot Semantics for Temporal Multiset Relations (Extended Version)

02/13/2019 ∙ by Anton Dignös, et al. ∙ 0

Snapshot semantics is widely used for evaluating queries over temporal data: temporal relations are seen as sequences of snapshot relations, and queries are evaluated at each snapshot. In this work, we demonstrate that current approaches for snapshot semantics over interval-timestamped multiset relations are subject to two bugs regarding snapshot aggregation and bag difference. We introduce a novel temporal data model based on K-relations that overcomes these bugs and prove it to correctly encode snapshot semantics. Furthermore, we present an efficient implementation of our model as a database middleware and demonstrate experimentally that our approach is competitive with native implementations and significantly outperforms such implementations on queries that involve aggregation.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recently, there is renewed interest in temporal databases fueled by the fact that abundant storage has made long term archival of historical data feasible. This has led to the incorporation of temporal features into the SQL:2011 standard [27] which defines an encoding of temporal data associating each tuple with a validity period. We refer to such relations as SQL period relations. Note that SQL period relations use multiset semantics. Period relations are supported by many DBMSs, e.g., PostgreSQL [34], Teradata [44], Oracle [30], IBM DB2 [35], and MS SQLServer [29]. However, none of these systems, with the partial exception of Teradata, supports snapshot semantics, an important class of temporal queries. Given a temporal database, a non-temporal query interpreted under snapshot semantics returns a temporal relation that assigns to each point in time the result of evaluating over the snapshot of the database at this point in time. This fundamental property of snapshot semantics is known as snapshot-reducibility [28, 42]. A specific type of snapshot semantics is the so-called sequenced semantics [7] which in addition to snapshot-reducibility enforces another property called change preservation that determines how time points are grouped into intervals in a snapshot query result.

works
name skill period
Ann SP
Joe NS
Sam SP
Ann SP
assign
mach skill period
M1 SP
M2 SP
M3 NS
(a) Input period relations
cnt period
0
1
2
1
0
1
0
(b) Snapshot aggregation
skill period
SP
SP
NS
(c) Snapshot difference
Figure 1: Snapshot semantics query evaluation – highlighted tuples are erroneously omitted by approaches that exhibit the aggregation gap (AG) and bag difference (BD) bugs.
Example 1.1 (Snapshot Aggregation).

Consider the SQL period relation works in Figure 0(a) that records factory workers, their skills, and when they are on duty. The validity period of each tuple is stored in the temporal attribute period. To simplify examples, we restrict the time domain to the hours of 2018-01-01 represented as integers to . The company requires that at least one SP worker is in the factory at any given time. This can be checked by evaluating the following query under snapshot semantics. : SELECT count(*) AS cnt FROM works WHERE skill = ’SP’ Evaluated under snapshot semantics, a query returns a snapshot (time-varying) result that records when the result is valid, i.e., returns the number of SP workers that are on duty at any given point of time. The result is shown in Figure 0(b). For instance, at 08:00am two SP workers (Ann and Joe) are on duty. The query exposes several safety violations, e.g., no SP worker is on duty between 00 and 03.

In the example above, safety violations correspond to gaps, i.e., periods of time where the aggregation’s input is empty. As we will demonstrate, all approaches for snapshot semantics that we are aware of do not return results for gaps (tuples marked in red) and, therefore, violate snapshot-reducibility. Teradata [44, p.149] for instance, realized the importance of reporting results for gaps, but in contrast to snapshot-reducibility provides gaps in the presence of grouping, while omitting them otherwise. As a consequence, in our example these approaches fail to identify safety violations. We refer to this type of error as the aggregation gap bug (AG bug).

Similar to the case of aggregation, we also identify a common error related to snapshot bag difference (EXCEPT ALL).

Example 1.2 (Snapshot Bag Difference).

Consider again Figure 1. Relation assign records machines (mach) that need to be assigned to workers with a specific skill over a specific period of time. For instance, the third tuple records that machine M3 requires a non-specialized (NS) worker for the time period . To determine which skill sets are missing during which time period, we evaluate the following query under snapshot semantics: : SELECT skill FROM assign EXCEPT ALL SELECT skill FROM works The result in Figure 0(c) indicates that one more SP worker is required during the periods and .

Many approaches treat bag difference as a NOT EXISTS subquery, and therefore do not return a tuple from the left input if this tuple exists in the right input (independent of their multiplicity). For instance, the two tuples for the SP workers (highlighted in red) are not returned, since there exists an SP worker at each snapshot in the works relation. This violates snapshot-reducibility. We refer to this type of error as the bag difference bug (BD bug).

The interval-based representation of temporal relations creates an additional problem: the encoding of a temporal query result is typically not unique. For instance, tuple from the works relation in Figure 1 can equivalently be represented as two tuples and . We refer to a method that determines how temporal data and snapshot query results are grouped into intervals as an interval-based representation system. A unique and predictable representation of temporal data is a desirable property, because equivalent relational algebra expressions should not lead to syntactically different result relations. This problem can be addressed by using a representation system that associates a unique encoding with each temporal database. Furthermore, overlap between multiple periods associated with a tuple and unnecessary splits of periods complicate the interpretation of data and, thus, should be avoided if possible. Given these limitations and the lack of implementations for snapshot semantics queries over bag relations, users currently resort to manually implementing such queries in SQL which is time-consuming and error-prone [39]. We address the above limitations of previous approaches for snapshot semantics and develop a framework based on the following desiderata: (i) support for set and multiset relations, (ii) snapshot-reducibility for all operations, and (iii) a unique interval-based encoding of temporal relations. Note that while previous work on sequenced semantics (e.g., [18, 16]) also aims to support snapshot-reducibility, we emphasize a unique encoding instead of trying to preserve intervals from the input of a query. We address these desiderata using a three-level approach. Note that we focus on data with a single time dimension, but are oblivious to whether this is transaction time or valid time. First, we introduce an abstract model that supports both sets and multisets, and by definition is snapshot-reducible. This model, however, uses a verbose encoding of temporal data and, thus, is not practical. Afterwards, we develop a more compact logical model as a representation system, where the complete temporal history of all equivalent tuples from the abstract model is stored in an annotation attached to one tuple. The abstract and the logical models leverage the theory of K-relations, which are a general class of annotated relations that cover both set and multiset relations. For our implementation, we use SQL over period relations to ensure compatibility with SQL:2011 and existing DBMSs. We prove the equivalence between the three layers (i.e., the abstract model, the logical model and the implementation) and show that the logical model determines a unique interval-encoding for the implementation and a correct rewriting scheme for queries over this encoding.

Our main technical contributions are:

  • [noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt]

  • Abstract model: We introduce snapshot -relations as a generalization of snapshot set and multiset relations. These relations are by definition snapshot-reducible.

  • Logical model: We define an interval-based representation, termed period -relations, and prove that these relations are a compact and unique representation system for snapshot semantics over snapshot -relations. We show this for the full relational algebra plus aggregation ().

  • We achieve a unique encoding of temporal data as period -relations by generalizing set-based coalescing [10].

  • We demonstrate that the multiset version of period -relations can be encoded as SQL period relations, a common interval-based model in DBMSs, and how to translate queries with snapshot semantics over period -relations into SQL.

  • We implement our approach as a database middleware and present optimizations that eliminate redundant coalescing steps. We demonstrate experimentally that we do not need to sacrifice performance to achieve correctness.

2 Related Work

Temporal Query Languages. There is a long history of research on temporal query languages [22, 6]. Many temporal query languages including TSQL2 [38, 40], ATSQL2 (Applied TSQL2) [8], IXSQL [28], ATSQL [9], and SQL/TP [46] support sequenced semantics, i.e., these languages support a specific type of snapshot semantics. In this paper, we provide a general framework that can be used to correctly implement snapshot semantics over period set and multiset relations for any language.

Interval-based Approaches for Sequenced Semantics. In the following, we discuss interval-based approaches for sequenced semantics. Table 1 shows for each approach whether it supports multisets, whether it is free of the aggregation gap and bag difference bugs, and whether its interval-based encoding of a sequenced query result is unique. An N/A indicates that the approach does not support the operation for which this type of bug can occur or the semantics of this operation is not defined precisely enough to judge its correctness. Note that while temporal query languages may be defined to apply sequenced semantics and, thus, by definition are snapshot-reducible, (the specification of) their implementation might fail to be snapshot-reducible. In the following discussion of the temporal query languages in Table 1, we refer to their semantics as provided in the referenced publication(s).

Interval preservation (ATSQL) [9, Def. 2.10] is a representation system for SQL period relations (multisets) that tries to preserve the intervals associated with input tuples, i.e., fragments of all intervals (including duplicates) associated with the input tuples “survive” in the output. Interval preservation is snapshot-reducible for multiset semantics for positive relational algebra[36] (selection, projection, join, and union), but exhibits the aggregation gap and bag difference bug. Moreover, the period encoding of a query result is not unique as it depends both on the query and the input representation. Teradata [44] is a commercial DBMS that supports sequenced operators using ATSQL’s statement modifiers. The implementation is based on query rewriting [2] and does not support difference. Teradata’s implementation exhibits the aggregation gap bug. Since the application of coalescing is optional, the encoding of snapshot relations as period relations is not unique. Change preservation [18, Def. 3.4] determines the interval boundaries of a query result tuple based on the maximal interval for which there is no change in the input. To track changes, it employs the lineage provenance model in [16] and the PI-CS model in [18]. The approach uses timestamp adjustment in combination with traditional database operators, but does not provide a unique encoding, exhibits the AG bug, and only supports set semantics. Our work addresses these issues and significantly generalizes this approach, in particular by supporting bag semantics. TSQL2 [38, 40, 42] implicitly applies coalescing [10] to produce a unique representation. Thus, it only supports set semantics, and it does not support aggregation. Snodgrass et al. [41] present a validtime extension of SQL/Temporal and an algebra with sequenced semantics. The algebra supports multisets, but exhibits both the aggregation gap and bag difference bug. Since intervals from the input are preserved where possible, the interval representation of a snapshot relation is not unique. TimeDB [43] is an implementation of ATSQL2 [8]. It uses a semantics for bag difference and intersection that is not snapshot-reducible (see [43, pp. 63]). Our approach is the first that supports set and multiset relations, is resilient against the two bugs, and specifies a unique interval-encoding.

Approach Multisets AG bug free BD bug free Unique encoding
Interval preservation [9] (ATSQL)
Teradata [44] N/A 111Optionally, coalescing (NORMALIZE ON in Teradata) can be applied to get a unique encoding at the cost of loosing multiplicities.
Change preservation [16, 18] N/A
TSQL2 [38, 40, 42] N/A N/A
ATSQL2 [8] N/A
TimeDB [43] (ATSQL2) N/A
SQL/Temporal [41]
SQL/TP [46]222Sequenced semantics can be expressed, but this is inefficient
Our approach
Table 1: Interval-based approaches for snapshot semantics.

Implementation

SQL period relations

name skill period
Ann SP
Joe NS
Sam SP
Ann SP

cnt period
0
1
2
1
0
1
0

PeriodEnc

PeriodEnc

Logical

Period K-relations

name skill
Ann SP
Sam SP
Joe NS

cnt
0
1
2

Abstract

Snapshot K-relations

name skill

name skill
Ann SP 1
Joe NS 1
Sam SP 1

name skill
Ann SP 1

cnt
0 1

cnt
2 1

cnt
1 1

Figure 2: Overview of our approach. Our abstract model is snapshot K-relations and nontemporal queries over snapshots (snapshot semantics). Our logical model is period K-relations and queries corresponding to the abstract model’s snapshot queries. Our implementation uses SQL period relations and rewritten non-temporal queries implementing the other model’s snapshot queries. Each model is associated with transformations to the other models which commute with queries (modulo the rewriting Rewr when mapping to the implementation).

Non-sequenced Temporal Queries. Non-sequenced temporal query languages, such as IXSQL [28] and SQL/TP [46], do not explicitly support sequenced semantics. Nevertheless, we review these languages here since they allow to express queries with sequenced semantics. SQL/TP [46] introduces a point-wise semantics for temporal queries [12, 45], where time is handled as a regular attribute. Intervals are used as an efficient encoding of time points, and a normalization operation is used to split intervals. The language supports multisets and a mechanism to manually produce sequenced semantics. However, sequenced semantics queries are specified as the union of non-temporal queries over snapshots. Even if such subqueries are grouped together for adjacent time points where the non-temporal query’s result is constant this still results in a large number of subqueries to be executed. Even worse, the number of subqueries that is required is data dependent. Also, the interval-based encoding is not unique, since time points are grouped into intervals depending on query syntax and encoding of the input. While this has no effect on the semantics since SQL/TP queries cannot distinguish between different interval-based encodings of a temporal database, it might be confusing to users that observe different query results for equivalent queries/inputs.

Implementations of Temporal Operators. A large body of work has focused on the implementation of individual temporal algebra operators such as joins [17, 32, 11] and aggregation [5, 33, 31]. Some exceptions supporting multiple operators are [25, 18, 13]. These approaches introduce efficient evaluation algorithms for a particular semantics of a temporal algebra operator. Our approach can utilize efficient operator implementations as long as (i) their semantics is compatible with our interval-based encoding of snapshot query results and (ii) they are snapshot-reducible.

Coalescing. Coalescing produces a unique representation of a set semantics temporal database. Böhlen et al. [10] study optimizations for coalescing that eliminate unnecessary coalescing operations. Zhou et al. [47] and [1] use analytical functions to efficiently implement coalescing in SQL. We generalize coalescing to -relations to define a unique encoding of interval-based temporal relations, including multiset relations. Similar to [10], we remove unnecessary K-coalescing steps and, similar to [47], we use OLAP functions for efficient implementation.

Temporality in Annotated Databases. Kostiley et al. [26] is to the best of our knowledge the only previous approach that uses semiring annotations to express temporality. The authors define a semiring whose elements are sets of time points. This approach is limited to set semantics, and no interval-based encoding was presented. The LIVE system [15] combines provenance and uncertainty annotations with versioning. The system uses interval timestamps, and query semantics is based on snapshot-reducibility [15, Def. 2]. However, computing the intervals associated with a query result requires provenance to be maintained for every query result.

3 Solution Overview

In this section, we give an overview of our three-level framework, which is illustrated in Figure 2.

Abstract model – Snapshot -relations. As an abstract model we use snapshot relations which map time points to snapshots. Queries over such relations are evaluated over each snapshot, which trivially satisfies snapshot-reducibility. To support both sets and multisets, we introduce snapshot -relations [20], which are snapshot relations where each snapshot is a -relation. In a -relation, each tuple is annotated with an element from a domain . For example, relations annotated with elements from the semiring (natural numbers) correspond to multiset semantics. The result of a snapshot query over a snapshot -relation is the result of evaluating over the -relation at each time point.

Example 3.1 (Abstract Model).

Figure 2 (bottom) shows the snapshots at times 00, 08, and 18 of an encoding of the running example as snapshot -relations. Each snapshot is an -relation where tuples are annotated with their multiplicity (shown with shaded background). For instance, the snapshot at time 08 has three tuples, each with multiplicity 1. The result of query is shown on the bottom right. Every snapshot in the result is computed by running over the corresponding snapshot in the input. For instance, at time 08 there are two SP workers, i.e., .

Logical Model – Period -relations. We introduce period -relations as a logical model, which merges equivalent tuples over all snapshots from the abstract model into one tuple. In a period -relation, every tuple is annotated with a temporal -element that is a unique interval-based representation for all time points of the merged tuples from the abstract model. We define a class of semirings called period semirings whose elements are temporal -elements. Specifically, for any semiring we can construct a period semiring whose annotations are temporal -elements. For instance, is the period semiring corresponding to semiring (multisets). We define necessary conditions for an interval-based model to correctly encode snapshot -relations and prove that period -relations fullfil these conditions. Specifically, we call an interval-based model a representation system iff the encoding of every snapshot -relation is (i) unique and (ii) snapshot-equivalent to . Furthermore, (iii) queries over encodings are snapshot-reducible.

Example 3.2 (Logical Model).

Figure 2 (middle) shows an encoding of the running example as period -relations. For instance, all tuples from the abstract model are merged into one tuple in the logical model with annotation , because at each time point during and a tuple with multiplicity exists. In Section 4.2, we will introduce a mapping from snapshot to -relations and the time slice operator which restores an the snapshot at time .

Implementation – SQL Period Relations. To ensure compatibility with the SQL standard, we use SQL period relations in our implementation and translate snapshot semantics queries into SQL queries over these period relations. For this we define an encoding of -relations as SQL period relations (PeriodEnc) together with a rewriting scheme for queries (Rewr).

Example 3.3 (Implementation).

Consider the SQL period relations shown on the top of Figure 2. Each interval-annotation pair of a temporal -element in the logical model is encoded as a separate tuple in the implementation. For instance, the annotation of tuple from the logical model is encoded as two tuples, each of which records one of the two intervals from this annotation

We present an implementation of our framework as a database middleware that exposes snapshot semantics as a new language feature in SQL and rewrites snapshot queries into SQL queries over SQL period relations. That is, we directly evaluate snapshot queries over data stored natively as period relations.

4 Snapshot K-relations

We first review background on the semiring annotation framework (-relations). Afterwards, we define snapshot -relations as our abstract model and snapshot semantics for this model. Importantly, queries over snapshot -relations are snapshot-reducible by construction. Finally, we state requirements for a logical model to be a representation system for this abstract model.

4.1 K-relations

In a -relation [20], every tuple is annotated with an element from a domain of a commutative semiring . A structure over a set with binary operations and is a commutative semiring iff (i) addition and multiplication are commutative, associative, and have a neutral element ( and , respectively); (ii) multiplication distributes over addition; and (iii) multiplication with zero returns zero. Abusing notation, we will use to denote both a semiring structure as well as its domain.

Consider a universal countable domain of values. An n-ary -relation over is a (total) function that maps tuples (elements from ) to elements from with the convention that tuples mapped to are not in the relation. Furthermore, we require that only holds for finitely many . Two semirings are of particular interest to us: The semiring with elements true and false using as addition and as multiplication corresponds to set semantics. The semiring of natural numbers with standard arithmetics corresponds to multisets.

The operators of the positive relational algebra [36] () over -relations are defined by applying the and operations of the semiring to input annotations. Intuitively, the and operations of the semiring correspond to the alternative and conjunctive use of tuples, respectively. For instance, if an output tuple is produced by joining two input tuples annotated with and , then the tuple is annotated with . Below we provide the standard definition of over -relations [20]. For a tuple , we use to denote the projection of on a list of projection expressions and to denote the projection of on the attributes of relation . For a condition and tuple , denotes a function that returns if and otherwise.

Definition 4.1 ( over -relations).

Let be a semiring, , denote -relations, , denote tuples of appropriate arity, and . on -relations is defined as:

We will make use of homomorphisms, functions from the domain of a semiring to the domain of a semiring that commute with the semiring operations. Since over -relations is defined in terms of these operations, it follows that semiring homomorphisms commute with queries, as was proven in [20].

Definition 4.2 (Homomorphism).

A mapping is called a homomorphism iff for all :

Example 4.1.

Consider the -relations shown below which are non-temporal versions of our running example. Query returns machines for which there are workers with the right skill to operate the machine. Under multiset semantics we expect M1 to occur in the result of with multiplicity since joins with and with . Evaluating the query in yields the expected result by multiplying the annotations of these join partners. Given the result of the query, we can compute the result of the query under set semantics by applying a homomorphism which maps all non-zero annotations to true and to false. For example, for result we get , i.e., this tuple is in the result under set semantics.

name skill Pete SP Bob SP Alice NS mach skill M1 SP M2 NS Result A M1 M2

4.2 Snapshot K-relations

We now formally define snapshot -relations, snapshot semantics over such relations, and then define representation systems. We assume a totally ordered and finite domain of time points and use to denote its order. and denote the minimal and maximal (exclusive) time point in according to , respectively. We use to denote the successor of according to .

A snapshot -relation over a relation schema is a function , where is the set of all -relations with schema . Snapshot -databases are defined analog. We use to denote the set of all snapshot -databases for time domain .

Definition 4.3 (Snapshot -relation).

Let be a commutative semiring and a relation schema. A snapshot -relation is a function .

For instance, a snapshot -relation is shown in Figure 2 (bottom). Given a snapshot -relation, we use the timeslice operator [23] to access its state (snapshot) at a time point :

The evaluation of a query over a snapshot database (set of snapshot relations) under snapshot semantics returns a snapshot relation that is constructed as follows: for each time point we have . Thus, snapshot temporal queries over snapshot -relations behave like queries over -relations for each snapshot, i.e., their semantics is uniquely determined by the semantics of queries over -relations.

Definition 4.4 (Snapshot Semantics).

Let be a snapshot -database and be a query. The result of over is a snapshot -relation that is defined point-wise as follows:

For example, consider the snapshot -relation shown at the bottom of Figure 2 and the evaluation of under snapshot semantics as also shown in this figure. Observe how the query result is computed by evaluating over each snapshot individually using multiset () query semantics. Furthermore, since , per the above definition, the timeslice operator commutes with queries: . This property is snapshot-reducibility.

4.3 Representation Systems

To compactly encode snapshot -relations, we study representation systems that consist of a set of representations , a function which associates an encoding in with the snapshot -database it represents, and a timeslice operator which extracts the snapshot at time from an encoding. If Enc is injective, then we use to denote the unique encoding associated with . We use to denote the timeslice over both snapshot databases and representations. It will be clear from the input which operator refers to. For such a representation system, we consider two encodings and from to be snapshot-equivalent [21] (written as ) if they encode the same snapshot -database. Note that this is the case if they encode the same snapshots, i.e., iff for all we have . For a representation system to behave correctly, the following conditions have to be met: 1) uniqueness: for each snapshot -database there exists a unique element from representing ; 2) snapshot-reducibility: the timeslice operator commutes with queries; and 3) snapshot-preservation: the encoding function Enc preserves the snapshots of the input.

Definition 4.5 (Representation System).

We call a triple a representation system for snapshot -databases with regard to a class of queries iff for every snapshot database , encodings , , time point , and query we have

  1. (uniqness)

  2. (snapshot-reducibility)

  3. (snapshot-preservation)

5 Temporal -elements

We now introduce temporal -elements that are the annotations we use to define our logical model (representation system). Temporal -elements record, using an interval-based encoding, how the -annotation of a tuple in a snapshot -relation changes over time. We introduce a unique normal form for temporal -elements based on a generalization of coalescing [10].

5.1 Defining Temporal -elements

To define temporal -elements, we need to introduce some background on intervals. Given the time domain and its associated total order , an interval is a pair of time points from , where . Interval represents the set of contiguous time points . For an interval we use to denote and to denote . We use to represent intervals. We define a relation that contains all interval pairs that are adjacent: . We will implicitly understand set operations, such as or , to be interpreted over the set of points represented by an interval. Furthermore, denotes the interval that covers precisely the intersection of the sets of time points defined by and and denotes their union (only well-defined if or ). For convenience, we define iff . We use to denote the set of all intervals over .

Definition 5.1 (Temporal -elements).

Given a semiring , a temporal -element is a function . We use to denote the set of all such temporal elements for .

We represent temporal -elements as sets of input-output pairs. Intervals that are not explicitly mentioned are mapped to .

Example 5.1.

Reconsider our running example with . The history of the annotation of tuple (Ann,SP) from the works relation is as shown in Figure 2 (middle). For sake of the example, we change the multiplicity of this tuple to during and during . This information is encoded as the temporal -element .

Note that a temporal -element may map overlapping intervals to non-zero elements of . We assign the following semantics to overlap: the annotation at a time point recorded by is the sum of the annotations assigned to intervals containing . For instance, the annotation at time for the -element would be . To extract the annotation valid at time from a temporal -element , we define a timeslice operator for temporal -elements as follows:

(timeslice operator)

Given two temporal -elements and , we would like to know if they represent the same history of annotations. For that, we define snapshot-equivalence () for temporal -elements:

(snapshot-equivalence)

5.2 A Normal Form Based on -Coalescing

The encoding of the annotation history of a tuple as a temporal -element is typically not unique.

Example 5.2.

Reconsider the temporal -element from Example 5.1. Recall that intervals not shown are mapped to . The -elements shown below are snapshot-equivalent to .

To be able to build a representation system based on temporal -elements we need a unique way to encode the annotation history of a tuple as a temporal -element (condition 1 of Definition 4.5). That is, we need to define a normal form that is unique for snapshot-equivalent temporal -elements. To this end, we generalize coalescing, which was defined for temporal databases with set semantics in [37, 10]. The generalized form, which we call -coalescing, coincides with standard coalescing for semiring (set semantics) and, for any semiring , yields a unique encoding.

-coalescing creates maximal intervals of contiguous time points with the same annotation. The output is a temporal -element such that (a) no two intervals mapped to a non-zero element overlap and (b) adjacent intervals assigned to non-zero elements are guaranteed to be mapped to different annotations. To determine such intervals, we define annotation changepoints, time points where the annotation of a temporal -element differs from the annotation at , i.e., ). It will be convenient to also consider as an annotation changepoint.

Definition 5.2 (Annotation Changepoint).

Given a temporal -element , a time point is called a changepoint in if one of the following conditions holds:

  • (smallest time point)

  • (change of annotation)

We use to denote the set of all annotation changepoints for . Furthermore, we define to be the set of all intervals that consist of consecutive change points:

In Definition 5.2, computes maximal intervals such that the annotation assigned by to each point in such an interval is constant. In the coalesced representation of only such intervals are mapped to non-zero annotations.

Definition 5.3 (-Coalesce).

Let be a temporal -element. We define -coalescing as a function :

We use to denote all normalized temporal -elements, i.e., elements for which for some .

sal period
k
k
k
k
Figure 3: Example period multiset relation and temporal -elements encoding the history of tuples.
Example 5.3.

Consider the SQL period relation shown in Figure 3. The temporal -elements encode the history of tuples , and . Note that is not coalesced since the two non-zero intervals of this -element overlap. Applying -coalesce we get:

That is, this tuple occurs twice within the time interval and once in , i.e., it has annotation changepoints , , and . Interpreting the same relation under set semantics (semiring ), the history of can be encoded as a temporal -element . Applying -coalesce:

That is, this tuple occurs (is annotated with ) within the time interval and its annotation changepoints are and .

We now prove several important properties of the -coalesce operator establishing that (coalesced temporal -elements) is a good choice for a normal form of temporal -elements.

Lemma 5.1.

Let be a semiring and , and temporal -elements. We have:

(idempotence)
(uniqueness)
(equivalence preservation)
Proof.

All proofs are shown in Appendix A. ∎

6 Period Semirings

Having established a unique normal form of temporal -elements, we now proceed to define period semirings as our logical model. The elements of a period semiring are temporal -elements in normal form. We prove that these structures are semirings and ultimately that relations annotated with period semirings form a representation system for snapshot -relations for . In Section 7, we then prove them to also be a representation system for , i.e., queries involving difference and aggregation.

When defining the addition and multiplication operations and their neutral elements in the semiring structure of temporal -elements, we have to ensure that these definitions are compatible with semiring on snapshots. Furthermore, we need to ensure that the output of these operations is guaranteed to be -coalesced. The latter can be ensured by applying -coalesce to the output of the operation. For addition, snapshot reducibility is achieved by pointwise addition (denoted as ) of the two functions that constitute the two input temporal -elements. That is, for each interval , the function that is the result of the addition of temporal -elements and assigns to the value . For multiplication, the multiplication of two -elements assigned to an overlapping pair of intervals and is valid during the intersection of and . Since both input temporal -elements may assign non-zero values to multiple intervals that have the same overlap, the resulting -value at a point would be the sum over all pairs of overlapping intervals. We denote this operation as . Since and may return a temporal -element that is not coalesced, we define the operations of our structures to apply to the result of and . The zero element of the temporal extension of is the temporal -element that maps all intervals to and the element is the temporal element that maps every interval to except for which is mapped to .

Definition 6.1 (Period Semiring).

For a time domain with minimum and maximum and a semiring , the period semiring is defined as:

where for and :


Example 6.1.

Consider the -relation works shown in Figure 2 (middle) and query . Recall that the annotation of a tuple in the result of a projection over a -relation is the sum of all input tuples which are projected onto . For result tuple (SP) we have input tuples (Ann,SP) and (Sam,SP) with and , respectively. The tuple (SP) is annotated with the sum of these annotations, i.e., . Substituting definitions we get:

Thus, as expected, the result records that, e.g., there are two skilled workers (SP) on duty during time interval .

Having defined the family of period semirings, it remains to be shown that with standard K-relational query semantics is a representation system for snapshot -relations.

6.1 is a Semiring

As a first step, we prove that for any semiring , the structure is also a semiring. The following lemma shows that -coalesce can be redundantly pushed into and operations.

Lemma 6.1.

Let be a semiring and . Then,

Using this lemma, we now prove that for any semiring , the structure is also a semiring.

Theorem 6.2.

For any semiring , structure is a semiring.

6.2 Timeslice Operator

We define a timeslice operator for -relations based on the timeslice operator for temporal -elements. We annotate each tuple in the output of this operator with the result of applied to the temporal -element the tuple is annotated with.

Definition 6.2 (Timeslice for -relations).

Let be a -relation and . The timeslice operator is defined as:

We now prove that the is a homomorphism . Since semiring homomorphisms commute with queries [20], equipped with this timeslice operator does fulfill the snapshot-reducibility condition of representation systems (Definition 4.5).

Theorem 6.3.

For any , the timeslice operator is a semiring homomorphism from to .

As an example of the application of this homomorphism, consider the period -relation works from our running example as shown on the left of Figure 2. Applying to this relation yields the snapshot shown on the bottom of this figure (three employees work between 8am and 9am out of whom two are specialized). If we evaluate query over this snapshot we get the snapshot shown on the right of this figure (the count is 2). By Theorem 6.3 we get the same result if we evaluate over the input period -relation and then apply to the result.

6.3 Encoding of Snapshot K-relations

We now define a bijective mapping from snapshot -relations to -relations. We then prove that the set of -relations together with the timeslice operator for such relations and the mapping (the inverse of ) form a representation system for snapshot -relations. Intuitively, is constructed by assigning each tuple a temporal -element where the annotation of the tuple at time (i.e., ) is assigned to a singleton interval . This temporal -element is then coalesced to create a element.

Definition 6.3.

Let be a semiring and a snapshot -relation, is a mapping from snapshot -relations to -relations defined as follows.

We first prove that this mapping is bijective, i.e., it is invertible, which guarantees that is well-defined and also implies uniqueness (condition 1 of Definition 4.5).

Lemma 6.4.

For any semiring , is bijective.

Next, we have to show that preserves snapshots, i.e., the instance at a time point represented by can be extracted from using the timeslice operator.

Lemma 6.5.

For any semiring , snapshot -relation , and time point , we have .

Based on these properties of and the fact that the timeslice operator over -relations is a homomorphism , our main technical result follows immediately. That is, the set of -relations equipped with the timeslice operator and is a representation system for positive relational algebra queries () over snapshot -relations.

Theorem 6.6 (Representation System).

Given a semiring , let be the set of all -relations. The triple is a representation system for queries over snapshot -relations.

7 Complex Queries

Having proven that -relations form a representation system for , we now study extensions for difference and aggregation.

7.1 Difference

Extensions of -relations for difference have been studied in [19, 3]. For instance, the difference operator on relations corresponds to bag difference (SQL’s EXCEPT ALL). Geerts et al. [19] apply an extension of semirings with a monus operation that is defined based on the natural order of a semiring and demonstrated how to define a difference operation for -relations based on the monus operation for semirings where this operations is well-defined. Following the terminology introduced in this work, we refer to semirings with a monus operation as m-semirings. We now prove that if a semiring has a well-defined monus, then so does . From this follows, that for any such , the difference operation is well-defined for . We proceed to show that the timeslice operator is an m-semiring homomorphism, which implies that -relations for any m-semiring form a representation system for (full relational algebra). The definition of a monus operator is based on the so-called natural order . For two elements and of a semiring , . If is a partial order then is called naturally ordered. For instance, is naturally ordered ( corresponds to the order of natural numbers) while is not (for any we have ). For the monus to be well-defined on , has to be naturally ordered and for any , the set has to have a smallest member. For any semiring fulfilling these two conditions, the monus operation is defined as where is the smallest element such that . For instance, the monus for is the truncating minus: .

Theorem 7.1.

For any m-semiring , semiring has a well-defined monus, i.e., is an m-semiring.

Let denote an operation that returns a temporal -element which assigns to each singleton interval the result of the monus for : (this is as defined in the proof of Theorem 7.1, see Appendix A). In the proof of Theorem 7.1, we demonstrate that . Obviously, computing using singleton intervals is not effective. In our implementation, we use a more efficient way to compute the monus for that is based on normalizing the input temporal -elements and such that annotations are attached to larger time intervals where is guaranteed to be constant. Importantly, is a homomorphism for monus-semiring .

Theorem 7.2.

Mapping is an m-semiring homomorphism.

For example, consider from Example 1.2 which can be expressed in relational algebra as . The -relation corresponding to the period relation assign shown in this example annotates each tuple with a singleton temporal -element mapping the period of this tuple to , e.g., (M1, SP) is annotated with . The annotation of result tuple (SP) is computed as