1 Introduction
Recently, there is renewed interest in temporal databases fueled by the fact that abundant storage has made long term archival of historical data feasible. This has led to the incorporation of temporal features into the SQL:2011 standard [27] which defines an encoding of temporal data associating each tuple with a validity period. We refer to such relations as SQL period relations. Note that SQL period relations use multiset semantics. Period relations are supported by many DBMSs, e.g., PostgreSQL [34], Teradata [44], Oracle [30], IBM DB2 [35], and MS SQLServer [29]. However, none of these systems, with the partial exception of Teradata, supports snapshot semantics, an important class of temporal queries. Given a temporal database, a nontemporal query interpreted under snapshot semantics returns a temporal relation that assigns to each point in time the result of evaluating over the snapshot of the database at this point in time. This fundamental property of snapshot semantics is known as snapshotreducibility [28, 42]. A specific type of snapshot semantics is the socalled sequenced semantics [7] which in addition to snapshotreducibility enforces another property called change preservation that determines how time points are grouped into intervals in a snapshot query result.



Example 1.1 (Snapshot Aggregation).
Consider the SQL period relation works in Figure 0(a) that records factory workers, their skills, and when they are on duty. The validity period of each tuple is stored in the temporal attribute period. To simplify examples, we restrict the time domain to the hours of 20180101 represented as integers to . The company requires that at least one SP worker is in the factory at any given time. This can be checked by evaluating the following query under snapshot semantics. : SELECT count(*) AS cnt FROM works WHERE skill = ’SP’ Evaluated under snapshot semantics, a query returns a snapshot (timevarying) result that records when the result is valid, i.e., returns the number of SP workers that are on duty at any given point of time. The result is shown in Figure 0(b). For instance, at 08:00am two SP workers (Ann and Joe) are on duty. The query exposes several safety violations, e.g., no SP worker is on duty between 00 and 03.
In the example above, safety violations correspond to gaps, i.e., periods of time where the aggregation’s input is empty. As we will demonstrate, all approaches for snapshot semantics that we are aware of do not return results for gaps (tuples marked in red) and, therefore, violate snapshotreducibility. Teradata [44, p.149] for instance, realized the importance of reporting results for gaps, but in contrast to snapshotreducibility provides gaps in the presence of grouping, while omitting them otherwise. As a consequence, in our example these approaches fail to identify safety violations. We refer to this type of error as the aggregation gap bug (AG bug).
Similar to the case of aggregation, we also identify a common error related to snapshot bag difference (EXCEPT ALL).
Example 1.2 (Snapshot Bag Difference).
Consider again Figure 1. Relation assign records machines (mach) that need to be assigned to workers with a specific skill over a specific period of time. For instance, the third tuple records that machine M3 requires a nonspecialized (NS) worker for the time period . To determine which skill sets are missing during which time period, we evaluate the following query under snapshot semantics: : SELECT skill FROM assign EXCEPT ALL SELECT skill FROM works The result in Figure 0(c) indicates that one more SP worker is required during the periods and .
Many approaches treat bag difference as a NOT EXISTS subquery, and therefore do not return a tuple from the left input if this tuple exists in the right input (independent of their multiplicity). For instance, the two tuples for the SP workers (highlighted in red) are not returned, since there exists an SP worker at each snapshot in the works relation. This violates snapshotreducibility. We refer to this type of error as the bag difference bug (BD bug).
The intervalbased representation of temporal relations creates an additional problem: the encoding of a temporal query result is typically not unique. For instance, tuple from the works relation in Figure 1 can equivalently be represented as two tuples and . We refer to a method that determines how temporal data and snapshot query results are grouped into intervals as an intervalbased representation system. A unique and predictable representation of temporal data is a desirable property, because equivalent relational algebra expressions should not lead to syntactically different result relations. This problem can be addressed by using a representation system that associates a unique encoding with each temporal database. Furthermore, overlap between multiple periods associated with a tuple and unnecessary splits of periods complicate the interpretation of data and, thus, should be avoided if possible. Given these limitations and the lack of implementations for snapshot semantics queries over bag relations, users currently resort to manually implementing such queries in SQL which is timeconsuming and errorprone [39]. We address the above limitations of previous approaches for snapshot semantics and develop a framework based on the following desiderata: (i) support for set and multiset relations, (ii) snapshotreducibility for all operations, and (iii) a unique intervalbased encoding of temporal relations. Note that while previous work on sequenced semantics (e.g., [18, 16]) also aims to support snapshotreducibility, we emphasize a unique encoding instead of trying to preserve intervals from the input of a query. We address these desiderata using a threelevel approach. Note that we focus on data with a single time dimension, but are oblivious to whether this is transaction time or valid time. First, we introduce an abstract model that supports both sets and multisets, and by definition is snapshotreducible. This model, however, uses a verbose encoding of temporal data and, thus, is not practical. Afterwards, we develop a more compact logical model as a representation system, where the complete temporal history of all equivalent tuples from the abstract model is stored in an annotation attached to one tuple. The abstract and the logical models leverage the theory of Krelations, which are a general class of annotated relations that cover both set and multiset relations. For our implementation, we use SQL over period relations to ensure compatibility with SQL:2011 and existing DBMSs. We prove the equivalence between the three layers (i.e., the abstract model, the logical model and the implementation) and show that the logical model determines a unique intervalencoding for the implementation and a correct rewriting scheme for queries over this encoding.
Our main technical contributions are:

[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt]

Abstract model: We introduce snapshot relations as a generalization of snapshot set and multiset relations. These relations are by definition snapshotreducible.

Logical model: We define an intervalbased representation, termed period relations, and prove that these relations are a compact and unique representation system for snapshot semantics over snapshot relations. We show this for the full relational algebra plus aggregation ().

We achieve a unique encoding of temporal data as period relations by generalizing setbased coalescing [10].

We demonstrate that the multiset version of period relations can be encoded as SQL period relations, a common intervalbased model in DBMSs, and how to translate queries with snapshot semantics over period relations into SQL.

We implement our approach as a database middleware and present optimizations that eliminate redundant coalescing steps. We demonstrate experimentally that we do not need to sacrifice performance to achieve correctness.
2 Related Work
Temporal Query Languages. There is a long history of research on temporal query languages [22, 6]. Many temporal query languages including TSQL2 [38, 40], ATSQL2 (Applied TSQL2) [8], IXSQL [28], ATSQL [9], and SQL/TP [46] support sequenced semantics, i.e., these languages support a specific type of snapshot semantics. In this paper, we provide a general framework that can be used to correctly implement snapshot semantics over period set and multiset relations for any language.
Intervalbased Approaches for Sequenced Semantics. In the following, we discuss intervalbased approaches for sequenced semantics. Table 1 shows for each approach whether it supports multisets, whether it is free of the aggregation gap and bag difference bugs, and whether its intervalbased encoding of a sequenced query result is unique. An N/A indicates that the approach does not support the operation for which this type of bug can occur or the semantics of this operation is not defined precisely enough to judge its correctness. Note that while temporal query languages may be defined to apply sequenced semantics and, thus, by definition are snapshotreducible, (the specification of) their implementation might fail to be snapshotreducible. In the following discussion of the temporal query languages in Table 1, we refer to their semantics as provided in the referenced publication(s).
Interval preservation (ATSQL) [9, Def. 2.10] is a representation system for SQL period relations (multisets) that tries to preserve the intervals associated with input tuples, i.e., fragments of all intervals (including duplicates) associated with the input tuples “survive” in the output. Interval preservation is snapshotreducible for multiset semantics for positive relational algebra[36] (selection, projection, join, and union), but exhibits the aggregation gap and bag difference bug. Moreover, the period encoding of a query result is not unique as it depends both on the query and the input representation. Teradata [44] is a commercial DBMS that supports sequenced operators using ATSQL’s statement modifiers. The implementation is based on query rewriting [2] and does not support difference. Teradata’s implementation exhibits the aggregation gap bug. Since the application of coalescing is optional, the encoding of snapshot relations as period relations is not unique. Change preservation [18, Def. 3.4] determines the interval boundaries of a query result tuple based on the maximal interval for which there is no change in the input. To track changes, it employs the lineage provenance model in [16] and the PICS model in [18]. The approach uses timestamp adjustment in combination with traditional database operators, but does not provide a unique encoding, exhibits the AG bug, and only supports set semantics. Our work addresses these issues and significantly generalizes this approach, in particular by supporting bag semantics. TSQL2 [38, 40, 42] implicitly applies coalescing [10] to produce a unique representation. Thus, it only supports set semantics, and it does not support aggregation. Snodgrass et al. [41] present a validtime extension of SQL/Temporal and an algebra with sequenced semantics. The algebra supports multisets, but exhibits both the aggregation gap and bag difference bug. Since intervals from the input are preserved where possible, the interval representation of a snapshot relation is not unique. TimeDB [43] is an implementation of ATSQL2 [8]. It uses a semantics for bag difference and intersection that is not snapshotreducible (see [43, pp. 63]). Our approach is the first that supports set and multiset relations, is resilient against the two bugs, and specifies a unique intervalencoding.
Approach  Multisets  AG bug free  BD bug free  Unique encoding 

Interval preservation [9] (ATSQL)  ✓  
Teradata [44]  ✓  N/A  ^{1}^{1}1Optionally, coalescing (NORMALIZE ON in Teradata) can be applied to get a unique encoding at the cost of loosing multiplicities.  
Change preservation [16, 18]  N/A  
TSQL2 [38, 40, 42]  N/A  N/A  
ATSQL2 [8]  ✓  N/A  
TimeDB [43] (ATSQL2)  ✓  N/A  
SQL/Temporal [41]  ✓  
SQL/TP [46]^{2}^{2}2Sequenced semantics can be expressed, but this is inefficient  ✓  ✓  ✓  
Our approach  ✓  ✓  ✓  ✓ 
Nonsequenced Temporal Queries. Nonsequenced temporal query languages, such as IXSQL [28] and SQL/TP [46], do not explicitly support sequenced semantics. Nevertheless, we review these languages here since they allow to express queries with sequenced semantics. SQL/TP [46] introduces a pointwise semantics for temporal queries [12, 45], where time is handled as a regular attribute. Intervals are used as an efficient encoding of time points, and a normalization operation is used to split intervals. The language supports multisets and a mechanism to manually produce sequenced semantics. However, sequenced semantics queries are specified as the union of nontemporal queries over snapshots. Even if such subqueries are grouped together for adjacent time points where the nontemporal query’s result is constant this still results in a large number of subqueries to be executed. Even worse, the number of subqueries that is required is data dependent. Also, the intervalbased encoding is not unique, since time points are grouped into intervals depending on query syntax and encoding of the input. While this has no effect on the semantics since SQL/TP queries cannot distinguish between different intervalbased encodings of a temporal database, it might be confusing to users that observe different query results for equivalent queries/inputs.
Implementations of Temporal Operators. A large body of work has focused on the implementation of individual temporal algebra operators such as joins [17, 32, 11] and aggregation [5, 33, 31]. Some exceptions supporting multiple operators are [25, 18, 13]. These approaches introduce efficient evaluation algorithms for a particular semantics of a temporal algebra operator. Our approach can utilize efficient operator implementations as long as (i) their semantics is compatible with our intervalbased encoding of snapshot query results and (ii) they are snapshotreducible.
Coalescing. Coalescing produces a unique representation of a set semantics temporal database. Böhlen et al. [10] study optimizations for coalescing that eliminate unnecessary coalescing operations. Zhou et al. [47] and [1] use analytical functions to efficiently implement coalescing in SQL. We generalize coalescing to relations to define a unique encoding of intervalbased temporal relations, including multiset relations. Similar to [10], we remove unnecessary Kcoalescing steps and, similar to [47], we use OLAP functions for efficient implementation.
Temporality in Annotated Databases. Kostiley et al. [26] is to the best of our knowledge the only previous approach that uses semiring annotations to express temporality. The authors define a semiring whose elements are sets of time points. This approach is limited to set semantics, and no intervalbased encoding was presented. The LIVE system [15] combines provenance and uncertainty annotations with versioning. The system uses interval timestamps, and query semantics is based on snapshotreducibility [15, Def. 2]. However, computing the intervals associated with a query result requires provenance to be maintained for every query result.
3 Solution Overview
In this section, we give an overview of our threelevel framework, which is illustrated in Figure 2.
Abstract model – Snapshot relations. As an abstract model we use snapshot relations which map time points to snapshots. Queries over such relations are evaluated over each snapshot, which trivially satisfies snapshotreducibility. To support both sets and multisets, we introduce snapshot relations [20], which are snapshot relations where each snapshot is a relation. In a relation, each tuple is annotated with an element from a domain . For example, relations annotated with elements from the semiring (natural numbers) correspond to multiset semantics. The result of a snapshot query over a snapshot relation is the result of evaluating over the relation at each time point.
Example 3.1 (Abstract Model).
Figure 2 (bottom) shows the snapshots at times 00, 08, and 18 of an encoding of the running example as snapshot relations. Each snapshot is an relation where tuples are annotated with their multiplicity (shown with shaded background). For instance, the snapshot at time 08 has three tuples, each with multiplicity 1. The result of query is shown on the bottom right. Every snapshot in the result is computed by running over the corresponding snapshot in the input. For instance, at time 08 there are two SP workers, i.e., .
Logical Model – Period relations. We introduce period relations as a logical model, which merges equivalent tuples over all snapshots from the abstract model into one tuple. In a period relation, every tuple is annotated with a temporal element that is a unique intervalbased representation for all time points of the merged tuples from the abstract model. We define a class of semirings called period semirings whose elements are temporal elements. Specifically, for any semiring we can construct a period semiring whose annotations are temporal elements. For instance, is the period semiring corresponding to semiring (multisets). We define necessary conditions for an intervalbased model to correctly encode snapshot relations and prove that period relations fullfil these conditions. Specifically, we call an intervalbased model a representation system iff the encoding of every snapshot relation is (i) unique and (ii) snapshotequivalent to . Furthermore, (iii) queries over encodings are snapshotreducible.
Example 3.2 (Logical Model).
Figure 2 (middle) shows an encoding of the running example as period relations. For instance, all tuples from the abstract model are merged into one tuple in the logical model with annotation , because at each time point during and a tuple with multiplicity exists. In Section 4.2, we will introduce a mapping from snapshot to relations and the time slice operator which restores an the snapshot at time .
Implementation – SQL Period Relations. To ensure compatibility with the SQL standard, we use SQL period relations in our implementation and translate snapshot semantics queries into SQL queries over these period relations. For this we define an encoding of relations as SQL period relations (PeriodEnc) together with a rewriting scheme for queries (Rewr).
Example 3.3 (Implementation).
Consider the SQL period relations shown on the top of Figure 2. Each intervalannotation pair of a temporal element in the logical model is encoded as a separate tuple in the implementation. For instance, the annotation of tuple from the logical model is encoded as two tuples, each of which records one of the two intervals from this annotation
We present an implementation of our framework as a database middleware that exposes snapshot semantics as a new language feature in SQL and rewrites snapshot queries into SQL queries over SQL period relations. That is, we directly evaluate snapshot queries over data stored natively as period relations.
4 Snapshot Krelations
We first review background on the semiring annotation framework (relations). Afterwards, we define snapshot relations as our abstract model and snapshot semantics for this model. Importantly, queries over snapshot relations are snapshotreducible by construction. Finally, we state requirements for a logical model to be a representation system for this abstract model.
4.1 Krelations
In a relation [20], every tuple is annotated with an element from a domain of a commutative semiring . A structure over a set with binary operations and is a commutative semiring iff (i) addition and multiplication are commutative, associative, and have a neutral element ( and , respectively); (ii) multiplication distributes over addition; and (iii) multiplication with zero returns zero. Abusing notation, we will use to denote both a semiring structure as well as its domain.
Consider a universal countable domain of values. An nary relation over is a (total) function that maps tuples (elements from ) to elements from with the convention that tuples mapped to are not in the relation. Furthermore, we require that only holds for finitely many . Two semirings are of particular interest to us: The semiring with elements true and false using as addition and as multiplication corresponds to set semantics. The semiring of natural numbers with standard arithmetics corresponds to multisets.
The operators of the positive relational algebra [36] () over relations are defined by applying the and operations of the semiring to input annotations. Intuitively, the and operations of the semiring correspond to the alternative and conjunctive use of tuples, respectively. For instance, if an output tuple is produced by joining two input tuples annotated with and , then the tuple is annotated with . Below we provide the standard definition of over relations [20]. For a tuple , we use to denote the projection of on a list of projection expressions and to denote the projection of on the attributes of relation . For a condition and tuple , denotes a function that returns if and otherwise.
Definition 4.1 ( over relations).
Let be a semiring, , denote relations, , denote tuples of appropriate arity, and . on relations is defined as:
We will make use of homomorphisms, functions from the domain of a semiring to the domain of a semiring that commute with the semiring operations. Since over relations is defined in terms of these operations, it follows that semiring homomorphisms commute with queries, as was proven in [20].
Definition 4.2 (Homomorphism).
A mapping is called a homomorphism iff for all :
Example 4.1.
Consider the relations shown below which are nontemporal versions of our running example. Query returns machines for which there are workers with the right skill to operate the machine. Under multiset semantics we expect M1 to occur in the result of with multiplicity since joins with and with . Evaluating the query in yields the expected result by multiplying the annotations of these join partners. Given the result of the query, we can compute the result of the query under set semantics by applying a homomorphism which maps all nonzero annotations to true and to false. For example, for result we get , i.e., this tuple is in the result under set semantics.
4.2 Snapshot Krelations
We now formally define snapshot relations, snapshot semantics over such relations, and then define representation systems. We assume a totally ordered and finite domain of time points and use to denote its order. and denote the minimal and maximal (exclusive) time point in according to , respectively. We use to denote the successor of according to .
A snapshot relation over a relation schema is a function , where is the set of all relations with schema . Snapshot databases are defined analog. We use to denote the set of all snapshot databases for time domain .
Definition 4.3 (Snapshot relation).
Let be a commutative semiring and a relation schema. A snapshot relation is a function .
For instance, a snapshot relation is shown in Figure 2 (bottom). Given a snapshot relation, we use the timeslice operator [23] to access its state (snapshot) at a time point :
The evaluation of a query over a snapshot database (set of snapshot relations) under snapshot semantics returns a snapshot relation that is constructed as follows: for each time point we have . Thus, snapshot temporal queries over snapshot relations behave like queries over relations for each snapshot, i.e., their semantics is uniquely determined by the semantics of queries over relations.
Definition 4.4 (Snapshot Semantics).
Let be a snapshot database and be a query. The result of over is a snapshot relation that is defined pointwise as follows:
For example, consider the snapshot relation shown at the bottom of Figure 2 and the evaluation of under snapshot semantics as also shown in this figure. Observe how the query result is computed by evaluating over each snapshot individually using multiset () query semantics. Furthermore, since , per the above definition, the timeslice operator commutes with queries: . This property is snapshotreducibility.
4.3 Representation Systems
To compactly encode snapshot relations, we study representation systems that consist of a set of representations , a function which associates an encoding in with the snapshot database it represents, and a timeslice operator which extracts the snapshot at time from an encoding. If Enc is injective, then we use to denote the unique encoding associated with . We use to denote the timeslice over both snapshot databases and representations. It will be clear from the input which operator refers to. For such a representation system, we consider two encodings and from to be snapshotequivalent [21] (written as ) if they encode the same snapshot database. Note that this is the case if they encode the same snapshots, i.e., iff for all we have . For a representation system to behave correctly, the following conditions have to be met: 1) uniqueness: for each snapshot database there exists a unique element from representing ; 2) snapshotreducibility: the timeslice operator commutes with queries; and 3) snapshotpreservation: the encoding function Enc preserves the snapshots of the input.
Definition 4.5 (Representation System).
We call a triple a representation system for snapshot databases with regard to a class of queries iff for every snapshot database , encodings , , time point , and query we have

(uniqness)

(snapshotreducibility)

(snapshotpreservation)
5 Temporal elements
We now introduce temporal elements that are the annotations we use to define our logical model (representation system). Temporal elements record, using an intervalbased encoding, how the annotation of a tuple in a snapshot relation changes over time. We introduce a unique normal form for temporal elements based on a generalization of coalescing [10].
5.1 Defining Temporal elements
To define temporal elements, we need to introduce some background on intervals. Given the time domain and its associated total order , an interval is a pair of time points from , where . Interval represents the set of contiguous time points . For an interval we use to denote and to denote . We use to represent intervals. We define a relation that contains all interval pairs that are adjacent: . We will implicitly understand set operations, such as or , to be interpreted over the set of points represented by an interval. Furthermore, denotes the interval that covers precisely the intersection of the sets of time points defined by and and denotes their union (only welldefined if or ). For convenience, we define iff . We use to denote the set of all intervals over .
Definition 5.1 (Temporal elements).
Given a semiring , a temporal element is a function . We use to denote the set of all such temporal elements for .
We represent temporal elements as sets of inputoutput pairs. Intervals that are not explicitly mentioned are mapped to .
Example 5.1.
Reconsider our running example with . The history of the annotation of tuple (Ann,SP) from the works relation is as shown in Figure 2 (middle). For sake of the example, we change the multiplicity of this tuple to during and during . This information is encoded as the temporal element .
Note that a temporal element may map overlapping intervals to nonzero elements of . We assign the following semantics to overlap: the annotation at a time point recorded by is the sum of the annotations assigned to intervals containing . For instance, the annotation at time for the element would be . To extract the annotation valid at time from a temporal element , we define a timeslice operator for temporal elements as follows:
(timeslice operator) 
Given two temporal elements and , we would like to know if they represent the same history of annotations. For that, we define snapshotequivalence () for temporal elements:
(snapshotequivalence) 
5.2 A Normal Form Based on Coalescing
The encoding of the annotation history of a tuple as a temporal element is typically not unique.
Example 5.2.
Reconsider the temporal element from Example 5.1. Recall that intervals not shown are mapped to . The elements shown below are snapshotequivalent to .
To be able to build a representation system based on temporal elements we need a unique way to encode the annotation history of a tuple as a temporal element (condition 1 of Definition 4.5). That is, we need to define a normal form that is unique for snapshotequivalent temporal elements. To this end, we generalize coalescing, which was defined for temporal databases with set semantics in [37, 10]. The generalized form, which we call coalescing, coincides with standard coalescing for semiring (set semantics) and, for any semiring , yields a unique encoding.
coalescing creates maximal intervals of contiguous time points with the same annotation. The output is a temporal element such that (a) no two intervals mapped to a nonzero element overlap and (b) adjacent intervals assigned to nonzero elements are guaranteed to be mapped to different annotations. To determine such intervals, we define annotation changepoints, time points where the annotation of a temporal element differs from the annotation at , i.e., ). It will be convenient to also consider as an annotation changepoint.
Definition 5.2 (Annotation Changepoint).
Given a temporal element , a time point is called a changepoint in if one of the following conditions holds:

(smallest time point)

(change of annotation)
We use to denote the set of all annotation changepoints for . Furthermore, we define to be the set of all intervals that consist of consecutive change points:
In Definition 5.2, computes maximal intervals such that the annotation assigned by to each point in such an interval is constant. In the coalesced representation of only such intervals are mapped to nonzero annotations.
Definition 5.3 (Coalesce).
Let be a temporal element. We define coalescing as a function :
We use to denote all normalized temporal elements, i.e., elements for which for some .
sal  period 

k  
k  
k  
k 
Example 5.3.
Consider the SQL period relation shown in Figure 3. The temporal elements encode the history of tuples , and . Note that is not coalesced since the two nonzero intervals of this element overlap. Applying coalesce we get:
That is, this tuple occurs twice within the time interval and once in , i.e., it has annotation changepoints , , and . Interpreting the same relation under set semantics (semiring ), the history of can be encoded as a temporal element . Applying coalesce:
That is, this tuple occurs (is annotated with ) within the time interval and its annotation changepoints are and .
We now prove several important properties of the coalesce operator establishing that (coalesced temporal elements) is a good choice for a normal form of temporal elements.
Lemma 5.1.
Let be a semiring and , and temporal elements. We have:
(idempotence)  
(uniqueness)  
(equivalence preservation) 
Proof.
All proofs are shown in Appendix A. ∎
6 Period Semirings
Having established a unique normal form of temporal elements, we now proceed to define period semirings as our logical model. The elements of a period semiring are temporal elements in normal form. We prove that these structures are semirings and ultimately that relations annotated with period semirings form a representation system for snapshot relations for . In Section 7, we then prove them to also be a representation system for , i.e., queries involving difference and aggregation.
When defining the addition and multiplication operations and their neutral elements in the semiring structure of temporal elements, we have to ensure that these definitions are compatible with semiring on snapshots. Furthermore, we need to ensure that the output of these operations is guaranteed to be coalesced. The latter can be ensured by applying coalesce to the output of the operation. For addition, snapshot reducibility is achieved by pointwise addition (denoted as ) of the two functions that constitute the two input temporal elements. That is, for each interval , the function that is the result of the addition of temporal elements and assigns to the value . For multiplication, the multiplication of two elements assigned to an overlapping pair of intervals and is valid during the intersection of and . Since both input temporal elements may assign nonzero values to multiple intervals that have the same overlap, the resulting value at a point would be the sum over all pairs of overlapping intervals. We denote this operation as . Since and may return a temporal element that is not coalesced, we define the operations of our structures to apply to the result of and . The zero element of the temporal extension of is the temporal element that maps all intervals to and the element is the temporal element that maps every interval to except for which is mapped to .
Definition 6.1 (Period Semiring).
For a time domain with minimum and maximum and a semiring , the period semiring is defined as:
where for and :
Example 6.1.
Consider the relation works shown in Figure 2 (middle) and query . Recall that the annotation of a tuple in the result of a projection over a relation is the sum of all input tuples which are projected onto . For result tuple (SP) we have input tuples (Ann,SP) and (Sam,SP) with and , respectively. The tuple (SP) is annotated with the sum of these annotations, i.e., . Substituting definitions we get:
Thus, as expected, the result records that, e.g., there are two skilled workers (SP) on duty during time interval .
Having defined the family of period semirings, it remains to be shown that with standard Krelational query semantics is a representation system for snapshot relations.
6.1 is a Semiring
As a first step, we prove that for any semiring , the structure is also a semiring. The following lemma shows that coalesce can be redundantly pushed into and operations.
Lemma 6.1.
Let be a semiring and . Then,
Using this lemma, we now prove that for any semiring , the structure is also a semiring.
Theorem 6.2.
For any semiring , structure is a semiring.
6.2 Timeslice Operator
We define a timeslice operator for relations based on the timeslice operator for temporal elements. We annotate each tuple in the output of this operator with the result of applied to the temporal element the tuple is annotated with.
Definition 6.2 (Timeslice for relations).
Let be a relation and . The timeslice operator is defined as:
We now prove that the is a homomorphism . Since semiring homomorphisms commute with queries [20], equipped with this timeslice operator does fulfill the snapshotreducibility condition of representation systems (Definition 4.5).
Theorem 6.3.
For any , the timeslice operator is a semiring homomorphism from to .
As an example of the application of this homomorphism, consider the period relation works from our running example as shown on the left of Figure 2. Applying to this relation yields the snapshot shown on the bottom of this figure (three employees work between 8am and 9am out of whom two are specialized). If we evaluate query over this snapshot we get the snapshot shown on the right of this figure (the count is 2). By Theorem 6.3 we get the same result if we evaluate over the input period relation and then apply to the result.
6.3 Encoding of Snapshot Krelations
We now define a bijective mapping from snapshot relations to relations. We then prove that the set of relations together with the timeslice operator for such relations and the mapping (the inverse of ) form a representation system for snapshot relations. Intuitively, is constructed by assigning each tuple a temporal element where the annotation of the tuple at time (i.e., ) is assigned to a singleton interval . This temporal element is then coalesced to create a element.
Definition 6.3.
Let be a semiring and a snapshot relation, is a mapping from snapshot relations to relations defined as follows.
We first prove that this mapping is bijective, i.e., it is invertible, which guarantees that is welldefined and also implies uniqueness (condition 1 of Definition 4.5).
Lemma 6.4.
For any semiring , is bijective.
Next, we have to show that preserves snapshots, i.e., the instance at a time point represented by can be extracted from using the timeslice operator.
Lemma 6.5.
For any semiring , snapshot relation , and time point , we have .
Based on these properties of and the fact that the timeslice operator over relations is a homomorphism , our main technical result follows immediately. That is, the set of relations equipped with the timeslice operator and is a representation system for positive relational algebra queries () over snapshot relations.
Theorem 6.6 (Representation System).
Given a semiring , let be the set of all relations. The triple is a representation system for queries over snapshot relations.
7 Complex Queries
Having proven that relations form a representation system for , we now study extensions for difference and aggregation.
7.1 Difference
Extensions of relations for difference have been studied in [19, 3]. For instance, the difference operator on relations corresponds to bag difference (SQL’s EXCEPT ALL). Geerts et al. [19] apply an extension of semirings with a monus operation that is defined based on the natural order of a semiring and demonstrated how to define a difference operation for relations based on the monus operation for semirings where this operations is welldefined. Following the terminology introduced in this work, we refer to semirings with a monus operation as msemirings. We now prove that if a semiring has a welldefined monus, then so does . From this follows, that for any such , the difference operation is welldefined for . We proceed to show that the timeslice operator is an msemiring homomorphism, which implies that relations for any msemiring form a representation system for (full relational algebra). The definition of a monus operator is based on the socalled natural order . For two elements and of a semiring , . If is a partial order then is called naturally ordered. For instance, is naturally ordered ( corresponds to the order of natural numbers) while is not (for any we have ). For the monus to be welldefined on , has to be naturally ordered and for any , the set has to have a smallest member. For any semiring fulfilling these two conditions, the monus operation is defined as where is the smallest element such that . For instance, the monus for is the truncating minus: .
Theorem 7.1.
For any msemiring , semiring has a welldefined monus, i.e., is an msemiring.
Let denote an operation that returns a temporal element which assigns to each singleton interval the result of the monus for : (this is as defined in the proof of Theorem 7.1, see Appendix A). In the proof of Theorem 7.1, we demonstrate that . Obviously, computing using singleton intervals is not effective. In our implementation, we use a more efficient way to compute the monus for that is based on normalizing the input temporal elements and such that annotations are attached to larger time intervals where is guaranteed to be constant. Importantly, is a homomorphism for monussemiring .
Theorem 7.2.
Mapping is an msemiring homomorphism.
For example, consider from Example 1.2 which can be expressed in relational algebra as . The relation corresponding to the period relation assign shown in this example annotates each tuple with a singleton temporal element mapping the period of this tuple to , e.g., (M1, SP) is annotated with . The annotation of result tuple (SP) is computed as
Comments
There are no comments yet.