M odel S upport Integrity Constraints and More Henrietta Dombrovskaya Braviant Holdings Chicago Boris Novikov Saint Petersburg University Russia Chad Slaughter Scale Genius Inc Chicago ID: 718663
Download Presentation The PPT/PDF document "Enhancements to the B itemporal" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Enhancements to the Bitemporal Model Support: Integrity Constraints and More
Henrietta Dombrovskaya,
Braviant
Holdings, Chicago
Boris
Novikov
, Saint Petersburg University, Russia
Chad Slaughter, Scale Genius, Inc. ChicagoSlide2
Bitemporal Data: Making It Happen In Postgres
Incorporating time, thereby making the data models temporal, is a hot topic. The practical usage include:
Snapshot queries (This report looked differently last week! I have a print out! Can you run it as it looked at the end of November?)
Time-interval queries (Why this query takes twice longer now than last month? What exactly changed in the data?)Correlated time queries (How many people where fired since X became a CTO? How many people left since the new head of Analytics started?)Many systems are already providing this kind of support (DB2, Oracle) and it is included into the SQL ANSI 2011 standard.However, Postgres never formally included the time dimensions, and we are going to fix this!
2
Sep-17
Bitemporal Data Model - PG Open 2017Slide3
ReferencesC.J. Date, Hugh
Darwen
, and Nikos
Lorentzos. 2014. Time and Relational Theory, Second Edition: Temporal Databases in the Relational Model and SQL (2nd ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Tom Johnston. 2014. Bitemporal Data: Theory and Practice (1st ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Tom Johnston and Randall Weis. 2010. Managing Time in Relational Databases: How to Design, Update and Query Temporal Data. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Krishna Kulkarni and Jan-Eike Michels. 2012. Temporal features in SQL:2011. SIGMOD Rec. 41, 3 (October 2012), 34-43.
Richard Thomas Snodgrass. 1999. Developing Time-Oriented Database Applications in SQL. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
ISO/IEC 9075-2:2011, Information technology — Database languages — SQL , 2011Martin Kaufmann, Peter M. Fischer, Norman May, Donald Kossmann
: Benchmarking
Bitemporal
Database Systems: Ready for the Future or Stuck in the Past? EDBT 2014: 738-749
3
Sep-17
Bitemporal Data Model - PG Open 2017Slide4
Previous Work OverviewWhat we’ve presented a year ago
4
Sep-17
Bitemporal Data Model - PG Open 2017Slide5
SummaryThe surveys (
Kosman
et al) indicate, that is should be easy to support time in Postgres due to the existence of the GIST indexes. And we’ve utilized this advantage.
Last year we presented the Postgres-based implementation of asserted versioning framework (Johnston and Weis). This model supports effective time and asserted time and differs from the ANSI 2011 standard.5Sep-17
Bitemporal Data Model - PG Open 2017Slide6
Where Our Work Can Be Found
pg_bitemporal
git repohttps://github.com/scalegenius/pg_bitemporalIncludes:Domain definition
Allen relations implementation (including the ones already implemented in Postgres)O
perations:create bitemporal
table
b
itemporal insert
bitemporal updatebitemporal
correctioninactivate
b
itemporal
delete
S
upport of
bitemporal
constraints
6
Sep-17
Bitemporal Data Model - PG Open 2017Slide7
Overview Of Bitemporal Operations
The next several slides present the basic
bitemporal
data manipulation operations and explain the differences between bitemporal and conventional operations.Terminology: effective timeasserted time
the combination of effective and asserted times: time regions
7
Sep-17
Bitemporal Data Model - PG Open 2017Slide8
Bitemporal Insert
#
Effective Interval
Assertive IntervalCustomer No.Name
Type1
[2015-06-01, oo
)
[ 2015-05-01 ,
oo
)
C100
John Doe
Silver
Asserted
1
Effective
now
= 2015-05-01
select
ll_bitemporal_insert
(
'customers’
,
,
$$’
customer_no
’, name
',
’type' $$,
,
$$'C100','John Doe', 'Silver' $$,
timeperiod
('2015-06-01','infinity'),
timeperiod
('2015-05-01','infinity')
8
Sep-17
Bitemporal Data Model - PG Open 2017Slide9
Bitemporal Update
#
Effective Interval
Assertive IntervalCustomer No.Name
Type1
[2015-06-01, oo)
[2015-05-01,2015-09-15)
C100
John Doe
Silver
2
[2015-06-01,2015-09-15)
[2015-09-15,
oo
)
C100
John Doe
Silver
3
[2015-09-15, oo)
[ 2015-09-15,
oo
)
C100
John Doe
Gold
Asserted
1
3
1
2
Effective
now =
2015-09-15
select
ll_bitemporal_update
($$customers$$,
$$
customer_no
$$, $$100$$,
$$type$$, $$Gold$$,
t
imeperiod
(‘2015-09-15’, ‘infinity’),
timeperiod
(‘2015-09-15’,
‘infinity’))
9
Sep-17
Bitemporal Data Model - PG Open 2017Slide10
Bitemporal Correction
#
Effective Interval
Assertive IntervalCustomer No.Name
Type1
[2015-06-01, oo)
[2015-05-01,2015-09-15)
C100
John Doe
Silver
2
[2015-06-01,2015-09-15)
[2015-09-15,
oo
)
C100
John Doe
Silver
3
[2015-09-15, oo)
[2015-09-15, 2015-09-22)
C100
John Doe
Gold
4
[2015-09-15,
oo
)
[ 2015-09-22,
oo
)
C100
John Doe
Platinum
Asserted
3
1
2
4
3
1
2
Effective
now = 2015-09-15
select
ll_bitemporal_correction
($$customers$$,
$$type
$$,
$$
Platinum$$,
$$
customer_no
$$,
$$
C100$$,
timeperiod
('2015-09-15'
,
'infinity
')
,
n
ow())
10
Sep-17
Bitemporal Data Model - PG Open 2017Slide11
Bitemporal Inactivate
#
Effective Interval
Assertive IntervalCustomer No.Name
Type1
[2015-06-01, oo)
[2015-05-01,2015-09-15)
C100
John Doe
Silver
2
[2015-06-01,2015-09-15)
[2015-09-15,
oo
)
C100
John Doe
Silver
3
[2015-09-15, oo)
[2015-09-15, 2015-09-22)
C100
John Doe
Gold
4
[2015-09-15,
oo
)
[2015-09-22, 2015-11-05)
C100
John Doe
Platinum
5
[2015-09-15,2015-12-31 )
[ 2015-11-05,
oo
)
C100
John Doe
Platinum
Asserted
4
3
1
2
5
4
3
1
2
Effective
now =
2015-11-05
select
ll_bitemporal_inactivate
(
$$customers$$,
$$
customer_no
$$,
$$C100$$,
timeperiod
('2015-12-31','infinity
'),
timeperiod
('2015-11-05','infinity'),
11
Sep-17
Bitemporal Data Model - PG Open 2017Slide12
Bitemporal Delete
#
Effective Interval
Assertive IntervalCustomer No.Name
Type1
[2015-06-01, oo)
[2015-05-01,2015-09-15)
C100
John Doe
Silver
2
[2015-06-01,2015-09-15)
[2015-09-15,
oo
)
C100
John Doe
Silver
3
[2015-09-15, oo)
[2015-09-15, 2015-09-22)
C100
John Doe
Gold
4
[2015-09-15,
oo
)
[2015-09-22, 2015-11-05)
C100
John Doe
Platinum
5
[2015-09-15,2015-12-31 )
[2015-11-05, 2015-11-17 )
C100
John Doe
Platinum
Asserted
4
5
3
1
2
5
4
3
1
2
Effective
now =
2015-11-17
select
ll_bitemporal_delete
(
'customers',
$$
customer_no
$$,
$$ 'C100' $$,
timeperiod
('2015-11-17','infinity
'))
12
Sep-17
Bitemporal Data Model - PG Open 2017Slide13
Bitemporal ConstraintsWe need to support the following constraint types:
Primary key
–
supported by GIST with exclusionUnique - supported by GIST with exclusionCheck – no difference from regular tablesIS/IS NOT NULL – no difference from regular tablesForeign key – most difficult to supportWe use
metacode to record the presence of the bitemporal constraints:
PK: select
bitemporal_internal.pk_constraint
('
postgres_cluster_id');
UQ: select bitemporal_internal.unique_constraint
('port');
13
Sep-17
Bitemporal Data Model - PG Open 2017Slide14
Defining Foreign Key ConstraintThe difficulty of verifying the bi-temporal FK is that the PK/UQ in the parent table should be effective and asserted all the time when a dependent record is effective/asserted
CONSTRAINT "
bitemporal fk postgres_version_database_versionsrelease_version" check (true or 'fk' <> '@
postgres_version ->
database_versions(release_version
)@'));
Function:
select
bitemporal_internal.fk_constraint
(
'
postgres_version
’
,'
database_versions
’
,'
release_version
');
14
Sep-17
Bitemporal Data Model - PG Open 2017Slide15
FK CreationCheck whether the referencing field is a PK/UQ validate_bitemporal_pk_uqCreate check constraint
fk_constraint
Check whether the validation on the parent table field already exists
ll_lookup_validation_functionIf not, create itll_generate_fk_validatecreate trigger on insert/update and a trigger function15
Sep-17Bitemporal Data Model - PG Open 2017Slide16
How a Validation Function Should Work?
In order to define the algorithm, we need to take a closer look at how
bitemporal
queries work.16Sep-17Bitemporal Data Model - PG Open 2017Slide17
Understanding Bitemporal Queries
17
Sep-17
Bitemporal Data Model - PG Open 2017Slide18
Types of Bitemporal QueriesSnapshot queries –
how the data looked at a certain point of time
What was the status of
customer_no ’C100’ on Sep 1 2015?May also use one-dimensional conditionsInterval queries in selection criteriaHow did the status change for customer_no ‘C100’ during 2015?May return multiple rows with time regionsOne-dimensional conditions are also hereCorrelated time queriesWhat was the status of customer ‘C200’ when customer ‘C100’ was ’Platinum’?Values are equal only at time when they are valid in both time regions
18
Sep-17
Bitemporal Data Model - PG Open 2017Slide19
How to Define Time Regions For the Query Result? Conceptually each row has associated bitemporal
time region
The strategy is to perform an operation as in non-temporal DB and calculate region for each tuple of resultOperations on time regions: ∩ intersection∪ union − differenceExamples:Interval query: intersect query region with tuple regions: what was the customer type for customer C100 in September 2015?Join: find matching tuples and intersect their regions: what was the status of customer C300 when customer C100 was “Platinum”?19
Sep-17
Bitemporal Data Model - PG Open 2017Slide20
Defining Time Regions For Updates
UPDATE
a_table
SET attr = (expr) WHERE (cond)Time regions for:S – the time region defined by UPDATE function (for new values to be set)P – time region for each row identified by WHERE clause (with
old values to be updated)N-
time region for each already existing row with values, which we are going to assign
(empty of not exists)
P ⋂ N = ⍉
After UPDATE the DB will contain regions:Old values: P ∖ S
New values: N ∪S
20
Sep-17
Bitemporal Data Model - PG Open 2017
P
N
s
P
N
sSlide21
Time Regions as Sets of RectanglesOperations on time regions:
∩
intersection
∪ union − differenceAsserted interval × Effective interval = rectangleUnfortunately, union and difference of rectangles are not rectangularSets of non-intersecting rectangles: all operations produce sets or rectangles
21
Sep-17
Bitemporal Data Model - PG Open 2017Slide22
Operations on Regions (Rectangle Sets)Disjoin (auxiliary)Split both arguments into smaller rectangles such that each pair of rectangles from different arguments either coincide or do not overlap
Set-theoretic operations: union, intersection, set difference:
Calculate the disjoin
Choose rectangles that belong to union/intersection/differenceSep-17Bitemporal Data Model - PG Open 201722Slide23
Using Rectangle setsChecking foreign key constraintsRect_set (FK) ⊊
rect_set
(PK)
Computing time regions (rectangle sets) for query resultsIntersectionUnionDifferenceBulk inserts and updates
Sep-17
Bitemporal Data Model - PG Open 2017
23Slide24
Finding One-Dimensional (Interval) DisjoinSort all left and right ends of source intervals togetherChoose intervals that intersect with or overlap with at least one of source intervals
Sep-17
Bitemporal Data Model - PG Open 2017
24Slide25
Two-Dimensional Disjoin AlgorithmProject both arguments on one dimension (say, on asserted and ignore effective time)Compute on-dimensional disjoin
Put back the second dimension producing stripes
For each stripe, find one-dimensional disjoin.
Sep-17Bitemporal Data Model - PG Open 201725Slide26
Conceptual Representation: Nested Relational. (Do we still need this one?)
Each row contains a cuboid set
DB primary key is the entire non-temporal row
(business key, non-key attributes)DB foreign keys are unusableUpdates affect at most 3 rows(business) keyNon-key
attributesCuboid
dim1
dim2
Alice
10
1,10
1,100
Alice
15
10,20
1,100
Bob
9
1,5
1,30
5,10
50,100
Bob
16
10,20
30,100
26
Sep-17
Bitemporal Data Model - PG Open 2017Slide27
Current PG Implementation: Flat Relational (Do we still need this one?)
DB primary key
(
busines key, rectangle)PK uniqueness can be supported with indexesDB foreign keys are unusableUpdates almost always affect several rows(business) key
Non-key attributes
Cuboid
dim1
dim2
Alice
10
1,10
1,100
Alice
15
10,20
1,100
Bob
9
1,5
1,30
Bob
9
5,10
50,100
Bob
16
10,20
30,100
27
Sep-17
Bitemporal Data Model - PG Open 2017Slide28
Enhancements To The Existing Bitemporal Functions
New features added as a result of
bitemporal
production system implementation28Sep-17Bitemporal Data Model - PG Open 2017Slide29
Bulk Operations
INSERT-SELECT
The power of SQL is operating on sets, but so far we only implemented single-row insert. The new function
ll_bitemporal_insert_select
(
p_table text
,
p_list_of_fields
text
,
p_select
TEXT
,
p_effective
temporal_relationships.timeperiod
,
p_asserted
temporal_relationships.timeperiod
)
Allows to pass a select statement as a parameter.
29
Sep-17
Bitemporal Data Model - PG Open 2017Slide30
UPDATE/CORRECTION AS SELECTOur existing bitemporal_update and bitemporal_correction
support updating multiple records in one call, but the update conditions are reduced to the lists of values
New “bulk” update and correction will allow free-format expressions for both selection criteria and new values.
The same concept will be applied to the DELETE/INACTIVATE functions.30Sep-17
Bitemporal Data Model - PG Open 2017Slide31
Consistent Transactional BehaviorQuestion:What should be recorded in a
bitemporal
table, when several operations happen in one transaction and affect the same record(s)?
Transactions are atomic, thereby the intermediate states should not be visible.Also, if we materialize the record status before (and regardless) of transaction commit/rollback, we might (most likely) get inconsistent results for the snapshot queries.31Sep-17
Bitemporal Data Model - PG Open 2017Slide32
Bitemporal Correction Immediately After Insert
#
Effective Interval
Assertive IntervalCustomer No.
NameType
1
[2015-06-01, oo)
[2015-05-01,2015-09-15)
C100
John Doe
Silver
2
[2015-06-01,2015-09-15)
[2015-09-15,
oo
)
C100
John Doe
Silver
3
[2015-09-15, oo)
[2015-09-15,
2015-09-15)
C100
John Doe
Gold
4
[2015-09-15,
oo
)
[ 2015-09-15,
oo
)
C100
John Doe
Platinum
Asserted
3
1
2
4
3
1
2
Effective
now = 2015-09-15
select
ll_bitemporal_correction
($$customers$$,
$$type
$$,
$$
Platinum$$,
$$
customer_no
$$,
$$
C100$$,
timeperiod
('2015-09-15'
,
'infinity')
,
32
Sep-17
Bitemporal Data Model - PG Open 2017Slide33
Corrected Behavior Of UPDATE/CORRECTIONNow we check the asserted interval of the “last” record, and if the “last” asserted start is the same as the “new” asserted start we do not insert a new record, but update the old one.
This still allows the ”non-transactional” behavior, of the app developers choose to ignore transactions
Bitemporal
correction has a new parameter: asserted_start_time (instead of “now” by default). One can still use the old version33
Sep-17Bitemporal Data Model - PG Open 2017Slide34
Bitemporal Correction – Correct Behavior
#
Effective Interval
Assertive IntervalCustomer No.Name
Type
1[2015-06-01, oo)
[2015-05-01,2015-09-15)
C100
John Doe
Silver
2
[2015-06-01,2015-09-15)
[2015-09-15,
oo
)
C100
John Doe
Silver
3
[2015-09-15, oo)
[2015-09-15,
oo
)
C100
John Doe
Platinum
4
[2015-09-15,
oo
)
[ 2015-09-22,
oo
)
Cl 00
John Doe
Platinum
Asserted
3
1
2
3
1
2
Effective
now = 2015-09-15
select
ll_bitemporal_correction
($$customers$$,
$$type
$$,
$$
Platinum$$,
$$
customer_no
$$,
$$
C100$$,
timeperiod
('2015-09-15'
,
'infinity')
,
34
Sep-17
Bitemporal Data Model - PG Open 2017Slide35
High-level FunctionsThe app developers do not care about how exactly bitemporal support is implemented.
We created a high-level functions for them to perform select/insert/update/delete without knowing about the
bitemporal
operationsExample:create or replace function common.hl_bitemporal_insert
(p_schema_name
text,
p_table_name
text
,
p_list_of_fields
text
,
p_list_of_values
text) returns integer
35
Sep-17
Bitemporal Data Model - PG Open 2017Slide36
PerformanceWe are using pg_bitemporal for both OLTP and OLAP systems, and the data volumes appear to be manageable
Indexing Strategy: GIST indexes are efficient, but occasionally we need to build extra indexes, for example conditional index with UPPER (asserted)=‘infinity’
Sep-17
Bitemporal Data Model - PG Open 201736Slide37
Future WorkFinalizing FK supportFinalizing constraints creation Finalizing
bitemporal
UPDATE
Continue research on performance and the choice of indexesMany thanks to Braviant Holdings leadership for their continuous support!37Sep-17
Bitemporal Data Model - PG Open 2017