Lecture 11 Fall 2019 Kesden Socrative httpsapisocrativecomrcNfu6Lp Todays Goals A brief introduction to relational databases A brief introduction to the SQL language used by relational databases ID: 800804
Download The PPT/PDF document "14-848 Cloud Infrastructure" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
14-848Cloud Infrastructure
Lecture 11 * Fall 2019 * Kesden
Slide2Socrative
https://api.socrative.com/rc/Nfu6Lp
Slide3Today’s Goals
A brief introduction to relational databases
A brief introduction to the SQL language used by relational databases
A brief look at how relational databases are organized on disk
Reflect on how they can be used within clouds
Reflect on why there are limits to their ability to scale
Slide4Not Among Today’s GoalS
A deep study of relational databases
Or, even minimal functional competency with SQL databases
If interested, take 15-615 or 15-645
Slide5Database
“A database is an organized collection of data, stored and accessed electronically.”
-- https://en.wikipedia.org/wiki/Database
Slide6Relational Database
“A relational database is a digital database based on the relational model of data, as proposed by E. F. Codd in 1970. “
“The relational model (RM) for database management is an approach to managing data using a structure and language consistent with first-order predicate logic…where all data is represented in terms of tuples, grouped into relations.
“Virtually all relational database systems use SQL (Structured Query Language) for querying and maintaining the database.”
-- https://en.wikipedia.org/wiki/Relational_database
-- https://en.wikipedia.org/wiki/Relational_model
Slide7Relational DatabasesRole In Clouds
Most widely used type of database since the 1980s
Unlikely to lose that title, or see reduced growth in use, any time soon
Good for up to millions of records
Beyond that, for “Big Data” applications, we’ll talk about various NoSQL databases
Everything from managing system configuration data to customer accounts to inventories to order and shipping records, to HR data, etc.
Slide8Relational Database, Informally
A database wherein the data is structured in tables known as
relations.
Rows
represent entities and are unordered
A relational database, although drawn as a table, functions more like an unordered set of rows.
Columns
represent attributes
Constraints can be enforced
w.r.t.
types and values
Schema
for each table describes attributes and associated types.
Operations can be performed across tables.
Slide9Example Relation:Student Table
StudentID
AndrewID
FName
LName
YearEnrolled
123456789
lkennedy
Laurie
Kennedy
2014
098765432
alight
Angela
Light
2015
135790246
gmcclendon
Gina
McClendon
2016
Slide10Keys:What Are They?
Used to name rows within tables
Used to associate rows across tables
Can be tuple of attributes
Slide11KeYS:Example Role
StudentID
AndrewID
FName
LName
YearEnrolled
123456789
lkennedy
Laurie
Kennedy
2014
098765432
llight
Laurie
Light
2015
135790246
gmcclendon
Gina
McClendon
2016
StudentID
CourseSemesterGrade13579024614-848F18A12345678914-736S17B+0987654314-740F17A-
Unique within relation
Used to associate relations
Slide12SQL (“Sequel”)Structured Query Language
Used by nearly all relational databases
Both a
Data Definition Language (DDL)
and a
Data Manipulation Language (DML)
DDL: Create, define, modify, and administer (access restrictions,
etc
) relations
Structure, types, constraints,
etc
– not values therein
DML: Query and modify values: Tuples and attributes thereof
Hides organization of data, e.g. whether and how indexed, how stored, location, etc.
Big difference from older database languages that described tree paths, etc.
Name derived from earlier
IBM predecessor,
“
S
tandard
E
nglish QUery Language”
Slide13Keys:Types
Candidate – Unique to row
Primary – Used as name for row
Foreign – Key used in another table
Think of a
foreign key
as a pointer
Slide14SQL:CREATE
CREATE TABLE
StudentBios
(
StudentID
CHAR(9) NOT NULL,
FName
VARCHAR(200)NOT NULL,
LName
VARCHAR(200)NOT NULL,
YearEnrolled
YEAR
)
StudentID
AndrewID
FName
LName
YearEnrolled
123456789lkennedyLaurieKennedy2014098765432llightLaurieLight2015135790246gmcclendon
GinaMcClendon2016
Slide15SQL:CREATE, REVISITED
CREATE TABLE
StudentCourses
(
StudentID
CHAR(9) NOT NULL,
Course CHAR(6) NOT NULL,
Semester CHAR(3) NOT NULL,
Grade CHAR(2)
PRIMARY KEY (
StudentID
, Course, Semester),
FOREIGN KEY (
StudentID
) REFERENCES
StudentBios
)
StudentID
Course
Semester
Grade
13579024614-848F18A12345678914-736S17B+0987654314-740
F17A-
Why is the primary key (StudentID, Course, Semester)?Not just StudentID?
Slide16SQL:INSERT
INSERT INTO
StudentBios
(CHAR(9), VARCHAR(200),VARCHAR(200),YEAR)
VALUES
(123456789, ‘
lkennedy
’, ‘Laurie’, ‘Kennedy’, 2014)
StudentID
AndrewID
FName
LName
YearEnrolled
123456789
lkennedy
Laurie
Kennedy
2014
098765432
llight
Angela
Light2015135790246gmcclendonGinaMcClendon2016
Slide17SQL:SELECT
SELECT
Fname
,
Lname
from
StudentBios
WHERE
YearEnrolled
>= 2015
StudentID
AndrewID
FName
LName
YearEnrolled
123456789
lkennedy
Laurie
Kennedy
2014
098765432
llightAngelaLight2015135790246gmcclendonGinaMcClendon2016
Slide18SQL:SELECT, Continued
SELECT
StudentID
,
Fname
,
Lname
from
StudentBios
WHERE
YearEnrolled
<= 2015
ORDER BY
FName,LName
StudentID
FName
LName
123456789
Angela
Light
098765432
LaurieKennedy
Slide19SQL:DELETE
DELETE FROM
StudentBios
WHERE
StudentBios.Fname
= ‘Laurie’
StudentID
AndrewID
FName
LName
YearEnrolled
123456789
lkennedy
Laurie
Kennedy
2014
098765432
llight
Laurie
Light
2015
135790246gmcclendonGinaMcClendon2016
Slide20Combines columns from one or more relations into a new relation
Rows are aligned based upon a matching key
Columns are selected from logically combined rows
Yes, a relation can be joined with itself, hence one or more
Complexity
Some keys may be present in some relations but not others
Depending upon which table is being iterated, some rows may be missed or some columns may be empty
This leads to different flavors of JOIN
SQL:
JOIN
Slide21SQL:JOIN
SELECT
AndrewID
, Course, Semester, Grade
FROM
StudentBios
INNER JOIN
StudentCourses
WHERE
StudentBIOS.StudentID
=
StudentCourses.StudentID
This effectively forms a new relation which is the union of the
StudentBios
and
StudentCourses
tables
The rows of the two are associated up based upon the
StudentID
of the the rows matching. INNER JOIN requires matches in both tablesIf there isn’t a matching key in both tables it isn’t includedThe result has (AndrewID, Course, Semester, Grade) columns
Slide22SQL:Views
CREATE
VIEW
StudentClassList
AS
SELECT
AndrewID
, Course, Semester
FROM
StudentBios
INNER JOIN
StudentCourses
WHERE
StudentBIOS.StudentID
=
StudentCourses.StudentID
This forms what is essentially a virtual table
Abstracts away complexity of underlying tables
Allows view to be created from tables that may be from different domains or systems
Allows protections to be applied to table, e.g. can access course list for student, but not course grades
May enable better system performance by aiding caching, etc.
Slide23Really approximate access times (latency):
Registers: Speed of processor
SRAM (Cache): 2-3nS = ~2.5x10
-6
mS = ~2.5/1,000,000 mS
DRAM (Main Memory): 20-30nS = ~2.5x10
-5
mS = 2.5/100,000 mS
FlashRAM
(SSD): 0.035 mS
Disk Drive: 1 mS, sequential
Disk Drive: 10 mS, random
The name of the game is…
To build a system that performs nearly as well as if all of its memory is of the fastest type
At a cost nearly as low as if all of its memory is of the slowest type
Caching is the magic sauce
Relational Databases:
Memory Hierarchy: A Quick Review
Slide24Memory is limited
High throughput is key
ACID properties
Atomicity – A transaction is all done, or note done at all, but not half done (Succeed or fail)
Consistency – Find things consistent, leave things consistent. Don’t break things with side-effects.
Isolation – Even if there is concurrency, effects should be the same as if sequential
Durability – Once it is done, it is done (.)
Minimize overhead, while maximizing performance
Relational Databases:
Additional Considerations
Slide25Caching is critical
Durability will require consideration of hardware
SSD or other
FlashRAM
layer?
Battery powered backup?
Etc
Indexes and data stored separately
Pack as many indexes into RAM as possible. Think like Haystack, but 40 years earlier.
Multiple indexes, separate for each use
Indexes and data different caching domains
Relational Databases:
Implications OF ENVIRONMENT, GOALS
Slide26Block/Page based storage of data pages
Often LRU cached
All same size for interchangeable allocation and caching, mapping to disk blocks,
etc
Indexes constructed to speed queries
B-Tree based – sorted, range-based,
etc
Caching may be page-based LRU.
Hash based – fast lookup without rich features
Trades disk space for lookup speed
Exact lookup only, not ranges, etc.
May be dynamic
Lookup results can be cached.
Indexes point into data pages
Commonly created by administrator, not based upon any intrinsic understanding by the database, itself
Relational Databases:
Typical Organization
Slide27Backups are hard because they may require a freeze
Blurry backups may not be useful because of ACID
Some algorithms can make blurry backup and fix-up
Replicated databases are a common option
A few replicas provide robustness without blowing scale
Need policies for working through failure and obtaining recovery
Need atomic commit protocol or master-slave,
etc
, to maintain consistency
Relational Databases:
Obtaining Robustness
Slide28Scaling to a point is easy enough
Buy bigger disk
Solution can be replicated
Scaling even bigger is easy enough
Buy an array
Solution can be replicated
Scaling even bigger gets complicated
Distributing whole tables makes joins more expensive
Sharding
tables by row makes summarizing across attributes slow
Sharding
tables by column makes lookup by entity slow
At a certain point, it may be better to focus on precise needs and structure a solution to meet them
NoSQL Databases: Row-Oriented, Column-Oriented, Key-Value Stores, etc.
Relational Databases:
Scaling Up
Slide29Distill data as much as possible.
Make representation of data as dense as possible.
Keep data redundantly, if (absolutely needed) to allow distribution with performance
Relational Databases:
Scaling Up, Simple Ideas
Slide30SQL:The STORY CONTINUES
There is a ton of theory behind relational databases
There is a ton more to the SQL language
There is a ton more to how SQL databases are organized, indexed, etc.
Take 15-615 or 15-645 if interested.
Slide31Moving Forward:Data (Not File) Storage at Scale
Key-Value Stores
Column, Row, and Document-Oriented Databases
RAM layer storage