/
14-848 Cloud Infrastructure 14-848 Cloud Infrastructure

14-848 Cloud Infrastructure - PowerPoint Presentation

ideassi
ideassi . @ideassi
Follow
344 views
Uploaded On 2020-08-06

14-848 Cloud Infrastructure - PPT Presentation

Lecture 11 Fall 2019 Kesden Socrative httpsapisocrativecomrcNfu6Lp Todays Goals A brief introduction to relational databases A brief introduction to the SQL language used by relational databases ID: 800804

studentid relational data databases relational studentid databases data database sql fname studentbios lname tables key yearenrolled andrewid based rows

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "14-848 Cloud Infrastructure" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

14-848Cloud Infrastructure

Lecture 11 * Fall 2019 * Kesden

Slide2

Socrative

https://api.socrative.com/rc/Nfu6Lp

Slide3

Today’s Goals

A brief introduction to relational databases

A brief introduction to the SQL language used by relational databases

A brief look at how relational databases are organized on disk

Reflect on how they can be used within clouds

Reflect on why there are limits to their ability to scale

Slide4

Not Among Today’s GoalS

A deep study of relational databases

Or, even minimal functional competency with SQL databases

If interested, take 15-615 or 15-645

Slide5

Database

“A database is an organized collection of data, stored and accessed electronically.”

-- https://en.wikipedia.org/wiki/Database

Slide6

Relational Database

“A relational database is a digital database based on the relational model of data, as proposed by E. F. Codd in 1970. “

“The relational model (RM) for database management is an approach to managing data using a structure and language consistent with first-order predicate logic…where all data is represented in terms of tuples, grouped into relations.

“Virtually all relational database systems use SQL (Structured Query Language) for querying and maintaining the database.”

-- https://en.wikipedia.org/wiki/Relational_database

-- https://en.wikipedia.org/wiki/Relational_model

Slide7

Relational DatabasesRole In Clouds

Most widely used type of database since the 1980s

Unlikely to lose that title, or see reduced growth in use, any time soon

Good for up to millions of records

Beyond that, for “Big Data” applications, we’ll talk about various NoSQL databases

Everything from managing system configuration data to customer accounts to inventories to order and shipping records, to HR data, etc.

Slide8

Relational Database, Informally

A database wherein the data is structured in tables known as

relations.

Rows

represent entities and are unordered

A relational database, although drawn as a table, functions more like an unordered set of rows.

Columns

represent attributes

Constraints can be enforced

w.r.t.

types and values

Schema

for each table describes attributes and associated types.

Operations can be performed across tables.

Slide9

Example Relation:Student Table

StudentID

AndrewID

FName

LName

YearEnrolled

123456789

lkennedy

Laurie

Kennedy

2014

098765432

alight

Angela

Light

2015

135790246

gmcclendon

Gina

McClendon

2016

Slide10

Keys:What Are They?

Used to name rows within tables

Used to associate rows across tables

Can be tuple of attributes

Slide11

KeYS:Example Role

StudentID

AndrewID

FName

LName

YearEnrolled

123456789

lkennedy

Laurie

Kennedy

2014

098765432

llight

Laurie

Light

2015

135790246

gmcclendon

Gina

McClendon

2016

StudentID

CourseSemesterGrade13579024614-848F18A12345678914-736S17B+0987654314-740F17A-

Unique within relation

Used to associate relations

Slide12

SQL (“Sequel”)Structured Query Language

Used by nearly all relational databases

Both a

Data Definition Language (DDL)

and a

Data Manipulation Language (DML)

DDL: Create, define, modify, and administer (access restrictions,

etc

) relations

Structure, types, constraints,

etc

– not values therein

DML: Query and modify values: Tuples and attributes thereof

Hides organization of data, e.g. whether and how indexed, how stored, location, etc.

Big difference from older database languages that described tree paths, etc.

Name derived from earlier

IBM predecessor,

S

tandard

E

nglish QUery Language”

Slide13

Keys:Types

Candidate – Unique to row

Primary – Used as name for row

Foreign – Key used in another table

Think of a

foreign key

as a pointer

Slide14

SQL:CREATE

CREATE TABLE

StudentBios

(

StudentID

CHAR(9) NOT NULL,

FName

VARCHAR(200)NOT NULL,

LName

VARCHAR(200)NOT NULL,

YearEnrolled

YEAR

)

StudentID

AndrewID

FName

LName

YearEnrolled

123456789lkennedyLaurieKennedy2014098765432llightLaurieLight2015135790246gmcclendon

GinaMcClendon2016

Slide15

SQL:CREATE, REVISITED

CREATE TABLE

StudentCourses

(

StudentID

CHAR(9) NOT NULL,

Course CHAR(6) NOT NULL,

Semester CHAR(3) NOT NULL,

Grade CHAR(2)

PRIMARY KEY (

StudentID

, Course, Semester),

FOREIGN KEY (

StudentID

) REFERENCES

StudentBios

)

StudentID

Course

Semester

Grade

13579024614-848F18A12345678914-736S17B+0987654314-740

F17A-

Why is the primary key (StudentID, Course, Semester)?Not just StudentID?

Slide16

SQL:INSERT

INSERT INTO

StudentBios

(CHAR(9), VARCHAR(200),VARCHAR(200),YEAR)

VALUES

(123456789, ‘

lkennedy

’, ‘Laurie’, ‘Kennedy’, 2014)

StudentID

AndrewID

FName

LName

YearEnrolled

123456789

lkennedy

Laurie

Kennedy

2014

098765432

llight

Angela

Light2015135790246gmcclendonGinaMcClendon2016

Slide17

SQL:SELECT

SELECT

Fname

,

Lname

from

StudentBios

WHERE

YearEnrolled

>= 2015

StudentID

AndrewID

FName

LName

YearEnrolled

123456789

lkennedy

Laurie

Kennedy

2014

098765432

llightAngelaLight2015135790246gmcclendonGinaMcClendon2016

Slide18

SQL:SELECT, Continued

SELECT

StudentID

,

Fname

,

Lname

from

StudentBios

WHERE

YearEnrolled

<= 2015

ORDER BY

FName,LName

StudentID

FName

LName

123456789

Angela

Light

098765432

LaurieKennedy

Slide19

SQL:DELETE

DELETE FROM

StudentBios

WHERE

StudentBios.Fname

= ‘Laurie’

StudentID

AndrewID

FName

LName

YearEnrolled

123456789

lkennedy

Laurie

Kennedy

2014

098765432

llight

Laurie

Light

2015

135790246gmcclendonGinaMcClendon2016

Slide20

Combines columns from one or more relations into a new relation

Rows are aligned based upon a matching key

Columns are selected from logically combined rows

Yes, a relation can be joined with itself, hence one or more

Complexity

Some keys may be present in some relations but not others

Depending upon which table is being iterated, some rows may be missed or some columns may be empty

This leads to different flavors of JOIN

SQL:

JOIN

Slide21

SQL:JOIN

SELECT

AndrewID

, Course, Semester, Grade

FROM

StudentBios

INNER JOIN

StudentCourses

WHERE

StudentBIOS.StudentID

=

StudentCourses.StudentID

This effectively forms a new relation which is the union of the

StudentBios

and

StudentCourses

tables

The rows of the two are associated up based upon the

StudentID

of the the rows matching. INNER JOIN requires matches in both tablesIf there isn’t a matching key in both tables it isn’t includedThe result has (AndrewID, Course, Semester, Grade) columns

Slide22

SQL:Views

CREATE

VIEW

StudentClassList

AS

SELECT

AndrewID

, Course, Semester

FROM

StudentBios

INNER JOIN

StudentCourses

WHERE

StudentBIOS.StudentID

=

StudentCourses.StudentID

This forms what is essentially a virtual table

Abstracts away complexity of underlying tables

Allows view to be created from tables that may be from different domains or systems

Allows protections to be applied to table, e.g. can access course list for student, but not course grades

May enable better system performance by aiding caching, etc.

Slide23

Really approximate access times (latency):

Registers: Speed of processor

SRAM (Cache): 2-3nS = ~2.5x10

-6

mS = ~2.5/1,000,000 mS

DRAM (Main Memory): 20-30nS = ~2.5x10

-5

mS = 2.5/100,000 mS

FlashRAM

(SSD): 0.035 mS

Disk Drive: 1 mS, sequential

Disk Drive: 10 mS, random

The name of the game is…

To build a system that performs nearly as well as if all of its memory is of the fastest type

At a cost nearly as low as if all of its memory is of the slowest type

Caching is the magic sauce

Relational Databases:

Memory Hierarchy: A Quick Review

Slide24

Memory is limited

High throughput is key

ACID properties

Atomicity – A transaction is all done, or note done at all, but not half done (Succeed or fail)

Consistency – Find things consistent, leave things consistent. Don’t break things with side-effects.

Isolation – Even if there is concurrency, effects should be the same as if sequential

Durability – Once it is done, it is done (.)

Minimize overhead, while maximizing performance

Relational Databases:

Additional Considerations

Slide25

Caching is critical

Durability will require consideration of hardware

SSD or other

FlashRAM

layer?

Battery powered backup?

Etc

Indexes and data stored separately

Pack as many indexes into RAM as possible. Think like Haystack, but 40 years earlier.

Multiple indexes, separate for each use

Indexes and data different caching domains

Relational Databases:

Implications OF ENVIRONMENT, GOALS

Slide26

Block/Page based storage of data pages

Often LRU cached

All same size for interchangeable allocation and caching, mapping to disk blocks,

etc

Indexes constructed to speed queries

B-Tree based – sorted, range-based,

etc

Caching may be page-based LRU.

Hash based – fast lookup without rich features

Trades disk space for lookup speed

Exact lookup only, not ranges, etc.

May be dynamic

Lookup results can be cached.

Indexes point into data pages

Commonly created by administrator, not based upon any intrinsic understanding by the database, itself

Relational Databases:

Typical Organization

Slide27

Backups are hard because they may require a freeze

Blurry backups may not be useful because of ACID

Some algorithms can make blurry backup and fix-up

Replicated databases are a common option

A few replicas provide robustness without blowing scale

Need policies for working through failure and obtaining recovery

Need atomic commit protocol or master-slave,

etc

, to maintain consistency

Relational Databases:

Obtaining Robustness

Slide28

Scaling to a point is easy enough

Buy bigger disk

Solution can be replicated

Scaling even bigger is easy enough

Buy an array

Solution can be replicated

Scaling even bigger gets complicated

Distributing whole tables makes joins more expensive

Sharding

tables by row makes summarizing across attributes slow

Sharding

tables by column makes lookup by entity slow

At a certain point, it may be better to focus on precise needs and structure a solution to meet them

NoSQL Databases: Row-Oriented, Column-Oriented, Key-Value Stores, etc.

Relational Databases:

Scaling Up

Slide29

Distill data as much as possible.

Make representation of data as dense as possible.

Keep data redundantly, if (absolutely needed) to allow distribution with performance

Relational Databases:

Scaling Up, Simple Ideas

Slide30

SQL:The STORY CONTINUES

There is a ton of theory behind relational databases

There is a ton more to the SQL language

There is a ton more to how SQL databases are organized, indexed, etc.

Take 15-615 or 15-645 if interested.

Slide31

Moving Forward:Data (Not File) Storage at Scale

Key-Value Stores

Column, Row, and Document-Oriented Databases

RAM layer storage