Database I/O Mechanisms Performance and persistence

Database I/O Mechanisms Performance and persistence Database I/O Mechanisms Performance and persistence - Start

2018-03-22 30K 30 0 0

Database I/O Mechanisms Performance and persistence - Description

Richard Banville. Fellow, OpenEdge Development. Progress Software. Agenda. 1. Database I/O Types. User Data I/O. Recovery Data I/O. Other I/O. 2. 3. 4. Summary. 5. File Write I/O for File Types. Logical . ID: 660914 Download Presentation

Download Presentation

Database I/O Mechanisms Performance and persistence




Download Presentation - The PPT/PDF document "Database I/O Mechanisms Performance and ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentations text content in Database I/O Mechanisms Performance and persistence

Slide1

Database I/O Mechanisms

Performance and persistence

Richard Banville

Fellow, OpenEdge Development

Progress Software

Slide2

Agenda

1

Database I/O Types

User Data I/O

Recovery Data I/O

Other I/O

2

3

4

Summary

5

Slide3

File Write I/O for File Types

Logical vs PhysicalDatabase request vs

OS I/O

Database I/O

vs O/S I/O

Physical I/O always uses file system cache (no raw I/O)Buffered vs unbuffered I/OUnbuffered I/O considered durable after write system callRecovery data with integrityUser data with -directioBuffered I/O requires file system sync. for durabilityRecovery data with no integrity

User data

Slide4

OpenEdge I/O & The File System

Database

Buffer Pool

BI

Buffers

A

I

Buffers

File system

cache

.d

.d

.d

.d

.d

.d

.d

.b

.d

.a

System Memory

Process Shared Memory

Physical

Disk Devices

Multi level caches

Multi level caches

I/O via F/S cache

Slide5

OpenEdge Data I/O & The File System

Database

Buffer Pool

File system

cache

System Memory

Process Shared Memory

.d

.d

.d

.d

.d

.d

Multi level caches

Buffered I/O

to F/S cache

F/S decides when to write to disk device

Disk device decides when to write to physical disk

At checkpoint, made durable via

fdatasync() / FlushFileBuffers()

Required

for crash recovery and Bi space reuse to work properly

Promon Checkpoints:

Flushes

Duration

Sync Time

0

0.20

0.02

4

0.20

0.04

4

0.17

0.02

2

0.22

0.03

Disk Devices

Slide6

OpenEdge –

directio

I/O & The File System

Database

Buffer Pool

File system

cache

System Memory

Process Shared Memory

.d

.d

.d

.d

.d

.d

Multi level caches

-

directio

Unbuffered I/O thru F/S cache

Not

raw

I/O to disk device

Each I/O sync‘d to disk device

Operational affects

No

need to sync at checkpoint

Write I/O more expensive

Additional cost to page writers

Promon Checkpoints:

Flushes

Duration

Sync Time

0

0.16

0.00

2

0.18

0.00

11

0.16

0.00

0

0.18

0.00

Disk Devices

Slide7

OpenEdge –

directio

I/O Performance

Database

Buffer Pool

File system

cache

System Memory

Process Shared Memory

.d

.d

.d

.d

.d

.d

Multi level caches

How could the more expensive writes of –

directio

improve performance?

APWs

absorb the additional cost

If they

do all the writing without adding OLTP contention

Lower

checkpoint costs

Each I/O sync‘d to disk device

No sync needed during

checkpoint

Higher throughput

due to less pause

May help on inadequate file system

Less useful for

Well tuned deployments

Properly sized systems

When buffers flushed at checkpoint

Disk Devices

Slide8

OpenEdge Recovery I/O & The File System

Unbuffered I/O to F/S cacheEach I/O sync‘d to disk device

For .bi, called “reliable I/O”

BI blocks written when:

BIW notices full block in out buffer

APW writes data block with bi dependancyBroker notices aged commit (-Mf)User can‘t find empty bi block to store update notesUser must perform checkpoint

BI

Buffers

File system

cache

System Memory

Process Shared Memory

Multi level caches

.d

.b

Disk Devices

Slide9

OpenEdge Recovery: Making it unreliable

Never in productionSpecific maintenance only

-r: BI writes are buffered (un-reliable) to F/S

All change notes recorded

Rollback will work

Crash recovery likely to workRecovery from OS crash will most likely failidxbuild some index, !“some; !”

BI

Buffers

File system

cache

System Memory

Process Shared Memory

Multi level caches

.d

.b

***

An earlier -r session crashed

,

the database may be damaged

. (514)

Disk Devices

Slide10

OpenEdge Recovery: Making it more unreliable

Never

ever

in production

Specific maintenance only

-i: no-integrityBI writes are bufferedNo data dependency check (!WAL)No F/S sync at checkpointNo record of purely physical notesRollback might workOS, DB crash, abnormal termination

Must restore from backup

BI

Buffers

File system

cache

System Memory

Process Shared Memory

Multi level caches

.d

.b

** Your database cannot be repaired.

You must restore a backup copy. (510)

Disk Devices

Slide11

Agenda

1

Database I/O Types

User Data I/O

Recovery Data I/O

Other I/O

2

3

4

Summary

5

Slide12

Buffer Pool I/O

Database

Buffer Pool (-B, -B2)

4

160

32

128

64

2

144

192

112

80

LRU buffer eviction policy

LRU2

buffer eviction

policy

Database

Buffer

Lookup

If not found via hash table lookup

Incur O/S read I/O – “page-in”

But where do you read into?

1 Buffer pool cache

1 Hash table

Multiple LRU

replacement chains

Buffer pool hash table

Slide13

Buffer Pool I/O

Database

Buffer Pool (-B, -B2)

C

C

C

C

D

D

D

C

C

D

LRU buffer eviction policy

LRU2

buffer eviction

policy

Database

Buffer

Lookup

Start at LRU end of buffer replacement chain

Look for first “non-dirty” buffer (to avoid write)

Can’t find one after 10 tries?

“Page-out” least recently used buffer (O/S write I/O) “LRU writes”

May force (multiple) BI/AI writes, usually partial writes!

“Page-in” your block to available buffer (O/S read I/O)

1 Buffer pool cache

1 Hash table

Multiple LRU

replacement chains

Buffer pool hash table

Slide14

Data Read I/O Tuning

Avoiding read I/OLarge buffer pool (-B)

Utilize alternate buffer pool (-B2)

Improve queries; Avoid table scans; Cache data locally

Private “read-only” buffers (–Bp),

utilities too!Increase pool when read I/O unacceptable for properly tuned applicationToo many buffers may cause O/S pagingDecrease file system cacheAvoid non-essential activities on production serverConsider buying more memory

Database Buffer Pool-B & -B2 buffers

I/O

DB

Increase performance by decreasing I/O

Slide15

Promon R&D => Performance indicators

Promon R&D => Buffer cache

O/S reads and O/S writes

Flushed at checkpoint

LRU Writes

APW

enqueues

*

Data I/O Performance Monitoring - Promon

What about buffer pool hit

ratio % (BHR)?Too easily skewed by bad queriesNot a fine enough metric (hits / requests)270,000 database read requests / secondBuffer hit ratio % of

98Still means 5,400 O/S Read I/Os per second!Fast F/S access still 75x slower than -B

Database Buffer Pool-B & -B2 buffers

I/O

DB

A low BHR indicates a

poorly tuned

system

A high BHR does not

denote a well tuned

system

Slide16

Data Write I/O Tuning

Avoiding write I/OLarge buffer pool lessens forced “page-outs”Improve queries in the application

Reduce checkpoint frequency (see next section)

Run with APWs (

Have someone else do it

!)Avoids user and server writesDecreases LRU writes (forced “page-outs”)Reduces checkpoint timePerforms DB buffer pool I/OMay flush AI and BI data

Database Buffer Pool-B & -B2 buffers

I/O

DB

Increase performance by decreasing I/O

Slide17

Asynchronous Page Writer Activities

Checkpoint

Queue

Primary –B buffer pool

AND

Alternate –B2 buffer pool

C

D

C

D

D

C

LRU

chains

4

148

200

120

BI

WAL

APW

DB

D

D

D

APW Queue

Forced bi write only if cluster > 95% full

New adaptive mechanism for checkpoint processing

Avoids buffers flushed

10.2b FCS

#1

#2

#3

Slide18

Asynchronous Page Writer Performance

Checkpoint

Queue

Primary –B buffer pool

AND

Alternate –B2 buffer pool

R

U

R

U

U

R

LRU

chains

4

148

200

120

BI

WAL

APW

DB

U

U

U

APW Queue

Promon R&D => Page Writers

APW queue writes

Checkpoint queue writes

Buffers scanned

Scan writes

Tuning

Increase until 0 blocks flushed at checkpoint

Decrease if partial BI writes increase

Increasing BI cluster size can avoid:

partial BI writes

forcing BI writes (95% full less of the time)

Typically need more if running with Direct I/O

Slide19

Agenda

1

Database I/O Types

User Data I/O

Recovery Data I/O

Other I/O

2

3

4

Summary

5

Slide20

Rollback Processing

BI Buffer Pool

-

bibufs

10

Free(a)

Free(b)

Free(c)

Free(d)

Free(e)32

31

3029

Modified Queue

Free List

15

Current Input Buffer

9

Backout Buffer

12

Backout Buffer

BI

Current Output Buffer

New Notes (Actions)

Forward Processing

Slide21

BI Buffer Pool – Recording a change

-

bibufs

10

Free(a)

Free(b)

Free(c)

Free(d)Free(e)

32

3130

29

Modified QueueFree List

BI

Current Output Buffer

New Notes (Actions)

Forward Processing

B I W

User

Empty buffer waits

Busy buffer waits

BIB latch contention

-

bwdelay

in ms (30ms)

Nap time when nothing dirty

Not much positive tuning affect

Slide22

BI Buffer Pool – Forced Write I/O

-

bibufs

10

Free(a)

Free(b)

Free(c)

Free(d)

Free(e)32

31

3029

Modified Queue

Free List

BI

Current Output Buffer

New Notes (Actions)

Forward Processing

User

Buffer Pool

172

128

Associated

BI note

dependency

ctr

(based on fill %)

Data Blocks

WAL

APW

DB

256

512

768

Checkpoint

Queue

Slide23

BI Buffer Pool – Write I/O

-

bibufs

10

Free(a)

Free(b)

Free(c)

Free(d)

Free(e)32

31

3029

Modified Queue

Free List

BI

Current Output Buffer

New Notes (Actions)

Forward Processing

Broker

User

Is it OK to buffer

modified

BI

blocks?

YES

Is it OK to buffer

committed

BI data

?

Delayed commit

(-Mf) is

up to you!

Delayed commit (Durability)

Based on –Mf value, Broker may flush BI buffers to disk

For aged

txn ends-Mf default 3Increasing -Mf Pros/Cons:

Slide24

Rollback Processing

BI

Buffer Pool –

Change rollback

-

bibufs 10

Free(a)Free(b)

Free(c)

Free(d)

Free(e)32

31

30

29Modified Queue

Free List

15

Current Input Buffer

9

Backout

Buffer

12

Backout

Buffer

BI

Current Output Buffer

New Notes (Actions)

Forward Processing

1 shared input buffer

Multiple private

back out buffers

Slide25

Rollback Processing

BI

Buffer Pool –

Change rollback

32

31

30

29

Modified Queue

15Current Input Buffer

9

Back out Buffer

12Back out Buffer

BI

Current Output Buffer

Read I/O to find notes

Write I/O when undoing

Promon:

BI Reads

Input buffer hits

Output buffer hits

Mod buffer hits

BO buffer hits

Slide26

Tuning the Bi Buffer Pool

-

bibufs

10

Free(a)

Free(b)

Free(c)

Free(d)

Free(e)32

31

3029

Modified Queue

Free List

BI

Current Output Buffer

New Notes (Actions)

Forward Processing

B I W

User

Run BIW

Promon: 5. BI Log Activity

Empty buffer waits – all full

Increase –

bibufs

(online)

-

aibufs

>= -

bibufs

Start with –

bibuf

150

Partial (forced) writes

-Mf expired

Increase if not risk adverse

Too many APWsTune checkpoint processing Busy buffer waits – busy - OKLog force waits/write – 2PC commit

Slide27

Monitoring BI Activity & Performance Summary

Activity

Forward Activity

Total BI writes

Records (notes) written

Clusters closedUndoTotal BI readsNotes read Input buffer hitsOutput buffer hits

Mod buffer hitsBO Buffer HitsPerformanceOK Waits & Writes

Busy buffer waitsBIW writesBad Waits & WritesEmpty buffer waits

Partial writesForced writes (2PC)Flushed at checkpointCheckpoint duration (wait)

Slide28

Checkpoint Processing

Slide29

Checkpoint Processing

Slide30

Promon Checkpoint Data

No.

Time

CPT Q

Scan

APW Q

Flushes(Cont.)

27

10:23:12…038452

0…26

10:22:46…0

3813813…

25

10:22:18

0

380

380

2

24

10:21:50

…201158158

0…

--------- Database Writes ---------

APW Specific Activity…

CPT Q: # data buffers APW wrote from checkpoint queue (from prev chkpt)

Scan: # data buffers APW wrote while scanning -B APW Q: # data buffers APW wrote from APW Q Dirty buffers added to APWQ from -B LRU eviction

Slide31

Promon Checkpoint Data

No.

Time

CPT Q

Scan

APW Q

Flushes(Cont.)

27

10:23:12…038452

0…26

10:22:46…0

3813813…

25

10:22:18

0

380

380

2

24

10:21:50

…201158158

0…

--------- Database Writes ---------

Flushes:

Number of database blocks written during checkpointVery costly operation (db updates paused)Should add ai

/bi flushesMarked from previous checkpointAvoid with APWs and larger cluster sizes

Slide32

Promon Checkpoint Data

No.

Time

Duration

Sync Time

27

10:23:12…

0.120.0426

10:22:46…0.110.03

2510:22:18

…0.110.04

2410:21:50

0.13

0.04

----- New Columns -----

Duration:

Time to process checkpoint including:

Write

chkpt

queue, buffer pool scan, bi/

ai

flush, F/S Sync Sync Time: Amount of time in seconds it took for fdatasync() or

FlushFileBuffers()Limit file system cache size and flush frequencyFaster disks for data filesAvoid with –directio (but increases all write I/Os)

File System Cache

DB

Slide33

Tuning Checkpoint Processing

Physical

BI truncate

Values in K

-bi (cluster size in KB)

-biblocksize (size in KB)Before-image block size set to 8 or 16 kbFollowed by sync command Runtime

BI bufsBIW

proutil <db>

-C truncate bi -biblocksize 8 -bi

8192

proutil <db> -C

bigrow 8 -r

Slide34

Summary: Recovery Subsystem

AI/BI buffersNo LRU replacement mechanismDatabase changes recorded orderly

Forward processing causes BI write I/O

Rollback may cause read I/O

Backout

Buffers (BOB) help rollback contentionCheckpointsBuffers flushed during checkpointPage writersBIW/AIW processingAPW processing

Slide35

Agenda

1

Database I/O Types

User Data I/O

Recovery Data I/O

Other I/O

2

3

4

Summary

5

Slide36

Database Extend

Database extend

Storage area locked - no other extends

Writes performed 16K at a time

Extend by 64 blocks or cluster size

Recovery extend (AI/BI)

Acquire space from F/S

Unbuffered write

Bi grow after truncate

Performance Improvements

F/S interaction for extent create 11.3

BI

extend, format

&

grow

in 11.3

Slide37

Monitoring I/O With Promon R&D

2. Activity Displays ...

1. Summary

3. Buffer Cache

4. Page Writers

5. BI Log / 6. AI Log8. I/O Operations by Type

9. I/O Operations by File

Database Accesses

vs

File I/O

Database writesO/S Writes

3. Other Displays…1. Performance Indicators2. I/O Operations by Process

4. Checkpoints5. I/O Operations by User by Table6. I/O Operations by User by Index

Slide38

Agenda

1

Database I/O Types

User Data I/O

Recovery Data I/O

Other I/O

2

3

4

Summary

5

Slide39

Summary

Slide40

Questions

?

Slide41

October

6–

9, 2013 • Boston

#PRGS13

www.progress.com/exchange-pug

Special

low rate of $495

for PUG Challenge attendees with the code PUGAM

And visit the Progress booth to learn more about the Progress App

Dev Challenge!

Slide42

Slide43

File Write I/O for File Types

I/O always uses file system cache (no raw I/O)Buffered vs unbuffered I/O

Unbuffered I/O considered durable after write system call

Buffered I/O requires file system sync.

f

or durabilityUser data files: <db>_<area>.d’s (table, index and LOB data)Changes recorded in .bi/.ai for undo/redo purposesUpdates use buffered I/OCan be overridden to use “unbuffered I/O” mechanism (-directio)

Recovery files: .ai, .bi, .tl (for recovery to work…)Updates recorded using “unbuffered I/O” mechanismAlways written before user data (WAL rule)Must be durable on disk when writtenFile system must observe write ordering

Slide44

Other parameters affecting I/O operations

-groupdelay (when –Mf 0)This is really group commit.User waits # ms to write end note so several commits can occur in same bi buffer write

Since –Mf 0, each commit would immediately force the bi buffer to be written.

-bwdelay

Forces BIW to delay # seconds between each bi buffer pool scan

Slide45

Recognizing You Have A Problem

Things are sluggishImprove cache utilizationAvoid causes of I/OPhysical disk activity

File system cache – too large or too slow

Users blocked on I/O

Promon “Blocked Users” screen

Checkpoint duration > 0.50Excessive I/OTable scans, buffer pool hit ratioRecovery subsystem (reads vs writes)APW, BIW, User writes

Slide46


About DocSlides
DocSlides allows users to easily upload and share presentations, PDF documents, and images.Share your documents with the world , watch,share and upload any time you want. How can you benefit from using DocSlides? DocSlides consists documents from individuals and organizations on topics ranging from technology and business to travel, health, and education. Find and search for what interests you, and learn from people and more. You can also download DocSlides to read or reference later.