/
Split-Level I/O Scheduling Split-Level I/O Scheduling

Split-Level I/O Scheduling - PowerPoint Presentation

phoebe-click
phoebe-click . @phoebe-click
Follow
426 views
Uploaded On 2016-07-18

Split-Level I/O Scheduling - PPT Presentation

Suli Yang Tyler Harter Nishant Agrawal Samer Al Kiswany Salini Selvaraj Kowsalya Anand Krishnamurthy Rini T Kaushik Andrea C ArpaciDusseau Remzi ID: 409269

scheduling level write block level scheduling block write scheduler split file system req framework deadline app requests schedulers tags

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Split-Level I/O Scheduling" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Split-Level I/O Scheduling

Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau Slide2

…yet another I/O scheduling paper?

2CFQ (2003)BFQ (2010)Deadline (2002)mClock (2011)Token-Bucket (2008)Libra (2014)

pClock (2007)

Fahrrad

(2008)

Y

FQ (1999)

Facade(2003)Slide3

Some mistakes we have been making for decades…

(in trying to build better schedulers)3Slide4

Current frameworks

fundamentally limitedCFQ, Deadline, Token-BucketImportant policies cannot be realizedFairness, Latency Guarantee, IsolationWasted effort trying to build new schedulers without fixing the framework4ProblemSlide5

Can we design a

simple and effective framework that lets us build schedulers to correctly realize important I/O policies?5Slide6

Solution: Split-Level Framework

Control: Allow scheduling at multiple levels Block levelSystem-call levelPage-cache levelInformation: Tag requests to identify the originSimplicity: Small set of hooks at key junctions within the storage stack6Slide7

Results

Three distinct policies implementedPriory, Deadline, IsolationLarge performance improvementsFairness: 12xTail latency: 4xIsolation: 6xGood foundation for applicationsReduce transaction latency for databasesImprove isolation for virtual machinesEffective rate limit for HDFS7Slide8

Overview

How I/O scheduling frameworks workSplit-Level Scheduling Framework: DesignSplit-Level Scheduler Case StudyConclusion8Slide9

Framework vs. Scheduler

Framework: A running environment (mechanism)Scheduler: Implement different policiesHow it works Framework provides callbacks to schedulers.9Slide10

Traditional Approach:

Block-Level I/O SchedulingPage Cache File SystemBlock-Level Queues

add_req

d

ispatch_req

r

eq_complete

Block-Level Scheduler

10

App

App

App

DeviceSlide11

Block-Level I/O Scheduling

11Simplified Complete Faire Queuing (CFQ) Implementation: Block-Level Queues dispatch_req

req_complete

Block-Level Scheduler

Device

a

dd_req

add_req

(r){

p =

r.submit_process

q =

get_queue

(p)

enqueue

(

q,r

)

}

dispatch_req

(){

q =

get_high_prio_queue

()

r =

d

e

queue

(q)

dispatch(r)

}

c

omplete_req

(r){

//clean up

}Slide12

Overview

What is an I/O scheduling frameworkSplit-Level Scheduling Framework: DesignThe reordering problemThe cause-mapping problemThe cost-estimation problem Split-Level Scheduler Case StudyConclusion12Slide13

Reordering

Scheduling is just reordering I/O requests13Slide14

File System

Data EntanglementBlock-Level Scheduler14File system tangles

data into one bundle Journal transactionShared metadata blockImpossible

for the schedulers to reorder

App1

App2Slide15

File System

Write DependenciesBlock-Level Scheduler15File systems carefully

order writes Schedulers cannot reorder

(

unless FS allows

)

App

tx1

tx2Slide16

Fundamental Limitation #1

(of block-level scheduling)The file system imposes ordering requirements contrary to the scheduling goals The scheduler cannot reorderToo late once data in the file system Need admission control16Slide17

Split-Level I/O Scheduling:

Multi-Layer HooksPage CacheFile SystemBlock-Level Queues

add_req

d

ispatch_req

r

eq_complete

Split-Level Scheduler

17

App

App

App

Device

w

rite()

fsync()

a

void data entanglement and ordering

above

the file systemSlide18

Cause Mapping

A scheduler needs to map an I/O request to the originating application18Slide19

Write Delegation

Page CacheBlock-Level SchedulerApp1App2

write()

write()

Write-back Daemon

Loss of cause

i

nformation!

Write-back daemon submits all requests

!

Write-back, journaling, delayed allocation….Slide20

Fundamental Limitation #2

(of block-level scheduling)Cause-mapping information lost within the frameworkImpossible to map an I/O request back to its originating application (no matter how you implement the scheduler)20Slide21

Split-Level I/O Scheduling: Tags

Page CacheBlock-Level SchedulerApp1App2

write()

write()

Write-back Daemon

Tags to identify origin

Tags pass across layers

1

1

2

1

1

2Slide22

Cost Estimation

A scheduler needs to estimate the cost of I/OMemory-level notification for timely estimateBlock-level notification for accurate estimateDetails in paper22Slide23

Split-Level I/O Scheduling Framework: Summary

Three key pieces: Multiple-layer hooks to prevent adverse file system interaction Tags to track causes across layersEarly memory-level notification of write workEasy Implementation~300 LOC in LinuxLittle added complexity for building schedulers23Slide24

Overview

How I/O scheduling frameworks workSplit-Level Scheduling Framework: DesignSplit-Level Scheduler Case StudyConclusion24Slide25

Challenge #1:

Priority SchedulerFairly allocate I/O resources based on the processes’ priorities25Slide26

Block-Level: CFQ

26goalWorkload:Eight processes with different priority (0-7), each sequentially writing its own file add_req(r){ p = r.submit_process q = get_queue(p) enqueue(q,r)}Slide27

Block-Level: CFQ

27the write-back threadadd_req(r){ p = r.submit_process q = get_queue(p) enqueue(q,r)}Slide28

Split-Level: AFQ

28CFQ deviate from the goal by 82%AFQ by 7% 12x improvementadd_req(r){ p = r.tagged_cause q = get_queue(p) enqueue(q,r)}Slide29

Challenge #2:

Deadline SchedulerProvide guaranteed latency of I/O requests29Slide30

Block-Deadline

Block-Deadline: cannot serve the low-latency requests until previous transaction completedFile SystemBlock-Deadline

App

tx1

tx2Slide31

Block-Deadline

Workload: Flush 4KB data to disk with or w/o background writesExpected Results: Operation finish within deadline (100ms)Slide32

Split-Deadline

Split-Deadline: suspend write() and fsync() to avoid many high-latency requests to accumulate in one transaction. File SystemSplit-Deadline

App

tx1

App

write()

fsync()

Write and fsync blocked to prevent high-latency data into FSSlide33

Split-Level:

Split-DeadlineSplit-Deadline maintains the deadline regardless of background writes. Slide34

The Fsync-Freeze Problem

During checkpointing, the system begins writing out the data that need to fsync()’d so aggressively that the service time for I/O requests from other processes go through the roof. ---Robert Hass (PostgreSQL)34Slide35

The Fsync-Freeze Problem

354x tail latency reduction.Split-Deadline solves the fsync-freeze problem!Workload: SQLite transaction with different checkpoint intervalExpected Results: Consistent transaction latency Slide36

Other Evaluation Results

Low overhead <1% runtime overhead <50 MB memory overheadOther schedulers Token-bucket for performance isolationOther applications PostgreSQL: latency guarantee for TPC-B workloads QEMU: provides isolation across VMs HDFS: effective I/O rate limit36Slide37

Overview

What is an I/O scheduling framework and how does it work. Split-Level Scheduling Framework: DesignSplit-Level Scheduler Case StudyConclusion37Slide38

Conclusion

For decades, people have been trying to build better block-level schedulersbound to fail without appropriate framework supportSplit-level framework enables correct scheduler implementationCross-layer tagsMulti-level hooksMemory-level notification38Source code and more information: http://research.cs.wisc.edu/adsl/Software/split/Slide39

Backup slides

39Slide40
Slide41

File System

Write DependenciesAppBlock-Level Scheduler

41Modern file system maintains data consistency by carefully

ordering writes

.

Schedulers

cannot reorder

unless file system allows it.

tx1

tx2Slide42

Split-Level I/O Scheduling:

Multi-Layer Hooks42System-call scheduling above the file system to avoid data entanglement.Block-level scheduling below the file system to maximize performance.Page Cache

App

App

App

r

ead()

write()

fsync()

File System

write-back

Block-Level Queues

a

dd_req

d

ispatch_req

r

eq_complete

Disk

SSD

SchedulerSlide43

Split-Level I/O Scheduling: Tags

43Write-heavy HDFS workload on a machine with 8GB RAM.Slide44

Split-Level I/O Scheduling: Tags

44Write-heavy HDFS workload on a machine with 8GB RAM.Slide45

Split-Level Framework Overhead

45I/O performance with noop scheduler:Slide46

Split-Level I/O Scheduling: Tags

46Write-heavy HDFS workload on a machine with 8GB RAM.Worse case memory overhead of tags: 50MB. Slide47

Block-Level: Windows

47Slide48

Performance Isolation

Sequential ReaderUnthrottledA:Throttled to 10MB/sB:48Slide49

Real Applications

49Slide50

Page Cache

Write Delegation

App1

App2

write()

write()

Block-Level Scheduler

50

w

rite-back

Loss of Cause Information!

The process that submitted the block-level requests may not be the process that issued the I/O.

Write-back, journaling, delayed allocation….Slide51

Page Cache

Split-Level I/O Scheduling: Tags

App1

App2

write()

write()

Block-Level Scheduler

51

w

rite-back

Use

tags

to track I/O request across layers and identify the originating application.

Tags identify

a set of processes

responsible for an I/O request.

1

1

2

1

1

2Slide52

Myth

#1 in I/O Scheduling: I don’t have to care about I/O scheduling. It is someone else’s problem…52Slide53

bottleneck of many systems, from

phones to servers. […our servers appear to freeze for tens of seconds during disk writes…]Foundation of performance isolation. […the interference as a result of competing I/Os remains problematic in a virtualized environment…]Pain points for databases, hypervisors, key-value stores and more.

[…one customer reported that just changing cfq to noop

solved

their

innoDB

IO problems

]

53

Why Is I/O Scheduling Relevant (to You)Slide54

Myth

#1 in I/O Scheduling: I don’t have to care about I/O scheduling. It is someone else’s problem…54

Fact #1: If you care about performance, you should care about I/O schedulingSlide55

Myth

#2 in I/O Scheduling: Can’t the disk (or SSD) handle all I/O scheduling? (Do I still need I/O scheduling in the era of SSD?)55Slide56

Device powerless when handed the “wrong” requests from the OS -- file system may withhold requestsDevices rely on OS-provided information --lack such mechanismsOther common reasons: --more contextual information --OS-level isolation unit --multi-device I/O scheduling56Why Should OS Do I/O SchedulingSlide57

Device is powerless when handed the “wrong” requests from the OS.Isolation can only be done at the OS level, as only OS knows about the isolation unit (processes, containers, or virtual machines). OS has more contextual information to assist I/O scheduling (e.g., file-based prefetching).Multi-device I/O scheduling can only be done at the OS-level.57Why Should OS Do I/O SchedulingSlide58

Myth

#2 in I/O Scheduling: Shouldn’t the disk (or SSD) handle all the I/O scheduling?58

Fact #2: OS has to issue the right request at the right timeSlide59

Current I/O scheduling frameworks are

fundamentally limited (so does any scheduler built under them).Important policies (isolation, fairness, meeting deadlines…) cannot be realized in current framework.Causes applications to suffer (databases, hypervisors, and more). […one customer reported that just changing cfq to noop sovled their innoDB IO problems…]

59Why Is I/O Scheduling Still An Open ProblemSlide60

Myth

#3 in I/O Scheduling: Isn’t it a solved problem? After all, we have many different I/O schedulers.60Slide61

Myth

#3 in I/O Scheduling: Isn’t it a solved problem? After all, we have many different I/O schedulers.61

Fact #3: Fundamental limitations in framework

Important policies cannot be realized Slide62

What is I/O Scheduling?

Applications submit I/O requests to storage devices.I/O scheduling: which requests, and when, to send to the device?Different scheduling goals and strategies for different schedulers.62Slide63

Block-Level I/O Scheduling

63Simplified Complete Faire Queuing (CFQ) Implementation: Block-Level Queues dispatch_req

req_complete

Block-Level Scheduler

Device

a

dd_req

add_req

(r){

p =

r.submit_process

q =

get_queue

(p)

enqueue

(

q,r

)

}

dispatch_req

(){

q =

get_high_prio_queue

()

r =

d

e

queue

(q)

dispatch(r)

}

c

omplete_req

(r){

//clean up

}