/
Balancing Throughput and Latency to Improve Real-Time Balancing Throughput and Latency to Improve Real-Time

Balancing Throughput and Latency to Improve Real-Time - PowerPoint Presentation

min-jolicoeur
min-jolicoeur . @min-jolicoeur
Follow
367 views
Uploaded On 2018-03-07

Balancing Throughput and Latency to Improve Real-Time - PPT Presentation

IO Service in Commodity Systems Mark Stanovich 1 Outline Motivation and Problem Approach Research Directions Multiple worstcase service times Preemption coalescing Conclusion 2 Overview ID: 641956

throughput time cpu service time throughput service cpu latency network real requests research cost priority system response arrival preemption

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Balancing Throughput and Latency to Impr..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Balancing Throughput and Latency to Improve Real-Time I/O Service in Commodity Systems

Mark Stanovich

1Slide2

Outline

Motivation and ProblemApproachResearch DirectionsMultiple worst-case service times

Preemption coalescing

Conclusion

2Slide3

Overview

Real-time I/O support usingCommercial-of-the-shelf (COTS) devices

General purpose operating systems (OS)

Benefits

Cost effectiveShorter time-to-market

Prebuilt components

Developer familiarity

Compatibility

3Slide4

Example:Video Surveillance System

Receive video

Intrusion detection

Recording

Playback

Local network

Internet

4

CPU

Network

Changes to make the system work?

How do we know the system works?Slide5

Problem with Current I/O in Commodity Systems

Commodity system relies on heuristicsOne size fits allNot amenable to RT techniques

RT too conservative

Considers a missed deadline as catastrophic

Assumes a single worst case

RT theoretical algorithms ignore practical considerations

Time on a device

service providedEffects of implementationOverheadsRestrictions

5Slide6

Approach

Balancing throughput and latencyVariability in provided serviceMore distant deadlines allow for higher throughput

Tight deadlines require low latency

Trade-off

Latency and throughput are not independent

Maximize throughput while keeping latency low enough to meet deadlines

6

http://www.wikihow.com/Race-Your-CarSlide7

Latency and Throughput

7

time

arrivals

Smaller

Scheduling Windows

LargerSlide8

Observation #1:

WCST(1) * N > WCST(N)

Sharing cost of I/O overheads

I/O service overhead examples

Positioning hard disk head

Erasures required when writing to flash

Less overhead higher throughput

8Slide9

Device Service Profile Too Pessimistic

Service rate workload dependentSequential vs. randomFragmented vs. bulk

Variable

levels of achievable service by issuing multiple requests

9

min access size

seek time

rotational

latencySlide10

Overloaded?

25

50

0

RT

1

15

25

50

0

15

10

RT

2

+

time

25

50

0

RT

1

+RT

2

75

75

75Slide11

Increased System Performance

25

50

0

RT

1

time

25

50

0

15

25

50

0

15

11

RT

2

RT

1

+RT

2Slide12

+

Small Variations Complicate Analysis

time

25

50

0

RT

1

+RT

2

RT

1

RT

2

arrivals

deadlines

25

50

0

15

5Slide13

Current Research

Scheduling algorithm to balance latency and throughputSharing the cost of I/O overheadsRT and NRTAnalyzing amortization effect

How much improvement?

Guarantee

Maximum lateness

Number of missed deadlines

Effects considering sporadic tasks

13Slide14

Observation #2:Preemption, a double-edged sword

Reduces latencyArrival of work can begin immediately

Reduces throughput

Consumes time without providing service

ExamplesContext switches

Cache/TLB misses

Tradeoff

Too often reduces throughput

Not often enough increases latency

14Slide15

Preemption

15

time

deadline

arrivalsSlide16

Cost of Preemption

16

CPU time for a jobSlide17

Cost of Preemption

17

Context switch time

CPU time for a jobSlide18

Cost of Preemption

18

Context switch time

Cache misses

CPU time for a jobSlide19

Current Research:How much preemption?

19

Network packet arrivals

timeSlide20

Current Research:How much preemption?

20

Network packet arrivals

timeSlide21

Current Research:How much preemption?

21

Network packet arrivals

timeSlide22

Current Research:Coalescing

Without breaking RT analysisBalancing overhead of preemptions and requests servicedInterrupts

Good

: services immediately

Bad: can be costly if occurs too oftenPolling

Good:

batches work

Bad: may unnecessarily delay service

22Slide23

Average Response TimeSlide24

Average Response TimeSlide25

Can we get the best of both?

Sporadic Sever

Light Load

Low response time

Polling Sever

Heavy

Load

Low response time

No dropped

pktsSlide26

Average Response TimeSlide27

Conclusion

Implementation effects force a tradeoff between throughput and latencyExisting RT I/O support is artificially limitedOne size fits all approach

Assumes a single worst-case

Balancing throughput and latency uncovers a broader range of RT I/O capabilities

Several promising directions to explore

27Slide28

Extra Slides

28Slide29

Latency and Throughput

Timeliness depends on min throughput and max latency

Tight timing constraints

Smaller number requests to consider

Fewer possible service orders

Low latency, Low throughput

Relaxed timing constraints

Larger number of requests

Larger number of possible service ordersHigh throughput, high latency

lengthen latency

increase throughput

time interval

resource

(service provided)

29Slide30

System Resources

Observation #3:

RT

Interference on Non-RT

Non-real time

!=

not important

Isolating RT from NRT is important

RT can impact NRT throughput

30

RT

Anti-virus

Backup

MaintenanceSlide31

Current Research:Improving Throughput of NRT

Pre-allocationNRT applications as a single RT entityGroup multiple NRT requests

Apply throughput techniques to NRT

Interleave NRT requests with RT requests

Mechanism to split RT resource allocation

POSIX sporadic server (high, low priority)

Specify low priority to be any priority including NRT

31Slide32

Research

DescriptionOne real-time applicationMultiple non-real time applicationsLimit NRT interference

Provide good throughput for non-real-time

Treat hard disk as black box

32Slide33

Amortization Reducing Expected Completion Time

Higher throughput

(More jobs serviced)

Lower throughput

(Fewer jobs serviced)

(Queue size increases)

(Queue size decreases)Slide34

Livelock

All CPU time spent dealing with interruptsSystem not performing useful workFirst interrupt is usefulUntil packet(s) for interrupt are processed, further interrupts provide no benefit

Disable interrupts until no more packets (work) available

Provided notification needed for scheduling decisions

34Slide35

Other ApproachesOnly account for time on device [

Kaldewey 2008]Group based on deadlines [ScanEDF , G-EDF]

Require device-internal knowledge

[Cheng 1996]

[Reuther 2003]

[Bosch 1999]

35

vs.Slide36

“Amortized” Cost of I/O Operations

WCST(n) << n * WCST(1)

Cost of some ops can be shared amongst requests

Hard disk seek time

Parallel access to flash packages

Improved

minimum

available resource

36

WCST(5)

5 * WCST(1)

timeSlide37

Amount of CPU Time?

37

Sends ping

traffic to B

Receive and respond

to packets from

A

A

B

deadline

arrival

interrupt

deadlineSlide38

Measured Worst-Case Load

38Slide39

Some Preliminary Numbers

ExperimentSend n random read requests simultaneously

Measure longest time to complete

n

requestsAmortized cost per request should decrease for larger values of

n

Amortization of seek operation

Hard Disk

n

random requests

39Slide40

50 Kbyte Requests

40Slide41

50 Kbyte Requests

41Slide42

Observation #1:I/O Service Requires CPU Time

ExamplesDevice drivers

Network protocol processing

Filesystem

RT analysis must consider OS CPU time

42

Apps

Device

(e.g., Network adapter, HDD)

OSSlide43

Example System

Web servicesMultimediaWebsiteVideo surveillance

Receive video

Intrusion detection

Recording

Playback

Local network

Internet

43

Network

All-in-one

server

CPUSlide44

Example

44

time

App

arrival

deadlineSlide45

Example: Network Receive

45

time

deadline

App

arrival

interrupt

App

OS

OS

deadlineSlide46

OS CPU Time

Interrupt mechanism outside control of OSMake interrupts schedulable threads [Kleiman1995]Implemented by RT Linux

46Slide47

Example: Network Receive

47

time

deadline

App

OS

arrival

interrupt

App

OSSlide48

Other Approaches

MechanismEnable/disable interruptsHardware mechanism (e.g., Motorola 68xxx)Schedulable thread

[Kleiman1995

]

Aperiodic servers (e.g., sporadic server [Sprunt 1991

])

Policies

Highest priority with budget [Facchinetti

2005]Limit number of interrupts [Regehr 2005]

Priority inheritance [Zhang 2006]Switch between interrupts and schedulable thread [Mogul 1997]

48Slide49

Problems Still Exist

Analysis?Requires known maximum on the amount of priority inversionWhat is the maximum amount?

Is enforcement of the maximum amount needed?

How much CPU time?

Limit using POSIX defined aperiodic server

Is an aperiodic server sufficient?

Practical considerations?

Overhead

Imprecise control

Can we back-charge an application?No priority inversion charge to applicationPriority inversion charge to separate entity

49Slide50

Concrete Research Tasks

CPUI/O workload characterization [RTAS 2007]Tunable demand [RTAS 2010, RTLWS 2011]

Effect of reducing availability on I/O service

Device

Improved schedulability due to amortization [RTAS 2008]

Analysis for multiple RT tasks

End-to-end I/O guarantees

Fit into analyzable framework

[RTAS 2007]Guarantees including both CPU and device components

50Slide51

Feature Comparison

51Slide52

New Approach

Better ModelInclude OS CPU consumption into analysisEnhance OS mechanisms to

allow better system design

Models built on empirical observations

Timing information unavailableStatic analysis not practical and too

pessimistic

Resources operate at a variety of service rates

Tighter deadlines == lower throughput

Longer deadlines == higher throughput

52Slide53

Example:

Rate-Latency Curve Convolution

=

Latency

1

Latency

2

Latency

1

+ Latency

2

rate

1

rate

2

rate

1

 

53Slide54

A Useful Tool: Real-Time Calculus

Based on network calculus, derived from queueing theoryProvides an analytical framework to compose systemMore precise analysis (bounds) especially for end-to-end analysis

Can be used with existing models (e.g., periodic)

Provides a very general representation for modeling systems

54Slide55

End-to-End AnalysisI/O service time includes multiple components

Analysis must consider all componentsWorst-case delay for each?Is this bound tight?Framework to “compose” individual resources

Tx

R

x

Device

request

response

55Slide56

Real-Time Calculus

Δ

α

β

(min service curve)

(max arrival curve)

Maximum horizontal distance

is the

worst-case response time

56Slide57

Real-Time Calculus [Thiele 2000]

 

 

 

 

2

 

 

 

2

 

57

workload

(arrivals)

resourcesSlide58

Apps

Tx

R

x

Device

Composing RT I/O Service

58Slide59

Constraint on Output Arrival

Deconvolution

Envelope arrival curve

γ – maximum service curve

β – min service curve

- input

- output

 

59Slide60

Timing Bounds

measured

possible

analytical

upper-bound

frequency

response

time

actual

upper-bound

observable

upper-bound

empirical

upper-bound

0

60Slide61

Job

61Slide62

Task

Worst-case Execution

Time (WCET)

Inter-arrival Time

deadline

time

62Slide63

Theoretical AnalysisNon-preemptive job scheduling reduces to bin packing

(NP-hard)

63Slide64

Real-Time Calculus [Thiele 2000]

Resource availability in the time interval [s,t) is C[

s,t

)

 

time

 

 

64Slide65

Real-Time Calculus [Thiele 2000]

 

 

 

 

 

 

 

 

 

 

2

 

 

 

2

 

65Slide66

Real-Time Calculus

Δ

α

β

(service curve)

(arrival curve)

Maximum horizontal distance

is the

worst-case response time

Maximum vertical distance

is

maximum queue

length

 

 

66Slide67

Network Calculus

 

 

 

 

 

 

 

 

 

67Slide68

68

Apps

Apps

CPU

CPU

Apps

CPUSlide69

Real-Time Background

Explicit timing constraintsFinish computation before a deadlineRetrieve sensor reading every 5 msecs

Display image every 1/30

th

of a secondSchedule (online) access to resources to meet timing constraints

Schedulability analysis (offline)

Abstract models

Workloads

ResourcesScheduling algorithm

App

n

App

2

App

1

69Slide70

Current Research:Analyzing CPU Time for IOs

Applications demand CPU time

Measure the interference

Ratio of max demand to interval length defines load

Schedulability (fixed-task priority)

Characterize I/O CPU time in terms of a load function

 

70

Task under

consideration

Interference from

higher priority tasksSlide71

How to measure load

I/O CPU component at high priorityMeasurement task at low priority

time

71Slide72

Measured Worst-Case Load

72Slide73

Analyzing

73

 

Task under

consideration

Interference from

higher priority tasks

τ

1

is a periodic task (WCET =2, Period = 10)Slide74

Bounding

74Slide75

Adjusting the Interference

May have missed worst-caseCPU time consumed too highAperiodic serversForce workload into a specific workload model

Example: Sporadic server

75Slide76

Future Research

Combine bounding and accountingAccountingCharge user of servicesCannot always charge correct account

Bound

Set aside separate account

If exhausted disable I/O until account is replenished

76Slide77

Future Research:Practicality of Aperiodic Servers

Practical considerationsIs the implementation correct?Overhead

Context switches

Latency

vs Throughput

77Slide78

Real time

Non-real time

OS scheduler

Past Research:

ThrottlingSlide79

“Amortized” Cost of I/O Operations

WCST(n) << n * WCST(1)

Cost of some ops can be shared amongst requests

Hard disk seek time

Parallel access to flash packages

Improved

minimum

available resource

79Slide80

Seek Time AmortizationSlide81

Seek Time AmortizationSlide82

Seek Time AmortizationSlide83

50 Kbyte Requests

83Slide84

Example System

Web servicesMultimediaWebsiteVideo surveillance

Receive video

Intrusion detection

Recording

Playback

Local network

Internet

84

CPU

Network

All-in-one

server

How do we make the system work?