/
Distributed Systems CS 15-440 Distributed Systems CS 15-440

Distributed Systems CS 15-440 - PowerPoint Presentation

karlyn-bohler
karlyn-bohler . @karlyn-bohler
Follow
357 views
Uploaded On 2019-02-14

Distributed Systems CS 15-440 - PPT Presentation

Caching Part I Lecture 15 November 1 2017 Mohammad Hammoud Today Last Lecture Pregel amp GraphLab Todays Lecture Latency and Bandwidth Introduction to Caching Announcements ID: 751800

server client replication side client server side replication data objects clients caching overlapping replicationwould perform requests latency proxy object locality cost spatial

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Distributed Systems CS 15-440" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Distributed SystemsCS 15-440

Caching – Part I

Lecture 15

, November 1,

2017

Mohammad Hammoud Slide2

Today…Last Lecture: Pregel & GraphLab

Today’s Lecture:

Latency and Bandwidth

Introduction to Caching

Announcements:

PS4 is due today by midnight

P3 is due on Nov 12th by midnight

Quiz II is on Nov 16 (during the recitation time)Slide3

Latency and BandwidthLatency and bandwidth are partially intertwinedIf bandwidth is saturatedCongestion occurs and latency increases

If bandwidth is not at peak

Congestion will not occur, but latency will NOT decrease

E.g., Sending a bit on a non-congested 50Mbps medium is not going to be faster than sending 32KB

Bandwidth can be easily increased, but it is inherently hard to decrease latency!Slide4

Latency and BandwidthIn reality, latency is the killer; not bandwidthBandwidth can be improved through redundancy

E.g., More pipes, fatter pipes, more lanes on a highway, more clerks at a

store, etc.,

It costs money, but not fundamentally difficult

Latency is much harder to improve

Typically, it requires deep structural changes

E.g., Shorten distance, reduce path length, etc.,

How can we reduce latency in distributed systems?Slide5

Replication and CachingOne way to reduce latency is to use replication and cachingWhat is replication?Replication is the process of maintaining several copies of data at multiple locations

Afterwards, a client can access the replicated copy that is nearest to it, potentially saving latency

What is caching?

Caching is a special kind of client-controlled replication

In particular,

client-side

replication is referred to as cachingSlide6

Replication and CachingExample ApplicationsCaching webpages at the client browserCaching IP addresses at clients and DNS Name ServersReplication in Content Delivery Network (CDNs)

Commonly accessed contents, such as software and streaming media, are cached at various network locations

Main Server

Replicated ServersSlide7

DilemmaCDNs address a major dilemmaBusinesses want to know your every click and keystrokeThis is to maintain deep, intimate knowledge of clients

Client-side caching hides this knowledge from servers

So, servers mark pages as “

uncacheable

This is often a lie, because the content is actually cacheable

But, the lack of caching hurts latency and subsequently user experience!!

Can businesses benefit from caching without giving up control?Slide8

CDNs: A Solution to this DilemmaThird party caching sites (or providers) provide hosting services, which are trusted by businessesA provider owns a collection of servers across the Internet

Typically, its hosting service can

dynamically

replicate files on different servers

E.g., Based on the popularity of a file in a region

Examples

:

Akamai (which pioneered CDN in the late 1990s)

Amazon

CloudFront

CDN

Windows Azure CDN

8Slide9

CDNs: A Solution to this Dilemma

9Slide10

Client- vs. Server-side ReplicationWould replication help if clients perform non-overlapping requests to data objects?Yes, through client-side cachingSlide11

Client- vs. Server-side ReplicationWould replication help if clients perform non-overlapping requests to data objects?Yes, through client-side caching

O

0

O

1

O

2

O

3

Server

Client 1

Client 2Slide12

Client- vs. Server-side ReplicationWould replication help if clients perform non-overlapping requests to data objects?Yes, through client-side caching

O

0

O

1

O

2

O

3

Server

Client 1

Client 2Slide13

Client- vs. Server-side ReplicationWould replication help if clients perform non-overlapping requests to data objects?Yes, through client-side caching

O

0

O

1

O

2

O

3

O

0

O

1

Server

Client 1

Client 2Slide14

Client- vs. Server-side ReplicationWould replication help if clients perform non-overlapping requests to data objects?Yes, through client-side caching

O

0

O

1

O

2

O

3

O

0

O

1

Server

Client 1

Client 2Slide15

Client- vs. Server-side ReplicationWould replication help if clients perform overlapping requests to data objects?Yes, through server-side replicationSlide16

Client- vs. Server-side ReplicationWould replication help if clients perform overlapping requests to data objects?Yes, through server-side replication

O

0

O

1

O

2

O

3

Server

Client 1

Client 2

ProxySlide17

Client- vs. Server-side ReplicationWould replication help if clients perform overlapping requests to data objects?Yes, through server-side replication

O

0

O

1

O

2

O

3

Server

Client 1

Client 2

ProxySlide18

Client- vs. Server-side ReplicationWould replication help if clients perform overlapping requests to data objects?Yes, through server-side replication

O

0

O

1

O

2

O

3

Server

Client 1

Client 2

ProxySlide19

Client- vs. Server-side ReplicationWould replication help if clients perform overlapping requests to data objects?Yes, through server-side replication

O

0

O

1

O

2

O

3

Server

Client 1

Client 2

O

0

ProxySlide20

Client- vs. Server-side ReplicationWould replication help if clients perform overlapping requests to data objects?Yes, through server-side replication

O

0

O

1

O

2

O

3

Server

Client 1

Client 2

O

0

ProxySlide21

Client- vs. Server-side ReplicationWould replication help if clients perform overlapping requests to data objects?Yes, through server-side replication

O

0

O

1

O

2

O

3

Server

Client 1

Client 2

O

0

ProxySlide22

Client- vs. Server-side ReplicationWould replication help if clients perform overlapping requests to data objects?Yes, through server-side replication

O

0

O

1

O

2

O

3

Server

Client 1

Client 2

O

0

ProxySlide23

Client- vs. Server-side ReplicationWould replication help if clients perform overlapping requests to data objects?Yes, through server-side replication

O

0

O

1

O

2

O

3

Server

Client 1

Client 2

O

0

ProxySlide24

Client- vs. Server-side ReplicationWould replication help if clients perform overlapping requests to data objects?Yes, through server-side replication

O

0

O

1

O

2

O

3

Server

Client 1

Client 2

O

0

ProxySlide25

Client- vs. Server-side ReplicationWould replication help if clients perform overlapping requests to data objects?Yes, through server-side replication

O

0

O

1

O

2

O

3

Server

Client 1

Client 2

O

0

ProxySlide26

Client- vs. Server-side ReplicationWould combined client- and server-side replication help if clients perform overlapping requests to data objects?YesSlide27

Client- vs. Server-side ReplicationWould combined client- and server-side replication help if clients perform overlapping requests to data objects?Yes

O

0

O

1

O

2

O

3

Server

Client 1

Client 2

O

0

ProxySlide28

Client- vs. Server-side ReplicationWould combined client- and server-side replication help if clients perform overlapping requests to data objects?Yes

O

0

O

1

O

2

O

3

Server

Client 1

Client 2

O

0

Proxy

O

0

O

0Slide29

CachingWe will focus first on caching then replicationThe basic idea of caching (it is very simple):A data object is stored far away

A client needs to make

multiple references

to that object

A copy (or a

replica

) of that object can be created and stored nearby

The client can

transparently

access the replica instead

Local storage used for client-side replicas is referred to as “cache”Slide30

Simple Cache MetricsReferences = Number of attempts to find an object in a cacheHits = Number of successesMisses = Number of failuresMiss Ratio

= Misses/References

Hit Ratio

= Hits/References = (1 − Miss Ratio)

Expected Cost of a Reference

=

(Miss Ratio × cost of miss) +

(Hit Ratio × cost of hit)

Cache Advantage

= (Cost of Miss / Cost of Hit)

Where cost is measured in time delay to access an objectSlide31

Why Caching is Effective?Applications tend to reuse data they accessed recentlyReferred to as the principle of locality

Two different types of locality:

Temporal locality

Recently accessed objects are likely to be accessed again

Spatial locality

Objects that are near one another are likely to be accessed successivelySlide32

Why Caching is Effective?The principle of locality enables:Effective caching PrefetchingI.e., Fetching an object that is

likely

to be requested before it is

actually

requested, thus resulting in a cache hit when requested

Enabled especially by spatial locality

Applications with (minimal or) no data reuse (e.g., a streaming application like streaming a video), do not benefit from caching

They may though benefit from prefetchingSlide33

Temporal and Spatial LocalitiesTemporal and spatial localities are very differentCaching implementations often tightly combine themOne can exist without the otherSpatial without temporal (e.g., linear scan of huge file)

Temporal without spatial (e.g., tight loop accessing just

one

object)

Example:

rm

-f *”

The shell expands “*” into a list

The loop iterates through the list

stat object

unlink object

The parent directory exhibits temporal locality

The directory entries exhibit spatial localitySlide34

Three Key QuestionsWhat data should be cached and when?Fetching Policy

How can updates be made visible everywhere?

Consistency or Update Propagation Policy

What data should be evicted to free up space?

Cache Replacement PolicySlide35

Next ClassContinue with Caching…