/
Distributed Systems Distributed Systems

Distributed Systems - PowerPoint Presentation

lois-ondreau
lois-ondreau . @lois-ondreau
Follow
343 views
Uploaded On 2019-11-07

Distributed Systems - PPT Presentation

Distributed Systems CS 15440 Caching Part II Lecture 21 November 28 2018 Mohammad Hammoud Today Last Lecture Onecopy semantic and cache consistency Todays Lecture Continue with cache consistency ID: 764213

server write consistency client write server client consistency lease cache read time caching based updates cached file webpage leases

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Distributed Systems" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Distributed SystemsCS 15-440 Caching – Part II Lecture 21, November 28, 2018 Mohammad Hammoud

Today…Last Lecture: One-copy semantic and cache consistency Today’s Lecture: Continue with cache consistency Announcements: Project 4 will be out today. It is due on Dec 12. PS5 is out. It is due on Dec 06.

Key QuestionsWhat data should be cached and when?Fetching Policy How can updates be made visible everywhere? Consistency or Update Propagation Policy What data should be evicted to free up space? Cache Replacement Policy 3

Cache Consistency ApproachesWe will study 7 cache consistency approaches:Broadcast InvalidationsCheck on Use Callback Leases Skip Scary Parts Faith-Based Caching Pass the Buck

Check on UseThe server does not invalidate cached copies upon updatesRather, a requestor at any site checks with the server before using any objectVersioning can be used, wherein each copy of a file is given a version numberIs my copy still valid?If no, fetch a new copy of the objectIf yes and I am a reader, proceed If yes and I am a writer, proceed and write-back when done

Check on UseHas to be done at coarse-granularity (e.g., entire file or large blocks)Otherwise, reads are slowed down excessivelyIt results in session semantic if done at whole file granularity Open {Read | Write}* Close  “session” Updates on an open file are initially visible only to the updater of the file Only when the file is closed, the changes are made visible to the server

Check on UseDisadvantages:“Up-to-date” is relative to network latency Server Client 1 Is version of file F still X? Client 2 YES YES Update Update Concurrent Updates! Write-Back Write-Back Is version of file F still X? (F) (F) (F)

Check on UseDisadvantages:How to handle concurrent writes? The final result depends on whose write-back arrives last at the serverThis gets impacted by network latencyIf updates A and B are exactly the same And the machines where they are pursued are homogenous And A is started, finished, and sent before B It is not necessary that A will reach the server before B Slow readsEspecially with loaded servers and high-latency networks

Check on UseDisadvantages:Pessimistic approach, especially with read-most workloadsCan we employ an optimistic (or Trust-and-Verify) approach? Advantages: Strict consistency (not across all copies) at coarse granularity No special server state is needed Servers do not need to know anything about caching sitesEasy to implement

Cache Consistency ApproachesWe will study 7 cache consistency approaches:Broadcast InvalidationsCheck on Use Callback Leases Skip Scary Parts Faith-Based Caching Pass the Buck

CallbackA write goes as follows:Reads on cached objects can proceed directly Server Client 1 Client 2 Client 3 Need to Write on F1 (F1, F2, F3) (F1) (F1, F2) (F3) Invalidate F1 Ack Go Ahead Write on F1 Write- back F1 (F2)

CallbackThe server maintains a directory that keeps track of who is currently caching every objectThus, upon an update to an object, the server sends invalidation messages (i.e., or callbacks) only to sites that are currently caching the object Typically done at coarse granularity (e.g., entire file) Can be made to work with byte ranges

CallbackAdvantages:Targeted notification of caching sitesZero network traffic for reads of cached objectsBiases read performance in favor of write-performance Excellent scalability, especially with read-most workloads

CallbackDisadvantages:Complexity of tracking cached objects on clientsSizable state on serverSilence at the server is ambiguous for clientsWhat if a client has been reading a file for a little while without hearing back from the server? Perhaps the server is down A keep-alive (or heartbeat) mechanism can be incorporated, whereby the server pings the clients (or the other way around) every now and then indicating that he is still alive

Cache Consistency ApproachesWe will study 7 cache consistency approaches:Broadcast InvalidationsCheck on UseCallback Leases Skip Scary Parts Faith-Based Caching Pass the Buck

LeasesA client places a request to obtain a finite-duration control from the serverThis duration is called a lease period (typically, for few seconds)There are three types of leasesRead and write leases, assuming an invalidation-based protocol Multiple requestors can obtain read leases on the same object, but only one can get a write lease on any object Open leases, assuming a check-on-use protocolA requestor loses control when its lease expiresHowever, it can renew the lease if needed

Lease RenewalExample: Server Client 1 Give me a read lease for time X on F (F) (F)Sorry, I can give you a read lease for time Y on F Renew my read lease for another time Y on F Okay, you got an extension for Y time on your read lease over F Read F for duration Y Read F for duration Y Clocks at Involved Machines are Assumed to be Synchronized

LeasesA write goes as follows, assuming an invalidation-based protocol: Server Client 1 Client 2 Client 3 Need to Write on F1 for time t ([F1, <1, nl>, <2, nl>], [F2, <2, nl>], [F3, <3, nl >]) (F1) (F1, F2) (F3) Invalidate F1 Ack Go Ahead Write on F1 Write- back F1 [Fi, <x, y>] = File Fi is cached at Client x and is either not leased (i.e., y = nl ), or read-leased ( y = rl ), or write-lease ( y = wl ). ([F1, <1, wl >], [F2, <2, nl >], [F3, <3, nl >]) ([F1, <1, nl >], [F2, <2, nl >], [F3, <3, nl >]) State: (F2)

LeasesWhat if a write request arrives to the server in between?It can be queued, until the previous request is satisfiedOnly one write can go at a time and multiple requests can be queued and serviced in a specific order (e.g., FIFO order)When serviced, the up-to-date copy has to be shipped to its site (as its copy has been invalidated before allowing the previous write to proceed) What if a read request arrives to the server in between? It can be queued as well After write is done, either another write is pursued singlehandedly, or one or more reads go in parallelIn any case, the up-to-date copy has to be shipped as well

LeasesAn open goes as follows, assuming session-semantic: Server Client 1 Client 2 Client 3 Open F1 for time t’’’ ([F1, <2, t’>], [F2, <2, t’’>], [F3, <3, t>])(F1) (F1, F2) (F3) Go Ahead Write on F1 for time t’’’ Write- back F1 [Fi, <x, y>] = File Fi is cached at Client x and either has its lease expired (i.e., y = E ), or valid till end of y . State: t’ t’’’ ([F1, <2, t’>, <1, t’’’>], [F2, <2, t’’>], [F3, <3, t>]) Time Intervals: ([F1, <2, t’>, <1, E>], [F2, <2, t’’>], [F3, <3, t>]) t’ > t’’’  Push New Value Push F1 Client 2 can see up-to-date F1 without polling the server

LeasesAn open goes as follows, assuming session-semantic: Server Client 1 Client 2 Client 3 Open F1 for time t’’’ ([F1, <2, t’>], [F2, <2, t’’>], [F3, <3, t>])(F1) (F1, F2) (F3) Go Ahead Write on F1 for time t’’’ Write- back F1 State: t’ t’’’ ([F1, <2, t’>, <1, t’’’>], [F2, <2, t’’>], [F3, <3, t>]) Time Intervals: ([F1, <2, E>, <1, E>], [F2, <2, t’’>], [F3, <3, t>]) t’ < t’’’  Do Not Push New Value Client 2 does NOT see up-to-date F1 (It can pull it after t’ expires) [Fi, <x, y>] = File Fi is cached at Client x and either has its lease expired (i.e., y = E ), or valid till end of y .

LeasesIn this case:A lease becomes a promise by the server that it will push updates to a client for a specified time (i.e., the lease duration)When a lease expires, the client is forced to poll the server for updates and pull the modified data if necessary The client can also renew its lease and get again updates pushed to its site for the new lease duration Flexibility in choices!

LeasesAdvantages:Generalizes the check-on-use and callback schemesLease duration can be tuned to adapt to mutation rateIt is a clean tuning knob for design flexibilityConceptually simple, yet flexible

LeasesDisadvantages:Lease-holder has total autonomy during leaseLoad/priorities can change at the serverRevocation (where a lease is withdrawn by the server from the lease-holder) can be incorporated In an invalidation-based, lease-based protocol:Writers will be delayed on an object until all the read leases on that object are expired Keep-alive callbacks are needed Stateful server, which typically implies inferior fault-tolerance and scalability (in terms of capacity and communication)

Cache Consistency ApproachesWe will study 7 cache consistency approaches:Broadcast InvalidationsCheck on UseCallback Leases Skip Scary Parts Faith-Based Caching Pass the Buck

Skip Scary PartsBasic Idea:When write-sharing is detected, caching is turned offAfterwards, all references go directly to the master copyCaching is resumed when write-sharing ends Advantages: Precise single-copy semantics (even at byte-level consistency) Excellent fallback strategy Exemplifies good engineering: “Handle average case well; worst case safely” Good adaptation of caching aggressiveness to workload characteristics (i.e., patterns of reads and writes)

Skip Scary PartsDisadvantages:Server needs to be aware of every use of dataAssuming it is used in conjunction with check-on-useEither clients expose their wills of making writes upon opening filesOr the server relies on clients’ write-backs upon closing files (which indicate writes on files)Server maintains some monitoring state

Cache Consistency ApproachesWe will study 7 cache consistency approaches:Broadcast InvalidationsCheck on UseCallback Leases Skip Scary Parts Faith-Based Caching Pass the Buck

A Primer: Eventual ConsistencyMany applications can tolerate inconsistency for a long timeWebpage updates, Web Search – Crawling, indexing and ranking, Updates to DNS ServerIn such applications, it is acceptable and efficient if updates are infrequently propagatedA caching scheme is termed as eventually consistent if: All replicas will gradually become consistent in the absence of updates

A Primer: Eventual ConsistencyCaching schemes typically apply eventual consistency if:Write-write conflicts are rareVery rare for two processes to write to the same objectGenerally, one client updates the data object E.g., One DNS server updates the name-to-IP mappingsRare conflicts can be handled through simple mechanisms, such as mutual exclusion Read-write conflicts are more frequent Conflicts where one process is reading an object, while another process is writing (or attempting to write) to a replica of it Eventually consistent schemes have to focus on efficiently resolving these conflicts

Faith-Based CachingBasic Idea (an implementation of eventual consistency):A client blindly assumes cached data is valid for a whileReferred to as “trust period”E.g., In Sun NFSv3 cached files are assumed current for 3 seconds, while directories for 30 secondsA small variant is to set a time-to-live (TTL) field for each object It periodically checks (based on time since last check) the validity of cached data No communication occurs during trust period Advantages: Simple implementation Server is stateless

Faith-Based CachingDisadvantages:Potential user-visible inconsistencies when a client accesses data from different replicasConsistency guarantees are typically needed for a single client while accessing cached copies (e.g., read-your-own-writes) Webpage-A Event: Update Webpage-A Webpage-A Webpage-A Webpage-A Webpage-A Webpage-A Webpage-A Webpage-A Webpage-A Webpage-A Webpage-A Webpage-A This becomes more of a consistency problem for server-side replication (we will discuss it later under server-side replication)

Cache Consistency ApproachesWe will study 7 cache consistency approaches:Broadcast InvalidationsCheck on UseCallback Leases Skip Scary Parts Faith-Based Caching Pass the Buck

Pass the BuckBasic Idea (another implementation of eventual consistency)Let the user trigger cache re-validation (hit “reload”)Otherwise, all cached copies are assumed validEquivalent to infinite-TTL faith-based cachingAdvantages:Simple implementation Avoids frivolous cache maintenance traffic Server is stateless

Pass the BuckDisadvantages:Places burden on usersUsers may be clueless about levels of consistency neededAssumes existence of usersPain for write scripts/programs

Cache Consistency ApproachesWe will study 7 cache consistency approaches:Broadcast InvalidationsCheck on Use Callback Leases Skip Scary Parts Faith-Based Caching Pass the Buck Many minor variants over the years, but these have withstood the test of time!

Three Key QuestionsWhat data should be cached and when?Fetching Policy How can updates be made visible everywhere? Consistency or Update Propagation Policy What data should be evicted to free up space? Cache Replacement Policy

Next ClassDiscuss cache replacement policies