for InternetScale Applications Atul Adya Gregory Cooper Daniel Myers Michael Piatek Google Seattle 1 A Case for Notifications Problem Ensuring cached data is fresh across users and devices ID: 464154
Download Presentation The PPT/PDF document "Thialfi: A Client Notification Service" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Thialfi: A Client Notification Servicefor Internet-Scale Applications
Atul Adya, Gregory Cooper, Daniel Myers, Michael PiatekGoogle Seattle
1Slide2
A Case for Notifications
Problem:
Ensuring cached data is fresh across users and devices
2Slide3
Common Application PatternsClients poll to detect changesSimple and reliable, but slow and inefficient
Push updates to the clientFast but complexAdd backup polling to get reliabilityTail latencies can be high: masks bugsApplication-specific protocol
sacrifice reliability
3Slide4
Our Solution: ThialfiScalable: tracks millions of clients and objects
Fast: notifies clients in less than a secondReliable: even when entire data centers failEasy to use: deployed in Chrome Sync, Contacts, Google Plus
4Slide5
Talk OutlineThialfi’s abstraction: reliable signaling
Delivering notifications in the common caseDetecting and recovering from failuresEvaluation and experience
5Slide6
Thialfi Overview
Thialfi client library
Register X
Notify X
Client
Data center
X: C1, C2
Client C1
Client C2
Thialfi Service
Update X
Register
Register
Update X
Application backend
Notify X
Notify X
6Slide7
Thialfi AbstractionObjects have unique
IDs and version numbers, monotonically increasing on every updateDelivery guaranteeRegistered clients learn latest version number
Reliable
signal
only: cached object
ID
X at version Y
7Slide8
Why Signal, Not Data?Developers want reliable, in-order data delivery
Adds complexity to Thialfi and application, e.g.,Hard state, arbitrary bufferingOffline applications flooded with data on wakeupFor most applications, reliable signal is enoughInvoke polling path on signal: simplifies integration
8Slide9
API Without Failure Recovery
Thialfi Service
Publish(
objectId
, version)
Client
Library
Register(
objectId
)
Unregister(
objectId
)
Notify(
objectId
, version)
9Slide10
Talk OutlineThialfi’s abstraction:
reliable signalingDelivering notifications in the common caseDetecting and recovering from failuresEvaluation and experience
10Slide11
Architecture
Client
Bigtable
Matcher:
Object ID
registered clients, version
Registrar:
Client ID
registered objects, notifications
Client
Registrar
Matcher
Object
BigtableData center
Notifications
Application Backend
Registrations, notifications,
acknowledgments
Client library
11Slide12
C1: x, v7
C2: x, v7
C1: x, v5
C2: x,
x:
v5;
C1,
C2
x:
v7
; C1, C2
x:
v7; C1,
C2
x
Life of a NotificationClientBigtable
C1: x, v7C2: x, v7
Notify: x,
v7
Client C2
Matcher
Object
Bigtable
Data center
Publish(x, v
7
)
x,
v7
Ack
: x
,
v7
12
RegistrarSlide13
Talk OutlineThialfi’s abstraction:
reliable signalingDelivering notifications in the common caseDetecting and recovering from failures
Evaluation and experience
13Slide14
Data center loss
Server state loss/schema migration
Partial storage unavailability
Possible Failures
Client
Library
Client
Bigtable
Registrar
Matcher
Object
Bigtable
Client
Bigtable
Registrar
Matcher
Object
Bigtable
. . .
Data center 1
Data center n
Thialfi Service
Client
Store
Client restart
Client state loss
Publish Feed
Network failures
14Slide15
Failures Addressed by ThialfiClient restartClient state lossNetwork failures
Partial storage unavailabilityServer state loss / schema migrationPublish feed lossData center outage15Slide16
Main Principle: No Hard StateThialfi remains correct even if all state is lost
All registrationsAll object versionsDetect and reconstruct after failures using:ReissueRegistrations()
client event
Registration Sync Protocol
NotifyUnknown
()
client event
16Slide17
Recovering Client Registrations
17
Registrar
Matcher
Object
Bigtable
x
y
x
y
ReissueRegistrations()
Register(x); Register(y)
ReissueRegistrations
: Not a burden for applications
Application
stores objects in its cache, or
Object list is implicit, e.g., bookmarks for user XSlide18
Registrar
Matcher
Object
Bigtable
Register: x, y
Syncing Client Registrations
x
y
Hash(x, y)
x
y
Goal:
Keep client-
r
egistrar registration state in sync
Every message contains hash of registered objects
Registrar initiates protocol when detects out-of-sync
Allows simpler reasoning of registration state
Reg
sync
18
Hash(x,
y)Slide19
Recovering From Lost VersionsVersions may be lost, e.g. schema migration
Refreshing from backend requires tight couplingInform client with NotifyUnknown
(
objectId
)
Client must refresh, regardless of its current state
19Slide20
Talk OutlineThialfi’s abstraction:
reliable signalingDelivering notifications in the common caseDetecting and recovering from failures
Evaluation and experience
20Slide21
Notification Latency Breakdown
Batching accounts for significant fraction of latency
21Slide22
Thialfi Usage by Applications22
Application
Language
Network Channel
Client Lines
of Code
(Semi-colons)
Chrome Sync
C++
XMPP
535
Contacts
JavaScript
Hanging
GET
40
Google+
JavaScript
Hanging
GET
80
Android Application
Java
C2DM + Standard
GET
300
Google BlackBerry
Java
RPC
340Slide23
Some Lessons LearnedAdd complexity at the server, not the clientDeploy at server: minutes. Upgrade clients: years+
Asynchronous events, not callbacksSpontaneous events occur: need to handle themInitial applications have few objects per clientEarlier use of polling forces such a model
23Slide24
Thialfi SummaryFast, scalable notification serviceReliable even when data centers failTwo key ideas simplify failure handling
Deliver a reliable signal, not dataNo hard state: reconstruct after failureDeployed in Chrome Sync, Contacts, Google+
24