/
UNIT - 2 Remote Procedure Call (RPC) UNIT - 2 Remote Procedure Call (RPC)

UNIT - 2 Remote Procedure Call (RPC) - PowerPoint Presentation

erica
erica . @erica
Follow
66 views
Uploaded On 2023-06-22

UNIT - 2 Remote Procedure Call (RPC) - PPT Presentation

Motivation A RequestReply protocol model naturally fits with the ClientServer model and hence is appropriate for distributed systems RPCs Remote Procedure Calls emerged as a IPC protocol for designing several distributed applications in 1994 ID: 1001772

process server client time server process time client clock call coordinator message amp rpc node processes algorithm binding interface

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "UNIT - 2 Remote Procedure Call (RPC)" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. UNIT - 2Remote Procedure Call (RPC)

2. MotivationA Request/Reply protocol model naturally fits with the Client/Server model, and hence is appropriate for distributed systems.RPCs (Remote Procedure Calls) emerged as a IPC protocol for designing several distributed applications in 1994.RPC is a mechanism through which control & data is transferred from one program to another.

3. MechanismThe caller places arguments to the procedure (located at remote place) in some specified location and format.Control is then transferred to the sequence of instructions that constitute the body of remote procedure.Procedure is executed.After execution, the control (and data as result) is returned back to caller.

4. ComplexityThe remote-procedure does not reside in the address space of the calling process.The remote-procedure may be on the same computer or on a different computer; thus parameters and results are complicated.Machines can crash or network may fail.

5. UNIT - 2Design Issues

6. Parameter PassingCall by valueParameters are copied into a message.Suitable for simple compact types like integers.Passing large structures can increase transmission costs.Call by referenceIn absence of shared memory and presence of disjoint address space, highly difficult.Copy-in and Copy-out may help but is language dependent.Call by Object referenceEmerald designers proposed moving the parameter-object along with its reference to the callee’s node.Depending upon whether the object is moved back to caller’s node or not after the call, this could be interpreted as call-by-visit or call-by-move respectively.

7. Data RepresentationDifferent byte-orderingLittle Endian or Big EndianDifferent sizes of Integers and other types16-bit or 32-bit, or 1’s or 2’s complementDifferent floating point representationsDifferent character setsASCII, EBCDIC, Unicode.A simple solution can be to do conversions on the flyOrImplicit (only values are transmitted) or Explicit (both type & value are transmitted) typing can be employed.

8. Call SemanticsNormal functioning of RPC may be disrupted due toCall or Response are lost due to network failureCaller or Callee node crashes and gets restartedTherefore, some call semantics need to be standardized.Possibly/may-be call semanticsWeakest semantics (for completeness only)To prevent caller from waiting indefinitely for a response, a timeout based mechanism is employed.Doesn’t guarantee anything about receipt of call message or execution.Suitable for some applications distributed over LAN network with high reliability.

9. Call SemanticsLast-one call semanticsRetransmission of call messages based on timeouts until a response is recived by the callerClearly, the results of the last executed call are used by the caller.Last-one can be achieved easily when only 2 nodes are involved.If N1 crashes, it again calls R1which inturn calls another R2.Orphan calls tend to create problem and their extermination is a difficult and costly solution.N1N2N3R1R2

10. Call SemanticsLast-of-Many call semanticsSimilar to Last-one Call Semantics except the result of Orphan calls is avoided or discarded.Calls are given unique identifiers and response messages have corresponding identifier associated.The caller compares the identifier of response with the latest repeated call for its acceptance.Unfortunately, the caller has to wait for last response.

11. Call SemanticsAtleast-Once call semanticsTimeout based retransmission without caring for Orphan calls.For nested calls, it takes the first response message.Weaker call-semantics.Exactly-once call semanticsFeature of LPC’s and thus is most desirable.The disadvantage of previous call-semantics is that they don’t guarantee same results for same parameters if the procedure is executed more than once.E.g.readNextRecord(filename);Malloc(10);Timeouts, retransmissions, call-identifiers, reply-cache and duplicate filtering is employed here to achieve it.

12. Server Creation SemanticsServer processes may either be created before clients invoke RPCs or on demand.Based on the time duration for which servers survive, they can be:Instance per call ServerServer is created on demand and then terminated.Any state information must be maintained by the OS or client.In case OS does, RPCs become expensive while in other case they lose transparency.Also, multiple requests to same type of server is expensive.

13. Server Creation SemanticsInstance per session ServerServer exists for the entire session initiated by the client.Normally, it is dedicated to one client, and hence maintains the state information of that client until client declares end-of-session.Thus, it can only be used by a single client and hence is not scalable.Persistent ServerExists indefinitely and is shared by many clients.Created before client requests.Has to service concurrent requests and thus RPCs need to be designed accordingly (see pop-up threads)Reliability can be achieved by replication along with load-balancing.

14. BindingClient needs to locate the server before the call.Process by which a client process becomes associated with the server process so that calls can take place is called BINDING.

15. Considerations in bindingServer NamingSever LocatingBinding TimeChanging BindingMultiple Simultaneous Bindings

16. Server NamingAn Interface name is used by client to specify the server.2 parts:Type – specifies the interface itself (e.g. FAT_FS_SVC)Instance – specifies one the several instances of same serverIn general, type is enough.Version numbers can be associated with type field for providing new as well as old servers (e.g. FAT_FS_SVC_1_0 and FAT_FS_SVC_1_1)Interface names are created by programmers and are not dictated by RPC Packages.

17. Server LOCATINGTwo common methods:BroadcastingSend messages from client to all nodes for interface typeIf server is replicated, many response messages are received.Choose the best one (load & network)Good for small n/ws but the method is expensiven/w traffic for large n/wsIncomplete and out-of-date decision making criteria (no. of servers, workload, best path)

18. Server LOCATINGBinding AgentA name-server (naming-agent) is used to bind client with server by providing client with location of server.In addition, it contains complete and up-to-date decision making criteria.Binding-table contains mapping of a server’s interface to its location.Additional information can be instances, versions, load, best path, etc.

19. Server LOCATINGBinding AgentBinding agent can poll servers periodically for existence.Address is implementation specific.Client may use broadcasting and caching for locating binding agent.On relocation of binding agent, the name-server can use broadcasting to intimate every node.

20. Server LOCATING3 primitives RegisterWhen server goes up, it registers it self with the binding agentCan be located by broadcastingDe-registerWhen server goes down, it de-registers itself but may cache the location of binding-agent.Look-upThe primitive is used by client for finding the location of server.

21. Binding TimeCompile TimeHardcodes values in codeInflexible if server is moved, replicated or interface is changedCan’t exploit the runtime characteristics of the system for efficient decisionLink TimeClient contacts agent for interface locationAgent returns handle and client caches itClient calls the RPC Good for situationsWhen client calls a specific RPC multiple timesCAS

22. BINDING TIMECall TimeServer Client binding takes place when the client calls the server for the first-time.Indirect Call MethodPasses interface name and arguments to the agent.Agent on behalf of client calls the RPC and returns the handle and result.Next time, a direct call can be made.CAS

23. Changing BindingsA Call to server failsContact agentServer is moved to another nodeContact agentA new version is installedContact agentThe state information migration is also required.

24. Multiple simultaneous BindingsA client may be bound to many servers of same typeReliability and fault toleranceA multicast communication at binding-agent can be establishedE.g. An update to a file replicated at several nodesCAS1S2

25. Server LOCATINGAdvantages Fault tolerant as multiple servers of same interface type are possible.Load balancingBest pathFiltering of clientsLocation transparency Low n/w bandwidth consumptionDisadvantagesSingle point failureReplicating agents can help but synchronization is to be ensuredPerformance bottle neckAgents with binding information of specific class of services can be usedOverhead in binding if many short lived clients exist

26. Reading AssignmentDistributed systems: principles and paradigms by AST & MV SteenChapter 22.4. REMOTE PROCEDURE CALL 2.4.1. Basic RPC Operation 2.4.2. Parameter Passing 2.4.3. Dynamic Binding 2.4.4. RPC Semantics in the Presence of Failures Client Cannot Locate the ServerLost Request Messages Lost Reply messages Server Crashes Client Crashes 2.4.5. Implementation Issues

27. UNIT - 2Implementation of RPC

28. Implementation of RPCTransparency is the main goalSyntactic & SemanticRPCs achieve this goal by exploiting the concept of stubs“Every problem in computer science can be solved by adding a layer of abstraction”RPC Packages contain 3 entitiesClient/Server processClient/Server StubRPC runtime

29. Implementation of RpcClient/Server processClient StubPacks the specification of the target RPC and arguments into a message and unpacks on receipt of resultServer StubUnpacks the call message and packs the resultRPC runtimeHandles transmissionInteracts with binding agentHandles retransmission, call semantics, etc.

30. Implementation of rpcCallpacksendReturnunpackreceiveResultpacksendBody of RPCunpackreceiveClientClientStubRPCRuntimeRPCRuntimeServerStubServer

31. Implementation of rpcStub-Generation can be done in 2 ways:ManuallyRPC programmer provides a set of translation functions from which a user can construct his or her own stubs.Easy to implement and can handle complex parameter types.

32. Implementation of rpcAutomaticallyUses IDL (Interface Definition Language) to define the Interface.ID is a list of procedure signatures, their arguments, and result types – all provided by the Interface.Also contains:Constants, Enumerated types, & so on to be used by both Client & Server.Whether argument(s) is input type or output type or bothInput type are copied from Client to ServerOutput type are copied from Server to ClientServer exports that interface while client imports it.Hence, Compile-time type checking is possible.

33. Implementation of rpcIDL CompilerUses the Interface-definition to create (automatically) Client and Server stubs.Uses the Interface-definition to create routines for argument-marshaling and un-marshaling.Marshaling means taking data and converting it into suitable form for transmission.Uses the interface-definition to create other files.Interface sum_svc { int sum { [in] int x; [in] int y; }; }

34. UNIT - 2CLASSES of RPC

35. Classes of RPCCallback RPCClient-Server relationship fits with RPCs; but peer-to-peer relationship is required by some applications.Example: A remote-interactive application may need user to input some data periodicallyCallback RPCsRPC is called by ClientServer executes some part of RPC and calls the Client backClient processes and returns the requested data to serverStep 2 & 3 can happen multiple timesFinally, server returns the result

36. Classes of RPCClientServerRPCcallbackResultCallback result

37. Classes of RPC3 Issues in Callback RPCProviding Server with Client’s handleClient that uses Callback RPC should use transient but unique identifier for Callback service and hence should register with binding agent.This identification should be passed to server during RPC call.The Server should invoke the RPC on Client for Callback RPCPeer-to-peer relationshipMaking Client process to waitPrimitive should be synchronous/blockingHandling DeadlocksP1P2P3

38. Classes of RPCBroadcast RPC1-to-1 relationship fits with RPCs; but 1-to-many relationship is required by some applications.Example: An update to a file replicated at n-nodes.2 ways:Use of special broadcast primitive that is processed by binding-agent for calling RPCs in multiple serversUse of special broadcast port to which all nodes are connected

39. Classes of RPCBatch-Mode RPCRPC are not called frequently but some applications may call RPCs frequently.To reduce the overhead of sending every individual RPC independently and individual waiting time, they can be buffered at client and sent to server in a batch.The prime requisite of this mode is that client shouldn’t require the reply for the sequence of requests.How to queue?Pre-defined intervalPre-defined numberBuffer space

40. Classes of RPCComplicated RPCLong Duration CallsSome mechanism is to be established to keep the parties in sync.1) Periodically, send a probe packet to server which is acknowledged immediately. The packet contains message identifier of last call.The acknowledge may contain processing or failed.2) Periodically, an acknowledgment is generated by Server to tell Client that I am processing the requestIf the ack is not received, then Client assumes Server has crashed or n/w has failed.

41. Classes of RPCComplicated RPCLong Message CallsSome mechanism is to be established if the arguments do not fit in a single packet.1) Use several physical RPCs for one logical RPCFixed overhead in each individual RPC2) Fragment at lower-level in protocol hierarchy

42. UNIT - 2RPC in LINUX

43. RPC IN LINUXStub GenerationBoth Automatic and ManualProcedure Arguments & ResultAccepts only one argument and returns on resultMultiple arguments can be packed into a single one and then sentLike structure in case of C LanguageUNIX RPCs have 2 arguments – pointer to single argument struct and handle of clientMarshalingRPC-runtime library has procedures used by stubs for marshaling some basic data types

44. RPC IN LINUXCall SemanticsSupports atleast-once call semanticsTimeout=5 seconds, retries = 5 timesException HandlingError Strings or global stderr variableBindingNo n/w wide client server bindingEach Server-Node has a local-binding agent called portmapper.It maintains a database of each service identified by its program number, version number and its map to port-number.Clients has to explicitly mention the hostname of serverLocation Transparency is compromised

45. RPC IN LINUXSecurity No authenticationUNIX StyleUsing UID and GIDDES StyleEach user has a unique netname which is sent in encrypted form

46. RPC IN LINUXClasses of RPC Asynchronous RPCSet timeout to zeroCallback RPCRegister Client process as Server on local portmapperBroadcast RPCCall is directed to all portmappersBatch modeUsing queuing

47. UNIT - 2Synchronization in Distributed Systems

48. SynchronizationCertain rules are to be followed in an OS for sharing resources among concurrently executing programs to get correct results – Synchronization mechanisms.Synchronization is harder to achieve in distributed systemsDisjoint address spacePhysical unreliable networkScattered relevant information over multiple machines

49. Clock SynchronizationTemporal ordering of events produced by concurrent processes is mandatoryOn centralized system, all processes get same clock and thus it can be achievedIn distributed system, there are multiple clocks and if they are not synchronized Senders & receivers will be out-of-syncSerialization of concurrent access to shared objects can’t be guaranteedThis Clock synchronization can be achieved bySynchronizing Physical clocksUsing Logical Clocks

50. UNIT - 2Physical clock Synchronization

51. Physical Clock SynchronizationSet and start all clocks at the same timeComputer clocks are realized as quartz crystal which oscillate at certain frequency when put under tensionIf put under specific tension, can generate clock ticks at specific intervals.However, the frequency also depends upon on the physical characteristics likeVoltage, humidity, temperature, cut, quality, etc.This means even if two (or more) clocks are set and started at the same time, the clocks may drift from ideal clock and hence from each other.What is the solution?Attach UTC receiver (atomic clock) to each machineEconomically not feasibleAttach UTC receiver to one machine and Periodically synchronize all clocks

52. Physical Clock SynchronizationWhen to synchronize?Drift rate is the rate with which a clock drifts away from expected real time (generally 1 sec in 10-11 days)Clock skew is the amount of difference between 2 clocks at any instant of time.Depending upon the nature and criticality of the system any 2 clocks are said to be synchronized if the clock skew is less than some specific constant.

53. Physical Clock SynchronizationFastNormalSlowUTC time (t)Clock time (c)dc/dt < 1; means Slowdc/dt = 1;Perfect Clockdc/dt > 1;Fast Clock

54. Physical Clock SynchronizationTimeUTC (1 tick/sec)Slow ( 0.5 tick /1 sec)Fast ( 1.5 ticks /sec)11221 (1-2= -1)3 (3-2 = +1)33442 (2-4= -2)6 (6-4 = +2)55663 (3-6= -3)9 (9-6= +3)77884 (4-8= -4)12 (12-8= +4)9910105 (5-10= -5)15 (15-10= +5)111112126 (6-12= -6)18 (18-12= +6)

55. Physical Clock SynchronizationIn worst case, the 2 clocks will drift in opposite direction, then after ∆t UTC time, they are 2d∆t Clock time apart.If the maximum skew affordable is S,Then S= 2d∆t ∆t = S / 2dThus, after S/2d interval, the clocks need to be synchronized so that maximum skew is less than S.

56. Physical Clock SynchronizationWhat if 2 clocks drift in Opposite direction with different rates?S=d1∆t + d2∆t∆t = S/(d1+d2)If d1 > d2, thenS/2d1 < S/(d1+d2)We synchronize earlyWhat if 2 clocks drift in same direction?S=d1∆t – d2∆t∆t = S/(d1-d2)If d1 > d2, or d1 < d2 or d1 = d2 thenS/2d < S/(d1-d2)We again synchronize early

57. Physical Clock SynchronizationFastNormalSlowUTC time (t)Clock time (c)∆t2d∆t

58. Physical Clock SynchronizationBased on this proposition, we have 2 types of Clock Sync algorithmsCentralized AlgorithmPassive-Time Server AlgorithmActive-Time Server AlgorithmBerkeley AlgorithmDistributed AlgorithmGlobal Averaging AlgorithmLocalized Averaging Algorithm

59. Passive-time server AlgorithmSteps:A Time-Server node has UTC receiverPeriodically (before S/2d time is over) every node sends a message to this Time Server to know its time and accordingly synchronize.The Time Server responds immediately with its current timeIssue:Due to propagation delay incurred, the received time needs to be adjusted.Considering symmetric delay; Current time = t + (T1 – T0)/2CST0T1t

60. Passive-time server AlgorithmIssue:The measure doesn’t take into consideration the elimination of request processing time for accurate measure.Considering symmetric delay; Current time = t + (T1 – T0 - I)/2This way only the time taken by message to reach the client is used for adjustment.CST0T1It

61. Passive-time server AlgorithmThe accuracy can be improved Series of calls yielding a number of (T1 – T0)The minimum of the measurements or the average is considered.Fault Tolerant AverageValues (T1 – T0) which are greater than some threshold value are discarded and are considered victims of n/w congestion.Cristian Algorithm used in NTP.

62. ACTIVE-time server AlgorithmSteps:A Time-Server node has UTC receiverTime Server periodically broadcasts its current time periodicallyAll nodes have some prior knowledge of minimum n/w delayUsing this estimate:Correct Time = T + tdIssue:Not Fault tolerant; if n/w delay > td

63. Berkeley AlgorithmSteps:No UTC is usedTime Server asks every node for their current timeTime Server has some prior knowledge of n/w delay between every node and itselfUsing this delay, correct time of every node is estimated.Fault-Tolerant Average of all values (including its) is calculatedThe adjustments are then propagated to all nodes (no UTC)

64. Berkeley Algorithm151218(16+19+13)/3=16Adj-Server= 16 -16= 0Adj-Client A = 16 – 19= -3Adj-Client B = 16 – 13= +3Td=1Td=118+1=1912+1=1315+1=16-3+320 - 3 = 1714 + 3 = 1717 + 0 = 17

65. Centralized AlgorithmsDrawbacks:Subject to single-point failuresNot Scalable+ve adjustments have no problem but –ve adjustments can create chaos

66. Distributed Global Averaging AlgorithmSteps:Every node periodically broadcasts its local timeThen it waits for some specified time (T), during whichIt collects same messages from other nodes,For every message, the node records the arrival time acc. To its own clock,After T time has lapsed,The node estimates the skew of its clock w.r.t. each of the other nodes,Computes fault tolerant average,Uses the skew to adjust its clockWhen to resync?T0 + iRT0 is fixed time in past agreed upon by all nodesR is system parameter

67. Distributed local Averaging AlgorithmGlobal averaging algorithm puts load on n/wIn this algorithm, 2 near neighbors exchange their clock time to get average and re-adjust their clocksLoad on n/w is reducedWith time all the clocks in a system get synchronized and then re-synchronizedHowever, it requires some ordering of nodes.

68. UNIT - 2Logical clock

69. Logical clockIt is sufficient to ensure that all events be totally ordered in a manner consistent with observed behavior – LamportLets define time in terms of the order in which the events occur. And not in terms of physical clock time.Getting all our events marked by unique numbers in sequence – Logical Clock.

70. Happened-before relationDenoted by ->If a & b are two events in same process and a occurs before b, then a->bIf two process exchange messages, then lets assume that a is an event message-sent and b is an event message-received, then a->bLaw of casualtyIf a->b and b->c, then a->c

71. Happened-before relationa->a is not trueIf a & b are two events in 2 processes that do not exchange messages (directly or indirectly)Neither a->b nor b->a Because they are concurrent, hence nothing can be saidIn other words, neither can casually affect the otherCasual OrderingP1P2cbaThough, a->c but!

72. Happened-before relationPartial orderingP1P2bcaP3defThough, a->b->c->d->e but!

73. Implementation of Logical ClockIt is a way to associate a timestamp (a number) with each event so that events that are related (non-concurrent) to each other by happened-before relation can be properly ordered.If a->b, then clock(a) -> clock(b)Clock must always go forwardClock is incremented between any two successive events (related or not)Can be implemented using Counters or physical clocksCan be global or localGlobal makes the system centralized and hence vulnerable to problems!!!

74. LOCAL Logical ClockEvery process gets a logical clockIf event a is sending a message by process p1 having clock c1 gets a timestamp t1 to process p2 having clock c2; then increment c2If t1 < c2 +1; set c2 = c2 + 1Else if t1 >= c2 +1;Set c2 = t1 + 1

75. TimeP1 (4 ticks)P2 (7 ticks)P3 (10 ticks)00001471028142031221304162840520355062442 => 51607285870832 => 5965809637290106779100

76. Reading AssignmentDistributed systems: principles and paradigms by AST & MV SteenChapter 33 Synchronization in Distributed Systems 3.1. CLOCK SYNCHRONIZATION 3.1.1. Logical Clocks 3.1.2. Physical Clocks 3.1.3. Clock Synchronization Algorithms Cristians Algorithm The Berkeley Algorithm Averaging Algorithms

77. Unit 2Mutual Exclusion in Distributed OS

78. Mutual Exclusion?It is a way to ensure that two or more processes access a shared resource in a serialized way.In other words, exclusive access is given to process to update a shared resource.The region within the process that is given exclusive access – Critical Region

79. How to achieve It?Centralized AlgorithmsDistributed AlgorithmsContention BasedTimestamp BasedVoting BasedToken Based

80. Centralized AlgorithmA Coordinator process coordinates exclusive access to shared resources01230REQUESTPut it in QGRANTEDInside Critical Region REQUESTRELEASEPull next request from QGRANTED

81. Advantages & ProblemsAdvantagesGrants permission in the order of requests – A Fair Algorithm.Easy to implementProblemsSingle point failure can bring down the system.Not ScalableConfusion in dealing with Denial & Dead.Can be solved by adding a message for Denial.

82. Distributed AlgorithmsContention basedToken Based

83. Contention Based Contention based algorithms allow multiple processes to request for shared resources but solve the argument based onTimestampVoting

84. Timestamp Based When a process p wants to access a shared resource, it sends a message to all other nodes containing the resource ID and its Timestamp.The receiver can take following actions:If it is not in critical region and does not want to enter into it, it sends reply message.If it is in critical region, it does not reply and queues the request message.If it wants to enter critical region, it compares the two timestamps – sender’s and receiver’s; the earlier request(lowest value) wins If receiver wins, it queues the request message and doesn’t replyIf sender wins, it replies.

85. Timestamp Based The sender can take following actions:It waits until it receives replies (say OK) from every node.After getting permission from all nodes, it executes critical section.After critical section, it replies to all messages queued previously (if any).

86. Timestamp Based 201Timestamp = 5Timestamp = 3WaitingWaitingInside Critical Section

87. Advantages & ProblemsAdvantagesNo Starvation & Guaranteed Mutual ExclusionProblems2(n-1) messages are send/received for a single request.Number of point-failures is n – Higher single point failure probability.Confusion in dealing with Denial & Dead.Can be solved by adding a message for Denial (Tanenbaum95)

88. Voting Based It is same as that of Timestamp Based, but the decision is made as soon as the reply (say OK) is received from majority of nodes.In this approach, a process can give permission to one process only.Requires the requesting process to inform others when it is done.

89. Token Based Nodes are arranged in a Logical Ring.Some process p0 initializes the token and then token starts circulating in the ring.If a process needs to access a shared resource, it waits for the token to arrive.It holds the token while executing critical section, then passes token to next process/node.If a process gets the token but doesn’t need to access any shared resource, it simply passes it to next.

90. Token Based 0312Need to access a shared resourceHold the token and execute the critical section

91. Advantages & ProblemsAdvantagesGuarantees Mutual Exclusion & avoids starvation.ProblemsLost tokens can create confusionDenial or DeadLost Process

92. Comparison of 3 AlgorithmsAlgorithmMessages per Critical SectionDelay before PermissionProblemsCentralized32Coordinator crashDistributed(Contention)2 (n-1)2 (n-1)Crash of any processDistributed(Token)1 to infinity0 to n-1Lost token & process

93. Reading AssignmentDistributed systems: principles and paradigms by AST & MV SteenChapter 33.2. MUTUAL EXCLUSION 3.2.1. A Centralized Algorithm 3.2.2. A Distributed Algorithm 3.2.3. A Token Ring Algorithm 3.2.4. A Comparison of the Three Algorithms

94. Unit 2ELECTION ALGORITHMSin Distributed OS

95. Election AlgorithmsFailures are inevitableTwo Strategies:Don’t CareReorganizeMost of distributed algorithms rely on the existence of a coordinator, a sequencer, an initiator or any other special process called coordinator process.What to do if such a process goes down?We need to dynamically elect a new coordinator process.

96. Election AlgorithmsElection Algorithms are meant for electing a coordinator process from among the currently running processes in such a manner that at any instance of time there is a single coordinator for all processes.All Election Algorithms make certain assumptions:Every process has a unique priority numberEvery process knows about other process’ priorityHighest priority number process is electedOn recovery, the actual coordinator takes appropriate steps.

97. Election AlgorithmsElection AlgorithmsBully AlgorithmInvitation AlgorithmRing Algorithm

98. Bully AlgorithmBully Algorithm assumes that Each process stores its state onto some permanent storage.There are no transmission errors.The communication subsystem does not fail

99. Bully AlgorithmWhen a process p is asking coordinator for some service and coordinator is not responding; it assumes that coordinator is down.What to do?Announce an ElectionThe node that found the coordinator down announces election by sending ELECTION message to every other node whose priority > his priority.

100. Bully AlgorithmThere a 3 possible outcomesNo body replies; In this case the process becomes the coordinator process.Single Reply; The interested process takes on the position of a coordinator process.Multiple Replies; The current process is relived of its duties and these processes further carry on the election until a single coordinator is found.After a coordinator is found, it is informed to every process by a COORDINATOR message.

101. Bully Algorithm42315CoordinatorX

102. Bully Algorithm42315Recovered CoordinatorCurrentCoordinatorLets go for an Election(Garcia-Molina)

103. Bully AlgorithmAssuming n processesIn Worst Case, if the initiator is the lowest priority process and every other process is interested, thenO(n^2) messages are sent. In Best Case, if the initiator is the highest priority process n-1 messages are sent.

104. Invitation AlgorithmThe Bully Algorithm fails for Asynchronous systemWorks fine if the process responds timely – unrealisticA Synchronous DS is one in whichThe time to execute each step of a process has known lower and upper bounds, andEach message transmitted over a channel is received within a known bounded time

105. Invitation AlgorithmThe Invitation Algorithm works in presence of timing failuresno assumptionThe algorithm works even if some router fails making communication between 2 subsets of the processes impossibleHow can a single global coordinator exist.It makes sense to think in terms of a coordinator for each sub-group of processes.

106. Invitation AlgorithmInitially there is a single group with a global coordinator.When it fails, node(s) sensing the failure start creating singleton groups with itself as coordinator.Every such group is given a unique group-number.The coordinators of these singleton groups periodically send invitation to other processes belonging to old group to join it in forming a larger group.

107. Invitation AlgorithmAs the group structure changes, it is assigned a new unique group-number.The unification is done as follows:The coordinator sends messages to every other node asking whether the node is itself a coordinator process.If they reply that they are, it waits for a period based on its own group number before issuing the invitation.Processes with lower priority defer sending invitation for a longer period to avoid sending invitations to all processes.The coordinator (or simple process) on receiving invitation from high priority process accepts the proposal.

108. Invitation AlgorithmWhen a coordinator (another process) receives an invitation it forwards it to all members of its group.Any process receiving an invitation accepts by sending accept to the coordinator which acknowledges with answer.The process (coordinator) which initiated the merger becomes the coordinator for the new-group.This is confirmed by sending ready to each member which respond with answer.

109. 1234Top PriorityLeastPriorityINVITATIONANSWERACCEPTREADY

110. Invitation AlgorithmIn Bully Algorithm, the low priority processes are ‘bullied’ to submission by high priority processes.In Invitation Algorithm, the process invites other processes to join its group and agree upon it being leader.

111. RING AlgorithmThere is a Logical Ring of processes.When a process senses that the coordinator has failedIt initiates the Election by passing an ELECTION message to its successor in the ring.The message contains its priority.The next process appends its priority to the message and passes it to its successor.If the next process is down, the sender skips until an alive successor is found.Finally, the initiator receives the message back.The highest priority process within the list is the Coordinator

112. RING AlgorithmAfter a coordinator is found, it is informed to every node by a COORDINATOR message sent in the same fashion as an ELECTION message.The message is again received by the initiator, and removed.Two or more processes might simultaneously initiate electionsIn this case, extra messages may exist BUT still same new coordinator is elected.

113. RING Algorithm1234Coordinator33, 43, 4, 24 is the Coordinator4 is the Coordinator4 is the Coordinator4 is the CoordinatorX

114. Reading AssignmentDistributed systems: principles and paradigms by AST & MV SteenChapter 33.3. ELECTION ALGORITHMS 3.3.1. The Bully Algorithm 3.3.2. A Ring Algorithm

115. ReferencesBooks:Distributed systems: principles and paradigms by AST & MV SteenDistributed OS (Concepts & Design) by PK SinhaPapers:Implementing remote procedure callsBy Andrew D. Birrell, Bruce J. Nelson