/
1 Simulation Modeling Imitation of the operation of a real-world process or system over 1 Simulation Modeling Imitation of the operation of a real-world process or system over

1 Simulation Modeling Imitation of the operation of a real-world process or system over - PowerPoint Presentation

riley
riley . @riley
Follow
65 views
Uploaded On 2023-11-11

1 Simulation Modeling Imitation of the operation of a real-world process or system over - PPT Presentation

Objective to collect data as if a real system were being observed Data collected from the simulation are used to estimate the performancedependability measures of the system 2 Discrete Event Simulation ID: 1030987

system time rate state time system state rate amp cpu event mark failure repair reward model number priority return

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "1 Simulation Modeling Imitation of the o..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. 1Simulation ModelingImitation of the operation of a real-world process or system over timeObjective: to collect data as if a real system were being observedData collected from the simulation are used to estimate the performance/dependability measures of the system

2. 2Discrete Event Simulationmodeling of the system as it evolves over time by a representation in which the state variables change only at a countable number of points in timeterminology:simulation clock: a variable that gives the current value of the simulated timeevent: an instantaneous occurrence which may change the state of the system

3. 3Simulation Terminologyevent list: a list (data structure) consisting of event records with each record containing the time of occurrence of a particular event, e.g., the arrival time, the departure time of a clienttiming routine: a subroutine which determines and removes the most imminent event record from the event list and advances the simulation clock to the time when the corresponding event is to occurevent routine: a subroutine which updates the state of the system when a particular type of event occursone event routine for each type of event

4. 4Event SchedulingDetermine the number of event types in the system, e.g., 1: arrival, 2: request for service, 3: service completion, 4: timer, etc.Place one or more initial event records in the event list, each containing event time, event type, customer class, etc.Determine the most imminent event in the event list (by the timing routine) in a loop until a specified stopping rule is satisfiedupdate the simulation clock when an event record is removed from the event list

5. 5Event Scheduling (cont.)Pass the control to the event routine corresponding to the event typeUpdate the state of the system Gather the statistics if necessaryReport the simulation results when the simulation is completedFor examplethe average response time per clientthe loss probability of callsthe system throughputthe average number of clients served over a time period

6. 6Simulation using smplIn the smpl view of systems, there are three types of entities:resources: facilitiessmpl provides functions to define, request, release and preempt (queueing) facilitiestokens: active entities of the systems, e.g., tasks, users (indistinguishable or distinguishable) events: a change of state of any system entity is an eventsmpl provides functions for scheduling and for selecting events in the order of event occurrence time

7. 7Structure of An smpl ProgramInitialization routine;timing control routine to select the most imminent eventfrom the event list (event clock is updated implicitly){ event type 1: event routine for event type 1; event type 2: event routine for event type 2; . . event type n: event routine for event type n;}statistics reporting routine;

8. 8Initialization Routinesmpl(m, s)int m=0; /* always 0 */ char *s;smpl provides seeds for 15 streams for generating randomnumbers. To collect a set of 15 sample values of a particularperformance measure, one can invoke smpl() 15 times: loop: repeat 15 times { smpl(0, “hw1”); }One can also use stream(1), stream(2), etc. to specify the stream number to be used in a simulation run

9. 9Facility Definition and Controlfd = facility(s, n)char *s; int n; /* # of servers */=> define a queueing server with “n” servers;smpl automatically manages enqueueing/dequeueing activitiesr = request(fd_id, token_id, pri)int fd_id; int token_id; int pri;=> request a server of facility “fd_id” be reserved for the token designated by “token_id” with priority “pri” (higher is better)r=0: facility is reservedr=1: facility is busy and the request is blocked in the queueordered on priority

10. 10r = preempt(fd_id, tkn_id, pri)int fd_id, tkn_id, pri;=> same as request() except that it will preempt the server if it is busy serving a task with priority < “pri”=> the event record corresponding to the preempted token (for the service completion event) is removed from the event list and a queue entry with the residual time is created r=0: facility is reservedr=1: facility is busy and the request is blocked in the queueordered on priorityrelease(fd_id, tkn_id)int fd_id; int tkn_id;=> release the facility and if the queue is not empty, reschedule an event with the event occurrence time at NOW for a blocked task, and reschedule an event with the event occurrence time at NOW+ the residual time for a preempted task.Facility Definition and Control create an event of the same type and put it in the event list

11. 11schedule(event_id, te, tkn_id)int event_id;real te; /* time interval relative to the current time */int tkn_id;=> schedule the event with id “event_id” to occur at NOW+te=> this essentially inserts an event record with the event occurrence time NOW+te into the event list=> part of the information in the event record is event_id, tkn_id and the event occurrence time NOW+teExample: schedule(2, 0.0, token_id)=> schedule event type #2 associated with token id “token_id” tooccur NOWScheduling Events

12. 12cause(event_id, tkn_id)int *event_id;int *tkn_id;=> remove the most imminent event from the event list and automatically advance the simulation clock to the event occurrencetime=> return the event number (type) and token id to the callerTypically in the smpl program, we use a select statement onthe event_id returned, so as to transfer the control to theappropriate event routine.Timing Routine

13. 13cancel(event_id)int event_id;=> search the event list and remove the first event with the eventnumber = event_idCanceling EventsGet Current Simulation Timet = real time()=> return the current simulation clock value=> real is a predefined type; it is the same as double in C

14. 14n= int inq(fd);=> returning # of tokens currently in queue (not including the ones in service) r = int status(fd)=> r=0: facility is free; r=1: facility is busyu = real U(fd)=> mean # of tokens in servicen = real Lq(fd)=> mean # of tokens in queue excluding the ones in serviceb = real B(fd)=> mean busy period = accumulated busy time/release countsStatus Functions

15. 15r = real drand48(); /* available on UNIX machines */ => return r in the range of (0,1)r = real expntl(x)double x; => return an exponentially distributed sample value with mean xr = real uniform(a,b)double a,b; => return a real number r in the range of (a,b)k = int random(i,j)int i, j; => return an integer k in the range of (i,j)r = real normal(x,s)=> return a normally distributed sample value with mean x and standard deviation sRandom Variate Generation (rand.c)

16. 16trace(n)int n;=> generate trace messages when a facility is defined, requested, or released, or whenever an event is scheduled or causedn=0: trace is offn=1: free-running, i.e., trace messages are generated continuouslyn=2: screen by screen running (press any key to resume tracing)n=3: message by message running (press any key to resume)Traces and Debugging

17. 17M/M/1 smpl program#include “smpl.h”main(){real Ta=200, Ts=100, te=200000;int customer=1, event, server;smpl(0, “M/M/1 Queue”);server = facility(“server”,1);schedule(1, 0.0, customer);while (time()< te) { cause(&event, &customer); switch(event) { case 1: /* arrival */ schedule(2,0.0, customer); schedule(1, expntl(Ta), customer); break; case 2: /* request server */ if (request(server, customer,0)==0) schedule(3, expntl(Ts), customer); break; case 3: /* completion */ release(server, customer); break; } }report();}

18. 18Confidence Interval and LevelSuppose we collect N sample values Y1, Y2, …, YN from N simulation runssample mean Y = (Y1 + Y2 + …+ YN )/Ntrue mean is mDefine 1-a as the probability that the absolute value of the difference between Y and m is equal to or less than H that is, prob[Y-H <= m <= Y+H] = 1- aConfidence interval Confidence levelConfidence interval half-width

19. 19Confidence Interval and Level (cont.)When Y1, Y2, …, YN are independent random variables from a normal distribution with the mean m, H is defined by H = ta/2;N-1* s/sqrt(N) where t is the student’s t distribution and s2 is the sample variance given by s2 = Si (Yi - Y)2 /(N-1) (and thus s is the standard deviation).

20. 20Batch Mean Analysis by smplUse a batch size m around 2000 observations to collect a sample value Yi to justify the normal distribution assumption (by central limit theorem). Delete d = 0.1 m initial observationsCollect k = 10 batches and compute the confidence interval half-width HIf the desired accuracy has not been reached, collect another batch and compute H again. Repeat as necessary.mmsamples to generate Y1samples to generate Y2

21. 21BMA: stat.c and bmeans.cBased on 95% confidence level (a = 0.05) with 10% confidence accuracy (H/Y = 10%)The following three routines are provided:init_bm(d, m): d is number of initial observations to be discarded and m is the number of observations to collect one sample Yiobs(y): y is the observation value generated out of a simulation runif the returning value is 1, it means that the required confidence level and accuracy have been reached; otherwise, need to continue calling this function obs(y)civals(Y, H, k): Y, H and k are passed in by reference. This function returns the final result.

22. 22M/M/1 smpl program with BMA#include “smpl.h”#define TOKENS 1000#define TRUE 1#define FALSE 0main(){real Ta=200.0,Ts=100.0,mean,hw;int tk_id=0,customer=0,event,server,nb;real ts[TOKENS]; /* start time stamp */int cont=TRUE;smpl(0,"M/M/1 Queue with BMA");init_bm(200,2000); /* d=200; m=2000 */server=facility("server",1);schedule(1,0.0,tk_id);while (cont) { cause(&event,&customer); switch(event) { case 1: /* arrival */ ts[customer] = time(); schedule(2,0.0, customer); if (++tk_id >= TOKENS) tk_id=0; schedule(1,expntl(Ta),tk_id); break;case 2: /* request server */ if (request(server, customer,0)==0) schedule(3,expntl(Ts),customer); break; case 3: /* release server */ release(server, customer); if (obs(time()-ts[customer]) == 1) cont = FALSE; break; } } /* end while */civals(&mean, &hw, &nb);printf(”Y= %f; H= %f after %d batches\n”, mean, hw, nb);}

23. 23Bmeans.c#include "smpl.h"#include "stat.c"static int d,k,m,n;static real smy,smY,smY2,Y, h;init_bm(m0,mb) int m0,mb; { /* set deletion amount & batch size */ d=m0; m=mb; smy=smY=smY2=0.0; k=n=0; }obs(y) real y; { int r=0; real var; if (d) then {d--; return(r);} smy+=y; n++; if (n==m) then { /* batch complete: update sums & counts */ smy/=n; smY+=smy; smY2+=smy*smy; k++; printf("batch %2d mean = %.3f",k,smy); smy=0.0; n=0; /* reset batch variables */ if (k>=10) then { /* compute grand mean & half width */ Y=smY/k; var=(smY2-k*Y*Y)/(k-1); h=T(0.025,k-1)*sqrt(var/k); printf(", rel. HW = %.3f",h/Y); if (h/Y<=0.1) then r=1; } printf("\n"); } return(r); }civals(mean,hw,nb) real *mean,*hw; int *nb; { /* return batch means analysis results */ *mean=Y; *hw=h; *nb=k; }

24. 24 Physical meaning: instantaneous rate at which components are failingChap 2: Reliability and Availability ModelsReliability R(t) = prob{S is fully functioning in [0,t]}Suppose from [0,t] time period, we measure out of N components, of whichN0 (t): # of components operating correctly at time tNf (t): # of components which have failed at time t

25. 25How many unfailing components are there at time t ? Nf(t+dt) - Nf(t)dtNt+dttimeFailing rate of a single componentZ(t) is also called the hazard rate.(2.3)text p.28 t

26. 26For electronic components, Z(t)’s relationship with respect to time isa bathtub curve.Useful time phaseWear-outphase(due to aging)Burn-in periodInfantmortalityphasetimeZ(t)High failure ratedue to faulty design,manufacturing orassembly.Weak componentsare removed duringthe “burn-in” period.Failure rate can be assumed to be a constant during the useful time period, say 

27. 27Exponential Failure Lawe.g.  = 0.01 hr-1, what is R(t) at t = 100 hrs? Ans: e-0.01*100 For hardware components, exponential failure law is frequently assumed. For software components, the reliability may grow as the software’s design faults are removed during the testing/debugging phase.

28. 28This yieldsZ(t)t<1>1=1Use this to model software failure ratee.g.  = -1 R(t) = et-1 t  , R(t)  1 t  0, R(t)  0 Reliability improves as a function of tIn general, we can assume Z(t) =  ( t )-1R(t) = e- (t )Weibull dist.

29. 29Formal Definition of R(t): Let x be a .. representing the life of a system and let F be thecumulative distribution function (CDF) of x. Then,  For a component obeying the exponential failure law

30. 30Mean Time to Failure (MTTF): The expected time that a system will operate before the first failure occursIdenticalsystemsi first failing time1 t1N tNDiscrete caseQ: what is the reliability of a system obeying the exponential failure law at t = MTTF?Ans: Continuous casefailure time

31. 31Relationship between MTBF (Mean time between failure) , MTTR & MTTF:MTTFMTTRMTTFMTTRtimeTime offirst failureTime of2nd failureMTBF = MTTR + MTTF If MTTF >> MTTR Then MTBF  MTTFMTTR (Mean Time to Repair) If also assume a failed system obeys “Exponential Repair Law”, then

32. 32Availability Instantaneous (or point) Availability A(t) = prob {the system is functioning at time t}Regardless of the # of times it may havefailed & been repaired during [0, t] Steady-State Availability = MTTFMTTF+MTTR For a system without repair, A(t) = R(t)A0timetA(t)

33. 33Assume exponential “failure” & “repair” lawOFSee also page 67, text chapter 4Time domain: Po’(t) = - Po(t) + PF(t) PF’(t) = Po(t) - PF(t) with initial state Po(0) = 1 & PF(0) = 0Laplace domain: SPo(S) - 1 = - Po(S) + PF(S) SPF(S) - 0 = Po(S) - PF(S)Po(0)PF(0)

34. 34Po(t) =PF(t) =+++e-(+)t++-e-(+)tPhysical meaning:Po(t) = prob {the system is functioning at time t } A(t) = +++e-(+)tSimilarly PF(S)F(t) L(F(t)) = f(S)LTInverseLT1ttneat1/S1/S2n!/Sn+11/(S-a)Inverse LT to return totime domain Po(S)

35. 35Q1: unavailability? Ans: PF(t)Q2: R(t)? Still e-tQ3: Steady-state availability?e.g.  = 0.01 &  = 0.1tA(t)0.91.00.90909...Ans: A( t) = +

36. 36Modeling: Series-Parallel Reliability Block Diagrams  A series-parallel block diagram represents the logical structure of a system with regard to how the reliabilities of its components affect the system reliability.  Components are combined into blocks in  series  parallel or  k-of-n configurations

37. 37A. Serial system: each element of the system is required to function correctly for the system to function correctly.132n1B. Parallel system: only one of several elements must be operational for the system to be operational.12nAssumptions: independentrandom variables perfect coverage so up to n-1 failures can be tolerated.2

38. 38C. Combination of series & parallel systems e.g.Computer 1Computer 2Interface 1Interface 2Display 1Display 2Bus 1Bus 2Parallel 1234Numerical ex: R=0.9 then Rsystem = [1-(1-0.9)2]4 = 0.96 v.s. Rnon-redundant = (0.9)4 = 0.6561Where Rj,i is for jth component of ith unit

39. 39e.g.R1R1R2R2R2R3123Rparallel = 1 - ( 1 - Rseries, 1 )( 1 - Rseries, 2 )( 1 - Rseries, 3 ) = 1 - ( 1 - R12 )( 1 - R23 )( 1 - R3 )Q: Prove the following theorem: Replication at the component level is more effective than replication at the system level in improving system reliability using the same # of components.Ans: Show that is better thanAssume R=1/2 for each component

40. 40D. k-out-of-ne.g. TMR ( Triple Module Redundancy ) is a 2-out-of-3 system.RTMR(t) = R1(t) * R2(t) * R3(t) + R1(t) R2(t) ( 1 - R3(t) ) + R1(t) R3(t) ( 1 - R2(t) ) + R2(t) R3(t) ( 1 - R1(t) ) when R1(t) = R2(t) = R3(t) = R(t) R2-out-of-3(t) = 3R2(t) - 2R3(t)all are functioning1 failed & 2 are functioningIn general,3e.g.

41. 41Q: Is RTMR(t) > Rsystem with a single component(t) ?Let 3R2 - 2R3 = R R2 -32R + 0.5 = 0  R = 1.0 or 0.5RTMR(t) = R(t) when the reliability of a single module is 1.0 or 0.5In fact, when R(t) < 0.5, R(t) > RTMR(t)Q: what is the MTTF of a k-out-of-n system when each single module follows the exponential failure law with a failure rate of  ? 0.51.01.0RsystemR(t)R(t)singleRTMR(t)More realistic region÷øöçèæ++==\ò¥kndttRMTTFsystem1...11)(0l--=å---=eeiinntRintitnkisystem)1()(!)!(!)(ll2-out-of-5:MTTF=2-out-of-3:MTTF=e.g.

42. 42Q: what is the reliability & MTTF of the following structure?Fig. 2.5, p36P1P2m1m2m3p = 0.00139/day( 1-out-of-2 )Also a parallel systemm = 0.00764/day ( 1-out-of-3 )A parallel system tooRsystem(t) = [ 1 - ( 1 - e-0.00139t )2 ][ 1 - ( 1 - e-0.00764t )3 ] = 6e-0.00903t-3e-0.01042t-6e-0.01667t+3e-0.01806t+2e-0.02431t-e-0.0257tRecall thatAi(t) = Equations321&obtained above can also be used to computethe system availability  by replacing Ri(t) with Ai(t)MTTF ==60.0090330.01042-60.01667-30.01806+20.02431+10.0257- =226.09Repair rateFailure rate of component i

43. 43Specifically,For steady state availabilityUse Ai( t   ) = ii + iinto equations 654&above456Assuming

44. 44Chap 9: Reliability & Availability ModelingABCCCDDEReliability Block Diagram using sharpeP.358 Appendix Bblock name {( param-list )}optional< block line >endcan be one of the following:

45. 452) parallel name name-1 name-2 {name3 name4 …}Name 1Name 2The parallel system is assigned to the first name.3) series name name-1 name-2 {name3 name4 …}Name 1Name 2The series system is assigned to the first name.4) kofn name expression-1, expression-2, component-nameknidenticalcomponents1) comp name exponential-polynomial Referring to the cumulative dist function( cdf )exp (lambda) meaningcdf (component-name{,state}{;arg-list}) which has been defined beforegen triple 1, triple 2, ... in the form of (aj, kj, bj)See p. 352orF(t) = 1-e-tR(t) = e-t

46. 465) kofn name k-expression, n-expression, name1 name2 {name3…} representing a k-out-of-n system having possibly different componentsComponents do not have identical failure-time dist.Ex:Fig. 2.5(p. 36)CPU1CPU2m1m2m3Sharing memory:a k-out-of-n deviceblock system ( k, n, pfrate, mfrate )comp CPU exp ( pfrate ) comp mem exp ( mfrate )parallel CPUs CPU CPUkofn mems k, n, memseries subsystem CPUs memsendOutputCDF for system 1-6e-0.00903t+3e-0.0104t+…(1-out-of-3) k=1.00 mean(system;k,3,…) = 2.26*102 rel (10,k,3…) = 9.9981*10-1 rel (365,k,3…) = 8.33*10-1k=2.00in semi symbolic form

47. 47Comments: any line starting with the symbol “*”  printing output.  printing F(t) in symbolic form cdf (system;1, 3, 0.00139, 0.00764)  define reliability function at time t func rel(t, k, n, pf, mf)\ 1-value (t; system; k, n, pf, mf)Returning F(t) at time t numerically\ means continuation to the next lineloop k, 1, 3, 1 expr mean (system; k, 3, 0.00139, 0.00764) expr rel (10, k, 3, 0.00139, 0.00764) expr rel (365, k, 3, 0.00139, 0.00764)endend

48. 48Fig. 9.1 p. 156ABCCCDDE=0.05=0.3=0.01=0.25=0.1 printing: 5 decimal places format 5 cdf cdf (block1a) expr 1-value(10; block1a) endPrint 1-F(t)=R(t) at t=10block block1acomp A exp(0.05)comp B exp(0.01)comp C exp(0.3)comp D exp(0.25)comp E exp(0.1)parallel threeC C C C parallel twoD D Dseries sys1 A B threeC twoD Eend

49. 49Availability Modelingpoly unavail(mu, lambda)\ gen\ 1, 0, 0\ -mu/(lambda+mu), 0, 0\ -lambda/(lambda+mu), 0, -(lambda+mu)block block1acomp A unavail(muA, lambdaA)comp B unavail(muB, lambdaB) . . .endbind muA 1 bind lambdaA 0.0001 .To define1-Ai(t) print steady state availability A() expr pinf(block1a) print instantaneous availability at t=100 expr 1-value(100; block1a) endSee p.354 text on a user-defineddistribution syntax: poly name(param-list) dist.gen\triple\tripleof the form(aj, kj, bj )

50. 50Fault Trees A pictorial representation of events that can cause the occurrence of an undesirable event.  An event at a high level is reduced to a combination of lower level events by means of logic gatesAND: when all fail, the failure event occurs.OR: when one fails, the failure event occurs.K out of n: when at least k out of n components fail, the failure event occurs.e.g.failureorANDANDP1P2M1M2M3P1P2M1M2M3P.39, chap.2

51. 51For a fault tree without repeated components:Qftree(t) =Unreliabilityor failure probabilityAND gateOR gatek-out-of-n gate:for n identically distributedcomponentsA set that contains at least k failed componentsk-out-of-n gate:for non-identicallydistributed components

52. 52The above equation cannot be used when there is a repeated componente.g.failureorANDANDP1M2M3orANDP2M1M3S1S2P1P2M1M3M2M32 processors: P1 & P23 memory modules: M1, M2 & M3M3 is shared by P1 & P2M1 is private to P1M2 is private to P2the system will operate as long as there is at least one operational processor with access to either a private or shared memory.

53. 53Leti.e. ifProcessor i fails at time tandotherwise.Then The subsystem fails when either P1 fails or both (M1M3) failOR gate for S1OR gate for S2 

54. 54Fault Tree using Sharpee.g.failureorABhyperexponentialexponential failure lawbind a1 0.028 a2 0.25 P 0.5endftree series (lambda)basic B exp (lambda)basic A gen \ 1, 0, 0 \ * for 1 -P, 0, -a1 \ * for -Pe-a1t -(1-P), 0, -a2 * for -(1-P)e-a2tor top A Bend* print cdf (series; 0.05) eval (series; 0.05) 0.5 1.5 0.5 endFailure rateP.172, chapter 9

55. 55e.g. Aircraft flight control systembindmIRS 0.000015mPRS 0.00099mSA 0.000037mCS 0.00048endftree aircraftbasic IRS exp (mIRS)basic PRS exp (mPRS) :kofn IRS23 2, 3, IRSkofn PRS23 2, 3, PRS :or top IRS23 PRS23 CS34 SAS23end* most susceptible to failure*  use 4 CS components* print format 8 expr mean(aircraft) eval (aircraft) 1000 10000 1000 expr value(10; aircraft) * unreliability end failureInertialreference sensors: 3PRS1PRS2PRS3IRSIRSIRSSA1SA2SA3Pitch rate sensors: 3Secondaryactuators: 3 CS1CS2CS3Computersystems: 4CS32/32/33/42/3This means if 3 out of 4 failthen the subsystem fails

56. 56Fault Trees using SHARPE with repeated componentsEx: also see the example in Fig. 9.22Ex:failureorANDANDP1M2M3orANDP2M1M3ftree systembasic P1 exp (lambda P1)basic P2 exp (lambda P2)basic M1 exp (lambda M1)basic M2 exp (lambda M2)repeat M3 exp (lambda M3)AND M1M3 M1 M3AND M2M3 M2 M3OR system1 P1 M1M3  kofn system1 1, 2, P1 M1M3 OR system2 P2 M2M3AND top system1 system2  kofn top 2, 2, system1 system2 end* print reliability at time t expr 1-value ( t; system)

57. 572.6 Series-Parallel Block Diagrams with Components in common(Also called Network Reliability Models)14325Can be arranged asA parallel connectionof series structuresA series connection of parallel structuresDefinition: a minimal path (set) is a minimal set of components whose functioning ensures the functioning of the system.Definition: a minimal cut (set) is a minimal set of componentswhose failure ensures the failure of the system.e.g.

58. 58Q: how many minimal cuts?A: 4: {1,2},{4,5},{1,3,5},{2,3,4} 1142255343Connect them in seriesQ: how many minimal paths?A: 4: {1,3,5},{1,4},{2,5},{2,3,4}1351425234Use parallel connectionThere are series-parallel diagrams with common components

59. 59Another example: TMR is a 2-out-of-3 system3 minimal paths:{1,2} {2,3} & {1,3}122331Due to Xi2 = Xi from Xi’s definition

60. 60NowTerms each contain only independent componentsFor identical components, R1=R2=R3, Rsystem=3R2-2R3

61. 61Chap. 2 (2.5) & Chap. 9 (9.3) p.43 & p.180 Modeling with a Reliability Graph* A reliability graph consists of nodes & directed arcs. - source node — no arcs enter it - target (sink) node — no arcs leave it* A system represented by a reliability graph fails when there is no path from the source to the sink.* arcs are associated with failure time distribution (in cdf)

62. 62e.g.,12345acdb13452sourcesinkArcs are associated with exponential distributions with rates i’srelgraph bridge(v1, v2, v3, v4, v5)a b exp(v1)a c exp(v4)b d exp(v2)c d exp(v5)bidirectb c exp(v3)end* print cdf (bridge;1,2,3,4,5) pqcdf (bridge;1,2,3,4,5)endThe underlying technique for solvingthe model is minimal path set &minimal cut set.output for pqcdf (bridge;1,2,3,4,5): 1-([P(0:a,b)*P(1:b,d)]+[P(2:a,c)*P(3:c,d)*(1-P(0:a,b)*P(1:b,d))]+[P(0;a,b)*Q(1:b,d)*Q(2:a,c)*P(3:c,d)*P(4:b,c)]+[Q(0:a,b)*P(1:b,d)*P(2:a,c)*Q(3:c,d)*P(4:b,c)]meaning 1 - e-1 t e-2 tmeaning (1-e-l1t)

63. 63What is the reliability graph corresponding to the fault treemodel on the left?failureorANDANDP1M2M3orANDP2M1M3Ans:src1sharesink2P1M1M3 M2P2Never failingNever failingSharpe code: relgraph P2M3shared src 1 exp(lambda_P1) src 2 exp(lambda_P2) 1 sink exp(lambda_M1) 2 sink exp(lambda_M2) share sink exp(lambda_M3) 1 share inf 2 share inf end * print reliability at time t expr 1-value(t; P2M3shared) endThis means 1  share &2  sharelinks never failSpecifyingfor P1See p.353 Appendix Bspecifying a component having all its mass at , i.e., F(t)=0 except at F()=1P1P2M1M3M3M2

64. 64Single Queuing SystemsM/M/1 queuing system — arrival process is a Poisson process (or the inter-arrival time is exponentially distributed)— service process is also a Poisson process (or the service time is exponentially distributed)advantage: a mathematically tractable model with solutions applicable to a wide variety of situations.A counting process {N(t), t  0}representing the # of eventsthat have occurred up to time t

65. 65Poisson process with an average arrival rate :  is the proportionality constantttttt….timetPr(exactly 1 arrival in [t,t+t]) = tPr(no arrivals in [t,t+t]) = 1 - tAnalogy: Coin flipping: results of coin flips are independent Arrivals are also independent

66. 66Let Pn(t)  P (# of arrivals = n at time t) Pij(t)  the prob. of going from i arrivals to j arrivals in a time interval of t seconds1-t t1-t0123….

67. 67Let t 0 then12solutionFrom1

68. 68Continuing, by induction,Poisson distributionmeaning: prob. of n arrivals in an interval of t secondsEx:  = 100 arrivals/min., what is the prob. of no arrivals in 5 sec.?The Poisson distribution480.51.0P0(t)P1(t)P2(t)P3(t)

69. 69* the mean & the variance of the Poisson dist. are both equal to t.derivation isbased onmean:Inter-arrival time cumulative dist. function(t)= P (time between arrivals  t )= 1-P (time between arrivals > t )= 1-P0(t)= 1-e-tThe inter-arrival time T a random variablecdf (t)P0(t) inter-arrival time density (t) =  T is an exponentially distributed r.v. no arrivals in a time interval of t = P0(t) T has a memory-less propertypdf (t)

70. 70e.g.,an average of mean interarrival time = 20 min.= mean interarrival timeThe last train arrived 19 minutes ago.What is the expected time until the next train arrives?120 min.coin flipping explanation in t Memory-less property  Markov propertyP ( T > t0+t | T > t0 ) = P ( T > t )* Definition: a Markov chain is a Markov process with a discrete state space.yesno 1-t

71. 71Probability flux (or flow): (probability of a state)*(transition rate originating from the state)physical meaning: # of times per second the event corresponding to the transition occurs.01n-1n23….n+1(1-t)(1-t)tt1-ttM/M/1

72. 72n-1n+1nflows into stateflows out of statennflows out of state nflows into state nStudy:— transient behavior— equilibrium behaviorGlobal balance equations: a set of linear equations for tthis yields

73. 73Equilibrium state probabilities conservation of probability: normalization equationUse local balance equations to solve the global balance equations 1. Local satisfies global 2. Local allows us to relate Pn with a reference state, e.g., P0Definition of local balance: “the probability flow into a state due to an arrival to aqueue equals the probability flow out of the same state due to a departure from the same queue”01n-1n23….

74. 74P0: P0P1: P0 = P1 P2: P1 = P2 Pn-1 = Pn applying thenormalization equation

75. 75Utilization: prob. that an M/M/1 queuing system is nonemptyPn for M/M/1 system when ==* for a lightly loaded system, there are usually less than 4 customers in the system.Pn1/21/41/81/161/321/64n012345

76. 76check* r  1 otherwise  >  and the queuing system would no longer be in equilibrium  i.e., unstable.Q2: Average # of customers in the queuing system?Q1:throughput?because when there is no customer, there is no contribution to throughput.e.g.,e.g.,

77. 77Let R be the mean response time per customerQ3: R? sinceby little’s law (to be discussed later)When r=1system is unstable(0.5,1)(0.9,9)(1,)3020100.51()M/M/1: average # of customers as a function of M/M/1service timewaiting time

78. 78M/M/1/N Queuing system: the finite buffer case01N-1N23….Following the previous derivation for M/M/1/,& no restriction on the range of an arriving customer is “lost” or “turnedaway” when there arealready N customersin the system.N  PN  /  PN 

79. 79Q1: the prob. that the queuing system is full? PNQ2: how fast are customers lost? PN×(blocking probability)r is utilizationapplying L’Hopital’s ruleQ3: population?Q4: throughput? Q5: Utilization?

80. 80Variations of M/M/1M/M/:::12345infinite # of servers01n-1n2(n-1)23n3….(n+1)n+1a Poisson process with meanQ1: throughput? Q2: response time? 1/Q3: population?  / by Little’s law

81. 81M/M/m01m-1m2(m-1)23m3…. mm+1Q1: what is the probability that all servers are busy? Ans:Q2: throughput? Ans: Q3: response time?Solution:can be obtained by consider 2 casesseparatelyn={12m:m serverse.g., a system with m processors

82. 82M/M/m/m01m-1m223m3…....12m:m servers with a single queue having a buffer space of m (when all servers are busy, a customer walks away), e.g., a telephone switching system.Q1. Prob. that all m servers are busy (e.g., in a telephone switch company)? PmQ2. Mean # of calls turned away per time unit? Pm× The expression for Pm is called Erlang’s B formula.12m

83. 83A Client-Server SystemRequest arrival rate per user : Response time: the time spent by a user at the system between submitting the request & the return of the responseService rate of the server systemwith one server: μState Description: one state component representationn: a number representing the # of users in the server system # of users still thinking (i.e., not issuing requests) = m - ncpu12m:m users01m-1m2(m-1)(m-2)m3….

84. 84Q2: Avg. # of users still thinking (not issuing requests)?Q3: System throughput? Q1: Avg. # of users in the server system?Recallin M/M/1Q4: Response time per user?

85. 85Q2: Response time?:m usersWhat happens if the server system has m servers, each witha service rate of m?m serversQ1: Throughput? 01m-1m2(m-1)2(m-2)3mm3….….

86. 86Fundamental Laws: algebraic relationships among performance measurement quantities.mathematicallyutilization law = arrival rate = A/T e.g.,C = # of completionsx = C/T throughputB = total system busy timeD = B/C average service time per request = B/T utilization of the systemA1A2A3A4C1C1C1......systemdeparturesarrivals

87. 87Little’s law4201051WA(t)D(t)# of arrivals A(t)or# of departures D(t)# of customers in the system(n)time112121Consider # of customers per time unit* A meaning of W is the total time spent by all customers in the system.  R = W/C* Another meaning of W is the total population accumulated (in queue & in service) over T time units. Algebraicallyxthe shaded regionobservationperiodavg. # of customersConsider response time per customer

88. 88Non-State-Space Models1) Reliability block diagrams2) Fault trees3) Reliability graphsCan be analyzed based on the individual components & info. about the system structure; the assumption is that the failure or repair of a component is not affected by other components.State-Space Models1) Markov — the “sojourn” time, i.e., the amount of time in a state, is exponentially distributed.2) Semi-Markov — the “sojourn” time, i.e., the amount of time in a state can be any distribution.When we associate “rewards” with states of Markov or Semi-Markov models, we have so called Markov reward models.Chap. 4Chap. 8Chap. 63) Stochastic Petri Net Models — a concise & more intuitive representation for the Markov model. When we associate “rewards” to the markings of the net, we have stochastic reward nets.Chap. 7

89. 89Markov Models (continuous-time) Two main concepts in the Markov model are “system state” and“state transition”.Used to describe the system at any time. For reliability models, we frequently use faulty & non-faulty modules in the system.Representing the change of state due to the occurrence of an event,e.g., failures, repairs, etc.Ex: TMRSystem state representation: where Si = 1 if module i is fault free 0 if module i is faulty(S1, S2, S3)(1,1,1) (0,0,0)(1,1,0) (0,0,1)(0,1,1) (0,1,0)(1,0,1) (1,0,0)States inwhich the systemis operationalStates in which the system has failedHow many of these?2nn is # of components in the state representation

90. 90State transitionAssume that each module obeys the exponential failure law and has a constant failure rate . The prob. of module 1 being failed at time t+t, given that it was operational at time t, is given by The prob. that a transition will occur is determined by the prob. of failure,fault coverage, prob. of repair, etc. Assume only one failure at a time. Then the state diagram of TMR is as follows:1,0,11,1,00,0,11,0,00,0,01,1,11.01-t1-t1-t1-2t1-2t1-2t1-3ttttttttttttt0,1,10,1,0

91. 91 The Markov model can be simplified by combining states having the same # of non-failed modules, i.e., The aggregate transition rate is from the perspective of source state; there is only a single component in the state representation. * prob{a single transition from i to j occurs within t}e.g.System is in state 3at time t+tProb. of 33 occurswithin time t System wasat state 3at time t32F011-3t3t1-2t2t1-tt1.01.0

92. 92Rewriting the above three expressions, we have:123Or in matrix form asorP’(t) = AP(t)

93. 93* this can be derived directly from the following state-transition-rate diagram3F232 negative: out positive: inThe set of differential equations can be solved numerically or analytically. To solve it analytically, one approach is to use Laplace Transform.Laplace transform of derivatives: if L(F(t)) = f(s), then L(F’(t)) = sf(s)-F(0) e.g., if L(P3(t))=P3(s), then L(P3’(t)) = sP3(s)-P3(0)F(t)LTinverse LTL(F(t)) = f(s)1ttneatTimedomainLaplacedomain

94. 94 Applying LT, we havesP3(s)-P3(0) = -3P3(s)sP2(s)-P2(0) = 3P3(s)-2P2(s)sPF(s)-PF(0) = 2PF(s)3F232Where P3(s) is the LF of P3(t)Apply the inverse LT

95. 95For the TMR system, the system reliability is the sum of P3(t) + P2(t), i.e., 1 - PF(t)Same expression as we obtained earlierusing a reliability block diagramor a fault tree model.In sharpe:markov main(lambda) 3 2 3*lambda 2 F 2*lambdaend 3 1.0end * print cdf=F(t) in symbolic form cdf(main;0.000001) * same as cdf(main,F;0.000001)* print F(t) at t = 0.2, 0.4, 0.6, 0.8, 1.0 eval(main,F;0.000001) 0.2 1.0 0.2 end

96. 96Example: the 2P3m systemP1P2m3m2m1Modeling 1-out-of-3 memory & 1-out-of-2 CPU:the system is alive when at least onememory and one CPU are alive3, 23, 13, 02, 02, 12, 21, 01, 11, 20, 10, 22p2p2pppp3m2mm3m2mmFpFpFpFmFm# of alive memory units# of alive processors

97. 97bindlambdap 1/720 * MTTF of a processorlambdam 1/(2*720) * is 720 hrsend * MTTF of a memory * unit is 2*720 hrsmarkov 2P3m* memory failure 32 22 3*lambdam 22 12 2* lambdam 12 02 lambdam value(t; 2P3m) is the 31 21 3* lambdam prob. of being in an 21 11 2* lambdam absorbing state 11 01 lambdam at time t; * processor failure value(t; 2P3m, 32) is 32 31 2*lambdap the prob. of being in 31 30 lambdap state 32 at time t. 22 21 2 * lambdap 21 20 lambdap 12 11 2 * lambdap 11 10 lambdapend* Q(t)echo Q(t) is as follows:cdf (2P3m)* R(t) can be found by * “expr 1-value(t;2P3m)”;* it can also be found by * defining my own function * called gp(t) belowfunc gp(t) value(t;2P3m,32)\ +value(t;2P3m,22)\ +…..\ +value(t;2P3m,11)* R(1 hr)* print reliability(t=1 hr)expr 1-value(1;2P3m)* use loop to print R(t) at * different valuesloop t, 0.5, 1, 0.1expr gp(t)endend32 1.0end

98. 98Availability ModelingP1P2m3m2m1pmpmk-out-of-n memory subsystemCase 1: Independent repairman model, i.e., all components have own repair facility and can be repaired independentlyunavailability

99. 99See p.354 text on a user-defineddistribution syntax: poly name(param-list) dist.When defining a component, use unreliability F(t) for reliability modeling, and useunavailability U(t) or for availability modeling. gen\triple\tripleof the formaj, kj, bjbindlambdap 1/720lambdam 1/(2*720)mup 1/4 MTTR = 4 hrsmum 1/2endpoly U(lambda,mu) gen\ lambda/(lambda+mu), 0, 0\ -lambda/(lambda+mu), 0, -(lambda+mu)block case1 (k,n)comp proc U(lambdap, mup)comp mem U(lambdam, mum)parallel procs proc proc kofn mems k, n, memseries top procs memsendloop k, 1,3,1* availability at the steady state (when t = ) expr pinf(case1; k, 3)* instantaneous availability at t=100 expr 1-value(100;case1;k, 3)endend

100. 100Case 2: There is only 1 repair facility capable of repairing one component at a time, with processor repair having a higher priority over memory repair.Assume that the system is up when at least 1 processor & 1 memory are up. When the system is in a failure state, it halts until it is repaired to become operational again, so no further component failure will occur in a failure stateNo, because processor repair takes priority over memory repair3, 23, 13, 02, 02, 12, 21, 01, 11, 20, 10, 22p2p2pppp3m2mm3m2mmmemoryprocessormmmpppppp?  mp

101. 101bind lambdap 1/720 lambdam 1/(2*720) mup 1/4 mum 1/2endmarkov M* memory failure32 22 3*lambdam * processor failure * processor repair30 31 mup31 32 mup20 21 mup21 22 mup10 11 mup11 12 mup01 02 mup* memory repair22 32 mum12 22 mum02 12 mumendSame as before in the 2P3mMarkov modelfor reliabilitymodeling* steady state unavailability expr prob(M,30)+prob(M,20)+\ prob(M,10)+prob(M,01)+prob(M,02)* for unavailability at time t = 1 hr expr tvalue(1; M, 30)\ +tvalue(1; M, 20)\ +tvalue(1; M, 10)\ +tvalue(1; M, 02)\ +tvalue(1; M, 01) endSharpe code for availability modeling of Case 2

102. 102Modeling Near-Coincident Fault using a Markov Model(Section 9.4.1)System Description:1. 4 CPUs & 3 memories (p & m are failure rates). The system must have at least 2 CPUs & 2 memories working.2. When a CPU or memory fails, the system can reconfigure to remove the failed component.3. Reconfiguration fails iff a second failure of the same component type (as the failed component) occurs during the reconfiguration period. The system cannot cope with such a near-coincident fault, i.e., the system fails if such a near- coincident fault occurs during the reconfiguration period.

103. 1034. Reconfiguration rate is F3,33,22,32,24,34,2F4Pc(4,P)4Pc(4,P)3Pc(3,P)3Pc(3,P)3mc(3,m)3mc(3,m)3mc(3,m)3P*(1-c(3,P))+ 3m*(1-c(3,m))3P*(1-c(3,P))+2m4P*(1-c(4,P))+2m4P*(1-c(4,P))+ 3m*(1-c(3,m))2P+ 3m*(1-c(3,m))2P+ 2mHere c(n,) means the coverage factor when 1 out of n components (with failure rate ) fails: it is the probability that the system can successful perform a reconfiguration using the remaining n-1 components.

104. 104faultrecoveredTime to occur: (T2)F(n-1)(T1)Probability offaultrecovered when t = 

105. 105In general (even for non-exponential distribution)Laplace Tranform for F(t) ispdf of T2prob. {t <T1}pdf for T2faultrecovered

106. 106Sharpe code:func c(n, )\ alpha/(alpha+(n-1)*)bind alpha 360endmarkov sift(p,m) 43 33 4*p*c(4,p) 33 23 3*p*c(3,p) 42 32 4*p*c(4,p) 32 22 3*p*c(3,p) 43 42 3*m*c(3,m) 33 32 3*m*c(3,m) 23 22 3*m*c(3,m)* to failure state 43 F 4*p*(1-c(4,p))+3* m*(1-c(3,m)) 33 F 3*p*(1-c(3,p))+3* m*(1-c(3,m)) 23 F 2*p+3* m*(1-c(3,m)) 42 F 4*p*(1-c(4,p))+2* m 32 F 3*p*(1-c(3,p))+2* m 22 F 2*p+2* mend 43 1.0endexpr mean(sift, F; 0.0001, 0.00001)expr 1-value(10;sift;0.0001,0.00001)end

107. 107Chap. 6 & Chap. 12: Performability Modeling6.4 Markov Reward Model We can associate each state with a “reward” denoting theperformance level given by the system while it is in that state.State j is associated with a reward rjMarkov process:State Probabilities:Steady-state probabilities:Then, the amount of reward accumulated during an interval (0, t)is given by:

108. 108Ex:210X(t)StatelabelstX(t) vs. t234Y(t)AccumulatedrewardtY(t) vs. t12Reward Z(t)tZ(t) vs. t210r2=2r1=1r0=0210with no absorbing states:with absorbing states: a finite value, denoted by Y(): the accumulated reward until absorptionAn absorbing state

109. 109a) Expected reward at time t Sharpe: exrt(t; system-name)can be used to represent the instantaneous “computational capacity” of the system at time tb) Expected reward at steady state Sharpe: exrss(system-name)Meaningless for a Markov chain with absorbing states(i.e., meaningful only for irreducible Markov models whichby definition do not have absorbing states)Performability Measures:

110. 110d) Time averaged cumulative reward W(t) = Y(t) / t with absorbing states: Y() is finite  as t  , W() = 0 with no absorbing states: W() is finitec) Expected cumulative reward over the interval [0, t]Sharpe: cexrt(t; system-name)Expected total time that the Markov chainstays at state i during the time interval [0, t]

111. 111f) Probability that the “cumulative reward until absorption” Y() is less than or equal to r when an absorbing state is reached: prob{Y() ≤ r}  meaningful only for a Markov model with absorbing statesSharpe:reward(system-name) or rvalue(r;system-name) = prob{Y() r}(in symbolic form)e) Distribution of cumulative reward: (a hard problem)Sharpe: not providedUsage: can answer the following question:What is the probability that the system is able to achieve a given amount of work r during the interval [0, t]?prob{Y(t) ≤ r}(in numerical form)

112. 112Reward assignmentsEx1: An irreducible Markov model (no absorbing states) reward assignment: ri = 1 to operational states ri = 0 to non-operational states E[Z(t)]=A(t) availability at time t E[Z()]=E[Z]=A steady state availability E[Y(t)]= expected system up time during [0,t] Same assignments for a Markov chain with absorbing states? E[Z(t)]=R(t) reliability E[Y()]=MTTF=E[Y()] infinity (undefined)E[Z()]=0E[Y(t)]=Ex 2: A birth-death process modeling an M/M/k01223k-1kk+1kkk(k-1)Suppose we know the s.s. probability vector  = {0, 1, 2 …}k servers

113. 1131) assign a reward of “# of customers” to each state E[Z(t)] = expected population at time t E[Z()] = E[Z] = steady state population2) assign a reward of “service rate” to each state E[Z(t)] = expected throughput at time t E[Z()] = steady state throughput1) assign a reward of “service rate” to each state E[Z(t)] = expected throughput at time t E[Y()] = expected # of customers serviced before failureEx 3: 2P3M without repair capability the system functions if at least 1 processor & 1 mm functioning • state representation: (i, j) • assume that the service rate of the system in state (i, j) is: mmCPUpm:failure rateP.314chap.12

114. 1143, 23, 13, 02, 02, 12, 21, 01, 11, 20, 10, 22p2p2pppp3 m2 mm3 m2 mmA Markov chain for reliability analysisof a system withoutrepair capability• p= 1/(2*720), m= 1/(720)2) reward assignment: ri = 1 to operational states, i.e., (3,2), (2,2) (1,2), (3,1), (2,1) and (1,1) ri = 0 to non-operational states E[Z(t)] = reliability at time t E[Y()] = MTTF

115. 115bind m 1/(2*720)bind p 1/720Markov 3mem-2proc * memory failure 32 22 3* m 22 12 2* m 12 02 m 31 21 3*m 21 11 2*m 11 01 m * processor failure 32 31 2* p 31 30 p 22 21 2*p 21 20 p 12 11 2*p 11 10 p * reward assignment reward 32 r32 22 r22 12 r12 31 r31 21 r21 11 r11* default is 0 assigned to other states end 32 1.0 endP.375sum(index,low,high, expression)* print prob{Y()  r}* in symbolic formreward (3mem-2proc)* print prob{Y()  200}rvalue (200; 3mem-2proc)* print E[Z(20)]exrt (20; 3mem-2proc)* print E[Z(20)] again based on* the definition of E[Z(t)], i.e., * expr sum(i, 1, 3, sum(j, 1, 2,\(sreward(3mem-2proc, $(i)$(j))*\value(20;3mem-2proc, $(i)$(j)))))** sreward returns the reward assigned* to a state* * Reward assignment to calculate R(t)*bind r32 1 : :bind r11 1** print R(t) at t=20expr exrt(20; 3mem-2proc)* code to be continued in the next pageexpectedreward(throughput)at time t=20** Reward assignment is the * service rate in state (i, j)bind r32 15/9 r22 3/2 r12 1 r31 1 r21 1 r11 1endProbability of thesystem serving less than200 customers beforeit fails

116. 116loop t, 0, 30, 5 expr exrt (t; 3mem-2proc) expr cexrt (t; 3mem-2proc) expr cexrt (t; 3mem-2proc)/tend* we expect * compare E[Z(t)], E[Y(t)] &* end the sharpe program end* R(20) is the same as E[Z(20)] with this reward assignmentexpr exrt (20; 3mem-2proc)expr value (20; 3mem-2proc, 32) + \ value (20; 3mem-2proc, 31) + \ value (20; 3mem-2proc, 22) + \ value (20; 3mem-2proc, 21) + \ value (20; 3mem-2proc, 12) + \ value (20; 3mem-2proc, 11)These twoshould givethe sameresult** What is E[Y()] with* this reward assignment?*

117. 1173, 23, 13, 02, 02, 12, 21, 01, 11, 20, 10, 22p2p2pppp3 m2 mm3 m2 mmmmmppppppmpmmmmAn acyclic (irreducible)Markov chainfor availabilityanalysisPer processor p= 1/(2*720), per MM m= 1/(720)Once a system enters a failure state, the system halts until it enters an operational state again via repairThere is 1 repair facility for processors with the repair rate of p = 1/4 & 1 repair facility for memory modules with the repair rate of m = 1/2, so simultaneous repair is possible in this case.P. 318:This Markov modelis irreducible due to repair

118. 118* When repair exists & Markov chain is* irreducible, E[Z()] exists; it is* the steady state availability in this caseexpr exrss (3mem-2proc)end* processor repair 30 31 up 31 32 up 20 21 up 21 22 up 10 11 up 11 12 up 01 02 up* memory repair 22 32 m 12 22 m 02 12 m 21 31 m 11 21 m 01 11 m bind m 1/(2*720)bind p 1/720bind m 1/2bind p 1/4Markov 3mem-2proc readprobs* memory failure 32 22 3* m 22 12 2* m 12 02 m 31 21 3*m 21 11 2*m 11 01 m * processor failure 32 31 2* p 31 30 p 22 21 2*p 21 20 p 12 11 2*p 11 10 p * reward assignment reward 32 r32 22 r22 12 r12 31 r31 21 r21 11 r11* default is 0 assigned to other states end 32 1.0 endrequired for transient analysis ofirreducibleMarkov chain * prob{Y()  r} * is not meaningful in this case* Print expected cumulative # of clients * served at t=50 as E[Y(50)] cexrt (50; 3mem-2proc)* print E[Z(20)] exrt (20; 3mem-2proc)* print E[Z(20)] again based on* the definition of E[Z(t)], i.e., * expr sum(i, 1, 3, sum(j, 1, 2\(sreward(3mem-2proc, $(i)$(j))*\value(20;3mem-2proc, $(i)$(j)))))* reward assignment for availabilitybind r32 1 : :bind r11 1** print A(t) at t=20 as E[Z(t=20)]; expr exrt(20; 3mem-2proc)*expectedreward(throughput)at time t=20* Reward assignment is the * service rate in state (i, j)bind r32 15/9 r22 3/2 r12 1 r31 1 r21 1 r11 1end

119. 119S. Jajodia & D. Mutchler,“Dynamic voting algorithms for maintaining the consistency of a replicated database”ACM Trans, Database Systems, Vol. 15, No. 2, June 1990, pp. 230-280.Source:Case Study 1: Replicated File Managementone copy:  if failed, then it is not accessibleAvailability10failure raterepair rate  

120. 120Can we use replicated copies to improve availability?consider only the update operations: suppose we have 7 copiesCannot update just one copy and leave the others unchanged will create inconsistency problemsMust maintain one-copy illusion to the user

121. 121Consistency algorithms for replicated data:Static:(simple voting)n copies*can do update if a majority of n copies can be reached & updatedCommunication failureThis partition can do updateThis partitioncannot do update No partition can do any updateA writequorumAnother write quorumThis partition can still do update

122. 122Dynamic voting:can do update if a majority of current (up-to-date) copies (since the last update) can be found and updated. These majoritycopies are called in the “major partition”.Each copy is associated with a set of local variables:1) version number (VN): to tell if the local copy is current2) site cardinality (SC): to tell how many copies are current, e.g., if in the last update, 5 copies were updated, then SC = 5VN=0000000SC = 7VN=1111111SC = 7VN=1111111SC = 7SC = 7Communication failureNo failureThis partition can do update because 4 is a majority

123. 123** All copies within the major partition are updated & the new SC is set to the # of copies in the major partition.2221211SC = 4SC = 7yesno3331211SC = 3SC = 7yesnoSC = 43331211SC = 3SC = 7yesnoSC = 4SC = 3This partition can do updatebecause 2 is a majority of SC=3no4431211SC = 2SC = 7yesnoSC = 4SC = 3no4431211SC = 2SC = 7nonoSC = 4SC = 3nonoSC = 2No partition can do update.System halts & must waitfor repairs to occur.

124. 1244431211SC = 2SC = 7SC = 4SC = 3SC = 2Still not a major partition because the # of copieswith the highest version # (i.e. 4) is 1 which is not amajority of 2 (the SC associated with the current copy)4431211SC = 2SC = 7SC = 4SC = 3SC = 2Reunion Scenarios5551511SC = 7SC = 4A major partition nowRepair ofnetworkpartitioningand nodefailure Repairof networkpartitioning No majorpartition existsNo majorpartition existsyesnononononononono

125. 125Availability modeling:Site-failure only model: there is only one partitionsystem models: Static voting: system is available as long as k out of n are available, so the “site availability” is given by:1) failure rate of each site is 2) repair rate of each site is 3) updates are frequent and there is always an update immediately following a failure/repair.k=

126. 126Dynamic voting: no simple probability expression exists Resort to Markov modelingstate representation( X, Y, Z )Petri net modelingX of Y current copies are alive Y-X of Y current copies are downY = current sitecardinality (SC) or# of current copiesZ of the n-Yother sites are alive but out-of-dateorn: # of initial copies(e.g., n=7)

127. 1272,2,01,2,13,3,00,2,21,2,20,2,31,2,30,2,41,2,40,2,51,2,54,4,07,7,05,5,06,6,01,2,00,2,00,2,153443526 75423324 55423324 522 2 2 2 2 2 repair of a current copy in the major partition** in state (1,2,0): no update can be performed because 1 is not a majority of 2.repair of an out-of-date copy in the major partition=7Site Availability:

128. 128Site availability comparison results: * static voting is better than dynamic voting when n=3up-to-dateup-to-dateout-of-dateUpdate is permitted in static voting but not permitted in dynamic voting* when n>3 dynamic voting is better

129. 129Chap 5: Product-Form Queuing Networks (QN)Entities: 1) service centers — with different service disciplines 2) customers (jobs) — single class — multiple classes (each/w a different workload) 3) links connecting service centersCPU2CPU3CPU1disk 1disk 2exitqueue

130. 130Service disciplines1) FCFS2) Priority — can be preemptive or non-preemptive3) Round Robin (RR) — time-slot based4) Processor Sharing (PS) — the server’s capability is equally divided among all jobs5) Last-Come-First-Serve Preemptive Resume (LCFSPR) — stack push-pop styleOpen vs. Closed QNM Open: customers arrive from an external source, spend time in the system & finally depart. Closed: # of customers circulating among the service centers is a constant, i.e., no external source of jobs & no departure.

131. 131What is a “product-form” solution for a QNM?- The joint probability of the queue sizes in the network is a product of the probabilities of queue sizes in individual service centers.e.g., a tandem queuing network with 2 servers121n1, n2n1-1, n2+1n1+1, n2n1-1, n2n1, n2+1n1+1, n2-1n1, n2-1111122a job arrives at server 1a job leaves server 2a job leaves server 1a Markov chain

132. 132By solving the steady-state global balance equations (one for each state),it can be shown that:The joint population probability that there are n1 jobs at server 1 & n2 jobs at server 2 is the product of the population probabilities for two individual M/M/1 queues.Prob. that the system hasn1 at server 1& n2 at server 2For the 1st M/M/1 queueFor the 2ndM/M/1 queue1112

133. 133Product-Form Queuing NetworksThis is true when the following characteristics hold (p. 93, text):1. The routing of customers from one service center to the next must be history independent, i.e., memory less (or Markovian).2. The queuing disciplines may be FCFS, PS (Processor Sharing), IS (Infinite Server) or LCFSPR (Last Come First Serve with Preemptive-Resume)3. For an FCFS center, the service time distribution must be exponential; for other servers, the service time distribution does not have to be exponential but must be differentiable (w. r. t. time)4. A product-form network may have multiple chains (multiple classes) of jobs and may be open with respect to some chains of jobs and closed with respect to others. External arrivals for all open chains must be Poisson.Pj(nj) is a function onlyof the j-th centerA QN is said to have a product-form solution if

134. 134Open product-form QNs: All jobs arrive from an external source & depart to a sink. From the external sourceFrom othercenters in the networkr121Ex:1 = r12 = 1×1  1 = 2 = r1 A general method to solve an open system QNM with a PF solution:1) get input arrival rate for each center.2) analyze each center separately.3) get aggregate measures, e.g., Arrival rate to j

135. 135e.g. r1 = 0.5, 1 = 1 & 2 = 2 Evaluate each center independently time spent in the system by a customer:Waiting timeat center 1,Waiting timeat center 2,

136. 136Ex: open product-formQNM with feedback132r1P1P3P2Q: X, n, R?X= 1 P1 = (r1/ P1) ×P1= r1n = n1+n2+n3R = n/X

137. 137In general, consider a network with J service centers serving N jobs:Closed Product-Form Networks A network with a set of jobs circulating indefinitely or a networkin which a job leaving the network will be replaced instantly by a statistically identical new job. e.g.,21N = 4probability that a job leaving center k moves to center jVisit countto center jA particular center’s visit count is set to one based on the model’s physical meaning.

138. 138Solution Technique: Mean Value Analysis Algorithm — it yields the average values of performance measures. In a closed system with N jobs, when a job arrives, it actually sees only (N-1) jobs distributed in the system.Notation:

139. 139Formulas:Recursion:k=0k=1k=k+1 until k=Nfor all j’sGivenjvjForall j,1jJrj (k) is estimated just like in M/M/1 except that population is one lessA particular center’s visit count is set to one based on the model’s physical meaning.

140. 140e.g.,2=11=2N=4Set v1 to 1 then v2 = 1( relative visit count is the same for both centers in this example)P.100, textk=0Starting withk=1(Little’s Law)( v1 = v2 = 1)k=2

141. 141k=3k=4(last iteration)

142. 142Using sharpe to solve closed QNMsThere are 6 types of service centers in a closed QNM which can be specified in a sharpe program:1. FCFS — Syntax: station-name fcfs rate Q: how to calculate rj (k) for a IS center?2. IS — Syntax: station-name is rate — there are infinite # of servers in the center3. MS — Syntax: station-name ms #servers rate — there are multiple servers in the center, each with the identical service rate4. LCFSPR — Syntax: station-name lcfspr rate5. PS — Syntax: station-name ps rate — all n jobs present at the center share one server with each job seeing the server speed reduced by a factor of n6. LDS (Load-dependent server) — Syntax: station-name lds rate1, rate2, … — all jobs at the center again share one server but the service rate of the server is load dependent (i.e., depending on the # of jobs present in the center)Single-chainMultiple-chainAutomatically computed by sharpeMust be specified in the sharpe code

143. 143Program structure for single-chain (or single class) product-form queuing networks (for a closed system) pfqn {(para-list)} * section 1: station-to-station probabilities < station-name station-name expression> end * section 2: station types & parameters < station-name station-type expression, …> end * section 3: number of customers per chain (or per class) < chain-name expression> endB.4.6. P.361 text

144. 144CPU1MP1= 0.667P2= 0.233disk 1disk 242.9 ms30 msP0= 0.120 ms25 sTerminalse.g., p.222 & p.102Service timeper visit* central server model bind P1 0.667 P2 0.233 end pfqn csm CPU disk1 P1 CPU disk2 P2 CPU terminals 1-P1-P2 terminals CPU 1 disk1 CPU 1 disk2 CPU 1 endSection1* station types & parameters terminals IS 1/25 CPU fcfs 1000/20 disk1 fcfs 1000/30 disk2 fcfs 1000/42.9 end* number of jobs chain1 M endSection2Section3For other service disciplinessee p.362, textThese are rateparameters

145. 145* (continued)* reporting per center (CPU) measuresloop i, 2, 10, 2 bind M i expr tput (csm, CPU) expr util (csm, CPU) expr qlength (csm, CPU) expr rtime (csm, CPU)end* calculate the system response time * by applying Little’s Lawfunc x() \ tput (csm, CPU) * (1-P1-P2)func nbar() \ qlength (csm, CPU) + \ qlength (csm, disk1) + \ qlength (csm, disk2) bind M 10* calculate the average response time per* terminal user once it enters the central* system when M=10 in the terminal center expr nbar()/x()end /* end the entire program */Apply MVA to this closed system and set the visit count to 1 for the terminals center. You should get the same output. Note: set rterminals = 25s because it is a IS center. qlength returns per-center population

146. 146Multiple-chain product-form queuing networks (for a closed system) mpfqn {(parameter-list)} * section 1: station to station probabilities for each chain. <chain chain-name <station-name station-name expression> : end> end * section 2: station types & parameters <<station-name station-type expression, …> <chain-name expression, …> : end> end * section 3: number of jobs per chain <chain-name expression> end

147. 147Example:diskCPUTwo classes:chain A: 2 jobschain B: 1 jobService time(service demand)DA,CPU = 2; DA,disk = 1DB,CPU = 3; DB,disk = 2Performance measures of interest?— Response time of a job (system)— Response time of a class A job (per class)— Response time of a class B job in the CPU center (per center per class)— Throughput of the CPU center for class A jobs (per center per class)— Utilization of the CPU center for class B jobs (per center per class)

148. 148* An example of using sharpe for solving a multiple-class product form queuing network* Two classes: A and B; assume visit count is 1 for each center* number of jobs: (2A, 1B)* number of stations: 2 -- cpu and disk* DA,cpu = 2* DA,disk = 1* DB,cpu = 3* DB,disk = 2*want to know RA, XA,(nA,cpu),(UA,cpu) mpfqn simple* section1: station to station transition probabilitieschain Acpu disk 1disk cpu 1endchain Bcpu disk 1disk cpu 1endenddiskCPUDA,CPU = 2; DA,disk = 1DB,CPU = 3; DB,disk = 2

149. 149*In general need to calculate RA=nA/XA.*But for the simple system here, we*can calculate RA = RA,cpu + RA,diskexpr mrtime(simple,cpu,A) +mrtime(simple,disk,A)*XA = XA, cpu for this simple systemexpr mtput(simple,cpu,A)*population of class A at CPU: nA,cpuexpr mqlength(simple,cpu,A)*utilization of class A at CPU: UA,cpuexpr mutil(simple,cpu,A) end-------------------------------------------------Output:mrtime(simple,cpu,A)+mrtime(simple,disk,A): 6.3478e+00 ------------------------------------------- mtput(simple,cpu,A): 3.1507e-01 ------------------------------------------- mqlength(simple,cpu,A): 1.4795e+00 ------------------------------------------- mutil(simple,cpu,A): 6.3014e-01*section 2: station types and parameterscpu ps A 1/2B 1/3enddisk psA 1/1B 1/2endend*section 3: number of customers in each * chainA 2B 1end Per-center per class-> per class measures Summation applies to population onlyPer-class measures -> system measures Summation applies to population and throughputOnce you know population and throughputYou can know the response time by Little’s Law

150. 150Chap 11: Hierarchical ModelsObjective: to avoid large models so as to improve solution efficiency.121235414151311867910Ex1:1214151311bridge 1bridge 2WYXZv1v3v2v4v5upper-level model(a reliability block diagram)Lower-level model for a bridge(a reliability graph)

151. 151P.265Partialsharpecodeshownrelgraph rbridge (v1, v2, v3, v4, v5)w x exp (v1)x z exp (v2)w y exp (v4)y z exp (v5)bidirectx y exp (v3)endblock rel-in-blockcomp 11 exp (u11)comp 12 exp (u12)comp 13 exp (u13)comp 14 exp (u14)comp 15 exp (u15)comp bridge1 cdf (rbridge; u1, u2, u3, u4, u5)comp bridge2 cdf (rbridge; u6, u7, u8, u9, u10)parallel C 12 13series D bridge1 11 Cseries E 14 bridge2 15parallel top D Eendeval (rel-in-block) 0 50000 500end

152. 152Ex2: A queuing model with resource constraintsP.277CPU1MP1= 0.667P2= 0.233disk 1disk 242.9 ms30 msP0= 0.120 ms25 sec# of runningjobs in the central serveris limited to n < MWithin thedashed lineis the centralserver systemNot in product-formbecause of resourcelimitations causing input flow ≠ output flow1MA load dependent centerIn product-form: both servers can be evaluated independently

153. 153* low-level model pfqn inner(n) CPU disk1 P1 CPU disk2 P2 CPU CPU 1-P1P2 disk1 CPU 1 disk2 CPU 1 end CPU fcfs 1000/20 disk1 fcfs 1000/30 disk2 fcfs 1000/42.9 end chain1 n end end* high-level model pfqn outer(M) term central 1.0 central term 1.0 end1Mcentral (lds)term25 sec.think time*station types term is 1/25 central lds X(1), X(2), X(3), X(4) end chain1 M end* define function for lds throughput X(n) func X(n) tput(inner,CPU;n)*(1- P1-P2)* can also be obtained as * (1000/20 * util(inner,CPU;n))*(1- P1-P2)* by Little’s Law, i.e., xCPU = CPU * CPU bind P1 0.667 P2 0.233 end* reporting each terminal user’s response time in * the central system as the number of users (M)* increases loop i, 0, 4, 1 expr 5*(2^i) expr rtime(outer, central; 5*(2^i)) end endServicerate of CPUUtilizationof CPU

154. 154Ex3: A queuing model with job priorities two classes of jobs: 1 & 2low priorityhigh priority at the CPU only1 M1=1/12; 2=1/7 class 1: 0.1 sec. class 2: 0.06 sec.(class 2 has higher priority at CPU)servicedemandP0(class 1)=1/15P0(class 2)=1/31P1(class 1)=8/15P1(class 2)=5/31P2(class 1)=5/15P2(class 2)=15/31CPUP3(class 1)=1/15P3(class 2)=10/31disk10.03 sec.disk30.03 sec.disk20.03 sec.M1=3 & M2=4P.284text

155. 155* performance measures of interest: response time & queue length at CPU.* not in product-form because of priority scheduling.Approximation solution: suppose u2 is the utilization of the CPU dedicated to class 2 jobs. Then the CPU service rate for class 1 jobs is slowed down by a factor of (1-u2)* we don’t know u2 since it is an output, but we need it as an input for class 1 jobs. use iterative techniqueCreate two CPUs, one for class 1 & the other for class 2, with the CPU service rate to class 1 jobs reduced by a factor of (1-u2)

156. 156Sharpe code (see p.285, text) mpfqn iter (M1, M2, u2) * chain 1 for class 1 jobs chain 1 CPU1 disk1 8/15 CPU1 disk2 5/15 CPU1 disk3 1/15 CPU1 terminals 1/15 disk1 CPU1 1 disk2 CPU1 1 disk3 CPU1 1 end * chain 2 for class 2 jobs chain 2 CPU2 disk1 5/31 CPU2 disk2 15/31 CPU2 disk3 10/31 CPU2 terminals 1/31 disk1 CPU2 1 disk2 CPU2 1 disk3 CPU2 1 terminals CPU2 1 end endSection 1: routingprob. perclass* Section 2: server types CPU1 fcfs (1-u2)*1/0.1 end CPU2 fcfs 1/0.06 end disk1 fcfs 1/0.03 end disk2 fcfs 1/0.03 end disk3 fcfs 1/0.03 end terminals is 1 1/12 2 1/7 end end* Section 3: number of jobs per class 1 M1 2 M2 endSection2class 1class 2Section3Servicerate of class 1jobs isreduced by a factor of (1-u2)

157. 157* we don’t know what the initial value of u2 is, * so make a guess u2=0 initially bind u2 mutil (iter, CPU2, 2; 3, 4, 0)* continue this for a sufficient # of iterations until u2 converges  try 5 times loop i, 1, 5, 1 bind u2 mutil (iter, CPU2, 2; 3, 4, u2 ) end* outputs are: * i=1 u2  0.659839* i=2 u2  0.659838* i=3 u2  0.659838* (converged after 3 iterations)* try starting u2 with another initial value,* say u2 =0.9 bind u2 0.9 loop 1, 1, 5, 1 bind u2 mutil (iter, CPU2, 2; 3, 4, u2 ) endsystemnamestationnamechain 2parameters for M1, M2, & u2 M1=3; M2=4 & u2 is equal to the u2 in the previous iteration* outputs are* i=1 u2  0.660454* i=2 u2  0.659839* i=3 u2  0.659838* printing response time & queue size expr mrtime (iter, CPU1, 1; 3, 4, u2 ) expr mrtime (iter, CPU2, 2; 3, 4, u2 ) mqlength (iter, CPU1, 1; 3, 4, u2 ) mqlength (iter, CPU2, 2; 3, 4, u2 )* outputs are* R1,CPU=0.47534* R2,CPU=0.10511 endu2 is alsoconvergedin 3 iterationsR1,CPU=0.28483R2,CPU=0.15834to be compared withthe corresponding parameter valueswithout priorityscheduling

158. 158Ex4: M/M/1/k queue with server failure & repair0,11,19,09,110,010,10,02,02,11,0…P.233, text & p.294: failure rate : job arrival rate: repair rate : job service rate1-level modelM/M/1/10prob {idle server} = prob(0,0) + prob(0,1)rejection probability = prob(10,0) + prob(10,1)State representation (a, b)# of jobs1 alive0 failed

159. 159Two-level model observation: job arrivals/services are much faster than server failures/repairs isolate out the fast recurrent set of states from the 1-level model, analyze it for steady-state probabilities & replace it by a single state in the original model. the assumption below is justified: “ the set of states ………….. whose transitions are job arrivals and departures will reach equilibrium between the times when a failure/repair occurs.”0,11,19,110,1

160. 160High-level:0,01,02,09,010,01……...*prob{low-level, }10,1*prob{low-level, }0,1Low-level:0,11,19,110,12,1…Prob{idle server} = prob(high-model, ) + prob(high-model, ) * prob(low-model, )0,00,11Rejection prob = prob(high-model, ) + prob(high-model, ) * prob(low-model, )10,010,11

161. 161Chap 7: Stochastic Petri Net Models* A stochastic Petri net (SPN) consists of places, transitions, arcs, tokensand a set of firing rules.placetokensrepresentingjobsarcP1P2An arc’s multiplicity canbe 1 (default), >1, or a variable depending onthe state of the systemimmediate transitionthe transition time is zero timed transitionthe transition time is exponentially distributedBoth are allowed in a generalized SPN model, or a GSPN model (SPNP is based on GSPN)can be arbitrary distributionin an extended SPN (ESPN) model

162. 162* Firing rule: A transition is enabled if: a) the # of tokens in each input place without an inhibitor arc is at least equal to the multiplicity of the input arc from that place. b) the # of tokens in each input place with an inhibitor arc is less than the multiplicity of the input inhibitor arc from that place. c) the enabling function of the transition (if any is assigned) returns TRUE — which is the default if not assigned.Concept of a state in SPN: Each distinct Petri net marking (as a result of tokens being distributed to various places) constitutes a separate state in the underlying Markov model.

163. 163e.g.••••abState(2, 0)(1, 1)••P1P2cd••P1P2cdP2ca••P1dbe.g.State (a, c, d) (2, 0, 0) (1, 1, 0) (1, 0, 1)P1P2

164. 164Ex: draw an SPN corresponding to the following M/M/1/5 queue5,04,11,40,53,22,3•• • • •abqueuefree bufferarrival# of free buffer slots still available# of jobsin the system* An inhibitor arc of multiplicity m from a place P to a transition t will disable t when P contains at least m tokens.* A transition can be associated with a priority, an enabling function which can be state-dependent, and a rate function which can also be state-dependent.* when both immediate and timed transitions are enabled in a marking, only immediate transitions will fire.mPtWhen there are many transitions enabled, the highest priority one will be fired first.(also called a guard)

165. 165Q:•23P1P2P3t1t3t2123How many states will be generated based on this SPN?* when a transition fires, the # of tokens removed in each of its input places is equal to the multiplicity of the input arc, and the # of tokens deposited in each of its output places is equal to the multiplicity of the output arc.Ans:absorbingstate3,0,16,0,00,0,24,1,02,2,00,3,011122233311,1,1

166. 166Q2: What is the underlying Markov model of this SPN?Ans:Reachability Graph:* The above state diagram is called the reachability graph of the SPN model. * When the SPN model does not contain immediate transitions, the reachability graph is a Markov chain.•3P1P2P3t1t3t21323,0,10,0,26,0,01133Q1: What is the reachability graph of this SPN?Ans: same as the one on the previous page except 2is replaced by infinity

167. 167Ex: draw an SPN using inhibitor arcs for an M/M/m/b queue with m=3 and b=5 This example features an inhibitor arc and a transition rate functionwhich depends on the marking (state) of the system.(# of tokens in place “ buf ”) ×  if # of tokens < m m ×  if # of tokens  m service rate =0145232333mserversbuffer sizelimitationt2t1bbuf

168. 168Ex: a M/M/1/6 with a bulk service center (e.g., an elevator) capable of servicing 3 jobs per service whenever there are 3 jobs to be serviced0145236Define an arc multiplicity function for the input arc from buf to t2 to return a value of3 if tokens(buf)  3tokens(buf) otherwiseMarkov modelthe corresponding SPN modelbt2buft1

169. 169Stochastic “Reward” Petri NetAssigning a reward to each “marking” of the systemLike a state in a Markov modeltangible:a state that enablesno immediate transitionvanishing:not tangibleabsorbing:a state that does notenable any transition(it is also tangible by definition)vanishing markings are not shown as “states” in the corresponding Markov chain.(they are shown in the reachability graph)

170. 170Structure of an SPNP program: (no main procedure)The following procedures must be included in an SPNP program: 1. parameters(){} for reading input parameters: double input(msg) can be called within.*2. net(){} for defining the stochastic Petri net. 3. assert(){} for checking illegal markings 4. ac_init(){} called before starting the reachability graph construction: normally empty (pr_net_info() can be called within) 5. ac_reach(){} called after the reachability graph is constructed: normally empty (pr_rg_info() can be called within)*6. ac_final(){} for calculating & reporting performance results.return(RES_NOER)return(RES_ERROR)Output to .out file*introduced later in detail

171. 171Frequently-used SPNP built-in functions:1. Within net() for defining a stochastic Petri net: • place(char *name) • trans(char *name) • init(name, n) the initial # of tokens in place “name” is n • priority(char *name, int priority) priority of transition “name” • guard(char *name, func) enabling_type(*func)(); • rateval(char *name, rate_type val) • ratefun(char *name, func) • probval(char *name, probability_type val) • probfun(char *name, func) “func” can be a marking-dependent functionassociating an enablingfunction to transition “name”for a timedtransitionfor an immediatetransitionrate_type(*func)();probability_type(*func)();Default: - priority=0 (lowest priority) - no enabling function (or a function always returning TRUE)Use mark(p_name) to return # of tokens in place p_name, & enabled(t_name) to see if transition t_name is enabled

172. 172iarc(char *t_name, char *p_name);oarc(char *t_name, char *p_name);harc(char *t_name, char *p_name);iarcharcoarcwith arcmultiplicity=1with arc multiplicity = constantwith arc multiplicity definedby a functionmiarc(t_name, p_name, mult)moarc(t_name, p_name, mult)mharc(t_name, p_name, mult)viarc(t_name, p_name, func)voarc(t_name, p_name, func)vharc(t_name, p_name, func)intint (*func)()

173. 1732. within ac_final for reporting the final analysis results:A. For calculating E[Z()] pr_expected(msg, function)expected(function)Printing/returningthe expected reward definedby “function”which assignsrewards to statesreward_type (*function)();char *msg; doublee.g.,reward_type util(){if mark (“P1”) return 1; else return (0);}reward_type X(){if enabled (“t1”) return rate (“t1”); else return (0);}t1P1this willassign areward toevery state of the systemUse mark(p_name) to return # of tokens in place p_name, & rate(t_name) to return the transition rate of transition t_name

174. 174B. For calculating E[Z(t)]: expected reward at time texpected(function)must call time_value(double t) prior to calling expected(function) in ac_final()must call para(IOP_METHOD, VAL_TSUNIF) in parameters() for transient analysisC. For calculating E[Y()]: cumulative expected reward until absorptioncum_abs(function) D. For calculating E(Y(t)]: cumulative expected reward over [0, t]cum_expected(function)must call time_value(double t) prior to calling cum_expected(function) in ac_final()must call para(IOP_METHOD, VAL_TSUNIF) in parameters()See p.13 SPNP reference guide v.3.1; TSUNIF stand for “Transient Solution using Uniformization”; if not set, the default is VAL-SSSOR (Steady State SOR)reward_type(*function)()reward_type(*function)()reward_type(*function)()

175. 175 (a variable rate)1mbM/M/m/bStochasticPetri netmodelExample:#include “user.h”double lambda;double mu;int b;int m;parameters(){ lambda = input(“enter lambda”); mu = input(“enter mu”); b = input(“enter b”); /*b=5 in this example*/ m = input(“enter m”); /*m=3 in this example*/}The underlying Markov model of M/M/3/50145232333trservtrinbbuf

176. 176rate_type rate_serv(){ if ( mark(“buf”)< m) return ( mark(“buf”)*mu); else return (m*mu);}net(){ place (“buf”); trans (“trin”); trans (“trserv”); rateval (“trin”, lambda); ratefun (“trserv”, rate_serv) oarc (“trin”, “buf”); iarc (“trserv”, “buf”); mharc (“trin”, “buf”, b)}fixed transition ratevariable transition ratetrservtrinbbuf (a variable rate)

177. 177assert(){ if (mark(“buf”) > b) return (RES_ERROR); else return (RES_NOERR);} ac_init() {pr_net_info();}ac_reach() {pr_rg_info();}/* reward assignment functions for calculating performance metrics */reward_type population() {return (mark(“buf”));}reward_type util() {return (enabled(“trserv”));} /* or return mark(“buf”) > 0 */reward_type tput() {return (rate(“trserv”));}reward_type probrej() {if (mark(“buf”)==b) return (1.0); else return (0.0);}ac_final(){ printf(“average population = %f\n”, expected (population)); /*output to screen*/ pr_expected (“average throughput”, tput); /* output to *.out */ pr_expected (“average utilization”, util); pr_expected (“rejection probability”, probrej); pr_value (“response time”, expected (population)/expected (tput)); }01b2....m2....mm

178. 178An M/M/N/K queue with server failure & repairFor an M/M/1/10 queue with server failure/repair, we have:0,11,19,09,110,010,10,02,02,11,0…The state representation is (a, b)# of serversbuffer space# of jobs# of alive serversP.235, textWe can study any (N, K) easily with an SPN without having torecreate a Markov model for each (N, K) pair. •• • • •Nserver_downt3t4server_upvariable rate:mark(“server_up”)*g where g is the per-server failure ratejobKt1enabling function: if mark(“server_up”) return true; service rate: Depending on # of jobs, i.e., mark(“job”), and # of alive servers, i.e., mark(“server_up”)…t is a fixed repair rate of one facilityt2

179. 179 * number of digits after decimal point set to 8format 8loop t, 0, 100, 50 expr R(t) expr exrt(t,TMR)endexpr mean(TMR,F)expr mean(TMR)end========== output ============* The following loop prints the* reliability as a function of t t=0.000000 R(t): 1.00000000e+00 exrt(t,TMR): 1.00000000e+00t=50.000000 R(t): 9.93096301e-01 exrt(t,TMR): 9.93096301e-01t=100.000000 R(t): 9.74555818e-01 exrt(t,TMR): 9.74555818e-01------------------------------------------- mean(TMR,F): 8.33333333e+02------------------------------------------- mean(TMR): 8.33333333e+02* An example to illustrate how to * calculate reliability and * mean time to failure of a TMR* system using a Markov modelmarkov TMR3 2 3 * lambda2 F 2 * lambdareward3 12 1end* initial probability3 1end bind lambda 0.001end* value(t; system, state) or * tvalue(t; system, state) returns the* probability of the system in “state” at time tfunc R(t) 1 - value(t; TMR, F)echo The following loop prints theecho reliability as a function of t32F32

180. 180/* An example to illustrate how to calculate reliability and mean time to absorption of a TMR system using SPNP */#include <stdio.h>#include "user.h"#define LAMBDA 0.001 parameters(){iopt(IOP_METHOD,VAL_TSUNIF); /* for transient analysis */ } assert() {} ac_init() { pr_net_info();}ac_reach(){ fprintf(stderr,"\nThe reachabiliity graph has been generated \n"); pr_rg_info();}rate_type failure_rate(){ return(LAMBDA * mark("p_sites"));}net(){ place("p_sites"); init("p_sites", 3); trans("t_failure"); ratefun("t_failure", failure_rate); iarc("t_failure","p_sites");} See p.13 SPNP reference guide v.3.1; TSUNIF stand for “Transient Solution using Uniformization”; if not set, the default is VAL-SSSOR (Steady State SOR)p_sitest_failurevariablefailureratereward_type reliability(){ if (mark("p_sites") >= 2) return(1.0); else return(0.0);}ac_final(){ double t; for (t=0; t<=100; t += 50){ time_value(t); pr_expected("Reliability at this time = ", reliability); } pr_mtta("mean time to absorption = ");}Must call time_value() before calling expected() for transient analysis

181. 181============= output ===============NET:================================= places: 1 immediate transitions: 0 timed transitions: 1 constant input arcs: 1 constant output arcs: 0 constant inhibitor arcs: 0 variable input arcs: 0 variable output arcs: 0 variable inhibitor arcs: 0=================================RG:================================= tangible markings: 4 (1 absorbing) vanishing markings: 0 marking-to-marking transitions: 3================================================================== TIME : 0.000000000000=================================EXPECTED: Reliability at this time = 1================================= TIME : 50.000000000000=================================EXPECTED: Reliability at this time = 0.993096301257================================= TIME : 100.000000000000=================================EXPECTED: Reliability at this time = 0.97455581787MTTA: mean time to absorption = 1833.33333333Because the absorbing state is 0, not 1.To model a true TMR system,add guard(“t_failure”, t_efunc) in net(){}where t_efunc is defined as: enabling_type t_efunc() { if (mark(“P_sites”) >= 2) return 1; else return 0; }3210

182. 182* number of digits after decimal point is set to 4format 4 echo ===============================echo The following loop prints the cumulativeecho expected reward, i.e., the total number of jobs echo having been serviced, over (0,t)loop t, 0, 100, 50* print cumulative expected reward over (0,t) expr cexrt(t; mp)end echoechoecho ===============================echo The following loop prints the probabilityecho that cumulative reward is less than a specifiedecho value r (i.e., the probability that less than r jobs echo have been serviced) when the system failsloop r, 2000, 0, -1000* print probability that the cumulative reward is less * than r when the system fails expr rvalue(r; mp)end* An example to illustrate how to * calculate the performability of a 1* out of 4 processor system using sharpebindlambda 0.001* assume that one processor is able to process * one job per time unitmu 1end markov mp readprobs4 3 4 * lambda3 2 3 * lambda2 1 2 * lambda1 F lambda* reward is throughputreward4 4*mu3 3*mu2 2*mu1 muend* initial probability4 1end432431F2

183. 183echoechoecho ==============================echo how to compute the cumulative expectedecho reward (number of jobs serviced) until echo absorption ????end=============== output ============================================The following loop prints the cumulative expected reward, i.e., the total number of jobs having been serviced, over (0,t) t=0.000000 cexrt(t;mp): 0.0000e+00 t=50.000000 cexrt(t;mp): 1.9508e+02 t=100.000000 cexrt(t;mp): 3.8065e+02 ===============================The following loop prints the probability that the cumulative reward is less than a specified value r (i.e., the probability that less than r jobs have been serviced) when the system fails------------------------------------------- r=2000.000000 rvalue(r; mp): 1.4288e-01 r=1000.000000 rvalue(r; mp): 1.8988e-02 r=0.000000 rvalue(r; mp): 0.0000e+00 ===============================how to compute the cumulative expected reward (number of jobs serviced) until absorption ????

184. 184/* An example to illustrate how to calculate the performability of a 1 out of 4 processor system using SPNP */ #include <stdio.h>#include "user.h" #define LAMBDA 0.001#define MU 1 parameters(){iopt(IOP_METHOD,VAL_TSUNIF); /* Transient analysis */}assert() {}ac_init() { pr_net_info();} ac_reach(){ fprintf(stderr,”\nThe reachabiliity graph has been generated \n"); pr_rg_info();}rate_type failure_rate(){ return(LAMBDA * mark("p_sites"));}net(){ place("p_sites"); init("p_sites",4); trans("t_failure"); ratefun("t_failure", failure_rate); iarc("t_failure","p_sites");}reward_type job_service_rate(){ if (mark("p_sites")) return(mark("p_sites")*MU); else return(0.0); /* reward is throughput */}ac_final(){double t; for (t=0; t<=100; t += 50){ time_value(t); pr_cum_expected ("Expected cumulative number of jobs serviced from (0,t) ", job_service_rate);}pr_cum_abs("Expected cumulative number of jobs serviced until absorption", job_service_rate);} .p_sitest_failurevariablefailurerate

185. 185================================= TIME : 0.000000000000=================================Expected cumulative number of jobs servicedfrom (0,t) = 0================================= TIME : 50.000000000000=================================Expected cumulative number of jobs serviced from (0,t) = 195.082301997================================= TIME : 100.000000000000=================================Expected cumulative number of jobs serviced from (0,t) = 380.650327856Expected cumulative number of jobs serviced until absorption = 4000============== output ==============NET:================================= places: 1 immediate transitions: 0 timed transitions: 1 constant input arcs: 1 constant output arcs: 0 constant inhibitor arcs: 0 variable input arcs: 0 variable output arcs: 0 variable inhibitor arcs: 0=================================RG:================================= tangible markings: 5 (1 absorbing) vanishing markings: 0 marking-to-marking transitions: 4=================================

186. 186Case Study 2Dynamic Quota-Based Admission Control With Subrating in Multimedia ServersSheng-Tzong Cheng, Chi-Ming Chen and Ing-Ray ChenACM/Springer Journal on Multimedia Systems, Vol. 8, No. 2, 2000, pp. 83-91.

187. 187BackgroundReservation-based admission control Allocates a fraction of the server capacity for a new request based on certain criteria. The allocated server capacity is reserved for the specific request until it leaves the system. Problem: A new request may be rejected if no available resource is left to serve the request. In such a case, the system incurs a loss due to the rejected request.

188. 188Background (Cont.)Possible ways of reservation-based admission control Deterministic approach * using the worst-case scenario to provide absolute Quality of Service (QoS) guarantee * resources are under-utilized Best-effort approach * based on statistical or average estimations of the required data rate * no absolute QoS guarantee

189. 189Subrating Mechanism Quota-based Reservation Partition the server capacity into several partitions (or quotas) Subrating mechanism Reduce the QoS of low-priority clients to accept a new high-priority client with an objective to achieve a higher ``system value’’.

190. 190Notationlh Arrival rate of high-priority clients (HPCs)ll Arrival rate of low-priority clients (LPCs)m Departure rate of clientsvh Reward of a HPC if the client is serviced successfullyvl Reward of a LPC if the client is serviced successfullyqh Penalty of a HPC if the client is rejected on admissionql Penalty of a LPC if the client is rejected on admissionN Total number of server capacity slots for servicing clientsnh Number of slots reserved for HPC only, 0<=nh<=N

191. 191

192. 192System ModelLow-priority partition nlCommon poolpartition nmA new requestHigh-priority partition nh

193. 193System Model (cont.) A high priority client does not degrade its QoS, while a low priority client has a range of QoS requirements (Qmax, Qmin) with Qmin = (1- 1/a) QmaxA low priority client in the common pool area can degrade its QoS once by 1/a (to Qmin) if necessary; if it departs in degraded service mode, the system only receives (1- 1/a) vl If the common pool area is all occupied, a low-priority clients (if available) each degrade their QoS by 1/a to make room to accommodate an arriving high-priority clientA degraded low priority client can raise its QoS level to Qmax when a client in the common pool area departs

194. 194Payoff Function Definition: The average system value received by the server per time unitThe payoff function is given by: Xhvh +Xlvl +Xld[vl (1- 1/a)] - Mhqh -Mlql

195. 195A Class of Quota-Based Admission Control AlgorithmsFree-quota scheme nh = 0, nl = 0, and nm = NFixed-quota scheme nm = 0Dynamic quota scheme: (nh, nm, nl ) with subrating With no subrating

196. 196SPN Model for Dynamic Quota with No Subrating Figure 1. SPN Model for Quota-Based Admission Control with No Subrating (NoSUB)nm nh nl

197. 197Places:(In the high partition)RH: mark(RH) indicates the number of available slots for high-priority clientsH: mark(H) indicates the number of high-priority clients being served (mark(RH) + mark(H) = nh)(In the low partition)RL: mark(RL) indicates the number of available slots for low-priority clientsL: mark(L) indicates the number of low-priority clients being served (mark(RL) + mark(L) = nl)(In the common pool partition)RS: mark(RS) indicates the number of available slotsSH: mark(SH) is the number of high-priority clients using the common pool partSL: mark(SH) is the number of low-priority clients using the common pool part (mark (RS) + mark (SH) +mark(SL) = nm)SPN Model for Dynamic Quota with No Subrating (cont.)

198. 198Transition: Rate Function: Enabling function:T1: h trueT2: mark(H) *  trueT3: l trueT4: mark(L) *  trueT5: h mark(RH) == 0T6: mark(SH) *  trueT7: l mark(RL) == 0T8: mark(SL) *  trueSPN Model for Dynamic Quota with No Subrating(cont.)

199. 199SPN Model for Dynamic Quota with Subrating Figure 2. SPN Model for Quota-Based Admission Control with Subrating (NoSUB)a * (a-1)a * aaa-1aaaaaaaa -11aaa -1a tokens represent 1 full slot in the middle partition

200. 200Places:(In the high partition) -- Each slot is represented by 1 tokenRH: mark(RH) indicates the number of available slots for high-priority clientsH: mark(H) indicates the number of high-priority clients being served (mark(RH) + mark(H) = nh)(In the low partition) -- Each slot is represented by 1 tokenRL: mark(RL) indicates the number of available slots for low-priority clientsL: mark(L) indicates the number of low-priority clients being served (mark(RL) + mark(L) = nl)(In the middle partition) -- Each slot is represented by a tokensRS: mark(RS) is the number of tokens available in the middle partitionSH: mark(SH) indicates the number of tokens held by mark(SH) /  high-priority clients SL: mark(SL) indicates the number of tokens held by mark(SL) /  low-priority clients SLL: mark(SLL) is the number of tokens held by mark(SSL) / ( -1) degraded low-priority clients (mark (RS) + mark (SH) + mark(SL) + mark(SLL) =  * nm)SPN Model for Dynamic Quota with Subrating (cont.)

201. 201Transition: Rate Function: Enabling function:T1: lh mark(RH) == 0T2: ll mark(RL) == 0T3: mark(SH) / *  trueT4: mark(SL) / *  trueT5: mark(SLL) /(-1) *  trueT6: h mark(RH) == 0 && mark(RS) == 0 T7: (immediate transition) trueT8: h trueT9: mark(H) *  trueT10:  l trueT11: mark(L) *  trueSPN Model for Dynamic Quota with Subrating (cont.)

202. 202Arc: Multiplicity function:RS -> T1 aT1 -> SH aRS -> T2 aT2 -> SL aSH -> T3 aT3 -> RS aSL -> T4 aT4 -> RS aSLL -> T5 a - 1T5 -> RS a - 1SL -> T6 a * aT6 -> SH aT6 -> SLL a * (a - 1)SLL -> T7 a - 1T7 -> SL aSPN Model for Dynamic Quota with Subrating (cont.)

203. 203Calculating System Value PayoffThe pay-off rate for dynamic quota with subrating can be obtained by the following steps:Calculate the values of Xh, Xl, Xld,Mh, and Ml from SPNP (by associating proper rewards with markings of the system)What is the reward assignment to calculate Xh? return rate(“T3”) + rate(“T9”);What is the reward assignment to calculate Mh? if (mark(“RH”) == 0 && mark(“RS”) == 0 && !enabled(“T6”)) return h; else return 0; Compute the pay-off rate by:Xhvh + Xlvl + Xld[vl*(-1)/] - Mhqh - Mlql

204. 204Analysis Result

205. 205Analysis Result (cont.)

206. 206Case Study #3: Analysis of Replicated Data withRepair DependencyIng-Ray Chen and Ding-Chau WangThe Computer Journal Vol. 39, No. 9, 1996, pp. 767-779

207. 207Replicated data managementExtend Case Study 1 by considering both node and link failures/recovery as well as the effect of repair dependency which occurs when many sites and links may have to share the same repairman due to repair constraints.

208. 208Dynamic voting for replicated data managementDynamic voting: Each site Si maintains (VNi, SCi, DSi) to understand if it is in the major partition Site i is in the major partition if:the number of copies it can access is larger than one half of SCithe number of copies it can access is exactly equal to one half of SCi and it can access the “distinguished site” indicated in DSiIf a site is in the major partition, it can update locally. After an update is done, all copies in the major partition are updated along with the new (VNi, SCi, DSi) value

209. 209System modelSites and links have independent failure rates λs and λl.A repairman can repair a failed site with rate μs and a failed link with rate μl.There is always an update (called an immediate update) after a failure or repair event since the update rate is much faster than the failure/repair rateSite subnetA site can be in one of four statesup and current (upcc)up and out-of-date (upoc)down and current (downcc)down and out-of-date (downoc)

210. 210 Site Subnetgi true: site i is in the major partitiongi true: site i is not in the major partitionWhen an update arrives and the major partition exists, a token will be put into readyDouble-circlemeans acommon placei tokens are circulatingforsite i modelOnly one out of the six transitions (t0,t1,t2,t3,t4,t5) can fire, all with the same priority level (3)Site iModel:one for eachsite(t5, 3, gi )t5 is the name of the transition, 3 is the priority of the transition, and gi is the enabling functiont5 will fire if gi is true and site i is up and currentAfter t5 fires, the state will go from upcc to upoc meaning that site i is up and out of date

211. 211 Site Subnet(t2, 3, gi )t2’s enabling function gi returns TRUE if site i is in the major partitiont2 will fire if gi returns true and site i is up and out-of-dateAfter t2 fires, the state will go from upoc to upcc meaning that the new state will be up and currentgi true: i is in the major partitiongi true: i is not in the major partition

212. 212 Site Subnet(t0, 3)t0 will fire if site i is down and currentAfter t0 fires, site i is down and out of date (downoc)gi true: i is in the major partitiongi true: i is not in the major partition

213. 213 Site Subnet(t1, 3)t1 will fire if site i is down and out-of-dateAfter t1 fires, site i remains down and out of dategi true: i is in the major partitiongi true: i is not in major partition

214. 214System subnet Each of the boxes labeled site i is the site subnet model for site if true: there is a major partitionf true: there isn’t a major partitionTransitions tf and tfbar are given the highest priority levels (5 and 4)When an update event arrives, a token will be put in place “update event” After all sites are evaluated and each site's status are updated, tds and tsc which have lowest priority levels will executeSystemSubnet Model

215. 215System subnet(tsc, 1) updates the site cardinality as the number of tokens in place sc:Input arc multiplicity: #(sc)Output arc multiplicity: the number of sites with mark (upcc) >0 (i.e., in the major partition)

216. System subnet(tds, 2) updates distinguished site as the number of tokens in place ds:Input arc multiplicity: #(ds)Output arc multiplicity: maximum #(upcc) among all sites in the major partition

217. 217Site failure/repair subnetsThis subnet describes the effect of site i’s failure and repair on the system statesite i can only be in one state at a time, so only one transition out of these two subnets is possible at any time. Site failure/repair subnets for site iIndependent Repairman Modelsubscript i refers node ifailure events: upcci  dwcci and upoci  dwoci with rate of λsrepair events: dwcci upcci and dwoci upoci with rate of μs

218. 218Link failure/repair subnetssubscript ij refers to the link between nodes i and jfailure events: uplinkij dwlinkijWith rate of λlrepair events: dwlinkij uplinkijWith rate of μl Link failure/repair subnets for each linkijIndependent Repairman Model

219. 219Meanings of places.

220. 220Arc multiplicity functions.

221. 221Enabling functions.

222. 222FIFO repairman model (one repairman)We can make use of the independent repairman model and modify the repair rates to account for repair dependency.The repair rate is “deflated” by the total number of failed sites and links to account for the effect of repair resource sharingIf a state has 3 failed entities: two failed sites and one failed link, For the independent repairman model, repair rates are μs, μs and μlFor the FIFO repairman model, repair rates are μs / 3, μs / 3 and μl / 3.

223. 223FIFO repairman model

224. 224Linear-order repairman model (one repairman)Repairing failed site/link in a prescribed order Creating a new enabling function associated with each repair transitionOnly one enabling function at any state returns TRUE based on the prescribed linear order and all others return FALSE

225. 225Linear-order repairman modelA 5-site ring topology with the linear repair order being sites 5,4,3,2,1 followed by links 45,51,43,32,21If sites 2 and 4, and link 51 are down, then site 4 is chosen to be repaired first12543

226. 226Linear-order repairman modelEnabling functions associated with sites 4 and 2 and link 51 will return TRUE, FALSE and FALSE, respectively, meaning that site 4 will be repaired first over site 2 and link 51.12543

227. 227Linear-order repairman model

228. 228Best-first repairman model(one repairman)Preference is given to the site or link which can lead to the existence of a major partition after its repair with respect to the current stateIf there are more than one failed sites or links whose repair would lead to the existence of a major partition, then a tie-breaker rule will be applied to select one to be repaired next.

229. 229Best-First Repair StrategyTie-Breaker RulesChoosing a failed entity such that after repair it will result in more current copies (i.e., a large SC) in the major partition (i.e., the more upcc sites in the major partition, the better)Choosing a site (among failed sites) that is the highest linearly ordered site, so it has a higher chance to become the DSChoosing a failed entity that will stay alive for a longer time after repair. That is, choose one with a lower failure rate and a higher repair rate. For example, when choosing between a failed site vs. a failed link, if μs / λs> μl / λl, then repair the failed site, otherwise repair the failed link

230. 230Best-First Repair ExampleEffective:(SC is 4 after repair)Ineffective:(SC is 3 after repair)

231. 231

232. 232EvaluationTested with a 5-site ring topologyFour repairman models:Independent repairDependent repair (one repairman)FIFO Linear-orderBest-first12543

233. 233Model complexity: number of statesIndependentFIFOLinear-orderBest-first# of states in the underlying Markov model8674867454293821

234. 234Performance metrics and reward assignments for calculation DefinitionReward AssignmentSystemAvailabilityThe steady-state probability that a major partition existsReward rate = 1 for those states in which enabling function f() is evaluated to TRUE. Reward rate = 0, otherwiseSite AvailabilityThe probability that an update arriving at an arbitrary site will succeed Reward rate = 1*k/n for those states in which enabling function f() is evaluated to TRUE where k is the number of up and current copies in the major partition.Reward rate = 0, otherwisek: # of ‘up and current’ (upcc) sites in the major partition in a particular staten: total number of sites in a system (n=5 in a 5-site ring topology)

235. 235Results: independent repairman modell:l site availabilitys:s site availabilitySite failure only assumption (as in Case Study 1) will overestimate the site availability unrealistically.

236. 236Results: Comparison of repairman modelsSite availability under independent repair is much higher than that under dependent repairAmong dependent repair: Best-first > Linear-Order > FIFO

237. 237HW3, Problem #3 --- if (mark(“RS”)==0 && !enabled(“T8”)) return 1; else return 0; -- for reward assignment for calculating the rejection prob of low-priority clients. Figure 2. SPN Model for Quota-Based Admission Control with Sub-ratinga * (a-1)a * aaa-1aaaaaaaa -11aaa -1(a-1) * a a * (a-1)T8A new transition “T8” is added to account for subrating for low-priority clientsFor the homework in which only the middle partition exists, the enabling condition for “T8” is mark(“RS”)==0For case study #2 in which all three partitions exist, the enabling condition for “T8” ismark(“RL”==0) && mark(“RS”)==0

238. 238Client arrival ratedepends on mark(“Po”)Ethernet service ratedepends on mark(“P1”) CPUEthernetClientDiskProblem #3 in HW#3

239. 239* Sharpe code partial code examplefunc A(k) (1-1/k)^(k-1) HW #2, Problem #4 func C(k) (1-A(k))/A(k)bind mu1 1.0/((1.0/Np)*(Lp/B))func mu(k) 1.0/((1.0/Np)*(Lp/B)+S*C(k+1))….pfqn system* Section 1. . .. . . end* Section 2. . .. . .Ethernet lds \ mu1,mu(2),mu(3),mu(4),mu(5),mu(6),mu(7),mu(8),mu(9),mu(10),\mu(11),mu(12),mu(13),mu(14),mu(15),mu(16),mu(17),mu(18),mu(19),mu(20)end* Section 3. . .end

240. 240HW #2, Problem #3 Partial Markov Model (you finish the rest)

241. 241How to obtain a performance measure using Sharpe?1. Do a reward assignment (using bind) HW #2, Problem #32. Call exrss(system) to obtain the expected steady state reward & use bind to hold the result if necessary, e.g., bind high_priority_population exrss(system)Example 1: High-priority class populationExample 2: High-priority class throughput:State000001002003010011012020021202Reward0123012012State030211220401410600    Reward010100    State000001002003010011012020021202Reward0h2h3h0h2h0h2hState030211220410401600    Reward0h 0 0 h0    

242. 242HW #3, Problem #1Can make use of the SPN model and code on slide #178 for M/M/m/b (with m=3 and b=8) btrservbuftrin variable rateHW #3, Problem #3Can make use of the SPN model in Figure 2 (slide #204) of Case Study 2 (to be discussed)

243. 243HW #3, Problem #2Reliability Modeling and Analysis of a 3P2M system using SPNPEach component (CPU or memory) has an independent repair facilityEach subsystem (CPU or memory) has an independent repair facility that can repair failed components within the subsystem one at a time.The whole system shares a repair facility which repairs failed components one at a time with the repair priority of memory modules over CPUs. •• • Enabling function: return false when CPU_up==0 or MM_up==0CPU_downt1t2CPU_upvariable rate:mark(“CPU_up”)*per-CPU failure rate• • MM_downt3t4MM_upvariable rate: For case (a) it is mark(“CPU_down”)*per-CPU repair rate; for cases (b) and (c) it is just per-CPU repair rateEnabling function: return false when CPU_up==0 or MM_up==0; for case c only: also return false when MM_down>0t3 and t4 also each have a variable rate and an enabling function.

244. λtr_fastservbuf6trintr_slowservtr_fasttr_slowfastserverslowserverProblem #1 in HW#3tr_fast and tr_slow are immediate transitions with probability = 1 but the priority of tr_fast is greater than the priority of tr_slow, so whenever the fast server is free, a job waiting in “buf” will go to it.tr_fastserv and tr_slowserv are timed transitions with rates of mu_f and mu_s, respectively.

245. Problem #2 in HW#3The places and transitions are the same for all three cases. Associate an enabling function each with cpu_failure, cpu_repair, mem_failure and mem_repair.Associate a rate function each with cpu_failure, cpu_repair, mem_failure and mem_repair.The three cases differ in how you define the enabling functions and the rate functions.You need to code and run 3 separate SPNP programs for the three cases.

246. Problem #3 in HW#3(α = 2, so a token represents one-half slot)A new transition “T8” is added to account for subrating for low-priority clientsFor the homework in which only the middle partition exists, the enabling condition for “T8” is mark(“RS”)==0For case study #2 in which all three partitions exist, the enabling condition for “T8” ismark(“RL”==0) && mark(“RS”)==0(α-1)**(α-1) Reward assignment function for calculating the population of low-priority clients: reward_type population_low_priority(){return (mark("SLL") + mark("SL")/2);}

247. Problem #4 in HW#3Five places: CPU, temp, disk1, disk2, and term each holding tokens (jobs). Place “temp” is a temporary place for holding jobs departing from CPU.Initially 4 tokens (jobs) are in place “term”. Four timed transitions: TCPU, Tdisk1, Tdisk2, and Tterm Only Tterm has a variable rate. All others have fixed rates.Three immediate transitions: Tp0, Tp1, Tp2 with probabilities of 0.1, 0.667, and 0.233 respectively.Place “temp” is the input place to these three immediate transitions.