Rapid arameterized Model Checking of Snoopy Cache Coher ence Pr otocols E
69K - views

Rapid arameterized Model Checking of Snoopy Cache Coher ence Pr otocols E

Allen Emerson and ineet Kahlon Department of Computer Sciences and Computer Engineering Research Center The Uni ersity of xas Austin TX78712 USA Abstract ne method is proposed for par ameterized reasoning about snoop cache coherence protocols The me

Download Pdf

Rapid arameterized Model Checking of Snoopy Cache Coher ence Pr otocols E




Download Pdf - The PPT/PDF document "Rapid arameterized Model Checking of Sno..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentation on theme: "Rapid arameterized Model Checking of Snoopy Cache Coher ence Pr otocols E"— Presentation transcript:


Page 1
Rapid arameterized Model Checking of Snoopy Cache Coher ence Pr otocols E. Allen Emerson and ineet Kahlon Department of Computer Sciences and Computer Engineering Research Center The Uni ersity of xas, Austin TX78712, USA Abstract. ne method is proposed for par ameterized reasoning about snoop cache coherence protocols. The method is distincti for being xact (sound and complete), fully automatic (algorithmic), and tractably ef ficient. The states of most cache coherence protocols can be or ganized into hierarchy reflecting ho tightly memory block in gi en cache

state is bound to the processor broad frame ork encompassing snoop cache coherence protocols is proposed where the hierarchy implicit in the design of protocols is captured as pr e-or der This yields ne solution technique that hinges on the construction of an abstr act history gr aph where global concrete state is represented by an abstract state reflecting the occupied local states. The abstract graph also tak es into account the history of local transitions of the protocol that were fired along the computation to get to the global state. This permits the abstract history graph to

xactly capture the beha viour of systems with an arbitrary number of homogeneous processes. Although the orst case size of the abstract history graph can be xponential in the size of the transition diagram describing the protocol, the actual size of the abstract history graph is small for standard cache protocols. The method is appli- cable to all of the most common snoop cache protocols described in Handy book [19] from Illinois-MESI to Dragon. The xperimental results for parame- terized erification of each of those protocols document the ef ficienc of this ne method in practice,

with each protocol being erified in just fraction of second. It is emphasized that this is parameterized erification. Intr oduction Cache protocols pro vide vital uf fer between the er gro wing performance of pro- cessors and lagging memory speeds making them indispensable for applications such as shared memory multi-processors. Unfortunately cache protocols are beha viorally com- ple x. Ensuring their correct operation, in particular that the maintain the fundamental safety property of coher ence so that dif ferent processes agree on their vie of shared data items, can be subtle.

The dif ficulty of the problem is often magnified as the number of coordinating caches increases. Moreo er it is highly desirable that cache proto- col be correct independent of the magnitude of There is thus great practical as well This ork as supported in part by NSF grants CCR-009-8141 CCR- 020-5483, and SRC contract 2002-TJ-1026. The authors email addresses are emerson,kahlon @cs.utexas.edu
Page 2
as theoretical interest in uniform parameterized reasoning about systems comprised of homogeneous cache protocols so as to ensure correctness for systems of all sizes This

general problem is kno wn in the literature as the ar ameterized Model Chec king Pr oblem (PMCP) It is in general algorithmically undecidable. Prior attempts to address the PMCP for cache protocols (cf. Section 5) ha had number of limitations, ranging from incompleteness to the need for considerable human interv ention and ingenuity to potentially catastrophic inef ficienc In this paper we present general method for solving the PMCP er snoop cache coherence protocols of the sort commonly used in shared memory multiprocessors. Our frame ork includes all of the protocols in the book of

Handy [19]. Our method is specialized to dealing with safety properties, as is appropriate for reasoning about coherence. gi solution for this PMCP er our cache frame ork for safety that is distinguished by being xact (sound and complete), fully automatic (algorithmic), and ha ving comple xity bounds that are quite tractable. The orst case comple xity of our general algorithm is single xponential time in the size of the state diagram of single cache unit; ho we er our xperimental results sho that our algorithm performs very ef ficiently in practice. ha applied our method to erify

parameterized ersions of the MSI, MESI, MOESI, Illinois (MESI-type), Berk ele N+1, Dragon, and Firefly cache coherence protocols. In our frame ork, we model cache coherence protocols using specialized ariant of broadcast protocols [14] that we call pr e-or der ed br oadcast pr otocols where pro- cesses coordinate using broadcast primiti es plus boolean guards. broadcast trans- mission corresponds to cache protocol putting message on the us; reception of such message corresponds to snooping the us and taking appropriate action. Boolean guards mak it possible to model protocols (e.g.,

Illinois, Firefly Dragon) that need to determine the presence or absence of the required memory block in other caches. Our approach xploits feature common to most snoop cache coherence protocols [8]: their states can be or ganized into hier ar hy based on ho tightly memory block in gi en state is bound to the processor Consider for xample, the MSI cache coherence protocol (cf. Figure 1). memory block in the modified state is intended to be used by at most one processor and can be written to by that processor locally without gen- erating an memory transactions across the us. So it

is tightly bound to the processor Ho we er block in the shar ed state can potentially be shared by multiple processes and cannot be modified locally Hence it is less tightly bound to the processor mak precise this notion of tightness by capturing it as pr e-or der on the state set of an indi vidual cache protocol. Intuiti ely state higher in the order is more tightly bound to the processor than state that is comparably lo wer in the order or instance, in the case of the MSI protocol, the pre-order is gi en by  Our technique in olv es the construction of an abstr act history gr

aph er nodes of the form    where is the set of states of the gi en cache protocol. The idea is the follo wing: represent global state of system with caches pr e-or der on finite set is refle xi and transiti binary relation on There are se eral associated relations. say is equi alent to written   if !#"$%& strictly precedes written (') if (#"+*-,./10 is incomparable to written &2 if *-,.+4506"+*-,.7410
Page 3
by tuple of the form  $ ! Here denotes the local state of the process ecuting the most recent transition in the

computation leading up to that flushes all the other processes into some unique fix ed state. The set denotes the maximal set of states of that could potentially be filled gi en arbitrarily man processes by firing (a stuttering of the) the sequence of local transitions that were fired in the system with caches to get to The standard abstract graph construction used in, e.g., [25] just stores the set of local states occurring in global state. Our ne construction xtra historical information permits us to reason about an arbitrary number of caches in an xact ashion

with respect to safety properties. In the orst case, the size of the abstract graph may be xponential in the size of the state diagram of the gi en cache protocol. But in practice the abstract graph tends to be small as documented by our empirical results. In our xperiments, protocols with states had abstract graphs with abstract states, for small  belie this may be reflection of the tendenc for broadcast transitions to dri recipients from wider range of cache states to narro wer (lo wer in the pre-order) range of cache states, thereby reducing the number of de grees of freedom

possible for abstract states. Finally we discuss ho our technique enables us to generate error traces once an error is detected. The rest of the paper is or ganized as follo ws. be gin by introducing the system model in section 2. In section 3, we present model checking algorithm for erifying parameterized safety properties based on the construction of the abstract history graph. Applications and xperimental results are discussed in section 4, while comparison with related orks and some concluding remarks are gi en in the final section 5. Pr eliminaries 2.1 Moti ating Example use as an

xample the simple MSI cache coherence protocol. The state transition diagram for the MSI protocol is sho wn in figure 1. The symbols and stand for        and    states, respecti ely The states are or ganized so that the closer the state is to the top, the more tightly is the memory block in that state bound to the processor In our system model we capture this notion of tightness as pr e-or der on the states of the cache protocol. The notation   means that if the controller observ es the ent from the processor side of the us then in addition to the state change it

generates the us transaction or action The null action is denoted by “-”. ransitions due to observ ed us transactions are sho wn as dashed arcs, while those due to local processor actions are sho wn in bold arcs. The ! #"  $" transaction is generated by process read % " request when the memory block is not in the cache. The ne wly loaded block is pr omoted viz., mo ed up in the state diagram, from in alid to the shared state in the requesting cache. If an other cache has the block in the modified state and it observ es  $" transaction on the us, then its cop is stale and so it

demotes its cop to the shared state. call such transition low-push broadcast. More generally broadcast transition '&)( is lo w-push transition with respect to if it forces ery other process in local that is strictly higher in the pre-order than to state that is at most as high as The  *" ,+!-. /  ! $" 10 transaction is generated by %324 to block that is either not in the cache or is in the cache
Page 4
                     !  ! "!  ##   ## 

 ##          Fig 1. The MSI Cache Coherence Protocol and its template ut not in the modified state. The cache controller puts the address on the us and asks for an xclusi cop that it intends to modify All other caches are in alidated. Once the cache obtains the xclusi cop the write can be performed in the cache. This is an xample of flush broadcast transition, that forces ery process other than the one firing the transition and in its non-initial state into unique fix ed state defined by the transition. The template for

protocol, such as MSI, is obtained from its state transition dia- gram through simple abstraction, treating the beha vior of the processors as purely non- deterministic. The transformation is straightforw ard, syntactic, and mechanical: Each transition generated by processor actions (represented by bold line) and labeled by %'&)( where (+* ,.- is labeled with the broadcast send label %0// while ery transition generated by us actions represented by dashed lines) and labeled with (1&32 is labeled with the matching broadcast recei label %54)4 In the original diagram the relationship between

broadcast send %0&)( and its corresponding recei (6&32 as established with the common symbol while in the template it is established by the common symbol in the labels %0// and %54)4 Ev ery bold transition labeled with %'& represents local action and is therefore labeled with the local transition label The natural pre-order on is 9;:=<>:@? All transitions labeled with A'B3CED are lo w-pushes with respect to while those labeled with A'BGF=B are flushes. 2.2 The System Model: Pr e-Order ed Br oadcast Pr otocols In this paper we consider amilies of systems of the form $0H such that

pre-order can be imposed on the states of template such that each transition of is either local transition or flush broadcast or lo w-push broadcast with respect to Furthermore There is usually natural and visually ob vious pre-order ut there may be more than one suitable pre-order suitable pre-order can be constructed as sho wn in the section 3.4.
Page 5
the transition could also be labeled with the specialized disjuncti guard   or the specialized conjuncti guard  call such systems pr e-or der ed br oadcasts The process template is formally defined by the 4-tuple 

  where is finite, non-empty set of states is finite set of labels including the local transition label broadcast labels  and recei labels  The local transition relation is such that each transition  is either local  or broadcast,    &)( or recei    assume that recei es are deterministic: for each label   appearing in some broadcast send and for each state in there is unique corresponding recei transi- tion on  out of The guard labeling each transition  of is either the boolean xpression true or the specialized conjunctive guard  or the

specialized disjunctive guard   assume that the guard is true for recei transitions. In practice, the abo mentioned guards suf fice in modeling cache coherence protocols as each cache only needs to kno whether another cache has the memory block it requires, xpressed using the specialized disjuncti guard, or whether no other cache has it, xpressed using the specialized conjuncti guard. further stipulate pre-ordering, on the state set of such that is the minimum element, i.e., for all local states "! we ha %$& and such that each broadcast transition  is of either of the tw forms 1.

Flush Gi en state of transition  where is called an flush transition pro vided that there xists the matching recei transition ' ( in and for each state of there is matching recei transition of the form ') in flush transition is an -flush for some Intuiti ely an -flush transition pushes ery process in its non-initial state, other than the one firing the transition, into local state 2. Low-push ransition  &)( is low-push transition pro vided that, *!  6 and for each state such that  there is matching recei transition of the form ' such that

and, for all other states there is matching self-loop recei transition  Intuiti ely transition  is lo w-push if it pushes ery process in local state strictly higher than in the pre-order into state at most as high as while lea ving the rest of the processes untouched. In practice, natural pre-order is normally supplied along with the diagram of as it dra wn in appropriate le els. If not, there is gi en in the section 3.4 an ef ficient algorithm (O( ,+ )) to compute an appropriate pre-order if one xists. capture block replacement beha vior we also require that templates be

initializ- able This means that from each state of protocol, there is local transition of the form . Such initializations model block replacement beha vior where cache is non-deterministically pushed into its in alid state, irrespecti of the current state of the Initializability is not needed for the mathematical results of section 3.1; ho we er it is needed for the results of section 3.2.
Page 6
block. or simplicity re-initialization transitions and self-loop receptions are not dra wn in state transition diagrams of cache protocols (cf. [8]). Gi en the state transition digram for 

      1 the system with copies of is based on interlea ving semantics in the standard ay path  -     of is sequence of states of starting at the initial state of such that for ery      " for some / or global state of and    we use   to denote the local state of process in and for computation path of we use   to denote the local computation path of in viz., the sequence     write  7 to mean that finite computation path of ends in global state In this paper we will focus on finite paths and computations as the suf fice for

safety Finally gi en global state of and local state of we let   denote the number of copies of in viz., the number of processes in local state in global state Safety Pr operties Gi en state of we say that is eac hable if there xists such that there is finite computation of leading to state with process in local state or cache coherence protocols, we are typically interested in pairwise eac hability viz., gi en pair of local states and of template deciding whether for some there xists reachable global state of with process in each of the local states and or instance, in the case of

the MSI protocol, we are interested in sho wing that none of the pairs in the set 6    ! is pairwise reachable. 3.1 Systems without conjuncti guards In this section, we assume that is template without conjuncti guards; guards of the form true or  are permitted. This allo ws us to handle the MSI, MOESI, MESI (not the Illinois ersion which is handled in the ne xt section), Berk ele and N+1 protocols. standard technique for reasoning about parameterized systems in olv es the con- struction of an abstract graph to capture the beha viour of system instance of arbitrary size. Classically

the abstract graph is defined to be transition diagram er the set 6 with gi en concrete global state of system instance being mapped via mapping say onto the set  or / $# transition is intro- duced from to in the abstract graph if there xists and concrete states and of % such that and results from by firing concrete transition of % There is loss of information in the mapping which is reflected in the act that it might not be possible to identify unique successor of in the abstract graph that results by firing transition  &)( where / or instance if  is local

tran- sition, then tw dif ferent successors are possible: '& ( *)+ 3( , and ')+ 3( , depending, respecti ely on whether there is xactly one or at least copies of in the concrete state that maps onto preserv soundness we co er for both cases and in- troduce both and as possible successors. Ho we er this may generate bogus paths in the abstract graph, viz., paths for which there do not xist matching concrete compu- tations. Thus there might xist paths in the abstract graph that don “lift to concrete computations and hence the abo technique though sound is not complete.
Page 7
In

this paper to check pairwise reachability we use the abstr act history gr aph of denoted by  where we bypass the abo problem by mapping each concrete state onto tuple of the form that denotes formal state with at least one cop of state and finite ut arbitrarily man copies of each state in As we later sho this permits us to reason about safety properties in sound and complete ashion. Definition (r epr esentati e) Gi en template     and finite computation of we define ep to be the tuple  4  where, if no flush transition as fired along then and

!    and if is the process to last fire flush transition along then   and    Gi en template the abstr act history gr aph      is tran- sition diagram defined er tuples of the form ) or ! for some we will sho ho to map  onto tuple of the form   This mapping depends not only on the global state ut also on viz., the history of the computation leading to and thus the term abstr act history gr aph Essentially in tuple state records the local state in of the process ecuting the last flush along whereas is superset of the set of the local

states of the remaining processes. This dichotomy is justified on the basis of the act that we can pump up the multiplicity of each local state in to an desired alue xcept possibly of the current local state in of the process to last ecute flush along which could ha multiplicity xactly one as we later sho no define the transition relation  ards that end, gi en tuple and local or broadcast send transition  we define the successor of   via  as either the state-successor denoted by state-succ     or the set-successor of    denoted by set-succ     As

mentioned abo e, we think of    as state with finite ut arbitrarily man copies of each state in plus one cop of The case of the state-successor captures the scenario when process in local state that possibly has multiplicity only one fires  while the case of the set-successor captures the scenario when process in local state with arbitrarily lar ge multiplicity fires enabled transition  Definition (state-successor) Let + & and let transition  labeled by guard be enabled in   viz., if  then 6  #  Then state-succ    where if  is local

transition then and if  is broadcast send transition then 3(  + 1    1    that is matc hing eceive for tr As an xample, since firing the transition   of the MSI protocol af fects only processes in state by causing them to transit to state therefore state- succ      * Definition (set-successor) Let ! and let transition  where ( be such that if  is labeled by guard then it is enabled in viz., if  then for some  (   Then, set-succ    is defined as the tuple  if  is -flush transition
Page 8
 

                           Fig 2. The abstr act histor aph or the MSI Cache Coherence Protocol if  is local transition. Note that since we had arbitrarily man copies of to start with so en after firing local transition  we are guaranteed arbitrarily man processes in local state which is therefore not xcluded from the second component of the resulting tuple. % if  is lo w-push broadcast transition, where is the (unique) match- ing recei for  from and 1( + 1  6   that is matching recei for tr As

in the pre vious case since we ha arbitrarily man copies of so in we include the local state that results from firing the matching recei for  from which by definition of lo push transition (and the act that  is itself. As an xample, since firing the transition   of the MSI protocol flushes ery other process into state therefore set-succ   *   / no formally define the abstract history graph of template Definition (Abstract History Graph) Gi en template     the abstr act history gr aph of is defined to be the tuple  where 4 and

6   % state-succ   or  set- succ    for some local or br oadcast send tr ansition  of  As an xample, the abstract history graph for the MSI protocol is sho wn in figure 3. Self loops are omitted for the sak of simplicity or con enience, we ha labeled each transition of the graph by the label of the transition responsible for “firing it. Note that as opposed to the classical construction, gi en tuple   and transition  both the set-successor and state-successor of via  are uniquely defined. This is because as will be sho wn in proposition 3.3, we can ha

arbitrarily man copies of each state in thereby alle viating the problem of considering the dif ferent successors that may arise from concrete states with dif ferent counts of local states as as the case with the classical abstract graph construction. This permits us to gi xact path cor respondences between the parameterized amily of concrete systems and the abstract
Page 9
history graph as we no sho Since we are dealing with systems of “disjuncti e nature ha ving (arbitrarily man y) xtra copies does not disable an transitions. Gi en ! the precise mapping of onto tuple of  is gi

en by the -representati of denoted by -r ep  Definition -r epr esentati e) Let *    be finite computation path of Then we define the -r epr esentative of denoted by -r ep as the tuple $  defined as follo ws: If then  else suppose that transition is initiated by transition  of fired locally by process  and let  be the process to last ecute flush transition in    Then   (         (        The tuple ep specifies the actual set of states present in the global state ha ving follo wed path through % In

contrast, the -representati -r ep incor porates not only the local states present in ut also the states that could potentially be present, gi en suf ficiently man processes in global state of that results from firing (a stuttering of) the same local transitions as were fired along to get to Thus, -r ep drags along some “history of the computation leading to and thereby stores more information than ep  This is formalized as follo ws. Pr oposition 3.1 (Containment Pr operty) Gi en  such that ep  and -r ep  we ha and no establish “path correspondence between finite

computations of and be- tween finite paths of starting at  Pr oposition 3.2 (Pr ojection) or an finite path  in there xists finite path in starting at  such that -r ep or the other direction, we ha Pr oposition 3.3 (Lifting) Let be path of  starting at  and leading to tu- ple of Then, gi en there xists for some such that ep    and has at least copies of each state in plus cop of Combining the pre vious three results, we ha Theor em 3.4 (Decidability Result) air  is pairwise reachable if there xists path in starting at  to tuple of the form $# where either and

or and / or / and  # Thus we ha reduced the problem of pairwise reachability for pair of local states of gi en template to the problem of reachability in the abstract history graph constructed from Since the size of the abstract graph is O( ,+  ), we ha Cor ollary 3.5 The pairwise reachability problem for pair of local states of gi en template can be solv ed in time O( ,+  ), where ,+ is the size of template as measured by the number of states and transitions in
Page 10
Note that in the construction of  it suf fices to consider only the set of tuples reachable

from the initial tuple  In practice, the number of states of this graph may be much smaller than the orst case scenario where it could be 6  This is illustrated clearly by our xperimental results in section 4.2. 3.2 Adding the Specialized Conjuncti Guard reason about systems wherein the templates are augmented with the specialized conjuncti guard along with the assumption of initializability we use modification of the abstract history graph. Broadly speaking, the intuition behind the modification is that we can mak the specialized conjuncti guard of process aluate to true

starting at an global state by dri ving all the other processes into their respecti initial states by making use of the local initializing transition mentioned abo e. Thus for ery tuple in the abstract history graph, we add transition of the form   *& where either  or  ( to Definition (Modified Abstract History Graph) Gi en template   $  and its abstract graph  define the modified abstract graph to be the tuple    where is the set of all transitions  % where and either or ) This transition corresponds to the successi firing of the local

initializing transition that lea es one process in state ( and the rest of the processes in their initial states, thereby enabling guard (  labeling its transitions. and  labeled by (  This corresponds to the firing of transition labeled with   labeled either by   or by true such that either % state- succ   or  set-succ     ! This correspond to the firing of transitions labeled with   or true. Then, as in section 3.1, we can sho “path correspondence between concrete finite computations of and finite paths in starting at  The proofs are

similar and are therefore omitted. Thus as in section 3.1, we ha the follo wing de- cidability result from which it follo ws, as before, that for this model of computation, pairwise reachability can be decided in time O( ,+ ), where ,+ is the size of the template Theor em 3.6 (Decidability Result) air  is pairwise reachable if there xists path in starting at  to tuple of the form $# where either and or and / or / and  # 3.3 Generating Err or races critical part of the erification process, once an error is detected, is the generation of concrete computation of the system at hand

leading to an erroneous global state. 10
Page 11
                                !"   !"        " Fig 3. The template or the Brok en MSI Protocol and its abstr act histor aph ill no we ha sho wn ho to reduce the erification process for safety properties of the parameterized ersion of gi en cache protocol to reachability analysis er the corresponding abstract history graph. This only allo ws us to detect an erroneous state in the

abstract history graph and thereby construct path in the abstract graph to an erroneous state. get back concrete computation of an instance of an original system leading to concrete erroneous state, we mak use of the construction used in pro ving proposition 3.3. Gi en path starting at the initial tuple  leading to an erroneous tuple   of the abstract history graph, this construction can be used to gi fully automated procedure to construct finite computation of concrete system for some ending in state such that ep    In general, is of size linear in the length of viz., O(  in

the orst case. But, as mentioned abo e, in practice, the number of states of the abstract history graph reachable from its initial state tend to be small and consequently so does the length of The ability to automatically generate error traces distinguishes our ork from [9], where no ef fecti ay to generate error traces as gi en. no illustrate the construction with brok en ersion of the MSI protocol (fig- ure 3). The MSI protocol is clobbered by replacing the flush transition labeled with % 24  from the shared state to the modified state by lo push transition labeled with

%324  In the abstract history graph, self loops are omitted for simplicity reasons and erroneous tuples are shaded. Note that the erroneous pair  $ can be reached via the path      $ by firing transi- tion labeled with % " follo wed by transition labeled with %324 From this path we can get back concrete computation of system with caches by firing tran- sitions labeled with %" % " and %324 in the order listed, stuttering of the sequence %" %324 The resulting concrete computation is:   $ $   $ %'&  / Here symbol labeling

transition indi- cates that process fires transition of template labeled with 11
Page 12
3.4 utomatic Construction of Pr e-order In practice, one can usually obtain the natural pre-order by dra wing the diagram in le els, reflecting ho tightly memory block in gi en cache state is bound to the pro- cessor Such le els are used in the te xtbook by Culler [8] et al. If not, we can ef ficiently xhibit feasible pre-order that can be imposed, or determine that none xists. proceed by constructing the labeled, directed graph  -  5 where 6   /& is its edge set. or /

an edge of the form represents 4 indicates and  means 4 construct as follo ws. 1. Initially 1  6 + This is because of the assumption we made in the system model that for each we ha 6 2. or each non-local transition or non-flush broadcast send transition    we ha  6 Thus we augment by adding the edge  6 Furthermore if  is matching recei for  such that then we ha that $ and so we add the edges and to On the other hand if  is matching recei for  then we ha that  and so we add the edge   to If already contains an edge of the form then in case we add the

edge to in the abo step, we remo to ensure that there is only one edge from to labeled with or Let be the subgraph of that we get by deleting all edges labeled with  Then we can impose pre-order on the states of compatible with its transitions if (1) there does not xist ycle in containing an edge labeled with and (2) for each edge   of there do not xist tw distinct maximal strongly connected components of one containing state and the other one containing state such that there is path from to in Since the maximal strongly connected components of can be constructed in time linear in the

size of viz., linear in therefore the abo mentioned conditions and can be check ed in time quadratic in the size of Thus we can decide in O( ,+ time whether desired pre-order can be imposed on or not. pplications As applications, we consider model checking parameterized ersions of all of the snoop based cache protocols presented in [19]. The translation from the state transition dia- gram of gi en protocol to its template is straightforw ard and syntactic and can be performed in the same mechanical ashion as as done for the MSI protocol in section 2.1: Firing bold transition labeled with 

and/or one that requires that no other cache currently possesses the desired memory block does not af fect the status of the memory block in an other cache. Such transition is therefore labeled with the lo- cal transition label and in the second case also guarded with the  Otherwise, transition labeled by   where is labeled with the broadcast send label *  Flush broadcast send transitions can be identified syntactically as all their matching recei es from ery non-initial state transit to unique state with the matching recei from self- looping on itself. Local transitions can be

identified by the absence of matching recei es. 12
Page 13
PrRd/ BusRd(S) PrRd/ PrWr/ BusRdX PrWr/ BusRdX/ Flush BusRd/ Flush BusRd/ Flush BusRdX/ Flush PrRd/ PrWr/ PrRd/ PrRd/ BusRd/Flush BusRd(S) PrWr!! PrWr?? PrWr!! PrRw?? PrRd?? PrRd?? PrWr/ BusRdX PrRw?? Flush BusRdX/   PrRd!!    Fig 4. The Illinois MESI Cache Coherence Protocol and its template while ery transition generated by us actions represented by dashed lines) and la- beled with  is labeled with the matching broadcast recei label  If to fire the transition additionally requires some other

cache to possess the desired memory block then it is also guarded by  Belo we consider only the Illinois MESI protocol in detail, with some others being handled in the full report [10]. 4.1 The Illinois MESI Cache Coher ence Pr otocol The transition digram and the template for the Illinois MESI cache coherence protocol is sho wn in figure 4. ormally the template is defined as ! "#%$'&($*)+$-,.0/ where, #! 1,324$-56$876$9+/ with the pre-order being gi en by 2;:<5=:>7(?@9 The set &A ,B3$8CEDGFIHKJLJM$8CEDGFIHNO$'CPDLQRDJMJL$'CPDMQD O/ The transitions are as

defined belo Empty Broadcasts (Local ransitions): S9T$UB3$82 S76$*B3$'20 56$*B3$'20 V2M$UB3$7K W56$UBK$ 5K S76$UBK$87K S9T$UBK$89 Note that the first three transitions are included because of the assumption of initializability and are for simplicity reasons not sho wn in figure nor are broadcast recei transitions that are self loops. Lo w-push sends: S24$8CEDGFIHKJLJL$ 5K Lo w-push recei es: S9T$'CPDGF6HXO$ 5K S76$8CEDGFIHNO$ 5; Flush sends: V2M$'CPDMQRDJLJM$9Y W56$'CPDLQRDJMJL$89 Flush recei es: U9Z$8CEDLQRD O$'20 U76$'CPDMQD

O$'20 5I$'CPDMQRD O$'20 The transitions S24$8CEDGFIHKJLJM$-5K and S24$*B3$87K are labeled with W and [=W respec- ti ely with the rest of the transitions being labeled with the true guard. need to decide whether the follo wing pairs are pairwise reachable: U9Z$9Y S9T$87K S9T$ 5K S76$7K $3S76$ 5; 13
Page 14
4.2 Experimental Results Here we summarize the results for wide range of xamples of cache coherence proto- cols. or detailed descriptions of these protocols refer to [19]. The column under of Abstr act States refers to the number of reachable states

in the abstract history graph for protocols that don use conjuncti guards, viz., MSI, MESI, MOESI, Berk ele and N+1; and in the modified abstract history graph for ones that use conjuncti guards, viz., Illinois-MESI, Firefly and Dragon. It is orth noting that although in the orst case the number of reachable abstract states in the modified abstract history graph cor responding to the template   could be as lar ge as  in practice it typically turns out to be much smaller or instance in the MESI protocol, the number of reachable abstract states were 6, against orst case

possibility of    states. similar scenario holds for the other protocols. Thus, in conclusion, the abstract history graph construction seems to ork well in practice. The xperiments were car ried out on machine with 797MHz Intel Pentium III processor and 256 Mb RAM. Belo we tab ulate the results for ariety of cache coherence protocols. The user time for erifying each of the cache coherence protocols as less than seconds. Pr otocol Pr e-Or der of Abstr act States MSI In valid Shar ed Modified MESI In valid Shar ed Exclusive Modified Illinois In valid Shar ed Exclusive

Modified MOESI In valid Owned Shar ed Exclusive Modified N+1 In valid alid Dirty In valid Owned Non-e xclusively Unowned Berk ele Unowned Owned Exclusively Firefly In valid Shar ed Dirty alid Exclusive In valid Shar ed Clean Shar ed Modified Exclusive Dragon Exclusive Modified Concluding Remarks The generally undecidable PMCP has recei ed good deal of attention in the literature. number of interesting proposals ha been put forth, and successfully applied to cer tain xamples ([7, 6, 26, 20, 2, 3, 27, 21]). Most of these orks, ho we er suf fer from the dra wbacks of

being either only partially automated or being sound ut not guaranteed complete. Much human ingenuity may be required to de elop, e.g., netw ork in ariants; the method may not terminate; the comple xity may be intractably high; and the under lying abstraction may only be conserv ati e, rather than xact. Similar limitations apply to prior ork on PMCP for cache protocols. Pong and Dubois [25] described methods that were sound ut not complete, as the were based on Ho we er for frame orks that handle specialized applications domains decisions procedures can be gi en that are both sound and

complete and fully automatic and in some cases ef ficient ([13, 15, 11, 12, 5, 24]). 14
Page 15
conserv ati e, ine xact abstractions. In [14] general frame ork of parameterized br oad- cast pr otocols as introduced and it as sho wn ho certain simple cache protocols could be modeled. That frame ork, ho we er did not admit guarded transitions, neces- sary to model man cache protocols such as Illinois (MESI). In [16], it as sho wn that sho wed that PMCP for safety er such broadcast protocols of [14] is decidable using the general backw ard reachability procedure of [1]. Ho we er

the backw ard reachabil- ity algorithm of [1] that [16], mak es use of, although general, suf fers from the handicap that the best kno wn bound for its running time is not kno wn to be primiti recursi [23]. In [22], Maidl, using proof tree based construction, sho ws decidability of the PMCP for broad class of systems including broadcast protocols, ut again the de- cision procedure is not kno wn to be primiti recursi e. Moreo er [22, 16, 14] do not report xperimental results for cache protocols. More recently Delzanno [9] uses arithmetical constraints to model global states of systems with man

identical caches. This method uses in ariant checking via backw ard reachability analysis of [1] and pro vides broad frame ork for reasoning about cache coherence protocols ut the procedure does not terminate on some xamples. Further more, this technique does not pro vide ay to generate err or tr aces when ug is detected. In [17], it as sho wn that for sub class of broadcast protocols called en- tr opic broadcast protocols, generalization of the Karp-Miller procedure for Petri nets terminates. While mathematically ele gant, the model does not allo for boolean guards necessary for modeling

protocols lik Illinois-MESI, Firefly and Dragon. Also, no x- plicit bounds were pro vided on the size of the resulting co erability tree (cf. [23]). In this paper we ha xploited the hierarchical or ganization inherent in the design of snoop cache protocols, representing and generalizing this or ganization using pre- orders. then present specialized ariant of the broadcast protocols model called pr e-or der ed pr otocols tailored to capture snoop cache coherence protocols. This has allo wed us to pro vide unified, fully automated and ef ficient method to reason about

parameterized snoop cache coherence protocols. Our method is unique in meeting all these important criteria: (a) it is sound and complete; (b) it is algorithmic; (c) it is apid meaning reasonably ef ficient in principle: orst case comple xity single xponential. (d) it has broad modeling po wer: handles all xamples from Handy book; (e) it is apid also meaning demonstrably ef ficient in xperimental practice; each xample protocol as erified for parameterized correctness in fraction of second; and (f) it caters for error trace reco ery Refer ences 1. Abdulla, K. Cerans, B.

Jonsson, K. Tsay General Decidability Theorems for Infinite State Systems. LICS 1996. 2. Abdulla, A. Boujjani, B. Jonsson and M. Nilsson. Handling global conditions in parame- terized systems erification. CA 1999. 3. Abdulla and B. Jonsson. On the xistence of netw ork in ariants for erifying parameter ized systems. In Corr ect System Design Recent Insights and Advances 1710, LNCS, pp 180-197, 1999. 4. K. Apt and D. ozen. Limits for automatic erification of finite-state concurrent systems. Information Pr ocessing Letter 15, pages 307-309, 1986. 15
Page 16
5.

Arons, A. Pnueli, S. Ruah, J, Xu and L. Zuck. arameterized erification with Automati- cally Computed Inducti Assertions. CA 2001, LNCS 2102, 2001. 6. M.C. Bro wne, E.M. Clark and O. Grumber g. Reasoning about Netw orks with Man Iden- tical Finite State Processes. Information and Contr ol 81(1), pages 13-31, April 1989. 7. E.M. Clark e, O. Grumber and S. Jha. erifying arameterized Netw orks using Abstraction and Re gular Languages. CONCUR. LNCS 962, pages 395-407, Springer -V erlag, 1995. 8. D. E. Culler and J. Singh. arallel Computer Architecture: Hardw are/Softw are Ap- proach. Mor gan

Kaufmann Publishers, 1998. 9. G. Delzanno. Automatic erification of arameterized Cache Coherence Protocols. CA 2000, 51-68. 10. E.A. Emerson and Kahlon. This paper full ersion. ailable at http://www.cs.utexas.edu/users/ emerson,ka hlon /taca s03/ 11. E.A. Emerson and Kahlon. Reducing Model Checking of the Man to the Fe CADE-17. LNCS Springer -V erlag, 2000. 12. E.A. Emerson and Kahlon. Model Checking Lar ge-Scale and arameterized Resource Allocation Systems. CAS, 2002. 13. E.A. Emerson and K.S. Namjoshi. Reasoning about Rings. POPL. pages 85-94, 1995. 14. E.A. Emerson and K.S. Namjoshi.

On Model Checking for Non-Deterministic Infinite-State Systems. LICS 1998. 15. E.A. Emerson and K.S. Namjoshi. Automatic erification of arameterized Synchronous Systems. CA LNCS Springer -V erlag, 1996. 16. J. Esparza, Fink el and R. Mayr On the erification of Broadcast Protocols. LICS 1999. 17. A. Fink el and J. Leroux. finite co ering tree for analyzing entropic broadcast protocols. Proc. VCL 2000. Report DSSE-TR-2000-6, Uni Southampton, GB. 18. S.M. German and A.P Sistla. Reasoning about Systems with Man Processes. CM 39(3), July 1992. 19. J. Handy The Cache Memory

Book. Academic Press, 1993. 20. R. urshan and K. L. McMillan. Structural Induction Theorem for Processes. PODC. pages 239-247, 1989. 21. D. Lesens, N. Halbw achs and Raymond. Automatic erification of arameterized Linear Netw ork of Processes. POPL 1997. pp 346-357, 1997. arallel Coordination Programs I. Acta Informatica 21 1984. 22. M. Maidl. Unifying Model Checking Approach for Safety Properties of arameterized Systems. CA 2001. 23. K. McAloon. Petri Nets and Lar ge Finite Sets. Theor etical Computer Science 32 pp. 173- 183, 1984. 24. A. Pnueli, S. Ruah and L. Zuck. Automatic Deducti

erification with In visible In ariants. CAS 2001, LNCS, 2001. 25. Pong and M. Dubois. Ne Approach for the erification of Cache Coherence Protocols. IEEE ansactions on ar allel and Distrib uted Systems ol. 6, No. 8, August 1995. 26. A. Sistla, arameterized erification of Linear Netw orks Using Automata as In ariants, CA 1997. 27. olper and Lo vinfosse. erifying Properties of Lar ge Sets of Processes with Net- ork In ariants. In J. Sif akis(ed) utomatic erification Methods for inite State Systems Springer -V erlag, LNCS 407, 1989. 16