 88K - views

# On Derivative Estimation of the Mean Time to Failure in Simulations of Highly Reliable Markovian Systems Marvin K

Nakayama Department of Computer and Information Science New Jersey Institute of Technology Newark NJ 07102 Abstract The mean time to failure MTTF of a Markovian system can be expressed as a ratio of two expectations For highly reliable Markovian sys

## On Derivative Estimation of the Mean Time to Failure in Simulations of Highly Reliable Markovian Systems Marvin K

Download Pdf - The PPT/PDF document "On Derivative Estimation of the Mean Tim..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

## Presentation on theme: "On Derivative Estimation of the Mean Time to Failure in Simulations of Highly Reliable Markovian Systems Marvin K"â€” Presentation transcript:

Page 1
On Derivative Estimation of the Mean Time to Failure in Simulations of Highly Reliable Markovian Systems Marvin K. Nakayama Department of Computer and Information Science New Jersey Institute of Technology Newark, NJ 07102 Abstract The mean time to failure (MTTF) of a Markovian system can be expressed as a ratio of two expectations. For highly reliable Markovian systems, the resulting ratio formula consists of one expectation that cannot be estimated with bounded relative error when using standard simulation, while the other, which we call a non-rare expectation, can be

estimated with bounded relative error. We show that some derivatives of the non- rare expectation cannot be estimated with bounded relative error when using standard simulation, which in turn may lead to an estimator of the derivative of the MTTF that has unbounded relative error. However, if particular importance-sampling methods (e.g., balanced failure biasing) are used, then the estimator of the derivative of the non-rare expectation will have bounded relative error, which (under certain conditions) will yield an estimator of the derivative of the MTTF with bounded relative error. Subject

classiﬁcations : Probability, stochastic model applications: highly dependable sys- tems. Simulation: statistical analysis of derivative estimators. Simulation, eﬃciency: im- portance sampling.
Page 2
1 Introduction The mean time to failure (MTTF) of a highly reliable Markovian system satisﬁes a ratio formula ξ/ , where and are expectations of random quantities deﬁned over regenerative cycles; e.g., see Goyal et al. (1992). Shahabuddin (1994) showed that can be estimated with bounded relative error when using standard simulation (i.e., no importance

sampling), and so we call a non-rare expectation. (A simulation estimator has bounded relative error if the ratio of the standard deviation to the mean remains bounded as the failure rates of the components vanish. In practice, this means that one can obtain good estimates of the mean independently of how rarely the system fails.) He also proved that the standard-simulation estimator of has unbounded relative error, and so we call a rare expectation; however, if certain importance-sampling methods such as balanced failure biasing (see Shahabuddin 1994 and Goyal et al. 1992) are used to

estimate , the corresponding estimator has bounded relative error (see Nakayama 1996 for generalizations and Hammersley and Handscomb 1964 and Glynn and Iglehart 1989 for details on importance sampling). Moreover, Shahabuddin (1994) showed that if both the numerator and denominator are estimated with bounded relative error, the resulting estimator of has bounded relative error. In this paper, we consider estimators of derivatives of with respect to the failure rate of any component type obtained using the likelihood-ratio derivative method (e.g., see Glynn 1990). Letting denote the derivative

operator with respect to , we have that (1) Since the standard-simulation estimator of does not have bounded relative error whereas the one for does, the previous research on derivative estimation in reliability systems focused on . Nakayama (1995) showed that for any component type , the standard-simulation estimator of has unbounded relative error and the balanced-failure-biasing estimator of has bounded relative error; see Nakayama (1996) for generalizations. However, we now prove that when standard simulation is applied, the estimator of may not have bounded relative error, even though the

estimator of always does. We show by example that this can result in an estimator of that has unbounded relative error, even if , and are estimated (mutually independently) with bounded relative error (i.e., is estimated using standard simulation and and are estimated with, for example, balanced failure biasing). Hence, we apply particular importance-sampling schemes (e.g., balanced fail- ure biasing) to obtain an estimator of having bounded relative error, which then results (under certain conditions) in an estimator of having bounded relative error. The rest of the paper is organized as

follows. Section 2 contains a description of the mathe-
Page 3
matical model. In Section 3 we ﬁrst review the basics of derivative estimation and importance sampling. Then we present results on the asymptotics of , which are subsequently used in an example showing that applying standard simulation to estimate can result in an estimator of having unbounded relative error. Section 3 concludes with our theorem on the bounded relative error of an estimator of . The proofs are collected in the appendix. For closely related empirical results on the estimation of , see Nakayama,

Goyal, and Glynn (1994). (In that paper all four terms in (1) are estimated using the same simulation runs with importance sampling, whereas in our analysis here, we do not apply importance sampling to estimate and the four quantities are estimated independently.) 2 Model Shahabuddin (1994) developed a model of a highly reliable Markovian system to analyze some performance measure estimators, and Nakayama (1995) later modiﬁed it to study derivative estimators. We will work with the latter model, which we now describe brieﬂy. The system consists of K < diﬀerent types of

components, and there are components of type , with . As time evolves, the components fail at random times and are repaired by some repairpersons. We model the evolution of the system as a continuous-time Markov chain (CTMC) ) : on a ﬁnite state space . We decompose as where (resp., ) is the set of operational (resp., failed) states. We assume that the system starts in state 0, the state with all components operational, and that 0 . Also, we assume that the system is coherent; i.e., if and with ) for all = 1 ,...,K then , where ) denotes the number of components of type operational in

state The lifetimes of each component of type are exponentially distributed with rate 0, and we let , ,..., ). We will examine derivatives with respect to the for diﬀerent component types . We let x,i ) be the probability that the next state visited is when the current state is and a component of type fails. This general form of the state transitions allows for component failure propagation (i.e., the failure of one component causes others to fail simultaneously with some probability). We denote a failure transition ( x,y ) (i.e., a transition of corresponding to the failure of some

component(s)) by .” A repair transition x,y ) (i.e., a transition of corresponding to the repair of some component(s)), which we denote by ,” occurs at exponential rate x,y 0. A single transition ( x,y ) cannot consist of some components failing and others completing repair. Let ) = λ,x,y ) : x,y be the generator matrix of . Let {·} ], and Var[ ] be the probability measure and expectation and variance operators, respectively,
Page 4
induced by the -matrix. The total transition rate out of state is λ,x λ,x,x ) = =1 x,y (2) Let = 0 ,... be the embedded discrete-time

Markov chain (DTMC) of the CTMC . Let ) = λ,x,y ) : x,y denote the transition probability matrix of . Then deﬁne Γ = x,y ) : λ,x,y , which is the set of possible transitions of the DTMC and is independent of the parameter setting As in Shahabuddin (1994) and Nakayama (1995), the failure rate of each component type is parameterized as ) = , where 1 is an integer, 0, and > 0. We deﬁne min . All other parameters in the model (including the repair rates) are independent of . In the literature on highly reliable systems, the behavior of the system is examined as 0. In

the following, we will sometimes parameterize quantities by rather than to emphasize when limits are being considered. For some constant , a function is said to be ) if / 0 as 0. Similarly, ) = ) if | for some constant 0 for all suﬃciently small. Also, ) = ) if | for some constant 0 for all suﬃciently small. Finally, ) = Θ( ) if ) = ) and ) = ). We use the following assumptions from Shahabuddin (1994) and Nakayama (1995): A1 The CTMC is irreducible over A2 For each state with = 0, there exists a state (which depends on ) such that ( x,y Γ and A3 For all states such

that (0 ,y Γ, , ,y ) = ). A4 If x,i 0 and x,j 0, then A5 If there exists a component type such that and ; 0 ,i 0, then there exists another component type such that and ; 0 ,j ; 0 ,i ). Assumption A3 ensures that if the fully operational system can reach a failed state in one transition, then the probability of this transition is much smaller than the largest transition probability from state 0. Nakayama (1995) introduced A4 and A5 as technical assumptions to simplify the analysis of certain derivative estimators.
Page 5
3 Derivative Estimators of Our goal is to analyze

estimators of derivatives of the MTTF , where (0) = 0] with = inf s> 0 : . Goyal et al. 1992 showed that ξ/ , with [min ,G and [1 < T ], where for some set of states = inf n > 0 : =0 /q λ,X ), and 1 {·} denotes the indicator function of the event {·} . To simplify notation, let = min ,G =0 /q λ,X ) and = 1 < T , where = min ,T . Also, observe that since = 0 with probability 1, = 1 /q λ, 0) + with probability 1 and = (1 /q λ, 0)) + ], where =1 /q λ,X ). Now recall our expression for the derivative of given in (1). Shahabuddin (1994) analyzed estimators of and , and

Nakayama (1995) studied estimators of . Thus, to complete the analysis of , we now will study . Using the likelihood ratio method for computing derivatives, we obtain ], where λ, 0) λ, 0) HS λ, 0) HS by (2), and =1 λ,X λ,X =1 λ,X (3) with =0 λ,X ,X +1 λ,X ,X +1 ) and λ,x,y ) = λ,x,y ). (For further details on the likelihood ratio method for estimating derivatives, see Glynn 1990.) We now brieﬂy review the basics of standard simulation and importance sampling. Consider a random variable having a density , and we want to estimate ], where

is the expectation operator under . We apply standard simulation by collecting i.i.d. samples = 1 ,...,n , of generated using density and constructing the estimator (1 /n =1 . Let be a density that is absolutely continuous with respect to (i.e., ) = 0 implies ) = 0 for ∈< ), and let denote its expectation operator. Deﬁne ) = /g ) to be the likelihood ratio evaluated at the point ∈< , and deﬁne the random variable ). Then, ZL ]. We implement importance sampling by collecting i.i.d. samples ( ,L ), = 1 ,...,n , of ( Z,L ) generated using the density , and constructing

the estimator = (1 /n =1 . Properly choosing the new density can result in a (substantial) variance reduction. As we shall soon see, importance sampling can be generalized beyond the realm of single random variables having a density to consider complex stochastic systems; for more details on importance sampling, see Glynn and Iglehart (1989). We now show how these ideas apply in the estimation of . We use standard simulation to estimate by collecting i.i.d. observations = 1 ,...,n , of . Each sample is generated by using the transition matrix to simulate the DTMC starting in state 0 until

Page 6
∪{ is hit. Then our standard-simulation estimator of is =1 Turning now to importance sampling, consider a distribution (which may depend on the particular failure and repair rates) that is absolutely continuous with respect to . We deﬁne ,...,x ) = ,...,X ) = ( ,...,x ,...,X ) = ( ,...,x to be the likelihood ratio evaluated at the sample path ( ,...,x ), and let ,...,X ). Then /q , 0)+ [( HS ], where is the expectation operator under . We assume A6 The distribution is Markovian with transition matrix ) = ( ,x,y ) : x,y such that ,x,y ) = 0 implies that ,x,y )

= 0 (i.e., is absolutely continuous with respect to ). Thus, ,...,x ) = =0 ,x ,x +1 ,x ,x +1 ). Also, for all x,y Γ, ,x,y ) = Θ(1) as 0. We apply importance sampling by collecting i.i.d. samples ( ,H ,S ,L ), = 1 ,...,n of ( ,H,S ,L ) generated by simulating the DTMC using the transition matrix . Then /q , 0) + =1 is the importance-sampling estimator. Balanced failure biasing is an importance-sampling method that satisﬁes Assumption A6 The basic idea of the technique is as follows. From any state having both failure and repair transitions, we increase (resp., decrease) the

total probability of a failure (resp., repair) transi- tion to (resp., 1 ), where is independent of . We allocate the equally to the individual failure transitions from . The 1 is allotted to the individual repair transitions in proportion to their original transition probabilities. From state 0, we change the transition probability of any possible (failure) transition to 1 /m , where is the number of failure transitions possible from state 0. See Shahabuddin (1994) and Goyal et al. (1992) for more details. As previously noted, we need estimators of , and in (1) to estimate . In a manner

analogous to how was developed earlier, we can construct and , which are the standard-simulation estimators of and , respectively. Shahabuddin (1994) showed that = Θ( ), and that Var[ ] = (1), where Var represents the variance operator under the measure . Thus, the relative error of , deﬁned as RE Var[ / , remains bounded (and actually vanishes) as 0. On the other hand, Shahabuddin proved that = Θ( for some constant 1 which depends on the model and that Var[ ] = = Θ( ), so RE ( Var[ / as 0. Also, Var[ IL ] = Θ( ), where Var is the variance
Page 7
operator

under a probability measure satisfying Assumption A6 . Hence, the correspond- ing importance-sampling estimator of under the measure has bounded relative error. Nakayama (1995,1996) proved that = Θ( ), where = min( ) and and Ż are constants depending on the model. It was also shown that Var[ IS ] = Θ( ), where = min( ), and Var[ IS ] = Θ( ) under any measure satisfying Assumption A6 . Therefore, the standard-simulation estimator of has unbounded relative error, whereas the importance-sampling estimator has bounded relative error. In this paper, we analyze various estimators of

and We start with the analysis of . The following result shows that standard simulation is not always an eﬃcient way of estimating , and so importance sampling needs to be applied. Lemma 1 Consider any system as described in Section 2. Then, (i) /q , 0) + HS where /q , 0) = Θ( and HS ] = ; hence, = Θ( (ii) When applying standard simulation, Var ] = Θ( Thus, when standard simulation is used, RE remains bounded as if and only if If an importance-sampling distribution satisfying Assumption A6 is applied, then (iii) Var /q , 0) + ( HS ] = as (iv) For any component type ,

the relative error of the importance-sampling estimator of RE as Now let us turn to the estimation of , which is what we are really interested in. For our estimator of , we assume that , and are estimated mutually independently. Suppose we have four (importance-sampling) probability measures , and having corresponding expectation operators , and . Any of these probability measures may be the original probability measure . Also, suppose we have random variables and having measures , and , respectively, for which ] = ] = ] = , and ] = . We assume there exist constants = 0, = 0, = 0, = 0 and that

are independent of such that ), ), ), and ). As we saw in Lemma 1, . Shahabuddin (1994) showed that 1 and , and Nakayama (1995) established that We construct an estimator of as follows. Collect (resp., , and ) i.i.d. samples of (resp., , and ) generated using measure (resp., , and ), where the observations of , and are mutually independent. This yields = 1 ,...,n
Page 8
= 1 ,...,n = 1 ,...,n ; and = 1 ,...,n . Let =1 /n and similarly deﬁne and . Then our estimator of is = ( . This approach is known as “measure-speciﬁc importance sampling”; see Goyal et al. (1992). Now

let , and be the variances of , and , respectively, under their corresponding measures. Nakayama, Goyal, and Glynn (1994) showed that the asymptotic variance of the resulting estimator of is (4) There are no covariance terms since , and are mutually independent. Now let us examine a particular estimator of for a speciﬁc model. Example 1: Consider a system with three types of components, where all three component types have a redundancy of 1 (i.e., = 1). The ﬁrst two component types have failure rate (i.e., = 1), and components of the third type have failure rate (i.e., = 4);

therefore, = 1. Failed components are ﬁxed by a single repairperson at rate 1 using a processor-sharing discipline, and the system is operational if and only if at least two components (of any types) are operational. We consider derivatives with respect to . It can be shown that 4 + ), ), 2 + ), = 3 2 + (1), 2 + ), and ). Now suppose that we use measure-speciﬁc importance sampling in which and are estimated using a probability measure satisfying Assumption A6 and and are estimated using standard simulation; i.e., IL IS and and . We can show that = Θ( ), = Θ( ), =

Θ(1), and = Θ(1). Hence, the estimators of , and have bounded relative error, but the estimator of does not. Moreover, the variance of the resulting estimator of is Θ( ), and so the estimator of has unbounded relative error. Thus, if the estimator of has unbounded relative error, then the resulting estimator of may also, even though the other terms in (1) are estimated with bounded relative error. On the other hand, we have the following theorem, which shows that under certain conditions, if each of the four terms is estimated with bounded relative error, then the corresponding

estimator of has bounded relative error. Theorem 1 Consider any system as described in Section 2. Using the notation above, assume that = 0 whenever . Consider any importance-sampling distribution satisfying Assumption A6 , and let and . Let /q , 0) + HS IL IS , and assume that , and are mutually independent. Then, σ/ remains bounded as
Page 9
Remarks: (i) In Example 1, 4, = 1, = 1 2, = 3 2, and 2, = 1, 1, = 0. Thus, and = 0, and so the estimator of based on Theorem 1 will have bounded relative error. (ii) We can show that when the technical condition of Theorem 1 is not

satisﬁed, the resulting derivative corresponds to a small sensitivity (i.e., there are other sensitivities that are asymptotically at least an order of magnitude larger). (The sensitivity of the MTTF with respect to is deﬁned to be , which measures the eﬀect of relative changes in the parameter value on the overall MTTF.) Therefore, even when we may not be able to estimate the derivative with bounded relative error, it is typically not that important since there are other derivatives with respect to other failure rates that have a much larger impact on the MTTF. (iii)

Theorem 1 easily generalizes to an estimator of the derivative of any ratio formula, not only for the MTTF. Speciﬁcally, suppose ( is any derivative estimator (not just for ) with , and mutually independent and each having bounded relative error under their respective probability measures. Then, ( has bounded relative error if the technical condition of Theorem 1 holds. In particular, the steady-state unavailability satisﬁes a ratio formula consisting of a rare expectation and a non-rare expectation, and so we can similarly analyze this performance measure. 4 Appendix We will

prove Lemma 1 by using some order of magnitude and bounding arguments as in Nakayama (1995). To do this, we deﬁne Ω to be the set of state sequences that start in state 0 and end in either 0 or ; i.e., Ω = ,...,x ) : , x = 0 , x ∈{ ,F , x 6∈{ ,F for 1 i For 0, let ,...,x Ω : , P ,...,X ) = ( ,...,x = Θ( the set of state sequences in Ω that have probability of order (under the original measure). For each component type , let = inf k > 0 : , n ,i which is the ﬁrst failure transition of the DTMC that may have been triggered by a failure of a

component of type . We use the notation ,...,x ) to denote the random
Page 10
variable evaluated at the sample path ( ,...,x Ω. (We do the same for the random variables and .) Also, for each component type , we deﬁne ,...,x Ω : , T ,...,x ,...,x ,...,x Ω : , T ,...,x >T ,...,x Furthermore, let = and , and note that =0 and =0 . Let =1 , which is the total number of components in the system. Lemma 2 Under the assumptions of Lemma 1, the following hold: (i) and |≤| +1)( +1) for all (ii) For any path ,...,x with + 1)( + 1) and ,...,X ) = ,...,x } for all >

suﬃciently small, where and are constants which are independent of ,...,x , and > Also, for each component type (iii) for m , and . For all ,...,x ,...,x ) = 1 for all , and ,...,x ) = Θ( . For any path ,...,x with ,...,x | + 1) φ , where is some constant which is independent of ,...,x , and > (iv) . For all ,...,x for all , and ,...,x ) = Θ( . For any path ,...,x with ,...,x | + 1) φ We can prove Lemma 2(i) by constructing a path (0 ,x ,x ,...,x whose ﬁrst tran- sition is a failure transition with transition probability Θ(1) and the remaining

transitions are repair transitions (there may be more than one due to failure propagation). The proof is straightforward and is omitted. Also, the proofs of the other parts are not included since they can be shown with arguments like those used to establish part (i) and Theorems 2 and 3 of Nakayama (1995). Parts (iii)–(iv) of Lemma 2 imply that and =0 Proof of Lemma 1. First we prove part (i). Note that , 0) ] + HS (5) where we recall the deﬁnition of in (3). Observe that , 0) = Θ( ) (6)
Page 11
since , 0) = Θ( ) by (2) and the deﬁnition of We now analyze

the second term on the right-hand side of (5). First, deﬁne Ż ) = max /q ,x ) : S, x = 0 , which is Θ(1) by Assumption A2 since . Thus, there exists some constant 0 < κ < such that Ż for all suﬃciently small. As a consequence, =1 /q ,X 1) for all suﬃciently small  > 0, and so =1 /q ,X κE 1] for all suﬃciently small  > 0. Using Lemma 2 and bounding arguments similar to those use in the proof of Theorem 1 of Nakayama (1995), we can show that ] = Θ(1) for all suﬃciently small > 0, which implies that ] = (1) (7) We now analyze the third

term on the right-hand side of (5). Note that HS ] = HS ] + HS >T (8) where ] for some set of events . We analyze the two terms on the right-hand side of (8) separately. For the ﬁrst term, observe that = , and HS ,...,x n> ,...,x ,...,x ,...,X ) = ( ,...,x +1 ,...,x n> ,...,x ,...,x ,...,X ) = ( ,...,x To analyze , consider any path ( ,...,x n > 0, and note that = 0 for < k < n . Deﬁne Ż ) = max /q ,x ) : S, x = 0 , which is Θ(1) by Assumption A2 since . Thus, there exists some constant 0 <υ < such that Ż for all > suﬃciently small. It then follows that for any

path ( ,...,x 0, ,...,x ) = =1 (1 /q ,x )) n +1)( +1) for all suﬃciently small > 0 by Lemma 2(ii), which implies that ,...,x ) = (1) for any path ( ,...,x . Lemma 2(iii) states that ,...,x ) = Θ( ) for each ( ,...,x , and ,...,X ) = ( ,...,x Θ( ) for each ( ,...,x . Hence, since the number of sample paths in is ﬁnite by Lemma 2(i), we have that ). Also, using bounding arguments similar to those used in the proof of Theorem 1 of Nakayama (1995), we can use Lemma 2(ii)–(iv) to show that ), and so HS ] = ). We can similarly apply Lemma 2(iii)–(iv) to prove that the

second term in (8) satisﬁes HS > T ] = ), and so HS ] = ). This, along with (6) and (7), establishes part (i). 10
Page 12
To prove part (ii), ﬁrst note that Var[ ] = Var [ HS ] = [( HS HS ]) and that [( HS ] = ] + [( HS ] + 2 HH ]. Following the same line of reasoning used to establish part (i), we can prove that ] = (1) and HS ] = ). Thus, we need to show that [( HS ] = Θ( ), which we do by ﬁrst not- ing that [( HS ] = [( HS ] + [( HS >T ]. Then by following arguments like those used to prove part (i), we can show that [( HS ] = Θ( ) and [( HS >T ] =

Θ( ) = ) since . Therefore, [( HS ] = Θ( ), and [( HS ] = Θ( ). Combining this with part (i) establishes part (ii). We omit the proofs of parts (iii) and (iv) since they can be established using the techniques above and in the proof of Theorem 9 of Nakayama (1995). Proof of Theorem 1. We assumed that = Θ( ), = Θ( ), = Θ( ), and = Θ( ). Then, = (Θ( Θ( )) Θ( ) = Θ( min( b, c ) since = 0 whenever . Because all of the estimators have bounded relative error (see Lemma 1, Shahabuddin 1994, and Nakayama 1995,1996), , , ). Then (4) implies that =

Θ( ) + 2 min( b, a ) + Θ( ) + Θ( min(2 +2 b, ) = 2 min( b, a and the result easily follows. Acknowledgments This research was partially supported by NJIT SBR Grant 421180. Also, the author would like to thank the Area Editor, Associate Editor, and two anonymous referees for the detailed comments, which improved the quality of the paper. References Glynn, P. W. 1990. Likelihood Ratio Derivative Estimators for Stochastic Systems. Comm. ACM 33 , 75–84. Glynn, P. W. and D. L. Iglehart. 1989. Importance Sampling for Stochastic Simulations. Mgmt. Sci. 35 , 1367–1393. 11
Page

13
Goyal, A., P. Shahabuddin, P. Heidelberger, V. F. Nicola, and P. W. Glynn. 1992. A Uniﬁed Framework for Simulating Markovian Models of Highly Dependable Systems. IEEE Trans. Comput. C-41 , 36–51. Hammersley, J. M. and D. C. Handscomb. 1964. Monte Carlo Methods . Metheun, London. Nakayama, M. K. 1995. Asymptotics of Likelihood Ratio Derivative Estimators in Simula- tions of Highly Reliable Markovian Systems. Mgmt. Sci. 41 , 524–554. Nakayama, M. K. 1996. General Conditions for Bounded Relative Error in Simulations of Highly Reliable Markovian Systems. Adv. Appl. Prob. 28 ,

687–727. Nakayama, M. K., A. Goyal, and P. W. Glynn. 1994. Likelihood Ratio Sensitivity Analysis for Markovian Models of Highly Dependable Systems. Opns. Res. 42 , 137–157. Shahabuddin, P. 1994. Importance Sampling for the Simulation of Highly Reliable Marko- vian Systems. Mgmt. Sci. 40 , 333–352. 12