Andy Wang CIS 593003 Computer Systems Performance Analysis Analysis of Simulation Results Check for correctness of implementation Model verification Check for representativeness of assumptions ID: 583178
Download Presentation The PPT/PDF document "Analysis of Simulation Results" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Analysis of Simulation Results
Andy Wang
CIS 5930-03
Computer Systems
Performance AnalysisSlide2
Analysis of Simulation Results
Check for correctness of implementation
Model verification
Check for representativeness of assumptionsModel validationHandle initial observationsDecide how long to run the simulation
2Slide3
3
Model Verification Techniques
Verification is similar to debugging
Programmer’s responsibility
Validation
Modeling person’s responsibilitySlide4
Top-Down Modular Design
Simulation models are large computer programs
Software engineering techniques apply
ModularityWell-defined interfaces for pieces to coordinateTop-down designHierarchical structure
4Slide5
Antibugging
Sanity checks
Probabilities of events should add up to 1
No simulated entities should disappearPackets sent = packets received + packets lost
5Slide6
Structured Walk-Through
Explaining the code to another person
Many bugs are discovered by reading the code carefully
6Slide7
Deterministic Models
Hard to verify simulation against random inputs
Should debug by specifying constant or deterministic distributions
7Slide8
Run Simplified Cases
Use only one packet, one source, one intermediary node
Can compare analyzed and simulated results
8Slide9
Trace
Time-ordered list of events
With associated variables
Should have levels of detailsIn terms of occurred events, procedure called, or variable updatesProperly indented to show levels Should allow the traces to be turned on and off
9Slide10
On-line Graphic Displays
Important when viewing a large amount of data
Verifying a CPU scheduler preempt processes according to priorities and time budgets
10Slide11
Continuity Test
Run the simulation with slightly different values of input parameters
Δ change in input should lead to Δ change in output
If not, possibly bugs
11Slide12
Degeneracy Test
Test for extreme simulation input and configuration parameters
Routers with zero service time
Idle and peak loadAlso unusual combinations Single CPU without disk
12Slide13
Consistency Tests
Check results for input parameters with similar effects
Two sources with arrival rate of 100 packets per second
Four sources with arrival rate of 50 packets per secondIf dissimilar, possibly bugs
13Slide14
Seed Independence
Different random seeds should yield statistically similar results
14Slide15
Should validate
Assumptions
Input values and distributions
Output values and conclusionsAgainstExpert intuitionReal system measurementsTheoretical results
15
Model Validation TechniquesSlide16
Model Validation Techniques
May not be possible to check all nine possibilities
Real system not available
May not be possible at allThe reason why the simulation was builtAs the last resortE.g., economic model
16Slide17
Expert Intuition
Should validate assumptions, input, and output separately and as early as possible
17
% packet loss
Throughput
Why would increased packet
loss lead to better throughput?Slide18
Real-System Measurements
Most reliable way for validation
Often not feasible
System may not existToo expensive to measureApply statistical techniques to compare model and measured dataUse multiple traces under different environments
18Slide19
Theoretical Results
Can apply
queueing
modelsIf too complexCan validate only the common scenariosCan validate a small subset of simulation parametersE.g., compare analytical equations with CPU simulation models with one and two cores
Use validated simulation to simulate many cores
19Slide20
Transient Removal
In most cases, we care only about steady-state performance
We need to perform
transient removal to remove initial data from analysisDifficultyFind out where transient state ends
20
Transient RemovalSlide21
Long Runs
Just run the simulation for a long time
Waste resources
Not sure if it’s long enough21Slide22
Proper Initialization
Start the simulation in a state close to steady state
Pre-populate requests in various queues
Pre-load memory cache contentPre-fragment storage (e.g., flash)Reduce the length of transient periods
22Slide23
Truncation
Assume steady state variance < transient state variance
Algorithm
Measure variability in terms of rangeRemove the first L observations, one at a timeUntil the (L + 1)th observation is neither min nor max or the remaining observations
23Slide24
Truncation
24
LSlide25
Initial Data Deletion
m replications
n data points for each replication
Xij = jth data point in i
th
replication
25Slide26
Initial Data Deletion
Step 1: average across replications
26Slide27
Initial Data Deletion
Step 2: compute grand mean
µ
Step 3: compute µL
=
average last n – L values,
L
27Slide28
Initial Data Deletion
Step 4: offset
µ
L by µ and normalize the result to µ by computing relative change Δµ
L
= (µ
L
- µ)/µ
28
Transient intervalSlide29
Moving Average of Independent Replications
Similar to initial data deletion
Requires computing the mean over a sliding time window
29Slide30
Moving Average of Independent Replications
m replications
n data points for each replication
Xij = jth data point in i
th
replication
30Slide31
Moving Average of Independent Replications
Step 1: average across replications
31Slide32
Moving Average of Independent Replications
Step 2: pick a k, say 1; average (j – k)
th
data point to (j + k)th data point, j; increase k as necessary
32
Transient intervalSlide33
Batch Means
Used for very long simulations
Divide N data points into m batches of n data points each
Step 1: pick n, say 1; compute the mean for each batchStep 2: compute the mean of meansStep 3: compute the variance of meansStep 4: n++, go to Step 1
33Slide34
Batch Means
Rationale: as n approaches the transient size, the variance peaks
Does not work well with few data points
34
Transient intervalSlide35
Terminating simulations: for systems that never reach a steady state
Network traffic consists of the transfer of small files
Transferring large files to reach steady state is not useful
System behavior changes with timeCyclic behaviorLess need for transient removal
35
Terminating SimulationsSlide36
Terminating Conditions
Increase the number of multimedia streams
Until missing 10 deadlines within a time window
May not terminateOne deadline miss may stretch across time windowsFix: until missing 3 deadlines…
36Slide37
Final Conditions
Handling the end of simulations
Might need to exclude some final data points
E.g., Mean service time = total service time/n completed jobs
37Slide38
Stopping Criteria: Variance Estimation
If the simulation run is too short
Results highly variable
If too longWasting resourcesOnly need to run until the confidence interval is narrow enoughSince confidence interval is a function of variance, how do we estimate variance?
38Slide39
Independent Replications
m runs with different seed values
Each run has n + n
0 data pointsFirst n0 data points discarded due to transient phaseStep 1: compute mean for each replication based on n data points
Step 2: compute
µ,
mean of means
Step 3: compute
2
,
variance of means
39Slide40
Independent Replications
Confidence interval:
µ ± z
1-α/22
Use
t
[1-
α
/2; m – 1]
, for m < 30
This method needs to discard mn
0
data points
A good idea to keep m small
Increase n to get narrower confidence
40Slide41
Batch Means
Given a long run of N + n
0
data pointsFirst n0 data points discarded due to transient phaseN data points are divided into m batches of n data points
41Slide42
Batch Means
Start with n = 1
Step 1: compute the mean for each batch
Step 2: compute µ, mean of meansStep 3: compute 2
,
variance of means
Confidence interval:
µ ± z
1-
α
/2
2
Use
t
[1-
α
/2; m – 1]
, for m < 30
42Slide43
Batch Means
Compared to independent replications
Only need to discard n
0 data pointsProblem with batch means Autocorrelation if the batch size n is smallCan use the mean of i
th
batch to guess the mean of (
i
+ 1)
th
batch
Need to find a batch size n
43Slide44
Batch Means
Plot batch size n vs. variance of means
Plot batch size n vs.
autocovariance Cov(batch_meani, batch_mean
i+1
),
i
44Slide45
Method of Regeneration
Regeneration
Measured effects for a computational cycle are independent of the previous cycle
45
Regeneration points
Regeneration cycleSlide46
Method of Regeneration
m regeneration cycles with
n
i data points eachStep 1: compute yi, sum for each cycleStep 2: compute grand mean,
µ
Step 3: compute the difference between expected and observed sums
w
i
=
y
i
-
n
i
µ
Step 4: compute
2
based on
w
i
46Slide47
Method of Regeneration
Step 5: compute average cycle length, c
Confidence interval:
µ ± z1-α/2
2
/(
cm
)
Use
t
[1-
α
/2; m – 1]
, for m < 30
47Slide48
Method of Regeneration
Advantages
Does not require removing transient data points
DisadvantagesCan be hard to find regeneration points
48Slide49
49
White Slide