/
Thoratec Workshop in   Applied Statistics for Thoratec Workshop in   Applied Statistics for

Thoratec Workshop in Applied Statistics for - PowerPoint Presentation

alexa-scheidler
alexa-scheidler . @alexa-scheidler
Follow
353 views
Uploaded On 2018-11-04

Thoratec Workshop in Applied Statistics for - PPT Presentation

QAQC Mfg and R D Part 2 of 3 Intermediate Difficulty Methods Instructor John Zorich wwwJOHNZORICHCOM JOHNZORICHYAHOOCOM Part 2 was designed for students who have taken Part 1 of these workshops ID: 714191

confidence sample avg data sample confidence data avg normal 100 null test size hypothesis amp interval reliability tests population

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Thoratec Workshop in Applied Statistic..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Thoratec

Workshop in

Applied Statistics for

QA/QC, Mfg, and R+D

Part 2 of 3:

Intermediate Difficulty Methods

Instructor : John Zorich

www.JOHNZORICH.COM JOHNZORICH@YAHOO.COM

Part 2 was designed for students who have

taken Part 1 of these workshops,

or who have used statistical methods on the job.Slide2

John Zorich's Qualifications:

20 years as a "regular" employee in the medical device industry (R&D, Mfg, Quality)

ASQ

Certified Quality Engineer (since 1996)

Statistical

consultant+instructor

(since 1999) for many companies, including Siemens Medical, Boston Scientific, Stryker, and Novellus

Instructor in applied statistics for

Ohlone

College (CA),

Pacific Polytechnic Institute (CA), and

KEMA

/

DEKRA

Past instructor in applied statistics for

UC

Santa Cruz Extension,

ASQ

Silicon Valley Biomedical Group, &

TUV

.

Publisher of 9 commercial, formally validated, statistical application Excel spreadsheets

that have been purchased by over 80 companies, world wide. Applications include: Reliability, Normality Tests & Normality Transformations, Sampling Plans,

SPC

, Gage

R&R

, and Power.

You’re invited to “connect” with me on LinkedIn.Slide3

Self-teaching & Reference Texts

RECOMMENDED by John Zorich

Clements:

Handbook of Statistical Methods in Mfg.

Cohen:

Statistical Power Analysis D'Agostino & Stevens: Goodness-of-Fit TechniquesDovich: Quality Engineering Statistics Dovich: Reliability Statistics Gross: A Normal Distribution CourseKaminsky et. al.: Statistics & QC for the WorkplaceKraemer: How Many Subjects? Mace: Sample-Size DeterminationMotulsky: Intuitive BiostatisticsMurphy & Myors: Statistical Power AnalysisNatrella: Experimental Statistics (1st edition)NIST Engineering Statistics Internet Handbook , found at http://www.itl.nist.gov/div898/handbook/index.htmPhilips: How to Think about StatisticsThode: Testing For Normality

Free

Recently re-publishedSlide4

Main Topics in Today's Workshop

Confidence Intervals

Significance Tests

Null Hypothesis

t-Tests, ANOVA, and P-values

Power calculations (e.g., for t-Tests)Confidence & Reliability CalculationsAttribute ( pass / fail ) dataVariables (measurement) dataNormal vs. Non-Normal dataK-tablesMTTF and MTBFNormality TestsThis is a lot to cover in 1 day, but your studying Student files & Instructor accessibility by email, complete the course.Slide5

(as taught in Part 1 of this Workshop) Distribution of Sample Avgs. vs. Population

Width of Distribution is measured in...

Std Errors

Std Deviations

Theoretical distribution of thousands of individual avgs taken from the population.Slide6

(as taught in Part 1 of this Workshop) Calculating a "standard error"

Avg#1

Avg#2

Avg#3

Avg#4 etc. Avg#N ------------- Std Dev of Avgs = Std Error of the Mean Multiple samples(with replacement) of the same size, from the same population, generated these Avgs.This is a theoretically correct but impractical method for calculation.Slide7

Standard Error of the

(sample)

Mean ( estimated from 1 sample )Practical application ( new topic for Part 2 ): 95% confidence interval for the Population Mean can be estimated using this equation: Sample Average + / – " t " x (Std Error of Mean)Practical formula for "Std Error of Mean" Sample Standard DeviationSample Size .

As taught in Part 1 of this workshopSlide8

In 2009, the US FDA told a medical-device start-up company client of John Zorich's that, in regards to the company's planned clinical trials....."

The equivalence of the device to the predicate can be demonstrated if the

confidence interval

for the difference in the mean values for the tested parameter excludes a difference larger than 20% from the predicate

.

" What the FDA was saying is this: Enroll a large number of patients, or we won't approve your product for sale in the United States! Unfortunately, the number of patients required to meet the FDA's confidence-interval mandate was determined to be larger than the company had time, money, or product to meet (the company had to close down & lay-off everyone).Confidence intervals are serious business!Slide9

Many statistical tests or methods can be thought of in terms of confidence intervals.

For example:

A 2-sided t-test is really an examination of whether or not the “Null Hypothesis” is inside or outside the sample’s 2-sided confidence interval.

Reliability

calculations are really lower 1-sided confidence limits on the observed %

in-specification; and (as we’ll see upcoming slides) because confidence limits are automatically adjusted based upon sample size, ANY sample size if valid.“SPC chart control-limit lines” are really upper and lower 2-sided confidence limits on the current process average.Confidence intervals are valuable!Slide10

95% confidence interval for

Sample’s “

Parameter Mean

is

91.6 to 108.4White area = 95% of area under both curves( colored tails = 5% )Smpl AvgLet's look at this in more detail, using Excel.These 2 curves are theoretical distributions of sample Avgs taken from Populations having identical Std Deviations but different Means ---91.6 & 108.4, respectively.Slide11

A "

95% confidence interval of the mean

" can be thought of in at least 2 ways:

an interval around the observed mean of a sample

, in which interval you can expect (with 95% confidence) to find the true mean (

= parameter) of the population from which the sample was taken (e.g., for cable diameters, sample avg = 0.020, 95% conf. interval = 0.019 to 0.021). THIS IS THE CORRECT & MOST COMMONLY USED INTERPRETATION OF THE TERM. an interval around the true mean of a population, in which interval you can expect to find 95% of all possible random sample means of a given size sample taken from that population. THIS IS NOT A CORRECT USE OF THE TERM !! Some people use this term to refer to the range of 95% of the population data --- that is also incorrect.95% Confidence Interval ( & Limits)Slide12

95% Confidence Interval ( & Limits)

True Mean

Sample Mean

THIS

is the "correct" interpretation of what a "Confidence Interval" is.The "Confidence Limits" are at the extreme left and right ends of this range.NOT

the "correct" interpretation of what a "Confidence Interval" is.Slide13

Sample Mean

90 % confidence interval

95 % confidence interval

99 % confidence interval

Where is the true mean ( = the Parameter Mean)? Answer: Somewhere in Confidence Interval.How confident can we be of that answer? Answer: It depends on the size of the interval.

What is size of interval in which we are sure

( = 100% confident) of finding Parameter Mean?

Answer:

Minus infinity to plus infinity!Slide14

Sample Mean

Confidence interval

based on a

LARGE sample-size

Confidence interval

based on a MEDIUM sample-sizeConfidence interval based on a SMALL sample-sizeWhere is the true mean ( = the Parameter Mean)? Answer: Somewhere in Confidence Interval (at the chosen confidence level, typically 95%).The choice of sample size is arbitrary, based upon how narrow you want the confidence interval ( i.e. how precisely you want to know the parameter). CI width is inversely proportional to square root of sample size:width = +/– t x

SampleStdev / sqrt

(

SampleSize

)

Slide15

Example of misuse of Confidence Intervals:

In 2009, a billion dollar manufacturing company submitted to a regulatory agency a report claiming that performance data between the stressed and unstressed new product were not significantly different, because the confidence intervals of the two populations overlapped.

The agency officially requested a literature or text book reference that explained such a rationale. After a few rounds of emails and re-writings of the report (and still no literature reference) the company consulted a professional statistician, who used a different statistical method to proved equivalency.

NOTE: As you can tell from the previous 2 slides, confidence intervals of virtually "

any"

two samples can overlap, if you use a small enough sample size and/or a large-enough confidence level. Slide16

This is a " t " Table (which is used to calculate Confidence Intervals)

"A" = the sum of BOTH dark areas of the curve above, expressed as a decimal fraction of the whole area under the curve.

"

v

" or "

d.f." is always smaller than the sample size, (in most cases, it’s equal to “sample size – 1” ).This is a t-curveSlide17

For sample size of

6 (d.f. = 5 ), the 95% confidence interval is

Avg +/

2.571 SEavg

30+ ≈ 1.7 ≈ 2.0 ≈ 2.4 ≈ 2.7 The #s in this last line are useful values to memorize. "one tailed" this would be 0.05 = 5%Slide18

95%

upper

1-tail confidence limit of the mean

= 106.7

The solid area

(5% of the whole) represents the % of the area under the curve to the left of the sample avg.95% of area under curveSmpl AvgSlide19

95%

lower

1-tail confidence limit of the mean

=

93.3

The solid area (5% of the whole) represents the % of the area under the curve to the right of the sample avg. 95% of area under curveLet's look at this in more detail, using ExcelSmpl AvgSlide20

63

2.0

Use this " t " Table to do the exercises on next few slides.Slide21

Class exercise: Confidence Limits

As a group, let's calculate the

90%

2-sided

confidence limits for the population mean, assuming the sample has... Avg = 100 Standard deviation = 10 Sample size = 9Answer: = 100 + / – ( 1.860 x 10 / sqrt( 9 ) ) = 100 + / – ( 18.6 / 3 ) = 100 + / – ( 6.2 ) = 93.8 and 106.2We are 90% sure that Parameter Avg is between these limits.t-table value for A = ( 1.0 – 0.9 ), d.f. = ( 9 – 1 )Slide22

DIFFICULT exercise: Sample Size

What

minimum sample size ( n )

is needed to know the true (Parameter) Average to within +/– 3 , with a confidence of 98% ( = tails are 2 % = 0.02 ) if we anticipate sample Std Deviation = 2.83 ?Answer: Conf. Interval = Avg +/– t x SEavg We need: t x SEavg = 3 using t-table column with A = 0.02 t x SEavg = 4.541 x 2.83 / Sqrt( 4 ) = 6.43 = 3.365 x 2.83 / Sqrt( 6 ) = 3.89 = 2.998 x 2.83 / Sqrt( 8 ) = 3.00

n = 8Slide23

In a normal distribution, +/

Z std values from the Parameter Avg encompasses 2 x A of the population.

+/

2.00 standard values equals2 x 0.4773 = 95.5%of the area under the normal curve

This is called a " Z " Table

+/

3.00 standard values equals

2 x 0.4987 = 99.7%

of the area under the normal curveSlide24

" Z " Table

Some statistical tests use

Z-tables instead of t-tables

,

for simplicity

when sample sizes are large (sample size does not appear in a Z-table, as it does in a t-Table). However... a " t-Test " always provides a (slightly) more accurate answer than a " Z-test " when comparing most statistics (e.g., a comparison of averages), and so Z-tests will not be taught in this workshop.

+/

2.00 standard values equals

2 x 0.4773 = 95.5% of the area

under the normal curve;

but on a t-Table, +/– 2.00 std values =

just 95.0% even

if sample size is 60 !!

Only if n = infinity are t-table values identical to Z-table values.Slide25

Microsoft Excel has a 3 different "functions" that claims to calculate

1/2 of the confidence interval of the mean

:

Excel 2007 & earlier:

CONFIDENCE ( 1– confidence, StdDev, sample size ) Starting in Excel 2010: CONFIDENCE.NORM ( 1– confidence, StdDev, sample size ) CONFIDENCE.T ( 1– confidence, StdDev, sample size ) Do not use CONFIDENCE nor CONFIDENCE.NORM, because they base their calculations on the Z-table rather than t-table and so produce a confidence interval that is too small for a sample mean and a sample standard deviation.Use only the last one ( CONFIDENCE.T ), because it will produce the exact same results as a manual calculation using the formula: SmplAvg +/− t x StdErrorMean.Microsoft Excel "Confidence" functionSlide26

What about ATTRIBUTE Confidence Limits?

Over a dozen different methods exist for calculating

binomial

confidence limits for sample % defective --- each of those methods

gives a different length & different conf. limits !!!

The classic method is called the “Exact” binomial --- it can be calculated via Excel's "Beta" function --- for example:UPPER 2-tailed Confidence Limit =betainv( 1 – (1 – C ) / 2 , k + 1 , N – k )LOWER 2-tailed Confidence Limit =betainv((1 – C ) / 2 , k , N – k + 1 )

C = Confidence N = Sample size (e.g., 100 )

k

= observed number of heads, or "yes" votes, or etc.)

95% conf. limits for observed 10 defects in sample size = 100

betainv

( 1

(1

0.95) / 2, 10 + 1 , 100

– 10 ) = 0.17622

betainv

( (1

0.95) / 2, 10 , 100

– 10 + 1 ) = 0.04900 Slide27

If the Parameter % Defective is 17.622%, then we have a 2.5% chance of observing 10 or less defective parts in a sample of 100 parts:

If the Parameter % Defective is 4.900%, then we have a 2.5% chance of a observing 10 or more defective parts in a sample of 100 parts.

This is similar to what we saw with variables confidence limits, except here the 2 curves are

not

identically shaped.

Continued from previous slide...Slide28

Attribute confidence limits, continued

One-sided confidence limits

are calculated like so:

UPPER

1-tailed Confidence Limit =betainv( C, k + 1 , N – k )LOWER 1-tailed Confidence Limit =betainv( 1 – C , k , N – k + 1 ) 95% 1-sided limits for observed 10 defects when N = 100 upper = betainv( 0.95 , 10 + 1 , 100 – 10 ) = 0.16372

lower = betainv( 1 – 0.95 , 10 , 100 – 10 + 1 ) = 0.05526

Which one is useful when calculating % in-spec for

population from which that sample was taken?Slide29

Binomial Confidence Limits (cont.)

VERY IMPORTANT NOTE:

Do not use the "Z table" or "Poisson" formula / table given in some text books for calculation of binomial confidence limits. Those methods are pre-computer-era

approximations

for the

"Exact" binomial results (binomial calculations are VERY difficult to do by hand, whereas Z and Poisson are easy).Using the 10% Defective example (see previous slides), the approximate confidence limits are... Z table: -------------------------- 4.120% and 15.880% Poisson: ------------------------ 4.795% and 18.390%Whereas, the "Exact" (correct !!!) confidence limits are... Binomial: ----------------------- 4.900% and 17.622% Beta approximation: -------- 4.900% and 17.622%Slide30

TESTS OF STATISTICAL SIGNIFICANCE

Most statistical tests are tests of “statistical significance”

(

e.g.

, t-Test, Chi-Square, F-test, ANOVA).These next sections discuss this topic in general using the t-Test and ANOVA as examples.Slide31

What is Statistical Significance?

Significance

(also referred to as either "

alpha

" or " α " ) is a number between 0 and 1 (or a % between 0% and 100%) that you choose, that... in your opinion, indicates how odd an event has to be... before you think it is fair to conclude that... "something's fishy" or "I'm being conned" or "the null hypothesis is false" For example, flip of 1 coin by seminar presenter, plus audience participation (all audience members are asked to now raise a hand high into the air).(e.g.) Significance is the probability of concluding that there is a difference, when in fact there is no difference.Slide32

Statistical Significance

# of heads

Approximate chance

(= probability) in a row of such occurrence, (null hypothesis using 1 coin is that the coin is honest 1 50 % 2 25 % 3 12 % 4 6 % ( ≈ now popular "0.05") 7 1 % 10 0.1 % 13 0.01 % 17 0.001 % 20 0.0001 %

JZ to describe coin-toss demo results when training 100+ TUV auditors in 2002

We can demo significance with a pair of dice.Slide33

Statistical SignificanceSlide34

There are a huge number of different t-Tests --- for example:

To determine if cables from a new Supplier have a mean diameter that we

suspect

is

significantly

larger than the value we’ve been getting for months from the old supplier, we perform a one-tailed test of a mean vs. a historical average.To determine if cables from a new Supplier have a mean diameter that is significantly different than the value we’ve been getting for months from the old supplier, we perform a two-tailed test of a mean vs. a historical average.t-TestsSlide35

There are at least

3 (mathematically identical) ways

to explain how a t-Test works;

e.g.

:

Using confidence intervalsUsing P-valuesUsing the “t-statistic”The first and second methods can be explained graphicly and therefore are easy to understand.The third method is basicly one that can be applied without necessarily understanding it, and therefore it should be used with caution. t-TestsSlide36

When using the “confidence interval” method

to explain the t-Test...

Significance = 1

Confidence

Therefore, Significance can be any value between0 and 1 ( = 0% and 100% ).However, because Confidence values are typically relatively large (90% or more), “Significance” values are typically relatively small (10% or less).Commonly used "significance" terms: "significant" = a significance of 0.05 or less "highly significant" = a significance of 0.01 or lessStatistical SignificanceSlide37

If you're testing whether dice are "loaded" (that is, dishonest), the null hypothesis is this: "the dice are honest"If you are testing whether Croatians are taller than Bosnians, the null hypothesis is this: "Croatians and Bosnians are, on average, the same height".

If you are testing whether 2 products are significantly different, the "null hypothesis" is that there is no difference between the products.

The

NULL HYPOTHESIS VALUE

is what your "test of significance" assumes is the Parameter, until you get a result that's so odd ( = has such a low probability of occurring, assuming the Null Hypothesis Value is the Parameter) that you decide to "reject the Null Hypothesis" rather than "accept" it.What is the "NULL HYPOTHESIS" ?Slide38

Confidence Interval explanation of t-Tests

Null Hypothesis Value

Sample

Avg

+/– t x SEavg (example of when Smpl Avg is smaller than Null Hypoth.)Sample Avg +/

– t x SEavg (example of when

Smpl

Avg

is

larger

than Null

Hypoth

.)

If the value of the Null Hypothesis

IS

outside the

95% confidence interval

, then the Sample Avg

IS

“statisticly significantly

different

than the Null Hypothesis Value

(the case above

does

show “significance”)

95% conf. interval

95% conf. intervalSlide39

Confidence Interval explanation of t-Tests

Null Hypothesis Value

Sample

Avg

+/– t x SEavg (example of when Smpl Avg is smaller than Null Hypoth.)Sample Avg

+/– t x SEavg (example of when

Smpl

Avg

is

larger

than Null

Hypoth

.)

If the Null Hypothesis is

NOT

outside the 99% confidence interval, then the Sample Avg

IS NOT

“statisticly

highly

significantly

different

than the Null Hypothesis Value

(the case above

does not

show “high significance”)

99% conf. interval

99% conf. intervalSlide40

Confidence Interval explanation of t-Tests

Null Hypothesis Value

Sample

Avg

+ t x SEavg (example of when Smpl Avg is smaller than Null Hypoth.)Only if the value of the Null Hypothesis IS larger than the1-sided upper 95% confidence limit can we say that the Sample Avg IS “statisticly significantly smaller” than the Null Hypothesis Value (the case shown above does show “significance”).

Upper 1-sided 95% Confidence Limit on the Sample Avg

Sample AvgSlide41

Confidence Interval explanation of t-Tests

Null Hypothesis Value

Sample

Avg

+ t x SEavg (example of when Smpl Avg is smaller than Null Hypoth.)Only if the value of the Null Hypothesis IS larger than the 1-sided upper 99% confidence limit can we say that the Sample Avg IS “statisticly highly significantly smaller” than the Null Hypothesis Value(the case above does not show “high significance”).

Upper 1-sided 99% Confidence Limit on the Sample Avg

Sample AvgSlide42

Confidence Interval explanation of t-Tests

Null Hypothesis Value

Sample

Avg

+/– t x SEavg (example of when Smpl Avg is larger than Null Hypoth.)Only if the value of the Null Hypothesis IS smaller than the1-sided lower 95% confidence limit can we say that the Sample Avg IS “statisticly significantly larger” than the Null Hypothesis Value (the case shown above does show “significance”).

Lower 1-sided 95% Confidence Limit on the Sample Avg

Sample AvgSlide43

Confidence Interval explanation of t-Tests

Null Hypothesis Value

Sample

Avg

+/– t x SEavg (example of when Smpl Avg is larger than Null Hypoth.)Where would the Lower 99% confidence limit have to be, in order to conclude that the sample average is NOT “statisticly highly significantly larger” than the Null Hypothesis Value ?

Lower 1-sided 99% Confidence Limit on the Sample Avg

Sample AvgSlide44

Use this in exercise on next slides...

t-Table used in t-Tests is same one used for Confidence Intervals.Slide45

Class exercise: 1-sample t-Test

Calculate whether this Sample

Avg

is "significantly"

larger

than Null Hypothesis: Null Hypothesis = 100 Sample Avg = 107 Sample Std Dev = 10 Sample size = 9 = 107 – ( 1.86 x 10 / sqrt( 9 ) ) = 107 – ( 18.6 / 3 ) = 107 – ( 6.2 ) = 100.81.86 = t-table value for A = 0.10, d.f. = ( 9 – 1 )

Null Hypoth. Value is belowthe Lower 1-tailed Confidence Limit, & so Sample Avg IS

statisticly

significantly LARGER.

Lower

1-tailed

Conf. Limit

A = 0.10 2-tailed equals

A

= 0.05

1-tailed

on t-Table...Slide46

Class exercise: t-Test

Calculate whether this Sample Avg is "significantly"

different

than Null Hypothesis:

Null Hypothesis = 100

Sample Avg = 107 Sample Std Dev = 10 Sample size = 9 = 107 + / – ( 2.306 x 10 / sqrt( 9 ) ) = 107 + / – ( 23.06 / 3 ) = 107 + / – ( 7.69 ) = 99.31 to 114.692.306 = t-table value for A = 0.05, d.f. = ( 9 – 1 )Null Hypoth. Value is INside Conf. Interval,& so Sample Avg

IS NOT statisticly significantly DIFFERENT.

LowerConLimit

UpperConLimitSlide47

Another argued-about point, in the use of tests of significance, is whether or not to use a 1-tailed or 2-tailed test.

The most commonly accepted view is this:

You MUST make your decision BEFORE you start your study (e.g., when you write your protocol).

If you KNOW that the Sample Avg will be larger

(or if you KNOW it will be smaller) than the Null Hypothesis Value, and BEFORE the study starts you can prove it

(e.g., on the basis of multiple preliminary studies), then you MUST use a 1-tailed test.Otherwise, you MUST use a 2-tailed test.To do any other way tempts you to modify your results to create the conclusion you want.1-tailed vs. 2-tailedSlide48

Significance is the probability

chosen

by you

before the experiment (or study) was conducted

, with the understanding that if the "p" value of your experimental result is equal to or less than the significance value, you will reject the null hypothesis.

"P" value represents the probability of the experimental result that you actually observed, assuming that the null-hypothesis is true. It is the probability of getting the observed result or a result that's even further out on the "tail" of the null-hypothesis distribution. If the "p" value is equal to or less than the chosen "significance" value, then you can say that the result is "significant" (or highly significant, if significance = 0.01 was chosen)." P " vs. "Significance"Slide49

The probability of getting 20 heads

or more

( = a more extremely unlikely result sample “average” than 20), is 0.05 ( = 5% ).

Each of the X-values is a possible result. Each is a "statistic". The statistics on the tails are less likely than the ones closer to the middle of the curve.

Do we need to examine this relationship more thoroughly, using Excel?Slide50

When 30 coins are tossed, any number of heads is possible (from 0 to 30)

Result

here is NOT significant

≈ 28% chance

of getting 17 or more heads

Result here IS significant≈ 3% chanceof getting 21 or more headsSignificance point5% (= our chosen “Alpha”) chance of getting 20 or more heads

, but...Slide51

A t-Test assumes that the Null Hypothesis is true, and then checks whether or not the Sample Avg being evaluated is in the red tail(s) or not

(see next slide)

Distribution of 1000’s of Sample Avgs (all of one size) taken from a theoretical Null Hypothesis Normal population.

Any of these are possible to be drawn by chance from the population.Distribution ofthe Raw Data in the theoretical Null Hypothesis population.Slide52

Avg here is NOT significant

Avg here IS significant

Significance point

≈ 25% chance

of getting this avg or larger.5% (= our chosen “Alpha”) chance of getting this Average or larger.≈ 3% chanceof getting this avg or larger.Theoretical distribution of thousands of individual avgs taken from the population.Slide53

"p" values are typically given as part of the output of a statistical test or statistical evaluation, e.g.

Assuming the "null hypothesis" is true, a "p" value is the

probability of occurrence of the

observed

result

OR a result that's even more extreme.“P-value” explanation of t-TestsSlide54

“P-value” explanation of t-Tests

Are the means of the 2 samples that generated this output from Excel "significantly

different

" from each other ?

No, because the

2-tailed P-valueis not 0.05 ( = 5% ) or less.We used the 2-tail value, because the question had to do with being "different", not "larger" or "smaller".However, because one-tailed p = 0.041, we can say that 97.106 is statistically smaller than 106.123 ---- that seeming contradiction is why there are arguments !!!Slide55

Calculate the number of standard errors your observed mean is from the Null Hypothesis mean; this is your “observed t-statistic”.

Compare that value to the appropriate value in a t-table.

If your observed t-statistic

the t-table value, you have a “significant” result; otherwise, your result is not significant.

Instructions for a t-table version of a t-test:For example: Is sample avg significantly different from 100 ? Null Hypothesis = 100 Sample Avg = 107 Sample Std Dev = 10 Sample size = 9Observed t-statistic = (107−100) / (10 / sqrt ( 9 ) ) = 2.1002-sided t-table value (at alpha = 0.05, df = 8) = 2.3062.100 < 2.306, thus 107 is not statistically different from 100.Slide56

An almost-century-old controversy

revolves around the role that

tests of significance

should play in evaluating the results of an experiment.

On one side of the controversy

are those who preach that a "statistically significant" result indicates that the experimental results are important, and a "non-significant" result indicates that the results are not important.On the other side are those who argue that the importance of a result must be decided upon by the researcher him/herself. If he/she decides "important", then the degree of statistical significance indicates how much confidence one should have in the result, especially in regards to whether or not one should perform a confirmational study. If he/she decides "not important", then the researcher considers the statistical significance to be irrelevant.The Significance of SignificanceIf it’s significant or not significant, who cares, so what ? Slide57

The Significance of Significance

If it’s significant or not significant, who cares, so what ?

A big company and a small company each conduct clinical trials on their own unique, new, unapproved medical device, both of which are designed to extend the life of seriously ill patients. The results are…

Big company & Small company &

large sample size small sample sizeAvg. life extension: 3 months 36 monthsStatistically differentfrom 0.00 months? Yes NoFDA gives approval? Yes NoSubsequent big-company advertisements say product is “Clinically shown to significantly extend life of patients!”Most scientists make the mistake of thinking a “statisticly significant” result means it is of practical importance.Slide58

The Significance of Significance

If it’s significant or not significant, who cares, so what ?

[per Murphy &

Myors

,

Statistical Power Analysis] "With a sufficiently large N, virtually any [ test ] statistic will be significantly different from zero, and virtually any null hypothesis that is tested will be rejected."See STUDENT_SmplSize_vs_Significance.XLS for the following t-Test example: Null Hypothesis = 1000.0, SmplAvg = 999.9, SmplStdDev = 10.0, 1-tailed alpha = 0.05, but if N = 30, not significant, if N = 30,000, yes significant.Read...The Cult of Statistical Significance by Ziliak & McCloskey, 2008, Univ. of Mich. Press, Ann ArborJohn Zorich’s Solution (per Ziliak & McCloskey): if the observed difference is NOT a practical importance, then ignore a “significant” result; if the observed difference IS a practical importance, then ignore a “non significant” result (and then repeat the study with a larger sample size).Slide59

Understanding

the Basis

of

Analysis of Variance

( ANOVA ) Slide60

Example of data to be analyzed in a

“One-Factor ANOVA” test:

*Effect of Different Conditioning Methods on the Breaking strength of cement briquettes

(lbs/in2) Method#1 Method#2 Method#3 553 553 492 550 599 530 568 579 528 541 545 510 537 540 571 * data taken from Juran’s Quality Control Handbook, 4th edition, p. 26.11Understanding the Basis of Analysis of Variance ( ANOVA )Slide61

Example of data to be analyzed in a

“Two-Factor ANOVA” test:

Part#1 Part#2 Part#3

Operator#1 0.123 0.127 0.135Operator#2 0.121 0.124 0.130Operator#3 0.124 0.127 0.134 Data from a Gage R&R study. Values in inches. Understanding the Basis of Analysis of Variance ( ANOVA )Slide62

Example of data to be analyzed in a

“Three-Factor ANOVA” test:

Seal-Strength

vs. Sealer

Dwelltime, Temperature, Pressure Time#1 Time#2 Time#3 . P1 P2 P3 P1 P2 P3 P1 P2 P3Temp#1 0.8 1.3 0.9 0.9 1.0 0.9 0.7 1.1 1.2Temp#2 1.2 1.1 1.2 1.0 1.1 1.2 0.9 0.9 1.1Temp#3 1.1 0.9 0.8 1.3 1.2 1.2 0.7 0.8 1.0Data from validation of a pouch sealer. Values in lbs.

Understanding the Basis of Analysis of Variance ( ANOVA )Slide63

For the sake of simplicity, we’re going to discuss only One-Factor ANOVA. The principles & methods that we’ll see here apply to multi-factor ANOVA.

We first need to discuss t-Tests, in order to understand some concepts needed for explaining ANOVA tests

Understanding the Basis of Analysis of Variance ( ANOVA )Slide64

t-Tests and ANOVA Analyses

both evaluate Sample

Avg’s

, to see if they are significantly different from what would be expected if the Samples came from the same population. If the Samples come from the same population, then the differences between Sample

Avg’s

are due to random chance, rather than caused by their coming from different populations that have truly different Avg’s.t-Test (as used here) evaluates 2 sample averages.ANOVA test evaluates more than 2 sample averages.--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------(FYI: With two samples, a 1-factor ANOVA gives the same p-value as a 2-tailed t-Test.) Understanding the Basis of Analysis of Variance ( ANOVA )Slide65

A two-sample

t-Test

is performed by dividing the difference between Sample

Avg’s

, by a value calculated from two estimates of the... Population Std Error of the Mean ( = SEM ). If that ratio is too large, then we reject the idea that the Samples came from the same Population. Tables of “ t ” are used to decide if the ratio is “too large”.Understanding the Basis of Analysis of Variance ( ANOVA )Slide66

Standard Error of the Mean (SEM)

SEM

=

Std

Dev of the Population of all possible Sample

Avg’s (of a given Smpl Size, from a given Population). In practice, assuming you have at least 2 samples, SEM can be estimated one of 2 ways (as we saw during our discussion of “Standard Errors” earlier today): SEM (theoretical formula) = Std Dev (n-1) of Sample Avg’s (Pooled) Sample Std DevSEM (practical formula) = -------------------------------- Sqrt(Smpl Size) Understanding the Basis of Analysis of Variance ( ANOVA )Slide67

(Pooled) Sample

Std

Dev

Is derived from a relatively simple equation found in any introductory Statistics textbook.

That equation combines all the individual Sample

Std Dev’s into a single, better estimate of the Population Std Dev than is any single one of them.We don’t have time today to examine the equation today, but it’s important to know is that a Pooled Std Dev is NOT the Avg of the Sample Std Dev’s !!Understanding the Basis of Analysis of Variance ( ANOVA )Slide68

EXAMPLE OF A ONE-FACTOR ANOVA ANALYSIS

Effect of Different Conditioning Methods on the

Breaking Strength of Cement Briquettes

(

lbs/in2) Method#1 Method#2 Method#3 553 553 492 550 599 530 568 579 528 541 545 510 537 540 571 ----------------------- ------------------------- ------------------------- 550 563 526 = Average 5 5 5 = Sample size 12.1 25.0 29.4 = StdDev (n-1)Understanding the Basis of Analysis of Variance ( ANOVA )Slide69

ANOVA table

for “Breaking strength” data

SS _

df

_ MS F P . Total 10,053 14 Between 3,509 2 1,754 3.22 0.076Within 6,544 12 545If the P value is equal to or less than your chosen “alpha” value, then there is a statistically significant difference between the means of the 3 samples.Understanding the Basis of Analysis of Variance ( ANOVA )Slide70

Mean Square ( MS )

Each MS in an ANOVA table

is really a

variance

(variance = square of the standard deviation)In our example: the “ F ” ratio = MS(between) / MS(within) = Variance(between) / Variance(within)If that ratio is too large, then we reject the idea that the Samples that generated the variances came from the same Population.F-tables are used determine whether or not F-ratios are “too large”.Understanding the Basis of Analysis of Variance ( ANOVA )Slide71

From earlier in this discussion, we have these equations:

1) (Pooled

Smpl

Std

Dev) = Population Std Dev 2) SEM = (Std Dev (n-1) of Smpl Avg’s)3) SEM = (Pooled Smpl Std Dev) / Sqrt(Smpl Size)Rearranging equation 3), we have...4) Sqrt(Smpl Size) x (SEM) = (Pooled Smpl Std Dev)Substituting the definition of SEM from 2) into 4), we have...5) Sqrt(Smpl Size) x (Std Dev (n-1) of Smpl Avg’s) = (Pooled Smpl Std Dev)

Substituting the “Pooled” definition from 1) into 5), we have...

6)

Sqrt

(

Smpl

Size) x (

Std

Dev (n-1) of

Smpl

Avg’s

)

=

Population

Std

Dev

These are 2

different

estimates of Population

Std

Dev

Understanding the Basis of Analysis of Variance ( ANOVA )Slide72

Applying those equations (on previous slide), to the “Breaking Strength of Cement Briquettes” data, we have

...

Method#1

Method#2 Method#3 Sample... 550 563 526 = Average 5 5 5 = Size 12.1 25.0 29.4 = StdDev (n-1)StdDev (n-1) of Sample Averages = 18.7Estimate#1 of Population StdDev = sqrt(5) x 18.7 = 41.89Estimate#2 of Population = Pooled Sample Std Dev = 23.35 Std Dev

Understanding the Basis of Analysis of Variance ( ANOVA )Slide73

Estimate#1

of population

StdDev

= 41.89

Estimate#2

of population StdDev = 23.35 (Estimate#1)2 (41.89)2F = = = 3.22 (Estimate#2)2 (23.35)2which is the same value derived from the classic ANOVA calculations described earlierand of course, this works in an F-test, because Variance = ( Std Dev )2

Understanding the Basis of Analysis of Variance ( ANOVA )Slide74

Conclusion

:

Instead of viewing ANOVA in terms of unfamiliar terms (such as: sums of squares, degrees of freedom, and mean squares), view it as

a comparison of two estimates of the Population Standard Deviation

.

One estimate is derived from the Standard Deviation of the Sample Averages, and the other is derived from the Pooled Standard Deviation of the Samples’ individual data points. If the ratio of the squares of the two estimates is too large (as determined by an F-test), then we conclude that not all the samples came from the same population.Understanding the Basis of Analysis of Variance ( ANOVA )Slide75

Choosing Sample Size

based upon

POWER

Calculations

(focus will be on t-Tests)Slide76

Statistical Power

Power

(which also is referred to as either "

1 - Beta

" or "

1 - ß") is a number between 0 and 1 ( = 0 to 100%) that you calculate (or choose) based upon...Your choice of Confidence ( = 1 – Significance)Your choice of Sample SizeYour choice of the difference (between the populations) that is important to detectStd Deviation of the data (estimated or known)...that represents the probability of "detecting" an important difference when there really is one (where "detecting" means test result is "significant").(e.g.) Power is the probability of concluding that there is a difference, when in fact there is an important difference.(compare to Significance).Slide77

(as we shall see on the next several slides...)Calculation of "statistical power" is important...

only if the actual or desired conclusion from your significance test is that the “Null Hypothesis could very possibly be true”

and

that the

“Alternate Hypothesis is most likely not true”. Another way to say the same thing is...Calculation of “statistical power” is important only if the actual or desired conclusion is that there is "no statistically significant difference“ between the sample result and the Null Hypothesis value.When is power important?Slide78

This is the distribution of Possible

Sample Averages

, assuming that the Parameter Mean = 100

100

SAMPLE AVG

100 Slide79

100 106

SAMPLE AVG

100

This represents the situation if we were to perform a

1-tailed t-test

of whether or not the Parameter Mean = 100In this case, if we obtain a sample average of ≈ 107 or larger, we have a "significant" result.Slide80

100 106

SAMPLE AVG

100

106

What if (

unbeknownst to us) the Parameter Mean = 106(that is, we assume incorrectly that the Parameter Mean = 100 )There is aΔ = 6 between these2 populations.Slide81

100

106

100 106

SAMPLE AVG

If true Mean = 106, a t-test that assumes Mean = 100

won't have a good chance of being "significant" (see next slide).Slide82

100

106

100 106

SAMPLE AVG

If the true Mean is 106, a t-test that assumes Mean = 100

will reject that false assumption ≈ 45 % of the time.Power ≈ 45 % to detectΔ = + 6[per Murphy & Myors] "The power of a statistical test is the proportion of the

[ true ] distribution of test statistics... that is above the critical value used to establish statistical significance."Slide83

100

106

100 106

SAMPLE AVG

"

Power" increases as we increase the sample sizefrom 2 (on previous slide) to 6 (on this slide).Power ≈ 80 %to detectΔ = + 6Slide84

100

106

100 106

SAMPLE AVG

"

Power" increases as sample size increases (or if the Std Error is reduced some other way).Power ≈ 100 % to detectΔ = + 6[per Murphy & Myors] "...the effects of sample size on statistical power are so profound that is tempting to conclude that a significance test is little more than a roundabout measure of how large the sample is. If the sample is sufficiently small, then

[we] never reject the null hypothesis. If the sample is sufficiently large, [we]

always

reject the null hypothesis."Slide85

Statistical Power : t-Tests in general

There are a huge number of different t-tests, each with a different way to calculate their own "standard error".

To estimate "power" for any t-test, use the test's Std Error & Null Hypothesis to draw the Null Hypothesis curve, and then draw an identically shaped one in the location of the Alternate Hypothesis (all as we did on a previous slide).Slide86

Statistical Power : General Comments:

(per Murphy &

Myors

) "...

power of 0.80 or above is usually judged to be adequate. The 0.80 convention is arbitrary (in the same way that significance criteria of 0.05 or 0.01 are arbitrary), but it seems to be widely accepted."

Power is most useful in planning, so that you don't spend time and money on a study only to find out it was doomed from the beginning.All major statistical software programs (such as StatGraphics) will calculate power for any test that the program can perform.Some textbooks explain the concept of Power and how to hand-calculate power values by using formulas and tables found in the books (e.g., Cohen, Kraemer, or Murphy).Slide87

For a specific test, computer programs typically output a

"

Power Curve

" chart, with "power" on the

Y-axis and "sample size" on the X-axis.Slide88

...and/or they output a

"

Power Curve

" chart, with "power" on the Y-axis and some version of the alternate hypothesis on the X-axis

(here, Δ is used on the X-axis).Slide89

" t-Tests" require "normal" data !!

( The text above is a scanned image from

William Mendenhall,

Intro. to Probability

& Statistics

, 5th ed., p. 281 )

Is “nearly” good enough?Slide90

" t-Tests" require "normal" data !!

( The text above is a scanned

image from C. L. Chiang's

Statistical Methods of Analysis

, 2003 by World Press )

( in regards to the theorem that underlies the use of the t-Test ...)Don’t we want to be sure our inferences are “valid”?(Central Limit Theorem applies to Means, not to raw data !!)Slide91

The

"

original data" (below) is not Normally distributed

(it has an “inverse normal” distribution, based upon an analysis not shown here). Therefore, a t-test on the original data gives the wrong answer for significance and power !!

.

.Slide92

Reliability

CalculationsSlide93

Regulatory requirements:

Product safety regulations (e.g. MDD) require that "risks" be "acceptable" when weighed against "benefits”, which are described in "risk management" docs (e.g., an FMEA).

ISO 13485 (one such regulation) requires that the “

output

of risk management” be used as “Design...

Input”. For example, if your FMEA states that the risk of failure of a given component is acceptable only if mitigation (e.g., process improvement) reduces the frequency of failure to 0.1% at a confidence of 95%, then you must perform Verification studies on that component to prove it has100% − 0.1% = 99.9% reliability at 95% confidence, or...You must update your FMEA with the reliability results you observe in your Verification study, and then decide if (at the new frequency-of-failure level) the risk is still "acceptable".Slide94

NOTE:

The field of reliability statistics is vast.

Today

, we will

not cover system reliability, altho we will discuss "mean time between failures“ and “mean time to failure” (MTBF and MTTF) which are concepts typically associated with electronic finished goods. Rather, we will primarily discuss component failures, and will focus on calculation of the time or stress level at which the first failures occur in a population (e.g., when the stress level at which the first patients are put at risk of injury or death by a failing medical device component, or when the first space-shuttle tile falls off during re-entry to Earth's atmosphere). Slide95

Definitions of " Failure

" and "

Reliability

"

In many of the slides in this section of the class, the words

" Failure " and " Reliability " are used. By "Failure" is meant that an individual component or product has been put on-test or under inspection and has either not passed specification or has literally failed (e.g., broke, separated, or burst -- it may have passed spec but then been taken past spec, until it eventually failed) --- which meaning is intended is obvious (or should be !!) in each situation."Failure Rate" refers to the % of a lot or sample that has failed in testing, so far (that is, up to a given stress level).By "Reliability" is meant the % of the lot that does not exhibit "failure" (Reliability = 100% minus the Failure Rate)…AT OR BELOW A SPECIFIC STRESS LEVEL (a level that is typically set equal to the “QC” specifications)Slide96

"Confidence" = 1

Significance

Therefore, Confidence is a value between 0 and 1

( = 0% and 100% )

Typical desired values are 95% or 99%"Confidence" represents the probability that you are "right" when you make a statistical claim such as... "This product is 99.99% reliable".Reliability calculations are really lower 1-sided confidence limits on the observed % in-specification; and (as we saw previously) because confidence limits are automatically adjusted based upon sample size, ANY sample size if valid.Slide97

ATTRIBUTE DATA:Pass/Fail testing,

0 or more

Failures

Method

:

Beta Equation (see Krishnamoorthy, Handbook...p.38)Don't use Dovich's Reliability Statistics beta table (many errors!)=betainv ( 1 – C , N – F , F + 1 )where...C = Confidence desired (expressed as a decimal fraction)N = sample sizeF = # of failures seen in the sample That formula outputs the lower 1-tailed "exact" binomial confidence limit on the success rate (see conf. limit discussion).If no failures in a sample of 299, then 95% confidence in... =betainv( 1 – 0.95 , 299

– 0 , 0 + 1 ) = 0.99 = 99% reliability

If 2 failures in a sample size of 30

, then 95% confidence in...

=

betainv

( 1

0.95

,

30

2

,

2 + 1 ) = 0.80 = 80% reliabilitySlide98

Why

does a sample of

299

, with

zero failures

,equal 95% confidence of at least 99% reliability?A reliability calculation on a binomial proportion is, in effect, a lower 1-sided confidence limit on the observed proportion. It's the lower-most edge of the interval in which we predict we will find the true ("parameter") proportion. 100%99%98%

Sample Statistic

Lower 1-tailed 95% Confidence Limit on Sample Statistic, when N =

299

and no failures are found in sample.

We are 95% sure that the Parameter is somewhere in this interval

For reliability, we get to claim the

worst

value

in that

interval (in this case, 99%)Slide99

If Sample Size is 100% of the lot, use

BetaInv

formula; but if between 1% & 100% of Lot Size

, reliability is more accurately calculated using

Hypergeometric

function (but it is more work!!): Confidence = 1.00 – SUM (hypgeomdist(F,N,D,P)) F = from F = 0 to F = # of failures seen in Sample N = Sample size D = P x ( 100% – %Reliability to be determined) P = Population Size Keep modifying “%Reliability” until Confidence = 95%.If F = 2, P = 300, N = 30, then %Reliability ≈ 80.9% (vs. 80.5% if use the binomial approximation method); that is... 0.001 = hypgeomdist ( 0 , 30 , 300 ( 1 – 0.809 ), 300 ) 0.010 = hypgeomdist ( 1 , 30 , 300 ( 1 – 0.809 ), 300 ) 0.038 = hypgeomdist ( 2 , 30 , 300 ( 1 – 0.809 ), 300 ) Sum ≈ 0.05 (subtracted from 1.00 equals 95% Confidence)

ATTRIBUTE DATA:

Pass/Fail testing,

0 or more

FailuresSlide100

Variables Data, Normally Distributed

Situation:

Based on analysis of prior R&D work, failures are believed to be "

Normally Distributed

"; you likewise also have estimates of the Mean and StdDev (e.g., from R&D work).

Method (e.g.): K-factor Table for "Normal" data(e.g., see Juran's Q-Handbook, Table V )To use the table, calculate the “Observed K”. Then, compare the “Observed K” to the K in the "Normal" K-factor Table. “Observed K” = number of Std Deviations that the Process Mean is from nearest side of a 1 or 2-sided specification, i.e.,|(SmplAvg – NearestSpecLimit)| ÷ SmplStdDevYou can claim the confidence and reliability that is associated with a given normal k-table value, if your “observed k” is equal to or greater than the k-table value. See next slide >>Slide101

Juran's QH

K

confidence

If the "observed K" is at least 3.520, & the population is "Normally Distributed”, & sample size is 15, then we are

95% confident

that the Lot from which the sample came has at least 99% in-spec parts.

reliabilitySlide102

Why

does a sample of

15

, with whose average is

3.52 std dev

’s away from a 1-sided QC spec,equal 95% confidence of at least 99% reliability?A reliability calculation on a sample from a Normal population is, in effect, a lower 1-sided confidence limit on the % in-spec that would be calculated from the observed K. It's the lower-most edge of the 95% confidence interval of % in-spec. 99.98%99%

98%

Observed Sample

Statistic:

=

NORMSDIST

(3.52)

=

99.98 %

in-spec

but no confidence can be claimed for this statement!

When sample

Avg

is 3.52

stdevs

from the one-sided spec, can claim 99% reliability at 95% confidence.

We are 95% sure that

the Parameter

% in-spec is

somewhere in this interval

For reliability, we get to claim the

worst

value

in that

interval (in this case, 99%)Slide103

Juran's QH

k

confidence

Use this table for the Class Exercise on the next slide

.Slide104

Class exercise:

Using K-factor Tables, determine what is the reliability, at 95% confidence, of a population from which this sample was taken (assume population is "normal"):

Sample size =

20

Avg =

1000 StdDev = 10 2-Sided specification = 950 to 1040Answer: The sample mean is 4 StdDevs from the nearest specification limit ( 1040 – 1000 = 40, and 40 / 10 = 4 ).And, on the sample size = 20 line (on the K-table), under 95% confidence, K = 4.000 is midway between 99% and 99.9% reliability. Therefore, we are 95% confident that the product is more than 99% reliable.Slide105

K-factors give accurate

reliability estimates

only if the raw data has a " normal " distribution.

(The text below is a scanned image from

Juran's Quality Control Handbook.)Slide106

Basing reliability on the

correct

distribution is critical

because you're always concerned with the tail regions, &

slight differences in the shape of a tail make huge differences in reliability estimates

(99.1% vs. 99.9% may be the difference between deciding to launch a new product or not !!).This is the distribution curve for individual values in a hypothetical "normal" population, as estimated from the mean & std deviation of one sample.QC Spec is 1.00Slide107

"

Cumulative Distribution

"

Sometimes this is referred to as an " S " curve.Slide108

Normal Probability Plotting Paper

In the pre-computer days, you would use special graph paper, called Normal Probability Plotting (

NPP

) paper,

to determine if data were "normal".

"Normal data” plots as a straight line on NPP paper. What NPP paper does, in effect, is to straighten out the "S" shaped cumulative "probability plot" curve we saw on the previous slide. Different versions of NPP paper are shown on the next two slides, followed by a way to create such paper using MS Excel.Slide109

This is an

S-curve

and

NPP plot

that have been combined by using 2 different Y-axes.

"S" curve of Cumulative Normal Probability Distribution (Y-axis for this is on the left here)Straight line of Normal Distribution on “Normal Probability Plotting Paper”(Y-axis for this is on the right here)This axis never gets to 1.0001.00.80.6

0.4

0.2

0.0

0.99

0.8

0.6

0.4

0.2

0.01Slide110

NORMAL PROBABILITY PLOTTING PAPER

Cumulative %

70 80 90 100 110 120 130

If ... Avg = 100 StdDev = 10

& Normal DistributionZ=0

Z=2

Z=1

Z= -2

Z= -1

Z values allow use of Linear Regression

These Z values are taken from a

“Normal Distribution Z-table”;

however,

MSExcel

can give them to us

automaticly

(more about this, soon).

Y-axis never gets to 100% (this is NOT a “log” scale !!).Slide111

Definition of "

F

"

Regarding the use of NPP paper, textbooks provide various transformation of % Cumulative values; such transformations are called “

plotting positions”; one purpose for them is to allow all data, even the "100%" point, to be plotted onto the Y-axis of NPP paper. In textbooks on Reliability Statistics, such a "plotting position" is given the symbol " F ". There are many different formulas for F, but a commonly used one is... F = Median Rank = ( Rank – 0.3 ) / ( SampleSize + 0.4 )where “rank" of the lowest value in the data set = 1, next lowest value = 2, next lowest = 3, and so on.A “more accurate and theoretically justified” calculation (per one of the authors of Applied Reliability) is (using Excel)... F = BETAINV ( 0.5 , Rank , SampleSize – Rank + 1 )Slide112

Plot “F” on a Probability Plot

F =

Rank

1

2345678910Slide113

"Normal Probability Paper" using MS Excel

Create an X,Y chart with...

X-axis = the observed measurement values

Y-axis = Z ( F )

To calculate Z(F), do the following:

Arrange measurements in order of magnitude."Rank" of the lowest value = 1, next = 2, & so on. " F " = ( Rank – 0.3 ) / ( Sample Size + 0.4 )" Z (F) " is = Normsinv( F ) the MS Excel functionIf Sample Size = 10, then " F " for the lowest rank = (1 – 0.3 ) / (10 + 0.4 ) = 0.7 / 10.4 = 0.067, and therefore Z(F) = Normsinv( F ) = – 1.50, whereas..." F " for the highest rank =(10 – 0.3 ) / (10 + 0.4 ) = 9.7 / 10.4 = 0.933, and therefore Z(F) = Normsinv( F ) = + 1.50Slide114

Data from a few slides ago, using cumulative

Z-table values of F ( that is " Z(F) " )

" Z(F) "Slide115

Y-axis = Z( F ) = Normsinv( F )

X-axis = X

i

d

FSDF

dsdFSDFHow straight do data points have to be, before claiming that they fit a straight line? John Zorich's personal choice is that the Correlation Coefficient must be at least 0.975, but preferably 0.99 or larger! (Juran says to use your "judgment...the sample is never a perfect fit"This is actual data; if the data were "normally distributed", the plotted points would lie on a straight line

(on this electronic version of NPP paper).Slide116

An important point:

Mathematical methods, including statistical ones, do not require "data". They require "numbers".

Said differently, the "data" that is inputted to

some

statistical methods do not have to exhibit normality, but the "numbers" do. That is, if the "data" is not normal, then

transform them into "normal" numbers. Slide117

Normal Probability Plotting Transformations

In the pre-computer-era of the 20th Century, if data was not straight on regular NPP paper, then

transformed NPP paper

would be used (sometimes provided at the back of textbooks).

SquareRoot ( DATA )

Cumulative % Log( DATA )Cumulative %

If data is straight on this NPP paper, then the data distribution is

"

SquareRoot-Normal

If data is straight on this NPP paper, then the

data distribution is

Log-Normal

”Slide118

dsfg

100

10,000

2

4jjjjjj

The upper curve has been "transformed" into the lower curve, by taking the Log of each data point.Slide119

The % of data that is outside the

Specification Interval ( = 100 to 10,000)

is same, no matter how data is "transformed".

d

X Log ( X ) Sqrt ( X ) 1 / X

. 5 0.70 2.2 0.2000050 1.70 7.1 0.02000100 2.00 10.0 0.0100010,000 4.00 100.0 0.0001050,000 4.70 223.6 0.0000299,000 5.00 314.6 0.000010In this case, 2 / 3 of the data is out of spec, whether or not the data is transformed, and no matter how the data is transformed.Slide120

This is a scanned image from

Juran

's Q-Handbook

("Basic Statistical Methods"

section), who says there that:"These convenient methods do require judgment (e.g., how 'straight' must the [normal probability plot] line be?) because the sample is never a perfect fit...."dData is "Lognormal" if it plots straight on Normal Probability Plotting Paper after the X-axis values are transformed by this formulaSlide121

Non-normal Data

The file called

"

Student Normal Transformations

" uses most of the formulas on the previous pages to create a series of charts that you can use to help determine if data can be transformed into "normality".Slide122

Actual data from presenter's client...Slide123

continued from previous slide...

This is the Excel equivalent

of a

Normal Probability

Plot(data is “Normal” if it shows as a straight line on this plot).df In reliability statistics textbooks, a plot like this, or one that is not even as straight as this, is sometimes shown as an example of a “Normal" distribution; but... even tho this data does “pass” the best “tests” for Normality (Anderson-Darling A2*, Cramer-von Mises W2*, and Shapiro-Francia W' ), with test p-values all > 0.425, ... and even tho the correlation coefficient is very high... this plot is slightly curved; and therefore this data is not truly normal (it is almost Normal).Is "almost" good enoughfor critical products? Slide124

The "inverse" ( = 1 / X ) transformation gives a much straighter line on "Normal Probability Plotting" paper, and so the distribution is "

Inverse Normal

" rather than “Normal”.

continued from previous slide...Slide125

dfasdf

continued from previous slide...

F

1 / X = "inverse"

( at 95% confidence --- see next slide)F( at 95% confidenceobservedSlide126

Juran's QH

k

confidence

Untransformed, average on the previous slide is less than 2.2 StdDev from the Spec, and so is (at 95% confidence) less than

90% reliable when population is

incorrectly assumed to be "normal"...

...but is almost 4.9 StdDev from the Spec if inverse transformed, and so is almost

99.9% reliable (when analyzed

correctly

).

If the input numbers are not "Normal", you get the wrong answer!Slide127

MTTF and MTBF

The discussion of MTTF and MTBF on this and the following slides applies to testing of electronic products that occurs AFTER burn-in has eliminated cases of “infant mortality”.

That is, these calculations are valid only for the second part of a product’s lifetime, when failures are random and the failure rate is relatively constant, and when therefore those failures are accurately modeled by the “

Exponential Distribution

”.

Typically, assessment of “Exponentiality” is done the same way as testing is done for “Normality”, i.e., with probability plots, as shown on the next slide.Slide128

Is data exponentially distributed?

30 devices put on-test; 9 failures occurred at these # of hours:

367, 422, 476, 508, 552, 589, 642, 683,

738Create a plot with Exponential(F) vs. raw data; if the line appears straight, data can be considered exponential.F = as defined on previous slides (using N = 30, not N = 9 )Exponential(F) = Ln ( 1 / ( 1 – F ) )Slide129

MTTF and MTBF

MTTF = M

ean

T

ime

To FailureThis term applies to products that are not repairable; that is, once the product fails, it cannot be repaired.MTBF = Mean T

ime B

etween

F

ailures

This term applies to products that are repairable; that is, such a product may fail (and be repaired) multiple times during its “lifetime”. Possibly a better term for this might be “Mean time between repairs”.Slide130

MTTF and MTBF

MT

T

F = M

ean

Time To Failureis calculated by adding up all the time the on-test devices were functioning correctly during the study, and then dividing by the total number of failures observed during the study. # 1 X 400# 2 X 100# 3 500# 4 X 200After each failure, the device is taken out of service.Device Failure indicated by “X” on line Hours in serviceMTTF = (400+100+500+200) / 3 failures = 400 hoursSlide131

MTTF and MTBF

MT

B

F = M

ean

Time Between Failureis calculated similarly to MTTF, but takes into consideration that repairs have occurred; all devices are typically in-service for the same length of time.# 1 X X 500# 2 X 500# 3 500# 4 X X 500After failure, device is quickly repaired & put back into service.Device Failure indicated by “X” on line Hours in serviceMTBF = (500+500+500+500) / 5 failures =

400 hoursSlide132

class exercise (taken from a reliability textbook):

Calculate MTTF and MTBF

MTTF:

30 devices put on-test; 9 failures occurred at these # of hours:

367, 422, 476, 508, 552, 589, 642, 683, 738 After failure, each failed device was NOT put back into service. After the 9th failure, the study was terminated (as planned).ANSWER: Sum of the 9 failure hours = 4977 Sum of non-failure hours = (30 – 9) x 738 = 15,498 MTTF = (4977 + 15498 ) / 9 = 2,275 hours MTBF:Using same data as above, assume all failed devices were immediately repaired (or replaced with a good device, which is the same thing, for purposes of MTBF calcs) and put back into service; all 30 devices were then in-service for 999 hours each.ANSWER: MTBF = ( 30 x 999 ) / 9 = 3,330 hoursSlide133

Confidence Limits on MTTF & MTBF

There is no difference in the way Confidence Limits are calculated for either MTTF or MTBF; but consideration does need to be given to how the study was conducted...

Virtually all studies are

RIGHT CENSORED

(that is, ended before all the on-test devices have failed).

“Type I” censored studies are terminated after a pre-defined time (e.g., # of hours, or # of cycles) (as in the MTBF example on the previous slide).“Type II” censored studies are terminated after a pre-defined number of failures have occurred in all the on-test devices combined (as in the MTTF example on the previous slide).Slide134

Confidence Limits on MTTF & MTBF

To calculate MTTF or MTBF confidence limits, we use a formula from the Chi-squared distribution, because MTTF and MTBF values are known to be “Chi-squared” distributed.

The shape of such a distribution changes drastically, depending upon the # of instrument failures in the sample.

Few failures

Many FailuresAs the number of failures becomes large, the distribution of the chi-square statistic takes on a more “Normal” shape. But if you want your calculation to be as accurate as possible, you should resist the temptation to use the Normal approximation calculation instead of a Chi-squared one.calculated chi-square valueprobabilitySlide135

Confidence Limits on MTTF & MTBF

We could calculate 2-sided intervals and limits, but typically those are

not

what are sought in a reliability study.

Typically, we want to know how

bad the product might be. That is, we want to know what is the MTTF or MTBF that we can claim, based on our data --- to do that, we calculate their...LOWER 1-SIDED CONFIDENCE LIMITS:Type I study: 2 x T / Chiinv( 1 – Conf, ( 2 x F ) + 2 )Type II study: 2 x T / Chiinv( 1 – Conf, ( 2 x F ) )Where T = Total in-service test time (all devices combined) Chiinv = the Excel function Conf = desired confidence (as a decimal fraction) F = Total number of failures (all devices combined) Slide136

class exercises:

Calculate MTTF & MTBF conf. limits

Calculate lower 1-sided confidence limits for MTTF & MTBF answers to the “class exercise”, a few slides back.

Lower 1-sided 95% confidence limit on the MTTF = 2,275 is...?

= 2 ( 4977 + 15498) / Chiinv ( 1 – 0.95, 2 x 9 ) = 1,418 hours (using the formula for a Type II censored study) Lower 1-sided 95% confidence limit on the MTBF = 3,330 is...? = 2 ( 30 x 999) / Chiinv ( 1 – 0.95, ( 2 x 9 ) + 2 ) = 1,908 hours (using the formula for a Type I censored study)Slide137

Reliability Calculations

based upon MTTF or MTBF

The following formulas predict with confidence either...

the

% of the population

that has not yet experienced failure after T hours of use (or after T number of cycles), or the probability of an individual device surviving to T without experiencing a failure. Lower 1-sided RELIABILITY CONFIDENCE LIMIT: = e ^ ( − T / MTTF(or MTBF)_Confidence_Limit )Where T = time (or cycles) at which to calculate reliability e = the base of the natural logarithm = 2.718282...For example, if the lower 1-sided 95% confidence limit for MTTF is 1418 hours, then we can be 95% confident that... e ^ ( − 500 / 1418 ) = 0.7029 = 70.29%of the population will run without failure for 500 hours, or that a single device has a 70.29% chance of not failing in 500 hours.Slide138

Tests for Normality --

primary references

A Normal Distribution Course

by J. Gross, (Peter Lang, GmbH, 2004)

Testing for Normality by H. C. Thode (Marcel Dekker Inc., 2002)Applied Reliability by Tobias & Trindade (Chapman & Hall, 2nd ed., 1995)How to Test Normality & Other Distributional Assumptions by S. S. Shapiro (ASQC Press, 2nd ed., 1990)Goodness-of-Fit Techniques by D'Agostino & Stephens (Marcel Dekker Inc., 1986) There are dozens of different tests for normality! Thode: "In our review of the literature, we found more tests than we ever imagined existed." Shapiro: "Unfortunately, there is no one overall 'best' test ". Slide139

Tests for Normality

All commercial statistical packages provide Normality Tests

For example,

StatGraphics

Centurion XV provides

Kolmogorov-Smirnov DKuiper’s VCramer-Von Mises W2Watson U2Anderson-Darling A2Such tests typically involve simple algebraic calculations and subsequent comparisons of the results to values in tables. How to perform the tests are detailed in the reference text mentioned at end of this webinar. See also the explanations in the “Normality Tests…” demo-spreadsheet found on the statistics page at www.JOHNZORICH.com. Slide140

Tests for Normality

Slide141
Slide142

WARNING about Tests for Normality

Data that is obviously non-normal

might not

fail a “test for normality”. Your best bet is to rely upon the shape of the Normal Probability Plot to help you decide if data is non-normal.

For example, as we saw above in the example involving the use of K-tables on 12 “actual data” points, that data showed an obviously curved line on NPP paper.

However, none of the available tests reject normality for that data, as we see on the next slide... Slide143

Normality tests on non-normal

"actual data" data that we saw above:

xzfsdfsdfSlide144

The recommendations by Gross,

Thode

, Tobias, Shapiro,

D'Agostino

and John Zorich can be summarized as follows:Plot the data on Normal Probability Plotting paper. If that plot looks curved (even slightly!), the data set is definitely not Normally distributed.If non-Normal, try Normal Probability Plots of transformations of the data, looking for a straight line. Choose straightest plot.Only then, perform one or more of the most highly recommended “Tests for Normality” on the (transformed?) data. If the test(s) pass, you can assume Normality in subsequent statistical analyses of the (transformed?) data (e.g. Normal K-tables, ANOVA, t-Tests, & Cp or Cpk --- all of which have Normality as a requirement for using the method).If no plot looks straight, even after transformation, then use “Reliability Plotting” or “Non Parametric” methods.Recommendations....Slide145

(Normality) In conclusion...

Just because raw data does not fail a “test for normality” does not mean it really is “normal”.

Incorrectly concluding normality

often leads to incorrectly rejecting product (in the experience of the presenter).

It’s simple to use Normality Tests, and it’s simple to choose a Normality Transformation, with commercial software or spreadsheets you create yourself,

altho you must be willing to make decisions regarding the straightness of NPP plots.In the presenter’s experience in the past few years...the FDA accepts “transformation to normality” (even in PMA supplements) when justified solely on the basis of a curved NPP plot of the raw data (that is, the FDA accepts the “non-normal” conclusion, even if the raw data “passes” a “normality test”);the FDA accepts the use of transformed raw data, based only on (1) the relatively straighter NPP plot of the transformed raw data, and (2) the transformed raw data “passing” a “normality test”.Slide146

df

Reliability

99.3%

however, data

not

"normal"92.1%however, data not "normal"94.2 % however, data not "normal"RELIABILITY USING K-FACTORS **89.4% data had a CubeRoot-Normal distributionLubricity

99.7%

data actually had a

Log-Normal distribution

Crimp-Joint

Bond Strength

99.999%

data actually had an

Inverse-Normal distribution

Burst Pressure

RELIABILITY

USING K-FACTORS AFTER DATA TRANSFORMATION

TEST DATA FROM ZTC CLIENTS

**

Assuming (

incorrectly

) that data is "normal"

and NOT applying any "transformation".Slide147

How to implement what you learned today?

Read your company's SOP (or ??) on statistical techniques.

Ask to read some of the validation protocols and validation reports that relate to your work, and study their "statistics" section (or it might be called the "data analysis" section).

Ask your boss to explain statistical statements made in meetings, reports, or SOPs.

Ask to be part of the planning team for verification, validation, or new-product "transfers", especially in regards to choosing what sample sizes to use for product evaluations.