/
“It is better to observe than to criticise.” “It is better to observe than to criticise.”

“It is better to observe than to criticise.” - PowerPoint Presentation

mitsue-stanley
mitsue-stanley . @mitsue-stanley
Follow
362 views
Uploaded On 2018-02-17

“It is better to observe than to criticise.” - PPT Presentation

Bobby Wellins Jazz Lineup 1322011 Best of all is to convey the magnitude of the effect and the degree of certainty explicitly Pinker 2014 p 45 Usually ID: 632230

inference effect chances practical effect inference practical chances smallest based magnitude size mechanistic beneficial harmful significance important error nhst

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "“It is better to observe than to criti..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

“It is better to observe than to criticise.”

– Bobby Wellins

(Jazz Line-up, 13/2/2011

)Slide2

“Best of all is to convey

the magnitude of the effect and the degree of certainty explicitly

.”

Pinker (2014

, p. 45

)Slide3

“Usually what one wants to know is not whether the change makes

any

difference, but to know how likely it is that the change will be big enough.”

– (

Landauer

, 1997, p. 222)”)Slide4

Magnitude-based inference in behavioural

research

Paul van

Schaik

p.van-schaik@tees.ac.uk

http://sss-studnet.tees.ac.uk/psychology/staff/Paul_vs/index.htm

Slide5

Outline

Problem and proposed solution

Quantification in behavioural research

Statistical inference in behavioural research

Magnitude-based inference

The application of magnitude-based inference in behavioural research

Other approachesLimitationsRecommendationsSlide6

The problem

A

researcher conducts a study comparing two software designs in terms of their

usability

She

conducts usability tests with two groups, each using one of the designs, and collects various

measuresThese include perceived usability, error rate and time-on-taskShe then compares the two groups in terms of their mean scores on the measures, using a t testShe finds that, although differences in mean scores are apparent, the test results do not show statistical significanceWhat should the researcher conclude about the difference in usability between the two designs?Slide7

A proposed solution

As

an altnernative to null-hypothesis significance-testing (NHST

), use

information about

uncertainty

in the data, the observed value of the effect and smallest substantial values for the effect to make two kinds of magnitude-basedinference: mechanistic and practicalUse the results of (NHST) as inputUse spreadsheets available on the Internet to generate inferencesDeveloped and influential in sport- and exercise scienceSlide8

Quantification in user research

“The

systematic study of the goals, needs, and capabilities of users so as to specify design, construction, or improvement of tools to benefit how users work and live” (Schumacher, 2009, p. 6

)

Usability- and user-experience data

E.g. psychometric data,

error rate and time-on-taskFormative researchusers’ interaction with an artefact is studied to generate data that, when analysed, provide information to inform system improvementSummative research establishes the quality interaction of an artefact in comparison with another artefact or a benchmarkSlide9

Statistical inference in user research

Usually, null-hypothesis significance testing (NHST) is used; limitations:

null

hypothesis of no effect

is

(almost) always

falseignores the smallest important effect: has no effect on the inference that is made in NHSTdoes not address practical relevance; does not clearly define or distinguish practical and mechanistic significancea non-significant result is inconclusive and a crude classification of inference is used (reject or retain H0)sample size estimation is based on NHSTSlide10

Merits of magnitude-based inference

Requires the researcher to define smallest

important effect,

rather

than null

effect

Uses smallest important effect as integral part of inference, so inferences are not an artefact of sample sizeProvides a rigorous and principled approach to infer practical significance; provides a rigorous distinction between practical and mechanistic significanceSlide11

More merits

Provides a

more refined

classification of inferences that can be made than merely rejecting or retaining the null hypothesis

Estimates

of required sample size are based on practical significance or mechanistic significance and researcher-defined smallest important effectSlide12

Inference of mechanistic significance (1)

For descriptive purposes, an effect can be classified in terms of its

size

in relation to smallest important + and - effect

size

as

positive, trivial or negativeFor inference proper, the chances of an effect being positive, negative or trivial are usedThe chances of the effect being positive: effect falling above the threshold of the smallest important + effectThe chances of the effect being negative: effect falling below the threshold of the smallest important - effect The chances of a trivial effect: 100% minus the sum of the chances of a + effect and those of a - effectSlide13

Inference of mechanistic significance (2)

An inference is then made from the chances of each of three ranges of outcome (positivity, triviality and negativity) as

follows

Unclear effect: both

the chances of the obtained effect being

+ and

the chances of the effect being - are too large (e.g., both greater than the default value of 0.05 or other appropriate cut-offs). Otherwise, clear effect, seen as substantially +, - or trivial and considered to have the size of the observed value, with a qualification of probability Proposed interpretation of probability rangesSlide14

Probability

Chances

Odds

The effect …

positive/trivial/negative

beneficial/negligible/harmful

<0; 0.005]<0; 0.5%]<0; 1:199]is almost certainly not …<0.005; 0.05]<0.5%; 5%]<1:199: 1:19]is very unlikely to be …<0.05; 0.25]<5%; 25%]<1:19; 1:3]is unlikely to be …, is probably not …<0.25; 0.75]<25%; 75%]

<1:3; 3:1]

is possibly (not) …, may (not) be …

<0.75; 0.95]

<75%; 95%]

<3:1; 19:1]

is likely to be ..., is probably …

<0.95; 0.995]

<95%; 99.5%]

<19:1;

199:1

]

is very likely to be …

<0.995; 1>

<99.5; 100>

<199:1

;

>

is almost certainly …Slide15
Slide16
Slide17

Inference of practical significance (1)

For descriptive purposes, an effect can be classified in terms of its

size

in relation to smallest important beneficial and harmful effect size

as

beneficial, negligible or harmful

For inference proper, the chances of an effect being beneficial, harmful or negligible are usedThe chances of the effect being beneficial: effect falling above the threshold of the smallest important ben. effectThe chances of the effect being harmful: effect falling below the threshold of the smallest important harmf. effect The chances of a negligible effect: 100% minus the sum of the chances of a ben. effect and those of a harmf. effectSlide18

Inference of practical significance (2)

Type-1

practical error

analogous

to that of Type-I error in NHST (rejecting the null hypothesis when it is

true)

Type-2 practical error analogous to that of Type-II error in NHST (retaining the null hypothesis when it is false)In the practical (‘clinical’) application of effectsthe chance of using a harmful effect (a Type-1 practical error) needs to be far smaller than the chance of not using a beneficial effect (a Type-2 practical error)Slide19

Inference of practical significance (3)

An inference is then made from the chances of each of three ranges of outcome (benefit, negligibility and harm) as

follows

If

the chances of benefit are greater than the suggested cut-off of 25% for a Type-2 practical error and the chances of harm are greater than the suggested cut-off of

0.5

% for a Type-1 practical error then the effect is unclearIf the chances of benefit are greater than 25% and the chances of harm are smaller than 0.5% then the effect is clearly beneficialOtherwise, the effect is clearly negligible or harmful.Proposed interpretation of probability ranges as beforeSlide20

Example from sport science (1)

I am grateful to Matt Weston for providing this

example

A

sports researcher is interested in whether a new, commercially available nutritional supplement has a beneficial or harmful effect on elite cyclists’ 40 km time

trial

performance (the faster the time, the better the performance)The researcher conducts an experiment to examine the effect of two different doses of the supplement (a low dose and a high dose)Experimental crossover design all of the cyclists perform the time trial under three different conditions (placebo [no supplement], low dose and high dose), in a counterbalanced manner and the researcher’s experience led to the belief that the smallest worthwhile change in 40 km time trial performance was -1%Slide21

Example from sport science (2)

The mean (±

SD

) performance

times

59.5

± 1.6 min (low dose), 60.9 ± 2.2 min (high dose) and 60.5 ± 1.9 min (placebo) Magnitude-based inferences calculate the chances of benefit (or harm), with reference to a change of -1%compared to placebo, the low dose performance improved by -1.7% (90% confidence interval -2.4 to -0.9%) with a 92% chance of benefit and 0.0% chance of harma low dose of the supplement is therefore likely to be beneficial and recommendedhowever, compared to placebo the high dose impaired performance by 0.7% (90% confidence interval -0.1 to 1.5%) with a 0% chance of benefit and a 25% chance of harma high dose of the supplement is therefore most unlikely beneficial and not recommendedSlide22

Demonstration

Example: unrelated

t

test

Mechanistic inference

Practical inference

Spreadsheets available at http://www.sportsci.org/Slide23

Observations

Practical

and mechanistic

inference, but not for statistical inference, depend on smallest worthwhile effect

The range of practical and mechanistic inferences

(e.g., “is

very (un)likely to be harmful/trivial/beneficial”) is greater than that of statistical inference (dichotomous)The results of practical and mechanistic inference concur about half of the time with those of statistical inference; when the results differ, the latter is more conservativePractical and mechanistic inference mostly concurSlide24

Smallest harmful/

-ive

d

Smallest beneficial/

+ive

d

Total sample size (N)  Sample size ratio  PM

S

S/P

S/M

M/P

-0.2

0.2

268

274

788

2.94

2.88

1.02

-0.3

0.3

122

122

352

2.89

2.89

1.00

-0.4

0.4

70

70

198

2.83

2.83

1.00

-0.5

0.5

46

46

128

2.78

2.78

1.00

-0.6

0.6

34

32

90

2.65

2.81

0.94

-0.7

0.7

26

24

66

2.54

2.75

0.92

-0.8

0.8

22

20

52

2.36

2.60

0.91

-0.9

0.9

18

16

42

2.33

2.63

0.89

-1.0

1.0

14

14

34

2.43

2.43

1.00

-1.1

1.1

14

12

28

2.00

2.33

0.86

-1.2

1.2

14

10

24

1.71

2.40

0.71Slide25

Further alternatives to NHST

C

ounter-null

statistic (Rosenthal & Rubin, 1994

)

p

rep (Killeen, 2005) p-intervals (Cumming, 2008)Minimum-effect tests (Murphy & Myors, 1999) Equivalence-testing (Tryon, 2001)Non-inferiority-testing (Head et al., 2014)Bayesian statistics (Rouder et al., 2009)Slide26

Limitations

Apparent

As in NHST, need to

make several choices or accept recommended

choices

Confidence level

Type-1 and Type-2 practical-error ratesThe smallest important effectThe mapping of quantitative probabilities onto qualitative descriptorsAs in NHST, assumptions about sampling distribution of the outcome statistic; can use bootstrappingSubstantiveThe decision rules do not necessarily take all relevant factors into account, for example the (financial) value of inputs to and outputs from using a harmful or beneficial effect (Murphy & Myors, 1999)Slide27

Recommendations

Plan

sample size using magnitude-based

inference

Analyse

data using

NHST; make better use of the results as input for magnitude-based inferenceAlways analyse data using mechanistic inference; also use practical inference for effects where benefit and harm can be meaningfully defined Use appropriate spreadsheets for sample size estimation and magnitude-based inference (http://www.sportsci.org/)When preparing for journal publication, cogently argue why it is appropriate to use magnitude-based inference in your research; in your section Data Analysis explain the specific magnitude-based inference that you have used (see, e.g., Barnes et al., 2014)Slide28

Some publications

Barnes, K. R., Hopkins, W. G., McGuigan, M. R., & Kilding, A. E. (

2015).

Warm-up with a weighted vest improves running performance via leg stiffness and running economy.

Journal of Science and Medicine in Sport

, 18

, 103-108. doi:10.1016/j.jsams.2013.12.005 Batterham, A. M., & Hopkins, W. G. (2006). Making meaningful inferences about magnitudes. International Journal of Sports Physiology and Performance, 1(1), 50-57. Hopkins, W. G. (2006). Estimating sample size for magnitude-based inference. Sport Science, 10, 63-70. Hopkins, W. G. (2006). Spreadsheets for analysis of controlled trials, with adjustment for a subject characteristic. Sport Science, 10, 46-50. Hopkins, W. G., Marshall, S. W., Batterham, A. M., & Hanin, J. (2009). Progressive statistics for studies in sports medicine and exercise science. Medicine and Science in Sports and Exercise, 41(1), 3-12. doi:10.1249/MSS.0b013e31818cb278 Schaik, P. van & Weston, M. (2016). Magnitude-based inference and its application in user research. International Journal of Human-Computer Studies, 88, 38-50.