What is STATISTICS Statistics fulfill one of the basic human needs A process to Manage to clean and format the data in order to get a valid data which is feasible to be analyzed ID: 921380
Download Presentation The PPT/PDF document "Introduction to Statistical Analysis" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Introduction to Statistical Analysis
Slide2What is STATISTICS?
Statistics
fulfill one
of the basic human
needs.
A
process
to:
Manage
-
to clean and format the data in order to get a valid data which is feasible to be analyzed
.
Analyze
-
to explore the data in order to answer the objective
.
Interpret
data
-
to convert
the statistical interpretation to the common understanding.
Slide3Classification of Statistics
Descriptive Statistics
- describe the data by summarizing them
Inferential Statistics
- techniques, by which..
- inferences are drawn for the population
parameters from the sample statistics
OR
- Conclusions were made for the
population using a sample data
Slide4What is the Descriptive Statistics for?
In any study….
Before answering the research question, we should recognize the characteristics of the sample
- (e.g. age, gender, ethnicity, etc)
Slide5How to describe a categorical variable?
Statistics
- Frequency
- Relative frequency
- Cumulative relative frequency
Figure/Chart
Bar
Pie
Slide6Statistics
Central tendency
Mean
Median (50
th
Percentile)
Dispersion
Standard deviation
Inter-quartile range (IQR) (3rd quartile – 1st quartile)Figure/ChartHistogram/Frequency polygonBox plot
How to describe a numerical variable?
Slide7Inferential statistics
With inferential statistics, we take a sample (a small subset of a larger set of data).
We then use this sample to draw inferences or make generalizations about the population from which the samples were drawn.
Estimation (Confidence interval)
For e.g.: Estimation of mean, estimation of proportion
Hypothesis test
For e.g.:
Comparing means, comparing proportions, association between 2 variables
Slide8Estimation
In estimation, the sample is used to estimate a population parameter and a confidence interval about the estimate is constructed.
Estimation (CI) of mean:
For e.g.: =16.14, 95% CI = (15.30, 16.98)
We are 95% sure that the mean duration of exercise of population will lie between 15.30 and 16.98 minutes/day.
Estimation (CI) of proportion:
For e.g.: p =0.37, 95% CI = (0.27, 0.47)
We are 95% sure that the prevalence of the obesity in the population will be between 27% & 47%.
Slide9Hypothesis
Testable statement that describes s relationships of variables.
Derived from research questions.
Postulating the existence of:
1. A difference between groups.
2. An association among factors.
Null hypothesis (
H
0
): - Hypothesis to be tested, of no difference.
Alternative hypothesis (
H
a
): - Hypothesis that postulates that there is a treatment effect or a difference between groups. The process of inferential statistics is to justify whether we have enough evidence (based on probability) to reject or fail to reject H0.
Slide10Interpretation
Sample Size
Data Collection
Data Quality
Statistical Analysis
Statistical Procedure
Slide11Sample Size
Slide12What Is Sample Size?
Sample size
is
:-
the number of units (persons, animals, patients, specific circumstances, etc.) in a population need to be studied to represent the population.
Slide13Guide : When to start and stop collecting? How are we going to collect it?
Minimum required sample: Depends on availability of the sample, time constraint, subject constraint and ethical issues
Study design : Influence the quality and accuracy of research
Economic : Waste of resources if not having the capability to produce useful results
Why We Need To Calculate Sample Size?
Slide14Process of Sample Size Determination
Slide15Journal sample size for pilot study
Slide16Too small
Well conducted study may fail to answer its research question.
May fail to detect important effects
May estimate those effects imprecisely
Too large
Costly – the longer the study the higher it cost
Difficulties face – lack of manpower and time
Tiring – recruitment of outcome or subjects maybe tiring for a long time
Sample size should be adequate to achieve a good precision in estimation
What
Happened If Sample Size…..
Slide17Power and sample size
http://biostat.mc.vanderbilt.edu/wiki/Main/PowerSampleSize
Epi
– Info Software
http://wwwn.cdc.gov/epiinfo/html/downloads.htm
EpiCalc
http://www.brixtonhealth.com/epicalc.html
Sample size for Prevalence Studies.xls – Lin
NaingSample size for Sensitivity and Specificity.xls – Lin NaingPower Analysis and Sample Size (PASS) – most powerful but have to buy the license firsthttp://www.ncss.com/pass.html
Statistical Software: Website address
Slide18Data Collection
Slide19E.g.: gender, race
Logical
ordering to
the categories, e.g.: education level, pain severity
E.g.: age, weight, height
Statistics:
Frequency & percentage
Relative frequency & percentage
Cumulative frequency & percentage
Figure/chart:
Bar
Pie
Statistic:
Central tendency & dispersion
Mean & SD(if normality assumed)Median & IQR (if skewed)Figure/chart:
HistogramBoxplot
Slide20Before key-in
–
What to prepare?
INSTRUMENT/QUESTIONNAIRE
Purpose and objective
Variables and units
Format
DATA DEFINITION/ DATA DICTIONARY
Explain the
summary of variables
in terms of variable name, description, formatting and labelling (where necessary/applicable).DATA ENTRY
Slide21DATA DEFINITION/DATA DICTIONARY
Slide22Data Quality
Slide23DUPLICATES
More than one
observation having same
patientid
MISSING VALUES
Blank
cell without information.
INCONSISTENCY
3 means nothing
in the definition for
ptgender
EXTREME VALUES
Value
exceeds the upper limit.Definition: patientid - patient identification number. ptgender - 1 is Male and 2 is Female. height - defines from 1.4m till 1.8m.
weight - defines from 50.0kg till 150.0kg.DATA CLEANING
Slide24Statistical Analysis
Slide25Number of Groups
Parametric Test
Non-parametric
Test
Dependent
Variable
Dependent
Variable
Numerical Data
Numerical Data
Two
(Independent)
Categorical
(e.g. smokers and non-smokers)Independent t-test
Mann-Whitney test> two (Independent)-Categorical(e.g. malay, chinese and indian)One-way ANOVAKruskal-Wallis testTwo (Dependent)-Categorical(e.g. pre and post intervention)Paired t-test
Wilcoxon Signed Rank test
-Numerical
(e.g. weight in kg)
Pearson’s correlation
Spearman’s correlation
Type of Analysis
Slide26Number of Groups
Assumptions
Assumptions
Dependent
Variable
Dependent
Variable
Categorical Data
Categorical Data
Two
(Independent)
Categorical
(e.g. smokers and non-smokers)
Chi-square testFisher’s exact test> two (Independent)-Categorical
(e.g. malay, chinese and indian)Chi-square testFisher’s exact testNon-parametric Test
Type of Analysis
Assumptions:
The number of cells with Expected Count (EC) less than 5, must be less than 25% of the total number of cells.
The smallest EC must be at least 2.
Slide27Example Study
– numerical data
RQ: Is there any difference of time spent for exercise between obese and non-obese group?
Objective: To compare the mean duration of exercise between obese and non-obese group
Assumption:
Dependent variable should be approximately normally distributed for each category of the independent variable.
Slide28second
Test statistic using SPSS
STEP:
Analyze >> Compare means >> Independent
t
test
Analyze >> Descriptive Statistics >> Explore
Click
Click
Slide29Make a decision
third
Descriptive statistics of each variable
Levene’s
test
result
If P value > 0.05(not sig.), read the first row(Equal variances assumed).
If P value < 0.05(sig), read the second row(Equal variances not assumed).
T
he
Levene’s
test is not significant.
Slide30Interpretation
“ An independent-sample
t
-test indicated that duration of exercise were not significantly different between obese (Mean=16.7, SD=4.83) and non-obese (Mean=15.8, SD=3.88),
t
(98)=1.06,
p
=
.291
. Therefore, there is no significant association between duration of exercise and obesity.”
Table 1: Comparing mean duration of exercise between obese(
n
=37) and non-obese(
n
=63) respondents
Slide31Comparing 2 or more proportions.
RQ: Is there any association between gender and obesity group?
Example Study
– categorical data
Slide32second
Test statistic using SPSS
Analyze >> Descriptive Statistics >> Crosstabs
Click
Click
Click
Click
1
2
Slide33Make a decision
third
must be
less than 25
%
must be at least 2
Slide34fourth
Interpretation
“ A Chi-square test for independence indicated that the prevalence (proportion) of obesity between male and female are not significantly different (
P
=0.753). Therefore, there is no significant association between gender and obesity.”
Table 9: Association between gender and obesity
Slide35Interpretation
Sample Size
Data Collection
Data Quality
Statistical Analysis
Statistical Procedure
Slide36Thank You