with Small Changes John D McKenzie Jr Babson College Babson Park MA 024570310 mckenziebabsonedu DSI Baltimore MD 2013 November 18 1 Abstract Todays technology produces massive amounts of data from a variety of sources such as social networking activities financial transaction ID: 805491
Download The PPT/PDF document "Introducing Big Data in Stat 101" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Introducing Big Datain Stat 101 with Small Changes
John D. McKenzie, Jr.Babson CollegeBabson Park, MA 02457-0310mckenzie@babson.eduDSIBaltimore, MD2013 November 18
1
Slide2AbstractToday’s technology produces massive amounts of data from a variety of sources such as social networking activities, financial transactions, genetic sequences, and astronomical transmissions. Very few introductory applied statistics courses consider such ‘Big Data’, for which many standard descriptive and inferential methods fail. This
presentation will consider a number of ways that students can be easily exposed to the three V’s of 'Big Data' (Volume, Velocity, and Variety) in such courses.2
Slide3AgendaBig Data and its Three
+ V’sStandard Introductory Applied CourseBig Data SetsVolumeVelocityVarieties3
Slide42012 Mathematics Awareness Monthhttp://www.maa.org/mathematics-awareness-month-2012
4
Slide5Big Data in the News
OSTP’s Big Data Initiative (US$200,000,000)(nsf.gov – search on Big Data)McKinsey Global Institute Report (a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions)Big Data Special Issue of Significance Magazine (August 2012)NSA Disclosures,…
5
Slide6Bits and Bytes
Prefixes for multiples ofbits (b) or bytes (B)
Decimal
Value
Metric
1000
k
kilo
1000
2
M
mega
1000
3
G
giga
10004Ttera10005Ppeta10006Eexa10007Zzetta10008YyottaBinaryValueJEDECIEC1024KkiloKikibi10242MmegaMimebi10243GgigaGigibi10244Titebi10245Pipebi10246Eiexbi10247Zizebi10248Yiyo
6
Slide7The Three V’s of Big Data
VolumeVelocityVariety META Group (now Gartner) analyst, Doug Laney
7
Slide8Introductory Applied CourseTerminology and Sampling Methods
Descriptive Statistics (graphs and numeric measures)Basic ProbabilityFundamental InferenceAdvanced TopicsOnly one course (De Veaux)8
Slide9VolumeMassive Data Sets
Practice SignificanceVisualization9
Slide10Big Data Setshttp://www.kdnuggets.com/datasets/
Over 60 Data RepositoriesandgrowingData Mining CompetitionsKDD Cup Results Summary10
Slide11Practical Significancep-value > .05 from one-sample z-test and
versusp-value = .000 from one-sample z-test with same sample mean and standard deviation but a 1000 times the sample sizeDoane and Steward (2009), Applied Statistics in Business & Economicspp. 364, 371, 374, 404, and 594 reinforcement 11
Slide12Practical Significance 2Chi-Square Test of Independence
with p-value of .255 to a p-value of .000 for12
1000
600
900
700
100
60
90
70
Slide13Data Visualization
A visualization created by IBM of Wikipedia edits. At multiple terabytes in size, the text and images of Wikipedia are a classic example of big data13
Slide14Data VisualizationTwitter Mentions
14
Slide15VelocityTime Series Data
Process Data15
Slide16Variety (structure)Two Sample DataMissing Data
Messy DataText DataDate and Time Data16
Slide17Variety: Two Sample Data
17
Slide18Text Data: Word Cloud
18
Slide19Text Data: Word Cloud
19
Slide20DSI Constitution and By-Laws
20
Slide21Text Data: N-Gram
21
Slide22Big Data, Business Analytics, Predictive Analytics, …, Data Science
22
Slide23Variety (sources)23
Slide24Future Introductory CourseMath Common Core State Standards
will result in Remedial Sections?Today’s Course with More Topics?Today’s Second Core?Big Data Analytics Course?or ?
24
Slide25Two Current Examples of AnalyticsSharpe, De Veaux, and Velleman (2012), Business Statistics, Second Edition, Chapter 25, Introduction to Data Mining (Paralyzed Veterans of America)
Berenson, Levine, and Krehbiel (2012), Basic Business Statistics, Twelfth Edition, Online Topic: Analytics and Data Mining2015?25