/
2/25/13 - Union University 2/25/13 - Union University

2/25/13 - Union University - PowerPoint Presentation

pasty-toler
pasty-toler . @pasty-toler
Follow
410 views
Uploaded On 2016-05-19

2/25/13 - Union University - PPT Presentation

1 ADVENTURES IN DATA MINING Margaret H Dunham Southern Methodist University Dallas Texas 75275 mhdlylesmuedu This material is based in part upon work supported by the National Science Foundation under Grant No ID: 325866

union university data eamonn university union eamonn data http www mining ucr keogh classification rna clustering amp terrorist org information duck computer

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "2/25/13 - Union University" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

2/25/13 - Union University

1

ADVENTURES

IN DATA MINING

Margaret H. Dunham

Southern Methodist UniversityDallas, Texas 75275mhd@lyle.smu.eduThis material is based in part upon work supported by the National Science Foundation under Grant No. 9820841 and NIH Grant No.1R21HG005912-01A1Some slides used by permission from Dr Eamonn Keogh; University of California Riverside;eamonn@cs.ucr.edu

ACM Distinguished Speakers ProgramSlide2

2/25/13 - Union University

2

The 2000 ozone hole over the

antarctic

seen by EPTOMShttp://

jwocky.gsfc.nasa.gov/multi/multi.html#holeSlide3

Data Mining Outline

Introduction

TechniquesClassificationClusteringAssociation Rules

Examples2/25/13 - Union University3

Explore some interesting data mining applicationsSlide4

Introduction

Data is growing at a phenomenal rate

Users expect more sophisticated informationHow?

2/25/13 - Union University4

UNCOVER HIDDEN INFORMATIONDATA MININGSlide5

But it isn’t Magic

You must know what you are looking for

You must know how to look for you

2/25/13 - Union University5

Suppose you knew that a specific cave had gold: What would you look for? How would you look for it? Might need an expert minerSlide6

CLASSIFICATION

Assign data into predefined groups or classes.

2/25/13 - Union University

6Slide7

“If it looks like a duck,

walks like a duck, and quacks like a duck, then

it’s a duck.”

2/25/13 - Union University7

Description

BehaviorAssociationsClassification Clustering Link Analysis (Profiling) (Similarity) “If it looks like a terrorist, walks like a terrorist, and quacks like a terrorist, then it’s a terrorist.”Slide8

Classification Ex: Grading

2/25/13 - Union University

8

>=90

<90

x>=80<80x>=70

<70

x

F

B

A

>=60

<50

x

C

DSlide9

2/25/13 - Union University

9

Grasshoppers

Katydids

Given a collection of annotated data. (in this case 5 instances

of Katydids and five of Grasshoppers), decide what type of insect the unlabeled example is.

(c) Eamonn Keogh, eamonn@cs.ucr.eduSlide10

2/25/13 - Union University

10

Insect ID

Abdomen

Length

Antennae LengthInsect Class

1

2.7

5.5

Grasshopper

2

8.0

9.1

Katydid

3

0.9

4.7

Grasshopper

4

1.1

3.1

Grasshopper

5

5.4

8.5

Katydid

6

2.9

1.9

Grasshopper

7

6.1

6.6

Katydid

8

0.5

1.0

Grasshopper

9

8.3

6.6

Katydid

10

8.1

4.7

Katydid

11

5.1

7.0

???????

The classification problem can now be expressed as:

Given a training database predict the

class

label of a previously unseen instance

previously unseen instance

=

(c) Eamonn Keogh, eamonn@cs.ucr.eduSlide11

2/25/13 - Union University

11

Antenna Length

10

1

2

3

4

5

6

7

8

9

10

1

2

3

4

5

6

7

8

9

Grasshoppers

Katydids

Abdomen Length

(c) Eamonn Keogh, eamonn@cs.ucr.eduSlide12

2/25/13 - Union University

12

How Stuff Works, “Facial Recognition,”

http://

computer.howstuffworks.com/facial-recognition1.htmSlide13

2/25/13 - Union University

13

Facial Recognition

(c) Eamonn Keogh, eamonn@cs.ucr.eduSlide14

2/25/13 - Union University

14

Handwriting Recognition

George Washington Manuscript

0

50

100

150

200

250

300

350

400

450

0

0.5

1

(c) Eamonn Keogh, eamonn@cs.ucr.eduSlide15

Rare Event Detection

2/25/13 - Union University

15Slide16

2/25/13 - Union University

16Slide17

2/25/13 - Union University

17

Dallas Morning News

October 7, 2005Slide18

© Prentice Hall

18

Classification Performance

True Positive

True Negative

False PositiveFalse NegativeSlide19

Behavior Based Classification/Prediction

Credit Card

Fraud DetectionCredit Score

Home Mortgage Approval2/25/13 - Union University19Slide20

CLUSTERING

Partition data into previously undefined groups.

2/25/13 - Union University

20Slide21

2/25/13 - Union University

21

http://

149.170.199.144/multivar/ca.htmSlide22

2/25/13 - Union University

22

What is Similarity?

(c) Eamonn Keogh, eamonn@cs.ucr.eduSlide23

Two Types of Clustering

2/25/13 - Union University

23

Hierarchical

Partitional

(c)

Eamonn

Keogh, eamonn@cs.ucr.eduSlide24

Hierarchical Clustering Example

Iris Data Set

2/25/13 - Union University

24

Setosa

VersicolorVirginicaThe data originally appeared in Fisher, R. A. (1936). "The Use of Multiple Measurements in Axonomic Problems," Annals of Eugenics 7, 179-188.Hierarchical Clustering Explorer Version 3.0, Human-Computer Interaction Lab, University of Maryland, http://www.cs.umd.edu/hcil/multi-cluster .Slide25

ASSOCIATION RULES/

LINK ANALYSIS

Find relationships between data

2/25/13 - Union University25Slide26

ASSOCIATION RULES EXAMPLES

People who buy diapers also buy beer

If gene A is highly expressed in this disease then gene A is also expressedRelationships between people

Book StoresDepartment StoresAdvertisingProduct Placementhttp://www.amazon.com/Data-Mining-Introductory-Advanced-Topics/dp/0130888923/ref=sr_1_1?ie=UTF8&s=books&qid=1235564485&sr=1-12/25/13 - Union University

26Slide27

2/25/13 - Union University

27

Data Mining Introductory and Advanced Topics

, by Margaret H. Dunham, Prentice Hall, 2003.

DILBERT reprinted by permission of United Feature Syndicate, Inc.Slide28

Data Mining Outline

Introduction

TechniquesExamplesVision Mining Law Enforcement (Cheating, Plagiarism, Fraud, Criminal Behavior,…)

Bioinformatics2/25/13 - Union University28Slide29

Vision Mining

License Plate Recognition

Red Light CamerasToll Boothshttp://www.licenseplaterecognition.com/

Computer Visionhttp://www.eecs.berkeley.edu/Research/Projects/CS/vision/shape/vid/2/25/13 - Union University29Slide30

2/25/13 - Union University

30

Joshua Benton and Holly K. Hacker, “At Charters, Cheating’s off the Charts:,

Dallas Morning News

, June 4, 2007.Slide31

No/Little Cheating

2/25/13 - Union University

31

Joshua Benton and Holly K. Hacker, “At Charters, Cheating’s off the Charts:,

Dallas Morning News, June 4, 2007.Slide32

Rampant Cheating

2/25/13 - Union University

32

Joshua Benton and Holly K. Hacker, “At Charters, Cheating’s off the Charts:,

Dallas Morning News, June 4, 2007.Slide33

2/25/13 - Union University

33

Jialun

 Qin, Jennifer J. Xu,

Daning Hu, Marc Sageman and Hsinchun Chen, “Analyzing Terrorist Networks: A Case Study of the Global Salafi

Jihad Network”  Lecture Notes in Computer Science, Publisher: Springer-Verlag GmbH, Volume 3495 / 2005 , p. 287.Slide34

Arnet

Miner

http://arnetminer.org/

2/25/13 - Union University34Slide35

DNA

Basic building blocks of organisms

Located in nucleus of cellsComposed of 4 nucleotidesTwo strands bound together

2/25/13 - Union University35

http://www.visionlearning.com/library/module_viewer.php?mid=63Slide36

Central Dogma: DNA -> RNA -> Protein

2/25/13 - Union University

36

Protein

RNA

DNAtranscriptiontranslationCCTGAGCCAACTATTGATGAA

Amino Acid

CCUGAGCCA

ACU

AUUGAUGAA

www.bioalgorithms.info

; chapter 6; Gene PredictionSlide37

Human Genome

Scientists originally thought there would be about 100,000 genes

Appear to be about 20,000WHY?

Almost identical to that of Chimps. What makes the difference?Answers appear to lie in the noncoding regions of the DNA (formerly thought to be junk)2/25/13 - Union University37Slide38

RNAi – Nobel Prize in Medicine 2006

2/25/13 - Union University

38

Double stranded RNA

Short Interfering RNA (~20-25

nt)RNA-Induced Silencing ComplexBinds to mRNACuts RNAsiRNA may be artificially added to cell!Image source: http://nobelprize.org/nobel_prizes/medicine/laureates/2006/adv.html, Advanced Information, Image 3Slide39

miRNA

Short (20-25nt) sequence of noncoding RNA

Known since 1993 but significance not widely appreciated until 2001Impact / Prevent translation of mRNA

Generally reduce protein levels without impacting mRNA levels (animal cells)FunctionsCauses some cancersGuide embryo developmentRegulate cell DifferentiationAssociated with HIV…2/25/13 - Union University

39Slide40

TCGR – Mature miRNA

(Window=5; Pattern=3)

2/25/13 - Union University

40

All Mature

Mus Musculus Homo SapiensC Elegans

ACG

CGC

GCG

UCGSlide41

TCGRs for Xue Training Data

2/25/13 - Union University

41

POS

I

TIVE

NEGAT

I

VE

C. Xue, F. Li, T. He, G. Liu, Y. Li, nad X. Zhang, “Classification of Real and Pseudo MicroRNA Precursors using Local Structure-Sequence Features and Support Vector Machine,”

BMC Bioinformatics

, vol 6, no 310

. Slide42

2/25/13 - Union University

42

Affymetrix GeneChip

® Array

http://www.affymetrix.com/corporate/outreach/lesson_plan/educator_resources.affxSlide43

BIG BROTHER ?

Total Information Awareness

http://en.wikipedia.org/wiki/Information_Awareness_OfficeTerror Watch List

http://www.businessweek.com/technology/content/may2005/tc20050511_8047_tc_210.htmhttp://www.theregister.co.uk/2004/08/19/senator_on_terror_watch/http://blog.wired.com/27bstroke6/2008/02/us-terror-watch.html CAPPShttp://en.wikipedia.org/wiki/CAPPS

2/25/13 - Union University43Slide44

2/25/13 - Union University

44

http://ieeexplore.ieee.org/iel5/6/32236/01502526.pdf?tp=&arnumber=1502526&isnumber=32236Slide45

2/25/13 - Union University

45Slide46

My DM

Toolbelt

C, C++Perl, RubyWekaR, SAS

Excel, XLMinerVi, word, …Grep, sed, …2/25/13 - Union University

46Slide47

2/25/13 - Union University

47

Thanks

!