/
AnHai Doan AnHai Doan

AnHai Doan - PowerPoint Presentation

alida-meadow
alida-meadow . @alida-meadow
Follow
365 views
Uploaded On 2016-07-20

AnHai Doan - PPT Presentation

University of Wisconsin Big Data Big Knowledge and Big Crowd The world has changed now everything is data centric everyone collects stores analyzes TBs and PBs of data To manage data in this new world need 3B technologies ID: 412434

big data crowdsourcing knowledge data big knowledge crowdsourcing technologies match crowd kbs problems products centers electronics 500k 500 scale

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "AnHai Doan" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

AnHai DoanUniversity of Wisconsin

Big Data, Big Knowledge, and Big CrowdSlide2

The world has changed; now everything is data centriceveryone collects, stores, analyzes TBs and PBs of data To manage data in this new world, need 3B technologies: lot of data

 need big data technologies to scale up algorithms

data is noisy, unstructured, heterogeneous

 need a lot of domain knowledge to understand such knowledge is often captured in big knowledge basesalgorithms are imperfect, certain things humans do better, need humans in the loop, scale is such that there is not enough human developers  need crowdsourcing with big crowd

2Slide3

ExamplesSemantic analysis of the Twitter streamprocess 3000-6000 tweets per sec, need fast data infrastructure

to recognize entities, e.g., “go giant!”, need a big KBKB being built in real time using crowdsourcing

Product matching for e-commerce

build 500+ matchers to match products one matcher per category: toy, electronics, clothes, etc.match 500K electronics products with 500K  need Hadoopuse a KB to match numerous synonyms: soft cover = paperback, etc.use crowdsourcing to generate training and testing data 3Slide4

Big Knowledge TechnologiesEveryone is now building KBsIT companies: Google, Microsoft, …

e-retailers: Amazon, Walmart, … stodgy behemoths: Johnson Control, GE, … tiny startups, academia, …

User communities are building KBs (e.g., biomedical)

There will be not just data centers, but also knowledge centersKBs and tools that use such KBscritical for understanding data (e.g., tweets)How do we help people build KBs? Knowledge centers? a next important direction for data integration research4Slide5

Big Crowd TechnologiesIndustry has been doing these for yearsFor us it’s not a fad, it’s fundamental

as data management increasingly involves semantic problemsHave gotten off to a good start (platforms / problems)

Need

hands-off crowdsourcingno developer in the loop, otherwise will not scalee.g., crowdsourcing 500 product matching problems, one per categoryNeed crowdsourcing for the massese.g., journalist wants to match two political lists of donorsNeed “grand challenges”

for

crowdsourcing

?

e.g., something like Wikipedia?

5