/
Crowd Mining Tova  Milo The engagement of crowds of Web users for data procurement Crowd Mining Tova  Milo The engagement of crowds of Web users for data procurement

Crowd Mining Tova Milo The engagement of crowds of Web users for data procurement - PowerPoint Presentation

holly
holly . @holly
Follow
343 views
Uploaded On 2022-07-13

Crowd Mining Tova Milo The engagement of crowds of Web users for data procurement - PPT Presentation

1012013 2 Background Crowd Data sourcing 2 Crowd Mining Crowdsourcing Challenges or shameless selfadvertisement What questions to ask SIGMOD13 VLDB13 ID: 928739

mining crowd questions central crowd mining central questions doat park supp data rules support eatat user complexity espresso open

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Crowd Mining Tova Milo The engagement o..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Crowd Mining

Tova

Milo

Slide2

The engagement of crowds of Web users for data procurement10/1/2013

2

Background

-

Crowd (Data) sourcing

2

Crowd Mining

Slide3

Crowdsourcing

Challenges (or, shameless self-advertisement )

What questions to ask? [SIGMOD13, VLDB13]

How to define & determine correctness of answers? [WWW12, ICDE11]

Who to ask? how many people? How to best use the resources? [ICDE12, ICDT13, ICDE13, VLDB13]

3

Crowd Mining

Data Mining

Data Cleaning

Probabilistic Data

Optimizations and Incremental Computation

Slide4

Crowd Mining: Crowdsourcing in an open world

Human knowledge forms an

open worldAssume we want to find out what is interesting

and important in some domain area

Folk medicine, people’s habits, …What questions to ask?

4

Crowd Mining

Slide5

Back to classic databases...

Significant data patterns are identified using

data mining techniques.A useful type of pattern: association rulesE.g., stomach ache

 chamomileQueries are dynamically constructed in the learning process

Is it possible to mine the crowd?

5

Crowd Mining

Slide6

Turning to the crowd

Let us model the history of every user as a

personal database

Every case = a

transaction consisting of itemsNot recorded anywhere – a hidden DBIt is hard for people to recall many details about many transactions!But … they can often

provide summaries, in the form of personal rules “To treat a sore throat I often use garlic”

6

Crowd Mining

Treated a

sore throat

with

garlic

and

oregano leaves

Treated a

sore throat

and

low fever

with garlic and

ginger …

Treated a heartburn with water,

baking soda

and

lemon

Treated

nausea

with

ginger

, the patient experienced

sleepiness

Slide7

Two types of questions

Free recollection (mostly simple, prominent patterns)

 Open questionsConcrete questions (may be more complex)

 Closed questions

We use the two types interleavingly.

When a patient has both

headaches and fever, how often do you use a willow tree bark infusion?

Tell me how you treat a particular illness

“I typically treat nausea with ginger infusion”

7

Crowd Mining

Slide8

Contributions (at a very high level)

Formal model

for crowd mining; allowed questions and the answers interpretation; personal rules and their overall significance.A Framework of the generic components required for mining the crowdSignificance and error estimations.

[and, how will this change if we ask more questions…]Crowd-mining algorithms

[Implementation & benchmark. both synthetic & real data.]

8

Crowd Mining

Slide9

The model: User support and confidence

A set of

users UEach user u 

U has a

(hidden!) transaction database DuEach rule X

 Y is associated with:

9

u

ser

support

user

confidence

Crowd Mining

Slide10

Model for closed and open questions

Closed questions:

X ? YAnswer: (approximate) user support and confidenceOpen questions:

? ? ?

Answer: an arbitrary rule with its user support and confidence

“I typically have a headache once a week. In 90% of the times, coffee helps.

10

Crowd Mining

Slide11

Significant rules

Significant rules

: Rules were the mean user support and confidence are above some specified thresholds Θs

, Θ

c.Goal: estimating rule significance (and identifying the significant rules) while asking the smallest possible number of questions to the crowd

11

Crowd Mining

Slide12

Framework components

One generic

framework for crowd-miningParticular choices of implementation for the black boxesValidated by experiments

12

Crowd Mining

Slide13

What is a good algorithm?

How to measure the efficiency of Crowd Mining Algorithms ???

Two distinguished cost factors:Crowd complexity: # of crowd queries used by the algorithm

Computational complexity: the complexity of computing the crowd queries and processing the answers

[Crowd comp. lower bound is a trivial computational comp. lower bound]There exists a tradeoff between the complexity measuresNaïve questions selection -> more crowd questions

13

Crowd Mining

Slide14

Semantic knowledge can save work

Given a taxonomy of is-a relationships among items, e.g.

espresso is a coffee frequent

({headache, espresso})

 frequent({headache, coffee})

AdvantagesAllows inference on itemset frequencies

Allows avoiding semantically equivalent itemsets {espresso}, {

espresso, coffee

}, {

espresso, beverage

}…

14

Crowd Mining

Slide15

Complexity measures

Given a taxonomy

Ψ we measure complexity it terms ofthe input Ψ:its size, shape, width…

the output frequent itemsets

, represented compactly by the Maximal Frequent Itemsets (MFI)Minimal Infrequent Itemsets (MII)

15

Crowd Mining

Slide16

Complexity boundaries

Notations:

|Ψ| - the taxonomy size|I(

Ψ)|

- the number of itemsets (modulo equivalences)|S(Ψ)

| - the number of possible solutions

16

Crowd Mining

Slide17

Now, back to the bigger picture…

17

Crowd Mining

“I’m looking for activities to do in a child-friendly attraction in New York, and a good restaurant near by”

“You can go bike riding in Central Park and eat at

Maoz

Vegetarian.

Rent bikes at the boathouse”

“You can go visit the Bronx Zoo

and eat at Pine Restaurant.

Order antipasti at Pine.

Skip dessert and go for ice cream across the street”

The user’s question in natural language:

A

nswers:

Slide18

ID

Transaction Details1<Football>

doAt <Central_Park>2

<Biking> doAt

<Central_Park>.<BaseBall> doAt

<Central_Park><Rent_Bikes>

doAt <Boathouse>

3

<Falafel>

eatAt

<

Maoz

Veg.>

4

<Antipasti> eatAt

<Pine>5

<Visit> doAt

<Bronx_Zoo>.

<Antipasti> eatAt <Pine>

Slide19

Solution Ingredients

Crowd Mining

Query Language (based on SPARQL and DMQL)Describing the relevant part of the ontology, andThe type of (association) rules we are interested in

Crowd-based Query Evaluation AlgorithmsOpen and closed questions to the crowd

Sampling and answers aggregationRefinement…

19

Crowd Mining

Slide20

Crowd Mining Query language

(based on SPARQL and DMQL)

20

Crowd Mining

FIND association rules

RELATED TO

x, y+, z, u?, p?, v?

WHERE

{$x

instanceOf

<Attraction>.

$x inside <NYC>.

$x

hasLabel

“Child Friendly”.

$y

subClassOf

*

<Activity>.

$z instanceOf <Restaurant>.

$z nearBy $x.

...}MATCHING

(

{}

=>

{([]

eatAt

$z.)}

WITH support THRESHOLD = 0.007

)

AND

(

{([]

doAt

$x)}

=>

{($y

doAt

$x),($u $p $v)}

WITH support THRESHOLD = 0.01

WITH confidence THRESHOLD = 0.2

RETURN MFI

)

Determine

group size

for each variable,

SPARQL-like

where clause

, $x is a variable

Path

of length 0 or more

The

left-hand

and

right-hand

parts of the rule are defined as SPARQL patterns

Mining parameters

: thresholds, output granularity, etc.

Several

rule patterns can be mined, joint by AND/OR

Slide21

Can we trust the crowd ?

21

Crowd Mining

21

Slide22

22

Can we trust the crowd ?

22

Crowd Mining

Slide23

Summary

23

Crowd Mining

The crowd is an incredible resource…

But must be used carefully!

Many challenges:(very) interactive computation

A huge amount of dataVarying quality and trust

“Computers are useless, they can only give you answers”

-

Pablo Picasso

But, as it seems, they can also ask us questions!

Slide24

Thanks

24

Crowd Mining

Antoine

Amarilli

, Yael

Amsterdamer

,

Rubi

Boim

,

Susan Davidson

,

Ohad

Greenshpan

, Benoit Gross,

Yael Grossman

, Ezra Levin,

Ilia

Lotosh, Slava

Novgordov

,

Neoklis

Polyzotis

,

Sudeepa

Roy,

Pierre

Senellart

,

Amit

Somech

, Wang-

Chiew

Tan…

EU-FP7 ERC

MoDaS

Mob Data Sourcing

Slide25

(Closed) Questions to the crowd

25

Crowd Mining

How often you

do

something

in Central Park?{ ( [] doAt <

Central_Park

> )

How often

do you eat

at

Maoz

Vegeterian

?

{ ( [] eatAt

<Maoz_Vegeterian> ) }

How often do you swim in

Central Park?{ ( <Swim> doAt <Central_Park>

) }Supp=0.7

Supp=0.1

Supp=0.3

Slide26

(Open) Questions to the crowd

26

Crowd Mining

What do you do

in Central Park ?

{ ( $y doAt <Central_Park>) }

What else do you do when

biking

in

Central Park

?

{ (<

Biking

>

doAt <Central_Park>) , ($y doAt

<Central_Park>) }

Complete: When I go biking in Central Park…

{ (<Biking> doAt <Central_Park

>), ( $u $p $v ) }$y = Football, supp = 0.6

$y = Football, supp = 0.6

$u = <

Rent_Bikes

>, $p =

doAt

, $v = <

BoatHouse

>, supp = 0.6

Slide27

Example of extracted association rules

<

Ball Games, doAt , Central_Park

> (Supp:0.4 Conf:1) < [ ] ,

eatAt , Maoz_Vegeterian > (Supp:0.2 Conf:0.2)<

visit , doAt , Bronx_Zoo

> (Supp: 0.2 Conf: 1) < [ ] , eatAt ,

Pine_Restaurant

>

(Supp: 0.2 Conf: 0.2)

27

Crowd Mining