On Incentive-Based Tagging - PowerPoint Presentation

On Incentive-Based Tagging
On Incentive-Based Tagging

On Incentive-Based Tagging - Description


Xuan S Yang Reynold Cheng Luyi Mo Ben Kao David W Cheung xyang2 ckcheng lymo kao dcheungcshkuhk The University of Hong Kong Outline 2 Introduction Problem Definition amp Solution ID: 510201 Download Presentation

Tags

tagging posts tag quality posts tagging quality tag time data tagged social resources based amp recommendation resource incentive delicious stable top physics

Embed / Share - On Incentive-Based Tagging


Presentation on theme: "On Incentive-Based Tagging"— Presentation transcript


Slide1

On Incentive-Based Tagging

Xuan S. Yang, Reynold Cheng, Luyi Mo, Ben Kao, David W. Cheung{xyang2, ckcheng, lymo, kao, dcheung}@cs.hku.hkThe University of Hong KongSlide2

Outline

2IntroductionProblem Definition & SolutionExperimentsConclusions & Future WorkSlide3

Collaborative Tagging Systems

3Example:Delicious, Flickr Users / TaggersResourcesWebpagesPhotosTags

Descriptive keywordsPostNon-empty set of tagsSlide4

Applications with Tag Data

4Search[1][2]Recommendation[3]Clustering[4]Concept Space Learning[5]

[1] Optimizing web search using social annotations. S. Bao et al. WWW’07[2] Can social bookmarking improve web search? P. Heymann et al. WSDM’08[3] Structured approach to query recommendation with social annotation data. J. Guo CIKM’10[4] Clustering the tagged web. D. Ramage et al. WSDM’09

[5] Exploring the value of folksonomies for creating semantic metadata. H. S. Al-Khalifa IJWSIS’07Slide5

Problem of Collaborative Tagging

5Most posts are given to small number of highly popular resources

[6] Analyzing Social Bookmarking Systems: A del.icio.us Cookbook. ECAI Mining Social Data Workshop. 2008

dataset from delicious

[6]

All

30m

urls

Over

10m

urls

are just tagged

once

Under-Tagging

39%

posts vs.

1%

urls

Over-TaggingSlide6

Under-Tagging

6Resources with very few posts have low quality tag dataLow quality of one single postIrrelevant to the

resource{3dmax}Not cover all the aspects{geography, education}Don’t know which tag is more important{maps, education}

Improve

tag data quality for under-tagged resource by

g

iving it

sufficient number

of postsSlide7

Having a sufficient No. of Posts

7All aspects of the resource will be coveredRelative occurrence frequency of tag t can reflect its importance

Irrelevant Tags rarely appearImportant tags occur frequently Can we always improve tag data quality by giving more posts to a resource?Slide8

Over-Tagging

8Relative Frequency vs. no. of posts>=250, stable

Tagging Efforts are

Wasted!Slide9

Incentive-Based Tagging

9Guide users’ tagging effortReward users for annotating under-tagged resourcesReduce the number of under-tagged resources

Save the tagging efforts wasted in over-tagged resourcesSlide10

Incentive-Based Tagging (cont’d)

10Limited BudgetIncentive AllocationObjective: Maximize Quality Improvement

Selected Resource

Quality Metric for Tag DataSlide11

Effect of Incentive-Based Tagging

11Top-10 Most Similar Query5,000 tagged resources Simulation for Physics ExperimentsImplemented in Java

www.myphysicslab.comTag Data

Top-10 ResultBase Case: 150k Posts From Delicious

10 Java

150k

+

10k

more Posts from Delicious

4 Physics

6 Java

150k

+

10

k

more Posts from

incentive-Based Tagging

9 Physics

1

Simulation

Ideal Case:

2m

Posts

from Delicious

10 PhysicsSlide12

Related Work

12Tag Recommendation[7][8][9] Automatically assign tags to resourcesDifferences:Machine-Learning Based MethodsHuman Labor

[7] Social Tag Prediction. P. Heymann, SIGIR’08[8] Latent Dirichlet Allocation for Tag Recommendation, R. Krestel, RecSys’09[9] Learning Optimal Ranking with Tensor Factorization for Tag Recommendation, S. Rendle, KDD’09Slide13

Related Work (Cont’d)

13Data Cleaning under Limited Budget[10]Similarity:Improve Data Quality with Human LaborOpposite Directions:

“-” Remove Uncertainty“+” Enrich Information[10] Explore or Exploit? Effective Strategies for Disambiguating Large Databases.  R. Cheng VLDB’10Slide14

Outline

14IntroductionProblem Definition & SolutionExperimentsConclusions & Future WorkSlide15

Data Model

15Set of ResourcesFor a specific riPost: a set of tagsPost Sequence {pi

(k)}Relative Frequency Distribution (rfd)After ri has k posts

{maps, education}

{geography, education}

{3dmax}

Tag

Frequency

Relative Frequency

Maps

1

0.2

Geography

1

0.2

Education

2

0.4

3dmax

1

0.2Slide16

Quality Model: Tagging Stability

16

Stability of rfdAverage Similarity between ω rfds’, i.e., (k-

ω+1)-th, …, k-th rfd

Stable

point

Threshold

Stable rfd Slide17

Quality

17For one resource ri with k postsSimilarity between its current rfd and its stable

rfdFor a set of resources RAverage quality of all the resourcesSlide18

Incentive-Based Tagging

18Input A set of resourcesInitial postsBudgetOutputIncentive assignment

how many new posts should ri get ObjectiveMaximize quality

r

1

r

2

r

3

Current

Time

time

time

timeSlide19

Incentive-Based Tagging (cont’d)

19Optimal SolutionDynamic ProgrammingBest Quality ImprovementAssumption: know the stable rfd & posts in the future

r

1

r

2

r

3

time

time

time

Current

TimeSlide20

Strategy Framework

20Slide21

Implementing CHOOSE()

21Free Choice (FC)Users freely decide which resource they want to tag.

Round Robin (RR)The resources have even chance to get posts. Slide22

Implementing CHOOSE()

22Fewest Post First (FP)Prioritize Under-Tagged ResourcesMost

Unstable First (MU)Resources with unstable rfds’ need more postsWindow sizeHybrid (FP-MU)

r

1

r

2

r

3

time

time

timeSlide23

Outline

23IntroductionProblem Definition & SolutionExperimentsConclusion & Future WorkSlide24

Setup

24Delicious dataset during year 20075000 resourcesPassed their stable pointKnow the entire post sequenceSimulation from

Feb. 1 2007148,471 Posts in total7% passed stable point25% under-tagged (# of Posts < 10)

r

1

r

2

r

3

time

time

time

Simulation StartSlide25

Quality vs. Budget

25FP & FP-MU are close to optimalFC does NOT increase the quality Budget = 1,0000.7% more posts comparing with initial no.

6.7% quality improvementMake all resources reach stable pointFC: over 2 million more postsFP & FP-MU: 90% saved Slide26

Over-Tagging

26Free Choice: 50% posts are over-tagging, wastedFP, MU and FP-MU:

0%Slide27

Top-10 Similar Sites (Cont’d)

27On Feb. 1 2007www.myphysicslab.com3 postsTop-10 all java related

10,000 more posts by FCget 4 more posts4/10 physics related Slide28

Top-10 Similar Sites (Cont’d)

28On Dec. 31 2007

270 PostsTop-10 all physics relatedPerfect Result10,000 more posts by FPget 11 more postsTop 9 physics related9 included in Perfect ResultTop 6 same order with Perfect ResultSlide29

Conclusion

29Define Tag Data QualityProblem of Incentive-Based TaggingEffective SolutionsImprove Data QualityImprove Quality of Application ResultsE.g. Top-k searchSlide30

Future Work

30Different costs of tagging operationUser preference in allocation processSystem developmentSlide31

References

31[1] Optimizing web search using social annotations. S. Bao et al. WWW’07[2] Can social bookmarking improve web search? P. Heymann et al. WSDM’08

[3] Structured approach to query recommendation with social annotation data. J. Guo CIKM’10[4] Clustering the tagged web. D. Ramage et al. WSDM’09 [5] Exploring the value of folksonomies for creating semantic metadata. H. S. Al-Khalifa IJWSIS’07[6] Analyzing Social Bookmarking Systems: A del.icio.us Cookbook. ECAI Mining Social Data Workshop. 2008[7] Social Tag Prediction. P. Heymann

, SIGIR’08[8] Latent Dirichlet Allocation for Tag Recommendation, R. Krestel, RecSys’09[9] Learning Optimal Ranking with Tensor Factorization for Tag Recommendation, S.

Rendle

, KDD’09

[10] Explore or Exploit? Effective Strategies for Disambiguating Large Databases.  R. Cheng VLDB’10Slide32

Thank you!

Contact Info: Xuan Shawn Yang University of Hong Kong xyang2@cs.hku.hk http://www.cs.hku.hk/~xyang2

32Slide33

Effectiveness of Quality Metric (Backup)

33All-Pair SimilarityRepresent each resource by their tagsCalculate the similarity between all pairs of resources

Compare the similarity result with gold standardSlide34

Under-Tagged Resources (Backup)

34Slide35

Other Top-10 Similar Sites (Backup)

35Slide36

Problem of Collaborative Tagging (Backup)

36Most posts are given to small number of highly popular resourcesdataset from delicious.com

All 30m urls39% posts vs. top 1% urlsOver 10m urls are just tagged onceSelected 5000 resources

High Quality Resources7% passed stable points50% over-tagging posts

25%

under-tagged (<

10

posts)Slide37

Tagging Stability (Backup)

37ExampleWindow size ThresholdStable Point: 100Stable rfd:

Shom More....
By: celsa-spraggs
Views: 67
Type: Public

Download Section

Please download the presentation from below link :


Download Presentation - The PPT/PDF document "On Incentive-Based Tagging" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Try DocSlides online tool for compressing your PDF Files Try Now