/
Ensuring quality in  crowdsourced Ensuring quality in  crowdsourced

Ensuring quality in crowdsourced - PowerPoint Presentation

patricia
patricia . @patricia
Follow
66 views
Uploaded On 2023-09-19

Ensuring quality in crowdsourced - PPT Presentation

search relevance evaluation The effects of training question distribution John Le CrowdFlower Andy Edmonds eBay Vaughn Hester CrowdFlower Lukas Biewald CrowdFlower BackgroundMotivation ID: 1017754

worker crowdflower matching training crowdflower worker training matching skew search distribution quality ebay data control results performance majority vote

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Ensuring quality in crowdsourced" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distributionJohn Le - CrowdFlowerAndy Edmonds - eBayVaughn Hester - CrowdFlowerLukas Biewald - CrowdFlower

2. Background/MotivationHuman judgments for search relevance evaluation/trainingQuality Control in crowdsourcingObserved worker regression to the mean over previous months

3.

4. Our Techniques for Quality ControlTraining data = training questionsQuestions to which we know the answerDynamic learning for quality controlAn initial training periodPer HIT screening questions

5.

6. ContributionsQuestions exploredDoes training data setup and distribution affect worker output and final results?Why important?Quality control is paramountQuantifying and understanding the effect of training data

7. The Experiment: AMTUsing Mechanical Turk and the CrowdFlower platform25 results per HIT20 cents per HITNo Turk qualificationsTitle: “Judge approximately 25 search results for relevance”

8. Judgment DatasetDataset: major online retailer’s internal product search projects256 queries with 5 product pairs associated with each query = 1280 search resultsExamples:“epiphone guitar”, “sofa,” and “yamaha a100.”

9. Experimental ManipulationExperiment12345Matching72.7%58%45.3%34.7%12.7%Not Matching8%23.3%47.3%56%84%Off Topic19.3%18%7.3%9.3%3.3%Spam0%0.7%0%0.7%0%Judge Training Question Answer Distribution SkewsMatchingNot MatchingOff TopicSpam14.5%82.67%2.5%0.33%Underlying Distribution Skew

10. Experimental ControlRound-robin workers into the simultaneously running experimentsNote only one HIT showed up on TurkWorkers were sent to the same experiment if they left and returned

11. ResultsWorker participationMean worker performanceAggregate majority vote AccuracyPerformance measures: precision and recall

12. Worker ParticipationExperiment12345Came to the Task4342428741Did Training2625275021Passed Training1918253717Failed Training772134Percent Passed73%72%92.6%74%80.9%Matching skewNot Matching skew

13. Mean Worker PerformanceWorker \ Experiment12345Accuracy (Overall)0.6900.7080.7490.7630.790Precision (Not Matching)0.9090.8950.9300.9170.915Recall (Not Matching)0.7040.7140.7740.8000.828Matching skewNot Matching skew

14. Aggregate Majority Vote Accuracy: Trusted Workers12345Underlying Distribution Skew

15. Aggregate Majority Vote Performance MeasuresExperiment12345Precision0.9210.9320.9360.9320.912Recall0.8650.9170.9190.8630.921Matching skewNot Matching skew

16. Discussion and LimitationsMaximize entropy -> minimize perceptible signalFor a skewed underlying distribution

17. Future WorkOptimal judgment task design and metricsQuality control enhancementsSeparate validation and ongoing trainingLong term worker performance optimizationsIncorporation of active learningIR performance metric analysis

18. Acknowledgements We thank Riddick Jiang for compiling the dataset for this project. We thank Brian Johnson (eBay), James Rubinstein (eBay), Aaron Shaw (Berkeley), Alex Sorokin (CrowdFlower), Chris Van Pelt (CrowdFlower) and Meili Zhong (PayPal) for their assistance with the paper.

19. Questions?john@crowdflower.comaedmonds@ebay.comvaughn@crowdflower.comlukas@crowdflower.comThanks!