/
Miscellaneous Midterm project review Miscellaneous Midterm project review

Miscellaneous Midterm project review - PowerPoint Presentation

debby-jeon
debby-jeon . @debby-jeon
Follow
369 views
Uploaded On 2018-02-16

Miscellaneous Midterm project review - PPT Presentation

Due in two weeks Instructions will be sent out by the weekend Will be graded unlike the proposal Profiler Summary of Paper A d ata cleaning browsing and visualization tool Builds on wranglerpotters wheel ideas ID: 631886

paper user visualizations data user paper data visualizations lots assumptions notions browsing

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Miscellaneous Midterm project review" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

MiscellaneousSlide2

Midterm project review

Due in two weeks

Instructions will be sent out by the weekend

Will be graded unlike the proposalSlide3

ProfilerSlide4

Summary of Paper

A “

d

ata cleaning” browsing and visualization tool

Builds on wrangler/potter’s wheel ideas

Adds a few notions of its own:

Recommending visualizations that highlight anomalies

Linked visualizations to see how value dependencies manifest across visualizationsSlide5

Assumptions

What are the assumptions made by the paper?Slide6

Assumptions

Single table:

Foreign key dependencies missed …

No data integration

Univariate

outliers (typically)

Fits in main memory

No entire row

deduplicationSlide7

Lots more to do…

What are the future directions from this paper?Slide8

Lots more to do…

What are the future directions from this paper?

Lots of user options: how does the user make sense of them?

What does the user do after browsing?

When does the user stop?Slide9

Other Open Questions

Recommendation of what to clean first?

Notions of completeness?

Real world statistics on what sorts of anomalies are more present than others?

Fixing errors?Slide10

Mutual Information-based Anomaly

The metric used tries to identify relationship between COUNT(*) GROUP BY X for the anomalous data vs. the other data

What else can you think of?