/
Visualizing Big Data Visualizing Big Data

Visualizing Big Data - PowerPoint Presentation

pamella-moone
pamella-moone . @pamella-moone
Follow
455 views
Uploaded On 2016-07-06

Visualizing Big Data - PPT Presentation

David Schmittdiel CSC 9010003 9162014 Outline Me Big Data review and background Problem statement Case study StubHub Intro I dont have a Computer Science background but I really really regret it ID: 393592

big data case genre data big genre case stubhub study business visualization http stephen time category display visual gcf org intelligence needed

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Visualizing Big Data" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Visualizing Big Data

David Schmittdiel

CSC 9010-003

9/16/2014Slide2

Outline

Me

Big Data review and background

Problem statement

Case study: StubHubSlide3

Intro

I don’t have a Computer Science background (but I really, really regret it)

MATLAB

 PHP

 MySQL  Oracle

Manager of Business Intelligence Development at StubHub

Bringing actionable data to the masses

Self-service, on-demand, exploratory BI

Data discovery through visualization

AutomationSlide4

Big Data, Big Ruse?

Stephen Few: “What the hell is Big Data anyway?”

BI vendor-driven responses:

Increased data volume AND velocity

New data sources (unstructured)

Fundamental question:

Do you really need Big Data?

Until you’ve figured out

how

to use the data that you already have, collecting more will only distract you from the real task.

Time

spent

collecting

more data is time that could be better spent weaving it into something meaningful

.”

Stephen Few,

Perceptual Edge - July/August/September 2012

,

“Big Data, Big Ruse”

http://www.perceptualedge.com/articles/visual_business_intelligence/big_data_big_ruse.pdfSlide5

The Real Task

Transforming raw data into meaningful, useful, actionable information

Leveraging the past to guide future endeavors

Finding the signals amidst the noise

Driving forces:

Scientific research

Business (ecommerce)

Government

Stephen Few: “The success of BI … [is] measured

in our increased ability to

understand

data and then make better decisions based on that understanding

.”Slide6

Visualizing Small Data

MS Excel

Ease of use for tasks involving smaller data sets, limited interactivity

Stephen Few: “building applications on top of Excel can be arduous and painful”

Stephen Few,

Perceptual Edge – September/October 2009

,

Fundamental Differences in Analytical Tools

http://www.perceptualedge.com/articles/visual_business_intelligence/differences_in_analytical_tools.pdfSlide7

Visualizing Small Data

Static dashboards: “custom analytics”

Time-consuming to build but relatively easy to maintain

“Remove … functionality

that isn’t relevant to the analytical

objective of

its

users”Slide8

Unique Challenges

Juliana

Freire

: “Visualization: Big Data Considerations”

Interactivity is key, but challenging for Big Data

Need

better integration between data management

and visualization components

Phil Simon describing Netflix’s data mindset:

Data should be accessible, easy to discover, and easy to process for

everyone

The longer you take to find the data, the less valuable it

becomes

Whether a dataset is large or small, being able to visualize it makes it easier

to explain

Juliana

Freire

,

DIMACS 2013

,

“Big Data Analysis and Integration”

http://dimacs.rutgers.edu/Workshops/BigData/Slides/2013-dimacs.pdf

Phil Simon,

HBR Webinar

,

The Visual

Organization:

Data Visualization, Big Data, and the

Quest for

Better Decisions

http://www.scribd.com/doc/232032215/HBR-Webinar-Summary-The-Visual-OrganizationSlide9

Case Study: StubHub

Using SAP Business Objects (BO) since at least 2008 on top of Oracle 11g DW

Included in the “Leaders” quadrant

of 2014 Gartner report

BO “delivers a broad range of BI and analytic capabilities through a semantic layer best suited for large IT-managed deployments that require robust governance and administrative capabilities”

Customers use “primarily for reporting; the number that use it for interactive discovery or visualization was well below the average”

Gartner,

Magic Quadrant for Business Intelligence and Analytics Platforms

www.gartner.com/technology/reprints.do?id=1-1QLGACN&ct=140210&st=sbSlide10

Case Study: StubHub

Feedback from business users was universally poor

Hard to use

Limited number of (inadequate) visualizations available

Not interactive

Supported by Tech org only

Reporting Team within Analytics org formed in January, 2013

Innovative

Responsive

Promote self-service

Objective vs subjective use of dataSlide11

Case Study: StubHub

General concept: aggregate

any

metrics by

any

breakdown, over

any

time period, filtered for

anything

 Supports “exploratory analytics”: pursue each question as it arises

Settle instead for a collection of dashboards categorized by business use caseSlide12

Case Study: StubHub

First iteration: Dynamic SQL

Complicated rules for commenting based on front-end selections

s

elect

-- DATE:

sp.src_created_dttm_sale

g.genre_cat_final

as "GCF", -- DISPLAY CATEGORY: GCF

g.genre_descr

as "Genre", -- DISPLAY CATEGORY: Genre

sum(

sp.ticket_cost

) as "GTS", -- DATA METRIC: GTS

count(distinct

transaction_id

) as "# Orders", -- DATA METRIC: # Orders

from

owbruntarget_dw.dw_sales_pipeline_fact

sp

join

owbruntarget_dw.dw_genre_dim

g on

sp.genre_dw_id

=

g.genre_dw_id

-- DISPLAY CATEGORY or FILTER: GCF, Genre

where 1=1

-- FILTER:

g.genre_cat_final

for GCF

-- FILTER:

g.genre_descr

for Genre

AND

trunc

(

src_created_dttm_sale

) between :

startdate

and :

enddate

group by

g.genre_cat_final

, -- DISPLAY CATEGORY: GCF

g.genre_descr

, -- DISPLAY CATEGORY: Genre

-- DATEG:

sp.src_created_dttm_sale

''

Proved unworkable because of long query execution times, even after incorporating bind variablesSlide13

Case Study: StubHub

Next iteration: “pandas”

dataframes

Open source Python library for data manipulation and analysis

F

ast and efficient

DataFrame

object for data manipulation with integrated indexing

Tools for reading and writing data between in-memory data structures and different formats (e.g. CSV)

For each dashboard, one static query

Tuning + Oracle query optimizer

Retrieve comprehensive data set needed to power the dashboard

Store data in CSV files on network

“Jukebox” functionality: only files needed are loaded into memory for processing

Pandas:

http://pandas.pydata.org/pandas-docs/stable/index.htmlSlide14

Case Study: StubHub

Results:

Huge decrease in dashboard run times

Corresponding increase in adoption rateSlide15

Case Study: StubHub

Where does the interactivity necessary for data discovery come from?

Template-based front end built with PHP + HTML + CSS + jQuery

Provide different levels of granularity

Decreases amount of time needed to create a new dashboard (vs. Tableau)

Menus control requests for:

Categories

group by

Metrics

aggregate functions

Filters

 where clause

Date range

Chart types, date aggregationSlide16

Case Study: StubHub

How to provide i

ntegration between back-end data management and front-end visualization components?

Solution is Data-Driven Documents (D3.js)

JavaScript library to drive the creation and control of dynamic and interactive graphical forms which run in web browsers

W3C-compliant, making use of the widely implemented Scalable Vector Graphics (SVG), JavaScript, HTML5, and Cascading Style Sheets (CSS3) standards

Large data sets can be easily bound to SVG objects using JSON and simple D3 functions to generate charts and diagrams

D3:

http://d3js.org/Slide17

Case Study: StubHub

Summary of approach

Create a collection of BI dashboards that are:

Fast

Customizable

Interactive

Highly visual

On-demand

Scalable

Consistent

Custom build EVERYTHING as needed

Leverage open source technologies whenever possible

Data source agnostic to accommodate new data stores as they become available

 Output from

MapReduce

jobs in CSV format