/
Big Data, Small Data and Data Visualisation via Big Data, Small Data and Data Visualisation via

Big Data, Small Data and Data Visualisation via - PowerPoint Presentation

sherrill-nordquist
sherrill-nordquist . @sherrill-nordquist
Follow
369 views
Uploaded On 2018-02-26

Big Data, Small Data and Data Visualisation via - PPT Presentation

Sentiment Analysis with HDInsight James Beresford Gavin RussellRockliff Group Managers Avanade DBI222 Who am I and what am I doing here James Over a decade of MS BI amp DW Experience ID: 636474

sentiment data hadoop hdinsight data sentiment hdinsight hadoop analysis hands big streaming azure 101 bob microsoft 276 web pig

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Big Data, Small Data and Data Visualisat..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Big Data, Small Data and Data Visualisation via Sentiment Analysis with HDInsight

James BeresfordGavin Russell-RockliffGroup Managers, Avanade

DBI222Slide2

Who am I?... and what am I doing here?

JamesOver a decade of MS BI & DW ExperienceHands on Big Data experience

Blogger at

www.bimonkey.com

Tweeter @

BI_Monkey

Gavin

Another

decade of MS BI & DW

Experience

Focus on Analysis Services &

Visualisation

Leads Avanade’s Australia BI Practice

Tweeter

@

gavinrrSlide3

Sentiment Analysis with HDInsightBig Data, Small Data and Data Visualisation

Session question:

How can I understand what my customers are saying and thinking?Slide4

Sentiment Analysis with HDInsightBig Data, Small Data and Data Visualisation

Outcomes for you:

Understand

Sentiment

Analysis

Get familiar with the basics of Big Data

Know some key Hadoop terms and tools

Know how to connect Big Data with Microsoft BI

2%Slide5

Sentiment Analysis 101Slide6

Sentiment Analysis 101

An overview of Sentiment Analysis

Sentiment Analysis is the process of understanding the emotional content of textSlide7

Sentiment Analysis 101

Take Free Form text

I had a fantastic time on holiday at your resort. The service was excellent and friendly. My family all really enjoyed themselves.

The pool was closed, which kind of sucked though.

Hotel FeedbackSlide8

Sentiment Analysis 101

Take a list of positive and negative words

Positive

Good

Great

Fantastic

Excellent

Friendly

Awesome

Enjoyed

Negative

Bad

Worse

Rubbish

Sucked

Awful

Terrible

BogusSlide9

Sentiment Analysis 101

Match the two

I had a

fantastic

time on holiday at your resort. The service was

excellent

and

friendly

. My family all really

enjoyed

themselves.

The pool was closed, which kind of

sucked

though.

Hotel FeedbackSlide10

Sentiment Analysis 101

Count them

Positive

Fantastic

Excellent

Friendly

Enjoyed

Negative

Sucked

4

1Slide11

Sentiment Analysis 101

Subtract

negative from positive

4

1

-

=

3

Overall sentiment:

PositiveSlide12

Sentiment Analysis 101

Sentiment Analysis gets a lot more complicated…Slide13

Big Data 101Slide14

Big Data 101

What is Big Data?

It is a

new set of approaches

for analysing data sets that were

not previously

accessible because they posed challenges across one

or more

of the “3 V’s” of Big

Data

Volume

- too

Big – Terabytes and more of Credit Card Transactions, Web Usage data, System logs

Variety

- too Complex – truly unstructured data such as Social Media, Customer Reviews, Call

Center

Records

Velocity - too Fast - Sensor data, live web traffic, Mobile Phone usage, GPS DataSlide15

File

Big Data

101

Hadoop

is just a File System - HDFS

Read Optimised & Failure Tolerant

Replicated 3 timesSlide16

REDUCE

MAP

Big Data

101

Map

+ Reduce =

Extract, Load

+

Transform

Raw Data

Raw Data

Raw Data

Raw Data

Mapper

Mapper

Mapper

Mapper

Data

Data

Data

Data

Reducer

OutputSlide17

HDInsight hands onSlide18

HDInsight hands on

What we will cover in the Technical Session

Creating an Azure HDInsight cluster

Loading

data into

Azure Blob Storage

C#

Streaming

Pig to enrichHive to outputODBC

to PowerPivot

PowerPivot

to

PowerViewSlide19

HDInsight hands on

Creating an Azure HDInsight clusterHDInsight is available as an Azure service

Launched from www.windowsazure.comSlide20

Video removed for web optimisation Slide21

HDInsight hands onLoading data into

Azure Blob Storage

HDInsight can directly reference Azure Blob Storage

No need to load into HDFS

Data can be loaded using a variety of utilitiesSlide22

Video removed for web optimisation Slide23

HDInsight hands on

Checking Azure loaded data via HDInsightHDInsight can view Azure Blob Storage similarly to data on HDFS

Can check data using Hadoop File System command lineSlide24

Video removed for web optimisation Slide25

HDInsight hands onHadoop and C#

Java is the preferred language for working with Hadoop

C#

Api’s

are being developed as part of the MS Implementation Slide26

HDInsight hands on

Hadoop Streaming with C#Hadoop Streaming allows line by line processing of text data in a Mapper process

Any programming language can be used, including C#

You can also use C# to build your reducersSlide27

HDInsight hands onHadoop Streaming with C#

BOB wrote:

>>

We all do. It takes time. Some people never do catch up

.

Another

one of your

favorite

answers. Vague, meaningless,

and unfathomable.

---

END.OF.DOCUMENT-

--Slide28

HDInsight hands on

Hadoop Streaming with C#Here we will see the code that splits out the words in the file for further processing, with metadataSlide29

Hadoop Streaming with C#

Set a list of common words to ignore in output Slide30

Hadoop Streaming with C#Pick up some file metadata from Hadoop Slide31

Hadoop Streaming with C#Read file line by lineSlide32

Hadoop Streaming with C#Output Mapped contentSlide33

HDInsight hands onHadoop Streaming with C#

Building the jobExecuting

No need to unzip dataSlide34

HDInsight hands onHadoop Streaming with C#

C:\apps\dist\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd

jar

C:\apps\dist\hadoop-1.1.0-SNAPSHOT\lib\hadoop-streaming.jar

"-

D

mapred.output.compress

=true"

"-

D

mapred.output.compression.codec

=

org.apache.hadoop.io.compress.GzipCodec

"

-

files "

asv

://

container@storage/user/hadoop/code/Sentiment_v2.exe" -numReduceTasks 0 -mapper "Sentiment_v2.exe" -

input "

asv

://

container@storage

.blob.core.windows.net/user/

hadoop

/data

/"

-output "asv://

container@storage

.blob.core.windows.net/user/

hadoop

/output/Sentiment

"Slide35

Video removed for web optimisation Slide36

HDInsight hands onHadoop Streaming with C#

276.0|5|bob|government

276.0|5|bob|telling

276.0|5|bob|opposed

276.0|5|bob|liberty

276.0|5|bob|obviously

276.0|5|bob|fail

276.0|5|bob|comprehend

276.0|5|bob|qualifier

276.0|5|bob|legalized

276.0|5|bob|curtis

Timestamp

File Chunk

Message Id

Author

WordSlide37

HDInsight hands onUsing Pig to Enrich the data

Pig is a query language which shares

some

concepts with SQL

Invoked from the Hadoop command shell

No GUI

Does not do any work until it has to output a resultsetUnder the hood executes Map/reduce

jobsSlide38

HDInsight hands onUsing Pig to Enrich the data with Sentiment scores

Load sentiment word lists and assign scores

Loading the data

Preprocess

to get some key fields

Count words in various contexts and add sentiment value

Dump results to Azure Blob StorageSlide39

Video removed for web optimisation Slide40

Using Pig to Enrich the data

Code sample: LOAD Operation

data_raw

=

LOAD ‘<filename>'

USING

PigStorage

('|')

AS

(

filename:chararray,message_id:chararray,author_id:chararray,word:chararray);Slide41

Using Pig to Enrich the data

Code sample: JOIN Statement

words_count_sentiment

=

JOIN

words_count_flat

BY

words LEFT,

sentiment

BY

sentiment_word

;Slide42

Using Pig to Enrich the data

Code sample: SUM Operation

message_sum_sentiment

=

FOREACH

messages_grouped

GENERATE

group

AS

message_details

,

SUM(

messages_joined.sentiment_value

) AS sentiment;Slide43

HDInsight hands onOutputting results to Hive

Hive is a near SQL compliant

language with a lot of similarities

Again, under the hood issues MapReduce queries

Exposed

to ODBCSlide44

HDInsight hands onOutputting results to Hive

Create some Hive tables to reference the Pig Output

Use the Interactive consoleSlide45

Video removed for web optimisation Slide46

Outputting data to Hive

Code review: CREATE EXTERNAL TABLE

CREATE EXTERNAL TABLE words

( word

STRING,

counts

INT,

sentiment INT )

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '124'

STORED AS TEXTFILE

LOCATION '

asv

://

westburycorpus@westburycorpusnoreur.blob.core.windows.net/user/

hadoop

/

pig_out

/words

';Slide47

Visualising ResultsSlide48

Simple Sentiment Analysis Using HDInsight

Using a PowerPivot Model to prepare the data for AnalysisHive to

PowerPivotSlide49

Video removed for web optimisation Slide50

Simple Sentiment Analysis Using HDInsight

Using PowerView to visualise the resultsPowerPivot to PowerViewSlide51

Simple Sentiment Analysis Using HDInsight

Using PowerView to visualise the resultsExcel DemoSlide52

Sentiment Analysis with HDInsightBig Data, Small Data and Data Visualisation

Goals of session:

Understand what Sentiment Analysis involves

Help you understand

what Big Data actually is

Know what some of the core technologies do

See

how Big Data fits into a the traditional BI worldSlide53

Sentiment Analysis with HDInsightBig Data, Small Data and Data Visualisation

Session question:

How can I understand what my customers are saying and thinking?Slide54

Developer Network

Resources for Developers

http://

msdn.microsoft.com

/en-au/

Learning

Virtual Academy

http://www.microsoftvirtualacademy.com/

TechNet

Resources

Sessions on Demand

http://channel9.msdn.com/Events/

TechEd

/Australia/2013

Resources for IT Professionals

http://technet.microsoft.com/en-au/Slide55

Track Resources

Download the CTP for SQL Server 2014 and accelerate your queries using In-Memory OLTP

-

http://

technet.microsoft.com/en-us/evalcenter/dn205290.aspx

Get into the cloud with an Azure account - use SQL database in Windows Azure or take your workload into Azure

VM -

www.windowsazure.com

Get big with big data –

HDInsight

on Azure and grab the latest Power BI

features

http

://www.windowsazure.com/en-us/documentation/services/hdinsight/?

fb=en-us

Power BI

-

www.powerbi.comSlide56

© 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.