Sentiment Analysis with HDInsight James Beresford Gavin RussellRockliff Group Managers Avanade DBI222 Who am I and what am I doing here James Over a decade of MS BI amp DW Experience ID: 636474
Download Presentation The PPT/PDF document "Big Data, Small Data and Data Visualisat..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Big Data, Small Data and Data Visualisation via Sentiment Analysis with HDInsight
James BeresfordGavin Russell-RockliffGroup Managers, Avanade
DBI222Slide2
Who am I?... and what am I doing here?
JamesOver a decade of MS BI & DW ExperienceHands on Big Data experience
Blogger at
www.bimonkey.com
Tweeter @
BI_Monkey
Gavin
Another
decade of MS BI & DW
Experience
Focus on Analysis Services &
Visualisation
Leads Avanade’s Australia BI Practice
Tweeter
@
gavinrrSlide3
Sentiment Analysis with HDInsightBig Data, Small Data and Data Visualisation
Session question:
How can I understand what my customers are saying and thinking?Slide4
Sentiment Analysis with HDInsightBig Data, Small Data and Data Visualisation
Outcomes for you:
Understand
Sentiment
Analysis
Get familiar with the basics of Big Data
Know some key Hadoop terms and tools
Know how to connect Big Data with Microsoft BI
2%Slide5
Sentiment Analysis 101Slide6
Sentiment Analysis 101
An overview of Sentiment Analysis
Sentiment Analysis is the process of understanding the emotional content of textSlide7
Sentiment Analysis 101
Take Free Form text
I had a fantastic time on holiday at your resort. The service was excellent and friendly. My family all really enjoyed themselves.
The pool was closed, which kind of sucked though.
Hotel FeedbackSlide8
Sentiment Analysis 101
Take a list of positive and negative words
Positive
Good
Great
Fantastic
Excellent
Friendly
Awesome
Enjoyed
Negative
Bad
Worse
Rubbish
Sucked
Awful
Terrible
BogusSlide9
Sentiment Analysis 101
Match the two
I had a
fantastic
time on holiday at your resort. The service was
excellent
and
friendly
. My family all really
enjoyed
themselves.
The pool was closed, which kind of
sucked
though.
Hotel FeedbackSlide10
Sentiment Analysis 101
Count them
Positive
Fantastic
Excellent
Friendly
Enjoyed
Negative
Sucked
4
1Slide11
Sentiment Analysis 101
Subtract
negative from positive
4
1
-
=
3
Overall sentiment:
PositiveSlide12
Sentiment Analysis 101
Sentiment Analysis gets a lot more complicated…Slide13
Big Data 101Slide14
Big Data 101
What is Big Data?
It is a
new set of approaches
for analysing data sets that were
not previously
accessible because they posed challenges across one
or more
of the “3 V’s” of Big
Data
Volume
- too
Big – Terabytes and more of Credit Card Transactions, Web Usage data, System logs
Variety
- too Complex – truly unstructured data such as Social Media, Customer Reviews, Call
Center
Records
Velocity - too Fast - Sensor data, live web traffic, Mobile Phone usage, GPS DataSlide15
File
Big Data
101
Hadoop
is just a File System - HDFS
Read Optimised & Failure Tolerant
Replicated 3 timesSlide16
REDUCE
MAP
Big Data
101
Map
+ Reduce =
Extract, Load
+
Transform
Raw Data
Raw Data
Raw Data
Raw Data
Mapper
Mapper
Mapper
Mapper
Data
Data
Data
Data
Reducer
OutputSlide17
HDInsight hands onSlide18
HDInsight hands on
What we will cover in the Technical Session
Creating an Azure HDInsight cluster
Loading
data into
Azure Blob Storage
C#
Streaming
Pig to enrichHive to outputODBC
to PowerPivot
PowerPivot
to
PowerViewSlide19
HDInsight hands on
Creating an Azure HDInsight clusterHDInsight is available as an Azure service
Launched from www.windowsazure.comSlide20
Video removed for web optimisation Slide21
HDInsight hands onLoading data into
Azure Blob Storage
HDInsight can directly reference Azure Blob Storage
No need to load into HDFS
Data can be loaded using a variety of utilitiesSlide22
Video removed for web optimisation Slide23
HDInsight hands on
Checking Azure loaded data via HDInsightHDInsight can view Azure Blob Storage similarly to data on HDFS
Can check data using Hadoop File System command lineSlide24
Video removed for web optimisation Slide25
HDInsight hands onHadoop and C#
Java is the preferred language for working with Hadoop
C#
Api’s
are being developed as part of the MS Implementation Slide26
HDInsight hands on
Hadoop Streaming with C#Hadoop Streaming allows line by line processing of text data in a Mapper process
Any programming language can be used, including C#
You can also use C# to build your reducersSlide27
HDInsight hands onHadoop Streaming with C#
BOB wrote:
>>
We all do. It takes time. Some people never do catch up
.
Another
one of your
favorite
answers. Vague, meaningless,
and unfathomable.
---
END.OF.DOCUMENT-
--Slide28
HDInsight hands on
Hadoop Streaming with C#Here we will see the code that splits out the words in the file for further processing, with metadataSlide29
Hadoop Streaming with C#
Set a list of common words to ignore in output Slide30
Hadoop Streaming with C#Pick up some file metadata from Hadoop Slide31
Hadoop Streaming with C#Read file line by lineSlide32
Hadoop Streaming with C#Output Mapped contentSlide33
HDInsight hands onHadoop Streaming with C#
Building the jobExecuting
No need to unzip dataSlide34
HDInsight hands onHadoop Streaming with C#
C:\apps\dist\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd
jar
C:\apps\dist\hadoop-1.1.0-SNAPSHOT\lib\hadoop-streaming.jar
"-
D
mapred.output.compress
=true"
"-
D
mapred.output.compression.codec
=
org.apache.hadoop.io.compress.GzipCodec
"
-
files "
asv
://
container@storage/user/hadoop/code/Sentiment_v2.exe" -numReduceTasks 0 -mapper "Sentiment_v2.exe" -
input "
asv
://
container@storage
.blob.core.windows.net/user/
hadoop
/data
/"
-output "asv://
container@storage
.blob.core.windows.net/user/
hadoop
/output/Sentiment
"Slide35
Video removed for web optimisation Slide36
HDInsight hands onHadoop Streaming with C#
276.0|5|bob|government
276.0|5|bob|telling
276.0|5|bob|opposed
276.0|5|bob|liberty
276.0|5|bob|obviously
276.0|5|bob|fail
276.0|5|bob|comprehend
276.0|5|bob|qualifier
276.0|5|bob|legalized
276.0|5|bob|curtis
Timestamp
File Chunk
Message Id
Author
WordSlide37
HDInsight hands onUsing Pig to Enrich the data
Pig is a query language which shares
some
concepts with SQL
Invoked from the Hadoop command shell
No GUI
Does not do any work until it has to output a resultsetUnder the hood executes Map/reduce
jobsSlide38
HDInsight hands onUsing Pig to Enrich the data with Sentiment scores
Load sentiment word lists and assign scores
Loading the data
Preprocess
to get some key fields
Count words in various contexts and add sentiment value
Dump results to Azure Blob StorageSlide39
Video removed for web optimisation Slide40
Using Pig to Enrich the data
Code sample: LOAD Operation
data_raw
=
LOAD ‘<filename>'
USING
PigStorage
('|')
AS
(
filename:chararray,message_id:chararray,author_id:chararray,word:chararray);Slide41
Using Pig to Enrich the data
Code sample: JOIN Statement
words_count_sentiment
=
JOIN
words_count_flat
BY
words LEFT,
sentiment
BY
sentiment_word
;Slide42
Using Pig to Enrich the data
Code sample: SUM Operation
message_sum_sentiment
=
FOREACH
messages_grouped
GENERATE
group
AS
message_details
,
SUM(
messages_joined.sentiment_value
) AS sentiment;Slide43
HDInsight hands onOutputting results to Hive
Hive is a near SQL compliant
language with a lot of similarities
Again, under the hood issues MapReduce queries
Exposed
to ODBCSlide44
HDInsight hands onOutputting results to Hive
Create some Hive tables to reference the Pig Output
Use the Interactive consoleSlide45
Video removed for web optimisation Slide46
Outputting data to Hive
Code review: CREATE EXTERNAL TABLE
CREATE EXTERNAL TABLE words
( word
STRING,
counts
INT,
sentiment INT )
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '124'
STORED AS TEXTFILE
LOCATION '
asv
://
westburycorpus@westburycorpusnoreur.blob.core.windows.net/user/
hadoop
/
pig_out
/words
';Slide47
Visualising ResultsSlide48
Simple Sentiment Analysis Using HDInsight
Using a PowerPivot Model to prepare the data for AnalysisHive to
PowerPivotSlide49
Video removed for web optimisation Slide50
Simple Sentiment Analysis Using HDInsight
Using PowerView to visualise the resultsPowerPivot to PowerViewSlide51
Simple Sentiment Analysis Using HDInsight
Using PowerView to visualise the resultsExcel DemoSlide52
Sentiment Analysis with HDInsightBig Data, Small Data and Data Visualisation
Goals of session:
Understand what Sentiment Analysis involves
Help you understand
what Big Data actually is
Know what some of the core technologies do
See
how Big Data fits into a the traditional BI worldSlide53
Sentiment Analysis with HDInsightBig Data, Small Data and Data Visualisation
Session question:
How can I understand what my customers are saying and thinking?Slide54
Developer Network
Resources for Developers
http://
msdn.microsoft.com
/en-au/
Learning
Virtual Academy
http://www.microsoftvirtualacademy.com/
TechNet
Resources
Sessions on Demand
http://channel9.msdn.com/Events/
TechEd
/Australia/2013
Resources for IT Professionals
http://technet.microsoft.com/en-au/Slide55
Track Resources
Download the CTP for SQL Server 2014 and accelerate your queries using In-Memory OLTP
-
http://
technet.microsoft.com/en-us/evalcenter/dn205290.aspx
Get into the cloud with an Azure account - use SQL database in Windows Azure or take your workload into Azure
VM -
www.windowsazure.com
Get big with big data –
HDInsight
on Azure and grab the latest Power BI
features
http
://www.windowsazure.com/en-us/documentation/services/hdinsight/?
fb=en-us
Power BI
-
www.powerbi.comSlide56
© 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.