/
Harnessing the Power of Hadoop: Cloud Scale with Microsoft Harnessing the Power of Hadoop: Cloud Scale with Microsoft

Harnessing the Power of Hadoop: Cloud Scale with Microsoft - PowerPoint Presentation

ellena-manuel
ellena-manuel . @ellena-manuel
Follow
407 views
Uploaded On 2017-05-01

Harnessing the Power of Hadoop: Cloud Scale with Microsoft - PPT Presentation

HDInsight Lance Olson Partner Group Program Manager BRK2557 Big data and traditional data warehouse Big data in the cloud Cloud versus onpremises Patterns and case studies HDInsight workloads ID: 543423

azure data hadoop hdinsight data azure hdinsight hadoop analytics hive time cloud microsoft event web etl happen sql premises

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Harnessing the Power of Hadoop: Cloud Sc..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1
Slide2

Harnessing the Power of Hadoop: Cloud Scale with Microsoft Azure HDInsight

Lance OlsonPartner Group Program Manager

BRK2557Slide3

Big data and traditional data warehouseBig data in the cloud

Cloud versus on-premisesPatterns and case studiesHDInsight workloads

AgendaSlide4

Big Data vs Traditional DWSlide5

Bottom-Up

(Inductive)

Observation

Pattern

Theory

Hypothesis

What will

happen?

How can we

make it happen?

Predictive Analytics

Prescriptive Analytics

DIFFICULTY

What

happened?

Why did

it happen?

Descriptive Analytics

INFORMATION

Diagnostic Analytics

VALUE

OPTIMIZATION

Top-Down

(Deductive)

Confirmation

Theory

Hypothesis

Observation

Two Approaches to Information

M

anagement for Analytics: Top-Down + Bottom-UpSlide6

Implement

Data

Warehouse

Physical Design

ETL Development

Reporting & Analytics Development

Install and Tune

Reporting &

Analytics Design

Dimension Modelling

ETL Design

Setup Infrastructure

Understand Corporate Strategy

Data Warehousing Uses A Top-Down Approach

Data

sources

OLTP

ERP

CRM

LOB

ETL

BI and

analytic

Dashboards

Reporting

Data warehouse

Gather Requirements

Business

Requirements

Technical

RequirementsSlide7

The “data lake” Uses A Bottom-Up

Approach

Ingest all data

regardless of requirements

Store

all data

in

native

format without schema definition

Do analysis

Using analytic engines like Hadoop

Interactive queries

Batch queries

Machine Learning

Data warehouse

Real-time analytics

Devices

Relational

Sensors

Video

LOB applications

Web

Social

Clickstream

Devices

Relational

Sensors

Video

LOB applications

Web

Social

ClickstreamSlide8

Data Lake + Data Warehouse Better Together

What happened?

What is happening?

Why did it happen?

What are key relationships?

What will happen?

What if?How risky is it?What should happen?What is the best option?

How can I optimize?

Data

sourcesOLTP

ERP

CRM

LOB

ETL

BI and

analytic

Dashboards

Reporting

Data warehouse

Devices

Relational

Sensors

Video

LOB applications

Web

Social

ClickstreamSlide9

Big Data in the CloudSlide10

Why Cloud + Big Data?

Massive Compute and Storage

Deployment expertise

Data of

all Volume

Variety

,

Velocity

Speed

Scale

Economics

Always Up,

Always On

Open and flexible

Time to valueSlide11

Why Microsoft Azure?

On-premises Servers

S

oftware

Appliances

Azure Facts

>4 trillion objects in Azure

300,000-1M+ requests per second

Double compute and storage every 6 months

Azure

Storage

HDInsight

Data Factory

ML

Stream Analytics

Database

DocumentDB

Search

Event HubsSlide12

Microsoft’s cloud Hadoop offering

100% open source Apache Hadoop

Built on the latest releases across Hadoop (2.6)

Up and running in minutes with no hardware to deploy

Harness existing .NET and Java

skills

Utilize familiar BI tools for analysis including Microsoft Excel

Introducing Azure HDInsightSlide13

Hadoop Is Being Run Everywhere in the WorldSlide14

Cloud and On-Premises “vs” or “+”?Slide15

Cloud + On-Premises Hybrid Scenarios

On-Premises

Development, Testing, & Pilot

IoT

Applications

Other Azure Services such as BI / MLSlide16

Use Cases: Let the data decide

Use Cases

Where?

Active Archive /

Compliance Reporting

Restricted data

= “down here”. “Up there” could be considered for other scenarios.ETL / Data Warehouse OptimizationOften has “down here” gravity, but cloud-based ETL offload has big payout

Smart Meter AnalysisTypically born “up there”

Single View of CustomerMay have heavy “down here” gravity; unless you’re using SaaS apps, then why not “up there”?

New Data for Product ManagementRestricted data = “down here”. “Up there” could be considered for many scenarios.

Vehicle Data for Transportation/LogisticsWhy not “up there”?Vehicle Data for Insurance

May have heavy “down here” gravity (ex. join w/risk data, etc.)Slide17

Use Cases: Patterns and Case StudiesSlide18

Rockwell Automation is partnered with one of the six oil and gas super majors to build unmanned internet-connected gas dispensers. Each dispenser emits real-time management metrics allowing them to detect anomalies and predict when proactive maintenance needs to occur.

Store sensor data every 5 minutes

Temperature, pressure, vibration, etc

.

Tens of thousands of data points / second

Data

Factory

Azure Blobs

Azure HDInsight

Hive, Pig,

Azure SQL DB

Power BI for O365

Mobile Notification Hub

Mobile Device

Real-time notificationSlide19

JustGiving wanted to harness the power of their data by using network science to map people’s connections and relationships so that they could connect people with the causes they care about. Based on 15

years

of data, the

JustGiving

GiveGraph

is the world’s largest

ecosystem of givingbehavior. It contains more than 81 million person nodes, thousands of causes and 285 million connections and is the engine that drives JustGiving’s

social platform, enabling levels of personalization and engagement that a traditional infrastructure would be unable to deliver.

SQL Server

On-premises

Agent

Azure Blobs

Azure HDInsight

Give

Graph

Azure Tables

Web API

Website +

Event store

Service Bus

Real-time Event

Serves results

Azure Cache

Activity

FeedsSlide20

Common Hadoop Patterns

Single view of entityCustomer, Product, Machine, etc.Predictive Analytics

Data Scientists and Analysts finding patterns and correlations

New models emerge to explain business performance

New predictions emerge based on previously disassociated data

Data Discovery

Large amounts of machine, sensor, clickstream, and geolocation dataNew value emerges when correlated with data from product, customer, and inventory catalogsUse CasesAd Placement and OffersActive Archive

ETL OffloadSingle View of Customer

Recommendation EngineCustomer Targeting and Acquisition

New Data for Product ManagementVehicle Data

Web Personalization and ExperienceSlide21

HDInsight WorkloadsSlide22

HDInsight Supports Hive

Microsoft contribution to Apache code

Hadoop 2.0

1400s

44.3s

35.1s

Sample Query

Hive 10

HDP 1.3 /

Hive 11

HDP 2.0

32x Speedup

40X

Speedup

SQL-like queries on Hadoop data in

HDInsight

HDInsight

provides easy-to-use graphical query interface for Hive

HiveQL

is a SQL-like language (subset of SQL)

Hive structures include well-understood database concepts such as tables, rows, columns, partitions

Compiled into

MapReduce

jobs that are executed on Hadoop

Dramatic performance gains with Stinger/

Tez

Stinger is a Microsoft,

Hortonworks

and OSS driven initiative to bring interactive queries with Hive

Brings query execution engine technology from Microsoft SQL Server to Hive

Performance gains up to 100x

HDP

2.1

15s

100x

SpeedupSlide23

HDInsight Supports HBase

Data Node

Data Node

Data Node

Data Node

Task Tracker

Task Tracker

Task Tracker

Task Tracker

Name Node

Job Tracker

HMaster

Coordination

Region Server

Region Server

Region Server

Region Server

NoSQL database on data in

HDInsight

Columnar, NoSQL database

Runs on top of the Azure Blob Stores in

HDInsight

Provides flexibility in that new columns can be added to column families at any timeSlide24

Storm for Azure HDInsight

Stream analytics for Near-Real Time ProcessingConsumes millions of real-time events from a scalable event broker (

ie

. Apache Kafka, Azure Event Hub)

Performs time-sensitive computation

Output to persistent stores, dashboards or

devicesCustomizable with Java + .NETDeeply integrated to Visual Studio

Event Queuing System

Collection

Presentation and action

Event producers

Transformation

Long-term storage

Event

Hubs

Storage

adapters

Stream

processing

Cloud gateways

(web APIs)

Field

gateways

Applications

Search and query

Data analytics (Excel)

Web/thick client

dashboards

Live Dashboards

Apache Storm on

HDInsight

Devices to take action

Kafka /

RabbitMQ

/

ActiveMQ

Web and Social

Devices

Sensors

Azure Stream Analytics

HDFS

Azure DBs

Azure storage

HBaseSlide25

Azure HDInsight running Linux

Choice of Windows or Linux clusters

Managed & supported by Microsoft

Re-use common tools, documentation, samples from Hadoop/Linux ecosystem

Add Hadoop projects that were authored on Linux to

HDInsight

Easier transition from on-premises to cloudSlide26

Microsoft Makes Hadoop Easier

Deep Visual Studio IntegrationDebug Hive jobs through Yarn logs or troubleshoot Storm topologiesVisualize Hadoop clusters, tables, and storage

Submit Hive queries, Storm topologies (C# or Java spouts/bolts)

IntelliSense for authoring Hive jobs and Storm business logicSlide27

Built

for Hadoop

Hyper Scale, Massive throughput

Enterprise

Ready

Introducing Azure Data Lake

A hyper scale repository for big data analytic workloads

Sign up

http://azure.com/datalakeSlide28

Visit

Myignite

at

http://myignite.microsoft.com

or download and use the

Ignite Mobile App

with the QR code above.

Please evaluate this sessionYour feedback is important to us!Slide29