Bob Ward Principal Architect Microsoft httpsakamsbobsql httpsakamsbobwardms httpsakamsbobsqldemos Session learning objectives At the end of this session you should be better able to ID: 806546
Download The PPT/PDF document "Inside SQL Server Polybase" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Inside SQL Server Polybase
Bob Ward, Principal Architect, Microsoft
https://aka.ms/bobsql
https://aka.ms/bobwardms
https://aka.ms/bobsqldemos
Slide2Session learning objectivesAt the end of this session, you should be better able to…
Understand what is Polybase and how to use it.Understand how Polybase works to know its capabilities for data virtualizationLearn how to use Polybase to build a data hub with SQL Server
Slide3SQL Server
T-SQL
Analytics
Apps
ODBC
NoSQL
Relational databases
Big data
Excel
PolyBase external tables
Cosmos DB
What is SQL Server Polybase?
D
istributed
compute engine integrated with SQL
Server
Query data where it lives using T-SQL
Distributed, scalable query performance
Manual/deploy with SQL Server
Auto deploy/optimize with Big Data Clusters
Intelligence over all data
HDFS
“It’s all about
Data Virtualization”
Slide4The Journey to Data Virtualization in SQL Server 2019
Linux support is coming
SQL Server 2019 is in Preview
Some details subject to change
David Dewitt
Rimma Nehme
Slide5Using Polybase in SQL Server: T-SQL EXTERNAL TABLE
Not simple without BDC
Login and password
Only for HDFS
And join to any other table or external table
INSERT only for HDFS
WWI
SQL
Cosmos DB
HDFS
metadata
Data lives
here
Results streamed
Slide6The SQL Server Polybase Architecture
HDFS
……...
Cosmos DB
Your data sources
Need more scale?
Add compute nodes
“Head” node
“Compute” node
“Compute” node
SQL Engine
Polybase Engine
Polybase Data Movement Service
control and
execution
Data flow
SQL Engine
Polybase Engine
Polybase Data Movement Service
SQL Engine
Polybase Engine
Polybase Data Movement Service
tempdb
DW dbs
tempdb
DW dbs
tempdb
DW dbs
Scan or pushdown
Scale out with partitions
shuffle
EE only
All editions
All editions
mpdwsvc.exe
Slide7DemoDive into Polybase
Slide8SQL Server Polybase and Hadoop
Slide9Polybase and Other ConnectorsBuilt-in Connectors (use ODBC)
sqlserver oracle teradata mongodb
\binn\Polybase\ODBC Drivers
SQL Server, Azure SQL Database, Azure SQL Data Warehouse
MongoDB or CosmosDB (using MongoDB API)
64bit
ODBC 3.0+ compliant
ODBC Connector
odbc
You install the driver
LOCATION “string” in EXTERNAL DATA SOURCE
Ex. SAP HANA (HDBCODBC Driver)
No client software install required
Scale out with
partitions
Slide10Polybase as a Semantic Layer
Slide11External Tables vs Linked Servers
* Insert into HDFS allowed
Slide12SQL Server 2019: Data Virtualization
WWI
SQL
Mobile
App Orders
Legacy Suppliers
Accounts Receivable
SQL Server
2019
Order Reviews
Customers from Acquisition
WideWorldimporters
Modern StockItems
Slide13DemoData Virtualization around the WideWorld
Slide14Lessons Learned
COLLATE required for character columns. Compat may be requiredORACLE case sensitive for LOCATION = <instance>.<schema>.<table>LOCATION for SQL Server is <db>.<schema>.<table>
EXTERNAL tables don’t support these types (they may be more)VARCHAR(MAX)GEOGRAPHYComputed Columns
JSON
MongoDB (CosmosDB) observations
Be careful of types in your document
LOCATION = <database>.<collection>
Need to dive in EXTERNAL TABLE compatibility (Ex. Row Level Security)
Slide15SQL Server 2019 Big Data Clusters and PolybaseThe Problem
Customers want to integrate with Big Data and other data sources easilyPolybase is not simple to setup, configure, maintain, and scale elasticallyCustomers may not have a Hadoop cluster or want to build one
Polybase “classic” (MapReduce) could be better
The Solution
Kubernetes and containers to deploy and scale elastically
Everything pre-installed including HDFS cluster
Build a control plane to help manage and monitor
Enhance SQL Server to read from HDFS natively
Provide a data mart for cached results
Introduce Spark and Notebooks for Data scientists
Slide16Polybase in SQL Server 2019 Big Data Clusters
Cluster
Compute pool
SQL Compute Node
SQL Compute Node
SQL Compute Node
…
Compute pool
SQL Compute Node
External data sources
IoT data
Directly read from HDFS
Persistent storage
…
Storage pool
SQL Server
Spark
HDFS Data Node
SQL Server
Spark
HDFS Data Node
SQL Server
Spark
HDFS Data Node
Kubernetes pod
Analytics
Custom
apps
BI
SQL Server
master instance
Node
Node
Node
Node
Node
Node
Node
SQL
Data mart
SQL Data Node
SQL Data Node
Compute pool
SQL Compute Node
Storage
Storage
Polybase head node
In Linux container
Polybase compute nodes
In Linux containers
“Built-in”
Data Sources
MapReduce
Not used
mpdwsvc.exe
uses SQLPAL
Control Plane
Controller Svc
Configuration Store (SQL Server)
Grafana
Elastic Search
Azure FSM Engine
InfluxDB
Kibana
Slide17Storage and Data Pools Data SourcesCREATE EXTERNAL DATA SOURCE
SqlStoragePoolWITH (LOCATION = 'sqlhdfs://service-mssql-controller:8080’);CREATE EXTERNAL TABLE
…WITH(DATA_SOURCE = SqlStoragePool,LOCATION = '/clickstream_data',FILE_FORMAT = csv_file);
CREATE EXTERNAL DATA SOURCE
SqlDataPool
WITH (LOCATION =
'sqldatapool
://
service-mssql-controller:8080/datapools/default
’);
CREATE EXTERNAL TABLE…
WITH ( DATA_SOURCE = SqlDataPool, DISTRIBUTION = ROUND_ROBIN );
mssql-controller REST endpoint
Preinstalled in model of Master Instance
Slide18Azure Data Studio and Data Virtualization
Using
PROSE
for intelligent import and schema detection
Slide19Session takeawaysPolybase = Data Virtualization = Reduced Need for ETLPolybase provides distributed read scale performance
Big Data Clusters automate the deployment of PolybaseDownload and try it yourselfSign up for EAP for SQL Server 2019 Big Data Clusters
Slide20Questions?
aka.ms/SQLBits19
Slide21Session resources
SQL Server 2019 Polybase documentationSQL Server Big Data Clusters documentationPolybase demos on
GitHubAzure SQL Database Elastic Query documentationLoading data into Azure SQL Data Warehouse with Polybase
https://aka.ms/bobsql
https://aka.ms/bobwardms
https://aka.ms/bobsqldemos
Slide22