Operational Analytics Sponsorzy strategiczni Sponsorzy srebrni Łukasz Grala Microsoft MVP Data Platform MCT MCSE Architect Mentor Data Platform amp Business Intelligence Solutions ID: 551665
Download Presentation The PPT/PDF document "SQL Server 2016" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
SQL Server 2016
Operational
AnalyticsSlide2
Sponsorzy strategiczni
Sponsorzy srebrniSlide3
Łukasz Grala
Microsoft MVP Data Platform | MCT | MCSE
Architect - Mentor Data Platform & Business Intelligence SolutionsTrainer Data Platform and Business IntelligenceUniversity
LecturerAuthor Webcasts
and PublicationsMicrosoft MVP Data PlatformLeader PLSSUG Poznań Phd Student on
Poznan University of Technology, Faculty of Computing Science(topics – database and datawarehouse architecture
, data
mining
, machine learning)
lukasz@grala.biz
lukasz@sqlexpert.pl Slide4
Marcin Szeliga
Data
PhilosopherBI Expert and ConsultantData Platform Architect20 years of experience with SQL Server Ph.D
. Candidate at Politechnika Śląska marcin@sqlexpert.plSlide5
Operational Database Management Systems
Data Warehouse Database Management Systems
Business Intelligence and Analytics Platforms
x86 Server Virtualization
Cloud Infrastructure as a Service
Enterprise Application Platform as a Service
Public Cloud Storage
Leader in 2014 for Gartner Magic Quadrants
Microsoft platform leads the way on-premises and cloudSlide6
Hyperscale cloud
Mission-critical performance
Deeper insights across data
Do more. Achieve more.Slide7
Performance
Security
Availability
Scalability
Operational analytics
Insights on operational data; Works with in-memory OLTP and disk-based OLTP
In-memory OLTP enhancements
Greater T-SQL surface area, terabytes of memory supported, and greater number of parallel CPUs
Query data store
Monitor and optimize query plans
Native JSON
Expanded support for JSON data
Temporal database support
Query data as
points in time
Always encrypted
Sensitive data remains encrypted at all times with ability to query
Row-level security
Apply fine-grained access control to table rows
Dynamic data masking
Real-time obfuscation of data to prevent unauthorized access
Other enhancements
Audit success/failure of database operations
TDE support for storage of in-memory OLTP tables
Enhanced auditing for OLTP with ability to track history of record changes
Enhanced AlwaysOn
Three synchronous replicas for auto failover across domains
Round robin load balancing of replicas
Automatic failover based on database health
DTC for transactional integrity across database instances with AlwaysOn
Support for SSIS with AlwaysOn
Enhanced database caching
Cache data
w
ith automatic, multiple
TempDB
files per instance in multi-core environments
SQL Server 2016 improvementsSlide8
Performance
Security
Availability
Scalability
Operational analytics
Insights on operational data; Works with in-memory OLTP and disk-based OLTP
In-memory OLTP enhancements
Greater T-SQL surface area, terabytes of memory supported, and greater number of parallel CPUs
Query data store
Monitor and optimize query plans
Native JSON
Expanded support for JSON data
Temporal database support
Query data as
points in time
Always encrypted
Sensitive data remains encrypted at all times with ability to query
Row-level security
Apply fine-grained access control to table rows
Dynamic data masking
Real-time obfuscation of data to prevent unauthorized access
Other enhancements
Audit success/failure of database operations
TDE support for storage of in-memory OLTP tables
Enhanced auditing for OLTP with ability to track history of record changes
Enhanced AlwaysOn
Three synchronous replicas for auto failover across domains
Round robin load balancing of replicas
Automatic failover based on database health
DTC for transactional integrity across database instances with AlwaysOn
Support for SSIS with AlwaysOn
Enhanced database caching
Cache data
w
ith automatic, multiple
TempDB
files per instance in multi-core environments
Mission-critical performanceSlide9
Refers to Operational Workload (i.e. OLTP)
Examples: Enterprise Resource Planning (ERP) – Inventory, Order, Sales, Machine Data – Data from machine operations on factory floorOnline Stores (e.g. Amazon, Expedia)
Stock/Security tradesMission CriticalNo downtime (High Availability) – impact on revenueLow latency and high transaction throughput
What does operational mean?Slide10
Analytics
Studying past data (e.g. operational, social media) to identify potential trends To
analyze the effects of certain decisions or events (e.g. Ad campaign)Analyze past/current data to predict outcomes (e.g. credit score)GoalsEnhance the business by gaining knowledge to make improvements or changes
Source – MIT/SLOAN Management Review
What does
analytics mean?Slide11
SQL Server
Database
Application Tier
Presentation Layer
IIS
Server
SQL Server
Relational DW
Database
ETL
BI and analytics
Dashboards
Reporting
SQL Server
Analysis Server
Key Issues
Complex Implementation
Requires two Servers (
CapEx
and
OpEx
)
Data Latency in Analytics
More businesses demand/require real-time Analytics
Hourly, Daily, Weekly
Traditional
BI architectureSlide12
SQL Server
Database
Application Tier
Presentation Layer
IIS
Server
BI and analytics
Dashboards
Reporting
Benefits
No Data Latency
No ETL
No Separate DW
Challenges
Analytics queries are resource intensive and can cause blocking
How to minimize Impact on Operational workload
Sub-optimal execution of Analytics on relational schema
Add analytics specific indexes
This is
OPERATIONAL ANALYTICS
SQL Server
Analysis Server
Minimizing
data latency for analytics Slide13
SQL Server 2016Slide14
Ability to run analytics queries concurrently with operational workload using the same schema
Not a replacement forExtreme analytics queries performance possible using schemas customized (e.g. star/snowflake) and pre-aggregated cubes
Data coming from non-relational sourcesData coming from multiple relational sources requiring integrated analytic
Operational AnalyticsSlide15
Goals
Minimal impact on operational workload with concurrent analyticsPerformant
analytics on operational schema
Achieved using columnstore Index
Operational
Analytics
with SQL
Server
2016Slide16
Quick Recap:
Columnstore Index
Improved compression:Data from same domain compress betterReduced I/O
:Fetch only columns needed
…
Data stored as rows
Data stored as columns
Ideal for OLTP
Efficient operation on small set of rows
C1
C2
C3
C5
C4
Improved
performance
:
More data fits in memory
Optimized for CPU utilization
Ideal for DW
workloadSlide17
Clustered
Columnstore
Performance: TPC-HSlide18
Operational analytics
disk-based tablesSlide19
Key Points
Create an
updateable
non-clustered columnstore index (NCCI) for analytics queries
Drop all other indexes that were created for
analytics
No
application changes
ColumnStore index is maintained just like any other index
Query Optimizer will choose columnstore index where
needed
Relational Table
(Clustered Index/Heap
)
Btree
Index
Delete bitmap
Nonclustered columnstore index (NCCI)
Delta rowgroups
Operational
Analytics with
columnstore
indexSlide20
Key Points
Create Columnstore only on cold data – using filtered predicate to minimize maintenance
Analytics query accesses both columnstore and ‘hot’ data transparently
Example –
Order Management Application –
CREATE NONCLUSTERED COLUMNSTORE INDEX ….. WHERE
order_status
= ‘SHIPPED’
Relational Table
(Clustered Index/Heap
)
Btree
Index
Delete bitmap
Nonclustered columnstore index (NCCI) – filtered index
HOT
Delta rowgroups
DML Operations
Minimizing
CSI
overheadSlide21
Operational analytics for
in-memory tablesSlide22
Operational
Analytics with
columnstore
on
In-Memory Tables
No explicit delta rowgroup
Rows (tail) not in columnstore stay in in-memory OLTP table
No columnstore index overhead when operating on tail
Background task migrates rows from tail to columnstore in chunks of 1 million rows not changed in last 1 hourDeleted Rows Table (DRT) – tracks deleted rows Columnstore data fully resident in memory
Persisted together with operational dataNo application changes required
In-Memory OLTP Table
Updateable CCI
DRT
Tail
Range Index
Hash Index
Hot
Like
Delta rowgroupSlide23
DML Operations on In-Memory OLTP
Operation
Hash
or Range IndexHK-CCI
InsertInsert row into HK
Insert row into HKDeleteSeek row(s) to be deleted
Delete the row
Seek row(s) to be deleted
Delete the row in HK
If row in TAIL then return
else insert <colstore-RID> into DRT
UpdateSeek the row(s) Update (delete/insert)
Seek the row(s) Update (delete/insert) in HK
If row in TAIL then return else insert <colstore-RID> into DRT
Columnstore
index
on in-memory table overheadSlide24
DML
Operations
In-Memory OLTP Table
Updateable CCI
DRT
Tail
Range
Index
Hash Index
Like
Delta rowgroup
Hot
Keep hot data only in in-memory tables
Example – data stays hot for 1 day, 1 week…
Minimizing
this overheadSlide25
Query
processingSlide26
Demo
time Slide27
Performance
improvments
Scan typeElapsed time (s)
SpeedupRow store
scan, interop44.441
Row store scan, native28.4451.6x
CSI
scan
, interop
0.80255.4xSlide28
Insert, Update,
Delete costs and query time
OperationElasped
time (s) with CSIElasped
time (s) No CSIIncrease % Update
Increase % QueryCSI scan, interop0.802
BASE
Insert 400
000 rows
53.547.811.9%
CSI scan, interop0.869
8.4%Update 400
000 rows42.4
28.946.7%CSI scan, interop1.181
47.3%
Delete
400 000
rows
38.3
30.5
25.6%
CSI
scan
,
interop
1.231
53.5%Slide29
Single
thread insert and update
OperationRows affectedRow
store (s)Secondary CSI (s)
Primary CSI (s)1000 updates10 000
0.8931.4006.86610% insert
18M
233.9
566
291.42% update
3.96M123.2314.3
275.9Slide30
Single
thread scan
Millions of rowsRow
storeSecondary CSIPrimary
CSINew built180
99.14.71.71After 1000 updates180
99.4
5.4
1.75
After 10% inserts
198108.714.5
9.5After 2% updates198109.5
16.810.0Slide31Slide32
Elapsed
time per lookup in a B-Tree index
Columns projected
B-tree over CSI (ms) without
mappingB-tree over CSI with
mappingB-tree over B-tree (ms) No compresion
B-
tree
over B-tree (ms) Page
compresion43.92
5.282.413.6584.33
5.732.323.85
206.678.07
2.554.44Slide33
Overhead
on data loading of having an NCI
New partition size
(milion rows)Time for bulk
load into CSI (ms)Time to create B-
tree (ms)Ratio (index creation/bulk load)1
5
327
66
1.24%55 335
771.45%10
5 354831.55%Slide34
Comparing
performance
OperationBillions of value per secondNo SIMD
Billions of value per second
SIMDSpeedup
Bit unpacking 6bits 2.08
11.55
5.55x
Bit
unpacking
12
bits 1.919.76
5.11x
Bit
unpacking 21 bits1.965.292.70x
Compaction
32
bits
1.24
6.70
5.40x
Range
predicate
16
bits
0.94
11.42
5.06x
Sum 16 bit
values
2.86
14.46
5.06x
128-bit bitmap
filter
0.97
11.4211.77x
64KB bitmap filter 1.012.372.35xSlide35
Query performance (1)
Predicate
or aggregationDuration SQL2014 (ms)
Duration SQL2016 (ms)Speedup
Billion of rows per sQ1-Q4: select
count(*) from LINEITEM where <predicate>
L_ORDERKEY = 235236
220
140
1.57x
12.9L_QUANTITY = 1900664
689.76x26.5L_SHIPMODE='AIR'694
1474.72x12.2
L_SHIPDATE between '01.01.1997' and '01.01.1998'512
875.89x20.7Slide36
Query performance (2)
Predicate
or aggregationDuration SQL2014 (ms)
Duration SQL2016 (ms)Speedup
Billion of rows per sQ5-Q6: select count(*) from PARTSUPP where
<predicate>
PS_AVAILQTY < 10
50
27
1.85x
8.9
PS_AVAILQTY = 10 45
153.00x16
Q7-Q8: select
<aggregates> from LINEITEM
avg
(L_DISCOUNT)
1272
196
6.49x
9.1
avg
(L_DISCOUNT), min(L_ORDERKEY), max(L_ORDERKEY)
1978
356
5.56x
5.1Slide37
Columnstore
index overhead
DML operations on OLTP workload
Operation
BTREE (NCI)
Non Clustered ColumnStore Index (NCCI)InsertInsert row into
btree
Insert
row into
btree (delta store)
DeleteSeek row(s) to be deleted
Delete the rowSeek for the row in the delta stores (there can be multiple)If row found, then delete
Otherwise insert the key into delete row buffer
UpdateSeek the row(s)
UpdateDelete the row (steps same as above)Insert the updated row into delta storeSlide38
Availability
Groups as data warehouse
Key pointsMission Critical Operational Workloads typically configured for High Availability using AlwaysOn Availability Groups
You can offload analytics to readable secondary replica
Secondary Replica
Secondary Replica
Secondary Replica
Primary Replica
Always on Availability GroupSlide39
SQL Server
Database
Application Tier
Presentation Layer
IIS
Server
BI and analytics
Dashboards
Reporting
Add analytics specific indexes
SQL Server
Analysis Server
Minimizing
data latency for analytics Slide40
High-end Server Hardware
SSAS Enterprise Readiness: Tabular
New
DirectQuery
DirectQuery
for Oracle,
Teradata
, ASP
DirectQuery
support
for MDX
query
(Excel Tools)Slide41
Analytics in real-time with no data latency
Rich set of options to control impact on Operational workload
Industry leading solution Integrating in-memory OLTP with in-memory Analytics
No Application changes required
Summary – OA with SQL Server 2016Slide42
Sponsorzy strategiczni
Sponsorzy srebrni