A Deep Dive into Spatial Indexing Michael Rys Principal Program Manager Microsoft Corporation SQLServerMike DBI405 Q Why is my Query so Slow A Usually because the index isnt being used ID: 306948
Download Presentation The PPT/PDF document "Troubleshooting SQL Server Spatial Query..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Troubleshooting SQL Server Spatial Query Performance: A Deep Dive into Spatial Indexing
Michael RysPrincipal Program ManagerMicrosoft Corporation@SQLServerMike
DBI405Slide2
Q: Why is my Query so Slow?
A: Usually because the index isn’t being used.Q: How do I tell?
A: SELECT * FROM T WHERE g.STIntersects(@x) = 1
NO INDEX
INDEX!Slide3
Hinting the Index
Spatial indexes can be forced if needed.SELECT * FROM T
WHERE g.STIntersects(@x) = 1Use SQL Server 2008 SP1 or later!
WITH(INDEX(
T_g_idx
))Slide4
But Why Isn't My Index Used?
Plan choice is cost-basedQO uses various information, including cardinality
When can we estimate cardinality?Variables: neverLiterals: not for spatial since they are not literals under the covers
Parameters: yes, but cached, so first call matters
DECLARE @x geometry = 'POINT (0 0)'
SELECT *
FROM T
WHERE
T.g.STIntersects(@x) = 1
SELECT *FROM TWHERE T.g.STIntersects('POINT (0 0)') = 1
EXEC sp_executesql N'SELECT * FROM T
WHERE
T.g.STIntersects
(@x) = 1',
N'@x
geometry', N'POINT (0 0)'Slide5
Spatial Indexing Basics
In general, split predicates in twoPrimary filter finds all candidates, possibly
with false positives (but never false negatives)Secondary filter removes false positivesThe index provides our primary filterOriginal predicate is our secondary filterSome tweaks to this scheme
Sometimes possible to skip secondary filter
A
B
C
D
A
B
D
A
B
Primary Filter
(Index lookup)
Secondary Filter
(Original predicate)
ESlide6
Using B+-Trees for Spatial Index
SQL Server has B+-TreesSpatial indexing is usually done through other structuresQuad tree, R-TreeChallenge: How do we repurpose the B+-Tree to handle spatial queries?
Add a level of indirection!Slide7
Mapping to the B+-Tree
B+-Trees handle linearly ordered sets wellWe need to somehow linearly order 2D space Either the plane or the globeWe want a locality-preserving mapping from the original space to the line
i.e., close objects should be close in the indexCan’t be done, but we can approximate itSlide8
SQL Server Spatial Indexing Story
Requires bounding boxOnly one grid
Planar IndexGeographic Index
No bounding box
Two top-level projection grids
3.
2.
1.
1
2
15
16
4
3
14
13
5
8
9
12
6
7
10
11
1
2
15
16
4
3
14
13
5
8
9
12
6
7
10
11
1
2
15
16
4
3
14
13
5
8
9
12
6
7
10
11
1. Overlay a grid on the spatial object
2. Identify grids for spatial object to store in index
3. Identify grids for query object(s)
4. Intersecting grids identifies candidates
Indexing Phase
Primary Filter
Secondary Filter
5. Apply actual CLR method on candidates to find matchesSlide9
SQL Server Spatial Indexing Story
Multi-Level GridMuch more flexible than a simple grid
Hilbert numberingModified adaptable QuadTreeGrid index features4 levelsCustomizable grid subdivisionsCustomizable maximum number of cells per object (default 16)
NEW IN SQL Server 2012:
New Default tessellation with 8 levels of cell nestingSlide10
Multi-Level Grid
Deepest-cell Optimization: Only keep the lowest level cell in index
Covering Optimization: Only record higher level cells when all lower
cells are completely covered by the object
/
(“cell 0”)
Cell-per-object Optimization: User restricts max number of cells per object
/4/2/3/1Slide11
Implementation of the Index
Persist a table-valued functionInternally rewrite queries to use the table
Prim_keygeography1
g1
2
g2
3
g3
Prim_key
cell_id
sridcell_attr10x00007
42
0
3
0x00007
42
1
3
0x0000A
42
2
3
0x0000B
42
0
3
0x0000C
42
1
10x0000D42
020x00014421
Base Table T
Internal Table for sixdCREATE SPATIAL INDEX sixd
ON T(geography)0 – cell at least touches the object (but not 1 or 2)
1 – guarantee that object partially covers cell2 – object covers cell
15 columns and 895 byte limitation
Spatial Reference ID
Have to be the same to produce match
Varbinary
(5) encoding of grid cell idSlide12
Auto Grid Spatial Index
New spatial index Tessellations: geometry_auto_gridgeography_auto_gridUses 8 Grid levels instead of the previous 4No GRIDS parameter needed (or available)Fixed at HLLLLLLL
default number of cells per object:8 for geometry 12 for geographyMore stable performance for windows of different sizefor data with different spatial density
For default values:
Up to 2x faster for longer queries > 500
ms
More efficient primary filter Fewer rows returned10ms slower for very fast queries < 50 msIncreased tessellation time which is constantSlide13
Spatial Index Performance
New grid gives much stable performance for query windows of different size
Better grid coverage gives fewer high peaksSlide14
Index Creation and Maintenance
Create index example GEOMETRY:CREATE SPATIAL INDEX sixd ON
spatial_table(geom_column)WITH (BOUNDING_BOX = (0, 0, 500, 500), GRIDS = (LOW, LOW, MEDIUM, HIGH),
CELLS_PER_OBJECT
= 20)
Create index example GEOGRAPHY:
CREATE SPATIAL INDEX
sixd ON spatial_table
(geogr_column)USING GEOGRAPHY_GRID
WITH (GRIDS = (LOW, LOW, MEDIUM, HIGH), CELLS_PER_OBJECT = 20)NEW IN SQL Server 2012
(equivalent to default creation):CREATE SPATIAL INDEX sixd ON spatial_table(geom_column)
USING GEOGRAPHY_AUTO_GRID
WITH (CELLS_PER_OBJECT = 20)
Use ALTER and DROP INDEX for maintenance.Slide15
demo Indexing and PerformanceSlide16
Spatial queries supported by index in SQL Server
Geometry:STIntersects() = 1
STOverlaps() = 1STEquals()= 1STTouches() = 1STWithin
() = 1
STContains
() = 1
STDistance
() < val
STDistance() <= val
Nearest NeighborFilter() = 1Geography
STIntersects() = 1 STOverlaps() = 1
STEquals
()= 1
STWithin
() = 1
STContains
() = 1
STDistance
() <
val
STDistance
() <=
val
Nearest Neighbor
Filter() = 1
New in SQL Server 2012Slide17
How Costing is Done
The stats on the index contain a trie constructed on the string form of the packed binary(5) typed Cell ID.
When a window query is compiled with a sniffable window object, the tessellation function on the window object is run at compile time. The results are used to construct a trie for use during compilation. May lead to wrong compilation for later objectsNo costing on:Local variables, constants, results of expressions
Use different indices and different stored
procs
to account for different query characteristicsSlide18
Understanding the Index Query PlanSlide19
Seeking into a Spatial Index
Minimize I/O and random I/OIntuition: small windows should touch small portions of the indexA cell 7.2.4 matches Itself
AncestorsDescendants
Spatial Index S
7
7.2
7.2.4Slide20
Understanding the Index Query Plan
T(@g)
Spatial Index Seek
Ranges
Remove dup ranges
Optional SortSlide21
Spatial index tessellation
Better and more continuous coverage
64 cells 128 cells 256 cells
Fully contained cells
Partially contained cellsSlide22
Query window number of cells
Typical spatial query performance
Optimal value (theoretical) is somewhere between two extremes
Time needed to
process false positives
Default values:
512 - Geometry AUTO grid
768 - Geography AUTO grid
1024 - MANUAL grids
SELECT * FROM table t WITH (
SPATIAL_WINDOW_MAX_CELLS
=256)
WHERE
t.geom.STIntersects
(@window)=1;Slide23
Query Window Hinting (SQL Server 2012)
SELECT * FROM table t with(SPATIAL_WINDOW_MAX_CELLS=1024)WHERE t.geom.STIntersects
(@window)=1Used if an index is chosen (does not force an index)Overwrites the default (512 for geometry, 768 for geography)Rule of thumb:Higher value makes primary filter phase longer but reduces work in secondary filter phase
Set higher for dense spatial data
Set lower for sparse spatial data Slide24
Index Hinting
FROM T WITH (INDEX (<Spatial_idxname>))Spatial index is treated the same way a non-clustered index is
the order of the hint is reflected in the order of the indexes in the planmultiple index hints are concatenatedno duplicates are allowedThe following restrictions exist:The spatial index must be either first in the first index hint or last in the last index hint for a given table.
Only one spatial index can be specified in any index hint for a given table.Slide25
demo Spatial
Query HintingSlide26
Additional Query Processing
SupportIndex intersection
Enables efficient mixing of spatial and non-spatial predicatesMatchingNew in SQL Server 2012: Nearest Neighbor queryDistance queries: convert to STIntersectsCommutativity: a.STIntersects(b) =
b.STIntersects
(a)
Dual:
a.STContains
(b) = b.STWithin(a)Multiple spatial indexes on the same columnVarious bounding boxes, granularitiesOuter references as window objectsEnables spatial join to use one indexSlide27
Other Spatial Performance Improvements in SQL Server 2012
Spatial index build time for point data can be as much as four to five times faster Optimized spatial query plan for STDistance
and STIntersects like queriesFaster point data queriesOptimized STBuffer, lower memory footprintSlide28
Spatial Nearest NeighborMain scenario
Give me the closest 5 Italian restaurantsExecution plan SQL Server 2008/2008 R2: table scan
SQL Server 2012: uses spatial indexSpecific query pattern requiredSELECT
TOP
(5) *
FROM Restaurants r
WHERE
r.type
= ‘Italian’
AND r.pos.
STDistance(@me) IS NOT NULLORDER BY
r.pos.
STDistance
(@me)Slide29
demo Spatial Performance in SQL Server
2012Slide30
Nearest Neighbor Performance
NN query
vs best current workaround (sort all points in 10km radius)
*Average time for NN query is ~236ms
Find the closest 50 business points to a specific location (out of 22 million in total)Slide31
Limitations of Spatial Plan Selection
Off whenever window object is not a parameter:Spatial join (window is an outer reference)Local variable, string constant, or complex expressionHas the classic SQL Server parameter-sensitivity problem
SQL compiles once for one parameter value and reuses the plan for all parameter valuesDifferent plans for different sizes of window require application logic to bucketize the windowsSlide32
Error 8635: Cannot find a plan
Error: The query processor could not produce a query plan for a query with a spatial index hint. Reason: %S_MSG. Try removing the index hints or removing SET FORCEPLAN.Possible Reasons (%S_MSG):The spatial index is disabled or offline
The spatial object is not defined in the scope of the predicateSpatial indexes do not support the comparand supplied in the predicateSpatial indexes do not support the comparator supplied in the predicateSpatial indexes do not support the method name supplied in the predicateThe comparand references a column that is defined below the predicate
The
comparand
in the comparison predicate is not deterministic
The spatial parameter references a column that is defined below the predicate
Could not find required binary spatial method in a conditionCould not find required comparison predicateSlide33
Index Support
Can be built in parallelCan be hinted
File groups/PartitioningAligned to base table or Separate file groupFull rebuild onlyNew catalog views, DDL EventsDBCC Checks
Supportability stored procedures
New in SQL Server 2012:
Index Page and Row Compression
Ca. 50% smaller indices, 0-15% slower queries
Not supported
Online rebuild
Database Tuning advisorSlide34
SET Options
Spatial indexes requires:ANSI_NULLS: ONANSI_PADDING: ONANSI_WARNINGS: ONCONCAT_NULL_YIELDS_NULL: ON
NUMERIC_ROUNDABORT: OFFQUOTED_IDENTIFIER: ONSlide35
Spatial Indices and Partitions and Filegroups
Default partitioned to the same filegroups as the base table. Overwrite with: [ ON {
filegroup_name | "default" } ]If filegroup_name is specified, the index will be placed on the specified filegroup regardless of the table’s partitioning scheme. If “default” is specified, the base table’s default filegroup/partitioning scheme is applied.Altering the base table’s partition scheme is not allowed unless the spatial index was created with the “ON
filegroup
” option (and is hence not aligned with the partitioning anyway). The index has to be dropped and then the base table repartitioned. Slide36
Spatial Catalog Views
sys.spatial_indexes catalog viewsys.spatial_index_tessellations catalog view
Entries in sys.indexes for a spatial index:A clustered index on the internal table of the spatial indexA spatial index (type = 4) for spatial indexAn entry in sys.internal_tables
An entry to
sys.index_columns
Slide37
sp_spatial_help_geometry_histogram
sp_spatial_help_geography_histogram
Used for spatial data and index analysis
New Spatial Histogram Helpers
Histogram of 22 million business points over US
Left: SSMS view of a histogram
Right: Custom drawing on top of Bing MapsSlide38
Indexing Support Procedures
sys.sp_help_spatial_geometry_indexsys.sp_help_spatial_geometry_index_xml
sys.sp_help_spatial_geography_indexsys.sp_help_spatial_geography_index_xmlProvide information about index:
64 properties
10 of which are considered coreSlide39
sys.sp_help_spatial_geometry_index
Arguments
Results
in property name/value pair table of the format:
Parameter
Type
Description
@
tabname
nvarchar
(776)
the name of the table for which the index has been specified
@
indexname
sysname
the index name to be investigated
@
verboseoutput
tinyint
0 core set of properties is reported
1 all properties are being reported
@
query_sample
geometry
A representative query sample that will be used to test the usefulness of the index. It may be a representative object or a query window.
PropName
:
nvarchar
(256)
PropValue
:
sql_variantSlide40
sys.sp_help_spatial_geography_index_xml
Arguments
ParameterType
Description
@
tabname
nvarchar
(776)
the name of the table for which the index has been specified
@
indexnamesysname
the index name to be investigated
@
verboseoutput
tinyint
0 core set of properties is reported
1 all properties are being reported
@
query_sample
geography
A representative query sample that will be used to test the usefulness of the index. It may be a representative object or a query window.
@
xml_output
xml
This is an
output
parameter that contains the returned properties in an XML fragmentSlide41
Some of the returned Properties
Property
Type
Description
Base_Table_Rows
Bigint
All
Number of rows in the base table
Index properties
-
All
index properties: bounding
box, grid densities, cell
per object
Total_Primary_Index_Rows
Bigint
All
Number of rows in the index
Total_Primary_Index_Pages
Bigint
All
Number of pages in the index
Total_Number_Of_ObjectCells_In_Level0_For_QuerySample
Bigint
Core
Indicates whether the representative query sample falls outside of the bounding box of the geometry index and into the root cell (level 0 cell). This is either 0 (not in level 0 cell) or 1. If it is in the level 0 cell, then the investigated index is not an appropriate index for the query sample.
Total_Number_Of_ObjectCells_In_Level0_In_Index
Bigint
Core
Number of cell instances of indexed objects that are tessellated in level
0.
For
geometry indexes, this will happen if the bounding box of the index is smaller than the data domain. A
high number of objects in level 0 may require a costly application of secondary filters if the query window falls partially outside the bounding box.
If the query window falls inside the bounding box, having a high number of objects in level 0 may actually improve the performance. Slide42
Some of the returned Properties
Property
Type
Description
Number_Of_Rows_Selected_By_Primary_Filter
bigint
Core
P = Number
of rows selected by the primary filter.
Number_Of_Rows_Selected_By_Internal_Filter
bigint
Core
S = Number
of rows selected by the internal filter. For these rows, the secondary filter is not called.
Number_Of_Times_Secondary_Filter_Is_Called
bigint
Core
Number of times the secondary filter is called.
Percentage_Of_Rows_NotSelected_By_Primary_Filter
float
Core
Suppose there are N rows in the base table, suppose P are selected by the primary filter. This is (N-P)/N as percentage.
Percentage_Of_Primary_Filter_Rows_Selected_By_Internal_Filter
float
Core
This
is S/P as a percentage. The higher the percentage, the better is the index in avoiding the more expensive secondary filter.
Number_Of_Rows_Output
bigint
Core
O=Number
of rows output by the query.
Internal_Filter_Efficiency
floatCoreThis is S/O as a percentage.
Primary_Filter_Efficiencyfloat
CoreThis is O/P as a percentage. The higher the efficiency is, the less
false positives have to be processed by the secondary filter.Slide43
Spatial Tips on index settings
Some best practice recommendations (YMMV):Start out with new default tesselation
Point data: always use HIGH for all 4 level. CELL_PER_OBJECT are not relevant in the case.Simple, relatively consistent polygons: set all levels to LOW or MEDIUM, MEDIUM, LOW, LOW Very complex LineString or Polygon instances:High number of CELL_PER_OBJECT (often 8192 is best)
Setting all 4 levels to HIGH may be beneficial
Polygons or line strings which have highly variable sizes: experimentation is needed.
Rule of thumb for GEOGRAPHY: if MMMM is not working, try HHMM Slide44
What to do if my Spatial Query is slow?
Make sure you are running SQL Server 2008 SP1, 2008 R2 or 2012
Check query plan for use of indexMake sure it is a supported operationHint the index (and/or a different join type)Do not use a spatial index when there is a highly selective non-spatial predicate
Run above index support procedure:
Assess effectiveness of primary filter (
Primary_Filter_Efficiency
)
Assess effectiveness of internal filter (Internal_Filter_Efficiency)
Redefine or define a new index with better characteristicsMore appropriate bounding box for GEOMETRYBetter grid densitiesSlide45
Summary: Spatial Index Improvements in SQL Server 2012
Auto Grid Spatial IndexSpatial Index HintMore supported OperationsSpatial Index Compression
Improved “Create Spatial Index” Time For Point DataSlide46
Related Content
Some Spatial Presentations (with further links
) and whitepapers:
http://www.slideshare.net/MichaelRys/sqlbits-x-sql-server-2012-spatial
http://www.slideshare.net/MichaelRys/sqlbits-x-sql-server-2012-spatial-indexing
http://social.technet.microsoft.com/wiki/contents/articles/9694.tuning-spatial-point-data-queries-in-sql-server-2012.aspx
Forum:
http://social.msdn.microsoft.com/Forums/en-US/sqlspatial/threads
Blogs:
http
://blogs.msdn.com/b/edkatibah/
Find Me Later At
…
@
SQLServerMike
mrys@microsoft.com
Breakout
Sessions
DBI324 - Taking SQL Server into the World of Spatial Data ManagementSlide47
How to contact me
Michael
Rys
mrys@microsoft.com
http://
sqlblog.com/blogs/michael_rys/
@
SQLServerMikeSlide48
Track Resources
@sqlserver
@
TechEd_europe
#
msTechEd
m
v
a
Microsoft Virtual Academy
SQL Server 2012 Eval Copy
Get Certified!
Hands-On LabsSlide49
Resources
Connect. Share. Discuss.
http
://europe.msteched.com
Learning
Microsoft Certification & Training Resources
www.microsoft.com/learning
TechNet
Resources for IT Professionals
http://microsoft.com/technet
Resources for Developers
http://microsoft.com/msdn Slide50
Evaluations
http://europe.msteched.com/sessionsSubmit your evals online Slide51
©
2012 Microsoft
Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the
part
of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT
MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.