Spring 2023 Instructor Jianer Chen Office PETR 428 Phone 8454259 Email chencsetamuedu Notes 16 Physical Query Plan and Optimization What Does DBMS Do Prepare a collection C of efficient algorithms for operations in relational algebra ID: 1028816
Download Presentation The PPT/PDF document "CSCE-608 Database Systems" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1. CSCE-608 Database SystemsSpring 2023Instructor: Jianer ChenOffice: PETR 428Phone: 845-4259Email: chen@cse.tamu.eduNotes 16: Physical Query Plan and Optimization
2. What Does DBMS Do?Prepare a collection C of efficient algorithms for operations in relational algebra;An input database program Pparserparse tree-lqp convertorLqp-pqp convertortake care of issues in optimization and security.Machine executable codephysical query planparse treepreprocessingparse treelogic query planapply logic lawslogic query planlogic query planOptimization via logic and sizeOptimization via algorithms and cost2
3. What Does DBMS Do?Prepare a collection C of efficient algorithms for operations in relational algebra;An input database program Pparserparse tree-lqp convertorLqp-pqp convertortake care of issues in optimization and security.Machine executable codephysical query planparse treepreprocessingparse treelogic query planapply logic lawslogic query planlogic query planOptimization via logic and sizeOptimization via algorithms and cost2SELECT a1, b1, c1FROM A, B, CWHERE a2=1 AND b2=2 AND c2=3×ABC×σπa1, b1, c1a2=1, b2 =2, c2=3×ABC×σπa1, b1, c1a2=1σσb2 =2c2=3A parse tree
4. What Does DBMS Do?Prepare a collection C of efficient algorithms for operations in relational algebra;An input database program Pparserparse tree-lqp convertorLqp-pqp convertortake care of issues in optimization and security.Machine executable codephysical query planparse treepreprocessingparse treelogic query planapply logic lawslogic query planlogic query planOptimization via logic and sizeOptimization via algorithms and cost2SELECT a1, b1, c1FROM A, B, CWHERE a2=1 AND b2=2 AND c2=3×ABC×σπa1, b1, c1a2=1, b2 =2, c2=3×ABC×σπa1, b1, c1a2=1σσb2 =2c2=3A parse tree
5. Input: an optimized LQP T, and main memory constraint M ×∩ πσσσG FEDCBAConstruction of Physical Query Plan
6. Input: an optimized LQP T, and main memory constraint MReplacing each leaf R of T by “scan(R)”; ×∩ πσσσscan(G) scan(F)scan(E)scan(D)scan(C)scan(B)scan(A)Construction of Physical Query Plan
7. Input: an optimized LQP T, and main memory constraint MReplacing each leaf R of T by “scan(R)”;Combining the “scan’s” with other operations; ×∩ πσσσscan(G) scan(F)scan(E)scan(D)scan(C)scan(B)scan(A)index-scanindex-scanConstruction of Physical Query Planindex-scan
8. Input: an optimized LQP T, and main memory constraint MReplacing each leaf R of T by “scan(R)”;Combining the “scan’s” with other operations; Replacing each internal node v of T by a proper algorithm; ×∩ πσσσscan(G) scan(F)scan(E)scan(D)scan(C)scan(B)scan(A)index-scanindex-scanJ2PJ2PJ1PJ1PCJI1PConstruction of Physical Query Planindex-scan
9. Input: an optimized LQP T, and main memory constraint MReplacing each leaf R of T by “scan(R)”;Combining the “scan’s” with other operations; Replacing each internal node v of T by a proper algorithm;For each edge e in T, decide if e should be “materialized”; ×∩ πσσσscan(G) scan(F)scan(E)scan(D)scan(C)scan(B)scan(A)index-scanindex-scanJ2PJ2PJ1PJ1PCJI1PConstruction of Physical Query Planindex-scan
10. Input: an optimized LQP T, and main memory constraint MReplacing each leaf R of T by “scan(R)”;Combining the “scan’s” with other operations; Replacing each internal node v of T by a proper algorithm;For each edge e in T, decide if e should be “materialized”; ×∩ πσσσscan(G) scan(F)scan(E)scan(D)scan(C)scan(B)scan(A)index-scanindex-scanJ2PJ2PJ1PJ1PCJI1PConstruction of Physical Query Planindex-scan
11. Input: an optimized LQP T, and main memory constraint MReplacing each leaf R of T by “scan(R)”;Combining the “scan’s” with other operations; Replacing each internal node v of T by a proper algorithm;For each edge e in T, decide if e should be “materialized”;Cut all materialized edges; ×∩ πσσσscan(G) scan(F)scan(E)scan(D)scan(C)scan(B)scan(A)index-scanindex-scanJ2PJ2PJ1PJ1PCJI1PConstruction of Physical Query Planindex-scan
12. Input: an optimized LQP T, and main memory constraint MReplacing each leaf R of T by “scan(R)”;Combining the “scan’s” with other operations; Replacing each internal node v of T by a proper algorithm;For each edge e in T, decide if e should be “materialized”;Cut all materialized edges;Each subtree is a call to the subroutine at the root of the subtree. The order of the calls follows the bottom-up order in the structure. ×∩ πσσσscan(G) scan(F)scan(E)scan(D)scan(C)scan(B)scan(A)index-scanindex-scanJ2PJ2PJ1PJ1PCJI1P123Construction of Physical Query Planindex-scan
13. Input: an optimized LQP T, and main memory constraint MReplacing each leaf R of T by “scan(R)”;Combining the “scan’s” with other operations; Replacing each internal node v of T by a proper algorithm;For each edge e in T, decide if e should be “materialized”;Cut all materialized edges;Each subtree is a call to the subroutine at the root of the subtree. The order of the calls follows the bottom-up order in the structure. ×∩ πσσσscan(G) scan(F)scan(E)scan(D)scan(C)scan(B)scan(A)index-scanindex-scanJ2PJ2PJ1PJ1PCJI1P123This produces an executable code for the input DB programConstruction of Physical Query Planindex-scan
14. Physical Query Plan: SummaryReplacing internal nodes of an LQP by proper algorithms;Deciding if a subroutine call should be pipelined or materialized;Many optimization techniques are involved here;In practice, heuristic optimization techniques are used to construct good physical query plans;The resulting physical query plan is an executable code.
15. What Does DBMS Do?Prepare a collection C of efficient algorithms for operations in relational algebra;An input database program Pparserparse tree-lqp convertorLqp-pqp convertortake care of issues in optimization and security.Machine executable codephysical query planparse treepreprocessingparse treelogic query planapply logic lawslogic query planlogic query planOptimization via logic and sizeOptimization via algorithms and cost2SELECT a1, b1, c1FROM A, B, CWHERE a2=1 AND b2=2 AND c2=3×ABC×σπa1, b1, c1a2=1, b2 =2, c2=3×ABC×σπa1, b1, c1a2=1σσb2 =2c2=3A parse tree
16. What Does DBMS Do?Prepare a collection C of efficient algorithms for operations in relational algebra;An input database program Pparserparse tree-lqp convertorLqp-pqp convertortake care of issues in optimization and security.Machine executable codephysical query planparse treepreprocessingparse treelogic query planapply logic lawslogic query planlogic query planOptimization via logic and sizeOptimization via algorithms and cost2SELECT a1, b1, c1FROM A, B, CWHERE a2=1 AND b2=2 AND c2=3×ABC×σπa1, b1, c1a2=1, b2 =2, c2=3×ABC×σπa1, b1, c1a2=1σσb2 =2c2=3A parse tree
17. secondarystorage(disks)in tables(relations)databaseadministratorDDLlanguagedatabase programmerDML (query)languageDBMSfile managerbuffermanagermainmemorybuffersindex/file managerDML complierDDL complierquery executionenginetransaction managerconcurrency controllock tablelogging &recovery
18. secondarystorage(disks)in tables(relations)databaseadministratorDDLlanguagedatabase programmerDML (query)languageDBMSfile managerbuffermanagermainmemorybuffersindex/file managerDML complierDDL complierquery executionenginetransaction managerconcurrency controllock tablelogging &recoveryWhat is still missing?
19. secondarystorage(disks)in tables(relations)databaseadministratorDDLlanguagedatabase programmerDML (query)languageDBMSfile managerbuffermanagermainmemorybuffersindex/file managerDML complierDDL complierquery executionenginetransaction managerconcurrency controllock tablelogging &recoveryEfficient Algorithms forRelational algebriac operations
20. secondarystorage(disks)in tables(relations)databaseadministratorDDLlanguagedatabase programmerDML (query)languageDBMSfile managerbuffermanagermainmemorybuffersindex/file managerDML complierDDL complierquery executionenginetransaction managerconcurrency controllock tablelogging &recoveryEfficient Algorithms forRelational algebriac operations
21. secondarystorage(disks)in tables(relations)databaseadministratorDDLlanguagedatabase programmerDML (query)languageDBMSfile managerbuffermanagermainmemorybuffersindex/file managerDML complierDDL complierquery executionenginetransaction managerconcurrency controllock tablelogging &recovery
22. secondarystorage(disks)in tables(relations)databaseadministratorDDLlanguagedatabase programmerDML (query)languageDBMSfile managerbuffermanagermainmemorybuffersindex/file managerDML complierDDL complierquery executionenginetransaction managerconcurrency controllock tablelogging &recovery
23. The Main Purpose of Index StructuresSpeedup the search process23 indexσa=6(R)blockscontainingthe desired tuplesquickly figure out disksotherwise have toscan the entire RBut also need to handle dynamic changes of R
24. Support fast searchSupport range search Support dynamic changesCould be either dense or sparse dense: pointers to all records sparse: one pointer per block24B+Trees
25. A B+tree node of order n where ph are pointers (disk addresses) and kh are search-keys (values of the attributes in the index)25B+Treespn+1knk2p2k1p1p3……
26. A B+tree node of order n where ph are pointers (disk addresses) and kh are search-keys (values of the attributes in the index)How big is n? Basically we want each B+tree node to fit in a disk block so that a B+tree node can be read/written by a single disk I/O. Typically, n ~ 100-200.26pn+1knk2p2k1p1p3……B+Trees
27. B+Tree Example order n = 3271003012015018035113035100101110120130150156179180200root
28. A B+Tree of order nEach node has: n keys and n+1 pointers These are fixedTo keep the nodes not too empty, also for the operations to be applied efficiently: * Non-leaf: at least (n+1)/2 pointers (to children) * Leaf: at least (n+1)/2 pointers to data (plus a “sequence pointer” to the next leaf) Basically: use at least one half of the pointers28
29. Sample non-leaf order n = 329578195To keys k < 57To keys 57 k<81To keys 81 k<95To keys k 95
30. Sample leaf node order n = 3 30From non-leaf nodeTo next leaf in sequence578195To record with key 57To record with key 81To record with key 95
31. Example (B+ tree of order n=3)311201501803035113035Full nodeMin. nodeNon-leafLeaf
32. B+tree rulesRule 1. All leaves are at same lowest level (balanced tree)Rule 2. Pointers in leaves point to records except for “sequence pointer”Rule 3. Number of keys/pointers in nodes:32Max. #pointersMax. # keysMin. #pointersMin. #keysNon-leafn+1n(n+1)/2(n+1)/2 1Leafn+1n(n+1)/2 + 1(n+1)/2Rootn+1n21
33. B+tree rulesRule 1. All leaves are at same lowest level (balanced tree)Rule 2. Pointers in leaves point to records except for “sequence pointer”Rule 3. Number of keys/pointers in nodes:33Max. #pointersMax. # keysMin. #pointersMin. #keysNon-leafn+1n(n+1)/2(n+1)/2 1Leafn+1n(n+1)/2 + 1(n+1)/2Rootn+1n21could be 1
34. Search in a B+treeStart from the rootSearch in a leaf blockMay not have to go to the data file 34 Search(ptr, k); \\ search a record of key value k in the subtree rooted at ptr \\ assume the B+tree is a dense index of order n Case 1. ptr is a leaf \\ pn+1 is the sequence pointer IF (k = ki) for a key ki in *ptr THEN return(pi); ELSE return(Null); Case 2. ptr is not a leaf find a key ki in *ptr such that ki-1 ≤ k < ki; return(Search(pi, k));
35. Search in B+tree351003012015018035113035100101110120130150156179180200rootSearch(*prt, 130)120 130 <150130 =130return100 130 Search(ptr, k); \\ search a record of key value k in the subtree rooted at ptr \\ assume the B+tree is a dense index of order n Case 1. ptr is a leaf \\ pn+1 is the sequence pointer IF (k = ki) for a key ki in *ptr THEN return(pi); ELSE return(Null); Case 2. ptr is not a leaf find a key ki in *ptr such that ki-1 ≤ k < ki; return(Search(pi, k)); A tree nodek1k2‧‧‧‧‧‧knp1p2p3pn+1pn