Lazy Red Black Trees Stefan Kahrs Overview some general introduction on BSTs some specific observations on redblack trees how we can make them lazy and why we may want to conclusions ID: 139994
Download Presentation The PPT/PDF document "Variations on Balanced Trees" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Variations on Balanced TreesLazy Red-Black Trees
Stefan KahrsSlide2
Overviewsome general introduction on BSTssome specific observations on red-black treeshow we can make them lazy - and why we may want toconclusionsSlide3
Binary Search Treescommonly used data structure to implement sets or finite maps (only keys shown):
56
33
227Slide4
A problem with ordinary BSTson random data searching or inserting or deleting an entry performs in O(log(n)) time where n is the number of entries, but...if the data is biased
then this can deteriorate to O(n)...and thus a tree-formation can deteriorate to O(n2)Slide5
Therefore...people have come up with various schemes that make trees self-balancethe idea is always that insertion/deletion pay a O(log(n))
tax to maintain an invariantthe invariant guarantees that
search
or
insert
or
delete
all perform in logarithmic timeSlide6
Well-known invariants for treesBraun trees: size of left/right subtree vary by at most 1 – too strong for search
trees O(n0.58)AVL trees:
depth
of left/right
subtree
vary by at most 1
2-3-4 trees: a node has 1 to 3 keys, and 2 to 4
subtrees
(special case of B-tree)
Red-Black trees: an indirect realisation of 2-3-4 treesSlide7
Red-Black TreeBST with an additional colour field which can be RED or BLACKinvariant 1
: red nodes have only black children, root/nil are blackthus, a non-empty black node has between 2 and 4 black children
invariant
2
: all paths to leaves go through the same number of black nodesSlide8
Example68
12
83
7
43
75
96
98
94
70
7
6Slide9
Perceived WisdomRed-Black trees are cheaper to maintain than AVL trees, though they may not be quite as balancedpretty balanced though: average path-length for a Red-Black tree is in the worst case
only 5% longer that that of a Braun-treeSlide10
Aside: a problem with balanced treesan ordinary BST has on random data an average path length of 2*ln
(n)this is only 38% longer than the average path length of a Braun treethus: most balanced tree schemes lose against ordinary BST on random data, because they fail to pay their tax from those 38%
red-black trees succeed thoughSlide11
Algorithms on RB treessearch: unchanged, ignores colourinsert:insert as in BST (a fresh red node)
rotate subtrees until color violation goes awaycolour root black
delete
(more complex than insert):
delete as in BST
if underflow rotate from siblings until underflow goes awaySlide12
Example68
12
83
7
43
75
96
98
94
70
7
6
69Slide13
Example68
12
83
7
43
75
96
98
94
70
7
6
69Slide14
Example68
12
83
7
43
75
96
98
94
70
7
6
69Slide15
Standard Imperative Algorithmfind the place of insertion in a loopcheck your parent whether you’re a naughty child, and correct behaviour if necessary, by going up the treeSlide16
Problem with thisQuestion: how do you go up the tree?Answer: children should know
their parent.Which means: trees in imperative implementations are often not proper trees, every link consists of two pointersSlide17
Functional Implementationsin a pure FP language such as Haskell you don’t have pointer comparison and so parent pointers won’t work
instead we do something like this:insert x tree = repair (
simplInsert
x tree)
simplInsert
inserts data in
subtree
and produces a tree with a potential invariant violation at the top,
repair
fixes thatthe ancestors
sit on the recursion stackSlide18
Recursionactually, nothing stops us from doing likewise in an imperative language, using recursive insertion (or deletion)cost: recursive calls rather than loops
benefit: no parent pointers – saves memory and makes all rotations cheaperis still more expensive though...Slide19
Can we do better?problem is that the recursive insertion algorithm is not tail-recursive and thus not directly loopifiable: we repair after
we insertwhat if we turn this around?newinsert
x tree =
simplinsert
x (repair tree)
this is the fundamental idea behind lazy red-black treesSlide20
What does that mean?we allow colour violations to happen in the first placethese violations remain in the tree
we repair them when we are about to revisit a nodethis is all nicely loopifiable and requires no parent pointersSlide21
In the imperative codewhere we used to have...n = n.left
;...to continue in the left branchwe now have:
n =
n.left
=
n.left.repair
();Slide22
Invariants?the standard red-black tree invariant is broken with this (affects search)in principle, we can have B-
R-R-B-R-
R
-B-
R
-
R
paths, though these are rare
but this is as bad as it gets, so we do have an invariant that guarantees O(log(n))
average path lengths are similar to RB treesSlide23
Performance?I implemented this in Java, and the performance data were initially inconclusive (JIT compiler, garbage collection)after forcing gc
between tests, standard RB remains faster (40% faster on random inputs), though this may still be tweakable
so what is the
extra cost
, and can we do anything about it?Slide24
Checks!most nodes we visit and check are fineespecially high up in the tree, as these are constantly repaired...and the ones low down do not matter that much anyway
so we could move from regular maintenance to student-flat
maintenance
, i.e. repair trees only once in a blue moonSlide25
What?yes, the colour invariant goes to pot with thatwe do maintain black height though...
...and trust the healing powers of occasional repair: suppose we have a biased insertion sequence and don’t repair for a while...Slide26
Example12
83
7
43
96
70
suppose the tree has this shape, and now we insert a 5 in repair-modeSlide27
Result12
83
7
43
96
70
5Slide28
Findingson random data, performance of lazy red-black trees is virtually unaffected, even if we perform safe-insert only 1/100on biased data works a bit better under
student-flat, but still loses to RB (15% slower for this bias)average tree depth: 1.5 longer than RBon random inputs
also on biased inputs (where BST falls off the cliff)Slide29
ConclusionsUltimately: failure!Lazy RB trees are not faster than normal ones.On random inputs, Lazy RB perform very similarly to plain BST
Some small room for improvement – I doubt though the gap to plain RB can be closedPerhaps other algorithms would
benefit more from
lazy invariant maintenance
?