27001010011110111101Extendible Hash TableDirectory entry key prefixfirst Dbits and a pointer to the bucketwith all keys starting with that prefixBucket entry keys matching on first d ID: 153645
Download Pdf The PPT/PDF document "1CSE 326: Data StructuresTopic #10: Hash..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1CSE 326: Data StructuresTopic #10: Hashing (3)Ashish Sabharwal Autumn, 20032Todays OutlineAdmin:Hardcopyturninfor Project 2 now!Homework 2due FridayStart looking for a partnerfor Project 3(must be someone different from your Project 2 partner)Finish HashingDouble hashing, rehashingExtendible hashingGroup Quiz #43When to Rehash?Many alternatives:Rehash when table is half fullRehash when insertion fails in open addressingRehash when insertion becomes very slowin separate chainingRehash when lcrosses a certain threshold4Something We Again Forgot:Disk Acesses5We Want To Minimize Disk Accesses!1024 bytesEntire blocks transferred into memory at a timeTransfer time much lessthan seek timeTherefore we need to minimize disk accesses!Disk access time =Seek time+Transfer time6Solution: Extendible HashingHashing technique for huge data setsOptimizes to reduce disk accesses Hash table contains1.Directory2Dentries, Dbits per entry, pointers to leaf buckets2.Leaf BucketsKeys in leaf Lhave dL£Dbits in common with parent key,leaves store all dataPropertiesOnly 2 levels in the table only 2 disk accesses for find!Each leaf bucket fits on one disk block cachingBetter than B-Trees if order is not important why? 27001010011110111101Extendible Hash TableDirectory entry : key prefix(first Dbits) and a pointer to the bucketwith all keys starting with that prefixBucket entry : keys matching on first dL£Dbits, plus the dataassociated with those keys000100(dL= 2)00001000110010000110(dL= 2)010010101101100(dL= 3)1000110011(dL= 3)101011011010111(dL= 2)110011110011110Directory for D= 3insert(11010)?insert(11011)?+ data+ data+ data+ dataBucket size = 48001010011110111101Inserting Using Bucket-Split000100(dL= 2)00001000110010000110(dL= 2)010010101101100(dL= 3)1000110011(dL= 3)101011011010111(dL= 3)110011101011011Directory for D= 3+ data+ data+ data+ dataBucket size = 4(dL= 3)1110011110Split9Insertion Using Directory-Expansion1.insert(10010)But, no room to insert,only one parent,and no adoption!2.Solution: Expand directoryNow do a bucket-split01101100(2)01101(2)10000100011001110111(2)1100111110001010011110111101000100More expensive!How to ensure this is uncommon?D= 2D= 310What if Extendible HashingDoesnt Cut It?Option 1: Store only pointers/references to the items: (key, value) pairs separately on diskOption 2: Improve hash function; Rehash11The One-Slide HashCollision resolution1.Separate ChainingExpand beyond hashtablevia secondary DictionariesAllows l 12.Open AddressingExpand within hashtableSecondary probing: {linear, quadratic, double hash}l£1 (by definition!)l£½ (by preference!)Choosing a Hash FunctionMake sure table size is primeCareful choice for stringsPerfect hashingIf keys known in advance, tune hash function for them!RehashingTunes up hashtablewhen, e.g.,lcrosses a thresholdExtendible hashingFor disk-based dataHash function: maps keys to integers12Search ADT ImplementationsUnsorted list (1) (n) (n)Sorted list (n) (log n)? (n)Trees (log n) (log n) (log n)Hash Table (1) (1) (1)insertdeletefindIs there anything a hash table cannotdo efficiently?(average case)Youll answer this in quiz #4!