/
1CSE 326: Data StructuresTopic #10: Hashing (3)Ashish Sabharwal Autumn 1CSE 326: Data StructuresTopic #10: Hashing (3)Ashish Sabharwal Autumn

1CSE 326: Data StructuresTopic #10: Hashing (3)Ashish Sabharwal Autumn - PDF document

luanne-stotts
luanne-stotts . @luanne-stotts
Follow
380 views
Uploaded On 2015-10-08

1CSE 326: Data StructuresTopic #10: Hashing (3)Ashish Sabharwal Autumn - PPT Presentation

27001010011110111101Extendible Hash TableDirectory entry key prefixfirst Dbits and a pointer to the bucketwith all keys starting with that prefixBucket entry keys matching on first d ID: 153645

27001010011110111101Extendible Hash TableDirectory entry

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "1CSE 326: Data StructuresTopic #10: Hash..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1CSE 326: Data StructuresTopic #10: Hashing (3)Ashish Sabharwal Autumn, 20032Today’s Outline•Admin:–Hardcopyturninfor Project 2 –now!–Homework 2due Friday–Start looking for a partnerfor Project 3(must be someone different from your Project 2 partner)•Finish Hashing–Double hashing, rehashing–Extendible hashing•Group Quiz #43When to Rehash?Many alternatives:•Rehash when table is half full•Rehash when insertion fails in open addressing•Rehash when insertion becomes very slowin separate chaining•Rehash when lcrosses a certain threshold4Something We Again Forgot:Disk Acesses5We Want To Minimize Disk Accesses!1024 bytes•Entire blocks transferred into memory at a time•Transfer time much lessthan seek time•Therefore we need to minimize disk accesses!Disk access time =Seek time+Transfer time6Solution: Extendible HashingHashing technique for huge data sets–Optimizes to reduce disk accesses Hash “table” contains1.Directory2Dentries, Dbits per entry, pointers to leaf buckets2.Leaf BucketsKeys in leaf Lhave dL£Dbits in common with parent key,leaves store all dataProperties–Only 2 levels in the table –only 2 disk accesses for find!–Each leaf bucket fits on one disk block –caching–Better than B-Trees if order is not important –why? 27001010011110111101Extendible Hash TableDirectory entry : key prefix(first Dbits) and a pointer to the bucketwith all keys starting with that prefixBucket entry : keys matching on first dL£Dbits, plus the dataassociated with those keys000100(dL= 2)00001000110010000110(dL= 2)010010101101100(dL= 3)1000110011(dL= 3)101011011010111(dL= 2)110011110011110Directory for D= 3insert(11010)?insert(11011)?+ data+ data+ data+ dataBucket size = 48001010011110111101Inserting Using Bucket-Split000100(dL= 2)00001000110010000110(dL= 2)010010101101100(dL= 3)1000110011(dL= 3)101011011010111(dL= 3)110011101011011Directory for D= 3+ data+ data+ data+ dataBucket size = 4(dL= 3)1110011110Split9Insertion Using Directory-Expansion1.insert(10010)But, no room to insert,only one parent,and no adoption!2.Solution: Expand directoryNow do a bucket-split01101100(2)01101(2)10000100011001110111(2)1100111110001010011110111101000100More expensive!How to ensure this is uncommon?D= 2D= 310What if Extendible HashingDoesn’t Cut It?Option 1: Store only pointers/references to the items: (key, value) pairs separately on diskOption 2: Improve hash function; Rehash11The One-Slide HashCollision resolution1.Separate Chaining–Expand beyond hashtablevia secondary Dictionaries–Allows l� 12.Open Addressing–Expand within hashtable–Secondary probing: {linear, quadratic, double hash}–l£1 (by definition!)–l£½ (by preference!)Choosing a Hash Function•Make sure table size is prime•Careful choice for strings•“Perfect hashing”–If keys known in advance, tune hash function for them!Rehashing•Tunes up hashtablewhen, e.g.,lcrosses a thresholdExtendible hashing•For disk-based dataHash function: maps keys to integers12Search ADT Implementations•Unsorted list(1)(n)(n)•Sorted list(n)(log n)?(n)•Trees(log n)(log n)(log n)•Hash Table(1)(1)(1)insertdeletefindIs there anything a hash table cannotdo efficiently?(average case)You’ll answer this in quiz #4!