Really Big Files Sorting Part 3 Using K Temporary Files Given N records in file F M records will fit into internal memory Use K temp files where K N M Create K sorted files from F then merge them ID: 341100
Download Presentation The PPT/PDF document "Sorting" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Sorting Really Big Files
Sorting Part 3Slide2
Using K Temporary FilesGiven N records in file FM records will fit into internal memoryUse K temp files, where K = N / M
Create K sorted files from F, then merge them
Problems
computers compare 2 values at once, not K values
merging only 2 of K runs at once creates LOTS of temp files
in the illustration on the next page, notice that we soon begin merging small runs with big temp files
too many comparisonsSlide3
Alternative Merging Strategy
R1
R2
T2
R3
T1
R4
F
R1
R2
T2
R3
T1
R4
F
R5
T3
R5
T3
empty
S1
S2
R1 = Run 1
R2 = Run 2
etc
What would these trees look like with 8 runs?Slide4
N-Way MergeWe can create that tree using just 4 temp files2 are input and 2 are output, the pairs alternate being input and output filesAlgorithmWrite Run 1 into T1
Write Run 2 into T2
Write Run 3 into T1
Write Run 4 into T2
...
Merge first runs in T1 and T2 into T3
Merge second runs in T1 and T2 into T4
Merge thirds runs in T1 and T2 into T3...Merge first runs in T3 and T4 into T1
Merge second runs in T3 and T4 into T2...Slide5
N-Way MergeStep NumberFiles Contain Runs1
T1 - R1 R3 R5 R7 R9
T2 - R2 R4 R6 R8 R10
T3 -
T4 -
2
T1 -T2 -T3 - R1-R2 R5-R6 R9-10T4 - R3-R4 R7-R83
T1 - R1-R4 R9-R10T2 - R5-R8T3 - T4 -4T1 -T2 -T3 - R1-R8T4 - R9-R105
T1 - R1-R10T2 -T3 -T4 -
T1
T2
F
T
3
T
4
T1
T2
T
3
T
4Slide6
AnalysisNumber of Comparisons:N-Way Merge -- O (n log2 n)K Temp Files -- O ( n2 )Disk SpaceCould the run size be one record?
In other words, is the internal sort necessary?