I am having trouble understanding an external sort. I have to code up an implementation for the following below. I have to sort 120 records(they are really just integers), and have a memory restriction of 20 records. So I will have 6 blocks. Let me know if this is right: For step 2, on Tb2 I will have three runs, and Tb1 three runs as well, which equals 60 records per file. For step 3, I just don't understand the merging process. What does it mean by reading two records into main memory, and from which file? And to read the next record?
Basic Algorithm
Assumptions:
four tapes:
two for input - Ta1, Ta2,
two for output - Tb1, Tb2.
Initially the file is on Ta1.
N records on Ta1
M records can fit in the memory
Step 1: Break the file into blocks of size M, [N/M]+1 blocks
Step 2: Sorting the blocks:
read a block, sort, store on Tb1
read a block, sort, store on Tb2,
read a block, sort, store on Tb1,
etc, alternatively writing on Tb1 and Tb2
Each sorted block is called a run.
Each output tape will contain half of the runs
Step 3: Merge:
From Tb1, Tb2 to Ta1, Ta2.
Merge the first run on Tb1 and the first run on Tb2, and store the result on Ta1:
Read two records in main memory, compare, store the smaller on Ta1
Read the next record (from Tb1 or Tb2 - the tape that contained the record stored on Ta1) compare, store on Ta1, etc.
Merge the second run on Tb1 and the second run on Tb2, store the result on Ta2.
Merge the third run on Tb1 and the third run on Tb2, store the result on Ta1. Etc, storing the result alternatively on Ta1 and Ta2.
Now Ta1 and Ta2 will contain sorted runs twice the size of the previous runs on Tb1 and Tb2
From Ta1, Ta2 to Tb1, Tb2.
Merge the first run on Ta1 and the first run on Ta2, and store the result on Tb1.
Merge the second run on Ta1 and the second run on Ta2, store the result on Tb2 Etc, merge and store alternatively on Ta1 and Ta2.
Repeat the process until only one run is obtained. This would be the sorted file.