I am having trouble understanding an external sort. I have to code up an implementation for the following below. I have to sort 120 records(they are really just integers), and have a memory restriction of 20 records. So I will have 6 blocks. Let me know if this is right: For step 2, on Tb2 I will have three runs, and Tb1 three runs as well, which equals 60 records per file. For step 3, I just don't understand the merging process. What does it mean by reading two records into main memory, and from which file? And to read the next record?

Basic Algorithm

Assumptions:
four tapes:
two for input - Ta1, Ta2,
two for output - Tb1, Tb2.

Initially the file is on Ta1.

N records on Ta1
M records can fit in the memory

Step 1: Break the file into blocks of size M, [N/M]+1 blocks

Step 2: Sorting the blocks:

read a block, sort, store on Tb1
read a block, sort, store on Tb2,
read a block, sort, store on Tb1,
etc, alternatively writing on Tb1 and Tb2

Each sorted block is called a run.
Each output tape will contain half of the runs

Step 3: Merge:

From Tb1, Tb2 to Ta1, Ta2.
Merge the first run on Tb1 and the first run on Tb2, and store the result on Ta1:

Read two records in main memory, compare, store the smaller on Ta1
Read the next record (from Tb1 or Tb2 - the tape that contained the record stored on Ta1) compare, store on Ta1, etc.
Merge the second run on Tb1 and the second run on Tb2, store the result on Ta2.
Merge the third run on Tb1 and the third run on Tb2, store the result on Ta1. Etc, storing the result alternatively on Ta1 and Ta2.

Now Ta1 and Ta2 will contain sorted runs twice the size of the previous runs on Tb1 and Tb2
From Ta1, Ta2 to Tb1, Tb2.
Merge the first run on Ta1 and the first run on Ta2, and store the result on Tb1.
Merge the second run on Ta1 and the second run on Ta2, store the result on Tb2 Etc, merge and store alternatively on Ta1 and Ta2.

Repeat the process until only one run is obtained. This would be the sorted file.

The instructions for merging are given to you, including which files to get the data from.

I know they are. Could someone reword it for me :)

Basically what it means is take a record from each file(tb1-recordb1 = 10,tb2-recordb2 = 15). Compare them. The lesser one(recordb1) write to the new file(ta1). Now take another record(recordb1 = 13) from that file(tb1) that stored the record you just wrote. Compare the existing value(recordb2) with the new one(recordb1) and write the lesser one to the new file. continue like this, always reading from the file you just wrote the record from, until you have read a complete block from each file. ta1 will have a block of 40 records. Start the next block and write it to ta2.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.