Member Avatar for Griff0527

I am trying to write several different sorting algorithms in order to time the differences between them when reading a file of a half million integers. For testing purposes, I am only using a few hundred integers, but I am getting a stack overflow error. Also, for testing reasons, I have an output file being created during this run so that I can see if the code is working correctly. Originally, I had 3 nested for loops that I believed were causing the errors (and horrible effeciencey), but I cleaned those up and I am still having the error. Can someone look this over and see if there is a glaring issue that I am overlooking. Thanks for any help you can give. I will also upload the .txt file of integers, in case anyone feels up to compiling this to recreate the error. I am using VS2012 Ultimate for my editing/compiling.

#include <iostream>
#include <fstream>
#include <ctime>
#include <string>

void quickSort(int a[], int first, int last);
int pivot(int a[], int first, int last);
void swap(int& a, int& b);

using namespace std;


int main()
{
    const int SIZE = 500000;
    string s[SIZE];
    int test[SIZE];
    int i = 0;
    int N = sizeof(test)/sizeof(int);
    ifstream readFile;
    ofstream myOutputFile;
    int start_s=clock(); //start timer
    readFile.open("inputData.txt");

    if (readFile.is_open()) 
    {
        while (!readFile.eof())
            {
                if (i < SIZE)  //for all values in the inputData read file
                {
                    getline(readFile, s[i]); //store values in a string array
                    test[i] = atoi(s[i].c_str()); //convert string array to integer array
                    i++;
                }

            }
    }

    quickSort(test, 0, SIZE-1);

    readFile.close(); //close file
    int stop_s=clock();//stop timer
    cout <<"time: " << (stop_s-start_s)/double(CLOCKS_PER_SEC)*1000 << endl;//output timer
    myOutputFile.open("sortedValues.txt");
    for (int i =0; i < SIZE; i++)
        myOutputFile << test[i] << endl;
    myOutputFile.close(); 


    return 0;
    system("pause");
}

/*
* Recursive Quicksort algorithm.
*/
void quickSort( int a[], int first, int last )
{
    int pivotElement;

    if(first < last)
    {
        pivotElement = pivot(a, first, last);
        quickSort(a, first, pivotElement-1);
        quickSort(a, pivotElement+1, last);
    }
}

/*
* Find and return the index of pivot element.
*/
int pivot(int a[], int first, int last)
{
    int  p = first;
    int pivotElement = a[first];

    for(int i = first+1 ; i <= last ; i++)
    {
        if(a[i] <= pivotElement)
        {
            p++;
            swap(a[i], a[p]);
        }
    }

    swap(a[p], a[first]);

    return p;
}


/*
* Swap the parameters.
*/
void swap(int& a, int& b)
{
    int temp = a;
    a = b;
    b = temp;
}

Just throwing this out there, this may not be the solution.

In this line you set the following interger to be a random number, where do you get this number from?

const int SIZE = 500000;

But, where you call your quick sort you call the following:

quickSort(test, 0, SIZE-1);

This would therefore assume that the last value is at position 500000 - 1 what if the array contains only 100 values? Would there be a position 499999? This is probably where your code is segmenting. You should also probably set the size of the array, i.e. how many values are inside the array and have this value to be the total number of elements that the array can handle. So for example, I entered your 100 values that you supplied and did the following:

quickSort(test, 0, 99);

Where we can assume that 0 is the first value inside the array and 99 is the last value, this printed out a result. Please see my attatched file.

You should debug your code, find out where it is going wrong.

Good luck :)

Member Avatar for Griff0527

I got the number 500000 because the full file has that many entries in it. I am running it with a much smaller file to ensure that it would work. I adjusted the code to reflect

const int SIZE = 100;

and

quickSort(test, 0, 99);

Then I checked the file and reduced it to exactly 100 integers. The stack overflow is gone, but the program has been running for well over 5 minutes and isn't completed. This seems excessive for 100 numbers.

How long did it take when you were able to successfully run the code?
Also, you said I should debug my code and figure out where it is going wrong. Any suggestions on where I should start?

Could you please provide me with the ACTUAL data values that you have? You can post them onto pastebin

If it's taking a massive amount of time to run, you should look at optimising your algorithm so that it best fits to match your needs. C++ is very quick, however, it does depend on the system and the compiler that you are using and whether or not you are having any memory leaks and/or you are handling things in memory. You may find it a lot easier to read these values inside a 2D array and do your sorting that way.

Because your question was not broad and you were not seeking someone to do the work for you, post your values and I will take a look to try and improve the performance of this for you.

NOTE: Your program should not be running for more than 5 minutes with 100 values. There is something wrong here.

These lines will certainly cause a stack overflow:

const int SIZE = 500000;
string s[SIZE];
int test[SIZE];

This creates (stack-based) static arrays of integers and strings, of 500000 elements. Assuming you run this on a 32bit platform, this will probably take about 10Mb of memory (and probably twice that on a 64bit platform). The stack is a static and limited amount of memory to be used for local variables of functions as they are needed, it is not meant to be used to allocate large arrays like that. Typically, the limit of the stack is between 2Mb and 8Mb, depending on the platform and compiler (the stack size can usually be configured via a compiler option, but the default is usually about 2-8Mb).

For any kind of large arrays, you should allocate them dynamically, even if you know their size at compile-time. So, you can use either these C-style arrays:

int main() {
  const int SIZE = 500000;
  string* s = new string[SIZE];
  int* test = new int[SIZE];

  // the rest of the main() function's code here.

  delete[] s;
  delete[] test;

  system("pause");
  return 0;
};

Or, you can use the following C++-style vectors:

#include <vector>

int main() {
  const int SIZE = 500000;
  vector<string> s(SIZE);
  vector<int> test(SIZE);

  // the rest of the main() function's code here.

  system("pause");
  return 0;
};

The stack overflow is gone, but the program has been running for well over 5 minutes and isn't completed. This seems excessive for 100 numbers.

This indeed very extreme. I made a little program a while ago for fun that tests out a whole suite of classic sorting algorithms on arrays of integers. For 100 integers, the time required was always less than 10 micro-seconds (i.e., 0.00001 seconds), and even InsertionSort (O(N^2)) is below 10 micro-seconds. So, 5 minutes is more than extreme, probably an infinite loop.

Also, if you want to time the algorithm, you should not time the reading and writing to and from the files, because that's going to be far more costly in time than the actual sorting. So, you should have:

readFile.open("inputData.txt");
if (readFile.is_open()) 
{
    while (!readFile.eof())
        {
            if (i < SIZE)  //for all values in the inputData read file
            {
                getline(readFile, s[i]); //store values in a string array
                test[i] = atoi(s[i].c_str()); //convert string array to integer array
                i++;
            }
        }
    readFile.close(); //close file
}

int start_s=clock(); //start timer
quickSort(test, 0, SIZE-1);
int stop_s=clock();//stop timer

I took your code and created this program to test it:

#include <iostream>
#include <cstdlib>
#include <iomanip>
#include <ctime>
#include <vector>

int pivot(int* a, int first, int last)
{
    int  p = first;
    int pivotElement = a[first];
    for(int i = first+1 ; i <= last ; i++)
    {
        if(a[i] <= pivotElement)
        {
            p++;
            std::swap(a[i], a[p]);
        }
    }
    std::swap(a[p], a[first]);
    return p;
}

void quickSort(int* a, int first, int last )
{
    int pivotElement;
    if(first < last)
    {
        pivotElement = pivot(a, first, last);
        quickSort(a, first, pivotElement-1);
        quickSort(a, pivotElement+1, last);
    }
}


int main() {
  std::srand((unsigned int)time(0));

  std::vector<int> arr(100);

  for(int i = 0; i < 100; ++i)
    arr[i] = std::rand() % 1000; // 100 random integers in [ 0, 1000 ) range.

  int start_s = std::clock(); //start timer
  for(int i = 0; i < 1000000; ++i) {  // make 1 million runs.
    std::vector<int> temp = arr;  // copy the array.
    quickSort(&temp[0], 0, 99);   // run quick-sort.
  };
  int stop_s = std::clock();//stop timer
  std::cout <<"time: " << (stop_s-start_s)/double(CLOCKS_PER_SEC) << std::endl;//output timer

  std::cout << "Finished!" << std::endl;
  return 0;
};

With highest optimizations, this program runs in 1 second, without optimizations, it runs in 6 seconds. In other words, the time for sorting a single 100-element random array is a few micro-seconds. Even in the worst case performance O(N^2), the code should still run in the low micro-seconds range.

The problem is that you have an infinite loop in the reading of the data. I would try the following instead:

readFile.open("inputData.txt");
if (readFile.is_open()) 
{
    while ( (!readFile.eof()) && (i < SIZE))   //<--- NOTICE the extra condition here.
        {
            getline(readFile, s[i]);      // store values in a string array
            test[i] = atoi(s[i].c_str()); // convert string array to integer array
            i++;
        }
    readFile.close(); //close file
}

You had an infinite loop when you reached the maximum SIZE before you reached the end of the file. Because of your if-statement, you would never read anything more from the file, but because the end-of-file condition was the condition to end the loop, the loop would never end. So, your 5 minutes running time just means that you've left the program running on an infinite loop for 5 minutes.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.