eliminating int duplicates

Question

marnun 0 Light Poster

11 Years Ago

Write an EFFICIENT program that reads in the name of a file containing positive integers that are less than 1,000,000. The program prints out the integers in the file, in the order that they appear, eliminating duplicates. So if the file contained 8 3 1 9 2 8 15 9 8 4, the program would print 8 3 1 9 22 15 4.

Here I am not even sure how to approach. Do we read in the file name as an int array, and then eliminate duplicates and save the result in that same file?
Any suggestions would be great! thanks.

c++ duplicate fstream

6 Contributors
14 Replies
264 Views
1 Day Discussion Span
Latest Post 11 Years Ago Latest Post by marnun

phorce 131 Posting Whiz in Training

11 Years Ago

Yes, you can do.

Store the values inside an array, then send the reference (pointer) of this array to a function that checks to see if the value is unique. (You can even construct a temp array to store the unique values inside) then you can print these to a file.

vmanes 1,165 Posting Virtuoso

11 Years Ago

First, define "efficient". I can think of solutions that are efficient in terms of speed, and in terms of memory usage.

As to solving the probem, yes, an array will almost certainly be needed for any solution. It's a matter of what you put in the array, and what you do with it.

If you store all the inputs, you then have to search through it in some manner and eliminate duplicates. You could set them to be a negative value, which will be skipped in the printing phase. But, you would have to make many passess through the array to find duplicates. Also, how big do you make the input array - your problem only state the limit on the size of the values, not the size of the file!

I would look for a way to test each value as it's read in, determine if you've already seen it or not, print it if it's unique. That way you don't worry about storing all the input or accessing it multiple times. There's an easy way to do this.

nullptr 167 Occasional Poster

11 Years Ago

Here's a fairly memory efficient example. The idea is basically to create an array where each bit in the array represents an integer. If the bit representing that integer is set then the function returns false, otherwise it sets the corresponding bit and returns true.

bool add2intArray(unsigned int* iarray, int number)
{
    int index = number >> 3; // same as number / 8
    unsigned char bit = number & 7;

    if ((iarray[index] & 1 << bit) == 1 << bit)
        return false;

    iarray[index] |= 1 << bit;

    return true;
}

int main()
{
    const int size = 1000000/8 + 1;
    unsigned int arr[size] = {0};

    int test[ ] = {1 , 2, 8, 9, 7, 3, 203, 9, 2, 16, 54, 54, 99, 4, 6, 2, 9, 97, 8, 98};

    for (int i : test)
    {
        if (add2intArray(arr, i))
            std::cout << i << ", ";
    }

    getchar();

    return 0;
}

Edited 11 Years Ago by nullptr

nullptr 167 Occasional Poster

11 Years Ago

I just realised that I messed up slightly in the above example. The fixed code is as follows.

bool add2intArray(unsigned char* iarray, int number)
{
    int index = number >> 3;
    unsigned char bit_value = 1 << (number & 7);

    if ((iarray[index] & bit_value) == bit_value)
        return false;

    iarray[index] |= bit_value;

    return true;
}

int main()
{
    const int size = 1000000/8;
    unsigned char arr[size] = {0};

    int test[ ] = {1 , 2, 8, 9, 7, 3, 203, 9, 999999, 2, 16, 54, 54, 99, 4, 6, 2, 9, 97, 8, 98, 999999, 999991};

    for (int i : test)
    {
        if (add2intArray(arr, i))
            std::cout << i << ", ";
    }

    getchar();

    return 0;
}

Write an EFFICIENT program that reads in the name of a file containing positive integers that are less than 1,000,000. The program prints out the integers in the file, in the order that they appear, eliminating duplicates. So if the file contained 8 3 1 9 2 8 15 9 8 4, the program would print 8 3 1 9 22 15 4.

The way I read the question, you simply need to read the file and print the content minus any duplicate values. There's no need to then save the non-duplicate list back to a file.

Edited 11 Years Ago by nullptr because: clarity

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

marnun 0 Light Poster · Answer 1 · 2012-12-09T22:32:18+00:00

thanx for replies!! been very helpful.
i think i am in overdrive when simple seams complex. i'll get back to it.

marnun 0 Light Poster · Answer 2 · 2012-12-10T06:25:42+00:00

I am back on this exercise, and this is what I came up with. Does it make any sense?

    #include <iostream>
    #include <fstream>
    using namespace std;

    int main() {
        string nameFile;
        int k, n, a[k], b[k];
        k=0;
        cin>>nameFile;
        ifstream integerFile (nameFile.c_str());
        while (integerFile>>n) //loop to find number of integers in the file
            k++;
        integerFile.close();

        fstream integerFile (nameFile.c_str(), ios::in |ios::out);

        for (int i=0; i<k; i++) {
            integerFile>>a[i];
            for (int j=0; j<i; j++) {
                if (a[i]!=a[j]) b[i]=a[i]; // b[] is an array that holds all non-duplicated integers
            }
        }

        for (int i=o; i<k; i++)
            integerFile<<b[i]; //will this override information stored in that file? 

        integerFile.close();
        return 0;
    }

marnun 0 Light Poster · Answer 3 · 2012-12-10T21:03:38+00:00

thanks for your answer. not sure what you're doing in your code though...
here is my code, adjusted to save non-duplicates in a new file instead of overwriting the initial one.

    #include <iostream>
    #include <fstream>
    using namespace std;

    int main() {
            string nameFile, nameFile2;
            int k, n, a[k];
            k=0;
            cin>>nameFile;
            cin>>nameFile2;
            ifstream integerFile (nameFile.c_str());
            ofstream integerFile2 (nameFile2.c_str());

            while (integerFile>>n) // loop to find number of integers in the file
                k++;

            for (int i=0; i<k; i++) {
                integerFile>>a[i];
                for (int j=0; j<i; j++) {
                    if (a[i]!=a[j]) 
                          integerFile2<<a[i]; 
                }
            }

            integerFile2.close();
            integerFile.close();
            return 0;
        }

vmanes 1,165 Posting Virtuoso · Answer 4 · 2012-12-11T05:07:33+00:00

Why do you read the file to count items, then use that count to limit the for loop?

Why not just use that while in place of the first for? In fact, I don't think your code will successfully run as is, since you read the file to end, then start reading it again without resetting in any manner.

Your array a[]. You set its size with variable k, which has no value. So how big is your array?

This bit

for (int j=0; j<i; j++) {
    if (a[i]!=a[j])
        integerFile2<<a[i];

Is going to write the newly read value a[i] to the output many times, each time it is tested against a number it doesn't match. You should instead keep track of finding a match, if not, then output the number.

As I mentioned earlier, efficiency has many ways of being measured. In terms of time, you have something that's in the n^2 category, about the same time efficency as bubble sort. And you have the problem of not knowing how bit to make your array a[]. Is there a limit to how many numbers can be in the input file? 100, 1,000, 1,000,000?

marnun 0 Light Poster · Answer 5 · 2012-12-11T15:34:27+00:00

Thanks a lot for your comment!
1. The number of integers in the file is not specified. That's why I thought to use while loop to see the size of an array i need. Actually, you are right, I should declare an array after while loop, not before.
2. How do I reset file? Close it and re-open again?
2. I see what you are saying about testing for duplicates.

Modified code:

    #include <iostream>
    #include <fstream>
    #include <string>
    using namespace std;

    int main() {
            string nameFile, nameFile2;
            int k, n, count;
            k=0; count=0;
            cin>>nameFile;
            cin>>nameFile2;
            ifstream integerFile (nameFile.c_str());
            ofstream integerFile2 (nameFile2.c_str());

            while (integerFile>>n) // loop to find number of integers in the file
                k++;

            int a[k];

            integerFile.close();
            ifstream integerFile (nameFile.c_str());

            for (int i=0; i<k; i++) {
                integerFile>>a[i];
                for (int j=0; j<i; j++) {
                    if (a[i]==a[j]) 
                          count++;
                    if (count==0)
                          integerFile2<<a[i]; 
                }
            }

            integerFile2.close();
            integerFile.close();
            return 0;
        }

marnun 0 Light Poster · Answer 6 · 2012-12-11T15:43:43+00:00

Just realized that I made a mistake in part checking for duplicates. It should be like this:

    for (int i=0; i<k; i++) {
            integerFile>>a[i];
            for (int j=0; j<i; j++) {
                if (a[i]==a[j]) 
                      count++;
            }
            if (count==0)
                integerFile2<<a[i]; 
   }

Lucaci Andrew 140 Za s|n · Answer 7 · 2012-12-11T16:23:16+00:00

You can always use the sorted set from STD:

#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <set>
using namespace std;

int main() {
    set<int> a;
    ifstream fin ("filename.txt", ios::in);
    int nr;
    string line;
    while (getline(fin, line)){
        stringstream token(line);
        cout<<line<<endl;
        token>>nr;
        a.insert(nr);
    }
    set<int>::iterator it;
    for (it=a.begin();it!=a.end();it++)
        cout<<*it<<" ";
    cout<<endl;
    return 0;
}

NathanOliver 429 Veteran Poster Featured Poster · Answer 8 · 2012-12-11T16:33:25+00:00

I had thought to tell him that but the problem states to print the numbers in the order they appear. If his compiler supports c++11 he could use an unordered map and use the value for the key.

vmanes 1,165 Posting Virtuoso · Answer 9 · 2012-12-11T18:10:27+00:00

marnum, you're still making a few classic mistakes, and writing code that loses on the efficiency standpoint.
Assuming you make array a[] big enough for any possible input;
1. be sure to reset count to 0 for each new pass in the for loop.
2. don't write duplicate numbers to the array, you'll just be comparing to them numerous times.

How about:

   end = 0;
   while( integerFile >> temp ) 
    {
        count = 0;
        for (int j = 0; j < end; j++) 
        {
           if (temp == a[j]) 
               count++;
        }
        if (count == 0)
        {  
           a[end] = temp;
           integerFile2 << a[end]; 
        }
   }

This is still not the most time efficient method, but I hope these corrections to your chosen path help.

marnun 0 Light Poster · Answer 10 · 2012-12-11T18:13:02+00:00

i'll have to look up more on stringstream token, set and unordered_map - haven't used them yet.
yes, numbers are not supposed to be sorted.
thanx everybody! i'll get back to this code, hopefully, after my finals..