Hello, I need some help filling an array with UNIQUE random numbers. So far I've figured out how to fill an array with random numbers, that's easy, but I'm stuck on how to avoid filling it with duplicate values. I'm assuming that i'll have to use either a linear or binary search function while filling the array, but my attempts so far have failed. Anyone got any idea what i'm doing wrong? Here's my terrible code so far...any help would dearly be appreciated.

#include <iostream>
#include <string>
#include <cstdlib>
#include <iomanip>
using namespace std;

//Global constant declaration
const int MAXSIZE = 200;
const int SEED = 12;

//Function Prototypes
void Seed();
void FillMainList(int list[]);
void PrintArray(const int list[]);
int LinearSearch(const int list[], int value);

int main ()
{
    int list[MAXSIZE];
    Seed();
    FillMainList(list);
    PrintArray(list);
    return 0;
}

void Seed() //Set & Display the seed value
{
    srand(SEED);
}

void FillMainList(int list[])
{
    int temp, test;
    for (int i=0; i<MAXSIZE; i++)
    {
        temp = (rand() % 5000);
        test = LinearSearch(list, temp);
        if (test == -1)
        list[i] = temp;
    }
}

void PrintArray(const int list[])
{
        for (int i=0; i<MAXSIZE; i++)
        {
            cout << setw(5) << list[i] << endl;
        }
        cout << endl;
}
int LinearSearch(const int list[], int value)
{
    int index = 0;
    int position = -1;
    bool found = false;

    while (index < MAXSIZE && !found)
    {
        if (list[index] == value)
        {
            found = true;
            position = index;
        }
        index++;
    }
    return position;
}

I don't see anything particularly wrong with your code. Two things I do see though:
- By using the same seed, you'll always get the same list of numbers. Most people solve this by using the current time as a seed.
- Using the global constant MAXSIZE is probably bad style. It's fine for creating the array, but you should restructure your other functions to take an array and it's size as parameters, and then use that size value for your loop condition. This will be more adaptable to different sized arrays and especially useful if you deal with dynamic arrays.

Some points:

• You need to tell us a bit more than the normal "this doesn't work" thing. What do you mean when you are not able to generate unique numbers? Is the array not filled completely or the numbers are getting repeated.

• You need to break out of the loop in your linear search if the required value is found.

• The loop counter in the loop which generates random numbers should not be incremented if a duplicate in found in the list.

• Use the current time as seed instead of a small number like 7 or 12.

• You need to break out of the loop in your linear search if the required value is found.

He sets a boolean flag :icon_wink:

Some points:

• You need to tell us a bit more than the normal "this doesn't work" thing. What do you mean when you are not able to generate unique numbers? Is the array not filled completely or the numbers are getting repeated.

• You need to break out of the loop in your linear search if the required value is found.

• The loop counter in the loop which generates random numbers should not be incremented if a duplicate in found in the list.

• Use the current time as seed instead of a small number like 7 or 12.

So when I run my code I get these results (i truncated them) but as you can see I get some weird memory location for whenever the number is a duplicate...I think or something weird is happening here.

2260
-858993460
-858993460
1745
4438
2603
4623
1443
2441
543
681
2255
3247
4136
1088
631
2632
2949
1409
3450
732
2526
2737
1722
1313
4160
2304
733
4599
860
4366
-858993460
626
583
2788
1500
4504
3022
4262

Ah, I see it now. When I ran it with MAXSIZE of 20 on my machine that didn't come up. The problem is here:

void FillMainList(int list[])
{
  int temp, test;
  for (int i=0; i<MAXSIZE; i++)
  {
    temp = (rand() % 5000);
    test = LinearSearch(list, temp);
    if (test == -1) // if the new value isn't in the list
      list[i] = temp; // then save it
    // otherwise.... it'll skip this index and you'll have the uninitialized value
  }
}

To fix that, you should put the random generator in a loop and use the LinearSearch results as a loop condition:

for(int i = 0; i < MAXSIZE; i++)
{
  do {
    temp = rand() % 5000; // get random number
  } while (LinearSearch(list, temp) != -1); // repeat till it's unique
  list[i] = temp; // save it
}

Sure, I understand. I'm just trying to quickly generate some random numbers in an array, however to check if the number is in the array before adding another random number. That way i'll have a list of all random numbers.

Member Avatar for iamthwee

Can't you just create an array of 5000 ints

for (int i =0; i< 5000; i++ )
{
   array[i] =i;
}

Then shuffle them around. So you would be picking two random index positions and swapping them. Do this say a couple of hundred times.

Then just print out the first 200 elements and you will get 200 random elements with no repeats in a number range from 0 - 5000.

... but I'm stuck on how to avoid filling it with duplicate values. I'm assuming that i'll have to use either a linear or binary search function while filling the array...

1. you cannot do a binary search; the array is not ordered.
2. a linear search is possible, but it is not very efficient. filling up an array of size N would take O(N*N)
3. the smart thing to do would be to avoid the search alltogether.

to choose N unique numbers out of M
a. choose the first with probability N/M
b. choose the next with probability (N-1)/(M-1) if the first was chosen, N/(M-1) if it was not.
c. and so on.
this would take linear time. O(M)

here is an example of how this could be implemented.

#include <iostream>
#include <ctime>
#include <cstdlib>
#include <algorithm>
#include <iterator>
using namespace std ;

void fill_with_unique_rand( int* array, int N, int min_value, int max_value )
{
  int available = max_value - min_value +1 ;
  int required = N ;

  for( int i=0 ; i<available ; ++i )
    if( ( rand() % (available-i) ) < required )  // we have to choose required 
                                 // out of the remaining (available-i)
    {
      --required ;
      array[required] = min_value + i ;
    }

  random_shuffle( array, array+N ) ;   // if needed
}

template< int N > inline void fill_array_with_unique_rand( int (&array)[N], 
                                           int min_value, int max_value )
{ fill_with_unique_rand( array, N, min_value, max_value ) ; }

int main()
{
  srand( unsigned( time(0)) ) ;

  int a[10] ;  fill_array_with_unique_rand( a, 101, 200 ) ;
  copy( a, a+sizeof(a)/sizeof(*a), ostream_iterator<int>(cout," ") ) ; 
  cout << '\n' ;

  int b[20] ;  fill_array_with_unique_rand( b, 20, 40 ) ;
  copy( b, b+sizeof(b)/sizeof(*b), ostream_iterator<int>(cout," ") ) ; 
  cout << '\n' ;
}

Holy Schneikes!

That's it!!! Thanks a million infarction!

for(int i = 0; i < MAXSIZE; i++)
{
do {
temp = rand() % 5000; // get random number
} while (LinearSearch(list, temp) != -1); // repeat till it's unique
list = temp; // save it
}

Member Avatar for iamthwee

That has a very bad efficiency though. See my suggestion.

Can't you just create an array of 5000 ints

for (int i =0; i< 5000; i++ )
{
   array[i] =i;
}

Then shuffle them around. So you would be picking two random index positions and swapping them. Do this say a couple of hundred times.

Then just print out the first 200 elements and you will get 200 random elements with no repeats in a number range from 0 - 5000.

This would be time-efficient and space-inefficient.

iamthwee makes a good point. The fix I provided will only be moderately effective at best, with decreasing effectiveness as you try to build a longer list (since you have to search the list each time and have more chances of a duplicate).

Also, neither of these will work if you were to build a list over 5000, for obvious reasons. And by disallowing duplicates, the array is technically not random :icon_wink:

Member Avatar for iamthwee

>This would be time-efficient and space-inefficient.

Maybe, but for a newbie my way is easier to understand.

Wow, thanks vijayan that IS wonderful. Thank you all so much for your help and clever ideas. I definately feel that I have what i need to get past the hurdle I was stuck on. Thank you to iamthewee too. I appreciate your help

To fix that, you should put the random generator in a loop and use the LinearSearch results as a loop condition:

for(int i = 0; i < MAXSIZE; i++)
{
  do {
    temp = rand() % 5000; // get random number
  } while (LinearSearch(list, temp) != -1); // repeat till it's unique
  list[i] = temp; // save it
}

• The loop counter in the loop which generates random numbers should not be incremented if a duplicate in found in the list.

;)

Hi vijayan121,

I think you may be able to help me, I haven't touched a line C in years, but the algorithm is what interests me, not the syntax. I guess the same must be doable in java.

Anyhow, I've got the same requirement, and my problem is the performance, it takes ages for my code to produce me my list.

I was suspecting that it was not the best solution to iterate over the list of elements to check if the random number was not taken yet and in that case regenerating a random number. From what I understand, your code should do the trick far quicker.

However, I am having some trouble understanding the logic. Could you pls elaborate? What does 'M' represent? What is obstream? Any chance you could give us more explanationsK? Thanks!

> What does 'M' represent?
M is the cardinality of the set of numbers out of which we have to choose N *unique* numbers. (M>=N)

> Any chance you could give us more explanations?
a. probability that the first number is chosen is N/M b. probability that the second number is chosen is (N-1)/(M-1) if the first was chosen, N/(M-1) if it was not.
ie. N/M * (N-1)/(M-1) + (M-N)/M * N/(M-1) == N/M and so on.

here is a java transliteration of the alogorithm (caveat: java is a foreign language to me, there may be syntax errors. but the logic is sound).

public class unique_rand_generator 
{
	// fill up array with unique random integers 
	// from the range min_value .. max_value
        // invariant: max_value - min_value + 1 (M) 
        //      >= array.length (N)
    public static void fill( int[] array, int min_value, 
                             int max_value )
	{
	  int available = max_value - min_value + 1 ;
	  int required = array.length ;
          Random rng = new Random() ;
	  for( int i=0 ; i<available ; ++i )
	  {
		// we have to choose required
                // out of the remaining (available-i)
	    if( ( rng.nextInt(available-i) ) < required )   
	    {
	      --required ;
	      array[required] = min_value + i ;
	    }
	  }
	}
}

My algorithm is a little bit different. First, I generate the random number into the array in order. Then randomly swap the array elements to make it disorder. For example: I want to generate 11 unique numbers from range of 1 to 1000. Firstly, I would fill my array with random number in order by create recursive function that firstly random a number into the middle of the array. Then, on the left side of that number I generate the number from 1-random_number and on the right side of that number I generate the number from random_number to max_number.

___  ___  ___  ___  ___  ___   320  ___  ___  ___  ___  ___  ___
___  ___  150  245  ___  ___
16    85
                    270   290
                                    ___  ___  570  630  ___  ___

                                    421  524

                                                        724  815
16   85   150  245  270  290  320   421  424  570  630  724  815

However, this algorithm has its weak point, what happen if the first number is 1, but we can easily solve that problem.

commented: neat! +7

The trouble with that algorithm is that it doesn't uniformly select all the possible combinations of numbers; it's biased. The trouble with vijayan's algorithms is that they rely on walking through all the possible numbers one must select from, which is slow.

> My algorithm is a little bit different.

>> The trouble with that algorithm is that it doesn't uniformly select
>> all the possible combinations of numbers; it's biased.

actually it is a very good algorithm. with a simple adjustment, the bias can be (approximately) removed. instead of placing the selected number into the middle of the array, place it in a location such that the remaining numbers are (approximately) equally probable. there would be a slight skew because we need to map a larger integer range to a smaller one; but performance could not be bettered ( O(N) instead of O(M) for M>N ).

void fill_with_unique_rand( int* array, int N,
                            int min_value, int max_value )
{
   int M =  max_value - min_value + 1 ;
   int r = M==0 ? 0 : rand()%M ;
   int v = min_value + r ;
   if( N == 1 ) array[0] = v ;
   else
   {
     int i = double(r)/M * N + 0.5 ;
     array[ i ] = v ;
     if( i>0 ) 
      fill_with_unique_rand( array, i, min_value, v-1 ) ;
     if( i<(N-1) ) 
      fill_with_unique_rand( array+i+1, N-i-1, v+1, max_value ) ;
   }
}

The problem with that is that it eliminates many possible choices of numbers, making the probabilities of various combinations unequal since you're forcing many of the numbers to fit into a certain slot. You need to either select the number with respect to the probability distribution of the slot you're going to fill, or select the slot with an appropriate probability distribution for the number you've chosen.

> You need to either select the number with respect to the probability distribution
> of the slot you're going to fill,
yeah. it is not exact. for example, if M==1000 and N==100, if the random number chosen is 253, place it in position 25 of the array. 0-252 is now the range for 25 positions(0-24) and 254-999 is the range for 74 positions. it is approximate, but 253/25 is roughly equal to 746/74.

The way I might do it is to use a data structure of the following form:

struct tree {
    tree* left;
    tree* right;
    int split_value;
    int incr; // num nodes in left child
}

Then what you do is, assuming you're picking values for an array from [0,M), you approach the tree with a number, v , chosen uniformly in the range [0, M-S), where S is the number of nodes in the tree, i.e. the number of numbers you've picked so far. Then, if your value, v + incr , is currently greater than or equal to split_value, you set v += incr + 1; and dive down the right side of the tree. If you've reached a null tree with your value v , insert a new node with split_value v , incr 0. Then you need to update any parent nodes of your new node whose left child you traversed to, adding one to their incr .

Given that you're approaching this tree with knowledge of how many nodes are in the tree, you at any point in time know how many nodes are in the left and right child. So you can balance this tree using the load-balanced binary search tree algorithm. Thus each new number you take will take O(log S) operations.

So you select values in the range [0,M-S) and insert them into the tree using the above scheme, for S = 0 to N-1. Then you have your N selected random values sitting in the tree. And that's that. Put them in an array if you want to do so.

The total cost is then O(n log n).

yeah. it is not exact. ... it is approximate, but 253/25 is roughly equal to 746/74.

The problem is that it's too exact. Randomly chosen combinations are not likely to have their numbers lined up so perfectly.

> The problem is that it's too exact. Randomly chosen combinations
> are not likely to have their numbers lined up so perfectly.
true enough.

> The total cost is then O(n log n)
so, it means for large N, if M/N < log N, the tree is the way to go.
otherwise go for the old O(M) algorithm.

No, for large M, the tree is the way to go.

O( N log N ) would be greater than O(M) for *any* N and M
if M/N < log N

And you want the running time to be less, not greater.

  1. M > N
  2. Assuming that M = N + D
  3. M < N log(n)
  4. N + D < N log(n)
  5. D < N log(n) - N
  6. D < N [log(n) - 1]

If the difference of M and N is too large, then Tree is way to go
If the difference of M and N is too small, then O(M) is way to go.

Hi everybody, and thanks for contributing to the thread. Especialy thanks to vijayan, who put in a lot of effort to help me out. I tried your algorithm, but it didn't work out somehow, it returned the list back-front instead of randomized.

I found a simple solution, just thought I'd share it for all the help I had.

What I do (based on the fact you can pass as paramater the upper boundary of your random number in the java object Random):

1. populate an array of a given range with numbers. So array position 0 = "1", array position
1 = "2" and so on.

2. instantiate an empty output array

3. pick a random position in my input array.
Copy the value of that cell to the output array (position: iteration number)

4. copy the value of last position of my input array to the position the random indicated (the position I just copied to my output array)

5. decrease the random boundary by 1, so that the generated random number will be max (input array size) - 1

and do this over and over, till all the values in the input arrays are copied.

No looping, no difficult algorithm and great performance!

HTH


Loïc

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.