How can I use very long STL vectors? I need to write a code where I use the vector container to store a very large number of values, and I need to access the elements using . I am currently using , where i is an integer. How can do the same if I have more elements than the largest integers (which in my system is 2^31)? For example in the sample "code" below, how can access myvec if "i" is supposed to be larger than 2^31?

Thanks,

vector<double> myvec;

  while (Not Full)  // add elements until decides to stop
  {
      double x = some value;  // here there is a function to compute x
      myvec.push_back(x);
  }

  int size = myvec.size();
  for (int i=0; i < size; ++i)
  {
      cout << myvec[i] << endl;
  }

I think you're asking the wrong question. The right question would be (assuming 8 byte doubles), why are you trying to eat over 16 gigabytes for a single vector? The naive approach isn't likely to work in this case, you need to consider alternative algorithms that don't waste so much memory.

I wonder if you really need to store that much data? Do you expect to populate just a few values within that range? It's unlikely you'd have the memory capacity to support a fully populated vector that large, so perhaps you should consider other ways to store the data.
Maps, database...

I really need to store all that data. I need to compute and store all those values. Memory is not a problem, I have access to computers with 256GB of memory or more.

I could use different data structures to store the data, but it would make my algorithms much slower since I take advantage of the fact that I can access the i-th element in constant time. I think I really want a vector. Is there any other data structure that allows me to access the i-th element in constant time?

Thanks,

Give us a clue - just how big do you want to go?
I wonder what platform you're using - it seems that you need access to 64 bit indexes.

>I really need to store all that data.
Every time I hear somebody say that, they turn out to be wrong. :icon_rolleyes:

>I think I really want a vector.
No, you want multiple vectors. If you're hitting the memory limit of a single vector, it's only a minor inconvenience to extend that limit with a vector of vectors.

you obviously are not using MS-Windows or *nix operating systems, because the max amount of ram supported by 64-bit MS-Windows is 128 gig.

one way to do what you want is to store the data in two or more arrays and use one of the arrays depending on the value of i.

I run my code on linux machines (64bits), but and int is still 32bits. long int is sometimes still 32 bits. So just using long int won't do any good.

For the moment 64bits indices would solve the problem, but I want a solution that would work even if I want to go past 64bits.

Is there and integer type that is guaranteed to be 64bits?

I thought about using multiple vectors instead of one, but them I still need an index to know which element I need to access. When I talk about the i-th element, what would i be?

Thanks

>When I talk about the i-th element, what would i be?
I'd probably use two indices where a simple calculation represents your value's index. So if you divide your vectors into 100 element blocks and there are ten of them, the item at "index" 356 would be located at v[3][56] in the actual vector (conceptually 3 * 100 + 56).

If you are using 64 linux,
then your long ints are 8 bytes.

Not sometimes 4,5,6,7 but always 8.
The max mapped memory for 32bit is 2^32 around 4 gig
The max mapped memory for 64bit is 2^64 around 1.6 million terabytes (this means alot!).

The unsigned integer range for 32 bit is 0 to +4,294,967,295
The unsigned integer range for 64 bit is 0 to +18,446,744,073,709,551,615

This roughly means that
if you can have it in memory, then you can have it in a vector.

Now what kind of algorithm are you using, to solve what kind of problem.

I've been using Vector<string> to store 36 gig of memory, in some gene expression statiscal problem.
But in the end I ended up using standard **char, since I couldn't .reserve() because I didn't have knowlegde at the beginning of the program.

>>if you can have it in memory, then you can have it in a vector.

No it doesn't. If he is still using 32-bit compiler than the size of the long int is not changed from what it is on a 32-bit operating system, and the STL libraries will still be 32-bit libraries. He will need to use a 64-bit compiler to make any use of the 64-bit os.

>>if you can have it in memory, then you can have it in a vector.

No it doesn't. If he is still using 32-bit compiler than the size of the long int is not changed from what it is on a 32-bit operating system, and the STL libraries will still be 32-bit libraries. He will need to use a 64-bit compiler to make any use of the 64-bit os.

Well the poster says he is using a 64bit linux,
so I found it safe to assume he is using a 64bit toolchain.

But your argument is valid. But for sake of completeness, the same can be said about a 16 bit compiler :)

commented: Yes, that was my point :) +36

If I use a 64-bit compiler than a long int is always 8 bytes? Then this basically solves my problem. I didn't know this was always true. Sometimes I see 4 bytes long ints, then maybe they have been compiled with a 32-bits compiler.

So I will just use long ints for my indices, and this will make it possible to address all the memory I have. I will also use multiple vectors for speed. It takes a long time to resize the vectors once they are big.

Thanks for all you replies.

One more question: If v is vector, what is the type of v.size()? Is it an long int?

Thanks.

It's so simple (no need in DaniWeb;)), ask your compiler:

typedef std::vector<double> DblVector;
    typedef std::vector<int>    IntVector;
    DblVector dv;
    IntVector iv;
    cout << sizeof(DblVector::size_type) << '\n';
    cout << dv.max_size() << '\n';
    cout << iv.max_size() << endl;

If you are using that much memory, you will be much better off using the boost array/multi_array classes. Then you will be better off using Blitz.

Word of warning: write the code using boost/stl untill it works with smaller data sets. THEN use Blitz. Otherwize the error messages and other problems are incomprehensible.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.