I am having problems with the new operator() specifically within
std::string

I appear to have hit a memory limit in visual studio.

Ideally I would like someone to help me find one of the following solutions:

1 - Can I access the memory in a very big file directly without a memory buffer?
2 - How do you increase the maximum size of the free store in visual studio?

My approach at the moment is not working I have a very big file that I want to
chop before processing.

- I create a membuffer for a 1Gb file (this was made from 25Gb file)
- then try to use a std::string to iterate over the string.

but I cannnot create the string as I get
bad allocation error
from cstr()

A simplified version of my code

#include <string>
#include <exception>
#include <fstream>
#include <iostream>

void main()
{

 std::ifstream fin("D:\\Gig1.txt", std::ios::in);
 if(fin.is_open)
 {
  unsigned int sz(1073741825); // 1 gig (1024)^3
  char * memblock = new char[sz]; //this is fine
  //check file sz is right size not called as sz == gcount
  fin.seekg(0, std::ios::beg);
  fin.read(memblock, sz);
  if(sz > fin.gcount())
  {
    sz = fin.gcount();
  }

  try
  {
	std::string str(memblock, sz); //too much for visio :(
  }
  catch(exception &e)
  {
   std::cout << e.what() << std::endl;
  }

  delete [] memblock;
  fin.close();
 }


}

This is an approximate code and I am posting here as my first post was asking the wrong question.

Thanks,
David

I get the same behavior. I have a Windows 7 64-bit computer with 5 Gig RAM. I also just tried the resize() method and that doesn't work either. Possibly the program can not find enough consecutive memory for that size.

Code::Blocks doesn't like that either -- it throws an error.

When thinking about a question posted about creating a LargeInteger class the other day I realized that to access the indexes of a string with the [] operator you need to use an int, which has an upper limit in size which is probably less than 1,000,000,000,000, unless the compiler automatically converts large index values into type long or type long long. I obviously do not know whether the standard limits the size of std::strings in any way or not, htough.

Using the standard fstream libraries should enable you to read from a large file just fine. Here's some example code that compiles and works for parsing a 5.3 GB file I made just for this experiment

#include <fstream>

using namespace std;

int main(int argc, char *argv[]){
    ifstream ifile( "op.txt" );
    string container;
    char linebuffer[400];
    while( !ifile.eof() ){
        while( container.size() < 1048576 && !ifile.eof() ){
            ifile.getline( linebuffer, 400 );
            container.append( linebuffer );
        }
        // Do something with container holding at most 1MB of string data
        container.clear();
    }
    ifile.close();
    return 0;
}

Notice that you have to use a char* buffer for getline(). This is trivial, though, and you can append that char array to the end of your container string. You should be able to parse the data line by line without putting it into a storage string. However, since this is what you asked for, I've included it. Try this out and see how it works for you.

I appear to have hit a memory limit in visual studio.

Since you are trying to allocate rather huge amounts of memory (2GB), I'd guess that the limitations are more probably imposed by your operating system (which one is it?) than the std::string implementation.

See MSDN documentation on Memory and Address Space Limits. Are you perhaps reaching the 2GB user-mode virtual address space limit there?

PS. As already suggested, perhaps process the file in much much smaller chunks than what you are now planning.

>>Since you are trying to allocate rather huge amounts of memory (2GB),

Nope. Its only 1 Gig which is less than what st::string.max_size() returns.

> I create a membuffer for a 1Gb file (this was made from 25Gb file)
Start with something smaller, like 1Mb instead.

If your buffer is too large, then it will just end up in swap space (yet more disk accesses). Every part of the file will then require TWO disk accesses to process rather than just the one.

>>Since you are trying to allocate rather huge amounts of memory (2GB),

Nope. Its only 1 Gig which is less than what st::string.max_size() returns.

Actually the OP tries to get ~2 GBs .. I think you missed ...

std::string str(memblock, sz); //too much for visio :(

Actually the OP tries to get ~2 GBs .. I think you missed ...

std::string str(memblock, sz); //too much for visio :(

Unless I'm missing something line 12 sets the value of sz to 1 Gig. All that line does is initialize str to the first sz bytes of memblock.

Unless I'm missing something line 12 sets the value of sz to 1 Gig. All that line does is initialize str to the first sz bytes of memblock.

The OP is trying to get a brand new std::string in;

std::string str(memblock, sz);

containing a copy of the data that was read in, so that would double the size of the allocated memory.

You are right -- but that may not be the problem.
This works

int main()
{
     unsigned int sz(1073741825);
     std::string str;
     str.resize(sz);
     cout << "Done\n";
     cin.get();

}

But this does not

int main()
{
     unsigned int sz(1073741825);
     unsigned int i = 0;
     std::string str;
     for(i = 0; i < sz; i+= 10)
         str += "xxxxxxxxxx";
     cout << "Done\n";
     cin.get();

}

Since you are trying to allocate rather huge amounts of memory (2GB), I'd guess that the limitations are more probably imposed by your operating system (which one is it?) than the std::string implementation.

See MSDN documentation on Memory and Address Space Limits. Are you perhaps reaching the 2GB user-mode virtual address space limit there?

PS. As already suggested, perhaps process the file in much much smaller chunks than what you are now planning.

Thanks everyone for your input,

As I am using XP there is a 4Gb physical memory limit and
I suspect this is restricting the application to ~2GB max.

I was completely unaware that this limit existed as most windows classes
appear to have been built with longs I figured since memory address registers
are dealing 100GB that this issue wasn't applicaible

I am concerned that like the millenium bug there is a lot
of microsoft code that won't cope beyond the 4GB limit despite claims
such as ie 8 downloads :(

Here was a work around not involving a new file:

try
{
 file.seekg (0, std::ios::beg);
 std::string data;
 for(int i(0); i < 16; ++i)
 {
  size = 64 * 1024 * 1024;
  char * smaller_memblock = new char [size];
  file.read(smaller_memblock, size);
  if(size < file.gcount())
  {
    size = file.gcount();
  }
  data.append(memblock, size);
  std::cout << data.size() << "<>" << size << std::endl;
  delete [] smaller_memblock;
}

interestingly this doesn't allow for a std::string above 640Mb
although direct allocation allows for 1Gb without throwing appending does not.

This is not a simple memory leak from delete[] not freeing
as if a temp string is used without appending instead, the for loop completes
but it does slow for each extra read() cstr();

Append or += fails I suspect as it predicts the size of the container
by going up in factors of two rather than just allocate exactly the memory needed.

So it appears that you have to use a smaller file if strings are to be used.

A wrapper around the string class would allow bigger strings but as this would involve
re-writing append I am not going to touch it. Hence I will make smaller files.

If someone has a way to overcome the append problem it would still be interesting
so as to better understand the design though.

David

Thanks everyone for your input,

As I am using XP there is a 4Gb physical memory limit and
I suspect this is restricting the application to ~2GB max.

I was completely unaware that this limit existed as most windows classes
appear to have been built with longs I figured since memory address registers
are dealing 100GB that this issue wasn't applicaible

The problem is you are probably compiling with a 32-bit compiler which produces code that has a 2 gig limit, even if you are running 64-bit operating system. Get a 64-bit compiler and that limitation will go away, although there will be another, but much larger, limit.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.