How to use really large arrays in C++

I recently ran into issues when using really large arrays in C++ for a dataset of several hundred million points. I have access to a computer with 512 GB of RAM, but even there the program would quit with an error. It took me a bit to figure out that the issue was not with the size of the array in bytes, but the number of elements in it. I was constructing and accessing the array with int integers, which are usually 32 bit integers – just as a long integer is with Microsoft Visual C++. The problem was that the number of elements in the array was too large for a 32 bit integers, leading to buffer overflows.

The solution was to switch to size_t as type, which is a 64 bit unsigned integer on 64 bit Visual C++. unsigned long long would be an equivalent. You need to take care that, when allocating an array, you make sure that the number of elements is also given as 64 bit integers. If you simply multiply a couple of 32 bit integers, the result will not be cast to 64 bit and you will get overflows.Hence

int a = largeNumber;
int b = anotherLargeNumber;
float *c = new float[a*b];

may not work even if largeNumber and anotherLargeNumber fit into 32 bit integers. In this case you have to use size_t for variables a and b, or add a cast like

float *c = new float[(size_t)a*(size_t)b]

One possible issue with size_t is that OpenMP does not accepted unsigned indexes for parallel for loops. So

#pragma omp parallel for
for(size_t i=0;i<n;i++)
{
do fancy parallel stuff here
}

will not compile. The solution is to use long long as type for the index variable. This seems to be somewhat compiler dependent, as I’ve had to change index types in libraries that apparently compiled fine for other people.

Leave a comment

Your email address will not be published. Required fields are marked *