C++ Programming – Using size_t or int for Dimensions and Indexes

arrayc

In C++, size_t (or, more correctly T::size_type which is "usually" size_t; i.e., a unsigned type) is used as the return value for size(), the argument to operator[], etc. (see std::vector, et. al.)

On the other hand, .NET languages use int (and, optionally, long) for the same purpose; in fact, CLS-compliant languages are not required to support unsigned types.

Given that .NET is newer than C++, something tells me that there may be problems using unsigned int even for things that "can't possibly" be negative like an array index or length. Is the C++ approach "historical artifact" for backwards compatibility? Or are there real and significant design tradeoffs between the two approaches?

Why does this matter? Well … what should I use for a new multi-dimensional class in C++; size_t or int?

struct Foo final // e.g., image, matrix, etc.
{
    typedef int32_t /* or int64_t*/ dimension_type; // *OR* always "size_t" ?
    typedef size_t size_type; // c.f., std::vector<>

    dimension_type bar_; // maybe rows, or x
    dimension_type baz_; // e.g., columns, or y

    size_type size() const { ... } // STL-like interface
};

Best Answer

Given that .NET is newer than C++, something tells me that there may be problems using unsigned int even for things that "can't possibly" be negative like an array index or length.

Yes. For certain types of applications such as image processing or array processing, it is often necessary to access elements relative to the current position:

sum = data[k - 2] + data[k - 1] + data[k] + data[k + 1] + ...

In these types of applications, you cannot perform range check with unsigned integers without thinking carefully:

if (k - 2 < 0) {
    throw std::out_of_range("will never be thrown"); 
}

if (k < 2) {
    throw std::out_of_range("will be thrown"); 
}

if (k < 2uL) {
    throw std::out_of_range("will be thrown, without signedness ambiguity"); 
}

Instead you have to rearrange your range check expression. That is the main difference. Programmers must also remember the integer conversion rules. When in doubt, re-read http://en.cppreference.com/w/cpp/language/operator_arithmetic#Conversions

A lot of applications do not need to use very large array indices, but they do need to perform range checks. Furthermore, a lot of programmers are not trained to do this expression rearrangement gymnastics. A single missed opportunity opens the door to an exploit.

C# is indeed designed for those applications that will not need more than 2^31 elements per array. For example, a spreadsheet application does not need to deal with that many rows, columns, or cells. C# deals with the upper limit by having optional checked arithmetic that can be enabled for a block of code with a keyword without messing with compiler options. For this reason, C# favors the use of signed integer. When these decisions are considered altogether, it makes good sense.

C++ is simply different, and is harder to get correct code.

Regarding the practical importance of allowing signed arithmetic to remove a potential violation of "principle of least astonishment", a case in point is OpenCV, which uses signed 32-bit integer for matrix element index, array size, pixel channel count, etc. Image processing is an example of programming domain that uses relative array index heavily. Unsigned integer underflow (negative result wrapped around) will severely complicate algorithm implementation.