I am currently working on an application for a low-memory platform that requires an std::set of many short strings (>100,000 strings of 4-16 characters each). I recently transitioned this set from std::string to const char * to save memory and I was wondering whether I was really avoiding all that much overhead per string.
I tried using the following:
std::string sizeTest = "testString";
std::cout << sizeof(sizeTest) << " bytes";
But it just gave me an output of 4 bytes, indicating that the string contains a pointer. I'm well aware that strings store their data in a char * internally, but I thought the string class would have additional overhead.
Does the GCC implementation of std::string incur more overhead than sizeof(std::string) would indicate? More importantly, is it significant over this size of data set?
Here are the sizes of relevant types on my platform (it is 32-bit and has 8 bits per byte):
char: 1 bytes
void *: 4 bytes
char *: 4 bytes
std::string: 4 bytes
Best Answer
Well, at least with GCC 4.4.5, which is what I have handy on this machine,
std::string
is a typdef forstd::basic_string<char>
, andbasic_string
is defined in/usr/include/c++/4.4.5/bits/basic_string.h
. There's a lot of indirection in that file, but what it comes down to is that nonemptystd::string
s store a pointer to one of these:Followed in-memory by the actual string data. So
std::string
is going to have at least three words of overhead for each string, plus any overhead for having a highercapacity
than`length
(probably not, depending on how you construct your strings -- you can check by asking thecapacity()
method).There's also going to be overhead from your memory allocator for doing lots of small allocations; I don't know what GCC uses for C++, but assuming it's similar to the
dlmalloc
allocator it uses for C, that could be at least two words per allocation, plus some space to align the size to a multiple of at least 8 bytes.