The STL does tend to cause this sort of thing. The spec doesn't allow memcpy'ing because that doesn't work in all cases. There's a document describing EASTL, a bunch of alterations made by EA to make it more suitable for their purposes, which does have a method of declaring that a type is safe to memcpy. Unfortunately it's not open source AFAIK so we can't play with it.
IIRC Dinkumware STL (the one in VS) grows vectors by 50% each time.
However, doing a series of push_back's on a vector is a common inefficiency. You can either use reserve to alleviate it (at the cost of possibly wasting memory if you overestimate significantly) or use a different container - deque performs better for a series of insertions like that but is a little slower in random access, which may/may not be a good tradeoff for you.
Or you could look at storing pointers instead of values which will make the resizing much cheaper if you're storing large elements. If you're storing large objects this will always win because you don't have to copy them ever - you'll always save that one copy for each item on insertion at least.
I agree with R. Pate and Todd Gardner; a std::set
might be a good idea here. Even if you're stuck using vectors, if you have enough duplicates, you might be better off creating a set to do the dirty work.
Let's compare three approaches:
Just using vector, sort + unique
sort( vec.begin(), vec.end() );
vec.erase( unique( vec.begin(), vec.end() ), vec.end() );
Convert to set (manually)
set<int> s;
unsigned size = vec.size();
for( unsigned i = 0; i < size; ++i ) s.insert( vec[i] );
vec.assign( s.begin(), s.end() );
Convert to set (using a constructor)
set<int> s( vec.begin(), vec.end() );
vec.assign( s.begin(), s.end() );
Here's how these perform as the number of duplicates changes:
Summary: when the number of duplicates is large enough, it's actually faster to convert to a set and then dump the data back into a vector.
And for some reason, doing the set conversion manually seems to be faster than using the set constructor -- at least on the toy random data that I used.
Best Answer
They aren't the same though, are they? One is a copy, the other is a swap. Hence the function names.
My favourite is:
Where
a
andb
are vectors.