C++ STL Data Structures – std::vector Non-Array Implementation

cdata structuresstl

I've seen some posts on the StackExchange family of sites talking about std::vector implementations. They all seem to indicate that std::vector is implemented strictly as an array (in practice), and that C++ 2003 dictates contiguity of elements – pretty much closing non-array loopholes.

Now I thought that I had read once of a non-array implementation of std::vector – perhaps this predated the 2003 enforcement of contiguity? (Edit: Herb Sutter makes note of this here) I believe it was something like a series of linked arrays with decreasing or increasing sizes under the hood but I can't remember the details. Does anyone know of std::vector implementations (or perhaps, more broadly, non-STL vector implementations) that use a non-array approach like this?

Edit: I'd like to clarify here that the emphasis is less on strict std::vector implementation for C++ and rather more on 1) historical STL implementations prior to C++ 2003 contiguous elements constraints or possibly even 2) "vector" implementations in other languages – that do not use the usual array-like structure. A VList implementation of a vector might be a potential example and I'm looking for others.

Best Answer

StackOverflow would be a somewhat better place for asking programming/algorithm related questions. In any case, the implementation you must have read would be based on "tables". Here is how such implementation works:

Initialize vector with size n, say n = 16

Address: 0xAAA0 to 0xAAB0 Memory reserved

Insert 17 elements. First 16 inserted fine. Next element requires more memory.

STL Library: Allocate memory for 16 * 2 = 32. Copy 16 elements. (Actual time taken = 16 units). Insert the 17th element.

Insert 16 more elements. First 15 inserted fine. Next element requires more space.

STL Library: Allocate memory for 16 * 2 * 2 = 64. Copy 32 elements. (Actual time taken = 32 units). Insert the 33rd element.

Insert 32 more elements. First 31 inserted fine. Next requires more space. STL Library: Allocate memory for 16 * 2 * 2 * 2 = 128. Copy 64 elements. (Actual time taken = 64 units). Insert the 65th element.

This implementation is O(1) for accessing and O(1) amortized for insertion. How? Over a very large number of operations, the total time of inserts would be:

Time = 2^0 (inserts) + 2^0 (copy) + 2^1 / 2 ( inserts ) + 2^1(copy) + 2^2/2 (inserts) + 2^2 (copy) ... .. + 2^n(copy)

Total number of inserts = 2^n Time = 2^0 + 2^0 + 2^0 + 2^1 + 2^1 + 2^2 + 2^2 = 1 + 2*2^0 + 2*2^1 +...+2*2^(n-1) = 1 + 2*(2^n - 1)

Average time per insert = 2 units

Total inserts = 2^n + 1 Time = 2^0 + 2^0 + 2^0 + 2^1 + 2^1 + 2^2 + 2^2 + 2^3 ... = 1 + 2*2^0 + 2*2^1 +...+2*2^(n-1) + 2^n = 1 + 2*(2^n - 1) + 2^n

Average time per insert = 3 units

Its not a linked list- but I'm pretty this is what you read: 'increasing/decreasing' sizes. Decrease in size upon deletes is similar. When used size is less than 1/4, free up the rest of memory, free up half of memory. If you're using your own memory allocator, it shouldn't be too hard to free only what you want. But if you want to copy over as well, analysis would tell you deletes still remain O(1)

Related Solutions

C Data Structures – Pure C Vector Implementation

Yes, the design of your Vector is 'wrong'.

Such a vector is usually thought of as a resizeable array, with the associated expectation that, like an array, it directly contains the elements. Your requirement that Vector stores pointers to allocated elements breaks that expectation.
Storing pointers is inefficient if the elements to be stored are small (for example, integers or doubles), because the overhead of dynamic allocation becomes significant (2 to 3 times the data size) and you need to store both the pointer and the actual data, where in the traditional design, you only need to store the few bytes for the actual data. Additionally, in the traditional design, all the elements are in adjacent memory locations, which makes the typical access patterns (looping over the items) much more cache-friendly for the CPU than your design, where the elements are scattered around in memory.
Sometimes it is necessary to conceptually store items in multiple containers. With the conventional design, this can be accommodated by storing pointers in the containers and managing the actual objects yourself. This technique can also be used to limit the amount of copying that needs to be done, if the user of the container thinks that it is a reasonable trade-off. With your design, you would need to dynamically allocate the pointers that get stored in the container, which is far from convenient and completely counter-intuitive for most developers.
Any interface that requires a pointer to allocated memory is prone to misuse, because the compiler can't see that Vector_append(&a) is an error and to a human reviewer it is also not immediately obvious. If you are lucky, this is detected during testing, but in some cases it will go undetected for a very long time.
The fact that Vector_remove() deletes the data item will also not sit nicely in most development circles, because
- Many coding guidelines require that corresponding malloc and free calls should be made from the same module. Your design makes it impossible to keep that requirement.
- It is common practice to create, for complex structures that contain pointers to allocated memory, to create a pair of 'constructor'/'destructor' functions that take care of all the allocations/cleanup in one go, including the 'root' struct. This also does not work correctly with Vector_remove calling free.

C++ – 2D linked list vs. multidimensional array/vector

I hope it's ok to answer my own question. I believe I have found the optimal (without overcomplicating the problem) data structure for my problem. There was at least minor idiocy on my part for not recognising this earlier. The data doesn't need to be accessed by (x,y,z) but instead by (x, y, range of z (say 0 - 3)). This give a C++ struct as follows:

struct node {
  struct node *next;
  int zGroup;
  int z;
  50 bytes of misc data };

I can then address this through a 3D dynamic array (vectors):

vector< vector < vector < node* > > > Data;

Any given Data[x][y][zGroup] points to the first element of a linked list, the entirety of which is needed every time one element of it is needed. No value of this array is NULL, every one contains a linked-list of at least one element.

The third dimension of the array - the zGroup has jagged dimensions, but with dynamic arrays this isn't an issue. Given the data and computations being performed on it, I know that the max x and y values are set when the file is read and do not change, neither does the number of z groups on any given (x,y) line, the actual z-values of nodes may change, but they will remain inside the same z-groups, giving a constant-sized, fully populated array.

With the way that the file is structured it is also easy enough to page it in and out of memory if I am brought to do this with much larger data sets.

Best Answer

Related Solutions

C Data Structures – Pure C Vector Implementation

C++ – 2D linked list vs. multidimensional array/vector

Related Topic